background simple sequence: Topics by Science.gov

Sample records for background simple sequence

Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.)

USDA-ARS?s Scientific Manuscript database

Background: Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed S...
Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

PubMed Central

2011-01-01

Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092
Kangaroo – A pattern-matching program for biological sequences

PubMed Central

2002-01-01

Background Biologists are often interested in performing a simple database search to identify proteins or genes that contain a well-defined sequence pattern. Many databases do not provide straightforward or readily available query tools to perform simple searches, such as identifying transcription binding sites, protein motifs, or repetitive DNA sequences. However, in many cases simple pattern-matching searches can reveal a wealth of information. We present in this paper a regular expression pattern-matching tool that was used to identify short repetitive DNA sequences in human coding regions for the purpose of identifying potential mutation sites in mismatch repair deficient cells. Results Kangaroo is a web-based regular expression pattern-matching program that can search for patterns in DNA, protein, or coding region sequences in ten different organisms. The program is implemented to facilitate a wide range of queries with no restriction on the length or complexity of the query expression. The program is accessible on the web at http://bioinfo.mshri.on.ca/kangaroo/ and the source code is freely distributed at http://sourceforge.net/projects/slritools/. Conclusion A low-level simple pattern-matching application can prove to be a useful tool in many research settings. For example, Kangaroo was used to identify potential genetic targets in a human colorectal cancer variant that is characterized by a high frequency of mutations in coding regions containing mononucleotide repeats. PMID:12150718
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
Robust Data Detection for the Photon-Counting Free-Space Optical System With Implicit CSI Acquisition and Background Radiation Compensation

NASA Astrophysics Data System (ADS)

Song, Tianyu; Kam, Pooi-Yuen

2016-02-01

Since atmospheric turbulence and pointing errors cause signal intensity fluctuations and the background radiation surrounding the free-space optical (FSO) receiver contributes an undesired noisy component, the receiver requires accurate channel state information (CSI) and background information to adjust the detection threshold. In most previous studies, for CSI acquisition, pilot symbols were employed, which leads to a reduction of spectral and energy efficiency; and an impractical assumption that the background radiation component is perfectly known was made. In this paper, we develop an efficient and robust sequence receiver, which acquires the CSI and the background information implicitly and requires no knowledge about the channel model information. It is robust since it can automatically estimate the CSI and background component and detect the data sequence accordingly. Its decision metric has a simple form and involves no integrals, and thus can be easily evaluated. A Viterbi-type trellis-search algorithm is adopted to improve the search efficiency, and a selective-store strategy is adopted to overcome a potential error floor problem as well as to increase the memory efficiency. To further simplify the receiver, a decision-feedback symbol-by-symbol receiver is proposed as an approximation of the sequence receiver. By simulations and theoretical analysis, we show that the performance of both the sequence receiver and the symbol-by-symbol receiver, approach that of detection with perfect knowledge of the CSI and background radiation, as the length of the window for forming the decision metric increases.
Capturing chloroplast variation for molecular ecology studies: a simple next generation sequencing approach applied to a rainforest tree

PubMed Central

2013-01-01

Background With high quantity and quality data production and low cost, next generation sequencing has the potential to provide new opportunities for plant phylogeographic studies on single and multiple species. Here we present an approach for in silicio chloroplast DNA assembly and single nucleotide polymorphism detection from short-read shotgun sequencing. The approach is simple and effective and can be implemented using standard bioinformatic tools. Results The chloroplast genome of Toona ciliata (Meliaceae), 159,514 base pairs long, was assembled from shotgun sequencing on the Illumina platform using de novo assembly of contigs. To evaluate its practicality, value and quality, we compared the short read assembly with an assembly completed using 454 data obtained after chloroplast DNA isolation. Sanger sequence verifications indicated that the Illumina dataset outperformed the longer read 454 data. Pooling of several individuals during preparation of the shotgun library enabled detection of informative chloroplast SNP markers. Following validation, we used the identified SNPs for a preliminary phylogeographic study of T. ciliata in Australia and to confirm low diversity across the distribution. Conclusions Our approach provides a simple method for construction of whole chloroplast genomes from shotgun sequencing of whole genomic DNA using short-read data and no available closely related reference genome (e.g. from the same species or genus). The high coverage of Illumina sequence data also renders this method appropriate for multiplexing and SNP discovery and therefore a useful approach for landscape level studies of evolutionary ecology. PMID:23497206
Action-perception coupling in violinists.

PubMed

Kajihara, Takafumi; Verdonschot, Rinus G; Sparks, Joseph; Stewart, Lauren

2013-01-01

The current study investigates auditory-motor coupling in musically trained participants using a Stroop-type task that required the execution of simple finger sequences according to aurally presented number sequences (e.g., "2," "4," "5," "3," "1"). Digital remastering was used to manipulate the pitch contour of the number sequences such that they were either congruent or incongruent with respect to the resulting action sequence. Conservatoire-level violinists showed a strong effect of congruency manipulation (increased response time for incongruent vs. congruent trials), in comparison to a control group of non-musicians. In Experiment 2, this paradigm was used to determine whether pedagogical background would influence this effect in a group of young violinists. Suzuki trained violinists differed significantly from those with no musical background, while traditionally trained violinists did not. The findings extend previous research in this area by demonstrating that obligatory audio-motor coupling is directly related to a musicians' expertise on their instrument of study and is influenced by pedagogy.
A Teaching Approach from the Exhaustive Search Method to the Needleman-Wunsch Algorithm

ERIC Educational Resources Information Center

Xu, Zhongneng; Yang, Yayun; Huang, Beibei

2017-01-01

The Needleman-Wunsch algorithm has become one of the core algorithms in bioinformatics; however, this programming requires more suitable explanations for students with different major backgrounds. In supposing sample sequences and using a simple store system, the connection between the exhaustive search method and the Needleman-Wunsch algorithm…
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns

PubMed Central

2011-01-01

Background Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Results Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Conclusions Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks. PMID:21910886
Robust optical flow using adaptive Lorentzian filter for image reconstruction under noisy condition

NASA Astrophysics Data System (ADS)

Kesrarat, Darun; Patanavijit, Vorapoj

2017-02-01

In optical flow for motion allocation, the efficient result in Motion Vector (MV) is an important issue. Several noisy conditions may cause the unreliable result in optical flow algorithms. We discover that many classical optical flows algorithms perform better result under noisy condition when combined with modern optimized model. This paper introduces effective robust models of optical flow by using Robust high reliability spatial based optical flow algorithms using the adaptive Lorentzian norm influence function in computation on simple spatial temporal optical flows algorithm. Experiment on our proposed models confirm better noise tolerance in optical flow's MV under noisy condition when they are applied over simple spatial temporal optical flow algorithms as a filtering model in simple frame-to-frame correlation technique. We illustrate the performance of our models by performing an experiment on several typical sequences with differences in movement speed of foreground and background where the experiment sequences are contaminated by the additive white Gaussian noise (AWGN) at different noise decibels (dB). This paper shows very high effectiveness of noise tolerance models that they are indicated by peak signal to noise ratio (PSNR).
BioWord: A sequence manipulation suite for Microsoft Word

PubMed Central

2012-01-01

Background The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. Results BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. Conclusions BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms. PMID:22676326
GASP: Gapped Ancestral Sequence Prediction for proteins

PubMed Central

Edwards, Richard J; Shields, Denis C

2004-01-01

Background The prediction of ancestral protein sequences from multiple sequence alignments is useful for many bioinformatics analyses. Predicting ancestral sequences is not a simple procedure and relies on accurate alignments and phylogenies. Several algorithms exist based on Maximum Parsimony or Maximum Likelihood methods but many current implementations are unable to process residues with gaps, which may represent insertion/deletion (indel) events or sequence fragments. Results Here we present a new algorithm, GASP (Gapped Ancestral Sequence Prediction), for predicting ancestral sequences from phylogenetic trees and the corresponding multiple sequence alignments. Alignments may be of any size and contain gaps. GASP first assigns the positions of gaps in the phylogeny before using a likelihood-based approach centred on amino acid substitution matrices to assign ancestral amino acids. Important outgroup information is used by first working down from the tips of the tree to the root, using descendant data only to assign probabilities, and then working back up from the root to the tips using descendant and outgroup data to make predictions. GASP was tested on a number of simulated datasets based on real phylogenies. Prediction accuracy for ungapped data was similar to three alternative algorithms tested, with GASP performing better in some cases and worse in others. Adding simple insertions and deletions to the simulated data did not have a detrimental effect on GASP accuracy. Conclusions GASP (Gapped Ancestral Sequence Prediction) will predict ancestral sequences from multiple protein alignments of any size. Although not as accurate in all cases as some of the more sophisticated maximum likelihood approaches, it can process a wide range of input phylogenies and will predict ancestral sequences for gapped and ungapped residues alike. PMID:15350199
A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

PubMed Central

Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P

2008-01-01

Background The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry. PMID:18570660
Changes in spinal reflex excitability associated with motor sequence learning.

PubMed

Lungu, Ovidiu; Frigon, Alain; Piché, Mathieu; Rainville, Pierre; Rossignol, Serge; Doyon, Julien

2010-05-01

There is ample evidence that motor sequence learning is mediated by changes in brain activity. Yet the question of whether this form of learning elicits changes detectable at the spinal cord level has not been addressed. To date, studies in humans have revealed that spinal reflex activity may be altered during the acquisition of various motor skills, but a link between motor sequence learning and changes in spinal excitability has not been demonstrated. To address this issue, we studied the modulation of H-reflex amplitude evoked in the flexor carpi radialis muscle of 14 healthy individuals between blocks of movements that involved the implicit acquisition of a sequence versus other movements that did not require learning. Each participant performed the task in three conditions: "sequence"-externally triggered, repeating and sequential movements, "random"-similar movements, but performed in an arbitrary order, and "simple"- involving alternating movements in a left-right or up-down direction only. When controlling for background muscular activity, H-reflex amplitude was significantly more reduced in the sequence (43.8 +/- 1.47%. mean +/- SE) compared with the random (38.2 +/- 1.60%) and simple (31.5 +/- 1.82%) conditions, while the M-response was not different across conditions. Furthermore, H-reflex changes were observed from the beginning of the learning process up to when subjects reached asymptotic performance on the motor task. Changes also persisted for >60 s after motor activity ceased. Such findings suggest that the excitability in some spinal reflex circuits is altered during the implicit learning process of a new motor sequence.
Identification of simple objects in image sequences

NASA Astrophysics Data System (ADS)

Geiselmann, Christoph; Hahn, Michael

1994-08-01

We present an investigation in the identification and location of simple objects in color image sequences. As an example the identification of traffic signs is discussed. Three aspects are of special interest. First regions have to be detected which may contain the object. The separation of those regions from the background can be based on color, motion, and contours. In the experiments all three possibilities are investigated. The second aspect focuses on the extraction of suitable features for the identification of the objects. For that purpose the border line of the region of interest is used. For planar objects a sufficient approximation of perspective projection is affine mapping. In consequence, it is near at hand to extract affine-invariant features from the border line. The investigation includes invariant features based on Fourier descriptors and moments. Finally, the object is identified by maximum likelihood classification. In the experiments all three basic object types are correctly identified. The probabilities for misclassification have been found to be below 1%
Cytogenetic and molecular markers for detecting Aegilops uniaristata chromosomes in a wheat background.

PubMed

Gong, Wenping; Li, Guangrong; Zhou, Jianping; Li, Genying; Liu, Cheng; Huang, Chengyan; Zhao, Zhendong; Yang, Zujun

2014-09-01

Aegilops uniaristata has many agronomically useful traits that can be used for wheat breeding. So far, a Triticum turgidum - Ae. uniaristata amphiploid and one set of Chinese Spring (CS) - Ae. uniaristata addition lines have been produced. To guide Ae. uniaristata chromatin transformation from these lines into cultivated wheat through chromosome engineering, reliable cytogenetic and molecular markers specific for Ae. uniaristata chromosomes need to be developed. Standard C-banding shows that C-bands mainly exist in the centromeric regions of Ae. uniaristata but rarely at the distal ends. Fluorescence in situ hybridization (FISH) using (GAA)8 as a probe showed that the hybridization signal of chromosomes 1N-7N are different, thus (GAA)8 can be used to identify all Ae. uniaristata chromosomes in wheat background simultaneously. Moreover, a total of 42 molecular markers specific for Ae. uniaristata chromosomes were developed by screening expressed sequence tag - sequence tagged site (EST-STS), expressed sequence tag - simple sequence repeat (EST-SSR), and PCR-based landmark unique gene (PLUG) primers. The markers were subsequently localized using the CS - Ae. uniaristata addition lines and different wheat cultivars as controls. The cytogenetic and molecular markers developed herein will be helpful for screening and identifying wheat - Ae. uniaristata progeny.
Research on the algorithm of infrared target detection based on the frame difference and background subtraction method

NASA Astrophysics Data System (ADS)

Liu, Yun; Zhao, Yuejin; Liu, Ming; Dong, Liquan; Hui, Mei; Liu, Xiaohua; Wu, Yijian

2015-09-01

As an important branch of infrared imaging technology, infrared target tracking and detection has a very important scientific value and a wide range of applications in both military and civilian areas. For the infrared image which is characterized by low SNR and serious disturbance of background noise, an innovative and effective target detection algorithm is proposed in this paper, according to the correlation of moving target frame-to-frame and the irrelevance of noise in sequential images based on OpenCV. Firstly, since the temporal differencing and background subtraction are very complementary, we use a combined detection method of frame difference and background subtraction which is based on adaptive background updating. Results indicate that it is simple and can extract the foreground moving target from the video sequence stably. For the background updating mechanism continuously updating each pixel, we can detect the infrared moving target more accurately. It paves the way for eventually realizing real-time infrared target detection and tracking, when transplanting the algorithms on OpenCV to the DSP platform. Afterwards, we use the optimal thresholding arithmetic to segment image. It transforms the gray images to black-white images in order to provide a better condition for the image sequences detection. Finally, according to the relevance of moving objects between different frames and mathematical morphology processing, we can eliminate noise, decrease the area, and smooth region boundaries. Experimental results proves that our algorithm precisely achieve the purpose of rapid detection of small infrared target.
Simple Sequence Repeats Provide a Substrate for Phenotypic Variation in the Neurospora crassa Circadian Clock

PubMed Central

Michael, Todd P.; Park, Sohyun; Kim, Tae-Sung; Booth, Jim; Byer, Amanda; Sun, Qi; Chory, Joanne; Lee, Kwangwon

2007-01-01

Background WHITE COLLAR-1 (WC-1) mediates interactions between the circadian clock and the environment by acting as both a core clock component and as a blue light photoreceptor in Neurospora crassa. Loss of the amino-terminal polyglutamine (NpolyQ) domain in WC-1 results in an arrhythmic circadian clock; this data is consistent with this simple sequence repeat (SSR) being essential for clock function. Methodology/Principal Findings Since SSRs are often polymorphic in length across natural populations, we reasoned that investigating natural variation of the WC-1 NpolyQ may provide insight into its role in the circadian clock. We observed significant phenotypic variation in the period, phase and temperature compensation of circadian regulated asexual conidiation across 143 N. crassa accessions. In addition to the NpolyQ, we identified two other simple sequence repeats in WC-1. The sizes of all three WC-1 SSRs correlated with polymorphisms in other clock genes, latitude and circadian period length. Furthermore, in a cross between two N. crassa accessions, the WC-1 NpolyQ co-segregated with period length. Conclusions/Significance Natural variation of the WC-1 NpolyQ suggests a mechanism by which period length can be varied and selected for by the local environment that does not deleteriously affect WC-1 activity. Understanding natural variation in the N. crassa circadian clock will facilitate an understanding of how fungi exploit their environments. PMID:17726525
Simple, multiplexed, PCR-based barcoding of DNA enables sensitive mutation detection in liquid biopsies using sequencing.

PubMed

Ståhlberg, Anders; Krzyzanowski, Paul M; Jackson, Jennifer B; Egyud, Matthew; Stein, Lincoln; Godfrey, Tony E

2016-06-20

Detection of cell-free DNA in liquid biopsies offers great potential for use in non-invasive prenatal testing and as a cancer biomarker. Fetal and tumor DNA fractions however can be extremely low in these samples and ultra-sensitive methods are required for their detection. Here, we report an extremely simple and fast method for introduction of barcodes into DNA libraries made from 5 ng of DNA. Barcoded adapter primers are designed with an oligonucleotide hairpin structure to protect the molecular barcodes during the first rounds of polymerase chain reaction (PCR) and prevent them from participating in mis-priming events. Our approach enables high-level multiplexing and next-generation sequencing library construction with flexible library content. We show that uniform libraries of 1-, 5-, 13- and 31-plex can be generated. Utilizing the barcodes to generate consensus reads for each original DNA molecule reduces background sequencing noise and allows detection of variant alleles below 0.1% frequency in clonal cell line DNA and in cell-free plasma DNA. Thus, our approach bridges the gap between the highly sensitive but specific capabilities of digital PCR, which only allows a limited number of variants to be analyzed, with the broad target capability of next-generation sequencing which traditionally lacks the sensitivity to detect rare variants. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
A Simple and Efficient Methodology To Improve Geometric Accuracy in Gamma Knife Radiation Surgery: Implementation in Multiple Brain Metastases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karaiskos, Pantelis, E-mail: pkaraisk@med.uoa.gr; Gamma Knife Department, Hygeia Hospital, Athens; Moutsatsos, Argyris

Purpose: To propose, verify, and implement a simple and efficient methodology for the improvement of total geometric accuracy in multiple brain metastases gamma knife (GK) radiation surgery. Methods and Materials: The proposed methodology exploits the directional dependence of magnetic resonance imaging (MRI)-related spatial distortions stemming from background field inhomogeneities, also known as sequence-dependent distortions, with respect to the read-gradient polarity during MRI acquisition. First, an extra MRI pulse sequence is acquired with the same imaging parameters as those used for routine patient imaging, aside from a reversal in the read-gradient polarity. Then, “average” image data are compounded from data acquiredmore » from the 2 MRI sequences and are used for treatment planning purposes. The method was applied and verified in a polymer gel phantom irradiated with multiple shots in an extended region of the GK stereotactic space. Its clinical impact in dose delivery accuracy was assessed in 15 patients with a total of 96 relatively small (<2 cm) metastases treated with GK radiation surgery. Results: Phantom study results showed that use of average MR images eliminates the effect of sequence-dependent distortions, leading to a total spatial uncertainty of less than 0.3 mm, attributed mainly to gradient nonlinearities. In brain metastases patients, non-eliminated sequence-dependent distortions lead to target localization uncertainties of up to 1.3 mm (mean: 0.51 ± 0.37 mm) with respect to the corresponding target locations in the “average” MRI series. Due to these uncertainties, a considerable underdosage (5%-32% of the prescription dose) was found in 33% of the studied targets. Conclusions: The proposed methodology is simple and straightforward in its implementation. Regarding multiple brain metastases applications, the suggested approach may substantially improve total GK dose delivery accuracy in smaller, outlying targets.« less

Ultrafast Pulse Sequencing for Fast Projective Measurements of Atomic Hyperfine Qubits

NASA Astrophysics Data System (ADS)

Ip, Michael; Ransford, Anthony; Campbell, Wesley

2015-05-01

Projective readout of quantum information stored in atomic hyperfine structure typically uses state-dependent CW laser-induced fluorescence. This method requires an often sophisticated imaging system to spatially filter out the background CW laser light. We present an alternative approach that instead uses simple pulse sequences from a mode-locked laser to affect the same state-dependent excitations in less than 1 ns. The resulting atomic fluorescence occurs in the dark, allowing the placement of non-imaging detectors right next to the atom to improve the qubit state detection efficiency and speed. We also discuss methods of Doppler cooling with mode-locked lasers for trapped ions, where the creation of the necessary UV light is often difficult with CW lasers.
Stimulus novelty, task relevance and the visual evoked potential in man

NASA Technical Reports Server (NTRS)

Courchesne, E.; Hillyard, S. A.; Galambos, R.

1975-01-01

The effect of task relevance on P3 (waveform of human evoked potential) waves and the methodologies used to deal with them are outlined. Visual evoked potentials (VEPs) were recorded from normal adult subjects performing in a visual discrimination task. Subjects counted the number of presentations of the numeral 4 which was interposed rarely and randomly within a sequence of tachistoscopically flashed background stimuli. Intrusive, task-irrelevant (not counted) stimuli were also interspersed rarely and randomly in the sequence of 2s; these stimuli were of two types: simples, which were easily recognizable, and novels, which were completely unrecognizable. It was found that the simples and the counted 4s evoked posteriorly distributed P3 waves while the irrelevant novels evoked large, frontally distributed P3 waves. These large, frontal P3 waves to novels were also found to be preceded by large N2 waves. These findings indicate that the P3 wave is not a unitary phenomenon but should be considered in terms of a family of waves, differing in their brain generators and in their psychological correlates.
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Genome Wide Characterization of Simple Sequence Repeats in Cucumber

USDA-ARS?s Scientific Manuscript database

The whole genome sequence of the cucumber cultivar Gy14 was recently sequenced at 15× coverage with the Roche 454 Titanium technology. The microsatellite DNA sequences (simple sequence repeats, SSRs) in the assembled scaffolds were computationally explored and characterized. A total of 112,073 SSRs ...
Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

PubMed Central

2010-01-01

Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
Microsatellite analysis in the genome of Acanthaceae: An in silico approach

PubMed Central

Kaliswamy, Priyadharsini; Vellingiri, Srividhya; Nathan, Bharathi; Selvaraj, Saravanakumar

2015-01-01

Background: Acanthaceae is one of the advanced and specialized families with conventionally used medicinal plants. Simple sequence repeats (SSRs) play a major role as molecular markers for genome analysis and plant breeding. The microsatellites existing in the complete genome sequences would help to attain a direct role in the genome organization, recombination, gene regulation, quantitative genetic variation, and evolution of genes. Objective: The current study reports the frequency of microsatellites and appropriate markers for the Acanthaceae family genome sequences. Materials and Methods: The whole nucleotide sequences of Acanthaceae species were obtained from National Center for Biotechnology Information database and screened for the presence of SSRs. SSR Locator tool was used to predict the microsatellites and inbuilt Primer3 module was used for primer designing. Results: Totally 110 repeats from 108 sequences of Acanthaceae family plant genomes were identified, and the occurrence of dinucleotide repeats was found to be abundant in the genome sequences. The essential amino acid isoleucine was found rich in all the sequences. We also designed the SSR-based primers/markers for 59 sequences of this family that contains microsatellite repeats in their genome. Conclusion: The identified microsatellites and primers might be useful for breeding and genetic studies of plants that belong to Acanthaceae family in the future. PMID:25709226
Fusion primer and nested integrated PCR (FPNI-PCR): a new high-efficiency strategy for rapid chromosome walking or flanking sequence cloning

PubMed Central

2011-01-01

Background The advent of genomics-based technologies has revolutionized many fields of biological enquiry. However, chromosome walking or flanking sequence cloning is still a necessary and important procedure to determining gene structure. Such methods are used to identify T-DNA insertion sites and so are especially relevant for organisms where large T-DNA insertion libraries have been created, such as rice and Arabidopsis. The currently available methods for flanking sequence cloning, including the popular TAIL-PCR technique, are relatively laborious and slow. Results Here, we report a simple and effective fusion primer and nested integrated PCR method (FPNI-PCR) for the identification and cloning of unknown genomic regions flanked known sequences. In brief, a set of universal primers was designed that consisted of various 15-16 base arbitrary degenerate oligonucleotides. These arbitrary degenerate primers were fused to the 3' end of an adaptor oligonucleotide which provided a known sequence without degenerate nucleotides, thereby forming the fusion primers (FPs). These fusion primers are employed in the first step of an integrated nested PCR strategy which defines the overall FPNI-PCR protocol. In order to demonstrate the efficacy of this novel strategy, we have successfully used it to isolate multiple genomic sequences namely, 21 orthologs of genes in various species of Rosaceace, 4 MYB genes of Rosa rugosa, 3 promoters of transcription factors of Petunia hybrida, and 4 flanking sequences of T-DNA insertion sites in transgenic tobacco lines and 6 specific genes from sequenced genome of rice and Arabidopsis. Conclusions The successful amplification of target products through FPNI-PCR verified that this novel strategy is an effective, low cost and simple procedure. Furthermore, FPNI-PCR represents a more sensitive, rapid and accurate technique than the established TAIL-PCR and hiTAIL-PCR procedures. PMID:22093809
A rapid and simple method of detection of Blepharisma japonicum using PCR and immobilisation on FTA paper

PubMed Central

Hide, Geoff; Hughes, Jacqueline M; McNuff, Robert

2003-01-01

Background The rapid expansion in the availability of genome and DNA sequence information has opened up new possibilities for the development of methods for detecting free-living protozoa in environmental samples. The protozoan Blepharisma japonicum was used to investigate a rapid and simple detection system based on polymerase chain reaction amplification (PCR) from organisms immobilised on FTA paper. Results Using primers designed from the α-tubulin genes of Blepharisma, specific and sensitive detection to the equivalent of a single Blepharisma cell could be achieved. Similar detection levels were found using water samples, containing Blepharisma, which were dried onto Whatman FTA paper. Conclusion This system has potential as a sensitive convenient detection system for Blepharisma and could be applied to other protozoan organisms. PMID:14516472
Universal sequence map (USM) of arbitrary discrete sequences

PubMed Central

2002-01-01

Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age

PubMed Central

2010-01-01

Background Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. Results We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. Conclusions GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events. PMID:20534164
Analysis of expressed sequence tags from Prunus mume flower and fruit and development of simple sequence repeat markers

PubMed Central

2010-01-01

Background Expressed Sequence Tag (EST) has been a cost-effective tool in molecular biology and represents an abundant valuable resource for genome annotation, gene expression, and comparative genomics in plants. Results In this study, we constructed a cDNA library of Prunus mume flower and fruit, sequenced 10,123 clones of the library, and obtained 8,656 expressed sequence tag (EST) sequences with high quality. The ESTs were assembled into 4,473 unigenes composed of 1,492 contigs and 2,981 singletons and that have been deposited in NCBI (accession IDs: GW868575 - GW873047), among which 1,294 unique ESTs were with known or putative functions. Furthermore, we found 1,233 putative simple sequence repeats (SSRs) in the P. mume unigene dataset. We randomly tested 42 pairs of PCR primers flanking potential SSRs, and 14 pairs were identified as true-to-type SSR loci and could amplify polymorphic bands from 20 individual plants of P. mume. We further used the 14 EST-SSR primer pairs to test the transferability on peach and plum. The result showed that nearly 89% of the primer pairs produced target PCR bands in the two species. A high level of marker polymorphism was observed in the plum species (65%) and low in the peach (46%), and the clustering analysis of the three species indicated that these SSR markers were useful in the evaluation of genetic relationships and diversity between and within the Prunus species. Conclusions We have constructed the first cDNA library of P. mume flower and fruit, and our data provide sets of molecular biology resources for P. mume and other Prunus species. These resources will be useful for further study such as genome annotation, new gene discovery, gene functional analysis, molecular breeding, evolution and comparative genomics between Prunus species. PMID:20626882
Statistical tests to compare motif count exceptionalities

PubMed Central

Robin, Stéphane; Schbath, Sophie; Vandewalle, Vincent

2007-01-01

Background Finding over- or under-represented motifs in biological sequences is now a common task in genomics. Thanks to p-value calculation for motif counts, exceptional motifs are identified and represent candidate functional motifs. The present work addresses the related question of comparing the exceptionality of one motif in two different sequences. Just comparing the motif count p-values in each sequence is indeed not sufficient to decide if this motif is significantly more exceptional in one sequence compared to the other one. A statistical test is required. Results We develop and analyze two statistical tests, an exact binomial one and an asymptotic likelihood ratio test, to decide whether the exceptionality of a given motif is equivalent or significantly different in two sequences of interest. For that purpose, motif occurrences are modeled by Poisson processes, with a special care for overlapping motifs. Both tests can take the sequence compositions into account. As an illustration, we compare the octamer exceptionalities in the Escherichia coli K-12 backbone versus variable strain-specific loops. Conclusion The exact binomial test is particularly adapted for small counts. For large counts, we advise to use the likelihood ratio test which is asymptotic but strongly correlated with the exact binomial test and very simple to use. PMID:17346349
Investigation of sequential properties of snoring episodes for obstructive sleep apnoea identification.

PubMed

Cavusoglu, M; Ciloglu, T; Serinagaoglu, Y; Kamasak, M; Erogul, O; Akcam, T

2008-08-01

In this paper, 'snore regularity' is studied in terms of the variations of snoring sound episode durations, separations and average powers in simple snorers and in obstructive sleep apnoea (OSA) patients. The goal was to explore the possibility of distinguishing among simple snorers and OSA patients using only sleep sound recordings of individuals and to ultimately eliminate the need for spending a whole night in the clinic for polysomnographic recording. Sequences that contain snoring episode durations (SED), snoring episode separations (SES) and average snoring episode powers (SEP) were constructed from snoring sound recordings of 30 individuals (18 simple snorers and 12 OSA patients) who were also under polysomnographic recording in Gülhane Military Medical Academy Sleep Studies Laboratory (GMMA-SSL), Ankara, Turkey. Snore regularity is quantified in terms of mean, standard deviation and coefficient of variation values for the SED, SES and SEP sequences. In all three of these sequences, OSA patients' data displayed a higher variation than those of simple snorers. To exclude the effects of slow variations in the base-line of these sequences, new sequences that contain the coefficient of variation of the sample values in a 'short' signal frame, i.e., short time coefficient of variation (STCV) sequences, were defined. The mean, the standard deviation and the coefficient of variation values calculated from the STCV sequences displayed a stronger potential to distinguish among simple snorers and OSA patients than those obtained from the SED, SES and SEP sequences themselves. Spider charts were used to jointly visualize the three parameters, i.e., the mean, the standard deviation and the coefficient of variation values of the SED, SES and SEP sequences, and the corresponding STCV sequences as two-dimensional plots. Our observations showed that the statistical parameters obtained from the SED and SES sequences, and the corresponding STCV sequences, possessed a strong potential to distinguish among simple snorers and OSA patients, both marginally, i.e., when the parameters are examined individually, and jointly. The parameters obtained from the SEP sequences and the corresponding STCV sequences, on the other hand, did not have a strong discrimination capability. However, the joint behaviour of these parameters showed some potential to distinguish among simple snorers and OSA patients.
De Novo Assembly of the Japanese Flounder (Paralichthys olivaceus) Spleen Transcriptome to Identify Putative Genes Involved in Immunity

PubMed Central

Huang, Lin; Li, Guiyang; Mo, Zhaolan; Xiao, Peng; Li, Jie; Huang, Jie

2015-01-01

Background Japanese flounder (Paralichthys olivaceus) is an economically important marine fish in Asia and has suffered from disease outbreaks caused by various pathogens, which requires more information for immune relevant genes on genome background. However, genomic and transcriptomic data for Japanese flounder remain scarce, which limits studies on the immune system of this species. In this study, we characterized the Japanese flounder spleen transcriptome using an Illumina paired-end sequencing platform to identify putative genes involved in immunity. Methodology/Principal Findings A cDNA library from the spleen of P. olivaceus was constructed and randomly sequenced using an Illumina technique. The removal of low quality reads generated 12,196,968 trimmed reads, which assembled into 96,627 unigenes. A total of 21,391 unigenes (22.14%) were annotated in the NCBI Nr database, and only 1.1% of the BLASTx top-hits matched P. olivaceus protein sequences. Approximately 12,503 (58.45%) unigenes were categorized into three Gene Ontology groups, 19,547 (91.38%) were classified into 26 Cluster of Orthologous Groups, and 10,649 (49.78%) were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Furthermore, 40,928 putative simple sequence repeats and 47, 362 putative single nucleotide polymorphisms were identified. Importantly, we identified 1,563 putative immune-associated unigenes that mapped to 15 immune signaling pathways. Conclusions/Significance The P. olivaceus transciptome data provides a rich source to discover and identify new genes, and the immune-relevant sequences identified here will facilitate our understanding of the mechanisms involved in the immune response. Furthermore, the plentiful potential SSRs and SNPs found in this study are important resources with respect to future development of a linkage map or marker assisted breeding programs for the flounder. PMID:25723398
Selective whole genome amplification for resequencing target microbial species from complex natural samples.

PubMed

Leichty, Aaron R; Brisson, Dustin

2014-10-01

Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with <6.7-fold amplification of the background. SWGA-treated genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.
Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

PubMed

Gupta, P D

2016-10-01

In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.
Information categorization approach to literary authorship disputes

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Peng, C.-K.; Yien, H.-W.; Goldberger, Ary L.

2003-11-01

Scientific analysis of the linguistic styles of different authors has generated considerable interest. We present a generic approach to measuring the similarity of two symbolic sequences that requires minimal background knowledge about a given human language. Our analysis is based on word rank order-frequency statistics and phylogenetic tree construction. We demonstrate the applicability of this method to historic authorship questions related to the classic Chinese novel “The Dream of the Red Chamber,” to the plays of William Shakespeare, and to the Federalist papers. This method may also provide a simple approach to other large databases based on their information content.
Simultaneous phylogeny reconstruction and multiple sequence alignment

PubMed Central

Yue, Feng; Shi, Jian; Tang, Jijun

2009-01-01

Background A phylogeny is the evolutionary history of a group of organisms. To date, sequence data is still the most used data type for phylogenetic reconstruction. Before any sequences can be used for phylogeny reconstruction, they must be aligned, and the quality of the multiple sequence alignment has been shown to affect the quality of the inferred phylogeny. At the same time, all the current multiple sequence alignment programs use a guide tree to produce the alignment and experiments showed that good guide trees can significantly improve the multiple alignment quality. Results We devise a new algorithm to simultaneously align multiple sequences and search for the phylogenetic tree that leads to the best alignment. We also implemented the algorithm as a C program package, which can handle both DNA and protein data and can take simple cost model as well as complex substitution matrices, such as PAM250 or BLOSUM62. The performance of the new method are compared with those from other popular multiple sequence alignment tools, including the widely used programs such as ClustalW and T-Coffee. Experimental results suggest that this method has good performance in terms of both phylogeny accuracy and alignment quality. Conclusion We present an algorithm to align multiple sequences and reconstruct the phylogenies that minimize the alignment score, which is based on an efficient algorithm to solve the median problems for three sequences. Our extensive experiments suggest that this method is very promising and can produce high quality phylogenies and alignments. PMID:19208110
Genome-wide characterization and selection of expressed sequence tag simple sequence repeat primers for optimized marker distribution and reliability in peach

USDA-ARS?s Scientific Manuscript database

Expressed sequence tag (EST) simple sequence repeats (SSRs) in Prunus were mined, and flanking primers designed and used for genome-wide characterization and selection of primers to optimize marker distribution and reliability. A total of 12,618 contigs were assembled from 84,727 ESTs, along with 34...
Implications of Secondary Aftershocks for Failure Processes

NASA Astrophysics Data System (ADS)

Gross, S. J.

2001-12-01

When a seismic sequence with more than one mainshock or an unusually large aftershock occurs, there is a compound aftershock sequence. The secondary aftershocks need not have exactly the same decay as the primary sequence, with the differences having implications for the failure process. When the stress step from the secondary mainshock is positive but not large enough to cause immediate failure of all the remaining primary aftershocks, failure processes which involve accelerating slip will produce secondary aftershocks that decay more rapidly than primary aftershocks. This is because the primary aftershocks are an accelerated version of the background seismicity, and secondary aftershocks are an accelerated version of the primary aftershocks. Real stress perturbations may be negative, and heterogeneities in mainshock stress fields mean that the real world situation is quite complicated. I will first describe and verify my picture of secondary aftershock decay with reference to a simple numerical model of slipping faults which obeys rate and state dependent friction and lacks stress heterogeneity. With such a model, it is possible to generate secondary aftershock sequences with perturbed decay patterns, quantify those patterns, and develop an analysis technique capable of correcting for the effect in real data. The secondary aftershocks are defined in terms of frequency linearized time s(T), which is equal to the number of primary aftershocks expected by a time T, $ s ≡ ∫ t=0T n(t) dt, where the start time t=0 is the time of the primary aftershock, and the primary aftershock decay function n(t) is extrapolated forward to the times of the secondary aftershocks. In the absence of secondary sequences the function s(T)$ re-scales the time so that approximately one event occurs per new time unit; the aftershock sequence is gone. If this rescaling is applied in the presence of a secondary sequence, the secondary sequence is shaped like a primary aftershock sequence, and can be fit by the same modeling techniques applied to simple sequences. The later part of the presentation will concern the decay of Hector Mine aftershocks as influenced by the Landers aftershocks. Although attempts to predict the abundance of Hector aftershocks based on stress overlap analysis are not very successful, the analysis does do a good job fitting the decay of secondary sequences.

Compression-based distance (CBD): a simple, rapid, and accurate method for microbiota composition comparison

PubMed Central

2013-01-01

Background Perturbations in intestinal microbiota composition have been associated with a variety of gastrointestinal tract-related diseases. The alleviation of symptoms has been achieved using treatments that alter the gastrointestinal tract microbiota toward that of healthy individuals. Identifying differences in microbiota composition through the use of 16S rRNA gene hypervariable tag sequencing has profound health implications. Current computational methods for comparing microbial communities are usually based on multiple alignments and phylogenetic inference, making them time consuming and requiring exceptional expertise and computational resources. As sequencing data rapidly grows in size, simpler analysis methods are needed to meet the growing computational burdens of microbiota comparisons. Thus, we have developed a simple, rapid, and accurate method, independent of multiple alignments and phylogenetic inference, to support microbiota comparisons. Results We create a metric, called compression-based distance (CBD) for quantifying the degree of similarity between microbial communities. CBD uses the repetitive nature of hypervariable tag datasets and well-established compression algorithms to approximate the total information shared between two datasets. Three published microbiota datasets were used as test cases for CBD as an applicable tool. Our study revealed that CBD recaptured 100% of the statistically significant conclusions reported in the previous studies, while achieving a decrease in computational time required when compared to similar tools without expert user intervention. Conclusion CBD provides a simple, rapid, and accurate method for assessing distances between gastrointestinal tract microbiota 16S hypervariable tag datasets. PMID:23617892
Quantification of the cerebrospinal fluid from a new whole body MRI sequence

NASA Astrophysics Data System (ADS)

Lebret, Alain; Petit, Eric; Durning, Bruno; Hodel, Jérôme; Rahmouni, Alain; Decq, Philippe

2012-03-01

Our work aims to develop a biomechanical model of hydrocephalus both intended to perform clinical research and to assist the neurosurgeon in diagnosis decisions. Recently, we have defined a new MR imaging sequence based on SPACE (Sampling Perfection with Application optimized Contrast using different flip-angle Evolution). On these images, the cerebrospinal fluid (CSF) appears as a homogeneous hypersignal. Therefore such images are suitable for segmentation and for volume assessment of the CSF. In this paper we present a fully automatic 3D segmentation of such SPACE MRI sequences. We choose a topological approach considering that CSF can be modeled as a simply connected object (i.e. a filled sphere). First an initial object which must be strictly included in the CSF and homotopic to a filled sphere, is determined by using a moment-preserving thresholding. Then a priority function based on an Euclidean distance map is computed in order to control the thickening process that adds "simple points" to the initial thresholded object. A point is called simple if its addition or its suppression does not result in change of topology neither for the object, nor for the background. The method is validated by measuring fluid volume of brain phantoms and by comparing our volume assessments on clinical data to those derived from a segmentation controlled by expert physicians. Then we show that a distinction between pathological cases and healthy adult people can be achieved by a linear discriminant analysis on volumes of the ventricular and intracranial subarachnoid spaces.
Early forest fire detection using principal component analysis of infrared video

NASA Astrophysics Data System (ADS)

Saghri, John A.; Radjabi, Ryan; Jacobs, John T.

2011-09-01

A land-based early forest fire detection scheme which exploits the infrared (IR) temporal signature of fire plume is described. Unlike common land-based and/or satellite-based techniques which rely on measurement and discrimination of fire plume directly from its infrared and/or visible reflectance imagery, this scheme is based on exploitation of fire plume temporal signature, i.e., temperature fluctuations over the observation period. The method is simple and relatively inexpensive to implement. The false alarm rate is expected to be lower that of the existing methods. Land-based infrared (IR) cameras are installed in a step-stare-mode configuration in potential fire-prone areas. The sequence of IR video frames from each camera is digitally processed to determine if there is a fire within camera's field of view (FOV). The process involves applying a principal component transformation (PCT) to each nonoverlapping sequence of video frames from the camera to produce a corresponding sequence of temporally-uncorrelated principal component (PC) images. Since pixels that form a fire plume exhibit statistically similar temporal variation (i.e., have a unique temporal signature), PCT conveniently renders the footprint/trace of the fire plume in low-order PC images. The PC image which best reveals the trace of the fire plume is then selected and spatially filtered via simple threshold and median filter operations to remove the background clutter, such as traces of moving tree branches due to wind.
Always look on both sides: Phylogenetic information conveyed by simple sequence repeat allele sequences

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) markers are widely used tools for inferences about genetic diversity, phylogeography and spatial genetic structure. Their applications assume that variation among alleles is essentially caused by an expansion or contraction of the number of repeats and that, accessorily,...
Digital detection of endonuclease mediated gene disruption in the HIV provirus

PubMed Central

Sedlak, Ruth Hall; Liang, Shu; Niyonzima, Nixon; De Silva Feelixge, Harshana S.; Roychoudhury, Pavitra; Greninger, Alexander L.; Weber, Nicholas D.; Boissel, Sandrine; Scharenberg, Andrew M.; Cheng, Anqi; Magaret, Amalia; Bumgarner, Roger; Stone, Daniel; Jerome, Keith R.

2016-01-01

Genome editing by designer nucleases is a rapidly evolving technology utilized in a highly diverse set of research fields. Among all fields, the T7 endonuclease mismatch cleavage assay, or Surveyor assay, is the most commonly used tool to assess genomic editing by designer nucleases. This assay, while relatively easy to perform, provides only a semi-quantitative measure of mutation efficiency that lacks sensitivity and accuracy. We demonstrate a simple droplet digital PCR assay that quickly quantitates a range of indel mutations with detection as low as 0.02% mutant in a wild type background and precision (≤6%CV) and accuracy superior to either mismatch cleavage assay or clonal sequencing when compared to next-generation sequencing. The precision and simplicity of this assay will facilitate comparison of gene editing approaches and their optimization, accelerating progress in this rapidly-moving field. PMID:26829887
Markerless video analysis for movement quantification in pediatric epilepsy monitoring.

PubMed

Lu, Haiping; Eng, How-Lung; Mandal, Bappaditya; Chan, Derrick W S; Ng, Yen-Ling

2011-01-01

This paper proposes a markerless video analytic system for quantifying body part movements in pediatric epilepsy monitoring. The system utilizes colored pajamas worn by a patient in bed to extract body part movement trajectories, from which various features can be obtained for seizure detection and analysis. Hence, it is non-intrusive and it requires no sensor/marker to be attached to the patient's body. It takes raw video sequences as input and a simple user-initialization indicates the body parts to be examined. In background/foreground modeling, Gaussian mixture models are employed in conjunction with HSV-based modeling. Body part detection follows a coarse-to-fine paradigm with graph-cut-based segmentation. Finally, body part parameters are estimated with domain knowledge guidance. Experimental studies are reported on sequences captured in an Epilepsy Monitoring Unit at a local hospital. The results demonstrate the feasibility of the proposed system in pediatric epilepsy monitoring and seizure detection.
Pattern statistics on Markov chains and sensitivity to parameter estimation

PubMed Central

Nuel, Grégory

2006-01-01

Background: In order to compute pattern statistics in computational biology a Markov model is commonly used to take into account the sequence composition. Usually its parameter must be estimated. The aim of this paper is to determine how sensitive these statistics are to parameter estimation, and what are the consequences of this variability on pattern studies (finding the most over-represented words in a genome, the most significant common words to a set of sequences,...). Results: In the particular case where pattern statistics (overlap counting only) computed through binomial approximations we use the delta-method to give an explicit expression of σ, the standard deviation of a pattern statistic. This result is validated using simulations and a simple pattern study is also considered. Conclusion: We establish that the use of high order Markov model could easily lead to major mistakes due to the high sensitivity of pattern statistics to parameter estimation. PMID:17044916
Development and characterization of simple sequence repeats for Bipolaris sokiniana and cross transferability to related species

USDA-ARS?s Scientific Manuscript database

Simple sequence repeats (SSR) markers were developed from a small insert genomic library for Bipolaris sorokiniana, a mitosporic fungal pathogen that causes spot blotch and root rot in switchgrass. About 59% of sequenced clones (n=384) harbored various SSR motifs. After eliminating the redundant seq...
BLAST Ring Image Generator (BRIG): simple prokaryote genome comparisons

PubMed Central

2011-01-01

Background Visualisation of genome comparisons is invaluable for helping to determine genotypic differences between closely related prokaryotes. New visualisation and abstraction methods are required in order to improve the validation, interpretation and communication of genome sequence information; especially with the increasing amount of data arising from next-generation sequencing projects. Visualising a prokaryote genome as a circular image has become a powerful means of displaying informative comparisons of one genome to a number of others. Several programs, imaging libraries and internet resources already exist for this purpose, however, most are either limited in the number of comparisons they can show, are unable to adequately utilise draft genome sequence data, or require a knowledge of command-line scripting for implementation. Currently, there is no freely available desktop application that enables users to rapidly visualise comparisons between hundreds of draft or complete genomes in a single image. Results BLAST Ring Image Generator (BRIG) can generate images that show multiple prokaryote genome comparisons, without an arbitrary limit on the number of genomes compared. The output image shows similarity between a central reference sequence and other sequences as a set of concentric rings, where BLAST matches are coloured on a sliding scale indicating a defined percentage identity. Images can also include draft genome assembly information to show read coverage, assembly breakpoints and collapsed repeats. In addition, BRIG supports the mapping of unassembled sequencing reads against one or more central reference sequences. Many types of custom data and annotations can be shown using BRIG, making it a versatile approach for visualising a range of genomic comparison data. BRIG is readily accessible to any user, as it assumes no specialist computational knowledge and will perform all required file parsing and BLAST comparisons automatically. Conclusions There is a clear need for a user-friendly program that can produce genome comparisons for a large number of prokaryote genomes with an emphasis on rapidly utilising unfinished or unassembled genome data. Here we present BRIG, a cross-platform application that enables the interactive generation of comparative genomic images via a simple graphical-user interface. BRIG is freely available for all operating systems at http://sourceforge.net/projects/brig/. PMID:21824423
Generation of non-genomic oligonucleotide tag sequences for RNA template-specific PCR

PubMed Central

Pinto, Fernando Lopes; Svensson, Håkan; Lindblad, Peter

2006-01-01

Background In order to overcome genomic DNA contamination in transcriptional studies, reverse template-specific polymerase chain reaction, a modification of reverse transcriptase polymerase chain reaction, is used. The possibility of using tags whose sequences are not found in the genome further improves reverse specific polymerase chain reaction experiments. Given the absence of software available to produce genome suitable tags, a simple tool to fulfill such need was developed. Results The program was developed in Perl, with separate use of the basic local alignment search tool, making the tool platform independent (known to run on Windows XP and Linux). In order to test the performance of the generated tags, several molecular experiments were performed. The results show that Tagenerator is capable of generating tags with good priming properties, which will deliberately not result in PCR amplification of genomic DNA. Conclusion The program Tagenerator is capable of generating tag sequences that combine genome absence with good priming properties for RT-PCR based experiments, circumventing the effects of genomic DNA contamination in an RNA sample. PMID:16820068
DOE Office of Scientific and Technical Information (OSTI.GOV)

Lo, Chien -Chi; Chain, Patrick S. G.

Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
A simple algorithm for quantifying DNA methylation levels on multiple independent CpG sites in bisulfite genomic sequencing electropherograms.

PubMed

Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A

2008-06-01

DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.
Label-free fluorescence strategy for sensitive detection of adenosine triphosphate using a loop DNA probe with low background noise.

PubMed

Lin, Chunshui; Cai, Zhixiong; Wang, Yiru; Zhu, Zhi; Yang, Chaoyong James; Chen, Xi

2014-07-15

A simple, rapid, label-free, and ultrasensitive fluorescence strategy for adenosine triphosphate (ATP) detection was developed using a loop DNA probe with low background noise. In this strategy, a loop DNA probe, which is the substrate for both ligation and digestion enzyme reaction, was designed. SYBR green I (SG I), a double-stranded specific dye, was applied for the readout fluorescence signal. Exonuclease I (Exo I) and exonuclease III (Exo III), sequence-independent nucleases, were selected to digest the loop DNA probe in order to minimize the background fluorescence signal. As a result, in the absence of ATP, the loop DNA was completely digested by Exo I and Exo III, leading to low background fluorescence owing to the weak electrostatic interaction between SG I and mononucleotides. On the other hand, ATP induced the ligation of the nicking site, and the sealed loop DNA resisted the digestion of Exo I and ExoIII, resulting in a remarkable increase of fluorescence response. Upon background noise reduction, the sensitivity of the ATP determination was improved significantly, and the detection limitation was found to be 1.2 pM, which is much lower than that in almost all the previously reported methods. This strategy has promise for wide application in the determination of ATP.
Genetic and epigenetic alterations induced by different levels of rye genome integration in wheat recipient.

PubMed

Zheng, X L; Zhou, J P; Zang, L L; Tang, A T; Liu, D Q; Deng, K J; Zhang, Y

2016-06-17

The narrow genetic variation present in common wheat (Triticum aestivum) varieties has greatly restricted the improvement of crop yield in modern breeding systems. Alien addition lines have proven to be an effective means to broaden the genetic diversity of common wheat. Wheat-rye addition lines, which are the direct bridge materials for wheat improvement, have been wildly used to produce new wheat cultivars carrying alien rye germplasm. In this study, we investigated the genetic and epigenetic alterations in two sets of wheat-rye disomic addition lines (1R-7R) and the corresponding triticales. We used expressed sequence tag-simple sequence repeat, amplified fragment length polymorphism, and methylation-sensitive amplification polymorphism analyses to analyze the effects of the introduction of alien chromosomes (either the entire genome or sub-genome) to wheat genetic background. We found obvious and diversiform variations in the genomic primary structure, as well as alterations in the extent and pattern of the genomic DNA methylation of the recipient. Meanwhile, these results also showed that introduction of different rye chromosomes could induce different genetic and epigenetic alterations in its recipient, and the genetic background of the parents is an important factor for genomic and epigenetic variation induced by alien chromosome addition.
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L

PubMed Central

2012-01-01

Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
Rapid evaluation and quality control of next generation sequencing data with FaQCs

DOE PAGES

Lo, Chien -Chi; Chain, Patrick S. G.

2014-12-01

Background: Next generation sequencing (NGS) technologies that parallelize the sequencing process and produce thousands to millions, or even hundreds of millions of sequences in a single sequencing run, have revolutionized genomic and genetic research. Because of the vagaries of any platform's sequencing chemistry, the experimental processing, machine failure, and so on, the quality of sequencing reads is never perfect, and often declines as the read is extended. These errors invariably affect downstream analysis/application and should therefore be identified early on to mitigate any unforeseen effects. Results: Here we present a novel FastQ Quality Control Software (FaQCs) that can rapidly processmore » large volumes of data, and which improves upon previous solutions to monitor the quality and remove poor quality data from sequencing runs. Both the speed of processing and the memory footprint of storing all required information have been optimized via algorithmic and parallel processing solutions. The trimmed output compared side-by-side with the original data is part of the automated PDF output. We show how this tool can help data analysis by providing a few examples, including an increased percentage of reads recruited to references, improved single nucleotide polymorphism identification as well as de novo sequence assembly metrics. Conclusion: FaQCs combines several features of currently available applications into a single, user-friendly process, and includes additional unique capabilities such as filtering the PhiX control sequences, conversion of FASTQ formats, and multi-threading. The original data and trimmed summaries are reported within a variety of graphics and reports, providing a simple way to do data quality control and assurance.« less
Highly Informative Simple Sequence Repeat (SSR) Markers for Fingerprinting Hazelnut

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat (SSR) or microsatellite markers have many applications in breeding and genetic studies of plants, including fingerprinting of cultivars and investigations of genetic diversity, and therefore provide information for better management of germplasm collections. They are repeatab...
SIRW: A web server for the Simple Indexing and Retrieval System that combines sequence motif searches with keyword searches.

PubMed

Ramu, Chenna

2003-07-01

SIRW (http://sirw.embl.de/) is a World Wide Web interface to the Simple Indexing and Retrieval System (SIR) that is capable of parsing and indexing various flat file databases. In addition it provides a framework for doing sequence analysis (e.g. motif pattern searches) for selected biological sequences through keyword search. SIRW is an ideal tool for the bioinformatics community for searching as well as analyzing biological sequences of interest.
Identification of (R)-selective ω-aminotransferases by exploring evolutionary sequence space.

PubMed

Kim, Eun-Mi; Park, Joon Ho; Kim, Byung-Gee; Seo, Joo-Hyun

2018-03-01

Several (R)-selective ω-aminotransferases (R-ωATs) have been reported. The existence of additional R-ωATs having different sequence characteristics from previous ones is highly expected. In addition, it is generally accepted that R-ωATs are variants of aminotransferase group III. Based on these backgrounds, sequences in RefSeq database were scored using family profiles of branched-chain amino acid aminotransferase (BCAT) and d-alanine aminotransferase (DAT) to predict and identify putative R-ωATs. Sequences with two profile analysis scores were plotted on two-dimensional score space. Candidates with relatively similar scores in both BCAT and DAT profiles (i.e., profile analysis score using BCAT profile was similar to profile analysis score using DAT profile) were selected. Experimental results for selected candidates showed that putative R-ωATs from Saccharopolyspora erythraea (R-ωAT_Sery), Bacillus cellulosilyticus (R-ωAT_Bcel), and Bacillus thuringiensis (R-ωAT_Bthu) had R-ωAT activity. Additional experiments revealed that R-ωAT_Sery also possessed DAT activity while R-ωAT_Bcel and R-ωAT_Bthu had BCAT activity. Selecting putative R-ωATs from regions with similar profile analysis scores identified potential R-ωATs. Therefore, R-ωATs could be efficiently identified by using simple family profile analysis and exploring evolutionary sequence space. Copyright © 2017 Elsevier Inc. All rights reserved.
Mapping Simple Repeated DNA Sequences in Heterochromatin of Drosophila Melanogaster

PubMed Central

Lohe, A. R.; Hilliker, A. J.; Roberts, P. A.

1993-01-01

Heterochromatin in Drosophila has unusual genetic, cytological and molecular properties. Highly repeated DNA sequences (satellites) are the principal component of heterochromatin. Using probes from cloned satellites, we have constructed a chromosome map of 10 highly repeated, simple DNA sequences in heterochromatin of mitotic chromosomes of Drosophila melanogaster. Despite extensive sequence homology among some satellites, chromosomal locations could be distinguished by stringent in situ hybridizations for each satellite. Only two of the localizations previously determined using gradient-purified bulk satellite probes are correct. Eight new satellite localizations are presented, providing a megabase-level chromosome map of one-quarter of the genome. Five major satellites each exhibit a multichromosome distribution, and five minor satellites hybridize to single sites on the Y chromosome. Satellites closely related in sequence are often located near one another on the same chromosome. About 80% of Y chromosome DNA is composed of nine simple repeated sequences, in particular (AAGAC)(n) (8 Mb), (AAGAG)(n) (7 Mb) and (AATAT)(n) (6 Mb). Similarly, more than 70% of the DNA in chromosome 2 heterochromatin is composed of five simple repeated sequences. We have also generated a high resolution map of satellites in chromosome 2 heterochromatin, using a series of translocation chromosomes whose breakpoints in heterochromatin were ordered by N-banding. Finally, staining and banding patterns of heterochromatic regions are correlated with the locations of specific repeated DNA sequences. The basis for the cytochemical heterogeneity in banding appears to depend exclusively on the different satellite DNAs present in heterochromatin. PMID:8375654

Development of genomic resources for the narrow-leafed lupin (Lupinus angustifolius): construction of a bacterial artificial chromosome (BAC) library and BAC-end sequencing

PubMed Central

2011-01-01

Background Lupinus angustifolius L, also known as narrow-leafed lupin (NLL), is becoming an important grain legume crop that is valuable for sustainable farming and is becoming recognised as a potential human health food. Recent interest is being directed at NLL to improve grain production, disease and pest management and health benefits of the grain. However, studies have been hindered by a lack of extensive genomic resources for the species. Results A NLL BAC library was constructed consisting of 111,360 clones with an average insert size of 99.7 Kbp from cv Tanjil. The library has approximately 12 × genome coverage. Both ends of 9600 randomly selected BAC clones were sequenced to generate 13985 BAC end-sequences (BESs), covering approximately 1% of the NLL genome. These BESs permitted a preliminary characterisation of the NLL genome such as organisation and composition, with the BESs having approximately 39% G:C content, 16.6% repetitive DNA and 5.4% putative gene-encoding regions. From the BESs 9966 simple sequence repeat (SSR) motifs were identified and some of these are shown to be potential markers. Conclusions The NLL BAC library and BAC-end sequences are powerful resources for genetic and genomic research on lupin. These resources will provide a robust platform for future high-resolution mapping, map-based cloning, comparative genomics and assembly of whole-genome sequencing data for the species. PMID:22014081
Analysis of simple sequence repeat (SSR) structure and sequence within Epichloë endophyte genomes reveals impacts on gene structure and insights into ancestral hybridization events.

PubMed

Clayton, William; Eaton, Carla Jane; Dupont, Pierre-Yves; Gillanders, Tim; Cameron, Nick; Saikia, Sanjay; Scott, Barry

2017-01-01

Epichloë grass endophytes comprise a group of filamentous fungi of both sexual and asexual species. Known for the beneficial characteristics they endow upon their grass hosts, the identification of these endophyte species has been of great interest agronomically and scientifically. The use of simple sequence repeat loci and the variation in repeat elements has been used to rapidly identify endophyte species and strains, however, little is known of how the structure of repeat elements changes between species and strains, and where these repeat elements are located in the fungal genome. We report on an in-depth analysis of the structure and genomic location of the simple sequence repeat locus B10, commonly used for Epichloë endophyte species identification. The B10 repeat was found to be located within an exon of a putative bZIP transcription factor, suggesting possible impacts on polypeptide sequence and thus protein function. Analysis of this repeat in the asexual endophyte hybrid Epichloë uncinata revealed that the structure of B10 alleles reflects the ancestral species that hybridized to give rise to this species. Understanding the structure and sequence of these simple sequence repeats provides a useful set of tools for readily distinguishing strains and for gaining insights into the ancestral species that have undergone hybridization events.
Transcriptome analysis of carnation (Dianthus caryophyllus L.) based on next-generation sequencing technology

PubMed Central

2012-01-01

Background Carnation (Dianthus caryophyllus L.), in the family Caryophyllaceae, can be found in a wide range of colors and is a model system for studies of flower senescence. In addition, it is one of the most important flowers in the global floriculture industry. However, few genomics resources, such as sequences and markers are available for carnation or other members of the Caryophyllaceae. To increase our understanding of the genetic control of important characters in carnation, we generated an expressed sequence tag (EST) database for a carnation cultivar important in horticulture by high-throughput sequencing using 454 pyrosequencing technology. Results We constructed a normalized cDNA library and a 3’-UTR library of carnation, obtaining a total of 1,162,126 high-quality reads. These reads were assembled into 300,740 unigenes consisting of 37,844 contigs and 262,896 singlets. The contigs were searched against an Arabidopsis sequence database, and 61.8% (23,380) of them had at least one BLASTX hit. These contigs were also annotated with Gene Ontology (GO) and were found to cover a broad range of GO categories. Furthermore, we identified 17,362 potential simple sequence repeats (SSRs) in 14,291 of the unigenes. We focused on gene discovery in the areas of flower color and ethylene biosynthesis. Transcripts were identified for almost every gene involved in flower chlorophyll and carotenoid metabolism and in anthocyanin biosynthesis. Transcripts were also identified for every step in the ethylene biosynthesis pathway. Conclusions We present the first large-scale sequence data set for carnation, generated using next-generation sequencing technology. The large EST database generated from these sequences is an informative resource for identifying genes involved in various biological processes in carnation and provides an EST resource for understanding the genetic diversity of this plant. PMID:22747974
Biophysical and structural considerations for protein sequence evolution

PubMed Central

2011-01-01

Background Protein sequence evolution is constrained by the biophysics of folding and function, causing interdependence between interacting sites in the sequence. However, current site-independent models of sequence evolutions do not take this into account. Recent attempts to integrate the influence of structure and biophysics into phylogenetic models via statistical/informational approaches have not resulted in expected improvements in model performance. This suggests that further innovations are needed for progress in this field. Results Here we develop a coarse-grained physics-based model of protein folding and binding function, and compare it to a popular informational model. We find that both models violate the assumption of the native sequence being close to a thermodynamic optimum, causing directional selection away from the native state. Sampling and simulation show that the physics-based model is more specific for fold-defining interactions that vary less among residue type. The informational model diffuses further in sequence space with fewer barriers and tends to provide less support for an invariant sites model, although amino acid substitutions are generally conservative. Both approaches produce sequences with natural features like dN/dS < 1 and gamma-distributed rates across sites. Conclusions Simple coarse-grained models of protein folding can describe some natural features of evolving proteins but are currently not accurate enough to use in evolutionary inference. This is partly due to improper packing of the hydrophobic core. We suggest possible improvements on the representation of structure, folding energy, and binding function, as regards both native and non-native conformations, and describe a large number of possible applications for such a model. PMID:22171550
PSSRdb: a relational database of polymorphic simple sequence repeats extracted from prokaryotic genomes.

PubMed

Kumar, Pankaj; Chaitanya, Pasumarthy S; Nagarajaram, Hampapathalu A

2011-01-01

PSSRdb (Polymorphic Simple Sequence Repeats database) (http://www.cdfd.org.in/PSSRdb/) is a relational database of polymorphic simple sequence repeats (PSSRs) extracted from 85 different species of prokaryotes. Simple sequence repeats (SSRs) are the tandem repeats of nucleotide motifs of the sizes 1-6 bp and are highly polymorphic. SSR mutations in and around coding regions affect transcription and translation of genes. Such changes underpin phase variations and antigenic variations seen in some bacteria. Although SSR-mediated phase variation and antigenic variations have been well-studied in some bacteria there seems a lot of other species of prokaryotes yet to be investigated for SSR mediated adaptive and other evolutionary advantages. As a part of our on-going studies on SSR polymorphism in prokaryotes we compared the genome sequences of various strains and isolates available for 85 different species of prokaryotes and extracted a number of SSRs showing length variations and created a relational database called PSSRdb. This database gives useful information such as location of PSSRs in genomes, length variation across genomes, the regions harboring PSSRs, etc. The information provided in this database is very useful for further research and analysis of SSRs in prokaryotes.
Sequence Composition and Gene Content of the Short Arm of Rye (Secale cereale) Chromosome 1

PubMed Central

Fluch, Silvia; Kopecky, Dieter; Burg, Kornel; Šimková, Hana; Taudien, Stefan; Petzold, Andreas; Kubaláková, Marie; Platzer, Matthias; Berenyi, Maria; Krainer, Siegfried; Doležel, Jaroslav; Lelley, Tamas

2012-01-01

Background The purpose of the study is to elucidate the sequence composition of the short arm of rye chromosome 1 (Secale cereale) with special focus on its gene content, because this portion of the rye genome is an integrated part of several hundreds of bread wheat varieties worldwide. Methodology/Principal Findings Multiple Displacement Amplification of 1RS DNA, obtained from flow sorted 1RS chromosomes, using 1RS ditelosomic wheat-rye addition line, and subsequent Roche 454FLX sequencing of this DNA yielded 195,313,589 bp sequence information. This quantity of sequence information resulted in 0.43× sequence coverage of the 1RS chromosome arm, permitting the identification of genes with estimated probability of 95%. A detailed analysis revealed that more than 5% of the 1RS sequence consisted of gene space, identifying at least 3,121 gene loci representing 1,882 different gene functions. Repetitive elements comprised about 72% of the 1RS sequence, Gypsy/Sabrina (13.3%) being the most abundant. More than four thousand simple sequence repeat (SSR) sites mostly located in gene related sequence reads were identified for possible marker development. The existence of chloroplast insertions in 1RS has been verified by identifying chimeric chloroplast-genomic sequence reads. Synteny analysis of 1RS to the full genomes of Oryza sativa and Brachypodium distachyon revealed that about half of the genes of 1RS correspond to the distal end of the short arm of rice chromosome 5 and the proximal region of the long arm of Brachypodium distachyon chromosome 2. Comparison of the gene content of 1RS to 1HS barley chromosome arm revealed high conservation of genes related to chromosome 5 of rice. Conclusions The present study revealed the gene content and potential gene functions on this chromosome arm and demonstrated numerous sequence elements like SSRs and gene-related sequences, which can be utilised for future research as well as in breeding of wheat and rye. PMID:22328922
Simple sequence repeat markers that identify Claviceps species and strains

USDA-ARS?s Scientific Manuscript database

Claviceps purpurea is a pathogen that infects most members of the Pooideae subfamily and causes ergot, a floral disease in which the ovary is replaced with a sclerotium. This study was initiated to develop Simple Sequence Repeat (SSRs) markers for rapid identification of C. purpurea. SSRs were desi...
Genome Annotation Generator: a simple tool for generating and correcting WGS annotation tables for NCBI submission

PubMed Central

Hall, Brian; Derego, Theodore; Bremer, Forest T; Cannoles, Kyle

2018-01-01

Abstract Background One of the most overlooked, yet critical, components of a whole genome sequencing (WGS) project is the submission and curation of the data to a genomic repository, most commonly the National Center for Biotechnology Information (NCBI). While large genome centers or genome groups have developed software tools for post-annotation assembly filtering, annotation, and conversion into the NCBI’s annotation table format, these tools typically require back-end setup and connection to an Structured Query Language (SQL) database and/or some knowledge of programming (Perl, Python) to implement. With WGS becoming commonplace, genome sequencing projects are moving away from the genome centers and into the ecology or biology lab, where fewer resources are present to support the process of genome assembly curation. To fill this gap, we developed software to assess, filter, and transfer annotation and convert a draft genome assembly and annotation set into the NCBI annotation table (.tbl) format, facilitating submission to the NCBI Genome Assembly database. This software has no dependencies, is compatible across platforms, and utilizes a simple command to perform a variety of simple and complex post-analysis, pre-NCBI submission WGS project tasks. Findings The Genome Annotation Generator is a consistent and user-friendly bioinformatics tool that can be used to generate a .tbl file that is consistent with the NCBI submission pipeline Conclusions The Genome Annotation Generator achieves the goal of providing a publicly available tool that will facilitate the submission of annotated genome assemblies to the NCBI. It is useful for any individual researcher or research group that wishes to submit a genome assembly of their study system to the NCBI. PMID:29635297
Similarity-based gene detection: using COGs to find evolutionarily-conserved ORFs

PubMed Central

Powell, Bradford C; Hutchison, Clyde A

2006-01-01

Background Experimental verification of gene products has not kept pace with the rapid growth of microbial sequence information. However, existing annotations of gene locations contain sufficient information to screen for probable errors. Furthermore, comparisons among genomes become more informative as more genomes are examined. We studied all open reading frames (ORFs) of at least 30 codons from the genomes of 27 sequenced bacterial strains. We grouped the potential peptide sequences encoded from the ORFs by forming Clusters of Orthologous Groups (COGs). We used this grouping in order to find homologous relationships that would not be distinguishable from noise when using simple BLAST searches. Although COG analysis was initially developed to group annotated genes, we applied it to the task of grouping anonymous DNA sequences that may encode proteins. Results "Mixed COGs" of ORFs (clusters in which some sequences correspond to annotated genes and some do not) are attractive targets when seeking errors of gene predicion. Examination of mixed COGs reveals some situations in which genes appear to have been missed in current annotations and a smaller number of regions that appear to have been annotated as gene loci erroneously. This technique can also be used to detect potential pseudogenes or sequencing errors. Our method uses an adjustable parameter for degree of conservation among the studied genomes (stringency). We detail results for one level of stringency at which we found 83 potential genes which had not previously been identified, 60 potential pseudogenes, and 7 sequences with existing gene annotations that are probably incorrect. Conclusion Systematic study of sequence conservation offers a way to improve existing annotations by identifying potentially homologous regions where the annotation of the presence or absence of a gene is inconsistent among genomes. PMID:16423288
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

PubMed Central

2013-01-01

Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
Detection of gas plumes in cluttered environments using long-wave infrared hyperspectral sensors

NASA Astrophysics Data System (ADS)

Broadwater, Joshua B.; Spisz, Thomas S.; Carr, Alison K.

2008-04-01

Long-wave infrared hyperspectral sensors provide the ability to detect gas plumes at stand-off distances. A number of detection algorithms have been developed for such applications, but in situations where the gas is released in a complex background and is at air temperature, these detectors can generate a considerable amount of false alarms. To make matters more difficult, the gas tends to have non-uniform concentrations throughout the plume making it spatially similar to the false alarms. Simple post-processing using median filters can remove a number of the false alarms, but at the cost of removing a significant amount of the gas plume as well. We approach the problem using an adaptive subpixel detector and morphological processing techniques. The adaptive subpixel detection algorithm is able to detect the gas plume against the complex background. We then use morphological processing techniques to isolate the gas plume while simultaneously rejecting nearly all false alarms. Results will be demonstrated on a set of ground-based long-wave infrared hyperspectral image sequences.
Genetic and Chemical Profiling of Gymnema sylvestre Accessions from Central India: Its Implication for Quality Control and Therapeutic Potential of Plant

PubMed Central

Verma, Ashutosh Kumar; Dhawan, Sunita Singh; Singh, Seema; Bharati, Kumar Avinash; Jyotsana

2016-01-01

Background: Gymnema sylvestre, a vulnerable plant species, is mentioned in Indian Pharmacopeia as an antidiabetic drug Objective: Study of genetic and chemical diversity and its implications in accessions of G. sylvestre Materials and Methods: Fourteen accessions of G. sylvestre collected from Central India and assessment of their genetic and chemical diversity were carried out using ISSR (inter simple sequence repeat) and HPLC (high performance liquid chromatography) fingerprinting methods Results: Among the screened 40 ISSR primers, 15 were found polymorphic and collectively produced nine unique accession-specific bands. The maximum and minimum numbers of amplicones were noted for ISSR-15 and ISSR-11, respectively. The ISSR -11 and ISSR-13 revealed 100% polymorphism. HPLC chromatograms showed that accessions possess the secondary metabolites of mid-polarity with considerable variability. Unknown peaks with retention time 2.63, 3.41, 23.83, 24.50, and 44.67 were found universal type. Comparative hierarchical clustering analysis based on foresaid fingerprints indicates that both techniques have equal potential to discriminate accessions according to percentage gymnemic acid in their leaf tissue. Second approach was noted more efficiently for separation of accessions according to their agro-climatic/collection site Conclusion: Highly polymorphic ISSRs could be utilized as molecular probes for further selection of high gymnemic acid yielding accessions. Observed accession specific bands may be used as a descriptor for plant accessions protection and converted into sequence tagged sites markers. Identified five universal type peaks could be helpful in identification of G. sylvestre-based various herbal preparations. SUMMARY Nine accession specific unique bandsFive marker peaks for G. sylvestre.Suitability of genetic and chemical fingerprinting Abbreviations used: HPLC: High Performance Liquid Chromatography, ISSR: Inter Simple Sequence Repeats, CTAB: Cetyl Trimethylammonium Bromide, DNTP: Deoxynucleotide Triphosphates PMID:27761067
Development of simple sequence repeat (SSR) markers from a genome survey of Chinese bayberry (Myrica rubra)

PubMed Central

2012-01-01

Background Chinese bayberry (Myrica rubra Sieb. and Zucc.) is a subtropical evergreen tree originating in China. It has been cultivated in southern China for several thousand years, and annual production has reached 1.1 million tons. The taste and high level of health promoting characters identified in the fruit in recent years has stimulated its extension in China and introduction to Australia. A limited number of co-dominant markers have been developed and applied in genetic diversity and identity studies. Here we report, for the first time, a survey of whole genome shotgun data to develop a large number of simple sequence repeat (SSR) markers to analyse the genetic diversity of the common cultivated Chinese bayberry and the relationship with three other Myrica species. Results The whole genome shotgun survey of Chinese bayberry produced 9.01Gb of sequence data, about 26x coverage of the estimated genome size of 323 Mb. The genome sequences were highly heterozygous, but with little duplication. From the initial assembled scaffold covering 255 Mb sequence data, 28,602 SSRs (≥5 repeats) were identified. Dinucleotide was the most common repeat motif with a frequency of 84.73%, followed by 13.78% trinucleotide, 1.34% tetranucleotide, 0.12% pentanucleotide and 0.04% hexanucleotide. From 600 primer pairs, 186 polymorphic SSRs were developed. Of these, 158 were used to screen 29 Chinese bayberry accessions and three other Myrica species: 91.14%, 89.87% and 46.84% SSRs could be used in Myrica adenophora, Myrica nana and Myrica cerifera, respectively. The UPGMA dendrogram tree showed that cultivated Myrica rubra is closely related to Myrica adenophora and Myrica nana, originating in southwest China, and very distantly related to Myrica cerifera, originating in America. These markers can be used in the construction of a linkage map and for genetic diversity studies in Myrica species. Conclusion Myrica rubra has a small genome of about 323 Mb with a high level of heterozygosity. A large number of SSRs were identified, and 158 polymorphic SSR markers developed, 91% of which can be transferred to other Myrica species. PMID:22621340
High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome

PubMed Central

Tang, Xiaoqing; Xiao, Yunhua; Lv, Tingting; Wang, Fangquan; Zhu, QianHao; Zheng, Tianqing; Yang, Jie

2014-01-01

Background Isatis indigotica, the source of the traditional Chinese medicine Radix isatidis (Ban-Lan-Gen), is an extremely important economical crop in China. To facilitate biological, biochemical and molecular research on the medicinal chemicals in I. indigotica, here we report the first I. indigotica transcriptome generated by RNA sequencing (RNA-seq). Results RNA-seq library was created using RNA extracted from a mixed sample including leaf and root. A total of 33,238 unigenes were assembled from more than 28 million of high quality short reads. The quality of the assembly was experimentally examined by cDNA sequencing of seven randomly selected unigenes. Based on blast search 28,184 unigenes had a hit in at least one of the protein and nucleotide databases used in this study, and 8 unigenes were found to be associated with biosynthesis of indole and its derivatives. According to Gene Ontology classification, 22,365 unigenes were categorized into 48 functional groups. Furthermore, Clusters of Orthologous Group and Swiss-Port annotation were assigned for 7,707 and 18,679 unigenes, respectively. Analysis of repeat motifs identified 6,400 simple sequence repeat markers in 4,509 unigenes. Conclusion Our data provide a comprehensive sequence resource for molecular study of I. indigotica. Our results will facilitate studies on the functions of genes involved in the indole alkaloid biosynthesis pathway and on metabolism of nitrogen and indole alkaloids in I. indigotica and its related species. PMID:25259890
Assembly of a phased diploid Candida albicans genome facilitates allele-specific measurements and provides a simple model for repeat and indel structure

PubMed Central

2013-01-01

Background Candida albicans is a ubiquitous opportunistic fungal pathogen that afflicts immunocompromised human hosts. With rare and transient exceptions the yeast is diploid, yet despite its clinical relevance the respective sequences of its two homologous chromosomes have not been completely resolved. Results We construct a phased diploid genome assembly by deep sequencing a standard laboratory wild-type strain and a panel of strains homozygous for particular chromosomes. The assembly has 700-fold coverage on average, allowing extensive revision and expansion of the number of known SNPs and indels. This phased genome significantly enhances the sensitivity and specificity of allele-specific expression measurements by enabling pooling and cross-validation of signal across multiple polymorphic sites. Additionally, the diploid assembly reveals pervasive and unexpected patterns in allelic differences between homologous chromosomes. Firstly, we see striking clustering of indels, concentrated primarily in the repeat sequences in promoters. Secondly, both indels and their repeat-sequence substrate are enriched near replication origins. Finally, we reveal an intimate link between repeat sequences and indels, which argues that repeat length is under selective pressure for most eukaryotes. This connection is described by a concise one-parameter model that explains repeat-sequence abundance in C. albicans as a function of the indel rate, and provides a general framework to interpret repeat abundance in species ranging from bacteria to humans. Conclusions The phased genome assembly and insights into repeat plasticity will be valuable for better understanding allele-specific phenomena and genome evolution. PMID:24025428
De Novo Transcriptome of the Hemimetabolous German Cockroach (Blattella germanica)

PubMed Central

Zhou, Xiaojie; Qian, Kun; Tong, Ying; Zhu, Junwei Jerry; Qiu, Xinghui; Zeng, Xiaopeng

2014-01-01

Background The German cockroach, Blattella germanica, is an important insect pest that transmits various pathogens mechanically and causes severe allergic diseases. This insect has long served as a model system for studies of insect biology, physiology and ecology. However, the lack of genome or transcriptome information heavily hinder our further understanding about the German cockroach in every aspect at a molecular level and on a genome-wide scale. To explore the transcriptome and identify unique sequences of interest, we subjected the B. germanica transcriptome to massively parallel pyrosequencing and generated the first reference transcriptome for B. germanica. Methodology/Principal Findings A total of 1,365,609 raw reads with an average length of 529 bp were generated via pyrosequencing the mixed cDNA library from different life stages of German cockroach including maturing oothecae, nymphs, adult females and males. The raw reads were de novo assembled to 48,800 contigs and 3,961 singletons with high-quality unique sequences. These sequences were annotated and classified functionally in terms of BLAST, GO and KEGG, and the genes putatively coding detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, immunity and chemoreception pathways were identified. A total of 3,601 SSRs (Simple Sequence Repeats) loci were also predicted. Conclusions/Significance The whole transcriptome pyrosequencing data from this study provides a usable genetic resource for future identification of potential functional genes involved in various biological processes. PMID:25265537
Host-Associated Metagenomics: A Guide to Generating Infectious RNA Viromes

PubMed Central

Robert, Catherine; Pascalis, Hervé; Michelle, Caroline; Jardot, Priscilla; Charrel, Rémi; Raoult, Didier; Desnues, Christelle

2015-01-01

Background Metagenomic analyses have been widely used in the last decade to describe viral communities in various environments or to identify the etiology of human, animal, and plant pathologies. Here, we present a simple and standardized protocol that allows for the purification and sequencing of RNA viromes from complex biological samples with an important reduction of host DNA and RNA contaminants, while preserving the infectivity of viral particles. Principal Findings We evaluated different viral purification steps, random reverse transcriptions and sequence-independent amplifications of a pool of representative RNA viruses. Viruses remained infectious after the purification process. We then validated the protocol by sequencing the RNA virome of human body lice engorged in vitro with artificially contaminated human blood. The full genomes of the most abundant viruses absorbed by the lice during the blood meal were successfully sequenced. Interestingly, random amplifications differed in the genome coverage of segmented RNA viruses. Moreover, the majority of reads were taxonomically identified, and only 7–15% of all reads were classified as “unknown”, depending on the random amplification method. Conclusion The protocol reported here could easily be applied to generate RNA viral metagenomes from complex biological samples of different origins. Our protocol allows further virological characterizations of the described viral communities because it preserves the infectivity of viral particles and allows for the isolation of viruses. PMID:26431175
Comparative study of the effectiveness and limitations of current methods for detecting sequence coevolution.

PubMed

Mao, Wenzhi; Kaya, Cihan; Dutta, Anindita; Horovitz, Amnon; Bahar, Ivet

2015-06-15

With rapid accumulation of sequence data on several species, extracting rational and systematic information from multiple sequence alignments (MSAs) is becoming increasingly important. Currently, there is a plethora of computational methods for investigating coupled evolutionary changes in pairs of positions along the amino acid sequence, and making inferences on structure and function. Yet, the significance of coevolution signals remains to be established. Also, a large number of false positives (FPs) arise from insufficient MSA size, phylogenetic background and indirect couplings. Here, a set of 16 pairs of non-interacting proteins is thoroughly examined to assess the effectiveness and limitations of different methods. The analysis shows that recent computationally expensive methods designed to remove biases from indirect couplings outperform others in detecting tertiary structural contacts as well as eliminating intermolecular FPs; whereas traditional methods such as mutual information benefit from refinements such as shuffling, while being highly efficient. Computations repeated with 2,330 pairs of protein families from the Negatome database corroborated these results. Finally, using a training dataset of 162 families of proteins, we propose a combined method that outperforms existing individual methods. Overall, the study provides simple guidelines towards the choice of suitable methods and strategies based on available MSA size and computing resources. Software is freely available through the Evol component of ProDy API. © The Author 2015. Published by Oxford University Press.
Cultivar identification, pedigree verification, and diversity analysis among Peach (Prunus persica L. Batsch) Cultivars based on Simple Sequence Repeat markers

USDA-ARS?s Scientific Manuscript database

The genetic relationships and pedigree inferences among peach (Prunus persica (L.) Batsch) accessions and breeding lines used in genetic improvement were evaluated using 15 simple sequence repeat (SSR) markers. A total of 80 alleles were detected among the 37 peach accessions with an average of 5.53...
THE USE OF INTER SIMPLE SEQUENCE REPEATS (ISSR) IN DISTINGUISHING NEIGHBORING DOUGLAS-FIR TREES AS A MEANS TO IDENTIFYING TREE ROOTS WITH ABOVE-GROUND BIOMASS

EPA Science Inventory

We are attempting to identify specific root fragments from soil cores with individual trees. We successfully used Inter Simple Sequence Repeats (ISSR) to distinguish neighboring old-growth Douglas-fir trees from one another, while maintaining identity among each tree's parts. W...

An integrated genetic linkage map of watermelon and genetic diversity based on single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers

USDA-ARS?s Scientific Manuscript database

Watermelon (Citrullus lanatus var. lanatus) is an important vegetable fruit throughout the world. A high number of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers should provide large coverage of the watermelon genome and high phylogenetic resolution of germplasm acces...
SeqHound: biological sequence and structure database as a platform for bioinformatics research

PubMed Central

2002-01-01

Background SeqHound has been developed as an integrated biological sequence, taxonomy, annotation and 3-D structure database system. It provides a high-performance server platform for bioinformatics research in a locally-hosted environment. Results SeqHound is based on the National Center for Biotechnology Information data model and programming tools. It offers daily updated contents of all Entrez sequence databases in addition to 3-D structural data and information about sequence redundancies, sequence neighbours, taxonomy, complete genomes, functional annotation including Gene Ontology terms and literature links to PubMed. SeqHound is accessible via a web server through a Perl, C or C++ remote API or an optimized local API. It provides functionality necessary to retrieve specialized subsets of sequences, structures and structural domains. Sequences may be retrieved in FASTA, GenBank, ASN.1 and XML formats. Structures are available in ASN.1, XML and PDB formats. Emphasis has been placed on complete genomes, taxonomy, domain and functional annotation as well as 3-D structural functionality in the API, while fielded text indexing functionality remains under development. SeqHound also offers a streamlined WWW interface for simple web-user queries. Conclusions The system has proven useful in several published bioinformatics projects such as the BIND database and offers a cost-effective infrastructure for research. SeqHound will continue to develop and be provided as a service of the Blueprint Initiative at the Samuel Lunenfeld Research Institute. The source code and examples are available under the terms of the GNU public license at the Sourceforge site http://sourceforge.net/projects/slritools/ in the SLRI Toolkit. PMID:12401134
Deep RNA-Seq to unlock the gene bank of floral development in Sinapis arvensis.

PubMed

Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

2014-01-01

Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops.
Deep RNA-Seq to Unlock the Gene Bank of Floral Development in Sinapis arvensis

PubMed Central

Liu, Jia; Mei, Desheng; Li, Yunchang; Huang, Shunmou; Hu, Qiong

2014-01-01

Sinapis arvensis is a weed with strong biological activity. Despite being a problematic annual weed that contaminates agricultural crop yield, it is a valuable alien germplasm resource. It can be utilized for broadening the genetic background of Brassica crops with desirable agricultural traits like resistance to blackleg (Leptosphaeria maculans), stem rot (Sclerotinia sclerotium) and pod shatter (caused by FRUITFULL gene). However, few genetic studies of S. arvensis were reported because of the lack of genomic resources. In the present study, we performed de novo transcriptome sequencing to produce a comprehensive dataset for S. arvensis for the first time. We used Illumina paired-end sequencing technology to sequence the S. arvensis flower transcriptome and generated 40,981,443 reads that were assembled into 131,278 transcripts. We de novo assembled 96,562 high quality unigenes with an average length of 832 bp. A total of 33,662 full-length ORF complete sequences were identified, and 41,415 unigenes were mapped onto 128 pathways using the KEGG Pathway database. The annotated unigenes were compared against Brassica rapa, B. oleracea, B. napus and Arabidopsis thaliana. Among these unigenes, 76,324 were identified as putative homologs of annotated sequences in the public protein databases, of which 1194 were associated with plant hormone signal transduction and 113 were related to gibberellin homeostasis/signaling. Unigenes that did not match any of those sequence datasets were considered to be unique to S. arvensis. Furthermore, 21,321 simple sequence repeats were found. Our study will enhance the currently available resources for Brassicaceae and will provide a platform for future genomic studies for genetic improvement of Brassica crops. PMID:25192023
Genome Survey Sequencing for the Characterization of the Genetic Background of Rosa roxburghii Tratt and Leaf Ascorbate Metabolism Genes.

PubMed

Lu, Min; An, Huaming; Li, Liangliang

2016-01-01

Rosa roxburghii Tratt is an important commercial horticultural crop in China that is recognized for its nutritional and medicinal values. In spite of the economic significance, genomic information on this rose species is currently unavailable. In the present research, a genome survey of R. roxburghii was carried out using next-generation sequencing (NGS) technologies. Total 30.29 Gb sequence data was obtained by HiSeq 2500 sequencing and an estimated genome size of R. roxburghii was 480.97 Mb, in which the guanine plus cytosine (GC) content was calculated to be 38.63%. All of these reads were technically assembled and a total of 627,554 contigs with a N50 length of 1.484 kb and furthermore 335,902 scaffolds with a total length of 409.36 Mb were obtained. Transposable elements (TE) sequence of 90.84 Mb which comprised 29.20% of the genome, and 167,859 simple sequence repeats (SSRs) were identified from the scaffolds. Among these, the mono-(66.30%), di-(25.67%), and tri-(6.64%) nucleotide repeats contributed to nearly 99% of the SSRs, and sequence motifs AG/CT (28.81%) and GAA/TTC (14.76%) were the most abundant among the dinucleotide and trinucleotide repeat motifs, respectively. Genome analysis predicted a total of 22,721 genes which have an average length of 2311.52 bp, an average exon length of 228.15 bp, and average intron length of 401.18 bp. Eleven genes putatively involved in ascorbate metabolism were identified and its expression in R. roxburghii leaves was validated by quantitative real-time PCR (qRT-PCR). This is the first report of genome-wide characterization of this rose species.
Label-Free Fluorescent DNA Dendrimers for microRNA Detection Based On Nonlinear Hybridization Chain Reaction-Mediated Multiple G-Quadruplex with Low Background Signal.

PubMed

Xue, Qingwang; Liu, Chunxue; Li, Xia; Dai, Li; Wang, Huaisheng

2018-04-18

Various fluorescent sensing systems for miRNA detection have been developed, but they mostly contain enzymatic amplification reactions and label procedures. The strict reaction conditions of tool enzymes and the high cost of labeling limit their potential applications, especially in complex biological matrices. Here, we have addressed the difficult problems and report a strategy for label-free fluorescent DNA dendrimers based on enzyme-free nonlinear hybridization chain reaction (HCR)-mediated multiple G-quadruplex for simple, sensitive, and selective detection of miRNAs with low-background signal. In the strategy, a split G-quadruplex (3:1) sequence is ingeniously designed at both ends of two double-stranded DNAs, which is exploited as building blocks for nonlinear HCR assembly, thereby acquiring a low background signal. A hairpin switch probe (HSP) was employed as recognition and transduction element. Upon sensing the target miRNA, the nonlinear HCR assembly of two blocks (blocks-A and blocks-B) was initiated with the help of two single-stranded DNA assistants, resulting in chain-branching growth of DNA dendrimers with multiple G-quadruplex incorporation. With the zinc(II)-protoporphyrin IX (ZnPPIX) selectively intercalated into the multiple G-quadruplexes, fluorescent DNA dendrimers were obtained, leading to an exponential fluorescence intensity increase. Benefiting from excellent performances of nonlinear HCR and low background signal, this strategy possesses the characteristics of a simplified reaction operation process, as well as high sensitivity. Moreover, the proposed fluorescent sensing strategy also shows preferable selectivity, and can be implemented without modified DNA blocks. Importantly, the strategy has also been tested for miRNA quantification with high confidence in breast cancer cells. Thus, this proposed strategy for label-free fluorescent DNA dendrimers based on a nonlinear HCR-mediated multiple G-quadruplex will be turned into an alternative approach for simple, sensitive, and selective miRNA quantification.
A Glance at Microsatellite Motifs from 454 Sequencing Reads of Watermelon Genomic DNA

USDA-ARS?s Scientific Manuscript database

A single 454 (Life Sciences Sequencing Technology) run of Charleston Gray watermelon (Citrullus lanatus var. lanatus) genomic DNA was performed and sequence data were assembled. A large scale identification of simple sequence repeat (SSR) was performed and SSR sequence data were used for the develo...
Development of chloroplast simple sequence repeats (cpSSRs) for the intraspecific study of Gracilaria tenuistipitata (Gracilariales, Rhodophyta) from different populations

PubMed Central

2014-01-01

Background Gracilaria tenuistipitata is an agarophyte with substantial economic potential because of its high growth rate and tolerance to a wide range of environment factors. This red seaweed is intensively cultured in China for the production of agar and fodder for abalone. Microsatellite markers were developed from the chloroplast genome of G. tenuistipitata var. liui to differentiate G. tenuistipitata obtained from six different localities: four from Peninsular Malaysia, one from Thailand and one from Vietnam. Eighty G. tenuistipitata specimens were analyzed using eight simple sequence repeat (SSR) primer-pairs that we developed for polymerase chain reaction (PCR) amplification. Findings Five mononucleotide primer-pairs and one trinucleotide primer-pair exhibited monomorphic alleles, whereas the other two primer-pairs separated the G. tenuistipitata specimens into two main clades. G. tenuistipitata from Thailand and Vietnam were grouped into one clade, and the populations from Batu Laut, Middle Banks and Kuah (Malaysia) were grouped into another clade. The combined dataset of these two primer-pairs separated G. tenuistipitata obtained from Kelantan, Malaysia from that obtained from other localities. Conclusions Based on the variations in repeated nucleotides of microsatellite markers, our results suggested that the populations of G. tenuistipitata were distributed into two main geographical regions: (i) populations in the west coast of Peninsular Malaysia and (ii) populations facing the South China Sea. The correct identification of G. tenuistipitata strains with traits of high economic potential will be advantageous for the mass cultivation of seaweeds. PMID:24490797
M13-Tailed Simple Sequence Repeat (SSR) Markers in Studies of Genetic Diversity and Population Structure of Common Oat Germplasm.

PubMed

Onyśk, Agnieszka; Boczkowska, Maja

2017-01-01

Simple Sequence Repeat (SSR) markers are one of the most frequently used molecular markers in studies of crop diversity and population structure. This is due to their uniform distribution in the genome, the high polymorphism, reproducibility, and codominant character. Additional advantages are the possibility of automatic analysis and simple interpretation of the results. The M13 tagged PCR reaction significantly reduces the costs of analysis by the automatic genetic analyzers. Here, we also disclose a short protocol of SSR data analysis.
Use of Sine Shaped High-Frequency Rhythmic Visual Stimuli Patterns for SSVEP Response Analysis and Fatigue Rate Evaluation in Normal Subjects

PubMed Central

Keihani, Ahmadreza; Shirzhiyan, Zahra; Farahi, Morteza; Shamsi, Elham; Mahnam, Amin; Makkiabadi, Bahador; Haidari, Mohsen R.; Jafari, Amir H.

2018-01-01

Background: Recent EEG-SSVEP signal based BCI studies have used high frequency square pulse visual stimuli to reduce subjective fatigue. However, the effect of total harmonic distortion (THD) has not been considered. Compared to CRT and LCD monitors, LED screen displays high-frequency wave with better refresh rate. In this study, we present high frequency sine wave simple and rhythmic patterns with low THD rate by LED to analyze SSVEP responses and evaluate subjective fatigue in normal subjects. Materials and Methods: We used patterns of 3-sequence high-frequency sine waves (25, 30, and 35 Hz) to design our visual stimuli. Nine stimuli patterns, 3 simple (repetition of each of above 3 frequencies e.g., P25-25-25) and 6 rhythmic (all of the frequencies in 6 different sequences e.g., P25-30-35) were chosen. A hardware setup with low THD rate (<0.1%) was designed to present these patterns on LED. Twenty two normal subjects (aged 23–30 (25 ± 2.1) yrs) were enrolled. Visual analog scale (VAS) was used for subjective fatigue evaluation after presentation of each stimulus pattern. PSD, CCA, and LASSO methods were employed to analyze SSVEP responses. The data including SSVEP features and fatigue rate for different visual stimuli patterns were statistically evaluated. Results: All 9 visual stimuli patterns elicited SSVEP responses. Overall, obtained accuracy rates were 88.35% for PSD and > 90% for CCA and LASSO (for TWs > 1 s). High frequency rhythmic patterns group with low THD rate showed higher accuracy rate (99.24%) than simple patterns group (98.48%). Repeated measure ANOVA showed significant difference between rhythmic pattern features (P < 0.0005). Overall, there was no significant difference between the VAS of rhythmic [3.85 ± 2.13] compared to the simple patterns group [3.96 ± 2.21], (P = 0.63). Rhythmic group had lower within group VAS variation (min = P25-30-35 [2.90 ± 2.45], max = P35-25-30 [4.81 ± 2.65]) as well as least individual pattern VAS (P25-30-35). Discussion and Conclusion: Overall, rhythmic and simple pattern groups had higher and similar accuracy rates. Rhythmic stimuli patterns showed insignificantly lower fatigue rate than simple patterns. We conclude that both rhythmic and simple visual high frequency sine wave stimuli require further research for human subject SSVEP-BCI studies. PMID:29892219
Use of Sine Shaped High-Frequency Rhythmic Visual Stimuli Patterns for SSVEP Response Analysis and Fatigue Rate Evaluation in Normal Subjects.

PubMed

Keihani, Ahmadreza; Shirzhiyan, Zahra; Farahi, Morteza; Shamsi, Elham; Mahnam, Amin; Makkiabadi, Bahador; Haidari, Mohsen R; Jafari, Amir H

2018-01-01

Background: Recent EEG-SSVEP signal based BCI studies have used high frequency square pulse visual stimuli to reduce subjective fatigue. However, the effect of total harmonic distortion (THD) has not been considered. Compared to CRT and LCD monitors, LED screen displays high-frequency wave with better refresh rate. In this study, we present high frequency sine wave simple and rhythmic patterns with low THD rate by LED to analyze SSVEP responses and evaluate subjective fatigue in normal subjects. Materials and Methods: We used patterns of 3-sequence high-frequency sine waves (25, 30, and 35 Hz) to design our visual stimuli. Nine stimuli patterns, 3 simple (repetition of each of above 3 frequencies e.g., P25-25-25) and 6 rhythmic (all of the frequencies in 6 different sequences e.g., P25-30-35) were chosen. A hardware setup with low THD rate (<0.1%) was designed to present these patterns on LED. Twenty two normal subjects (aged 23-30 (25 ± 2.1) yrs) were enrolled. Visual analog scale (VAS) was used for subjective fatigue evaluation after presentation of each stimulus pattern. PSD, CCA, and LASSO methods were employed to analyze SSVEP responses. The data including SSVEP features and fatigue rate for different visual stimuli patterns were statistically evaluated. Results: All 9 visual stimuli patterns elicited SSVEP responses. Overall, obtained accuracy rates were 88.35% for PSD and > 90% for CCA and LASSO (for TWs > 1 s). High frequency rhythmic patterns group with low THD rate showed higher accuracy rate (99.24%) than simple patterns group (98.48%). Repeated measure ANOVA showed significant difference between rhythmic pattern features ( P < 0.0005). Overall, there was no significant difference between the VAS of rhythmic [3.85 ± 2.13] compared to the simple patterns group [3.96 ± 2.21], ( P = 0.63). Rhythmic group had lower within group VAS variation (min = P25-30-35 [2.90 ± 2.45], max = P35-25-30 [4.81 ± 2.65]) as well as least individual pattern VAS (P25-30-35). Discussion and Conclusion: Overall, rhythmic and simple pattern groups had higher and similar accuracy rates. Rhythmic stimuli patterns showed insignificantly lower fatigue rate than simple patterns. We conclude that both rhythmic and simple visual high frequency sine wave stimuli require further research for human subject SSVEP-BCI studies.
All about the Human Genome Project (HGP)

MedlinePlus

... CSER), and Genome Sequencing Informatics Tools (GS-IT) Comparative Genomics Background information prepared for the media on ... other species to the human sequence. Background on Comparative Genomic Analysis New Process to Prioritize Animal Genomes ...
Evaluation of anonymous and expressed sequence tag derived polymorphic microsatellite markers in the tobacco budworm Heliothis virescens (Lepidoptera: noctuidae)

USDA-ARS?s Scientific Manuscript database

Polymorphic genetic markers were identified and characterized using a partial genomic library of Heliothis virescens enriched for simple sequence repeats (SSR) and nucleotide sequences of expressed sequence tags (EST). Nucleotide sequences of 192 clones from the partial genomic library yielded 147 u...
De novo Assembly of the Indo-Pacific Humpback Dolphin Leucocyte Transcriptome to Identify Putative Genes Involved in the Aquatic Adaptation and Immune Response

PubMed Central

Xia, Jia; Yang, Lili; Chen, Jialin; Wu, Yuping; Yi, Meisheng

2013-01-01

Background The Indo-Pacific humpback dolphin (Sousa chinensis), a marine mammal species inhabited in the waters of Southeast Asia, South Africa and Australia, has attracted much attention because of the dramatic decline in population size in the past decades, which raises the concern of extinction. So far, this species is poorly characterized at molecular level due to little sequence information available in public databases. Recent advances in large-scale RNA sequencing provide an efficient approach to generate abundant sequences for functional genomic analyses in the species with un-sequenced genomes. Principal Findings We performed a de novo assembly of the Indo-Pacific humpback dolphin leucocyte transcriptome by Illumina sequencing. 108,751 high quality sequences from 47,840,388 paired-end reads were generated, and 48,868 and 46,587 unigenes were functionally annotated by BLAST search against the NCBI non-redundant and Swiss-Prot protein databases (E-value<10−5), respectively. In total, 16,467 unigenes were clustered into 25 functional categories by searching against the COG database, and BLAST2GO search assigned 37,976 unigenes to 61 GO terms. In addition, 36,345 unigenes were grouped into 258 KEGG pathways. We also identified 9,906 simple sequence repeats and 3,681 putative single nucleotide polymorphisms as potential molecular markers in our assembled sequences. A large number of unigenes were predicted to be involved in immune response, and many genes were predicted to be relevant to adaptive evolution and cetacean-specific traits. Conclusion This study represented the first transcriptome analysis of the Indo-Pacific humpback dolphin, an endangered species. The de novo transcriptome analysis of the unique transcripts will provide valuable sequence information for discovery of new genes, characterization of gene expression, investigation of various pathways and adaptive evolution, as well as identification of genetic markers. PMID:24015242
Clustering method for counting passengers getting in a bus with single camera

NASA Astrophysics Data System (ADS)

Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying

2010-03-01

Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.
Simple chained guide trees give high-quality protein multiple sequence alignments

PubMed Central

Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

2014-01-01

Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495
Rapid Whole-Genome Sequencing for Investigation of a Neonatal MRSA Outbreak

PubMed Central

Köser, Claudio U.; Holden, Matthew T.G.; Ellington, Matthew J.; Cartwright, Edward J.P.; Brown, Nicholas M.; Ogilvy-Stuart, Amanda L.; Hsu, Li Yang; Chewapreecha, Claire; Croucher, Nicholas J.; Harris, Simon R.; Sanders, Mandy; Enright, Mark C.; Dougan, Gordon; Bentley, Stephen D.; Parkhill, Julian; Fraser, Louise J.; Betley, Jason R.; Schulz-Trieglaff, Ole B.; Smith, Geoffrey P.; Peacock, Sharon J.

2013-01-01

Background Isolates of methicillin-resistant Staphylococcus aureus (MRSA) belonging to a single lineage are often indistinguishable by means of current typing techniques. Whole-genome sequencing may provide improved resolution to define transmission pathways and characterize outbreaks. Methods We investigated a putative MRSA outbreak in a neonatal intensive care unit. By using rapid high-throughput sequencing technology with a clinically relevant turnaround time, we retrospectively sequenced the DNA from seven isolates associated with the outbreak and another seven MRSA isolates associated with carriage of MRSA or bacteremia in the same hospital. Results We constructed a phylogenetic tree by comparing single-nucleotide polymorphisms (SNPs) in the core genome to a reference genome (an epidemic MRSA clone, EMRSA-15 [sequence type 22]). This revealed a distinct cluster of outbreak isolates and clear separation between these and the nonoutbreak isolates. A previously missed transmission event was detected between two patients with bacteremia who were not part of the outbreak. We created an artificial “resistome” of antibiotic-resistance genes and demonstrated concordance between it and the results of phenotypic susceptibility testing; we also created a “toxome” consisting of toxin genes. One outbreak isolate had a hypermutator phenotype with a higher number of SNPs than the other outbreak isolates, highlighting the difficulty of imposing a simple threshold for the number of SNPs between isolates to decide whether they are part of a recent transmission chain. Conclusions Whole-genome sequencing can provide clinically relevant data within a time frame that can influence patient care. The need for automated data interpretation and the provision of clinically meaningful reports represent hurdles to clinical implementation. (Funded by the U.K. Clinical Research Collaboration Translational Infection Research Initiative and others.) PMID:22693998
Deconvoluting simulated metagenomes: the performance of hard- and soft- clustering algorithms applied to metagenomic chromosome conformation capture (3C)

PubMed Central

DeMaere, Matthew Z.

2016-01-01

Background Chromosome conformation capture, coupled with high throughput DNA sequencing in protocols like Hi-C and 3C-seq, has been proposed as a viable means of generating data to resolve the genomes of microorganisms living in naturally occuring environments. Metagenomic Hi-C and 3C-seq datasets have begun to emerge, but the feasibility of resolving genomes when closely related organisms (strain-level diversity) are present in the sample has not yet been systematically characterised. Methods We developed a computational simulation pipeline for metagenomic 3C and Hi-C sequencing to evaluate the accuracy of genomic reconstructions at, above, and below an operationally defined species boundary. We simulated datasets and measured accuracy over a wide range of parameters. Five clustering algorithms were evaluated (2 hard, 3 soft) using an adaptation of the extended B-cubed validation measure. Results When all genomes in a sample are below 95% sequence identity, all of the tested clustering algorithms performed well. When sequence data contains genomes above 95% identity (our operational definition of strain-level diversity), a naive soft-clustering extension of the Louvain method achieves the highest performance. Discussion Previously, only hard-clustering algorithms have been applied to metagenomic 3C and Hi-C data, yet none of these perform well when strain-level diversity exists in a metagenomic sample. Our simple extension of the Louvain method performed the best in these scenarios, however, accuracy remained well below the levels observed for samples without strain-level diversity. Strain resolution is also highly dependent on the amount of available 3C sequence data, suggesting that depth of sequencing must be carefully considered during experimental design. Finally, there appears to be great scope to improve the accuracy of strain resolution through further algorithm development. PMID:27843713
Microsatellite DNA in genomic survey sequences and UniGenes of loblolly pine

Treesearch

Craig S Echt; Surya Saha; Dennis L Deemer; C Dana Nelson

2011-01-01

Genomic DNA sequence databases are a potential and growing resource for simple sequence repeat (SSR) marker development in loblolly pine (Pinus taeda L.). Loblolly pine also has many expressed sequence tags (ESTs) available for microsatellite (SSR) marker development. We compared loblolly pine SSR densities in genome survey sequences (GSSs) to those in non-redundant...
Simulations Using Random-Generated DNA and RNA Sequences

ERIC Educational Resources Information Center

Bryce, C. F. A.

1977-01-01

Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

A SSR-based genetic linkage map of cultivated peanut (Arachis hypogaea L.)

USDA-ARS?s Scientific Manuscript database

The objective of this study was to construct a molecular linkage map of cultivated tetraploid peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank. Three recombinant inbre...
Development and transferability of black and red raspberry microsatellite markers from short-read sequences

USDA-ARS?s Scientific Manuscript database

The advent of next-generation sequencing technologies has been a boon to the cost-effective development of molecular markers, particularly in non-model species. Here, we demonstrate the efficiency of microsatellite or simple sequence repeat (SSR) marker development from short-read sequences using th...
Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

USDA-ARS?s Scientific Manuscript database

Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...
Transcriptome characterization and polymorphism detection between subspecies of big sagebrush (Artemisia tridentata)

PubMed Central

2011-01-01

Background Big sagebrush (Artemisia tridentata) is one of the most widely distributed and ecologically important shrub species in western North America. This species serves as a critical habitat and food resource for many animals and invertebrates. Habitat loss due to a combination of disturbances followed by establishment of invasive plant species is a serious threat to big sagebrush ecosystem sustainability. Lack of genomic data has limited our understanding of the evolutionary history and ecological adaptation in this species. Here, we report on the sequencing of expressed sequence tags (ESTs) and detection of single nucleotide polymorphism (SNP) and simple sequence repeat (SSR) markers in subspecies of big sagebrush. Results cDNA of A. tridentata sspp. tridentata and vaseyana were normalized and sequenced using the 454 GS FLX Titanium pyrosequencing technology. Assembly of the reads resulted in 20,357 contig consensus sequences in ssp. tridentata and 20,250 contigs in ssp. vaseyana. A BLASTx search against the non-redundant (NR) protein database using 29,541 consensus sequences obtained from a combined assembly resulted in 21,436 sequences with significant blast alignments (≤ 1e-15). A total of 20,952 SNPs and 119 polymorphic SSRs were detected between the two subspecies. SNPs were validated through various methods including sequence capture. Validation of SNPs in different individuals uncovered a high level of nucleotide variation in EST sequences. EST sequences of a third, tetraploid subspecies (ssp. wyomingensis) obtained by Illumina sequencing were mapped to the consensus sequences of the combined 454 EST assembly. Approximately one-third of the SNPs between sspp. tridentata and vaseyana identified in the combined assembly were also polymorphic within the two geographically distant ssp. wyomingensis samples. Conclusion We have produced a large EST dataset for Artemisia tridentata, which contains a large sample of the big sagebrush leaf transcriptome. SNP mapping among the three subspecies suggest the origin of ssp. wyomingensis via mixed ancestry. A large number of SNP and SSR markers provide the foundation for future research to address questions in big sagebrush evolution, ecological genetics, and conservation using genomic approaches. PMID:21767398
Determining Phylogenetic Relationships Among Date Palm Cultivars Using Random Amplified Polymorphic DNA (RAPD) and Inter-Simple Sequence Repeat (ISSR) Markers.

PubMed

Haider, Nadia

2017-01-01

Investigation of genetic variation and phylogenetic relationships among date palm (Phoenix dactylifera L.) cultivars is useful for their conservation and genetic improvement. Various molecular markers such as restriction fragment length polymorphisms (RFLPs), simple sequence repeat (SSR), representational difference analysis (RDA), and amplified fragment length polymorphism (AFLP) have been developed to molecularly characterize date palm cultivars. PCR-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (ISSR) are powerful tools to determine the relatedness of date palm cultivars that are difficult to distinguish morphologically. In this chapter, the principles, materials, and methods of RAPD and ISSR techniques are presented. Analysis of data generated from these two techniques and the use of these data to reveal phylogenetic relationships among date palm cultivars are also discussed.
Characterization and Amplification of Gene-Based Simple Sequence Repeat (SSR) Markers in Date Palm.

PubMed

Zhao, Yongli; Keremane, Manjunath; Prakash, Channapatna S; He, Guohao

2017-01-01

The paucity of molecular markers limits the application of genetic and genomic research in date palm (Phoenix dactylifera L.). Availability of expressed sequence tag (EST) sequences in date palm may provide a good resource for developing gene-based markers. This study characterizes a substantial fraction of transcriptome sequences containing simple sequence repeats (SSRs) from the EST sequences in date palm. The EST sequences studied are mainly homologous to those of Elaeis guineensis and Musa acuminata. A total of 911 gene-based SSR markers, characterized with functional annotations, have provided a useful basis not only for discovering candidate genes and understanding genetic basis of traits of interest but also for developing genetic and genomic tools for molecular research in date palm, such as diversity study, quantitative trait locus (QTL) mapping, and molecular breeding. The procedures of DNA extraction, polymerase chain reaction (PCR) amplification of these gene-based SSR markers, and gel electrophoresis of PCR products are described in this chapter.
Object tracking via background subtraction for monitoring illegal activity in crossroad

NASA Astrophysics Data System (ADS)

Ghimire, Deepak; Jeong, Sunghwan; Park, Sang Hyun; Lee, Joonwhoan

2016-07-01

In the field of intelligent transportation system a great number of vision-based techniques have been proposed to prevent pedestrians from being hit by vehicles. This paper presents a system that can perform pedestrian and vehicle detection and monitoring of illegal activity in zebra crossings. In zebra crossing, according to the traffic light status, to fully avoid a collision, a driver or pedestrian should be warned earlier if they possess any illegal moves. In this research, at first, we detect the traffic light status of pedestrian and monitor the crossroad for vehicle pedestrian moves. The background subtraction based object detection and tracking is performed to detect pedestrian and vehicles in crossroads. Shadow removal, blob segmentation, trajectory analysis etc. are used to improve the object detection and classification performance. We demonstrate the experiment in several video sequences which are recorded in different time and environment such as day time and night time, sunny and raining environment. Our experimental results show that such simple and efficient technique can be used successfully as a traffic surveillance system to prevent accidents in zebra crossings.
Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes

PubMed Central

Salazar-Gonzalez, Jesus F.; Salazar, Maria G.; Tully, Damien C.; Ogilvie, Colin B.; Learn, Gerald H.; Allen, Todd M.; Heath, Sonya L.; Goepfert, Paul; Bar, Katharine J.

2016-01-01

Background Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS) as a simple and convenient alternative to collecting and storing frozen plasma. Methods We performed parallel nucleic acid extraction, single genome amplification (SGA), next generation sequencing (NGS), and phylogenetic analyses on plasma and DBS. Results We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF) HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ∼50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20°C for up to five months. Conclusions These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings. PMID:27819061
A comprehensive evaluation of assembly scaffolding tools

PubMed Central

2014-01-01

Background Genome assembly is typically a two-stage process: contig assembly followed by the use of paired sequencing reads to join contigs into scaffolds. Scaffolds are usually the focus of reported assembly statistics; longer scaffolds greatly facilitate the use of genome sequences in downstream analyses, and it is appealing to present larger numbers as metrics of assembly performance. However, scaffolds are highly prone to errors, especially when generated using short reads, which can directly result in inflated assembly statistics. Results Here we provide the first independent evaluation of scaffolding tools for second-generation sequencing data. We find large variations in the quality of results depending on the tool and dataset used. Even extremely simple test cases of perfect input, constructed to elucidate the behaviour of each algorithm, produced some surprising results. We further dissect the performance of the scaffolders using real and simulated sequencing data derived from the genomes of Staphylococcus aureus, Rhodobacter sphaeroides, Plasmodium falciparum and Homo sapiens. The results from simulated data are of high quality, with several of the tools producing perfect output. However, at least 10% of joins remains unidentified when using real data. Conclusions The scaffolders vary in their usability, speed and number of correct and missed joins made between contigs. Results from real data highlight opportunities for further improvements of the tools. Overall, SGA, SOPRA and SSPACE generally outperform the other tools on our datasets. However, the quality of the results is highly dependent on the read mapper and genome complexity. PMID:24581555
CRISPR Associated Diversity within a Population of Sulfolobus islandicus

PubMed Central

Held, Nicole L.; Herrera, Alfa; Cadillo-Quiroz, Hinsby; Whitaker, Rachel J.

2010-01-01

Background Predator-prey models for virus-host interactions predict that viruses will cause oscillations of microbial host densities due to an arms race between resistance and virulence. A new form of microbial resistance, CRISPRs (clustered regularly interspaced short palindromic repeats) are a rapidly evolving, sequence-specific immunity mechanism in which a short piece of invading viral DNA is inserted into the host's chromosome, thereby rendering the host resistant to further infection. Few studies have linked this form of resistance to population dynamics in natural microbial populations. Methodology/Principal Findings We examined sequence diversity in 39 strains of the archeaon Sulfolobus islandicus from a single, isolated hot spring from Kamchatka, Russia to determine the effects of CRISPR immunity on microbial population dynamics. First, multiple housekeeping genetic markers identify a large clonal group of identical genotypes coexisting with a diverse set of rare genotypes. Second, the sequence-specific CRISPR spacer arrays split the large group of isolates into two very different groups and reveal extensive diversity and no evidence for dominance of a single clone within the population. Conclusions/Significance The evenness of resistance genotypes found within this population of S. islandicus is indicative of a lack of strain dominance, in contrast to the prediction for a resistant strain in a simple predator-prey interaction. Based on evidence for the independent acquisition of resistant sequences, we hypothesize that CRISPR mediated clonal interference between resistant strains promotes and maintains diversity in this natural population. PMID:20927396
On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation

PubMed Central

2014-01-01

Background Protein sequence similarities to any types of non-globular segments (coiled coils, low complexity regions, transmembrane regions, long loops, etc. where either positional sequence conservation is the result of a very simple, physically induced pattern or rather integral sequence properties are critical) are pertinent sources for mistaken homologies. Regretfully, these considerations regularly escape attention in large-scale annotation studies since, often, there is no substitute to manual handling of these cases. Quantitative criteria are required to suppress events of function annotation transfer as a result of false homology assignments. Results The sequence homology concept is based on the similarity comparison between the structural elements, the basic building blocks for conferring the overall fold of a protein. We propose to dissect the total similarity score into fold-critical and other, remaining contributions and suggest that, for a valid homology statement, the fold-relevant score contribution should at least be significant on its own. As part of the article, we provide the DissectHMMER software program for dissecting HMMER2/3 scores into segment-specific contributions. We show that DissectHMMER reproduces HMMER2/3 scores with sufficient accuracy and that it is useful in automated decisions about homology for instructive sequence examples. To generalize the dissection concept for cases without 3D structural information, we find that a dissection based on alignment quality is an appropriate surrogate. The approach was applied to a large-scale study of SMART and PFAM domains in the space of seed sequences and in the space of UniProt/SwissProt. Conclusions Sequence similarity core dissection with regard to fold-critical and other contributions systematically suppresses false hits and, additionally, recovers previously obscured homology relationships such as the one between aquaporins and formate/nitrite transporters that, so far, was only supported by structure comparison. PMID:24890864
Microsatellites for Lindera species

Treesearch

Craig S. Echt; D. Deemer; T.L. Kubisiak; C.D. Nelson

2006-01-01

Microsatellite markers were developed for conservation genetic studies of Lindera melissifolia (pondberry), a federally endangered shrub of southern bottomland ecosystems. Microsatellite sequences were obtained from DNA libraries that were enriched for the (AC)n simple sequence repeat motif. From 35 clone sequences, 20 primer...
Application of next-generation sequencing for rapid marker development in molecular plant breeding: a case study on anthracnose disease resistance in Lupinus angustifolius L.

PubMed Central

2012-01-01

Background In the last 30 years, a number of DNA fingerprinting methods such as RFLP, RAPD, AFLP, SSR, DArT, have been extensively used in marker development for molecular plant breeding. However, it remains a daunting task to identify highly polymorphic and closely linked molecular markers for a target trait for molecular marker-assisted selection. The next-generation sequencing (NGS) technology is far more powerful than any existing generic DNA fingerprinting methods in generating DNA markers. In this study, we employed a grain legume crop Lupinus angustifolius (lupin) as a test case, and examined the utility of an NGS-based method of RAD (restriction-site associated DNA) sequencing as DNA fingerprinting for rapid, cost-effective marker development tagging a disease resistance gene for molecular breeding. Results Twenty informative plants from a cross of RxS (disease resistant x susceptible) in lupin were subjected to RAD single-end sequencing by multiplex identifiers. The entire RAD sequencing products were resolved in two lanes of the 16-lanes per run sequencing platform Solexa HiSeq2000. A total of 185 million raw reads, approximately 17 Gb of sequencing data, were collected. Sequence comparison among the 20 test plants discovered 8207 SNP markers. Filtration of DNA sequencing data with marker identification parameters resulted in the discovery of 38 molecular markers linked to the disease resistance gene Lanr1. Five randomly selected markers were converted into cost-effective, simple PCR-based markers. Linkage analysis using marker genotyping data and disease resistance phenotyping data on a F8 population consisting of 186 individual plants confirmed that all these five markers were linked to the R gene. Two of these newly developed sequence-specific PCR markers, AnSeq3 and AnSeq4, flanked the target R gene at a genetic distance of 0.9 centiMorgan (cM), and are now replacing the markers previously developed by a traditional DNA fingerprinting method for marker-assisted selection in the Australian national lupin breeding program. Conclusions We demonstrated that more than 30 molecular markers linked to a target gene of agronomic trait of interest can be identified from a small portion (1/8) of one sequencing run on HiSeq2000 by applying NGS based RAD sequencing in marker development. The markers developed by the strategy described in this study are all co-dominant SNP markers, which can readily be converted into high throughput multiplex format or low-cost, simple PCR-based markers desirable for large scale marker implementation in plant breeding programs. The high density and closely linked molecular markers associated with a target trait help to overcome a major bottleneck for implementation of molecular markers on a wide range of germplasm in breeding programs. We conclude that application of NGS based RAD sequencing as DNA fingerprinting is a very rapid and cost-effective strategy for marker development in molecular plant breeding. The strategy does not require any prior genome knowledge or molecular information for the species under investigation, and it is applicable to other plant species. PMID:22805587
Targeted isolation, sequence assembly and characterization of two white spruce (Picea glauca) BAC clones for terpenoid synthase and cytochrome P450 genes involved in conifer defence reveal insights into a conifer genome

PubMed Central

2009-01-01

Background Conifers are a large group of gymnosperm trees which are separated from the angiosperms by more than 300 million years of independent evolution. Conifer genomes are extremely large and contain considerable amounts of repetitive DNA. Currently, conifer sequence resources exist predominantly as expressed sequence tags (ESTs) and full-length (FL)cDNAs. There is no genome sequence available for a conifer or any other gymnosperm. Conifer defence-related genes often group into large families with closely related members. The goals of this study are to assess the feasibility of targeted isolation and sequence assembly of conifer BAC clones containing specific genes from two large gene families, and to characterize large segments of genomic DNA sequence for the first time from a conifer. Results We used a PCR-based approach to identify BAC clones for two target genes, a terpene synthase (3-carene synthase; 3CAR) and a cytochrome P450 (CYP720B4) from a non-arrayed genomic BAC library of white spruce (Picea glauca). Shotgun genomic fragments isolated from the BAC clones were sequenced to a depth of 15.6- and 16.0-fold coverage, respectively. Assembly and manual curation yielded sequence scaffolds of 172 kbp (3CAR) and 94 kbp (CYP720B4) long. Inspection of the genomic sequences revealed the intron-exon structures, the putative promoter regions and putative cis-regulatory elements of these genes. Sequences related to transposable elements (TEs), high complexity repeats and simple repeats were prevalent and comprised approximately 40% of the sequenced genomic DNA. An in silico simulation of the effect of sequencing depth on the quality of the sequence assembly provides direction for future efforts of conifer genome sequencing. Conclusion We report the first targeted cloning, sequencing, assembly, and annotation of large segments of genomic DNA from a conifer. We demonstrate that genomic BAC clones for individual members of multi-member gene families can be isolated in a gene-specific fashion. The results of the present work provide important new information about the structure and content of conifer genomic DNA that will guide future efforts to sequence and assemble conifer genomes. PMID:19656416
Differential effects of simple repeating DNA sequences on gene expression from the SV40 early promoter.

PubMed

Amirhaeri, S; Wohlrab, F; Wells, R D

1995-02-17

The influence of simple repeat sequences, cloned into different positions relative to the SV40 early promoter/enhancer, on the transient expression of the chloramphenicol acetyltransferase (CAT) gene was investigated. Insertion of (G)29.(C)29 in either orientation into the 5'-untranslated region of the CAT gene reduced expression in CV-1 cells 50-100 fold when compared with controls with random sequence inserts. Analysis of CAT-specific mRNA levels demonstrated that the effect was due to a reduction of CAT mRNA production rather than to posttranscriptional events. In contrast, insertion of the same insert in either orientation upstream of the promoter-enhancer or downstream of the gene stimulated gene expression 2-3-fold. These effects could be reversed by cotransfection of a competitor plasmid carrying (G)25.(C)25 sequences. The results suggest that a G.C-binding transcription factor modulates gene expression in this system and that promoter strength can be regulated by providing protein-binding sites in trans. Although constructs containing longer tracts of alternating (C-G), (T-G), or (A-T) sequences inhibited CAT expression when inserted in the 5'-untranslated region of the CAT gene, the amount of CAT mRNA was unaffected. Hence, these inhibitions must be due to posttranscriptional events, presumably at the level of translation. These effects of microsatellite sequences on gene expression are discussed with respect to recent data on related simple repeat sequences which cause several human genetic diseases.
Simple-MSSM: a simple and efficient method for simultaneous multi-site saturation mutagenesis.

PubMed

Cheng, Feng; Xu, Jian-Miao; Xiang, Chao; Liu, Zhi-Qiang; Zhao, Li-Qing; Zheng, Yu-Guo

2017-04-01

To develop a practically simple and robust multi-site saturation mutagenesis (MSSM) method that enables simultaneously recombination of amino acid positions for focused mutant library generation. A general restriction enzyme-free and ligase-free MSSM method (Simple-MSSM) based on prolonged overlap extension PCR (POE-PCR) and Simple Cloning techniques. As a proof of principle of Simple-MSSM, the gene of eGFP (enhanced green fluorescent protein) was used as a template gene for simultaneous mutagenesis of five codons. Forty-eight randomly selected clones were sequenced. Sequencing revealed that all the 48 clones showed at least one mutant codon (mutation efficiency = 100%), and 46 out of the 48 clones had mutations at all the five codons. The obtained diversities at these five codons are 27, 24, 26, 26 and 22, respectively, which correspond to 84, 75, 81, 81, 69% of the theoretical diversity offered by NNK-degeneration (32 codons; NNK, K = T or G). The enzyme-free Simple-MSSM method can simultaneously and efficiently saturate five codons within one day, and therefore avoid missing interactions between residues in interacting amino acid networks.
Method for Constructing Composite Response Surfaces by Combining Neural Networks with Polynominal Interpolation or Estimation Techniques

NASA Technical Reports Server (NTRS)

Rai, Man Mohan (Inventor); Madavan, Nateri K. (Inventor)

2007-01-01

A method and system for data modeling that incorporates the advantages of both traditional response surface methodology (RSM) and neural networks is disclosed. The invention partitions the parameters into a first set of s simple parameters, where observable data are expressible as low order polynomials, and c complex parameters that reflect more complicated variation of the observed data. Variation of the data with the simple parameters is modeled using polynomials; and variation of the data with the complex parameters at each vertex is analyzed using a neural network. Variations with the simple parameters and with the complex parameters are expressed using a first sequence of shape functions and a second sequence of neural network functions. The first and second sequences are multiplicatively combined to form a composite response surface, dependent upon the parameter values, that can be used to identify an accurate mode
Principles of protein folding--a perspective from simple exact models.

PubMed Central

Dill, K. A.; Bromberg, S.; Yue, K.; Fiebig, K. M.; Yee, D. P.; Thomas, P. D.; Chan, H. S.

1995-01-01

General principles of protein structure, stability, and folding kinetics have recently been explored in computer simulations of simple exact lattice models. These models represent protein chains at a rudimentary level, but they involve few parameters, approximations, or implicit biases, and they allow complete explorations of conformational and sequence spaces. Such simulations have resulted in testable predictions that are sometimes unanticipated: The folding code is mainly binary and delocalized throughout the amino acid sequence. The secondary and tertiary structures of a protein are specified mainly by the sequence of polar and nonpolar monomers. More specific interactions may refine the structure, rather than dominate the folding code. Simple exact models can account for the properties that characterize protein folding: two-state cooperativity, secondary and tertiary structures, and multistage folding kinetics--fast hydrophobic collapse followed by slower annealing. These studies suggest the possibility of creating "foldable" chain molecules other than proteins. The encoding of a unique compact chain conformation may not require amino acids; it may require only the ability to synthesize specific monomer sequences in which at least one monomer type is solvent-averse. PMID:7613459
Identification of Simple Sequence Repeats in Chloroplast Genomes of Magnoliids Through Bioinformatics Approach.

PubMed

Srivastava, Deepika; Shanker, Asheesh

2016-12-01

Basal angiosperms or Magnoliids is an important clade of commercially important plants which mainly include spices and edible fruits. In this study, 17 chloroplast genome sequences belonging to clade Magnoliids were screened for the identification of chloroplast simple sequence repeats (cpSSRs). Simple sequence repeats or microsatellites are short stretches of DNA up to 1-6 base pair in length. These repeats are ubiquitous and play important role in the development of molecular markers and to study the mapping of traits of economic, medical or ecological interest. A total of 479 SSRs were detected, showing average density of 1 SSR/6.91 kb. Depending on the repeat units, the length of SSRs ranged from 12 to 24 bp for mono-, 12 to 18 bp for di-, 12 to 26 bp for tri-, 12 to 24 bp for tetra-, 15 bp for penta- and 18 bp for hexanucleotide repeats. Mononucleotide repeats were the most frequent (207, 43.21 %) followed by tetranucleotide repeats (130, 27.13 %). Penta- and hexanucleotide repeats were least frequent or absent in these chloroplast genomes.
New chloroplast microsatellite markers suitable for assessing genetic diversity of Lolium perenne and other related grass species

PubMed Central

Diekmann, Kerstin; Hodkinson, Trevor R.; Barth, Susanne

2012-01-01

Background and Aims Lolium perenne (perennial ryegrass) is the most important forage grass species of temperate regions. We have previously released the chloroplast genome sequence of L. perenne ‘Cashel’. Here nine chloroplast microsatellite markers are published, which were designed based on knowledge about genetically variable regions within the L. perenne chloroplast genome. These markers were successfully used for characterizing the genetic diversity in Lolium and different grass species. Methods Chloroplast genomes of 14 Poaceae taxa were screened for mononucleotide microsatellite repeat regions and primers designed for their amplification from nine loci. The potential of these markers to assess genetic diversity was evaluated on a set of 16 Irish and 15 European L. perenne ecotypes, nine L. perenne cultivars, other Lolium taxa and other grass species. Key Results All analysed Poaceae chloroplast genomes contained more than 200 mononucleotide repeats (chloroplast simple sequence repeats, cpSSRs) of at least 7 bp in length, concentrated mainly in the large single copy region of the genome. Nucleotide composition varied considerably among subfamilies (with Pooideae biased towards poly A repeats). The nine new markers distinguish L. perenne from all non-Lolium taxa. TeaCpSSR28 was able to distinguish between all Lolium species and Lolium multiflorum due to an elongation of an A8 mononucleotide repeat in L. multiflorum. TeaCpSSR31 detected a considerable degree of microsatellite length variation and single nucleotide polymorphism. TeaCpSSR27 revealed variation within some L. perenne accessions due to a 44-bp indel and was hence readily detected by simple agarose gel electrophoresis. Smaller insertion/deletion events or single nucleotide polymorphisms detected by these new markers could be visualized by polyacrylamide gel electrophoresis or DNA sequencing, respectively. Conclusions The new markers are a valuable tool for plant breeding companies, seed testing agencies and the wider scientific community due to their ability to monitor genetic diversity within breeding pools, to trace maternal inheritance and to distinguish closely related species. PMID:22419761

Construction of the first genetic linkage map of Japanese gentian (Gentianaceae)

PubMed Central

2012-01-01

Background Japanese gentians (Gentiana triflora and Gentiana scabra) are amongst the most popular floricultural plants in Japan. However, genomic resources for Japanese gentians have not yet been developed, mainly because of the heterozygous genome structure conserved by outcrossing, the long juvenile period, and limited knowledge about the inheritance of important traits. In this study, we developed a genetic linkage map to improve breeding programs of Japanese gentians. Results Enriched simple sequence repeat (SSR) libraries from a G. triflora double haploid line yielded almost 20,000 clones using 454 pyrosequencing technology, 6.7% of which could be used to design SSR markers. To increase the number of molecular markers, we identified three putative long terminal repeat (LTR) sequences using the recently developed inter-primer binding site (iPBS) method. We also developed retrotransposon microsatellite amplified polymorphism (REMAP) markers combining retrotransposon and inter-simple sequence repeat (ISSR) markers. In addition to SSR and REMAP markers, modified amplified fragment length polymorphism (AFLP) and random amplification polymorphic DNA (RAPD) markers were developed. Using 93 BC1 progeny from G. scabra backcrossed with a G. triflora double haploid line, 19 linkage groups were constructed with a total of 263 markers (97 SSR, 97 AFLP, 39 RAPD, and 30 REMAP markers). One phenotypic trait (stem color) and 10 functional markers related to genes controlling flower color, flowering time and cold tolerance were assigned to the linkage map, confirming its utility. Conclusions This is the first reported genetic linkage map for Japanese gentians and for any species belonging to the family Gentianaceae. As demonstrated by mapping of functional markers and the stem color trait, our results will help to explain the genetic basis of agronomic important traits, and will be useful for marker-assisted selection in gentian breeding programs. Our map will also be an important resource for further genetic analyses such as mapping of quantitative trait loci and map-based cloning of genes in this species. PMID:23186361
A 48 SNP set for grapevine cultivar identification

PubMed Central

2011-01-01

Background Rapid and consistent genotyping is an important requirement for cultivar identification in many crop species. Among them grapevine cultivars have been the subject of multiple studies given the large number of synonyms and homonyms generated during many centuries of vegetative multiplication and exchange. Simple sequence repeat (SSR) markers have been preferred until now because of their high level of polymorphism, their codominant nature and their high profile repeatability. However, the rapid application of partial or complete genome sequencing approaches is identifying thousands of single nucleotide polymorphisms (SNP) that can be very useful for such purposes. Although SNP markers are bi-allelic, and therefore not as polymorphic as microsatellites, the high number of loci that can be multiplexed and the possibilities of automation as well as their highly repeatable results under any analytical procedure make them the future markers of choice for any type of genetic identification. Results We analyzed over 300 SNP in the genome of grapevine using a re-sequencing strategy in a selection of 11 genotypes. Among the identified polymorphisms, we selected 48 SNP spread across all grapevine chromosomes with allele frequencies balanced enough as to provide sufficient information content for genetic identification in grapevine allowing for good genotyping success rate. Marker stability was tested in repeated analyses of a selected group of cultivars obtained worldwide to demonstrate their usefulness in genetic identification. Conclusions We have selected a set of 48 stable SNP markers with a high discrimination power and a uniform genome distribution (2-3 markers/chromosome), which is proposed as a standard set for grapevine (Vitis vinifera L.) genotyping. Any previous problems derived from microsatellite allele confusion between labs or the need to run reference cultivars to identify allele sizes disappear using this type of marker. Furthermore, because SNP markers are bi-allelic, allele identification and genotype naming are extremely simple and genotypes obtained with different equipments and by different laboratories are always fully comparable. PMID:22060012
Use of PCR with Sequence-specific Primers for High-Resolution Human Leukocyte Antigen Typing of Patients with Narcolepsy

PubMed Central

Woo, Hye In; Joo, Eun Yeon; Lee, Kyung Wha

2012-01-01

Background Narcolepsy is a neurologic disorder characterized by excessive daytime sleepiness, symptoms of abnormal rapid eye movement (REM) sleep, and a strong association with HLA-DRB1*1501, -DQA1*0102, and -DQB1*0602. Here, we investigated the clinico-physical characteristics of Korean patients with narcolepsy, their HLA types, and the clinical utility of high-resolution PCR with sequence-specific primers (PCR-SSP) as a simple typing method for identifying DRB1*15/16, DQA1, and DQB1 alleles. Methods The study population consisted of 67 consecutively enrolled patients having unexplained daytime sleepiness and diagnosed narcolepsy based on clinical and neurological findings. Clinical data and the results of the multiple sleep latency test and polysomnography were reviewed, and HLA typing was performed using both high-resolution PCR-SSP and sequence-based typing (SBT). Results The 44 narcolepsy patients with cataplexy displayed significantly higher frequencies of DRB1*1501 (Pc= 0.003), DQA1*0102 (Pc=0.001), and DQB1*0602 (Pc=0.014) than the patients without cataplexy. Among patients carrying DRB1*1501-DQB1*0602 or DQA1*0102, the frequencies of a mean REM sleep latency of less than 20 min in nocturnal polysomnography and clinical findings, including sleep paralysis and hypnagogic hallucination were significantly higher. SBT and PCR-SSP showed 100% concordance for high-resolution typing of DRB1*15/16 alleles and DQA1 and DQB1 loci. Conclusions The clinical characteristics and somnographic findings of narcolepsy patients were associated with specific HLA alleles, including DRB1*1501, DQA1*0102, and DQB1*0602. Application of high-resolution PCR-SSP, a reliable and simple method, for both allele- and locus-specific HLA typing of DRB1*15/16, DQA1, and DQB1 would be useful for characterizing clinical status among subjects with narcolepsy. PMID:22259780
Aircraft stress sequence development: A complex engineering process made simple

NASA Technical Reports Server (NTRS)

Schrader, K. H.; Butts, D. G.; Sparks, W. A.

1994-01-01

Development of stress sequences for critical aircraft structure requires flight measured usage data, known aircraft loads, and established relationships between aircraft flight loads and structural stresses. Resulting cycle-by-cycle stress sequences can be directly usable for crack growth analysis and coupon spectra tests. Often, an expert in loads and spectra development manipulates the usage data into a typical sequence of representative flight conditions for which loads and stresses are calculated. For a fighter/trainer type aircraft, this effort is repeated many times for each of the fatigue critical locations (FCL) resulting in expenditure of numerous engineering hours. The Aircraft Stress Sequence Computer Program (ACSTRSEQ), developed by Southwest Research Institute under contract to San Antonio Air Logistics Center, presents a unique approach for making complex technical computations in a simple, easy to use method. The program is written in Microsoft Visual Basic for the Microsoft Windows environment.
Hippocampal Replay is Not a Simple Function of Experience

PubMed Central

Gupta, Anoopum S.; van der Meer, Matthijs A. A.; Touretzky, David S.; Redish, A. David

2015-01-01

Summary Replay of behavioral sequences in the hippocampus during sharp-wave-ripple-complexes (SWRs) provides a potential mechanism for memory consolidation and the learning of knowledge structures. Current hypotheses imply that replay should straightforwardly reflect recent experience. However, we find these hypotheses to be incompatible with the content of replay on a task with two distinct behavioral sequences (A&B). We observed forward and backward replay of B even when rats had been performing A for >10 minutes. Furthermore, replay of non-local sequence B occurred more often when B was infrequently experienced. Neither forward nor backward sequences preferentially represented highly-experienced trajectories within a session. Additionally, we observed the construction of never-experienced novel-path sequences. These observations challenge the idea that sequence activation during SWRs is a simple replay of recent experience. Instead, replay reflected all physically available trajectories within the environment, suggesting a potential role in active learning and maintenance of the cognitive map. PMID:20223204
Alignment-free Transcriptomic and Metatranscriptomic Comparison Using Sequencing Signatures with Variable Length Markov Chains.

PubMed

Liao, Weinan; Ren, Jie; Wang, Kun; Wang, Shun; Zeng, Feng; Wang, Ying; Sun, Fengzhu

2016-11-23

The comparison between microbial sequencing data is critical to understand the dynamics of microbial communities. The alignment-based tools analyzing metagenomic datasets require reference sequences and read alignments. The available alignment-free dissimilarity approaches model the background sequences with Fixed Order Markov Chain (FOMC) yielding promising results for the comparison of microbial communities. However, in FOMC, the number of parameters grows exponentially with the increase of the order of Markov Chain (MC). Under a fixed high order of MC, the parameters might not be accurately estimated owing to the limitation of sequencing depth. In our study, we investigate an alternative to FOMC to model background sequences with the data-driven Variable Length Markov Chain (VLMC) in metatranscriptomic data. The VLMC originally designed for long sequences was extended to apply to high-throughput sequencing reads and the strategies to estimate the corresponding parameters were developed. The flexible number of parameters in VLMC avoids estimating the vast number of parameters of high-order MC under limited sequencing depth. Different from the manual selection in FOMC, VLMC determines the MC order adaptively. Several beta diversity measures based on VLMC were applied to compare the bacterial RNA-Seq and metatranscriptomic datasets. Experiments show that VLMC outperforms FOMC to model the background sequences in transcriptomic and metatranscriptomic samples. A software pipeline is available at https://d2vlmc.codeplex.com.
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
Detection of Bacillus anthracis DNA in Complex Soil and Air Samples Using Next-Generation Sequencing

PubMed Central

Be, Nicholas A.; Thissen, James B.; Gardner, Shea N.; McLoughlin, Kevin S.; Fofanov, Viacheslav Y.; Koshinsky, Heather; Ellingson, Sally R.; Brettin, Thomas S.; Jackson, Paul J.; Jaing, Crystal J.

2013-01-01

Bacillus anthracis is the potentially lethal etiologic agent of anthrax disease, and is a significant concern in the realm of biodefense. One of the cornerstones of an effective biodefense strategy is the ability to detect infectious agents with a high degree of sensitivity and specificity in the context of a complex sample background. The nature of the B. anthracis genome, however, renders specific detection difficult, due to close homology with B. cereus and B. thuringiensis. We therefore elected to determine the efficacy of next-generation sequencing analysis and microarrays for detection of B. anthracis in an environmental background. We applied next-generation sequencing to titrated genome copy numbers of B. anthracis in the presence of background nucleic acid extracted from aerosol and soil samples. We found next-generation sequencing to be capable of detecting as few as 10 genomic equivalents of B. anthracis DNA per nanogram of background nucleic acid. Detection was accomplished by mapping reads to either a defined subset of reference genomes or to the full GenBank database. Moreover, sequence data obtained from B. anthracis could be reliably distinguished from sequence data mapping to either B. cereus or B. thuringiensis. We also demonstrated the efficacy of a microbial census microarray in detecting B. anthracis in the same samples, representing a cost-effective and high-throughput approach, complementary to next-generation sequencing. Our results, in combination with the capacity of sequencing for providing insights into the genomic characteristics of complex and novel organisms, suggest that these platforms should be considered important components of a biosurveillance strategy. PMID:24039948
MELOGEN: an EST database for melon functional genomics

PubMed Central

Gonzalez-Ibeas, Daniel; Blanca, José; Roig, Cristina; González-To, Mireia; Picó, Belén; Truniger, Verónica; Gómez, Pedro; Deleu, Wim; Caño-Delgado, Ana; Arús, Pere; Nuez, Fernando; Garcia-Mas, Jordi; Puigdomènech, Pere; Aranda, Miguel A

2007-01-01

Background Melon (Cucumis melo L.) is one of the most important fleshy fruits for fresh consumption. Despite this, few genomic resources exist for this species. To facilitate the discovery of genes involved in essential traits, such as fruit development, fruit maturation and disease resistance, and to speed up the process of breeding new and better adapted melon varieties, we have produced a large collection of expressed sequence tags (ESTs) from eight normalized cDNA libraries from different tissues in different physiological conditions. Results We determined over 30,000 ESTs that were clustered into 16,637 non-redundant sequences or unigenes, comprising 6,023 tentative consensus sequences (contigs) and 10,614 unclustered sequences (singletons). Many potential molecular markers were identified in the melon dataset: 1,052 potential simple sequence repeats (SSRs) and 356 single nucleotide polymorphisms (SNPs) were found. Sixty-nine percent of the melon unigenes showed a significant similarity with proteins in databases. Functional classification of the unigenes was carried out following the Gene Ontology scheme. In total, 9,402 unigenes were mapped to one or more ontology. Remarkably, the distributions of melon and Arabidopsis unigenes followed similar tendencies, suggesting that the melon dataset is representative of the whole melon transcriptome. Bioinformatic analyses primarily focused on potential precursors of melon micro RNAs (miRNAs) in the melon dataset, but many other genes potentially controlling disease resistance and fruit quality traits were also identified. Patterns of transcript accumulation were characterised by Real-Time-qPCR for 20 of these genes. Conclusion The collection of ESTs characterised here represents a substantial increase on the genetic information available for melon. A database (MELOGEN) which contains all EST sequences, contig images and several tools for analysis and data mining has been created. This set of sequences constitutes also the basis for an oligo-based microarray for melon that is being used in experiments to further analyse the melon transcriptome. PMID:17767721
Analysis of SSR information in EST resources of sugarcane

USDA-ARS?s Scientific Manuscript database

Expressed sequence tags ( ESTs) offer the opportunity to exploit single, low -copy, conserved sequence motifs for the development of simple sequence repeats ( SSRs). The total of 262 113 ESTs of sugarcane (Saccharum officinarum) in the database of NCBI were downloaded and analyzed, which resulted in...
The Complete Chloroplast Genome of Banana (Musa acuminata, Zingiberales): Insight into Plastid Monocotyledon Evolution

PubMed Central

Martin, Guillaume; Baurens, Franc-Christophe; Cardi, Céline; Aury, Jean-Marc; D’Hont, Angélique

2013-01-01

Background Banana (genus Musa) is a crop of major economic importance worldwide. It is a monocotyledonous member of the Zingiberales, a sister group of the widely studied Poales. Most cultivated bananas are natural Musa inter-(sub-)specific triploid hybrids. A Musa acuminata reference nuclear genome sequence was recently produced based on sequencing of genomic DNA enriched in nucleus. Methodology/Principal Findings The Musa acuminata chloroplast genome was assembled with chloroplast reads extracted from whole-genome-shotgun sequence data. The Musa chloroplast genome is a circular molecule of 169,972 bp with a quadripartite structure containing two single copy regions, a Large Single Copy region (LSC, 88,338 bp) and a Small Single Copy region (SSC, 10,768 bp) separated by Inverted Repeat regions (IRs, 35,433 bp). Two forms of the chloroplast genome relative to the orientation of SSC versus LSC were found. The Musa chloroplast genome shows an extreme IR expansion at the IR/SSC boundary relative to the most common structures found in angiosperms. This expansion consists of the integration of three additional complete genes (rps15, ndhH and ycf1) and part of the ndhA gene. No such expansion has been observed in monocots so far. Simple Sequence Repeats were identified in the Musa chloroplast genome and a new set of Musa chloroplastic markers was designed. Conclusion The complete sequence of M. acuminata ssp malaccensis chloroplast we reported here is the first one for the Zingiberales order. As such it provides new insight in the evolution of the chloroplast of monocotyledons. In particular, it reinforces that IR/SSC expansion has occurred independently several times within monocotyledons. The discovery of new polymorphic markers within Musa chloroplast opens new perspectives to better understand the origin of cultivated triploid bananas. PMID:23840670
Shifted termination assay (STA) fragment analysis to detect BRAF V600 mutations in papillary thyroid carcinomas

PubMed Central

2013-01-01

Background BRAF mutation is an important diagnostic and prognostic marker in patients with papillary thyroid carcinoma (PTC). To be applicable in clinical laboratories with limited equipment, diverse testing methods are required to detect BRAF mutation. Methods A shifted termination assay (STA) fragment analysis was used to detect common V600 BRAF mutations in 159 PTCs with DNAs extracted from formalin-fixed paraffin-embedded tumor tissue. The results of STA fragment analysis were compared to those of direct sequencing. Serial dilutions of BRAF mutant cell line (SNU-790) were used to calculate limit of detection (LOD). Results BRAF mutations were detected in 119 (74.8%) PTCs by STA fragment analysis. In direct sequencing, BRAF mutations were observed in 118 (74.2%) cases. The results of STA fragment analysis had high correlation with those of direct sequencing (p < 0.00001, κ = 0.98). The LOD of STA fragment analysis and direct sequencing was 6% and 12.5%, respectively. In PTCs with pT3/T4 stages, BRAF mutation was observed in 83.8% of cases. In pT1/T2 carcinomas, BRAF mutation was detected in 65.9% and this difference was statistically significant (p = 0.007). Moreover, BRAF mutation was more frequent in PTCs with extrathyroidal invasion than tumors without extrathyroidal invasion (84.7% versus 62.2%, p = 0.001). To prepare and run the reactions, direct sequencing required 450 minutes while STA fragment analysis needed 290 minutes. Conclusions STA fragment analysis is a simple and sensitive method to detect BRAF V600 mutations in formalin-fixed paraffin-embedded clinical samples. Virtual Slides The virtual slide(s) for this article can be found here: http://www.diagnosticpathology.diagnomx.eu/vs/5684057089135749 PMID:23883275
MetaLIMS, a simple open-source laboratory information management system for small metagenomic labs

PubMed Central

Gaultier, Nicolas Paul Eugène; Miller, Dana; Purbojati, Rikky Wenang; Lauro, Federico M.

2017-01-01

Abstract Background: As the cost of sequencing continues to fall, smaller groups increasingly initiate and manage larger sequencing projects and take on the complexity of data storage for high volumes of samples. This has created a need for low-cost laboratory information management systems (LIMS) that contain flexible fields to accommodate the unique nature of individual labs. Many labs do not have a dedicated information technology position, so LIMS must also be easy to setup and maintain with minimal technical proficiency. Findings: MetaLIMS is a free and open-source web-based application available via GitHub. The focus of MetaLIMS is to store sample metadata prior to sequencing and analysis pipelines. Initially designed for environmental metagenomics labs, in addition to storing generic sample collection information and DNA/RNA processing information, the user can also add fields specific to the user's lab. MetaLIMS can also produce a basic sequencing submission form compatible with the proprietary Clarity LIMS system used by some sequencing facilities. To help ease the technical burden associated with web deployment, MetaLIMS options the use of commercial web hosting combined with MetaLIMS bash scripts for ease of setup. Conclusions: MetaLIMS overcomes key challenges common in LIMS by giving labs access to a low-cost and open-source tool that also has the flexibility to meet individual lab needs and an option for easy deployment. By making the web application open source and hosting it on GitHub, we hope to encourage the community to build upon MetaLIMS, making it more robust and tailored to the needs of more researchers. PMID:28430964
Deep mRNA Sequencing of the Tritonia diomedea Brain Transcriptome Provides Access to Gene Homologues for Neuronal Excitability, Synaptic Transmission and Peptidergic Signalling

PubMed Central

Senatore, Adriano; Edirisinghe, Neranjan; Katz, Paul S.

2015-01-01

Background The sea slug Tritonia diomedea (Mollusca, Gastropoda, Nudibranchia), has a simple and highly accessible nervous system, making it useful for studying neuronal and synaptic mechanisms underlying behavior. Although many important contributions have been made using Tritonia, until now, a lack of genetic information has impeded exploration at the molecular level. Results We performed Illumina sequencing of central nervous system mRNAs from Tritonia, generating 133.1 million 100 base pair, paired-end reads. De novo reconstruction of the RNA-Seq data yielded a total of 185,546 contigs, which partitioned into 123,154 non-redundant gene clusters (unigenes). BLAST comparison with RefSeq and Swiss-Prot protein databases, as well as mRNA data from other invertebrates (gastropod molluscs: Aplysia californica, Lymnaea stagnalis and Biomphalaria glabrata; cnidarian: Nematostella vectensis) revealed that up to 76,292 unigenes in the Tritonia transcriptome have putative homologues in other databases, 18,246 of which are below a more stringent E-value cut-off of 1x10-6. In silico prediction of secreted proteins from the Tritonia transcriptome shotgun assembly (TSA) produced a database of 579 unique sequences of secreted proteins, which also exhibited markedly higher expression levels compared to other genes in the TSA. Conclusions Our efforts greatly expand the availability of gene sequences available for Tritonia diomedea. We were able to extract full length protein sequences for most queried genes, including those involved in electrical excitability, synaptic vesicle release and neurotransmission, thus confirming that the transcriptome will serve as a useful tool for probing the molecular correlates of behavior in this species. We also generated a neurosecretome database that will serve as a useful tool for probing peptidergic signalling systems in the Tritonia brain. PMID:25719197
Transcriptomic Analysis of the Endangered Neritid Species Clithon retropictus: De Novo Assembly, Functional Annotation, and Marker Discovery

PubMed Central

Park, So Young; Patnaik, Bharat Bhusan; Kang, Se Won; Hwang, Hee-Ju; Chung, Jong Min; Song, Dae Kwon; Sang, Min Kyu; Patnaik, Hongray Howrelia; Lee, Jae Bong; Noh, Mi Young; Kim, Changmu; Kim, Soonok; Park, Hong Seog; Lee, Jun Sang; Han, Yeon Soo; Lee, Yong Seok

2016-01-01

An aquatic gastropod belonging to the family Neritidae, Clithon retropictus is listed as an endangered class II species in South Korea. The lack of information on its genomic background limits the ability to obtain functional data resources and inhibits informed conservation planning for this species. In the present study, the transcriptomic sequencing and de novo assembly of C. retropictus generated a total of 241,696,750 high-quality reads. These assembled to 282,838 unigenes with mean and N50 lengths of 736.9 and 1201 base pairs, respectively. Of these, 125,616 unigenes were subjected to annotation analysis with known proteins in Protostome DB, COG, GO, and KEGG protein databases (BLASTX; E ≤ 0.00001) and with known nucleotides in the Unigene database (BLASTN; E ≤ 0.00001). The GO analysis indicated that cellular process, cell, and catalytic activity are the predominant GO terms in the biological process, cellular component, and molecular function categories, respectively. In addition, 2093 unigenes were distributed in 107 different KEGG pathways. Furthermore, 49,280 simple sequence repeats were identified in the unigenes (>1 kilobase sequences). This is the first report on the identification of transcriptomic and microsatellite resources for C. retropictus, which opens up the possibility of exploring traits related to the adaptation and acclimatization of this species. PMID:27455329
High-throughput sequencing of black pepper root transcriptome

PubMed Central

2012-01-01

Background Black pepper (Piper nigrum L.) is one of the most popular spices in the world. It is used in cooking and the preservation of food and even has medicinal properties. Losses in production from disease are a major limitation in the culture of this crop. The major diseases are root rot and foot rot, which are results of root infection by Fusarium solani and Phytophtora capsici, respectively. Understanding the molecular interaction between the pathogens and the host’s root region is important for obtaining resistant cultivars by biotechnological breeding. Genetic and molecular data for this species, though, are limited. In this paper, RNA-Seq technology has been employed, for the first time, to describe the root transcriptome of black pepper. Results The root transcriptome of black pepper was sequenced by the NGS SOLiD platform and assembled using the multiple-k method. Blast2Go and orthoMCL methods were used to annotate 10338 unigenes. The 4472 predicted proteins showed about 52% homology with the Arabidopsis proteome. Two root proteomes identified 615 proteins, which seem to define the plant’s root pattern. Simple-sequence repeats were identified that may be useful in studies of genetic diversity and may have applications in biotechnology and ecology. Conclusions This dataset of 10338 unigenes is crucially important for the biotechnological breeding of black pepper and the ecogenomics of the Magnoliids, a major group of basal angiosperms. PMID:22984782
Method for automatic localization of MR-visible markers using morphological image processing and conventional pulse sequences: feasibility for image-guided procedures.

PubMed

Busse, Harald; Trampel, Robert; Gründer, Wilfried; Moche, Michael; Kahn, Thomas

2007-10-01

To evaluate the feasibility and accuracy of an automated method to determine the 3D position of MR-visible markers. Inductively coupled RF coils were imaged in a whole-body 1.5T scanner using the body coil and two conventional gradient echo sequences (FLASH and TrueFISP) and large imaging volumes up to (300 mm(3)). To minimize background signals, a flip angle of approximately 1 degrees was used. Morphological 2D image processing in orthogonal scan planes was used to determine the 3D positions of a configuration of three fiducial markers (FMC). The accuracies of the marker positions and of the orientation of the plane defined by the FMC were evaluated at various distances r(M) from the isocenter. Fiducial marker detection with conventional equipment (pulse sequences, imaging coils) was very reliable and highly reproducible over a wide range of experimental conditions. For r(M)
Design and cloning strategies for constructing shRNA expression vectors

PubMed Central

McIntyre, Glen J; Fanning, Gregory C

2006-01-01

Background Short hairpin RNA (shRNA) encoded within an expression vector has proven an effective means of harnessing the RNA interference (RNAi) pathway in mammalian cells. A survey of the literature revealed that shRNA vector construction can be hindered by high mutation rates and the ensuing sequencing is often problematic. Current options for constructing shRNA vectors include the use of annealed complementary oligonucleotides (74 % of surveyed studies), a PCR approach using hairpin containing primers (22 %) and primer extension of hairpin templates (4 %). Results We considered primer extension the most attractive method in terms of cost. However, in initial experiments we encountered a mutation frequency of 50 % compared to a reported 20 – 40 % for other strategies. By modifying the technique to be an isothermal reaction using the DNA polymerase Phi29, we reduced the error rate to 10 %, making primer extension the most efficient and cost-effective approach tested. We also found that inclusion of a restriction site in the loop could be exploited for confirming construct integrity by automated sequencing, while maintaining intended gene suppression. Conclusion In this study we detail simple improvements for constructing and sequencing shRNA that overcome current limitations. We also compare the advantages of our solutions against proposed alternatives. Our technical modifications will be of tangible benefit to researchers looking for a more efficient and reliable shRNA construction process. PMID:16396676
Single-cell whole exome and targeted sequencing in NPM1/FLT3 positive pediatric acute myeloid leukemia.

PubMed

Walter, Christiane; Pozzorini, Christian; Reinhardt, Katarina; Geffers, Robert; Xu, Zhenyu; Reinhardt, Dirk; von Neuhoff, Nils; Hanenberg, Helmut

2018-02-01

The small portion of leukemic stem cells (LSCs) in acute myeloid leukemia (AML) present in children and adolescents is often masked by the high background of AML blasts and normal hematopoietic cells. The aim of the current study was to establish a simple workflow for reliable genetic analysis of single LSC-enriched blasts from pediatric patients. For three AMLs with mutations in nucleophosmin 1 and/or fms-like tyrosine kinase 3, we performed whole genome amplification on sorted single-cell DNA followed by whole exome sequencing (WES). The corresponding bulk bone marrow DNAs were also analyzed by WES and by targeted sequencing (TS) that included 54 genes associated with myeloid malignancies. Analysis revealed that read coverage statistics were comparable between single-cell and bulk WES data, indicating high-quality whole genome amplification. From 102 single-cell variants, 72 single nucleotide variants and insertions or deletions (70%) were consistently found in the two bulk DNA analyses. Variants reliably detected in single cells were also present in TS. However, initial screening by WES with read counts between 50-72× failed to detect rare AML subclones in the bulk DNAs. In summary, our study demonstrated that single-cell WES combined with bulk DNA TS is a promising tool set for detecting AML subclones and possibly LSCs. © 2017 Wiley Periodicals, Inc.

Query-seeded iterative sequence similarity searching improves selectivity 5–20-fold

PubMed Central

Li, Weizhong; Lopez, Rodrigo

2017-01-01

Abstract Iterative similarity search programs, like psiblast, jackhmmer, and psisearch, are much more sensitive than pairwise similarity search methods like blast and ssearch because they build a position specific scoring model (a PSSM or HMM) that captures the pattern of sequence conservation characteristic to a protein family. But models are subject to contamination; once an unrelated sequence has been added to the model, homologs of the unrelated sequence will also produce high scores, and the model can diverge from the original protein family. Examination of alignment errors during psiblast PSSM contamination suggested a simple strategy for dramatically reducing PSSM contamination. psiblast PSSMs are built from the query-based multiple sequence alignment (MSA) implied by the pairwise alignments between the query model (PSSM, HMM) and the subject sequences in the library. When the original query sequence residues are inserted into gapped positions in the aligned subject sequence, the resulting PSSM rarely produces alignment over-extensions or alignments to unrelated sequences. This simple step, which tends to anchor the PSSM to the original query sequence and slightly increase target percent identity, can reduce the frequency of false-positive alignments more than 20-fold compared with psiblast and jackhmmer, with little loss in search sensitivity. PMID:27923999
How Does Sequence Structure Affect the Judgment of Time? Exploring a Weighted Sum of Segments Model

ERIC Educational Resources Information Center

Matthews, William J.

2013-01-01

This paper examines the judgment of segmented temporal intervals, using short tone sequences as a convenient test case. In four experiments, we investigate how the relative lengths, arrangement, and pitches of the tones in a sequence affect judgments of sequence duration, and ask whether the data can be described by a simple weighted sum of…
Discrete sequence prediction and its applications

NASA Technical Reports Server (NTRS)

Laird, Philip

1992-01-01

Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.
The Contribution of Short Repeats of Low Sequence Complexity to Large Conifer Genomes

Treesearch

A. Schmidt; R.L. Doudrick; J.S. Heslop-Harrison; T. Schmidt

2000-01-01

Abstract: The abundance and genomic organization of six simple sequence repeats, consisting of di-, tri-, and tetranucleotide sequence motifs, and a minisatellite repeat have been analyzed in different gymnosperms by Southern hybridization. Within the gymnosperm genomes investigated, the abundance and genomic organization of micro- and...
Infrared dim moving target tracking via sparsity-based discriminative classifier and convolutional network

NASA Astrophysics Data System (ADS)

Qian, Kun; Zhou, Huixin; Wang, Bingjian; Song, Shangzhen; Zhao, Dong

2017-11-01

Infrared dim and small target tracking is a great challenging task. The main challenge for target tracking is to account for appearance change of an object, which submerges in the cluttered background. An efficient appearance model that exploits both the global template and local representation over infrared image sequences is constructed for dim moving target tracking. A Sparsity-based Discriminative Classifier (SDC) and a Convolutional Network-based Generative Model (CNGM) are combined with a prior model. In the SDC model, a sparse representation-based algorithm is adopted to calculate the confidence value that assigns more weights to target templates than negative background templates. In the CNGM model, simple cell feature maps are obtained by calculating the convolution between target templates and fixed filters, which are extracted from the target region at the first frame. These maps measure similarities between each filter and local intensity patterns across the target template, therefore encoding its local structural information. Then, all the maps form a representation, preserving the inner geometric layout of a candidate template. Furthermore, the fixed target template set is processed via an efficient prior model. The same operation is applied to candidate templates in the CNGM model. The online update scheme not only accounts for appearance variations but also alleviates the migration problem. At last, collaborative confidence values of particles are utilized to generate particles' importance weights. Experiments on various infrared sequences have validated the tracking capability of the presented algorithm. Experimental results show that this algorithm runs in real-time and provides a higher accuracy than state of the art algorithms.
Label-Free Sensitive Detection of DNA Methyltransferase by Target-Induced Hyperbranched Amplification with Zero Background Signal.

PubMed

Zhang, Yan; Wang, Xin-Yan; Zhang, Qianyi; Zhang, Chun-Yang

2017-11-21

DNA methyltransferases (MTases) may specifically recognize the short palindromic sequences and transfer a methyl group from S-adenosyl-l-methionine to target cytosine/adenine. The aberrant DNA methylation is linked to the abnormal DNA MTase activity, and some DNA MTases have become promising targets of anticancer/antimicrobial drugs. However, the reported DNA MTase assays often involve laborious operation, expensive instruments, and radio-labeled substrates. Here, we develop a simple and label-free fluorescent method to sensitively detect DNA adenine methyltransferase (Dam) on the basis of terminal deoxynucleotidyl transferase (TdT)-activated Endonuclease IV (Endo IV)-assisted hyperbranched amplification. We design a hairpin probe with a palindromic sequence in the stem as the substrate and a NH 2 -modified 3' end for the prevention of nonspecific amplification. The substrate may be methylated by Dam and subsequently cleaved by DpnI, producing three single-stranded DNAs, two of which with 3'-OH termini may be amplified by hyperbranched amplification to generate a distinct fluorescence signal. Because high exactitude of TdT enables the amplification only in the presence of free 3'-OH termini and Endo IV only hydrolyzes the intact apurinic/apyrimidinic sites in double-stranded DNAs, zero background signal can be achieved. This method exhibits excellent selectivity and high sensitivity with a limit of detection of 0.003 U/mL for pure Dam and 9.61 × 10 -6 mg/mL for Dam in E. coli cells. Moreover, it can be used to screen the Dam inhibitors, holding great potentials in disease diagnosis and drug development.
Development of a novel pink-eyed dilution mouse model showing progressive darkening of the eyes and coat hair with aging

PubMed Central

ISHIKAWA, Akira; SUGIYAMA, Makoto; HONDO, Eiichi; KINOSHITA, Keiji; YAMAGISHI, Yuki

2015-01-01

Oca2p-cas (oculocutaneous albinism II; pink-eyed dilution castaneus) is a coat color mutant gene on mouse chromosome 7 that arose spontaneously in wild Mus musculus castaneus mice. Mice homozygous for Oca2p-cas usually exhibit pink eyes and gray coat hair on the non-agouti genetic background, and this ordinary phenotype remains unchanged throughout life. During breeding of a mixed strain carrying this gene on the C57BL/6J background, we discovered a novel spontaneous mutation that causes darkening of the eyes and coat hair with aging. In this study, we developed a novel mouse model showing this unique phenotype. Gross observations revealed that the pink eyes and gray coat hair of the novel mutant young mice became progressively darker in color by approximately 3 months after birth. Light and transmission-electron microscopic observations revealed a marked increase in melanin pigmentation of coat hair shafts and choroid of the eye in the novel mice compared to that in the ordinary mice. Sequence analysis of Oca2p-cas revealed a 4.1-kb deletion involving exons 15 and 16 of its wild-type gene. However, there was no sequence difference between the two types of mutant mice. Mating experiments suggested that the novel mutant phenotype was not inherited in a simple fashion, due to incomplete penetrance. The novel spontaneous mutant mouse is the first example of progressive hair darkening animals and is an essential animal model for understanding of the regulation mechanisms of melanin biosynthesis with aging. PMID:25739360
Development of a novel pink-eyed dilution mouse model showing progressive darkening of the eyes and coat hair with aging.

PubMed

Ishikawa, Akira; Sugiyama, Makoto; Hondo, Eiichi; Kinoshita, Keiji; Yamagishi, Yuki

2015-01-01

Oca2(p-cas) (oculocutaneous albinism II; pink-eyed dilution castaneus) is a coat color mutant gene on mouse chromosome 7 that arose spontaneously in wild Mus musculus castaneus mice. Mice homozygous for Oca2(p-cas) usually exhibit pink eyes and gray coat hair on the non-agouti genetic background, and this ordinary phenotype remains unchanged throughout life. During breeding of a mixed strain carrying this gene on the C57BL/6J background, we discovered a novel spontaneous mutation that causes darkening of the eyes and coat hair with aging. In this study, we developed a novel mouse model showing this unique phenotype. Gross observations revealed that the pink eyes and gray coat hair of the novel mutant young mice became progressively darker in color by approximately 3 months after birth. Light and transmission-electron microscopic observations revealed a marked increase in melanin pigmentation of coat hair shafts and choroid of the eye in the novel mice compared to that in the ordinary mice. Sequence analysis of Oca2(p-cas) revealed a 4.1-kb deletion involving exons 15 and 16 of its wild-type gene. However, there was no sequence difference between the two types of mutant mice. Mating experiments suggested that the novel mutant phenotype was not inherited in a simple fashion, due to incomplete penetrance. The novel spontaneous mutant mouse is the first example of progressive hair darkening animals and is an essential animal model for understanding of the regulation mechanisms of melanin biosynthesis with aging.
"Hook"-calibration of GeneChip-microarrays: theory and algorithm.

PubMed

Binder, Hans; Preibisch, Stephan

2008-08-29

: The improvement of microarray calibration methods is an essential prerequisite for quantitative expression analysis. This issue requires the formulation of an appropriate model describing the basic relationship between the probe intensity and the specific transcript concentration in a complex environment of competing interactions, the estimation of the magnitude these effects and their correction using the intensity information of a given chip and, finally the development of practicable algorithms which judge the quality of a particular hybridization and estimate the expression degree from the intensity values. : We present the so-called hook-calibration method which co-processes the log-difference (delta) and -sum (sigma) of the perfect match (PM) and mismatch (MM) probe-intensities. The MM probes are utilized as an internal reference which is subjected to the same hybridization law as the PM, however with modified characteristics. After sequence-specific affinity correction the method fits the Langmuir-adsorption model to the smoothed delta-versus-sigma plot. The geometrical dimensions of this so-called hook-curve characterize the particular hybridization in terms of simple geometric parameters which provide information about the mean non-specific background intensity, the saturation value, the mean PM/MM-sensitivity gain and the fraction of absent probes. This graphical summary spans a metrics system for expression estimates in natural units such as the mean binding constants and the occupancy of the probe spots. The method is single-chip based, i.e. it separately uses the intensities for each selected chip. : The hook-method corrects the raw intensities for the non-specific background hybridization in a sequence-specific manner, for the potential saturation of the probe-spots with bound transcripts and for the sequence-specific binding of specific transcripts. The obtained chip characteristics in combination with the sensitivity corrected probe-intensity values provide expression estimates scaled in natural units which are given by the binding constants of the particular hybridization.
Colloidal polymers with controlled sequence and branching constructed from magnetic field assembled nanoparticles.

PubMed

Bannwarth, Markus B; Utech, Stefanie; Ebert, Sandro; Weitz, David A; Crespy, Daniel; Landfester, Katharina

2015-03-24

The assembly of nanoparticles into polymer-like architectures is challenging and usually requires highly defined colloidal building blocks. Here, we show that the broad size-distribution of a simple dispersion of magnetic nanocolloids can be exploited to obtain various polymer-like architectures. The particles are assembled under an external magnetic field and permanently linked by thermal sintering. The remarkable variety of polymer-analogue architectures that arises from this simple process ranges from statistical and block copolymer-like sequencing to branched chains and networks. This library of architectures can be realized by controlling the sequencing of the particles and the junction points via a size-dependent self-assembly of the single building blocks.
A robust and cost-effective approach to sequence and analyze complete genomes of small RNA viruses

USDA-ARS?s Scientific Manuscript database

Background: Next-generation sequencing (NGS) allows ultra-deep sequencing of nucleic acids. The use of sequence-independent amplification of viral nucleic acids without utilization of target-specific primers provides advantages over traditional sequencing methods and allows detection of unsuspected ...
Deciphering mRNA Sequence Determinants of Protein Production Rate

NASA Astrophysics Data System (ADS)

Szavits-Nossan, Juraj; Ciandrini, Luca; Romano, M. Carmen

2018-03-01

One of the greatest challenges in biophysical models of translation is to identify coding sequence features that affect the rate of translation and therefore the overall protein production in the cell. We propose an analytic method to solve a translation model based on the inhomogeneous totally asymmetric simple exclusion process, which allows us to unveil simple design principles of nucleotide sequences determining protein production rates. Our solution shows an excellent agreement when compared to numerical genome-wide simulations of S. cerevisiae transcript sequences and predicts that the first 10 codons, which is the ribosome footprint length on the mRNA, together with the value of the initiation rate, are the main determinants of protein production rate under physiological conditions. Finally, we interpret the obtained analytic results based on the evolutionary role of the codons' choice for regulating translation rates and ribosome densities.
PeanutDB: an integrated bioinformatics web portal for Arachis hypogaea transcriptomics

PubMed Central

2012-01-01

Background The peanut (Arachis hypogaea) is an important crop cultivated worldwide for oil production and food sources. Its complex genetic architecture (e.g., the large and tetraploid genome possibly due to unique cross of wild diploid relatives and subsequent chromosome duplication: 2n = 4x = 40, AABB, 2800 Mb) presents a major challenge for its genome sequencing and makes it a less-studied crop. Without a doubt, transcriptome sequencing is the most effective way to harness the genome structure and gene expression dynamics of this non-model species that has a limited genomic resource. Description With the development of next generation sequencing technologies such as 454 pyro-sequencing and Illumina sequencing by synthesis, the transcriptomics data of peanut is rapidly accumulated in both the public databases and private sectors. Integrating 187,636 Sanger reads (103,685,419 bases), 1,165,168 Roche 454 reads (333,862,593 bases) and 57,135,995 Illumina reads (4,073,740,115 bases), we generated the first release of our peanut transcriptome assembly that contains 32,619 contigs. We provided EC, KEGG and GO functional annotations to these contigs and detected SSRs, SNPs and other genetic polymorphisms for each contig. Based on both open-source and our in-house tools, PeanutDB presents many seamlessly integrated web interfaces that allow users to search, filter, navigate and visualize easily the whole transcript assembly, its annotations and detected polymorphisms and simple sequence repeats. For each contig, sequence alignment is presented in both bird’s-eye view and nucleotide level resolution, with colorfully highlighted regions of mismatches, indels and repeats that facilitate close examination of assembly quality, genetic polymorphisms, sequence repeats and/or sequencing errors. Conclusion As a public genomic database that integrates peanut transcriptome data from different sources, PeanutDB (http://bioinfolab.muohio.edu/txid3818v1) provides the Peanut research community with an easy-to-use web portal that will definitely facilitate genomics research and molecular breeding in this less-studied crop. PMID:22712730
Microcomputer-Assisted Mathematics: From Simple Interest to e.

ERIC Educational Resources Information Center

Kimberling, Clark

1985-01-01

The progression from simple interest to compound interest leads naturally and quickly to the number e, involving mathematical discovery learning through writing programs. Several programs are given, with suggestions for a teaching sequence. (MNS)
Handling the data management needs of high-throughput sequencing data: SpeedGene, a compression algorithm for the efficient storage of genetic data

PubMed Central

2012-01-01

Background As Next-Generation Sequencing data becomes available, existing hardware environments do not provide sufficient storage space and computational power to store and process the data due to their enormous size. This is and will be a frequent problem that is encountered everyday by researchers who are working on genetic data. There are some options available for compressing and storing such data, such as general-purpose compression software, PBAT/PLINK binary format, etc. However, these currently available methods either do not offer sufficient compression rates, or require a great amount of CPU time for decompression and loading every time the data is accessed. Results Here, we propose a novel and simple algorithm for storing such sequencing data. We show that, the compression factor of the algorithm ranges from 16 to several hundreds, which potentially allows SNP data of hundreds of Gigabytes to be stored in hundreds of Megabytes. We provide a C++ implementation of the algorithm, which supports direct loading and parallel loading of the compressed format without requiring extra time for decompression. By applying the algorithm to simulated and real datasets, we show that the algorithm gives greater compression rate than the commonly used compression methods, and the data-loading process takes less time. Also, The C++ library provides direct-data-retrieving functions, which allows the compressed information to be easily accessed by other C++ programs. Conclusions The SpeedGene algorithm enables the storage and the analysis of next generation sequencing data in current hardware environment, making system upgrades unnecessary. PMID:22591016
Functional PMS2 Hybrid Alleles Containing a Pseudogene-Specific Missense Variant Trace Back to a Single Ancient Intrachromosomal Recombination Event

PubMed Central

Ganster, Christina; Wernstedt, Annekatrin; Kehrer-Sawatzki, Hildegard; Messiaen, Ludwine; Schmidt, Konrad; Rahner, Nils; Heinimann, Karl; Fonatsch, Christa; Zschocke, Johannes; Wimmer, Katharina

2012-01-01

Sequence exchange between PMS2 and its pseudogene PMS2CL, embedded in an inverted duplication on chromosome 7p22, has been reported to be an ongoing process that leads to functional PMS2 hybrid alleles containing PMS2- and PMS2CL-specific sequence variants at the 5′-and the 3′-end, respectively. The frequency of PMS2 hybrid alleles, their biological significance, and the mechanisms underlying their formation are largely unknown. Here we show that overall hybrid alleles account for one-third of 384 PMS2 alleles analyzed in individuals of different ethnic backgrounds. Depending on the population, 14–60% of hybrid alleles carry PMS2CL-specific sequences in exons 13–15, the remainder only in exon 15. We show that exons 13–15 hybrid alleles, named H1 hybrid alleles, constitute different haplotypes but trace back to a single ancient intrachromosomal recombination event with crossover. Taking advantage of an ancestral sequence variant specific for all H1 alleles we developed a simple gDNA-based polymerase chain reaction (PCR) assay that can be used to identify H1-allele carriers with high sensitivity and specificity (100 and 99%, respectively). Because H1 hybrid alleles harbor missense variant p.N775S of so far unknown functional significance, we assessed the H1-carrier frequency in 164 colorectal cancer patients. So far, we found no indication that the variant plays a major role with regard to cancer susceptibility. PMID:20186689
Functional PMS2 hybrid alleles containing a pseudogene-specific missense variant trace back to a single ancient intrachromosomal recombination event.

PubMed

Ganster, Christina; Wernstedt, Annekatrin; Kehrer-Sawatzki, Hildegard; Messiaen, Ludwine; Schmidt, Konrad; Rahner, Nils; Heinimann, Karl; Fonatsch, Christa; Zschocke, Johannes; Wimmer, Katharina

2010-05-01

Sequence exchange between PMS2 and its pseudogene PMS2CL, embedded in an inverted duplication on chromosome 7p22, has been reported to be an ongoing process that leads to functional PMS2 hybrid alleles containing PMS2- and PMS2CL-specific sequence variants at the 5'-and the 3'-end, respectively. The frequency of PMS2 hybrid alleles, their biological significance, and the mechanisms underlying their formation are largely unknown. Here we show that overall hybrid alleles account for one-third of 384 PMS2 alleles analyzed in individuals of different ethnic backgrounds. Depending on the population, 14-60% of hybrid alleles carry PMS2CL-specific sequences in exons 13-15, the remainder only in exon 15. We show that exons 13-15 hybrid alleles, named H1 hybrid alleles, constitute different haplotypes but trace back to a single ancient intrachromosomal recombination event with crossover. Taking advantage of an ancestral sequence variant specific for all H1 alleles we developed a simple gDNA-based polymerase chain reaction (PCR) assay that can be used to identify H1-allele carriers with high sensitivity and specificity (100 and 99%, respectively). Because H1 hybrid alleles harbor missense variant p.N775S of so far unknown functional significance, we assessed the H1-carrier frequency in 164 colorectal cancer patients. So far, we found no indication that the variant plays a major role with regard to cancer susceptibility. (c) 2010 Wiley-Liss, Inc.
Typing of artiodactyl MHC-DRB genes with the help of intronic simple repeated DNA sequences.

PubMed

Schwaiger, F W; Buitkamp, J; Weyers, E; Epplen, J T

1993-02-01

An efficient oligonucleotide typing method for the highly polymorphic MHC-DRB genes is described for artiodactyls like cattle, sheep and goat. By means of the polymerase chain reaction, the second exon of MHC-DRB is amplified as well as part of the adjacent intron containing a mixed simple repeat sequence. Using this primer combination we were able to amplify the MHC-DRB exons 2 and adjacent introns from all of the investigated 10 species of the family of Bovidae and giraffes. Therefore, the DRB genes of novel artiodactyl species can also be readily studied. Oligonucleotide probes specific for the polymorphisms of ungulate DRB genes are used with which sequences differing in at least one single base can be distinguished. Exonic polymorphism was found to be correlated with the allele lengths and the patterns of the repeat structures. Hence oligonucleotide probes specific for different simple repeats and polymorphic positions serve also for typing across species barriers. The strict correlation of sequence length and exonic polymorphism permits a preselection of specific oligonucleotides for hybridization. Thus more than 20 alleles can already be differentiated from each of the three species.
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

PubMed Central

Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

2011-01-01

Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
Genetic diversity studies and identification of SSR markers associated with Fusarium wilt (Fusarium udum) resistance in cultivated pigeonpea (Cajanus cajan).

PubMed

Singh, A K; Rai, V P; Chand, R; Singh, R P; Singh, M N

2013-01-01

Genetic diversity and identification of simple sequence repeat markers correlated with Fusarium wilt resistance was performed in a set of 36 elite cultivated pigeonpea genotypes differing in levels of resistance to Fusarium wilt. Twenty-four polymorphic sequence repeat markers were screened across these genotypes, and amplified a total of 59 alleles with an average high polymorphic information content value of 0.52. Cluster analysis, done by UPGMA and PCA, grouped the 36 pigeonpea genotypes into two main clusters according to their Fusarium wilt reaction. Based on the Kruskal-Wallis ANOVA and simple regression analysis, six simple sequence repeat markers were found to be significantly associated with Fusarium wilt resistance. The phenotypic variation explained by these markers ranged from 23.7 to 56.4%. The present study helps in finding out feasibility of prescreened SSR markers to be used in genetic diversity analysis and their potential association with disease resistance.

GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts

PubMed Central

Naito, Yuki; Bono, Hidemasa

2012-01-01

GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users. PMID:22641850
GGRNA: an ultrafast, transcript-oriented search engine for genes and transcripts.

PubMed

Naito, Yuki; Bono, Hidemasa

2012-07-01

GGRNA (http://GGRNA.dbcls.jp/) is a Google-like, ultrafast search engine for genes and transcripts. The web server accepts arbitrary words and phrases, such as gene names, IDs, gene descriptions, annotations of gene and even nucleotide/amino acid sequences through one simple search box, and quickly returns relevant RefSeq transcripts. A typical search takes just a few seconds, which dramatically enhances the usability of routine searching. In particular, GGRNA can search sequences as short as 10 nt or 4 amino acids, which cannot be handled easily by popular sequence analysis tools. Nucleotide sequences can be searched allowing up to three mismatches, or the query sequences may contain degenerate nucleotide codes (e.g. N, R, Y, S). Furthermore, Gene Ontology annotations, Enzyme Commission numbers and probe sequences of catalog microarrays are also incorporated into GGRNA, which may help users to conduct searches by various types of keywords. GGRNA web server will provide a simple and powerful interface for finding genes and transcripts for a wide range of users. All services at GGRNA are provided free of charge to all users.
Comparison of simple sequence repeats in 19 Archaea.

PubMed

Trivedi, S

2006-12-05

All organisms that have been studied until now have been found to have differential distribution of simple sequence repeats (SSRs), with more SSRs in intergenic than in coding sequences. SSR distribution was investigated in Archaea genomes where complete chromosome sequences of 19 Archaea were analyzed with the program SPUTNIK to find di- to penta-nucleotide repeats. The number of repeats was determined for the complete chromosome sequences and for the coding and non-coding sequences. Different from what has been found for other groups of organisms, there is an abundance of SSRs in coding regions of the genome of some Archaea. Dinucleotide repeats were rare and CG repeats were found in only two Archaea. In general, trinucleotide repeats are the most abundant SSR motifs; however, pentanucleotide repeats are abundant in some Archaea. Some of the tetranucleotide and pentanucleotide repeat motifs are organism specific. In general, repeats are short and CG-rich repeats are present in Archaea having a CG-rich genome. Among the 19 Archaea, SSR density was not correlated with genome size or with optimum growth temperature. Pentanucleotide density had an inverse correlation with the CG content of the genome.
Simple sequence repeat marker loci discovery using SSR primer.

PubMed

Robinson, Andrew J; Love, Christopher G; Batley, Jacqueline; Barker, Gary; Edwards, David

2004-06-12

Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. With the increase in the availability of DNA sequence information, an automated process to identify and design PCR primers for amplification of SSR loci would be a useful tool in plant breeding programs. We report an application that integrates SPUTNIK, an SSR repeat finder, with Primer3, a PCR primer design program, into one pipeline tool, SSR Primer. On submission of multiple FASTA formatted sequences, the script screens each sequence for SSRs using SPUTNIK. The results are parsed to Primer3 for locus-specific primer design. The script makes use of a Web-based interface, enabling remote use. This program has been written in PERL and is freely available for non-commercial users by request from the authors. The Web-based version may be accessed at http://hornbill.cspp.latrobe.edu.au/
Development of Genomic Simple Sequence Repeats (SSR) by Enrichment Libraries in Date Palm.

PubMed

Al-Faifi, Sulieman A; Migdadi, Hussein M; Algamdi, Salem S; Khan, Mohammad Altaf; Al-Obeed, Rashid S; Ammar, Megahed H; Jakse, Jerenj

2017-01-01

Development of highly informative markers such as simple sequence repeats (SSR) for cultivar identification and germplasm characterization and management is essential for date palms genetic studies. The present study documents the development of SSR markers and assesses genetic relationships of commonly grown date palm (Phoenix dactylifera L.) cultivars in different geographical regions of Saudi Arabia. A total of 93 novel simple sequence repeat (SSR) markers were screened for their ability to detect polymorphism in date palm. Around 71% of genomic SSRs are dinucleotide, 25% trinucleotide, 3% tetranucleotide, and 1% pentanucleotide motives and show 100% polymorphism. The Unweighted Pair Group Method with Arithmetic Mean (UPGMA) cluster analysis illustrates that cultivars trend to group according to their class of maturity, region of cultivation, and fruit color. Analysis of molecular variations (AMOVA) reveals genetic variation among and within cultivars of 27% and 73%, respectively, according to the geographical distribution of the cultivars. Developed microsatellite markers are of additional value to date palm characterization, tools which can be used by researchers in population genetics, cultivar identification, as well as genetic resource exploration and management. The cultivars tested exhibited a significant amount of genetic diversity and could be suitable for successful breeding programs. Genomic sequences generated from this study are available at the National Center for Biotechnology Information (NCBI), Sequence Read Archive (Accession numbers. LIBGSS_039019).
Winnowing DNA for rare sequences: highly specific sequence and methylation based enrichment.

PubMed

Thompson, Jason D; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

2012-01-01

Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue.
A behavior analytic analogue of learning to use synonyms, syntax, and parts of speech.

PubMed

Chase, Philip N; Ellenwood, David W; Madden, Gregory

2008-01-01

Matching-to-sample and sequence training procedures were used to develop responding to stimulus classes that were considered analogous to 3 aspects of verbal behavior: identifying synonyms and parts of speech, and using syntax. Matching-to-sample procedures were used to train 12 paired associates from among 24 stimuli. These pairs were analogous to synonyms. Then, sequence characteristics were trained to 6 of the stimuli. The result was the formation of 3 classes of 4 stimuli, with the classes controlling a sequence response analogous to a simple ordering syntax: first, second, and third. Matching-to-sample procedures were then used to add 4 stimuli to each class. These stimuli, without explicit sequence training, also began to control the same sequence responding as the other members of their class. Thus, three 8-member functionally equivalent sequence classes were formed. These classes were considered to be analogous to parts of speech. Further testing revealed three 8-member equivalence classes and 512 different sequences of first, second, and third. The study indicated that behavior analytic procedures may be used to produce some generative aspects of verbal behavior related to simple syntax and semantics.
Construction and Characterization of an in-vivo Linear Covalently Closed DNA Vector Production System

PubMed Central

2012-01-01

Background While safer than their viral counterparts, conventional non-viral gene delivery DNA vectors offer a limited safety profile. They often result in the delivery of unwanted prokaryotic sequences, antibiotic resistance genes, and the bacterial origins of replication to the target, which may lead to the stimulation of unwanted immunological responses due to their chimeric DNA composition. Such vectors may also impart the potential for chromosomal integration, thus potentiating oncogenesis. We sought to engineer an in vivo system for the quick and simple production of safer DNA vector alternatives that were devoid of non-transgene bacterial sequences and would lethally disrupt the host chromosome in the event of an unwanted vector integration event. Results We constructed a parent eukaryotic expression vector possessing a specialized manufactured multi-target site called “Super Sequence”, and engineered E. coli cells (R-cell) that conditionally produce phage-derived recombinase Tel (PY54), TelN (N15), or Cre (P1). Passage of the parent plasmid vector through R-cells under optimized conditions, resulted in rapid, efficient, and one step in vivo generation of mini lcc—linear covalently closed (Tel/TelN-cell), or mini ccc—circular covalently closed (Cre-cell), DNA constructs, separated from the backbone plasmid DNA. Site-specific integration of lcc plasmids into the host chromosome resulted in chromosomal disruption and 105 fold lower viability than that seen with the ccc counterpart. Conclusion We offer a high efficiency mini DNA vector production system that confers simple, rapid and scalable in vivo production of mini lcc DNA vectors that possess all the benefits of “minicircle” DNA vectors and virtually eliminate the potential for undesirable vector integration events. PMID:23216697
Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

PubMed Central

Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret

2009-01-01

Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254
Unexpected effects of different genetic backgrounds on identification of genomic rearrangements via whole-genome next generation sequencing.

PubMed

Chen, Zhangguo; Gowan, Katherine; Leach, Sonia M; Viboolsittiseri, Sawanee S; Mishra, Ameet K; Kadoishi, Tanya; Diener, Katrina; Gao, Bifeng; Jones, Kenneth; Wang, Jing H

2016-10-21

Whole genome next generation sequencing (NGS) is increasingly employed to detect genomic rearrangements in cancer genomes, especially in lymphoid malignancies. We recently established a unique mouse model by specifically deleting a key non-homologous end-joining DNA repair gene, Xrcc4, and a cell cycle checkpoint gene, Trp53, in germinal center B cells. This mouse model spontaneously develops mature B cell lymphomas (termed G1XP lymphomas). Here, we attempt to employ whole genome NGS to identify novel structural rearrangements, in particular inter-chromosomal translocations (CTXs), in these G1XP lymphomas. We sequenced six lymphoma samples, aligned our NGS data with mouse reference genome (in C57BL/6J (B6) background) and identified CTXs using CREST algorithm. Surprisingly, we detected widespread CTXs in both lymphomas and wildtype control samples, majority of which were false positive and attributable to different genetic backgrounds. In addition, we validated our NGS pipeline by sequencing multiple control samples from distinct tissues of different genetic backgrounds of mouse (B6 vs non-B6). Lastly, our studies showed that widespread false positive CTXs can be generated by simply aligning sequences from different genetic backgrounds of mouse. We conclude that mapping and alignment with reference genome might not be a preferred method for analyzing whole-genome NGS data obtained from a genetic background different from reference genome. Given the complex genetic background of different mouse strains or the heterogeneity of cancer genomes in human patients, in order to minimize such systematic artifacts and uncover novel CTXs, a preferred method might be de novo assembly of personalized normal control genome and cancer cell genome, instead of mapping and aligning NGS data to mouse or human reference genome. Thus, our studies have critical impact on the manner of data analysis for cancer genomics.
A ddRAD-based genetic map and its integration with the genome assembly of Japanese eel (Anguilla japonica) provides insights into genome evolution after the teleost-specific genome duplication

PubMed Central

2014-01-01

Background Recent advancements in next-generation sequencing technology have enabled cost-effective sequencing of whole or partial genomes, permitting the discovery and characterization of molecular polymorphisms. Double-digest restriction-site associated DNA sequencing (ddRAD-seq) is a powerful and inexpensive approach to developing numerous single nucleotide polymorphism (SNP) markers and constructing a high-density genetic map. To enrich genomic resources for Japanese eel (Anguilla japonica), we constructed a ddRAD-based genetic map using an Ion Torrent Personal Genome Machine and anchored scaffolds of the current genome assembly to 19 linkage groups of the Japanese eel. Furthermore, we compared the Japanese eel genome with genomes of model fishes to infer the history of genome evolution after the teleost-specific genome duplication. Results We generated the ddRAD-based linkage map of the Japanese eel, where the maps for female and male spanned 1748.8 cM and 1294.5 cM, respectively, and were arranged into 19 linkage groups. A total of 2,672 SNP markers and 115 Simple Sequence Repeat markers provide anchor points to 1,252 scaffolds covering 151 Mb (13%) of the current genome assembly of the Japanese eel. Comparisons among the Japanese eel, medaka, zebrafish and spotted gar genomes showed highly conserved synteny among teleosts and revealed part of the eight major chromosomal rearrangement events that occurred soon after the teleost-specific genome duplication. Conclusions The ddRAD-seq approach combined with the Ion Torrent Personal Genome Machine sequencing allowed us to conduct efficient and flexible SNP genotyping. The integration of the genetic map and the assembled sequence provides a valuable resource for fine mapping and positional cloning of quantitative trait loci associated with economically important traits and for investigating comparative genomics of the Japanese eel. PMID:24669946
Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies

PubMed Central

2010-01-01

Background All polypeptide backbones have the potential to form amyloid fibrils, which are associated with a number of degenerative disorders. However, the likelihood that amyloidosis would actually occur under physiological conditions depends largely on the amino acid composition of a protein. We explore using a naive Bayesian classifier and a weighted decision tree for predicting the amyloidogenicity of immunoglobulin sequences. Results The average accuracy based on leave-one-out (LOO) cross validation of a Bayesian classifier generated from 143 amyloidogenic sequences is 60.84%. This is consistent with the average accuracy of 61.15% for a holdout test set comprised of 103 AM and 28 non-amyloidogenic sequences. The LOO cross validation accuracy increases to 81.08% when the training set is augmented by the holdout test set. In comparison, the average classification accuracy for the holdout test set obtained using a decision tree is 78.64%. Non-amyloidogenic sequences are predicted with average LOO cross validation accuracies between 74.05% and 77.24% using the Bayesian classifier, depending on the training set size. The accuracy for the holdout test set was 89%. For the decision tree, the non-amyloidogenic prediction accuracy is 75.00%. Conclusions This exploratory study indicates that both classification methods may be promising in providing straightforward predictions on the amyloidogenicity of a sequence. Nevertheless, the number of available sequences that satisfy the premises of this study are limited, and are consequently smaller than the ideal training set size. Increasing the size of the training set clearly increases the accuracy, and the expansion of the training set to include not only more derivatives, but more alignments, would make the method more sound. The accuracy of the classifiers may also be improved when additional factors, such as structural and physico-chemical data, are considered. The development of this type of classifier has significant applications in evaluating engineered antibodies, and may be adapted for evaluating engineered proteins in general. PMID:20144194
Protein interface classification by evolutionary analysis

PubMed Central

2012-01-01

Background Distinguishing biologically relevant interfaces from lattice contacts in protein crystals is a fundamental problem in structural biology. Despite efforts towards the computational prediction of interface character, many issues are still unresolved. Results We present here a protein-protein interface classifier that relies on evolutionary data to detect the biological character of interfaces. The classifier uses a simple geometric measure, number of core residues, and two evolutionary indicators based on the sequence entropy of homolog sequences. Both aim at detecting differential selection pressure between interface core and rim or rest of surface. The core residues, defined as fully buried residues (>95% burial), appear to be fundamental determinants of biological interfaces: their number is in itself a powerful discriminator of interface character and together with the evolutionary measures it is able to clearly distinguish evolved biological contacts from crystal ones. We demonstrate that this definition of core residues leads to distinctively better results than earlier definitions from the literature. The stringent selection and quality filtering of structural and sequence data was key to the success of the method. Most importantly we demonstrate that a more conservative selection of homolog sequences - with relatively high sequence identities to the query - is able to produce a clearer signal than previous attempts. Conclusions An evolutionary approach like the one presented here is key to the advancement of the field, which so far was missing an effective method exploiting the evolutionary character of protein interfaces. Its coverage and performance will only improve over time thanks to the incessant growth of sequence databases. Currently our method reaches an accuracy of 89% in classifying interfaces of the Ponstingl 2003 datasets and it lends itself to a variety of useful applications in structural biology and bioinformatics. We made the corresponding software implementation available to the community as an easy-to-use graphical web interface at http://www.eppic-web.org. PMID:23259833
Discovery and mapping of single feature polymorphisms in wheat using Affymetrix arrays

PubMed Central

Bernardo, Amy N; Bradbury, Peter J; Ma, Hongxiang; Hu, Shengwa; Bowden, Robert L; Buckler, Edward S; Bai, Guihua

2009-01-01

Background Wheat (Triticum aestivum L.) is a staple food crop worldwide. The wheat genome has not yet been sequenced due to its huge genome size (~17,000 Mb) and high levels of repetitive sequences; the whole genome sequence may not be expected in the near future. Available linkage maps have low marker density due to limitation in available markers; therefore new technologies that detect genome-wide polymorphisms are still needed to discover a large number of new markers for construction of high-resolution maps. A high-resolution map is a critical tool for gene isolation, molecular breeding and genomic research. Single feature polymorphism (SFP) is a new microarray-based type of marker that is detected by hybridization of DNA or cRNA to oligonucleotide probes. This study was conducted to explore the feasibility of using the Affymetrix GeneChip to discover and map SFPs in the large hexaploid wheat genome. Results Six wheat varieties of diverse origins (Ning 7840, Clark, Jagger, Encruzilhada, Chinese Spring, and Opata 85) were analyzed for significant probe by variety interactions and 396 probe sets with SFPs were identified. A subset of 164 unigenes was sequenced and 54% showed polymorphism within probes. Microarray analysis of 71 recombinant inbred lines from the cross Ning 7840/Clark identified 955 SFPs and 877 of them were mapped together with 269 simple sequence repeat markers. The SFPs were randomly distributed within a chromosome but were unevenly distributed among different genomes. The B genome had the most SFPs, and the D genome had the least. Map positions of a selected set of SFPs were validated by mapping single nucleotide polymorphism using SNaPshot and comparing with expressed sequence tags mapping data. Conclusion The Affymetrix array is a cost-effective platform for SFP discovery and SFP mapping in wheat. The new high-density map constructed in this study will be a useful tool for genetic and genomic research in wheat. PMID:19480702
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop.

PubMed

Schumacher, André; Pireddu, Luca; Niemenmaa, Matti; Kallio, Aleksi; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo

2014-01-01

Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig's scalability over many computing nodes and illustrate its use with example scripts. Available under the open source MIT license at http://sourceforge.net/projects/seqpig/
Chroma key without color restrictions based on asynchronous amplitude modulation of background illumination on retroreflective screens

NASA Astrophysics Data System (ADS)

Vidal, Borja; Lafuente, Juan A.

2016-03-01

A simple technique to avoid color limitations in image capture systems based on chroma key video composition using retroreflective screens and light-emitting diodes (LED) rings is proposed and demonstrated. The combination of an asynchronous temporal modulation onto the background illumination and simple image processing removes the usual restrictions on foreground colors in the scene. The technique removes technical constraints in stage composition, allowing its design to be purely based on artistic grounds. Since it only requires adding a very simple electronic circuit to widely used chroma keying hardware based on retroreflective screens, the technique is easily applicable to TV and filming studios.
High throughput SNP discovery and genotyping in grapevine (Vitis vinifera L.) by combining a re-sequencing approach and SNPlex technology

PubMed Central

Lijavetzky, Diego; Cabezas, José Antonio; Ibáñez, Ana; Rodríguez, Virginia; Martínez-Zapater, José M

2007-01-01

Background Single-nucleotide polymorphisms (SNPs) are the most abundant type of DNA sequence polymorphisms. Their higher availability and stability when compared to simple sequence repeats (SSRs) provide enhanced possibilities for genetic and breeding applications such as cultivar identification, construction of genetic maps, the assessment of genetic diversity, the detection of genotype/phenotype associations, or marker-assisted breeding. In addition, the efficiency of these activities can be improved thanks to the ease with which SNP genotyping can be automated. Expressed sequence tags (EST) sequencing projects in grapevine are allowing for the in silico detection of multiple putative sequence polymorphisms within and among a reduced number of cultivars. In parallel, the sequence of the grapevine cultivar Pinot Noir is also providing thousands of polymorphisms present in this highly heterozygous genome. Still the general application of those SNPs requires further validation since their use could be restricted to those specific genotypes. Results In order to develop a large SNP set of wide application in grapevine we followed a systematic re-sequencing approach in a group of 11 grape genotypes corresponding to ancient unrelated cultivars as well as wild plants. Using this approach, we have sequenced 230 gene fragments, what represents the analysis of over 1 Mb of grape DNA sequence. This analysis has allowed the discovery of 1573 SNPs with an average of one SNP every 64 bp (one SNP every 47 bp in non-coding regions and every 69 bp in coding regions). Nucleotide diversity in grape (π = 0.0051) was found to be similar to values observed in highly polymorphic plant species such as maize. The average number of haplotypes per gene sequence was estimated as six, with three haplotypes representing over 83% of the analyzed sequences. Short-range linkage disequilibrium (LD) studies within the analyzed sequences indicate the existence of a rapid decay of LD within the selected grapevine genotypes. To validate the use of the detected polymorphisms in genetic mapping, cultivar identification and genetic diversity studies we have used the SNPlex™ genotyping technology in a sample of grapevine genotypes and segregating progenies. Conclusion These results provide accurate values for nucleotide diversity in coding sequences and a first estimate of short-range LD in grapevine. Using SNPlex™ genotyping we have shown the application of a set of discovered SNPs as molecular markers for cultivar identification, linkage mapping and genetic diversity studies. Thus, the combination a highly efficient re-sequencing approach and the SNPlex™ high throughput genotyping technology provide a powerful tool for grapevine genetic analysis. PMID:18021442
Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL): adapting the Partial Phylogenetic Profiling algorithm to scan sequences for signatures that predict protein function

PubMed Central

2010-01-01

Background Comparative genomics methods such as phylogenetic profiling can mine powerful inferences from inherently noisy biological data sets. We introduce Sites Inferred by Metabolic Background Assertion Labeling (SIMBAL), a method that applies the Partial Phylogenetic Profiling (PPP) approach locally within a protein sequence to discover short sequence signatures associated with functional sites. The approach is based on the basic scoring mechanism employed by PPP, namely the use of binomial distribution statistics to optimize sequence similarity cutoffs during searches of partitioned training sets. Results Here we illustrate and validate the ability of the SIMBAL method to find functionally relevant short sequence signatures by application to two well-characterized protein families. In the first example, we partitioned a family of ABC permeases using a metabolic background property (urea utilization). Thus, the TRUE set for this family comprised members whose genome of origin encoded a urea utilization system. By moving a sliding window across the sequence of a permease, and searching each subsequence in turn against the full set of partitioned proteins, the method found which local sequence signatures best correlated with the urea utilization trait. Mapping of SIMBAL "hot spots" onto crystal structures of homologous permeases reveals that the significant sites are gating determinants on the cytosolic face rather than, say, docking sites for the substrate-binding protein on the extracellular face. In the second example, we partitioned a protein methyltransferase family using gene proximity as a criterion. In this case, the TRUE set comprised those methyltransferases encoded near the gene for the substrate RF-1. SIMBAL identifies sequence regions that map onto the substrate-binding interface while ignoring regions involved in the methyltransferase reaction mechanism in general. Neither method for training set construction requires any prior experimental characterization. Conclusions SIMBAL shows that, in functionally divergent protein families, selected short sequences often significantly outperform their full-length parent sequence for making functional predictions by sequence similarity, suggesting avenues for improved functional classifiers. When combined with structural data, SIMBAL affords the ability to localize and model functional sites. PMID:20102603
HMM-ModE: implementation, benchmarking and validation with HMMER3

PubMed Central

2014-01-01

Background HMM-ModE is a computational method that generates family specific profile HMMs using negative training sequences. The method optimizes the discrimination threshold using 10 fold cross validation and modifies the emission probabilities of profiles to reduce common fold based signals shared with other sub-families. The protocol depends on the program HMMER for HMM profile building and sequence database searching. The recent release of HMMER3 has improved database search speed by several orders of magnitude, allowing for the large scale deployment of the method in sequence annotation projects. We have rewritten our existing scripts both at the level of parsing the HMM profiles and modifying emission probabilities to upgrade HMM-ModE using HMMER3 that takes advantage of its probabilistic inference with high computational speed. The method is benchmarked and tested on GPCR dataset as an accurate and fast method for functional annotation. Results The implementation of this method, which now works with HMMER3, is benchmarked with the earlier version of HMMER, to show that the effect of local-local alignments is marked only in the case of profiles containing a large number of discontinuous match states. The method is tested on a gold standard set of families and we have reported a significant reduction in the number of false positive hits over the default HMM profiles. When implemented on GPCR sequences, the results showed an improvement in the accuracy of classification compared with other methods used to classify the familyat different levels of their classification hierarchy. Conclusions The present findings show that the new version of HMM-ModE is a highly specific method used to differentiate between fold (superfamily) and function (family) specific signals, which helps in the functional annotation of protein sequences. The use of modified profile HMMs of GPCR sequences provides a simple yet highly specific method for classification of the family, being able to predict the sub-family specific sequences with high accuracy even though sequences share common physicochemical characteristics between sub-families. PMID:25073805
Predicting PDZ domain mediated protein interactions from structure

PubMed Central

2013-01-01

Background PDZ domains are structural protein domains that recognize simple linear amino acid motifs, often at protein C-termini, and mediate protein-protein interactions (PPIs) in important biological processes, such as ion channel regulation, cell polarity and neural development. PDZ domain-peptide interaction predictors have been developed based on domain and peptide sequence information. Since domain structure is known to influence binding specificity, we hypothesized that structural information could be used to predict new interactions compared to sequence-based predictors. Results We developed a novel computational predictor of PDZ domain and C-terminal peptide interactions using a support vector machine trained with PDZ domain structure and peptide sequence information. Performance was estimated using extensive cross validation testing. We used the structure-based predictor to scan the human proteome for ligands of 218 PDZ domains and show that the predictions correspond to known PDZ domain-peptide interactions and PPIs in curated databases. The structure-based predictor is complementary to the sequence-based predictor, finding unique known and novel PPIs, and is less dependent on training–testing domain sequence similarity. We used a functional enrichment analysis of our hits to create a predicted map of PDZ domain biology. This map highlights PDZ domain involvement in diverse biological processes, some only found by the structure-based predictor. Based on this analysis, we predict novel PDZ domain involvement in xenobiotic metabolism and suggest new interactions for other processes including wound healing and Wnt signalling. Conclusions We built a structure-based predictor of PDZ domain-peptide interactions, which can be used to scan C-terminal proteomes for PDZ interactions. We also show that the structure-based predictor finds many known PDZ mediated PPIs in human that were not found by our previous sequence-based predictor and is less dependent on training–testing domain sequence similarity. Using both predictors, we defined a functional map of human PDZ domain biology and predict novel PDZ domain function. Users may access our structure-based and previous sequence-based predictors at http://webservice.baderlab.org/domains/POW. PMID:23336252

Design, implementation and evaluation of a practical pseudoknot folding algorithm based on thermodynamics

PubMed Central

Reeder, Jens; Giegerich, Robert

2004-01-01

Background The general problem of RNA secondary structure prediction under the widely used thermodynamic model is known to be NP-complete when the structures considered include arbitrary pseudoknots. For restricted classes of pseudoknots, several polynomial time algorithms have been designed, where the O(n6)time and O(n4) space algorithm by Rivas and Eddy is currently the best available program. Results We introduce the class of canonical simple recursive pseudoknots and present an algorithm that requires O(n4) time and O(n2) space to predict the energetically optimal structure of an RNA sequence, possible containing such pseudoknots. Evaluation against a large collection of known pseudoknotted structures shows the adequacy of the canonization approach and our algorithm. Conclusions RNA pseudoknots of medium size can now be predicted reliably as well as efficiently by the new algorithm. PMID:15294028
In defence of story-telling.

PubMed

Currie, Adrian; Sterelny, Kim

2017-04-01

We argue that narratives are central to the success of historical reconstruction. Narrative explanation involves tracing causal trajectories across time. The construction of narrative, then, often involves postulating relatively speculative causal connections between comparatively well-established events. But speculation is not always idle or harmful: it also aids in overcoming local underdetermination by forming scaffolds from which new evidence becomes relevant. Moreover, as our understanding of the past's causal milieus become richer, the constraints on narrative plausibility become increasingly strict: a narrative's admissibility does not turn on mere logical consistency with background data. Finally, narrative explanation and explanation generated by simple, formal models complement one another. Where models often achieve isolation and precision at the cost of simplification and abstraction, narratives can track complex changes in a trajectory over time at the cost of simplicity and precision. In combination both allow us to understand and explain highly complex historical sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.
An efficient method for facial component detection in thermal images

NASA Astrophysics Data System (ADS)

Paul, Michael; Blanik, Nikolai; Blazek, Vladimir; Leonhardt, Steffen

2015-04-01

A method to detect certain regions in thermal images of human faces is presented. In this approach, the following steps are necessary to locate the periorbital and the nose regions: First, the face is segmented from the background by thresholding and morphological filtering. Subsequently, a search region within the face, around its center of mass, is evaluated. Automatically computed temperature thresholds are used per subject and image or image sequence to generate binary images, in which the periorbital regions are located by integral projections. Then, the located positions are used to approximate the nose position. It is possible to track features in the located regions. Therefore, these regions are interesting for different applications like human-machine interaction, biometrics and biomedical imaging. The method is easy to implement and does not rely on any training images or templates. Furthermore, the approach saves processing resources due to simple computations and restricted search regions.
Increasing the Efficacy of SLNB in Cases of Malignant Melanoma Located in Close Proximity to the Lymphatic Basin

PubMed Central

Bogdanov-Berezovsky, Alexander; Pagkalos, Vasileios A.; Silberstein, Eldad; Xanthinaki, Arsinoi A.; Krieger, Yuval

2014-01-01

Background. Being predictive of the entire nodal bed, sentinel lymph node biopsy (SLNB) is invaluable in the surgical management of melanoma. Although the concept is simple, sentinel lymph node (SLN) identification and removal can be technically challenging. Methods. A total of 102 consecutive patients have undergone SLNB in the Division of Plastic and Reconstructive Surgery of Soroka University Medical Center from 2009 to 2012. Patients have undergone SLNB using a radioactive tracer and blue stain in order to identify the SLN. Although SLNB usually precedes the wide excision of melanoma, primary lesions in close proximity (<10 cm) to the lymph basin require wide excision before beginning the SLN quest. Results. All pathology reports confirmed the excision of lymph nodes. Conclusions. When treating MM in close proximity to the lymph basin, changing the sequence of the SLNB procedure seems to increase the efficacy of the method. PMID:24660067
Visual display aid for orbital maneuvering - Design considerations

NASA Technical Reports Server (NTRS)

Grunwald, Arthur J.; Ellis, Stephen R.

1993-01-01

This paper describes the development of an interactive proximity operations planning system that allows on-site planning of fuel-efficient multiburn maneuvers in a potential multispacecraft environment. Although this display system most directly assists planning by providing visual feedback to aid visualization of the trajectories and constraints, its most significant features include: (1) the use of an 'inverse dynamics' algorithm that removes control nonlinearities facing the operator, and (2) a trajectory planning technique that separates, through a 'geometric spreadsheet', the normally coupled complex problems of planning orbital maneuvers and allows solution by an iterative sequence of simple independent actions. The visual feedback of trajectory shapes and operational constraints, provided by user-transparent and continuously active background computations, allows the operator to make fast, iterative design changes that rapidly converge to fuel-efficient solutions. The planning tool provides an example of operator-assisted optimization of nonlinear cost functions.
Winnowing DNA for Rare Sequences: Highly Specific Sequence and Methylation Based Enrichment

PubMed Central

Thompson, Jason D.; Shibahara, Gosuke; Rajan, Sweta; Pel, Joel; Marziali, Andre

2012-01-01

Rare mutations in cell populations are known to be hallmarks of many diseases and cancers. Similarly, differential DNA methylation patterns arise in rare cell populations with diagnostic potential such as fetal cells circulating in maternal blood. Unfortunately, the frequency of alleles with diagnostic potential, relative to wild-type background sequence, is often well below the frequency of errors in currently available methods for sequence analysis, including very high throughput DNA sequencing. We demonstrate a DNA preparation and purification method that through non-linear electrophoretic separation in media containing oligonucleotide probes, achieves 10,000 fold enrichment of target DNA with single nucleotide specificity, and 100 fold enrichment of unmodified methylated DNA differing from the background by the methylation of a single cytosine residue. PMID:22355378
Background sequence characteristics influence the occurrence and severity of disease-causing mtDNA mutations

PubMed Central

Wei, Wei; Hudson, Gavin

2017-01-01

Inherited mitochondrial DNA (mtDNA) mutations have emerged as a common cause of human disease, with mutations occurring multiple times in the world population. The clinical presentation of three pathogenic mtDNA mutations is strongly associated with a background mtDNA haplogroup, but it is not clear whether this is limited to a handful of examples or is a more general phenomenon. To address this, we determined the characteristics of 30,506 mtDNA sequences sampled globally. After performing several quality control steps, we ascribed an established pathogenicity score to the major alleles for each sequence. The mean pathogenicity score for known disease-causing mutations was significantly different between mtDNA macro-haplogroups. Several mutations were observed across all haplogroup backgrounds, whereas others were only observed on specific clades. In some instances this reflected a founder effect, but in others, the mutation recurred but only within the same phylogenetic cluster. Sequence diversity estimates showed that disease-causing mutations were more frequent on young sequences, and genomes with two or more disease-causing mutations were more common than expected by chance. These findings implicate the mtDNA background more generally in recurrent mutation events that have been purified through natural selection in older populations. This provides an explanation for the low frequency of mtDNA disease reported in specific ethnic groups. PMID:29253894
GABI-Kat SimpleSearch: new features of the Arabidopsis thaliana T-DNA mutant database.

PubMed

Kleinboelting, Nils; Huep, Gunnar; Kloetgen, Andreas; Viehoever, Prisca; Weisshaar, Bernd

2012-01-01

T-DNA insertion mutants are very valuable for reverse genetics in Arabidopsis thaliana. Several projects have generated large sequence-indexed collections of T-DNA insertion lines, of which GABI-Kat is the second largest resource worldwide. User access to the collection and its Flanking Sequence Tags (FSTs) is provided by the front end SimpleSearch (http://www.GABI-Kat.de). Several significant improvements have been implemented recently. The database now relies on the TAIRv10 genome sequence and annotation dataset. All FSTs have been newly mapped using an optimized procedure that leads to improved accuracy of insertion site predictions. A fraction of the collection with weak FST yield was re-analysed by generating new FSTs. Along with newly found predictions for older sequences about 20,000 new FSTs were included in the database. Information about groups of FSTs pointing to the same insertion site that is found in several lines but is real only in a single line are included, and many problematic FST-to-line links have been corrected using new wet-lab data. SimpleSearch currently contains data from ~71,000 lines with predicted insertions covering 62.5% of the 27,206 nuclear protein coding genes, and offers insertion allele-specific data from 9545 confirmed lines that are available from the Nottingham Arabidopsis Stock Centre.
Prediction during statistical learning, and implications for the implicit/explicit divide

PubMed Central

Dale, Rick; Duran, Nicholas D.; Morehead, J. Ryan

2012-01-01

Accounts of statistical learning, both implicit and explicit, often invoke predictive processes as central to learning, yet practically all experiments employ non-predictive measures during training. We argue that the common theoretical assumption of anticipation and prediction needs clearer, more direct evidence for it during learning. We offer a novel experimental context to explore prediction, and report results from a simple sequential learning task designed to promote predictive behaviors in participants as they responded to a short sequence of simple stimulus events. Predictive tendencies in participants were measured using their computer mouse, the trajectories of which served as a means of tapping into predictive behavior while participants were exposed to very short and simple sequences of events. A total of 143 participants were randomly assigned to stimulus sequences along a continuum of regularity. Analysis of computer-mouse trajectories revealed that (a) participants almost always anticipate events in some manner, (b) participants exhibit two stable patterns of behavior, either reacting to vs. predicting future events, (c) the extent to which participants predict relates to performance on a recall test, and (d) explicit reports of perceiving patterns in the brief sequence correlates with extent of prediction. We end with a discussion of implicit and explicit statistical learning and of the role prediction may play in both kinds of learning. PMID:22723817
Xenosurveillance reflects traditional sampling techniques for the identification of human pathogens: A comparative study in West Africa

PubMed Central

Fakoli, Lawrence S.; Bolay, Kpehe; Bolay, Fatorma K.; Diclaro, Joseph W.; Brackney, Doug E.; Stenglein, Mark D.; Ebel, Gregory D.

2018-01-01

Background Novel surveillance strategies are needed to detect the rapid and continuous emergence of infectious disease agents. Ideally, new sampling strategies should be simple to implement, technologically uncomplicated, and applicable to areas where emergence events are known to occur. To this end, xenosurveillance is a technique that makes use of blood collected by hematophagous arthropods to monitor and identify vertebrate pathogens. Mosquitoes are largely ubiquitous animals that often exist in sizable populations. As well, many domestic or peridomestic species of mosquitoes will preferentially take blood-meals from humans, making them a unique and largely untapped reservoir to collect human blood. Methodology/Principal findings We sought to take advantage of this phenomenon by systematically collecting blood-fed mosquitoes during a field trail in Northern Liberia to determine whether pathogen sequences from blood engorged mosquitoes accurately mirror those obtained directly from humans. Specifically, blood was collected from humans via finger-stick and by aspirating bloodfed mosquitoes from the inside of houses. Shotgun metagenomic sequencing of RNA and DNA derived from these specimens was performed to detect pathogen sequences. Samples obtained from xenosurveillance and from finger-stick blood collection produced a similar number and quality of reads aligning to two human viruses, GB virus C and hepatitis B virus. Conclusions/Significance This study represents the first systematic comparison between xenosurveillance and more traditional sampling methodologies, while also demonstrating the viability of xenosurveillance as a tool to sample human blood for circulating pathogens. PMID:29561834
A Toolkit for bulk PCR-based marker design from next-generation sequence data: application for development of a framework linkage map in bulb onion (Allium cepa L.)

PubMed Central

2012-01-01

Background Although modern sequencing technologies permit the ready detection of numerous DNA sequence variants in any organisms, converting such information to PCR-based genetic markers is hampered by a lack of simple, scalable tools. Onion is an example of an under-researched crop with a complex, heterozygous genome where genome-based research has previously been hindered by limited sequence resources and genetic markers. Results We report the development of generic tools for large-scale web-based PCR-based marker design in the Galaxy bioinformatics framework, and their application for development of next-generation genetics resources in a wide cross of bulb onion (Allium cepa L.). Transcriptome sequence resources were developed for the homozygous doubled-haploid bulb onion line ‘CUDH2150’ and the genetically distant Indian landrace ‘Nasik Red’, using 454™ sequencing of normalised cDNA libraries of leaf and shoot. Read mapping of ‘Nasik Red’ reads onto ‘CUDH2150’ assemblies revealed 16836 indel and SNP polymorphisms that were mined for portable PCR-based marker development. Tools for detection of restriction polymorphisms and primer set design were developed in BioPython and adapted for use in the Galaxy workflow environment, enabling large-scale and targeted assay design. Using PCR-based markers designed with these tools, a framework genetic linkage map of over 800cM spanning all chromosomes was developed in a subset of 93 F2 progeny from a very large F2 family developed from the ‘Nasik Red’ x ‘CUDH2150’ inter-cross. The utility of tools and genetic resources developed was tested by designing markers to transcription factor-like polymorphic sequences. Bin mapping these markers using a subset of 10 progeny confirmed the ability to place markers within 10 cM bins, enabling increased efficiency in marker assignment and targeted map refinement. The major genetic loci conditioning red bulb colour (R) and fructan content (Frc) were located on this map by QTL analysis. Conclusions The generic tools developed for the Galaxy environment enable rapid development of sets of PCR assays targeting sequence variants identified from Illumina and 454 sequence data. They enable non-specialist users to validate and exploit large volumes of next-generation sequence data using basic equipment. PMID:23157543
Yellow lupin (Lupinus luteus L.) transcriptome sequencing: molecular marker development and comparative studies

PubMed Central

2012-01-01

Background Yellow lupin (Lupinus luteus L.) is a minor legume crop characterized by its high seed protein content. Although grown in several temperate countries, its orphan condition has limited the generation of genomic tools to aid breeding efforts to improve yield and nutritional quality. In this study, we report the construction of 454-expresed sequence tag (EST) libraries, carried out comparative studies between L. luteus and model legume species, developed a comprehensive set of EST-simple sequence repeat (SSR) markers, and validated their utility on diversity studies and transferability to related species. Results Two runs of 454 pyrosequencing yielded 205 Mb and 530 Mb of sequence data for L1 (young leaves, buds and flowers) and L2 (immature seeds) EST- libraries. A combined assembly (L1L2) yielded 71,655 contigs with an average contig length of 632 nucleotides. L1L2 contigs were clustered into 55,309 isotigs. 38,200 isotigs translated into proteins and 8,741 of them were full length. Around 57% of L. luteus sequences had significant similarity with at least one sequence of Medicago, Lotus, Arabidopsis, or Glycine, and 40.17% showed positive matches with all of these species. L. luteus isotigs were also screened for the presence of SSR sequences. A total of 2,572 isotigs contained at least one EST-SSR, with a frequency of one SSR per 17.75 kbp. Empirical evaluation of the EST-SSR candidate markers resulted in 222 polymorphic EST-SSRs. Two hundred and fifty four (65.7%) and 113 (30%) SSR primer pairs were able to amplify fragments from L. hispanicus and L. mutabilis DNA, respectively. Fifty polymorphic EST-SSRs were used to genotype a sample of 64 L. luteus accessions. Neighbor-joining distance analysis detected the existence of several clusters among L. luteus accessions, strongly suggesting the existence of population subdivisions. However, no clear clustering patterns followed the accession’s origin. Conclusion L. luteus deep transcriptome sequencing will facilitate the further development of genomic tools and lupin germplasm. Massive sequencing of cDNA libraries will continue to produce raw materials for gene discovery, identification of polymorphisms (SNPs, EST-SSRs, INDELs, etc.) for marker development, anchoring sequences for genome comparisons and putative gene candidates for QTL detection. PMID:22920992
Development of somatic hybrids Solanum × michoacanum Bitter. (Rydb.) (+) S. tuberosum L. and autofused 4x S. × michoacanum plants as potential sources of late blight resistance for potato breeding.

PubMed

Smyda, P; Jakuczun, H; Dębski, K; Sliwka, J; Thieme, R; Nachtigall, M; Wasilewicz-Flis, I; Zimnoch-Guzowska, E

2013-08-01

Phytophthora infestans resistant somatic hybrids of S. × michoacanum (+) S. tuberosum and autofused 4 x S. × michoacanum were obtained. Our material is promising to introgress resistance from S. × michoacanum into cultivated potato background. Solanum × michoacanum (Bitter.) Rydb. (mch) is a wild diploid (2n = 2x = 24) potato species derived from spontaneous cross of S. bulbocastanum and S. pinnatisectum. This hybrid is a 1 EBN (endosperm balance number) species and can cross effectively only with other 1 EBN species. Plants of mch are resistant to Phytophthora infestans (Mont) de Bary. To introgress late blight resistance genes from mch into S. tuberosum (tbr), genepool somatic hybridization between mch and susceptible diploid potato clones (2n = 2x = 24) or potato cultivar Rywal (2n = 4x = 48) was performed. In total 18,775 calli were obtained from postfusion products from which 1,482 formed shoots. The Simple Sequence Repeat (SSR), Cleaved Amplified Polymorphic Sequences (CAPS) and Random Amplified Polymorphic DNA (RAPD) analyses confirmed hybrid nature of 228 plants and 116 autofused 4x mch. After evaluation of morphological features, flowering, pollen stainability, tuberization and ploidy level, 118 somatic hybrids and 116 autofused 4x mch were tested for late blight resistance using the detached leaf assay. After two seasons of testing three somatic hybrids and 109 4x mch were resistant. Resistant forms have adequate pollen stainability for use in crossing programme and are a promising material useful for introgression resistance from mch into the cultivated potato background.
Genotyping and Molecular Identification of Date Palm Cultivars Using Inter-Simple Sequence Repeat (ISSR) Markers.

PubMed

Ayesh, Basim M

2017-01-01

Molecular markers are credible for the discrimination of genotypes and estimation of the extent of genetic diversity and relatedness in a set of genotypes. Inter-simple sequence repeat (ISSR) markers rapidly reveal high polymorphic fingerprints and have been used frequently to determine the genetic diversity among date palm cultivars. This chapter describes the application of ISSR markers for genotyping of date palm cultivars. The application involves extraction of genomic DNA from the target cultivars with reliable quality and quantity. Subsequently the extracted DNA serves as a template for amplification of genomic regions flanked by inverted simple sequence repeats using a single primer. The similarity of each pair of samples is measured by calculating the number of mono- and polymorphic bands revealed by gel electrophoresis. Matrices constructed for similarity and genetic distance are used to build a phylogenetic tree and cluster analysis, to determine the molecular relatedness of cultivars. The protocol describes 3 out of 9 tested primers consistently amplified 31 loci in 6 date palm cultivars, with 28 polymorphic loci.
Analysis on the DNA Fingerprinting of Aspergillus Oryzae Mutant Induced by High Hydrostatic Pressure

NASA Astrophysics Data System (ADS)

Wang, Hua; Zhang, Jian; Yang, Fan; Wang, Kai; Shen, Si-Le; Liu, Bing-Bing; Zou, Bo; Zou, Guang-Tian

2011-01-01

The mutant strains of aspergillus oryzae (HP300a) are screened under 300 MPa for 20 min. Compared with the control strains, the screened mutant strains have unique properties such as genetic stability, rapid growth, lots of spores, and high protease activity. Random amplified polymorphic DNA (RAPD) and inter simple sequence repeats (ISSR) are used to analyze the DNA fingerprinting of HP300a and the control strains. There are 67.9% and 51.3% polymorphic bands obtained by these two markers, respectively, indicating significant genetic variations between HP300a and the control strains. In addition, comparison of HP300a and the control strains, the genetic distances of random sequence and simple sequence repeat of DNA are 0.51 and 0.34, respectively.
Simple, reliable, and nondestructive method for the measurement of vacuum pressure without specialized equipment.

PubMed

Yuan, Jin-Peng; Ji, Zhong-Hua; Zhao, Yan-Ting; Chang, Xue-Fang; Xiao, Lian-Tuan; Jia, Suo-Tang

2013-09-01

We present a simple, reliable, and nondestructive method for the measurement of vacuum pressure in a magneto-optical trap. The vacuum pressure is verified to be proportional to the collision rate constant between cold atoms and the background gas with a coefficient k, which can be calculated by means of the simple ideal gas law. The rate constant for loss due to collisions with all background gases can be derived from the total collision loss rate by a series of loading curves of cold atoms under different trapping laser intensities. The presented method is also applicable for other cold atomic systems and meets the miniaturization requirement of commercial applications.
Analysis of the Transcriptome of Erigeron breviscapus Uncovers Putative Scutellarin and Chlorogenic Acids Biosynthetic Genes and Genetic Markers

PubMed Central

Zhang, Jia-Jin; Shu, Li-Ping; Zhang, Wei; Long, Guang-Qiang; Liu, Tao; Meng, Zheng-Gui; Chen, Jun-Wen; Yang, Sheng-Chao

2014-01-01

Background Erigeron breviscapus (Vant.) Hand-Mazz. is a famous medicinal plant. Scutellarin and chlorogenic acids are the primary active components in this herb. However, the mechanisms of biosynthesis and regulation for scutellarin and chlorogenic acids in E. breviscapus are considerably unknown. In addition, genomic information of this herb is also unavailable. Principal Findings Using Illumina sequencing on GAIIx platform, a total of 64,605,972 raw sequencing reads were generated and assembled into 73,092 non-redundant unigenes. Among them, 44,855 unigenes (61.37%) were annotated in the public databases Nr, Swiss-Prot, KEGG, and COG. The transcripts encoding the known enzymes involved in flavonoids and in chlorogenic acids biosynthesis were discovered in the Illumina dataset. Three candidate cytochrome P450 genes were discovered which might encode flavone 6-hydroase converting apigenin to scutellarein. Furthermore, 4 unigenes encoding the homologues of maize P1 (R2R3-MYB transcription factors) were defined, which might regulate the biosynthesis of scutellarin. Additionally, a total of 11,077 simple sequence repeat (SSR) were identified from 9,255 unigenes. Of SSRs, tri-nucleotide motifs were the most abundant motif. Thirty-six primer pairs for SSRs were randomly selected for validation of the amplification and polymorphism. The result revealed that 34 (94.40%) primer pairs were successfully amplified and 19 (52.78%) primer pairs exhibited polymorphisms. Conclusion Using next generation sequencing (NGS) technology, this study firstly provides abundant genomic data for E. breviscapus. The candidate genes involved in the biosynthesis and transcriptional regulation of scutellarin and chlorogenic acids were obtained in this study. Additionally, a plenty of genetic makers were generated by identification of SSRs, which is a powerful tool for molecular breeding and genetics applications in this herb. PMID:24956277
Synthesis and sensing integration: A novel enzymatic reaction modulated Nanoclusters Beacon (NCB) "Illumination" strategy for label-free biosensing and logic gate operation.

PubMed

Hong, Lu; Zhou, Fu; Wang, Guangfeng; Zhang, Xiaojun

2016-12-15

A novel fluorescent label-free "turn-on" NAD(+) and adenosine triphosphate (ATP) biosensing strategy is proposed by fully exploiting ligation triggered Nanocluster Beacon (NCB). In the presence of the target, the split NCB was brought to intact, which brought the C-rich sequence and enhancer sequence in close proximity resulting in the lightening of dark DNA/AgNCs ("On" mode). Further application was presented for logic gate operation and aptasensor construction. The feasibility was investigated by Ultraviolet-visible spectroscopy (UV-vis), Fluorescence, lifetime and High Resolution Transmission Electron Microscopy (HRTEM) etc. The strategy displayed good performance in the detection of NAD(+) and ATP, with the detection limit of 0.002nM and 0.001mM, the linear range of 10-1000nM and 0.003-0.01mM, respectively. Due to the DNA/AgNCs as fluorescence reporter, the completely label-free fluorescent strategy boasts the features of simplicity and low cost, and showing little reliance on the sensing environment. Meanwhile, the regulation by overhang G-rich sequence not relying on Förster energy transfer quenching manifests the high signal-to-background ratios (S/B ratios). This method not only provided a simple, economical and reliable fluorescent NAD(+) assay but also explored a flexible G-rich sequence regulated NCB probe for the fluorescent biosensors. Furthermore, this sensing mode was expanded to the application of a logic gate design, which exhibited a high performance for not only versatile biosensors construction but also for molecular computing application. Copyright © 2016 Elsevier B.V. All rights reserved.
The Sudden Dominance of bla CTX–M Harbouring Plasmids in Shigella spp. Circulating in Southern Vietnam

PubMed Central

Nhu, Nguyen Thi Khanh; Vinh, Ha; Nga, Tran Vu Thieu; Stabler, Richard; Duy, Pham Thanh; Thi Minh Vien, Le; van Doorn, H. Rogier; Cerdeño-Tárraga, Ana; Thomson, Nicholas; Campbell, James; Van Minh Hoang, Nguyen; Thi Thu Nga, Tran; Minh, Pham Van; Thuy, Cao Thu; Wren, Brendan; Farrar, Jeremy; Baker, Stephen

2010-01-01

Background Plasmid mediated antimicrobial resistance in the Enterobacteriaceae is a global problem. The rise of CTX-M class extended spectrum beta lactamases (ESBLs) has been well documented in industrialized countries. Vietnam is representative of a typical transitional middle income country where the spectrum of infectious diseases combined with the spread of drug resistance is shifting and bringing new healthcare challenges. Methodology We collected hospital admission data from the pediatric population attending the hospital for tropical diseases in Ho Chi Minh City with Shigella infections. Organisms were cultured from all enrolled patients and subjected to antimicrobial susceptibility testing. Those that were ESBL positive were subjected to further investigation. These investigations included PCR amplification for common ESBL genes, plasmid investigation, conjugation, microarray hybridization and DNA sequencing of a bla CTX–M encoding plasmid. Principal Findings We show that two different bla CTX-M genes are circulating in this bacterial population in this location. Sequence of one of the ESBL plasmids shows that rather than the gene being integrated into a preexisting MDR plasmid, the bla CTX-M gene is located on relatively simple conjugative plasmid. The sequenced plasmid (pEG356) carried the bla CTX-M-24 gene on an ISEcp1 element and demonstrated considerable sequence homology with other IncFI plasmids. Significance The rapid dissemination, spread of antimicrobial resistance and changing population of Shigella spp. concurrent with economic growth are pertinent to many other countries undergoing similar development. Third generation cephalosporins are commonly used empiric antibiotics in Ho Chi Minh City. We recommend that these agents should not be considered for therapy of dysentery in this setting. PMID:20544028
RNAseq versus genome-predicted transcriptomes: a large population of novel transcripts identified in an Illumina-454 Hydra transcriptome

PubMed Central

2013-01-01

Background Evolutionary studies benefit from deep sequencing technologies that generate genomic and transcriptomic sequences from a variety of organisms. Genome sequencing and RNAseq have complementary strengths. In this study, we present the assembly of the most complete Hydra transcriptome to date along with a comparative analysis of the specific features of RNAseq and genome-predicted transcriptomes currently available in the freshwater hydrozoan Hydra vulgaris. Results To produce an accurate and extensive Hydra transcriptome, we combined Illumina and 454 Titanium reads, giving the primacy to Illumina over 454 reads to correct homopolymer errors. This strategy yielded an RNAseq transcriptome that contains 48’909 unique sequences including splice variants, representing approximately 24’450 distinct genes. Comparative analysis to the available genome-predicted transcriptomes identified 10’597 novel Hydra transcripts that encode 529 evolutionarily-conserved proteins. The annotation of 170 human orthologs points to critical functions in protein biosynthesis, FGF and TOR signaling, vesicle transport, immunity, cell cycle regulation, cell death, mitochondrial metabolism, transcription and chromatin regulation. However, a majority of these novel transcripts encodes short ORFs, at least 767 of them corresponding to pseudogenes. This RNAseq transcriptome also lacks 11’270 predicted transcripts that correspond either to silent genes or to genes expressed below the detection level of this study. Conclusions We established a simple and powerful strategy to combine Illumina and 454 reads and we produced, with genome assistance, an extensive and accurate Hydra transcriptome. The comparative analysis of the RNAseq transcriptome with genome-predicted transcriptomes lead to the identification of large populations of novel as well as missing transcripts that might reflect Hydra-specific evolutionary events. PMID:23530871

Implementing a genomic data management system using iRODS in the Wellcome Trust Sanger Institute

PubMed Central

2011-01-01

Background Increasingly large amounts of DNA sequencing data are being generated within the Wellcome Trust Sanger Institute (WTSI). The traditional file system struggles to handle these increasing amounts of sequence data. A good data management system therefore needs to be implemented and integrated into the current WTSI infrastructure. Such a system enables good management of the IT infrastructure of the sequencing pipeline and allows biologists to track their data. Results We have chosen a data grid system, iRODS (Rule-Oriented Data management systems), to act as the data management system for the WTSI. iRODS provides a rule-based system management approach which makes data replication much easier and provides extra data protection. Unlike the metadata provided by traditional file systems, the metadata system of iRODS is comprehensive and allows users to customize their own application level metadata. Users and IT experts in the WTSI can then query the metadata to find and track data. The aim of this paper is to describe how we designed and used (from both system and user viewpoints) iRODS as a data management system. Details are given about the problems faced and the solutions found when iRODS was implemented. A simple use case describing how users within the WTSI use iRODS is also introduced. Conclusions iRODS has been implemented and works as the production system for the sequencing pipeline of the WTSI. Both biologists and IT experts can now track and manage data, which could not previously be achieved. This novel approach allows biologists to define their own metadata and query the genomic data using those metadata. PMID:21906284
Development of highly polymorphic simple sequence repeat markers using genome-wide microsatellite variant analysis in Foxtail millet [Setaria italica (L.) P. Beauv.

PubMed Central

2014-01-01

Background Foxtail millet (Setaria italica (L.) Beauv.) is an important gramineous grain-food and forage crop. It is grown worldwide for human and livestock consumption. Its small genome and diploid nature have led to foxtail millet fast becoming a novel model for investigating plant architecture, drought tolerance and C4 photosynthesis of grain and bioenergy crops. Therefore, cost-effective, reliable and highly polymorphic molecular markers covering the entire genome are required for diversity, mapping and functional genomics studies in this model species. Result A total of 5,020 highly repetitive microsatellite motifs were isolated from the released genome of the genotype 'Yugu1’ by sequence scanning. Based on sequence comparison between S. italica and S. viridis, a set of 788 SSR primer pairs were designed. Of these primers, 733 produced reproducible amplicons and were polymorphic among 28 Setaria genotypes selected from diverse geographical locations. The number of alleles detected by these SSR markers ranged from 2 to 16, with an average polymorphism information content of 0.67. The result obtained by neighbor-joining cluster analysis of 28 Setaria genotypes, based on Nei’s genetic distance of the SSR data, showed that these SSR markers are highly polymorphic and effective. Conclusions A large set of highly polymorphic SSR markers were successfully and efficiently developed based on genomic sequence comparison between different genotypes of the genus Setaria. The large number of new SSR markers and their placement on the physical map represent a valuable resource for studying diversity, constructing genetic maps, functional gene mapping, QTL exploration and molecular breeding in foxtail millet and its closely related species. PMID:24472631
Learning predictive statistics from temporal sequences: Dynamics and strategies

PubMed Central

Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E.; Kourtzi, Zoe

2017-01-01

Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics—that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments. PMID:28973111
Learning predictive statistics from temporal sequences: Dynamics and strategies.

PubMed

Wang, Rui; Shen, Yuan; Tino, Peter; Welchman, Andrew E; Kourtzi, Zoe

2017-10-01

Human behavior is guided by our expectations about the future. Often, we make predictions by monitoring how event sequences unfold, even though such sequences may appear incomprehensible. Event structures in the natural environment typically vary in complexity, from simple repetition to complex probabilistic combinations. How do we learn these structures? Here we investigate the dynamics of structure learning by tracking human responses to temporal sequences that change in structure unbeknownst to the participants. Participants were asked to predict the upcoming item following a probabilistic sequence of symbols. Using a Markov process, we created a family of sequences, from simple frequency statistics (e.g., some symbols are more probable than others) to context-based statistics (e.g., symbol probability is contingent on preceding symbols). We demonstrate the dynamics with which individuals adapt to changes in the environment's statistics-that is, they extract the behaviorally relevant structures to make predictions about upcoming events. Further, we show that this structure learning relates to individual decision strategy; faster learning of complex structures relates to selection of the most probable outcome in a given context (maximizing) rather than matching of the exact sequence statistics. Our findings provide evidence for alternate routes to learning of behaviorally relevant statistics that facilitate our ability to predict future events in variable environments.
A coarse-grained biophysical model of sequence evolution and the population size dependence of the speciation rate

PubMed Central

Khatri, Bhavin S.; Goldstein, Richard A.

2015-01-01

Speciation is fundamental to understanding the huge diversity of life on Earth. Although still controversial, empirical evidence suggests that the rate of speciation is larger for smaller populations. Here, we explore a biophysical model of speciation by developing a simple coarse-grained theory of transcription factor-DNA binding and how their co-evolution in two geographically isolated lineages leads to incompatibilities. To develop a tractable analytical theory, we derive a Smoluchowski equation for the dynamics of binding energy evolution that accounts for the fact that natural selection acts on phenotypes, but variation arises from mutations in sequences; the Smoluchowski equation includes selection due to both gradients in fitness and gradients in sequence entropy, which is the logarithm of the number of sequences that correspond to a particular binding energy. This simple consideration predicts that smaller populations develop incompatibilities more quickly in the weak mutation regime; this trend arises as sequence entropy poises smaller populations closer to incompatible regions of phenotype space. These results suggest a generic coarse-grained approach to evolutionary stochastic dynamics, allowing realistic modelling at the phenotypic level. PMID:25936759
Development of genome-wide SNP assays for rice

USDA-ARS?s Scientific Manuscript database

With the introduction of new sequencing technologies, single nucleotide polymorphisms (SNPs) are rapidly replacing simple sequence repeats (SSRs) as the DNA marker of choice for applications in plant breeding and genetics because they are more abundant, stable, amenable to automation, efficient, and...
A Simple View of Writing in Chinese

ERIC Educational Resources Information Center

Yeung, Pui-sze; Ho, Connie Suk-han; Chan, David Wai-ock; Chung, Kevin Kien-hoa

2017-01-01

This study examined the Chinese written composition development of elementary-grade students in relation to the simple view of writing. Measures of nonverbal reasoning ability, component skills of transcription (stroke sequence knowledge, word spelling, and handwriting fluency), oral language (definitional skill, oral narrative skills, and…
Simple tools for assembling and searching high-density picolitre pyrophosphate sequence data.

PubMed

Parker, Nicolas J; Parker, Andrew G

2008-04-18

The advent of pyrophosphate sequencing makes large volumes of sequencing data available at a lower cost than previously possible. However, the short read lengths are difficult to assemble and the large dataset is difficult to handle. During the sequencing of a virus from the tsetse fly, Glossina pallidipes, we found the need for tools to search quickly a set of reads for near exact text matches. A set of tools is provided to search a large data set of pyrophosphate sequence reads under a "live" CD version of Linux on a standard PC that can be used by anyone without prior knowledge of Linux and without having to install a Linux setup on the computer. The tools permit short lengths of de novo assembly, checking of existing assembled sequences, selection and display of reads from the data set and gathering counts of sequences in the reads. Demonstrations are given of the use of the tools to help with checking an assembly against the fragment data set; investigating homopolymer lengths, repeat regions and polymorphisms; and resolving inserted bases caused by incomplete chain extension. The additional information contained in a pyrophosphate sequencing data set beyond a basic assembly is difficult to access due to a lack of tools. The set of simple tools presented here would allow anyone with basic computer skills and a standard PC to access this information.
Prediction and analysis of protein solubility using a novel scoring card method with dipeptide composition

PubMed Central

2012-01-01

Background Existing methods for predicting protein solubility on overexpression in Escherichia coli advance performance by using ensemble classifiers such as two-stage support vector machine (SVM) based classifiers and a number of feature types such as physicochemical properties, amino acid and dipeptide composition, accompanied with feature selection. It is desirable to develop a simple and easily interpretable method for predicting protein solubility, compared to existing complex SVM-based methods. Results This study proposes a novel scoring card method (SCM) by using dipeptide composition only to estimate solubility scores of sequences for predicting protein solubility. SCM calculates the propensities of 400 individual dipeptides to be soluble using statistic discrimination between soluble and insoluble proteins of a training data set. Consequently, the propensity scores of all dipeptides are further optimized using an intelligent genetic algorithm. The solubility score of a sequence is determined by the weighted sum of all propensity scores and dipeptide composition. To evaluate SCM by performance comparisons, four data sets with different sizes and variation degrees of experimental conditions were used. The results show that the simple method SCM with interpretable propensities of dipeptides has promising performance, compared with existing SVM-based ensemble methods with a number of feature types. Furthermore, the propensities of dipeptides and solubility scores of sequences can provide insights to protein solubility. For example, the analysis of dipeptide scores shows high propensity of α-helix structure and thermophilic proteins to be soluble. Conclusions The propensities of individual dipeptides to be soluble are varied for proteins under altered experimental conditions. For accurately predicting protein solubility using SCM, it is better to customize the score card of dipeptide propensities by using a training data set under the same specified experimental conditions. The proposed method SCM with solubility scores and dipeptide propensities can be easily applied to the protein function prediction problems that dipeptide composition features play an important role. Availability The used datasets, source codes of SCM, and supplementary files are available at http://iclab.life.nctu.edu.tw/SCM/. PMID:23282103
Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

PubMed Central

2012-01-01

Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868
Next generation DNA sequencing technology delivers valuable genetic markers for the genomic orphan legume species, Bituminaria bituminosa

PubMed Central

2011-01-01

Background Bituminaria bituminosa is a perennial legume species from the Canary Islands and Mediterranean region that has potential as a drought-tolerant pasture species and as a source of pharmaceutical compounds. Three botanical varieties have previously been identified in this species: albomarginata, bituminosa and crassiuscula. B. bituminosa can be considered a genomic 'orphan' species with very few genomic resources available. New DNA sequencing technologies provide an opportunity to develop high quality molecular markers for such orphan species. Results 432,306 mRNA molecules were sampled from a leaf transcriptome of a single B. bituminosa plant using Roche 454 pyrosequencing, resulting in an average read length of 345 bp (149.1 Mbp in total). Sequences were assembled into 3,838 isotigs/contigs representing putatively unique gene transcripts. Gene ontology descriptors were identified for 3,419 sequences. Raw sequence reads containing simple sequence repeat (SSR) motifs were identified, and 240 primer pairs flanking these motifs were designed. Of 87 primer pairs developed this way, 75 (86.2%) successfully amplified primarily single fragments by PCR. Fragment analysis using 20 primer pairs in 79 accessions of B. bituminosa detected 130 alleles at 21 SSR loci. Genetic diversity analyses confirmed that variation at these SSR loci accurately reflected known taxonomic relationships in original collections of B. bituminosa and provided additional evidence that a division of the botanical variety bituminosa into two according to geographical origin (Mediterranean region and Canary Islands) may be appropriate. Evidence of cross-pollination was also found between botanical varieties within a B. bituminosa breeding programme. Conclusions B. bituminosa can no longer be considered a genomic orphan species, having now a large (albeit incomplete) repertoire of expressed gene sequences that can serve as a resource for future genetic studies. This experimental approach was effective in developing codominant and polymorphic SSR markers for application in diverse genetic studies. These markers have already given new insight into genetic variation in B. bituminosa, providing evidence that a division of the botanical variety bituminosa may be appropriate. This approach is commended to those seeking to develop useful markers for genomic orphan species. PMID:22171578
SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

PubMed Central

Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

2013-01-01

Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype. PMID:24058483
Development of genic-SSR markers by deep transcriptome sequencing in pigeonpea [Cajanus cajan (L.) Millspaugh

PubMed Central

2011-01-01

Background Pigeonpea [Cajanus cajan (L.) Millspaugh], one of the most important food legumes of semi-arid tropical and subtropical regions, has limited genomic resources, particularly expressed sequence based (genic) markers. We report a comprehensive set of validated genic simple sequence repeat (SSR) markers using deep transcriptome sequencing, and its application in genetic diversity analysis and mapping. Results In this study, 43,324 transcriptome shotgun assembly unigene contigs were assembled from 1.696 million 454 GS-FLX sequence reads of separate pooled cDNA libraries prepared from leaf, root, stem and immature seed of two pigeonpea varieties, Asha and UPAS 120. A total of 3,771 genic-SSR loci, excluding homopolymeric and compound repeats, were identified; of which 2,877 PCR primer pairs were designed for marker development. Dinucleotide was the most common repeat motif with a frequency of 60.41%, followed by tri- (34.52%), hexa- (2.62%), tetra- (1.67%) and pentanucleotide (0.76%) repeat motifs. Primers were synthesized and tested for 772 of these loci with repeat lengths of ≥18 bp. Of these, 550 markers were validated for consistent amplification in eight diverse pigeonpea varieties; 71 were found to be polymorphic on agarose gel electrophoresis. Genetic diversity analysis was done on 22 pigeonpea varieties and eight wild species using 20 highly polymorphic genic-SSR markers. The number of alleles at these loci ranged from 4-10 and the polymorphism information content values ranged from 0.46 to 0.72. Neighbor-joining dendrogram showed distinct separation of the different groups of pigeonpea cultivars and wild species. Deep transcriptome sequencing of the two parental lines helped in silico identification of polymorphic genic-SSR loci to facilitate the rapid development of an intra-species reference genetic map, a subset of which was validated for expected allelic segregation in the reference mapping population. Conclusion We developed 550 validated genic-SSR markers in pigeonpea using deep transcriptome sequencing. From these, 20 highly polymorphic markers were used to evaluate the genetic relationship among species of the genus Cajanus. A comprehensive set of genic-SSR markers was developed as an important genomic resource for diversity analysis and genetic mapping in pigeonpea. PMID:21251263
Recursive sequences in first-year calculus

NASA Astrophysics Data System (ADS)

Krainer, Thomas

2016-02-01

This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

USGS Publications Warehouse

Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

PubMed

Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

2013-01-01

SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).
Genotyping variability of computationally categorized peach microsatellite markers

USDA-ARS?s Scientific Manuscript database

Numerous expressed sequence tag (EST) simple sequence repeat (SSR) primers can be easily mined out. The obstacle to develop them into usable markers is how to optimally select downsized subsets of the primers for genotyping, which accordingly reduces amplification failure and monomorphism often occu...
A Fractal Excursion.

ERIC Educational Resources Information Center

Camp, Dane R.

1991-01-01

After introducing the two-dimensional Koch curve, which is generated by simple recursions on an equilateral triangle, the process is extended to three dimensions with simple recursions on a regular tetrahedron. Included, for both fractal sequences, are iterative formulae, illustrations of the first several iterations, and a sample PASCAL program.…
Noise-gating to Clean Astrophysical Image Data

DOE Office of Scientific and Technical Information (OSTI.GOV)

DeForest, C. E.

I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to nomore » apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.« less
Isolation and characterization of microsatellite markers for Dendranthema morifolium (Asteraceae) using next-generation sequencing.

PubMed

Yuan, W-J; Ye, S; Du, L-H; Li, S-M; Miao, X; Shang, F-D

2016-10-05

Dendranthema morifolium (Asteraceae) is a perennial herbaceous plant native to China. A long history of artificial crossings may have resulted in complex genetic background and decreased genetic diversity. To protect the genetic diversity of D. morifolium and enabling breeding of new D. morifolium cultivars, we developed a set of molecular markers. We used pyrosequencing of an enriched microsatellite library by Roche 454 FLX+ platform, to isolate D. morifolium simple sequence repeats (SSRs). A total of 32,863 raw reads containing 2251 SSRs were obtained. To test the effectiveness of these SSR markers, we designed primers by randomly selecting 100 novel SSRs, and amplified them across 60 cultivars representing five different petal shape groups. Sixteen SSRs were polymorphic with the number of alleles ranging from 6 to 19, and their expected and observed heterozygosities ranging from 0.477 to 0.848, and 0.250 to 0.804, respectively. The polymorphism information content ranged from 0.459 to 0.854 and the inbreeding coefficient ranged from -0.119 to 0.759. An unweighted pair-group method arithmetic average analysis was performed to survey the phylogenetic relationships of these 60 cultivars and five clusters were identified. These markers can be used for investigating genetic relationships and identifying elite alleles through linkage and association analyses.

Noise-gating to Clean Astrophysical Image Data

NASA Astrophysics Data System (ADS)

DeForest, C. E.

2017-04-01

I present a family of algorithms to reduce noise in astrophysical images and image sequences, preserving more information from the original data than is retained by conventional techniques. The family uses locally adaptive filters (“noise gates”) in the Fourier domain to separate coherent image structure from background noise based on the statistics of local neighborhoods in the image. Processing of solar data limited by simple shot noise or by additive noise reveals image structure not easily visible in the originals, preserves photometry of observable features, and reduces shot noise by a factor of 10 or more with little to no apparent loss of resolution. This reveals faint features that were either not directly discernible or not sufficiently strongly detected for quantitative analysis. The method works best on image sequences containing related subjects, for example movies of solar evolution, but is also applicable to single images provided that there are enough pixels. The adaptive filter uses the statistical properties of noise and of local neighborhoods in the data to discriminate between coherent features and incoherent noise without reference to the specific shape or evolution of those features. The technique can potentially be modified in a straightforward way to exploit additional a priori knowledge about the functional form of the noise.
Development of Reproducible EST-derived SSR Markers and Assessment of Genetic Diversity in Panax ginseng Cultivars and Related Species

PubMed Central

Choi, Hong-Il; Kim, Nam Hoon; Kim, Jun Ha; Choi, Beom Soon; Ahn, In-Ok; Lee, Joon-Soo; Yang, Tae-Jin

2011-01-01

Little is known about the genetics or genomics of Panax ginseng. In this study, we developed 70 expressed sequence tag-derived polymorphic simple sequence repeat markers by trials of 140 primer pairs. All of the 70 markers showed reproducible polymorphism among four Panax speciesand 19 of them were polymorphic in six P. ginseng cultivars. These markers segregated 1:2:1 manner of Mendelian inheritance in an F2 population of a cross between two P. ginseng cultivars, ‘Yunpoong’ and ‘Chunpoong’, indicating that these are reproducible and inheritable mappable markers. A phylogenetic analysis using the genotype data showed three distinctive groups: a P. ginseng-P. japonicus clade, P. notoginseng and P. quinquefolius, with similarity coefficients of 0.70. P. japonicus was intermingled with P. ginseng cultivars, indicating that both species have similar genetic backgrounds. P. ginseng cultivars were subdivided into three minor groups: an independent cultivar ‘Chunpoong’, a subgroup with three accessions including two cultivars, ‘Gumpoong’ and ‘Yunpoong’ and one landrace ‘Hwangsook’ and another subgroup with two accessions including one cultivar, ‘Gopoong’ and one landrace ‘Jakyung’. Each primer pair produced 1 to 4 bands, indicating that the ginseng genome has a highly replicated paleopolyploid genome structure. PMID:23717085
Influence of Neuroblastoma Stage on Serum-Based Detection of MYCN Amplification

PubMed Central

Combaret, Valerie; Hogarty, Michael D; London, Wendy B; McGrady, Patrick; Iacono, Isabelle; Brejon, Stephanie; Swerts, Katrien; Noguera, Rosa; Gross, Nicole; Rousseau, Raphael; Puisieux, Alain

2010-01-01

Background MYCN oncogene amplification has been defined as the most important prognostic factor for neuroblastoma, the most common solid extracranial neoplasm in children. High copy numbers are strongly associated with rapid tumor progression and poor outcome, independently of tumor stage or patient age, and this has become an important factor in treatment stratification. Procedure By Real Time Quantitative PCR analysis, we evaluated the clinical relevance of circulating MYCN DNA of 267 patients with locoregional or metastatic neuroblastoma in children less than 18 months of age. Results For patients in this age group with INSS stage 4 or 4S NB and stage 3 patients, serum-based determination of MYCN DNA sequences had good sensitivity (85%, 83% and 75% respectively) and high specificity (100%) when compared to direct tumor gene determination. In contrast, the approach showed low sensitivity patients with stage 1 and 2 disease. Conclusion Our results show that the sensitivity of the serum-based MYCN DNA sequence determination depends on the stage of the disease. However, this simple, reproducible assay may represent a reasonably sensitive and very specific tool to assess tumor MYCN status in cases with stage 3 and metastatic disease for whom a wait and see strategy is often recommended. PMID:19301388
SSRscanner: a program for reporting distribution and exact location of simple sequence repeats.

PubMed

Anwar, Tamanna; Khan, Asad U

2006-02-20

Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com.
A hybrid swarm population of Pinus densiflora x P. sylvestris hybrids inferred from sequence analysis of chloroplast DNA and morphological characters

USDA-ARS?s Scientific Manuscript database

To confirm a hybrid swarm population of Pinus densiflora × P. sylvestris in Jilin, China and to study whether shoot apex morphology of 4-year old seedlings can be correlated with the sequence of a chloroplast DNA simple sequence repeat marker (cpDNA SSR), needles and seeds from P. densiflora, P. syl...
Comparative Evaluation of Background Subtraction Algorithms in Remote Scene Videos Captured by MWIR Sensors

PubMed Central

Yao, Guangle; Lei, Tao; Zhong, Jiandan; Jiang, Ping; Jia, Wenwu

2017-01-01

Background subtraction (BS) is one of the most commonly encountered tasks in video analysis and tracking systems. It distinguishes the foreground (moving objects) from the video sequences captured by static imaging sensors. Background subtraction in remote scene infrared (IR) video is important and common to lots of fields. This paper provides a Remote Scene IR Dataset captured by our designed medium-wave infrared (MWIR) sensor. Each video sequence in this dataset is identified with specific BS challenges and the pixel-wise ground truth of foreground (FG) for each frame is also provided. A series of experiments were conducted to evaluate BS algorithms on this proposed dataset. The overall performance of BS algorithms and the processor/memory requirements were compared. Proper evaluation metrics or criteria were employed to evaluate the capability of each BS algorithm to handle different kinds of BS challenges represented in this dataset. The results and conclusions in this paper provide valid references to develop new BS algorithm for remote scene IR video sequence, and some of them are not only limited to remote scene or IR video sequence but also generic for background subtraction. The Remote Scene IR dataset and the foreground masks detected by each evaluated BS algorithm are available online: https://github.com/JerryYaoGl/BSEvaluationRemoteSceneIR. PMID:28837112
Generation and analysis of expressed sequence tags in the extreme large genomes Lilium and Tulipa

PubMed Central

2012-01-01

Background Bulbous flowers such as lily and tulip (Liliaceae family) are monocot perennial herbs that are economically very important ornamental plants worldwide. However, there are hardly any genetic studies performed and genomic resources are lacking. To build genomic resources and develop tools to speed up the breeding in both crops, next generation sequencing was implemented. We sequenced and assembled transcriptomes of four lily and five tulip genotypes using 454 pyro-sequencing technology. Results Successfully, we developed the first set of 81,791 contigs with an average length of 514 bp for tulip, and enriched the very limited number of 3,329 available ESTs (Expressed Sequence Tags) for lily with 52,172 contigs with an average length of 555 bp. The contigs together with singletons covered on average 37% of lily and 39% of tulip estimated transcriptome. Mining lily and tulip sequence data for SSRs (Simple Sequence Repeats) showed that di-nucleotide repeats were twice more abundant in UTRs (UnTranslated Regions) compared to coding regions, while tri-nucleotide repeats were equally spread over coding and UTR regions. Two sets of single nucleotide polymorphism (SNP) markers suitable for high throughput genotyping were developed. In the first set, no SNPs flanking the target SNP (50 bp on either side) were allowed. In the second set, one SNP in the flanking regions was allowed, which resulted in a 2 to 3 fold increase in SNP marker numbers compared with the first set. Orthologous groups between the two flower bulbs: lily and tulip (12,017 groups) and among the three monocot species: lily, tulip, and rice (6,900 groups) were determined using OrthoMCL. Orthologous groups were screened for common SNP markers and EST-SSRs to study synteny between lily and tulip, which resulted in 113 common SNP markers and 292 common EST-SSR. Lily and tulip contigs generated were annotated and described according to Gene Ontology terminology. Conclusions Two transcriptome sets were built that are valuable resources for marker development, comparative genomic studies and candidate gene approaches. Next generation sequencing of leaf transcriptome is very effective; however, deeper sequencing and using more tissues and stages is advisable for extended comparative studies. PMID:23167289
Bioinformatic analysis of ESTs collected by Sanger and pyrosequencing methods for a keystone forest tree species: oak

PubMed Central

2010-01-01

Background The Fagaceae family comprises about 1,000 woody species worldwide. About half belong to the Quercus family. These oaks are often a source of raw material for biomass wood and fiber. Pedunculate and sessile oaks, are among the most important deciduous forest tree species in Europe. Despite their ecological and economical importance, very few genomic resources have yet been generated for these species. Here, we describe the development of an EST catalogue that will support ecosystem genomics studies, where geneticists, ecophysiologists, molecular biologists and ecologists join their efforts for understanding, monitoring and predicting functional genetic diversity. Results We generated 145,827 sequence reads from 20 cDNA libraries using the Sanger method. Unexploitable chromatograms and quality checking lead us to eliminate 19,941 sequences. Finally a total of 125,925 ESTs were retained from 111,361 cDNA clones. Pyrosequencing was also conducted for 14 libraries, generating 1,948,579 reads, from which 370,566 sequences (19.0%) were eliminated, resulting in 1,578,192 sequences. Following clustering and assembly using TGICL pipeline, 1,704,117 EST sequences collapsed into 69,154 tentative contigs and 153,517 singletons, providing 222,671 non-redundant sequences (including alternative transcripts). We also assembled the sequences using MIRA and PartiGene software and compared the three unigene sets. Gene ontology annotation was then assigned to 29,303 unigene elements. Blast search against the SWISS-PROT database revealed putative homologs for 32,810 (14.7%) unigene elements, but more extensive search with Pfam, Refseq_protein, Refseq_RNA and eight gene indices revealed homology for 67.4% of them. The EST catalogue was examined for putative homologs of candidate genes involved in bud phenology, cuticle formation, phenylpropanoids biosynthesis and cell wall formation. Our results suggest a good coverage of genes involved in these traits. Comparative orthologous sequences (COS) with other plant gene models were identified and allow to unravel the oak paleo-history. Simple sequence repeats (SSRs) and single nucleotide polymorphisms (SNPs) were searched, resulting in 52,834 SSRs and 36,411 SNPs. All of these are available through the Oak Contig Browser http://genotoul-contigbrowser.toulouse.inra.fr:9092/Quercus_robur/index.html. Conclusions This genomic resource provides a unique tool to discover genes of interest, study the oak transcriptome, and develop new markers to investigate functional diversity in natural populations. PMID:21092232
JCoDA: a tool for detecting evolutionary selection

PubMed Central

2010-01-01

Background The incorporation of annotated sequence information from multiple related species in commonly used databases (Ensembl, Flybase, Saccharomyces Genome Database, Wormbase, etc.) has increased dramatically over the last few years. This influx of information has provided a considerable amount of raw material for evaluation of evolutionary relationships. To aid in the process, we have developed JCoDA (Java Codon Delimited Alignment) as a simple-to-use visualization tool for the detection of site specific and regional positive/negative evolutionary selection amongst homologous coding sequences. Results JCoDA accepts user-inputted unaligned or pre-aligned coding sequences, performs a codon-delimited alignment using ClustalW, and determines the dN/dS calculations using PAML (Phylogenetic Analysis Using Maximum Likelihood, yn00 and codeml) in order to identify regions and sites under evolutionary selection. The JCoDA package includes a graphical interface for Phylip (Phylogeny Inference Package) to generate phylogenetic trees, manages formatting of all required file types, and streamlines passage of information between underlying programs. The raw data are output to user configurable graphs with sliding window options for straightforward visualization of pairwise or gene family comparisons. Additionally, codon-delimited alignments are output in a variety of common formats and all dN/dS calculations can be output in comma-separated value (CSV) format for downstream analysis. To illustrate the types of analyses that are facilitated by JCoDA, we have taken advantage of the well studied sex determination pathway in nematodes as well as the extensive sequence information available to identify genes under positive selection, examples of regional positive selection, and differences in selection based on the role of genes in the sex determination pathway. Conclusions JCoDA is a configurable, open source, user-friendly visualization tool for performing evolutionary analysis on homologous coding sequences. JCoDA can be used to rapidly screen for genes and regions of genes under selection using PAML. It can be freely downloaded at http://www.tcnj.edu/~nayaklab/jcoda. PMID:20507581
Billions of basepairs of recently expanded, repetitive sequences are eliminated from the somatic genome during copepod development

PubMed Central

2014-01-01

Background Chromatin diminution is the programmed deletion of DNA from presomatic cell or nuclear lineages during development, producing single organisms that contain two different nuclear genomes. Phylogenetically diverse taxa undergo chromatin diminution — some ciliates, nematodes, copepods, and vertebrates. In cyclopoid copepods, chromatin diminution occurs in taxa with massively expanded germline genomes; depending on species, germline genome sizes range from 15 – 75 Gb, 12–74 Gb of which are lost from pre-somatic cell lineages at germline – soma differentiation. This is more than an order of magnitude more sequence than is lost from other taxa. To date, the sequences excised from copepods have not been analyzed using large-scale genomic datasets, and the processes underlying germline genomic gigantism in this clade, as well as the functional significance of chromatin diminution, have remained unknown. Results Here, we used high-throughput genomic sequencing and qPCR to characterize the germline and somatic genomes of Mesocyclops edax, a freshwater cyclopoid copepod with a germline genome of ~15 Gb and a somatic genome of ~3 Gb. We show that most of the excised DNA consists of repetitive sequences that are either 1) verifiable transposable elements (TEs), or 2) non-simple repeats of likely TE origin. Repeat elements in both genomes are skewed towards younger (i.e. less divergent) elements. Excised DNA is a non-random sample of the germline repeat element landscape; younger elements, and high frequency DNA transposons and LINEs, are disproportionately eliminated from the somatic genome. Conclusions Our results suggest that germline genome expansion in M. edax reflects explosive repeat element proliferation, and that billions of base pairs of such repeats are deleted from the somatic genome every generation. Thus, we hypothesize that chromatin diminution is a mechanism that controls repeat element load, and that this load can evolve to be divergent between tissue types within single organisms. PMID:24618421
A Method for Selecting Structure-switching Aptamers Applied to a Colorimetric Gold Nanoparticle Assay

PubMed Central

Martin, Jennifer A.; Smith, Joshua E.; Warren, Mercedes; Chávez, Jorge L.; Hagen, Joshua A.; Kelley-Loughnane, Nancy

2015-01-01

Small molecules provide rich targets for biosensing applications due to their physiological implications as biomarkers of various aspects of human health and performance. Nucleic acid aptamers have been increasingly applied as recognition elements on biosensor platforms, but selecting aptamers toward small molecule targets requires special design considerations. This work describes modification and critical steps of a method designed to select structure-switching aptamers to small molecule targets. Binding sequences from a DNA library hybridized to complementary DNA capture probes on magnetic beads are separated from nonbinders via a target-induced change in conformation. This method is advantageous because sequences binding the support matrix (beads) will not be further amplified, and it does not require immobilization of the target molecule. However, the melting temperature of the capture probe and library is kept at or slightly above RT, such that sequences that dehybridize based on thermodynamics will also be present in the supernatant solution. This effectively limits the partitioning efficiency (ability to separate target binding sequences from nonbinders), and therefore many selection rounds will be required to remove background sequences. The reported method differs from previous structure-switching aptamer selections due to implementation of negative selection steps, simplified enrichment monitoring, and extension of the length of the capture probe following selection enrichment to provide enhanced stringency. The selected structure-switching aptamers are advantageous in a gold nanoparticle assay platform that reports the presence of a target molecule by the conformational change of the aptamer. The gold nanoparticle assay was applied because it provides a simple, rapid colorimetric readout that is beneficial in a clinical or deployed environment. Design and optimization considerations are presented for the assay as proof-of-principle work in buffer to provide a foundation for further extension of the work toward small molecule biosensing in physiological fluids. PMID:25870978
SeqPig: simple and scalable scripting for large sequencing data sets in Hadoop

PubMed Central

Schumacher, André; Pireddu, Luca; Niemenmaa, Matti; Kallio, Aleksi; Korpelainen, Eija; Zanetti, Gianluigi; Heljanko, Keijo

2014-01-01

Summary: Hadoop MapReduce-based approaches have become increasingly popular due to their scalability in processing large sequencing datasets. However, as these methods typically require in-depth expertise in Hadoop and Java, they are still out of reach of many bioinformaticians. To solve this problem, we have created SeqPig, a library and a collection of tools to manipulate, analyze and query sequencing datasets in a scalable and simple manner. SeqPigscripts use the Hadoop-based distributed scripting engine Apache Pig, which automatically parallelizes and distributes data processing tasks. We demonstrate SeqPig’s scalability over many computing nodes and illustrate its use with example scripts. Availability and Implementation: Available under the open source MIT license at http://sourceforge.net/projects/seqpig/ Contact: andre.schumacher@yahoo.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24149054
A Practical Workshop for Generating Simple DNA Fingerprints of Plants

ERIC Educational Resources Information Center

Rouziere, A.-S.; Redman, J. E.

2011-01-01

Gel electrophoresis DNA fingerprints offer a graphical and visually appealing illumination of the similarities and differences between DNA sequences of different species and individuals. A polymerase chain reaction (PCR) and restriction digest protocol was designed to give high-school students the opportunity to generate simple fingerprints of…
Multiplexed microsatellite recovery using massively parallel sequencing

Treesearch

T.N. Jennings; B.J. Knaus; T.D. Mullins; S.M. Haig; R.C. Cronn

2011-01-01

Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of...
Identification of Human Lineage-Specific Transcriptional Coregulators Enabled by a Glossary of Binding Modules and Tunable Genomic Backgrounds.

PubMed

Mariani, Luca; Weinand, Kathryn; Vedenko, Anastasia; Barrera, Luis A; Bulyk, Martha L

2017-09-27

Transcription factors (TFs) control cellular processes by binding specific DNA motifs to modulate gene expression. Motif enrichment analysis of regulatory regions can identify direct and indirect TF binding sites. Here, we created a glossary of 108 non-redundant TF-8mer "modules" of shared specificity for 671 metazoan TFs from publicly available and new universal protein binding microarray data. Analysis of 239 ENCODE TF chromatin immunoprecipitation sequencing datasets and associated RNA sequencing profiles suggest the 8mer modules are more precise than position weight matrices in identifying indirect binding motifs and their associated tethering TFs. We also developed GENRE (genomically equivalent negative regions), a tunable tool for construction of matched genomic background sequences for analysis of regulatory regions. GENRE outperformed four state-of-the-art approaches to background sequence construction. We used our TF-8mer glossary and GENRE in the analysis of the indirect binding motifs for the co-occurrence of tethering factors, suggesting novel TF-TF interactions. We anticipate that these tools will aid in elucidating tissue-specific gene-regulatory programs. Copyright © 2017 Elsevier Inc. All rights reserved.
Effects of tonal language background on tests of temporal sequencing in children.

PubMed

Mukari, Siti Zamratol-Mai S; Yu, Xuan; Ishak, Wan Syafira; Mazlan, Rafidah

2015-01-01

The aims of the present study were to determine the effects of language background on the performance of the pitch pattern sequence test (PPST) and duration pattern sequence test (DPST). As temporal order sequencing may be affected by age and working memory, these factors were also studied. Performance of tonal and non-tonal language speakers on PPST and DPST were compared. Twenty-eight native Mandarin (tonal language) speakers and twenty-nine native Malay (non-tonal language) speakers between seven to nine years old participated in this study. The results revealed that relative to native Malay speakers, native Mandarin speakers demonstrated better scores on the PPST in both humming and verbal labeling responses. However, a similar language effect was not apparent in the DPST. An age effect was only significant in the PPST (verbal labeling). Finally, no significant effect of working memory was found on the PPST and the DPST. These findings suggest that the PPST is affected by tonal language background, and highlight the importance of developing different normative values for tonal and non-tonal language speakers.
Navy LPD-17 Amphibious Ship Procurement: Background, Issues, and Options for Congress

DTIC Science & Technology

2010-07-01

Background, Issues, and Options for Congress 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK...performed out of sequence and significant rework has been required, disrupting the optimal construction sequence and application of lessons learned...deeply concerned about Northrop Grumman Ship Systems’ ( NGSS ) ability to recover in the aftermath of Hurricane Katrina, particularly in regard to
Navy LPD-17 Amphibious Ship Procurement: Background, Issues, and Options for Congress

DTIC Science & Technology

2010-06-10

Background, Issues, and Options for Congress 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e . TASK...out of sequence and significant rework has been required, disrupting the optimal construction sequence and application of lessons learned for...concerned about Northrop Grumman Ship Systems’ ( NGSS ) ability to recover in the aftermath of Hurricane Katrina, particularly in regard to construction
Infrared maritime target detection using a probabilistic single Gaussian model of sea clutter in Fourier domain

NASA Astrophysics Data System (ADS)

Zhou, Anran; Xie, Weixin; Pei, Jihong; Chen, Yapei

2018-02-01

For ship targets detection in cluttered infrared image sequences, a robust detection method, based on the probabilistic single Gaussian model of sea background in Fourier domain, is put forward. The amplitude spectrum sequences at each frequency point of the pure seawater images in Fourier domain, being more stable than the gray value sequences of each background pixel in the spatial domain, are regarded as a Gaussian model. Next, a probability weighted matrix is built based on the stability of the pure seawater's total energy spectrum in the row direction, to make the Gaussian model more accurate. Then, the foreground frequency points are separated from the background frequency points by the model. Finally, the false-alarm points are removed utilizing ships' shape features. The performance of the proposed method is tested by visual and quantitative comparisons with others.
Integer sequence discovery from small graphs

PubMed Central

Hoppe, Travis; Petrone, Anna

2015-01-01

We have exhaustively enumerated all simple, connected graphs of a finite order and have computed a selection of invariants over this set. Integer sequences were constructed from these invariants and checked against the Online Encyclopedia of Integer Sequences (OEIS). 141 new sequences were added and six sequences were extended. From the graph database, we were able to programmatically suggest relationships among the invariants. It will be shown that we can readily visualize any sequence of graphs with a given criteria. The code has been released as an open-source framework for further analysis and the database was constructed to be extensible to invariants not considered in this work. PMID:27034526

A novel diagnostic method for malaria using loop-mediated isothermal amplification (LAMP) and MinION™ nanopore sequencer.

PubMed

Imai, Kazuo; Tarumoto, Norihito; Misawa, Kazuhisa; Runtuwene, Lucky Ronald; Sakai, Jun; Hayashida, Kyoko; Eshita, Yuki; Maeda, Ryuichiro; Tuda, Josef; Murakami, Takashi; Maesaki, Shigefumi; Suzuki, Yutaka; Yamagishi, Junya; Maeda, Takuya

2017-09-13

A simple and accurate molecular diagnostic method for malaria is urgently needed due to the limitations of conventional microscopic examination. In this study, we demonstrate a new diagnostic procedure for human malaria using loop mediated isothermal amplification (LAMP) and the MinION™ nanopore sequencer. We generated specific LAMP primers targeting the 18S-rRNA gene of all five human Plasmodium species including two P. ovale subspecies (P. falciparum, P. vivax, P. ovale wallikeri, P. ovale curtisi, P. knowlesi and P. malariae) and examined human blood samples collected from 63 malaria patients in Indonesia. Additionally, we performed amplicon sequencing of our LAMP products using MinION™ nanopore sequencer to identify each Plasmodium species. Our LAMP method allowed amplification of all targeted 18S-rRNA genes of the reference plasmids with detection limits of 10-100 copies per reaction. Among the 63 clinical samples, 54 and 55 samples were positive by nested PCR and our LAMP method, respectively. Identification of the Plasmodium species by LAMP amplicon sequencing analysis using the MinION™ was consistent with the reference plasmid sequences and the results of nested PCR. Our diagnostic method combined with LAMP and MinION™ could become a simple and accurate tool for the identification of human Plasmodium species, even in resource-limited situations.
Developing expressed sequence tag libraries and the discovery of simple sequence repeat markers for two species of raspberry (Rubus L.).

PubMed

Bushakra, Jill M; Lewers, Kim S; Staton, Margaret E; Zhebentyayeva, Tetyana; Saski, Christopher A

2015-10-26

Due to a relatively high level of codominant inheritance and transferability within and among taxonomic groups, simple sequence repeat (SSR) markers are important elements in comparative mapping and delineation of genomic regions associated with traits of economic importance. Expressed sequence tags (ESTs) are a source of SSRs that can be used to develop markers to facilitate plant breeding and for more basic research across genera and higher plant orders. Leaf and meristem tissue from 'Heritage' red raspberry (Rubus idaeus) and 'Bristol' black raspberry (R. occidentalis) were utilized for RNA extraction. After conversion to cDNA and library construction, ESTs were sequenced, quality verified, assembled and scanned for SSRs. Primers flanking the SSRs were designed and a subset tested for amplification, polymorphism and transferability across species. ESTs containing SSRs were functionally annotated using the GenBank non-redundant (nr) database and further classified using the gene ontology database. To accelerate development of EST-SSRs in the genus Rubus (Rosaceae), 1149 and 2358 cDNA sequences were generated from red raspberry and black raspberry, respectively. The cDNA sequences were screened using rigorous filtering criteria which resulted in the identification of 121 and 257 SSR loci for red and black raspberry, respectively. Primers were designed from the surrounding sequences resulting in 131 and 288 primer pairs, respectively, as some sequences contained more than one SSR locus. Sequence analysis revealed that the SSR-containing genes span a diversity of functions and share more sequence identity with strawberry genes than with other Rosaceous species. This resource of Rubus-specific, gene-derived markers will facilitate the construction of linkage maps composed of transferable markers for studying and manipulating important traits in this economically important genus.
Differential transferability of EST-SSR primers developed from diploid species Pseudoroegneria spicata, Thinopyrum bessarabicum, and Th. elongatum

USDA-ARS?s Scientific Manuscript database

Simple sequence repeat technology based on expressed sequence tag (EST-SSR) is a useful genomic tool for genome mapping, characterizing plant species relationships, elucidating genome evolution, and tracing genes on alien chromosome segments. EST-SSR primers developed from three perennial diploid T...
Stroke-model-based character extraction from gray-level document images.

PubMed

Ye, X; Cheriet, M; Suen, C Y

2001-01-01

Global gray-level thresholding techniques such as Otsu's method, and local gray-level thresholding techniques such as edge-based segmentation or the adaptive thresholding method are powerful in extracting character objects from simple or slowly varying backgrounds. However, they are found to be insufficient when the backgrounds include sharply varying contours or fonts in different sizes. A stroke-model is proposed to depict the local features of character objects as double-edges in a predefined size. This model enables us to detect thin connected components selectively, while ignoring relatively large backgrounds that appear complex. Meanwhile, since the stroke width restriction is fully factored in, the proposed technique can be used to extract characters in predefined font sizes. To process large volumes of documents efficiently, a hybrid method is proposed for character extraction from various backgrounds. Using the measurement of class separability to differentiate images with simple backgrounds from those with complex backgrounds, the hybrid method can process documents with different backgrounds by applying the appropriate methods. Experiments on extracting handwriting from a check image, as well as machine-printed characters from scene images demonstrate the effectiveness of the proposed model.
Universality classes of fluctuation dynamics in hierarchical complex systems

NASA Astrophysics Data System (ADS)

Macêdo, A. M. S.; González, Iván R. Roa; Salazar, D. S. P.; Vasconcelos, G. L.

2017-03-01

A unified approach is proposed to describe the statistics of the short-time dynamics of multiscale complex systems. The probability density function of the relevant time series (signal) is represented as a statistical superposition of a large time-scale distribution weighted by the distribution of certain internal variables that characterize the slowly changing background. The dynamics of the background is formulated as a hierarchical stochastic model whose form is derived from simple physical constraints, which in turn restrict the dynamics to only two possible classes. The probability distributions of both the signal and the background have simple representations in terms of Meijer G functions. The two universality classes for the background dynamics manifest themselves in the signal distribution as two types of tails: power law and stretched exponential, respectively. A detailed analysis of empirical data from classical turbulence and financial markets shows excellent agreement with the theory.
Sexing the Sciuridae: a simple and accurate set of molecular methods to determine sex in tree squirrels, ground squirrels and marmots.

PubMed

Gorrell, Jamieson C; Boutin, Stan; Raveh, Shirley; Neuhaus, Peter; Côté, Steeve D; Coltman, David W

2012-09-01

We determined the sequence of the male-specific minor histocompatibility complex antigen (Smcy) from the Y chromosome of seven squirrel species (Sciuridae, Rodentia). Based on conserved regions inside the Smcy intron sequence, we designed PCR primers for sex determination in these species that can be co-amplified with nuclear loci as controls. PCR co-amplification yields two products for males and one for females that are easily visualized as bands by agarose gel electrophoresis. Our method provides simple and reliable sex determination across a wide range of squirrel species. © 2012 Blackwell Publishing Ltd.
Plant genotyping using fluorescently tagged inter-simple sequence repeats (ISSRs): basic principles and methodology.

PubMed

Prince, Linda M

2015-01-01

Inter-simple sequence repeat PCR (ISSR-PCR) is a fast, inexpensive genotyping technique based on length variation in the regions between microsatellites. The method requires no species-specific prior knowledge of microsatellite location or composition. Very small amounts of DNA are required, making this method ideal for organisms of conservation concern, or where the quantity of DNA is extremely limited due to organism size. ISSR-PCR can be highly reproducible but requires careful attention to detail. Optimization of DNA extraction, fragment amplification, and normalization of fragment peak heights during fluorescent detection are critical steps to minimizing the downstream time spent verifying and scoring the data.
A resource of large-scale molecular markers for monitoring Agropyron cristatum chromatin introgression in wheat background based on transcriptome sequences.

PubMed

Zhang, Jinpeng; Liu, Weihua; Lu, Yuqing; Liu, Qunxing; Yang, Xinming; Li, Xiuquan; Li, Lihui

2017-09-20

Agropyron cristatum is a wild grass of the tribe Triticeae and serves as a gene donor for wheat improvement. However, very few markers can be used to monitor A. cristatum chromatin introgressions in wheat. Here, we reported a resource of large-scale molecular markers for tracking alien introgressions in wheat based on transcriptome sequences. By aligning A. cristatum unigenes with the Chinese Spring reference genome sequences, we designed 9602 A. cristatum expressed sequence tag-sequence-tagged site (EST-STS) markers for PCR amplification and experimental screening. As a result, 6063 polymorphic EST-STS markers were specific for the A. cristatum P genome in the single-receipt wheat background. A total of 4956 randomly selected polymorphic EST-STS markers were further tested in eight wheat variety backgrounds, and 3070 markers displaying stable and polymorphic amplification were validated. These markers covered more than 98% of the A. cristatum genome, and the marker distribution density was approximately 1.28 cM. An application case of all EST-STS markers was validated on the A. cristatum 6 P chromosome. These markers were successfully applied in the tracking of alien A. cristatum chromatin. Altogether, this study provided a universal method of large-scale molecular marker development to monitor wild relative chromatin in wheat.
Development of expressed sequence tag-simple sequence repeat markers for genetic characterization and population structure analysis of Praxelis clematidea (Asteraceae).

PubMed

Wang, Q Z; Huang, M; Downie, S R; Chen, Z X

2016-05-23

Invasive plants tend to spread aggressively in new habitats and an understanding of their genetic diversity and population structure is useful for their management. In this study, expressed sequence tag-simple sequence repeat (EST-SSR) markers were developed for the invasive plant species Praxelis clematidea (Asteraceae) from 5548 Stevia rebaudiana (Asteraceae) expressed sequence tags (ESTs). A total of 133 microsatellite-containing ESTs (2.4%) were identified, of which 56 (42.1%) were hexanucleotide repeat motifs and 50 (37.6%) were trinucleotide repeat motifs. Of the 24 primer pairs designed from these 133 ESTs, 7 (29.2%) resulted in significant polymorphisms. The number of alleles per locus ranged from 5 to 9. The relatively high genetic diversity (H = 0.2667, I = 0.4212, and P = 100%) of P. clematidea was related to high gene flow (Nm = 1.4996) among populations. The coefficient of population differentiation (GST = 0.2500) indicated that most genetic variation occurred within populations. A Mantel test suggested that there was significant correlation between genetic distance and geographical distribution (r = 0.3192, P = 0.012). These results further support the transferability of EST-SSR markers between closely related genera of the same family.
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

PubMed Central

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
40 CFR 1065.650 - Emission calculations.

Code of Federal Regulations, 2011 CFR

2011-07-01

... following sequence of preliminary calculations on recorded concentrations: (i) Correct all THC and CH4.... (iii) Calculate all THC and NMHC concentrations, including dilution air background concentrations, as... NMHC to background corrected mass of THC. If the background corrected mass of NMHC is greater than 0.98...
A whole-genome assembly of the domestic cow, Bos taurus

USDA-ARS?s Scientific Manuscript database

Background: The genome of the domestic cow, Bos taurus, was sequenced using a mixture of hierarchical and whole-genome shotgun sequencing methods. Results: We have assembled the 35 million sequence reads and applied a variety of assembly improvement techniques, creating an assembly of 2.86 billion b...
Next generation sequencing provides rapid access to the genome of wheat stripe rust

USDA-ARS?s Scientific Manuscript database

Background: The wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has ra...
A sensitive detection method for MPLW515L or MPLW515K mutation in chronic myeloproliferative disorders with locked nucleic acid-modified probes and real-time polymerase chain reaction.

PubMed

Pancrazzi, Alessandro; Guglielmelli, Paola; Ponziani, Vanessa; Bergamaschi, Gaetano; Bosi, Alberto; Barosi, Giovanni; Vannucchi, Alessandro M

2008-09-01

Acquired mutations in the juxtamembrane region of MPL (W515K or W515L), the receptor for thrombopoietin, have been described in patients with primary myelofibrosis or essential thrombocythemia, which are chronic myeloproliferative disorders. We have developed a real-time polymerase chain reaction assay for the detection and quantification of MPL mutations that is based on locked nucleic acid fluorescent probes. Mutational analysis was performed using DNA from granulocytes. Reference curves were obtained using cloned fragments of MPL containing either the wild-type or mutated sequence; the predicted sensitivity level was at least 0.1% mutant allele in a wild-type background. None of the 60 control subjects presented with a MPLW515L/K mutation. Of 217 patients with myelofibrosis, 19 (8.7%) harbored the MPLW515 mutation, 10 (52.6%) with the W515L allele. In one case, both the W515L and W515K alleles were detected by real-time polymerase chain reaction. By comparing results obtained with conventional sequencing, no erroneous genotype attribution using real-time polymerase chain reaction was found, whereas one patient considered wild type according to sequence analysis actually harbored a low W515L allele burden. This is a simple, sensitive, and cost-effective procedure for large-scale screening of the MPLW515L/K mutation in patients suspected to have a myeloproliferative disorder. It can also provide a quantitative estimate of mutant allele burden that might be useful for both patient prognosis and monitoring response to therapy.
Coherent Somatic Mutation in Autoimmune Disease

PubMed Central

Ross, Kenneth Andrew

2014-01-01

Background Many aspects of autoimmune disease are not well understood, including the specificities of autoimmune targets, and patterns of co-morbidity and cross-heritability across diseases. Prior work has provided evidence that somatic mutation caused by gene conversion and deletion at segmentally duplicated loci is relevant to several diseases. Simple tandem repeat (STR) sequence is highly mutable, both somatically and in the germ-line, and somatic STR mutations are observed under inflammation. Results Protein-coding genes spanning STRs having markers of mutability, including germ-line variability, high total length, repeat count and/or repeat similarity, are evaluated in the context of autoimmunity. For the initiation of autoimmune disease, antigens whose autoantibodies are the first observed in a disease, termed primary autoantigens, are informative. Three primary autoantigens, thyroid peroxidase (TPO), phogrin (PTPRN2) and filaggrin (FLG), include STRs that are among the eleven longest STRs spanned by protein-coding genes. This association of primary autoantigens with long STR sequence is highly significant (). Long STRs occur within twenty genes that are associated with sixteen common autoimmune diseases and atherosclerosis. The repeat within the TTC34 gene is an outlier in terms of length and a link with systemic lupus erythematosus is proposed. Conclusions The results support the hypothesis that many autoimmune diseases are triggered by immune responses to proteins whose DNA sequence mutates somatically in a coherent, consistent fashion. Other autoimmune diseases may be caused by coherent somatic mutations in immune cells. The coherent somatic mutation hypothesis has the potential to be a comprehensive explanation for the initiation of many autoimmune diseases. PMID:24988487
Molecular characterization of freshwater snails in the genus Bulinus: a role for barcodes?

PubMed Central

Kane, Richard A; Stothard, J Russell; Emery, Aidan M; Rollinson, David

2008-01-01

Background Reliable and consistent methods are required for the identification and classification of freshwater snails belonging to the genus Bulinus (Gastropoda, Planorbidae) which act as intermediate hosts for schistosomes of both medical and veterinary importance. The current project worked towards two main objectives, the development of a cost effective, simple screening method for the routine identification of Bulinus isolates and the use of resultant sequencing data to produce a model of relationships within the group. Results Phylogenetic analysis of the DNA sequence for a large section (1009 bp) of the mitochondrial gene cytochrome oxidase subunit 1 (cox1) for isolates of Bulinus demonstrated superior resolution over that employing the second internal transcribed spacer (its2) of the ribosomal gene complex. Removal of transitional substitutions within cox1 because of saturation effects still allowed identification of snails at species group level. Within groups, some species could be identified with ease but there were regions where the high degree of molecular diversity meant that clear identification of species was problematic, this was particularly so within the B. africanus group. Conclusion The sequence diversity within cox1 is such that a barcoding approach may offer the best method for characterization of populations and species within the genus from different geographical locations. The study has confirmed the definition of some accepted species within the species groups but additionally has revealed some unrecognized isolates which underlines the need to use molecular markers in addition to more traditional methods of identification. A barcoding approach based on part of the cox1 gene as defined by the Folmer primers is proposed. PMID:18544153
A reference genetic map of C. clementina hort. ex Tan.; citrus evolution inferences from comparative mapping

PubMed Central

2012-01-01

Background Most modern citrus cultivars have an interspecific origin. As a foundational step towards deciphering the interspecific genome structures, a reference whole genome sequence was produced by the International Citrus Genome Consortium from a haploid derived from Clementine mandarin. The availability of a saturated genetic map of Clementine was identified as an essential prerequisite to assist the whole genome sequence assembly. Clementine is believed to be a ‘Mediterranean’ mandarin × sweet orange hybrid, and sweet orange likely arose from interspecific hybridizations between mandarin and pummelo gene pools. The primary goals of the present study were to establish a Clementine reference map using codominant markers, and to perform comparative mapping of pummelo, sweet orange, and Clementine. Results Five parental genetic maps were established from three segregating populations, which were genotyped with Single Nucleotide Polymorphism (SNP), Simple Sequence Repeats (SSR) and Insertion-Deletion (Indel) markers. An initial medium density reference map (961 markers for 1084.1 cM) of the Clementine was established by combining male and female Clementine segregation data. This Clementine map was compared with two pummelo maps and a sweet orange map. The linear order of markers was highly conserved in the different species. However, significant differences in map size were observed, which suggests a variation in the recombination rates. Skewed segregations were much higher in the male than female Clementine mapping data. The mapping data confirmed that Clementine arose from hybridization between ‘Mediterranean’ mandarin and sweet orange. The results identified nine recombination break points for the sweet orange gamete that contributed to the Clementine genome. Conclusions A reference genetic map of citrus, used to facilitate the chromosome assembly of the first citrus reference genome sequence, was established. The high conservation of marker order observed at the interspecific level should allow reasonable inferences of most citrus genome sequences by mapping next-generation sequencing (NGS) data in the reference genome sequence. The genome of the haploid Clementine used to establish the citrus reference genome sequence appears to have been inherited primarily from the ‘Mediterranean’ mandarin. The high frequency of skewed allelic segregations in the male Clementine data underline the probable extent of deviation from Mendelian segregation for characters controlled by heterozygous loci in male parents. PMID:23126659
Analysis of the transcriptome of Panax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markers

PubMed Central

2011-01-01

Background Panax notoginseng (Burk) F.H. Chen is important medicinal plant of the Araliacease family. Triterpene saponins are the bioactive constituents in P. notoginseng. However, available genomic information regarding this plant is limited. Moreover, details of triterpene saponin biosynthesis in the Panax species are largely unknown. Results Using the 454 pyrosequencing technology, a one-quarter GS FLX titanium run resulted in 188,185 reads with an average length of 410 bases for P. notoginseng root. These reads were processed and assembled by 454 GS De Novo Assembler software into 30,852 unique sequences. A total of 70.2% of unique sequences were annotated by Basic Local Alignment Search Tool (BLAST) similarity searches against public sequence databases. The Kyoto Encyclopedia of Genes and Genomes (KEGG) assignment discovered 41 unique sequences representing 11 genes involved in triterpene saponin backbone biosynthesis in the 454-EST dataset. In particular, the transcript encoding dammarenediol synthase (DS), which is the first committed enzyme in the biosynthetic pathway of major triterpene saponins, is highly expressed in the root of four-year-old P. notoginseng. It is worth emphasizing that the candidate cytochrome P450 (Pn02132 and Pn00158) and UDP-glycosyltransferase (Pn00082) gene most likely to be involved in hydroxylation or glycosylation of aglycones for triterpene saponin biosynthesis were discovered from 174 cytochrome P450s and 242 glycosyltransferases by phylogenetic analysis, respectively. Putative transcription factors were detected in 906 unique sequences, including Myb, homeobox, WRKY, basic helix-loop-helix (bHLH), and other family proteins. Additionally, a total of 2,772 simple sequence repeat (SSR) were identified from 2,361 unique sequences, of which, di-nucleotide motifs were the most abundant motif. Conclusion This study is the first to present a large-scale EST dataset for P. notoginseng root acquired by next-generation sequencing (NGS) technology. The candidate genes involved in triterpene saponin biosynthesis, including the putative CYP450s and UGTs, were obtained in this study. Additionally, the identification of SSRs provided plenty of genetic makers for molecular breeding and genetics applications in this species. These data will provide information on gene discovery, transcriptional regulation and marker-assisted selection for P. notoginseng. The dataset establishes an important foundation for the study with the purpose of ensuring adequate drug resources for this species. PMID:22369100
Using RNA-Seq for gene identification, polymorphism detection and transcript profiling in two alfalfa genotypes with divergent cell wall composition in stems

PubMed Central

2011-01-01

Background Alfalfa, [Medicago sativa (L.) sativa], a widely-grown perennial forage has potential for development as a cellulosic ethanol feedstock. However, the genomics of alfalfa, a non-model species, is still in its infancy. The recent advent of RNA-Seq, a massively parallel sequencing method for transcriptome analysis, provides an opportunity to expand the identification of alfalfa genes and polymorphisms, and conduct in-depth transcript profiling. Results Cell walls in stems of alfalfa genotype 708 have higher cellulose and lower lignin concentrations compared to cell walls in stems of genotype 773. Using the Illumina GA-II platform, a total of 198,861,304 expression sequence tags (ESTs, 76 bp in length) were generated from cDNA libraries derived from elongating stem (ES) and post-elongation stem (PES) internodes of 708 and 773. In addition, 341,984 ESTs were generated from ES and PES internodes of genotype 773 using the GS FLX Titanium platform. The first alfalfa (Medicago sativa) gene index (MSGI 1.0) was assembled using the Sanger ESTs available from GenBank, the GS FLX Titanium EST sequences, and the de novo assembled Illumina sequences. MSGI 1.0 contains 124,025 unique sequences including 22,729 tentative consensus sequences (TCs), 22,315 singletons and 78,981 pseudo-singletons. We identified a total of 1,294 simple sequence repeats (SSR) among the sequences in MSGI 1.0. In addition, a total of 10,826 single nucleotide polymorphisms (SNPs) were predicted between the two genotypes. Out of 55 SNPs randomly selected for experimental validation, 47 (85%) were polymorphic between the two genotypes. We also identified numerous allelic variations within each genotype. Digital gene expression analysis identified numerous candidate genes that may play a role in stem development as well as candidate genes that may contribute to the differences in cell wall composition in stems of the two genotypes. Conclusions Our results demonstrate that RNA-Seq can be successfully used for gene identification, polymorphism detection and transcript profiling in alfalfa, a non-model, allogamous, autotetraploid species. The alfalfa gene index assembled in this study, and the SNPs, SSRs and candidate genes identified can be used to improve alfalfa as a forage crop and cellulosic feedstock. PMID:21504589
Construction of a plant-transformation-competent BIBAC library and genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.)

PubMed Central

2013-01-01

Background Cotton, one of the world’s leading crops, is important to the world’s textile and energy industries, and is a model species for studies of plant polyploidization, cellulose biosynthesis and cell wall biogenesis. Here, we report the construction of a plant-transformation-competent binary bacterial artificial chromosome (BIBAC) library and comparative genome sequence analysis of polyploid Upland cotton (Gossypium hirsutum L.) with one of its diploid putative progenitor species, G. raimondii Ulbr. Results We constructed the cotton BIBAC library in a vector competent for high-molecular-weight DNA transformation in different plant species through either Agrobacterium or particle bombardment. The library contains 76,800 clones with an average insert size of 135 kb, providing an approximate 99% probability of obtaining at least one positive clone from the library using a single-copy probe. The quality and utility of the library were verified by identifying BIBACs containing genes important for fiber development, fiber cellulose biosynthesis, seed fatty acid metabolism, cotton-nematode interaction, and bacterial blight resistance. In order to gain an insight into the Upland cotton genome and its relationship with G. raimondii, we sequenced nearly 10,000 BIBAC ends (BESs) randomly selected from the library, generating approximately one BES for every 250 kb along the Upland cotton genome. The retroelement Gypsy/DIRS1 family predominates in the Upland cotton genome, accounting for over 77% of all transposable elements. From the BESs, we identified 1,269 simple sequence repeats (SSRs), of which 1,006 were new, thus providing additional markers for cotton genome research. Surprisingly, comparative sequence analysis showed that Upland cotton is much more diverged from G. raimondii at the genomic sequence level than expected. There seems to be no significant difference between the relationships of the Upland cotton D- and A-subgenomes with the G. raimondii genome, even though G. raimondii contains a D genome (D5). Conclusions The library represents the first BIBAC library in cotton and related species, thus providing tools useful for integrative physical mapping, large-scale genome sequencing and large-scale functional analysis of the Upland cotton genome. Comparative sequence analysis provides insights into the Upland cotton genome, and a possible mechanism underlying the divergence and evolution of polyploid Upland cotton from its diploid putative progenitor species, G. raimondii. PMID:23537070

Analysis of BAC-end sequences (BESs) and development of BES-SSR markers for genetic mapping and hybrid purity assessment in pigeonpea (Cajanus spp.)

PubMed Central

2011-01-01

Background Pigeonpea [Cajanus cajan (L.) Millsp.] is an important legume crop of rainfed agriculture. Despite of concerted research efforts directed to pigeonpea improvement, stagnated productivity of pigeonpea during last several decades may be accounted to prevalence of various biotic and abiotic constraints and the situation is exacerbated by availability of inadequate genomic resources to undertake any molecular breeding programme for accelerated crop improvement. With the objective of enhancing genomic resources for pigeonpea, this study reports for the first time, large scale development of SSR markers from BAC-end sequences and their subsequent use for genetic mapping and hybridity testing in pigeonpea. Results A set of 88,860 BAC (bacterial artificial chromosome)-end sequences (BESs) were generated after constructing two BAC libraries by using HindIII (34,560 clones) and BamHI (34,560 clones) restriction enzymes. Clustering based on sequence identity of BESs yielded a set of >52K non-redundant sequences, comprising 35 Mbp or >4% of the pigeonpea genome. These sequences were analyzed to develop annotation lists and subdivide the BESs into genome fractions (e.g., genes, retroelements, transpons and non-annotated sequences). Parallel analysis of BESs for microsatellites or simple sequence repeats (SSRs) identified 18,149 SSRs, from which a set of 6,212 SSRs were selected for further analysis. A total of 3,072 novel SSR primer pairs were synthesized and tested for length polymorphism on a set of 22 parental genotypes of 13 mapping populations segregating for traits of interest. In total, we identified 842 polymorphic SSR markers that will have utility in pigeonpea improvement. Based on these markers, the first SSR-based genetic map comprising of 239 loci was developed for this previously uncharacterized genome. Utility of developed SSR markers was also demonstrated by identifying a set of 42 markers each for two hybrids (ICPH 2671 and ICPH 2438) for genetic purity assessment in commercial hybrid breeding programme. Conclusion In summary, while BAC libraries and BESs should be useful for genomics studies, BES-SSR markers, and the genetic map should be very useful for linking the genetic map with a future physical map as well as for molecular breeding in pigeonpea. PMID:21447154
A Simple Acronym for Doing Calculus: CAL

ERIC Educational Resources Information Center

Hathaway, Richard J.

2008-01-01

An acronym is presented that provides students a potentially useful, unifying view of the major topics covered in an elementary calculus sequence. The acronym (CAL) is based on viewing the calculus procedure for solving a calculus problem P* in three steps: (1) recognizing that the problem cannot be solved using simple (non-calculus) techniques;…
A Writing Intervention to Teach Simple Sentences and Descriptive Paragraphs to Adolescents with Writing Difficulties

ERIC Educational Resources Information Center

Datchuk, Shawn M.; Kubina, Richard M., Jr.

2017-01-01

The present study used a multiple-baseline, single-case experimental design to investigate the effects of a multicomponent intervention on construction of simple sentences and word sequences. The intervention entailed sequential delivery of sentence instruction and frequency building to a performance criterion and paragraph instruction.…
Comprehensive analysis of Arabidopsis expression level polymorphisms with simple inheritance

PubMed Central

Plantegenet, Stephanie; Weber, Johann; Goldstein, Darlene R; Zeller, Georg; Nussbaumer, Cindy; Thomas, Jérôme; Weigel, Detlef; Harshman, Keith; Hardtke, Christian S

2009-01-01

In Arabidopsis thaliana, gene expression level polymorphisms (ELPs) between natural accessions that exhibit simple, single locus inheritance are promising quantitative trait locus (QTL) candidates to explain phenotypic variability. It is assumed that such ELPs overwhelmingly represent regulatory element polymorphisms. However, comprehensive genome-wide analyses linking expression level, regulatory sequence and gene structure variation are missing, preventing definite verification of this assumption. Here, we analyzed ELPs observed between the Eil-0 and Lc-0 accessions. Compared with non-variable controls, 5′ regulatory sequence variation in the corresponding genes is indeed increased. However, ∼42% of all the ELP genes also carry major transcription unit deletions in one parent as revealed by genome tiling arrays, representing a >4-fold enrichment over controls. Within the subset of ELPs with simple inheritance, this proportion is even higher and deletions are generally more severe. Similar results were obtained from analyses of the Bay-0 and Sha accessions, using alternative technical approaches. Collectively, our results suggest that drastic structural changes are a major cause for ELPs with simple inheritance, corroborating experimentally observed indel preponderance in cloned Arabidopsis QTL. PMID:19225455
A Simple Sequence Repeat- and Single-Nucleotide Polymorphism-Based Genetic Linkage Map of the Brown Planthopper, Nilaparvata lugens

PubMed Central

Jairin, Jirapong; Kobayashi, Tetsuya; Yamagata, Yoshiyuki; Sanada-Morimura, Sachiyo; Mori, Kazuki; Tashiro, Kosuke; Kuhara, Satoru; Kuwazaki, Seigo; Urio, Masahiro; Suetsugu, Yoshitaka; Yamamoto, Kimiko; Matsumura, Masaya; Yasui, Hideshi

2013-01-01

In this study, we developed the first genetic linkage map for the major rice insect pest, the brown planthopper (BPH, Nilaparvata lugens). The linkage map was constructed by integrating linkage data from two backcross populations derived from three inbred BPH strains. The consensus map consists of 474 simple sequence repeats, 43 single-nucleotide polymorphisms, and 1 sequence-tagged site, for a total of 518 markers at 472 unique positions in 17 linkage groups. The linkage groups cover 1093.9 cM, with an average distance of 2.3 cM between loci. The average number of marker loci per linkage group was 27.8. The sex-linkage group was identified by exploiting X-linked and Y-specific markers. Our linkage map and the newly developed markers used to create it constitute an essential resource and a useful framework for future genetic analyses in BPH. PMID:23204257
Low-pass sequencing for microbial comparative genomics

PubMed Central

Goo, Young Ah; Roach, Jared; Glusman, Gustavo; Baliga, Nitin S; Deutsch, Kerry; Pan, Min; Kennedy, Sean; DasSarma, Shiladitya; Victor Ng, Wailap; Hood, Leroy

2004-01-01

Background We studied four extremely halophilic archaea by low-pass shotgun sequencing: (1) the metabolically versatile Haloarcula marismortui; (2) the non-pigmented Natrialba asiatica; (3) the psychrophile Halorubrum lacusprofundi and (4) the Dead Sea isolate Halobaculum gomorrense. Approximately one thousand single pass genomic sequences per genome were obtained. The data were analyzed by comparative genomic analyses using the completed Halobacterium sp. NRC-1 genome as a reference. Low-pass shotgun sequencing is a simple, inexpensive, and rapid approach that can readily be performed on any cultured microbe. Results As expected, the four archaeal halophiles analyzed exhibit both bacterial and eukaryotic characteristics as well as uniquely archaeal traits. All five halophiles exhibit greater than sixty percent GC content and low isoelectric points (pI) for their predicted proteins. Multiple insertion sequence (IS) elements, often involved in genome rearrangements, were identified in H. lacusprofundi and H. marismortui. The core biological functions that govern cellular and genetic mechanisms of H. sp. NRC-1 appear to be conserved in these four other halophiles. Multiple TATA box binding protein (TBP) and transcription factor IIB (TFB) homologs were identified from most of the four shotgunned halophiles. The reconstructed molecular tree of all five halophiles shows a large divergence between these species, but with the closest relationship being between H. sp. NRC-1 and H. lacusprofundi. Conclusion Despite the diverse habitats of these species, all five halophiles share (1) high GC content and (2) low protein isoelectric points, which are characteristics associated with environmental exposure to UV radiation and hypersalinity, respectively. Identification of multiple IS elements in the genome of H. lacusprofundi and H. marismortui suggest that genome structure and dynamic genome reorganization might be similar to that previously observed in the IS-element rich genome of H. sp. NRC-1. Identification of multiple TBP and TFB homologs in these four halophiles are consistent with the hypothesis that different types of complex transcriptional regulation may occur through multiple TBP-TFB combinations in response to rapidly changing environmental conditions. Low-pass shotgun sequence analyses of genomes permit extensive and diverse analyses, and should be generally useful for comparative microbial genomics. PMID:14718067
Characterization of the transcriptome of an ecologically important avian species, the Vinous-throated Parrotbill Paradoxornis webbianus bulomachus (Paradoxornithidae; Aves)

PubMed Central

2012-01-01

Background Adaptive divergence driven by environmental heterogeneity has long been a fascinating topic in ecology and evolutionary biology. The study of the genetic basis of adaptive divergence has, however, been greatly hampered by a lack of genomic information. The recent development of transcriptome sequencing provides an unprecedented opportunity to generate large amounts of genomic data for detailed investigations of the genetics of adaptive divergence in non-model organisms. Herein, we used the Illumina sequencing platform to sequence the transcriptome of brain and liver tissues from a single individual of the Vinous-throated Parrotbill, Paradoxornis webbianus bulomachus, an ecologically important avian species in Taiwan with a wide elevational range of sea level to 3100 m. Results Our 10.1 Gbp of sequences were first assembled based on Zebra Finch (Taeniopygia guttata) and chicken (Gallus gallus) RNA references. The remaining reads were then de novo assembled. After filtering out contigs with low coverage (<10X), we retained 67,791 of 487,336 contigs, which covered approximately 5.3% of the P. w. bulomachus genome. Of 7,779 contigs retained for a top-hit species distribution analysis, the majority (about 86%) were matched to known Zebra Finch and chicken transcripts. We also annotated 6,365 contigs to gene ontology (GO) terms: in total, 122 GO-slim terms were assigned, including biological process (41%), molecular function (32%), and cellular component (27%). Many potential genetic markers for future adaptive genomic studies were also identified: 8,589 single nucleotide polymorphisms, 1,344 simple sequence repeats and 109 candidate genes that might be involved in elevational or climate adaptation. Conclusions Our study shows that transcriptome data can serve as a rich genetic resource, even for a single run of short-read sequencing from a single individual of a non-model species. This is the first study providing transcriptomic information for species in the avian superfamily Sylvioidea, which comprises more than 1,000 species. Our data can be used to study adaptive divergence in heterogeneous environments and investigate other important ecological and evolutionary questions in parrotbills from different populations and even in other species in the Sylvioidea. PMID:22530590
A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

PubMed

Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

2015-04-11

Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.
Adenine specific DNA chemical sequencing reaction.

PubMed Central

Iverson, B L; Dervan, P B

1987-01-01

Reaction of DNA with K2PdCl4 at pH 2.0 followed by a piperidine workup produces specific cleavage at adenine (A) residues. Product analysis revealed the K2PdCl4 reaction involves selective depurination at adenine, affording an excision reaction analogous to the other chemical DNA sequencing reactions. Adenine residues methylated at the exocyclic amine (N6) react with lower efficiency than unmethylated adenine in an identical sequence. This simple protocol specific for A may be a useful addition to current chemical sequencing reactions. Images PMID:3671067
Background subtraction for fluorescence EXAFS data of a very dilute dopant Z in Z + 1 host.

PubMed

Medling, Scott; Bridges, Frank

2011-07-01

When conducting EXAFS at the Cu K-edge for ZnS:Cu with very low Cu concentration (<0.04% Cu), a large background was present that increased with energy. This background arises from a Zn X-ray Raman peak, which moves through the Cu fluorescence window, plus the tail of the Zn fluorescence peak. This large background distorts the EXAFS and must be removed separately before reducing the data. A simple means to remove this background is described.
Development of polymorphic genic-SSR markers by cDNA library sequencing in boxwood, Buxus spp. (Buxaceae)

USDA-ARS?s Scientific Manuscript database

Genic microsatellites or simple sequence repeat (genic-SSR) markers were developed in boxwood (Buxus taxa) for genetic diversity analysis, identification of taxa, and to facilitate breeding. cDNA libraries were developed from mRNA extracted from leaves of Buxus sempervirens ‘Vardar Valley’ and seque...
Loblolly pine SSR markers for shortleaf pine genetics

Treesearch

C. Dana Nelson; Sedley Josserand; Craig S. Echt; Jeff Koppelman

2007-01-01

Simple sequence repeats (SSR) are highly informative DNA-based markers widely used in population genetic and linkage mapping studies. We have been developing PCR primer pairs for amplifying SSR markers for loblolly pine (Pinus taeda L.) using loblolly pine DNA and EST sequence data as starting materials. Fifty primer pairs known to reliably amplify...
Diversity analysis in Cannabis sativa based on large-scale development of expressed sequence tag-derived simple sequence repeat markers.

PubMed

Gao, Chunsheng; Xin, Pengfei; Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

2014-01-01

Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis.
Diversity Analysis in Cannabis sativa Based on Large-Scale Development of Expressed Sequence Tag-Derived Simple Sequence Repeat Markers

PubMed Central

Cheng, Chaohua; Tang, Qing; Chen, Ping; Wang, Changbiao; Zang, Gonggu; Zhao, Lining

2014-01-01

Cannabis sativa L. is an important economic plant for the production of food, fiber, oils, and intoxicants. However, lack of sufficient simple sequence repeat (SSR) markers has limited the development of cannabis genetic research. Here, large-scale development of expressed sequence tag simple sequence repeat (EST-SSR) markers was performed to obtain more informative genetic markers, and to assess genetic diversity in cannabis (Cannabis sativa L.). Based on the cannabis transcriptome, 4,577 SSRs were identified from 3,624 ESTs. From there, a total of 3,442 complementary primer pairs were designed as SSR markers. Among these markers, trinucleotide repeat motifs (50.99%) were the most abundant, followed by hexanucleotide (25.13%), dinucleotide (16.34%), tetranucloetide (3.8%), and pentanucleotide (3.74%) repeat motifs, respectively. The AAG/CTT trinucleotide repeat (17.96%) was the most abundant motif detected in the SSRs. One hundred and seventeen EST-SSR markers were randomly selected to evaluate primer quality in 24 cannabis varieties. Among these 117 markers, 108 (92.31%) were successfully amplified and 87 (74.36%) were polymorphic. Forty-five polymorphic primer pairs were selected to evaluate genetic diversity and relatedness among the 115 cannabis genotypes. The results showed that 115 varieties could be divided into 4 groups primarily based on geography: Northern China, Europe, Central China, and Southern China. Moreover, the coefficient of similarity when comparing cannabis from Northern China with the European group cannabis was higher than that when comparing with cannabis from the other two groups, owing to a similar climate. This study outlines the first large-scale development of SSR markers for cannabis. These data may serve as a foundation for the development of genetic linkage, quantitative trait loci mapping, and marker-assisted breeding of cannabis. PMID:25329551
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV)

PubMed Central

Martin, Andrew C. R.

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and ’dotifying’ repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/. PMID:25653836
Viewing multiple sequence alignments with the JavaScript Sequence Alignment Viewer (JSAV).

PubMed

Martin, Andrew C R

2014-01-01

The JavaScript Sequence Alignment Viewer (JSAV) is designed as a simple-to-use JavaScript component for displaying sequence alignments on web pages. The display of sequences is highly configurable with options to allow alternative coloring schemes, sorting of sequences and 'dotifying' repeated amino acids. An option is also available to submit selected sequences to another web site, or to other JavaScript code. JSAV is implemented purely in JavaScript making use of the JQuery and JQuery-UI libraries. It does not use any HTML5-specific options to help with browser compatibility. The code is documented using JSDOC and is available from http://www.bioinf.org.uk/software/jsav/.
Brassica ASTRA: an integrated database for Brassica genomic research.

PubMed

Love, Christopher G; Robinson, Andrew J; Lim, Geraldine A C; Hopkins, Clare J; Batley, Jacqueline; Barker, Gary; Spangenberg, German C; Edwards, David

2005-01-01

Brassica ASTRA is a public database for genomic information on Brassica species. The database incorporates expressed sequences with Swiss-Prot and GenBank comparative sequence annotation as well as secondary Gene Ontology (GO) annotation derived from the comparison with Arabidopsis TAIR GO annotations. Simple sequence repeat molecular markers are identified within resident sequences and mapped onto the closely related Arabidopsis genome sequence. Bacterial artificial chromosome (BAC) end sequences derived from the Multinational Brassica Genome Project are also mapped onto the Arabidopsis genome sequence enabling users to identify candidate Brassica BACs corresponding to syntenic regions of Arabidopsis. This information is maintained in a MySQL database with a web interface providing the primary means of interrogation. The database is accessible at http://hornbill.cspp.latrobe.edu.au.
Discovery and mapping of a new expressed sequence tag-single nucleotide polymorphism and simple sequence repeat panel for large-scale genetic studies and breeding of Theobroma cacao L.

PubMed Central

Allegre, Mathilde; Argout, Xavier; Boccara, Michel; Fouet, Olivier; Roguet, Yolande; Bérard, Aurélie; Thévenin, Jean Marc; Chauveau, Aurélie; Rivallan, Ronan; Clement, Didier; Courtois, Brigitte; Gramacho, Karina; Boland-Augé, Anne; Tahi, Mathias; Umaharan, Pathmanathan; Brunel, Dominique; Lanaud, Claire

2012-01-01

Theobroma cacao is an economically important tree of several tropical countries. Its genetic improvement is essential to provide protection against major diseases and improve chocolate quality. We discovered and mapped new expressed sequence tag-single nucleotide polymorphism (EST-SNP) and simple sequence repeat (SSR) markers and constructed a high-density genetic map. By screening 149 650 ESTs, 5246 SNPs were detected in silico, of which 1536 corresponded to genes with a putative function, while 851 had a clear polymorphic pattern across a collection of genetic resources. In addition, 409 new SSR markers were detected on the Criollo genome. Lastly, 681 new EST-SNPs and 163 new SSRs were added to the pre-existing 418 co-dominant markers to construct a large consensus genetic map. This high-density map and the set of new genetic markers identified in this study are a milestone in cocoa genomics and for marker-assisted breeding. The data are available at http://tropgenedb.cirad.fr. PMID:22210604
The Flushtration Count Illusion: Attribute substitution tricks our interpretation of a simple visual event sequence.

PubMed

Thomas, Cyril; Didierjean, André; Kuhn, Gustav

2018-04-17

When faced with a difficult question, people sometimes work out an answer to a related, easier question without realizing that a substitution has taken place (e.g., Kahneman, 2011, Thinking, fast and slow. New York, Farrar, Strauss, Giroux). In two experiments, we investigated whether this attribute substitution effect can also affect the interpretation of a simple visual event sequence. We used a magic trick called the 'Flushtration Count Illusion', which involves a technique used by magicians to give the illusion of having seen multiple cards with identical backs, when in fact only the back of one card (the bottom card) is repeatedly shown. In Experiment 1, we demonstrated that most participants are susceptible to the illusion, even if they have the visual and analytical reasoning capacity to correctly process the sequence. In Experiment 2, we demonstrated that participants construct a biased and simplified representation of the Flushtration Count by substituting some attributes of the event sequence. We discussed of the psychological processes underlying this attribute substitution effect. © 2018 The British Psychological Society.
SSRscanner: a program for reporting distribution and exact location of simple sequence repeats

PubMed Central

Anwar, Tamanna; Khan, Asad U

2006-01-01

Simple sequence repeats (SSRs) have become important molecular markers for a broad range of applications, such as genome mapping and characterization, phenotype mapping, marker assisted selection of crop plants and a range of molecular ecology and diversity studies. These repeated DNA sequences are found in both prokaryotes and eukaryotes. They are distributed almost at random throughout the genome, ranging from mononucleotide to trinucleotide repeats. They are also found at longer lengths (> 6 repeating units) of tracts. Most of the computer programs that find SSRs do not report its exact position. A computer program SSRscanner was written to find out distribution, frequency and exact location of each SSR in the genome. SSRscanner is user friendly. It can search repeats of any length and produce outputs with their exact position on chromosome and their frequency of occurrence in the sequence. Availability This program has been written in PERL and is freely available for non-commercial users by request from the authors. Please contact the authors by E-mail: huzzi99@hotmail.com PMID:17597863

Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

PubMed

Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

2014-04-08

The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.
Simple Experiments on Magnetism and Electricity...from Edison.

ERIC Educational Resources Information Center

Schultz, Robert F.

Background information, lists of materials needed and procedures used are provided for 16 simple experiments on electricity and magnetism. These experiments are organized into sections dealing with: (1) Edison's carbon experiments (building a galvanometer, investigating the variable conductivity of carbon, and examining the carbon transmitter…
EASY: a simple tool for simultaneously removing background, deadtime and acoustic ringing in quantitative NMR spectroscopy--part I: basic principle and applications.

PubMed

Jaeger, Christian; Hemmann, Felix

2014-01-01

Elimination of Artifacts in NMR SpectroscopY (EASY) is a simple but very effective tool to remove simultaneously any real NMR probe background signal, any spectral distortions due to deadtime ringdown effects and -specifically- severe acoustic ringing artifacts in NMR spectra of low-gamma nuclei. EASY enables and maintains quantitative NMR (qNMR) as only a single pulse (preferably 90°) is used for data acquisition. After the acquisition of the first scan (it contains the wanted NMR signal and the background/deadtime/ringing artifacts) the same experiment is repeated immediately afterwards before the T1 waiting delay. This second scan contains only the background/deadtime/ringing parts. Hence, the simple difference of both yields clean NMR line shapes free of artefacts. In this Part I various examples for complete (1)H, (11)B, (13)C, (19)F probe background removal due to construction parts of the NMR probes are presented. Furthermore, (25)Mg EASY of Mg(OH)2 is presented and this example shows how extremely strong acoustic ringing can be suppressed (more than a factor of 200) such that phase and baseline correction for spectra acquired with a single pulse is no longer a problem. EASY is also a step towards deadtime-free data acquisition as these effects are also canceled completely. EASY can be combined with any other NMR experiment, including 2D NMR, if baseline distortions are a big problem. © 2013 Published by Elsevier Inc.
Using sheep genomes from diverse U.S. breeds to identify missense variants in genes affecting fecundity

USDA-ARS?s Scientific Manuscript database

Background: Access to sheep genome sequences significantly improves the chances of identifying genes that may influence the health, welfare, and productivity of these animals. Methods: A public, searchable DNA sequence resource for U.S. sheep was created with whole genome sequence (WGS) of 96 rams. ...
Initial Steps in Creating a Developmentally Valid Tool for Observing/Assessing Rope Jumping

ERIC Educational Resources Information Center

Roberton, Mary Ann; Thompson, Gregory; Langendorfer, Stephen J.

2017-01-01

Background: Valid motor development sequences show the various behaviors that children display as they progress toward competence in specific motor skills. Teachers can use these sequences to observe informally or formally assess their students. While longitudinal study is ultimately required to validate developmental sequences, there are earlier,…
What's in your next-generation sequence data? An exploration of unmapped DNA and RNA sequence reads from the bovine reference individual

USDA-ARS?s Scientific Manuscript database

BACKGROUND: Next-generation sequencing projects commonly commence by aligning reads to a reference genome assembly. While improvements in alignment algorithms and computational hardware have greatly enhanced the efficiency and accuracy of alignments, a significant percentage of reads often remain u...
Digestion-ligation-only Hi-C is an efficient and cost-effective method for chromosome conformation capture.

PubMed

Lin, Da; Hong, Ping; Zhang, Siheng; Xu, Weize; Jamal, Muhammad; Yan, Keji; Lei, Yingying; Li, Liang; Ruan, Yijun; Fu, Zhen F; Li, Guoliang; Cao, Gang

2018-05-01

Chromosome conformation capture (3C) technologies can be used to investigate 3D genomic structures. However, high background noise, high costs, and a lack of straightforward noise evaluation in current methods impede the advancement of 3D genomic research. Here we developed a simple digestion-ligation-only Hi-C (DLO Hi-C) technology to explore the 3D landscape of the genome. This method requires only two rounds of digestion and ligation, without the need for biotin labeling and pulldown. Non-ligated DNA was efficiently removed in a cost-effective step by purifying specific linker-ligated DNA fragments. Notably, random ligation could be quickly evaluated in an early quality-control step before sequencing. Moreover, an in situ version of DLO Hi-C using a four-cutter restriction enzyme has been developed. We applied DLO Hi-C to delineate the genomic architecture of THP-1 and K562 cells and uncovered chromosomal translocations. This technology may facilitate investigation of genomic organization, gene regulation, and (meta)genome assembly.
Ultrafast state detection and 2D ion crystals in a Paul trap

NASA Astrophysics Data System (ADS)

Ip, Michael; Ransford, Anthony; Campbell, Wesley

2016-05-01

Projective readout of quantum information stored in atomic qubits typically uses state-dependent CW laser-induced fluorescence. This method requires an often sophisticated imaging system to spatially filter out the background CW laser light. We present an alternative approach that instead uses simple pulse sequences from a mode-locked laser to affect the same state-dependent excitations in less than 1 ns. The resulting atomic fluorescence occurs in the dark, allowing the placement of non-imaging detectors right next to the atom to improve the qubit state detection efficiency and speed. We also study 2D Coulomb crystals of atomic ions in an oblate Paul trap. We find that crystals with hundreds of ions can be held in the trap, potentially offering an alternative to the use of Penning traps for the quantum simulation of 2D lattice spin models. We discuss the classical physics of these crystals and the metastable states that are supported in 2D. This work is supported by the US Army Research Office.
SDM-Assist software to design site-directed mutagenesis primers introducing “silent” restriction sites

PubMed Central

2013-01-01

Background Over the past decades site-directed mutagenesis (SDM) has become an indispensable tool for biological structure-function studies. In principle, SDM uses modified primer pairs in a PCR reaction to introduce a mutation in a cDNA insert. DpnI digestion of the reaction mixture is used to eliminate template copies before amplification in E. coli; however, this process is inefficient resulting in un-mutated clones which can only be distinguished from mutant clones by sequencing. Results We have developed a program – ‘SDM-Assist’ which creates SDM primers adding a specific identifier: through additional silent mutations a restriction site is included or a previous one removed which allows for highly efficient identification of ‘mutated clones’ by a simple restriction digest. Conclusions The direct identification of SDM clones will save time and money for researchers. SDM-Assist also scores the primers based on factors such as Tm, GC content and secondary structure allowing for simplified selection of optimal primer pairs. PMID:23522286
DNA Extraction and Amplification from Contemporary Polynesian Bark-Cloth

PubMed Central

Moncada, Ximena; Payacán, Claudia; Arriaza, Francisco; Lobos, Sergio; Seelenfreund, Daniela; Seelenfreund, Andrea

2013-01-01

Background Paper mulberry has been used for thousands of years in Asia and Oceania for making paper and bark-cloth, respectively. Museums around the world hold valuable collections of Polynesian bark-cloth. Genetic analysis of the plant fibers from which the textiles were made may answer a number of questions of interest related to provenance, authenticity or species used in the manufacture of these textiles. Recovery of nucleic acids from paper mulberry bark-cloth has not been reported before. Methodology We describe a simple method for the extraction of PCR-amplifiable DNA from small samples of contemporary Polynesian bark-cloth (tapa) using two types of nuclear markers. We report the amplification of about 300 bp sequences of the ITS1 region and of a microsatellite marker. Conclusions Sufficient DNA was retrieved from all bark-cloth samples to permit successful PCR amplification. This method shows a means of obtaining useful genetic information from modern bark-cloth samples and opens perspectives for the analyses of small fragments derived from ethnographic materials. PMID:23437166
Testing the Role of Multicopy Plasmids in the Evolution of Antibiotic Resistance.

PubMed

Escudero, Jose Antonio; MacLean, R Craig; San Millan, Alvaro

2018-05-02

Multicopy plasmids are extremely abundant in prokaryotes but their role in bacterial evolution remains poorly understood. We recently showed that the increase in gene copy number per cell provided by multicopy plasmids could accelerate the evolution of plasmid-encoded genes. In this work, we present an experimental system to test the ability of multicopy plasmids to promote gene evolution. Using simple molecular biology methods, we constructed a model system where an antibiotic resistance gene can be inserted into Escherichia coli MG1655, either in the chromosome or on a multicopy plasmid. We use an experimental evolution approach to propagate the different strains under increasing concentrations of antibiotics and we measure survival of bacterial populations over time. The choice of the antibiotic molecule and the resistance gene is so that the gene can only confer resistance through the acquisition of mutations. This "evolutionary rescue" approach provides a simple method to test the potential of multicopy plasmids to promote the acquisition of antibiotic resistance. In the next step of the experimental system, the molecular bases of antibiotic resistance are characterized. To identify mutations responsible for the acquisition of antibiotic resistance we use deep DNA sequencing of samples obtained from whole populations and clones. Finally, to confirm the role of the mutations in the gene under study, we reconstruct them in the parental background and test the resistance phenotype of the resulting strains.
Asymptotic convertibility of entanglement: An information-spectrum approach to entanglement concentration and dilution

NASA Astrophysics Data System (ADS)

Jiao, Yong; Wakakuwa, Eyuri; Ogawa, Tomohiro

2018-02-01

We consider asymptotic convertibility of an arbitrary sequence of bipartite pure states into another by local operations and classical communication (LOCC). We adopt an information-spectrum approach to address cases where each element of the sequences is not necessarily a tensor power of a bipartite pure state. We derive necessary and sufficient conditions for the LOCC convertibility of one sequence to another in terms of spectral entropy rates of entanglement of the sequences. Based on these results, we also provide simple proofs for previously known results on the optimal rates of entanglement concentration and dilution of general sequences of bipartite pure states.
BioWord: a sequence manipulation suite for Microsoft Word.

PubMed

Anzaldi, Laura J; Muñoz-Fernández, Daniel; Erill, Ivan

2012-06-07

The ability to manipulate, edit and process DNA and protein sequences has rapidly become a necessary skill for practicing biologists across a wide swath of disciplines. In spite of this, most everyday sequence manipulation tools are distributed across several programs and web servers, sometimes requiring installation and typically involving frequent switching between applications. To address this problem, here we have developed BioWord, a macro-enabled self-installing template for Microsoft Word documents that integrates an extensive suite of DNA and protein sequence manipulation tools. BioWord is distributed as a single macro-enabled template that self-installs with a single click. After installation, BioWord will open as a tab in the Office ribbon. Biologists can then easily manipulate DNA and protein sequences using a familiar interface and minimize the need to switch between applications. Beyond simple sequence manipulation, BioWord integrates functionality ranging from dyad search and consensus logos to motif discovery and pair-wise alignment. Written in Visual Basic for Applications (VBA) as an open source, object-oriented project, BioWord allows users with varying programming experience to expand and customize the program to better meet their own needs. BioWord integrates a powerful set of tools for biological sequence manipulation within a handy, user-friendly tab in a widely used word processing software package. The use of a simple scripting language and an object-oriented scheme facilitates customization by users and provides a very accessible educational platform for introducing students to basic bioinformatics algorithms.
Real-time detection of small and dim moving objects in IR video sequences using a robust background estimator and a noise-adaptive double thresholding

NASA Astrophysics Data System (ADS)

Zingoni, Andrea; Diani, Marco; Corsini, Giovanni

2016-10-01

We developed an algorithm for automatically detecting small and poorly contrasted (dim) moving objects in real-time, within video sequences acquired through a steady infrared camera. The algorithm is suitable for different situations since it is independent of the background characteristics and of changes in illumination. Unlike other solutions, small objects of any size (up to single-pixel), either hotter or colder than the background, can be successfully detected. The algorithm is based on accurately estimating the background at the pixel level and then rejecting it. A novel approach permits background estimation to be robust to changes in the scene illumination and to noise, and not to be biased by the transit of moving objects. Care was taken in avoiding computationally costly procedures, in order to ensure the real-time performance even using low-cost hardware. The algorithm was tested on a dataset of 12 video sequences acquired in different conditions, providing promising results in terms of detection rate and false alarm rate, independently of background and objects characteristics. In addition, the detection map was produced frame by frame in real-time, using cheap commercial hardware. The algorithm is particularly suitable for applications in the fields of video-surveillance and computer vision. Its reliability and speed permit it to be used also in critical situations, like in search and rescue, defence and disaster monitoring.
Modeling How, When, and What Is Learned in a Simple Fault-Finding Task

ERIC Educational Resources Information Center

Ritter, Frank E.; Bibby, Peter A.

2008-01-01

We have developed a process model that learns in multiple ways while finding faults in a simple control panel device. The model predicts human participants' learning through its own learning. The model's performance was systematically compared to human learning data, including the time course and specific sequence of learned behaviors. These…
One Stroke at a Time

ERIC Educational Resources Information Center

Hollibaugh, Molly

2012-01-01

At first glance, a Zentangle creation can seem intricate and complicated. But, when you learn how it is done, you realize how simple it is. Zentangles are patterns, or "tangles," that have been reduced to a simple sequence of elemental strokes. When you learn to focus on each stroke you find yourself capable of things that you may have once…
An annotated genetic map of loblolly pine based on microsatellite and cDNA markers

USDA-ARS?s Scientific Manuscript database

Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective o...
Detection of possible restriction sites for type II restriction enzymes in DNA sequences.

PubMed

Gagniuc, P; Cimponeriu, D; Ionescu-Tîrgovişte, C; Mihai, Andrada; Stavarachi, Monica; Mihai, T; Gavrilă, L

2011-01-01

In order to make a step forward in the knowledge of the mechanism operating in complex polygenic disorders such as diabetes and obesity, this paper proposes a new algorithm (PRSD -possible restriction site detection) and its implementation in Applied Genetics software. This software can be used for in silico detection of potential (hidden) recognition sites for endonucleases and for nucleotide repeats identification. The recognition sites for endonucleases may result from hidden sequences through deletion or insertion of a specific number of nucleotides. Tests were conducted on DNA sequences downloaded from NCBI servers using specific recognition sites for common type II restriction enzymes introduced in the software database (n = 126). Each possible recognition site indicated by the PRSD algorithm implemented in Applied Genetics was checked and confirmed by NEBcutter V2.0 and Webcutter 2.0 software. In the sequence NG_008724.1 (which includes 63632 nucleotides) we found a high number of potential restriction sites for ECO R1 that may be produced by deletion (n = 43 sites) or insertion (n = 591 sites) of one nucleotide. The second module of Applied Genetics has been designed to find simple repeats sizes with a real future in understanding the role of SNPs (Single Nucleotide Polymorphisms) in the pathogenesis of the complex metabolic disorders. We have tested the presence of simple repetitive sequences in five DNA sequence. The software indicated exact position of each repeats detected in the tested sequences. Future development of Applied Genetics can provide an alternative for powerful tools used to search for restriction sites or repetitive sequences or to improve genotyping methods.
``Sequence space soup'' of proteins and copolymers

NASA Astrophysics Data System (ADS)

Chan, Hue Sun; Dill, Ken A.

1991-09-01

To study the protein folding problem, we use exhaustive computer enumeration to explore ``sequence space soup,'' an imaginary solution containing the ``native'' conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymer sequence. The model is of short self-avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two-dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acid monomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.
Masking as an effective quality control method for next-generation sequencing data analysis.

PubMed

Yun, Sajung; Yun, Sijung

2014-12-13

Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

How much detail and accuracy is required in plant growth sub-models to address questions about optimal management strategies in agricultural systems?

PubMed Central

Renton, Michael

2011-01-01

Background and aims Simulations that integrate sub-models of important biological processes can be used to ask questions about optimal management strategies in agricultural and ecological systems. Building sub-models with more detail and aiming for greater accuracy and realism may seem attractive, but is likely to be more expensive and time-consuming and result in more complicated models that lack transparency. This paper illustrates a general integrated approach for constructing models of agricultural and ecological systems that is based on the principle of starting simple and then directly testing for the need to add additional detail and complexity. Methodology The approach is demonstrated using LUSO (Land Use Sequence Optimizer), an agricultural system analysis framework based on simulation and optimization. A simple sensitivity analysis and functional perturbation analysis is used to test to what extent LUSO's crop–weed competition sub-model affects the answers to a number of questions at the scale of the whole farming system regarding optimal land-use sequencing strategies and resulting profitability. Principal results The need for accuracy in the crop–weed competition sub-model within LUSO depended to a small extent on the parameter being varied, but more importantly and interestingly on the type of question being addressed with the model. Only a small part of the crop–weed competition model actually affects the answers to these questions. Conclusions This study illustrates an example application of the proposed integrated approach for constructing models of agricultural and ecological systems based on testing whether complexity needs to be added to address particular questions of interest. We conclude that this example clearly demonstrates the potential value of the general approach. Advantages of this approach include minimizing costs and resources required for model construction, keeping models transparent and easy to analyse, and ensuring the model is well suited to address the question of interest. PMID:22476477
Molecular Diagnosis of Pathogenic Sporothrix Species

PubMed Central

Rodrigues, Anderson Messias; de Hoog, G. Sybren; de Camargo, Zoilo Pires

2015-01-01

Background Sporotrichosis is a chronic (sub)cutaneous infection caused by thermodimorphic fungi in the order, Ophiostomatales. These fungi are characterized by major differences in routes of transmission, host predilections, species virulence, and susceptibilities to antifungals. Sporothrix species emerge in the form of outbreaks. Large zoonoses and sapronoses are ongoing in Brazil and China, respectively. Current diagnostic methods based on morphology and physiology are inaccurate due to closely related phenotypes with overlapping components between pathogenic and non-pathogenic Sporothrix. There is a critical need for new diagnostic tools that are specific, sensitive, and cost-effective. Methodology We developed a panel of novel markers, based on calmodulin (CAL) gene sequences, for the large-scale diagnosis and epidemiology of clinically relevant members of the Sporothrix genus, and its relative, Ophiostoma. We identified specific PCR-based markers for S. brasiliensis, S. schenckii, S. globosa, S. mexicana, S. pallida, and O. stenoceras. We employed a murine model of disseminated sporotrichosis to optimize a PCR assay for detecting Sporothrix in clinical specimens. Results Primer-BLAST searches revealed candidate sequences that were conserved within a single species. Species-specific primers showed no significant homology with human, mouse, or microorganisms outside the Sporothrix genus. The detection limit was 10–100 fg of DNA in a single round of PCR for identifying S. brasiliensis, S. schenckii, S. globosa, S. mexicana, and S. pallida. A simple, direct PCR assay, with conidia as a source of DNA, was effective for rapid, low-cost genotyping. Samples from a murine model of disseminated sporotrichosis confirmed the feasibility of detecting S. brasiliensis and S. schenckii DNA in spleen, liver, lungs, heart, brain, kidney, tail, and feces of infected animals. Conclusions This PCR-based method could successfully detect and identify a single species in samples from cultures and from clinical specimens. The method proved to be simple, high throughput, sensitive, and accurate for diagnosing sporotrichosis. PMID:26623643
Synthetic oligonucleotide probes deduced from amino acid sequence data. Theoretical and practical considerations.

PubMed

Lathe, R

1985-05-05

Synthetic probes deduced from amino acid sequence data are widely used to detect cognate coding sequences in libraries of cloned DNA segments. The redundancy of the genetic code dictates that a choice must be made between (1) a mixture of probes reflecting all codon combinations, and (2) a single longer "optimal" probe. The second strategy is examined in detail. The frequency of sequences matching a given probe by chance alone can be determined and also the frequency of sequences closely resembling the probe and contributing to the hybridization background. Gene banks cannot be treated as random associations of the four nucleotides, and probe sequences deduced from amino acid sequence data occur more often than predicted by chance alone. Probe lengths must be increased to confer the necessary specificity. Examination of hybrids formed between unique homologous probes and their cognate targets reveals that short stretches of perfect homology occurring by chance make a significant contribution to the hybridization background. Statistical methods for improving homology are examined, taking human coding sequences as an example, and considerations of codon utilization and dinucleotide frequencies yield an overall homology of greater than 82%. Recommendations for probe design and hybridization are presented, and the choice between using multiple probes reflecting all codon possibilities and a unique optimal probe is discussed.
De Novo Transcriptome Sequencing Reveals Important Molecular Networks and Metabolic Pathways of the Plant, Chlorophytum borivilianum

PubMed Central

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum. PMID:24376689
Optimization of sequence alignment for simple sequence repeat regions.

PubMed

Jighly, Abdulqader; Hamwieh, Aladdin; Ogbonnaya, Francis C

2011-07-20

Microsatellites, or simple sequence repeats (SSRs), are tandemly repeated DNA sequences, including tandem copies of specific sequences no longer than six bases, that are distributed in the genome. SSR has been used as a molecular marker because it is easy to detect and is used in a range of applications, including genetic diversity, genome mapping, and marker assisted selection. It is also very mutable because of slipping in the DNA polymerase during DNA replication. This unique mutation increases the insertion/deletion (INDELs) mutation frequency to a high ratio - more than other types of molecular markers such as single nucleotide polymorphism (SNPs).SNPs are more frequent than INDELs. Therefore, all designed algorithms for sequence alignment fit the vast majority of the genomic sequence without considering microsatellite regions, as unique sequences that require special consideration. The old algorithm is limited in its application because there are many overlaps between different repeat units which result in false evolutionary relationships. To overcome the limitation of the aligning algorithm when dealing with SSR loci, a new algorithm was developed using PERL script with a Tk graphical interface. This program is based on aligning sequences after determining the repeated units first, and the last SSR nucleotides positions. This results in a shifting process according to the inserted repeated unit type.When studying the phylogenic relations before and after applying the new algorithm, many differences in the trees were obtained by increasing the SSR length and complexity. However, less distance between different linage had been observed after applying the new algorithm. The new algorithm produces better estimates for aligning SSR loci because it reflects more reliable evolutionary relations between different linages. It reduces overlapping during SSR alignment, which results in a more realistic phylogenic relationship.
De Novo transcriptome sequencing reveals important molecular networks and metabolic pathways of the plant, Chlorophytum borivilianum.

PubMed

Kalra, Shikha; Puniya, Bhanwar Lal; Kulshreshtha, Deepika; Kumar, Sunil; Kaur, Jagdeep; Ramachandran, Srinivasan; Singh, Kashmir

2013-01-01

Chlorophytum borivilianum, an endangered medicinal plant species is highly recognized for its aphrodisiac properties provided by saponins present in the plant. The transcriptome information of this species is limited and only few hundred expressed sequence tags (ESTs) are available in the public databases. To gain molecular insight of this plant, high throughput transcriptome sequencing of leaf RNA was carried out using Illumina's HiSeq 2000 sequencing platform. A total of 22,161,444 single end reads were retrieved after quality filtering. Available (e.g., De-Bruijn/Eulerian graph) and in-house developed bioinformatics tools were used for assembly and annotation of transcriptome. A total of 101,141 assembled transcripts were obtained, with coverage size of 22.42 Mb and average length of 221 bp. Guanine-cytosine (GC) content was found to be 44%. Bioinformatics analysis, using non-redundant proteins, gene ontology (GO), enzyme commission (EC) and kyoto encyclopedia of genes and genomes (KEGG) databases, extracted all the known enzymes involved in saponin and flavonoid biosynthesis. Few genes of the alkaloid biosynthesis, along with anticancer and plant defense genes, were also discovered. Additionally, several cytochrome P450 (CYP450) and glycosyltransferase unique sequences were also found. We identified simple sequence repeat motifs in transcripts with an abundance of di-nucleotide simple sequence repeat (SSR; 43.1%) markers. Large scale expression profiling through Reads per Kilobase per Million mapped reads (RPKM) showed major genes involved in different metabolic pathways of the plant. Genes, expressed sequence tags (ESTs) and unique sequences from this study provide an important resource for the scientific community, interested in the molecular genetics and functional genomics of C. borivilianum.
Weak beacon detection for air-to-ground optical wireless link establishment.

PubMed

Han, Yaoqiang; Dang, Anhong; Tang, Junxiong; Guo, Hong

2010-02-01

In an air-to-ground free-space optical communication system, strong background interference seriously affects the beacon detection, which makes it difficult to establish the optical link. In this paper, we propose a correlation beacon detection scheme under strong background interference conditions. As opposed to traditional beacon detection schemes, the beacon is modulated by an m-sequence at the transmitting terminal with a digital differential matched filter (DDMF) array introduced at the receiving end to detect the modulated beacon. This scheme is capable of suppressing both strong interference and noise by correlation reception of the received image sequence. In addition, the DDMF array enables each pixel of the image sensor to have its own DDMF of the same structure to process its received image sequence in parallel, thus it makes fast beacon detection possible. Theoretical analysis and an outdoor experiment have been demonstrated and show that the proposed scheme can realize fast and effective beacon detection under strong background interference conditions. Consequently, the required beacon transmission power can also be reduced dramatically.
Dog leukocyte antigen class II-associated genetic risk testing for immune disorders of dogs: simplified approaches using Pug dog necrotizing meningoencephalitis as a model.

PubMed

Pedersen, Niels; Liu, Hongwei; Millon, Lee; Greer, Kimberly

2011-01-01

A significantly increased risk for a number of autoimmune and infectious diseases in purebred and mixed-breed dogs has been associated with certain alleles or allele combinations of the dog leukocyte antigen (DLA) class II complex containing the DRB1, DQA1, and DQB1 genes. The exact level of risk depends on the specific disease, the alleles in question, and whether alleles exist in a homozygous or heterozygous state. The gold standard for identifying high-risk alleles and their zygosity has involved direct sequencing of the exon 2 regions of each of the 3 genes. However, sequencing and identification of specific alleles at each of the 3 loci are relatively expensive and sequencing techniques are not ideal for additional parentage or identity determination. However, it is often possible to get the same information from sequencing only 1 gene given the small number of possible alleles at each locus in purebred dogs, extensive homozygosity, and tendency for disease-causing alleles at each of the 3 loci to be strongly linked to each other into haplotypes. Therefore, genetic testing in purebred dogs with immune diseases can be often simplified by sequencing alleles at 1 rather than 3 loci. Further simplification of genetic tests for canine immune diseases can be achieved by the use of alternative genetic markers in the DLA class II region that are also strongly linked with the disease genotype. These markers consist of either simple tandem repeats or single nucleotide polymorphisms that are also in strong linkage with specific DLA class II genotypes and/or haplotypes. The current study uses necrotizing meningoencephalitis of Pug dogs as a paradigm to assess simple alternative genetic tests for disease risk. It was possible to attain identical necrotizing meningoencephalitis risk assessments to 3-locus DLA class II sequencing by sequencing only the DQB1 gene, using 3 DLA class II-linked simple tandem repeat markers, or with a small single nucleotide polymorphism array designed to identify breed-specific DQB1 alleles.
Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis

PubMed Central

Zheng, Jin-shuang; Sun, Cheng-zhen; Zhang, Shu-ning; Hou, Xi-lin; Bonnema, Guusje

2016-01-01

A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis. PMID:27507974
Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis.

PubMed

Zheng, Jin-Shuang; Sun, Cheng-Zhen; Zhang, Shu-Ning; Hou, Xi-Lin; Bonnema, Guusje

2016-01-01

A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis.
Simple Shared Motifs (SSM) in conserved region of promoters: a new approach to identify co-regulation patterns.

PubMed

Gruel, Jérémy; LeBorgne, Michel; LeMeur, Nolwenn; Théret, Nathalie

2011-09-12

Regulation of gene expression plays a pivotal role in cellular functions. However, understanding the dynamics of transcription remains a challenging task. A host of computational approaches have been developed to identify regulatory motifs, mainly based on the recognition of DNA sequences for transcription factor binding sites. Recent integration of additional data from genomic analyses or phylogenetic footprinting has significantly improved these methods. Here, we propose a different approach based on the compilation of Simple Shared Motifs (SSM), groups of sequences defined by their length and similarity and present in conserved sequences of gene promoters. We developed an original algorithm to search and count SSM in pairs of genes. An exceptional number of SSM is considered as a common regulatory pattern. The SSM approach is applied to a sample set of genes and validated using functional gene-set enrichment analyses. We demonstrate that the SSM approach selects genes that are over-represented in specific biological categories (Ontology and Pathways) and are enriched in co-expressed genes. Finally we show that genes co-expressed in the same tissue or involved in the same biological pathway have increased SSM values. Using unbiased clustering of genes, Simple Shared Motifs analysis constitutes an original contribution to provide a clearer definition of expression networks.
Identification of apple cultivars on the basis of simple sequence repeat markers.

PubMed

Liu, G S; Zhang, Y G; Tao, R; Fang, J G; Dai, H Y

2014-09-12

DNA markers are useful tools that play an important role in plant cultivar identification. They are usually based on polymerase chain reaction (PCR) and include simple sequence repeats (SSRs), inter-simple sequence repeats, and random amplified polymorphic DNA. However, DNA markers were not used effectively in the complete identification of plant cultivars because of the lack of known DNA fingerprints. Recently, a novel approach called the cultivar identification diagram (CID) strategy was developed to facilitate the use of DNA markers for separate plant individuals. The CID was designed whereby a polymorphic maker was generated from each PCR that directly allowed for cultivar sample separation at each step. Therefore, it could be used to identify cultivars and varieties easily with fewer primers. In this study, 60 apple cultivars, including a few main cultivars in fields and varieties from descendants (Fuji x Telamon) were examined. Of the 20 pairs of SSR primers screened, 8 pairs gave reproducible, polymorphic DNA amplification patterns. The banding patterns obtained from these 8 primers were used to construct a CID map. Each cultivar or variety in this study was distinguished from the others completely, indicating that this method can be used for efficient cultivar identification. The result contributed to studies on germplasm resources and the seedling industry in fruit trees.
The paradox of MHC-DRB exon/intron evolution: alpha-helix and beta-sheet encoding regions diverge while hypervariable intronic simple repeats coevolve with beta-sheet codons.

PubMed

Schwaiger, F W; Weyers, E; Epplen, C; Brün, J; Ruff, G; Crawford, A; Epplen, J T

1993-09-01

Twenty-one different caprine and 13 ovine MHC-DRB exon 2 sequences were determined including part of the adjacent introns containing simple repetitive (gt)n(ga)m elements. The positions for highly polymorphic DRB amino acids vary slightly among ungulates and other mammals. From man and mouse to ungulates the basic (gt)n(ga)m structure is fixed in evolution for 7 x 10(7) years whereas ample variations exist in the tandem (gt)n and (ga)m dinucleotides and especially their "degenerated" derivatives. Phylogenetic trees for the alpha-helices and beta-pleated sheets of the ungulate DRB sequences suggest different evolutionary histories. In hoofed animals as well as in humans DRB beta-sheet encoding sequences and adjacent intronic repeats can be assembled into virtually identical groups suggesting coevolution of noncoding as well as coding DNA. In contrast alpha-helices and C-terminal parts of the first DRB domain evolve distinctly. In the absence of a defined mechanism causing specific, site-directed mutations, double-recombination or gene-conversion-like events would readily explain this fact. The role of the intronic simple (gt)n(ga)m repeat is discussed with respect to these genetic exchange mechanisms during evolution.
Exogean: a framework for annotating protein-coding genes in eukaryotic genomic DNA

PubMed Central

Djebali, Sarah; Delaplace, Franck; Crollius, Hugues Roest

2006-01-01

Background Accurate and automatic gene identification in eukaryotic genomic DNA is more than ever of crucial importance to efficiently exploit the large volume of assembled genome sequences available to the community. Automatic methods have always been considered less reliable than human expertise. This is illustrated in the EGASP project, where reference annotations against which all automatic methods are measured are generated by human annotators and experimentally verified. We hypothesized that replicating the accuracy of human annotators in an automatic method could be achieved by formalizing the rules and decisions that they use, in a mathematical formalism. Results We have developed Exogean, a flexible framework based on directed acyclic colored multigraphs (DACMs) that can represent biological objects (for example, mRNA, ESTs, protein alignments, exons) and relationships between them. Graphs are analyzed to process the information according to rules that replicate those used by human annotators. Simple individual starting objects given as input to Exogean are thus combined and synthesized into complex objects such as protein coding transcripts. Conclusion We show here, in the context of the EGASP project, that Exogean is currently the method that best reproduces protein coding gene annotations from human experts, in terms of identifying at least one exact coding sequence per gene. We discuss current limitations of the method and several avenues for improvement. PMID:16925841
Transcript and proteomic analysis of developing white lupin (Lupinus albus L.) roots

PubMed Central

Tian, Li; Peel, Gregory J; Lei, Zhentian; Aziz, Naveed; Dai, Xinbin; He, Ji; Watson, Bonnie; Zhao, Patrick X; Sumner, Lloyd W; Dixon, Richard A

2009-01-01

Background White lupin (Lupinus albus L.) roots efficiently take up and accumulate (heavy) metals, adapt to phosphate deficiency by forming cluster roots, and secrete antimicrobial prenylated isoflavones during development. Genomic and proteomic approaches were applied to identify candidate genes and proteins involved in antimicrobial defense and (heavy) metal uptake and translocation. Results A cDNA library was constructed from roots of white lupin seedlings. Eight thousand clones were randomly sequenced and assembled into 2,455 unigenes, which were annotated based on homologous matches in the NCBInr protein database. A reference map of developing white lupin root proteins was established through 2-D gel electrophoresis and peptide mass fingerprinting. High quality peptide mass spectra were obtained for 170 proteins. Microsomal membrane proteins were separated by 1-D gel electrophoresis and identified by LC-MS/MS. A total of 74 proteins were putatively identified by the peptide mass fingerprinting and the LC-MS/MS methods. Genomic and proteomic analyses identified candidate genes and proteins encoding metal binding and/or transport proteins, transcription factors, ABC transporters and phenylpropanoid biosynthetic enzymes. Conclusion The combined EST and protein datasets will facilitate the understanding of white lupin's response to biotic and abiotic stresses and its utility for phytoremediation. The root ESTs provided 82 perfect simple sequence repeat (SSR) markers with potential utility in breeding white lupin for enhanced agronomic traits. PMID:19123941
D-Light on promoters: a client-server system for the analysis and visualization of cis-regulatory elements

PubMed Central

2013-01-01

Background The binding of transcription factors to DNA plays an essential role in the regulation of gene expression. Numerous experiments elucidated binding sequences which subsequently have been used to derive statistical models for predicting potential transcription factor binding sites (TFBS). The rapidly increasing number of genome sequence data requires sophisticated computational approaches to manage and query experimental and predicted TFBS data in the context of other epigenetic factors and across different organisms. Results We have developed D-Light, a novel client-server software package to store and query large amounts of TFBS data for any number of genomes. Users can add small-scale data to the server database and query them in a large scale, genome-wide promoter context. The client is implemented in Java and provides simple graphical user interfaces and data visualization. Here we also performed a statistical analysis showing what a user can expect for certain parameter settings and we illustrate the usage of D-Light with the help of a microarray data set. Conclusions D-Light is an easy to use software tool to integrate, store and query annotation data for promoters. A public D-Light server, the client and server software for local installation and the source code under GNU GPL license are available at http://biwww.che.sbg.ac.at/dlight. PMID:23617301
40 CFR 86.230-94 - Test sequence: general requirements.

Code of Federal Regulations, 2010 CFR

2010-07-01

... testing. (2) The ambient temperature reported shall be a simple average of the test cell temperatures... cell temperature shall be 20 °F±3 °F (−7 °C±1.7 °C) when measured in accordance with paragraph (e)(2... approximately level during all phases of the test sequence to prevent abnormal fuel distribution. (e) Engine...
Cross-species transferability and mapping of genomic and cDNA SSRs in pines

Treesearch

D. Chagne; P. Chaumeil; A. Ramboer; C. Collada; A. Guevara; M. T. Cervera; G. G. Vendramin; V. Garcia; J-M. Frigerio; Craig Echt; T. Richardson; Christophe Plomion

2004-01-01

Two unigene datasets of Pinus taeda and Pinus pinaster were screened to detect di-, tri and tetranucleotide repeated motifs using the SSRIT script. A total of 419 simple sequence repeats (SSRs) were identified, from which only 12.8% overlapped between the two sets. The position of the SSRs within the coding sequence were predicted...
Genetic variation patterns of American chestnut populations at EST-SSRs

Treesearch

Oliver Gailing; C. Dana Nelson

2017-01-01

The objective of this study is to analyze patterns of genetic variation at genic expressed sequence tag - simple sequence repeats (EST-SSRs) and at chloroplast DNA markers in populations of American chestnut (Castanea dentata Borkh.) to assist in conservation and breeding efforts. Allelic diversity at EST-SSRs decreased significantly from southwest to northeast along...
Does Sleep Facilitate the Consolidation of Allocentric or Egocentric Representations of Implicitly Learned Visual-Motor Sequence Learning?

ERIC Educational Resources Information Center

Viczko, Jeremy; Sergeeva, Valya; Ray, Laura B.; Owen, Adrian M.; Fogel, Stuart M.

2018-01-01

Sleep facilitates the consolidation (i.e., enhancement) of simple, explicit (i.e., conscious) motor sequence learning (MSL). MSL can be dissociated into egocentric (i.e., motor) or allocentric (i.e., spatial) frames of reference. The consolidation of the allocentric memory representation is sleep-dependent, whereas the egocentric consolidation…

Learning of Monotonic and Nonmonotonic Sequences in Domesticated Horses ("Equus Callabus") and Chickens ("Gallus Domesticus")

ERIC Educational Resources Information Center

Kundey, Shannon M. A.; Strandell, Brittany; Mathis, Heather; Rowan, James D.

2010-01-01

(Hulse and Dorsky, 1977) and (Hulse and Dorsky, 1979) found that rats, like humans, learn sequences following a simple rule-based structure more quickly than those lacking a rule-based structure. Through two experiments, we explored whether two additional species--domesticated horses ("Equus callabus") and chickens ("Gallus domesticus")--would…
The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

USDA-ARS?s Scientific Manuscript database

Background: Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results: We describe the sequencing and assembly of...
Improved efficiency in amplification of Escherichia coli o-antigen gene clusters using genome-wide sequence comparison

USDA-ARS?s Scientific Manuscript database

Background: In many bacteria including E. coli, genes encoding O-antigens are clustered in the chromosome, with a 39-bp JUMPstart sequence and gnd gene located upstream and downstream of the cluster, respectively. For determining the DNA sequence of the E. coli O-antigen gene cluster, one set of P...
Children's Criteria for Representational Adequacy in the Perception of Simple Sonic Stimuli

ERIC Educational Resources Information Center

Verschaffel, Lieven; Reybrouck, Mark; Jans, Christine; Van Dooren, Wim

2010-01-01

This study investigates children's metarepresentational competence with regard to listening to and making sense of simple sonic stimuli. Using diSessa's (2003) work on metarepresentational competence in mathematics and sciences as theoretical and empirical background, it aims to assess children's criteria for representational adequacy of graphical…
An Isotopic Dilution Experiment Using Liquid Scintillation: A Simple Two-System, Two-Phase Analysis.

ERIC Educational Resources Information Center

Moehs, Peter J.; Levine, Samuel

1982-01-01

A simple isotonic, dilution analysis whose principles apply to methods of more complex radioanalyses is described. Suitable for clinical and instrumental analysis chemistry students, experimental manipulations are kept to a minimum involving only aqueous extraction before counting. Background information, procedures, and results are discussed.…
Prediction of Transport Properties of Permeants through Polymer Films. A Simple Gravimetric Experiment.

ERIC Educational Resources Information Center

Britton, L. N.; And Others

1988-01-01

Considers the applicability of the simple emersion/weight-gain method for predicting diffusion coefficients, solubilities, and permeation rates of chemicals in polymers that do not undergo physical and chemical deterioration. Presents the theoretical background, procedures and typical results related to this activity. (CW)
Development of a EST dataset and characterization of EST-SSRs in a traditional Chinese medicinal plant, Epimedium sagittatum (Sieb. Et Zucc.) Maxim

PubMed Central

2010-01-01

Background Epimedium sagittatum (Sieb. Et Zucc.) Maxim, a traditional Chinese medicinal plant species, has been used extensively as genuine medicinal materials. Certain Epimedium species are endangered due to commercial overexploition, while sustainable application studies, conservation genetics, systematics, and marker-assisted selection (MAS) of Epimedium is less-studied due to the lack of molecular markers. Here, we report a set of expressed sequence tags (ESTs) and simple sequence repeats (SSRs) identified in these ESTs for E. sagittatum. Results cDNAs of E. sagittatum are sequenced using 454 GS-FLX pyrosequencing technology. The raw reads are cleaned and assembled into a total of 76,459 consensus sequences comprising of 17,231 contigs and 59,228 singlets. About 38.5% (29,466) of the consensus sequences significantly match to the non-redundant protein database (E-value < 1e-10), 22,295 of which are further annotated using Gene Ontology (GO) terms. A total of 2,810 EST-SSRs is identified from the Epimedium EST dataset. Trinucleotide SSR is the dominant repeat type (55.2%) followed by dinucleotide (30.4%), tetranuleotide (7.3%), hexanucleotide (4.9%), and pentanucleotide (2.2%) SSR. The dominant repeat motif is AAG/CTT (23.6%) followed by AG/CT (19.3%), ACC/GGT (11.1%), AT/AT (7.5%), and AAC/GTT (5.9%). Thirty-two SSR-ESTs are randomly selected and primer pairs are synthesized for testing the transferability across 52 Epimedium species. Eighteen primer pairs (85.7%) could be successfully transferred to Epimedium species and sixteen of those show high genetic diversity with 0.35 of observed heterozygosity (Ho) and 0.65 of expected heterozygosity (He) and high number of alleles per locus (11.9). Conclusion A large EST dataset with a total of 76,459 consensus sequences is generated, aiming to provide sequence information for deciphering secondary metabolism, especially for flavonoid pathway in Epimedium. A total of 2,810 EST-SSRs is identified from EST dataset and ~1580 EST-SSR markers are transferable. E. sagittatum EST-SSR transferability to the major Epimedium germplasm is up to 85.7%. Therefore, this EST dataset and EST-SSRs will be a powerful resource for further studies such as taxonomy, molecular breeding, genetics, genomics, and secondary metabolism in Epimedium species. PMID:20141623
Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

PubMed Central

Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R

2009-01-01

Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662
LISTA, a comprehensive compilation of nucleotide sequences encoding proteins from the yeast Saccharomyces.

PubMed Central

Linder, P; Dölz, R; Mossé, M O; Lazowska, J; Slonimski, P P

1993-01-01

The amount of nucleotide sequence data is increasing exponentially. We therefore made an effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. Each sequence has been attributed a single genetic name and in the case of allelic duplicated sequences, synonyms are given, if necessary. For the nomenclature we have introduced a standard principle for naming gene sequences based on priority rules. We have also applied a simple method to distinguish duplicated sequences of one and the same gene from non-allelic sequences of duplicated genes. By using these principles we have sorted out a lot of confusion in the literature and databanks. Along with the genetic name, the mnemonic from the EMBL databank, the codon bias, reference of the publication of the sequence and the EMBL accession numbers are included in each entry. PMID:8332521
Double-modulation spectroscopy of molecular ions - Eliminating the background in velocity-modulation spectroscopy

NASA Technical Reports Server (NTRS)

Lan, Guang; Tholl, Hans Dieter; Farley, John W.

1991-01-01

Velocity-modulation spectroscopy is an established technique for performing laser absorption spectroscopy of molecular ions in a discharge. However, such experiments are often plagued by a coherent background signal arising from emission from the discharge or from electronic pickup. Fluctuations in the background can obscure the desired signal. A simple technique using amplitude modulation of the laser and two lock-in amplifiers in series to detect the signal is demonstrated. The background and background fluctuations are thereby eliminated, facilitating the detection of molecular ions.
Soundtrack contents and depicted sexual violence.

PubMed

Pfaus, J G; Myronuk, L D; Jacobs, W J

1986-06-01

Male undergraduates were exposed to a videotaped depiction of heterosexual rape accompanied by one of three soundtracks: the original soundtrack (featuring dialogue and background rock music), relaxing music, or no sound. Subjective reports of sexual arousal, general enjoyment, perceived erotic content, and perceived pornographic content of the sequence were then provided by each subject. Results indicated that males exposed to the videotape accompanied by the original soundtrack found the sequence significantly more pornographic than males exposed to the sequence accompanied by either relaxing background music or no sound. Ratings of sexual arousal, general enjoyment, and the perceived erotic content, however, did not differ significantly across soundtrack conditions. These results are compatible with the assertion that the content of a video soundtrack may influence the impact of depicted sexual violence.
"First generation" automated DNA sequencing technology.

PubMed

Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

2011-10-01

Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
A Simple Method to Determine the "R" or "S" Configuration of Molecules with an Axis of Chirality

ERIC Educational Resources Information Center

Wang, Cunde; Wu, Weiming

2011-01-01

A simple method for the "R" or "S" designation of molecules with an axis of chirality is described. The method involves projection of the substituents along the chiral axis, utilizes the Cahn-Ingold-Prelog sequence rules in assigning priority to the substituents, is easy to use, and has broad applicability. (Contains 5 figures.)
Meta4: a web application for sharing and annotating metagenomic gene predictions using web services.

PubMed

Richardson, Emily J; Escalettes, Franck; Fotheringham, Ian; Wallace, Robert J; Watson, Mick

2013-01-01

Whole-genome shotgun metagenomics experiments produce DNA sequence data from entire ecosystems, and provide a huge amount of novel information. Gene discovery projects require up-to-date information about sequence homology and domain structure for millions of predicted proteins to be presented in a simple, easy-to-use system. There is a lack of simple, open, flexible tools that allow the rapid sharing of metagenomics datasets with collaborators in a format they can easily interrogate. We present Meta4, a flexible and extensible web application that can be used to share and annotate metagenomic gene predictions. Proteins and predicted domains are stored in a simple relational database, with a dynamic front-end which displays the results in an internet browser. Web services are used to provide up-to-date information about the proteins from homology searches against public databases. Information about Meta4 can be found on the project website, code is available on Github, a cloud image is available, and an example implementation can be seen at.
Fine-tuning gene networks using simple sequence repeats

PubMed Central

Egbert, Robert G.; Klavins, Eric

2012-01-01

The parameters in a complex synthetic gene network must be extensively tuned before the network functions as designed. Here, we introduce a simple and general approach to rapidly tune gene networks in Escherichia coli using hypermutable simple sequence repeats embedded in the spacer region of the ribosome binding site. By varying repeat length, we generated expression libraries that incrementally and predictably sample gene expression levels over a 1,000-fold range. We demonstrate the utility of the approach by creating a bistable switch library that programmatically samples the expression space to balance the two states of the switch, and we illustrate the need for tuning by showing that the switch’s behavior is sensitive to host context. Further, we show that mutation rates of the repeats are controllable in vivo for stability or for targeted mutagenesis—suggesting a new approach to optimizing gene networks via directed evolution. This tuning methodology should accelerate the process of engineering functionally complex gene networks. PMID:22927382
In silico mining of putative microsatellite markers from whole genome sequence of water buffalo (Bubalus bubalis) and development of first BuffSatDB

PubMed Central

2013-01-01

Background Though India has sequenced water buffalo genome but its draft assembly is based on cattle genome BTau 4.0, thus de novo chromosome wise assembly is a major pending issue for global community. The existing radiation hybrid of buffalo and these reported STR can be used further in final gap plugging and “finishing” expected in de novo genome assembly. QTL and gene mapping needs mining of putative STR from buffalo genome at equal interval on each and every chromosome. Such markers have potential role in improvement of desirable characteristics, such as high milk yields, resistance to diseases, high growth rate. The STR mining from whole genome and development of user friendly database is yet to be done to reap the benefit of whole genome sequence. Description By in silico microsatellite mining of whole genome, we have developed first STR database of water buffalo, BuffSatDb (Buffalo MicroSatellite Database (http://cabindb.iasri.res.in/buffsatdb/) which is a web based relational database of 910529 microsatellite markers, developed using PHP and MySQL database. Microsatellite markers have been generated using MIcroSAtellite tool. It is simple and systematic web based search for customised retrieval of chromosome wise and genome-wide microsatellites. Search has been enabled based on chromosomes, motif type (mono-hexa), repeat motif and repeat kind (simple and composite). The search may be customised by limiting location of STR on chromosome as well as number of markers in that range. This is a novel approach and not been implemented in any of the existing marker database. This database has been further appended with Primer3 for primer designing of the selected markers enabling researcher to select markers of choice at desired interval over the chromosome. The unique add-on of degenerate bases further helps in resolving presence of degenerate bases in current buffalo assembly. Conclusion Being first buffalo STR database in the world , this would not only pave the way in resolving current assembly problem but shall be of immense use for global community in QTL/gene mapping critically required to increase knowledge in the endeavour to increase buffalo productivity, especially for third world country where rural economy is significantly dependent on buffalo productivity. PMID:23336431
The characterization of a new set of EST-derived simple sequence repeat (SSR) markers as a resource for the genetic analysis of Phaseolus vulgaris

PubMed Central

2011-01-01

Background Over recent years, a growing effort has been made to develop microsatellite markers for the genomic analysis of the common bean (Phaseolus vulgaris) to broaden the knowledge of the molecular genetic basis of this species. The availability of large sets of expressed sequence tags (ESTs) in public databases has given rise to an expedient approach for the identification of SSRs (Simple Sequence Repeats), specifically EST-derived SSRs. In the present work, a battery of new microsatellite markers was obtained from a search of the Phaseolus vulgaris EST database. The diversity, degree of transferability and polymorphism of these markers were tested. Results From 9,583 valid ESTs, 4,764 had microsatellite motifs, from which 377 were used to design primers, and 302 (80.11%) showed good amplification quality. To analyze transferability, a group of 167 SSRs were tested, and the results showed that they were 82% transferable across at least one species. The highest amplification rates were observed between the species from the Phaseolus (63.7%), Vigna (25.9%), Glycine (19.8%), Medicago (10.2%), Dipterix (6%) and Arachis (1.8%) genera. The average PIC (Polymorphism Information Content) varied from 0.53 for genomic SSRs to 0.47 for EST-SSRs, and the average number of alleles per locus was 4 and 3, respectively. Among the 315 newly tested SSRs in the BJ (BAT93 X Jalo EEP558) population, 24% (76) were polymorphic. The integration of these segregant loci into a framework map composed of 123 previously obtained SSR markers yielded a total of 199 segregant loci, of which 182 (91.5%) were mapped to 14 linkage groups, resulting in a map length of 1,157 cM. Conclusions A total of 302 newly developed EST-SSR markers, showing good amplification quality, are available for the genetic analysis of Phaseolus vulgaris. These markers showed satisfactory rates of transferability, especially between species that have great economic and genomic values. Their diversity was comparable to genomic SSRs, and they were incorporated in the common bean reference genetic map, which constitutes an important contribution to and advance in Phaseolus vulgaris genomic research. PMID:21554695
Psychophysical and perceptual performance in a simulated-scotoma model of human eye injury

NASA Astrophysics Data System (ADS)

Brandeis, R.; Egoz, I.; Peri, D.; Sapiens, N.; Turetz, J.

2008-02-01

Macular scotomas, affecting visual functioning, characterize many eye and neurological diseases like AMD, diabetes mellitus, multiple sclerosis, and macular hole. In this work, foveal visual field defects were modeled, and their effects were evaluated on spatial contrast sensitivity and a task of stimulus detection and aiming. The modeled occluding scotomas, of different size, were superimposed on the stimuli presented on the computer display, and were stabilized on the retina using a mono Purkinje Eye-Tracker. Spatial contrast sensitivity was evaluated using square-wave grating stimuli, whose contrast thresholds were measured using the method of constant stimuli with "catch trials". The detection task consisted of a triple conjunctive visual search display of: size (in visual angle), contrast and background (simple, low-level features vs. complex, high-level features). Search/aiming accuracy as well as R.T. measures used for performance evaluation. Artificially generated scotomas suppressed spatial contrast sensitivity in a size dependent manner, similar to previous studies. Deprivation effect was dependent on spatial frequency, consistent with retinal inhomogeneity models. Stimulus detection time was slowed in complex background search situation more than in simple background. Detection speed was dependent on scotoma size and size of stimulus. In contrast, visually guided aiming was more sensitive to scotoma effect in simple background search situation than in complex background. Both stimulus aiming R.T. and accuracy (precision targeting) were impaired, as a function of scotoma size and size of stimulus. The data can be explained by models distinguishing between saliency-based, parallel and serial search processes, guiding visual attention, which are supported by underlying retinal as well as neural mechanisms.
Background Checks on School Personnel. ERIC Digest Series EA 55.

ERIC Educational Resources Information Center

Baas, Alan

Although it is relatively simple to check on applicants' basic professional competency, ensuring the moral competency of potential school employees is much more difficult. This digest examines major legal issues, district liabilities and responsibilities, suggested guidelines, and information sources involving employee background checks. Of more…
Terminator oligo blocking efficiently eliminates rRNA from Drosophila small RNA sequencing libraries.

PubMed

Wickersheim, Michelle L; Blumenstiel, Justin P

2013-11-01

A large number of methods are available to deplete ribosomal RNA reads from high-throughput RNA sequencing experiments. Such methods are critical for sequencing Drosophila small RNAs between 20 and 30 nucleotides because size selection is not typically sufficient to exclude the highly abundant class of 30 nucleotide 2S rRNA. Here we demonstrate that pre-annealing terminator oligos complimentary to Drosophila 2S rRNA prior to 5' adapter ligation and reverse transcription efficiently depletes 2S rRNA sequences from the sequencing reaction in a simple and inexpensive way. This depletion is highly specific and is achieved with minimal perturbation of miRNA and piRNA profiles.

Next-generation sequencing library preparation method for identification of RNA viruses on the Ion Torrent Sequencing Platform.

PubMed

Chen, Guiqian; Qiu, Yuan; Zhuang, Qingye; Wang, Suchun; Wang, Tong; Chen, Jiming; Wang, Kaicheng

2018-05-09

Next generation sequencing (NGS) is a powerful tool for the characterization, discovery, and molecular identification of RNA viruses. There were multiple NGS library preparation methods published for strand-specific RNA-seq, but some methods are not suitable for identifying and characterizing RNA viruses. In this study, we report a NGS library preparation method to identify RNA viruses using the Ion Torrent PGM platform. The NGS sequencing adapters were directly inserted into the sequencing library through reverse transcription and polymerase chain reaction, without fragmentation and ligation of nucleic acids. The results show that this method is simple to perform, able to identify multiple species of RNA viruses in clinical samples.
Molecular beacon sequence design algorithm.

PubMed

Monroe, W Todd; Haselton, Frederick R

2003-01-01

A method based on Web-based tools is presented to design optimally functioning molecular beacons. Molecular beacons, fluorogenic hybridization probes, are a powerful tool for the rapid and specific detection of a particular nucleic acid sequence. However, their synthesis costs can be considerable. Since molecular beacon performance is based on its sequence, it is imperative to rationally design an optimal sequence before synthesis. The algorithm presented here uses simple Microsoft Excel formulas and macros to rank candidate sequences. This analysis is carried out using mfold structural predictions along with other free Web-based tools. For smaller laboratories where molecular beacons are not the focus of research, the public domain algorithm described here may be usefully employed to aid in molecular beacon design.
A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library

PubMed Central

Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477
A simple and efficient method for assembling TALE protein based on plasmid library.

PubMed

Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.
Sequencing, de novo assembly and characterization of the spotted scat Scatophagus argus (Linnaeus 1766) transcriptome for discovery of reproduction related genes and SSRs

NASA Astrophysics Data System (ADS)

Yang, Wei; Chen, Huapu; Cui, Xuefan; Zhang, Kewei; Jiang, Dongneng; Deng, Siping; Zhu, Chunhua; Li, Guangli

2017-09-01

Spotted scat (Scatophagus argus) is an economically important farmed fish, particularly in East and Southeast Asia. Because there has been little research on reproductive development and regulation in this species, the lack of a mature artificial reproduction technology remains a barrier for the sustainable development of the aquaculture industry. More genetic and genomic background knowledge is urgently needed for an in-depth understanding of the molecular mechanism of reproductive process and identification of functional genes related to sexual differentiation, gonad maturation and gametogenesis. For these reasons, we performed transcriptomic analysis on spotted scat using a multiple tissue sample mixing strategy. The Illumina RNA sequencing generated 118 510 486 raw reads. After trimming, de novo assembly was performed and yielded 99 888 unigenes with an average length of 905.75 bp. A total of 45 015 unigenes were successfully annotated to the Nr, Swiss-Prot, KOG and KEGG databases. Additionally, 23 783 and 27 183 annotated unigenes were assigned to 56 Gene Ontology (GO) functional groups and 228 KEGG pathways, respectively. Subsequently, 2 474 transcripts associated with reproduction were selected using GO term and KEGG pathway assignments, and a number of reproduction-related genes involved in sex differentiation, gonad development and gametogenesis were identified. Furthermore, 22 279 simple sequence repeat (SSR) loci were discovered and characterized. The comprehensive transcript dataset described here greatly increases the genetic information available for spotted scat and contributes valuable sequence resources for functional gene mining and analysis. Candidate transcripts involved in reproduction would make good starting points for future studies on reproductive mechanisms, and the putative sex differentiation-related genes will be helpful for sex-determining gene identification and sex-specific marker isolation. Lastly, the SSRs can serve as marker resources for future research into genetics, marker-assisted selection (MAS) and conservation biology.
Discovery of Pod Shatter-Resistant Associated SNPs by Deep Sequencing of a Representative Library Followed by Bulk Segregant Analysis in Rapeseed

PubMed Central

Huang, Shunmou; Yang, Hongli; Zhan, Gaomiao; Wang, Xinfa; Liu, Guihua; Wang, Hanzhong

2012-01-01

Background Single nucleotide polymorphisms (SNPs) are an important class of genetic marker for target gene mapping. As of yet, there is no rapid and effective method to identify SNPs linked with agronomic traits in rapeseed and other crop species. Methodology/Principal Findings We demonstrate a novel method for identifying SNP markers in rapeseed by deep sequencing a representative library and performing bulk segregant analysis. With this method, SNPs associated with rapeseed pod shatter-resistance were discovered. Firstly, a reduced representation of the rapeseed genome was used. Genomic fragments ranging from 450–550 bp were prepared from the susceptible bulk (ten F2 plants with the silique shattering resistance index, SSRI <0.10) and the resistance bulk (ten F2 plants with SSRI >0.90), and also Solexa sequencing-produced 90 bp reads. Approximately 50 million of these sequence reads were assembled into contigs to a depth of 20-fold coverage. Secondly, 60,396 ‘simple SNPs’ were identified, and the statistical significance was evaluated using Fisher's exact test. There were 70 associated SNPs whose –log10 p value over 16 were selected to be further analyzed. The distribution of these SNPs appeared a tight cluster, which consisted of 14 associated SNPs within a 396 kb region on chromosome A09. Our evidence indicates that this region contains a major quantitative trait locus (QTL). Finally, two associated SNPs from this region were mapped on a major QTL region. Conclusions/Significance 70 associated SNPs were discovered and a major QTL for rapeseed pod shatter-resistance was found on chromosome A09 using our novel method. The associated SNP markers were used for mapping of the QTL, and may be useful for improving pod shatter-resistance in rapeseed through marker-assisted selection and map-based cloning. This approach will accelerate the discovery of major QTLs and the cloning of functional genes for important agronomic traits in rapeseed and other crop species. PMID:22529909
Development of EST-SSR markers for Taxillus nigrans (Loranthaceae) in southwestern China using next-generation sequencing1

PubMed Central

Miao, Ning; Zhang, Lei; Li, Maoping; Fan, Liqiang; Mao, Kangshan

2017-01-01

Premise of the study: We developed transcriptome microsatellite markers (simple sequence repeats) for Taxillus nigrans (Loranthaceae) to survey the genetic diversity and population structure of this species. Methods and Results: We used Illumina HiSeq data to reconstruct the transcriptome of T. nigrans by de novo assembly and used the transcriptome to develop a set of simple sequence repeat markers. Overall, 40 primer pairs were designed and tested; 19 of them amplified successfully and demonstrated polymorphisms. Two loci that detected null alleles were eliminated, and the remaining 17, which were subjected to further analyses, yielded two to 21 alleles per locus. Conclusions: The markers will serve as a basis for studies to assess the extent and pattern of distribution of genetic variation in T. nigrans, and they may also be useful in conservation genetic, ecological, and evolutionary studies of the genus Taxillus, a group of plant species of importance in Chinese traditional medicine. PMID:28924510
Inverted-U Function Relating Cortical Plasticity and Task Difficulty

PubMed Central

Engineer, Navzer D.; Engineer, Crystal T.; Reed, Amanda C.; Pandya, Pritesh K.; Jakkamsetti, Vikram; Moucha, Raluca; Kilgard, Michael P.

2012-01-01

Many psychological and physiological studies with simple stimuli have suggested that perceptual learning specifically enhances the response of primary sensory cortex to task-relevant stimuli. The aim of this study was to determine whether auditory discrimination training on complex tasks enhances primary auditory cortex responses to a target sequence relative to non-target and novel sequences. We collected responses from more than 2,000 sites in 31 rats trained on one of six discrimination tasks that differed primarily in the similarity of the target and distractor sequences. Unlike training with simple stimuli, long-term training with complex stimuli did not generate target specific enhancement in any of the groups. Instead, cortical receptive field size decreased, latency decreased, and paired pulse depression decreased in rats trained on the tasks of intermediate difficulty while tasks that were too easy or too difficult either did not alter or degraded cortical responses. These results suggest an inverted-U function relating neural plasticity and task difficulty. PMID:22249158
[Analysis of MAT1A gene mutations in a child affected with simple hypermethioninemia].

PubMed

Sun, Yun; Ma, Dingyuan; Wang, Yanyun; Yang, Bin; Jiang, Tao

2017-02-10

To detect potential mutations of MAT1A gene in a child suspected with simple hypermethioninemia by MS/MS neonatal screening. Clinical data of the child was collected. Genomic DNA was extracted by a standard method and subjected to targeted sequencing using an Ion Ampliseq TM Inherited Disease Panel. Detected mutations were verified by Sanger sequencing. The child showed no clinical features except evaluated methionine. A novel compound mutation of the MAT1A gene, i.e., c.345delA and c.529C>T, was identified in the child. His father and mother were found to be heterozygous for the c.345delA mutation and c.529C>T mutation, respectively. The compound mutation c.345delA and c.529C>T of the MAT1A gene probably underlie the disease in the child. The semi-conductor sequencing has provided an important means for the diagnosis of hereditary diseases.
Investigation of microsatellite instability in Turkish breast cancer patients.

PubMed

Demokan, Semra; Muslumanoglu, Mahmut; Yazici, H; Igci, Abdullah; Dalay, Nejat

2002-01-01

Multiple somatic and inherited genetic changes that lead to loss of growth control may contribute to the development of breast cancer. Microsatellites are tandem repeats of simple sequences that occur abundantly and at random throughout most eucaryotic genomes. Microsatellite instability (MI), characterized by the presence of random contractions or expansions in the length of simple sequence repeats or microsatellites, is observed in a variety of tumors. The aim of this study was to compare tumor DNA fingerprints with constitutional DNA fingerprints to investigate changes specific to breast cancer and evaluate its correlation with clinical characteristics. Tumor and normal tissue samples of 38 patients with breast cancer were investigated by comparing PCR-amplified microsatellite sequences D2S443 and D21S1436. Microsatellite instability at D21S1436 and D2S443 was found in 5 (13%) and 7 (18%) patients, respectively. Two patients displayed instability at both marker loci. No association was found between MI and age, family history, lymph node involvement and other clinical parameters.
Simple diazonium chemistry to develop specific gene sensing platforms.

PubMed

Revenga-Parra, M; García-Mendiola, T; González-Costas, J; González-Romero, E; Marín, A García; Pau, J L; Pariente, F; Lorenzo, E

2014-02-27

A simple strategy for covalent immobilizing DNA sequences, based on the formation of stable diazonized conducting platforms, is described. The electrochemical reduction of 4-nitrobenzenediazonium salt onto screen-printed carbon electrodes (SPCE) in aqueous media gives rise to terminal grafted amino groups. The presence of primary aromatic amines allows the formation of diazonium cations capable to react with the amines present at the DNA capture probe. As a comparison a second strategy based on the binding of aminated DNA capture probes to the developed diazonized conducting platforms through a crosslinking agent was also employed. The resulting DNA sensing platforms were characterized by cyclic voltammetry, electrochemical impedance spectroscopy and spectroscopic ellipsometry. The hybridization event with the complementary sequence was detected using hexaamineruthenium (III) chloride as electrochemical indicator. Finally, they were applied to the analysis of a 145-bp sequence from the human gene MRP3, reaching a detection limit of 210 pg μL(-1). Copyright © 2014 Elsevier B.V. All rights reserved.
Isolation of a novel mutant gene for soil-surface rooting in rice (Oryza sativa L.)

PubMed Central

2013-01-01

Background Root system architecture is an important trait affecting the uptake of nutrients and water by crops. Shallower root systems preferentially take up nutrients from the topsoil and help avoid unfavorable environments in deeper soil layers. We have found a soil-surface rooting mutant from an M2 population that was regenerated from seed calli of a japonica rice cultivar, Nipponbare. In this study, we examined the genetic and physiological characteristics of this mutant. Results The primary roots of the mutant showed no gravitropic response from the seedling stage on, whereas the gravitropic response of the shoots was normal. Segregation analyses by using an F2 population derived from a cross between the soil-surface rooting mutant and wild-type Nipponbare indicated that the trait was controlled by a single recessive gene, designated as sor1. Fine mapping by using an F2 population derived from a cross between the mutant and an indica rice cultivar, Kasalath, revealed that sor1 was located within a 136-kb region between the simple sequence repeat markers RM16254 and 2935-6 on the terminal region of the short arm of chromosome 4, where 13 putative open reading frames (ORFs) were found. We sequenced these ORFs and detected a 33-bp deletion in one of them, Os04g0101800. Transgenic plants of the mutant transformed with the genomic fragment carrying the Os04g0101800 sequence from Nipponbare showed normal gravitropic responses and no soil-surface rooting. Conclusion These results suggest that sor1, a rice mutant causing soil-surface rooting and altered root gravitropic response, is allelic to Os04g0101800, and that a 33-bp deletion in the coding region of this gene causes the mutant phenotypes. PMID:24280269
Construction of nested genetic core collections to optimize the exploitation of natural diversity in Vitis vinifera L. subsp. sativa

PubMed Central

Le Cunff, Loïc; Fournier-Level, Alexandre; Laucou, Valérie; Vezzulli, Silvia; Lacombe, Thierry; Adam-Blondon, Anne-Françoise; Boursiquot, Jean-Michel; This, Patrice

2008-01-01

Background The first high quality draft of the grape genome sequence has just been published. This is a critical step in accessing all the genes of this species and increases the chances of exploiting the natural genetic diversity through association genetics. However, our basic knowledge of the extent of allelic variation within the species is still not sufficient. Towards this goal, we constructed nested genetic core collections (G-cores) to capture the simple sequence repeat (SSR) diversity of the grape cultivated compartment (Vitis vinifera L. subsp. sativa) from the world's largest germplasm collection (Domaine de Vassal, INRA Hérault, France), containing 2262 unique genotypes. Results Sub-samples of 12, 24, 48 and 92 varieties of V. vinifera L. were selected based on their genotypes for 20 SSR markers using the M-strategy. They represent respectively 58%, 73%, 83% and 100% of total SSR diversity. The capture of allelic diversity was analyzed by sequencing three genes scattered throughout the genome on 233 individuals: 41 single nucleotide polymorphisms (SNPs) were identified using the G-92 core (one SNP for every 49 nucleotides) while only 25 were observed using a larger sample of 141 individuals selected on the basis of 50 morphological traits, thus demonstrating the reliability of the approach. Conclusion The G-12 and G-24 core-collections displayed respectively 78% and 88% of the SNPs respectively, and are therefore of great interest for SNP discovery studies. Furthermore, the nested genetic core collections satisfactorily reflected the geographic and the genetic diversity of grape, which are also of great interest for the study of gene evolution in this species. PMID:18384667
Metabolic markers and microecological characteristics of tongue coating in patients with chronic gastritis

PubMed Central

2013-01-01

Background In Traditional Chinese Medicine (TCM), tongue diagnosis has been an important diagnostic method for the last 3000 years. Tongue diagnosis is a non-invasive, simple and valuable diagnostic tool. TCM treats the tongue coating on a very sensitive scale that reflects physiological and pathological changes in the organs, especially the spleen and stomach. Tongue coating can diagnose disease severity and determine the TCM syndrome (“Zheng” in Chinese). The biological bases of different tongue coating appearances are still poorly understood and lack systematic investigation at the molecular level. Methods Tongue coating samples were collected from 70 chronic gastritis patients and 20 normal controls. 16S rRNA denatured gradient gel electrophoresis (16S rRNA–DGGE) and liquid chromatography and mass spectrometry (LC–MS) were designed to profile tongue coatings. The statistical techniques used were principal component analysis and partial least squares–discriminate analysis. Results Ten potential metabolites or markers were found in chronic gastritis patients, including UDP-D-galactose, 3-ketolactose, and vitamin D2, based on LC–MS. Eight significantly different strips were observed in samples from chronic gastritis patients based on 16S rRNA–DGGE. Two strips, Strips 8 and 10, were selected for gene sequencing. Strip 10 sequencing showed a 100% similarity to Rothia mucilaginosa. Strip 8 sequencing showed a 96.2% similarity to Moraxella catarrhalis. Conclusions Changes in glucose metabolism could possibly form the basis of tongue coating conformation in chronic gastritis patients. The study revealed important connections between metabolic components, microecological components and tongue coating in chronic gastritis patients. Compared with other diagnostic regimens, such as blood tests or tissue biopsies, tongue coating is more amenable to, and more convenient for, both patients and doctors. PMID:24041039
Multiplex Amplification Refractory Mutation System Polymerase Chain Reaction (ARMS-PCR) for diagnosis of natural infection with canine distemper virus

PubMed Central

2010-01-01

Background Canine distemper virus (CDV) is present worldwide and produces a lethal systemic infection of wild and domestic Canidae. Pre-existing antibodies acquired from vaccination or previous CDV infection might interfere the interpretation of a serologic diagnosis method. In addition, due to the high similarity of nucleic acid sequences between wild-type CDV and the new vaccine strain, current PCR derived methods cannot be applied for the definite confirmation of CD infection. Hence, it is worthy of developing a simple and rapid nucleotide-based assay for differentiation of wild-type CDV which is a cause of disease from attenuated CDVs after vaccination. High frequency variations have been found in the region spanning from the 3'-untranslated region (UTR) of the matrix (M) gene to the fusion (F) gene (designated M-F UTR) in a few CDV strains. To establish a differential diagnosis assay, an amplification refractory mutation analysis was established based on the highly variable region on M-F UTR and F regions. Results Sequences of frequent polymorphisms were found scattered throughout the M-F UTR region; the identity of nucleic acid between local strains and vaccine strains ranged from 82.5% to 93.8%. A track of AAA residue located 35 nucleotides downstream from F gene start codon highly conserved in three vaccine strains were replaced with TGC in the local strains; that severed as target sequences for deign of discrimination primers. The method established in the present study successfully differentiated seven Taiwanese CDV field isolates, all belonging to the Asia-1 lineage, from vaccine strains. Conclusions The method described herein would be useful for several clinical applications, such as confirmation of nature CDV infection, evaluation of vaccination status and verification of the circulating viral genotypes. PMID:20534175
Bamboo tea: reduction of taxonomic complexity and application of DNA diagnostics based on rbcL and matK sequence data

PubMed Central

Häser, Annette

2016-01-01

Background Names used in ingredient lists of food products are trivial and in their nature rarely precise. The most recent scientific interpretation of the term bamboo (Bambusoideae, Poaceae) comprises over 1,600 distinct species. In the European Union only few of these exotic species are well known sources for food ingredients (i.e., bamboo sprouts) and are thus not considered novel foods, which would require safety assessments before marketing of corresponding products. In contrast, the use of bamboo leaves and their taxonomic origin is mostly unclear. However, products containing bamboo leaves are currently marketed. Methods We analysed bamboo species and tea products containing bamboo leaves using anatomical leaf characters and DNA sequence data. To reduce taxonomic complexity associated with the term bamboo, we used a phylogenetic framework to trace the origin of DNA from commercially available bamboo leaves within the bambusoid subfamily. For authentication purposes, we introduced a simple PCR based test distinguishing genuine bamboo from other leaf components and assessed the diagnostic potential of rbcL and matK to resolve taxonomic entities within the bamboo subfamily and tribes. Results Based on anatomical and DNA data we were able to trace the taxonomic origin of bamboo leaves used in products to the genera Phyllostachys and Pseudosasa from the temperate “woody” bamboo tribe (Arundinarieae). Currently available rbcL and matK sequence data allow the character based diagnosis of 80% of represented bamboo genera. We detected adulteration by carnation in four of eight tea products and, after adapting our objectives, could trace the taxonomic origin of the adulterant to Dianthus chinensis (Caryophyllaceae), a well known traditional Chinese medicine with counter indications for pregnant women. PMID:27957401
Is High Resolution Melting Analysis (HRMA) Accurate for Detection of Human Disease-Associated Mutations? A Meta Analysis

PubMed Central

Ma, Feng-Li; Jiang, Bo; Song, Xiao-Xiao; Xu, An-Gao

2011-01-01

Background High Resolution Melting Analysis (HRMA) is becoming the preferred method for mutation detection. However, its accuracy in the individual clinical diagnostic setting is variable. To assess the diagnostic accuracy of HRMA for human mutations in comparison to DNA sequencing in different routine clinical settings, we have conducted a meta-analysis of published reports. Methodology/Principal Findings Out of 195 publications obtained from the initial search criteria, thirty-four studies assessing the accuracy of HRMA were included in the meta-analysis. We found that HRMA was a highly sensitive test for detecting disease-associated mutations in humans. Overall, the summary sensitivity was 97.5% (95% confidence interval (CI): 96.8–98.5; I2 = 27.0%). Subgroup analysis showed even higher sensitivity for non-HR-1 instruments (sensitivity 98.7% (95%CI: 97.7–99.3; I2 = 0.0%)) and an eligible sample size subgroup (sensitivity 99.3% (95%CI: 98.1–99.8; I2 = 0.0%)). HRMA specificity showed considerable heterogeneity between studies. Sensitivity of the techniques was influenced by sample size and instrument type but by not sample source or dye type. Conclusions/Significance These findings show that HRMA is a highly sensitive, simple and low-cost test to detect human disease-associated mutations, especially for samples with mutations of low incidence. The burden on DNA sequencing could be significantly reduced by the implementation of HRMA, but it should be recognized that its sensitivity varies according to the number of samples with/without mutations, and positive results require DNA sequencing for confirmation. PMID:22194806
'PACLIMS': A component LIM system for high-throughput functional genomic analysis

PubMed Central

Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A

2005-01-01

Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298
A Sensitive Detection Method for MPLW515L or MPLW515K Mutation in Chronic Myeloproliferative Disorders with Locked Nucleic Acid-Modified Probes and Real-Time Polymerase Chain Reaction

PubMed Central

Pancrazzi, Alessandro; Guglielmelli, Paola; Ponziani, Vanessa; Bergamaschi, Gaetano; Bosi, Alberto; Barosi, Giovanni; Vannucchi, Alessandro M.

2008-01-01

Acquired mutations in the juxtamembrane region of MPL (W515K or W515L), the receptor for thrombopoietin, have been described in patients with primary myelofibrosis or essential thrombocythemia, which are chronic myeloproliferative disorders. We have developed a real-time polymerase chain reaction assay for the detection and quantification of MPL mutations that is based on locked nucleic acid fluorescent probes. Mutational analysis was performed using DNA from granulocytes. Reference curves were obtained using cloned fragments of MPL containing either the wild-type or mutated sequence; the predicted sensitivity level was at least 0.1% mutant allele in a wild-type background. None of the 60 control subjects presented with a MPLW515L/K mutation. Of 217 patients with myelofibrosis, 19 (8.7%) harbored the MPLW515 mutation, 10 (52.6%) with the W515L allele. In one case, both the W515L and W515K alleles were detected by real-time polymerase chain reaction. By comparing results obtained with conventional sequencing, no erroneous genotype attribution using real-time polymerase chain reaction was found, whereas one patient considered wild type according to sequence analysis actually harbored a low W515L allele burden. This is a simple, sensitive, and cost-effective procedure for large-scale screening of the MPLW515L/K mutation in patients suspected to have a myeloproliferative disorder. It can also provide a quantitative estimate of mutant allele burden that might be useful for both patient prognosis and monitoring response to therapy. PMID:18669880
WebSat--a web software for microsatellite marker development.

PubMed

Martins, Wellington Santos; Lucas, Divino César Soares; Neves, Kelligton Fabricio de Souza; Bertioli, David John

2009-01-01

Simple sequence repeats (SSR), also known as microsatellites, have been extensively used as molecular markers due to their abundance and high degree of polymorphism. We have developed a simple to use web software, called WebSat, for microsatellite molecular marker prediction and development. WebSat is accessible through the Internet, requiring no program installation. Although a web solution, it makes use of Ajax techniques, providing a rich, responsive user interface. WebSat allows the submission of sequences, visualization of microsatellites and the design of primers suitable for their amplification. The program allows full control of parameters and the easy export of the resulting data, thus facilitating the development of microsatellite markers. The web tool may be accessed at http://purl.oclc.org/NET/websat/

Plant genome and transcriptome annotations: from misconceptions to simple solutions

PubMed Central

Bolger, Marie E; Arsova, Borjana; Usadel, Björn

2018-01-01

Abstract Next-generation sequencing has triggered an explosion of available genomic and transcriptomic resources in the plant sciences. Although genome and transcriptome sequencing has become orders of magnitudes cheaper and more efficient, often the functional annotation process is lagging behind. This might be hampered by the lack of a comprehensive enumeration of simple-to-use tools available to the plant researcher. In this comprehensive review, we present (i) typical ontologies to be used in the plant sciences, (ii) useful databases and resources used for functional annotation, (iii) what to expect from an annotated plant genome, (iv) an automated annotation pipeline and (v) a recipe and reference chart outlining typical steps used to annotate plant genomes/transcriptomes using publicly available resources. PMID:28062412
GATA simple sequence repeats function as enhancer blocker boundaries.

PubMed

Kumar, Ram P; Krishnan, Jaya; Pratap Singh, Narendra; Singh, Lalji; Mishra, Rakesh K

2013-01-01

Simple sequence repeats (SSRs) account for ~3% of the human genome, but their functional significance still remains unclear. One of the prominent SSRs the GATA tetranucleotide repeat has preferentially accumulated in complex organisms. GATA repeats are particularly enriched on the human Y chromosome, and their non-random distribution and exclusive association with genes expressed during early development indicate their role in coordinated gene regulation. Here we show that GATA repeats have enhancer blocker activity in Drosophila and human cells. This enhancer blocker activity is seen in transgenic as well as native context of the enhancers at various developmental stages. These findings ascribe functional significance to SSRs and offer an explanation as to why SSRs, especially GATA, may have accumulated in complex organisms.
Assessment of clinical analytical sensitivity and specificity of next-generation sequencing for detection of simple and complex mutations.

PubMed

Chin, Ephrem L H; da Silva, Cristina; Hegde, Madhuri

2013-02-19

Detecting mutations in disease genes by full gene sequence analysis is common in clinical diagnostic laboratories. Sanger dideoxy terminator sequencing allows for rapid development and implementation of sequencing assays in the clinical laboratory, but it has limited throughput, and due to cost constraints, only allows analysis of one or at most a few genes in a patient. Next-generation sequencing (NGS), on the other hand, has evolved rapidly, although to date it has mainly been used for large-scale genome sequencing projects and is beginning to be used in the clinical diagnostic testing. One advantage of NGS is that many genes can be analyzed easily at the same time, allowing for mutation detection when there are many possible causative genes for a specific phenotype. In addition, regions of a gene typically not tested for mutations, like deep intronic and promoter mutations, can also be detected. Here we use 20 previously characterized Sanger-sequenced positive controls in disease-causing genes to demonstrate the utility of NGS in a clinical setting using standard PCR based amplification to assess the analytical sensitivity and specificity of the technology for detecting all previously characterized changes (mutations and benign SNPs). The positive controls chosen for validation range from simple substitution mutations to complex deletion and insertion mutations occurring in autosomal dominant and recessive disorders. The NGS data was 100% concordant with the Sanger sequencing data identifying all 119 previously identified changes in the 20 samples. We have demonstrated that NGS technology is ready to be deployed in clinical laboratories. However, NGS and associated technologies are evolving, and clinical laboratories will need to invest significantly in staff and infrastructure to build the necessary foundation for success.
Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

PubMed

Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

2017-01-03

Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.
Consistency of VDJ Rearrangement and Substitution Parameters Enables Accurate B Cell Receptor Sequence Annotation.

PubMed

Ralph, Duncan K; Matsen, Frederick A

2016-01-01

VDJ rearrangement and somatic hypermutation work together to produce antibody-coding B cell receptor (BCR) sequences for a remarkable diversity of antigens. It is now possible to sequence these BCRs in high throughput; analysis of these sequences is bringing new insight into how antibodies develop, in particular for broadly-neutralizing antibodies against HIV and influenza. A fundamental step in such sequence analysis is to annotate each base as coming from a specific one of the V, D, or J genes, or from an N-addition (a.k.a. non-templated insertion). Previous work has used simple parametric distributions to model transitions from state to state in a hidden Markov model (HMM) of VDJ recombination, and assumed that mutations occur via the same process across sites. However, codon frame and other effects have been observed to violate these parametric assumptions for such coding sequences, suggesting that a non-parametric approach to modeling the recombination process could be useful. In our paper, we find that indeed large modern data sets suggest a model using parameter-rich per-allele categorical distributions for HMM transition probabilities and per-allele-per-position mutation probabilities, and that using such a model for inference leads to significantly improved results. We present an accurate and efficient BCR sequence annotation software package using a novel HMM "factorization" strategy. This package, called partis (https://github.com/psathyrella/partis/), is built on a new general-purpose HMM compiler that can perform efficient inference given a simple text description of an HMM.
Cosmic microwave background radiation anisotropies in brane worlds.

PubMed

Koyama, Kazuya

2003-11-28

We propose a new formulation to calculate the cosmic microwave background (CMB) spectrum in the Randall-Sundrum two-brane model based on recent progress in solving the bulk geometry using a low energy approximation. The evolution of the anisotropic stress imprinted on the brane by the 5D Weyl tensor is calculated. An impact of the dark radiation perturbation on the CMB spectrum is investigated in a simple model assuming an initially scale-invariant adiabatic perturbation. The dark radiation perturbation induces isocurvature perturbations, but the resultant spectrum can be quite different from the prediction of simple mixtures of adiabatic and isocurvature perturbations due to Weyl anisotropic stress.
Draft Genome Sequence of the Cellulolytic Bacterium Clostridium papyrosolvens C7 (ATCC 700395).

PubMed

Zepeda, Veronica; Dassa, Bareket; Borovok, Ilya; Lamed, Raphael; Bayer, Edward A; Cate, Jamie H D

2013-09-12

We report the draft genome sequence of the cellulose-degrading bacterium Clostridium papyrosolvens C7, originally isolated from mud collected below a freshwater pond in Massachusetts. This Gram-positive bacterium grows in a mesophilic anaerobic environment with filter paper as the only carbon source, and it has a simple cellulosome system with multiple carbohydrate-degrading enzymes.
Draft Genome Sequence of the Cellulolytic Bacterium Clostridium papyrosolvens C7 (ATCC 700395)

PubMed Central

Zepeda, Veronica; Dassa, Bareket; Borovok, Ilya; Lamed, Raphael; Bayer, Edward A.

2013-01-01

We report the draft genome sequence of the cellulose-degrading bacterium Clostridium papyrosolvens C7, originally isolated from mud collected below a freshwater pond in Massachusetts. This Gram-positive bacterium grows in a mesophilic anaerobic environment with filter paper as the only carbon source, and it has a simple cellulosome system with multiple carbohydrate-degrading enzymes. PMID:24029755
Increasing Classroom Compliance: Using a High-Probability Command Sequence with Noncompliant Students

ERIC Educational Resources Information Center

Axelrod, Michael I.; Zank, Amber J.

2012-01-01

Noncompliance is one of the most problematic behaviors within the school setting. One strategy to increase compliance of noncompliant students is a high-probability command sequence (HPCS; i.e., a set of simple commands in which an individual is likely to comply immediately prior to the delivery of a command that has a lower probability of…
in silico Whole Genome Sequencer & Analyzer (iWGS): a computational pipeline to guide the design and analysis of de novo genome sequencing studies

USDA-ARS?s Scientific Manuscript database

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding it...
Simple sequence repeats in Escherichia coli: abundance, distribution, composition, and polymorphism.

PubMed

Gur-Arie, R; Cohen, C J; Eitan, Y; Shelef, L; Hallerman, E M; Kashi, Y

2000-01-01

Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.
Improved Spin-Echo-Edited NMR Diffusion Measurements

NASA Astrophysics Data System (ADS)

Otto, William H.; Larive, Cynthia K.

2001-12-01

The need for simple and robust schemes for the analysis of ligand-protein binding has resulted in the development of diffusion-based NMR techniques that can be used to assay binding in protein solutions containing a mixture of several ligands. As a means of gaining spectral selectivity in NMR diffusion measurements, a simple experiment, the gradient modified spin-echo (GOSE), has been developed to reject the resonances of coupled spins and detect only the singlets in the 1H NMR spectrum. This is accomplished by first using a spin echo to null the resonances of the coupled spins. Following the spin echo, the singlet magnetization is flipped out of the transverse plane and a dephasing gradient is applied to reduce the spectral artifacts resulting from incomplete cancellation of the J-coupled resonances. The resulting modular sequence is combined here with the BPPSTE pulse sequence; however, it could be easily incorporated into any pulse sequence where additional spectral selectivity is desired. Results obtained with the GOSE-BPPSTE pulse sequence are compared with those obtained with the BPPSTE and CPMG-BPPSTE experiments for a mixture containing the ligands resorcinol and tryptophan in a solution of human serum albumin.
Using Graphical Notations to Assess Children's Experiencing of Simple and Complex Musical Fragments

ERIC Educational Resources Information Center

Verschaffel, Lieven; Reybrouck, Mark; Janssens, Marjan; Van Dooren, Wim

2010-01-01

The aim of this study was to analyze children's graphical notations as external representations of their experiencing when listening to simple sonic stimuli and complex musical fragments. More specifically, we assessed the impact of four factors on children's notations: age, musical background, complexity of the fragment, and most salient…
DNA sequencing using fluorescence background electroblotting membrane

DOEpatents

Caldwell, Karin D.; Chu, Tun-Jen; Pitt, William G.

1992-01-01

A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through said smino groups contained on the surface thereof. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to said target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membrances may be reprobed numerous times.
DNA sequencing using fluorescence background electroblotting membrane

DOEpatents

Caldwell, K.D.; Chu, T.J.; Pitt, W.G.

1992-05-12

A method for the multiplex sequencing on DNA is disclosed which comprises the electroblotting or specific base terminated DNA fragments, which have been resolved by gel electrophoresis, onto the surface of a neutral non-aromatic polymeric microporous membrane exhibiting low background fluorescence which has been surface modified to contain amino groups. Polypropylene membranes are preferably and the introduction of amino groups is accomplished by subjecting the membrane to radio or microwave frequency plasma discharge in the presence of an aminating agent, preferably ammonia. The membrane, containing physically adsorbed DNA fragments on its surface after the electroblotting, is then treated with crosslinking means such as UV radiation or a glutaraldehyde spray to chemically bind the DNA fragments to the membrane through amino groups contained on the surface. The DNA fragments chemically bound to the membrane are subjected to hybridization probing with a tagged probe specific to the sequence of the DNA fragments. The tagging may be by either fluorophores or radioisotopes. The tagged probes hybridized to the target DNA fragments are detected and read by laser induced fluorescence detection or autoradiograms. The use of aminated low fluorescent background membranes allows the use of fluorescent detection and reading even when the available amount of DNA to be sequenced is small. The DNA bound to the membranes may be reprobed numerous times. No Drawings
On the normalization of the minimum free energy of RNAs by sequence length.

PubMed

Trotta, Edoardo

2014-01-01

The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size.
On the Normalization of the Minimum Free Energy of RNAs by Sequence Length

PubMed Central

Trotta, Edoardo

2014-01-01

The minimum free energy (MFE) of ribonucleic acids (RNAs) increases at an apparent linear rate with sequence length. Simple indices, obtained by dividing the MFE by the number of nucleotides, have been used for a direct comparison of the folding stability of RNAs of various sizes. Although this normalization procedure has been used in several studies, the relationship between normalized MFE and length has not yet been investigated in detail. Here, we demonstrate that the variation of MFE with sequence length is not linear and is significantly biased by the mathematical formula used for the normalization procedure. For this reason, the normalized MFEs strongly decrease as hyperbolic functions of length and produce unreliable results when applied for the comparison of sequences with different sizes. We also propose a simple modification of the normalization formula that corrects the bias enabling the use of the normalized MFE for RNAs longer than 40 nt. Using the new corrected normalized index, we analyzed the folding free energies of different human RNA families showing that most of them present an average MFE density more negative than expected for a typical genomic sequence. Furthermore, we found that a well-defined and restricted range of MFE density characterizes each RNA family, suggesting the use of our corrected normalized index to improve RNA prediction algorithms. Finally, in coding and functional human RNAs the MFE density appears scarcely correlated with sequence length, consistent with a negligible role of thermodynamic stability demands in determining RNA size. PMID:25405875
Forensic Loci Allele Database (FLAD): Automatically generated, permanent identifiers for sequenced forensic alleles.

PubMed

Van Neste, Christophe; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

2016-01-01

It is difficult to predict if and when massively parallel sequencing of forensic STR loci will replace capillary electrophoresis as the new standard technology in forensic genetics. The main benefits of sequencing are increased multiplexing scales and SNP detection. There is not yet a consensus on how sequenced profiles should be reported. We present the Forensic Loci Allele Database (FLAD) service, made freely available on http://forensic.ugent.be/FLAD/. It offers permanent identifiers for sequenced forensic alleles (STR or SNP) and their microvariants for use in forensic allele nomenclature. Analogous to Genbank, its aim is to provide permanent identifiers for forensically relevant allele sequences. Researchers that are developing forensic sequencing kits or are performing population studies, can register on http://forensic.ugent.be/FLAD/ and add loci and allele sequences with a short and simple application interface (API). Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
Single molecule targeted sequencing for cancer gene mutation detection.

PubMed

Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui

2016-05-19

With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis.
Mining and Development of Novel SSR Markers Using Next Generation Sequencing (NGS) Data in Plants.

PubMed

Taheri, Sima; Lee Abdullah, Thohirah; Yusop, Mohd Rafii; Hanafi, Mohamed Musa; Sahebi, Mahbod; Azizi, Parisa; Shamshiri, Redmond Ramin

2018-02-13

Microsatellites, or simple sequence repeats (SSRs), are one of the most informative and multi-purpose genetic markers exploited in plant functional genomics. However, the discovery of SSRs and development using traditional methods are laborious, time-consuming, and costly. Recently, the availability of high-throughput sequencing technologies has enabled researchers to identify a substantial number of microsatellites at less cost and effort than traditional approaches. Illumina is a noteworthy transcriptome sequencing technology that is currently used in SSR marker development. Although 454 pyrosequencing datasets can be used for SSR development, this type of sequencing is no longer supported. This review aims to present an overview of the next generation sequencing, with a focus on the efficient use of de novo transcriptome sequencing (RNA-Seq) and related tools for mining and development of microsatellites in plants.

SEXCMD: Development and validation of sex marker sequences for whole-exome/genome and RNA sequencing.

PubMed

Jeong, Seongmun; Kim, Jiwoong; Park, Won; Jeon, Hongmin; Kim, Namshin

2017-01-01

Over the last decade, a large number of nucleotide sequences have been generated by next-generation sequencing technologies and deposited to public databases. However, most of these datasets do not specify the sex of individuals sampled because researchers typically ignore or hide this information. Male and female genomes in many species have distinctive sex chromosomes, XX/XY and ZW/ZZ, and expression levels of many sex-related genes differ between the sexes. Herein, we describe how to develop sex marker sequences from syntenic regions of sex chromosomes and use them to quickly identify the sex of individuals being analyzed. Array-based technologies routinely use either known sex markers or the B-allele frequency of X or Z chromosomes to deduce the sex of an individual. The same strategy has been used with whole-exome/genome sequence data; however, all reads must be aligned onto a reference genome to determine the B-allele frequency of the X or Z chromosomes. SEXCMD is a pipeline that can extract sex marker sequences from reference sex chromosomes and rapidly identify the sex of individuals from whole-exome/genome and RNA sequencing after training with a known dataset through a simple machine learning approach. The pipeline counts total numbers of hits from sex-specific marker sequences and identifies the sex of the individuals sampled based on the fact that XX/ZZ samples do not have Y or W chromosome hits. We have successfully validated our pipeline with mammalian (Homo sapiens; XY) and avian (Gallus gallus; ZW) genomes. Typical calculation time when applying SEXCMD to human whole-exome or RNA sequencing datasets is a few minutes, and analyzing human whole-genome datasets takes about 10 minutes. Another important application of SEXCMD is as a quality control measure to avoid mixing samples before bioinformatics analysis. SEXCMD comprises simple Python and R scripts and is freely available at https://github.com/lovemun/SEXCMD.
An innovative diagnostic technology for the codon mutation C580Y in kelch13 of Plasmodium falciparum with MinION nanopore sequencer.

PubMed

Imai, Kazuo; Tarumoto, Norihito; Runtuwene, Lucky Ronald; Sakai, Jun; Hayashida, Kyoko; Eshita, Yuki; Maeda, Ryuichiro; Tuda, Josef; Ohno, Hideaki; Murakami, Takashi; Maesaki, Shigefumi; Suzuki, Yutaka; Yamagishi, Junya; Maeda, Takuya

2018-05-29

The recent spread of artemisinin (ART)-resistant Plasmodium falciparum represents an emerging global threat to public health. In Southeast Asia, the C580Y mutation of kelch13 (k13) is the dominant mutation of ART-resistant P. falciparum. Therefore, a simple method for the detection of C580Y mutation is urgently needed to enable widespread routine surveillance in the field. The aim of this study is to develop a new diagnostic procedure for the C580Y mutation using loop-mediated isothermal amplification (LAMP) combined with the MinION nanopore sequencer. A LAMP assay for the k13 gene of P. falciparum to detect the C580Y mutation was successfully developed. The detection limit of this procedure was 10 copies of the reference plasmid harboring the k13 gene within 60 min. Thereafter, amplicon sequencing of the LAMP products using the MinION nanopore sequencer was performed to clarify the nucleotide sequences of the gene. The C580Y mutation was identified based on the sequence data collected from MinION reads 30 min after the start of sequencing. Further, clinical evaluation of the LAMP assay in 34 human blood samples collected from patients with P. falciparum malaria in Indonesia revealed a positive detection rate of 100%. All LAMP amplicons of up to 12 specimens were simultaneously sequenced using MinION. The results of sequencing were consistent with those of the conventional PCR and Sanger sequencing protocol. All procedures from DNA extraction to variant calling were completed within 3 h. The C580Y mutation was not found among these 34 P. falciparum isolates in Indonesia. An innovative method combining LAMP and MinION will enable simple, rapid, and high-sensitivity detection of the C580Y mutation of P. falciparum, even in resource-limited situations in developing countries.
TrawlerWeb: an online de novo motif discovery tool for next-generation sequencing datasets.

PubMed

Dang, Louis T; Tondl, Markus; Chiu, Man Ho H; Revote, Jerico; Paten, Benedict; Tano, Vincent; Tokolyi, Alex; Besse, Florence; Quaife-Ryan, Greg; Cumming, Helen; Drvodelic, Mark J; Eichenlaub, Michael P; Hallab, Jeannette C; Stolper, Julian S; Rossello, Fernando J; Bogoyevitch, Marie A; Jans, David A; Nim, Hieu T; Porrello, Enzo R; Hudson, James E; Ramialison, Mirana

2018-04-05

A strong focus of the post-genomic era is mining of the non-coding regulatory genome in order to unravel the function of regulatory elements that coordinate gene expression (Nat 489:57-74, 2012; Nat 507:462-70, 2014; Nat 507:455-61, 2014; Nat 518:317-30, 2015). Whole-genome approaches based on next-generation sequencing (NGS) have provided insight into the genomic location of regulatory elements throughout different cell types, organs and organisms. These technologies are now widespread and commonly used in laboratories from various fields of research. This highlights the need for fast and user-friendly software tools dedicated to extracting cis-regulatory information contained in these regulatory regions; for instance transcription factor binding site (TFBS) composition. Ideally, such tools should not require prior programming knowledge to ensure they are accessible for all users. We present TrawlerWeb, a web-based version of the Trawler_standalone tool (Nat Methods 4:563-5, 2007; Nat Protoc 5:323-34, 2010), to allow for the identification of enriched motifs in DNA sequences obtained from next-generation sequencing experiments in order to predict their TFBS composition. TrawlerWeb is designed for online queries with standard options common to web-based motif discovery tools. In addition, TrawlerWeb provides three unique new features: 1) TrawlerWeb allows the input of BED files directly generated from NGS experiments, 2) it automatically generates an input-matched biologically relevant background, and 3) it displays resulting conservation scores for each instance of the motif found in the input sequences, which assists the researcher in prioritising the motifs to validate experimentally. Finally, to date, this web-based version of Trawler_standalone remains the fastest online de novo motif discovery tool compared to other popular web-based software, while generating predictions with high accuracy. TrawlerWeb provides users with a fast, simple and easy-to-use web interface for de novo motif discovery. This will assist in rapidly analysing NGS datasets that are now being routinely generated. TrawlerWeb is freely available and accessible at: http://trawler.erc.monash.edu.au .
High performance MRI simulations of motion on multi-GPU systems

PubMed Central

2014-01-01

Background MRI physics simulators have been developed in the past for optimizing imaging protocols and for training purposes. However, these simulators have only addressed motion within a limited scope. The purpose of this study was the incorporation of realistic motion, such as cardiac motion, respiratory motion and flow, within MRI simulations in a high performance multi-GPU environment. Methods Three different motion models were introduced in the Magnetic Resonance Imaging SIMULator (MRISIMUL) of this study: cardiac motion, respiratory motion and flow. Simulation of a simple Gradient Echo pulse sequence and a CINE pulse sequence on the corresponding anatomical model was performed. Myocardial tagging was also investigated. In pulse sequence design, software crushers were introduced to accommodate the long execution times in order to avoid spurious echoes formation. The displacement of the anatomical model isochromats was calculated within the Graphics Processing Unit (GPU) kernel for every timestep of the pulse sequence. Experiments that would allow simulation of custom anatomical and motion models were also performed. Last, simulations of motion with MRISIMUL on single-node and multi-node multi-GPU systems were examined. Results Gradient Echo and CINE images of the three motion models were produced and motion-related artifacts were demonstrated. The temporal evolution of the contractility of the heart was presented through the application of myocardial tagging. Better simulation performance and image quality were presented through the introduction of software crushers without the need to further increase the computational load and GPU resources. Last, MRISIMUL demonstrated an almost linear scalable performance with the increasing number of available GPU cards, in both single-node and multi-node multi-GPU computer systems. Conclusions MRISIMUL is the first MR physics simulator to have implemented motion with a 3D large computational load on a single computer multi-GPU configuration. The incorporation of realistic motion models, such as cardiac motion, respiratory motion and flow may benefit the design and optimization of existing or new MR pulse sequences, protocols and algorithms, which examine motion related MR applications. PMID:24996972
Genome signature analysis of thermal virus metagenomes reveals Archaea and thermophilic signatures

PubMed Central

Pride, David T; Schoenfeld, Thomas

2008-01-01

Background Metagenomic analysis provides a rich source of biological information for otherwise intractable viral communities. However, study of viral metagenomes has been hampered by its nearly complete reliance on BLAST algorithms for identification of DNA sequences. We sought to develop algorithms for examination of viral metagenomes to identify the origin of sequences independent of BLAST algorithms. We chose viral metagenomes obtained from two hot springs, Bear Paw and Octopus, in Yellowstone National Park, as they represent simple microbial populations where comparatively large contigs were obtained. Thermal spring metagenomes have high proportions of sequences without significant Genbank homology, which has hampered identification of viruses and their linkage with hosts. To analyze each metagenome, we developed a method to classify DNA fragments using genome signature-based phylogenetic classification (GSPC), where metagenomic fragments are compared to a database of oligonucleotide signatures for all previously sequenced Bacteria, Archaea, and viruses. Results From both Bear Paw and Octopus hot springs, each assembled contig had more similarity to other metagenome contigs than to any sequenced microbial genome based on GSPC analysis, suggesting a genome signature common to each of these extreme environments. While viral metagenomes from Bear Paw and Octopus share some similarity, the genome signatures from each locale are largely unique. GSPC using a microbial database predicts most of the Octopus metagenome has archaeal signatures, while bacterial signatures predominate in Bear Paw; a finding consistent with those of Genbank BLAST. When using a viral database, the majority of the Octopus metagenome is predicted to belong to archaeal virus Families Globuloviridae and Fuselloviridae, while none of the Bear Paw metagenome is predicted to belong to archaeal viruses. As expected, when microbial and viral databases are combined, each of the Octopus and Bear Paw metagenomic contigs are predicted to belong to viruses rather than to any Bacteria or Archaea, consistent with the apparent viral origin of both metagenomes. Conclusion That BLAST searches identify no significant homologs for most metagenome contigs, while GSPC suggests their origin as archaeal viruses or bacteriophages, indicates GSPC provides a complementary approach in viral metagenomic analysis. PMID:18798991
Novel numerical and graphical representation of DNA sequences and proteins.

PubMed

Randić, M; Novic, M; Vikić-Topić, D; Plavsić, D

2006-12-01

We have introduced novel numerical and graphical representations of DNA, which offer a simple and unique characterization of DNA sequences. The numerical representation of a DNA sequence is given as a sequence of real numbers derived from a unique graphical representation of the standard genetic code. There is no loss of information on the primary structure of a DNA sequence associated with this numerical representation. The novel representations are illustrated with the coding sequences of the first exon of beta-globin gene of half a dozen species in addition to human. The method can be extended to proteins as is exemplified by humanin, a 24-aa peptide that has recently been identified as a specific inhibitor of neuronal cell death induced by familial Alzheimer's disease mutant genes.
Complexity and Entropy Analysis of DNMT1 Gene

USDA-ARS?s Scientific Manuscript database

Background: The application of complexity information on DNA sequence and protein in biological processes are well established in this study. Available sequences for DNMT1 gene, which is a maintenance methyltransferase is responsible for copying DNA methylation patterns to the daughter strands durin...
An interleaved sequence for simultaneous magnetic resonance angiography (MRA), susceptibility weighted imaging (SWI) and quantitative susceptibility mapping (QSM).

PubMed

Chen, Yongsheng; Liu, Saifeng; Buch, Sagar; Hu, Jiani; Kang, Yan; Haacke, E Mark

2018-04-01

To image the entire vasculature of the brain with complete suppression of signal from background tissue using a single 3D excitation interleaved rephased/dephased multi-echo gradient echo sequence. This ensures no loss of signal from fast flow and provides co-registered susceptibility weighted images (SWI) and quantitative susceptibility maps (QSM) from the same scan. The suppression of background tissue was accomplished by subtracting the flow-dephased images from the flow-rephased images with the same echo time of 12.5ms to generate a magnetic resonance angiogram and venogram (MRAV). Further, a 2.5ms flow-compensated echo was added in the rephased portion to provide sufficient signal for major arteries with fast flow. The QSM data from the rephased 12.5ms echo was used to suppress veins on the MRAV to generate an artery-only MRA. The proposed approach was tested on five healthy volunteers at 3T. This three-echo interleaved GRE sequence provided complete background suppression of stationary tissues, while the short echo data gave high signal in the internal carotid and middle cerebral arteries (MCA). The contrast-to-noise ratio (CNR) of the arteries was significantly improved in the M3 territory of the MCA compared to the non-linear subtraction MRA and TOF-MRA. Veins were suppressed successfully utilizing the QSM data. The background tissue can be properly suppressed using the proposed interleaved MRAV sequence. One can obtain whole brain MRAV, MRA, SWI, true-SWI (or tSWI) and QSM data simultaneously from a single scan. Published by Elsevier Inc.
High-throughput sequencing of forensic genetic samples using punches of FTA cards with buccal swabs.

PubMed

Kampmann, Marie-Louise; Buchard, Anders; Børsting, Claus; Morling, Niels

2016-01-01

Here, we demonstrate that punches from buccal swab samples preserved on FTA cards can be used for high-throughput DNA sequencing, also known as massively parallel sequencing (MPS). We typed 44 reference samples with the HID-Ion AmpliSeq Identity Panel using washed 1.2 mm punches from FTA cards with buccal swabs and compared the results with those obtained with DNA extracted using the EZ1 DNA Investigator Kit. Concordant profiles were obtained for all samples. Our protocol includes simple punch, wash, and PCR steps, reducing cost and hands-on time in the laboratory. Furthermore, it facilitates automation of DNA sequencing.
Arbitrarily accurate twin composite π -pulse sequences

NASA Astrophysics Data System (ADS)

Torosov, Boyan T.; Vitanov, Nikolay V.

2018-04-01

We present three classes of symmetric broadband composite pulse sequences. The composite phases are given by analytic formulas (rational fractions of π ) valid for any number of constituent pulses. The transition probability is expressed by simple analytic formulas and the order of pulse area error compensation grows linearly with the number of pulses. Therefore, any desired compensation order can be produced by an appropriate composite sequence; in this sense, they are arbitrarily accurate. These composite pulses perform equally well as or better than previously published ones. Moreover, the current sequences are more flexible as they allow total pulse areas of arbitrary integer multiples of π .
SHAPE Selection (SHAPES) enrich for RNA structure signal in SHAPE sequencing-based probing data

PubMed Central

Poulsen, Line Dahl; Kielpinski, Lukasz Jan; Salama, Sofie R.; Krogh, Anders; Vinther, Jeppe

2015-01-01

Selective 2′ Hydroxyl Acylation analyzed by Primer Extension (SHAPE) is an accurate method for probing of RNA secondary structure. In existing SHAPE methods, the SHAPE probing signal is normalized to a no-reagent control to correct for the background caused by premature termination of the reverse transcriptase. Here, we introduce a SHAPE Selection (SHAPES) reagent, N-propanone isatoic anhydride (NPIA), which retains the ability of SHAPE reagents to accurately probe RNA structure, but also allows covalent coupling between the SHAPES reagent and a biotin molecule. We demonstrate that SHAPES-based selection of cDNA–RNA hybrids on streptavidin beads effectively removes the large majority of background signal present in SHAPE probing data and that sequencing-based SHAPES data contain the same amount of RNA structure data as regular sequencing-based SHAPE data obtained through normalization to a no-reagent control. Moreover, the selection efficiently enriches for probed RNAs, suggesting that the SHAPES strategy will be useful for applications with high-background and low-probing signal such as in vivo RNA structure probing. PMID:25805860
Simple Sequence Repeats in Escherichia coli: Abundance, Distribution, Composition, and Polymorphism

PubMed Central

Gur-Arie, Riva; Cohen, Cyril J.; Eitan, Yuval; Shelef, Leora; Hallerman, Eric M.; Kashi, Yechezkel

2000-01-01

Computer-based genome-wide screening of the DNA sequence of Escherichia coli strain K12 revealed tens of thousands of tandem simple sequence repeat (SSR) tracts, with motifs ranging from 1 to 6 nucleotides. SSRs were well distributed throughout the genome. Mononucleotide SSRs were over-represented in noncoding regions and under-represented in open reading frames (ORFs). Nucleotide composition of mono- and dinucleotide SSRs, both in ORFs and in noncoding regions, differed from that of the genomic region in which they occurred, with 93% of all mononucleotide SSRs proving to be of A or T. Computer-based analysis of the fine position of every SSR locus in the noncoding portion of the genome relative to downstream ORFs showed SSRs located in areas that could affect gene regulation. DNA sequences at 14 arbitrarily chosen SSR tracts were compared among E. coli strains. Polymorphisms of SSR copy number were observed at four of seven mononucleotide SSR tracts screened, with all polymorphisms occurring in noncoding regions. SSR polymorphism could prove important as a genome-wide source of variation, both for practical applications (including rapid detection, strain identification, and detection of loci affecting key phenotypes) and for evolutionary adaptation of microbes.[The sequence data described in this paper have been submitted to the GenBank data library under accession numbers AF209020–209030 and AF209508–209518.] PMID:10645951
A founder large deletion mutation in Xeroderma pigmentosum-Variant form in Tunisia: implication for molecular diagnosis and therapy.

PubMed

Ben Rekaya, Mariem; Laroussi, Nadia; Messaoud, Olfa; Jones, Mariem; Jerbi, Manel; Naouali, Chokri; Bouyacoub, Yosra; Chargui, Mariem; Kefi, Rym; Fazaa, Becima; Boubaker, Mohamed Samir; Boussen, Hamouda; Mokni, Mourad; Abdelhak, Sonia; Zghal, Mohamed; Khaled, Aida; Yacoub-Youssef, Houda

2014-01-01

Xeroderma pigmentosum Variant (XP-V) form is characterized by a late onset of skin symptoms. Our aim is the clinical and genetic investigations of XP-V Tunisian patients in order to develop a simple tool for early diagnosis. We investigated 16 suspected XP patients belonging to ten consanguineous families. Analysis of the POLH gene was performed by linkage analysis, long range PCR, and sequencing. Genetic analysis showed linkage to the POLH gene with a founder haplotype in all affected patients. Long range PCR of exon 9 to exon 11 showed a 3926 bp deletion compared to control individuals. Sequence analysis demonstrates that this deletion has occurred between two Alu-Sq2 repetitive sequences in the same orientation, respectively, in introns 9 and 10. We suggest that this mutation POLH NG_009252.1: g.36847_40771del3925 is caused by an equal crossover event that occurred between two homologous chromosomes at meiosis. These results allowed us to develop a simple test based on a simple PCR in order to screen suspected XP-V patients. In Tunisia, the prevalence of XP-V group seems to be underestimated and clinical diagnosis is usually later. Cascade screening of this founder mutation by PCR in regions with high frequency of XP provides a rapid and cost-effective tool for early diagnosis of XP-V in Tunisia and North Africa.
A Founder Large Deletion Mutation in Xeroderma Pigmentosum-Variant Form in Tunisia: Implication for Molecular Diagnosis and Therapy

PubMed Central

Ben Rekaya, Mariem; Laroussi, Nadia; Messaoud, Olfa; Jones, Mariem; Jerbi, Manel; Bouyacoub, Yosra; Chargui, Mariem; Kefi, Rym; Fazaa, Becima; Boubaker, Mohamed Samir; Boussen, Hamouda; Mokni, Mourad; Abdelhak, Sonia; Zghal, Mohamed; Khaled, Aida; Yacoub-Youssef, Houda

2014-01-01

Xeroderma pigmentosum Variant (XP-V) form is characterized by a late onset of skin symptoms. Our aim is the clinical and genetic investigations of XP-V Tunisian patients in order to develop a simple tool for early diagnosis. We investigated 16 suspected XP patients belonging to ten consanguineous families. Analysis of the POLH gene was performed by linkage analysis, long range PCR, and sequencing. Genetic analysis showed linkage to the POLH gene with a founder haplotype in all affected patients. Long range PCR of exon 9 to exon 11 showed a 3926 bp deletion compared to control individuals. Sequence analysis demonstrates that this deletion has occurred between two Alu-Sq2 repetitive sequences in the same orientation, respectively, in introns 9 and 10. We suggest that this mutation POLH NG_009252.1: g.36847_40771del3925 is caused by an equal crossover event that occurred between two homologous chromosomes at meiosis. These results allowed us to develop a simple test based on a simple PCR in order to screen suspected XP-V patients. In Tunisia, the prevalence of XP-V group seems to be underestimated and clinical diagnosis is usually later. Cascade screening of this founder mutation by PCR in regions with high frequency of XP provides a rapid and cost-effective tool for early diagnosis of XP-V in Tunisia and North Africa. PMID:24877075
Long-Term Predictive and Feedback Encoding of Motor Signals in the Simple Spike Discharge of Purkinje Cells

PubMed Central

Popa, Laurentiu S.; Streng, Martha L.

2017-01-01

Abstract Most hypotheses of cerebellar function emphasize a role in real-time control of movements. However, the cerebellum’s use of current information to adjust future movements and its involvement in sequencing, working memory, and attention argues for predicting and maintaining information over extended time windows. The present study examines the time course of Purkinje cell discharge modulation in the monkey (Macaca mulatta) during manual, pseudo-random tracking. Analysis of the simple spike firing from 183 Purkinje cells during tracking reveals modulation up to 2 s before and after kinematics and position error. Modulation significance was assessed against trial shuffled firing, which decoupled simple spike activity from behavior and abolished long-range encoding while preserving data statistics. Position, velocity, and position errors have the most frequent and strongest long-range feedforward and feedback modulations, with less common, weaker long-term correlations for speed and radial error. Position, velocity, and position errors can be decoded from the population simple spike firing with considerable accuracy for even the longest predictive (-2000 to -1500 ms) and feedback (1500 to 2000 ms) epochs. Separate analysis of the simple spike firing in the initial hold period preceding tracking shows similar long-range feedforward encoding of the upcoming movement and in the final hold period feedback encoding of the just completed movement, respectively. Complex spike analysis reveals little long-term modulation with behavior. We conclude that Purkinje cell simple spike discharge includes short- and long-range representations of both upcoming and preceding behavior that could underlie cerebellar involvement in error correction, working memory, and sequencing. PMID:28413823
Iteration with Spreadsheets.

ERIC Educational Resources Information Center

Smith, Michael

1990-01-01

Presents several examples of the iteration method using computer spreadsheets. Examples included are simple iterative sequences and the solution of equations using the Newton-Raphson formula, linear interpolation, and interval bisection. (YP)
A SSR-based composite genetic linkage map for the cultivated peanut (Arachis hypogaea L.) genome

PubMed Central

2010-01-01

Background The construction of genetic linkage maps for cultivated peanut (Arachis hypogaea L.) has and continues to be an important research goal to facilitate quantitative trait locus (QTL) analysis and gene tagging for use in a marker-assisted selection in breeding. Even though a few maps have been developed, they were constructed using diploid or interspecific tetraploid populations. The most recently published intra-specific map was constructed from the cross of cultivated peanuts, in which only 135 simple sequence repeat (SSR) markers were sparsely populated in 22 linkage groups. The more detailed linkage map with sufficient markers is necessary to be feasible for QTL identification and marker-assisted selection. The objective of this study was to construct a genetic linkage map of cultivated peanut using simple sequence repeat (SSR) markers derived primarily from peanut genomic sequences, expressed sequence tags (ESTs), and by "data mining" sequences released in GenBank. Results Three recombinant inbred lines (RILs) populations were constructed from three crosses with one common female parental line Yueyou 13, a high yielding Spanish market type. The four parents were screened with 1044 primer pairs designed to amplify SSRs and 901 primer pairs produced clear PCR products. Of the 901 primer pairs, 146, 124 and 64 primer pairs (markers) were polymorphic in these populations, respectively, and used in genotyping these RIL populations. Individual linkage maps were constructed from each of the three populations and a composite map based on 93 common loci were created using JoinMap. The composite linkage maps consist of 22 composite linkage groups (LG) with 175 SSR markers (including 47 SSRs on the published AA genome maps), representing the 20 chromosomes of A. hypogaea. The total composite map length is 885.4 cM, with an average marker density of 5.8 cM. Segregation distortion in the 3 populations was 23.0%, 13.5% and 7.8% of the markers, respectively. These distorted loci tended to cluster on LG1, LG3, LG4 and LG5. There were only 15 EST-SSR markers mapped due to low polymorphism. By comparison, there were potential synteny, collinear order of some markers and conservation of collinear linkage groups among the maps and with the AA genome but not fully conservative. Conclusion A composite linkage map was constructed from three individual mapping populations with 175 SSR markers in 22 composite linkage groups. This composite genetic linkage map is among the first "true" tetraploid peanut maps produced. This map also consists of 47 SSRs that have been used in the published AA genome maps, and could be used in comparative mapping studies. The primers described in this study are PCR-based markers, which are easy to share for genetic mapping in peanuts. All 1044 primer pairs are provided as additional files and the three RIL populations will be made available to public upon request for quantitative trait loci (QTL) analysis and linkage map improvement. PMID:20105299
Flexible, Mastery-Oriented Astrophysics Sequence.

ERIC Educational Resources Information Center

Zeilik, Michael, II

1981-01-01

Describes the implementation and impact of a two-semester mastery-oriented astrophysics sequence for upper-level physics/astrophysics majors designed to handle flexibly a wide range of student backgrounds. A Personalized System of Instruction (PSI) format was used fostering frequent student-instructor interaction and role-modeling behavior in…
FUNGAL-SPECIFIC PCR PRIMERS DEVELOPED FOR ANALYSIS OF THE ITS REGION OF ENVIRONMENTAL DNA EXTRACTS

EPA Science Inventory

Background The Internal Transcribed Spacer (ITS) regions of fungal ribosomal DNA (rDNA) are highly variable sequences of great importance in distinguishing fungal species by PCR analysis. Previously published PCR primers available for amplifying these sequences from environmenta...
Evaluation of rapid and simple techniques for the enrichment of viruses prior to metagenomic virus discovery.

PubMed

Hall, Richard J; Wang, Jing; Todd, Angela K; Bissielo, Ange B; Yen, Seiha; Strydom, Hugo; Moore, Nicole E; Ren, Xiaoyun; Huang, Q Sue; Carter, Philip E; Peacey, Matthew

2014-01-01

The discovery of new or divergent viruses using metagenomics and high-throughput sequencing has become more commonplace. The preparation of a sample is known to have an effect on the representation of virus sequences within the metagenomic dataset yet comparatively little attention has been given to this. Physical enrichment techniques are often applied to samples to increase the number of viral sequences and therefore enhance the probability of detection. With the exception of virus ecology studies, there is a paucity of information available to researchers on the type of sample preparation required for a viral metagenomic study that seeks to identify an aetiological virus in an animal or human diagnostic sample. A review of published virus discovery studies revealed the most commonly used enrichment methods, that were usually quick and simple to implement, namely low-speed centrifugation, filtration, nuclease-treatment (or combinations of these) which have been routinely used but often without justification. These were applied to a simple and well-characterised artificial sample composed of bacterial and human cells, as well as DNA (adenovirus) and RNA viruses (influenza A and human enterovirus), being either non-enveloped capsid or enveloped viruses. The effect of the enrichment method was assessed by both quantitative real-time PCR and metagenomic analysis that incorporated an amplification step. Reductions in the absolute quantities of bacteria and human cells were observed for each method as determined by qPCR, but the relative abundance of viral sequences in the metagenomic dataset remained largely unchanged. A 3-step method of centrifugation, filtration and nuclease-treatment showed the greatest increase in the proportion of viral sequences. This study provides a starting point for the selection of a purification method in future virus discovery studies, and highlights the need for more data to validate the effect of enrichment methods on different sample types, amplification, bioinformatics approaches and sequencing platforms. This study also highlights the potential risks that may attend selection of a virus enrichment method without any consideration for the sample type being investigated. Copyright © 2013 The Authors. Published by Elsevier B.V. All rights reserved.

A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis

PubMed Central

2017-01-01

Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338
A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.

PubMed

Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi

2017-01-01

Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.
Cognitive Impairment among the Aging Population in a Community in Southwest Nigeria

ERIC Educational Resources Information Center

Adebiyi, Akindele O.; Ogunniyi, Adesola; Adediran, Babatunde A.; Olakehinde, Olaide O.; Siwoku, Akeem A.

2016-01-01

Background: Vascular risk models can be quite informative in assisting the clinician to make a prediction of an individual's risk of cognitive impairment. Thus, a simple marker is a priority for low-capacity settings. This study examines the association of selected simple to deploy vascular markers with cognitive impairment in an elderly…
Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

PubMed Central

Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

2012-01-01

RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676
The Joint Effects of Background Selection and Genetic Recombination on Local Gene Genealogies

PubMed Central

Zeng, Kai; Charlesworth, Brian

2011-01-01

Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data. PMID:21705759
The joint effects of background selection and genetic recombination on local gene genealogies.

PubMed

Zeng, Kai; Charlesworth, Brian

2011-09-01

Background selection, the effects of the continual removal of deleterious mutations by natural selection on variability at linked sites, is potentially a major determinant of DNA sequence variability. However, the joint effects of background selection and genetic recombination on the shape of the neutral gene genealogy have proved hard to study analytically. The only existing formula concerns the mean coalescent time for a pair of alleles, making it difficult to assess the importance of background selection from genome-wide data on sequence polymorphism. Here we develop a structured coalescent model of background selection with recombination and implement it in a computer program that efficiently generates neutral gene genealogies for an arbitrary sample size. We check the validity of the structured coalescent model against forward-in-time simulations and show that it accurately captures the effects of background selection. The model produces more accurate predictions of the mean coalescent time than the existing formula and supports the conclusion that the effect of background selection is greater in the interior of a deleterious region than at its boundaries. The level of linkage disequilibrium between sites is elevated by background selection, to an extent that is well summarized by a change in effective population size. The structured coalescent model is readily extendable to more realistic situations and should prove useful for analyzing genome-wide polymorphism data.
A platform for efficient genotyping in Musa using microsatellite markers

PubMed Central

Christelová, Pavla; Valárik, Miroslav; Hřibová, Eva; Van den houwe, Ines; Channelière, Stéphanie; Roux, Nicolas; Doležel, Jaroslav

2011-01-01

Background and aims Bananas and plantains (Musa spp.) are one of the major fruit crops worldwide with acknowledged importance as a staple food for millions of people. The rich genetic diversity of this crop is, however, endangered by diseases, adverse environmental conditions and changed farming practices, and the need for its characterization and preservation is urgent. With the aim of providing a simple and robust approach for molecular characterization of Musa species, we developed an optimized genotyping platform using 19 published simple sequence repeat markers. Methodology The genotyping system is based on 19 microsatellite loci, which are scored using fluorescently labelled primers and high-throughput capillary electrophoresis separation with high resolution. This genotyping platform was tested and optimized on a set of 70 diploid and 38 triploid banana accessions. Principal results The marker set used in this study provided enough polymorphism to discriminate between individual species, subspecies and subgroups of all accessions of Musa. Likewise, the capability of identifying duplicate samples was confirmed. Based on the results of a blind test, the genotyping system was confirmed to be suitable for characterization of unknown accessions. Conclusions Here we report on the first complex and standardized platform for molecular characterization of Musa germplasm that is ready to use for the wider Musa research and breeding community. We believe that this genotyping system offers a versatile tool that can accommodate all possible requirements for characterizing Musa diversity, and is economical for samples ranging from one to many accessions. PMID:22476494
SERE: Single-parameter quality control and sample comparison for RNA-Seq

PubMed Central

2012-01-01

Background Assessing the reliability of experimental replicates (or global alterations corresponding to different experimental conditions) is a critical step in analyzing RNA-Seq data. Pearson’s correlation coefficient r has been widely used in the RNA-Seq field even though its statistical characteristics may be poorly suited to the task. Results Here we present a single-parameter test procedure for count data, the Simple Error Ratio Estimate (SERE), that can determine whether two RNA-Seq libraries are faithful replicates or globally different. Benchmarking shows that the interpretation of SERE is unambiguous regardless of the total read count or the range of expression differences among bins (exons or genes), a score of 1 indicating faithful replication (i.e., samples are affected only by Poisson variation of individual counts), a score of 0 indicating data duplication, and scores >1 corresponding to true global differences between RNA-Seq libraries. On the contrary the interpretation of Pearson’s r is generally ambiguous and highly dependent on sequencing depth and the range of expression levels inherent to the sample (difference between lowest and highest bin count). Cohen’s simple Kappa results are also ambiguous and are highly dependent on the choice of bins. For quantifying global sample differences SERE performs similarly to a measure based on the negative binomial distribution yet is simpler to compute. Conclusions SERE can therefore serve as a straightforward and reliable statistical procedure for the global assessment of pairs or large groups of RNA-Seq datasets by a single statistical parameter. PMID:23033915
WebSat ‐ A web software for microsatellite marker development

PubMed Central

Martins, Wellington Santos; Soares Lucas, Divino César; de Souza Neves, Kelligton Fabricio; Bertioli, David John

2009-01-01

Simple sequence repeats (SSR), also known as microsatellites, have been extensively used as molecular markers due to their abundance and high degree of polymorphism. We have developed a simple to use web software, called WebSat, for microsatellite molecular marker prediction and development. WebSat is accessible through the Internet, requiring no program installation. Although a web solution, it makes use of Ajax techniques, providing a rich, responsive user interface. WebSat allows the submission of sequences, visualization of microsatellites and the design of primers suitable for their amplification. The program allows full control of parameters and the easy export of the resulting data, thus facilitating the development of microsatellite markers. Availability The web tool may be accessed at http://purl.oclc.org/NET/websat/ PMID:19255650
Molecular Analysis of Date Palm Genetic Diversity Using Random Amplified Polymorphic DNA (RAPD) and Inter-Simple Sequence Repeats (ISSRs).

PubMed

El Sharabasy, Sherif F; Soliman, Khaled A

2017-01-01

The date palm is an ancient domesticated plant with great diversity and has been cultivated in the Middle East and North Africa for at last 5000 years. Date palm cultivars are classified based on the fruit moisture content, as dry, semidry, and soft dates. There are a number of biochemical and molecular techniques available for characterization of the date palm variation. This chapter focuses on the DNA-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeats (ISSR) techniques, in addition to biochemical markers based on isozyme analysis. These techniques coupled with appropriate statistical tools proved useful for determining phylogenetic relationships among date palm cultivars and provide information resources for date palm gene banks.
Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

PubMed Central

2011-01-01

Background Melon (Cucumis melo), an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO) terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs) and 3,073 single nucleotide polymorphisms (SNPs) in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but longer than many other dicot plants. Codon usages of melon full-length transcripts were largely similar to those of Arabidopsis coding sequences. Conclusion The collection of melon ESTs generated from full-length enriched and standard cDNA libraries is expected to play significant roles in annotating the melon genome. The ESTs and associated analysis results will be useful resources for gene discovery, functional analysis, marker-assisted breeding of melon and closely related species, comparative genomic studies and for gaining insights into gene expression patterns. PMID:21599934
Analysis of the leaf transcriptome of Musa acuminata during interaction with Mycosphaerella musicola: gene assembly, annotation and marker development

PubMed Central

2013-01-01

Background Although banana (Musa sp.) is an important edible crop, contributing towards poverty alleviation and food security, limited transcriptome datasets are available for use in accelerated molecular-based breeding in this genus. 454 GS-FLX Titanium technology was employed to determine the sequence of gene transcripts in genotypes of Musa acuminata ssp. burmannicoides Calcutta 4 and M. acuminata subgroup Cavendish cv. Grande Naine, contrasting in resistance to the fungal pathogen Mycosphaerella musicola, causal organism of Sigatoka leaf spot disease. To enrich for transcripts under biotic stress responses, full length-enriched cDNA libraries were prepared from whole plant leaf materials, both uninfected and artificially challenged with pathogen conidiospores. Results The study generated 846,762 high quality sequence reads, with an average length of 334 bp and totalling 283 Mbp. De novo assembly generated 36,384 and 35,269 unigene sequences for M. acuminata Calcutta 4 and Cavendish Grande Naine, respectively. A total of 64.4% of the unigenes were annotated through Basic Local Alignment Search Tool (BLAST) similarity analyses against public databases. Assembled sequences were functionally mapped to Gene Ontology (GO) terms, with unigene functions covering a diverse range of molecular functions, biological processes and cellular components. Genes from a number of defense-related pathways were observed in transcripts from each cDNA library. Over 99% of contig unigenes mapped to exon regions in the reference M. acuminata DH Pahang whole genome sequence. A total of 4068 genic-SSR loci were identified in Calcutta 4 and 4095 in Cavendish Grande Naine. A subset of 95 potential defense-related gene-derived simple sequence repeat (SSR) loci were validated for specific amplification and polymorphism across M. acuminata accessions. Fourteen loci were polymorphic, with alleles per polymorphic locus ranging from 3 to 8 and polymorphism information content ranging from 0.34 to 0.82. Conclusions A large set of unigenes were characterized in this study for both M. acuminata Calcutta 4 and Cavendish Grande Naine, increasing the number of public domain Musa ESTs. This transcriptome is an invaluable resource for furthering our understanding of biological processes elicited during biotic stresses in Musa. Gene-based markers will facilitate molecular breeding strategies, forming the basis of genetic linkage mapping and analysis of quantitative trait loci. PMID:23379821
Parameters of proteome evolution from histograms of amino-acid sequence identities of paralogous proteins

PubMed Central

Axelsen, Jacob Bock; Yan, Koon-Kiu; Maslov, Sergei

2007-01-01

Background The evolution of the full repertoire of proteins encoded in a given genome is mostly driven by gene duplications, deletions, and sequence modifications of existing proteins. Indirect information about relative rates and other intrinsic parameters of these three basic processes is contained in the proteome-wide distribution of sequence identities of pairs of paralogous proteins. Results We introduce a simple mathematical framework based on a stochastic birth-and-death model that allows one to extract some of this information and apply it to the set of all pairs of paralogous proteins in H. pylori, E. coli, S. cerevisiae, C. elegans, D. melanogaster, and H. sapiens. It was found that the histogram of sequence identities p generated by an all-to-all alignment of all protein sequences encoded in a genome is well fitted with a power-law form ~ p-γ with the value of the exponent γ around 4 for the majority of organisms used in this study. This implies that the intra-protein variability of substitution rates is best described by the Gamma-distribution with the exponent α ≈ 0.33. Different features of the shape of such histograms allow us to quantify the ratio between the genome-wide average deletion/duplication rates and the amino-acid substitution rate. Conclusion We separately measure the short-term ("raw") duplication and deletion rates rdup∗, rdel∗ which include gene copies that will be removed soon after the duplication event and their dramatically reduced long-term counterparts rdup, rdel. High deletion rate among recently duplicated proteins is consistent with a scenario in which they didn't have enough time to significantly change their functional roles and thus are to a large degree disposable. Systematic trends of each of the four duplication/deletion rates with the total number of genes in the genome were analyzed. All but the deletion rate of recent duplicates rdel∗ were shown to systematically increase with Ngenes. Abnormally flat shapes of sequence identity histograms observed for yeast and human are consistent with lineages leading to these organisms undergoing one or more whole-genome duplications. This interpretation is corroborated by our analysis of the genome of Paramecium tetraurelia where the p-4 profile of the histogram is gradually restored by the successive removal of paralogs generated in its four known whole-genome duplication events. PMID:18039386
A comprehensive resource of drought- and salinity- responsive ESTs for gene discovery and marker development in chickpea (Cicer arietinum L.)

PubMed Central

2009-01-01

Background Chickpea (Cicer arietinum L.), an important grain legume crop of the world is seriously challenged by terminal drought and salinity stresses. However, very limited number of molecular markers and candidate genes are available for undertaking molecular breeding in chickpea to tackle these stresses. This study reports generation and analysis of comprehensive resource of drought- and salinity-responsive expressed sequence tags (ESTs) and gene-based markers. Results A total of 20,162 (18,435 high quality) drought- and salinity- responsive ESTs were generated from ten different root tissue cDNA libraries of chickpea. Sequence editing, clustering and assembly analysis resulted in 6,404 unigenes (1,590 contigs and 4,814 singletons). Functional annotation of unigenes based on BLASTX analysis showed that 46.3% (2,965) had significant similarity (≤1E-05) to sequences in the non-redundant UniProt database. BLASTN analysis of unique sequences with ESTs of four legume species (Medicago, Lotus, soybean and groundnut) and three model plant species (rice, Arabidopsis and poplar) provided insights on conserved genes across legumes as well as novel transcripts for chickpea. Of 2,965 (46.3%) significant unigenes, only 2,071 (32.3%) unigenes could be functionally categorised according to Gene Ontology (GO) descriptions. A total of 2,029 sequences containing 3,728 simple sequence repeats (SSRs) were identified and 177 new EST-SSR markers were developed. Experimental validation of a set of 77 SSR markers on 24 genotypes revealed 230 alleles with an average of 4.6 alleles per marker and average polymorphism information content (PIC) value of 0.43. Besides SSR markers, 21,405 high confidence single nucleotide polymorphisms (SNPs) in 742 contigs (with ≥ 5 ESTs) were also identified. Recognition sites for restriction enzymes were identified for 7,884 SNPs in 240 contigs. Hierarchical clustering of 105 selected contigs provided clues about stress- responsive candidate genes and their expression profile showed predominance in specific stress-challenged libraries. Conclusion Generated set of chickpea ESTs serves as a resource of high quality transcripts for gene discovery and development of functional markers associated with abiotic stress tolerance that will be helpful to facilitate chickpea breeding. Mapping of gene-based markers in chickpea will also add more anchoring points to align genomes of chickpea and other legume species. PMID:19912666
Folding and Stabilization of Native-Sequence-Reversed Proteins

PubMed Central

Zhang, Yuanzhao; Weber, Jeffrey K; Zhou, Ruhong

2016-01-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols. PMID:27113844
Folding and Stabilization of Native-Sequence-Reversed Proteins

NASA Astrophysics Data System (ADS)

Zhang, Yuanzhao; Weber, Jeffrey K.; Zhou, Ruhong

2016-04-01

Though the problem of sequence-reversed protein folding is largely unexplored, one might speculate that reversed native protein sequences should be significantly more foldable than purely random heteropolymer sequences. In this article, we investigate how the reverse-sequences of native proteins might fold by examining a series of small proteins of increasing structural complexity (α-helix, β-hairpin, α-helix bundle, and α/β-protein). Employing a tandem protein structure prediction algorithmic and molecular dynamics simulation approach, we find that the ability of reverse sequences to adopt native-like folds is strongly influenced by protein size and the flexibility of the native hydrophobic core. For β-hairpins with reverse-sequences that fail to fold, we employ a simple mutational strategy for guiding stable hairpin formation that involves the insertion of amino acids into the β-turn region. This systematic look at reverse sequence duality sheds new light on the problem of protein sequence-structure mapping and may serve to inspire new protein design and protein structure prediction protocols.
An annotated genetic map of loblolly pine based on microsatellite and cDNA markers

Treesearch

Craig S. Echt; Surya Saha; Konstantin V. Krutovsky; Kokulapalan Wimalanathan; John E. Erpelding; Chun Liang; C Dana Nelson

2011-01-01

Previous loblolly pine (Pinus taeda L.) genetic linkage maps have been based on a variety of DNA polymorphisms, such as AFLPs, RAPDs, RFLPs, and ESTPs, but only a few SSRs (simple sequence repeats), also known as simple tandem repeats or microsatellites, have been mapped in P. taeda. The objective of this study was to integrate a large set of SSR markers from a variety...
Strain rates, stress markers and earthquake clustering (Invited)

NASA Astrophysics Data System (ADS)

Fry, B.; Gerstenberger, M.; Abercrombie, R. E.; Reyners, M.; Eberhart-Phillips, D. M.

2013-12-01

The 2010-present Canterbury earthquakes comprise a well-recorded sequence in a relatively low strain-rate shallow crustal region. We present new scientific results to test the hypothesis that: Earthquake sequences in low-strain rate areas experience high stress drop events, low-post seismic relaxation, and accentuated seismic clustering. This hypothesis is based on a physical description of the aftershock process in which the spatial distribution of stress accumulation and stress transfer are controlled by fault strength and orientation. Following large crustal earthquakes, time dependent forecasts are often developed by fitting parameters defined by Omori's aftershock decay law. In high-strain rate areas, simple forecast models utilizing a single p-value fit observed aftershock sequences well. In low-strain rate areas such as Canterbury, assumptions of simple Omori decay may not be sufficient to capture the clustering (sub-sequence) nature exhibited by the punctuated rise in activity following significant child events. In Canterbury, the moment release is more clustered than in more typical Omori sequences. The individual earthquakes in these clusters also exhibit somewhat higher stress drops than in the average crustal sequence in high-strain rate regions, suggesting the earthquakes occur on strong Andersonian-oriented faults, possibly juvenile or well-healed . We use the spectral ratio procedure outlined in (Viegas et al., 2010) to determine corner frequencies and Madariaga stress-drop values for over 800 events in the sequence. Furthermore, we will discuss the relevance of tomographic results of Reyners and Eberhart-Phillips (2013) documenting post-seismic stress-driven fluid processes following the three largest events in the sequence as well as anisotropic patterns in surface wave tomography (Fry et al., 2013). These tomographic studies are both compatible with the hypothesis, providing strong evidence for the presence of widespread and hydrated regional upper crustal cracking parallel to sub-parallel to the dominant transverse failure plane in the sequence. Joint interpretation of the three separate datasets provide a positive first attempt at testing our fundamental hypothesis.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE PAGES

Yim, Won Cheol; Cushman, John C.

2017-07-22

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Markers and mapping revisited: finding your gene.

PubMed

Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

2009-01-01

This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.

Evolution Analysis of Simple Sequence Repeats in Plant Genome.

PubMed

Qin, Zhen; Wang, Yanping; Wang, Qingmei; Li, Aixian; Hou, Fuyun; Zhang, Liming

2015-01-01

Simple sequence repeats (SSRs) are widespread units on genome sequences, and play many important roles in plants. In order to reveal the evolution of plant genomes, we investigated the evolutionary regularities of SSRs during the evolution of plant species and the plant kingdom by analysis of twelve sequenced plant genome sequences. First, in the twelve studied plant genomes, the main SSRs were those which contain repeats of 1-3 nucleotides combination. Second, in mononucleotide SSRs, the A/T percentage gradually increased along with the evolution of plants (except for P. patens). With the increase of SSRs repeat number the percentage of A/T in C. reinhardtii had no significant change, while the percentage of A/T in terrestrial plants species gradually declined. Third, in dinucleotide SSRs, the percentage of AT/TA increased along with the evolution of plant kingdom and the repeat number increased in terrestrial plants species. This trend was more obvious in dicotyledon than monocotyledon. The percentage of CG/GC showed the opposite pattern to the AT/TA. Forth, in trinucleotide SSRs, the percentages of combinations including two or three A/T were in a rising trend along with the evolution of plant kingdom; meanwhile with the increase of SSRs repeat number in plants species, different species chose different combinations as dominant SSRs. SSRs in C. reinhardtii, P. patens, Z. mays and A. thaliana showed their specific patterns related to evolutionary position or specific changes of genome sequences. The results showed that, SSRs not only had the general pattern in the evolution of plant kingdom, but also were associated with the evolution of the specific genome sequence. The study of the evolutionary regularities of SSRs provided new insights for the analysis of the plant genome evolution.
Divide and Conquer (DC) BLAST: fast and easy BLAST execution within HPC environments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Yim, Won Cheol; Cushman, John C.

Bioinformatics is currently faced with very large-scale data sets that lead to computational jobs, especially sequence similarity searches, that can take absurdly long times to run. For example, the National Center for Biotechnology Information (NCBI) Basic Local Alignment Search Tool (BLAST and BLAST+) suite, which is by far the most widely used tool for rapid similarity searching among nucleic acid or amino acid sequences, is highly central processing unit (CPU) intensive. While the BLAST suite of programs perform searches very rapidly, they have the potential to be accelerated. In recent years, distributed computing environments have become more widely accessible andmore » used due to the increasing availability of high-performance computing (HPC) systems. Therefore, simple solutions for data parallelization are needed to expedite BLAST and other sequence analysis tools. However, existing software for parallel sequence similarity searches often requires extensive computational experience and skill on the part of the user. In order to accelerate BLAST and other sequence analysis tools, Divide and Conquer BLAST (DCBLAST) was developed to perform NCBI BLAST searches within a cluster, grid, or HPC environment by using a query sequence distribution approach. Scaling from one (1) to 256 CPU cores resulted in significant improvements in processing speed. Thus, DCBLAST dramatically accelerates the execution of BLAST searches using a simple, accessible, robust, and parallel approach. DCBLAST works across multiple nodes automatically and it overcomes the speed limitation of single-node BLAST programs. DCBLAST can be used on any HPC system, can take advantage of hundreds of nodes, and has no output limitations. Thus, this freely available tool simplifies distributed computation pipelines to facilitate the rapid discovery of sequence similarities between very large data sets.« less
Simple sequence repeat marker development from bacterial artificial chromosome end sequences and expressed sequence tags of flax (Linum usitatissimum L.).

PubMed

Cloutier, Sylvie; Miranda, Evelyn; Ward, Kerry; Radovanovic, Natasa; Reimer, Elsa; Walichnowski, Andrzej; Datla, Raju; Rowland, Gordon; Duguid, Scott; Ragupathy, Raja

2012-08-01

Flax is an important oilseed crop in North America and is mostly grown as a fibre crop in Europe. As a self-pollinated diploid with a small estimated genome size of ~370 Mb, flax is well suited for fast progress in genomics. In the last few years, important genetic resources have been developed for this crop. Here, we describe the assessment and comparative analyses of 1,506 putative simple sequence repeats (SSRs) of which, 1,164 were derived from BAC-end sequences (BESs) and 342 from expressed sequence tags (ESTs). The SSRs were assessed on a panel of 16 flax accessions with 673 (58 %) and 145 (42 %) primer pairs being polymorphic in the BESs and ESTs, respectively. With 818 novel polymorphic SSR primer pairs reported in this study, the repertoire of available SSRs in flax has more than doubled from the combined total of 508 of all previous reports. Among nucleotide motifs, trinucleotides were the most abundant irrespective of the class, but dinucleotides were the most polymorphic. SSR length was also positively correlated with polymorphism. Two dinucleotide (AT/TA and AG/GA) and two trinucleotide (AAT/ATA/TAA and GAA/AGA/AAG) motifs and their iterations, different from those reported in many other crops, accounted for more than half of all the SSRs and were also more polymorphic (63.4 %) than the rest of the markers (42.7 %). This improved resource promises to be useful in genetic, quantitative trait loci (QTL) and association mapping as well as for anchoring the physical/genetic map with the whole genome shotgun reference sequence of flax.
Investigating on the Differences between Triggered and Background Seismicity in Italy and Southern California.

NASA Astrophysics Data System (ADS)

Stallone, A.; Marzocchi, W.

2017-12-01

Earthquake occurrence may be approximated by a multidimensional Poisson clustering process, where each point of the Poisson process is replaced by a cluster of points, the latter corresponding to the well-known aftershock sequence (triggered events). Earthquake clusters and their parents are assumed to occur according to a Poisson process at a constant temporal rate proportional to the tectonic strain rate, while events within a cluster are modeled as generations of dependent events reproduced by a branching process. Although the occurrence of such space-time clusters is a general feature in different tectonic settings, seismic sequences seem to have marked differences from region to region: one example, among many others, is that seismic sequences of moderate magnitude in Italian Apennines seem to last longer than similar seismic sequences in California. In this work we investigate on the existence of possible differences in the earthquake clustering process in these two areas. At first, we separate the triggered and background components of seismicity in the Italian and Southern California seismic catalog. Then we study the space-time domain of the triggered earthquakes with the aim to identify possible variations in the triggering properties across the two regions. In the second part of the work we focus our attention on the characteristics of the background seismicity in both seismic catalogs. The assumption of time stationarity of the background seismicity (which includes both cluster parents and isolated events) is still under debate. Some authors suggest that the independent component of seismicity could undergo transient perturbations at various time scales due to different physical mechanisms, such as, for example, viscoelastic relaxation, presence of fluids, non-stationary plate motion, etc, whose impact may depend on the tectonic setting. Here we test if the background seismicity in the two regions can be satisfactorily described by the time-homogeneous Poisson process, and, in case, we characterize quantitatively possible discrepancies with this reference process, and the differences between the two regions.
Recognizing human actions by learning and matching shape-motion prototype trees.

PubMed

Jiang, Zhuolin; Lin, Zhe; Davis, Larry S

2012-03-01

A shape-motion prototype-based approach is introduced for action recognition. The approach represents an action as a sequence of prototypes for efficient and flexible action matching in long video sequences. During training, an action prototype tree is learned in a joint shape and motion space via hierarchical K-means clustering and each training sequence is represented as a labeled prototype sequence; then a look-up table of prototype-to-prototype distances is generated. During testing, based on a joint probability model of the actor location and action prototype, the actor is tracked while a frame-to-prototype correspondence is established by maximizing the joint probability, which is efficiently performed by searching the learned prototype tree; then actions are recognized using dynamic prototype sequence matching. Distance measures used for sequence matching are rapidly obtained by look-up table indexing, which is an order of magnitude faster than brute-force computation of frame-to-frame distances. Our approach enables robust action matching in challenging situations (such as moving cameras, dynamic backgrounds) and allows automatic alignment of action sequences. Experimental results demonstrate that our approach achieves recognition rates of 92.86 percent on a large gesture data set (with dynamic backgrounds), 100 percent on the Weizmann action data set, 95.77 percent on the KTH action data set, 88 percent on the UCF sports data set, and 87.27 percent on the CMU action data set.
Application of circular consensus sequencing and network analysis to characterize the bovine IgG repertoire

USDA-ARS?s Scientific Manuscript database

Background: Vertebrate immune systems generate diverse repertoires of antibodies capable of mediating response to a variety of antigens. Next generation sequencing methods provide unique approaches to a number of immuno-based research areas including antibody discovery and engineering, disease surve...
VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

PubMed

Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

2015-07-07

Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Solid-phase proximity ligation assays for individual or parallel protein analyses with readout via real-time PCR or sequencing.

PubMed

Nong, Rachel Yuan; Wu, Di; Yan, Junhong; Hammond, Maria; Gu, Gucci Jijuan; Kamali-Moghaddam, Masood; Landegren, Ulf; Darmanis, Spyros

2013-06-01

Solid-phase proximity ligation assays share properties with the classical sandwich immunoassays for protein detection. The proteins captured via antibodies on solid supports are, however, detected not by single antibodies with detectable functions, but by pairs of antibodies with attached DNA strands. Upon recognition by these sets of three antibodies, pairs of DNA strands brought in proximity are joined by ligation. The ligated reporter DNA strands are then detected via methods such as real-time PCR or next-generation sequencing (NGS). We describe how to construct assays that can offer improved detection specificity by virtue of recognition by three antibodies, as well as enhanced sensitivity owing to reduced background and amplified detection. Finally, we also illustrate how the assays can be applied for parallel detection of proteins, taking advantage of the oligonucleotide ligation step to avoid background problems that might arise with multiplexing. The protocol for the singleplex solid-phase proximity ligation assay takes ~5 h. The multiplex version of the assay takes 7-8 h depending on whether quantitative PCR (qPCR) or sequencing is used as the readout. The time for the sequencing-based protocol includes the library preparation but not the actual sequencing, as times may vary based on the choice of sequencing platform.
Motion detection and compensation in infrared retinal image sequences.

PubMed

Scharcanski, J; Schardosim, L R; Santos, D; Stuchi, A

2013-01-01

Infrared image data captured by non-mydriatic digital retinography systems often are used in the diagnosis and treatment of the diabetic macular edema (DME). Infrared illumination is less aggressive to the patient retina, and retinal studies can be carried out without pupil dilation. However, sequences of infrared eye fundus images of static scenes, tend to present pixel intensity fluctuations in time, and noisy and background illumination changes pose a challenge to most motion detection methods proposed in the literature. In this paper, we present a retinal motion detection method that is adaptive to background noise and illumination changes. Our experimental results indicate that this method is suitable for detecting retinal motion in infrared image sequences, and compensate the detected motion, which is relevant in retinal laser treatment systems for DME. Copyright © 2013 Elsevier Ltd. All rights reserved.
Meteor localization via statistical analysis of spatially temporal fluctuations in image sequences

NASA Astrophysics Data System (ADS)

Kukal, Jaromír.; Klimt, Martin; Šihlík, Jan; Fliegel, Karel

2015-09-01

Meteor detection is one of the most important procedures in astronomical imaging. Meteor path in Earth's atmosphere is traditionally reconstructed from double station video observation system generating 2D image sequences. However, the atmospheric turbulence and other factors cause spatially-temporal fluctuations of image background, which makes the localization of meteor path more difficult. Our approach is based on nonlinear preprocessing of image intensity using Box-Cox and logarithmic transform as its particular case. The transformed image sequences are then differentiated along discrete coordinates to obtain statistical description of sky background fluctuations, which can be modeled by multivariate normal distribution. After verification and hypothesis testing, we use the statistical model for outlier detection. Meanwhile the isolated outlier points are ignored, the compact cluster of outliers indicates the presence of meteoroids after ignition.
Specialized microbial databases for inductive exploration of microbial genome sequences

PubMed Central

Fang, Gang; Ho, Christine; Qiu, Yaowu; Cubas, Virginie; Yu, Zhou; Cabau, Cédric; Cheung, Frankie; Moszer, Ivan; Danchin, Antoine

2005-01-01

Background The enormous amount of genome sequence data asks for user-oriented databases to manage sequences and annotations. Queries must include search tools permitting function identification through exploration of related objects. Methods The GenoList package for collecting and mining microbial genome databases has been rewritten using MySQL as the database management system. Functions that were not available in MySQL, such as nested subquery, have been implemented. Results Inductive reasoning in the study of genomes starts from "islands of knowledge", centered around genes with some known background. With this concept of "neighborhood" in mind, a modified version of the GenoList structure has been used for organizing sequence data from prokaryotic genomes of particular interest in China. GenoChore , a set of 17 specialized end-user-oriented microbial databases (including one instance of Microsporidia, Encephalitozoon cuniculi, a member of Eukarya) has been made publicly available. These databases allow the user to browse genome sequence and annotation data using standard queries. In addition they provide a weekly update of searches against the world-wide protein sequences data libraries, allowing one to monitor annotation updates on genes of interest. Finally, they allow users to search for patterns in DNA or protein sequences, taking into account a clustering of genes into formal operons, as well as providing extra facilities to query sequences using predefined sequence patterns. Conclusion This growing set of specialized microbial databases organize data created by the first Chinese bacterial genome programs (ThermaList, Thermoanaerobacter tencongensis, LeptoList, with two different genomes of Leptospira interrogans and SepiList, Staphylococcus epidermidis) associated to related organisms for comparison. PMID:15698474
Preschool-aged children have difficulty constructing and interpreting simple utterances composed of graphic symbols.

PubMed

Sutton, Ann; Trudeau, Natacha; Morford, Jill; Rios, Monica; Poirier, Marie-Andrée

2010-01-01

Children who require augmentative and alternative communication (AAC) systems while they are in the process of acquiring language face unique challenges because they use graphic symbols for communication. In contrast to the situation of typically developing children, they use different modalities for comprehension (auditory) and expression (visual). This study explored the ability of three- and four-year-old children without disabilities to perform tasks involving sequences of graphic symbols. Thirty participants were asked to transpose spoken simple sentences into graphic symbols by selecting individual symbols corresponding to the spoken words, and to interpret graphic symbol utterances by selecting one of four photographs corresponding to a sequence of three graphic symbols. The results showed that these were not simple tasks for the participants, and few of them performed in the expected manner - only one in transposition, and only one-third of participants in interpretation. Individual response strategies in some cases lead to contrasting response patterns. Children at this age level have not yet developed the skills required to deal with graphic symbols even though they have mastered the corresponding spoken language structures.
Computational Software for Fitting Seismic Data to Epidemic-Type Aftershock Sequence Models

NASA Astrophysics Data System (ADS)

Chu, A.

2014-12-01

Modern earthquake catalogs are often analyzed using spatial-temporal point process models such as the epidemic-type aftershock sequence (ETAS) models of Ogata (1998). My work introduces software to implement two of ETAS models described in Ogata (1998). To find the Maximum-Likelihood Estimates (MLEs), my software provides estimates of the homogeneous background rate parameter and the temporal and spatial parameters that govern triggering effects by applying the Expectation-Maximization (EM) algorithm introduced in Veen and Schoenberg (2008). Despite other computer programs exist for similar data modeling purpose, using EM-algorithm has the benefits of stability and robustness (Veen and Schoenberg, 2008). Spatial shapes that are very long and narrow cause difficulties in optimization convergence and problems with flat or multi-modal log-likelihood functions encounter similar issues. My program uses a robust method to preset a parameter to overcome the non-convergence computational issue. In addition to model fitting, the software is equipped with useful tools for examining modeling fitting results, for example, visualization of estimated conditional intensity, and estimation of expected number of triggered aftershocks. A simulation generator is also given with flexible spatial shapes that may be defined by the user. This open-source software has a very simple user interface. The user may execute it on a local computer, and the program also has potential to be hosted online. Java language is used for the software's core computing part and an optional interface to the statistical package R is provided.
Accelerating String Set Matching in FPGA Hardware for Bioinformatics Research

PubMed Central

Dandass, Yoginder S; Burgess, Shane C; Lawrence, Mark; Bridges, Susan M

2008-01-01

Background This paper describes techniques for accelerating the performance of the string set matching problem with particular emphasis on applications in computational proteomics. The process of matching peptide sequences against a genome translated in six reading frames is part of a proteogenomic mapping pipeline that is used as a case-study. The Aho-Corasick algorithm is adapted for execution in field programmable gate array (FPGA) devices in a manner that optimizes space and performance. In this approach, the traditional Aho-Corasick finite state machine (FSM) is split into smaller FSMs, operating in parallel, each of which matches up to 20 peptides in the input translated genome. Each of the smaller FSMs is further divided into five simpler FSMs such that each simple FSM operates on a single bit position in the input (five bits are sufficient for representing all amino acids and special symbols in protein sequences). Results This bit-split organization of the Aho-Corasick implementation enables efficient utilization of the limited random access memory (RAM) resources available in typical FPGAs. The use of on-chip RAM as opposed to FPGA logic resources for FSM implementation also enables rapid reconfiguration of the FPGA without the place and routing delays associated with complex digital designs. Conclusion Experimental results show storage efficiencies of over 80% for several data sets. Furthermore, the FPGA implementation executing at 100 MHz is nearly 20 times faster than an implementation of the traditional Aho-Corasick algorithm executing on a 2.67 GHz workstation. PMID:18412963
Reranking candidate gene models with cross-species comparison for improved gene prediction

PubMed Central

Liu, Qian; Crammer, Koby; Pereira, Fernando CN; Roos, David S

2008-01-01

Background Most gene finders score candidate gene models with state-based methods, typically HMMs, by combining local properties (coding potential, splice donor and acceptor patterns, etc). Competing models with similar state-based scores may be distinguishable with additional information. In particular, functional and comparative genomics datasets may help to select among competing models of comparable probability by exploiting features likely to be associated with the correct gene models, such as conserved exon/intron structure or protein sequence features. Results We have investigated the utility of a simple post-processing step for selecting among a set of alternative gene models, using global scoring rules to rerank competing models for more accurate prediction. For each gene locus, we first generate the K best candidate gene models using the gene finder Evigan, and then rerank these models using comparisons with putative orthologous genes from closely-related species. Candidate gene models with lower scores in the original gene finder may be selected if they exhibit strong similarity to probable orthologs in coding sequence, splice site location, or signal peptide occurrence. Experiments on Drosophila melanogaster demonstrate that reranking based on cross-species comparison outperforms the best gene models identified by Evigan alone, and also outperforms the comparative gene finders GeneWise and Augustus+. Conclusion Reranking gene models with cross-species comparison improves gene prediction accuracy. This straightforward method can be readily adapted to incorporate additional lines of evidence, as it requires only a ranked source of candidate gene models. PMID:18854050
Hybridization-Induced Aggregation Technology for Practical Clinical Testing: KRAS Mutation Detection in Lung and Colorectal Tumors.

PubMed

Sloane, Hillary S; Landers, James P; Kelly, Kimberly A

2016-07-01

KRAS mutations have emerged as powerful predictors of response to targeted therapies in the treatment of lung and colorectal cancers; thus, prospective KRAS genotyping is essential for appropriate treatment stratification. Conventional mutation testing technologies are not ideal for routine clinical screening, as they often involve complex, time-consuming processes and/or costly instrumentation. In response, we recently introduced a unique analytical strategy for revealing KRAS mutations, based on the allele-specific hybridization-induced aggregation (HIA) of oligonucleotide probe-conjugated microbeads. Using simple, inexpensive instrumentation, this approach allows for the detection of any common KRAS mutation in <10 minutes after PCR. Here, we evaluate the clinical utility of the HIA method for mutation detection (HIAMD). In the analysis of 20 lung and colon tumor pathology specimens, we observed a 100% correlation between the KRAS mutation statuses determined by HIAMD and sequencing. In addition, we were able to detect KRAS mutations in a background of 75% wild-type DNA-a finding consistent with that reported for sequencing. With this, we show that HIAMD allows for the rapid and cost-effective detection of KRAS mutations, without compromising analytical performance. These results indicate the validity of HIAMD as a mutation-testing technology suitable for practical clinical testing. Further expansion of this platform may involve the detection of mutations in other key oncogenic pathways. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Monodisperse Picoliter Droplets for Low-Bias and Contamination-Free Reactions in Single-Cell Whole Genome Amplification

PubMed Central

Maruyama, Toru; Yamagishi, Keisuke; Mori, Tetsushi; Takeyama, Haruko

2015-01-01

Whole genome amplification (WGA) is essential for obtaining genome sequences from single bacterial cells because the quantity of template DNA contained in a single cell is very low. Multiple displacement amplification (MDA), using Phi29 DNA polymerase and random primers, is the most widely used method for single-cell WGA. However, single-cell MDA usually results in uneven genome coverage because of amplification bias, background amplification of contaminating DNA, and formation of chimeras by linking of non-contiguous chromosomal regions. Here, we present a novel MDA method, termed droplet MDA, that minimizes amplification bias and amplification of contaminants by using picoliter-sized droplets for compartmentalized WGA reactions. Extracted DNA fragments from a lysed cell in MDA mixture are divided into 105 droplets (67 pL) within minutes via flow through simple microfluidic channels. Compartmentalized genome fragments can be individually amplified in these droplets without the risk of encounter with reagent-borne or environmental contaminants. Following quality assessment of WGA products from single Escherichia coli cells, we showed that droplet MDA minimized unexpected amplification and improved the percentage of genome recovery from 59% to 89%. Our results demonstrate that microfluidic-generated droplets show potential as an efficient tool for effective amplification of low-input DNA for single-cell genomics and greatly reduce the cost and labor investment required for determination of nearly complete genome sequences of uncultured bacteria from environmental samples. PMID:26389587
Draft Genome Sequence of Thermoanaerobacter sp. Strain A7A, Reconstructed from a Metagenome Obtained from a High-Temperature Hydrocarbon Reservoir in the Bass Strait, Australia

PubMed Central

Li, Dongmei; Greenfield, Paul; Rosewarne, Carly P.

2013-01-01

The draft genome sequence of Thermoanaerobacter sp. strain A7A was reconstructed from a metagenome of a microbial consortium obtained from the Tuna oil field in the Gippsland Basin, Australia. The organism is a strict anaerobe that is predicted to ferment a range of simple sugars and undertake sulfur reduction. PMID:24029756
Comparison between rpoB and 16S rRNA Gene Sequencing for Molecular Identification of 168 Clinical Isolates of Corynebacterium

PubMed Central

Khamis, Atieh; Raoult, Didier; La Scola, Bernard

2005-01-01

Higher proportions (91%) of 168 corynebacterial isolates were positively identified by partial rpoB gene determination than by that based on 16S rRNA gene sequences. This method is thus a simple, molecular-analysis-based method for identification of corynebacteria, but it should be used in conjunction with other tests for definitive identification. PMID:15815024
Long-term Recurrent Convolutional Networks for Visual Recognition and Description

DTIC Science & Technology

2014-11-17

deep???, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large...models which are also recurrent, or “temporally deep”, are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent...limitation of simple RNN models which strictly integrate state information over time is known as the “vanishing gradient” effect : the ability to

Detection of genomic rearrangements in cucumber using genomecmp software

NASA Astrophysics Data System (ADS)

Kulawik, Maciej; Pawełkowicz, Magdalena Ewa; Wojcieszek, Michał; PlÄ der, Wojciech; Nowak, Robert M.

2017-08-01

Comparative genomic by increasing information about the genomes sequences available in the databases is a rapidly evolving science. A simple comparison of the general features of genomes such as genome size, number of genes, and chromosome number presents an entry point into comparative genomic analysis. Here we present the utility of the new tool genomecmp for finding rearrangements across the compared sequences and applications in plant comparative genomics.
Separating Putative Pathogens from Background Contamination with Principal Orthogonal Decomposition: Evidence for Leptospira in the Ugandan Neonatal Septisome

PubMed Central

Schiff, Steven J.; Kiwanuka, Julius; Riggio, Gina; Nguyen, Lan; Mu, Kevin; Sproul, Emily; Bazira, Joel; Mwanga-Amumpaire, Juliet; Tumusiime, Dickson; Nyesigire, Eunice; Lwanga, Nkangi; Bogale, Kaleb T.; Kapur, Vivek; Broach, James R.; Morton, Sarah U.; Warf, Benjamin C.; Poss, Mary

2016-01-01

Neonatal sepsis (NS) is responsible for over 1 million yearly deaths worldwide. In the developing world, NS is often treated without an identified microbial pathogen. Amplicon sequencing of the bacterial 16S rRNA gene can be used to identify organisms that are difficult to detect by routine microbiological methods. However, contaminating bacteria are ubiquitous in both hospital settings and research reagents and must be accounted for to make effective use of these data. In this study, we sequenced the bacterial 16S rRNA gene obtained from blood and cerebrospinal fluid (CSF) of 80 neonates presenting with NS to the Mbarara Regional Hospital in Uganda. Assuming that patterns of background contamination would be independent of pathogenic microorganism DNA, we applied a novel quantitative approach using principal orthogonal decomposition to separate background contamination from potential pathogens in sequencing data. We designed our quantitative approach contrasting blood, CSF, and control specimens and employed a variety of statistical random matrix bootstrap hypotheses to estimate statistical significance. These analyses demonstrate that Leptospira appears present in some infants presenting within 48 h of birth, indicative of infection in utero, and up to 28 days of age, suggesting environmental exposure. This organism cannot be cultured in routine bacteriological settings and is enzootic in the cattle that often live in close proximity to the rural peoples of western Uganda. Our findings demonstrate that statistical approaches to remove background organisms common in 16S sequence data can reveal putative pathogens in small volume biological samples from newborns. This computational analysis thus reveals an important medical finding that has the potential to alter therapy and prevention efforts in a critically ill population. PMID:27379237
Molecular Identification of Sex in Phoenix dactylifera Using Inter Simple Sequence Repeat Markers.

PubMed

Al-Ameri, Abdulhafed A; Al-Qurainy, Fahad; Gaafar, Abdel-Rhman Z; Khan, Salim; Nadeem, M

2016-01-01

Early sex identification of Date Palm (Phoenix dactylifera L.) at seedling stage is an economically desirable objective, which will significantly increase the profits of seed based cultivation. The utilization of molecular markers at this stage for early and rapid identification of sex is important due to the lack of morphological markers. In this study, a total of two hundred Inter Simple Sequence Repeat (ISSR) primers were screened among male and female Date palm plants to identify putative sex-specific marker, out of which only two primers (IS_A02 and IS_A71) were found to be associated with sex. The primer IS_A02 produced a unique band of size 390 bp and was found clearly in all female plants, while it was absent in all male plants. Contrary to this, the primer IS_A71 produced a unique band of size 380 bp and was clearly found in all male plants, whereas it was absent in all the female plants. Subsequently, these specific fragments were excised, purified, and sequenced for the development of sequence specific markers further in future for the implementation on dioecious Date Palm for sex determination. These markers are efficient, highly reliable, and reproducible for sex identification at the early stage of seedling.
Plasmonic SERS nanochips and nanoprobes for medical diagnostics and bio-energy applications

NASA Astrophysics Data System (ADS)

Ngo, Hoan T.; Wang, Hsin-Neng; Crawford, Bridget M.; Fales, Andrew M.; Vo-Dinh, Tuan

2017-02-01

The development of rapid, easy-to-use, cost-effective, high accuracy, and high sensitive DNA detection methods for molecular diagnostics has been receiving increasing interest. Over the last five years, our laboratory has developed several chip-based DNA detection techniques including the molecular sentinel-on-chip (MSC), the multiplex MSC, and the inverse molecular sentinel-on-chip (iMS-on-Chip). In these techniques, plasmonic surface-enhanced Raman scattering (SERS) Nanowave chips were functionalized with DNA probes for single-step DNA detection. Sensing mechanisms were based on hybridization of target sequences and DNA probes, resulting in a distance change between SERS reporters and the Nanowave chip's gold surface. This distance change resulted in change in SERS intensity, thus indicating the presence and capture of the target sequences. Our techniques were single-step DNA detection techniques. Target sequences were detected by simple delivery of sample solutions onto DNA probe-functionalized Nanowave chips and SERS signals were measured after 1h - 2h incubation. Target sequence labeling or washing to remove unreacted components was not required, making the techniques simple, easy-to-use, and cost effective. The usefulness of the techniques for medical diagnostics was illustrated by the detection of genetic biomarkers for respiratory viral infection and of dengue virus 4 DNA.
Hermes Transposon Distribution and Structure in Musca domestica

PubMed Central

Subramanian, Ramanand A.; Cathcart, Laura A.; Krafsur, Elliot S.; Atkinson, Peter W.

2009-01-01

Hermes are hAT transposons from Musca domestica that are very closely related to the hobo transposons from Drosophila melanogaster and are useful as gene vectors in a wide variety of organisms including insects, planaria, and yeast. hobo elements show distinct length variations in a rapidly evolving region of the transposase-coding region as a result of expansions and contractions of a simple repeat sequence encoding 3 amino acids threonine, proline, and glutamic acid (TPE). These variations in length may influence the function of the protein and the movement of hobo transposons in natural populations. Here, we determine the distribution of Hermes in populations of M. domestica as well as whether Hermes transposase has undergone similar sequence expansions and contractions during its evolution in this species. Hermes transposons were found in all M. domestica individuals sampled from 14 populations collected from 4 continents. All individuals with Hermes transposons had evidence for the presence of intact transposase open reading frames, and little sequence variation was observed among Hermes elements. A systematic analysis of the TPE-homologous region of the Hermes transposase-coding region revealed no evidence for length variation. The simple sequence repeat found in hobo elements is a feature of this transposon that evolved since the divergence of hobo and Hermes. PMID:19366812
Learning Orthographic Structure With Sequential Generative Neural Networks.

PubMed

Testolin, Alberto; Stoianov, Ivilin; Sperduti, Alessandro; Zorzi, Marco

2016-04-01

Learning the structure of event sequences is a ubiquitous problem in cognition and particularly in language. One possible solution is to learn a probabilistic generative model of sequences that allows making predictions about upcoming events. Though appealing from a neurobiological standpoint, this approach is typically not pursued in connectionist modeling. Here, we investigated a sequential version of the restricted Boltzmann machine (RBM), a stochastic recurrent neural network that extracts high-order structure from sensory data through unsupervised generative learning and can encode contextual information in the form of internal, distributed representations. We assessed whether this type of network can extract the orthographic structure of English monosyllables by learning a generative model of the letter sequences forming a word training corpus. We show that the network learned an accurate probabilistic model of English graphotactics, which can be used to make predictions about the letter following a given context as well as to autonomously generate high-quality pseudowords. The model was compared to an extended version of simple recurrent networks, augmented with a stochastic process that allows autonomous generation of sequences, and to non-connectionist probabilistic models (n-grams and hidden Markov models). We conclude that sequential RBMs and stochastic simple recurrent networks are promising candidates for modeling cognition in the temporal domain. Copyright © 2015 Cognitive Science Society, Inc.
A Simple Method for the Extraction, PCR-amplification, Cloning, and Sequencing of Pasteuria 16S rDNA from Small Numbers of Endospores.

PubMed

Atibalentja, N; Noel, G R; Ciancio, A

2004-03-01

For many years the taxonomy of the genus Pasteuria has been marred with confusion because the bacterium could not be cultured in vitro and, therefore, descriptions were based solely on morphological, developmental, and pathological characteristics. The current study sought to devise a simple method for PCR-amplification, cloning, and sequencing of Pasteuria 16S rDNA from small numbers of endospores, with no need for prior DNA purification. Results show that DNA extracts from plain glass bead-beating of crude suspensions containing 10,000 endospores at 0.2 x 10 endospores ml(-1) were sufficient for PCR-amplification of Pasteuria 16S rDNA, when used in conjunction with specific primers. These results imply that for P. penetrans and P. nishizawae only one parasitized female of Meloidogyne spp. and Heterodera glycines, respectively, should be sufficient, and as few as eight cadavers of Belonolaimus longicaudatus with an average number of 1,250 endospores of "Candidatus Pasteuria usgae" are needed for PCR-amplification of Pasteuria 16S rDNA. The method described in this paper should facilitate the sequencing of the 16S rDNA of the many Pasteuria isolates that have been reported on nematodes and, consequently, expedite the classification of those isolates through comparative sequence analysis.
GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

PubMed

Thomas, Paul D

2010-06-09

Phylogenetic relationships between genes are not only of theoretical interest: they enable us to learn about human genes through the experimental work on their relatives in numerous model organisms from bacteria to fruit flies and mice. Yet the most commonly used computational algorithms for reconstructing gene trees can be inaccurate for numerous reasons, both algorithmic and biological. Additional information beyond gene sequence data has been shown to improve the accuracy of reconstructions, though at great computational cost. We describe a simple, fast algorithm for inferring gene phylogenies, which makes use of information that was not available prior to the genomic age: namely, a reliable species tree spanning much of the tree of life, and knowledge of the complete complement of genes in a species' genome. The algorithm, called GIGA, constructs trees agglomeratively from a distance matrix representation of sequences, using simple rules to incorporate this genomic age information. GIGA makes use of a novel conceptualization of gene trees as being composed of orthologous subtrees (containing only speciation events), which are joined by other evolutionary events such as gene duplication or horizontal gene transfer. An important innovation in GIGA is that, at every step in the agglomeration process, the tree is interpreted/reinterpreted in terms of the evolutionary events that created it. Remarkably, GIGA performs well even when using a very simple distance metric (pairwise sequence differences) and no distance averaging over clades during the tree construction process. GIGA is efficient, allowing phylogenetic reconstruction of very large gene families and determination of orthologs on a large scale. It is exceptionally robust to adding more gene sequences, opening up the possibility of creating stable identifiers for referring to not only extant genes, but also their common ancestors. We compared trees produced by GIGA to those in the TreeFam database, and they were very similar in general, with most differences likely due to poor alignment quality. However, some remaining differences are algorithmic, and can be explained by the fact that GIGA tends to put a larger emphasis on minimizing gene duplication and deletion events.
Derivational Suffixes as Cues to Stress Position in Reading Greek

ERIC Educational Resources Information Center

Grimani, Aikaterini; Protopapas, Athanassios

2017-01-01

Background: In languages with lexical stress, reading aloud must include stress assignment. Stress information sources across languages include word-final letter sequences. Here, we examine whether such sequences account for stress assignment in Greek and whether this is attributable to absolute rules involving accenting morphemes or to…
Properties of Cordonnier, Perrin and Van der Laan Numbers

ERIC Educational Resources Information Center

Shannon, A. G.; Anderson, P. G.; Horadam, A. F.

2006-01-01

This paper aims to explore some properties of certain third-order linear sequences which have some properties analogous to the better known second-order sequences of Fibonacci and Lucas. Historical background issues are outlined. These, together with the number and combinatorial theoretical results, provide plenty of pedagogical opportunities for…
GSP: a web-based platform for designing genome-specific primers in polyploids

USDA-ARS?s Scientific Manuscript database

The primary goal of this research was to develop a web-based platform named GSP for designing genome-specific primers to distinguish subgenome sequences in the polyploid genome background. GSP uses BLAST to extract homeologous sequences of the subgenomes in the existing databases, performed a multip...
Whole genome sequencing of elite rice cultivars as a comprehensive information resource for marker assisted selection

USDA-ARS?s Scientific Manuscript database

Current advances in sequencing technologies and bioinformatics allow to determine a nearly complete genomic background of rice, a staple food for the poor people. Consequently, comprehensive databases of variation among thousands of varieties is currently being assembled and released. Proper analysi...
Regularization in Short-Term Memory for Serial Order

ERIC Educational Resources Information Center

Botvinick, Matthew; Bylsma, Lauren M.

2005-01-01

Previous research has shown that short-term memory for serial order can be influenced by background knowledge concerning regularities of sequential structure. Specifically, it has been shown that recall is superior for sequences that fit well with familiar sequencing constraints. The authors report a corresponding effect pertaining to serial…
The Relative Importance of Children's Criteria for Representational Adequacy in the Perception of Simple Sonic Stimuli

ERIC Educational Resources Information Center

Verschaffel, Lieven; Reybrouck, Mark; Degraeuwe, Goedele; Van Dooren, Wim

2013-01-01

This study investigates children's metarepresentational competence (MRC) with regard to listening to and making sense of simple sonic stimuli. Using diSessa's (2002) seminal work on MRC in mathematics and sciences as background, it aims to assess the relative importance children attribute to several criteria for representational adequacy…
Pizza Again? On the Division of Polygons into Sections with a Common Origin

ERIC Educational Resources Information Center

Sinitsky, Ilya; Stupel, Moshe; Sinitsky, Marina

2018-01-01

The paper explores the division of a polygon into equal-area pieces using line segments originating at a common point. The mathematical background of the proposed method is very simple and belongs to secondary school geometry. Simple examples dividing a square into two, four or eight congruent pieces provide a starting point to discovering how to…
RSAT: regulatory sequence analysis tools.

PubMed

Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

2008-07-01

The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
Consistency of gene starts among Burkholderia genomes

PubMed Central

2011-01-01

Background Evolutionary divergence in the position of the translational start site among orthologous genes can have significant functional impacts. Divergence can alter the translation rate, degradation rate, subcellular location, and function of the encoded proteins. Results Existing Genbank gene maps for Burkholderia genomes suggest that extensive divergence has occurred--53% of ortholog sets based on Genbank gene maps had inconsistent gene start sites. However, most of these inconsistencies appear to be gene-calling errors. Evolutionary divergence was the most plausible explanation for only 17% of the ortholog sets. Correcting probable errors in the Genbank gene maps decreased the percentage of ortholog sets with inconsistent starts by 68%, increased the percentage of ortholog sets with extractable upstream intergenic regions by 32%, increased the sequence similarity of intergenic regions and predicted proteins, and increased the number of proteins with identifiable signal peptides. Conclusions Our findings highlight an emerging problem in comparative genomics: single-digit percent errors in gene predictions can lead to double-digit percentages of inconsistent ortholog sets. The work demonstrates a simple approach to evaluate and improve the quality of gene maps. PMID:21342528
ColorTree: a batch customization tool for phylogenic trees

PubMed Central

Chen, Wei-Hua; Lercher, Martin J

2009-01-01

Background Genome sequencing projects and comparative genomics studies typically aim to trace the evolutionary history of large gene sets, often requiring human inspection of hundreds of phylogenetic trees. If trees are checked for compatibility with an explicit null hypothesis (e.g., the monophyly of certain groups), this daunting task is greatly facilitated by an appropriate coloring scheme. Findings In this note, we introduce ColorTree, a simple yet powerful batch customization tool for phylogenic trees. Based on pattern matching rules, ColorTree applies a set of customizations to an input tree file, e.g., coloring labels or branches. The customized trees are saved to an output file, which can then be viewed and further edited by Dendroscope (a freely available tree viewer). ColorTree runs on any Perl installation as a stand-alone command line tool, and its application can thus be easily automated. This way, hundreds of phylogenic trees can be customized for easy visual inspection in a matter of minutes. Conclusion ColorTree allows efficient and flexible visual customization of large tree sets through the application of a user-supplied configuration file to multiple tree files. PMID:19646243
Simple sequence repeat markers useful for sorghum downy mildew (Peronosclerospora sorghi) and related species

PubMed Central

Perumal, Ramasamy; Nimmakayala, Padmavathi; Erattaimuthu, Saradha R; No, Eun-Gyu; Reddy, Umesh K; Prom, Louis K; Odvody, Gary N; Luster, Douglas G; Magill, Clint W

2008-01-01

Background A recent outbreak of sorghum downy mildew in Texas has led to the discovery of both metalaxyl resistance and a new pathotype in the causal organism, Peronosclerospora sorghi. These observations and the difficulty in resolving among phylogenetically related downy mildew pathogens dramatically point out the need for simply scored markers in order to differentiate among isolates and species, and to study the population structure within these obligate oomycetes. Here we present the initial results from the use of a biotin capture method to discover, clone and develop PCR primers that permit the use of simple sequence repeats (microsatellites) to detect differences at the DNA level. Results Among the 55 primers pairs designed from clones from pathotype 3 of P. sorghi, 36 flanked microsatellite loci containing simple repeats, including 28 (55%) with dinucleotide repeats and 6 (11%) with trinucleotide repeats. A total of 22 microsatellites with CA/AC or GT/TG repeats were the most abundant (40%) and GA/AG or CT/TC types contribute 15% in our collection. When used to amplify DNA from 19 isolates from P. sorghi, as well as from 5 related species that cause downy mildew on other hosts, the number of different bands detected for each SSR primer pair using a LI-COR- DNA Analyzer ranged from two to eight. Successful cross-amplification for 12 primer pairs studied in detail using DNA from downy mildews that attack maize (P. maydis & P. philippinensis), sugar cane (P. sacchari), pearl millet (Sclerospora graminicola) and rose (Peronospora sparsa) indicate that the flanking regions are conserved in all these species. A total of 15 SSR amplicons unique to P. philippinensis (one of the potential threats to US maize production) were detected, and these have potential for development of diagnostic tests. A total of 260 alleles were obtained using 54 microsatellites primer combinations, with an average of 4.8 polymorphic markers per SSR across 34 Peronosclerospora, Peronospora and Sclerospora spp isolates studied. Cluster analysis by UPGMA as well as principal coordinate analysis (PCA) grouped the 34 isolates into three distinct groups (all 19 isolates of Peronosclerospora sorghi in cluster I, five isolates of P. maydis and three isolates of P. sacchari in cluster II and five isolates of Sclerospora graminicola in cluster III). Conclusion To our knowledge, this is the first attempt to extensively develop SSR markers from Peronosclerospora genomic DNA. The newly developed SSR markers can be readily used to distinguish isolates within several species of the oomycetes that cause downy mildew diseases. Also, microsatellite fragments likely include retrotransposon regions of DNA and these sequences can serve as useful genetic markers for strain identification, due to their degree of variability and their widespread occurrence among sorghum, maize, sugarcane, pearl millet and rose downy mildew isolates. PMID:19040756
LookSeq: a browser-based viewer for deep sequencing data.

PubMed

Manske, Heinrich Magnus; Kwiatkowski, Dominic P

2009-11-01

Sequencing a genome to great depth can be highly informative about heterogeneity within an individual or a population. Here we address the problem of how to visualize the multiple layers of information contained in deep sequencing data. We propose an interactive AJAX-based web viewer for browsing large data sets of aligned sequence reads. By enabling seamless browsing and fast zooming, the LookSeq program assists the user to assimilate information at different levels of resolution, from an overview of a genomic region to fine details such as heterogeneity within the sample. A specific problem, particularly if the sample is heterogeneous, is how to depict information about structural variation. LookSeq provides a simple graphical representation of paired sequence reads that is more revealing about potential insertions and deletions than are conventional methods.

Socio-Economic Background and Access to Internet as Correlates of Students' Achievement in Agricultural Science

ERIC Educational Resources Information Center

Adegoke, Sunday Paul; Osokoya, Modupe M.

2015-01-01

This study investigated access to internet and socio-economic background as correlates of students' achievement in Agricultural Science among selected Senior Secondary Schools Two Students in Ogbomoso South and North Local Government Areas. The study adopted multi-stage sampling technique. Simple random sampling was used to select 30 students from…
A Simple Exact Error Rate Analysis for DS-CDMA with Arbitrary Pulse Shape in Flat Nakagami Fading

NASA Astrophysics Data System (ADS)

Rahman, Mohammad Azizur; Sasaki, Shigenobu; Kikuchi, Hisakazu; Harada, Hiroshi; Kato, Shuzo

A simple exact error rate analysis is presented for random binary direct sequence code division multiple access (DS-CDMA) considering a general pulse shape and flat Nakagami fading channel. First of all, a simple model is developed for the multiple access interference (MAI). Based on this, a simple exact expression of the characteristic function (CF) of MAI is developed in a straight forward manner. Finally, an exact expression of error rate is obtained following the CF method of error rate analysis. The exact error rate so obtained can be much easily evaluated as compared to the only reliable approximate error rate expression currently available, which is based on the Improved Gaussian Approximation (IGA).
DOE Office of Scientific and Technical Information (OSTI.GOV)

Stevens, David E.

The process by which super-thermal ions slow down against background Coulomb potentials arises in many fields of study. In particular, this is one of the main mechanisms by which the mass and energy from the reaction products of fusion reactions is deposited back into the background. Many of these fields are characterized by length and time scales that are the same magnitude as the range and duration of the trajectory of these particles, before they thermalize into the background. This requires numerical simulation of this slowing down process through numerically integrating the velocities and energies of these particles. This papermore » first presents a simple introduction to the required plasma physics, followed by the description of the numerical integration used to integrate a beam of particles. This algorithm is unique in that it combines in an integrated manner both a second-order integration of the slowing down with the particle beam dispersion. These two processes are typically computed in isolation from each other. A simple test problem of a beam of alpha particles slowing down against an inert background of deuterium and tritium with varying properties of both the beam and the background illustrate the utility of the algorithm. This is followed by conclusions and appendices. The appendices define the notation, units, and several useful identities.« less
LISTA, LISTA-HOP and LISTA-HON: a comprehensive compilation of protein encoding sequences and its associated homology databases from the yeast Saccharomyces.

PubMed Central

Dölz, R; Mossé, M O; Slonimski, P P; Bairoch, A; Linder, P

1994-01-01

We continued our effort to make a comprehensive database (LISTA) for the yeast Saccharomyces cerevisiae. In this database each sequence has been attributed a single genetic name. In the case of duplicated sequences a simple method has been applied to distinguish between sequences of one and the same gene from non-allelic sequences of duplicated genes. If necessary, synonyms are given in the case of allelic duplicated sequences. Thus sequences can be found either by the name or by synonyms given in LISTA. Each entry contains the genetic name, the mnemonic from the EMBL data bank, the codon bias, reference of the publication of the sequence, Chromosomal location as far as known, Swissprot and EMBL accession numbers. To obtain more information on the included sequences, each entry has been screened against non-redundant nucleotide and protein data bank collections resulting in LISTA-HON and LISTA-HOP. The LISTA data base can be linked to the associated data sets or to nucleotide and protein banks by the Sequence Retrieval System (SRS). PMID:7937046
Prof. Hayashi's work on the pre-main sequence evolution and brown dwarfs

NASA Astrophysics Data System (ADS)

Nakano, Takenori

2012-09-01

Prof. Hayashi's work on the evolution of stars in the pre-main sequence stage is reviewed. The historical background and the process of finding the Hayashi phase are mentioned. The work on the evolution of low-mass stars is also reviewed including the determination of the bottom of the main sequence and evolution of brown dwarfs, and comparison is made with the other works in the same period.
High-Throughput Analysis of T-DNA Location and Structure Using Sequence Capture.

PubMed

Inagaki, Soichi; Henry, Isabelle M; Lieberman, Meric C; Comai, Luca

2015-01-01

Agrobacterium-mediated transformation of plants with T-DNA is used both to introduce transgenes and for mutagenesis. Conventional approaches used to identify the genomic location and the structure of the inserted T-DNA are laborious and high-throughput methods using next-generation sequencing are being developed to address these problems. Here, we present a cost-effective approach that uses sequence capture targeted to the T-DNA borders to select genomic DNA fragments containing T-DNA-genome junctions, followed by Illumina sequencing to determine the location and junction structure of T-DNA insertions. Multiple probes can be mixed so that transgenic lines transformed with different T-DNA types can be processed simultaneously, using a simple, index-based pooling approach. We also developed a simple bioinformatic tool to find sequence read pairs that span the junction between the genome and T-DNA or any foreign DNA. We analyzed 29 transgenic lines of Arabidopsis thaliana, each containing inserts from 4 different T-DNA vectors. We determined the location of T-DNA insertions in 22 lines, 4 of which carried multiple insertion sites. Additionally, our analysis uncovered a high frequency of unconventional and complex T-DNA insertions, highlighting the needs for high-throughput methods for T-DNA localization and structural characterization. Transgene insertion events have to be fully characterized prior to use as commercial products. Our method greatly facilitates the first step of this characterization of transgenic plants by providing an efficient screen for the selection of promising lines.
Directionality analysis on functional magnetic resonance imaging during motor task using Granger causality.

PubMed

Anwar, A R; Muthalib, M; Perrey, S; Galka, A; Granert, O; Wolff, S; Deuschl, G; Raethjen, J; Heute, U; Muthuraman, M

2012-01-01

Directionality analysis of signals originating from different parts of brain during motor tasks has gained a lot of interest. Since brain activity can be recorded over time, methods of time series analysis can be applied to medical time series as well. Granger Causality is a method to find a causal relationship between time series. Such causality can be referred to as a directional connection and is not necessarily bidirectional. The aim of this study is to differentiate between different motor tasks on the basis of activation maps and also to understand the nature of connections present between different parts of the brain. In this paper, three different motor tasks (finger tapping, simple finger sequencing, and complex finger sequencing) are analyzed. Time series for each task were extracted from functional magnetic resonance imaging (fMRI) data, which have a very good spatial resolution and can look into the sub-cortical regions of the brain. Activation maps based on fMRI images show that, in case of complex finger sequencing, most parts of the brain are active, unlike finger tapping during which only limited regions show activity. Directionality analysis on time series extracted from contralateral motor cortex (CMC), supplementary motor area (SMA), and cerebellum (CER) show bidirectional connections between these parts of the brain. In case of simple finger sequencing and complex finger sequencing, the strongest connections originate from SMA and CMC, while connections originating from CER in either direction are the weakest ones in magnitude during all paradigms.
Accounting for orphaned aftershocks in the earthquake background rate

USGS Publications Warehouse

Van Der Elst, Nicholas

2017-01-01

Aftershocks often occur within cascades of triggered seismicity in which each generation of aftershocks triggers an additional generation, and so on. The rate of earthquakes in any particular generation follows Omori's law, going approximately as 1/t. This function decays rapidly, but is heavy-tailed, and aftershock sequences may persist for long times at a rate that is difficult to discriminate from background. It is likely that some apparently spontaneous earthquakes in the observational catalogue are orphaned aftershocks of long-past main shocks. To assess the relative proportion of orphaned aftershocks in the apparent background rate, I develop an extension of the ETAS model that explicitly includes the expected contribution of orphaned aftershocks to the apparent background rate. Applying this model to California, I find that the apparent background rate can be almost entirely attributed to orphaned aftershocks, depending on the assumed duration of an aftershock sequence. This implies an earthquake cascade with a branching ratio (the average number of directly triggered aftershocks per main shock) of nearly unity. In physical terms, this implies that very few earthquakes are completely isolated from the perturbing effects of other earthquakes within the fault system. Accounting for orphaned aftershocks in the ETAS model gives more accurate estimates of the true background rate, and more realistic expectations for long-term seismicity patterns.
Accounting for orphaned aftershocks in the earthquake background rate

NASA Astrophysics Data System (ADS)

van der Elst, Nicholas J.

2017-11-01

Aftershocks often occur within cascades of triggered seismicity in which each generation of aftershocks triggers an additional generation, and so on. The rate of earthquakes in any particular generation follows Omori's law, going approximately as 1/t. This function decays rapidly, but is heavy-tailed, and aftershock sequences may persist for long times at a rate that is difficult to discriminate from background. It is likely that some apparently spontaneous earthquakes in the observational catalogue are orphaned aftershocks of long-past main shocks. To assess the relative proportion of orphaned aftershocks in the apparent background rate, I develop an extension of the ETAS model that explicitly includes the expected contribution of orphaned aftershocks to the apparent background rate. Applying this model to California, I find that the apparent background rate can be almost entirely attributed to orphaned aftershocks, depending on the assumed duration of an aftershock sequence. This implies an earthquake cascade with a branching ratio (the average number of directly triggered aftershocks per main shock) of nearly unity. In physical terms, this implies that very few earthquakes are completely isolated from the perturbing effects of other earthquakes within the fault system. Accounting for orphaned aftershocks in the ETAS model gives more accurate estimates of the true background rate, and more realistic expectations for long-term seismicity patterns.
Escherichia coli promoter sequences predict in vitro RNA polymerase selectivity.

PubMed

Mulligan, M E; Hawley, D K; Entriken, R; McClure, W R

1984-01-11

We describe a simple algorithm for computing a homology score for Escherichia coli promoters based on DNA sequence alone. The homology score was related to 31 values, measured in vitro, of RNA polymerase selectivity, which we define as the product KBk2, the apparent second order rate constant for open complex formation. We found that promoter strength could be predicted to within a factor of +/-4.1 in KBk2 over a range of 10(4) in the same parameter. The quantitative evaluation was linked to an automated (Apple II) procedure for searching and evaluating possible promoters in DNA sequence files.
Gamma-Ray Imaging Probes.

NASA Astrophysics Data System (ADS)

Wild, Walter James

1988-12-01

External nuclear medicine diagnostic imaging of early primary and metastatic lung cancer tumors is difficult due to the poor sensitivity and resolution of existing gamma cameras. Nonimaging counting detectors used for internal tumor detection give ambiguous results because distant background variations are difficult to discriminate from neighboring tumor sites. This suggests that an internal imaging nuclear medicine probe, particularly an esophageal probe, may be advantageously used to detect small tumors because of the ability to discriminate against background variations and the capability to get close to sites neighboring the esophagus. The design, theory of operation, preliminary bench tests, characterization of noise behavior and optimization of such an imaging probe is the central theme of this work. The central concept lies in the representation of the aperture shell by a sequence of binary digits. This, coupled with the mode of operation which is data encoding within an axial slice of space, leads to the fundamental imaging equation in which the coding operation is conveniently described by a circulant matrix operator. The coding/decoding process is a classic coded-aperture problem, and various estimators to achieve decoding are discussed. Some estimators require a priori information about the object (or object class) being imaged; the only unbiased estimator that does not impose this requirement is the simple inverse-matrix operator. The effects of noise on the estimate (or reconstruction) is discussed for general noise models and various codes/decoding operators. The choice of an optimal aperture for detector count times of clinical relevance is examined using a statistical class-separability formalism.
Simple data-smoothing and noise-suppression technique

NASA Technical Reports Server (NTRS)

Duty, R. L.

1970-01-01

Algorithm, based on the Borel method of summing divergent sequences, is used for smoothing noisy data where knowledge of frequency content is not required. Technique's effectiveness is demonstrated by a series of graphs.
Building Mathematical Models of Simple Harmonic and Damped Motion.

ERIC Educational Resources Information Center

Edwards, Thomas

1995-01-01

By developing a sequence of mathematical models of harmonic motion, shows that mathematical models are not right or wrong, but instead are better or poorer representations of the problem situation. (MKR)
Genetic variation assessment of acid lime accessions collected from south of Iran using SSR and ISSR molecular markers.

PubMed

Sharafi, Ata Allah; Abkenar, Asad Asadi; Sharafi, Ali; Masaeli, Mohammad

2016-01-01

Iran has a long history of acid lime cultivation and propagation. In this study, genetic variation in 28 acid lime accessions from five regions of south of Iran, and their relatedness with other 19 citrus cultivars were analyzed using Simple Sequence Repeat (SSR) and Inter-Simple Sequence Repeat (ISSR) molecular markers. Nine primers for SSR and nine ISSR primers were used for allele scoring. In total, 49 SSR and 131 ISSR polymorphic alleles were detected. Cluster analysis of SSR and ISSR data showed that most of the acid lime accessions (19 genotypes) have hybrid origin and genetically distance with nucellar of Mexican lime (9 genotypes). As nucellar of Mexican lime are susceptible to phytoplasma, these acid lime genotypes can be used to evaluate their tolerance against biotic constricts like lime "witches' broom disease".
A set of tetra-nucleotide core motif SSR markers for efficient identification of potato (Solanum tuberosum) cultivars.

PubMed

Kishine, Masahiro; Tsutsumi, Katsuji; Kitta, Kazumi

2017-12-01

Simple sequence repeat (SSR) is a popular tool for individual fingerprinting. The long-core motif (e.g. tetra-, penta-, and hexa-nucleotide) simple sequence repeats (SSRs) are preferred because they make it easier to separate and distinguish neighbor alleles. In the present study, a new set of 8 tetra-nucleotide SSRs in potato ( Solanum tuberosum ) is reported. By using these 8 markers, 72 out of 76 cultivars obtained from Japan and the United States were clearly discriminated, while two pairs, both of which arose from natural variation, showed identical profiles. The combined probability of identity between two random cultivars for the set of 8 SSR markers was estimated to be 1.10 × 10 -8 , confirming the usefulness of the proposed SSR markers for fingerprinting analyses of potato.
PERMutation Using Transposase Engineering (PERMUTE): A Simple Approach for Constructing Circularly Permuted Protein Libraries.

PubMed

Jones, Alicia M; Atkinson, Joshua T; Silberg, Jonathan J

2017-01-01

Rearrangements that alter the order of a protein's sequence are used in the lab to study protein folding, improve activity, and build molecular switches. One of the simplest ways to rearrange a protein sequence is through random circular permutation, where native protein termini are linked together and new termini are created elsewhere through random backbone fission. Transposase mutagenesis has emerged as a simple way to generate libraries encoding different circularly permuted variants of proteins. With this approach, a synthetic transposon (called a permuteposon) is randomly inserted throughout a circularized gene to generate vectors that express different permuted variants of a protein. In this chapter, we outline the protocol for constructing combinatorial libraries of circularly permuted proteins using transposase mutagenesis, and we describe the different permuteposons that have been developed to facilitate library construction.
Object-oriented parsing of biological databases with Python.

PubMed

Ramu, C; Gemünd, C; Gibson, T J

2000-07-01

While database activities in the biological area are increasing rapidly, rather little is done in the area of parsing them in a simple and object-oriented way. We present here an elegant, simple yet powerful way of parsing biological flat-file databases. We have taken EMBL, SWISSPROT and GENBANK as examples. EMBL and SWISS-PROT do not differ much in the format structure. GENBANK has a very different format structure than EMBL and SWISS-PROT. Extracting the desired fields in an entry (for example a sub-sequence with an associated feature) for later analysis is a constant need in the biological sequence-analysis community: this is illustrated with tools to make new splice-site databases. The interface to the parser is abstract in the sense that the access to all the databases is independent from their different formats, since parsing instructions are hidden.
Short Communication: Genetic linkage map of Cucurbita maxima with molecular and morphological markers.

PubMed

Ge, Y; Li, X; Yang, X X; Cui, C S; Qu, S P

2015-05-22

Cucurbita maxima is one of the most widely cultivated vegetables in China and exhibits distinct morphological characteristics. In this study, genetic linkage analysis with 57 simple-sequence repeats, 21 amplified fragment length polymorphisms, 3 random-amplified polymorphic DNA, and one morphological marker revealed 20 genetic linkage groups of C. maxima covering a genetic distance of 991.5 cM with an average of 12.1 cM between adjacent markers. Genetic linkage analysis identified the simple-sequence repeat marker 'PU078072' 5.9 cM away from the locus 'Rc', which controls rind color. The genetic map in the present study will be useful for better mapping, tagging, and cloning of quantitative trait loci/gene(s) affecting economically important traits and for breeding new varieties of C. maxima through marker-assisted selection.
Genetic diversity of Pinus nigra Arn. populations in Southern Spain and Northern Morocco revealed by inter-simple sequence repeat profiles.

PubMed

Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E; Tiscar, Pedro A; Viñegla, Benjamin; Linares, Juan C; Gómez-Gómez, Lourdes; Ahrazem, Oussama

2012-01-01

Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei's genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei's genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups-Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco-while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra.
Genetic Diversity of Pinus nigra Arn. Populations in Southern Spain and Northern Morocco Revealed By Inter-Simple Sequence Repeat Profiles †

PubMed Central

Rubio-Moraga, Angela; Candel-Perez, David; Lucas-Borja, Manuel E.; Tiscar, Pedro A.; Viñegla, Benjamin; Linares, Juan C.; Gómez-Gómez, Lourdes; Ahrazem, Oussama

2012-01-01

Eight Pinus nigra Arn. populations from Southern Spain and Northern Morocco were examined using inter-simple sequence repeat markers to characterize the genetic variability amongst populations. Pair-wise population genetic distance ranged from 0.031 to 0.283, with a mean of 0.150 between populations. The highest inter-population average distance was between PaCU from Cuenca and YeCA from Cazorla, while the lowest distance was between TaMO from Morocco and MA Sierra Mágina populations. Analysis of molecular variance (AMOVA) and Nei’s genetic diversity analyses revealed higher genetic variation within the same population than among different populations. Genetic differentiation (Gst) was 0.233. Cuenca showed the highest Nei’s genetic diversity followed by the Moroccan region, Sierra Mágina, and Cazorla region. However, clustering of populations was not in accordance with their geographical locations. Principal component analysis showed the presence of two major groups—Group 1 contained all populations from Cuenca while Group 2 contained populations from Cazorla, Sierra Mágina and Morocco—while Bayesian analysis revealed the presence of three clusters. The low genetic diversity observed in PaCU and YeCA is probably a consequence of inappropriate management since no estimation of genetic variability was performed before the silvicultural treatments. Data indicates that the inter-simple sequence repeat (ISSR) method is sufficiently informative and powerful to assess genetic variability among populations of P. nigra. PMID:22754321

Genetic fidelity and variability of micropropagated cassava plants (Manihot esculenta Crantz) evaluated using ISSR markers.

PubMed

Vidal, Á M; Vieira, L J; Ferreira, C F; Souza, F V D; Souza, A S; Ledo, C A S

2015-07-14

Molecular markers are efficient for assessing the genetic fidelity of various species of plants after in vitro culture. In this study, we evaluated the genetic fidelity and variability of micropropagated cassava plants (Manihot esculenta Crantz) using inter-simple sequence repeat markers. Twenty-two cassava accessions from the Embrapa Cassava & Fruits Germplasm Bank were used. For each accession, DNA was extracted from a plant maintained in the field and from 3 plants grown in vitro. For DNA amplification, 27 inter-simple sequence repeat primers were used, of which 24 generated 175 bands; 100 of those bands were polymorphic and were used to study genetic variability among accessions of cassava plants maintained in the field. Based on the genetic distance matrix calculated using the arithmetic complement of the Jaccard's index, genotypes were clustered using the unweighted pair group method using arithmetic averages. The number of bands per primer was 2-13, with an average of 7.3. For most micropropagated accessions, the fidelity study showed no genetic variation between plants of the same accessions maintained in the field and those maintained in vitro, confirming the high genetic fidelity of the micropropagated plants. However, genetic variability was observed among different accessions grown in the field, and clustering based on the dissimilarity matrix revealed 7 groups. Inter-simple sequence repeat markers were efficient for detecting the genetic homogeneity of cassava plants derived from meristem culture, demonstrating the reliability of this propagation system.
Automatic segmentation of low-visibility moving objects through energy analyis of the local 3D spectrum

NASA Astrophysics Data System (ADS)

Nestares, Oscar; Miravet, Carlos; Santamaria, Javier; Fonolla Navarro, Rafael

1999-05-01

Automatic object segmentation in highly noisy image sequences, composed by a translating object over a background having a different motion, is achieved through joint motion-texture analysis. Local motion and/or texture is characterized by the energy of the local spatio-temporal spectrum, as different textures undergoing different translational motions display distinctive features in their 3D (x,y,t) spectra. Measurements of local spectrum energy are obtained using a bank of directional 3rd order Gaussian derivative filters in a multiresolution pyramid in space- time (10 directions, 3 resolution levels). These 30 energy measurements form a feature vector describing texture-motion for every pixel in the sequence. To improve discrimination capability and reduce computational cost, we automatically select those 4 features (channels) that best discriminate object from background, under the assumptions that the object is smaller than the background and has a different velocity or texture. In this way we reject features irrelevant or dominated by noise, that could yield wrong segmentation results. This method has been successfully applied to sequences with extremely low visibility and for objects that are even invisible for the eye in absence of motion.
Impact of the HIV-1 genetic background and HIV-1 population size on the evolution of raltegravir resistance.

PubMed

Fun, Axel; Leitner, Thomas; Vandekerckhove, Linos; Däumer, Martin; Thielen, Alexander; Buchholz, Bernd; Hoepelman, Andy I M; Gisolf, Elizabeth H; Schipper, Pauline J; Wensing, Annemarie M J; Nijhuis, Monique

2018-01-05

Emergence of resistance against integrase inhibitor raltegravir in human immunodeficiency virus type 1 (HIV-1) patients is generally associated with selection of one of three signature mutations: Y143C/R, Q148K/H/R or N155H, representing three distinct resistance pathways. The mechanisms that drive selection of a specific pathway are still poorly understood. We investigated the impact of the HIV-1 genetic background and population dynamics on the emergence of raltegravir resistance. Using deep sequencing we analyzed the integrase coding sequence (CDS) in longitudinal samples from five patients who initiated raltegravir plus optimized background therapy at viral loads > 5000 copies/ml. To investigate the role of the HIV-1 genetic background we created recombinant viruses containing the viral integrase coding region from pre-raltegravir samples from two patients in whom raltegravir resistance developed through different pathways. The in vitro selections performed with these recombinant viruses were designed to mimic natural population bottlenecks. Deep sequencing analysis of the viral integrase CDS revealed that the virological response to raltegravir containing therapy inversely correlated with the relative amount of unique sequence variants that emerged suggesting diversifying selection during drug pressure. In 4/5 patients multiple signature mutations representing different resistance pathways were observed. Interestingly, the resistant population can consist of a single resistant variant that completely dominates the population but also of multiple variants from different resistance pathways that coexist in the viral population. We also found evidence for increased diversification after stronger bottlenecks. In vitro selections with low viral titers, mimicking population bottlenecks, revealed that both recombinant viruses and HXB2 reference virus were able to select mutations from different resistance pathways, although typically only one resistance pathway emerged in each individual culture. The generation of a specific raltegravir resistant variant is not predisposed in the genetic background of the viral integrase CDS. Typically, in the early phases of therapy failure the sequence space is explored and multiple resistance pathways emerge and then compete for dominance which frequently results in a switch of the dominant population over time towards the fittest variant or even multiple variants of similar fitness that can coexist in the viral population.
Observable tensor-to-scalar ratio and secondary gravitational wave background

NASA Astrophysics Data System (ADS)

Chatterjee, Arindam; Mazumdar, Anupam

2018-03-01

In this paper we will highlight how a simple vacuum energy dominated inflection-point inflation can match the current data from cosmic microwave background radiation, and predict large primordial tensor to scalar ratio, r ˜O (10-3-10-2), with observable second order gravitational wave background, which can be potentially detectable from future experiments, such as DECi-hertz Interferometer Gravitational wave Observatory (DECIGO), Laser Interferometer Space Antenna (eLISA), cosmic explorer (CE), and big bang observatory (BBO).
Comparison of double-locus sequence typing (DLST) and multilocus sequence typing (MLST) for the investigation of Pseudomonas aeruginosa populations.

PubMed

Cholley, Pascal; Stojanov, Milos; Hocquet, Didier; Thouverez, Michelle; Bertrand, Xavier; Blanc, Dominique S

2015-08-01

Reliable molecular typing methods are necessary to investigate the epidemiology of bacterial pathogens. Reference methods such as multilocus sequence typing (MLST) and pulsed-field gel electrophoresis (PFGE) are costly and time consuming. Here, we compared our newly developed double-locus sequence typing (DLST) method for Pseudomonas aeruginosa to MLST and PFGE on a collection of 281 isolates. DLST was as discriminatory as MLST and was able to recognize "high-risk" epidemic clones. Both methods were highly congruent. Not surprisingly, a higher discriminatory power was observed with PFGE. In conclusion, being a simple method (single-strand sequencing of only 2 loci), DLST is valuable as a first-line typing tool for epidemiological investigations of P. aeruginosa. Coupled to a more discriminant method like PFGE or whole genome sequencing, it might represent an efficient typing strategy to investigate or prevent outbreaks. Copyright © 2015 Elsevier Inc. All rights reserved.
SOBA: sequence ontology bioinformatics analysis.

PubMed

Moore, Barry; Fan, Guozhen; Eilbeck, Karen

2010-07-01

The advent of cheaper, faster sequencing technologies has pushed the task of sequence annotation from the exclusive domain of large-scale multi-national sequencing projects to that of research laboratories and small consortia. The bioinformatics burden placed on these laboratories, some with very little programming experience can be daunting. Fortunately, there exist software libraries and pipelines designed with these groups in mind, to ease the transition from an assembled genome to an annotated and accessible genome resource. We have developed the Sequence Ontology Bioinformatics Analysis (SOBA) tool to provide a simple statistical and graphical summary of an annotated genome. We envisage its use during annotation jamborees, genome comparison and for use by developers for rapid feedback during annotation software development and testing. SOBA also provides annotation consistency feedback to ensure correct use of terminology within annotations, and guides users to add new terms to the Sequence Ontology when required. SOBA is available at http://www.sequenceontology.org/cgi-bin/soba.cgi.
A Gibbs sampler for motif detection in phylogenetically close sequences

NASA Astrophysics Data System (ADS)

Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

2004-03-01

Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.
Coherent direct sequence optical code multiple access encoding-decoding efficiency versus wavelength detuning.

PubMed

Pastor, D; Amaya, W; García-Olcina, R; Sales, S

2007-07-01

We present a simple theoretical model of and the experimental verification for vanishing of the autocorrelation peak due to wavelength detuning on the coding-decoding process of coherent direct sequence optical code multiple access systems based on a superstructured fiber Bragg grating. Moreover, the detuning vanishing effect has been explored to take advantage of this effect and to provide an additional degree of multiplexing and/or optical code tuning.
An Activation-Based Model of Routine Sequence Errors

DTIC Science & Technology

2015-04-01

part of the ACT-R frame- work (e.g., Anderson, 1983), we adopt a newer, richer no- tion of priming as part of our approach ( Harrison & Trafton, 2010...2014). Other models of routine sequence errors, such as the in- teractive activation network ( IAN ) model (Cooper & Shal- lice, 2006) and the simple...error patterns that results from an interface layout shift. The ideas behind our expanded priming approach, however, could apply to IAN , which uses
Estimate of Cosmic Muon Background for Shallow Underground Neutrino Detectors

NASA Astrophysics Data System (ADS)

Casimiro, E.; Simão, F. R. A.; Anjos, J. C.

One of the severe limitations in detecting neutrino signals from nuclear reactors is that the copious cosmic ray background imposes the use of a time veto upon the passage of the muons to reduce the number of fake signals due to muon-induced spallation neutrons. For this reason neutrino detectors are usually located underground, with a large overburden. However there are practical limitations that do restrain from locating the detectors at large depths underground. In order to decide the depth underground at which the Neutrino Angra Detector (currently in preparation) should be installed, an estimate of the cosmogenic background in the detector as a function of the depth is required. We report here a simple analytical estimation of the muon rates in the detector volume for different plausible depths, assuming a simple plain overburden geometry. We extend the calculation to the case of the San Onofre neutrino detector and to the case of the Double Chooz neutrino detector, where other estimates or measurements have been performed. Our estimated rates are consistent.
SCCmecFinder, a Web-Based Tool for Typing of Staphylococcal Cassette Chromosome mec in Staphylococcus aureus Using Whole-Genome Sequence Data.

PubMed

Kaya, Hülya; Hasman, Henrik; Larsen, Jesper; Stegger, Marc; Johannesen, Thor Bech; Allesøe, Rosa Lundbye; Lemvigh, Camilla Koldbæk; Aarestrup, Frank Møller; Lund, Ole; Larsen, Anders Rhod

2018-01-01

Typing of methicillin-resistant Staphylococcus aureus (MRSA) is important in infection control and surveillance. The current nomenclature of MRSA includes the genetic background of the S. aureus strain determined by multilocus sequence typing (MLST) or equivalent methods like spa typing and typing of the mobile genetic element staphylococcal cassette chromosome mec (SCC mec ), which carries the mecA or mecC gene. Whereas MLST and spa typing are relatively simple, typing of SCC mec is less trivial because of its heterogeneity. Whole-genome sequencing (WGS) provides the essential data for typing of the genetic background and SCC mec , but so far, no bioinformatic tools for SCC mec typing have been available. Here, we report the development and evaluation of SCC mec Finder for characterization of the SCC mec element from S. aureus WGS data. SCC mec Finder is able to identify all SCC mec element types, designated I to XIII, with subtyping of SCC mec types IV (2B) and V (5C2). SCC mec elements are characterized by two different gene prediction approaches to achieve correct annotation, a Basic Local Alignment Search Tool (BLAST)-based approach and a k -mer-based approach. Evaluation of SCC mec Finder by using a diverse collection of clinical isolates ( n = 93) showed a high typeability level of 96.7%, which increased to 98.9% upon modification of the default settings. In conclusion, SCC mec Finder can be an alternative to more laborious SCC mec typing methods and is freely available at https://cge.cbs.dtu.dk/services/SCCmecFinder. IMPORTANCE SCC mec in MRSA is acknowledged to be of importance not only because it contains the mecA or mecC gene but also for staphylococcal adaptation to different environments, e.g., in hospitals, the community, and livestock. Typing of SCC mec by PCR techniques has, because of its heterogeneity, been challenging, and whole-genome sequencing has only partially solved this since no good bioinformatic tools have been available. In this article, we describe the development of a new bioinformatic tool, SCC mec Finder, that includes most of the needs for infection control professionals and researchers regarding the interpretation of SCC mec elements. The software detects all of the SCC mec elements accepted by the International Working Group on the Classification of Staphylococcal Cassette Chromosome Elements, and users will be prompted if diverging and potential new elements are uploaded. Furthermore, SCC mec Finder will be curated and updated as new elements are found and it is easy to use and freely accessible.
Elements of Mathematics, Book O: Intuitive Background. Chapter 1, Operational Systems.

ERIC Educational Resources Information Center

Exner, Robert; And Others

The sixteen chapters of this book provide the core material for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…
The Heteronuclear Single-Quantum Correlation (HSQC) Experiment: Vectors versus Product Operators

ERIC Educational Resources Information Center

de la Vega-Herna´ndez, Karen; Antuch, Manuel

2015-01-01

A vectorial representation of the full sequence of events occurring during the 2D-NMR heteronuclear single-quantum correlation (HSQC) experiment is presented. The proposed vectorial representation conveys an understanding of the magnetization evolution during the HSQC pulse sequence for those who have little or no quantum mechanical background.…
Elements of Mathematics, Book O: Intuitive Background. Chapter 5, Mappings.

ERIC Educational Resources Information Center

Exner, Robert; And Others

The sixteen chapters of this book provide the core material for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…
Use of Intragenic Sequence Ribotyping (ISR) for serotyping Salmonella obtained from poultry and their environment

USDA-ARS?s Scientific Manuscript database

BACKGROUND: The dkgB-linked ribosomal region of Salmonella enterica flanking a 5S gene shows genetic heterogeneity that distinguishes closely related serovars such as Enteritidis, Dublin, Gallinarum and Pullorum (Morales et al, 2006). We wanted to know how sequence-based ISR compared to the traditio...
Elements of Mathematics, Book O: Intuitive Background. Chapter 2, The Integers.

ERIC Educational Resources Information Center

Exner, Robert; And Others

The sixteen chapters of this book provide the core materials for the Elements of Mathematics Program, a secondary sequence developed for highly motivated students with strong verbal abilities. The sequence is based on a functional-relational approach to mathematics teaching, and emphasizes teaching by analysis of real-life situations. This text is…
An Alternative Time for Telling: When Conceptual Instruction Prior to Problem Solving Improves Mathematical Knowledge

ERIC Educational Resources Information Center

Fyfe, Emily R.; DeCaro, Marci S.; Rittle-Johnson, Bethany

2014-01-01

Background: The sequencing of learning materials greatly influences the knowledge that learners construct. Recently, learning theorists have focused on the sequencing of instruction in relation to solving related problems. The general consensus suggests explicit instruction should be provided; however, when to provide instruction remains unclear.…
Improvements in Technique of NMR Imaging and NMR Diffusion Measurements in the Presence of Background Gradients.

NASA Astrophysics Data System (ADS)

Lian, Jianyu

In this work, modification of the cosine current distribution rf coil, PCOS, has been introduced and tested. The coil produces a very homogeneous rf magnetic field, and it is inexpensive to build and easy to tune for multiple resonance frequency. The geometrical parameters of the coil are optimized to produce the most homogeneous rf field over a large volume. To avoid rf field distortion when the coil length is comparable to a quarter wavelength, a parallel PCOS coil is proposed and discussed. For testing rf coils and correcting B _1 in NMR experiments, a simple, rugged and accurate NMR rf field mapping technique has been developed. The method has been tested and used in 1D, 2D, 3D and in vivo rf mapping experiments. The method has been proven to be very useful in the design of rf coils. To preserve the linear relation between rf output applied on an rf coil and modulating input for an rf modulating -amplifying system of NMR imaging spectrometer, a quadrature feedback loop is employed in an rf modulator with two orthogonal rf channels to correct the amplitude and phase non-linearities caused by the rf components in the rf system. The modulator is very linear over a large range and it can generate an arbitrary rf shape. A diffusion imaging sequence has been developed for measuring and imaging diffusion in the presence of background gradients. Cross terms between the diffusion sensitizing gradients and background gradients or imaging gradients can complicate diffusion measurement and make the interpretation of NMR diffusion data ambiguous, but these have been eliminated in this method. Further, the background gradients has been measured and imaged. A dipole random distribution model has been established to study background magnetic fields Delta B and background magnetic gradients G_0 produced by small particles in a sample when it is in a B_0 field. From this model, the minimum distance that a spin can approach a particle can be determined by measuring and <{bf G}_sp{0 }{2}>. From this model, the particle concentration in a sample can be determined by measuring the lineshape of a free induction decay (fid).
Modeling RNA interference in mammalian cells

PubMed Central

2011-01-01

Background RNA interference (RNAi) is a regulatory cellular process that controls post-transcriptional gene silencing. During RNAi double-stranded RNA (dsRNA) induces sequence-specific degradation of homologous mRNA via the generation of smaller dsRNA oligomers of length between 21-23nt (siRNAs). siRNAs are then loaded onto the RNA-Induced Silencing multiprotein Complex (RISC), which uses the siRNA antisense strand to specifically recognize mRNA species which exhibit a complementary sequence. Once the siRNA loaded-RISC binds the target mRNA, the mRNA is cleaved and degraded, and the siRNA loaded-RISC can degrade additional mRNA molecules. Despite the widespread use of siRNAs for gene silencing, and the importance of dosage for its efficiency and to avoid off target effects, none of the numerous mathematical models proposed in literature was validated to quantitatively capture the effects of RNAi on the target mRNA degradation for different concentrations of siRNAs. Here, we address this pressing open problem performing in vitro experiments of RNAi in mammalian cells and testing and comparing different mathematical models fitting experimental data to in-silico generated data. We performed in vitro experiments in human and hamster cell lines constitutively expressing respectively EGFP protein or tTA protein, measuring both mRNA levels, by quantitative Real-Time PCR, and protein levels, by FACS analysis, for a large range of concentrations of siRNA oligomers. Results We tested and validated four different mathematical models of RNA interference by quantitatively fitting models' parameters to best capture the in vitro experimental data. We show that a simple Hill kinetic model is the most efficient way to model RNA interference. Our experimental and modeling findings clearly show that the RNAi-mediated degradation of mRNA is subject to saturation effects. Conclusions Our model has a simple mathematical form, amenable to analytical investigations and a small set of parameters with an intuitive physical meaning, that makes it a unique and reliable mathematical tool. The findings here presented will be a useful instrument for better understanding RNAi biology and as modelling tool in Systems and Synthetic Biology. PMID:21272352
Analysis of Litopenaeus vannamei Transcriptome Using the Next-Generation DNA Sequencing Technique

PubMed Central

Li, Chaozheng; Weng, Shaoping; Chen, Yonggui; Yu, Xiaoqiang; Lü, Ling; Zhang, Haiqing; He, Jianguo; Xu, Xiaopeng

2012-01-01

Background Pacific white shrimp (Litopenaeus vannamei), the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. Methodology/Principal Findings This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp) with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG) categories, 8171 unigenes were assigned into 51 Gene ontology (GO) functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. Conclusions/Significance The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei. PMID:23071809

Some links on this page may take you to non-federal websites. Their policies may differ from this site.