random sequence generation: Topics by Science.gov

Sample records for random sequence generation

Image encryption using random sequence generated from generalized information domain

NASA Astrophysics Data System (ADS)

Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

2016-05-01

A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Real-time fast physical random number generator with a photonic integrated circuit.

PubMed

Ugajin, Kazusa; Terashima, Yuta; Iwakawa, Kento; Uchida, Atsushi; Harayama, Takahisa; Yoshimura, Kazuyuki; Inubushi, Masanobu

2017-03-20

Random number generators are essential for applications in information security and numerical simulations. Most optical-chaos-based random number generators produce random bit sequences by offline post-processing with large optical components. We demonstrate a real-time hardware implementation of a fast physical random number generator with a photonic integrated circuit and a field programmable gate array (FPGA) electronic board. We generate 1-Tbit random bit sequences and evaluate their statistical randomness using NIST Special Publication 800-22 and TestU01. All of the BigCrush tests in TestU01 are passed using 410-Gbit random bit sequences. A maximum real-time generation rate of 21.1 Gb/s is achieved for random bit sequences in binary format stored in a computer, which can be directly used for applications involving secret keys in cryptography and random seeds in large-scale numerical simulations.
Implementation of a quantum random number generator based on the optimal clustering of photocounts

NASA Astrophysics Data System (ADS)

Balygin, K. A.; Zaitsev, V. I.; Klimov, A. N.; Kulik, S. P.; Molotkov, S. N.

2017-10-01

To implement quantum random number generators, it is fundamentally important to have a mathematically provable and experimentally testable process of measurements of a system from which an initial random sequence is generated. This makes sure that randomness indeed has a quantum nature. A quantum random number generator has been implemented with the use of the detection of quasi-single-photon radiation by a silicon photomultiplier (SiPM) matrix, which makes it possible to reliably reach the Poisson statistics of photocounts. The choice and use of the optimal clustering of photocounts for the initial sequence of photodetection events and a method of extraction of a random sequence of 0's and 1's, which is polynomial in the length of the sequence, have made it possible to reach a yield rate of 64 Mbit/s of the output certainly random sequence.
Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

PubMed

Shteingart, Hanan; Loewenstein, Yonatan

2016-01-01

There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

NASA Technical Reports Server (NTRS)

Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

2007-01-01

We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
A Comparison of Three Random Number Generators for Aircraft Dynamic Modeling Applications

NASA Technical Reports Server (NTRS)

Grauer, Jared A.

2017-01-01

Three random number generators, which produce Gaussian white noise sequences, were compared to assess their suitability in aircraft dynamic modeling applications. The first generator considered was the MATLAB (registered) implementation of the Mersenne-Twister algorithm. The second generator was a website called Random.org, which processes atmospheric noise measured using radios to create the random numbers. The third generator was based on synthesis of the Fourier series, where the random number sequences are constructed from prescribed amplitude and phase spectra. A total of 200 sequences, each having 601 random numbers, for each generator were collected and analyzed in terms of the mean, variance, normality, autocorrelation, and power spectral density. These sequences were then applied to two problems in aircraft dynamic modeling, namely estimating stability and control derivatives from simulated onboard sensor data, and simulating flight in atmospheric turbulence. In general, each random number generator had good performance and is well-suited for aircraft dynamic modeling applications. Specific strengths and weaknesses of each generator are discussed. For Monte Carlo simulation, the Fourier synthesis method is recommended because it most accurately and consistently approximated Gaussian white noise and can be implemented with reasonable computational effort.
Simulations Using Random-Generated DNA and RNA Sequences

ERIC Educational Resources Information Center

Bryce, C. F. A.

1977-01-01

Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
DNA-based random number generation in security circuitry.

PubMed

Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

2010-06-01

DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.
On the design of henon and logistic map-based random number generator

NASA Astrophysics Data System (ADS)

Magfirawaty; Suryadi, M. T.; Ramli, Kalamullah

2017-10-01

The key sequence is one of the main elements in the cryptosystem. True Random Number Generators (TRNG) method is one of the approaches to generating the key sequence. The randomness source of the TRNG divided into three main groups, i.e. electrical noise based, jitter based and chaos based. The chaos based utilizes a non-linear dynamic system (continuous time or discrete time) as an entropy source. In this study, a new design of TRNG based on discrete time chaotic system is proposed, which is then simulated in LabVIEW. The principle of the design consists of combining 2D and 1D chaotic systems. A mathematical model is implemented for numerical simulations. We used comparator process as a harvester method to obtain the series of random bits. Without any post processing, the proposed design generated random bit sequence with high entropy value and passed all NIST 800.22 statistical tests.
Generation of Aptamers from A Primer-Free Randomized ssDNA Library Using Magnetic-Assisted Rapid Aptamer Selection

NASA Astrophysics Data System (ADS)

Tsao, Shih-Ming; Lai, Ji-Ching; Horng, Horng-Er; Liu, Tu-Chen; Hong, Chin-Yih

2017-04-01

Aptamers are oligonucleotides that can bind to specific target molecules. Most aptamers are generated using random libraries in the standard systematic evolution of ligands by exponential enrichment (SELEX). Each random library contains oligonucleotides with a randomized central region and two fixed primer regions at both ends. The fixed primer regions are necessary for amplifying target-bound sequences by PCR. However, these extra-sequences may cause non-specific bindings, which potentially interfere with good binding for random sequences. The Magnetic-Assisted Rapid Aptamer Selection (MARAS) is a newly developed protocol for generating single-strand DNA aptamers. No repeat selection cycle is required in the protocol. This study proposes and demonstrates a method to isolate aptamers for C-reactive proteins (CRP) from a randomized ssDNA library containing no fixed sequences at 5‧ and 3‧ termini using the MARAS platform. Furthermore, the isolated primer-free aptamer was sequenced and binding affinity for CRP was analyzed. The specificity of the obtained aptamer was validated using blind serum samples. The result was consistent with monoclonal antibody-based nephelometry analysis, which indicated that a primer-free aptamer has high specificity toward targets. MARAS is a feasible platform for efficiently generating primer-free aptamers for clinical diagnoses.
Truly random number generation: an example

NASA Astrophysics Data System (ADS)

Frauchiger, Daniela; Renner, Renato

2013-10-01

Randomness is crucial for a variety of applications, ranging from gambling to computer simulations, and from cryptography to statistics. However, many of the currently used methods for generating randomness do not meet the criteria that are necessary for these applications to work properly and safely. A common problem is that a sequence of numbers may look random but nevertheless not be truly random. In fact, the sequence may pass all standard statistical tests and yet be perfectly predictable. This renders it useless for many applications. For example, in cryptography, the predictability of a "andomly" chosen password is obviously undesirable. Here, we review a recently developed approach to generating true | and hence unpredictable | randomness.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

NASA Astrophysics Data System (ADS)

Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

2011-07-01

An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.
Using Maximum Entropy to Find Patterns in Genomes

NASA Astrophysics Data System (ADS)

Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.
Analysis of Uniform Random Numbers Generated by Randu and Urn Ten Different Seeds.

DTIC Science & Technology

The statistical properties of the numbers generated by two uniform random number generators, RANDU and URN, each using ten different seeds are...The testing is performed on a sequence of 50,000 numbers generated by each uniform random number generator using each of the ten seeds . (Author)
A Micro-Computer Model for Army Air Defense Training.

DTIC Science & Technology

1985-03-01

generator. The period is 32763 numbers generated before a repetitive sequence is encountered on the development system. Chi-Squared tests for frequency...C’ Tests CPeriodicity. The period is 32763 numbers generated C’before a repetitive sequence is encountered on the development system. This was...positions in the test array. This was done with several different random number seeds. In each case 32763 p random numbers were generated before a
Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

PubMed

Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

2017-05-01

Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.
Secure self-calibrating quantum random-bit generator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fiorentino, M.; Santori, C.; Spillane, S. M.

2007-03-15

Random-bit generators (RBGs) are key components of a variety of information processing applications ranging from simulations to cryptography. In particular, cryptographic systems require 'strong' RBGs that produce high-entropy bit sequences, but traditional software pseudo-RBGs have very low entropy content and therefore are relatively weak for cryptography. Hardware RBGs yield entropy from chaotic or quantum physical systems and therefore are expected to exhibit high entropy, but in current implementations their exact entropy content is unknown. Here we report a quantum random-bit generator (QRBG) that harvests entropy by measuring single-photon and entangled two-photon polarization states. We introduce and implement a quantum tomographicmore » method to measure a lower bound on the 'min-entropy' of the system, and we employ this value to distill a truly random-bit sequence. This approach is secure: even if an attacker takes control of the source of optical states, a secure random sequence can be distilled.« less
Random sequences generation through optical measurements by phase-shifting interferometry

NASA Astrophysics Data System (ADS)

François, M.; Grosges, T.; Barchiesi, D.; Erra, R.; Cornet, A.

2012-04-01

The development of new techniques for producing random sequences with a high level of security is a challenging topic of research in modern cryptographics. The proposed method is based on the measurement by phase-shifting interferometry of the speckle signals of the interaction between light and structures. We show how the combination of amplitude and phase distributions (maps) under a numerical process can produce random sequences. The produced sequences satisfy all the statistical requirements of randomness and can be used in cryptographic schemes.
Problems with the random number generator RANF implemented on the CDC cyber 205

NASA Astrophysics Data System (ADS)

Kalle, Claus; Wansleben, Stephan

1984-10-01

We show that using RANF may lead to wrong results when lattice models are simulated by Monte Carlo methods. We present a shift-register sequence random number generator which generates two random numbers per cycle on a two pipe CDC Cyber 205.

Generating constrained randomized sequences: item frequency matters.

PubMed

French, Robert M; Perruchet, Pierre

2009-11-01

All experimental psychologists understand the importance of randomizing lists of items. However, randomization is generally constrained, and these constraints-in particular, not allowing immediately repeated items-which are designed to eliminate particular biases, frequently engender others. We describe a simple Monte Carlo randomization technique that solves a number of these problems. However, in many experimental settings, we are concerned not only with the number and distribution of items but also with the number and distribution of transitions between items. The algorithm mentioned above provides no control over this. We therefore introduce a simple technique that uses transition tables for generating correctly randomized sequences. We present an analytic method of producing item-pair frequency tables and item-pair transitional probability tables when immediate repetitions are not allowed. We illustrate these difficulties and how to overcome them, with reference to a classic article on word segmentation in infants. Finally, we provide free access to an Excel file that allows users to generate transition tables with up to 10 different item types, as well as to generate appropriately distributed randomized sequences of any length without immediately repeated elements. This file is freely available from http://leadserv.u-bourgogne.fr/IMG/xls/TransitionMatrix.xls.
Random Number Generation and Executive Functions in Parkinson's Disease: An Event-Related Brain Potential Study.

PubMed

Münte, Thomas F; Joppich, Gregor; Däuper, Jan; Schrader, Christoph; Dengler, Reinhard; Heldmann, Marcus

2015-01-01

The generation of random sequences is considered to tax executive functions and has been reported to be impaired in Parkinson's disease (PD) previously. To assess the neurophysiological markers of random number generation in PD. Event-related potentials (ERP) were recorded in 12 PD patients and 12 age-matched normal controls (NC) while either engaging in random number generation (RNG) by pressing the number keys on a computer keyboard in a random sequence or in ordered number generation (ONG) necessitating key presses in the canonical order. Key presses were paced by an external auditory stimulus at a rate of 1 tone every 1800 ms. As a secondary task subjects had to monitor the tone-sequence for a particular target tone to which the number "0" key had to be pressed. This target tone occurred randomly and infrequently, thus creating a secondary oddball task. Behaviorally, PD patients showed an increased tendency to count in steps of one as well as a tendency towards repetition avoidance. Electrophysiologically, the amplitude of the P3 component of the ERP to the target tone of the secondary task was reduced during RNG in PD but not in NC. The behavioral findings indicate less random behavior in PD while the ERP findings suggest that this impairment comes about, because attentional resources are depleted in PD.
Single-Molecule Electrical Random Resequencing of DNA and RNA

NASA Astrophysics Data System (ADS)

Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

2012-07-01

Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.
Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

NASA Astrophysics Data System (ADS)

Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

2016-02-01

Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.
Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences

PubMed Central

Peng, Zhen; Genewein, Tim; Braun, Daniel A.

2014-01-01

Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects' self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories. PMID:24744716
Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

NASA Technical Reports Server (NTRS)

Hurd, W. J.; Welch, L. R.

1977-01-01

A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.
Compact quantum random number generator based on superluminescent light-emitting diodes

NASA Astrophysics Data System (ADS)

Wei, Shihai; Yang, Jie; Fan, Fan; Huang, Wei; Li, Dashuang; Xu, Bingjie

2017-12-01

By measuring the amplified spontaneous emission (ASE) noise of the superluminescent light emitting diodes, we propose and realize a quantum random number generator (QRNG) featured with practicability. In the QRNG, after the detection and amplification of the ASE noise, the data acquisition and randomness extraction which is integrated in a field programmable gate array (FPGA) are both implemented in real-time, and the final random bit sequences are delivered to a host computer with a real-time generation rate of 1.2 Gbps. Further, to achieve compactness, all the components of the QRNG are integrated on three independent printed circuit boards with a compact design, and the QRNG is packed in a small enclosure sized 140 mm × 120 mm × 25 mm. The final random bit sequences can pass all the NIST-STS and DIEHARD tests.
Random Item Generation Is Affected by Age

ERIC Educational Resources Information Center

Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal

2016-01-01

Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…
On the limiting characteristics of quantum random number generators at various clusterings of photocounts

NASA Astrophysics Data System (ADS)

Molotkov, S. N.

2017-03-01

Various methods for the clustering of photocounts constituting a sequence of random numbers are considered. It is shown that the clustering of photocounts resulting in the Fermi-Dirac distribution makes it possible to achieve the theoretical limit of the random number generation rate.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

PubMed

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Automatic generation of randomized trial sequences for priming experiments.

PubMed

Ihrke, Matthias; Behrendt, Jörg

2011-01-01

In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.
Pseudorandom number generation using chaotic true orbits of the Bernoulli map

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saito, Asaki, E-mail: saito@fun.ac.jp; Yamaguchi, Akihiro

We devise a pseudorandom number generator that exactly computes chaotic true orbits of the Bernoulli map on quadratic algebraic integers. Moreover, we describe a way to select the initial points (seeds) for generating multiple pseudorandom binary sequences. This selection method distributes the initial points almost uniformly (equidistantly) in the unit interval, and latter parts of the generated sequences are guaranteed not to coincide. We also demonstrate through statistical testing that the generated sequences possess good randomness properties.
Improved diagonal queue medical image steganography using Chaos theory, LFSR, and Rabin cryptosystem.

PubMed

Jain, Mamta; Kumar, Anil; Choudhary, Rishabh Charan

2017-06-01

In this article, we have proposed an improved diagonal queue medical image steganography for patient secret medical data transmission using chaotic standard map, linear feedback shift register, and Rabin cryptosystem, for improvement of previous technique (Jain and Lenka in Springer Brain Inform 3:39-51, 2016). The proposed algorithm comprises four stages, generation of pseudo-random sequences (pseudo-random sequences are generated by linear feedback shift register and standard chaotic map), permutation and XORing using pseudo-random sequences, encryption using Rabin cryptosystem, and steganography using the improved diagonal queues. Security analysis has been carried out. Performance analysis is observed using MSE, PSNR, maximum embedding capacity, as well as by histogram analysis between various Brain disease stego and cover images.
An investigation of the uniform random number generator

NASA Technical Reports Server (NTRS)

Temple, E. C.

1982-01-01

Most random number generators that are in use today are of the congruential form X(i+1) + AX(i) + C mod M where A, C, and M are nonnegative integers. If C=O, the generator is called the multiplicative type and those for which C/O are called mixed congruential generators. It is easy to see that congruential generators will repeat a sequence of numbers after a maximum of M values have been generated. The number of numbers that a procedure generates before restarting the sequence is called the length or the period of the generator. Generally, it is desirable to make the period as long as possible. A detailed discussion of congruential generators is given. Also, several promising procedures that differ from the multiplicative and mixed procedure are discussed.
Reduction of display artifacts by random sampling

NASA Technical Reports Server (NTRS)

Ahumada, A. J., Jr.; Nagel, D. C.; Watson, A. B.; Yellott, J. I., Jr.

1983-01-01

The application of random-sampling techniques to remove visible artifacts (such as flicker, moire patterns, and paradoxical motion) introduced in TV-type displays by discrete sequential scanning is discussed and demonstrated. Sequential-scanning artifacts are described; the window of visibility defined in spatiotemporal frequency space by Watson and Ahumada (1982 and 1983) and Watson et al. (1983) is explained; the basic principles of random sampling are reviewed and illustrated by the case of the human retina; and it is proposed that the sampling artifacts can be replaced by random noise, which can then be shifted to frequency-space regions outside the window of visibility. Vertical sequential, single-random-sequence, and continuously renewed random-sequence plotting displays generating 128 points at update rates up to 130 Hz are applied to images of stationary and moving lines, and best results are obtained with the single random sequence for the stationary lines and with the renewed random sequence for the moving lines.
Fast and secure encryption-decryption method based on chaotic dynamics

DOEpatents

Protopopescu, Vladimir A.; Santoro, Robert T.; Tolliver, Johnny S.

1995-01-01

A method and system for the secure encryption of information. The method comprises the steps of dividing a message of length L into its character components; generating m chaotic iterates from m independent chaotic maps; producing an "initial" value based upon the m chaotic iterates; transforming the "initial" value to create a pseudo-random integer; repeating the steps of generating, producing and transforming until a pseudo-random integer sequence of length L is created; and encrypting the message as ciphertext based upon the pseudo random integer sequence. A system for accomplishing the invention is also provided.
A measurement of disorder in binary sequences

NASA Astrophysics Data System (ADS)

Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

2015-03-01

We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

PubMed

El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

2016-01-01

A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
The RANDOM computer program: A linear congruential random number generator

NASA Technical Reports Server (NTRS)

Miles, R. F., Jr.

1986-01-01

The RANDOM Computer Program is a FORTRAN program for generating random number sequences and testing linear congruential random number generators (LCGs). The linear congruential form of random number generator is discussed, and the selection of parameters of an LCG for a microcomputer described. This document describes the following: (1) The RANDOM Computer Program; (2) RANDOM.MOD, the computer code needed to implement an LCG in a FORTRAN program; and (3) The RANCYCLE and the ARITH Computer Programs that provide computational assistance in the selection of parameters for an LCG. The RANDOM, RANCYCLE, and ARITH Computer Programs are written in Microsoft FORTRAN for the IBM PC microcomputer and its compatibles. With only minor modifications, the RANDOM Computer Program and its LCG can be run on most micromputers or mainframe computers.
Quasirandom geometric networks from low-discrepancy sequences

NASA Astrophysics Data System (ADS)

Estrada, Ernesto

2017-08-01

We define quasirandom geometric networks using low-discrepancy sequences, such as Halton, Sobol, and Niederreiter. The networks are built in d dimensions by considering the d -tuples of digits generated by these sequences as the coordinates of the vertices of the networks in a d -dimensional Id unit hypercube. Then, two vertices are connected by an edge if they are at a distance smaller than a connection radius. We investigate computationally 11 network-theoretic properties of two-dimensional quasirandom networks and compare them with analogous random geometric networks. We also study their degree distribution and their spectral density distributions. We conclude from this intensive computational study that in terms of the uniformity of the distribution of the vertices in the unit square, the quasirandom networks look more random than the random geometric networks. We include an analysis of potential strategies for generating higher-dimensional quasirandom networks, where it is know that some of the low-discrepancy sequences are highly correlated. In this respect, we conclude that up to dimension 20, the use of scrambling, skipping and leaping strategies generate quasirandom networks with the desired properties of uniformity. Finally, we consider a diffusive process taking place on the nodes and edges of the quasirandom and random geometric graphs. We show that the diffusion time is shorter in the quasirandom graphs as a consequence of their larger structural homogeneity. In the random geometric graphs the diffusion produces clusters of concentration that make the process more slow. Such clusters are a direct consequence of the heterogeneous and irregular distribution of the nodes in the unit square in which the generation of random geometric graphs is based on.

Direct generation of all-optical random numbers from optical pulse amplitude chaos.

PubMed

Li, Pu; Wang, Yun-Cai; Wang, An-Bang; Yang, Ling-Zhen; Zhang, Ming-Jiang; Zhang, Jian-Zhong

2012-02-13

We propose and theoretically demonstrate an all-optical method for directly generating all-optical random numbers from pulse amplitude chaos produced by a mode-locked fiber ring laser. Under an appropriate pump intensity, the mode-locked laser can experience a quasi-periodic route to chaos. Such a chaos consists of a stream of pulses with a fixed repetition frequency but random intensities. In this method, we do not require sampling procedure and external triggered clocks but directly quantize the chaotic pulses stream into random number sequence via an all-optical flip-flop. Moreover, our simulation results show that the pulse amplitude chaos has no periodicity and possesses a highly symmetric distribution of amplitude. Thus, in theory, the obtained random number sequence without post-processing has a high-quality randomness verified by industry-standard statistical tests.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

PubMed Central

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
An On-Demand Optical Quantum Random Number Generator with In-Future Action and Ultra-Fast Response

PubMed Central

Stipčević, Mario; Ursin, Rupert

2015-01-01

Random numbers are essential for our modern information based society e.g. in cryptography. Unlike frequently used pseudo-random generators, physical random number generators do not depend on complex algorithms but rather on a physicsal process to provide true randomness. Quantum random number generators (QRNG) do rely on a process, wich can be described by a probabilistic theory only, even in principle. Here we present a conceptualy simple implementation, which offers a 100% efficiency of producing a random bit upon a request and simultaneously exhibits an ultra low latency. A careful technical and statistical analysis demonstrates its robustness against imperfections of the actual implemented technology and enables to quickly estimate randomness of very long sequences. Generated random numbers pass standard statistical tests without any post-processing. The setup described, as well as the theory presented here, demonstrate the maturity and overall understanding of the technology. PMID:26057576
Shaping the spectrum of random-phase radar waveforms

DOEpatents

Doerry, Armin W.; Marquette, Brandeis

2017-05-09

The various technologies presented herein relate to generation of a desired waveform profile in the form of a spectrum of apparently random noise (e.g., white noise or colored noise), but with precise spectral characteristics. Hence, a waveform profile that could be readily determined (e.g., by a spoofing system) is effectively obscured. Obscuration is achieved by dividing the waveform into a series of chips, each with an assigned frequency, wherein the sequence of chips are subsequently randomized. Randomization can be a function of the application of a key to the chip sequence. During processing of the echo pulse, a copy of the randomized transmitted pulse is recovered or regenerated against which the received echo is correlated. Hence, with the echo energy range-compressed in this manner, it is possible to generate a radar image with precise impulse response.
Autonomous Byte Stream Randomizer

NASA Technical Reports Server (NTRS)

Paloulian, George K.; Woo, Simon S.; Chow, Edward T.

2013-01-01

Net-centric networking environments are often faced with limited resources and must utilize bandwidth as efficiently as possible. In networking environments that span wide areas, the data transmission has to be efficient without any redundant or exuberant metadata. The Autonomous Byte Stream Randomizer software provides an extra level of security on top of existing data encryption methods. Randomizing the data s byte stream adds an extra layer to existing data protection methods, thus making it harder for an attacker to decrypt protected data. Based on a generated crypto-graphically secure random seed, a random sequence of numbers is used to intelligently and efficiently swap the organization of bytes in data using the unbiased and memory-efficient in-place Fisher-Yates shuffle method. Swapping bytes and reorganizing the crucial structure of the byte data renders the data file unreadable and leaves the data in a deconstructed state. This deconstruction adds an extra level of security requiring the byte stream to be reconstructed with the random seed in order to be readable. Once the data byte stream has been randomized, the software enables the data to be distributed to N nodes in an environment. Each piece of the data in randomized and distributed form is a separate entity unreadable on its own right, but when combined with all N pieces, is able to be reconstructed back to one. Reconstruction requires possession of the key used for randomizing the bytes, leading to the generation of the same cryptographically secure random sequence of numbers used to randomize the data. This software is a cornerstone capability possessing the ability to generate the same cryptographically secure sequence on different machines and time intervals, thus allowing this software to be used more heavily in net-centric environments where data transfer bandwidth is limited.
Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality

NASA Astrophysics Data System (ADS)

Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.

2018-03-01

This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.
Method and apparatus for determining position using global positioning satellites

NASA Technical Reports Server (NTRS)

Ward, John (Inventor); Ward, William S. (Inventor)

1998-01-01

A global positioning satellite receiver having an antenna for receiving a L1 signal from a satellite. The L1 signal is processed by a preamplifier stage including a band pass filter and a low noise amplifier and output as a radio frequency (RF) signal. A mixer receives and de-spreads the RF signal in response to a pseudo-random noise code, i.e., Gold code, generated by an internal pseudo-random noise code generator. A microprocessor enters a code tracking loop, such that during the code tracking loop, it addresses the pseudo-random code generator to cause the pseudo-random code generator to sequentially output pseudo-random codes corresponding to satellite codes used to spread the L1 signal, until correlation occurs. When an output of the mixer is indicative of the occurrence of correlation between the RF signal and the generated pseudo-random codes, the microprocessor enters an operational state which slows the receiver code sequence to stay locked with the satellite code sequence. The output of the mixer is provided to a detector which, in turn, controls certain routines of the microprocessor. The microprocessor will output pseudo range information according to an interrupt routine in response detection of correlation. The pseudo range information is to be telemetered to a ground station which determines the position of the global positioning satellite receiver.
Pseudo-random bit generator based on lag time series

NASA Astrophysics Data System (ADS)

García-Martínez, M.; Campos-Cantón, E.

2014-12-01

In this paper, we present a pseudo-random bit generator (PRBG) based on two lag time series of the logistic map using positive and negative values in the bifurcation parameter. In order to hidden the map used to build the pseudo-random series we have used a delay in the generation of time series. These new series when they are mapped xn against xn+1 present a cloud of points unrelated to the logistic map. Finally, the pseudo-random sequences have been tested with the suite of NIST giving satisfactory results for use in stream ciphers.
A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

NASA Astrophysics Data System (ADS)

Tong, Xiaojun; Cui, Minggen; Wang, Zhu

2009-07-01

The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.
Random Sequence for Optimal Low-Power Laser Generated Ultrasound

NASA Astrophysics Data System (ADS)

Vangi, D.; Virga, A.; Gulino, M. S.

2017-08-01

Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.
Subjective randomness as statistical inference.

PubMed

Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

2018-06-01

Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.
Autocorrelation peaks in congruential pseudorandom number generators

NASA Technical Reports Server (NTRS)

Neuman, F.; Merrick, R. B.

1976-01-01

The complete correlation structure of several congruential pseudorandom number generators (PRNG) of the same type and small cycle length was studied to deal with the problem of congruential PRNG almost repeating themselves at intervals smaller than their cycle lengths, during simulation of bandpass filtered normal random noise. Maximum period multiplicative and mixed congruential generators were studied, with inferences drawn from examination of several tractable members of a class of random number generators, and moduli from 2 to the 5th power to 2 to the 9th power. High correlation is shown to exist in mixed and multiplicative congruential random number generators and prime moduli Lehmer generators for shifts a fraction of their cycle length. The random noise sequences in question are required when simulating electrical noise, air turbulence, or time variation of wind parameters.
Correlations between prefrontal neurons form a small-world network that optimizes the generation of multineuron sequences of activity

PubMed Central

Luongo, Francisco J.; Zimmerman, Chris A.; Horn, Meryl E.

2016-01-01

Sequential patterns of prefrontal activity are believed to mediate important behaviors, e.g., working memory, but it remains unclear exactly how they are generated. In accordance with previous studies of cortical circuits, we found that prefrontal microcircuits in young adult mice spontaneously generate many more stereotyped sequences of activity than expected by chance. However, the key question of whether these sequences depend on a specific functional organization within the cortical microcircuit, or emerge simply as a by-product of random interactions between neurons, remains unanswered. We observed that correlations between prefrontal neurons do follow a specific functional organization—they have a small-world topology. However, until now it has not been possible to directly link small-world topologies to specific circuit functions, e.g., sequence generation. Therefore, we developed a novel analysis to address this issue. Specifically, we constructed surrogate data sets that have identical levels of network activity at every point in time but nevertheless represent various network topologies. We call this method shuffling activity to rearrange correlations (SHARC). We found that only surrogate data sets based on the actual small-world functional organization of prefrontal microcircuits were able to reproduce the levels of sequences observed in actual data. As expected, small-world data sets contained many more sequences than surrogate data sets with randomly arranged correlations. Surprisingly, small-world data sets also outperformed data sets in which correlations were maximally clustered. Thus the small-world functional organization of cortical microcircuits, which effectively balances the random and maximally clustered regimes, is optimal for producing stereotyped sequential patterns of activity. PMID:26888108
Method of multiplexed analysis using ion mobility spectrometer

DOEpatents

Belov, Mikhail E [Richland, WA; Smith, Richard D [Richland, WA

2009-06-02

A method for analyzing analytes from a sample introduced into a Spectrometer by generating a pseudo random sequence of a modulation bins, organizing each modulation bin as a series of submodulation bins, thereby forming an extended pseudo random sequence of submodulation bins, releasing the analytes in a series of analyte packets into a Spectrometer, thereby generating an unknown original ion signal vector, detecting the analytes at a detector, and characterizing the sample using the plurality of analyte signal subvectors. The method is advantageously applied to an Ion Mobility Spectrometer, and an Ion Mobility Spectrometer interfaced with a Time of Flight Mass Spectrometer.
Revisiting sample size: are big trials the answer?

PubMed

Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

2012-07-18

The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.
Enzymatic synthesis of random sequences of RNA and RNA analogues by DNA polymerase theta mutants for the generation of aptamer libraries.

PubMed

Randrianjatovo-Gbalou, Irina; Rosario, Sandrine; Sismeiro, Odile; Varet, Hugo; Legendre, Rachel; Coppée, Jean-Yves; Huteau, Valérie; Pochet, Sylvie; Delarue, Marc

2018-05-21

Nucleic acid aptamers, especially RNA, exhibit valuable advantages compared to protein therapeutics in terms of size, affinity and specificity. However, the synthesis of libraries of large random RNAs is still difficult and expensive. The engineering of polymerases able to directly generate these libraries has the potential to replace the chemical synthesis approach. Here, we start with a DNA polymerase that already displays a significant template-free nucleotidyltransferase activity, human DNA polymerase theta, and we mutate it based on the knowledge of its three-dimensional structure as well as previous mutational studies on members of the same polA family. One mutant exhibited a high tolerance towards ribonucleotides (NTPs) and displayed an efficient ribonucleotidyltransferase activity that resulted in the assembly of long RNA polymers. HPLC analysis and RNA sequencing of the products were used to quantify the incorporation of the four NTPs as a function of initial NTP concentrations and established the randomness of each generated nucleic acid sequence. The same mutant revealed a propensity to accept other modified nucleotides and to extend them in long fragments. Hence, this mutant can deliver random natural and modified RNA polymers libraries ready to use for SELEX, with custom lengths and balanced or unbalanced ratios.
Reduced randomness in quantum cryptography with sequences of qubits encoded in the same basis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lamoureux, L.-P.; Cerf, N. J.; Bechmann-Pasquinucci, H.

2006-03-15

We consider the cloning of sequences of qubits prepared in the states used in the BB84 or six-state quantum cryptography protocol, and show that the single-qubit fidelity is unaffected even if entire sequences of qubits are prepared in the same basis. This result is only valid provided that the sequences are much shorter than the total key. It is of great importance for practical quantum cryptosystems because it reduces the need for high-speed random number generation without impairing on the security against finite-size cloning attacks.
Quantum random number generator based on quantum nature of vacuum fluctuations

NASA Astrophysics Data System (ADS)

Ivanova, A. E.; Chivilikhin, S. A.; Gleim, A. V.

2017-11-01

Quantum random number generator (QRNG) allows obtaining true random bit sequences. In QRNG based on quantum nature of vacuum, optical beam splitter with two inputs and two outputs is normally used. We compare mathematical descriptions of spatial beam splitter and fiber Y-splitter in the quantum model for QRNG, based on homodyne detection. These descriptions were identical, that allows to use fiber Y-splitters in practical QRNG schemes, simplifying the setup. Also we receive relations between the input radiation and the resulting differential current in homodyne detector. We experimentally demonstrate possibility of true random bits generation by using QRNG based on homodyne detection with Y-splitter.
Dice and DNA

ERIC Educational Resources Information Center

Wernersson, Rasmus

2007-01-01

An important part of teaching students how to use the BLAST tool for searching large sequence databases, is to train the students to think critically about the quality of the sequence hits found--both in terms of the statistical significance and how informative the individual hits are. This paper describes how generating truly random sequences by…
Randomizer for High Data Rates

NASA Technical Reports Server (NTRS)

Garon, Howard; Sank, Victor J.

2018-01-01

NASA as well as a number of other space agencies now recognize that the current recommended CCSDS randomizer used for telemetry (TM) is too short. When multiple applications of the PN8 Maximal Length Sequence (MLS) are required in order to fully cover a channel access data unit (CADU), spectral problems in the form of elevated spurious discretes (spurs) appear. Originally the randomizer was called a bit transition generator (BTG) precisely because it was thought that its primary value was to insure sufficient bit transitions to allow the bit/symbol synchronizer to lock and remain locked. We, NASA, have shown that the old BTG concept is a limited view of the real value of the randomizer sequence and that the randomizer also aids in signal acquisition as well as minimizing the potential for false decoder lock. Under the guidelines we considered here there are multiple maximal length sequences under GF(2) which appear attractive in this application. Although there may be mitigating reasons why another MLS sequence could be selected, one sequence in particular possesses a combination of desired properties which offsets it from the others.

Toward DNA-based Security Circuitry: First Step - Random Number Generation.

PubMed

Bogard, Christy M; Arazi, Benjamin; Rouchka, Eric C

2008-08-10

DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. Our team investigates the implications of DNA-based circuit design in serving security applications. As an initial step we develop a random number generation circuitry. A novel prototype schema employs solid-phase synthesis of oligonucleotides for random construction of DNA sequences. Temporary storage and retrieval is achieved through plasmid vectors.
The correlation structure of several popular pseudorandom number generators

NASA Technical Reports Server (NTRS)

Neuman, F.; Merrick, R.; Martin, C. F.

1973-01-01

One of the desirable properties of a pseudorandom number generator is that the sequence of numbers it generates should have very low autocorrelation for all shifts except for zero shift and those that are multiples of its cycle length. Due to the simple methods of constructing random numbers, the ideal is often not quite fulfilled. A simple method of examining any random generator for previously unsuspected regularities is discussed. Once they are discovered it is often easy to derive the mathematical relationships, which describe the mathematical relationships, which describe the regular behavior. As examples, it is shown that high correlation exists in mixed and multiplicative congruential random number generators and prime moduli Lehmer generators for shifts a fraction of their cycle lengths.
Minimalist design of a robust real-time quantum random number generator

NASA Astrophysics Data System (ADS)

Kravtsov, K. S.; Radchenko, I. V.; Kulik, S. P.; Molotkov, S. N.

2015-08-01

We present a simple and robust construction of a real-time quantum random number generator (QRNG). Our minimalist approach ensures stable operation of the device as well as its simple and straightforward hardware implementation as a stand-alone module. As a source of randomness the device uses measurements of time intervals between clicks of a single-photon detector. The obtained raw sequence is then filtered and processed by a deterministic randomness extractor, which is realized as a look-up table. This enables high speed on-the-fly processing without the need of extensive computations. The overall performance of the device is around 1 random bit per detector click, resulting in 1.2 Mbit/s generation rate in our implementation.
Analysis of using interpulse intervals to generate 128-bit biometric random binary sequences for securing wireless body sensor networks.

PubMed

Zhang, Guang-He; Poon, Carmen C Y; Zhang, Yuan-Ting

2012-01-01

Wireless body sensor network (WBSN), a key building block for m-Health, demands extremely stringent resource constraints and thus lightweight security methods are preferred. To minimize resource consumption, utilizing information already available to a WBSN, particularly common to different sensor nodes of a WBSN, for security purposes becomes an attractive solution. In this paper, we tested the randomness and distinctiveness of the 128-bit biometric binary sequences (BSs) generated from interpulse intervals (IPIs) of 20 healthy subjects as well as 30 patients suffered from myocardial infarction and 34 subjects with other cardiovascular diseases. The encoding time of a biometric BS on a WBSN node is on average 23 ms and memory occupation is 204 bytes for any given IPI sequence. The results from five U.S. National Institute of Standards and Technology statistical tests suggest that random biometric BSs can be generated from both healthy subjects and cardiovascular patients and can potentially be used as authentication identifiers for securing WBSNs. Ultimately, it is preferred that these biometric BSs can be used as encryption keys such that key distribution over the WBSN can be avoided.
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

PubMed

Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

2018-01-01

The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
640-Gbit/s fast physical random number generation using a broadband chaotic semiconductor laser

NASA Astrophysics Data System (ADS)

Zhang, Limeng; Pan, Biwei; Chen, Guangcan; Guo, Lu; Lu, Dan; Zhao, Lingjuan; Wang, Wei

2017-04-01

An ultra-fast physical random number generator is demonstrated utilizing a photonic integrated device based broadband chaotic source with a simple post data processing method. The compact chaotic source is implemented by using a monolithic integrated dual-mode amplified feedback laser (AFL) with self-injection, where a robust chaotic signal with RF frequency coverage of above 50 GHz and flatness of ±3.6 dB is generated. By using 4-least significant bits (LSBs) retaining from the 8-bit digitization of the chaotic waveform, random sequences with a bit-rate up to 640 Gbit/s (160 GS/s × 4 bits) are realized. The generated random bits have passed each of the fifteen NIST statistics tests (NIST SP800-22), indicating its randomness for practical applications.
Hiding message into DNA sequence through DNA coding and chaotic maps.

PubMed

Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

2014-09-01

The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.

2000-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.

2001-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Quantum random bit generation using energy fluctuations in stimulated Raman scattering.

PubMed

Bustard, Philip J; England, Duncan G; Nunn, Josh; Moffatt, Doug; Spanner, Michael; Lausten, Rune; Sussman, Benjamin J

2013-12-02

Random number sequences are a critical resource in modern information processing systems, with applications in cryptography, numerical simulation, and data sampling. We introduce a quantum random number generator based on the measurement of pulse energy quantum fluctuations in Stokes light generated by spontaneously-initiated stimulated Raman scattering. Bright Stokes pulse energy fluctuations up to five times the mean energy are measured with fast photodiodes and converted to unbiased random binary strings. Since the pulse energy is a continuous variable, multiple bits can be extracted from a single measurement. Our approach can be generalized to a wide range of Raman active materials; here we demonstrate a prototype using the optical phonon line in bulk diamond.
A revision of the subtract-with-borrow random number generators

NASA Astrophysics Data System (ADS)

Sibidanov, Alexei

2017-12-01

The most popular and widely used subtract-with-borrow generator, also known as RANLUX, is reimplemented as a linear congruential generator using large integer arithmetic with the modulus size of 576 bits. Modern computers, as well as the specific structure of the modulus inferred from RANLUX, allow for the development of a fast modular multiplication - the core of the procedure. This was previously believed to be slow and have too high cost in terms of computing resources. Our tests show a significant gain in generation speed which is comparable with other fast, high quality random number generators. An additional feature is the fast skipping of generator states leading to a seeding scheme which guarantees the uniqueness of random number sequences. Licensing provisions: GPLv3 Programming language: C++, C, Assembler
PERMutation Using Transposase Engineering (PERMUTE): A Simple Approach for Constructing Circularly Permuted Protein Libraries.

PubMed

Jones, Alicia M; Atkinson, Joshua T; Silberg, Jonathan J

2017-01-01

Rearrangements that alter the order of a protein's sequence are used in the lab to study protein folding, improve activity, and build molecular switches. One of the simplest ways to rearrange a protein sequence is through random circular permutation, where native protein termini are linked together and new termini are created elsewhere through random backbone fission. Transposase mutagenesis has emerged as a simple way to generate libraries encoding different circularly permuted variants of proteins. With this approach, a synthetic transposon (called a permuteposon) is randomly inserted throughout a circularized gene to generate vectors that express different permuted variants of a protein. In this chapter, we outline the protocol for constructing combinatorial libraries of circularly permuted proteins using transposase mutagenesis, and we describe the different permuteposons that have been developed to facilitate library construction.
Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

PubMed

Shan, Gao; Zheng, Wei-Mou

2009-02-01

By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.
A hybrid-type quantum random number generator

NASA Astrophysics Data System (ADS)

Hai-Qiang, Ma; Wu, Zhu; Ke-Jin, Wei; Rui-Xue, Li; Hong-Wei, Liu

2016-05-01

This paper proposes a well-performing hybrid-type truly quantum random number generator based on the time interval between two independent single-photon detection signals, which is practical and intuitive, and generates the initial random number sources from a combination of multiple existing random number sources. A time-to-amplitude converter and multichannel analyzer are used for qualitative analysis to demonstrate that each and every step is random. Furthermore, a carefully designed data acquisition system is used to obtain a high-quality random sequence. Our scheme is simple and proves that the random number bit rate can be dramatically increased to satisfy practical requirements. Project supported by the National Natural Science Foundation of China (Grant Nos. 61178010 and 11374042), the Fund of State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications), China, and the Fundamental Research Funds for the Central Universities of China (Grant No. bupt2014TS01).
A critical evaluation of random copolymer mimesis of homogeneous antimicrobial peptides.

PubMed

Hu, Kan; Schmidt, Nathan W; Zhu, Rui; Jiang, Yunjiang; Lai, Ghee Hwee; Wei, Gang; Palermo, Edmund F; Kuroda, Kenichi; Wong, Gerard C L; Yang, Lihua

2013-01-01

Polymeric synthetic mimics of antimicrobial peptides (SMAMPs) have recently demonstrated similar antimicrobial activity as natural antimicrobial peptides (AMPs) from innate immunity. This is surprising, since polymeric SMAMPs are heterogeneous in terms of chemical structure (random sequence) and conformation (random coil), in contrast to defined amino acid sequence and intrinsic secondary structure. To understand this better, we compare AMPs with a 'minimal' mimic, a well characterized family of polydisperse cationic methacrylate-based random copolymer SMAMPs. Specifically, we focus on a comparison between the quantifiable membrane curvature generating capacity, charge density, and hydrophobicity of the polymeric SMAMPs and AMPs. Synchrotron small angle x-ray scattering (SAXS) results indicate that typical AMPs and these methacrylate SMAMPs generate similar amounts of membrane negative Gaussian curvature (NGC), which is topologically necessary for a variety of membrane-destabilizing processes. Moreover, the curvature generating ability of SMAMPs is more tolerant of changes in the lipid composition than that of natural AMPs with similar chemical groups, consistent with the lower specificity of SMAMPs. We find that, although the amount of NGC generated by these SMAMPs and AMPs are similar, the SMAMPs require significantly higher levels of hydrophobicity and cationic charge to achieve the same level of membrane deformation. We propose an explanation for these differences, which has implications for new synthetic strategies aimed at improved mimesis of AMPs.
Random digital encryption secure communication system

NASA Technical Reports Server (NTRS)

Doland, G. D. (Inventor)

1982-01-01

The design of a secure communication system is described. A product code, formed from two pseudorandom sequences of digital bits, is used to encipher or scramble data prior to transmission. The two pseudorandom sequences are periodically changed at intervals before they have had time to repeat. One of the two sequences is transmitted continuously with the scrambled data for synchronization. In the receiver portion of the system, the incoming signal is compared with one of two locally generated pseudorandom sequences until correspondence between the sequences is obtained. At this time, the two locally generated sequences are formed into a product code which deciphers the data from the incoming signal. Provision is made to ensure synchronization of the transmitting and receiving portions of the system.
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

PubMed Central

Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

2005-01-01

We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762
Not all numbers are equal: preferences and biases among children and adults when generating random sequences.

PubMed

Towse, John N; Loetscher, Tobias; Brugger, Peter

2014-01-01

We investigate the number preferences of children and adults when generating random digit sequences. Previous research has shown convincingly that adults prefer smaller numbers when randomly choosing between responses 1-6. We analyze randomization choices made by both children and adults, considering a range of experimental studies and task configurations. Children - most of whom are between 8 and 11~years - show a preference for relatively large numbers when choosing numbers 1-10. Adults show a preference for small numbers with the same response set. We report a modest association between children's age and numerical bias. However, children also exhibit a small number bias with a smaller response set available, and they show a preference specifically for the numbers 1-3 across many datasets. We argue that number space demonstrates both continuities (numbers 1-3 have a distinct status) and change (a developmentally emerging bias toward the left side of representational space or lower numbers).
Construction of a scFv Library with Synthetic, Non-combinatorial CDR Diversity.

PubMed

Bai, Xuelian; Shim, Hyunbo

2017-01-01

Many large synthetic antibody libraries have been designed, constructed, and successfully generated high-quality antibodies suitable for various demanding applications. While synthetic antibody libraries have many advantages such as optimized framework sequences and a broader sequence landscape than natural antibodies, their sequence diversities typically are generated by random combinatorial synthetic processes which cause the incorporation of many undesired CDR sequences. Here, we describe the construction of a synthetic scFv library using oligonucleotide mixtures that contain predefined, non-combinatorially synthesized CDR sequences. Each CDR is first inserted to a master scFv framework sequence and the resulting single-CDR libraries are subjected to a round of proofread panning. The proofread CDR sequences are assembled to produce the final scFv library with six diversified CDRs.
On the conservative nature of intragenic recombination

PubMed Central

Drummond, D. Allan; Silberg, Jonathan J.; Meyer, Michelle M.; Wilke, Claus O.; Arnold, Frances H.

2005-01-01

Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related β-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among β-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function. PMID:15809422

On the conservative nature of intragenic recombination.

PubMed

Drummond, D Allan; Silberg, Jonathan J; Meyer, Michelle M; Wilke, Claus O; Arnold, Frances H

2005-04-12

Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related beta-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among beta-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function.
Random number generators tested on quantum Monte Carlo simulations.

PubMed

Hongo, Kenta; Maezono, Ryo; Miura, Kenichi

2010-08-01

We have tested and compared several (pseudo) random number generators (RNGs) applied to a practical application, ground state energy calculations of molecules using variational and diffusion Monte Carlo metheds. A new multiple recursive generator with 8th-order recursion (MRG8) and the Mersenne twister generator (MT19937) are tested and compared with the RANLUX generator with five luxury levels (RANLUX-[0-4]). Both MRG8 and MT19937 are proven to give the same total energy as that evaluated with RANLUX-4 (highest luxury level) within the statistical error bars with less computational cost to generate the sequence. We also tested the notorious implementation of linear congruential generator (LCG), RANDU, for comparison. (c) 2010 Wiley Periodicals, Inc.
Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

PubMed

Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

2017-06-07

For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.
Comparative effectiveness of next generation genomic sequencing for disease diagnosis: design of a randomized controlled trial in patients with colorectal cancer/polyposis syndromes.

PubMed

Gallego, Carlos J; Bennette, Caroline S; Heagerty, Patrick; Comstock, Bryan; Horike-Pyne, Martha; Hisama, Fuki; Amendola, Laura M; Bennett, Robin L; Dorschner, Michael O; Tarczy-Hornoch, Peter; Grady, William M; Fullerton, S Malia; Trinidad, Susan B; Regier, Dean A; Nickerson, Deborah A; Burke, Wylie; Patrick, Donald L; Jarvik, Gail P; Veenstra, David L

2014-09-01

Whole exome and whole genome sequencing are applications of next generation sequencing transforming clinical care, but there is little evidence whether these tests improve patient outcomes or if they are cost effective compared to current standard of care. These gaps in knowledge can be addressed by comparative effectiveness and patient-centered outcomes research. We designed a randomized controlled trial that incorporates these research methods to evaluate whole exome sequencing compared to usual care in patients being evaluated for hereditary colorectal cancer and polyposis syndromes. Approximately 220 patients will be randomized and followed for 12 months after return of genomic findings. Patients will receive findings associated with colorectal cancer in a first return of results visit, and findings not associated with colorectal cancer (incidental findings) during a second return of results visit. The primary outcome is efficacy to detect mutations associated with these syndromes; secondary outcomes include psychosocial impact, cost-effectiveness and comparative costs. The secondary outcomes will be obtained via surveys before and after each return visit. The expected challenges in conducting this randomized controlled trial include the relatively low prevalence of genetic disease, difficult interpretation of some genetic variants, and uncertainty about which incidental findings should be returned to patients. The approaches utilized in this study may help guide other investigators in clinical genomics to identify useful outcome measures and strategies to address comparative effectiveness questions about the clinical implementation of genomic sequencing in clinical care. Copyright © 2014 Elsevier Inc. All rights reserved.
Identifying novel sequence variants of RNA 3D motifs

PubMed Central

Zirbel, Craig L.; Roll, James; Sweeney, Blake A.; Petrov, Anton I.; Pirrung, Meg; Leontis, Neocles B.

2015-01-01

Predicting RNA 3D structure from sequence is a major challenge in biophysics. An important sub-goal is accurately identifying recurrent 3D motifs from RNA internal and hairpin loop sequences extracted from secondary structure (2D) diagrams. We have developed and validated new probabilistic models for 3D motif sequences based on hybrid Stochastic Context-Free Grammars and Markov Random Fields (SCFG/MRF). The SCFG/MRF models are constructed using atomic-resolution RNA 3D structures. To parameterize each model, we use all instances of each motif found in the RNA 3D Motif Atlas and annotations of pairwise nucleotide interactions generated by the FR3D software. Isostericity relations between non-Watson–Crick basepairs are used in scoring sequence variants. SCFG techniques model nested pairs and insertions, while MRF ideas handle crossing interactions and base triples. We use test sets of randomly-generated sequences to set acceptance and rejection thresholds for each motif group and thus control the false positive rate. Validation was carried out by comparing results for four motif groups to RMDetect. The software developed for sequence scoring (JAR3D) is structured to automatically incorporate new motifs as they accumulate in the RNA 3D Motif Atlas when new structures are solved and is available free for download. PMID:26130723
Identification of Prostate Cancer-Specific microDNAs

DTIC Science & Technology

2016-02-01

circular DNA by rolling circle amplification (RCA) and then amplified DNA fragments were subject to deep sequencing. Deep sequencing of the...demonstrate the existence of microDNAs in prostate cancer. We adopted multiple displacement amplification (MDA) with random 2 primers for enriched...prostate cancer cells through multiple displacement amplification and next generation sequencing. R e la ti v e c e ll g ro w th ( % ) 0 20
A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

PubMed

Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

2015-04-11

Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.
Undesirable Choice Biases with Small Differences in the Spatial Structure of Chance Stimulus Sequences.

PubMed

Herrera, David; Treviño, Mario

2015-01-01

In two-alternative discrimination tasks, experimenters usually randomize the location of the rewarded stimulus so that systematic behavior with respect to irrelevant stimuli can only produce chance performance on the learning curves. One way to achieve this is to use random numbers derived from a discrete binomial distribution to create a 'full random training schedule' (FRS). When using FRS, however, sporadic but long laterally-biased training sequences occur by chance and such 'input biases' are thought to promote the generation of laterally-biased choices (i.e., 'output biases'). As an alternative, a 'Gellerman-like training schedule' (GLS) can be used. It removes most input biases by prohibiting the reward from appearing on the same location for more than three consecutive trials. The sequence of past rewards obtained from choosing a particular discriminative stimulus influences the probability of choosing that same stimulus on subsequent trials. Assuming that the long-term average ratio of choices matches the long-term average ratio of reinforcers, we hypothesized that a reduced amount of input biases in GLS compared to FRS should lead to a reduced production of output biases. We compared the choice patterns produced by a 'Rational Decision Maker' (RDM) in response to computer-generated FRS and GLS training sequences. To create a virtual RDM, we implemented an algorithm that generated choices based on past rewards. Our simulations revealed that, although the GLS presented fewer input biases than the FRS, the virtual RDM produced more output biases with GLS than with FRS under a variety of test conditions. Our results reveal that the statistical and temporal properties of training sequences interacted with the RDM to influence the production of output biases. Thus, discrete changes in the training paradigms did not translate linearly into modifications in the pattern of choices generated by a RDM. Virtual RDMs could be further employed to guide the selection of proper training schedules for perceptual decision-making studies.
Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

PubMed Central

Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg

2017-01-01

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences.

PubMed

Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan

2011-09-01

To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
A critical evaluation of random copolymer mimesis of homogeneous antimicrobial peptides

PubMed Central

Hu, Kan; Schmidt, Nathan W.; Zhu, Rui; Jiang, Yunjiang; Lai, Ghee Hwee; Wei, Gang; Palermo, Edmund F.; Kuroda, Kenichi; Wong, Gerard C. L.; Yang, Lihua

2013-01-01

Polymeric synthetic mimics of antimicrobial peptides (SMAMPs) have recently demonstrated similar antimicrobial activity as natural antimicrobial peptides (AMPs) from innate immunity. This is surprising, since polymeric SMAMPs are heterogeneous in terms of chemical structure (random sequence) and conformation (random coil), in contrast to defined amino acid sequence and intrinsic secondary structure. To understand this better, we compare AMPs with a ‘minimal’ mimic, a well characterized family of polydisperse cationic methacrylate-based random copolymer SMAMPs. Specifically, we focus on a comparison between the quantifiable membrane curvature generating capacity, charge density, and hydrophobicity of the polymeric SMAMPs and AMPs. Synchrotron small angle x-ray scattering (SAXS) results indicate that typical AMPs and these methacrylate SMAMPs generate similar amounts of membrane negative Gaussian curvature (NGC), which is topologically necessary for a variety of membrane-destabilizing processes. Moreover, the curvature generating ability of SMAMPs is more tolerant of changes in the lipid composition than that of natural AMPs with similar chemical groups, consistent with the lower specificity of SMAMPs. We find that, although the amount of NGC generated by these SMAMPs and AMPs are similar, the SMAMPs require significantly higher levels of hydrophobicity and cationic charge to achieve the same level of membrane deformation. We propose an explanation for these differences, which has implications for new synthetic strategies aimed at improved mimesis of AMPs. PMID:23750051
Dynamic Loads Generation for Multi-Point Vibration Excitation Problems

NASA Technical Reports Server (NTRS)

Shen, Lawrence

2011-01-01

A random-force method has been developed to predict dynamic loads produced by rocket-engine random vibrations for new rocket-engine designs. The method develops random forces at multiple excitation points based on random vibration environments scaled from accelerometer data obtained during hot-fire tests of existing rocket engines. This random-force method applies random forces to the model and creates expected dynamic response in a manner that simulates the way the operating engine applies self-generated random vibration forces (random pressure acting on an area) with the resulting responses that we measure with accelerometers. This innovation includes the methodology (implementation sequence), the computer code, two methods to generate the random-force vibration spectra, and two methods to reduce some of the inherent conservatism in the dynamic loads. This methodology would be implemented to generate the random-force spectra at excitation nodes without requiring the use of artificial boundary conditions in a finite element model. More accurate random dynamic loads than those predicted by current industry methods can then be generated using the random force spectra. The scaling method used to develop the initial power spectral density (PSD) environments for deriving the random forces for the rocket engine case is based on the Barrett Criteria developed at Marshall Space Flight Center in 1963. This invention approach can be applied in the aerospace, automotive, and other industries to obtain reliable dynamic loads and responses from a finite element model for any structure subject to multipoint random vibration excitations.
de Bruijn cycles for neural decoding.

PubMed

Aguirre, Geoffrey Karl; Mattar, Marcelo Gomes; Magis-Weinberg, Lucía

2011-06-01

Stimulus counterbalance is critical for studies of neural habituation, bias, anticipation, and (more generally) the effect of stimulus history and context. We introduce de Bruijn cycles, a class of combinatorial objects, as the ideal source of pseudo-random stimulus sequences with arbitrary levels of counterbalance. Neuro-vascular imaging studies (such as BOLD fMRI) have an additional requirement imposed by the filtering and noise properties of the method: only some temporal frequencies of neural modulation are detectable. Extant methods of generating counterbalanced stimulus sequences yield neural modulations that are weakly (or not at all) detected by BOLD fMRI. We solve this limitation using a novel "path-guided" approach for the generation of de Bruijn cycles. The algorithm encodes a hypothesized neural modulation of specific temporal frequency within the seemingly random order of events. By positioning the modulation between the signal and noise bands of the neuro-vascular imaging method, the resulting sequence markedly improves detection power. These sequences may be used to study stimulus context and history effects in a manner not previously possible. Copyright © 2011 Elsevier Inc. All rights reserved.
Application of sorting and next generation sequencing to study 5΄-UTR influence on translation efficiency in Escherichia coli

PubMed Central

Evfratov, Sergey A.; Osterman, Ilya A.; Komarova, Ekaterina S.; Pogorelskaya, Alexandra M.; Rubtsova, Maria P.; Zatsepin, Timofei S.; Semashko, Tatiana A.; Kostryukova, Elena S.; Mironov, Andrey A.; Burnaev, Evgeny; Krymova, Ekaterina; Gelfand, Mikhail S.; Govorun, Vadim M.; Bogdanov, Alexey A.; Dontsova, Olga A.

2017-01-01

Abstract Yield of protein per translated mRNA may vary by four orders of magnitude. Many studies analyzed the influence of mRNA features on the translation yield. However, a detailed understanding of how mRNA sequence determines its propensity to be translated is still missing. Here, we constructed a set of reporter plasmid libraries encoding CER fluorescent protein preceded by randomized 5΄ untranslated regions (5΄-UTR) and Red fluorescent protein (RFP) used as an internal control. Each library was transformed into Escherchia coli cells, separated by efficiency of CER mRNA translation by a cell sorter and subjected to next generation sequencing. We tested efficiency of translation of the CER gene preceded by each of 48 natural 5΄-UTR sequences and introduced random and designed mutations into natural and artificially selected 5΄-UTRs. Several distinct properties could be ascribed to a group of 5΄-UTRs most efficient in translation. In addition to known ones, several previously unrecognized features that contribute to the translation enhancement were found, such as low proportion of cytidine residues, multiple SD sequences and AG repeats. The latter could be identified as translation enhancer, albeit less efficient than SD sequence in several natural 5΄-UTRs. PMID:27899632
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

PubMed Central

Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

2012-01-01

ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
Sustained State-Independent Quantum Contextual Correlations from a Single Ion

NASA Astrophysics Data System (ADS)

Leupold, F. M.; Malinowski, M.; Zhang, C.; Negnevitsky, V.; Alonso, J.; Home, J. P.; Cabello, A.

2018-05-01

We use a single trapped-ion qutrit to demonstrate the quantum-state-independent violation of noncontextuality inequalities using a sequence of randomly chosen quantum nondemolition projective measurements. We concatenate 53 ×106 sequential measurements of 13 observables, and unambiguously violate an optimal noncontextual bound. We use the same data set to characterize imperfections including signaling and repeatability of the measurements. The experimental sequence was generated in real time with a quantum random number generator integrated into our control system to select the subsequent observable with a latency below 50 μ s , which can be used to constrain contextual hidden-variable models that might describe our results. The state-recycling experimental procedure is resilient to noise and independent of the qutrit state, substantiating the fact that the contextual nature of quantum physics is connected to measurements and not necessarily to designated states. The use of extended sequences of quantum nondemolition measurements finds applications in the fields of sensing and quantum information.
Host-Associated Metagenomics: A Guide to Generating Infectious RNA Viromes

PubMed Central

Robert, Catherine; Pascalis, Hervé; Michelle, Caroline; Jardot, Priscilla; Charrel, Rémi; Raoult, Didier; Desnues, Christelle

2015-01-01

Background Metagenomic analyses have been widely used in the last decade to describe viral communities in various environments or to identify the etiology of human, animal, and plant pathologies. Here, we present a simple and standardized protocol that allows for the purification and sequencing of RNA viromes from complex biological samples with an important reduction of host DNA and RNA contaminants, while preserving the infectivity of viral particles. Principal Findings We evaluated different viral purification steps, random reverse transcriptions and sequence-independent amplifications of a pool of representative RNA viruses. Viruses remained infectious after the purification process. We then validated the protocol by sequencing the RNA virome of human body lice engorged in vitro with artificially contaminated human blood. The full genomes of the most abundant viruses absorbed by the lice during the blood meal were successfully sequenced. Interestingly, random amplifications differed in the genome coverage of segmented RNA viruses. Moreover, the majority of reads were taxonomically identified, and only 7–15% of all reads were classified as “unknown”, depending on the random amplification method. Conclusion The protocol reported here could easily be applied to generate RNA viral metagenomes from complex biological samples of different origins. Our protocol allows further virological characterizations of the described viral communities because it preserves the infectivity of viral particles and allows for the isolation of viruses. PMID:26431175
A method for multi-codon scanning mutagenesis of proteins based on asymmetric transposons.

PubMed

Liu, Jia; Cropp, T Ashton

2012-02-01

Random mutagenesis followed by selection or screening is a commonly used strategy to improve protein function. Despite many available methods for random mutagenesis, nearly all generate mutations at the nucleotide level. An ideal mutagenesis method would allow for the generation of 'codon mutations' to change protein sequence with defined or mixed amino acids of choice. Herein we report a method that allows for mutations of one, two or three consecutive codons. Key to this method is the development of a Mu transposon variant with asymmetric terminal sequences. As a demonstration of the method, we performed multi-codon scanning on the gene encoding superfolder GFP (sfGFP). Characterization of 50 randomly chosen clones from each library showed that more than 40% of the mutants in these three libraries contained seamless, in-frame mutations with low site preference. By screening only 500 colonies from each library, we successfully identified several spectra-shift mutations, including a S205D variant that was found to bear a single excitation peak in the UV region.
Scrambled Sobol Sequences via Permutation

DTIC Science & Technology

2009-01-01

LCG LCG64 LFG MLFG PMLCG Sobol Scrambler PermutationScrambler LinearScrambler <<uses>> PermuationFactory StaticFactory DynamicFactory <<uses>> Figure 3...Phy., 19:252–256, 1979. [2] Emanouil I. Atanassov. A new efficient algorithm for generating the scrambled sobol ’ sequence. In NMA ’02: Revised Papers...Deidre W.Evan, and Micheal Mascagni. On the scrambled sobol sequence. In ICCS2005, pages 775–782, 2005. [7] Richard Durstenfeld. Algorithm 235: Random

Real-time UAV trajectory generation using feature points matching between video image sequences

NASA Astrophysics Data System (ADS)

Byun, Younggi; Song, Jeongheon; Han, Dongyeob

2017-09-01

Unmanned aerial vehicles (UAVs), equipped with navigation systems and video capability, are currently being deployed for intelligence, reconnaissance and surveillance mission. In this paper, we present a systematic approach for the generation of UAV trajectory using a video image matching system based on SURF (Speeded up Robust Feature) and Preemptive RANSAC (Random Sample Consensus). Video image matching to find matching points is one of the most important steps for the accurate generation of UAV trajectory (sequence of poses in 3D space). We used the SURF algorithm to find the matching points between video image sequences, and removed mismatching by using the Preemptive RANSAC which divides all matching points to outliers and inliers. The inliers are only used to determine the epipolar geometry for estimating the relative pose (rotation and translation) between image sequences. Experimental results from simulated video image sequences showed that our approach has a good potential to be applied to the automatic geo-localization of the UAVs system
Statistical complexity measure of pseudorandom bit generators

NASA Astrophysics Data System (ADS)

González, C. M.; Larrondo, H. A.; Rosso, O. A.

2005-08-01

Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.
Perceptions of Randomness: Why Three Heads Are Better than Four

ERIC Educational Resources Information Center

Hahn, Ulrike; Warren, Paul A.

2009-01-01

A long tradition of psychological research has lamented the systematic errors and biases in people's perception of the characteristics of sequences generated by a random mechanism such as a coin toss. It is proposed that once the likely nature of people's actual experience of such processes is taken into account, these "errors" and "biases"…
Single-electron random-number generator (RNG) for highly secure ubiquitous computing applications

NASA Astrophysics Data System (ADS)

Uchida, Ken; Tanamoto, Tetsufumi; Fujita, Shinobu

2007-11-01

Since the security of all modern cryptographic techniques relies on unpredictable and irreproducible digital keys generated by random-number generators (RNGs), the realization of high-quality RNG is essential for secure communications. In this report, a new RNG, which utilizes single-electron phenomena, is proposed. A room-temperature operating silicon single-electron transistor (SET) having nearby an electron pocket is used as a high-quality, ultra-small RNG. In the proposed RNG, stochastic single-electron capture/emission processes to/from the electron pocket are detected with high sensitivity by the SET, and result in giant random telegraphic signals (GRTS) on the SET current. It is experimentally demonstrated that the single-electron RNG generates extremely high-quality random digital sequences at room temperature, in spite of its simple configuration. Because of its small-size and low-power properties, the single-electron RNG is promising as a key nanoelectronic device for future ubiquitous computing systems with highly secure mobile communication capabilities.
Random sampling of constrained phylogenies: conducting phylogenetic analyses when the phylogeny is partially known.

PubMed

Housworth, E A; Martins, E P

2001-01-01

Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.
Generation of Some First-Order Autoregressive Markovian Sequences of Positive Random Variables with Given Marginal Distributions,

DTIC Science & Technology

1981-03-01

Again E( XnX 1 Xn) Xn + (l-aB)/X PlXn-1 + (l-Pl)/x 2.11) and X0 E0 gives a stationary sequence. Thus the correla- tions and regressions are the...sequence, although the sample paths will tend to have runs-up. A similar analysis given in Lawrance and Lewis [5] shows that 1 1 + i a + au (3.7) E( XnX
DNA/RNA hybrid substrates modulate the catalytic activity of purified AID.

PubMed

Abdouni, Hala S; King, Justin J; Ghorbani, Atefeh; Fifield, Heather; Berghuis, Lesley; Larijani, Mani

2018-01-01

Activation-induced cytidine deaminase (AID) converts cytidine to uridine at Immunoglobulin (Ig) loci, initiating somatic hypermutation and class switching of antibodies. In vitro, AID acts on single stranded DNA (ssDNA), but neither double-stranded DNA (dsDNA) oligonucleotides nor RNA, and it is believed that transcription is the in vivo generator of ssDNA targeted by AID. It is also known that the Ig loci, particularly the switch (S) regions targeted by AID are rich in transcription-generated DNA/RNA hybrids. Here, we examined the binding and catalytic behavior of purified AID on DNA/RNA hybrid substrates bearing either random sequences or GC-rich sequences simulating Ig S regions. If substrates were made up of a random sequence, AID preferred substrates composed entirely of DNA over DNA/RNA hybrids. In contrast, if substrates were composed of S region sequences, AID preferred to mutate DNA/RNA hybrids over substrates composed entirely of DNA. Accordingly, AID exhibited a significantly higher affinity for binding DNA/RNA hybrid substrates composed specifically of S region sequences, than any other substrates composed of DNA. Thus, in the absence of any other cellular processes or factors, AID itself favors binding and mutating DNA/RNA hybrids composed of S region sequences. AID:DNA/RNA complex formation and supporting mutational analyses suggest that recognition of DNA/RNA hybrids is an inherent structural property of AID. Copyright © 2017 Elsevier Ltd. All rights reserved.
Synchronization of random bit generators based on coupled chaotic lasers and application to cryptography.

PubMed

Kanter, Ido; Butkovski, Maria; Peleg, Yitzhak; Zigzag, Meital; Aviad, Yaara; Reidler, Igor; Rosenbluh, Michael; Kinzel, Wolfgang

2010-08-16

Random bit generators (RBGs) constitute an important tool in cryptography, stochastic simulations and secure communications. The later in particular has some difficult requirements: high generation rate of unpredictable bit strings and secure key-exchange protocols over public channels. Deterministic algorithms generate pseudo-random number sequences at high rates, however, their unpredictability is limited by the very nature of their deterministic origin. Recently, physical RBGs based on chaotic semiconductor lasers were shown to exceed Gbit/s rates. Whether secure synchronization of two high rate physical RBGs is possible remains an open question. Here we propose a method, whereby two fast RBGs based on mutually coupled chaotic lasers, are synchronized. Using information theoretic analysis we demonstrate security against a powerful computational eavesdropper, capable of noiseless amplification, where all parameters are publicly known. The method is also extended to secure synchronization of a small network of three RBGs.
Secure communications using quantum cryptography

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hughes, R.J.; Buttler, W.T.; Kwiat, P.G.

1997-08-01

The secure distribution of the secret random bit sequences known as {open_quotes}key{close_quotes} material, is an essential precursor to their use for the encryption and decryption of confidential communications. Quantum cryptography is an emerging technology for secure key distribution with single-photon transmissions, nor evade detection (eavesdropping raises the key error rate above a threshold value). We have developed experimental quantum cryptography systems based on the transmission of non-orthogonal single-photon states to generate shared key material over multi-kilometer optical fiber paths and over line-of-sight links. In both cases, key material is built up using the transmission of a single-photon per bit ofmore » an initial secret random sequence. A quantum-mechanically random subset of this sequence is identified, becoming the key material after a data reconciliation stage with the sender. In our optical fiber experiment we have performed quantum key distribution over 24-km of underground optical fiber using single-photon interference states, demonstrating that secure, real-time key generation over {open_quotes}open{close_quotes} multi-km node-to-node optical fiber communications links is possible. We have also constructed a quantum key distribution system for free-space, line-of-sight transmission using single-photon polarization states, which is currently undergoing laboratory testing. 7 figs.« less
Using multi-locus allelic sequence data to estimate genetic divergence among four Lilium (Liliaceae) cultivars

PubMed Central

Shahin, Arwa; Smulders, Marinus J. M.; van Tuyl, Jaap M.; Arens, Paul; Bakker, Freek T.

2014-01-01

Next Generation Sequencing (NGS) may enable estimating relationships among genotypes using allelic variation of multiple nuclear genes simultaneously. We explored the potential and caveats of this strategy in four genetically distant Lilium cultivars to estimate their genetic divergence from transcriptome sequences using three approaches: POFAD (Phylogeny of Organisms from Allelic Data, uses allelic information of sequence data), RAxML (Randomized Accelerated Maximum Likelihood, tree building based on concatenated consensus sequences) and Consensus Network (constructing a network summarizing among gene tree conflicts). Twenty six gene contigs were chosen based on the presence of orthologous sequences in all cultivars, seven of which also had an orthologous sequence in Tulipa, used as out-group. The three approaches generated the same topology. Although the resolution offered by these approaches is high, in this case there was no extra benefit in using allelic information. We conclude that these 26 genes can be widely applied to construct a species tree for the genus Lilium. PMID:25368628
RSAT: regulatory sequence analysis tools.

PubMed

Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

2008-07-01

The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
Analysis of entropy extraction efficiencies in random number generation systems

NASA Astrophysics Data System (ADS)

Wang, Chao; Wang, Shuang; Chen, Wei; Yin, Zhen-Qiang; Han, Zheng-Fu

2016-05-01

Random numbers (RNs) have applications in many areas: lottery games, gambling, computer simulation, and, most importantly, cryptography [N. Gisin et al., Rev. Mod. Phys. 74 (2002) 145]. In cryptography theory, the theoretical security of the system calls for high quality RNs. Therefore, developing methods for producing unpredictable RNs with adequate speed is an attractive topic. Early on, despite the lack of theoretical support, pseudo RNs generated by algorithmic methods performed well and satisfied reasonable statistical requirements. However, as implemented, those pseudorandom sequences were completely determined by mathematical formulas and initial seeds, which cannot introduce extra entropy or information. In these cases, “random” bits are generated that are not at all random. Physical random number generators (RNGs), which, in contrast to algorithmic methods, are based on unpredictable physical random phenomena, have attracted considerable research interest. However, the way that we extract random bits from those physical entropy sources has a large influence on the efficiency and performance of the system. In this manuscript, we will review and discuss several randomness extraction schemes that are based on radiation or photon arrival times. We analyze the robustness, post-processing requirements and, in particular, the extraction efficiency of those methods to aid in the construction of efficient, compact and robust physical RNG systems.
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

PubMed Central

2011-01-01

Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510
Unlocking hidden genomic sequence

PubMed Central

Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

2004-01-01

Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330
Artificial neural network study on organ-targeting peptides

NASA Astrophysics Data System (ADS)

Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

2010-01-01

We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.
Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

PubMed

Ohno, S

1984-01-01

Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.
All about Eve: Secret Sharing using Quantum Effects

NASA Technical Reports Server (NTRS)

Jackson, Deborah J.

2005-01-01

This document discusses the nature of light (including classical light and photons), encryption, quantum key distribution (QKD), light polarization and beamsplitters and their application to information communication. A quantum of light represents the smallest possible subdivision of radiant energy (light) and is called a photon. The QKD key generation sequence is outlined including the receiver broadcasting the initial signal indicating reception availability, timing pulses from the sender to provide reference for gated detection of photons, the sender generating photons through random polarization while the receiver detects photons with random polarization and communicating via data link to mutually establish random keys. The QKD network vision includes inter-SATCOM, point-to-point Gnd Fiber and SATCOM-fiber nodes. QKD offers an unconditionally secure method of exchanging encryption keys. Ongoing research will focus on how to increase the key generation rate.
The Neural Correlates of Implicit Sequence Learning in Schizophrenia

PubMed Central

Marvel, Cherie L.; Turner, Beth M.; O’Leary, Daniel S.; Johnson, Hans J.; Pierson, Ronald K.; Boles Ponto, Laura L.; Andreasen, Nancy C.

2009-01-01

Twenty-seven schizophrenia spectrum patients and 25 healthy controls performed a probabilistic version of the serial reaction time task (SRT) that included sequence trials embedded within random trials. Patients showed diminished, yet measurable, sequence learning. Postexperimental analyses revealed that a group of patients performed above chance when generating short spans of the sequence. This high-generation group showed SRT learning that was similar in magnitude to that of controls. Their learning was evident from the very 1st block; however, unlike controls, learning did not develop further with continued testing. A subset of 12 patients and 11 controls performed the SRT in conjunction with positron emission tomography. High-generation performance, which corresponded to SRT learning in patients, correlated to activity in the premotor cortex and parahippocampus. These areas have been associated with stimulus-driven visuospatial processing. Taken together, these results suggest that a subset of patients who showed moderate success on the SRT used an explicit stimulus-driven strategy to process the sequential stimuli. This adaptive strategy facilitated sequence learning but may have interfered with conventional implicit learning of the overall stimulus pattern. PMID:17983290
Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

PubMed

Hoshino, Tatsuhiko; Inagaki, Fumio

2017-01-01

Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
Random number generation in bilingual Balinese and German students: preliminary findings from an exploratory cross-cultural study.

PubMed

Strenge, Hans; Lesmana, Cokorda Bagus Jaya; Suryani, Luh Ketut

2009-08-01

Verbal random number generation is a procedurally simple task to assess executive function and appears ideally suited for the use under diverse settings in cross-cultural research. The objective of this study was to examine ethnic group differences between young adults in Bali (Indonesia) and Kiel (Germany): 50 bilingual healthy students, 30 Balinese and 20 Germans, attempted to generate a random sequence of the digits 1 to 9. In Balinese participants, randomization was done in Balinese (native language L1) and Indonesian (first foreign language L2), in German subjects in the German (L1) and English (L2) languages. 10 of 30 Balinese (33%), but no Germans, were unable to inhibit habitual counting in more than half of the responses. The Balinese produced significantly more nonrandom responses than the Germans with higher rates of counting and significantly less occurrence of the digits 2 and 3 in L1 compared with L2. Repetition and cycling behavior did not differ between the four languages. The findings highlight the importance of taking into account culture-bound psychosocial factors for Balinese individuals when administering and interpreting a random number generation test.

Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

PubMed

Lau, Billy T; Ji, Hanlee P

2017-09-21

RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.
Importance Sampling of Word Patterns in DNA and Protein Sequences

PubMed Central

Chan, Hock Peng; Chen, Louis H.Y.

2010-01-01

Abstract Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs. PMID:21128856
Transient sequences in a hypernetwork generated by an adaptive network of spiking neurons.

PubMed

Maslennikov, Oleg V; Shchapin, Dmitry S; Nekorkin, Vladimir I

2017-06-28

We propose a model of an adaptive network of spiking neurons that gives rise to a hypernetwork of its dynamic states at the upper level of description. Left to itself, the network exhibits a sequence of transient clustering which relates to a traffic in the hypernetwork in the form of a random walk. Receiving inputs the system is able to generate reproducible sequences corresponding to stimulus-specific paths in the hypernetwork. We illustrate these basic notions by a simple network of discrete-time spiking neurons together with its FPGA realization and analyse their properties.This article is part of the themed issue 'Mathematical methods in medicine: neuroscience, cardiology and pathology'. © 2017 The Author(s).
Perception of randomness: On the time of streaks.

PubMed

Sun, Yanlong; Wang, Hongbin

2010-12-01

People tend to think that streaks in random sequential events are rare and remarkable. When they actually encounter streaks, they tend to consider the underlying process as non-random. The present paper examines the time of pattern occurrences in sequences of Bernoulli trials, and shows that among all patterns of the same length, a streak is the most delayed pattern for its first occurrence. It is argued that when time is of essence, how often a pattern is to occur (mean time, or, frequency) and when a pattern is to first occur (waiting time) are different questions and bear different psychological relevance. The waiting time statistics may provide a quantitative measure to the psychological distance when people are expecting a probabilistic event, and such measure is consistent with both of the representativeness and availability heuristics in people's perception of randomness. We discuss some of the recent empirical findings and suggest that people's judgment and generation of random sequences may be guided by their actual experiences of the waiting time statistics. Published by Elsevier Inc.
Effects of learning duration on implicit transfer.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2015-10-01

Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE PAGES

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

2015-05-12

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
True random bit generators based on current time series of contact glow discharge electrolysis

NASA Astrophysics Data System (ADS)

Rojas, Andrea Espinel; Allagui, Anis; Elwakil, Ahmed S.; Alawadhi, Hussain

2018-05-01

Random bit generators (RBGs) in today's digital information and communication systems employ a high rate physical entropy sources such as electronic, photonic, or thermal time series signals. However, the proper functioning of such physical systems is bound by specific constrains that make them in some cases weak and susceptible to external attacks. In this study, we show that the electrical current time series of contact glow discharge electrolysis, which is a dc voltage-powered micro-plasma in liquids, can be used for generating random bit sequences in a wide range of high dc voltages. The current signal is quantized into a binary stream by first using a simple moving average function which makes the distribution centered around zero, and then applying logical operations which enables the binarized data to pass all tests in industry-standard randomness test suite by the National Institute of Standard Technology. Furthermore, the robustness of this RBG against power supply attacks has been examined and verified.
A Study of Ontogenetic and Generational Change in Adolescent Personality by Means of Multivariate Longitudinal Sequences: Phase II. Final Report.

ERIC Educational Resources Information Center

Nesselroade, John R.; Baltes, Paul B.

Assessment of the relationship between ontogenetic (individual) and generational (historical) change in adolescent personality development was the focus of this study. The total sample included 1000 male and female adolescents (ages 13-18) randomly drawn from 32 public school systems in West Virginia following a design using longitudinal sequences…
Viral metagenomic analysis of feces of wild small carnivores

PubMed Central

2014-01-01

Background Recent studies have clearly demonstrated the enormous virus diversity that exists among wild animals. This exemplifies the required expansion of our knowledge of the virus diversity present in wildlife, as well as the potential transmission of these viruses to domestic animals or humans. Methods In the present study we evaluated the viral diversity of fecal samples (n = 42) collected from 10 different species of wild small carnivores inhabiting the northern part of Spain using random PCR in combination with next-generation sequencing. Samples were collected from American mink (Neovison vison), European mink (Mustela lutreola), European polecat (Mustela putorius), European pine marten (Martes martes), stone marten (Martes foina), Eurasian otter (Lutra lutra) and Eurasian badger (Meles meles) of the family of Mustelidae; common genet (Genetta genetta) of the family of Viverridae; red fox (Vulpes vulpes) of the family of Canidae and European wild cat (Felis silvestris) of the family of Felidae. Results A number of sequences of possible novel viruses or virus variants were detected, including a theilovirus, phleboviruses, an amdovirus, a kobuvirus and picobirnaviruses. Conclusions Using random PCR in combination with next generation sequencing, sequences of various novel viruses or virus variants were detected in fecal samples collected from Spanish carnivores. Detected novel viruses highlight the viral diversity that is present in fecal material of wild carnivores. PMID:24886057
Genome Sequencing of Steroid Producing Bacteria Using Ion Torrent Technology and a Reference Genome.

PubMed

Sola-Landa, Alberto; Rodríguez-García, Antonio; Barreiro, Carlos; Pérez-Redondo, Rosario

2017-01-01

The Next-Generation Sequencing technology has enormously eased the bacterial genome sequencing and several tens of thousands of genomes have been sequenced during the last 10 years. Most of the genome projects are published as draft version, however, for certain applications the complete genome sequence is required.In this chapter, we describe the strategy that allowed the complete genome sequencing of Mycobacterium neoaurum NRRL B-3805, an industrial strain exploited for steroid production, using Ion Torrent sequencing reads and the genome of a close strain as the reference. This protocol can be applied to analyze the genetic variations between closely related strains; for example, to elucidate the point mutations between a parental strain and a random mutagenesis-derived mutant.
A Generative Angular Model of Protein Structure Evolution

PubMed Central

Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

2017-01-01

Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724
Generation and Analysis of Expressed Sequence Tags from Olea europaea L.

PubMed Central

Ozdemir Ozgenturk, Nehir; Oruç, Fatma; Sezerman, Ugur; Kuçukural, Alper; Vural Korkut, Senay; Toksoz, Feriha; Un, Cemal

2010-01-01

Olive (Olea europaea L.) is an important source of edible oil which was originated in Near-East region. In this study, two cDNA libraries were constructed from young olive leaves and immature olive fruits for generation of ESTs to discover the novel genes and search the function of unknown genes of olive. The randomly selected 3840 colonies were sequenced for EST collection from both libraries. Readable 2228 sequences for olive leaf and 1506 sequences for olive fruit were assembled into 205 and 69 contigs, respectively, whereas 2478 were singletons. Putative functions of all 2752 differentially expressed unique sequences were designated by gene homology based on BLAST and annotated using BLAST2GO. While 1339 ESTs show no homology to the database, 2024 ESTs have homology (under 80%) with hypothetical proteins, putative proteins, expressed proteins, and unknown proteins in NCBI-GenBank. 635 EST's unique genes sequence have been identified by over 80% homology to known function in other species which were not previously described in Olea family. Only 3.1% of total EST's was shown similarity with olive database existing in NCBI. This generated EST's data and consensus sequences were submitted to NCBI as valuable source for functional genome studies of olive. PMID:21197085
Random and externally controlled occurrences of Dansgaard-Oeschger events

NASA Astrophysics Data System (ADS)

Lohmann, Johannes; Ditlevsen, Peter D.

2018-05-01

Dansgaard-Oeschger (DO) events constitute the most pronounced mode of centennial to millennial climate variability of the last glacial period. Since their discovery, many decades of research have been devoted to understand the origin and nature of these rapid climate shifts. In recent years, a number of studies have appeared that report emergence of DO-type variability in fully coupled general circulation models via different mechanisms. These mechanisms result in the occurrence of DO events at varying degrees of regularity, ranging from periodic to random. When examining the full sequence of DO events as captured in the North Greenland Ice Core Project (NGRIP) ice core record, one can observe high irregularity in the timing of individual events at any stage within the last glacial period. In addition to the prevailing irregularity, certain properties of the DO event sequence, such as the average event frequency or the relative distribution of cold versus warm periods, appear to be changing throughout the glacial. By using statistical hypothesis tests on simple event models, we investigate whether the observed event sequence may have been generated by stationary random processes or rather was strongly modulated by external factors. We find that the sequence of DO warming events is consistent with a stationary random process, whereas dividing the event sequence into warming and cooling events leads to inconsistency with two independent event processes. As we include external forcing, we find a particularly good fit to the observed DO sequence in a model where the average residence time in warm periods are controlled by global ice volume and cold periods by boreal summer insolation.
DNA based random key generation and management for OTP encryption.

PubMed

Zhang, Yunpeng; Liu, Xin; Sun, Manhui

2017-09-01

One-time pad (OTP) is a principle of key generation applied to the stream ciphering method which offers total privacy. The OTP encryption scheme has proved to be unbreakable in theory, but difficult to realize in practical applications. Because OTP encryption specially requires the absolute randomness of the key, its development has suffered from dense constraints. DNA cryptography is a new and promising technology in the field of information security. DNA chromosomes storing capabilities can be used as one-time pad structures with pseudo-random number generation and indexing in order to encrypt the plaintext messages. In this paper, we present a feasible solution to the OTP symmetric key generation and transmission problem with DNA at the molecular level. Through recombinant DNA technology, by using only sender-receiver known restriction enzymes to combine the secure key represented by DNA sequence and the T vector, we generate the DNA bio-hiding secure key and then place the recombinant plasmid in implanted bacteria for secure key transmission. The designed bio experiments and simulation results show that the security of the transmission of the key is further improved and the environmental requirements of key transmission are reduced. Analysis has demonstrated that the proposed DNA-based random key generation and management solutions are marked by high security and usability. Published by Elsevier B.V.
Determining Phylogenetic Relationships Among Date Palm Cultivars Using Random Amplified Polymorphic DNA (RAPD) and Inter-Simple Sequence Repeat (ISSR) Markers.

PubMed

Haider, Nadia

2017-01-01

Investigation of genetic variation and phylogenetic relationships among date palm (Phoenix dactylifera L.) cultivars is useful for their conservation and genetic improvement. Various molecular markers such as restriction fragment length polymorphisms (RFLPs), simple sequence repeat (SSR), representational difference analysis (RDA), and amplified fragment length polymorphism (AFLP) have been developed to molecularly characterize date palm cultivars. PCR-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeat (ISSR) are powerful tools to determine the relatedness of date palm cultivars that are difficult to distinguish morphologically. In this chapter, the principles, materials, and methods of RAPD and ISSR techniques are presented. Analysis of data generated from these two techniques and the use of these data to reveal phylogenetic relationships among date palm cultivars are also discussed.
Gene discovery in Boophilus microplus, the cattle tick: the transcriptomes of ovaries, salivary glands, and hemocytes.

PubMed

Santos, Isabel K F de Miranda; Valenzuela, Jesus G; Ribeiro, José Marcos C; de Castro, Marilia; Costa, Juliana Nardelli; Costa, Ana Maria; da Silva, Edson Ramiro; Neto, Olavo Bilac Rego; Rocha, Clarisse; Daffre, Sirlei; Ferreira, Beatriz R; da Silva, João Santana; Szabó, Matias Pablo; Bechara, Gervasio Henrique

2004-10-01

The quest for new control strategies for ticks can profit from high throughput genomics. In order to identify genes that are involved in oogenesis and development, in defense, and in hematophagy, the transcriptomes of ovaries, hemocytes, and salivary glands from rapidly ingurgitating females, and of salivary glands from males of Boophilus microplus were PCR amplified, and the expressed sequence tags (EST) of random clones were mass sequenced. So far, more than 1,344 EST have been generated for these tissues, with approximately 30% novelty, depending on the the tissue studied. To date approximately 760 nucleotide sequences from B. microplus are deposited in the NCBI database. Mass sequencing of partial cDNAs of parasite genes can build up this scant database and rapidly generate a large quantity of useful information about potential targets for immunobiological or chemical control.
Statistical inference of the generation probability of T-cell receptors from sequence repertoires.

PubMed

Murugan, Anand; Mora, Thierry; Walczak, Aleksandra M; Callan, Curtis G

2012-10-02

Stochastic rearrangement of germline V-, D-, and J-genes to create variable coding sequence for certain cell surface receptors is at the origin of immune system diversity. This process, known as "VDJ recombination", is implemented via a series of stochastic molecular events involving gene choices and random nucleotide insertions between, and deletions from, genes. We use large sequence repertoires of the variable CDR3 region of human CD4+ T-cell receptor beta chains to infer the statistical properties of these basic biochemical events. Because any given CDR3 sequence can be produced in multiple ways, the probability distribution of hidden recombination events cannot be inferred directly from the observed sequences; we therefore develop a maximum likelihood inference method to achieve this end. To separate the properties of the molecular rearrangement mechanism from the effects of selection, we focus on nonproductive CDR3 sequences in T-cell DNA. We infer the joint distribution of the various generative events that occur when a new T-cell receptor gene is created. We find a rich picture of correlation (and absence thereof), providing insight into the molecular mechanisms involved. The generative event statistics are consistent between individuals, suggesting a universal biochemical process. Our probabilistic model predicts the generation probability of any specific CDR3 sequence by the primitive recombination process, allowing us to quantify the potential diversity of the T-cell repertoire and to understand why some sequences are shared between individuals. We argue that the use of formal statistical inference methods, of the kind presented in this paper, will be essential for quantitative understanding of the generation and evolution of diversity in the adaptive immune system.
Optical Processing Techniques For Pseudorandom Sequence Prediction

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.

1983-11-01

Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
Normal and compound poisson approximations for pattern occurrences in NGS reads.

PubMed

Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu

2012-06-01

Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).

Generation of a total of 6483 expressed sequence tags from 60 day-old bovine whole fetus and fetal placenta.

PubMed

Oishi, M; Gohma, H; Lejukole, H Y; Taniguchi, Y; Yamada, T; Suzuki, K; Shinkai, H; Uenishi, H; Yasue, H; Sasaki, Y

2004-05-01

Expressed sequence tags (ESTs) generated based on characterization of clones isolated randomly from cDNA libraries are used to study gene expression profiles in specific tissues and to provide useful information for characterizing tissue physiology. In this study, two directionally cloned cDNA libraries were constructed from 60 day-old bovine whole fetus and fetal placenta. We have characterized 5357 and 1126 clones, and then identified 3464 and 795 unique sequences for the fetus and placenta cDNA libraries: 1851 and 504 showed homology to already identified genes, and 1613 and 291 showed no significant matches to any of the sequences in DNA databases, respectively. Further, we found 94 unique sequences overlapping in both the fetus and the placenta, leading to a catalog of 4165 genes expressed in 60 day-old fetus and placenta. The catalog is used to examine expression profile of genes in 60 day-old bovine fetus and placenta.
High-Throughput Next-Generation Sequencing of Polioviruses

PubMed Central

Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

2016-01-01

ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

PubMed

Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

PubMed Central

Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
Random whole metagenomic sequencing for forensic discrimination of soils.

PubMed

Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

2014-01-01

Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
The neural correlates of implicit sequence learning in schizophrenia.

PubMed

Marvel, Cherie L; Turner, Beth M; O'Leary, Daniel S; Johnson, Hans J; Pierson, Ronald K; Ponto, Laura L Boles; Andreasen, Nancy C

2007-11-01

Twenty-seven schizophrenia spectrum patients and 25 healthy controls performed a probabilistic version of the serial reaction time task (SRT) that included sequence trials embedded within random trials. Patients showed diminished, yet measurable, sequence learning. Postexperimental analyses revealed that a group of patients performed above chance when generating short spans of the sequence. This high-generation group showed SRT learning that was similar in magnitude to that of controls. Their learning was evident from the very 1st block; however, unlike controls, learning did not develop further with continued testing. A subset of 12 patients and 11 controls performed the SRT in conjunction with positron emission tomography. High-generation performance, which corresponded to SRT learning in patients, correlated to activity in the premotor cortex and parahippocampus. These areas have been associated with stimulus-driven visuospatial processing. Taken together, these results suggest that a subset of patients who showed moderate success on the SRT used an explicit stimulus-driven strategy to process the sequential stimuli. This adaptive strategy facilitated sequence learning but may have interfered with conventional implicit learning of the overall stimulus pattern. PsycINFO Database Record (c) 2007 APA, all rights reserved.
Methodological reporting of randomized clinical trials in respiratory research in 2010.

PubMed

Lu, Yi; Yao, Qiuju; Gu, Jie; Shen, Ce

2013-09-01

Although randomized controlled trials (RCTs) are considered the highest level of evidence, they are also subject to bias, due to a lack of adequately reported randomization, and therefore the reporting should be as explicit as possible for readers to determine the significance of the contents. We evaluated the methodological quality of RCTs in respiratory research in high ranking clinical journals, published in 2010. We assessed the methodological quality, including generation of the allocation sequence, allocation concealment, double-blinding, sample-size calculation, intention-to-treat analysis, flow diagrams, number of medical centers involved, diseases, funding sources, types of interventions, trial registration, number of times the papers have been cited, journal impact factor, journal type, and journal endorsement of the CONSORT (Consolidated Standards of Reporting Trials) rules, in RCTs published in 12 top ranking clinical respiratory journals and 5 top ranking general medical journals. We included 176 trials, of which 93 (53%) reported adequate generation of the allocation sequence, 66 (38%) reported adequate allocation concealment, 79 (45%) were double-blind, 123 (70%) reported adequate sample-size calculation, 88 (50%) reported intention-to-treat analysis, and 122 (69%) included a flow diagram. Multivariate logistic regression analysis revealed that journal impact factor ≥ 5 was the only variable that significantly influenced adequate allocation sequence generation. Trial registration and journal impact factor ≥ 5 significantly influenced adequate allocation concealment. Medical interventions, trial registration, and journal endorsement of the CONSORT statement influenced adequate double-blinding. Publication in one of the general medical journal influenced adequate sample-size calculation. The methodological quality of RCTs in respiratory research needs improvement. Stricter enforcement of the CONSORT statement should enhance the quality of RCTs.
Long period pseudo random number sequence generator

NASA Technical Reports Server (NTRS)

Wang, Charles C. (Inventor)

1989-01-01

A circuit for generating a sequence of pseudo random numbers, (A sub K). There is an exponentiator in GF(2 sup m) for the normal basis representation of elements in a finite field GF(2 sup m) each represented by m binary digits and having two inputs and an output from which the sequence (A sub K). Of pseudo random numbers is taken. One of the two inputs is connected to receive the outputs (E sub K) of maximal length shift register of n stages. There is a switch having a pair of inputs and an output. The switch outputs is connected to the other of the two inputs of the exponentiator. One of the switch inputs is connected for initially receiving a primitive element (A sub O) in GF(2 sup m). Finally, there is a delay circuit having an input and an output. The delay circuit output is connected to the other of the switch inputs and the delay circuit input is connected to the output of the exponentiator. Whereby after the exponentiator initially receives the primitive element (A sub O) in GF(2 sup m) through the switch, the switch can be switched to cause the exponentiator to receive as its input a delayed output A(K-1) from the exponentiator thereby generating (A sub K) continuously at the output of the exponentiator. The exponentiator in GF(2 sup m) is novel and comprises a cyclic-shift circuit; a Massey-Omura multiplier; and, a control logic circuit all operably connected together to perform the function U(sub i) = 92(sup i) (for n(sub i) = 1 or 1 (for n(subi) = 0).
AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

PubMed

Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

2009-01-01

The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.
The relationships of 'ecstasy' (MDMA) and cannabis use to impaired executive inhibition and access to semantic long-term memory.

PubMed

Murphy, Philip N; Erwin, Philip G; Maciver, Linda; Fisk, John E; Larkin, Derek; Wareing, Michelle; Montgomery, Catharine; Hilton, Joanne; Tames, Frank J; Bradley, Belinda; Yanulevitch, Kate; Ralley, Richard

2011-10-01

This study aimed to examine the relationship between the consumption of ecstasy (3,4-methylenedioxymethamphetamine (MDMA)) and cannabis, and performance on the random letter generation task which generates dependent variables drawing upon executive inhibition and access to semantic long-term memory (LTM). The participant group was a between-participant independent variable with users of both ecstasy and cannabis (E/C group, n = 15), users of cannabis but not ecstasy (CA group, n = 13) and controls with no exposure to these drugs (CO group, n = 12). Dependent variables measured violations of randomness: number of repeat sequences, number of alphabetical sequences (both drawing upon inhibition) and redundancy (drawing upon access to semantic LTM). E/C participants showed significantly higher redundancy than CO participants but did not differ from CA participants. There were no significant effects for the other dependent variables. A regression model comprising intelligence measures and estimates of ecstasy and cannabis consumption predicted redundancy scores, but only cannabis consumption contributed significantly to this prediction. Impaired access to semantic LTM may be related to cannabis consumption, although the involvement of ecstasy and other stimulant drugs cannot be excluded here. Executive inhibitory functioning, as measured by the random letter generation task, is unrelated to ecstasy and cannabis consumption. Copyright © 2011 John Wiley & Sons, Ltd.
Efficient error correction for next-generation sequencing of viral amplicons

PubMed Central

2012-01-01

Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430
Efficient error correction for next-generation sequencing of viral amplicons.

PubMed

Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury

2012-06-25

Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.
Theory and implementation of a very high throughput true random number generator in field programmable gate array

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less
Theory and implementation of a very high throughput true random number generator in field programmable gate array.

PubMed

Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao

2016-04-01

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
Cooperation of Deterministic Dynamics and Random Noise in Production of Complex Syntactical Avian Song Sequences: A Neural Network Model

PubMed Central

Yamashita, Yuichi; Okumura, Tetsu; Okanoya, Kazuo; Tani, Jun

2011-01-01

How the brain learns and generates temporal sequences is a fundamental issue in neuroscience. The production of birdsongs, a process which involves complex learned sequences, provides researchers with an excellent biological model for this topic. The Bengalese finch in particular learns a highly complex song with syntactical structure. The nucleus HVC (HVC), a premotor nucleus within the avian song system, plays a key role in generating the temporal structures of their songs. From lesion studies, the nucleus interfacialis (NIf) projecting to the HVC is considered one of the essential regions that contribute to the complexity of their songs. However, the types of interaction between the HVC and the NIf that can produce complex syntactical songs remain unclear. In order to investigate the function of interactions between the HVC and NIf, we have proposed a neural network model based on previous biological evidence. The HVC is modeled by a recurrent neural network (RNN) that learns to generate temporal patterns of songs. The NIf is modeled as a mechanism that provides auditory feedback to the HVC and generates random noise that feeds into the HVC. The model showed that complex syntactical songs can be replicated by simple interactions between deterministic dynamics of the RNN and random noise. In the current study, the plausibility of the model is tested by the comparison between the changes in the songs of actual birds induced by pharmacological inhibition of the NIf and the changes in the songs produced by the model resulting from modification of parameters representing NIf functions. The efficacy of the model demonstrates that the changes of songs induced by pharmacological inhibition of the NIf can be interpreted as a trade-off between the effects of noise and the effects of feedback on the dynamics of the RNN of the HVC. These facts suggest that the current model provides a convincing hypothesis for the functional role of NIf–HVC interaction. PMID:21559065
Partial characterization of normal and Haemophilus influenzae-infected mucosal complementary DNA libraries in chinchilla middle ear mucosa.

PubMed

Kerschner, Joseph E; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J Christopher; Ehrlich, Garth D

2010-04-01

We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription-polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis.
Partial Characterization of Normal and Haemophilus influenzae–Infected Mucosal Complementary DNA Libraries in Chinchilla Middle Ear Mucosa

PubMed Central

Kerschner, Joseph E.; Erdos, Geza; Hu, Fen Ze; Burrows, Amy; Cioffi, Joseph; Khampang, Pawjai; Dahlgren, Margaret; Hayes, Jay; Keefe, Randy; Janto, Benjamin; Post, J. Christopher; Ehrlich, Garth D.

2010-01-01

Objectives We sought to construct and partially characterize complementary DNA (cDNA) libraries prepared from the middle ear mucosa (MEM) of chinchillas to better understand pathogenic aspects of infection and inflammation, particularly with respect to leukotriene biogenesis and response. Methods Chinchilla MEM was harvested from controls and after middle ear inoculation with nontypeable Haemophilus influenzae. RNA was extracted to generate cDNA libraries. Randomly selected clones were subjected to sequence analysis to characterize the libraries and to provide DNA sequence for phylogenetic analyses. Reverse transcription–polymerase chain reaction of the RNA pools was used to generate cDNA sequences corresponding to genes associated with leukotriene biosynthesis and metabolism. Results Sequence analysis of 921 randomly selected clones from the uninfected MEM cDNA library produced approximately 250,000 nucleotides of almost entirely novel sequence data. Searches of the GenBank database with the Basic Local Alignment Search Tool provided for identification of 515 unique genes expressed in the MEM and not previously described in chinchillas. In almost all cases, the chinchilla cDNA sequences displayed much greater homology to human or other primate genes than with rodent species. Genes associated with leukotriene metabolism were present in both normal and infected MEM. Conclusions Based on both phylogenetic comparisons and gene expression similarities with humans, chinchilla MEM appears to be an excellent model for the study of middle ear inflammation and infection. The higher degree of sequence similarity between chinchillas and humans compared to chinchillas and rodents was unexpected. The cDNA libraries from normal and infected chinchilla MEM will serve as useful molecular tools in the study of otitis media and should yield important information with respect to middle ear pathogenesis. PMID:20433028
Bicomponent Block Copolymers Derived from One or More Random Copolymers as an Alternative Route to Controllable Phase Behavior

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ashraf, Arman R.; Ryan, Justin J.; Satkowski, Michael M.

Block copolymers have been extensively studied due to their ability to spontaneously self-organize into a wide variety of morphologies that are valuable in energy-, medical- and conservation-related (nano)technologies. While the phase behavior of bicomponent diblock and triblock copolymers is conventionally governed by temperature and individual block masses, we demonstrate that their phase behavior can alternatively be controlled through the use of blocks with random monomer sequencing. Block random copolymers (BRCs), i.e., diblock copolymers wherein one or both blocks is a random copolymer comprised of A and B repeat units, have been synthesized, and their phase behavior, expressed in terms ofmore » the order-disorder transition (ODT), has been investigated. Our results establish that, depending on the block composition contrast and molecular weight, BRCs can microphase-separate. We also report that the predicted ODT can be generated at relatively constant molecular weight and temperature with these new soft materials. This sequence-controlled synthetic strategy is extended to thermoplastic elastomeric triblock copolymers differing in chemistry and possessing a random-copolymer midblock.« less
Deep Sequencing to Identify the Causes of Viral Encephalitis

PubMed Central

Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

2014-01-01

Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

Proteasix: a tool for automated and large-scale prediction of proteases involved in naturally occurring peptide generation.

PubMed

Klein, Julie; Eales, James; Zürbig, Petra; Vlahou, Antonia; Mischak, Harald; Stevens, Robert

2013-04-01

In this study, we have developed Proteasix, an open-source peptide-centric tool that can be used to predict in silico the proteases involved in naturally occurring peptide generation. We developed a curated cleavage site (CS) database, containing 3500 entries about human protease/CS combinations. On top of this database, we built a tool, Proteasix, which allows CS retrieval and protease associations from a list of peptides. To establish the proof of concept of the approach, we used a list of 1388 peptides identified from human urine samples, and compared the prediction to the analysis of 1003 randomly generated amino acid sequences. Metalloprotease activity was predominantly involved in urinary peptide generation, and more particularly to peptides associated with extracellular matrix remodelling, compared to proteins from other origins. In comparison, random sequences returned almost no results, highlighting the specificity of the prediction. This study provides a tool that can facilitate linking of identified protein fragments to predicted protease activity, and therefore into presumed mechanisms of disease. Experiments are needed to confirm the in silico hypotheses; nevertheless, this approach may be of great help to better understand molecular mechanisms of disease, and define new biomarkers, and therapeutic targets. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

PubMed Central

Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

2015-01-01

A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961
A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules.

PubMed

Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

2015-01-01

A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.
Time- and Cost-Efficient Identification of T-DNA Insertion Sites through Targeted Genomic Sequencing

PubMed Central

Lepage, Étienne; Zampini, Éric; Boyle, Brian; Brisson, Normand

2013-01-01

Forward genetic screens enable the unbiased identification of genes involved in biological processes. In Arabidopsis, several mutant collections are publicly available, which greatly facilitates such practice. Most of these collections were generated by agrotransformation of a T-DNA at random sites in the plant genome. However, precise mapping of T-DNA insertion sites in mutants isolated from such screens is a laborious and time-consuming task. Here we report a simple, low-cost and time efficient approach to precisely map T-DNA insertions simultaneously in many different mutants. By combining sequence capture, next-generation sequencing and 2D-PCR pooling, we developed a new method that allowed the rapid localization of T-DNA insertion sites in 55 out of 64 mutant plants isolated in a screen for gyrase inhibition hypersensitivity. PMID:23951038
The Teaching of Protein Synthesis--A Microcomputer Based Method.

ERIC Educational Resources Information Center

Goodridge, Frank

1983-01-01

Describes two computer programs (BASIC for 32K Commodore PET) for teaching protein synthesis. The first is an interactive test of base-pairing knowledge, and the second generates random DNA nucleotide sequences, with instructions for substitution, insertion, and deletion printed out for each student. (JN)
Linkage mapping in a watermelon population segregating for fusarium wilt resistance

Treesearch

Leigh K. Hawkins; Fenny Dane; Thomas L. Kubisiak; Billy B. Rhodes; Robert L. Jarret

2001-01-01

Isozyme, randomly amplified polymorphic DNA (RAPD), and simple sequence repeats (SSR) markers were used to generate a linkage map in an F2 and F3 watermelon (Citrullus lanatus (Thumb.) Matsum. & Nakai) population derived from a cross between the fusarium wilt (Fusarium oxysporum f....
Three-dimensional information hierarchical encryption based on computer-generated holograms

NASA Astrophysics Data System (ADS)

Kong, Dezhao; Shen, Xueju; Cao, Liangcai; Zhang, Hao; Zong, Song; Jin, Guofan

2016-12-01

A novel approach for encrypting three-dimensional (3-D) scene information hierarchically based on computer-generated holograms (CGHs) is proposed. The CGHs of the layer-oriented 3-D scene information are produced by angular-spectrum propagation algorithm at different depths. All the CGHs are then modulated by different chaotic random phase masks generated by the logistic map. Hierarchical encryption encoding is applied when all the CGHs are accumulated one by one, and the reconstructed volume of the 3-D scene information depends on permissions of different users. The chaotic random phase masks could be encoded into several parameters of the chaotic sequences to simplify the transmission and preservation of the keys. Optical experiments verify the proposed method and numerical simulations show the high key sensitivity, high security, and application flexibility of the method.
Spontaneous Generation of Infectious Prion Disease in Transgenic Mice

PubMed Central

Castilla, Joaquín; Pintado, Belén; Gutiérrez-Adan, Alfonso; Andréoletti, Olivier; Aguilar-Calvo, Patricia; Arroba, Ana-Isabel; Parra-Arrondo, Beatriz; Ferrer, Isidro; Manzanares, Jorge; Espinosa, Juan-Carlos

2013-01-01

We generated transgenic mice expressing bovine cellular prion protein (PrPC) with a leucine substitution at codon 113 (113L). This protein is homologous to human protein with mutation 102L, and its genetic link with Gerstmann–Sträussler–Scheinker syndrome has been established. This mutation in bovine PrPC causes a fully penetrant, lethal, spongiform encephalopathy. This genetic disease was transmitted by intracerebral inoculation of brain homogenate from ill mice expressing mutant bovine PrP to mice expressing wild-type bovine PrP, which indicated de novo generation of infectious prions. Our findings demonstrate that a single amino acid change in the PrPC sequence can induce spontaneous generation of an infectious prion disease that differs from all others identified in hosts expressing the same PrPC sequence. These observations support the view that a variety of infectious prion strains might spontaneously emerge in hosts displaying random genetic PrPC mutations. PMID:24274622
Bit Error Probability for Maximum Likelihood Decoding of Linear Block Codes

NASA Technical Reports Server (NTRS)

Lin, Shu; Fossorier, Marc P. C.; Rhee, Dojun

1996-01-01

In this paper, the bit error probability P(sub b) for maximum likelihood decoding of binary linear codes is investigated. The contribution of each information bit to P(sub b) is considered. For randomly generated codes, it is shown that the conventional approximation at high SNR P(sub b) is approximately equal to (d(sub H)/N)P(sub s), where P(sub s) represents the block error probability, holds for systematic encoding only. Also systematic encoding provides the minimum P(sub b) when the inverse mapping corresponding to the generator matrix of the code is used to retrieve the information sequence. The bit error performances corresponding to other generator matrix forms are also evaluated. Although derived for codes with a generator matrix randomly generated, these results are shown to provide good approximations for codes used in practice. Finally, for decoding methods which require a generator matrix with a particular structure such as trellis decoding or algebraic-based soft decision decoding, equivalent schemes that reduce the bit error probability are discussed.
Singular over-representation of an octameric palindrome, HIP1, in DNA from many cyanobacteria.

PubMed

Robinson, N J; Robinson, P J; Gupta, A; Bleasby, A J; Whitton, B A; Morby, A P

1995-03-11

An octameric palindrome (5'-GCGATCGC-3') is abundant in cyanobacterial sequences within databases (GenBank/EMBL) and was designated HIP1 (highly iterated palindrome). The frequency of occurrence of all 256 octameric palindromes has now been determined in sub-databases revealing large and unique over-representation of HIP1 in cyanobacterial entries. DNA sequences from other bacteria were searched for any over-represented octameric palindromes analogous to HIP1. Only two sequences were identified, in the genomes of a thermophile and halophilic archaebacteria, although these were less abundant than HIP1 in cyanobacteria and relate to codon usage. To test the proposed widespread distribution of HIP1 in DNA from the cyanobacterium Synechococcus PCC 6301, randomly selected genomic clones were partly sequenced. HIP1 constituted 2.5% of the novel sequences, equivalent to a site on average once every 320 nucleotides. An oligonucleotide including HIP1 was also tested in PCR. Multiple products were obtained using template DNA from cyanobacterial strains in which HIP1 is abundant in known sequences, and some strains generated characteristic HIP-PCR banding patterns. However, analysis of DNA from one strain (not previously represented in databases) by random sequencing, HIP-PCR and Pvul digestion, confirms that not all cyanobacterial genomes are rich in HIP1.
Usefulness of fire ant genetics in insecticide efficacy trials

USDA-ARS?s Scientific Manuscript database

Mature fire ant colonies contain an average of 80,000 worker ants. For this study, eight fire ant workers were randomly sampled from each colony. DNA fingerprints for each individual ant were generated using 21 simple sequence repeats (SSR) markers that were developed from fire ant DNA by other lab...
Unbiased Combinatorial Genomic Approaches to Identify Alternative Therapeutic Targets within the TSC Signaling Network

DTIC Science & Technology

2013-06-01

number of ways to generate either random mutations or specific alterations to the genome sequence . Unlike previous approaches however, both TALENs and...made to the donor construct will be incorporated into the endogenous genomic sequence (examples in Liu et al., 2012; Zu et al., 2013). One challenge... Drosophila with the CRISPR RNA-guided Cas9 nuclease. Genetics. 2013. Hwang WY, Fu Y, Reyon D, Maeder ML, Tsai SQ, Sander JD, et al. Efficient genome
mtDNA sequence diversity of Hazara ethnic group from Pakistan.

PubMed

Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

2017-09-01

The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.
[Screening specific recognition motif of RNA-binding proteins by SELEX in combination with next-generation sequencing technique].

PubMed

Zhang, Lu; Xu, Jinhao; Ma, Jinbiao

2016-07-25

RNA-binding protein exerts important biological function by specifically recognizing RNA motif. SELEX (Systematic evolution of ligands by exponential enrichment), an in vitro selection method, can obtain consensus motif with high-affinity and specificity for many target molecules from DNA or RNA libraries. Here, we combined SELEX with next-generation sequencing to study the protein-RNA interaction in vitro. A pool of RNAs with 20 bp random sequences were transcribed by T7 promoter, and target protein was inserted into plasmid containing SBP-tag, which can be captured by streptavidin beads. Through only one cycle, the specific RNA motif can be obtained, which dramatically improved the selection efficiency. Using this method, we found that human hnRNP A1 RRMs domain (UP1 domain) bound RNA motifs containing AGG and AG sequences. The EMSA experiment indicated that hnRNP A1 RRMs could bind the obtained RNA motif. Taken together, this method provides a rapid and effective method to study the RNA binding specificity of proteins.
High-throughput and site-specific identification of 2'-O-methylation sites using ribose oxidation sequencing (RibOxi-seq).

PubMed

Zhu, Yinzhou; Pirnie, Stephan P; Carmichael, Gordon G

2017-08-01

Ribose methylation (2'- O -methylation, 2'- O Me) occurs at high frequencies in rRNAs and other small RNAs and is carried out using a shared mechanism across eukaryotes and archaea. As RNA modifications are important for ribosome maturation, and alterations in these modifications are associated with cellular defects and diseases, it is important to characterize the landscape of 2'- O -methylation. Here we report the development of a highly sensitive and accurate method for ribose methylation detection using next-generation sequencing. A key feature of this method is the generation of RNA fragments with random 3'-ends, followed by periodate oxidation of all molecules terminating in 2',3'-OH groups. This allows only RNAs harboring 2'-OMe groups at their 3'-ends to be sequenced. Although currently requiring microgram amounts of starting material, this method is robust for the analysis of rRNAs even at low sequencing depth. © 2017 Zhu et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism

PubMed Central

Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

2015-01-01

HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds. PMID:26585833
Position-specific automated processing of V3 env ultra-deep pyrosequencing data for predicting HIV-1 tropism.

PubMed

Jeanne, Nicolas; Saliou, Adrien; Carcenac, Romain; Lefebvre, Caroline; Dubois, Martine; Cazabat, Michelle; Nicot, Florence; Loiseau, Claire; Raymond, Stéphanie; Izopet, Jacques; Delobel, Pierre

2015-11-20

HIV-1 coreceptor usage must be accurately determined before starting CCR5 antagonist-based treatment as the presence of undetected minor CXCR4-using variants can cause subsequent virological failure. Ultra-deep pyrosequencing of HIV-1 V3 env allows to detect low levels of CXCR4-using variants that current genotypic approaches miss. However, the computation of the mass of sequence data and the need to identify true minor variants while excluding artifactual sequences generated during amplification and ultra-deep pyrosequencing is rate-limiting. Arbitrary fixed cut-offs below which minor variants are discarded are currently used but the errors generated during ultra-deep pyrosequencing are sequence-dependant rather than random. We have developed an automated processing of HIV-1 V3 env ultra-deep pyrosequencing data that uses biological filters to discard artifactual or non-functional V3 sequences followed by statistical filters to determine position-specific sensitivity thresholds, rather than arbitrary fixed cut-offs. It allows to retain authentic sequences with point mutations at V3 positions of interest and discard artifactual ones with accurate sensitivity thresholds.
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

PubMed Central

2010-01-01

Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements. PMID:20205909
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

PubMed

Nuel, Gregory; Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

2010-01-26

In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.
Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

PubMed

Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

2016-01-01

Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety ‘Amrapali’ (Mangifera indica L.)

PubMed Central

Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

2016-01-01

Mango (Mangifera indica L.) is called “king of fruits” due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties ‘Neelam’, ‘Dashehari’ and their hybrid ‘Amrapali’ using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango. PMID:27736892
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum.

PubMed

VanBuren, Robert; Bryant, Doug; Edger, Patrick P; Tang, Haibao; Burgess, Diane; Challabathula, Dinakar; Spittle, Kristi; Hall, Richard; Gu, Jenny; Lyons, Eric; Freeling, Michael; Bartels, Dorothea; Ten Hallers, Boudewijn; Hastie, Alex; Michael, Todd P; Mockler, Todd C

2015-11-26

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetium genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a 'near-complete' draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. The Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.
Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing.

PubMed

Watanabe, Manabu; Kusano, Junko; Ohtaki, Shinsaku; Ishikura, Takashi; Katayama, Jin; Koguchi, Akira; Paumen, Michael; Hayashi, Yoshiharu

2014-09-01

Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line) were used as a model. Single-cell capture was performed using laser capture microdissection (LCM) with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈ 10(6) cells) were subjected to whole genome amplification (WGA). For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel) was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 10(31-35). For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100 × were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100 × were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
A new complexity measure for time series analysis and classification

NASA Astrophysics Data System (ADS)

Nagaraj, Nithin; Balasubramanian, Karthi; Dey, Sutirth

2013-07-01

Complexity measures are used in a number of applications including extraction of information from data such as ecological time series, detection of non-random structure in biomedical signals, testing of random number generators, language recognition and authorship attribution etc. Different complexity measures proposed in the literature like Shannon entropy, Relative entropy, Lempel-Ziv, Kolmogrov and Algorithmic complexity are mostly ineffective in analyzing short sequences that are further corrupted with noise. To address this problem, we propose a new complexity measure ETC and define it as the "Effort To Compress" the input sequence by a lossless compression algorithm. Here, we employ the lossless compression algorithm known as Non-Sequential Recursive Pair Substitution (NSRPS) and define ETC as the number of iterations needed for NSRPS to transform the input sequence to a constant sequence. We demonstrate the utility of ETC in two applications. ETC is shown to have better correlation with Lyapunov exponent than Shannon entropy even with relatively short and noisy time series. The measure also has a greater rate of success in automatic identification and classification of short noisy sequences, compared to entropy and a popular measure based on Lempel-Ziv compression (implemented by Gzip).
Structure-Function Analysis of Chloroplast Proteins via Random Mutagenesis Using Error-Prone PCR.

PubMed

Dumas, Louis; Zito, Francesca; Auroy, Pascaline; Johnson, Xenie; Peltier, Gilles; Alric, Jean

2018-06-01

Site-directed mutagenesis of chloroplast genes was developed three decades ago and has greatly advanced the field of photosynthesis research. Here, we describe a new approach for generating random chloroplast gene mutants that combines error-prone polymerase chain reaction of a gene of interest with chloroplast complementation of the knockout Chlamydomonas reinhardtii mutant. As a proof of concept, we targeted a 300-bp sequence of the petD gene that encodes subunit IV of the thylakoid membrane-bound cytochrome b 6 f complex. By sequencing chloroplast transformants, we revealed 149 mutations in the 300-bp target petD sequence that resulted in 92 amino acid substitutions in the 100-residue target subunit IV sequence. Our results show that this method is suited to the study of highly hydrophobic, multisubunit, and chloroplast-encoded proteins containing cofactors such as hemes, iron-sulfur clusters, and chlorophyll pigments. Moreover, we show that mutant screening and sequencing can be used to study photosynthetic mechanisms or to probe the mutational robustness of chloroplast-encoded proteins, and we propose that this method is a valuable tool for the directed evolution of enzymes in the chloroplast. © 2018 American Society of Plant Biologists. All rights reserved.
Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

PubMed Central

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

2015-01-01

ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644
ECB deacylase mutants

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.

2002-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Population and performance analyses of four major populations with Illumina's FGx Forensic Genomics System.

PubMed

Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Seah, Lay Hong; Budowle, Bruce

2017-09-01

The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation. Copyright © 2017 Elsevier B.V. All rights reserved.
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L

PubMed Central

2012-01-01

Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
The Color-Word Interference Test and Its Relation to Performance Impairment under Auditory Distraction.

ERIC Educational Resources Information Center

Thackray, Richard I.; And Others

The ability to resist distraction is an important requirement for air traffic controllers. The study examined the relationship between performance on the Stroop color-word interference test (a suggested measure of distraction susceptibility) and impairment under auditory distraction on a task requiring the subject to generate random sequences of…
Studying long 16S rDNA sequences with ultrafast-metagenomic sequence classification using exact alignments (Kraken).

PubMed

Valenzuela-González, Fabiola; Martínez-Porchas, Marcel; Villalpando-Canchola, Enrique; Vargas-Albores, Francisco

2016-03-01

Ultrafast-metagenomic sequence classification using exact alignments (Kraken) is a novel approach to classify 16S rDNA sequences. The classifier is based on mapping short sequences to the lowest ancestor and performing alignments to form subtrees with specific weights in each taxon node. This study aimed to evaluate the classification performance of Kraken with long 16S rDNA random environmental sequences produced by cloning and then Sanger sequenced. A total of 480 clones were isolated and expanded, and 264 of these clones formed contigs (1352 ± 153 bp). The same sequences were analyzed using the Ribosomal Database Project (RDP) classifier. Deeper classification performance was achieved by Kraken than by the RDP: 73% of the contigs were classified up to the species or variety levels, whereas 67% of these contigs were classified no further than the genus level by the RDP. The results also demonstrated that unassembled sequences analyzed by Kraken provide similar or inclusively deeper information. Moreover, sequences that did not form contigs, which are usually discarded by other programs, provided meaningful information when analyzed by Kraken. Finally, it appears that the assembly step for Sanger sequences can be eliminated when using Kraken. Kraken cumulates the information of both sequence senses, providing additional elements for the classification. In conclusion, the results demonstrate that Kraken is an excellent choice for use in the taxonomic assignment of sequences obtained by Sanger sequencing or based on third generation sequencing, of which the main goal is to generate larger sequences. Copyright © 2016 Elsevier B.V. All rights reserved.
Percolation in random-Sierpiński carpets: A real space renormalization group approach

NASA Astrophysics Data System (ADS)

Perreau, Michel; Peiro, Joaquina; Berthier, Serge

1996-11-01

The site percolation transition in random Sierpiński carpets is investigated by real space renormalization. The fixed point is not unique like in regular translationally invariant lattices, but depends on the number k of segmentation steps of the generation process of the fractal. It is shown that, for each scale invariance ratio n, the sequence of fixed points pn,k is increasing with k, and converges when k-->∞ toward a limit pn strictly less than 1. Moreover, in such scale invariant structures, the percolation threshold does not depend only on the scale invariance ratio n, but also on the scale. The sequence pn,k and pn are calculated for n=4, 8, 16, 32, and 64, and for k=1 to k=11, and k=∞. The corresponding thermal exponent sequence νn,k is calculated for n=8 and 16, and for k=1 to k=5, and k=∞. Suggestions are made for an experimental test in physical self-similar structures.
Pulse Compression Techniques for Laser Generated Ultrasound

NASA Technical Reports Server (NTRS)

Anastasi, R. F.; Madaras, E. I.

1999-01-01

Laser generated ultrasound for nondestructive evaluation has an optical power density limit due to rapid high heating that causes material damage. This damage threshold limits the generated ultrasound amplitude, which impacts nondestructive evaluation inspection capability. To increase ultrasound signal levels and improve the ultrasound signal-to-noise ratio without exceeding laser power limitations, it is possible to use pulse compression techniques. The approach illustrated here uses a 150mW laser-diode modulated with a pseudo-random sequence and signal correlation. Results demonstrate the successful generation of ultrasonic bulk waves in aluminum and graphite-epoxy composite materials using a modulated low-power laser diode and illustrate ultrasound bandwidth control.
A blackberry (Rubus L.) expressed sequence tag library for the development of simple sequence repeat markers

PubMed Central

Lewers, Kim S; Saski, Chris A; Cuthbertson, Brandon J; Henry, David C; Staton, Meg E; Main, Dorrie S; Dhanaraj, Anik L; Rowland, Lisa J; Tomkins, Jeff P

2008-01-01

Background The recent development of novel repeat-fruiting types of blackberry (Rubus L.) cultivars, combined with a long history of morphological marker-assisted selection for thornlessness by blackberry breeders, has given rise to increased interest in using molecular markers to facilitate blackberry breeding. Yet no genetic maps, molecular markers, or even sequences exist specifically for cultivated blackberry. The purpose of this study is to begin development of these tools by generating and annotating the first blackberry expressed sequence tag (EST) library, designing primers from the ESTs to amplify regions containing simple sequence repeats (SSR), and testing the usefulness of a subset of the EST-SSRs with two blackberry cultivars. Results A cDNA library of 18,432 clones was generated from expanding leaf tissue of the cultivar Merton Thornless, a progenitor of many thornless commercial cultivars. Among the most abundantly expressed of the 3,000 genes annotated were those involved with energy, cell structure, and defense. From individual sequences containing SSRs, 673 primer pairs were designed. Of a randomly chosen set of 33 primer pairs tested with two blackberry cultivars, 10 detected an average of 1.9 polymorphic PCR products. Conclusion This rate predicts that this library may yield as many as 940 SSR primer pairs detecting 1,786 polymorphisms. This may be sufficient to generate a genetic map that can be used to associate molecular markers with phenotypic traits, making possible molecular marker-assisted breeding to compliment existing morphological marker-assisted breeding in blackberry. PMID:18570660
Genarris: Random generation of molecular crystal structures and fast screening with a Harris approximation

NASA Astrophysics Data System (ADS)

Li, Xiayue; Curtis, Farren S.; Rose, Timothy; Schober, Christoph; Vazquez-Mayagoitia, Alvaro; Reuter, Karsten; Oberhofer, Harald; Marom, Noa

2018-06-01

We present Genarris, a Python package that performs configuration space screening for molecular crystals of rigid molecules by random sampling with physical constraints. For fast energy evaluations, Genarris employs a Harris approximation, whereby the total density of a molecular crystal is constructed via superposition of single molecule densities. Dispersion-inclusive density functional theory is then used for the Harris density without performing a self-consistency cycle. Genarris uses machine learning for clustering, based on a relative coordinate descriptor developed specifically for molecular crystals, which is shown to be robust in identifying packing motif similarity. In addition to random structure generation, Genarris offers three workflows based on different sequences of successive clustering and selection steps: the "Rigorous" workflow is an exhaustive exploration of the potential energy landscape, the "Energy" workflow produces a set of low energy structures, and the "Diverse" workflow produces a maximally diverse set of structures. The latter is recommended for generating initial populations for genetic algorithms. Here, the implementation of Genarris is reported and its application is demonstrated for three test cases.
On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1995-01-01

For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.
Annotation-based genome-wide SNP discovery in the large and complex Aegilops tauschii genome using next-generation sequencing without a reference genome sequence

PubMed Central

2011-01-01

Background Many plants have large and complex genomes with an abundance of repeated sequences. Many plants are also polyploid. Both of these attributes typify the genome architecture in the tribe Triticeae, whose members include economically important wheat, rye and barley. Large genome sizes, an abundance of repeated sequences, and polyploidy present challenges to genome-wide SNP discovery using next-generation sequencing (NGS) of total genomic DNA by making alignment and clustering of short reads generated by the NGS platforms difficult, particularly in the absence of a reference genome sequence. Results An annotation-based, genome-wide SNP discovery pipeline is reported using NGS data for large and complex genomes without a reference genome sequence. Roche 454 shotgun reads with low genome coverage of one genotype are annotated in order to distinguish single-copy sequences and repeat junctions from repetitive sequences and sequences shared by paralogous genes. Multiple genome equivalents of shotgun reads of another genotype generated with SOLiD or Solexa are then mapped to the annotated Roche 454 reads to identify putative SNPs. A pipeline program package, AGSNP, was developed and used for genome-wide SNP discovery in Aegilops tauschii-the diploid source of the wheat D genome, and with a genome size of 4.02 Gb, of which 90% is repetitive sequences. Genomic DNA of Ae. tauschii accession AL8/78 was sequenced with the Roche 454 NGS platform. Genomic DNA and cDNA of Ae. tauschii accession AS75 was sequenced primarily with SOLiD, although some Solexa and Roche 454 genomic sequences were also generated. A total of 195,631 putative SNPs were discovered in gene sequences, 155,580 putative SNPs were discovered in uncharacterized single-copy regions, and another 145,907 putative SNPs were discovered in repeat junctions. These SNPs were dispersed across the entire Ae. tauschii genome. To assess the false positive SNP discovery rate, DNA containing putative SNPs was amplified by PCR from AL8/78 and AS75 and resequenced with the ABI 3730 xl. In a sample of 302 randomly selected putative SNPs, 84.0% in gene regions, 88.0% in repeat junctions, and 81.3% in uncharacterized regions were validated. Conclusion An annotation-based genome-wide SNP discovery pipeline for NGS platforms was developed. The pipeline is suitable for SNP discovery in genomic libraries of complex genomes and does not require a reference genome sequence. The pipeline is applicable to all current NGS platforms, provided that at least one such platform generates relatively long reads. The pipeline package, AGSNP, and the discovered 497,118 Ae. tauschii SNPs can be accessed at (http://avena.pw.usda.gov/wheatD/agsnp.shtml). PMID:21266061
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE Office of Scientific and Technical Information (OSTI.GOV)

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
Single-molecule sequencing of the desiccation-tolerant grass Oropetium thomaeum

DOE PAGES

VanBuren, Robert; Bryant, Doug; Edger, Patrick P.; ...

2015-11-11

Plant genomes, and eukaryotic genomes in general, are typically repetitive, polyploid and heterozygous, which complicates genome assembly1. The short read lengths of early Sanger and current next-generation sequencing platforms hinder assembly through complex repeat regions, and many draft and reference genomes are fragmented, lacking skewed GC and repetitive intergenic sequences, which are gaining importance due to projects like the Encyclopedia of DNA Elements (ENCODE). Here we report the whole-genome sequencing and assembly of the desiccation-tolerant grass Oropetium thomaeum. Using only single-molecule real-time sequencing, which generates long (>16 kilobases) reads with random errors, we assembled 99% (244 megabases) of the Oropetiummore » genome into 625 contigs with an N50 length of 2.4 megabases. Oropetium is an example of a ‘near-complete’ draft genome which includes gapless coverage over gene space as well as intergenic sequences such as centromeres, telomeres, transposable elements and rRNA clusters that are typically unassembled in draft genomes. Oropetium has 28,466 protein-coding genes and 43% repeat sequences, yet with 30% more compact euchromatic regions it is the smallest known grass genome. As a result, the Oropetium genome demonstrates the utility of single-molecule real-time sequencing for assembling high-quality plant and other eukaryotic genomes, and serves as a valuable resource for the plant comparative genomics community.« less
The primitive code and repeats of base oligomers as the primordial protein-encoding sequence.

PubMed Central

Ohno, S; Epplen, J T

1983-01-01

Even if the prebiotic self-replication of nucleic acids and the subsequent emergence of primitive, enzyme-independent tRNAs are accepted as plausible, the origin of life by spontaneous generation still appears improbable. This is because the just-emerged primitive translational machinery had to cope with base sequences that were not preselected for their coding potentials. Particularly if the primitive mitochondria-like code with four chain-terminating base triplets preceded the universal code, the translation of long, randomly generated, base sequences at this critical stage would have merely resulted in the production of short oligopeptides instead of long polypeptide chains. We present the base sequence of a mouse transcript containing tetranucleotide repeats conserved during evolution. Even if translated in accordance with the primitive mitochondria-like code, this transcript in its three reading frames can yield 245-, 246-, and 251-residue-long tetrapeptidic periodical polypeptides that are already acquiring longer periodicities. We contend that the first set of base sequences translated at the beginning of life were such oligonucleotide repeats. By quickly acquiring longer periodicities, their products must have soon gained characteristic secondary structures--alpha-helical or beta-sheet or both. PMID:6574491

Fast registration and reconstruction of aliased low-resolution frames by use of a modified maximum-likelihood approach.

PubMed

Alam, M S; Bognar, J G; Cain, S; Yasuda, B J

1998-03-10

During the process of microscanning a controlled vibrating mirror typically is used to produce subpixel shifts in a sequence of forward-looking infrared (FLIR) images. If the FLIR is mounted on a moving platform, such as an aircraft, uncontrolled random vibrations associated with the platform can be used to generate the shifts. Iterative techniques such as the expectation-maximization (EM) approach by means of the maximum-likelihood algorithm can be used to generate high-resolution images from multiple randomly shifted aliased frames. In the maximum-likelihood approach the data are considered to be Poisson random variables and an EM algorithm is developed that iteratively estimates an unaliased image that is compensated for known imager-system blur while it simultaneously estimates the translational shifts. Although this algorithm yields high-resolution images from a sequence of randomly shifted frames, it requires significant computation time and cannot be implemented for real-time applications that use the currently available high-performance processors. The new image shifts are iteratively calculated by evaluation of a cost function that compares the shifted and interlaced data frames with the corresponding values in the algorithm's latest estimate of the high-resolution image. We present a registration algorithm that estimates the shifts in one step. The shift parameters provided by the new algorithm are accurate enough to eliminate the need for iterative recalculation of translational shifts. Using this shift information, we apply a simplified version of the EM algorithm to estimate a high-resolution image from a given sequence of video frames. The proposed modified EM algorithm has been found to reduce significantly the computational burden when compared with the original EM algorithm, thus making it more attractive for practical implementation. Both simulation and experimental results are presented to verify the effectiveness of the proposed technique.
The nonlinear, complex sequential organization of behavior in schizophrenic patients: neurocognitive strategies and clinical correlations.

PubMed

Paulus, M P; Perry, W; Braff, D L

1999-09-01

Thought disorder is a hallmark of schizophrenia and can be inferred from disorganized behavior. Measures of the sequential organization of behavior are important because they reflect the cognitive processes of the selection and sequencing of behavioral elements, which generate observable and analyzable behavioral patterns. In this context, sequences of choices generated by schizophrenic patients in a two-choice guessing task fluctuate significantly, which reflects an "oscillating dysregulation" between highly predictable and highly unpredictable subsequences within a single test session. In this study, we aimed to clarify the significance of dysregulation by seeing whether demographic, clinical, neuropsychological, and psychological measures predict the degree of dysregulation observed on this two-choice task. Thirty schizophrenic patients repeatedly performed a LEFT or RIGHT key press that was followed by a stimulus, which occurred randomly on the left or right side of the computer screen. Thus, the stimulus location had nothing to do with the key press behavior. The range of key press sequence predictabilities as measured by the dynamical entropy was used to quantify the dysregulation of response sequences and reflects the range of fixity and randomness of the responses. A factor analysis was performed and step-wise multiple regression analyses were used to relate the factor scores to demographic, clinical, symptomatic, Wisconsin Card Sorting Test (WCST), and Rorschach variables. The LEFT/RIGHT key press sequences were determined by three factors: 1) the degree of win-stay/lose-shift strategy; 2) the degree of contextual influence on the current choice; and 3) the degree of dysregulation on the choice task. Demographic and clinical variables did not predict any of the three response patterns on the choice task. In contrast, the WCST and Rorschach test predicted performance on various factors of choice task response patterns. Schizophrenic patients employ several rules, i.e., "win-stay/lose-shift" and "decide according to the previous choice," that fluctuate significantly when generating sequences on this task, confirming that a basic behavioral dysregulation occurs in a single schizophrenic subject across a single test session. The organization or the "temporal architecture" of the behavioral sequences is not related to symptoms per se, but is related to deficits in executive functioning, problem solving, and perceptual organizational abilities.
The Status, Quality, and Expansion of the NIH Full-Length cDNA Project: The Mammalian Gene Collection (MGC)

PubMed Central

2004-01-01

The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334
MDC-Analyzer: a novel degenerate primer design tool for the construction of intelligent mutagenesis libraries with contiguous sites.

PubMed

Tang, Lixia; Wang, Xiong; Ru, Beibei; Sun, Hengfei; Huang, Jian; Gao, Hui

2014-06-01

Recent computational and bioinformatics advances have enabled the efficient creation of novel biocatalysts by reducing amino acid variability at hot spot regions. To further expand the utility of this strategy, we present here a tool called Multi-site Degenerate Codon Analyzer (MDC-Analyzer) for the automated design of intelligent mutagenesis libraries that can completely cover user-defined randomized sequences, especially when multiple contiguous and/or adjacent sites are targeted. By initially defining an objective function, the possible optimal degenerate PCR primer profiles could be automatically explored using the heuristic approach of Greedy Best-First-Search. Compared to the previously developed DC-Analyzer, MDC-Analyzer allows for the existence of a small amount of undesired sequences as a tradeoff between the number of degenerate primers and the encoded library size while still providing all the benefits of DC-Analyzer with the ability to randomize multiple contiguous sites. MDC-Analyzer was validated using a series of randomly generated mutation schemes and experimental case studies on the evolution of halohydrin dehalogenase, which proved that the MDC methodology is more efficient than other methods and is particularly well-suited to exploring the sequence space of proteins using data-driven protein engineering strategies.
Iteration and superposition encryption scheme for image sequences based on multi-dimensional keys

NASA Astrophysics Data System (ADS)

Han, Chao; Shen, Yuzhen; Ma, Wenlin

2017-12-01

An iteration and superposition encryption scheme for image sequences based on multi-dimensional keys is proposed for high security, big capacity and low noise information transmission. Multiple images to be encrypted are transformed into phase-only images with the iterative algorithm and then are encrypted by different random phase, respectively. The encrypted phase-only images are performed by inverse Fourier transform, respectively, thus new object functions are generated. The new functions are located in different blocks and padded zero for a sparse distribution, then they propagate to a specific region at different distances by angular spectrum diffraction, respectively and are superposed in order to form a single image. The single image is multiplied with a random phase in the frequency domain and then the phase part of the frequency spectrums is truncated and the amplitude information is reserved. The random phase, propagation distances, truncated phase information in frequency domain are employed as multiple dimensional keys. The iteration processing and sparse distribution greatly reduce the crosstalk among the multiple encryption images. The superposition of image sequences greatly improves the capacity of encrypted information. Several numerical experiments based on a designed optical system demonstrate that the proposed scheme can enhance encrypted information capacity and make image transmission at a highly desired security level.
Quantum cryptography for secure free-space communications

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hughes, R.J.; Buttler, W.T.; Kwiat, P.G.

1999-03-01

The secure distribution of the secret random bit sequences known as key material, is an essential precursor to their use for the encryption and decryption of confidential communications. Quantum cryptography is a new technique for secure key distribution with single-photon transmissions: Heisenberg`s uncertainty principle ensures that an adversary can neither successfully tap the key transmissions, nor evade detection (eavesdropping raises the key error rate above a threshold value). The authors have developed experimental quantum cryptography systems based on the transmission of non-orthogonal photon polarization states to generate shared key material over line-of-sight optical links. Key material is built up usingmore » the transmission of a single-photon per bit of an initial secret random sequence. A quantum-mechanically random subset of this sequence is identified, becoming the key material after a data reconciliation stage with the sender. The authors have developed and tested a free-space quantum key distribution (QKD) system over an outdoor optical path of {approximately}1 km at Los Alamos National Laboratory under nighttime conditions. Results show that free-space QKD can provide secure real-time key distribution between parties who have a need to communicate secretly. Finally, they examine the feasibility of surface to satellite QKD.« less
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

PubMed Central

Dröge, J.; Gregor, I.; McHardy, A. C.

2015-01-01

Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150
Neutrality and evolvability of designed protein sequences

NASA Astrophysics Data System (ADS)

Bhattacherjee, Arnab; Biswas, Parbati

2010-07-01

The effect of foldability on protein’s evolvability is analyzed by a two-prong approach consisting of a self-consistent mean-field theory and Monte Carlo simulations. Theory and simulation models representing protein sequences with binary patterning of amino acid residues compatible with a particular foldability criteria are used. This generalized foldability criterion is derived using the high temperature cumulant expansion approximating the free energy of folding. The effect of cumulative point mutations on these designed proteins is studied under neutral condition. The robustness, protein’s ability to tolerate random point mutations is determined with a selective pressure of stability (ΔΔG) for the theory designed sequences, which are found to be more robust than that of Monte Carlo and mean-field-biased Monte Carlo generated sequences. The results show that this foldability criterion selects viable protein sequences more effectively compared to the Monte Carlo method, which has a marked effect on how the selective pressure shapes the evolutionary sequence space. These observations may impact de novo sequence design and its applications in protein engineering.
Novel pseudo-random number generator based on quantum random walks.

PubMed

Yang, Yu-Guang; Zhao, Qian-Qian

2016-02-04

In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation.
Novel pseudo-random number generator based on quantum random walks

PubMed Central

Yang, Yu-Guang; Zhao, Qian-Qian

2016-01-01

In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation. PMID:26842402
Recurrent Network models of sequence generation and memory

PubMed Central

Rajan, Kanaka; Harvey, Christopher D; Tank, David W

2016-01-01

SUMMARY Sequential activation of neurons is a common feature of network activity during a variety of behaviors, including working memory and decision making. Previous network models for sequences and memory emphasized specialized architectures in which a principled mechanism is pre-wired into their connectivity. Here, we demonstrate that starting from random connectivity and modifying a small fraction of connections, a largely disordered recurrent network can produce sequences and implement working memory efficiently. We use this process, called Partial In-Network training (PINning), to model and match cellular-resolution imaging data from the posterior parietal cortex during a virtual memory-guided two-alternative forced choice task [Harvey, Coen and Tank, 2012]. Analysis of the connectivity reveals that sequences propagate by the cooperation between recurrent synaptic interactions and external inputs, rather than through feedforward or asymmetric connections. Together our results suggest that neural sequences may emerge through learning from largely unstructured network architectures. PMID:26971945
Markovian Analysis of the Sequential Behavior of the Spontaneous Spinal Cord Dorsum Potentials Induced by Acute Nociceptive Stimulation in the Anesthetized Cat.

PubMed

Martin, Mario; Béjar, Javier; Esposito, Gennaro; Chávez, Diógenes; Contreras-Hernández, Enrique; Glusman, Silvio; Cortés, Ulises; Rudomín, Pablo

2017-01-01

In a previous study we developed a Machine Learning procedure for the automatic identification and classification of spontaneous cord dorsum potentials ( CDPs ). This study further supported the proposal that in the anesthetized cat, the spontaneous CDPs recorded from different lumbar spinal segments are generated by a distributed network of dorsal horn neurons with structured (non-random) patterns of functional connectivity and that these configurations can be changed to other non-random and stable configurations after the noceptive stimulation produced by the intradermic injection of capsaicin in the anesthetized cat. Here we present a study showing that the sequence of identified forms of the spontaneous CDPs follows a Markov chain of at least order one. That is, the system has memory in the sense that the spontaneous activation of dorsal horn neuronal ensembles producing the CDPs is not independent of the most recent activity. We used this markovian property to build a procedure to identify portions of signals as belonging to a specific functional state of connectivity among the neuronal networks involved in the generation of the CDPs . We have tested this procedure during acute nociceptive stimulation produced by the intradermic injection of capsaicin in intact as well as spinalized preparations. Altogether, our results indicate that CDP sequences cannot be generated by a renewal stochastic process. Moreover, it is possible to describe some functional features of activity in the cord dorsum by modeling the CDP sequences as generated by a Markov order one stochastic process. Finally, these Markov models make possible to determine the functional state which produced a CDP sequence. The proposed identification procedures appear to be useful for the analysis of the sequential behavior of the ongoing CDPs recorded from different spinal segments in response to a variety of experimental procedures including the changes produced by acute nociceptive stimulation. They are envisaged as a useful tool to examine alterations of the patterns of functional connectivity between dorsal horn neurons under normal and different pathological conditions, an issue of potential clinical concern.
Design automation techniques for custom LSI arrays

NASA Technical Reports Server (NTRS)

Feller, A.

1975-01-01

The standard cell design automation technique is described as an approach for generating random logic PMOS, CMOS or CMOS/SOS custom large scale integration arrays with low initial nonrecurring costs and quick turnaround time or design cycle. The system is composed of predesigned circuit functions or cells and computer programs capable of automatic placement and interconnection of the cells in accordance with an input data net list. The program generates a set of instructions to drive an automatic precision artwork generator. A series of support design automation and simulation programs are described, including programs for verifying correctness of the logic on the arrays, performing dc and dynamic analysis of MOS devices, and generating test sequences.
Multi-site Stochastic Simulation of Daily Streamflow with Markov Chain and KNN Algorithm

NASA Astrophysics Data System (ADS)

Mathai, J.; Mujumdar, P.

2017-12-01

A key focus of this study is to develop a method which is physically consistent with the hydrologic processes that can capture short-term characteristics of daily hydrograph as well as the correlation of streamflow in temporal and spatial domains. In complex water resource systems, flow fluctuations at small time intervals require that discretisation be done at small time scales such as daily scales. Also, simultaneous generation of synthetic flows at different sites in the same basin are required. We propose a method to equip water managers with a streamflow generator within a stochastic streamflow simulation framework. The motivation for the proposed method is to generate sequences that extend beyond the variability represented in the historical record of streamflow time series. The method has two steps: In step 1, daily flow is generated independently at each station by a two-state Markov chain, with rising limb increments randomly sampled from a Gamma distribution and the falling limb modelled as exponential recession and in step 2, the streamflow generated in step 1 is input to a nonparametric K-nearest neighbor (KNN) time series bootstrap resampler. The KNN model, being data driven, does not require assumptions on the dependence structure of the time series. A major limitation of KNN based streamflow generators is that they do not produce new values, but merely reshuffle the historical data to generate realistic streamflow sequences. However, daily flow generated using the Markov chain approach is capable of generating a rich variety of streamflow sequences. Furthermore, the rising and falling limbs of daily hydrograph represent different physical processes, and hence they need to be modelled individually. Thus, our method combines the strengths of the two approaches. We show the utility of the method and improvement over the traditional KNN by simulating daily streamflow sequences at 7 locations in the Godavari River basin in India.
Immunization of chickens with an avian paramyxovirus 10 isolated from Rockhopper Penguins does not provide protection against challenge with virulent Newcastle disease virus

USDA-ARS?s Scientific Manuscript database

Four viral isolates from Rockhopper Penguins were previously identified as members of a novel avian paramyxovirus serotype 10 (APMV-10). Whole genome random next-generation sequencing was performed and phylogenetic analysis showed that the isolates were most closely related to APMV-2 and APMV-8. Int...
Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

NASA Technical Reports Server (NTRS)

Kaljurand, M.; Valentin, J. R.; Shao, M.

1996-01-01

Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.
Machine-learned analysis of the association of next-generation sequencing-based human TRPV1 and TRPA1 genotypes with the sensitivity to heat stimuli and topically applied capsaicin.

PubMed

Kringel, Dario; Geisslinger, Gerd; Resch, Eduard; Oertel, Bruno G; Thrun, Michael C; Heinemann, Sarah; Lötsch, Jörn

2018-03-27

Heat pain and its modulation by capsaicin varies among subjects in experimental and clinical settings. A plausible cause is a genetic component, of which TRPV1 ion channels, by their response to both heat and capsaicin, are primary candidates. However, TRPA1 channels can heterodimerize with TRPV1 channels and carry genetic variants reported to modulate heat pain sensitivity. To address the role of these candidate genes in capsaicin-induced hypersensitization to heat, pain thresholds acquired before and after topical application of capsaicin and TRPA1/TRPV1 exomic sequences derived by next-generation sequencing were assessed in n = 75 healthy volunteers and the genetic information comprised 278 loci. Gaussian mixture modeling indicated 2 phenotype groups with high or low capsaicin-induced hypersensitization to heat. Unsupervised machine learning implemented as swarm-based clustering hinted at differences in the genetic pattern between these phenotype groups. Several methods of supervised machine learning implemented as random forests, adaptive boosting, k-nearest neighbors, naive Bayes, support vector machines, and for comparison, binary logistic regression predicted the phenotype group association consistently better when based on the observed genotypes than when using a random permutation of the exomic sequences. Of note, TRPA1 variants were more important for correct phenotype group association than TRPV1 variants. This indicates a role of the TRPA1 and TRPV1 next-generation sequencing-based genetic pattern in the modulation of the individual response to heat-related pain phenotypes. When considering earlier evidence that topical capsaicin can induce neuropathy-like quantitative sensory testing patterns in healthy subjects, implications for future analgesic treatments with transient receptor potential inhibitors arise.This is an open-access article distributed under the terms of the Creative Commons Attribution-Non Commercial-No Derivatives License 4.0 (CCBY-NC-ND), where it is permissible to download and share the work provided it is properly cited. The work cannot be changed in any way or used commercially without permission from the journal.
SNP Discovery in the Transcriptome of White Pacific Shrimp Litopenaeus vannamei by Next Generation Sequencing

PubMed Central

Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

2014-01-01

The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047
Input dependent cell assembly dynamics in a model of the striatal medium spiny neuron network.

PubMed

Ponzi, Adam; Wickens, Jeff

2012-01-01

The striatal medium spiny neuron (MSN) network is sparsely connected with fairly weak GABAergic collaterals receiving an excitatory glutamatergic cortical projection. Peri-stimulus time histograms (PSTH) of MSN population response investigated in various experimental studies display strong firing rate modulations distributed throughout behavioral task epochs. In previous work we have shown by numerical simulation that sparse random networks of inhibitory spiking neurons with characteristics appropriate for UP state MSNs form cell assemblies which fire together coherently in sequences on long behaviorally relevant timescales when the network receives a fixed pattern of constant input excitation. Here we first extend that model to the case where cortical excitation is composed of many independent noisy Poisson processes and demonstrate that cell assembly dynamics is still observed when the input is sufficiently weak. However if cortical excitation strength is increased more regularly firing and completely quiescent cells are found, which depend on the cortical stimulation. Subsequently we further extend previous work to consider what happens when the excitatory input varies as it would when the animal is engaged in behavior. We investigate how sudden switches in excitation interact with network generated patterned activity. We show that sequences of cell assembly activations can be locked to the excitatory input sequence and outline the range of parameters where this behavior is shown. Model cell population PSTH display both stimulus and temporal specificity, with large population firing rate modulations locked to elapsed time from task events. Thus the random network can generate a large diversity of temporally evolving stimulus dependent responses even though the input is fixed between switches. We suggest the MSN network is well suited to the generation of such slow coherent task dependent response which could be utilized by the animal in behavior.
Input Dependent Cell Assembly Dynamics in a Model of the Striatal Medium Spiny Neuron Network

PubMed Central

Ponzi, Adam; Wickens, Jeff

2012-01-01

The striatal medium spiny neuron (MSN) network is sparsely connected with fairly weak GABAergic collaterals receiving an excitatory glutamatergic cortical projection. Peri-stimulus time histograms (PSTH) of MSN population response investigated in various experimental studies display strong firing rate modulations distributed throughout behavioral task epochs. In previous work we have shown by numerical simulation that sparse random networks of inhibitory spiking neurons with characteristics appropriate for UP state MSNs form cell assemblies which fire together coherently in sequences on long behaviorally relevant timescales when the network receives a fixed pattern of constant input excitation. Here we first extend that model to the case where cortical excitation is composed of many independent noisy Poisson processes and demonstrate that cell assembly dynamics is still observed when the input is sufficiently weak. However if cortical excitation strength is increased more regularly firing and completely quiescent cells are found, which depend on the cortical stimulation. Subsequently we further extend previous work to consider what happens when the excitatory input varies as it would when the animal is engaged in behavior. We investigate how sudden switches in excitation interact with network generated patterned activity. We show that sequences of cell assembly activations can be locked to the excitatory input sequence and outline the range of parameters where this behavior is shown. Model cell population PSTH display both stimulus and temporal specificity, with large population firing rate modulations locked to elapsed time from task events. Thus the random network can generate a large diversity of temporally evolving stimulus dependent responses even though the input is fixed between switches. We suggest the MSN network is well suited to the generation of such slow coherent task dependent response which could be utilized by the animal in behavior. PMID:22438838

SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

PubMed

Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

2016-06-15

Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F. William

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F.W.

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
Simulative research on generating UWB signals by all-optical BPF

NASA Astrophysics Data System (ADS)

Yang, Chunyong; Hou, Rui; Chen, Shaoping

2007-11-01

The simulating technique is used to investigate generating and distributing Ultra-Wide-Band signals depend on fiber transmission. Numerical result for the system about the frequency response shows that the characteristics of band-pass filter is presented, and the shorter the wavelength is, the bandwidth of lower frequency is wider. Transmission performance simulation for 12.5Gb/s psudo-random sequence also shows that Gaussian pulse signal after transported in fiber is similar to UWB wave pattern mask of FCC in time domain and frequency spectrum specification of FCC in frequency domain .
rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

PubMed

Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard

2016-10-01

Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.
Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis

PubMed Central

Tong, Helin; Chen, You; Wang, Jingyi; Chen, Yeyuan; Sun, Guangming; He, Junhu; Wu, Yaoting

2013-01-01

Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region. PMID:24024187
At least some errors are randomly generated (Freud was wrong)

NASA Technical Reports Server (NTRS)

Sellen, A. J.; Senders, J. W.

1986-01-01

An experiment was carried out to expose something about human error generating mechanisms. In the context of the experiment, an error was made when a subject pressed the wrong key on a computer keyboard or pressed no key at all in the time allotted. These might be considered, respectively, errors of substitution and errors of omission. Each of seven subjects saw a sequence of three digital numbers, made an easily learned binary judgement about each, and was to press the appropriate one of two keys. Each session consisted of 1,000 presentations of randomly permuted, fixed numbers broken into 10 blocks of 100. One of two keys should have been pressed within one second of the onset of each stimulus. These data were subjected to statistical analyses in order to probe the nature of the error generating mechanisms. Goodness of fit tests for a Poisson distribution for the number of errors per 50 trial interval and for an exponential distribution of the length of the intervals between errors were carried out. There is evidence for an endogenous mechanism that may best be described as a random error generator. Furthermore, an item analysis of the number of errors produced per stimulus suggests the existence of a second mechanism operating on task driven factors producing exogenous errors. Some errors, at least, are the result of constant probability generating mechanisms with error rate idiosyncratically determined for each subject.
Hurdles and sorting by inversions: combinatorial, statistical, and experimental results.

PubMed

Swenson, Krister M; Lin, Yu; Rajan, Vaibhav; Moret, Bernard M E

2009-10-01

As data about genomic architecture accumulates, genomic rearrangements have attracted increasing attention. One of the main rearrangement mechanisms, inversions (also called reversals), was characterized by Hannenhalli and Pevzner and this characterization in turn extended by various authors. The characterization relies on the concepts of breakpoints, cycles, and obstructions colorfully named hurdles and fortresses. In this paper, we study the probability of generating a hurdle in the process of sorting a permutation if one does not take special precautions to avoid them (as in a randomized algorithm, for instance). To do this we revisit and extend the work of Caprara and of Bergeron by providing simple and exact characterizations of the probability of encountering a hurdle in a random permutation. Using similar methods we provide the first asymptotically tight analysis of the probability that a fortress exists in a random permutation. Finally, we study other aspects of hurdles, both analytically and through experiments: when are they created in a sequence of sorting inversions, how much later are they detected, and how much work may need to be undone to return to a sorting sequence.
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955–2013

PubMed Central

Armijo-Olivo, Susan; Cummings, Greta G.; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

Objectives To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. Methods We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Results Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955–2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. Conclusions The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed. PMID:29272315
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955-2013.

PubMed

Saltaji, Humam; Armijo-Olivo, Susan; Cummings, Greta G; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955-2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed.
High density bit transition requirements versus the effects on BCH error correcting code. [bit synchronization

NASA Technical Reports Server (NTRS)

Ingels, F. M.; Schoggen, W. O.

1982-01-01

The design to achieve the required bit transition density for the Space Shuttle high rate multiplexes (HRM) data stream of the Space Laboratory Vehicle is reviewed. It contained a recommended circuit approach, specified the pseudo random (PN) sequence to be used and detailed the properties of the sequence. Calculations showing the probability of failing to meet the required transition density were included. A computer simulation of the data stream and PN cover sequence was provided. All worst case situations were simulated and the bit transition density exceeded that required. The Preliminary Design Review and the critical Design Review are documented. The Cover Sequence Generator (CSG) Encoder/Decoder design was constructed and demonstrated. The demonstrations were successful. All HRM and HRDM units incorporate the CSG encoder or CSG decoder as appropriate.
Comprehensive genotyping in dyslipidemia: mendelian dyslipidemias caused by rare variants and Mendelian randomization studies using common variants.

PubMed

Tada, Hayato; Kawashiri, Masa-Aki; Yamagishi, Masakazu

2017-04-01

Dyslipidemias, especially hyper-low-density lipoprotein cholesterolemia and hypertriglyceridemia, are important causal risk factors for coronary artery disease. Comprehensive genotyping using the 'next-generation sequencing' technique has facilitated the investigation of Mendelian dyslipidemias, in addition to Mendelian randomization studies using common genetic variants associated with plasma lipids and coronary artery disease. The beneficial effects of low-density lipoprotein cholesterol-lowering therapies on coronary artery disease have been verified by many randomized controlled trials over the years, and subsequent genetic studies have supported these findings. More recently, Mendelian randomization studies have preceded randomized controlled trials. When the on-target/off-target effects of rare variants and common variants exhibit the same direction, novel drugs targeting molecules identified by investigations of rare Mendelian lipid disorders could be promising. Such a strategy could aid in the search for drug discovery seeds other than those for dyslipidemias.
Partial bisulfite conversion for unique template sequencing

PubMed Central

Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael

2018-01-01

Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423
The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad

The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showedmore » a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.« less
Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Jian; Luo, Mao; Zhu, Ye

2015-03-27

Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries ofmore » untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.« less
Generative adversarial networks for brain lesion detection

NASA Astrophysics Data System (ADS)

Alex, Varghese; Safwan, K. P. Mohammed; Chennamsetty, Sai Saketh; Krishnamurthi, Ganapathy

2017-02-01

Manual segmentation of brain lesions from Magnetic Resonance Images (MRI) is cumbersome and introduces errors due to inter-rater variability. This paper introduces a semi-supervised technique for detection of brain lesion from MRI using Generative Adversarial Networks (GANs). GANs comprises of a Generator network and a Discriminator network which are trained simultaneously with the objective of one bettering the other. The networks were trained using non lesion patches (n=13,000) from 4 different MR sequences. The network was trained on BraTS dataset and patches were extracted from regions excluding tumor region. The Generator network generates data by modeling the underlying probability distribution of the training data, (PData). The Discriminator learns the posterior probability P (Label Data) by classifying training data and generated data as "Real" or "Fake" respectively. The Generator upon learning the joint distribution, produces images/patches such that the performance of the Discriminator on them are random, i.e. P (Label Data = GeneratedData) = 0.5. During testing, the Discriminator assigns posterior probability values close to 0.5 for patches from non lesion regions, while patches centered on lesion arise from a different distribution (PLesion) and hence are assigned lower posterior probability value by the Discriminator. On the test set (n=14), the proposed technique achieves whole tumor dice score of 0.69, sensitivity of 91% and specificity of 59%. Additionally the generator network was capable of generating non lesion patches from various MR sequences.
High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2.

PubMed

Abécassis, V; Pompon, D; Truan, G

2000-10-15

The design of a family shuffling strategy (CLERY: Combinatorial Libraries Enhanced by Recombination in Yeast) associating PCR-based and in vivo recombination and expression in yeast is described. This strategy was tested using human cytochrome P450 CYP1A1 and CYP1A2 as templates, which share 74% nucleotide sequence identity. Construction of highly shuffled libraries of mosaic structures and reduction of parental gene contamination were two major goals. Library characterization involved multiprobe hybridization on DNA macro-arrays. The statistical analysis of randomly selected clones revealed a high proportion of chimeric genes (86%) and a homogeneous representation of the parental contribution among the sequences (55.8 +/- 2.5% for parental sequence 1A2). A microtiter plate screening system was designed to achieve colorimetric detection of polycyclic hydrocarbon hydroxylation by transformed yeast cells. Full sequences of five randomly picked and five functionally selected clones were analyzed. Results confirmed the shuffling efficiency and allowed calculation of the average length of sequence exchange and mutation rates. The efficient and statistically representative generation of mosaic structures by this type of family shuffling in a yeast expression system constitutes a novel and promising tool for structure-function studies and tuning enzymatic activities of multicomponent eucaryote complexes involving non-soluble enzymes.
Nonlinear Estimation of Discrete-Time Signals Under Random Observation Delay

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caballero-Aguila, R.; Jimenez-Lopez, J. D.; Hermoso-Carazo, A.

2008-11-06

This paper presents an approximation to the nonlinear least-squares estimation problem of discrete-time stochastic signals using nonlinear observations with additive white noise which can be randomly delayed by one sampling time. The observation delay is modelled by a sequence of independent Bernoulli random variables whose values, zero or one, indicate that the real observation arrives on time or it is delayed and, hence, the available measurement to estimate the signal is not up-to-date. Assuming that the state-space model generating the signal is unknown and only the covariance functions of the processes involved in the observation equation are ready for use,more » a filtering algorithm based on linear approximations of the real observations is proposed.« less
Polarization chaos and random bit generation in nonlinear fiber optics induced by a time-delayed counter-propagating feedback loop.

PubMed

Morosi, J; Berti, N; Akrout, A; Picozzi, A; Guasoni, M; Fatome, J

2018-01-22

In this manuscript, we experimentally and numerically investigate the chaotic dynamics of the state-of-polarization in a nonlinear optical fiber due to the cross-interaction between an incident signal and its intense backward replica generated at the fiber-end through an amplified reflective delayed loop. Thanks to the cross-polarization interaction between the two-delayed counter-propagating waves, the output polarization exhibits fast temporal chaotic dynamics, which enable a powerful scrambling process with moving speeds up to 600-krad/s. The performance of this all-optical scrambler was then evaluated on a 10-Gbit/s On/Off Keying telecom signal achieving an error-free transmission. We also describe how these temporal and chaotic polarization fluctuations can be exploited as an all-optical random number generator. To this aim, a billion-bit sequence was experimentally generated and successfully confronted to the dieharder benchmarking statistic tools. Our experimental analysis are supported by numerical simulations based on the resolution of counter-propagating coupled nonlinear propagation equations that confirm the observed behaviors.
The Conduct and Reporting of Child Health Research: An Analysis of Randomized Controlled Trials Published in 2012 and Evaluation of Change over 5 Years.

PubMed

Gates, Allison; Hartling, Lisa; Vandermeer, Ben; Caldwell, Patrina; Contopoulos-Ioannidis, Despina G; Curtis, Sarah; Fernandes, Ricardo M; Klassen, Terry P; Williams, Katrina; Dyson, Michele P

2018-02-01

For child health randomized controlled trials (RCTs) published in 2012, we aimed to describe design and reporting characteristics and evaluate changes since 2007; assess the association between trial design and registration and risk of bias (RoB); and assess the association between RoB and effect size. For 300 RCTs, we extracted design and reporting characteristics and assessed RoB. We assessed 5-year changes in design and reporting (based on 300 RCTs we had previously analyzed) using the Fisher exact test. We tested for associations between design and reporting characteristics and overall RoB and registration using the Fisher exact, Cochran-Armitage, Kruskal-Wallis, and Jonckheere-Terpstra tests. We pooled effect sizes and tested for differences by RoB using the χ 2 test for subgroups in meta-analysis. The 2012 and 2007 RCTs differed with respect to many design and reporting characteristics. From 2007 to 2012, RoB did not change for random sequence generation and improved for allocation concealment (P < .001). Fewer 2012 RCTs were rated high overall RoB and more were rated unclear (P = .03). Only 7.3% of 2012 RCTs were rated low overall RoB. Trial registration doubled from 2007 to 2012 (23% to 46%) (P < .001) and was associated with lower RoB (P = .009). Effect size did not differ by RoB (P = .43) CONCLUSIONS: Random sequence generation and allocation concealment were not often reported, and selective reporting was prevalent. Measures to increase trialists' awareness and application of existing reporting guidance, and the prospective registration of RCTs is needed to improve the trustworthiness of findings from this field. Copyright © 2017 The Author(s). Published by Elsevier Inc. All rights reserved.

Clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing

PubMed Central

Momeni, Stephanie S.; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A.; Childers, Noel K.

2015-01-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African American children was examined using MLST. Serotype and presence of collagen-binding proteins (CBP) cnm/cbm were also assessed. One hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using START2 and MEGA. Thirty-four sequence types (ST) were identified of which 27 were unique to this population. Seventy-five percent of the isolates clustered into 16 clonal groups. Serotypes observed were c (n=84), e (n=3), and k (n=11). The prevalence of S. mutans isolates serotype k was notably high at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized populations studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study is higher than reported in most populations and is the first report of S. mutans serotype k in a US population. PMID:26443288
Detecting targets hidden in random forests

NASA Astrophysics Data System (ADS)

Kouritzin, Michael A.; Luo, Dandan; Newton, Fraser; Wu, Biao

2009-05-01

Military tanks, cargo or troop carriers, missile carriers or rocket launchers often hide themselves from detection in the forests. This plagues the detection problem of locating these hidden targets. An electro-optic camera mounted on a surveillance aircraft or unmanned aerial vehicle is used to capture the images of the forests with possible hidden targets, e.g., rocket launchers. We consider random forests of longitudinal and latitudinal correlations. Specifically, foliage coverage is encoded with a binary representation (i.e., foliage or no foliage), and is correlated in adjacent regions. We address the detection problem of camouflaged targets hidden in random forests by building memory into the observations. In particular, we propose an efficient algorithm to generate random forests, ground, and camouflage of hidden targets with two dimensional correlations. The observations are a sequence of snapshots consisting of foliage-obscured ground or target. Theoretically, detection is possible because there are subtle differences in the correlations of the ground and camouflage of the rocket launcher. However, these differences are well beyond human perception. To detect the presence of hidden targets automatically, we develop a Markov representation for these sequences and modify the classical filtering equations to allow the Markov chain observation. Particle filters are used to estimate the position of the targets in combination with a novel random weighting technique. Furthermore, we give positive proof-of-concept simulations.
Target Site Recognition by a Diversity-Generating Retroelement

PubMed Central

Guo, Huatao; Tse, Longping V.; Nieh, Angela W.; Czornyj, Elizabeth; Williams, Steven; Oukil, Sabrina; Liu, Vincent B.; Miller, Jeff F.

2011-01-01

Diversity-generating retroelements (DGRs) are in vivo sequence diversification machines that are widely distributed in bacterial, phage, and plasmid genomes. They function to introduce vast amounts of targeted diversity into protein-encoding DNA sequences via mutagenic homing. Adenine residues are converted to random nucleotides in a retrotransposition process from a donor template repeat (TR) to a recipient variable repeat (VR). Using the Bordetella bacteriophage BPP-1 element as a prototype, we have characterized requirements for DGR target site function. Although sequences upstream of VR are dispensable, a 24 bp sequence immediately downstream of VR, which contains short inverted repeats, is required for efficient retrohoming. The inverted repeats form a hairpin or cruciform structure and mutational analysis demonstrated that, while the structure of the stem is important, its sequence can vary. In contrast, the loop has a sequence-dependent function. Structure-specific nuclease digestion confirmed the existence of a DNA hairpin/cruciform, and marker coconversion assays demonstrated that it influences the efficiency, but not the site of cDNA integration. Comparisons with other phage DGRs suggested that similar structures are a conserved feature of target sequences. Using a kanamycin resistance determinant as a reporter, we found that transplantation of the IMH and hairpin/cruciform-forming region was sufficient to target the DGR diversification machinery to a heterologous gene. In addition to furthering our understanding of DGR retrohoming, our results suggest that DGRs may provide unique tools for directed protein evolution via in vivo DNA diversification. PMID:22194701
Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent

PubMed Central

Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric

2014-01-01

Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414
Identification of Novel Growth Regulators in Plant Populations Expressing Random Peptides1[OPEN

PubMed Central

Bao, Zhilong; Clancy, Maureen A.

2017-01-01

The use of chemical genomics approaches allows the identification of small molecules that integrate into biological systems, thereby changing discrete processes that influence growth, development, or metabolism. Libraries of chemicals are applied to living systems, and changes in phenotype are observed, potentially leading to the identification of new growth regulators. This work describes an approach that is the nexus of chemical genomics and synthetic biology. Here, each plant in an extensive population synthesizes a unique small peptide arising from a transgene composed of a randomized nucleic acid sequence core flanked by translational start, stop, and cysteine-encoding (for disulfide cyclization) sequences. Ten and 16 amino acid sequences, bearing a core of six and 12 random amino acids, have been synthesized in Arabidopsis (Arabidopsis thaliana) plants. Populations were screened for phenotypes from the seedling stage through senescence. Dozens of phenotypes were observed in over 2,000 plants analyzed. Ten conspicuous phenotypes were verified through separate transformation and analysis of multiple independent lines. The results indicate that these populations contain sequences that often influence discrete aspects of plant biology. Novel peptides that affect photosynthesis, flowering, and red light response are described. The challenge now is to identify the mechanistic integrations of these peptides into biochemical processes. These populations serve as a new tool to identify small molecules that modulate discrete plant functions that could be produced later in transgenic plants or potentially applied exogenously to impart their effects. These findings could usher in a new generation of agricultural growth regulators, herbicides, or defense compounds. PMID:28807931
Sequencing of the large dsDNA genome of Oryctes rhinoceros nudivirus using multiple displacement amplification of nanogram amounts of virus DNA.

PubMed

Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A

2008-09-01

The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.
Conditional Monte Carlo randomization tests for regression models.

PubMed

Parhat, Parwen; Rosenberger, William F; Diao, Guoqing

2014-08-15

We discuss the computation of randomization tests for clinical trials of two treatments when the primary outcome is based on a regression model. We begin by revisiting the seminal paper of Gail, Tan, and Piantadosi (1988), and then describe a method based on Monte Carlo generation of randomization sequences. The tests based on this Monte Carlo procedure are design based, in that they incorporate the particular randomization procedure used. We discuss permuted block designs, complete randomization, and biased coin designs. We also use a new technique by Plamadeala and Rosenberger (2012) for simple computation of conditional randomization tests. Like Gail, Tan, and Piantadosi, we focus on residuals from generalized linear models and martingale residuals from survival models. Such techniques do not apply to longitudinal data analysis, and we introduce a method for computation of randomization tests based on the predicted rate of change from a generalized linear mixed model when outcomes are longitudinal. We show, by simulation, that these randomization tests preserve the size and power well under model misspecification. Copyright © 2014 John Wiley & Sons, Ltd.
Development and cross-species/genera transferability of microsatellite markers discovered using 454 genome sequencing in chokecherry (Prunus virginiana L.).

PubMed

Wang, Hongxia; Walla, James A; Zhong, Shaobin; Huang, Danqiong; Dai, Wenhao

2012-11-01

Chokecherry (Prunus virginiana L.) (2n = 4x = 32) is a unique Prunus species for both genetics and disease-resistance research due to its tetraploid nature and X-disease resistance. However, no genetic and genomic information on chokecherry is available. A partial chokecherry genome was sequenced using Roche 454 sequencing technology. A total of 145,094 reads covering 4.8 Mbp of the chokecherry genome were generated and 15,113 contigs were assembled, of which 11,675 contigs were larger than 100 bp in size. A total of 481 SSR loci were identified from 234 (out of 11,675) contigs and 246 polymerase chain reaction (PCR) primer pairs were designed. Of 246 primers, 212 (86.2 %) effectively produced amplification from the genomic DNA of chokecherry. All 212 amplifiable chokecherry primers were used to amplify genomic DNA from 11 other rosaceous species (sour cherry, sweet cherry, black cherry, peach, apricot, plum, apple, crabapple, pear, juneberry, and raspberry). Thus, chokecherry SSR primers can be transferable across Prunus species and other rosaceous species. An average of 63.2 and 58.7 % of amplifiable chokecherry primers amplified DNA from cherry and other Prunus species, respectively, while 47.2 % of amplifiable chokecherry primers amplified DNA from other rosaceous species. Using random genome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no sequence information available. Sequence information and confirmed transferability of the identified chokecherry SSRs among species will be valuable for genetic research in Prunus and other rosaceous species. Key message A total of 246 SSR primers were identified from chokecherry genome sequences. Of which, 212 were confirmed amplifiable both in chokecherry and other 11 other rosaceous species.
Effects of inbreeding on economic traits of channel catfish.

PubMed

Bondari, K; Dunham, R A

1987-05-01

Inbred channel catfish (Ictalurus punctatus) were produced from two generations of full-sib matings to study the effect of inbreeding on reproduction, growth and survival. A randomly mated control line was propagated from the same base population to be used for the evaluation of the inbred fish. First generation inbred (I1) and control (C1) lines comprised five full-sib families each. Second generation inbred (I2) and control (C2) lines were produced by mating each male catfish from the I1 or C1 line to two females in sequence, one from the I1 and one from the C1 line. The design also produced two reciprocal outcross lines to be compared to their contemporary inbred and control lines. The coefficient of inbreeding for the inbred line increased from 0.25 in generation 1 to 0.375 in generation 2. The inbreeding coefficient was zero for all other lines. The resulting fish were performance tested in two locations, Tifton, Georgia and Auburn, Alabama and no genotype-environment interactions occurred. Results indicated that one generation of inbreeding increased number of days required for eggs to hatch by 21%, but did not significantly influence spawn weight or hatchability score. However, inbred females produced more eggs/kg body weight than control females. Two generations of full-sib mating in Georgia did not depress weight when expressed as a deviation to random controls but was depressed 13-16% when expressed as a deviation to half-sib out-crosses. Second generation inbreds produced in Alabama exhibited a 19% depression for growth rate when compared to either random or half-sib outcross controls. Survival rates at various age intervals was not decreased by inbreeding. The amount of inbreeding depression varied among families and between sexes.
Methodological reporting of randomized trials in five leading Chinese nursing journals.

PubMed

Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

2014-01-01

Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.
Automated design evolution of stereochemically randomized protein foldamers

NASA Astrophysics Data System (ADS)

Ranbhor, Ranjit; Kumar, Anil; Patel, Kirti; Ramakrishnan, Vibin; Durani, Susheel

2018-05-01

Diversification of chain stereochemistry opens up the possibilities of an ‘in principle’ increase in the design space of proteins. This huge increase in the sequence and consequent structural variation is aimed at the generation of smart materials. To diversify protein structure stereochemically, we introduced L- and D-α-amino acids as the design alphabet. With a sequence design algorithm, we explored the usage of specific variables such as chirality and the sequence of this alphabet in independent steps. With molecular dynamics, we folded stereochemically diverse homopolypeptides and evaluated their ‘fitness’ for possible design as protein-like foldamers. We propose a fitness function to prune the most optimal fold among 1000 structures simulated with an automated repetitive simulated annealing molecular dynamics (AR-SAMD) approach. The highly scored poly-leucine fold with sequence lengths of 24 and 30 amino acids were later sequence-optimized using a Dead End Elimination cum Monte Carlo based optimization tool. This paper demonstrates a novel approach for the de novo design of protein-like foldamers.
Computational Analysis of Mouse piRNA Sequence and Biogenesis

PubMed Central

Betel, Doron; Sheridan, Robert; Marks, Debora S; Sander, Chris

2007-01-01

The recent discovery of a new class of 30-nucleotide long RNAs in mammalian testes, called PIWI-interacting RNA (piRNA), with similarities to microRNAs and repeat-associated small interfering RNAs (rasiRNAs), has raised puzzling questions regarding their biogenesis and function. We report a comparative analysis of currently available piRNA sequence data from the pachytene stage of mouse spermatogenesis that sheds light on their sequence diversity and mechanism of biogenesis. We conclude that (i) there are at least four times as many piRNAs in mouse testes than currently known; (ii) piRNAs, which originate from long precursor transcripts, are generated by quasi-random enzymatic processing that is guided by a weak sequence signature at the piRNA 5′ends resulting in a large number of distinct sequences; and (iii) many of the piRNA clusters contain inverted repeats segments capable of forming double-strand RNA fold-back segments that may initiate piRNA processing analogous to transposon silencing. PMID:17997596
Varietal Discrimination and Genetic Variability Analysis of Cymbopogon Using RAPD and ISSR Markers Analysis.

PubMed

Bishoyi, Ashok Kumar; Sharma, Anjali; Kavane, Aarti; Geetha, K A

2016-06-01

Cymbopogon is an important genus of family Poaceae, cultivated mainly for its essential oils which possess high medicinal and economical value. Several cultivars of Cymbopogon species are available for commercial cultivation in India and identification of these cultivars was conceded by means of morphological markers and essential oil constitution. Since these parameters are highly influenced by environmental factors, in most of the cases, it is difficult to identify Cymbopogon cultivars. In the present study, Random amplified polymorphic DNA (RAPD) and Inter-simple sequence repeat (ISSR) markers were employed to discriminate nine leading varieties of Cymbopogon since prior genomic information is lacking or very little in the genus. Ninety RAPD and 70 ISSR primers were used which generated 63 and 69 % polymorphic amplicons, respectively. Similarity in the pattern of UPGMA-derived dendrogram of RAPD and ISSR analysis revealed the reliability of the markers chosen for the study. Varietal/cultivar-specific markers generated from the study could be utilised for varietal/cultivar authentication, thus monitoring the quality of the essential oil production in Cymbopogon. These markers can also be utilised for the IPR protection of the cultivars. Moreover, the study provides molecular marker tool kit in both random and simple sequence repeats for diverse molecular research in the same or related genera.
Apparatus for the conversion of power strokes of a random sequence and of random lengths of strokes into potential energy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elkuch, E.

1984-01-17

The apparatus comprises at least one positive displacement pump, which is driven by the sea waves. The quantity of delivery of this pump is adjustable in accordance with the lengths of strokes made by the ocean waves. This is made possible in that the positive displacement pump comprises pistons having different volume displacements. The height of the incoming waves is measured by a membrane box connected to a transducer which generates signals such that only that piston of the plurality of pistons is made to operate, which has by design a volume displacement which gives the optimal recovery of themore » energy of the ocean waves. The or these pistons pump a working fluid into a storage vessel, which allows the generation of peak load as well as base load electrical energy.« less
Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

PubMed Central

Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

2010-01-01

Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614
Memory and learning with rapid audiovisual sequences

PubMed Central

Keller, Arielle S.; Sekuler, Robert

2015-01-01

We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed. PMID:26575193
Memory and learning with rapid audiovisual sequences.

PubMed

Keller, Arielle S; Sekuler, Robert

2015-01-01

We examined short-term memory for sequences of visual stimuli embedded in varying multisensory contexts. In two experiments, subjects judged the structure of the visual sequences while disregarding concurrent, but task-irrelevant auditory sequences. Stimuli were eight-item sequences in which varying luminances and frequencies were presented concurrently and rapidly (at 8 Hz). Subjects judged whether the final four items in a visual sequence identically replicated the first four items. Luminances and frequencies in each sequence were either perceptually correlated (Congruent) or were unrelated to one another (Incongruent). Experiment 1 showed that, despite encouragement to ignore the auditory stream, subjects' categorization of visual sequences was strongly influenced by the accompanying auditory sequences. Moreover, this influence tracked the similarity between a stimulus's separate audio and visual sequences, demonstrating that task-irrelevant auditory sequences underwent a considerable degree of processing. Using a variant of Hebb's repetition design, Experiment 2 compared musically trained subjects and subjects who had little or no musical training on the same task as used in Experiment 1. Test sequences included some that intermittently and randomly recurred, which produced better performance than sequences that were generated anew for each trial. The auditory component of a recurring audiovisual sequence influenced musically trained subjects more than it did other subjects. This result demonstrates that stimulus-selective, task-irrelevant learning of sequences can occur even when such learning is an incidental by-product of the task being performed.
Relatively Random: Context Effects on Perceived Randomness and Predicted Outcomes

ERIC Educational Resources Information Center

Matthews, William J.

2013-01-01

This article concerns the effect of context on people's judgments about sequences of chance outcomes. In Experiment 1, participants judged whether sequences were produced by random, mechanical processes (such as a roulette wheel) or skilled human action (such as basketball shots). Sequences with lower alternation rates were judged more likely to…
A Numerical Study of New Logistic Map

NASA Astrophysics Data System (ADS)

Khmou, Youssef

In this paper, we propose a new logistic map based on the relation of the information entropy, we study the bifurcation diagram comparatively to the standard logistic map. In the first part, we compare the obtained diagram, by numerical simulations, with that of the standard logistic map. It is found that the structures of both diagrams are similar where the range of the growth parameter is restricted to the interval [0,e]. In the second part, we present an application of the proposed map in traffic flow using macroscopic model. It is found that the bifurcation diagram is an exact model of the Greenberg’s model of traffic flow where the growth parameter corresponds to the optimal velocity and the random sequence corresponds to the density. In the last part, we present a second possible application of the proposed map which consists of random number generation. The results of the analysis show that the excluded initial values of the sequences are (0,1).
Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database

PubMed Central

2017-01-01

Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799

Partial bisulfite conversion for unique template sequencing.

PubMed

Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan

2018-01-25

We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

PubMed Central

Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

2017-01-01

Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800
Dimeric PROP1 binding to diverse palindromic TAAT sequences promotes its transcriptional activity.

PubMed

Nakayama, Michie; Kato, Takako; Susa, Takao; Sano, Akiko; Kitahara, Kousuke; Kato, Yukio

2009-08-13

Mutations in the Prop1 gene are responsible for murine Ames dwarfism and human combined pituitary hormone deficiency with hypogonadism. Recently, we reported that PROP1 is a possible transcription factor for gonadotropin subunit genes through plural cis-acting sites composed of AT-rich sequences containing a TAAT motif which differs from its consensus binding sequence known as PRDQ9 (TAATTGAATTA). This study aimed to verify the binding specificity and sequence of PROP1 by applying the method of SELEX (Systematic Evolution of Ligands by EXponential enrichment), EMSA (electrophoretic mobility shift assay) and transient transfection assay. SELEX, after 5, 7 and 9 generations of selection using a random sequence library, showed that nucleotides containing one or two TAAT motifs were accumulated and accounted for 98.5% at the 9th generation. Aligned sequences and EMSA demonstrated that PROP1 binds preferentially to 11 nucleotides composed of an inverted TAAT motif separated by 3 nucleotides with variation in the half site of palindromic TAAT motifs and with preferential requirement of T at the nucleotide number 5 immediately 3' to a TAAT motif. Transient transfection assay demonstrated first that dimeric binding of PROP1 to an inverted TAAT motif and its cognates resulted in transcriptional activation, whereas monomeric binding of PROP1 to a single TAAT motif and an inverted ATTA motif did not mediate activation. Thus, this study demonstrated that dimeric binding of PROP1 is able to recognize diverse palindromic TAAT sequences separated by 3 nucleotides and to exhibit its transcriptional activity.
Improved methods of DNA extraction from human spermatozoa that mitigate experimentally-induced oxidative DNA damage.

PubMed

Xavier, Miguel J; Nixon, Brett; Roman, Shaun D; Aitken, Robert John

2018-01-01

Current approaches for DNA extraction and fragmentation from mammalian spermatozoa provide several challenges for the investigation of the oxidative stress burden carried in the genome of male gametes. Indeed, the potential introduction of oxidative DNA damage induced by reactive oxygen species, reducing agents (dithiothreitol or beta-mercaptoethanol), and DNA shearing techniques used in the preparation of samples for chromatin immunoprecipitation and next-generation sequencing serve to cofound the reliability and accuracy of the results obtained. Here we report optimised methodology that minimises, or completely eliminates, exposure to DNA damaging compounds during extraction and fragmentation procedures. Specifically, we show that Micrococcal nuclease (MNase) digestion prior to cellular lysis generates a greater DNA yield with minimal collateral oxidation while randomly fragmenting the entire paternal genome. This modified methodology represents a significant improvement over traditional fragmentation achieved via sonication in the preparation of genomic DNA from human spermatozoa for downstream applications, such as next-generation sequencing. We also present a redesigned bioinformatic pipeline framework adjusted to correctly analyse this form of data and detect statistically relevant targets of oxidation.
Characterization and Modulation of Proteins Involved in Sulfur Mustard Vesication

DTIC Science & Technology

2000-06-01

PARP staining was present throughout the nucleus, the DBD showed a more localized punctate pattern in the region of the nucleolus and throughout the...34 oligonucleotide is synthesized that is identical in base composition to the antisense, but had a randomly generated sequence. This is an important control...reversed this inhibitory effect. The roles of PARP in modulating the composition and enzyme activities of the DNA synthesome were further investigated by
Absolute nuclear material assay

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2012-05-15

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Absolute nuclear material assay

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2010-07-13

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Assessment of clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing.

PubMed

Momeni, Stephanie S; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A; Childers, Noel K

2015-12-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African-American children was examined using MLST. Serotype and the presence of collagen-binding proteins (CBPs) encoded by cnm/cbm were also assessed. One-hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using start2 and mega. Thirty-four sequence types were identified, of which 27 were unique to this population. Seventy-five per cent of the isolates clustered into 16 clonal groups. The serotypes observed were c (n = 84), e (n = 3), and k (n = 11). The prevalence of S. mutans isolates of serotype k was notably high, at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized population studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study, is higher than reported in most populations and is the first report of S. mutans serotype k in a United States population. © 2015 Eur J Oral Sci.
Genetic variability in isolates of Chromobacterium violaceum from pulmonary secretion, water, and soil.

PubMed

Santini, A C; Magalhães, J T; Cascardo, J C M; Corrêa, R X

2016-04-28

Chromobacterium violaceum is a free-living Gram-negative bacillus usually found in the water and soil in tropical regions, which causes infections in humans. Chromobacteriosis is characterized by rapid dissemination and high mortality. The aim of this study was to detect the genetic variability among C. violaceum type strain ATCC 12472, and seven isolates from the environment and one from a pulmonary secretion from a chromobacteriosis patient from Ilhéus, Bahia. The molecular characterization of all samples was performed by polymerase chain reaction (PCR) sequencing and 16S rDNA analysis. Primers specific for two ATCC 12472 pathogenicity genes, hilA and yscD, as well as random amplified polymorphic DNA (RAPD), were used for PCR amplification and comparative sequencing of the products. For a more specific approach, the PCR products of 16S rDNA were digested with restriction enzymes. Seven of the samples, including type-strain ATCC 12472, were amplified by the hilA primers; these were subsequently sequenced. Gene yscD was amplified only in type-strain ATCC 12472. MspI and AluI digestion revealed 16S rDNA polymorphisms. This data allowed the generation of a dendogram for each analysis. The isolates of C. violaceum have variability in random genomic regions demonstrated by RAPD. Also, these isolates have variability in pathogenicity genes, as demonstrated by sequencing and restriction enzyme digestion.
The Effect of Interference on Temporal Order Memory for Random and Fixed Sequences in Nondemented Older Adults

ERIC Educational Resources Information Center

Tolentino, Jerlyn C.; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K.; Gilbert, Paul E.

2012-01-01

Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order…
Next-Generation DNA Sequencing of VH/VL Repertoires: A Primer and Guide to Applications in Single-Domain Antibody Discovery.

PubMed

Henry, Kevin A

2018-01-01

Immunogenetic analyses of expressed antibody repertoires are becoming increasingly common experimental investigations and are critical to furthering our understanding of autoimmunity, infectious disease, and cancer. Next-generation DNA sequencing (NGS) technologies have now made it possible to interrogate antibody repertoires to unprecedented depths, typically by sequencing of cDNAs encoding immunoglobulin variable domains. In this chapter, we describe simple, fast, and reliable methods for producing and sequencing multiplex PCR amplicons derived from the variable regions (V H , V H H or V L ) of rearranged immunoglobulin heavy and light chain genes using the Illumina MiSeq platform. We include complete protocols and primer sets for amplicon sequencing of V H /V H H/V L repertoires directly from human, mouse, and llama lymphocytes as well as from phage-displayed V H /V H H/V L libraries; these can be easily be adapted to other types of amplicons with little modification. The resulting amplicons are diverse and representative, even using as few as 10 3 input B cells, and their generation is relatively inexpensive, requiring no special equipment and only a limited set of primers. In the absence of heavy-light chain pairing, single-domain antibodies are uniquely amenable to NGS analyses. We present a number of applications of NGS technology useful in discovery of single-domain antibodies from phage display libraries, including: (i) assessment of library functionality; (ii) confirmation of desired library randomization; (iii) estimation of library diversity; and (iv) monitoring the progress of panning experiments. While the case studies presented here are of phage-displayed single-domain antibody libraries, the principles extend to other types of in vitro display libraries.
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

PubMed

Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

2018-03-20

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

PubMed Central

Pierro, Gaetano

2012-01-01

The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380
Visible digital watermarking system using perceptual models

NASA Astrophysics Data System (ADS)

Cheng, Qiang; Huang, Thomas S.

2001-03-01

This paper presents a visible watermarking system using perceptual models. %how and why A watermark image is overlaid translucently onto a primary image, for the purposes of immediate claim of copyright, instantaneous recognition of owner or creator, or deterrence to piracy of digital images or video. %perceptual The watermark is modulated by exploiting combined DCT-domain and DWT-domain perceptual models. % so that the watermark is visually uniform. The resulting watermarked image is visually pleasing and unobtrusive. The location, size and strength of the watermark vary randomly with the underlying image. The randomization makes the automatic removal of the watermark difficult even though the algorithm is known publicly but the key to the random sequence generator. The experiments demonstrate that the watermarked images have pleasant visual effect and strong robustness. The watermarking system can be used in copyright notification and protection.
Generating Correlated Gamma Sequences for Sea-Clutter Simulation

DTIC Science & Technology

2012-03-01

generation of correlated Gamma random fields via SIRP theory is examined in [Conte et al. 1991, Armstrong & Griffiths 1991]. In these papers , the Gamma...2 〉2 + |〈x[n]x∗[n+ k]〉|2 . (4) Because 〈 |x|2 〉2 = z̄2 and |〈x[n]x∗[n+ k]〉|2 ≥ 0, this results in 〈z[n]z[n+ k]〉 ≥ z̄2 if the real- isation of z[n] is...linear map- ping. In a practical situation, a process with a given auto-covariance function would be specified. It is shown that by using an
Universal quantum computation with temporal-mode bilayer square lattices

NASA Astrophysics Data System (ADS)

Alexander, Rafael N.; Yokoyama, Shota; Furusawa, Akira; Menicucci, Nicolas C.

2018-03-01

We propose an experimental design for universal continuous-variable quantum computation that incorporates recent innovations in linear-optics-based continuous-variable cluster state generation and cubic-phase gate teleportation. The first ingredient is a protocol for generating the bilayer-square-lattice cluster state (a universal resource state) with temporal modes of light. With this state, measurement-based implementation of Gaussian unitary gates requires only homodyne detection. Second, we describe a measurement device that implements an adaptive cubic-phase gate, up to a random phase-space displacement. It requires a two-step sequence of homodyne measurements and consumes a (non-Gaussian) cubic-phase state.
Scenario generation for stochastic optimization problems via the sparse grid method

DOE PAGES

Chen, Michael; Mehrotra, Sanjay; Papp, David

2015-04-19

We study the use of sparse grids in the scenario generation (or discretization) problem in stochastic programming problems where the uncertainty is modeled using a continuous multivariate distribution. We show that, under a regularity assumption on the random function involved, the sequence of optimal objective function values of the sparse grid approximations converges to the true optimal objective function values as the number of scenarios increases. The rate of convergence is also established. We treat separately the special case when the underlying distribution is an affine transform of a product of univariate distributions, and show how the sparse grid methodmore » can be adapted to the distribution by the use of quadrature formulas tailored to the distribution. We numerically compare the performance of the sparse grid method using different quadrature rules with classic quasi-Monte Carlo (QMC) methods, optimal rank-one lattice rules, and Monte Carlo (MC) scenario generation, using a series of utility maximization problems with up to 160 random variables. The results show that the sparse grid method is very efficient, especially if the integrand is sufficiently smooth. In such problems the sparse grid scenario generation method is found to need several orders of magnitude fewer scenarios than MC and QMC scenario generation to achieve the same accuracy. As a result, it is indicated that the method scales well with the dimension of the distribution--especially when the underlying distribution is an affine transform of a product of univariate distributions, in which case the method appears scalable to thousands of random variables.« less
Prospective identification of parasitic sequences in phage display screens

PubMed Central

Matochko, Wadim L.; Cory Li, S.; Tang, Sindy K.Y.; Derda, Ratmir

2014-01-01

Phage display empowered the development of proteins with new function and ligands for clinically relevant targets. In this report, we use next-generation sequencing to analyze phage-displayed libraries and uncover a strong bias induced by amplification preferences of phage in bacteria. This bias favors fast-growing sequences that collectively constitute <0.01% of the available diversity. Specifically, a library of 109 random 7-mer peptides (Ph.D.-7) includes a few thousand sequences that grow quickly (the ‘parasites’), which are the sequences that are typically identified in phage display screens published to date. A similar collapse was observed in other libraries. Using Illumina and Ion Torrent sequencing and multiple biological replicates of amplification of Ph.D.-7 library, we identified a focused population of 770 ‘parasites’. In all, 197 sequences from this population have been identified in literature reports that used Ph.D.-7 library. Many of these enriched sequences have confirmed function (e.g. target binding capacity). The bias in the literature, thus, can be viewed as a selection with two different selection pressures: (i) target-binding selection, and (ii) amplification-induced selection. Enrichment of parasitic sequences could be minimized if amplification bias is removed. Here, we demonstrate that emulsion amplification in libraries of ∼106 diverse clones prevents the biased selection of parasitic clones. PMID:24217917
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Gift from statistical learning: Visual statistical learning enhances memory for sequence elements and impairs memory for items that disrupt regularities.

PubMed

Otsuka, Sachio; Saiki, Jun

2016-02-01

Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.

P41IDENTIFICATION OF GLIOMA SPECIFIC APTAMER TARGETS

PubMed Central

Arora, Mohit; Alder, Jane; Lawrence, Clare; Davis, Charles; Dawson, Tim; Hall, Greg; Shaw, Lisa

2014-01-01

INTRODUCTION: Aptamers are in vitro generated DNA and RNA sequences which are randomly created as a library, with multiple permutations and combinations. These are then exposed to the target structure against which we want an aptamer ‘selected’ using Sequential Enumeration of Ligands by Exponential enrichment (SELEX). METHOD: Commercially available glioma and glial cell lines and in-house generated primary glioma cultures were used. Modified aptamers based on published sequences against glioma cell lines and newly generated sequences were used in the project to identify their binding targets. Cy3 or biotin- conjugated aptamers were incubated with live glioma cell cultures and imaged using confocal or light microscopy.To determine the target ligand, aptamers were then reacted with glial cell lysate and subjected to precipitation using streptavidin agarose beads and SDS polyacrylamide electrophoresis. Proteins were analysed by mass spectroscopy. RESULTS: Known and unknown aptamer protein ligands were co-precipitated. Ku70, Ku80 were precipitated along with nucleolin and related proteins. CONCLUSION: The aptamer has shown preferential binding to glioma cells and could act as a delivery system for therapeutic payloads. The aptamer targets Ku70 and Ku80, which are known to be over expressed in other forms of cancer but their role in gliomagenesis has not been fully elucidated. Other novel proteins have also been identified. Thus the aptamer co-precipitation technique has identified potential glioma biomarkers that may be of clinical significance.
Simulation of gene evolution under directional mutational pressure

NASA Astrophysics Data System (ADS)

Dudkiewicz, Małgorzata; Mackiewicz, Paweł; Kowalczuk, Maria; Mackiewicz, Dorota; Nowicka, Aleksandra; Polak, Natalia; Smolarczyk, Kamila; Banaszak, Joanna; R. Dudek, Mirosław; Cebrat, Stanisław

2004-05-01

The two main mechanisms generating the genetic diversity, mutation and recombination, have random character but they are biased which has an effect on the generation of asymmetry in the bacterial chromosome structure and in the protein coding sequences. Thus, like in a case of two chiral molecules-the two possible orientations of a gene in relation to the topology of a chromosome are not equivalent. Assuming that the sequence of a gene may oscillate only between certain limits of its structural composition means that the gene could be forced out of these limits by the directional mutation pressure, in the course of evolution. The probability of the event depends on the time the gene stays under the same mutation pressure. Inversion of the gene changes the directional mutational pressure to the reciprocal one and hence it changes the distance of the gene to its lower and upper bound of the structural tolerance. Using Monte Carlo methods we were able to simulate the evolution of genes under experimentally found mutational pressure, assuming simple mechanisms of selection. We found that the mutation and recombination should work in accordance to lower their negative effects on the function of the products of coding sequences.
GTRAC: fast retrieval from compressed collections of genomic variants

PubMed Central

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-01-01

Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665
GTRAC: fast retrieval from compressed collections of genomic variants.

PubMed

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-09-01

The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Absolute nuclear material assay using count distribution (LAMBDA) space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prasad, Mano K.; Snyderman, Neal J.; Rowland, Mark S.

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Absolute nuclear material assay using count distribution (LAMBDA) space

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2012-06-05

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
[Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

PubMed

Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

2011-01-01

The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction

PubMed Central

Laehnemann, David; Borkhardt, Arndt

2016-01-01

Characterizing the errors generated by common high-throughput sequencing platforms and telling true genetic variation from technical artefacts are two interdependent steps, essential to many analyses such as single nucleotide variant calling, haplotype inference, sequence assembly and evolutionary studies. Both random and systematic errors can show a specific occurrence profile for each of the six prominent sequencing platforms surveyed here: 454 pyrosequencing, Complete Genomics DNA nanoball sequencing, Illumina sequencing by synthesis, Ion Torrent semiconductor sequencing, Pacific Biosciences single-molecule real-time sequencing and Oxford Nanopore sequencing. There is a large variety of programs available for error removal in sequencing read data, which differ in the error models and statistical techniques they use, the features of the data they analyse, the parameters they determine from them and the data structures and algorithms they use. We highlight the assumptions they make and for which data types these hold, providing guidance which tools to consider for benchmarking with regard to the data properties. While no benchmarking results are included here, such specific benchmarks would greatly inform tool choices and future software development. The development of stand-alone error correctors, as well as single nucleotide variant and haplotype callers, could also benefit from using more of the knowledge about error profiles and from (re)combining ideas from the existing approaches presented here. PMID:26026159
Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx™ forensic genomics system.

PubMed

Wendt, Frank R; Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Ng, Jillian; Oldt, Robert F; McCulloh, Kelly L; Weise, Jessica A; Smith, David Glenn; Kanthaswamy, Sreetharan; Budowle, Bruce

2016-09-01

Forensically-relevant genetic markers were typed for sixty-two Yavapai Native Americans using the ForenSeq™ DNA Signature Prep Kit.These data are invaluable to the human identity community due to the greater genetic differentiation among Native American tribes than among other subdivisions within major populations of the United States. Autosomal, X-chromosomal, and Y-chromosomal short tandem repeat (STR) and identity-informative (iSNPs), ancestry-informative (aSNPs), and phenotype-informative (pSNPs) single nucleotide polymorphism (SNP) allele frequencies are reported. Sequence-based allelic variants were observed in 13 autosomal, 3 X, and 3 Y STRs. These observations increased observed and expected heterozygosities for autosomal STRs by 0.081±0.068 and 0.073±0.063, respectively, and decreased single-locus random match probabilities by 0.051±0.043 for 13 autosomal STRs. The autosomal random match probabilities (RMPs) were 2.37×10-26 and 2.81×10-29 for length-based and sequence-based alleles, respectively. There were 22 and 25 unique Y-STR haplotypes among 26 males, generating haplotype diversities of 0.95 and 0.96, for length-based and sequencebased alleles, respectively. Of the 26 haplotypes generated, 17 were assigned to haplogroup Q, three to haplogroup R1b, two each to haplogroups E1b1b and L, and one each to haplogroups R1a and I1. Male and female sequence-based X-STR random match probabilities were 3.28×10-7 and 1.22×10-6, respectively. The average observed and expected heterozygosities for 94 iSNPs were 0.39±0.12 and 0.39±0.13, respectively, and the combined iSNP RMP was 1.08×10-32. The combined STR and iSNP RMPs were 2.55×10-58 and 3.02×10-61 for length-based and sequence-based STR alleles, respectively. Ancestry and phenotypic SNP information, performed using the ForenSeq™ Universal Analysis Software, predicted black hair, brown eyes, and some probability of East Asian ancestry for all but one sample that clustered between European and Admixed American ancestry on a principal components analysis. These data serve as the first population assessment using the ForenSeq™ panel and highlight the value of employing sequence-based alleles for forensic DNA typing to increase heterozygosity, which is beneficial for identity testing in populations with reduced genetic diversity. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Mitochondrial DNA control region sequences from Nairobi (Kenya): inferring phylogenetic parameters for the establishment of a forensic database.

PubMed

Brandstätter, Anita; Peterson, Christine T; Irwin, Jodi A; Mpoke, Solomon; Koech, Davy K; Parson, Walther; Parsons, Thomas J

2004-10-01

Large forensic mtDNA databases which adhere to strict guidelines for generation and maintenance, are not available for many populations outside of the United States and western Europe. We have established a high quality mtDNA control region sequence database for urban Nairobi as both a reference database for forensic investigations, and as a tool to examine the genetic variation of Kenyan sequences in the context of known African variation. The Nairobi sequences exhibited high variation and a low random match probability, indicating utility for forensic testing. Haplogroup identification and frequencies were compared with those reported from other published studies on African, or African-origin populations from Mozambique, Sierra Leone, and the United States, and suggest significant differences in the mtDNA compositions of the various populations. The quality of the sequence data in our study was investigated and supported using phylogenetic measures. Our data demonstrate the diversity and distinctiveness of African populations, and underline the importance of establishing additional forensic mtDNA databases of indigenous African populations.
Templated sequence insertion polymorphisms in the human genome

NASA Astrophysics Data System (ADS)

Onozawa, Masahiro; Aplan, Peter

2016-11-01

Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.
Population genetics and molecular evolution of DNA sequences in transposable elements. I. A simulation framework.

PubMed

Kijima, T E; Innan, Hideki

2013-11-01

A population genetic simulation framework is developed to understand the behavior and molecular evolution of DNA sequences of transposable elements. Our model incorporates random transposition and excision of transposable element (TE) copies, two modes of selection against TEs, and degeneration of transpositional activity by point mutations. We first investigated the relationships between the behavior of the copy number of TEs and these parameters. Our results show that when selection is weak, the genome can maintain a relatively large number of TEs, but most of them are less active. In contrast, with strong selection, the genome can maintain only a limited number of TEs but the proportion of active copies is large. In such a case, there could be substantial fluctuations of the copy number over generations. We also explored how DNA sequences of TEs evolve through the simulations. In general, active copies form clusters around the original sequence, while less active copies have long branches specific to themselves, exhibiting a star-shaped phylogeny. It is demonstrated that the phylogeny of TE sequences could be informative to understand the dynamics of TE evolution.
Assessment of antibody library diversity through next generation sequencing and technical error compensation

PubMed Central

Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

2017-01-01

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201
Assessment of antibody library diversity through next generation sequencing and technical error compensation.

PubMed

Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

2017-01-01

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.
Natural Time and Nowcasting Earthquakes: Are Large Global Earthquakes Temporally Clustered?

NASA Astrophysics Data System (ADS)

Luginbuhl, Molly; Rundle, John B.; Turcotte, Donald L.

2018-02-01

The objective of this paper is to analyze the temporal clustering of large global earthquakes with respect to natural time, or interevent count, as opposed to regular clock time. To do this, we use two techniques: (1) nowcasting, a new method of statistically classifying seismicity and seismic risk, and (2) time series analysis of interevent counts. We chose the sequences of M_{λ } ≥ 7.0 and M_{λ } ≥ 8.0 earthquakes from the global centroid moment tensor (CMT) catalog from 2004 to 2016 for analysis. A significant number of these earthquakes will be aftershocks of the largest events, but no satisfactory method of declustering the aftershocks in clock time is available. A major advantage of using natural time is that it eliminates the need for declustering aftershocks. The event count we utilize is the number of small earthquakes that occur between large earthquakes. The small earthquake magnitude is chosen to be as small as possible, such that the catalog is still complete based on the Gutenberg-Richter statistics. For the CMT catalog, starting in 2004, we found the completeness magnitude to be M_{σ } ≥ 5.1. For the nowcasting method, the cumulative probability distribution of these interevent counts is obtained. We quantify the distribution using the exponent, β, of the best fitting Weibull distribution; β = 1 for a random (exponential) distribution. We considered 197 earthquakes with M_{λ } ≥ 7.0 and found β = 0.83 ± 0.08. We considered 15 earthquakes with M_{λ } ≥ 8.0, but this number was considered too small to generate a meaningful distribution. For comparison, we generated synthetic catalogs of earthquakes that occur randomly with the Gutenberg-Richter frequency-magnitude statistics. We considered a synthetic catalog of 1.97 × 10^5 M_{λ } ≥ 7.0 earthquakes and found β = 0.99 ± 0.01. The random catalog converted to natural time was also random. We then generated 1.5 × 10^4 synthetic catalogs with 197 M_{λ } ≥ 7.0 in each catalog and found the statistical range of β values. The observed value of β = 0.83 for the CMT catalog corresponds to a p value of p=0.004 leading us to conclude that the interevent natural times in the CMT catalog are not random. For the time series analysis, we calculated the autocorrelation function for the sequence of natural time intervals between large global earthquakes and again compared with data from 1.5 × 10^4 synthetic catalogs of random data. In this case, the spread of autocorrelation values was much larger, so we concluded that this approach is insensitive to deviations from random behavior.
Fine-scale population structure and the era of next-generation sequencing.

PubMed

Henn, Brenna M; Gravel, Simon; Moreno-Estrada, Andres; Acevedo-Acevedo, Suehelay; Bustamante, Carlos D

2010-10-15

Fine-scale population structure characterizes most continents and is especially pronounced in non-cosmopolitan populations. Roughly half of the world's population remains non-cosmopolitan and even populations within cities often assort along ethnic and linguistic categories. Barriers to random mating can be ecologically extreme, such as the Sahara Desert, or cultural, such as the Indian caste system. In either case, subpopulations accumulate genetic differences if the barrier is maintained over multiple generations. Genome-wide polymorphism data, initially with only a few hundred autosomal microsatellites, have clearly established differences in allele frequency not only among continental regions, but also within continents and within countries. We review recent evidence from the analysis of genome-wide polymorphism data for genetic boundaries delineating human population structure and the main demographic and genomic processes shaping variation, and discuss the implications of population structure for the distribution and discovery of disease-causing genetic variants, in the light of the imminent availability of sequencing data for a multitude of diverse human genomes.
Syntactic sequencing in Hebbian cell assemblies.

PubMed

Wennekers, Thomas; Palm, Günther

2009-12-01

Hebbian cell assemblies provide a theoretical framework for the modeling of cognitive processes that grounds them in the underlying physiological neural circuits. Recently we have presented an extension of cell assemblies by operational components which allows to model aspects of language, rules, and complex behaviour. In the present work we study the generation of syntactic sequences using operational cell assemblies timed by unspecific trigger signals. Syntactic patterns are implemented in terms of hetero-associative transition graphs in attractor networks which cause a directed flow of activity through the neural state space. We provide regimes for parameters that enable an unspecific excitatory control signal to switch reliably between attractors in accordance with the implemented syntactic rules. If several target attractors are possible in a given state, noise in the system in conjunction with a winner-takes-all mechanism can randomly choose a target. Disambiguation can also be guided by context signals or specific additional external signals. Given a permanently elevated level of external excitation the model can enter an autonomous mode, where it generates temporal grammatical patterns continuously.
EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.

PubMed

Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang

2016-11-01

The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Theta oscillations promote temporal sequence learning.

PubMed

Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan

2018-05-17

Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of temporal order information. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
HASP server: a database and structural visualization platform for comparative models of influenza A hemagglutinin proteins.

PubMed

Ambroggio, Xavier I; Dommer, Jennifer; Gopalan, Vivek; Dunham, Eleca J; Taubenberger, Jeffery K; Hurt, Darrell E

2013-06-18

Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences. Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis. The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.

Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

PubMed Central

2010-01-01

Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
The Role of the Y-Chromosome in the Establishment of Murine Hybrid Dysgenesis and in the Analysis of the Nucleotide Sequence Organization, Genetic Transmission and Evolution of Repeated Sequences.

NASA Astrophysics Data System (ADS)

Nallaseth, Ferez Soli

The Y-chromosome presents a unique cytogenetic framework for the evolution of nucleotide sequences. Alignment of nine Y-chromosomal fragments in their increasing Y-specific/non Y-specific (male/female) sequence divergence ratios was directly and inversely related to their interspersion on these two respective genomic fractions. Sequence analysis confirmed a direct relationship between divergence ratios and the Alu, LINE-1, Satellite and their derivative oligonucleotide contents. Thus their relocation on the Y-chromosome is followed by sequence divergence rather than the well documented concerted evolution of these non-coding progenitor repeated sequences. Five of the nine Y-chromosomal fragments are non-pseudoautosomal and transcribed into heterogeneous PolyA^+ RNA and thus can be retrotransposed. Evolutionary and computer analysis identified homologous oligonucleotide tracts in several human loci suggesting common and random mechanistic origins. Dysgenic genomes represent the accelerated evolution driving sequence divergence (McClintock, 1984). Sex reversal and sterility characterizing dysgenesis occurs in C57BL/6JY ^{rm Pos} but not in 129/SvY^{rm Pos} derivative strains. High frequency, random, multi-locus deletion products of the feral Y^{ rm Pos}-chromosome are generated in the germlines of F1(C57BL/6J X 129/SvY^{ rm Pos})(male) and C57BL/6JY ^{rm Pos}(male) but not in 129/SvY^{rm Pos}(male). Equal, 10^{-1}, 10^ {-2}, and 0 copies (relative to males) of Y^{rm Pos}-specific deletion products respectively characterize C57BL/6JY ^{rm Pos} (HC), (LC), (T) and (F) females. The testes determining loci of inactive Y^{rm Pos}-chromosomes in C57BL/6JY^{rm Pos} HC females are the preferentially deleted/rearranged Y ^{rm Pos}-sequences. Disruption of regulation of plasma testosterone and hepatic MUP-A mRNA levels, TRD of a 4.7 Kbp EcoR1 fragment suggest disruption of autosomal/X-chromosomal sequences. These data and the highly repeated progenitor (Alu, GATA, LINE-1) sequence content of deletion products confirmed the previously unidentified loss of genetic control of mammalian chromosome biology and hybrid dysgenesis.
In Darwinian evolution, feedback from natural selection leads to biased mutations.

PubMed

Caporale, Lynn Helena; Doyle, John

2013-12-01

Natural selection provides feedback through which information about the environment and its recurring challenges is captured, inherited, and accumulated within genomes in the form of variations that contribute to survival. The variation upon which natural selection acts is generally described as "random." Yet evidence has been mounting for decades, from such phenomena as mutation hotspots, horizontal gene transfer, and highly mutable repetitive sequences, that variation is far from the simplifying idealization of random processes as white (uniform in space and time and independent of the environment or context). This paper focuses on what is known about the generation and control of mutational variation, emphasizing that it is not uniform across the genome or in time, not unstructured with respect to survival, and is neither memoryless nor independent of the (also far from white) environment. We suggest that, as opposed to frequentist methods, Bayesian analysis could capture the evolution of nonuniform probabilities of distinct classes of mutation, and argue not only that the locations, styles, and timing of real mutations are not correctly modeled as generated by a white noise random process, but that such a process would be inconsistent with evolutionary theory. © 2013 New York Academy of Sciences.
Pseudo-random dynamic address configuration (PRDAC) algorithm for mobile ad hoc networks

NASA Astrophysics Data System (ADS)

Wu, Shaochuan; Tan, Xuezhi

2007-11-01

By analyzing all kinds of address configuration algorithms, this paper provides a new pseudo-random dynamic address configuration (PRDAC) algorithm for mobile ad hoc networks. Based on PRDAC, the first node that initials this network randomly chooses a nonlinear shift register that can generates an m-sequence. When another node joins this network, the initial node will act as an IP address configuration sever to compute an IP address according to this nonlinear shift register, and then allocates this address and tell the generator polynomial of this shift register to this new node. By this means, when other node joins this network, any node that has obtained an IP address can act as a server to allocate address to this new node. PRDAC can also efficiently avoid IP conflicts and deal with network partition and merge as same as prophet address (PA) allocation and dynamic configuration and distribution protocol (DCDP). Furthermore, PRDAC has less algorithm complexity, less computational complexity and more sufficient assumption than PA. In addition, PRDAC radically avoids address conflicts and maximizes the utilization rate of IP addresses. Analysis and simulation results show that PRDAC has rapid convergence, low overhead and immune from topological structures.
[Methodological quality and reporting quality evaluation of randomized controlled trials published in China Journal of Chinese Materia Medica].

PubMed

Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei

2018-02-01

To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials （1.09%） reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.
Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe.

PubMed

Chen, Bo-Ruei; Hale, Devin C; Ciolek, Peter J; Runge, Kurt W

2012-05-03

Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches.
Optimized scheduling technique of null subcarriers for peak power control in 3GPP LTE downlink.

PubMed

Cho, Soobum; Park, Sang Kyu

2014-01-01

Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system.
Metagenomic approaches for direct and cell culture evaluation of the virological quality of wastewater

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aw, Tiong Gim; Howe, Adina; Rose, Joan B.

2014-12-01

Genomic-based molecular techniques are emerging as powerful tools that allow a comprehensive characterization of water and wastewater microbiomes. Most recently, next generation sequencing (NGS) technologies which produce large amounts of sequence data are beginning to impact the field of environmental virology. In this study, NGS and bioinformatics have been employed for the direct detection and characterization of viruses in wastewater and of viruses isolated after cell culture. Viral particles were concentrated and purified from sewage samples by polyethylene glycol precipitation. Viral nucleic acid was extracted and randomly amplified prior to sequencing using Illumina technology, yielding a total of 18 millionmore » sequence reads. Most of the viral sequences detected could not be characterized, indicating the great viral diversity that is yet to be discovered. This sewage virome was dominated by bacteriophages and contained sequences related to known human pathogenic viruses such as adenoviruses (species B, C and F), polyomaviruses JC and BK and enteroviruses (type B). An array of other animal viruses was also found, suggesting unknown zoonotic viruses. This study demonstrated the feasibility of metagenomic approaches to characterize viruses in complex environmental water samples.« less
Optimized Scheduling Technique of Null Subcarriers for Peak Power Control in 3GPP LTE Downlink

PubMed Central

Park, Sang Kyu

2014-01-01

Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system. PMID:24883376
Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.

PubMed

Jérôme, Mariette; Noirot, Céline; Klopp, Christophe

2011-05-26

Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.
Simulation of Crack Propagation in Engine Rotating Components under Variable Amplitude Loading

NASA Technical Reports Server (NTRS)

Bonacuse, P. J.; Ghosn, L. J.; Telesman, J.; Calomino, A. M.; Kantzos, P.

1998-01-01

The crack propagation life of tested specimens has been repeatedly shown to strongly depend on the loading history. Overloads and extended stress holds at temperature can either retard or accelerate the crack growth rate. Therefore, to accurately predict the crack propagation life of an actual component, it is essential to approximate the true loading history. In military rotorcraft engine applications, the loading profile (stress amplitudes, temperature, and number of excursions) can vary significantly depending on the type of mission flown. To accurately assess the durability of a fleet of engines, the crack propagation life distribution of a specific component should account for the variability in the missions performed (proportion of missions flown and sequence). In this report, analytical and experimental studies are described that calibrate/validate the crack propagation prediction capability ]or a disk alloy under variable amplitude loading. A crack closure based model was adopted to analytically predict the load interaction effects. Furthermore, a methodology has been developed to realistically simulate the actual mission mix loading on a fleet of engines over their lifetime. A sequence of missions is randomly selected and the number of repeats of each mission in the sequence is determined assuming a Poisson distributed random variable with a given mean occurrence rate. Multiple realizations of random mission histories are generated in this manner and are used to produce stress, temperature, and time points for fracture mechanics calculations. The result is a cumulative distribution of crack propagation lives for a given, life limiting, component location. This information can be used to determine a safe retirement life or inspection interval for the given location.
Key Aspects of Nucleic Acid Library Design for in Vitro Selection

PubMed Central

Vorobyeva, Maria A.; Davydova, Anna S.; Vorobjev, Pavel E.; Pyshnyi, Dmitrii V.; Venyaminova, Alya G.

2018-01-01

Nucleic acid aptamers capable of selectively recognizing their target molecules have nowadays been established as powerful and tunable tools for biospecific applications, be it therapeutics, drug delivery systems or biosensors. It is now generally acknowledged that in vitro selection enables one to generate aptamers to almost any target of interest. However, the success of selection and the affinity of the resulting aptamers depend to a large extent on the nature and design of an initial random nucleic acid library. In this review, we summarize and discuss the most important features of the design of nucleic acid libraries for in vitro selection such as the nature of the library (DNA, RNA or modified nucleotides), the length of a randomized region and the presence of fixed sequences. We also compare and contrast different randomization strategies and consider computer methods of library design and some other aspects. PMID:29401748
Investigation of the contextual interference effect in the manipulation of the motor parameter of over-all force.

PubMed

Goodwin, J E; Meeuwsen, H J

1996-12-01

This investigation examined the contextual interference effect when manipulating over-all force in a golf-putting task. Undergraduate women (N = 30) were randomly assigned to a Random, Blocked-Random, or Blocked practice condition and practiced golf putting from distances of 2.43 m, 3.95 m, and 5.47 m during acquisition. Subjects in the Random condition practiced trials in a quasirandom sequence and those in the Blocked-Random condition practiced trials initially in a blocked sequence with the remainder of the trials practiced in a quasirandom sequence. In the Blocked condition subjects practiced trials in a blocked sequence. A 24-hr. transfer test consisted of 30 trials with 10 trials each from 1.67 m, 3.19 m, and 6.23 m. Transfer scores supported the Magill and Hall (1990) hypothesis that, when task variations involve learning parameters of a generalized motor program, the benefit of random practice over blocked practice would not be found.
Least squares deconvolution for leak detection with a pseudo random binary sequence excitation

NASA Astrophysics Data System (ADS)

Nguyen, Si Tran Nguyen; Gong, Jinzhe; Lambert, Martin F.; Zecchin, Aaron C.; Simpson, Angus R.

2018-01-01

Leak detection and localisation is critical for water distribution system pipelines. This paper examines the use of the time-domain impulse response function (IRF) for leak detection and localisation in a pressurised water pipeline with a pseudo random binary sequence (PRBS) signal excitation. Compared to the conventional step wave generated using a single fast operation of a valve closure, a PRBS signal offers advantageous correlation properties, in that the signal has very low autocorrelation for lags different from zero and low cross correlation with other signals including noise and other interference. These properties result in a significant improvement in the IRF signal to noise ratio (SNR), leading to more accurate leak localisation. In this paper, the estimation of the system IRF is formulated as an optimisation problem in which the l2 norm of the IRF is minimised to suppress the impact of noise and interference sources. Both numerical and experimental data are used to verify the proposed technique. The resultant estimated IRF provides not only accurate leak location estimation, but also good sensitivity to small leak sizes due to the improved SNR.
Operations analysis (study 2.1): Program manual and users guide for the LOVES computer code

NASA Technical Reports Server (NTRS)

Wray, S. T., Jr.

1975-01-01

Information is provided necessary to use the LOVES Computer Program in its existing state, or to modify the program to include studies not properly handled by the basic model. The Users Guide defines the basic elements assembled together to form the model for servicing satellites in orbit. As the program is a simulation, the method of attack is to disassemble the problem into a sequence of events, each occurring instantaneously and each creating one or more other events in the future. The main driving force of the simulation is the deterministic launch schedule of satellites and the subsequent failure of the various modules which make up the satellites. The LOVES Computer Program uses a random number generator to simulate the failure of module elements and therefore operates over a long span of time typically 10 to 15 years. The sequence of events is varied by making several runs in succession with different random numbers resulting in a Monte Carlo technique to determine statistical parameters of minimum value, average value, and maximum value.
Stimulus novelty, task relevance and the visual evoked potential in man

NASA Technical Reports Server (NTRS)

Courchesne, E.; Hillyard, S. A.; Galambos, R.

1975-01-01

The effect of task relevance on P3 (waveform of human evoked potential) waves and the methodologies used to deal with them are outlined. Visual evoked potentials (VEPs) were recorded from normal adult subjects performing in a visual discrimination task. Subjects counted the number of presentations of the numeral 4 which was interposed rarely and randomly within a sequence of tachistoscopically flashed background stimuli. Intrusive, task-irrelevant (not counted) stimuli were also interspersed rarely and randomly in the sequence of 2s; these stimuli were of two types: simples, which were easily recognizable, and novels, which were completely unrecognizable. It was found that the simples and the counted 4s evoked posteriorly distributed P3 waves while the irrelevant novels evoked large, frontally distributed P3 waves. These large, frontal P3 waves to novels were also found to be preceded by large N2 waves. These findings indicate that the P3 wave is not a unitary phenomenon but should be considered in terms of a family of waves, differing in their brain generators and in their psychological correlates.
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures

PubMed Central

Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad

2015-01-01

We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411
Identification of Multiple Novel Viruses, Including a Parvovirus and a Hepevirus, in Feces of Red Foxes

PubMed Central

van der Giessen, Joke; Haagmans, Bart L.; Osterhaus, Albert D. M. E.; Smits, Saskia L.

2013-01-01

Red foxes (Vulpes vulpes) are the most widespread members of the order of Carnivora. Since they often live in (peri)urban areas, they are a potential reservoir of viruses that transmit from wildlife to humans or domestic animals. Here we evaluated the fecal viral microbiome of 13 red foxes by random PCR in combination with next-generation sequencing. Various novel viruses, including a parvovirus, bocavirus, adeno-associated virus, hepevirus, astroviruses, and picobirnaviruses, were identified. PMID:23616657
Wideband propagation measurements at 30.3 GHz through a pecan orchard in Texas

NASA Astrophysics Data System (ADS)

Papazian, Peter B.; Jones, David L.; Espeland, Richard H.

1992-09-01

Wideband propagation measurements were made in a pecan orchard in Texas during April and August of 1990 to examine the propagation characteristics of millimeter-wave signals through vegetation. Measurements were made on tree obstructed paths with and without leaves. The study presents narrowband attenuation data at 9.6 and 28.8 GHz as well as wideband impulse response measurements at 30.3 GHz. The wideband probe (Violette et al., 1983), provides amplitude and delay of reflected and scattered signals and bit-error rate. This is accomplished using a 500 MBit/sec pseudo-random code to BPSK modulate a 28.8 GHz carrier. The channel impulse response is then extracted by cross correlating the received pseudo-random sequence with a locally generated replica.
Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

PubMed Central

2012-01-01

Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868

Ranked solutions to a class of combinatorial optimizations—with applications in mass spectrometry based peptide sequencing and a variant of directed paths in random media

NASA Astrophysics Data System (ADS)

Doerr, Timothy P.; Alves, Gelio; Yu, Yi-Kuo

2005-08-01

Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions-find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks-peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.
Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
The effect of interference on temporal order memory for random and fixed sequences in nondemented older adults.

PubMed

Tolentino, Jerlyn C; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K; Gilbert, Paul E

2012-05-21

Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order memory for fixed sequences is comparable between older adults and young adults when temporal interference is minimized. The results suggest that temporal order memory is less efficient and more susceptible to interference in older adults, possibly due to impaired temporal pattern separation.
A computer aided thermodynamic approach for predicting the formation of Z-DNA in naturally occurring sequences

NASA Technical Reports Server (NTRS)

Ho, P. S.; Ellison, M. J.; Quigley, G. J.; Rich, A.

1986-01-01

The ease with which a particular DNA segment adopts the left-handed Z-conformation depends largely on the sequence and on the degree of negative supercoiling to which it is subjected. We describe a computer program (Z-hunt) that is designed to search long sequences of naturally occurring DNA and retrieve those nucleotide combinations of up to 24 bp in length which show a strong propensity for Z-DNA formation. Incorporated into Z-hunt is a statistical mechanical model based on empirically determined energetic parameters for the B to Z transition accumulated to date. The Z-forming potential of a sequence is assessed by ranking its behavior as a function of negative superhelicity relative to the behavior of similar sized randomly generated nucleotide sequences assembled from over 80,000 combinations. The program makes it possible to compare directly the Z-forming potential of sequences with different base compositions and different sequence lengths. Using Z-hunt, we have analyzed the DNA sequences of the bacteriophage phi X174, plasmid pBR322, the animal virus SV40 and the replicative form of the eukaryotic adenovirus-2. The results are compared with those previously obtained by others from experiments designed to locate Z-DNA forming regions in these sequences using probes which show specificity for the left-handed DNA conformation.
Origins of Protein Functions in Cells

NASA Technical Reports Server (NTRS)

Seelig, Burchard; Pohorille, Andrzej

2011-01-01

In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known. Recently it was found that, as in the previous case, the proteins have a structure unknown among modern enzymes. In this case, in vitro evolution started from a small, non-enzymatic protein. A similar selection process initiated from a library of random polypeptides is in progress. These results not only allow for estimating the occurrence of function in random protein assemblies but also provide evidence for the possibility of alternative protein worlds. Extant proteins might simply represent a frozen accident in the world of possible proteins. Alternative collections of proteins, even with similar functions, could originate alternative evolutionary paths.
A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?

PubMed Central

Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.

2018-01-01

Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157
When Gravity Fails: Local Search Topology

NASA Technical Reports Server (NTRS)

Frank, Jeremy; Cheeseman, Peter; Stutz, John; Lau, Sonie (Technical Monitor)

1997-01-01

Local search algorithms for combinatorial search problems frequently encounter a sequence of states in which it is impossible to improve the value of the objective function; moves through these regions, called {\\em plateau moves), dominate the time spent in local search. We analyze and characterize {\\em plateaus) for three different classes of randomly generated Boolean Satisfiability problems. We identify several interesting features of plateaus that impact the performance of local search algorithms. We show that local minima tend to be small but occasionally may be very large. We also show that local minima can be escaped without unsatisfying a large number of clauses, but that systematically searching for an escape route may be computationally expensive if the local minimum is large. We show that plateaus with exits, called benches, tend to be much larger than minima, and that some benches have very few exit states which local search can use to escape. We show that the solutions (i.e. global minima) of randomly generated problem instances form clusters, which behave similarly to local minima. We revisit several enhancements of local search algorithms and explain their performance in light of our results. Finally we discuss strategies for creating the next generation of local search algorithms.
Incompleteness and limit of security theory of quantum key distribution

NASA Astrophysics Data System (ADS)

Hirota, Osamu; Murakami, Dan; Kato, Kentaro; Futami, Fumio

2012-10-01

It is claimed in the many papers that a trace distance: d guarantees the universal composition security in quantum key distribution (QKD) like BB84 protocol. In this introduction paper, at first, it is explicitly explained what is the main misconception in the claim of the unconditional security for QKD theory. In general terms, the cause of the misunderstanding on the security claim is the Lemma in the paper of Renner. It suggests that the generation of the perfect random key is assured by the probability (1-d), and its failure probability is d. Thus, it concludes that the generated key provides the perfect random key sequence when the protocol is success. So the QKD provides perfect secrecy to the one time pad. This is the reason for the composition claim. However, the quantity of the trace distance (or variational distance) is not the probability for such an event. If d is not small enough, always the generated key sequence is not uniform. Now one needs the reconstruction of the evaluation of the trace distance if one wants to use it. One should first go back to the indistinguishability theory in the computational complexity based, and to clarify the meaning of the value of the variational distance. In addition, the same analysis for the information theoretic case is necessary. The recent serial papers by H.P.Yuen have given the answer on such questions. In this paper, we show more concise description of Yuen's theory, and clarify that the upper bound theories for the trace distance by Tomamichel et al and Hayashi et al are constructed by the wrong reasoning of Renner and it is unsuitable as the security analysis. Finally, we introduce a new macroscopic quantum communication to replace Q-bit QKD.
High throughput mutagenesis for identification of residues regulating human prostacyclin (hIP) receptor expression and function.

PubMed

Bill, Anke; Rosethorne, Elizabeth M; Kent, Toby C; Fawcett, Lindsay; Burchell, Lynn; van Diepen, Michiel T; Marelli, Anthony; Batalov, Sergey; Miraglia, Loren; Orth, Anthony P; Renaud, Nicole A; Charlton, Steven J; Gosling, Martin; Gaither, L Alex; Groot-Kormelink, Paul J

2014-01-01

The human prostacyclin receptor (hIP receptor) is a seven-transmembrane G protein-coupled receptor (GPCR) that plays a critical role in vascular smooth muscle relaxation and platelet aggregation. hIP receptor dysfunction has been implicated in numerous cardiovascular abnormalities, including myocardial infarction, hypertension, thrombosis and atherosclerosis. Genomic sequencing has discovered several genetic variations in the PTGIR gene coding for hIP receptor, however, its structure-function relationship has not been sufficiently explored. Here we set out to investigate the applicability of high throughput random mutagenesis to study the structure-function relationship of hIP receptor. While chemical mutagenesis was not suitable to generate a mutagenesis library with sufficient coverage, our data demonstrate error-prone PCR (epPCR) mediated mutagenesis as a valuable method for the unbiased screening of residues regulating hIP receptor function and expression. Here we describe the generation and functional characterization of an epPCR derived mutagenesis library compromising >4000 mutants of the hIP receptor. We introduce next generation sequencing as a useful tool to validate the quality of mutagenesis libraries by providing information about the coverage, mutation rate and mutational bias. We identified 18 mutants of the hIP receptor that were expressed at the cell surface, but demonstrated impaired receptor function. A total of 38 non-synonymous mutations were identified within the coding region of the hIP receptor, mapping to 36 distinct residues, including several mutations previously reported to affect the signaling of the hIP receptor. Thus, our data demonstrates epPCR mediated random mutagenesis as a valuable and practical method to study the structure-function relationship of GPCRs.
High Throughput Mutagenesis for Identification of Residues Regulating Human Prostacyclin (hIP) Receptor Expression and Function

PubMed Central

Kent, Toby C.; Fawcett, Lindsay; Burchell, Lynn; van Diepen, Michiel T.; Marelli, Anthony; Batalov, Sergey; Miraglia, Loren; Orth, Anthony P.; Renaud, Nicole A.; Charlton, Steven J.; Gosling, Martin; Gaither, L. Alex; Groot-Kormelink, Paul J.

2014-01-01

The human prostacyclin receptor (hIP receptor) is a seven-transmembrane G protein-coupled receptor (GPCR) that plays a critical role in vascular smooth muscle relaxation and platelet aggregation. hIP receptor dysfunction has been implicated in numerous cardiovascular abnormalities, including myocardial infarction, hypertension, thrombosis and atherosclerosis. Genomic sequencing has discovered several genetic variations in the PTGIR gene coding for hIP receptor, however, its structure-function relationship has not been sufficiently explored. Here we set out to investigate the applicability of high throughput random mutagenesis to study the structure-function relationship of hIP receptor. While chemical mutagenesis was not suitable to generate a mutagenesis library with sufficient coverage, our data demonstrate error-prone PCR (epPCR) mediated mutagenesis as a valuable method for the unbiased screening of residues regulating hIP receptor function and expression. Here we describe the generation and functional characterization of an epPCR derived mutagenesis library compromising >4000 mutants of the hIP receptor. We introduce next generation sequencing as a useful tool to validate the quality of mutagenesis libraries by providing information about the coverage, mutation rate and mutational bias. We identified 18 mutants of the hIP receptor that were expressed at the cell surface, but demonstrated impaired receptor function. A total of 38 non-synonymous mutations were identified within the coding region of the hIP receptor, mapping to 36 distinct residues, including several mutations previously reported to affect the signaling of the hIP receptor. Thus, our data demonstrates epPCR mediated random mutagenesis as a valuable and practical method to study the structure-function relationship of GPCRs. PMID:24886841
Violation of an Evolutionarily Conserved Immunoglobulin Diversity Gene Sequence Preference Promotes Production of dsDNA-Specific IgG Antibodies

PubMed Central

Silva-Sanchez, Aaron; Liu, Cun Ren; Vale, Andre M.; Khass, Mohamed; Kapoor, Pratibha; Elgavish, Ada; Ivanov, Ivaylo I.; Ippolito, Gregory C.; Schelonka, Robert L.; Schoeb, Trenton R.; Burrows, Peter D.; Schroeder, Harry W.

2015-01-01

Variability in the developing antibody repertoire is focused on the third complementarity determining region of the H chain (CDR-H3), which lies at the center of the antigen binding site where it often plays a decisive role in antigen binding. The power of VDJ recombination and N nucleotide addition has led to the common conception that the sequence of CDR-H3 is unrestricted in its variability and random in its composition. Under this view, the immune response is solely controlled by somatic positive and negative clonal selection mechanisms that act on individual B cells to promote production of protective antibodies and prevent the production of self-reactive antibodies. This concept of a repertoire of random antigen binding sites is inconsistent with the observation that diversity (DH) gene segment sequence content by reading frame (RF) is evolutionarily conserved, creating biases in the prevalence and distribution of individual amino acids in CDR-H3. For example, arginine, which is often found in the CDR-H3 of dsDNA binding autoantibodies, is under-represented in the commonly used DH RFs rearranged by deletion, but is a frequent component of rarely used inverted RF1 (iRF1), which is rearranged by inversion. To determine the effect of altering this germline bias in DH gene segment sequence on autoantibody production, we generated mice that by genetic manipulation are forced to utilize an iRF1 sequence encoding two arginines. Over a one year period we collected serial serum samples from these unimmunized, specific pathogen-free mice and found that more than one-fifth of them contained elevated levels of dsDNA-binding IgG, but not IgM; whereas mice with a wild type DH sequence did not. Thus, germline bias against the use of arginine enriched DH sequence helps to reduce the likelihood of producing self-reactive antibodies. PMID:25706374
A package of Linux scripts for the parallelization of Monte Carlo simulations

NASA Astrophysics Data System (ADS)

Badal, Andreu; Sempau, Josep

2006-09-01

Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the sequential code. Program summary 1Title of program:clonEasy Catalogue identifier:ADYD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYD_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a Unix style shell (bash), support for the Secure Shell protocol and a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1) Compilers:GNU FORTRAN g77 (Linux); g95 (Linux); Intel Fortran Compiler 7.1 (Linux) Programming language used:Linux shell (bash) script, FORTRAN 77 No. of bits in a word:32 No. of lines in distributed program, including test data, etc.:1916 No. of bytes in distributed program, including test data, etc.:18 202 Distribution format:tar.gz Nature of the physical problem:There are many situations where a Monte Carlo simulation involves a huge amount of CPU time. The parallelization of such calculations is a simple way of obtaining a relatively low statistical uncertainty using a reasonable amount of time. Method of solution:The presented collection of Linux scripts and auxiliary FORTRAN programs implement Secure Shell-based communication between a "master" computer and a set of "clones". The aim of this communication is to execute a code that performs a Monte Carlo simulation on all the clones simultaneously. The code is unique, but each clone is fed with a different set of random seeds. Hence, clonEasy effectively permits the parallelization of the calculation. Restrictions on the complexity of the program:clonEasy can only be used with programs that produce statistically independent results using the same code, but with a different sequence of random numbers. Users must choose the initialization values for the random number generator on each computer and combine the output from the different executions. A FORTRAN program to combine the final results is also provided. Typical running time:The execution time of each script largely depends on the number of computers that are used, the actions that are to be performed and, to a lesser extent, on the network connexion bandwidth. Unusual features of the program:Any computer on the Internet with a Secure Shell client/server program installed can be used as a node of a virtual computer cluster for parallel calculations with the sequential source code. The simplicity of the parallelization scheme makes the use of this package a straightforward task, which does not require installing any additional libraries. Program summary 2Title of program:seedsMLCG Catalogue identifier:ADYE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYE_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1), MS Windows (2000, XP) Compilers:GNU FORTRAN g77 (Linux and Windows); g95 (Linux); Intel Fortran Compiler 7.1 (Linux); Compaq Visual Fortran 6.1 (Windows) Programming language used:FORTRAN 77 No. of bits in a word:32 Memory required to execute with typical data:500 kilobytes No. of lines in distributed program, including test data, etc.:492 No. of bytes in distributed program, including test data, etc.:5582 Distribution format:tar.gz Nature of the physical problem:Statistically independent results from different runs of a Monte Carlo code can be obtained using uncorrelated sequences of random numbers on each execution. Multiplicative linear congruential generators (MLCG), or other generators that are based on them such as RANECU, can be adapted to produce these sequences. Method of solution:For a given MLCG, the presented program calculates initialization values that produce disjoint, consecutive sequences of pseudo-random numbers. The calculated values initiate the generator in distant positions of the random number cycle and can be used, for instance, on a parallel simulation. The values are found using the formula S=(aS)MODm, which gives the random value that will be generated after J iterations of the MLCG. Restrictions on the complexity of the program:The 32-bit length restriction for the integer variables in standard FORTRAN 77 limits the produced seeds to be separated a distance smaller than 2 31, when the distance J is expressed as an integer value. The program allows the user to input the distance as a power of 10 for the purpose of efficiently splitting the sequence of generators with a very long period. Typical running time:The execution time depends on the parameters of the used MLCG and the distance between the generated seeds. The generation of 10 6 seeds separated 10 12 units in the sequential cycle, for one of the MLCGs found in the RANECU generator, takes 3 s on a 2.4 GHz Intel Pentium 4 using the g77 compiler.
Sma3s: a three-step modular annotator for large sequence datasets.

PubMed

Muñoz-Mérida, Antonio; Viguera, Enrique; Claros, M Gonzalo; Trelles, Oswaldo; Pérez-Pulido, Antonio J

2014-08-01

Automatic sequence annotation is an essential component of modern 'omics' studies, which aim to extract information from large collections of sequence data. Most existing tools use sequence homology to establish evolutionary relationships and assign putative functions to sequences. However, it can be difficult to define a similarity threshold that achieves sufficient coverage without sacrificing annotation quality. Defining the correct configuration is critical and can be challenging for non-specialist users. Thus, the development of robust automatic annotation techniques that generate high-quality annotations without needing expert knowledge would be very valuable for the research community. We present Sma3s, a tool for automatically annotating very large collections of biological sequences from any kind of gene library or genome. Sma3s is composed of three modules that progressively annotate query sequences using either: (i) very similar homologues, (ii) orthologous sequences or (iii) terms enriched in groups of homologous sequences. We trained the system using several random sets of known sequences, demonstrating average sensitivity and specificity values of ~85%. In conclusion, Sma3s is a versatile tool for high-throughput annotation of a wide variety of sequence datasets that outperforms the accuracy of other well-established annotation algorithms, and it can enrich existing database annotations and uncover previously hidden features. Importantly, Sma3s has already been used in the functional annotation of two published transcriptomes. © The Author 2014. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Sim3C: simulation of Hi-C and Meta3C proximity ligation sequencing technologies.

PubMed

DeMaere, Matthew Z; Darling, Aaron E

2018-02-01

Chromosome conformation capture (3C) and Hi-C DNA sequencing methods have rapidly advanced our understanding of the spatial organization of genomes and metagenomes. Many variants of these protocols have been developed, each with their own strengths. Currently there is no systematic means for simulating sequence data from this family of sequencing protocols, potentially hindering the advancement of algorithms to exploit this new datatype. We describe a computational simulator that, given simple parameters and reference genome sequences, will simulate Hi-C sequencing on those sequences. The simulator models the basic spatial structure in genomes that is commonly observed in Hi-C and 3C datasets, including the distance-decay relationship in proximity ligation, differences in the frequency of interaction within and across chromosomes, and the structure imposed by cells. A means to model the 3D structure of randomly generated topologically associating domains is provided. The simulator considers several sources of error common to 3C and Hi-C library preparation and sequencing methods, including spurious proximity ligation events and sequencing error. We have introduced the first comprehensive simulator for 3C and Hi-C sequencing protocols. We expect the simulator to have use in testing of Hi-C data analysis algorithms, as well as more general value for experimental design, where questions such as the required depth of sequencing, enzyme choice, and other decisions can be made in advance in order to ensure adequate statistical power with respect to experimental hypothesis testing.
Analysis of the Transcriptome of Erigeron breviscapus Uncovers Putative Scutellarin and Chlorogenic Acids Biosynthetic Genes and Genetic Markers

PubMed Central

Zhang, Jia-Jin; Shu, Li-Ping; Zhang, Wei; Long, Guang-Qiang; Liu, Tao; Meng, Zheng-Gui; Chen, Jun-Wen; Yang, Sheng-Chao

2014-01-01

Background Erigeron breviscapus (Vant.) Hand-Mazz. is a famous medicinal plant. Scutellarin and chlorogenic acids are the primary active components in this herb. However, the mechanisms of biosynthesis and regulation for scutellarin and chlorogenic acids in E. breviscapus are considerably unknown. In addition, genomic information of this herb is also unavailable. Principal Findings Using Illumina sequencing on GAIIx platform, a total of 64,605,972 raw sequencing reads were generated and assembled into 73,092 non-redundant unigenes. Among them, 44,855 unigenes (61.37%) were annotated in the public databases Nr, Swiss-Prot, KEGG, and COG. The transcripts encoding the known enzymes involved in flavonoids and in chlorogenic acids biosynthesis were discovered in the Illumina dataset. Three candidate cytochrome P450 genes were discovered which might encode flavone 6-hydroase converting apigenin to scutellarein. Furthermore, 4 unigenes encoding the homologues of maize P1 (R2R3-MYB transcription factors) were defined, which might regulate the biosynthesis of scutellarin. Additionally, a total of 11,077 simple sequence repeat (SSR) were identified from 9,255 unigenes. Of SSRs, tri-nucleotide motifs were the most abundant motif. Thirty-six primer pairs for SSRs were randomly selected for validation of the amplification and polymorphism. The result revealed that 34 (94.40%) primer pairs were successfully amplified and 19 (52.78%) primer pairs exhibited polymorphisms. Conclusion Using next generation sequencing (NGS) technology, this study firstly provides abundant genomic data for E. breviscapus. The candidate genes involved in the biosynthesis and transcriptional regulation of scutellarin and chlorogenic acids were obtained in this study. Additionally, a plenty of genetic makers were generated by identification of SSRs, which is a powerful tool for molecular breeding and genetics applications in this herb. PMID:24956277
Generating intrinsically disordered protein conformational ensembles from a Markov chain

NASA Astrophysics Data System (ADS)

Cukier, Robert I.

2018-03-01

Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
Analysis of the transcriptome of Erigeron breviscapus uncovers putative scutellarin and chlorogenic acids biosynthetic genes and genetic markers.

PubMed

Jiang, Ni-Hao; Zhang, Guang-Hui; Zhang, Jia-Jin; Shu, Li-Ping; Zhang, Wei; Long, Guang-Qiang; Liu, Tao; Meng, Zheng-Gui; Chen, Jun-Wen; Yang, Sheng-Chao

2014-01-01

Erigeron breviscapus (Vant.) Hand-Mazz. is a famous medicinal plant. Scutellarin and chlorogenic acids are the primary active components in this herb. However, the mechanisms of biosynthesis and regulation for scutellarin and chlorogenic acids in E. breviscapus are considerably unknown. In addition, genomic information of this herb is also unavailable. Using Illumina sequencing on GAIIx platform, a total of 64,605,972 raw sequencing reads were generated and assembled into 73,092 non-redundant unigenes. Among them, 44,855 unigenes (61.37%) were annotated in the public databases Nr, Swiss-Prot, KEGG, and COG. The transcripts encoding the known enzymes involved in flavonoids and in chlorogenic acids biosynthesis were discovered in the Illumina dataset. Three candidate cytochrome P450 genes were discovered which might encode flavone 6-hydroase converting apigenin to scutellarein. Furthermore, 4 unigenes encoding the homologues of maize P1 (R2R3-MYB transcription factors) were defined, which might regulate the biosynthesis of scutellarin. Additionally, a total of 11,077 simple sequence repeat (SSR) were identified from 9,255 unigenes. Of SSRs, tri-nucleotide motifs were the most abundant motif. Thirty-six primer pairs for SSRs were randomly selected for validation of the amplification and polymorphism. The result revealed that 34 (94.40%) primer pairs were successfully amplified and 19 (52.78%) primer pairs exhibited polymorphisms. Using next generation sequencing (NGS) technology, this study firstly provides abundant genomic data for E. breviscapus. The candidate genes involved in the biosynthesis and transcriptional regulation of scutellarin and chlorogenic acids were obtained in this study. Additionally, a plenty of genetic makers were generated by identification of SSRs, which is a powerful tool for molecular breeding and genetics applications in this herb.
Differential Expression and Functional Analysis of High-Throughput -Omics Data Using Open Source Tools.

PubMed

Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N

2017-01-01

Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.
Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases

PubMed Central

Schadt, Eric E.; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H.; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A.; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

2013-01-01

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types. PMID:23093720
Research on parallel algorithm for sequential pattern mining

NASA Astrophysics Data System (ADS)

Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

2008-03-01

Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.

Modeling kinetic rate variation in third generation DNA sequencing data to detect putative modifications to DNA bases.

PubMed

Schadt, Eric E; Banerjee, Onureena; Fang, Gang; Feng, Zhixing; Wong, Wing H; Zhang, Xuegong; Kislyuk, Andrey; Clark, Tyson A; Luong, Khai; Keren-Paz, Alona; Chess, Andrew; Kumar, Vipin; Chen-Plotkin, Alice; Sondheimer, Neal; Korlach, Jonas; Kasarskis, Andrew

2013-01-01

Current generation DNA sequencing instruments are moving closer to seamlessly sequencing genomes of entire populations as a routine part of scientific investigation. However, while significant inroads have been made identifying small nucleotide variation and structural variations in DNA that impact phenotypes of interest, progress has not been as dramatic regarding epigenetic changes and base-level damage to DNA, largely due to technological limitations in assaying all known and unknown types of modifications at genome scale. Recently, single-molecule real time (SMRT) sequencing has been reported to identify kinetic variation (KV) events that have been demonstrated to reflect epigenetic changes of every known type, providing a path forward for detecting base modifications as a routine part of sequencing. However, to date no statistical framework has been proposed to enhance the power to detect these events while also controlling for false-positive events. By modeling enzyme kinetics in the neighborhood of an arbitrary location in a genomic region of interest as a conditional random field, we provide a statistical framework for incorporating kinetic information at a test position of interest as well as at neighboring sites that help enhance the power to detect KV events. The performance of this and related models is explored, with the best-performing model applied to plasmid DNA isolated from Escherichia coli and mitochondrial DNA isolated from human brain tissue. We highlight widespread kinetic variation events, some of which strongly associate with known modification events, while others represent putative chemically modified sites of unknown types.
Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

PubMed

Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

2012-02-01

The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.
Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes

PubMed Central

Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

2012-01-01

The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information. PMID:22384404
Linear Lepidopteran ambidensovirus 1 sequences drive random integration of a reporter gene in transfected Spodoptera frugiperda cells.

PubMed

Rizk, Francine; Laverdure, Sylvain; d'Alençon, Emmanuelle; Bossin, Hervé; Dupressoir, Thierry

2018-01-01

The Lepidopteran ambidensovirus 1 isolated from Junonia coenia (hereafter JcDV) is an invertebrate parvovirus considered as a viral transduction vector as well as a potential tool for the biological control of insect pests. Previous works showed that JcDV-based circular plasmids experimentally integrate into insect cells genomic DNA. In order to approach the natural conditions of infection and possible integration, we generated linear JcDV- gfp based molecules which were transfected into non permissive Spodoptera frugiperda ( Sf9 ) cultured cells. Cells were monitored for the expression of green fluorescent protein (GFP) and DNA was analyzed for integration of transduced viral sequences. Non-structural protein modulation of the VP-gene cassette promoter activity was additionally assayed. We show that linear JcDV-derived molecules are capable of long term genomic integration and sustained transgene expression in Sf9 cells. As expected, only the deletion of both inverted terminal repeats (ITR) or the polyadenylation signals of NS and VP genes dramatically impairs the global transduction/expression efficiency. However, all the integrated viral sequences we characterized appear "scrambled" whatever the viral content of the transfected vector. Despite a strong GFP expression, we were unable to recover any full sequence of the original constructs and found rearranged viral and non-viral sequences as well. Cellular flanking sequences were identified as non-coding ones. On the other hand, the kinetics of GFP expression over time led us to investigate the apparent down-regulation by non-structural proteins of the VP-gene cassette promoter. Altogether, our results show that JcDV-derived sequences included in linear DNA molecules are able to drive efficiently the integration and expression of a foreign gene into the genome of insect cells, whatever their composition, provided that at least one ITR is present. However, the transfected sequences were extensively rearranged with cellular DNA during or after random integration in the host cell genome. Lastly, the non-structural proteins seem to participate in the regulation of p9 promoter activity rather than to the integration of viral sequences.
CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.

PubMed

Onsongo, Getiria; Baughn, Linda B; Bower, Matthew; Henzler, Christine; Schomaker, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

2016-11-01

Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
10-year trend in quantity and quality of pediatric randomized controlled trials published in mainland China: 2002–2011

PubMed Central

2013-01-01

Background Quality assessment of pediatric randomized controlled trials (RCTs) in China is limited. The aim of this study was to evaluate the quantitative trends and quality indicators of RCTs published in mainland China over a recent 10-year period. Methods We individually searched all 17 available pediatric journals published in China from January 1, 2002 to December 30, 2011 to identify RCTs of drug treatment in participants under the age of 18 years. The quality was evaluated according to the Cochrane quality assessment protocol. Results Of 1287 journal issues containing 44398 articles, a total of 2.4% (1077/44398) articles were included in the analysis. The proportion of RCTs increased from 0.28% in 2002 to 0.32% in 2011. Individual sample sizes ranged from 10 to 905 participants (median 81 participants); 2.3% of the RCTs were multiple center trials; 63.9% evaluated Western medicine, 32.5% evaluated traditional Chinese medicine; 15% used an adequate method of random sequence generation; and 10.4% used a quasi-random method for randomization. Only 1% of the RCTs reported adequate allocation concealment and 0.6% reported the method of blinding. The follow-up period was from 7 days to 96 months, with a median of 7.5 months. There was incomplete outcome data reported in 8.3%, of which 4.5% (4/89) used intention-to-treat analysis. Only 0.4% of the included trials used adequate random sequence allocation, concealment and blinding. The articles published from 2007 to 2011 revealed an improvement in the randomization method compared with articles published from 2002 to 2006 (from 2.7% to 23.6%, p = 0.000). Conclusions In mainland China, the quantity of RCTs did not increase in the pediatric population, and the general quality was relatively poor. Quality improvements were suboptimal in the later 5 years. PMID:23914882
Rapid development of microsatellite markers for the endangered fish Schizothorax biddulphi (Günther) using next generation sequencing and cross-species amplification.

PubMed

Luo, Wei; Nie, Zhulan; Zhan, Fanbin; Wei, Jie; Wang, Weimin; Gao, Zexia

2012-11-14

Tarim schizothoracin (Schizothorax biddulphi) is an endemic fish species native to the Tarim River system of Xinjiang and has been classified as an extremely endangered freshwater fish species in China. Here, we used a next generation sequencing platform (ion torrent PGM™) to obtain a large number of microsatellites for S. biddulphi, for the first time. A total of 40577 contigs were assembled, which contained 1379 SSRs. In these SSRs, the number of dinucleotide repeats were the most frequent (77.08%) and AC repeats were the most frequently occurring microsatellite, followed by AG, AAT and AT. Fifty loci were randomly selected for primer development; of these, 38 loci were successfully amplified and 29 loci were polymorphic across panels of 30 individuals. The H(o) ranged from 0.15 to 0.83, and H(e) ranged from 0.15 to 0.85, with 3.5 alleles per locus on average. Cross-species utility indicated that 20 of these markers were successfully amplified in a related, also an endangered fish species, S. irregularis. This study suggests that PGM™ sequencing is a rapid and cost-effective tool for developing microsatellite markers for non-model species and the developed microsatellite markers in this study would be useful in Schizothorax genetic analysis.
A pedagogical example of second-order arithmetic sequences applied to the construction of computer passwords by upper elementary grade students

NASA Astrophysics Data System (ADS)

Coggins, Porter E.

2015-04-01

The purpose of this paper is (1) to present how general education elementary school age students constructed computer passwords using digital root sums and second-order arithmetic sequences, (2) argue that computer password construction can be used as an engaging introduction to generate interest in elementary school students to study mathematics related to computer science, and (3) share additional mathematical ideas accessible to elementary school students that can be used to create computer passwords. This paper serves to fill a current gap in the literature regarding the integration of mathematical content accessible to upper elementary school students and aspects of computer science in general, and computer password construction in particular. In addition, the protocols presented here can serve as a hook to generate further interest in mathematics and computer science. Students learned to create a random-looking computer password by using biometric measurements of their shoe size, height, and age in months and to create a second-order arithmetic sequence, then converted the resulting numbers into characters that become their computer passwords. This password protocol can be used to introduce students to good computer password habits that can serve a foundation for a life-long awareness of data security. A refinement of the password protocol is also presented.
Identification and Characterization of an Acinetobacter baumannii Biofilm-Associated Protein▿

PubMed Central

Loehfelm, Thomas W.; Luke, Nicole R.; Campagnari, Anthony A.

2008-01-01

We have identified a homologue to the staphylococcal biofilm-associated protein (Bap) in a bloodstream isolate of Acinetobacter baumannii. The fully sequenced open reading frame is 25,863 bp and encodes a protein with a predicted molecular mass of 854 kDa. Analysis of the nucleotide sequence reveals a repetitive structure consistent with bacterial cell surface adhesins. Bap-specific monoclonal antibody (MAb) 6E3 was generated to an epitope conserved among 41% of A. baumannii strains isolated during a recent outbreak in the U.S. military health care system. Flow cytometry confirms that the MAb 6E3 epitope is surface exposed. Random transposon mutagenesis was used to generate A. baumannii bap1302::EZ-Tn5, a mutant negative for surface reactivity to MAb 6E3 in which the transposon disrupts the coding sequence of bap. Time course confocal laser scanning microscopy and three-dimensional image analysis of actively growing biofilms demonstrates that this mutant is unable to sustain biofilm thickness and volume, suggesting a role for Bap in supporting the development of the mature biofilm structure. This is the first identification of a specific cell surface protein directly involved in biofilm formation by A. baumannii and suggests that Bap is involved in intercellular adhesion within the mature biofilm. PMID:18024522
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

PubMed

Herold, Keith E; Rasooly, Avraham

2003-12-01

Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
Complementary DNA sequencing and identification of mRNAs from the venomous gland of Agkistrodon piscivorus leucostoma.

PubMed

Jia, Ying; Cantu, Bruno A; Sánchez, Elda E; Pérez, John C

2008-06-15

To advance our knowledge on the snake venom composition and transcripts expressed in venom gland at the molecular level, we constructed a cDNA library from the venom gland of Agkistrodon piscivorus leucostoma for the generation of expressed sequence tags (ESTs) database. From the randomly sequenced 2112 independent clones, we have obtained ESTs for 1309 (62%) cDNAs, which showed significant deduced amino acid sequence similarity (scores >80) to previously characterized proteins in National Center for Biotechnology Information (NCBI) database. Ribosomal proteins make up 47 clones (2%) and the remaining 756 (36%) cDNAs represent either unknown identity or show BLASTX sequence identity scores of <80 with known GenBank accessions. The most highly expressed gene encoding phospholipase A(2) (PLA(2)) accounting for 35% of A. p. leucostoma venom gland cDNAs was identified and further confirmed by crude venom applied to sodium dodecyl sulfate/polyacrylamide gel electrophoresis (SDS-PAGE) electrophoresis and protein sequencing. A total of 180 representative genes were obtained from the sequence assemblies and deposited to EST database. Clones showing sequence identity to disintegrins, thrombin-like enzymes, hemorrhagic toxins, fibrinogen clotting inhibitors and plasminogen activators were also identified in our EST database. These data can be used to develop a research program that will help us identify genes encoding proteins that are of medical importance or proteins involved in the mechanisms of the toxin venom.
Characterization of the Kenaf (Hibiscus cannabinus) Global Transcriptome Using Illumina Paired-End Sequencing and Development of EST-SSR Markers

PubMed Central

Li, Hui; Li, Defang; Chen, Anguo; Tang, Huijuan; Li, Jianjun; Huang, Siqi

2016-01-01

Kenaf (Hibiscus cannabinus L.) is an economically important natural fiber crop grown worldwide. However, only 20 expressed tag sequences (ESTs) for kenaf are available in public databases. The aim of this study was to develop large-scale simple sequence repeat (SSR) markers to lay a solid foundation for the construction of genetic linkage maps and marker-assisted breeding in kenaf. We used Illumina paired-end sequencing technology to generate new EST-simple sequences and MISA software to mine SSR markers. We identified 71,318 unigenes with an average length of 1143 nt and annotated these unigenes using four different protein databases. Overall, 9324 complementary pairs were designated as EST-SSR markers, and their quality was validated using 100 randomly selected SSR markers. In total, 72 primer pairs reproducibly amplified target amplicons, and 61 of these primer pairs detected significant polymorphism among 28 kenaf accessions. Thus, in this study, we have developed large-scale SSR markers for kenaf, and this new resource will facilitate construction of genetic linkage maps, investigation of fiber growth and development in kenaf, and also be of value to novel gene discovery and functional genomic studies. PMID:26960153
Origin and implications of zero degeneracy in networks spectra.

PubMed

Yadav, Alok; Jalan, Sarika

2015-04-01

The spectra of many real world networks exhibit properties which are different from those of random networks generated using various models. One such property is the existence of a very high degeneracy at the zero eigenvalue. In this work, we provide all the possible reasons behind the occurrence of the zero degeneracy in the network spectra, namely, the complete and partial duplications, as well as their implications. The power-law degree sequence and the preferential attachment are the properties which enhances the occurrence of such duplications and hence leading to the zero degeneracy. A comparison of the zero degeneracy in protein-protein interaction networks of six different species and in their corresponding model networks indicates importance of the degree sequences and the power-law exponent for the occurrence of zero degeneracy.
Phage display as a technology delivering on the promise of peptide drug discovery.

PubMed

Hamzeh-Mivehroud, Maryam; Alizadeh, Ali Akbar; Morris, Michael B; Church, W Bret; Dastmalchi, Siavoush

2013-12-01

Phage display represents an important approach in the development pipeline for producing peptides and peptidomimetics therapeutics. Using randomly generated DNA sequences and molecular biology techniques, large diverse peptide libraries can be displayed on the phage surface. The phage library can be incubated with a target of interest and the phage which bind can be isolated and sequenced to reveal the displayed peptides' primary structure. In this review, we focus on the 'mechanics' of the phage display process, whilst highlighting many diverse and subtle ways it has been used to further the drug-development process, including the potential for the phage particle itself to be used as a drug carrier targeted to a particular pathogen or cell type in the body. Copyright © 2013 Elsevier Ltd. All rights reserved.
Primer-Free Aptamer Selection Using A Random DNA Library

PubMed Central

Pan, Weihua; Xin, Ping; Patrick, Susan; Dean, Stacey; Keating, Christine; Clawson, Gary

2010-01-01

Aptamers are highly structured oligonucleotides (DNA or RNA) that can bind to targets with affinities comparable to antibodies 1. They are identified through an in vitro selection process called Systematic Evolution of Ligands by EXponential enrichment (SELEX) to recognize a wide variety of targets, from small molecules to proteins and other macromolecules 2-4. Aptamers have properties that are well suited for in vivo diagnostic and/or therapeutic applications: Besides good specificity and affinity, they are easily synthesized, survive more rigorous processing conditions, they are poorly immunogenic, and their relatively small size can result in facile penetration of tissues. Aptamers that are identified through the standard SELEX process usually comprise ~80 nucleotides (nt), since they are typically selected from nucleic acid libraries with ~40 nt long randomized regions plus fixed primer sites of ~20 nt on each side. The fixed primer sequences thus can comprise nearly ~50% of the library sequences, and therefore may positively or negatively compromise identification of aptamers in the selection process 3, although bioinformatics approaches suggest that the fixed sequences do not contribute significantly to aptamer structure after selection 5. To address these potential problems, primer sequences have been blocked by complementary oligonucleotides or switched to different sequences midway during the rounds of SELEX 6, or they have been trimmed to 6-9 nt 7, 8. Wen and Gray 9 designed a primer-free genomic SELEX method, in which the primer sequences were completely removed from the library before selection and were then regenerated to allow amplification of the selected genomic fragments. However, to employ the technique, a unique genomic library has to be constructed, which possesses limited diversity, and regeneration after rounds of selection relies on a linear reamplification step. Alternatively, efforts to circumvent problems caused by fixed primer sequences using high efficiency partitioning are met with problems regarding PCR amplification 10. We have developed a primer-free (PF) selection method that significantly simplifies SELEX procedures and effectively eliminates primer-interference problems 11, 12. The protocols work in a straightforward manner. The central random region of the library is purified without extraneous flanking sequences and is bound to a suitable target (for example to a purified protein or complex mixtures such as cell lines). Then the bound sequences are obtained, reunited with flanking sequences, and re-amplified to generate selected sub-libraries. As an example, here we selected aptamers to S100B, a protein marker for melanoma. Binding assays showed Kd s in the 10-7 - 10-8 M range after a few rounds of selection, and we demonstrate that the aptamers function effectively in a sandwich binding format. PMID:20689511
Association between funding source, methodological quality and research outcomes in randomized controlled trials of synbiotics, probiotics and prebiotics added to infant formula: A Systematic Review

PubMed Central

2013-01-01

Background There is little or no information available on the impact of funding by the food industry on trial outcomes and methodological quality of synbiotics, probiotics and prebiotics research in infants. The objective of this study was to compare the methodological quality, outcomes of food industry sponsored trials versus non industry sponsored trials, with regards to supplementation of synbiotics, probiotics and prebiotics in infant formula. Methods A comprehensive search was conducted to identify published and unpublished randomized clinical trials (RCTs). Cochrane methodology was used to assess the risk of bias of included RCTs in the following domains: 1) sequence generation; 2) allocation concealment; 3) blinding; 4) incomplete outcome data; 5) selective outcome reporting; and 6) other bias. Clinical outcomes and authors’ conclusions were reported in frequencies and percentages. The association between source of funding, risk of bias, clinical outcomes and conclusions were assessed using Pearson’s Chi-square test and the Fisher’s exact test. A p-value < 0.05 was statistically significant. Results Sixty seven completed and 3 on-going RCTs were included. Forty (59.7%) were funded by food industry, 11 (16.4%) by non-industry entities and 16 (23.9%) did not specify source of funding. Several risk of bias domains, especially sequence generation, allocation concealment and blinding, were not adequately reported. There was no significant association between the source of funding and sequence generation, allocation concealment, blinding and selective reporting, majority of reported clinical outcomes or authors’ conclusions. On the other hand, source of funding was significantly associated with the domains of incomplete outcome data, free of other bias domains as well as reported antibiotic use and conclusions on weight gain. Conclusion In RCTs on infants fed infant formula containing probiotics, prebiotics or synbiotics, the source of funding did not influence the majority of outcomes in favour of the sponsors’ products. More non-industry funded research is needed to further assess the impact of funding on methodological quality, reported clinical outcomes and authors’ conclusions. PMID:24219082
Low rank approximation methods for MR fingerprinting with large scale dictionaries.

PubMed

Yang, Mingrui; Ma, Dan; Jiang, Yun; Hamilton, Jesse; Seiberlich, Nicole; Griswold, Mark A; McGivney, Debra

2018-04-01

This work proposes new low rank approximation approaches with significant memory savings for large scale MR fingerprinting (MRF) problems. We introduce a compressed MRF with randomized singular value decomposition method to significantly reduce the memory requirement for calculating a low rank approximation of large sized MRF dictionaries. We further relax this requirement by exploiting the structures of MRF dictionaries in the randomized singular value decomposition space and fitting them to low-degree polynomials to generate high resolution MRF parameter maps. In vivo 1.5T and 3T brain scan data are used to validate the approaches. T 1 , T 2 , and off-resonance maps are in good agreement with that of the standard MRF approach. Moreover, the memory savings is up to 1000 times for the MRF-fast imaging with steady-state precession sequence and more than 15 times for the MRF-balanced, steady-state free precession sequence. The proposed compressed MRF with randomized singular value decomposition and dictionary fitting methods are memory efficient low rank approximation methods, which can benefit the usage of MRF in clinical settings. They also have great potentials in large scale MRF problems, such as problems considering multi-component MRF parameters or high resolution in the parameter space. Magn Reson Med 79:2392-2400, 2018. © 2017 International Society for Magnetic Resonance in Medicine. © 2017 International Society for Magnetic Resonance in Medicine.
Molecular selection in a unified evolutionary sequence

NASA Technical Reports Server (NTRS)

Fox, S. W.

1986-01-01

With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.
Ocean biogeochemistry modeled with emergent trait-based genomics

NASA Astrophysics Data System (ADS)

Coles, V. J.; Stukel, M. R.; Brooks, M. T.; Burd, A.; Crump, B. C.; Moran, M. A.; Paul, J. H.; Satinsky, B. M.; Yager, P. L.; Zielinski, B. L.; Hood, R. R.

2017-12-01

Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and “omics” data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean.
AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

PubMed

Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

2003-01-01

In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Modeste Nguimdo, Romain, E-mail: Romain.Nguimdo@vub.ac.be; Tchitnga, Robert; Woafo, Paul

We numerically investigate the possibility of using a coupling to increase the complexity in simplest chaotic two-component electronic circuits operating at high frequency. We subsequently show that complex behaviors generated in such coupled systems, together with the post-processing are suitable for generating bit-streams which pass all the NIST tests for randomness. The electronic circuit is built up by unidirectionally coupling three two-component (one active and one passive) oscillators in a ring configuration through resistances. It turns out that, with such a coupling, high chaotic signals can be obtained. By extracting points at fixed interval of 10 ns (corresponding to a bitmore » rate of 100 Mb/s) on such chaotic signals, each point being simultaneously converted in 16-bits (or 8-bits), we find that the binary sequence constructed by including the 10(or 2) least significant bits pass statistical tests of randomness, meaning that bit-streams with random properties can be achieved with an overall bit rate up to 10×100 Mb/s =1Gbit/s (or 2×100 Mb/s =200 Megabit/s). Moreover, by varying the bias voltages, we also investigate the parameter range for which more complex signals can be obtained. Besides being simple to implement, the two-component electronic circuit setup is very cheap as compared to optical and electro-optical systems.« less
Variations on a theme of Lander and Waterman

DOE Office of Scientific and Technical Information (OSTI.GOV)

Speed, T.

1997-12-01

The original Lander and Waterman mathematical analysis was for fingerprinting random clones. Since that time, a number of variants of their theory have appeared, including ones which apply to mapping by anchoring random clones, and to non-random or directed clone mapping. The same theory is now widely used to devise random sequencing strategies. In this talk I will review these developments, and go on the discuss the theory required for directed sequencing strategies.
Rate of de novo mutations and the importance of father's age to disease risk.

PubMed

Kong, Augustine; Frigge, Michael L; Masson, Gisli; Besenbacher, Soren; Sulem, Patrick; Magnusson, Gisli; Gudjonsson, Sigurjon A; Sigurdsson, Asgeir; Jonasdottir, Aslaug; Jonasdottir, Adalbjorg; Wong, Wendy S W; Sigurdsson, Gunnar; Walters, G Bragi; Steinberg, Stacy; Helgason, Hannes; Thorleifsson, Gudmar; Gudbjartsson, Daniel F; Helgason, Agnar; Magnusson, Olafur Th; Thorsteinsdottir, Unnur; Stefansson, Kari

2012-08-23

Mutations generate sequence diversity and provide a substrate for selection. The rate of de novo mutations is therefore of major importance to evolution. Here we conduct a study of genome-wide mutation rates by sequencing the entire genomes of 78 Icelandic parent-offspring trios at high coverage. We show that in our samples, with an average father's age of 29.7, the average de novo mutation rate is 1.20 × 10(-8) per nucleotide per generation. Most notably, the diversity in mutation rate of single nucleotide polymorphisms is dominated by the age of the father at conception of the child. The effect is an increase of about two mutations per year. An exponential model estimates paternal mutations doubling every 16.5 years. After accounting for random Poisson variation, father's age is estimated to explain nearly all of the remaining variation in the de novo mutation counts. These observations shed light on the importance of the father's age on the risk of diseases such as schizophrenia and autism.
Identification and Validation of Expressed Sequence Tags from Pigeonpea (Cajanus cajan L.) Root

PubMed Central

Kumar, Ravi Ranjan; Yadav, Shailesh; Joshi, Shourabh; Bhandare, Prithviraj P.; Patil, Vinod Kumar; Kulkarni, Pramod B.; Sonkawade, Swati; Naik, G. R.

2014-01-01

Pigeonpea (Cajanus cajan (L) Millsp.) is an important food legume crop of rain fed agriculture in the arid and semiarid tropics of the world. It has deep and extensive root system which serves a number of important physiological and metabolic functions in plant development and growth. In order to identify genes associated with pigeonpea root, ESTs were generated from the root tissues of pigeonpea (GRG-295 genotype) by normalized cDNA library. A total of 105 high quality ESTs were generated by sequencing of 250 random clones which resulted in 72 unigenes comprising 25 contigs and 47 singlets. The ESTs were assigned to 9 functional categories on the basis of their putative function. In order to validate the possible expression of transcripts, four genes, namely, S-adenosylmethionine synthetase, phosphoglycerate kinase, serine carboxypeptidase, and methionine aminopeptidase, were further analyzed by reverse transcriptase PCR. The possible role of the identified transcripts and their functions associated with root will also be a valuable resource for the functional genomics study in legume crop. PMID:24895494
Automated Sequence Generation Process and Software

NASA Technical Reports Server (NTRS)

Gladden, Roy

2007-01-01

"Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.
Comparison and correlation of Simple Sequence Repeats distribution in genomes of Brucella species

PubMed Central

Kiran, Jangampalli Adi Pradeep; Chakravarthi, Veeraraghavulu Praveen; Kumar, Yellapu Nanda; Rekha, Somesula Swapna; Kruti, Srinivasan Shanthi; Bhaskar, Matcha

2011-01-01

Computational genomics is one of the important tools to understand the distribution of closely related genomes including simple sequence repeats (SSRs) in an organism, which gives valuable information regarding genetic variations. The central objective of the present study was to screen the SSRs distributed in coding and non-coding regions among different human Brucella species which are involved in a range of pathological disorders. Computational analysis of the SSRs in the Brucella indicates few deviations from expected random models. Statistical analysis also reveals that tri-nucleotide SSRs are overrepresented and tetranucleotide SSRs underrepresented in Brucella genomes. From the data, it can be suggested that over expressed tri-nucleotide SSRs in genomic and coding regions might be responsible in the generation of functional variation of proteins expressed which in turn may lead to different pathogenicity, virulence determinants, stress response genes, transcription regulators and host adaptation proteins of Brucella genomes. Abbreviations SSRs - Simple Sequence Repeats, ORFs - Open Reading Frames. PMID:21738309
Controlling Light Transmission Through Highly Scattering Media Using Semi-Definite Programming as a Phase Retrieval Computation Method.

PubMed

N'Gom, Moussa; Lien, Miao-Bin; Estakhri, Nooshin M; Norris, Theodore B; Michielssen, Eric; Nadakuditi, Raj Rao

2017-05-31

Complex Semi-Definite Programming (SDP) is introduced as a novel approach to phase retrieval enabled control of monochromatic light transmission through highly scattering media. In a simple optical setup, a spatial light modulator is used to generate a random sequence of phase-modulated wavefronts, and the resulting intensity speckle patterns in the transmitted light are acquired on a camera. The SDP algorithm allows computation of the complex transmission matrix of the system from this sequence of intensity-only measurements, without need for a reference beam. Once the transmission matrix is determined, optimal wavefronts are computed that focus the incident beam to any position or sequence of positions on the far side of the scattering medium, without the need for any subsequent measurements or wavefront shaping iterations. The number of measurements required and the degree of enhancement of the intensity at focus is determined by the number of pixels controlled by the spatial light modulator.
Sequenced RAPD markers to detect hybridization in the barbary partridge (Alectoris barbara, Phasianidae).

PubMed

Barbanera, Filippo; Guerrini, Monica; Bertoncini, Franco; Cappelli, Fabio; Muzzeddu, Marco; Dini, Fernando

2011-01-01

In the Alectoris partridges (Phasianidae), hybridization occurs occasionally as a result of the natural breakdown of isolating mechanisms but more frequently as a result of human activity. No genetic record of hybridization is known for the barbary partridge (A. barbara). This species is distributed mostly in North Africa and, in Europe, on the island of Sardinia (Italy) and on Gibraltar. The risk of hybridization between barbary and red-legged partridge (A. rufa: Iberian Peninsula, France, Italy) is high in Sardinia and in Spain. We developed two random amplified polymorphic DNA (RAPD) markers to detect A. barbara × A. rufa hybrid partridges. We tested them on 125 experimental hybrids, sequenced the relative species-specific bands and found that the bands and their corresponding sequences were reliably transmitted through a number of generations (F1, F2, F3, BC1, BC2). Our markers represent a highly valuable tool for the preservation of the A. barbara genome from the pressing threat of A. rufa pollution. © 2010 Blackwell Publishing Ltd.
A multi-center randomized controlled trial to compare a self-ligating bracket with a conventional bracket in a UK population: Part 1: Treatment efficiency.

PubMed

O'Dywer, Lian; Littlewood, Simon J; Rahman, Shahla; Spencer, R James; Barber, Sophy K; Russell, Joanne S

2016-01-01

To use a two-arm parallel trial to compare treatment efficiency between a self-ligating and a conventional preadjusted edgewise appliance system. A prospective multi-center randomized controlled clinical trial was conducted in three hospital orthodontic departments. Subjects were randomly allocated to receive treatment with either a self-ligating (3M SmartClip) or conventional (3M Victory) preadjusted edgewise appliance bracket system using a computer-generated random sequence concealed in opaque envelopes, with stratification for operator and center. Two operators followed a standardized protocol regarding bracket bonding procedure and archwire sequence. Efficiency of each ligation system was assessed by comparing the duration of treatment (months), total number of appointments (scheduled and emergency visits), and number of bracket bond failures. One hundred thirty-eight subjects (mean age 14 years 11 months) were enrolled in the study, of which 135 subjects (97.8%) completed treatment. The mean treatment time and number of visits were 25.12 months and 19.97 visits in the SmartClip group and 25.80 months and 20.37 visits in the Victory group. The overall bond failure rate was 6.6% for the SmartClip and 7.2% for Victory, with a similar debond distribution between the two appliances. No significant differences were found between the bracket systems in any of the outcome measures. No serious harm was observed from either bracket system. There was no clinically significant difference in treatment efficiency between treatment with a self-ligating bracket system and a conventional ligation system.
Rényi continuous entropy of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2004-12-07

Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
DNA polymerase preference determines PCR priming efficiency.

PubMed

Pan, Wenjing; Byrne-Steele, Miranda; Wang, Chunlin; Lu, Stanley; Clemmons, Scott; Zahorchak, Robert J; Han, Jian

2014-01-30

Polymerase chain reaction (PCR) is one of the most important developments in modern biotechnology. However, PCR is known to introduce biases, especially during multiplex reactions. Recent studies have implicated the DNA polymerase as the primary source of bias, particularly initiation of polymerization on the template strand. In our study, amplification from a synthetic library containing a 12 nucleotide random portion was used to provide an in-depth characterization of DNA polymerase priming bias. The synthetic library was amplified with three commercially available DNA polymerases using an anchored primer with a random 3' hexamer end. After normalization, the next generation sequencing (NGS) results of the amplified libraries were directly compared to the unamplified synthetic library. Here, high throughput sequencing was used to systematically demonstrate and characterize DNA polymerase priming bias. We demonstrate that certain sequence motifs are preferred over others as primers where the six nucleotide sequences at the 3' end of the primer, as well as the sequences four base pairs downstream of the priming site, may influence priming efficiencies. DNA polymerases in the same family from two different commercial vendors prefer similar motifs, while another commercially available enzyme from a different DNA polymerase family prefers different motifs. Furthermore, the preferred priming motifs are GC-rich. The DNA polymerase preference for certain sequence motifs was verified by amplification from single-primer templates. We incorporated the observed DNA polymerase preference into a primer-design program that guides the placement of the primer to an optimal location on the template. DNA polymerase priming bias was characterized using a synthetic library amplification system and NGS. The characterization of DNA polymerase priming bias was then utilized to guide the primer-design process and demonstrate varying amplification efficiencies among three commercially available DNA polymerases. The results suggest that the interaction of the DNA polymerase with the primer:template junction during the initiation of DNA polymerization is very important in terms of overall amplification bias and has broader implications for both the primer design process and multiplex PCR.
Chemical Evolution and the Evolutionary Definition of Life.

PubMed

Higgs, Paul G

2017-06-01

Darwinian evolution requires a mechanism for generation of diversity in a population, and selective differences between individuals that influence reproduction. In biology, diversity is generated by mutations and selective differences arise because of the encoded functions of the sequences (e.g., ribozymes or proteins). Here, I draw attention to a process that I will call chemical evolution, in which the diversity is generated by random chemical synthesis instead of (or in addition to) mutation, and selection acts on physicochemical properties, such as hydrolysis, photolysis, solubility, or surface binding. Chemical evolution applies to short oligonucleotides that can be generated by random polymerization, as well as by template-directed replication, and which may be too short to encode a specific function. Chemical evolution is an important stage on the pathway to life, between the stage of "just chemistry" and the stage of full biological evolution. A mathematical model is presented here that illustrates the differences between these three stages. Chemical evolution leads to much larger differences in molecular concentrations than can be achieved by selection without replication. However, chemical evolution is not open-ended, unlike biological evolution. The ability to undergo Darwinian evolution is often considered to be a defining feature of life. Here, I argue that chemical evolution, although Darwinian, does not quite constitute life, and that a good place to put the conceptual boundary between non-life and life is between chemical and biological evolution.
Observation of quantum criticality with ultracold atoms in optical lattices

NASA Astrophysics Data System (ADS)

Zhang, Xibo

As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields: comparative genomics and epigenomics, and employs a variety of computational techniques to address the problems. One fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective: to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.
Structure and function of neonatal social communication in a genetic mouse model of autism.

PubMed

Takahashi, T; Okabe, S; Broin, P Ó; Nishi, A; Ye, K; Beckert, M V; Izumi, T; Machida, A; Kang, G; Abe, S; Pena, J L; Golden, A; Kikusui, T; Hiroi, N

2016-09-01

A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor.
Structure and function of neonatal social communication in a genetic mouse model of autism

PubMed Central

Takahashi, Tomohisa; Okabe, Shota; Ó Broin, Pilib; Nishi, Akira; Ye, Kenny; Beckert, Michael V.; Izumi, Takeshi; Machida, Akihiro; Kang, Gina; Abe, Seiji; Pena, Jose L.; Golden, Aaron; Kikusui, Takefumi; Hiroi, Noboru

2015-01-01

A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically-triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor. PMID:26666205
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

PubMed

Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

2013-07-01

The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
Whole genome sequencing identifies influenza A H3N2 transmission and offers superior resolution to classical typing methods.

PubMed

Meinel, Dominik M; Heinzinger, Susanne; Eberle, Ute; Ackermann, Nikolaus; Schönberger, Katharina; Sing, Andreas

2018-02-01

Influenza with its annual epidemic waves is a major cause of morbidity and mortality worldwide. However, only little whole genome data are available regarding the molecular epidemiology promoting our understanding of viral spread in human populations. We implemented a RT-PCR strategy starting from patient material to generate influenza A whole genome sequences for molecular epidemiological surveillance. Samples were obtained within the Bavarian Influenza Sentinel. The complete influenza virus genome was amplified by a one-tube multiplex RT-PCR and sequenced on an Illumina MiSeq. We report whole genomic sequences for 50 influenza A H3N2 viruses, which was the predominating virus in the season 2014/15, directly from patient specimens. The dataset included random samples from Bavaria (Germany) throughout the influenza season and samples from three suspected transmission clusters. We identified the outbreak samples based on sequence identity. Whole genome sequencing (WGS) was superior in resolution compared to analysis of single segments or partial segment analysis. Additionally, we detected manifestation of substantial amounts of viral quasispecies in several patients, carrying mutations varying from the dominant virus in each patient. Our rapid whole genome sequencing approach for influenza A virus shows that WGS can effectively be used to detect and understand outbreaks in large communities. Additionally, the genomic data provide in-depth details about the circulating virus within one season.
Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
Seismoelectric data processing for surface surveys of shallow targets

USGS Publications Warehouse

Haines, S.S.; Guitton, A.; Biondi, B.

2007-01-01

The utility of the seismoelectric method relies on the development of methods to extract the signal of interest from background and source-generated coherent noise that may be several orders-of-magnitude stronger. We compare data processing approaches to develop a sequence of preprocessing and signal/noise separation and to quantify the noise level from which we can extract signal events. Our preferred sequence begins with the removal of power line harmonic noise and the use of frequency filters to minimize random and source-generated noise. Mapping to the linear Radon domain with an inverse process incorporating a sparseness constraint provides good separation of signal from noise, though it is ineffective on noise that shows the same dip as the signal. Similarly, the seismoelectric signal and noise do not separate cleanly in the Fourier domain, so f-k filtering can not remove all of the source-generated noise and it also disrupts signal amplitude patterns. We find that prediction-error filters provide the most effective method to separate signal and noise, while also preserving amplitude information, assuming that adequate pattern models can be determined for the signal and noise. These Radon-domain and prediction-error-filter methods successfully separate signal from <33 dB stronger noise in our test data. ?? 2007 Society of Exploration Geophysicists.
Development of a Web Tool for Escherichia coli Subtyping Based on fimH Alleles.

PubMed

Roer, Louise; Tchesnokova, Veronika; Allesøe, Rosa; Muradova, Mariya; Chattopadhyay, Sujay; Ahrenfeldt, Johanne; Thomsen, Martin C F; Lund, Ole; Hansen, Frank; Hammerum, Anette M; Sokurenko, Evgeni; Hasman, Henrik

2017-08-01

The aim of this study was to construct a valid publicly available method for in silico fimH subtyping of Escherichia coli particularly suitable for differentiation of fine-resolution subgroups within clonal groups defined by standard multilocus sequence typing (MLST). FimTyper was constructed as a FASTA database containing all currently known fimH alleles. The software source code is publicly available at https://bitbucket.org/genomicepidemiology/fimtyper, the database is freely available at https://bitbucket.org/genomicepidemiology/fimtyper_db, and a service implementing the software is available at https://cge.cbs.dtu.dk/services/FimTyper FimTyper was validated on three data sets: one containing Sanger sequences of fimH alleles of 42 E. coli isolates generated prior to the current study (data set 1), one containing whole-genome sequence (WGS) data of 243 third-generation-cephalosporin-resistant E. coli isolates (data set 2), and one containing a randomly chosen subset of 40 E. coli isolates from data set 2 that were subjected to conventional fimH subtyping (data set 3). The combination of the three data sets enabled an evaluation and comparison of FimTyper on both Sanger sequences and WGS data. FimTyper correctly predicted all 42 fimH subtypes from the Sanger sequences from data set 1 and successfully analyzed all 243 draft genomes from data set 2. FimTyper subtyping of the Sanger sequences and WGS data from data set 3 were in complete agreement. Additionally, fimH subtyping was evaluated on a phylogenetic network of 122 sequence type 131 (ST131) E. coli isolates. There was perfect concordance between the typology and fimH -based subclones within ST131, with accurate identification of the pandemic multidrug-resistant clonal subgroup ST131- H 30. FimTyper provides a standardized tool, as a rapid alternative to conventional fimH subtyping, highly suitable for surveillance and outbreak detection. Copyright © 2017 American Society for Microbiology.

A statistical approach to selecting and confirming validation targets in -omics experiments

PubMed Central

2012-01-01

Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Bottom-up driven involuntary auditory evoked field change: constant sound sequencing amplifies but does not sharpen neural activity.

PubMed

Okamoto, Hidehiko; Stracke, Henning; Lagemann, Lothar; Pantev, Christo

2010-01-01

The capability of involuntarily tracking certain sound signals during the simultaneous presence of noise is essential in human daily life. Previous studies have demonstrated that top-down auditory focused attention can enhance excitatory and inhibitory neural activity, resulting in sharpening of frequency tuning of auditory neurons. In the present study, we investigated bottom-up driven involuntary neural processing of sound signals in noisy environments by means of magnetoencephalography. We contrasted two sound signal sequencing conditions: "constant sequencing" versus "random sequencing." Based on a pool of 16 different frequencies, either identical (constant sequencing) or pseudorandomly chosen (random sequencing) test frequencies were presented blockwise together with band-eliminated noises to nonattending subjects. The results demonstrated that the auditory evoked fields elicited in the constant sequencing condition were significantly enhanced compared with the random sequencing condition. However, the enhancement was not significantly different between different band-eliminated noise conditions. Thus the present study confirms that by constant sound signal sequencing under nonattentive listening the neural activity in human auditory cortex can be enhanced, but not sharpened. Our results indicate that bottom-up driven involuntary neural processing may mainly amplify excitatory neural networks, but may not effectively enhance inhibitory neural circuits.
Novel application of the MSSCP method in biodiversity studies.

PubMed

Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula

2012-02-01

Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Definition of Proteasomal Peptide Splicing Rules for High-Efficiency Spliced Peptide Presentation by MHC Class I Molecules

PubMed Central

Berkers, Celia R.; de Jong, Annemieke; Schuurman, Karianne G.; Linnemann, Carsten; Meiring, Hugo D.; Janssen, Lennert; Neefjes, Jacques J.; Schumacher, Ton N. M.; Rodenko, Boris

2015-01-01

Peptide splicing, in which two distant parts of a protein are excised and then ligated to form a novel peptide, can generate unique MHC class I–restricted responses. Because these peptides are not genetically encoded and the rules behind proteasomal splicing are unknown, it is difficult to predict these spliced Ags. In the current study, small libraries of short peptides were used to identify amino acid sequences that affect the efficiency of this transpeptidation process. We observed that splicing does not occur at random, neither in terms of the amino acid sequences nor through random splicing of peptides from different sources. In contrast, splicing followed distinct rules that we deduced and validated both in vitro and in cells. Peptide ligation was quantified using a model peptide and demonstrated to occur with up to 30% ligation efficiency in vitro, provided that optimal structural requirements for ligation were met by both ligating partners. In addition, many splicing products could be formed from a single protein. Our splicing rules will facilitate prediction and detection of new spliced Ags to expand the peptidome presented by MHC class I Ags. PMID:26401003
De novo selection of oncogenes.

PubMed

Chacón, Kelly M; Petti, Lisa M; Scheideman, Elizabeth H; Pirazzoli, Valentina; Politi, Katerina; DiMaio, Daniel

2014-01-07

All cellular proteins are derived from preexisting ones by natural selection. Because of the random nature of this process, many potentially useful protein structures never arose or were discarded during evolution. Here, we used a single round of genetic selection in mouse cells to isolate chemically simple, biologically active transmembrane proteins that do not contain any amino acid sequences from preexisting proteins. We screened a retroviral library expressing hundreds of thousands of proteins consisting of hydrophobic amino acids in random order to isolate four 29-aa proteins that induced focus formation in mouse and human fibroblasts and tumors in mice. These proteins share no amino acid sequences with known cellular or viral proteins, and the simplest of them contains only seven different amino acids. They transformed cells by forming a stable complex with the platelet-derived growth factor β receptor transmembrane domain and causing ligand-independent receptor activation. We term this approach de novo selection and suggest that it can be used to generate structures and activities not observed in nature, create prototypes for novel research reagents and therapeutics, and provide insight into cell biology, transmembrane protein-protein interactions, and possibly virus evolution and the origin of life.
Adaptive Designs for Randomized Trials in Public Health

PubMed Central

Brown, C. Hendricks; Have, Thomas R. Ten; Jo, Booil; Dagne, Getachew; Wyman, Peter A.; Muthén, Bengt; Gibbons, Robert D.

2009-01-01

In this article, we present a discussion of two general ways in which the traditional randomized trial can be modified or adapted in response to the data being collected. We use the term adaptive design to refer to a trial in which characteristics of the study itself, such as the proportion assigned to active intervention versus control, change during the trial in response to data being collected. The term adaptive sequence of trials refers to a decision-making process that fundamentally informs the conceptualization and conduct of each new trial with the results of previous trials. Our discussion below investigates the utility of these two types of adaptations for public health evaluations. Examples are provided to illustrate how adaptation can be used in practice. From these case studies, we discuss whether such evaluations can or should be analyzed as if they were formal randomized trials, and we discuss practical as well as ethical issues arising in the conduct of these new-generation trials. PMID:19296774
Optical image encryption using chaos-based compressed sensing and phase-shifting interference in fractional wavelet domain

NASA Astrophysics Data System (ADS)

Liu, Qi; Wang, Ying; Wang, Jun; Wang, Qiong-Hua

2018-02-01

In this paper, a novel optical image encryption system combining compressed sensing with phase-shifting interference in fractional wavelet domain is proposed. To improve the encryption efficiency, the volume data of original image are decreased by compressed sensing. Then the compacted image is encoded through double random phase encoding in asymmetric fractional wavelet domain. In the encryption system, three pseudo-random sequences, generated by three-dimensional chaos map, are used as the measurement matrix of compressed sensing and two random-phase masks in the asymmetric fractional wavelet transform. It not only simplifies the keys to storage and transmission, but also enhances our cryptosystem nonlinearity to resist some common attacks. Further, holograms make our cryptosystem be immune to noises and occlusion attacks, which are obtained by two-step-only quadrature phase-shifting interference. And the compression and encryption can be achieved in the final result simultaneously. Numerical experiments have verified the security and validity of the proposed algorithm.
Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length

PubMed Central

Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.

2012-01-01

Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677
Portable and Error-Free DNA-Based Data Storage.

PubMed

Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

2017-07-10

DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
Ultraaccurate genome sequencing and haplotyping of single human cells.

PubMed

Chu, Wai Keung; Edge, Peter; Lee, Ho Suk; Bansal, Vikas; Bafna, Vineet; Huang, Xiaohua; Zhang, Kun

2017-11-21

Accurate detection of variants and long-range haplotypes in genomes of single human cells remains very challenging. Common approaches require extensive in vitro amplification of genomes of individual cells using DNA polymerases and high-throughput short-read DNA sequencing. These approaches have two notable drawbacks. First, polymerase replication errors could generate tens of thousands of false-positive calls per genome. Second, relatively short sequence reads contain little to no haplotype information. Here we report a method, which is dubbed SISSOR (single-stranded sequencing using microfluidic reactors), for accurate single-cell genome sequencing and haplotyping. A microfluidic processor is used to separate the Watson and Crick strands of the double-stranded chromosomal DNA in a single cell and to randomly partition megabase-size DNA strands into multiple nanoliter compartments for amplification and construction of barcoded libraries for sequencing. The separation and partitioning of large single-stranded DNA fragments of the homologous chromosome pairs allows for the independent sequencing of each of the complementary and homologous strands. This enables the assembly of long haplotypes and reduction of sequence errors by using the redundant sequence information and haplotype-based error removal. We demonstrated the ability to sequence single-cell genomes with error rates as low as 10 -8 and average 500-kb-long DNA fragments that can be assembled into haplotype contigs with N50 greater than 7 Mb. The performance could be further improved with more uniform amplification and more accurate sequence alignment. The ability to obtain accurate genome sequences and haplotype information from single cells will enable applications of genome sequencing for diverse clinical needs. Copyright © 2017 the Author(s). Published by PNAS.
Genetic recombination pathways and their application for genome modification of human embryonic stem cells.

PubMed

Nieminen, Mikko; Tuuri, Timo; Savilahti, Harri

2010-10-01

Human embryonic stem cells are pluripotent cells derived from early human embryo and retain a potential to differentiate into all adult cell types. They provide vast opportunities in cell replacement therapies and are expected to become significant tools in drug discovery as well as in the studies of cellular and developmental functions of human genes. The progress in applying different types of DNA recombination reactions for genome modification in a variety of eukaryotic cell types has provided means to utilize recombination-based strategies also in human embryonic stem cells. Homologous recombination-based methods, particularly those utilizing extended homologous regions and those employing zinc finger nucleases to boost genomic integration, have shown their usefulness in efficient genome modification. Site-specific recombination systems are potent genome modifiers, and they can be used to integrate DNA into loci that contain an appropriate recombination signal sequence, either naturally occurring or suitably pre-engineered. Non-homologous recombination can be used to generate random integrations in genomes relatively effortlessly, albeit with a moderate efficiency and precision. DNA transposition-based strategies offer substantially more efficient random strategies and provide means to generate single-copy insertions, thus potentiating the generation of genome-wide insertion libraries applicable in genetic screens. 2010 Elsevier Inc. All rights reserved.
Generation of a novel artificial TrkB agonist, BM17d99, using T7 phage-displayed random peptide libraries.

PubMed

Ohnishi, Toshiyuki; Sakamoto, Kotaro; Asami-Odaka, Asano; Nakamura, Kimie; Shimizu, Ayako; Ito, Takashi; Asami, Taiji; Ohtaki, Tetsuya; Inooka, Hiroshi

2017-01-29

Tropomyosin receptor kinase B (TrkB) is a known receptor of brain-derived neurotrophic factor (BDNF). Because it plays a critical role in the regulation of neuronal development, maturation, survival, etc., TrkB is a good target for drugs against central nervous system diseases. In this study, we aimed to generate peptidic TrkB agonists by applying random peptide phage display technology. After the phage panning against recombinant Fc-fused TrkB (TrkB-Fc), agonistic phages were directly screened against TrkB-expressing HEK293 cells. Through subsequent screening of the first-hit BM17 peptide-derived focus library, we successfully obtained the BM17d99 peptide, which had no sequence similarity with BDNF but had TrkB-binding capacity. We then synthesized a dimeric BM17d99 analog peptide that could phosphorylate or activate TrkB by facilitating receptor homodimerization. Treatment of TrkB-expressing HEK293 cells with the dimeric BM17d99 analog peptide significantly induced the phosphorylation of TrkB, suggesting that homodimerization of TrkB was enhanced by the dimeric peptide. This report demonstrates that our approach is useful for the generation of artificial peptidic agonists of cell surface receptors. Copyright © 2016 Elsevier Inc. All rights reserved.
HPV integration hijacks and multimerizes a cellular enhancer to generate a viral-cellular super-enhancer that drives high viral oncogene expression

PubMed Central

Redmond, Catherine J.; Dooley, Katharine E.; Fu, Haiqing; Gillison, Maura L.; Akagi, Keiko; Symer, David E.; Aladjem, Mirit I.

2018-01-01

Integration of human papillomavirus (HPV) genomes into cellular chromatin is common in HPV-associated cancers. Integration is random, and each site is unique depending on how and where the virus integrates. We recently showed that tandemly integrated HPV16 could result in the formation of a super-enhancer-like element that drives transcription of the viral oncogenes. Here, we characterize the chromatin landscape and genomic architecture of this integration locus to elucidate the mechanisms that promoted de novo super-enhancer formation. Using next-generation sequencing and molecular combing/fiber-FISH, we show that ~26 copies of HPV16 are integrated into an intergenic region of chromosome 2p23.2, interspersed with 25 kb of amplified, flanking cellular DNA. This interspersed, co-amplified viral-host pattern is frequent in HPV-associated cancers and here we designate it as Type III integration. An abundant viral-cellular fusion transcript encoding the viral E6/E7 oncogenes is expressed from the integration locus and the chromatin encompassing both the viral enhancer and a region in the adjacent amplified cellular sequences is strongly enriched in the super-enhancer markers H3K27ac and Brd4. Notably, the peak in the amplified cellular sequence corresponds to an epithelial-cell-type specific enhancer. Thus, HPV16 integration generated a super-enhancer-like element composed of tandem interspersed copies of the viral upstream regulatory region and a cellular enhancer, to drive high levels of oncogene expression. PMID:29364907
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.

PubMed

Nath, Abhigyan; Subbiah, Karthikeyan

2015-12-01

Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.
Long-range correlations and charge transport properties of DNA sequences

NASA Astrophysics Data System (ADS)

Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

2010-04-01

By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5
Generation and analysis of a barcode-tagged insertion mutant library in the fission yeast Schizosaccharomyces pombe

PubMed Central

2012-01-01

Background Barcodes are unique DNA sequence tags that can be used to specifically label individual mutants. The barcode-tagged open reading frame (ORF) haploid deletion mutant collections in the budding yeast Saccharomyces cerevisiae and the fission yeast Schizosaccharomyces pombe allow for high-throughput mutant phenotyping because the relative growth of mutants in a population can be determined by monitoring the proportions of their associated barcodes. While these mutant collections have greatly facilitated genome-wide studies, mutations in essential genes are not present, and the roles of these genes are not as easily studied. To further support genome-scale research in S. pombe, we generated a barcode-tagged fission yeast insertion mutant library that has the potential of generating viable mutations in both essential and non-essential genes and can be easily analyzed using standard molecular biological techniques. Results An insertion vector containing a selectable ura4+ marker and a random barcode was used to generate a collection of 10,000 fission yeast insertion mutants stored individually in 384-well plates and as six pools of mixed mutants. Individual barcodes are flanked by Sfi I recognition sites and can be oligomerized in a unique orientation to facilitate barcode sequencing. Independent genetic screens on a subset of mutants suggest that this library contains a diverse collection of single insertion mutations. We present several approaches to determine insertion sites. Conclusions This collection of S. pombe barcode-tagged insertion mutants is well-suited for genome-wide studies. Because insertion mutations may eliminate, reduce or alter the function of essential and non-essential genes, this library will contain strains with a wide range of phenotypes that can be assayed by their associated barcodes. The design of the barcodes in this library allows for barcode sequencing using next generation or standard benchtop cloning approaches. PMID:22554201
Transposable element islands facilitate adaptation to novel environments in an invasive species

PubMed Central

Schrader, Lukas; Kim, Jay W.; Ence, Daniel; Zimin, Aleksey; Klein, Antonia; Wyschetzki, Katharina; Weichselgartner, Tobias; Kemena, Carsten; Stökl, Johannes; Schultner, Eva; Wurm, Yannick; Smith, Christopher D.; Yandell, Mark; Heinze, Jürgen; Gadau, Jürgen; Oettler, Jan

2014-01-01

Adaptation requires genetic variation, but founder populations are generally genetically depleted. Here we sequence two populations of an inbred ant that diverge in phenotype to determine how variability is generated. Cardiocondyla obscurior has the smallest of the sequenced ant genomes and its structure suggests a fundamental role of transposable elements (TEs) in adaptive evolution. Accumulations of TEs (TE islands) comprising 7.18% of the genome evolve faster than other regions with regard to single-nucleotide variants, gene/exon duplications and deletions and gene homology. A non-random distribution of gene families, larvae/adult specific gene expression and signs of differential methylation in TE islands indicate intragenomic differences in regulation, evolutionary rates and coalescent effective population size. Our study reveals a tripartite interplay between TEs, life history and adaptation in an invasive species. PMID:25510865
A general strategy for cloning viroids and other small circular RNAs that uses minimal amounts of template and does not require prior knowledge of its sequence.

PubMed

Navarro, B; Daròs, J A; Flores, R

1996-01-01

Two PCR-based methods are described for obtaining clones of small circular RNAs of unknown sequence and for which only minute amounts are available. To avoid introducing any assumption about the RNA sequence, synthesis of the cDNAs is initiated with random primers. The cDNA population is then PCR-amplified using a primer whose sequence is present at both sides of the cDNAs, since they have been obtained with random hexamers and then a linker with the sequence of the PCR primer has been ligated to their termini, or because the cDNAs have been synthesized with an oligonucleotide that contains the sequence of the PCR primer at its 5' end and six randomized positions at its 3' end. The procedures need only approximately 50 ng of purified RNA template. The reasons for the emergence of cloning artifacts and precautions to avoid them are discussed.
Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.

PubMed

Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C

2018-06-01

High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
Weight distributions for turbo codes using random and nonrandom permutations

NASA Technical Reports Server (NTRS)

Dolinar, S.; Divsalar, D.

1995-01-01

This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.

Engineering RNA phage MS2 virus-like particles for peptide display

NASA Astrophysics Data System (ADS)

Jordan, Sheldon Keith

Phage display is a powerful and versatile technology that enables the selection of novel binding functions from large populations of randomly generated peptide sequences. Random sequences are genetically fused to a viral structural protein to produce complex peptide libraries. From a sufficiently complex library, phage bearing peptides with practically any desired binding activity can be physically isolated by affinity selection, and, since each particle carries in its genome the genetic information for its own replication, the selectants can be amplified by infection of bacteria. For certain applications however, existing phage display platforms have limitations. One such area is in the field of vaccine development, where the goal is to identify relevant epitopes by affinity-selection against an antibody target, and then to utilize them as immunogens to elicit a desired antibody response. Today, affinity selection is usually conducted using display on filamentous phages like M13. This technology provides an efficient means for epitope identification, but, because filamentous phages do not display peptides in the high-density, multivalent arrays the immune system prefers to recognize, they generally make poor immunogens and are typically useless as vaccines. This makes it necessary to confer immunogenicity by conjugating synthetic versions of the peptides to more immunogenic carriers. Unfortunately, when introduced into these new structural environments, the epitopes often fail to elicit relevant antibody responses. Thus, it would be advantageous to combine the epitope selection and immunogen functions into a single platform where the structural constraints present during affinity selection can be preserved during immunization. This dissertation describes efforts to develop a peptide display system based on the virus-like particles (VLPs) of bacteriophage MS2. Phage display technologies rely on (1) the identification of a site in a viral structural protein that is present on the surface of the virus particle and can accept foreign sequence insertions without disruption of protein folding and viral particle assembly, and (2) on the encapsidation of nucleic acid sequences encoding both the VLP and the peptide it displays. The experiments described here are aimed at satisfying the first of these two requirements by engineering efficient peptide display at two different sites in MS2 coat protein. First, we evaluated the suitability of the N-terminus of MS2 coat for peptide insertions. It was observed that random N-terminal 10-mer fusions generally disrupted protein folding and VLP assembly, but by bracketing the foreign sequences with certain specific dipeptides, these defects could be suppressed. Next, the suitability of a coat protein surface loop for foreign sequence insertion was tested. Specifically, random sequence peptides were inserted into the N-terminal-most AB-loop of a coat protein single-chain dimer. Again we found that efficient display required the presence of appropriate dipeptides bracketing the peptide insertion. Finally, it was shown that an N-terminal fusion that tended to interfere specifically with capsid assembly could be efficiently incorporated into mosaic particles when co-expressed with wild-type coat protein.
Free Vibration of Uncertain Unsymmetrically Laminated Beams

NASA Technical Reports Server (NTRS)

Kapania, Rakesh K.; Goyal, Vijay K.

2001-01-01

Monte Carlo Simulation and Stochastic FEA are used to predict randomness in the free vibration response of thin unsymmetrically laminated beams. For the present study, it is assumed that randomness in the response is only caused by uncertainties in the ply orientations. The ply orientations may become random or uncertain during the manufacturing process. A new 16-dof beam element, based on the first-order shear deformation beam theory, is used to study the stochastic nature of the natural frequencies. Using variational principles, the element stiffness matrix and mass matrix are obtained through analytical integration. Using a random sequence a large data set is generated, containing possible random ply-orientations. This data is assumed to be symmetric. The stochastic-based finite element model for free vibrations predicts the relation between the randomness in fundamental natural frequencies and the randomness in ply-orientation. The sensitivity derivatives are calculated numerically through an exact formulation. The squared fundamental natural frequencies are expressed in terms of deterministic and probabilistic quantities, allowing to determine how sensitive they are to variations in ply angles. The predicted mean-valued fundamental natural frequency squared and the variance of the present model are in good agreement with Monte Carlo Simulation. Results, also, show that variations between plus or minus 5 degrees in ply-angles can affect free vibration response of unsymmetrically and symmetrically laminated beams.
Variable speed wind turbine generator with zero-sequence filter

DOEpatents

Muljadi, Eduard

1998-01-01

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.
Variable Speed Wind Turbine Generator with Zero-sequence Filter

DOEpatents

Muljadi, Eduard

1998-08-25

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.
Variable speed wind turbine generator with zero-sequence filter

DOEpatents

Muljadi, E.

1998-08-25

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility. 14 figs.
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

PubMed Central

Matochko, Wadim L.; Derda, Ratmir

2013-01-01

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Method For Determining And Modifying Protein/Peptide Solubilty

DOEpatents

Waldo, Geoffrey S.

2005-03-15

A solubility reporter for measuring a protein's solubility in vivo or in vitro is described. The reporter, which can be used in a single living cell, gives a specific signal suitable for determining whether the cell bears a soluble version of the protein of interest. A pool of random mutants of an arbitrary protein, generated using error-prone in vitro recombination, may also be screened for more soluble versions using the reporter, and these versions may be recombined to yield variants having further-enhanced solubility. The method of the present invention includes "irrational" (random mutagenesis) methods, which do not require a priori knowledge of the three-dimensional structure of the protein of interest. Multiple sequences of mutation/genetic recombination and selection for improved solubility are demonstrated to yield versions of the protein which display enhanced solubility.
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

PubMed

Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

2014-01-13

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Autogen Version 2.0

NASA Technical Reports Server (NTRS)

Gladden, Roy

2007-01-01

Version 2.0 of the autogen software has been released. "Autogen" (automated sequence generation) signifies both a process and software used to implement the process of automated generation of sequences of commands in a standard format for uplink to spacecraft. Autogen requires fewer workers than are needed for older manual sequence-generation processes and reduces sequence-generation times from weeks to minutes.
Periodic, On-Demand, and User-Specified Information Reconciliation

NASA Technical Reports Server (NTRS)

Kolano, Paul

2007-01-01

Automated sequence generation (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. Autogen requires fewer workers than are needed for older manual sequence-generation processes and reduces sequence-generation times from weeks to minutes. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences. APGEN includes a graphical user interface that facilitates scheduling of activities on a time line and affords a capability to automatically expand, decompose, and schedule activities.
A global sampling approach to designing and reengineering RNA secondary structures.

PubMed

Levin, Alex; Lis, Mieszko; Ponty, Yann; O'Donnell, Charles W; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme

2012-11-01

The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
Genome-wide analysis of Tol2 transposon reintegration in zebrafish.

PubMed

Kondrychyn, Igor; Garcia-Lecea, Marta; Emelyanov, Alexander; Parinov, Sergey; Korzh, Vladimir

2009-09-08

Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. We performed a large-scale enhancer trap (ET) screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.
A global sampling approach to designing and reengineering RNA secondary structures

PubMed Central

Levin, Alex; Lis, Mieszko; Ponty, Yann; O’Donnell, Charles W.; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme

2012-01-01

The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign. PMID:22941632
Living laboratory: whole-genome sequencing as a learning healthcare enterprise.

PubMed

Angrist, M; Jamal, L

2015-04-01

With the proliferation of affordable large-scale human genomic data come profound and vexing questions about management of such data and their clinical uncertainty. These issues challenge the view that genomic research on human beings can (or should) be fully segregated from clinical genomics, either conceptually or practically. Here, we argue that the sharp distinction between clinical care and research is especially problematic in the context of large-scale genomic sequencing of people with suspected genetic conditions. Core goals of both enterprises (e.g. understanding genotype-phenotype relationships; generating an evidence base for genomic medicine) are more likely to be realized at a population scale if both those ordering and those undergoing sequencing for diagnostic reasons are routinely and longitudinally studied. Rather than relying on expensive and lengthy randomized clinical trials and meta-analyses, we propose leveraging nascent clinical-research hybrid frameworks into a broader, more permanent instantiation of exploratory medical sequencing. Such an investment could enlighten stakeholders about the real-life challenges posed by whole-genome sequencing, such as establishing the clinical actionability of genetic variants, returning 'off-target' results to families, developing effective service delivery models and monitoring long-term outcomes. © 2014 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.
Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition.

PubMed

Lee, Jessica A; Francis, Christopher A

2017-12-01

Denitrification is a dominant nitrogen loss process in the sediments of San Francisco Bay. In this study, we sought to understand the ecology of denitrifying bacteria by using next-generation sequencing (NGS) to survey the diversity of a denitrification functional gene, nirS (encoding cytchrome-cd 1 nitrite reductase), along the salinity gradient of San Francisco Bay over the course of a year. We compared our dataset to a library of nirS sequences obtained previously from the same samples by standard PCR cloning and Sanger sequencing, and showed that both methods similarly demonstrated geography, salinity and, to a lesser extent, nitrogen, to be strong determinants of community composition. Furthermore, the depth afforded by NGS enabled novel techniques for measuring the association between environment and community composition. We used Random Forests modelling to demonstrate that the site and salinity of a sample could be predicted from its nirS sequences, and to identify indicator taxa associated with those environmental characteristics. This work contributes significantly to our understanding of the distribution and dynamics of denitrifying communities in San Francisco Bay, and provides valuable tools for the further study of this key N-cycling guild in all estuarine systems. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
An In-Depth Analysis of the Chung-Lu Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Winlaw, M.; DeSterck, H.; Sanders, G.

2015-10-28

In the classic Erd}os R enyi random graph model [5] each edge is chosen with uniform probability and the degree distribution is binomial, limiting the number of graphs that can be modeled using the Erd}os R enyi framework [10]. The Chung-Lu model [1, 2, 3] is an extension of the Erd}os R enyi model that allows for more general degree distributions. The probability of each edge is no longer uniform and is a function of a user-supplied degree sequence, which by design is the expected degree sequence of the model. This property makes it an easy model to work withmore » theoretically and since the Chung-Lu model is a special case of a random graph model with a given degree sequence, many of its properties are well known and have been studied extensively [2, 3, 13, 8, 9]. It is also an attractive null model for many real-world networks, particularly those with power-law degree distributions and it is sometimes used as a benchmark for comparison with other graph generators despite some of its limitations [12, 11]. We know for example, that the average clustering coe cient is too low relative to most real world networks. As well, measures of a nity are also too low relative to most real-world networks of interest. However, despite these limitations or perhaps because of them, the Chung-Lu model provides a basis for comparing new graph models.« less
Volume calculation of CT lung lesions based on Halton low-discrepancy sequences

NASA Astrophysics Data System (ADS)

Li, Shusheng; Wang, Liansheng; Li, Shuo

2017-03-01

Volume calculation from the Computed Tomography (CT) lung lesions data is a significant parameter for clinical diagnosis. The volume is widely used to assess the severity of the lung nodules and track its progression, however, the accuracy and efficiency of previous studies are not well achieved for clinical uses. It remains to be a challenging task due to its tight attachment to the lung wall, inhomogeneous background noises and large variations in sizes and shape. In this paper, we employ Halton low-discrepancy sequences to calculate the volume of the lung lesions. The proposed method directly compute the volume without the procedure of three-dimension (3D) model reconstruction and surface triangulation, which significantly improves the efficiency and reduces the complexity. The main steps of the proposed method are: (1) generate a certain number of random points in each slice using Halton low-discrepancy sequences and calculate the lesion area of each slice through the proportion; (2) obtain the volume by integrating the areas in the sagittal direction. In order to evaluate our proposed method, the experiments were conducted on the sufficient data sets with different size of lung lesions. With the uniform distribution of random points, our proposed method achieves more accurate results compared with other methods, which demonstrates the robustness and accuracy for the volume calculation of CT lung lesions. In addition, our proposed method is easy to follow and can be extensively applied to other applications, e.g., volume calculation of liver tumor, atrial wall aneurysm, etc.
Self-correcting random number generator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Humble, Travis S.; Pooser, Raphael C.

2016-09-06

A system and method for generating random numbers. The system may include a random number generator (RNG), such as a quantum random number generator (QRNG) configured to self-correct or adapt in order to substantially achieve randomness from the output of the RNG. By adapting, the RNG may generate a random number that may be considered random regardless of whether the random number itself is tested as such. As an example, the RNG may include components to monitor one or more characteristics of the RNG during operation, and may use the monitored characteristics as a basis for adapting, or self-correcting, tomore » provide a random number according to one or more performance criteria.« less
Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome.

PubMed

Wu, Jia Qian; Du, Jiang; Rozowsky, Joel; Zhang, Zhengdong; Urban, Alexander E; Euskirchen, Ghia; Weissman, Sherman; Gerstein, Mark; Snyder, Michael

2008-01-03

Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced. We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins. We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

The Effect of Practice Schedule on Context-Dependent Learning.

PubMed

Lee, Ya-Yun; Fisher, Beth E

2018-03-02

It is well established that random practice compared to blocked practice enhances motor learning. Additionally, while information in the environment may be incidental, learning is also enhanced when an individual performs a task within the same environmental context in which the task was originally practiced. This study aimed to disentangle the effects of practice schedule and incidental/environmental context on motor learning. Participants practiced three finger sequences under either a random or blocked practice schedule. Each sequence was associated with specific incidental context (i.e., color and location on the computer screen) during practice. The participants were tested under the conditions when the sequence-context associations remained the same or were changed from that of practice. When the sequence-context association was changed, the participants who practiced under blocked schedule demonstrated greater performance decrement than those who practiced under random schedule. The findings suggested that those participants who practiced under random schedule were more resistant to the change of environmental context.
Molecular analysis of the microbial diversity present in the colonic wall, colonic lumen, and cecal lumen of a pig.

PubMed

Pryde, S E; Richardson, A J; Stewart, C S; Flint, H J

1999-12-01

Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined.
Molecular Analysis of the Microbial Diversity Present in the Colonic Wall, Colonic Lumen, and Cecal Lumen of a Pig

PubMed Central

Pryde, Susan E.; Richardson, Anthony J.; Stewart, Colin S.; Flint, Harry J.

1999-01-01

Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined. PMID:10583991
Selection of Optimal Polypurine Tract Region Sequences during Moloney Murine Leukemia Virus Replication

PubMed Central

Robson, Nicole D.; Telesnitsky, Alice

2000-01-01

Retrovirus plus-strand synthesis is primed by a cleavage remnant of the polypurine tract (PPT) region of viral RNA. In this study, we tested replication properties for Moloney murine leukemia viruses with targeted mutations in the PPT and in conserved sequences upstream, as well as for pools of mutants with randomized sequences in these regions. The importance of maintaining some purine residues within the PPT was indicated both by examining the evolution of random PPT pools and from the replication properties of targeted mutants. Although many different PPT sequences could support efficient replication and one mutant that contained two differences in the core PPT was found to replicate as well as the wild type, some sequences in the core PPT clearly conferred advantages over others. Contributions of sequences upstream of the core PPT were examined with deletion mutants. A conserved T-stretch within the upstream sequence was examined in detail and found to be unimportant to helper functions. Evolution of virus pools containing randomized T-stretch sequences demonstrated marked preference for the wild-type sequence in six of its eight positions. These findings demonstrate that maintenance of the T-rich element is more important to viral replication than is maintenance of the core PPT. PMID:11044073
Rcorrector: efficient and accurate error correction for Illumina RNA-seq reads.

PubMed

Song, Li; Florea, Liliana

2015-01-01

Next-generation sequencing of cellular RNA (RNA-seq) is rapidly becoming the cornerstone of transcriptomic analysis. However, sequencing errors in the already short RNA-seq reads complicate bioinformatics analyses, in particular alignment and assembly. Error correction methods have been highly effective for whole-genome sequencing (WGS) reads, but are unsuitable for RNA-seq reads, owing to the variation in gene expression levels and alternative splicing. We developed a k-mer based method, Rcorrector, to correct random sequencing errors in Illumina RNA-seq reads. Rcorrector uses a De Bruijn graph to compactly represent all trusted k-mers in the input reads. Unlike WGS read correctors, which use a global threshold to determine trusted k-mers, Rcorrector computes a local threshold at every position in a read. Rcorrector has an accuracy higher than or comparable to existing methods, including the only other method (SEECER) designed for RNA-seq reads, and is more time and memory efficient. With a 5 GB memory footprint for 100 million reads, it can be run on virtually any desktop or server. The software is available free of charge under the GNU General Public License from https://github.com/mourisl/Rcorrector/.
Transcriptome Analysis and Development of SSR Molecular Markers in Glycyrrhiza uralensis Fisch.

PubMed Central

Liu, Yaling; Zhang, Pengfei; Song, Meiling; Hou, Junling; Qing, Mei; Wang, Wenquan; Liu, Chunsheng

2015-01-01

Licorice is an important traditional Chinese medicine with clinical and industrial applications. Genetic resources of licorice are insufficient for analysis of molecular biology and genetic functions; as such, transcriptome sequencing must be conducted for functional characterization and development of molecular markers. In this study, transcriptome sequencing on the Illumina HiSeq 2500 sequencing platform generated a total of 5.41 Gb clean data. De novo assembly yielded a total of 46,641 unigenes. Comparison analysis using BLAST showed that the annotations of 29,614 unigenes were conserved. Further study revealed 773 genes related to biosynthesis of secondary metabolites of licorice, 40 genes involved in biosynthesis of the terpenoid backbone, and 16 genes associated with biosynthesis of glycyrrhizic acid. Analysis of unigenes larger than 1 Kb with a length of 11,702 nt presented 7,032 simple sequence repeats (SSR). Sixty-four of 69 randomly designed and synthesized SSR pairs were successfully amplified, 33 pairs of primers were polymorphism in in Glycyrrhiza uralensis Fisch., Glycyrrhiza inflata Bat., Glycyrrhiza glabra L. and Glycyrrhiza pallidiflora Maxim. This study not only presents the molecular biology data of licorice but also provides a basis for genetic diversity research and molecular marker-assisted breeding of licorice. PMID:26571372
Primer design for a prokaryotic differential display RT-PCR.

PubMed Central

Fislage, R; Berceanu, M; Humboldt, Y; Wendt, M; Oberender, H

1997-01-01

We have developed a primer set for a prokaryotic differential display of mRNA in the Enterobacteriaceae group. Each combination of ten 10mer and ten 11mer primers generates up to 85 bands from total Escherichia coli RNA, thus covering expressed sequences of a complete bacterial genome. Due to the lack of polyadenylation in prokaryotic RNA the type T11VN anchored oligonucleotides for the reverse transcriptase reaction had to be replaced with respect to the original method described by Liang and Pardee [ Science , 257, 967-971 (1992)]. Therefore, the sequences of both the 10mer and the new 11mer oligonucleotides were determined by a statistical evaluation of species-specific coding regions extracted from the EMBL database. The 11mer primers used for reverse transcription were selected for localization in the 3'-region of the bacterial RNA. The 10mer primers preferentially bind to the 5'-end of the RNA. None of the primers show homology to rRNA or other abundant small RNA species. Randomly sampled cDNA bands were checked for their bacterial origin either by re-amplification, cloning and sequencing or by re-amplification and direct sequencing with 10mer and 11mer primers after asymmetric PCR. PMID:9108168
Primer design for a prokaryotic differential display RT-PCR.

PubMed

Fislage, R; Berceanu, M; Humboldt, Y; Wendt, M; Oberender, H

1997-05-01

We have developed a primer set for a prokaryotic differential display of mRNA in the Enterobacteriaceae group. Each combination of ten 10mer and ten 11mer primers generates up to 85 bands from total Escherichia coli RNA, thus covering expressed sequences of a complete bacterial genome. Due to the lack of polyadenylation in prokaryotic RNA the type T11VN anchored oligonucleotides for the reverse transcriptase reaction had to be replaced with respect to the original method described by Liang and Pardee [ Science , 257, 967-971 (1992)]. Therefore, the sequences of both the 10mer and the new 11mer oligonucleotides were determined by a statistical evaluation of species-specific coding regions extracted from the EMBL database. The 11mer primers used for reverse transcription were selected for localization in the 3'-region of the bacterial RNA. The 10mer primers preferentially bind to the 5'-end of the RNA. None of the primers show homology to rRNA or other abundant small RNA species. Randomly sampled cDNA bands were checked for their bacterial origin either by re-amplification, cloning and sequencing or by re-amplification and direct sequencing with 10mer and 11mer primers after asymmetric PCR.
Congruence analysis of point clouds from unstable stereo image sequences

NASA Astrophysics Data System (ADS)

Jepping, C.; Bethmann, F.; Luhmann, T.

2014-06-01

This paper deals with the correction of exterior orientation parameters of stereo image sequences over deformed free-form surfaces without control points. Such imaging situation can occur, for example, during photogrammetric car crash test recordings where onboard high-speed stereo cameras are used to measure 3D surfaces. As a result of such measurements 3D point clouds of deformed surfaces are generated for a complete stereo sequence. The first objective of this research focusses on the development and investigation of methods for the detection of corresponding spatial and temporal tie points within the stereo image sequences (by stereo image matching and 3D point tracking) that are robust enough for a reliable handling of occlusions and other disturbances that may occur. The second objective of this research is the analysis of object deformations in order to detect stable areas (congruence analysis). For this purpose a RANSAC-based method for congruence analysis has been developed. This process is based on the sequential transformation of randomly selected point groups from one epoch to another by using a 3D similarity transformation. The paper gives a detailed description of the congruence analysis. The approach has been tested successfully on synthetic and real image data.
OPEN PROBLEM: Orbits' statistics in chaotic dynamical systems

NASA Astrophysics Data System (ADS)

Arnold, V.

2008-07-01

This paper shows how the measurement of the stochasticity degree of a finite sequence of real numbers, published by Kolmogorov in Italian in a journal of insurances' statistics, can be usefully applied to measure the objective stochasticity degree of sequences, originating from dynamical systems theory and from number theory. Namely, whenever the value of Kolmogorov's stochasticity parameter of a given sequence of numbers is too small (or too big), one may conclude that the conjecture describing this sequence as a sample of independent values of a random variables is highly improbable. Kolmogorov used this strategy fighting (in a paper in 'Doklady', 1940) against Lysenko, who had tried to disprove the classical genetics' law of Mendel experimentally. Calculating his stochasticity parameter value for the numbers from Lysenko's experiment reports, Kolmogorov deduced, that, while these numbers were different from the exact fulfilment of Mendel's 3 : 1 law, any smaller deviation would be a manifestation of the report's number falsification. The calculation of the values of the stochasticity parameter would be useful for many other generators of pseudorandom numbers and for many other chaotically looking statistics, including even the prime numbers distribution (discussed in this paper as an example).
Preparation of Meloidogyne javanica near-isogenic lines virulent and avirulent against the tomato resistance gene Mi and preliminary analyses of the genetic variation between the two lines.

PubMed

Xu, Jian-Hua; Narabu, Takashi; Li, Hong-Mei; Fu, Peng

2002-01-01

Meloidogyne javanica, reproducing by mitotic parthenogenesis, is an economically important pathogen of a wide range of crops. A pair of near-isogenic lines virulent and avirulent toward the tomato resistance gene Mi were prepared for M. javanica by continuously selecting an avirulent population on the resistant tomato cultivar Momotaro over 19 generations. Random amplified polymorphic DNA (RAPD) analysis with 102 primers revealed that RAPD patterns were highly conserved between the virulent and avirulent lines, confirming that the two lines were genomically very similar. Nevertheless, with one of the primers a distinct polymorphic fragment, specific for the avirulent lines, was amplified. Southern hybridization results indicated that the polymorphic fragment and its homologs were deleted from the genome of the virulent line during the process of virulence acquisition. Sequence analysis and homology searches of public data bases, however, revealed no published sequences significantly similar to the sequence of the fragment, precluding a prediction of the potential function of the sequence. The successful preparation of the near-isogenic Mi-virulent and avirulent lines laid a firm foundation for the further identification and isolation of virulence-related genes in M. javanica.
A perturbation method to the tent map based on Lyapunov exponent and its application

NASA Astrophysics Data System (ADS)

Cao, Lv-Chen; Luo, Yu-Ling; Qiu, Sen-Hui; Liu, Jun-Xiu

2015-10-01

Perturbation imposed on a chaos system is an effective way to maintain its chaotic features. A novel parameter perturbation method for the tent map based on the Lyapunov exponent is proposed in this paper. The pseudo-random sequence generated by the tent map is sent to another chaos function — the Chebyshev map for the post processing. If the output value of the Chebyshev map falls into a certain range, it will be sent back to replace the parameter of the tent map. As a result, the parameter of the tent map keeps changing dynamically. The statistical analysis and experimental results prove that the disturbed tent map has a highly random distribution and achieves good cryptographic properties of a pseudo-random sequence. As a result, it weakens the phenomenon of strong correlation caused by the finite precision and effectively compensates for the digital chaos system dynamics degradation. Project supported by the Guangxi Provincial Natural Science Foundation, China (Grant No. 2014GXNSFBA118271), the Research Project of Guangxi University, China (Grant No. ZD2014022), the Fund from Guangxi Provincial Key Laboratory of Multi-source Information Mining & Security, China (Grant No. MIMS14-04), the Fund from the Guangxi Provincial Key Laboratory of Wireless Wideband Communication & Signal Processing, China (Grant No. GXKL0614205), the Education Development Foundation and the Doctoral Research Foundation of Guangxi Normal University, the State Scholarship Fund of China Scholarship Council (Grant No. [2014]3012), and the Innovation Project of Guangxi Graduate Education, China (Grant No. YCSZ2015102).
Discovery of common sequences absent in the human reference genome using pooled samples from next generation sequencing.

PubMed

Liu, Yu; Koyutürk, Mehmet; Maxwell, Sean; Xiang, Min; Veigl, Martina; Cooper, Richard S; Tayo, Bamidele O; Li, Li; LaFramboise, Thomas; Wang, Zhenghe; Zhu, Xiaofeng; Chance, Mark R

2014-08-16

Sequences up to several megabases in length have been found to be present in individual genomes but absent in the human reference genome. These sequences may be common in populations, and their absence in the reference genome may indicate rare variants in the genomes of individuals who served as donors for the human genome project. As the reference genome is used in probe design for microarray technology and mapping short reads in next generation sequencing (NGS), this missing sequence could be a source of bias in functional genomic studies and variant analysis. One End Anchor (OEA) and/or orphan reads from paired-end sequencing have been used to identify novel sequences that are absent in reference genome. However, there is no study to investigate the distribution, evolution and functionality of those sequences in human populations. To systematically identify and study the missing common sequences (micSeqs), we extended the previous method by pooling OEA reads from large number of individuals and applying strict filtering methods to remove false sequences. The pipeline was applied to data from phase 1 of the 1000 Genomes Project. We identified 309 micSeqs that are present in at least 1% of the human population, but absent in the reference genome. We confirmed 76% of these 309 micSeqs by comparison to other primate genomes, individual human genomes, and gene expression data. Furthermore, we randomly selected fifteen micSeqs and confirmed their presence using PCR validation in 38 additional individuals. Functional analysis using published RNA-seq and ChIP-seq data showed that eleven micSeqs are highly expressed in human brain and three micSeqs contain transcription factor (TF) binding regions, suggesting they are functional elements. In addition, the identified micSeqs are absent in non-primates and show dynamic acquisition during primate evolution culminating with most micSeqs being present in Africans, suggesting some micSeqs may be important sources of human diversity. 76% of micSeqs were confirmed by a comparative genomics approach. Fourteen micSeqs are expressed in human brain or contain TF binding regions. Some micSeqs are primate-specific, conserved and may play a role in the evolution of primates.
Quantum random number generator

DOEpatents

Pooser, Raphael C.

2016-05-10

A quantum random number generator (QRNG) and a photon generator for a QRNG are provided. The photon generator may be operated in a spontaneous mode below a lasing threshold to emit photons. Photons emitted from the photon generator may have at least one random characteristic, which may be monitored by the QRNG to generate a random number. In one embodiment, the photon generator may include a photon emitter and an amplifier coupled to the photon emitter. The amplifier may enable the photon generator to be used in the QRNG without introducing significant bias in the random number and may enable multiplexing of multiple random numbers. The amplifier may also desensitize the photon generator to fluctuations in power supplied thereto while operating in the spontaneous mode. In one embodiment, the photon emitter and amplifier may be a tapered diode amplifier.
Perceptions of randomness in binary sequences: Normative, heuristic, or both?

PubMed

Reimers, Stian; Donkin, Chris; Le Pelley, Mike E

2018-03-01

When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)
Quantum random number generation

DOE PAGES

Ma, Xiongfeng; Yuan, Xiao; Cao, Zhu; ...

2016-06-28

Quantum physics can be exploited to generate true random numbers, which play important roles in many applications, especially in cryptography. Genuine randomness from the measurement of a quantum system reveals the inherent nature of quantumness -- coherence, an important feature that differentiates quantum mechanics from classical physics. The generation of genuine randomness is generally considered impossible with only classical means. Based on the degree of trustworthiness on devices, quantum random number generators (QRNGs) can be grouped into three categories. The first category, practical QRNG, is built on fully trusted and calibrated devices and typically can generate randomness at a highmore » speed by properly modeling the devices. The second category is self-testing QRNG, where verifiable randomness can be generated without trusting the actual implementation. The third category, semi-self-testing QRNG, is an intermediate category which provides a tradeoff between the trustworthiness on the device and the random number generation speed.« less
Indirect vs direct bonding of mandibular fixed retainers in orthodontic patients: a single-center randomized controlled trial comparing placement time and failure over a 6-month period.

PubMed

Bovali, Efstathia; Kiliaridis, Stavros; Cornelis, Marie A

2014-12-01

The objective of this 2-arm parallel single-center trial was to compare placement time and numbers of failures of mandibular lingual retainers bonded with an indirect procedure vs a direct bonding procedure. Sixty-four consecutive patients at the postgraduate orthodontic clinic of the University of Geneva in Switzerland scheduled for debonding and mandibular fixed retainer placement were randomly allocated to either an indirect bonding procedure or a traditional direct bonding procedure. Eligibility criteria were the presence of the 4 mandibular incisors and the 2 mandibular canines, and no active caries, restorations, fractures, or periodontal disease of these teeth. The patients were randomized in blocks of 4; the randomization sequence was generated using an online randomization service (www.randomization.com). Allocation concealment was secured by contacting the sequence generator for treatment assignment; blinding was possible for outcome assessment only. Bonding time was measured for each procedure. Unpaired t tests were used to assess differences in time. Patients were recalled at 1, 2, 4, and 6 months after bonding. Mandibular fixed retainers having at least 1 composite pad debonded were considered as failures. The log-rank test was used to compare the Kaplan-Meier survival curves of both procedures. A test of proportion was applied to compare the failures at 6 months between the treatment groups. Sixty-four patients were randomized in a 1:1 ratio. One patient dropped out at baseline after the bonding procedure, and 3 patients did not attend the recalls at 4 and 6 months. Bonding time was significantly shorter for the indirect procedure (321 ± 31 seconds, mean ± SD) than for the direct procedure (401 ± 40 seconds) (per protocol analysis of 63 patients: mean difference = 80 seconds; 95% CI = 62.4-98.1; P <0.001). The 6-month numbers of failures were 10 of 31 (32%) with the indirect technique and 7 of 29 (24%) with the direct technique (log rank: P = 0.35; test of proportions: risk difference = 0.08; 95% CI = -0.15 to 0.31; P = 0.49). No serious harm was observed except for plaque accumulation. Indirect bonding was statistically significantly faster than direct bonding, with both techniques showing similar risks of failure. This trial was not registered. The protocol was not published before trial commencement. No funding or conflict of interest to be declared. Copyright © 2014 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.
Molecular and bioinformatic analysis of the FB-NOF transposable element.

PubMed

Badal, Martí; Portela, Anna; Xamena, Noel; Cabré, Oriol

2006-04-12

The Drosophila melanogaster transposable element FB-NOF is known to play a role in genome plasticity through the generation of all sort of genomic rearrangements. Moreover, several insertional mutants due to FB mobilizations have been reported. Its structure and sequence, however, have been poorly studied mainly as a consequence of the long, complex and repetitive sequence of FB inverted repeats. This repetitive region is composed of several 154 bp blocks, each with five almost identical repeats. In this paper, we report the sequencing process of 2 kb long FB inverted repeats of a complete FB-NOF element, with high precision and reliability. This achievement has been possible using a new map of the FB repetitive region, which identifies unambiguously each repeat with new features that can be used as landmarks. With this new vision of the element, a list of FB-NOF in the D. melanogaster genomic clones has been done, improving previous works that used only bioinformatic algorithms. The availability of many FB and FB-NOF sequences allowed an analysis of the FB insertion sequences that showed no sequence specificity, but a preference for A/T rich sequences. The position of NOF into FB is also studied, revealing that it is always located after a second repeat in a random block. With the results of this analysis, we propose a model of transposition in which NOF jumps from FB to FB, using an unidentified transposase enzyme that should specifically recognize the second repeat end of the FB blocks.
Linguistic Analysis of the Human Heartbeat Using Frequency and Rank Order Statistics

NASA Astrophysics Data System (ADS)

Yang, Albert C.-C.; Hseu, Shu-Shya; Yien, Huey-Wen; Goldberger, Ary L.; Peng, C.-K.

2003-03-01

Complex physiologic signals may carry unique dynamical signatures that are related to their underlying mechanisms. We present a method based on rank order statistics of symbolic sequences to investigate the profile of different types of physiologic dynamics. We apply this method to heart rate fluctuations, the output of a central physiologic control system. The method robustly discriminates patterns generated from healthy and pathologic states, as well as aging. Furthermore, we observe increased randomness in the heartbeat time series with physiologic aging and pathologic states and also uncover nonrandom patterns in the ventricular response to atrial fibrillation.
Identifying uniformly mutated segments within repeats.

PubMed

Sahinalp, S Cenk; Eichler, Evan; Goldberg, Paul; Berenbrink, Petra; Friedetzky, Tom; Ergun, Funda

2004-12-01

Given a long string of characters from a constant size alphabet we present an algorithm to determine whether its characters have been generated by a single i.i.d. random source. More specifically, consider all possible n-coin models for generating a binary string S, where each bit of S is generated via an independent toss of one of the n coins in the model. The choice of which coin to toss is decided by a random walk on the set of coins where the probability of a coin change is much lower than the probability of using the same coin repeatedly. We present a procedure to evaluate the likelihood of a n-coin model for given S, subject a uniform prior distribution over the parameters of the model (that represent mutation rates and probabilities of copying events). In the absence of detailed prior knowledge of these parameters, the algorithm can be used to determine whether the a posteriori probability for n=1 is higher than for any other n>1. Our algorithm runs in time O(l4logl), where l is the length of S, through a dynamic programming approach which exploits the assumed convexity of the a posteriori probability for n. Our test can be used in the analysis of long alignments between pairs of genomic sequences in a number of ways. For example, functional regions in genome sequences exhibit much lower mutation rates than non-functional regions. Because our test provides means for determining variations in the mutation rate, it may be used to distinguish functional regions from non-functional ones. Another application is in determining whether two highly similar, thus evolutionarily related, genome segments are the result of a single copy event or of a complex series of copy events. This is particularly an issue in evolutionary studies of genome regions rich with repeat segments (especially tandemly repeated segments).

Generation of diversity in Streptococcus mutans genes demonstrated by MLST.

PubMed

Do, Thuy; Gilbert, Steven C; Clark, Douglas; Ali, Farida; Fatturi Parolo, Clarissa C; Maltz, Marisa; Russell, Roy R; Holbrook, Peter; Wade, William G; Beighton, David

2010-02-05

Streptococcus mutans, consisting of serotypes c, e, f and k, is an oral aciduric organism associated with the initiation and progression of dental caries. A total of 135 independent Streptococcus mutans strains from caries-free and caries-active subjects isolated from various geographical locations were examined in two versions of an MLST scheme consisting of either 6 housekeeping genes [accC (acetyl-CoA carboxylase biotin carboxylase subunit), gki (glucokinase), lepA (GTP-binding protein), recP (transketolase), sodA (superoxide dismutase), and tyrS (tyrosyl-tRNA synthetase)] or the housekeeping genes supplemented with 2 extracellular putative virulence genes [gtfB (glucosyltransferase B) and spaP (surface protein antigen I/II)] to increase sequence type diversity. The number of alleles found varied between 20 (lepA) and 37 (spaP). Overall, 121 sequence types (STs) were defined using the housekeeping genes alone and 122 with all genes. However pi, nucleotide diversity per site, was low for all loci being in the range 0.019-0.007. The virulence genes exhibited the greatest nucleotide diversity and the recombination/mutation ratio was 0.67 [95% confidence interval 0.3-1.15] compared to 8.3 [95% confidence interval 5.0-14.5] for the 6 concatenated housekeeping genes alone. The ML trees generated for individual MLST loci were significantly incongruent and not significantly different from random trees. Analysis using ClonalFrame indicated that the majority of isolates were singletons and no evidence for a clonal structure or evidence to support serotype c strains as the ancestral S. mutans strain was apparent. There was also no evidence of a geographical distribution of individual isolates or that particular isolate clusters were associated with caries. The overall low sequence diversity suggests that S. mutans is a newly emerged species which has not accumulated large numbers of mutations but those that have occurred have been shuffled as a consequence of intra-species recombination generating genotypes which can be readily distinguished by sequence analysis.
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Fatigue testing of weldable high strength steels under simulated service conditions

NASA Astrophysics Data System (ADS)

Tantbirojn, Natee

There have been concerns over the effect of Cathodic Protection (CP) on weldable high strength steels employed in Jack-up production platform. The guidance provided by the Department of Energy HSE on higher strength steels, based on previous work, was to avoid overprotection as this could cause hydrogen embrittlement. However, the tests conducted so far at UCL for the SE702 type high strength steels (yields strength around 690 MPa) have shown that the effect of over protection on high strength steels may not be as severe as previously thought. For this thesis, SE702 high strength steels have been investigated in more detail. Thick (85mm) parent and ground welded plates were tested under constant amplitude in air and seawater with CP. Tests were also conducted on Thick (40mm) T-butt welded plates under variable amplitude loading in air and seawater with two CP levels (-800mV and -1050mV). Different backing materials (ceramic and metallic) for the welding process of the T-butt plates were also investigated. The variable amplitude sequences employed were generated using the Jack-up Offshore Standard load History (JOSH). The fatigue results are presented as crack growth and S/N curves. They were compared to the conventional offshore steel (BS 4360 50D). The results suggested that the fatigue life of the high strength steels was comparable to the BS 4360 50D steels. The effect of increasing the CP was found to be detrimental to the fatigue life but the effect was not large. The effect of CP was less noticeable in T-butt welded plates. However, in general, the effect of overprotection is not as detrimental to the Jack-up steels as previously thought. The load histories generated by JOSH were found to have some unfavourable characteristics. The framework is based on Markov Chain method and pseudo-random number generator for selecting sea-states. A study was carried out on the sequence generated by JOSH. The generated sequences were analysed for their validity for fatigue testing. This has resulted in recommendations on the methods for generating standard load histories.
Competition between B-Z and B-L transitions in a single DNA molecule: Computational studies

NASA Astrophysics Data System (ADS)

Kwon, Ah-Young; Nam, Gi-Moon; Johner, Albert; Kim, Seyong; Hong, Seok-Cheol; Lee, Nam-Kyung

2016-02-01

Under negative torsion, DNA adopts left-handed helical forms, such as Z-DNA and L-DNA. Using the random copolymer model developed for a wormlike chain, we represent a single DNA molecule with structural heterogeneity as a helical chain consisting of monomers which can be characterized by different helical senses and pitches. By Monte Carlo simulation, where we take into account bending and twist fluctuations explicitly, we study sequence dependence of B-Z transitions under torsional stress and tension focusing on the interaction with B-L transitions. We consider core sequences, (GC) n repeats or (TG) n repeats, which can interconvert between the right-handed B form and the left-handed Z form, imbedded in a random sequence, which can convert to left-handed L form with different (tension dependent) helical pitch. We show that Z-DNA formation from the (GC) n sequence is always supported by unwinding torsional stress but Z-DNA formation from the (TG) n sequence, which are more costly to convert but numerous, can be strongly influenced by the quenched disorder in the surrounding random sequence.
Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees.

PubMed

Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard

2010-03-31

Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
De novo assembly and characterization of bark transcriptome using Illumina sequencing and development of EST-SSR markers in rubber tree (Hevea brasiliensis Muell. Arg.)

PubMed Central

2012-01-01

Background In rubber tree, bark is one of important agricultural and biological organs. However, the molecular mechanism involved in the bark formation and development in rubber tree remains largely unknown, which is at least partially due to lack of bark transcriptomic and genomic information. Therefore, it is necessary to carried out high-throughput transcriptome sequencing of rubber tree bark to generate enormous transcript sequences for the functional characterization and molecular marker development. Results In this study, more than 30 million sequencing reads were generated using Illumina paired-end sequencing technology. In total, 22,756 unigenes with an average length of 485 bp were obtained with de novo assembly. The similarity search indicated that 16,520 and 12,558 unigenes showed significant similarities to known proteins from NCBI non-redundant and Swissprot protein databases, respectively. Among these annotated unigenes, 6,867 and 5,559 unigenes were separately assigned to Gene Ontology (GO) and Clusters of Orthologous Group (COG). When 22,756 unigenes searched against the Kyoto Encyclopedia of Genes and Genomes Pathway (KEGG) database, 12,097 unigenes were assigned to 5 main categories including 123 KEGG pathways. Among the main KEGG categories, metabolism was the biggest category (9,043, 74.75%), suggesting the active metabolic processes in rubber tree bark. In addition, a total of 39,257 EST-SSRs were identified from 22,756 unigenes, and the characterizations of EST-SSRs were further analyzed in rubber tree. 110 potential marker sites were randomly selected to validate the assembly quality and develop EST-SSR markers. Among 13 Hevea germplasms, PCR success rate and polymorphism rate of 110 markers were separately 96.36% and 55.45% in this study. Conclusion By assembling and analyzing de novo transcriptome sequencing data, we reported the comprehensive functional characterization of rubber tree bark. This research generated a substantial fraction of rubber tree transcriptome sequences, which were very useful resources for gene annotation and discovery, molecular markers development, genome assembly and annotation, and microarrays development in rubber tree. The EST-SSR markers identified and developed in this study will facilitate marker-assisted selection breeding in rubber tree. Moreover, this study also supported that transcriptome analysis based on Illumina paired-end sequencing is a powerful tool for transcriptome characterization and molecular marker development in non-model species, especially those with large and complex genomes. PMID:22607098
Characterization and Simulation of Gunfire with Wavelets

DOE PAGES

Smallwood, David O.

1999-01-01

Gunfire is used as an example to show how the wavelet transform can be used to characterize and simulate nonstationary random events when an ensemble of events is available. The structural response to nearby firing of a high-firing rate gun has been characterized in several ways as a nonstationary random process. The current paper will explore a method to describe the nonstationary random process using a wavelet transform. The gunfire record is broken up into a sequence of transient waveforms each representing the response to the firing of a single round. A wavelet transform is performed on each of thesemore » records. The gunfire is simulated by generating realizations of records of a single-round firing by computing an inverse wavelet transform from Gaussian random coefficients with the same mean and standard deviation as those estimated from the previously analyzed gunfire record. The individual records are assembled into a realization of many rounds firing. A second-order correction of the probability density function is accomplished with a zero memory nonlinear function. The method is straightforward, easy to implement, and produces a simulated record much like the measured gunfire record.« less
ARTS: automated randomization of multiple traits for study design.

PubMed

Maienschein-Cline, Mark; Lei, Zhengdeng; Gardeux, Vincent; Abbasi, Taimur; Machado, Roberto F; Gordeuk, Victor; Desai, Ankit A; Saraf, Santosh; Bahroos, Neil; Lussier, Yves

2014-06-01

Collecting data from large studies on high-throughput platforms, such as microarray or next-generation sequencing, typically requires processing samples in batches. There are often systematic but unpredictable biases from batch-to-batch, so proper randomization of biologically relevant traits across batches is crucial for distinguishing true biological differences from experimental artifacts. When a large number of traits are biologically relevant, as is common for clinical studies of patients with varying sex, age, genotype and medical background, proper randomization can be extremely difficult to prepare by hand, especially because traits may affect biological inferences, such as differential expression, in a combinatorial manner. Here we present ARTS (automated randomization of multiple traits for study design), which aids researchers in study design by automatically optimizing batch assignment for any number of samples, any number of traits and any batch size. ARTS is implemented in Perl and is available at github.com/mmaiensc/ARTS. ARTS is also available in the Galaxy Tool Shed, and can be used at the Galaxy installation hosted by the UIC Center for Research Informatics (CRI) at galaxy.cri.uic.edu. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Low-Energy Truly Random Number Generation with Superparamagnetic Tunnel Junctions for Unconventional Computing

NASA Astrophysics Data System (ADS)

Vodenicarevic, D.; Locatelli, N.; Mizrahi, A.; Friedman, J. S.; Vincent, A. F.; Romera, M.; Fukushima, A.; Yakushiji, K.; Kubota, H.; Yuasa, S.; Tiwari, S.; Grollier, J.; Querlioz, D.

2017-11-01

Low-energy random number generation is critical for many emerging computing schemes proposed to complement or replace von Neumann architectures. However, current random number generators are always associated with an energy cost that is prohibitive for these computing schemes. We introduce random number bit generation based on specific nanodevices: superparamagnetic tunnel junctions. We experimentally demonstrate high-quality random bit generation that represents an orders-of-magnitude improvement in energy efficiency over current solutions. We show that the random generation speed improves with nanodevice scaling, and we investigate the impact of temperature, magnetic field, and cross talk. Finally, we show how alternative computing schemes can be implemented using superparamagentic tunnel junctions as random number generators. These results open the way for fabricating efficient hardware computing devices leveraging stochasticity, and they highlight an alternative use for emerging nanodevices.
A complex of RAG-1 and RAG-2 proteins persists on DNA after single-strand cleavage at V(D)J recombination signal sequences.

PubMed Central

Grawunder, U; Lieber, M R

1997-01-01

The recombination activating gene (RAG) 1 and 2 proteins are required for initiation of V(D)J recombination in vivo and have been shown to be sufficient to introduce DNA double-strand breaks at recombination signal sequences (RSSs) in a cell-free assay in vitro. RSSs consist of a highly conserved palindromic heptamer that is separated from a slightly less conserved A/T-rich nonamer by either a 12 or 23 bp spacer of random sequence. Despite the high sequence specificity of RAG-mediated cleavage at RSSs, direct binding of the RAG proteins to these sequences has been difficult to demonstrate by standard methods. Even when this can be demonstrated, questions about the order of events for an individual RAG-RSS complex will require methods that monitor aspects of the complex during transitions from one step of the reaction to the next. Here we have used template-independent DNA polymerase terminal deoxynucleotidyl transferase (TdT) in order to assess occupancy of the reaction intermediates by the RAG complex during the reaction. In addition, this approach allows analysis of the accessibility of end products of a RAG-catalyzed cleavage reaction for N nucleotide addition. The results indicate that RAG proteins form a long-lived complex with the RSS once the initial nick is generated, because the 3'-OH group at the nick remains obstructed for TdT-catalyzed N nucleotide addition. In contrast, the 3'-OH group generated at the signal end after completion of the cleavage reaction can be efficiently tailed by TdT, suggesting that the RAG proteins disassemble from the signal end after DNA double-strand cleavage has been completed. Therefore, a single RAG complex maintains occupancy from the first step (nick formation) to the second step (cleavage). In addition, the results suggest that N region diversity at V(D)J junctions within rearranged immunoglobulin and T cell receptor gene loci can only be introduced after the generation of RAG-catalyzed DNA double-strand breaks, i.e. during the DNA end joining phase of the V(D)J recombination reaction. PMID:9060432
De novo assembly of highly polymorphic metagenomic data using in situ generated reference sequences and a novel BLAST-based assembly pipeline.

PubMed

Lin, You-Yu; Hsieh, Chia-Hung; Chen, Jiun-Hong; Lu, Xuemei; Kao, Jia-Horng; Chen, Pei-Jer; Chen, Ding-Shinn; Wang, Hurng-Yi

2017-04-26

The accuracy of metagenomic assembly is usually compromised by high levels of polymorphism due to divergent reads from the same genomic region recognized as different loci when sequenced and assembled together. A viral quasispecies is a group of abundant and diversified genetically related viruses found in a single carrier. Current mainstream assembly methods, such as Velvet and SOAPdenovo, were not originally intended for the assembly of such metagenomics data, and therefore demands for new methods to provide accurate and informative assembly results for metagenomic data. In this study, we present a hybrid method for assembling highly polymorphic data combining the partial de novo-reference assembly (PDR) strategy and the BLAST-based assembly pipeline (BBAP). The PDR strategy generates in situ reference sequences through de novo assembly of a randomly extracted partial data set which is subsequently used for the reference assembly for the full data set. BBAP employs a greedy algorithm to assemble polymorphic reads. We used 12 hepatitis B virus quasispecies NGS data sets from a previous study to assess and compare the performance of both PDR and BBAP. Analyses suggest the high polymorphism of a full metagenomic data set leads to fragmentized de novo assembly results, whereas the biased or limited representation of external reference sequences included fewer reads into the assembly with lower assembly accuracy and variation sensitivity. In comparison, the PDR generated in situ reference sequence incorporated more reads into the final PDR assembly of the full metagenomics data set along with greater accuracy and higher variation sensitivity. BBAP assembly results also suggest higher assembly efficiency and accuracy compared to other assembly methods. Additionally, BBAP assembly recovered HBV structural variants that were not observed amongst assembly results of other methods. Together, PDR/BBAP assembly results were significantly better than other compared methods. Both PDR and BBAP independently increased the assembly efficiency and accuracy of highly polymorphic data, and assembly performances were further improved when used together. BBAP also provides nucleotide frequency information. Together, PDR and BBAP provide powerful tools for metagenomic data studies.
Selection of peptides binding to metallic borides by screening M13 phage display libraries.

PubMed

Ploss, Martin; Facey, Sandra J; Bruhn, Carina; Zemel, Limor; Hofmann, Kathrin; Stark, Robert W; Albert, Barbara; Hauer, Bernhard

2014-02-10

Metal borides are a class of inorganic solids that is much less known and investigated than for example metal oxides or intermetallics. At the same time it is a highly versatile and interesting class of compounds in terms of physical and chemical properties, like semiconductivity, ferromagnetism, or catalytic activity. This makes these substances attractive for the generation of new materials. Very little is known about the interaction between organic materials and borides. To generate nanostructured and composite materials which consist of metal borides and organic modifiers it is necessary to develop new synthetic strategies. Phage peptide display libraries are commonly used to select peptides that bind specifically to metals, metal oxides, and semiconductors. Further, these binding peptides can serve as templates to control the nucleation and growth of inorganic nanoparticles. Additionally, the combination of two different binding motifs into a single bifunctional phage could be useful for the generation of new composite materials. In this study, we have identified a unique set of sequences that bind to amorphous and crystalline nickel boride (Ni3B) nanoparticles, from a random peptide library using the phage display technique. Using this technique, strong binders were identified that are selective for nickel boride. Sequence analysis of the peptides revealed that the sequences exhibit similar, yet subtle different patterns of amino acid usage. Although a predominant binding motif was not observed, certain charged amino acids emerged as essential in specific binding to both substrates. The 7-mer peptide sequence LGFREKE, isolated on amorphous Ni3B emerged as the best binder for both substrates. Fluorescence microscopy and atomic force microscopy confirmed the specific binding affinity of LGFREKE expressing phage to amorphous and crystalline Ni3B nanoparticles. This study is, to our knowledge, the first to identify peptides that bind specifically to amorphous and to crystalline Ni3B nanoparticles. We think that the identified strong binding sequences described here could potentially serve for the utilisation of M13 phage as a viable alternative to other methods to create tailor-made boride composite materials or new catalytic surfaces by a biologically driven nano-assembly synthesis and structuring.
Human Inferences about Sequences: A Minimal Transition Probability Model

PubMed Central

2016-01-01

The brain constantly infers the causes of the inputs it receives and uses these inferences to generate statistical expectations about future observations. Experimental evidence for these expectations and their violations include explicit reports, sequential effects on reaction times, and mismatch or surprise signals recorded in electrophysiology and functional MRI. Here, we explore the hypothesis that the brain acts as a near-optimal inference device that constantly attempts to infer the time-varying matrix of transition probabilities between the stimuli it receives, even when those stimuli are in fact fully unpredictable. This parsimonious Bayesian model, with a single free parameter, accounts for a broad range of findings on surprise signals, sequential effects and the perception of randomness. Notably, it explains the pervasive asymmetry between repetitions and alternations encountered in those studies. Our analysis suggests that a neural machinery for inferring transition probabilities lies at the core of human sequence knowledge. PMID:28030543
Forecasting drought risks for a water supply storage system using bootstrap position analysis

USGS Publications Warehouse

Tasker, Gary; Dunne, Paul

1997-01-01

Forecasting the likelihood of drought conditions is an integral part of managing a water supply storage and delivery system. Position analysis uses a large number of possible flow sequences as inputs to a simulation of a water supply storage and delivery system. For a given set of operating rules and water use requirements, water managers can use such a model to forecast the likelihood of specified outcomes such as reservoir levels falling below a specified level or streamflows falling below statutory passing flows a few months ahead conditioned on the current reservoir levels and streamflows. The large number of possible flow sequences are generated using a stochastic streamflow model with a random resampling of innovations. The advantages of this resampling scheme, called bootstrap position analysis, are that it does not rely on the unverifiable assumption of normality and it allows incorporation of long-range weather forecasts into the analysis.
Neuronal Vacuolization in Feline Panleukopenia Virus Infection.

PubMed

Pfankuche, Vanessa M; Jo, Wendy K; van der Vries, Erhard; Jungwirth, Nicole; Lorenzen, Stephan; Osterhaus, Albert D M E; Baumgärtner, Wolfgang; Puff, Christina

2018-03-01

Feline panleukopenia virus (FPV) infections are typically associated with anorexia, vomiting, diarrhea, neutropenia, and lymphopenia. In cases of late prenatal or early neonatal infections, cerebellar hypoplasia is reported in kittens. In addition, single cases of encephalitis are described. FPV replication was recently identified in neurons, although it is mainly found in cells with high mitotic activity. A female cat, 2 months old, was submitted to necropsy after it died with neurologic deficits. Besides typical FPV intestinal tract changes, multifocal, randomly distributed intracytoplasmic vacuoles within neurons of the thoracic spinal cord were found histologically. Next-generation sequencing identified FPV-specific sequences within the central nervous system. FPV antigen was detected within central nervous system cells, including the vacuolated neurons, via immunohistochemistry. In situ hybridization confirmed the presence of FPV DNA within the vacuolated neurons. Thus, FPV should be considered a cause for neuronal vacuolization in cats presenting with ataxia.
Solution to urn models of pairwise interaction with application to social, physical, and biological sciences

NASA Astrophysics Data System (ADS)

Pickering, William; Lim, Chjan

2017-07-01

We investigate a family of urn models that correspond to one-dimensional random walks with quadratic transition probabilities that have highly diverse applications. Well-known instances of these two-urn models are the Ehrenfest model of molecular diffusion, the voter model of social influence, and the Moran model of population genetics. We also provide a generating function method for diagonalizing the corresponding transition matrix that is valid if and only if the underlying mean density satisfies a linear differential equation and express the eigenvector components as terms of ordinary hypergeometric functions. The nature of the models lead to a natural extension to interaction between agents in a general network topology. We analyze the dynamics on uncorrelated heterogeneous degree sequence networks and relate the convergence times to the moments of the degree sequences for various pairwise interaction mechanisms.
The intellectual developmental disorders Mexico study: situational diagnosis, burden, genomics and intervention proposal.

PubMed

Lazcano-Ponce, Eduardo; Katz, Gregorio; Rodríguez-Valentín, Rocío; Castro, Filipa de; Allen-Leigh, Betania; Márquez-Caraveo, María Elena; Ramírez-García, Miguel Ángel; Arroyo-García, Eduardo; Medina-Mora, María Elena; Ángeles, Gustavo; Urquieta-Salomón, José Edmundo; Salvador-Carulla, Luis

2016-01-01

This study aims to generate evidence on intellectual development disorders (IDD) in Mexico. IDD disease burden will be estimated with a probabilistic model, using population-based surveys. Direct and indirect costs of catastrophic expenses of families with a member with an IDD will be evaluated. Genomic characterization of IDD will include: sequencing participant exomes and performing bioinformatics analyses to identify de novo or inherited variants through trio analysis; identifying genetic variants associated with IDD, and validating randomly selected variants by polymerase chain reaction (PCR) and sequencing or real-time quantitative PCR (qPCR). Delphi surveys will be done on best practices for IDD diagnosis and management. An external evaluation will employ qualitative case studies of two social and labor inclusion programs for people with IDD. The results will constitute scientific evidence for the design, promotion and evaluation of public policies, which are currently absent on IDD.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

PubMed Central

2014-01-01

Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

PubMed

Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

2018-02-01

Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer.

PubMed

Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph; Aury, Jean-Marc

2017-02-01

Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. © The Author 2017. Published by Oxford University Press.
de novo assembly and population genomic survey of natural yeast isolates with the Oxford Nanopore MinION sequencer

PubMed Central

Istace, Benjamin; Friedrich, Anne; d'Agata, Léo; Faye, Sébastien; Payen, Emilie; Beluche, Odette; Caradec, Claudia; Davidas, Sabrina; Cruaud, Corinne; Liti, Gianni; Lemainque, Arnaud; Engelen, Stefan; Wincker, Patrick; Schacherer, Joseph

2017-01-01

Abstract Background: Oxford Nanopore Technologies Ltd (Oxford, UK) have recently commercialized MinION, a small single-molecule nanopore sequencer, that offers the possibility of sequencing long DNA fragments from small genomes in a matter of seconds. The Oxford Nanopore technology is truly disruptive; it has the potential to revolutionize genomic applications due to its portability, low cost, and ease of use compared with existing long reads sequencing technologies. The MinION sequencer enables the rapid sequencing of small eukaryotic genomes, such as the yeast genome. Combined with existing assembler algorithms, near complete genome assemblies can be generated and comprehensive population genomic analyses can be performed. Results: Here, we resequenced the genome of the Saccharomyces cerevisiae S288C strain to evaluate the performance of nanopore-only assemblers. Then we de novo sequenced and assembled the genomes of 21 isolates representative of the S. cerevisiae genetic diversity using the MinION platform. The contiguity of our assemblies was 14 times higher than the Illumina-only assemblies and we obtained one or two long contigs for 65 % of the chromosomes. This high contiguity allowed us to accurately detect large structural variations across the 21 studied genomes. Conclusion: Because of the high completeness of the nanopore assemblies, we were able to produce a complete cartography of transposable elements insertions and inspect structural variants that are generally missed using a short-read sequencing strategy. Our analyses show that the Oxford Nanopore technology is already usable for de novo sequencing and assembly; however, non-random errors in homopolymers require polishing the consensus using an alternate sequencing technology. PMID:28369459
C5-epimerase and 2-O-sulfotransferase associate in vitro to generate contiguous epimerized and 2-O-sulfated heparan sulfate domains.

PubMed

Préchoux, Aurélie; Halimi, Célia; Simorre, Jean-Pierre; Lortat-Jacob, Hugues; Laguri, Cédric

2015-04-17

Heparan sulfate (HS), a complex polysaccharide of the cell surface, is endowed with the remarkable ability to bind numerous proteins and, as such, regulates a large variety of biological processes. Protein binding depends on HS structure; however, in the absence of a template driving its biosynthesis, the mechanism by which protein binding sequences are assembled remains poorly known. Here, we developed a chemically defined 13C-labeled substrate and NMR based experiments to simultaneously follow in real time the activity of HS biosynthetic enzymes and characterize the reaction products. Using this new approach, we report that the association of C5-epimerase and 2-O-sulfotransferase, which catalyze the production of iduronic acid and its 2-O-sulfation, respectively, is necessary to processively generate extended sequences of contiguous IdoA2S-containing disaccharides, whereas modifications are randomly introduced when the enzymes are uncoupled. These data shed light on the mechanisms by which HS motifs are generated during biosynthesis. They support the view that HS structure assembly is controlled not only by the availability of the biosynthetic enzymes but also by their physical association, which in the case of the C5-epimerase and 2-O-sulfotransferase was characterized by an affinity of 80 nM as demonstrated by surface plasmon resonance experiments.
Specific and Modular Binding Code for Cytosine Recognition in Pumilio/FBF (PUF) RNA-binding Domains

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dong, Shuyun; Wang, Yang; Cassidy-Amstutz, Caleb

2011-10-28

Pumilio/fem-3 mRNA-binding factor (PUF) proteins possess a recognition code for bases A, U, and G, allowing designed RNA sequence specificity of their modular Pumilio (PUM) repeats. However, recognition side chains in a PUM repeat for cytosine are unknown. Here we report identification of a cytosine-recognition code by screening random amino acid combinations at conserved RNA recognition positions using a yeast three-hybrid system. This C-recognition code is specific and modular as specificity can be transferred to different positions in the RNA recognition sequence. A crystal structure of a modified PUF domain reveals specific contacts between an arginine side chain and themore » cytosine base. We applied the C-recognition code to design PUF domains that recognize targets with multiple cytosines and to generate engineered splicing factors that modulate alternative splicing. Finally, we identified a divergent yeast PUF protein, Nop9p, that may recognize natural target RNAs with cytosine. This work deepens our understanding of natural PUF protein target recognition and expands the ability to engineer PUF domains to recognize any RNA sequence.« less
Overview of recurrent chromosomal losses in retinoblastoma detected by low coverage next generation sequencing

PubMed Central

García-Chequer, A.J.; Méndez-Tenorio, A.; Olguín-Ruiz, G.; Sánchez-Vallejo, C.; Isa, P.; Arias, C.F.; Torres, J.; Hernández-Angeles, A.; Ramírez-Ortiz, M.A.; Lara, C.; Cabrera-Muñoz, M.L.; Sadowinski-Pine, S.; Bravo-Ortiz, J.C.; Ramón-García, G.; Diegopérez-Ramírez, J.; Ramírez-Reyes, G.; Casarrubias-Islas, R.; Ramírez, J.; Orjuela, M.A.; Ponce-Castañeda, M.V.

2016-01-01

Genes are frequently lost or gained in malignant tumors and the analysis of these changes can be informative about the underlying tumor biology. Retinoblastoma is a pediatric intraocular malignancy, and since deletions in chromosome 13 have been described in this tumor, we performed genome wide sequencing with the Illumina platform to test whether recurrent losses could be detected in low coverage data from DNA pools of Rb cases. An in silico reference profile for each pool was created from the human genome sequence GRCh37p5; a chromosome integrity score and a graphics 40 Kb window analysis approach, allowed us to identify with high resolution previously reported non random recurrent losses in all chromosomes of these tumors. We also found a pattern of gains and losses associated to clear and dark cytogenetic bands respectively. We further analyze a pool of medulloblastoma and found a more stable genomic profile and previously reported losses in this tumor. This approach facilitates identification of recurrent deletions from many patients that may be biological relevant for tumor development. PMID:26883451
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

PubMed Central

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-01-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

PubMed

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-09-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
Pathogenesis, Molecular Genetics, and Genomics of Mycobacterium avium subsp. paratuberculosis, the Etiologic Agent of Johne’s Disease

PubMed Central

Rathnaiah, Govardhan; Zinniel, Denise K.; Bannantine, John P.; Stabel, Judith R.; Gröhn, Yrjö T.; Collins, Michael T.; Barletta, Raúl G.

2017-01-01

Mycobacterium avium subsp. paratuberculosis (MAP) is the etiologic agent of Johne’s disease in ruminants causing chronic diarrhea, malnutrition, and muscular wasting. Neonates and young animals are infected primarily by the fecal–oral route. MAP attaches to, translocates via the intestinal mucosa, and is phagocytosed by macrophages. The ensuing host cellular immune response leads to granulomatous enteritis characterized by a thick and corrugated intestinal wall. We review various tissue culture systems, ileal loops, and mice, goats, and cattle used to study MAP pathogenesis. MAP can be detected in clinical samples by microscopy, culturing, PCR, and an enzyme-linked immunosorbent assay. There are commercial vaccines that reduce clinical disease and shedding, unfortunately, their efficacies are limited and may not engender long-term protective immunity. Moreover, the potential linkage with Crohn’s disease and other human diseases makes MAP a concern as a zoonotic pathogen. Potential therapies with anti-mycobacterial agents are also discussed. The completion of the MAP K-10 genome sequence has greatly improved our understanding of MAP pathogenesis. The analysis of this sequence has identified a wide range of gene functions involved in virulence, lipid metabolism, transcriptional regulation, and main metabolic pathways. We also review the transposons utilized to generate random transposon mutant libraries and the recent advances in the post-genomic era. This includes the generation and characterization of allelic exchange mutants, transcriptomic analysis, transposon mutant banks analysis, new efforts to generate comprehensive mutant libraries, and the application of transposon site hybridization mutagenesis and transposon sequencing for global analysis of the MAP genome. Further analysis of candidate vaccine strains development is also provided with critical discussions on their benefits and shortcomings, and strategies to develop a highly efficacious live-attenuated vaccine capable of differentiating infected from vaccinated animals. PMID:29164142
Deciphering of the Dual oxidase (Nox family) gene from kuruma shrimp, Marsupenaeus japonicus: full-length cDNA cloning and characterization.

PubMed

Inada, Mari; Kihara, Keisuke; Kono, Tomoya; Sudhakaran, Raja; Mekata, Tohru; Sakai, Masahiro; Yoshida, Terutoyo; Itami, Toshiaki

2013-02-01

In many physiological processes, including the innate immune system, free radicals such as nitric oxide (NO) and reactive oxygen species (ROS) play significant roles. In humans, 2 homologs of Dual oxidases (Duox) generate hydrogen peroxide (H(2)O(2)), which is a type of ROS. Here, we report the identification and characterization of a Duox from kuruma shrimp, Marsupenaeus japonicus. The full-length cDNA sequence of the M. japonicus Dual oxidase (MjDuox) gene contains 4695 bp and was generated using reverse transcriptase-polymerase chain reaction (RT-PCR) and random amplification of cDNA ends (RACE). The open reading frame of MjDuox encodes a protein of 1498 amino acids with an estimated mass of 173 kDa. In a homology analysis using amino acid sequences, MjDuox exhibited 69.3% sequence homology with the Duox of the red flour beetle, Tribolium castaneum. A transcriptional analysis revealed that the MjDuox mRNA is highly expressed in the gills of healthy kuruma shrimp. In the gills, MjDuox expression reached its peak 60 h after injection with WSSV and decreased to its normal level at 72 h. In gene knockdown experiments of free radical-generating enzymes, the survival rates decreased during the early stages of a white spot syndrome virus (WSSV) infection following the knockdown of the NADPH oxidase (MjNox) or MjDuox genes. In the present study, the identification, cloning and gene knockdown of the kuruma shrimp MjDuox are reported. Duoxes have been identified in vertebrates and some insects; however, few reports have investigated Duoxes in crustaceans. This study is the first to identify and clone a Dual oxidase from a crustacean species. Copyright © 2012 Elsevier Ltd. All rights reserved.
'PACLIMS': a component LIM system for high-throughput functional genomic analysis.

PubMed

Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A

2005-04-12

Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the approximately 11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors.
'PACLIMS': A component LIM system for high-throughput functional genomic analysis

PubMed Central

Donofrio, Nicole; Rajagopalon, Ravi; Brown, Douglas; Diener, Stephen; Windham, Donald; Nolin, Shelly; Floyd, Anna; Mitchell, Thomas; Galadima, Natalia; Tucker, Sara; Orbach, Marc J; Patel, Gayatri; Farman, Mark; Pampanwar, Vishal; Soderlund, Cari; Lee, Yong-Hwan; Dean, Ralph A

2005-01-01

Background Recent advances in sequencing techniques leading to cost reduction have resulted in the generation of a growing number of sequenced eukaryotic genomes. Computational tools greatly assist in defining open reading frames and assigning tentative annotations. However, gene functions cannot be asserted without biological support through, among other things, mutational analysis. In taking a genome-wide approach to functionally annotate an entire organism, in this application the ~11,000 predicted genes in the rice blast fungus (Magnaporthe grisea), an effective platform for tracking and storing both the biological materials created and the data produced across several participating institutions was required. Results The platform designed, named PACLIMS, was built to support our high throughput pipeline for generating 50,000 random insertion mutants of Magnaporthe grisea. To be a useful tool for materials and data tracking and storage, PACLIMS was designed to be simple to use, modifiable to accommodate refinement of research protocols, and cost-efficient. Data entry into PACLIMS was simplified through the use of barcodes and scanners, thus reducing the potential human error, time constraints, and labor. This platform was designed in concert with our experimental protocol so that it leads the researchers through each step of the process from mutant generation through phenotypic assays, thus ensuring that every mutant produced is handled in an identical manner and all necessary data is captured. Conclusion Many sequenced eukaryotes have reached the point where computational analyses are no longer sufficient and require biological support for their predicted genes. Consequently, there is an increasing need for platforms that support high throughput genome-wide mutational analyses. While PACLIMS was designed specifically for this project, the source and ideas present in its implementation can be used as a model for other high throughput mutational endeavors. PMID:15826298
Acceptability and performance of the menstrual cup in South Africa: a randomized crossover trial comparing the menstrual cup to tampons or sanitary pads.

PubMed

Beksinska, Mags E; Smit, Jenni; Greener, Ross; Todd, Catherine S; Lee, Mei-ling Ting; Maphumulo, Virginia; Hoffmann, Vivian

2015-02-01

In low-income settings, many women and girls face activity restrictions during menses, owing to lack of affordable menstrual products. The menstrual cup (MC) is a nonabsorbent reusable cup that collects menstrual blood. We assessed the acceptability and performance of the MPower® MC compared to pads or tampons among women in a low-resource setting. We conducted a randomized two-period crossover trial at one site in Durban, South Africa, between January and November 2013. Participants aged 18-45 years with regular menstrual cycles were eligible for inclusion if they had no intention of becoming pregnant, were using an effective contraceptive method, had water from the municipal system as their primary water source, and had no sexually transmitted infections. We used a computer-generated randomization sequence to assign participants to one of two sequences of menstrual product use, with allocation concealed only from the study investigators. Participants used each method over three menstrual cycles (total 6 months) and were interviewed at baseline and monthly follow-up visits. The product acceptability outcome compared product satisfaction question scores using an ordinal logistic regression model with individual random effects. This study is registered on the South African Clinical Trials database: number DOH-27-01134273. Of 124 women assessed, 110 were eligible and randomly assigned to selected menstrual products. One hundred and five women completed all follow-up visits. By comparison to pads/tampons (usual product used), the MC was rated significantly better for comfort, quality, menstrual blood collection, appearance, and preference. Both of these comparative outcome measures, along with likelihood of continued use, recommending the product, and future purchase, increased for the MC over time. MC acceptance in a population of novice users, many with limited experience with tampons, indicates that there is a pool of potential users in low-resource settings.
Survey and Analysis of Microsatellites in the Silkworm, Bombyx mori

PubMed Central

Prasad, M. Dharma; Muthulakshmi, M.; Madhu, M.; Archak, Sunil; Mita, K.; Nagaraju, J.

2005-01-01

We studied microsatellite frequency and distribution in 21.76-Mb random genomic sequences, 0.67-Mb BAC sequences from the Z chromosome, and 6.3-Mb EST sequences of Bombyx mori. We mined microsatellites of ≥15 bases of mononucleotide repeats and ≥5 repeat units of other classes of repeats. We estimated that microsatellites account for 0.31% of the genome of B. mori. Microsatellite tracts of A, AT, and ATT were the most abundant whereas their number drastically decreased as the length of the repeat motif increased. In general, tri- and hexanucleotide repeats were overrepresented in the transcribed sequences except TAA, GTA, and TGA, which were in excess in genomic sequences. The Z chromosome sequences contained shorter repeat types than the rest of the chromosomes in addition to a higher abundance of AT-rich repeats. Our results showed that base composition of the flanking sequence has an influence on the origin and evolution of microsatellites. Transitions/transversions were high in microsatellites of ESTs, whereas the genomic sequence had an equal number of substitutions and indels. The average heterozygosity value for 23 polymorphic microsatellite loci surveyed in 13 diverse silkmoth strains having 2–14 alleles was 0.54. Only 36 (18.2%) of 198 microsatellite loci were polymorphic between the two divergent silkworm populations and 10 (5%) loci revealed null alleles. The microsatellite map generated using these polymorphic markers resulted in 8 linkage groups. B. mori microsatellite loci were the most conserved in its immediate ancestor, B. mandarina, followed by the wild saturniid silkmoth, Antheraea assama. PMID:15371363
JVM: Java Visual Mapping tool for next generation sequencing read.

PubMed

Yang, Ye; Liu, Juan

2015-01-01

We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
Physiology is rocking the foundations of evolutionary biology.

PubMed

Noble, Denis

2013-08-01

The 'Modern Synthesis' (Neo-Darwinism) is a mid-20th century gene-centric view of evolution, based on random mutations accumulating to produce gradual change through natural selection. Any role of physiological function in influencing genetic inheritance was excluded. The organism became a mere carrier of the real objects of selection, its genes. We now know that genetic change is far from random and often not gradual. Molecular genetics and genome sequencing have deconstructed this unnecessarily restrictive view of evolution in a way that reintroduces physiological function and interactions with the environment as factors influencing the speed and nature of inherited change. Acquired characteristics can be inherited, and in a few but growing number of cases that inheritance has now been shown to be robust for many generations. The 21st century can look forward to a new synthesis that will reintegrate physiology with evolutionary biology.
Ocean biogeochemistry modeled with emergent trait-based genomics.

PubMed

Coles, V J; Stukel, M R; Brooks, M T; Burd, A; Crump, B C; Moran, M A; Paul, J H; Satinsky, B M; Yager, P L; Zielinski, B L; Hood, R R

2017-12-01

Marine ecosystem models have advanced to incorporate metabolic pathways discovered with genomic sequencing, but direct comparisons between models and "omics" data are lacking. We developed a model that directly simulates metagenomes and metatranscriptomes for comparison with observations. Model microbes were randomly assigned genes for specialized functions, and communities of 68 species were simulated in the Atlantic Ocean. Unfit organisms were replaced, and the model self-organized to develop community genomes and transcriptomes. Emergent communities from simulations that were initialized with different cohorts of randomly generated microbes all produced realistic vertical and horizontal ocean nutrient, genome, and transcriptome gradients. Thus, the library of gene functions available to the community, rather than the distribution of functions among specific organisms, drove community assembly and biogeochemical gradients in the model ocean. Copyright © 2017 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.
Genetic relatedness between oral and intestinal isolates of Porphyromonas endodontalis by analysis of random amplified polymorphic DNA.

PubMed

Gonçalves, R B; Väisänen, M L; Van Steenbergen, T J; Sundqvist, G; Mouton, C

1999-01-01

Genomic fingerprints from the DNA of 27 strains of Porphyromonas endodontalis from diverse clinical and geographic origins were generated as random amplified polymorphic DNA (RAPD) using the technique of PCR amplification with a single primer of arbitrary sequence. Cluster analysis of the combined RAPD data obtained with three selected 9- or 10-mer-long primers identified 25 distinct RAPD types which clustered as three main groups identifying three genogroups. Genogroups I and II included exclusively P. endodontalis isolates of oral origin, while 7/9 human intestinal strains of genogroup III which linked at a similarity level of 52% constituted the most homogeneous group in our study. Genotypic diversity within P. endodontalis, as shown by RAPD analysis, suggests that the taxon is composed of two oral genogroups and one intestinal genogroup. This hypothesis remains to be confirmed.
Color image encryption by using Yang-Gu mixture amplitude-phase retrieval algorithm in gyrator transform domain and two-dimensional Sine logistic modulation map

NASA Astrophysics Data System (ADS)

Sui, Liansheng; Liu, Benqing; Wang, Qiang; Li, Ye; Liang, Junli

2015-12-01

A color image encryption scheme is proposed based on Yang-Gu mixture amplitude-phase retrieval algorithm and two-coupled logistic map in gyrator transform domain. First, the color plaintext image is decomposed into red, green and blue components, which are scrambled individually by three random sequences generated by using the two-dimensional Sine logistic modulation map. Second, each scrambled component is encrypted into a real-valued function with stationary white noise distribution in the iterative amplitude-phase retrieval process in the gyrator transform domain, and then three obtained functions are considered as red, green and blue channels to form the color ciphertext image. Obviously, the ciphertext image is real-valued function and more convenient for storing and transmitting. In the encryption and decryption processes, the chaotic random phase mask generated based on logistic map is employed as the phase key, which means that only the initial values are used as private key and the cryptosystem has high convenience on key management. Meanwhile, the security of the cryptosystem is enhanced greatly because of high sensitivity of the private keys. Simulation results are presented to prove the security and robustness of the proposed scheme.
Pseudo-Random Number Generator Based on Coupled Map Lattices

NASA Astrophysics Data System (ADS)

Lü, Huaping; Wang, Shihong; Hu, Gang

A one-way coupled chaotic map lattice is used for generating pseudo-random numbers. It is shown that with suitable cooperative applications of both chaotic and conventional approaches, the output of the spatiotemporally chaotic system can easily meet the practical requirements of random numbers, i.e., excellent random statistical properties, long periodicity of computer realizations, and fast speed of random number generations. This pseudo-random number generator system can be used as ideal synchronous and self-synchronizing stream cipher systems for secure communications.
System, method and apparatus for generating phrases from a database

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.

Generating random numbers by means of nonlinear dynamic systems

NASA Astrophysics Data System (ADS)

Zang, Jiaqi; Hu, Haojie; Zhong, Juhua; Luo, Duanbin; Fang, Yi

2018-07-01

To introduce the randomness of a physical process to students, a chaotic pendulum experiment was opened in East China University of Science and Technology (ECUST) on the undergraduate level in the physics department. It was shown chaotic motion could be initiated through adjusting the operation of a chaotic pendulum. By using the data of the angular displacements of chaotic motion, random binary numerical arrays can be generated. To check the randomness of generated numerical arrays, the NIST Special Publication 800-20 method was adopted. As a result, it was found that all the random arrays which were generated by the chaotic motion could pass the validity criteria and some of them were even better than the quality of pseudo-random numbers generated by a computer. Through the experiments, it is demonstrated that chaotic pendulum can be used as an efficient mechanical facility in generating random numbers, and can be applied in teaching random motion to the students.
Extracting random numbers from quantum tunnelling through a single diode.

PubMed

Bernardo-Gavito, Ramón; Bagci, Ibrahim Ethem; Roberts, Jonathan; Sexton, James; Astbury, Benjamin; Shokeir, Hamzah; McGrath, Thomas; Noori, Yasir J; Woodhead, Christopher S; Missous, Mohamed; Roedig, Utz; Young, Robert J

2017-12-19

Random number generation is crucial in many aspects of everyday life, as online security and privacy depend ultimately on the quality of random numbers. Many current implementations are based on pseudo-random number generators, but information security requires true random numbers for sensitive applications like key generation in banking, defence or even social media. True random number generators are systems whose outputs cannot be determined, even if their internal structure and response history are known. Sources of quantum noise are thus ideal for this application due to their intrinsic uncertainty. In this work, we propose using resonant tunnelling diodes as practical true random number generators based on a quantum mechanical effect. The output of the proposed devices can be directly used as a random stream of bits or can be further distilled using randomness extraction algorithms, depending on the application.
IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

EPA Science Inventory

Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...
Evolution in a Test Tube: Exploring the Structure and Function of RNA Probes

DTIC Science & Technology

2008-05-02

Bartel, D.P. and Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. 24...Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. Chen, Ying; Carlini
Generating and using truly random quantum states in Mathematica

NASA Astrophysics Data System (ADS)

Miszczak, Jarosław Adam

2012-01-01

The problem of generating random quantum states is of a great interest from the quantum information theory point of view. In this paper we present a package for Mathematica computing system harnessing a specific piece of hardware, namely Quantis quantum random number generator (QRNG), for investigating statistical properties of quantum states. The described package implements a number of functions for generating random states, which use Quantis QRNG as a source of randomness. It also provides procedures which can be used in simulations not related directly to quantum information processing. Program summaryProgram title: TRQS Catalogue identifier: AEKA_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEKA_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 7924 No. of bytes in distributed program, including test data, etc.: 88 651 Distribution format: tar.gz Programming language: Mathematica, C Computer: Requires a Quantis quantum random number generator (QRNG, http://www.idquantique.com/true-random-number-generator/products-overview.html) and supporting a recent version of Mathematica Operating system: Any platform supporting Mathematica; tested with GNU/Linux (32 and 64 bit) RAM: Case dependent Classification: 4.15 Nature of problem: Generation of random density matrices. Solution method: Use of a physical quantum random number generator. Running time: Generating 100 random numbers takes about 1 second, generating 1000 random density matrices takes more than a minute.
Quantum random number generation for loophole-free Bell tests

NASA Astrophysics Data System (ADS)

Mitchell, Morgan; Abellan, Carlos; Amaya, Waldimar

2015-05-01

We describe the generation of quantum random numbers at multi-Gbps rates, combined with real-time randomness extraction, to give very high purity random numbers based on quantum events at most tens of ns in the past. The system satisfies the stringent requirements of quantum non-locality tests that aim to close the timing loophole. We describe the generation mechanism using spontaneous-emission-driven phase diffusion in a semiconductor laser, digitization, and extraction by parity calculation using multi-GHz logic chips. We pay special attention to experimental proof of the quality of the random numbers and analysis of the randomness extraction. In contrast to widely-used models of randomness generators in the computer science literature, we argue that randomness generation by spontaneous emission can be extracted from a single source.
Entropy and long-range memory in random symbolic additive Markov chains

NASA Astrophysics Data System (ADS)

Melnik, S. S.; Usatenko, O. V.

2016-06-01

The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
Entropy and long-range memory in random symbolic additive Markov chains.

PubMed

Melnik, S S; Usatenko, O V

2016-06-01

The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
K-Ras(G12D)-selective inhibitory peptides generated by random peptide T7 phage display technology.

PubMed

Sakamoto, Kotaro; Kamada, Yusuke; Sameshima, Tomoya; Yaguchi, Masahiro; Niida, Ayumu; Sasaki, Shigekazu; Miwa, Masanori; Ohkubo, Shoichi; Sakamoto, Jun-Ichi; Kamaura, Masahiro; Cho, Nobuo; Tani, Akiyoshi

2017-03-11

Amino-acid mutations of Gly 12 (e.g. G12D, G12V, G12C) of V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (K-Ras), the most promising drug target in cancer therapy, are major growth drivers in various cancers. Although over 30 years have passed since the discovery of these mutations in most cancer patients, effective mutated K-Ras inhibitors have not been marketed. Here, we report novel and selective inhibitory peptides to K-Ras(G12D). We screened random peptide libraries displayed on T7 phage against purified recombinant K-Ras(G12D), with thorough subtraction of phages bound to wild-type K-Ras, and obtained KRpep-2 (Ac-RRCPLYISYDPVCRR-NH 2 ) as a consensus sequence. KRpep-2 showed more than 10-fold binding- and inhibition-selectivity to K-Ras(G12D), both in SPR analysis and GDP/GTP exchange enzyme assay. K D and IC 50 values were 51 and 8.9 nM, respectively. After subsequent sequence optimization, we successfully generated KRpep-2d (Ac-RRRRCPLYISYDPVCRRRR-NH 2 ) that inhibited enzyme activity of K-Ras(G12D) with IC 50 = 1.6 nM and significantly suppressed ERK-phosphorylation, downstream of K-Ras(G12D), along with A427 cancer cell proliferation at 30 μM peptide concentration. To our knowledge, this is the first report of a K-Ras(G12D)-selective inhibitor, contributing to the development and study of K-Ras(G12D)-targeting drugs. Copyright © 2017 Elsevier Inc. All rights reserved.
An Image Encryption Algorithm Utilizing Julia Sets and Hilbert Curves

PubMed Central

Sun, Yuanyuan; Chen, Lina; Xu, Rudan; Kong, Ruiqing

2014-01-01

Image encryption is an important and effective technique to protect image security. In this paper, a novel image encryption algorithm combining Julia sets and Hilbert curves is proposed. The algorithm utilizes Julia sets’ parameters to generate a random sequence as the initial keys and gets the final encryption keys by scrambling the initial keys through the Hilbert curve. The final cipher image is obtained by modulo arithmetic and diffuse operation. In this method, it needs only a few parameters for the key generation, which greatly reduces the storage space. Moreover, because of the Julia sets’ properties, such as infiniteness and chaotic characteristics, the keys have high sensitivity even to a tiny perturbation. The experimental results indicate that the algorithm has large key space, good statistical property, high sensitivity for the keys, and effective resistance to the chosen-plaintext attack. PMID:24404181
J3Gen: A PRNG for Low-Cost Passive RFID

PubMed Central

Melià-Seguí, Joan; Garcia-Alfaro, Joaquin; Herrera-Joancomartí, Jordi

2013-01-01

Pseudorandom number generation (PRNG) is the main security tool in low-cost passive radio-frequency identification (RFID) technologies, such as EPC Gen2. We present a lightweight PRNG design for low-cost passive RFID tags, named J3Gen. J3Gen is based on a linear feedback shift register (LFSR) configured with multiple feedback polynomials. The polynomials are alternated during the generation of sequences via a physical source of randomness. J3Gen successfully handles the inherent linearity of LFSR based PRNGs and satisfies the statistical requirements imposed by the EPC Gen2 standard. A hardware implementation of J3Gen is presented and evaluated with regard to different design parameters, defining the key-equivalence security and nonlinearity of the design. The results of a SPICE simulation confirm the power-consumption suitability of the proposal. PMID:23519344
NHash: Randomized N-Gram Hashing for Distributed Generation of Validatable Unique Study Identifiers in Multicenter Research.

PubMed

Zhang, Guo-Qiang; Tao, Shiqiang; Xing, Guangming; Mozes, Jeno; Zonjy, Bilal; Lhatoo, Samden D; Cui, Licong

2015-11-10

A unique study identifier serves as a key for linking research data about a study subject without revealing protected health information in the identifier. While sufficient for single-site and limited-scale studies, the use of common unique study identifiers has several drawbacks for large multicenter studies, where thousands of research participants may be recruited from multiple sites. An important property of study identifiers is error tolerance (or validatable), in that inadvertent editing mistakes during their transmission and use will most likely result in invalid study identifiers. This paper introduces a novel method called "Randomized N-gram Hashing (NHash)," for generating unique study identifiers in a distributed and validatable fashion, in multicenter research. NHash has a unique set of properties: (1) it is a pseudonym serving the purpose of linking research data about a study participant for research purposes; (2) it can be generated automatically in a completely distributed fashion with virtually no risk for identifier collision; (3) it incorporates a set of cryptographic hash functions based on N-grams, with a combination of additional encryption techniques such as a shift cipher; (d) it is validatable (error tolerant) in the sense that inadvertent edit errors will mostly result in invalid identifiers. NHash consists of 2 phases. First, an intermediate string using randomized N-gram hashing is generated. This string consists of a collection of N-gram hashes f1, f2, ..., fk. The input for each function fi has 3 components: a random number r, an integer n, and input data m. The result, fi(r, n, m), is an n-gram of m with a starting position s, which is computed as (r mod |m|), where |m| represents the length of m. The output for Step 1 is the concatenation of the sequence f1(r1, n1, m1), f2(r2, n2, m2), ..., fk(rk, nk, mk). In the second phase, the intermediate string generated in Phase 1 is encrypted using techniques such as shift cipher. The result of the encryption, concatenated with the random number r, is the final NHash study identifier. We performed experiments using a large synthesized dataset comparing NHash with random strings, and demonstrated neglegible probability for collision. We implemented NHash for the Center for SUDEP Research (CSR), a National Institute for Neurological Disorders and Stroke-funded Center Without Walls for Collaborative Research in the Epilepsies. This multicenter collaboration involves 14 institutions across the United States and Europe, bringing together extensive and diverse expertise to understand sudden unexpected death in epilepsy patients (SUDEP). The CSR Data Repository has successfully used NHash to link deidentified multimodal clinical data collected in participating CSR institutions, meeting all desired objectives of NHash.
Free-Space Quantum Communication with a Portable Quantum Memory

NASA Astrophysics Data System (ADS)

Namazi, Mehdi; Vallone, Giuseppe; Jordaan, Bertus; Goham, Connor; Shahrokhshahi, Reihaneh; Villoresi, Paolo; Figueroa, Eden

2017-12-01

The realization of an elementary quantum network that is intrinsically secure and operates over long distances requires the interconnection of several quantum modules performing different tasks. In this work, we report the realization of a communication network functioning in a quantum regime, consisting of four different quantum modules: (i) a random polarization qubit generator, (ii) a free-space quantum-communication channel, (iii) an ultralow-noise portable quantum memory, and (iv) a qubit decoder, in a functional elementary quantum network possessing all capabilities needed for quantum-information distribution protocols. We create weak coherent pulses at the single-photon level encoding polarization states |H ⟩ , |V ⟩, |D ⟩, and |A ⟩ in a randomized sequence. The random qubits are sent over a free-space link and coupled into a dual-rail room-temperature quantum memory and after storage and retrieval are analyzed in a four-detector polarization analysis akin to the requirements of the BB84 protocol. We also show ultralow noise and fully portable operation, paving the way towards memory-assisted all-environment free-space quantum cryptographic networks.
Completely device-independent quantum key distribution

NASA Astrophysics Data System (ADS)

Aguilar, Edgar A.; Ramanathan, Ravishankar; Kofler, Johannes; Pawłowski, Marcin

2016-08-01

Quantum key distribution (QKD) is a provably secure way for two distant parties to establish a common secret key, which then can be used in a classical cryptographic scheme. Using quantum entanglement, one can reduce the necessary assumptions that the parties have to make about their devices, giving rise to device-independent QKD (DIQKD). However, in all existing protocols to date the parties need to have an initial (at least partially) random seed as a resource. In this work, we show that this requirement can be dropped. Using recent advances in the fields of randomness amplification and randomness expansion, we demonstrate that it is sufficient for the message the parties want to communicate to be (partially) unknown to the adversaries—an assumption without which any type of cryptography would be pointless to begin with. One party can use her secret message to locally generate a secret sequence of bits, which can then be openly used by herself and the other party in a DIQKD protocol. Hence our work reduces the requirements needed to perform secure DIQKD and establish safe communication.
Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

PubMed

Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

2016-04-28

Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Generating Models of Surgical Procedures using UMLS Concepts and Multiple Sequence Alignment

PubMed Central

Meng, Frank; D’Avolio, Leonard W.; Chen, Andrew A.; Taira, Ricky K.; Kangarloo, Hooshang

2005-01-01

Surgical procedures can be viewed as a process composed of a sequence of steps performed on, by, or with the patient’s anatomy. This sequence is typically the pattern followed by surgeons when generating surgical report narratives for documenting surgical procedures. This paper describes a methodology for semi-automatically deriving a model of conducted surgeries, utilizing a sequence of derived Unified Medical Language System (UMLS) concepts for representing surgical procedures. A multiple sequence alignment was computed from a collection of such sequences and was used for generating the model. These models have the potential of being useful in a variety of informatics applications such as information retrieval and automatic document generation. PMID:16779094
Improve homology search sensitivity of PacBio data by correcting frameshifts.

PubMed

Du, Nan; Sun, Yanni

2016-09-01

Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Comparative Genomic and Transcriptomic Characterization of the Toxigenic Marine Dinoflagellate Alexandrium ostenfeldii

PubMed Central

Jaeckisch, Nina; Yang, Ines; Wohlrab, Sylke; Glöckner, Gernot; Kroymann, Juergen; Vogel, Heiko; Cembella, Allan; John, Uwe

2011-01-01

Many dinoflagellate species are notorious for the toxins they produce and ecological and human health consequences associated with harmful algal blooms (HABs). Dinoflagellates are particularly refractory to genomic analysis due to the enormous genome size, lack of knowledge about their DNA composition and structure, and peculiarities of gene regulation, such as spliced leader (SL) trans-splicing and mRNA transposition mechanisms. Alexandrium ostenfeldii is known to produce macrocyclic imine toxins, described as spirolides. We characterized the genome of A. ostenfeldii using a combination of transcriptomic data and random genomic clones for comparison with other dinoflagellates, particularly Alexandrium species. Examination of SL sequences revealed similar features as in other dinoflagellates, including Alexandrium species. SL sequences in decay indicate frequent retro-transposition of mRNA species. This probably contributes to overall genome complexity by generating additional gene copies. Sequencing of several thousand fosmid and bacterial artificial chromosome (BAC) ends yielded a wealth of simple repeats and tandemly repeated longer sequence stretches which we estimated to comprise more than half of the whole genome. Surprisingly, the repeats comprise a very limited set of 79–97 bp sequences; in part the genome is thus a relatively uniform sequence space interrupted by coding sequences. Our genomic sequence survey (GSS) represents the largest genomic data set of a dinoflagellate to date. Alexandrium ostenfeldii is a typical dinoflagellate with respect to its transcriptome and mRNA transposition but demonstrates Alexandrium-like stop codon usage. The large portion of repetitive sequences and the organization within the genome is in agreement with several other studies on dinoflagellates using different approaches. It remains to be determined whether this unusual composition is directly correlated to the exceptionally genome organization of dinoflagellates with a low amount of histones and histone-like proteins. PMID:22164224
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.

PubMed

Gerth, Michael; Hurst, Gregory D D

2017-01-01

High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Employing online quantum random number generators for generating truly random quantum states in Mathematica

NASA Astrophysics Data System (ADS)

Miszczak, Jarosław Adam

2013-01-01

The presented package for the Mathematica computing system allows the harnessing of quantum random number generators (QRNG) for investigating the statistical properties of quantum states. The described package implements a number of functions for generating random states. The new version of the package adds the ability to use the on-line quantum random number generator service and implements new functions for retrieving lists of random numbers. Thanks to the introduced improvements, the new version provides faster access to high-quality sources of random numbers and can be used in simulations requiring large amount of random data. New version program summaryProgram title: TRQS Catalogue identifier: AEKA_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEKA_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 18 134 No. of bytes in distributed program, including test data, etc.: 2 520 49 Distribution format: tar.gz Programming language: Mathematica, C. Computer: Any supporting Mathematica in version 7 or higher. Operating system: Any platform supporting Mathematica; tested with GNU/Linux (32 and 64 bit). RAM: Case-dependent Supplementary material: Fig. 1 mentioned below can be downloaded. Classification: 4.15. External routines: Quantis software library (http://www.idquantique.com/support/quantis-trng.html) Catalogue identifier of previous version: AEKA_v1_0 Journal reference of previous version: Comput. Phys. Comm. 183(2012)118 Does the new version supersede the previous version?: Yes Nature of problem: Generation of random density matrices and utilization of high-quality random numbers for the purpose of computer simulation. Solution method: Use of a physical quantum random number generator and an on-line service providing access to the source of true random numbers generated by quantum real number generator. Reasons for new version: Added support for the high-speed on-line quantum random number generator and improved methods for retrieving lists of random numbers. Summary of revisions: The presented version provides two signicant improvements. The first one is the ability to use the on-line Quantum Random Number Generation service developed by PicoQuant GmbH and the Nano-Optics groups at the Department of Physics of Humboldt University. The on-line service supported in the version 2.0 of the TRQS package provides faster access to true randomness sources constructed using the laws of quantum physics. The service is freely available at https://qrng.physik.hu-berlin.de/. The use of this service allows using the presented package with the need of a physical quantum random number generator. The second improvement introduced in this version is the ability to retrieve arrays of random data directly for the used source. This increases the speed of the random number generation, especially in the case of an on-line service, where it reduces the time necessary to establish the connection. Thanks to the speed improvement of the presented version, the package can now be used in simulations requiring larger amounts of random data. Moreover, the functions for generating random numbers provided by the current version of the package more closely follow the pattern of functions for generating pseudo- random numbers provided in Mathematica. Additional comments: Speed comparison: The implementation of the support for the QRNG on-line service provides a noticeable improvement in the speed of random number generation. For the samples of real numbers of size 101; 102,…,107 the times required to generate these samples using Quantis USB device and QRNG service are compared in Fig. 1. The presented results show that the use of the on-line service provides faster access to random numbers. One should note, however, that the speed gain can increase or decrease depending on the connection speed between the computer and the server providing random numbers. Running time: Depends on the used source of randomness and the amount of random data used in the experiment. References: [1] M. Wahl, M. Leifgen, M. Berlin, T. Röhlicke, H.-J. Rahn, O. Benson., An ultrafast quantum random number generator with provably bounded output bias based on photon arrival time measurements, Applied Physics Letters, Vol. 098, 171105 (2011). http://dx.doi.org/10.1063/1.3578456.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.