Sample records for random number sequence

  1. Real-time fast physical random number generator with a photonic integrated circuit.

    PubMed

    Ugajin, Kazusa; Terashima, Yuta; Iwakawa, Kento; Uchida, Atsushi; Harayama, Takahisa; Yoshimura, Kazuyuki; Inubushi, Masanobu

    2017-03-20

    Random number generators are essential for applications in information security and numerical simulations. Most optical-chaos-based random number generators produce random bit sequences by offline post-processing with large optical components. We demonstrate a real-time hardware implementation of a fast physical random number generator with a photonic integrated circuit and a field programmable gate array (FPGA) electronic board. We generate 1-Tbit random bit sequences and evaluate their statistical randomness using NIST Special Publication 800-22 and TestU01. All of the BigCrush tests in TestU01 are passed using 410-Gbit random bit sequences. A maximum real-time generation rate of 21.1 Gb/s is achieved for random bit sequences in binary format stored in a computer, which can be directly used for applications involving secret keys in cryptography and random seeds in large-scale numerical simulations.

  2. Implementation of a quantum random number generator based on the optimal clustering of photocounts

    NASA Astrophysics Data System (ADS)

    Balygin, K. A.; Zaitsev, V. I.; Klimov, A. N.; Kulik, S. P.; Molotkov, S. N.

    2017-10-01

    To implement quantum random number generators, it is fundamentally important to have a mathematically provable and experimentally testable process of measurements of a system from which an initial random sequence is generated. This makes sure that randomness indeed has a quantum nature. A quantum random number generator has been implemented with the use of the detection of quasi-single-photon radiation by a silicon photomultiplier (SiPM) matrix, which makes it possible to reliably reach the Poisson statistics of photocounts. The choice and use of the optimal clustering of photocounts for the initial sequence of photodetection events and a method of extraction of a random sequence of 0's and 1's, which is polynomial in the length of the sequence, have made it possible to reach a yield rate of 64 Mbit/s of the output certainly random sequence.

  3. A Comparison of Three Random Number Generators for Aircraft Dynamic Modeling Applications

    NASA Technical Reports Server (NTRS)

    Grauer, Jared A.

    2017-01-01

    Three random number generators, which produce Gaussian white noise sequences, were compared to assess their suitability in aircraft dynamic modeling applications. The first generator considered was the MATLAB (registered) implementation of the Mersenne-Twister algorithm. The second generator was a website called Random.org, which processes atmospheric noise measured using radios to create the random numbers. The third generator was based on synthesis of the Fourier series, where the random number sequences are constructed from prescribed amplitude and phase spectra. A total of 200 sequences, each having 601 random numbers, for each generator were collected and analyzed in terms of the mean, variance, normality, autocorrelation, and power spectral density. These sequences were then applied to two problems in aircraft dynamic modeling, namely estimating stability and control derivatives from simulated onboard sensor data, and simulating flight in atmospheric turbulence. In general, each random number generator had good performance and is well-suited for aircraft dynamic modeling applications. Specific strengths and weaknesses of each generator are discussed. For Monte Carlo simulation, the Fourier synthesis method is recommended because it most accurately and consistently approximated Gaussian white noise and can be implemented with reasonable computational effort.

  4. Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

    NASA Technical Reports Server (NTRS)

    Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

    2007-01-01

    We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.

  5. Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

    PubMed

    Hoshino, Tatsuhiko; Inagaki, Fumio

    2017-01-01

    Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.

  6. DNA-based random number generation in security circuitry.

    PubMed

    Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

    2010-06-01

    DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.

  7. Simulations Using Random-Generated DNA and RNA Sequences

    ERIC Educational Resources Information Center

    Bryce, C. F. A.

    1977-01-01

    Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…

  8. A Micro-Computer Model for Army Air Defense Training.

    DTIC Science & Technology

    1985-03-01

    generator. The period is 32763 numbers generated before a repetitive sequence is encountered on the development system. Chi-Squared tests for frequency...C’ Tests CPeriodicity. The period is 32763 numbers generated C’before a repetitive sequence is encountered on the development system. This was...positions in the test array. This was done with several different random number seeds. In each case 32763 p random numbers were generated before a

  9. Analysis of Uniform Random Numbers Generated by Randu and Urn Ten Different Seeds.

    DTIC Science & Technology

    The statistical properties of the numbers generated by two uniform random number generators, RANDU and URN, each using ten different seeds are...The testing is performed on a sequence of 50,000 numbers generated by each uniform random number generator using each of the ten seeds . (Author)

  10. Direct generation of all-optical random numbers from optical pulse amplitude chaos.

    PubMed

    Li, Pu; Wang, Yun-Cai; Wang, An-Bang; Yang, Ling-Zhen; Zhang, Ming-Jiang; Zhang, Jian-Zhong

    2012-02-13

    We propose and theoretically demonstrate an all-optical method for directly generating all-optical random numbers from pulse amplitude chaos produced by a mode-locked fiber ring laser. Under an appropriate pump intensity, the mode-locked laser can experience a quasi-periodic route to chaos. Such a chaos consists of a stream of pulses with a fixed repetition frequency but random intensities. In this method, we do not require sampling procedure and external triggered clocks but directly quantize the chaotic pulses stream into random number sequence via an all-optical flip-flop. Moreover, our simulation results show that the pulse amplitude chaos has no periodicity and possesses a highly symmetric distribution of amplitude. Thus, in theory, the obtained random number sequence without post-processing has a high-quality randomness verified by industry-standard statistical tests.

  11. Random Number Generation and Executive Functions in Parkinson's Disease: An Event-Related Brain Potential Study.

    PubMed

    Münte, Thomas F; Joppich, Gregor; Däuper, Jan; Schrader, Christoph; Dengler, Reinhard; Heldmann, Marcus

    2015-01-01

    The generation of random sequences is considered to tax executive functions and has been reported to be impaired in Parkinson's disease (PD) previously. To assess the neurophysiological markers of random number generation in PD. Event-related potentials (ERP) were recorded in 12 PD patients and 12 age-matched normal controls (NC) while either engaging in random number generation (RNG) by pressing the number keys on a computer keyboard in a random sequence or in ordered number generation (ONG) necessitating key presses in the canonical order. Key presses were paced by an external auditory stimulus at a rate of 1 tone every 1800 ms. As a secondary task subjects had to monitor the tone-sequence for a particular target tone to which the number "0" key had to be pressed. This target tone occurred randomly and infrequently, thus creating a secondary oddball task. Behaviorally, PD patients showed an increased tendency to count in steps of one as well as a tendency towards repetition avoidance. Electrophysiologically, the amplitude of the P3 component of the ERP to the target tone of the secondary task was reduced during RNG in PD but not in NC. The behavioral findings indicate less random behavior in PD while the ERP findings suggest that this impairment comes about, because attentional resources are depleted in PD.

  12. Truly random number generation: an example

    NASA Astrophysics Data System (ADS)

    Frauchiger, Daniela; Renner, Renato

    2013-10-01

    Randomness is crucial for a variety of applications, ranging from gambling to computer simulations, and from cryptography to statistics. However, many of the currently used methods for generating randomness do not meet the criteria that are necessary for these applications to work properly and safely. A common problem is that a sequence of numbers may look random but nevertheless not be truly random. In fact, the sequence may pass all standard statistical tests and yet be perfectly predictable. This renders it useless for many applications. For example, in cryptography, the predictability of a "andomly" chosen password is obviously undesirable. Here, we review a recently developed approach to generating true | and hence unpredictable | randomness.

  13. Problems with the random number generator RANF implemented on the CDC cyber 205

    NASA Astrophysics Data System (ADS)

    Kalle, Claus; Wansleben, Stephan

    1984-10-01

    We show that using RANF may lead to wrong results when lattice models are simulated by Monte Carlo methods. We present a shift-register sequence random number generator which generates two random numbers per cycle on a two pipe CDC Cyber 205.

  14. Not all numbers are equal: preferences and biases among children and adults when generating random sequences.

    PubMed

    Towse, John N; Loetscher, Tobias; Brugger, Peter

    2014-01-01

    We investigate the number preferences of children and adults when generating random digit sequences. Previous research has shown convincingly that adults prefer smaller numbers when randomly choosing between responses 1-6. We analyze randomization choices made by both children and adults, considering a range of experimental studies and task configurations. Children - most of whom are between 8 and 11~years - show a preference for relatively large numbers when choosing numbers 1-10. Adults show a preference for small numbers with the same response set. We report a modest association between children's age and numerical bias. However, children also exhibit a small number bias with a smaller response set available, and they show a preference specifically for the numbers 1-3 across many datasets. We argue that number space demonstrates both continuities (numbers 1-3 have a distinct status) and change (a developmentally emerging bias toward the left side of representational space or lower numbers).

  15. On the limiting characteristics of quantum random number generators at various clusterings of photocounts

    NASA Astrophysics Data System (ADS)

    Molotkov, S. N.

    2017-03-01

    Various methods for the clustering of photocounts constituting a sequence of random numbers are considered. It is shown that the clustering of photocounts resulting in the Fermi-Dirac distribution makes it possible to achieve the theoretical limit of the random number generation rate.

  16. On the design of henon and logistic map-based random number generator

    NASA Astrophysics Data System (ADS)

    Magfirawaty; Suryadi, M. T.; Ramli, Kalamullah

    2017-10-01

    The key sequence is one of the main elements in the cryptosystem. True Random Number Generators (TRNG) method is one of the approaches to generating the key sequence. The randomness source of the TRNG divided into three main groups, i.e. electrical noise based, jitter based and chaos based. The chaos based utilizes a non-linear dynamic system (continuous time or discrete time) as an entropy source. In this study, a new design of TRNG based on discrete time chaotic system is proposed, which is then simulated in LabVIEW. The principle of the design consists of combining 2D and 1D chaotic systems. A mathematical model is implemented for numerical simulations. We used comparator process as a harvester method to obtain the series of random bits. Without any post processing, the proposed design generated random bit sequence with high entropy value and passed all NIST 800.22 statistical tests.

  17. Generating constrained randomized sequences: item frequency matters.

    PubMed

    French, Robert M; Perruchet, Pierre

    2009-11-01

    All experimental psychologists understand the importance of randomizing lists of items. However, randomization is generally constrained, and these constraints-in particular, not allowing immediately repeated items-which are designed to eliminate particular biases, frequently engender others. We describe a simple Monte Carlo randomization technique that solves a number of these problems. However, in many experimental settings, we are concerned not only with the number and distribution of items but also with the number and distribution of transitions between items. The algorithm mentioned above provides no control over this. We therefore introduce a simple technique that uses transition tables for generating correctly randomized sequences. We present an analytic method of producing item-pair frequency tables and item-pair transitional probability tables when immediate repetitions are not allowed. We illustrate these difficulties and how to overcome them, with reference to a classic article on word segmentation in infants. Finally, we provide free access to an Excel file that allows users to generate transition tables with up to 10 different item types, as well as to generate appropriately distributed randomized sequences of any length without immediately repeated elements. This file is freely available from http://leadserv.u-bourgogne.fr/IMG/xls/TransitionMatrix.xls.

  18. Novel application of the MSSCP method in biodiversity studies.

    PubMed

    Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula

    2012-02-01

    Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  19. Variations on a theme of Lander and Waterman

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Speed, T.

    1997-12-01

    The original Lander and Waterman mathematical analysis was for fingerprinting random clones. Since that time, a number of variants of their theory have appeared, including ones which apply to mapping by anchoring random clones, and to non-random or directed clone mapping. The same theory is now widely used to devise random sequencing strategies. In this talk I will review these developments, and go on the discuss the theory required for directed sequencing strategies.

  20. The RANDOM computer program: A linear congruential random number generator

    NASA Technical Reports Server (NTRS)

    Miles, R. F., Jr.

    1986-01-01

    The RANDOM Computer Program is a FORTRAN program for generating random number sequences and testing linear congruential random number generators (LCGs). The linear congruential form of random number generator is discussed, and the selection of parameters of an LCG for a microcomputer described. This document describes the following: (1) The RANDOM Computer Program; (2) RANDOM.MOD, the computer code needed to implement an LCG in a FORTRAN program; and (3) The RANCYCLE and the ARITH Computer Programs that provide computational assistance in the selection of parameters for an LCG. The RANDOM, RANCYCLE, and ARITH Computer Programs are written in Microsoft FORTRAN for the IBM PC microcomputer and its compatibles. With only minor modifications, the RANDOM Computer Program and its LCG can be run on most micromputers or mainframe computers.

  1. Weight distributions for turbo codes using random and nonrandom permutations

    NASA Technical Reports Server (NTRS)

    Dolinar, S.; Divsalar, D.

    1995-01-01

    This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.

  2. Effects of learning duration on implicit transfer.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2015-10-01

    Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.

  3. Compact quantum random number generator based on superluminescent light-emitting diodes

    NASA Astrophysics Data System (ADS)

    Wei, Shihai; Yang, Jie; Fan, Fan; Huang, Wei; Li, Dashuang; Xu, Bingjie

    2017-12-01

    By measuring the amplified spontaneous emission (ASE) noise of the superluminescent light emitting diodes, we propose and realize a quantum random number generator (QRNG) featured with practicability. In the QRNG, after the detection and amplification of the ASE noise, the data acquisition and randomness extraction which is integrated in a field programmable gate array (FPGA) are both implemented in real-time, and the final random bit sequences are delivered to a host computer with a real-time generation rate of 1.2 Gbps. Further, to achieve compactness, all the components of the QRNG are integrated on three independent printed circuit boards with a compact design, and the QRNG is packed in a small enclosure sized 140 mm × 120 mm × 25 mm. The final random bit sequences can pass all the NIST-STS and DIEHARD tests.

  4. An On-Demand Optical Quantum Random Number Generator with In-Future Action and Ultra-Fast Response

    PubMed Central

    Stipčević, Mario; Ursin, Rupert

    2015-01-01

    Random numbers are essential for our modern information based society e.g. in cryptography. Unlike frequently used pseudo-random generators, physical random number generators do not depend on complex algorithms but rather on a physicsal process to provide true randomness. Quantum random number generators (QRNG) do rely on a process, wich can be described by a probabilistic theory only, even in principle. Here we present a conceptualy simple implementation, which offers a 100% efficiency of producing a random bit upon a request and simultaneously exhibits an ultra low latency. A careful technical and statistical analysis demonstrates its robustness against imperfections of the actual implemented technology and enables to quickly estimate randomness of very long sequences. Generated random numbers pass standard statistical tests without any post-processing. The setup described, as well as the theory presented here, demonstrate the maturity and overall understanding of the technology. PMID:26057576

  5. An investigation of the uniform random number generator

    NASA Technical Reports Server (NTRS)

    Temple, E. C.

    1982-01-01

    Most random number generators that are in use today are of the congruential form X(i+1) + AX(i) + C mod M where A, C, and M are nonnegative integers. If C=O, the generator is called the multiplicative type and those for which C/O are called mixed congruential generators. It is easy to see that congruential generators will repeat a sequence of numbers after a maximum of M values have been generated. The number of numbers that a procedure generates before restarting the sequence is called the length or the period of the generator. Generally, it is desirable to make the period as long as possible. A detailed discussion of congruential generators is given. Also, several promising procedures that differ from the multiplicative and mixed procedure are discussed.

  6. A hybrid-type quantum random number generator

    NASA Astrophysics Data System (ADS)

    Hai-Qiang, Ma; Wu, Zhu; Ke-Jin, Wei; Rui-Xue, Li; Hong-Wei, Liu

    2016-05-01

    This paper proposes a well-performing hybrid-type truly quantum random number generator based on the time interval between two independent single-photon detection signals, which is practical and intuitive, and generates the initial random number sources from a combination of multiple existing random number sources. A time-to-amplitude converter and multichannel analyzer are used for qualitative analysis to demonstrate that each and every step is random. Furthermore, a carefully designed data acquisition system is used to obtain a high-quality random sequence. Our scheme is simple and proves that the random number bit rate can be dramatically increased to satisfy practical requirements. Project supported by the National Natural Science Foundation of China (Grant Nos. 61178010 and 11374042), the Fund of State Key Laboratory of Information Photonics and Optical Communications (Beijing University of Posts and Telecommunications), China, and the Fundamental Research Funds for the Central Universities of China (Grant No. bupt2014TS01).

  7. Reduced randomness in quantum cryptography with sequences of qubits encoded in the same basis

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Lamoureux, L.-P.; Cerf, N. J.; Bechmann-Pasquinucci, H.

    2006-03-15

    We consider the cloning of sequences of qubits prepared in the states used in the BB84 or six-state quantum cryptography protocol, and show that the single-qubit fidelity is unaffected even if entire sequences of qubits are prepared in the same basis. This result is only valid provided that the sequences are much shorter than the total key. It is of great importance for practical quantum cryptosystems because it reduces the need for high-speed random number generation without impairing on the security against finite-size cloning attacks.

  8. Randomizer for High Data Rates

    NASA Technical Reports Server (NTRS)

    Garon, Howard; Sank, Victor J.

    2018-01-01

    NASA as well as a number of other space agencies now recognize that the current recommended CCSDS randomizer used for telemetry (TM) is too short. When multiple applications of the PN8 Maximal Length Sequence (MLS) are required in order to fully cover a channel access data unit (CADU), spectral problems in the form of elevated spurious discretes (spurs) appear. Originally the randomizer was called a bit transition generator (BTG) precisely because it was thought that its primary value was to insure sufficient bit transitions to allow the bit/symbol synchronizer to lock and remain locked. We, NASA, have shown that the old BTG concept is a limited view of the real value of the randomizer sequence and that the randomizer also aids in signal acquisition as well as minimizing the potential for false decoder lock. Under the guidelines we considered here there are multiple maximal length sequences under GF(2) which appear attractive in this application. Although there may be mitigating reasons why another MLS sequence could be selected, one sequence in particular possesses a combination of desired properties which offsets it from the others.

  9. Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

    PubMed

    Ohno, S

    1984-01-01

    Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.

  10. Minimizing the average distance to a closest leaf in a phylogenetic tree.

    PubMed

    Matsen, Frederick A; Gallagher, Aaron; McCoy, Connor O

    2013-11-01

    When performing an analysis on a collection of molecular sequences, it can be convenient to reduce the number of sequences under consideration while maintaining some characteristic of a larger collection of sequences. For example, one may wish to select a subset of high-quality sequences that represent the diversity of a larger collection of sequences. One may also wish to specialize a large database of characterized "reference sequences" to a smaller subset that is as close as possible on average to a collection of "query sequences" of interest. Such a representative subset can be useful whenever one wishes to find a set of reference sequences that is appropriate to use for comparative analysis of environmentally derived sequences, such as for selecting "reference tree" sequences for phylogenetic placement of metagenomic reads. In this article, we formalize these problems in terms of the minimization of the Average Distance to the Closest Leaf (ADCL) and investigate algorithms to perform the relevant minimization. We show that the greedy algorithm is not effective, show that a variant of the Partitioning Around Medoids (PAM) heuristic gets stuck in local minima, and develop an exact dynamic programming approach. Using this exact program we note that the performance of PAM appears to be good for simulated trees, and is faster than the exact algorithm for small trees. On the other hand, the exact program gives solutions for all numbers of leaves less than or equal to the given desired number of leaves, whereas PAM only gives a solution for the prespecified number of leaves. Via application to real data, we show that the ADCL criterion chooses chimeric sequences less often than random subsets, whereas the maximization of phylogenetic diversity chooses them more often than random. These algorithms have been implemented in publicly available software.

  11. PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

    PubMed

    Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

    2018-01-01

    The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.

  12. Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

    PubMed

    Shteingart, Hanan; Loewenstein, Yonatan

    2016-01-01

    There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.

  13. Recombination of polynucleotide sequences using random or defined primers

    DOEpatents

    Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.

    2000-01-01

    A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.

  14. Recombination of polynucleotide sequences using random or defined primers

    DOEpatents

    Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.

    2001-01-01

    A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.

  15. Toward DNA-based Security Circuitry: First Step - Random Number Generation.

    PubMed

    Bogard, Christy M; Arazi, Benjamin; Rouchka, Eric C

    2008-08-10

    DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. Our team investigates the implications of DNA-based circuit design in serving security applications. As an initial step we develop a random number generation circuitry. A novel prototype schema employs solid-phase synthesis of oligonucleotides for random construction of DNA sequences. Temporary storage and retrieval is achieved through plasmid vectors.

  16. A new complexity measure for time series analysis and classification

    NASA Astrophysics Data System (ADS)

    Nagaraj, Nithin; Balasubramanian, Karthi; Dey, Sutirth

    2013-07-01

    Complexity measures are used in a number of applications including extraction of information from data such as ecological time series, detection of non-random structure in biomedical signals, testing of random number generators, language recognition and authorship attribution etc. Different complexity measures proposed in the literature like Shannon entropy, Relative entropy, Lempel-Ziv, Kolmogrov and Algorithmic complexity are mostly ineffective in analyzing short sequences that are further corrupted with noise. To address this problem, we propose a new complexity measure ETC and define it as the "Effort To Compress" the input sequence by a lossless compression algorithm. Here, we employ the lossless compression algorithm known as Non-Sequential Recursive Pair Substitution (NSRPS) and define ETC as the number of iterations needed for NSRPS to transform the input sequence to a constant sequence. We demonstrate the utility of ETC in two applications. ETC is shown to have better correlation with Lyapunov exponent than Shannon entropy even with relatively short and noisy time series. The measure also has a greater rate of success in automatic identification and classification of short noisy sequences, compared to entropy and a popular measure based on Lempel-Ziv compression (implemented by Gzip).

  17. Pseudorandom number generation using chaotic true orbits of the Bernoulli map

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Saito, Asaki, E-mail: saito@fun.ac.jp; Yamaguchi, Akihiro

    We devise a pseudorandom number generator that exactly computes chaotic true orbits of the Bernoulli map on quadratic algebraic integers. Moreover, we describe a way to select the initial points (seeds) for generating multiple pseudorandom binary sequences. This selection method distributes the initial points almost uniformly (equidistantly) in the unit interval, and latter parts of the generated sequences are guaranteed not to coincide. We also demonstrate through statistical testing that the generated sequences possess good randomness properties.

  18. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

    PubMed

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

  19. Mean convergence theorems and weak laws of large numbers for weighted sums of random variables under a condition of weighted integrability

    NASA Astrophysics Data System (ADS)

    Ordóñez Cabrera, Manuel; Volodin, Andrei I.

    2005-05-01

    From the classical notion of uniform integrability of a sequence of random variables, a new concept of integrability (called h-integrability) is introduced for an array of random variables, concerning an array of constantsE We prove that this concept is weaker than other previous related notions of integrability, such as Cesàro uniform integrability [Chandra, Sankhya Ser. A 51 (1989) 309-317], uniform integrability concerning the weights [Ordóñez Cabrera, Collect. Math. 45 (1994) 121-132] and Cesàro [alpha]-integrability [Chandra and Goswami, J. Theoret. ProbabE 16 (2003) 655-669]. Under this condition of integrability and appropriate conditions on the array of weights, mean convergence theorems and weak laws of large numbers for weighted sums of an array of random variables are obtained when the random variables are subject to some special kinds of dependence: (a) rowwise pairwise negative dependence, (b) rowwise pairwise non-positive correlation, (c) when the sequence of random variables in every row is [phi]-mixing. Finally, we consider the general weak law of large numbers in the sense of Gut [Statist. Probab. Lett. 14 (1992) 49-52] under this new condition of integrability for a Banach space setting.

  20. Autonomous Byte Stream Randomizer

    NASA Technical Reports Server (NTRS)

    Paloulian, George K.; Woo, Simon S.; Chow, Edward T.

    2013-01-01

    Net-centric networking environments are often faced with limited resources and must utilize bandwidth as efficiently as possible. In networking environments that span wide areas, the data transmission has to be efficient without any redundant or exuberant metadata. The Autonomous Byte Stream Randomizer software provides an extra level of security on top of existing data encryption methods. Randomizing the data s byte stream adds an extra layer to existing data protection methods, thus making it harder for an attacker to decrypt protected data. Based on a generated crypto-graphically secure random seed, a random sequence of numbers is used to intelligently and efficiently swap the organization of bytes in data using the unbiased and memory-efficient in-place Fisher-Yates shuffle method. Swapping bytes and reorganizing the crucial structure of the byte data renders the data file unreadable and leaves the data in a deconstructed state. This deconstruction adds an extra level of security requiring the byte stream to be reconstructed with the random seed in order to be readable. Once the data byte stream has been randomized, the software enables the data to be distributed to N nodes in an environment. Each piece of the data in randomized and distributed form is a separate entity unreadable on its own right, but when combined with all N pieces, is able to be reconstructed back to one. Reconstruction requires possession of the key used for randomizing the bytes, leading to the generation of the same cryptographically secure random sequence of numbers used to randomize the data. This software is a cornerstone capability possessing the ability to generate the same cryptographically secure sequence on different machines and time intervals, thus allowing this software to be used more heavily in net-centric environments where data transfer bandwidth is limited.

  1. Autocorrelation peaks in congruential pseudorandom number generators

    NASA Technical Reports Server (NTRS)

    Neuman, F.; Merrick, R. B.

    1976-01-01

    The complete correlation structure of several congruential pseudorandom number generators (PRNG) of the same type and small cycle length was studied to deal with the problem of congruential PRNG almost repeating themselves at intervals smaller than their cycle lengths, during simulation of bandpass filtered normal random noise. Maximum period multiplicative and mixed congruential generators were studied, with inferences drawn from examination of several tractable members of a class of random number generators, and moduli from 2 to the 5th power to 2 to the 9th power. High correlation is shown to exist in mixed and multiplicative congruential random number generators and prime moduli Lehmer generators for shifts a fraction of their cycle length. The random noise sequences in question are required when simulating electrical noise, air turbulence, or time variation of wind parameters.

  2. Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding.

    PubMed

    Leray, Matthieu; Knowlton, Nancy

    2017-01-01

    DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.

  3. Digital RNA sequencing minimizes sequence-dependent bias and amplification noise with optimized single-molecule barcodes

    PubMed Central

    Shiroguchi, Katsuyuki; Jia, Tony Z.; Sims, Peter A.; Xie, X. Sunney

    2012-01-01

    RNA sequencing (RNA-Seq) is a powerful tool for transcriptome profiling, but is hampered by sequence-dependent bias and inaccuracy at low copy numbers intrinsic to exponential PCR amplification. We developed a simple strategy for mitigating these complications, allowing truly digital RNA-Seq. Following reverse transcription, a large set of barcode sequences is added in excess, and nearly every cDNA molecule is uniquely labeled by random attachment of barcode sequences to both ends. After PCR, we applied paired-end deep sequencing to read the two barcodes and cDNA sequences. Rather than counting the number of reads, RNA abundance is measured based on the number of unique barcode sequences observed for a given cDNA sequence. We optimized the barcodes to be unambiguously identifiable, even in the presence of multiple sequencing errors. This method allows counting with single-copy resolution despite sequence-dependent bias and PCR-amplification noise, and is analogous to digital PCR but amendable to quantifying a whole transcriptome. We demonstrated transcriptome profiling of Escherichia coli with more accurate and reproducible quantification than conventional RNA-Seq. PMID:22232676

  4. Accuracy of Reaction Cross Section for Exotic Nuclei in Glauber Model Based on MCMC Diagnostics

    NASA Astrophysics Data System (ADS)

    Rueter, Keiti; Novikov, Ivan

    2017-01-01

    Parameters of a nuclear density distribution for an exotic nuclei with halo or skin structures can be determined from the experimentally measured reaction cross-section. In the presented work, to extract parameters such as nuclear size information for a halo and core, we compare experimental data on reaction cross-sections with values obtained using expressions of the Glauber Model. These calculations are performed using a Markov Chain Monte Carlo algorithm. We discuss the accuracy of the Monte Carlo approach and its dependence on k*, the power law turnover point in the discreet power spectrum of the random number sequence and on the lag-1 autocorrelation time of the random number sequence.

  5. The correlation structure of several popular pseudorandom number generators

    NASA Technical Reports Server (NTRS)

    Neuman, F.; Merrick, R.; Martin, C. F.

    1973-01-01

    One of the desirable properties of a pseudorandom number generator is that the sequence of numbers it generates should have very low autocorrelation for all shifts except for zero shift and those that are multiples of its cycle length. Due to the simple methods of constructing random numbers, the ideal is often not quite fulfilled. A simple method of examining any random generator for previously unsuspected regularities is discussed. Once they are discovered it is often easy to derive the mathematical relationships, which describe the mathematical relationships, which describe the regular behavior. As examples, it is shown that high correlation exists in mixed and multiplicative congruential random number generators and prime moduli Lehmer generators for shifts a fraction of their cycle lengths.

  6. On the conservative nature of intragenic recombination

    PubMed Central

    Drummond, D. Allan; Silberg, Jonathan J.; Meyer, Michelle M.; Wilke, Claus O.; Arnold, Frances H.

    2005-01-01

    Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related β-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among β-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function. PMID:15809422

  7. On the conservative nature of intragenic recombination.

    PubMed

    Drummond, D Allan; Silberg, Jonathan J; Meyer, Michelle M; Wilke, Claus O; Arnold, Frances H

    2005-04-12

    Intragenic recombination rapidly creates protein sequence diversity compared with random mutation, but little is known about the relative effects of recombination and mutation on protein function. Here, we compare recombination of the distantly related beta-lactamases PSE-4 and TEM-1 to mutation of PSE-4. We show that, among beta-lactamase variants containing the same number of amino acid substitutions, variants created by recombination retain function with a significantly higher probability than those generated by random mutagenesis. We present a simple model that accurately captures the differing effects of mutation and recombination in real and simulated proteins with only four parameters: (i) the amino acid sequence distance between parents, (ii) the number of substitutions, (iii) the average probability that random substitutions will preserve function, and (iv) the average probability that substitutions generated by recombination will preserve function. Our results expose a fundamental functional enrichment in regions of protein sequence space accessible by recombination and provide a framework for evaluating whether the relative rates of mutation and recombination observed in nature reflect the underlying imbalance in their effects on protein function.

  8. RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

    PubMed Central

    Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

    2012-01-01

    RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611

  9. Pilot Study on the Applicability of Variance Reduction Techniques to the Simulation of a Stochastic Combat Model

    DTIC Science & Technology

    1987-09-01

    inverse transform method to obtain unit-mean exponential random variables, where Vi is the jth random number in the sequence of a stream of uniform random...numbers. The inverse transform method is discussed in the simulation textbooks listed in the reference section of this thesis. X(b,c,d) = - P(b,c,d...Defender ,C * P(b,c,d) We again use the inverse transform method to obtain the conditions for an interim event to occur and to induce the change in

  10. Genomes: At the edge of chaos with maximum information capacity

    NASA Astrophysics Data System (ADS)

    Kong, Sing-Guan; Chen, Hong-Da; Torda, Andrew; Lee, H. C.

    2016-12-01

    We propose an order index, ϕ, which quantifies the notion of “life at the edge of chaos” when applied to genome sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length and base composition. The 786 complete genomic sequences in GenBank were found to have ϕ values in a very narrow range, 0.037 ± 0.027. We show this implies that genomes are halfway towards being completely random, namely, at the edge of chaos. We argue that this narrow range represents the neighborhood of a fixed-point in the space of sequences, and genomes are driven there by the dynamics of a robust, predominantly neutral evolution process.

  11. Complete convergence of randomly weighted END sequences and its application.

    PubMed

    Li, Penghua; Li, Xiaoqin; Wu, Kehan

    2017-01-01

    We investigate the complete convergence of partial sums of randomly weighted extended negatively dependent (END) random variables. Some results of complete moment convergence, complete convergence and the strong law of large numbers for this dependent structure are obtained. As an application, we study the convergence of the state observers of linear-time-invariant systems. Our results extend the corresponding earlier ones.

  12. Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

    NASA Technical Reports Server (NTRS)

    Hurd, W. J.; Welch, L. R.

    1977-01-01

    A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.

  13. Quantum random number generator based on quantum nature of vacuum fluctuations

    NASA Astrophysics Data System (ADS)

    Ivanova, A. E.; Chivilikhin, S. A.; Gleim, A. V.

    2017-11-01

    Quantum random number generator (QRNG) allows obtaining true random bit sequences. In QRNG based on quantum nature of vacuum, optical beam splitter with two inputs and two outputs is normally used. We compare mathematical descriptions of spatial beam splitter and fiber Y-splitter in the quantum model for QRNG, based on homodyne detection. These descriptions were identical, that allows to use fiber Y-splitters in practical QRNG schemes, simplifying the setup. Also we receive relations between the input radiation and the resulting differential current in homodyne detector. We experimentally demonstrate possibility of true random bits generation by using QRNG based on homodyne detection with Y-splitter.

  14. Gene copy number evolution during tetraploid cotton radiation.

    PubMed

    Rong, J; Feltus, F A; Liu, L; Lin, L; Paterson, A H

    2010-11-01

    After polyploid formation, retention or loss of duplicated genes is not random. Genes with some functional domains are convergently restored to 'singleton' state after many independent genome duplications, and have been referred to as 'duplication-resistant' (DR) genes. To further explore the timeframe for their restoration to the singleton state, 27 cotton homologs of genes found to be 'DR' in Arabidopsis were selected based on diagnostic Pfam domains. Their copy numbers were studied using southern hybridization and sequence analysis in five tetraploid species and their ancestral A and D genome diploids. DR genes had significantly lower copy number than gene families hybridizing to randomly selected cotton ESTs. Three DR genes showed complete loss of D genome-derived homoeologs in some or all tetraploid species. Prior analysis has shown gene loss in polyploid cotton to be rare, and herein only one randomly selected gene showed loss of a homoeolog in only one of the five tetraploid species (Gossypium mustelinum). BAC sequencing confirmed two cases of gene loss in tetraploid cotton. Divergence among 5' sequences of DR genes amplified from G. arboreum, G. raimondii, and Gossypioides kirkii was correlated with gene copy number. These results show that genes containing Pfam domains associated with duplication resistance in Arabidopsis have also been preferentially restored to low copy number after a more recent polyploidization event in cotton. In tetraploid cotton, genes from the progenitor D genome seem to experience more gene copy number divergence than genes from the A genome. Together with D subgenome-biased alterations in gene expression, perhaps gene loss may contribute to the relatively larger portion of quantitative trait variation attributable to D than A subgenome chromosomes of tetraploid cotton.

  15. Random trinomial tree models and vanilla options

    NASA Astrophysics Data System (ADS)

    Ganikhodjaev, Nasir; Bayram, Kamola

    2013-09-01

    In this paper we introduce and study random trinomial model. The usual trinomial model is prescribed by triple of numbers (u, d, m). We call the triple (u, d, m) an environment of the trinomial model. A triple (Un, Dn, Mn), where {Un}, {Dn} and {Mn} are the sequences of independent, identically distributed random variables with 0 < Dn < 1 < Un and Mn = 1 for all n, is called a random environment and trinomial tree model with random environment is called random trinomial model. The random trinomial model is considered to produce more accurate results than the random binomial model or usual trinomial model.

  16. Minimalist design of a robust real-time quantum random number generator

    NASA Astrophysics Data System (ADS)

    Kravtsov, K. S.; Radchenko, I. V.; Kulik, S. P.; Molotkov, S. N.

    2015-08-01

    We present a simple and robust construction of a real-time quantum random number generator (QRNG). Our minimalist approach ensures stable operation of the device as well as its simple and straightforward hardware implementation as a stand-alone module. As a source of randomness the device uses measurements of time intervals between clicks of a single-photon detector. The obtained raw sequence is then filtered and processed by a deterministic randomness extractor, which is realized as a look-up table. This enables high speed on-the-fly processing without the need of extensive computations. The overall performance of the device is around 1 random bit per detector click, resulting in 1.2 Mbit/s generation rate in our implementation.

  17. Quantum random bit generation using energy fluctuations in stimulated Raman scattering.

    PubMed

    Bustard, Philip J; England, Duncan G; Nunn, Josh; Moffatt, Doug; Spanner, Michael; Lausten, Rune; Sussman, Benjamin J

    2013-12-02

    Random number sequences are a critical resource in modern information processing systems, with applications in cryptography, numerical simulation, and data sampling. We introduce a quantum random number generator based on the measurement of pulse energy quantum fluctuations in Stokes light generated by spontaneously-initiated stimulated Raman scattering. Bright Stokes pulse energy fluctuations up to five times the mean energy are measured with fast photodiodes and converted to unbiased random binary strings. Since the pulse energy is a continuous variable, multiple bits can be extracted from a single measurement. Our approach can be generalized to a wide range of Raman active materials; here we demonstrate a prototype using the optical phonon line in bulk diamond.

  18. Normal and compound poisson approximations for pattern occurrences in NGS reads.

    PubMed

    Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu

    2012-06-01

    Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).

  19. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fleischmann, R.D.; Adams, M.D.; White, O.

    1995-07-28

    An approach for genome analysis based on sequencing and assembly of unselected pieces of DNA from the whole chromosome has been applied to obtain the complete nucleotide sequence (1,830,137 base pairs) of the genome from the bacterium Haemophilus influenzae Rd. This approach eliminates the need for initial mapping efforts and is therefore applicable to the vast array of microbial species for which genome maps are unavailable. The H. influenzae Rd genome sequence (Genome Sequence DataBase accession number L42023) represents the only complete genome sequence from a free-living organism. 46 refs., 4 figs., 4 tabs.

  20. Counting of oligomers in sequences generated by markov chains for DNA motif discovery.

    PubMed

    Shan, Gao; Zheng, Wei-Mou

    2009-02-01

    By means of the technique of the imbedded Markov chain, an efficient algorithm is proposed to exactly calculate first, second moments of word counts and the probability for a word to occur at least once in random texts generated by a Markov chain. A generating function is introduced directly from the imbedded Markov chain to derive asymptotic approximations for the problem. Two Z-scores, one based on the number of sequences with hits and the other on the total number of word hits in a set of sequences, are examined for discovery of motifs on a set of promoter sequences extracted from A. thaliana genome. Source code is available at http://www.itp.ac.cn/zheng/oligo.c.

  1. Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

    NASA Astrophysics Data System (ADS)

    Choi, Woo-Yong; Chatterjee, Mainak

    2015-03-01

    With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.

  2. Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

    PubMed

    Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

    2017-06-07

    For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.

  3. Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice

    PubMed Central

    Burleigh, J. Gordon; Light, Jessica E.; Reed, David L.

    2016-01-01

    Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain. PMID:27547523

  4. The randomized benchmarking number is not what you think it is

    NASA Astrophysics Data System (ADS)

    Proctor, Timothy; Rudinger, Kenneth; Blume-Kohout, Robin; Sarovar, Mohan; Young, Kevin

    Randomized benchmarking (RB) is a widely used technique for characterizing a gate set, whereby random sequences of gates are used to probe the average behavior of the gate set. The gates are chosen to ideally compose to the identity, and the rate of decay in the survival probability of an initial state with increasing length sequences is extracted from a set of experiments - this is the `RB number'. For reasonably well-behaved noise and particular gate sets, it has been claimed that the RB number is a reliable estimate of the average gate fidelity (AGF) of each noisy gate to the ideal target unitary, averaged over all gates in the set. Contrary to this widely held view, we show that this is not the case. We show that there are physically relevant situations, in which RB was thought to be provably reliable, where the RB number is many orders of magnitude away from the AGF. These results have important implications for interpreting the RB protocol, and immediate consequences for many advanced RB techniques. Sandia National Laboratories is a multi-mission laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy's National Nuclear Security Administration under contract DE-AC04-94AL85000.

  5. OPEN PROBLEM: Orbits' statistics in chaotic dynamical systems

    NASA Astrophysics Data System (ADS)

    Arnold, V.

    2008-07-01

    This paper shows how the measurement of the stochasticity degree of a finite sequence of real numbers, published by Kolmogorov in Italian in a journal of insurances' statistics, can be usefully applied to measure the objective stochasticity degree of sequences, originating from dynamical systems theory and from number theory. Namely, whenever the value of Kolmogorov's stochasticity parameter of a given sequence of numbers is too small (or too big), one may conclude that the conjecture describing this sequence as a sample of independent values of a random variables is highly improbable. Kolmogorov used this strategy fighting (in a paper in 'Doklady', 1940) against Lysenko, who had tried to disprove the classical genetics' law of Mendel experimentally. Calculating his stochasticity parameter value for the numbers from Lysenko's experiment reports, Kolmogorov deduced, that, while these numbers were different from the exact fulfilment of Mendel's 3 : 1 law, any smaller deviation would be a manifestation of the report's number falsification. The calculation of the values of the stochasticity parameter would be useful for many other generators of pseudorandom numbers and for many other chaotically looking statistics, including even the prime numbers distribution (discussed in this paper as an example).

  6. Population genetics and molecular evolution of DNA sequences in transposable elements. I. A simulation framework.

    PubMed

    Kijima, T E; Innan, Hideki

    2013-11-01

    A population genetic simulation framework is developed to understand the behavior and molecular evolution of DNA sequences of transposable elements. Our model incorporates random transposition and excision of transposable element (TE) copies, two modes of selection against TEs, and degeneration of transpositional activity by point mutations. We first investigated the relationships between the behavior of the copy number of TEs and these parameters. Our results show that when selection is weak, the genome can maintain a relatively large number of TEs, but most of them are less active. In contrast, with strong selection, the genome can maintain only a limited number of TEs but the proportion of active copies is large. In such a case, there could be substantial fluctuations of the copy number over generations. We also explored how DNA sequences of TEs evolve through the simulations. In general, active copies form clusters around the original sequence, while less active copies have long branches specific to themselves, exhibiting a star-shaped phylogeny. It is demonstrated that the phylogeny of TE sequences could be informative to understand the dynamics of TE evolution.

  7. Simple chained guide trees give high-quality protein multiple sequence alignments

    PubMed Central

    Boyce, Kieran; Sievers, Fabian; Higgins, Desmond G.

    2014-01-01

    Guide trees are used to decide the order of sequence alignment in the progressive multiple sequence alignment heuristic. These guide trees are often the limiting factor in making large alignments, and considerable effort has been expended over the years in making these quickly or accurately. In this article we show that, at least for protein families with large numbers of sequences that can be benchmarked with known structures, simple chained guide trees give the most accurate alignments. These also happen to be the fastest and simplest guide trees to construct, computationally. Such guide trees have a striking effect on the accuracy of alignments produced by some of the most widely used alignment packages. There is a marked increase in accuracy and a marked decrease in computational time, once the number of sequences goes much above a few hundred. This is true, even if the order of sequences in the guide tree is random. PMID:25002495

  8. FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

    PubMed

    El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

    2016-01-01

    A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.

  9. A revision of the subtract-with-borrow random number generators

    NASA Astrophysics Data System (ADS)

    Sibidanov, Alexei

    2017-12-01

    The most popular and widely used subtract-with-borrow generator, also known as RANLUX, is reimplemented as a linear congruential generator using large integer arithmetic with the modulus size of 576 bits. Modern computers, as well as the specific structure of the modulus inferred from RANLUX, allow for the development of a fast modular multiplication - the core of the procedure. This was previously believed to be slow and have too high cost in terms of computing resources. Our tests show a significant gain in generation speed which is comparable with other fast, high quality random number generators. An additional feature is the fast skipping of generator states leading to a seeding scheme which guarantees the uniqueness of random number sequences. Licensing provisions: GPLv3 Programming language: C++, C, Assembler

  10. 640-Gbit/s fast physical random number generation using a broadband chaotic semiconductor laser

    NASA Astrophysics Data System (ADS)

    Zhang, Limeng; Pan, Biwei; Chen, Guangcan; Guo, Lu; Lu, Dan; Zhao, Lingjuan; Wang, Wei

    2017-04-01

    An ultra-fast physical random number generator is demonstrated utilizing a photonic integrated device based broadband chaotic source with a simple post data processing method. The compact chaotic source is implemented by using a monolithic integrated dual-mode amplified feedback laser (AFL) with self-injection, where a robust chaotic signal with RF frequency coverage of above 50 GHz and flatness of ±3.6 dB is generated. By using 4-least significant bits (LSBs) retaining from the 8-bit digitization of the chaotic waveform, random sequences with a bit-rate up to 640 Gbit/s (160 GS/s × 4 bits) are realized. The generated random bits have passed each of the fifteen NIST statistics tests (NIST SP800-22), indicating its randomness for practical applications.

  11. Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples.

    PubMed

    Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael

    2017-08-08

    Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.

  12. Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

    PubMed

    Lau, Billy T; Ji, Hanlee P

    2017-09-21

    RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.

  13. Structurally complex and highly active RNA ligases derived from random RNA sequences

    NASA Technical Reports Server (NTRS)

    Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

    1995-01-01

    Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.

  14. Large deviations in the random sieve

    NASA Astrophysics Data System (ADS)

    Grimmett, Geoffrey

    1997-05-01

    The proportion [rho]k of gaps with length k between square-free numbers is shown to satisfy log[rho]k=[minus sign](1+o(1))(6/[pi]2) klogk as k[rightward arrow][infty infinity]. Such asymptotics are consistent with Erdos's challenge to prove that the gap following the square-free number t is smaller than clogt/log logt, for all t and some constant c satisfying c>[pi]2/12. The results of this paper are achieved by studying the probabilities of large deviations in a certain ‘random sieve’, for which the proportions [rho]k have representations as probabilities. The asymptotic form of [rho]k may be obtained in situations of greater generality, when the squared primes are replaced by an arbitrary sequence (sr) of relatively prime integers satisfying [sum L: summation operator]r1/sr<[infty infinity], subject to two further conditions of regularity on this sequence.

  15. Random and externally controlled occurrences of Dansgaard-Oeschger events

    NASA Astrophysics Data System (ADS)

    Lohmann, Johannes; Ditlevsen, Peter D.

    2018-05-01

    Dansgaard-Oeschger (DO) events constitute the most pronounced mode of centennial to millennial climate variability of the last glacial period. Since their discovery, many decades of research have been devoted to understand the origin and nature of these rapid climate shifts. In recent years, a number of studies have appeared that report emergence of DO-type variability in fully coupled general circulation models via different mechanisms. These mechanisms result in the occurrence of DO events at varying degrees of regularity, ranging from periodic to random. When examining the full sequence of DO events as captured in the North Greenland Ice Core Project (NGRIP) ice core record, one can observe high irregularity in the timing of individual events at any stage within the last glacial period. In addition to the prevailing irregularity, certain properties of the DO event sequence, such as the average event frequency or the relative distribution of cold versus warm periods, appear to be changing throughout the glacial. By using statistical hypothesis tests on simple event models, we investigate whether the observed event sequence may have been generated by stationary random processes or rather was strongly modulated by external factors. We find that the sequence of DO warming events is consistent with a stationary random process, whereas dividing the event sequence into warming and cooling events leads to inconsistency with two independent event processes. As we include external forcing, we find a particularly good fit to the observed DO sequence in a model where the average residence time in warm periods are controlled by global ice volume and cold periods by boreal summer insolation.

  16. Spontaneous Spatial Mapping of Learned Sequence in Chimpanzees: Evidence for a SNARC-Like Effect

    PubMed Central

    Adachi, Ikuma

    2014-01-01

    In the last couple of decades, there has been a growing number of reports on space-based representation of numbers and serial order in humans. In the present study, to explore evolutionary origins of such representations, we examined whether our closest evolutionary relatives, chimpanzees, map an acquired sequence onto space in a similar way to humans. The subjects had been trained to perform a number sequence task in which they touched a sequence of “small” to “large” Arabic numerals presented in random locations on the monitor. This task was presented in sessions that also included test trials consisting of only two numerals (1 and 9) horizontally arranged. On half of the trials 1 was located to the left of 9, whereas on the other half 1 was to the right to 9. The Chimpanzees' performance was systematically influenced by the spatial arrangement of the stimuli; specifically, they responded quicker when 1 was on the left and 9 on the right compared to the other way around. This result suggests that chimpanzees, like humans, spontaneously map a learned sequence onto space. PMID:24643044

  17. Lyapunov exponents for one-dimensional aperiodic photonic bandgap structures

    NASA Astrophysics Data System (ADS)

    Kissel, Glen J.

    2011-10-01

    Existing in the "gray area" between perfectly periodic and purely randomized photonic bandgap structures are the socalled aperoidic structures whose layers are chosen according to some deterministic rule. We consider here a onedimensional photonic bandgap structure, a quarter-wave stack, with the layer thickness of one of the bilayers subject to being either thin or thick according to five deterministic sequence rules and binary random selection. To produce these aperiodic structures we examine the following sequences: Fibonacci, Thue-Morse, Period doubling, Rudin-Shapiro, as well as the triadic Cantor sequence. We model these structures numerically with a long chain (approximately 5,000,000) of transfer matrices, and then use the reliable algorithm of Wolf to calculate the (upper) Lyapunov exponent for the long product of matrices. The Lyapunov exponent is the statistically well-behaved variable used to characterize the Anderson localization effect (exponential confinement) when the layers are randomized, so its calculation allows us to more precisely compare the purely randomized structure with its aperiodic counterparts. It is found that the aperiodic photonic systems show much fine structure in their Lyapunov exponents as a function of frequency, and, in a number of cases, the exponents are quite obviously fractal.

  18. Statistical complexity measure of pseudorandom bit generators

    NASA Astrophysics Data System (ADS)

    González, C. M.; Larrondo, H. A.; Rosso, O. A.

    2005-08-01

    Pseudorandom number generators (PRNG) are extensively used in Monte Carlo simulations, gambling machines and cryptography as substitutes of ideal random number generators (RNG). Each application imposes different statistical requirements to PRNGs. As L’Ecuyer clearly states “the main goal for Monte Carlo methods is to reproduce the statistical properties on which these methods are based whereas for gambling machines and cryptology, observing the sequence of output values for some time should provide no practical advantage for predicting the forthcoming numbers better than by just guessing at random”. In accordance with different applications several statistical test suites have been developed to analyze the sequences generated by PRNGs. In a recent paper a new statistical complexity measure [Phys. Lett. A 311 (2003) 126] has been defined. Here we propose this measure, as a randomness quantifier of a PRNGs. The test is applied to three very well known and widely tested PRNGs available in the literature. All of them are based on mathematical algorithms. Another PRNGs based on Lorenz 3D chaotic dynamical system is also analyzed. PRNGs based on chaos may be considered as a model for physical noise sources and important new results are recently reported. All the design steps of this PRNG are described, and each stage increase the PRNG randomness using different strategies. It is shown that the MPR statistical complexity measure is capable to quantify this randomness improvement. The PRNG based on the chaotic 3D Lorenz dynamical system is also evaluated using traditional digital signal processing tools for comparison.

  19. Computationally assisted screening and design of cell-interactive peptides by a cell-based assay using peptide arrays and a fuzzy neural network algorithm.

    PubMed

    Kaga, Chiaki; Okochi, Mina; Tomita, Yasuyuki; Kato, Ryuji; Honda, Hiroyuki

    2008-03-01

    We developed a method of effective peptide screening that combines experiments and computational analysis. The method is based on the concept that screening efficiency can be enhanced from even limited data by use of a model derived from computational analysis that serves as a guide to screening and combining the model with subsequent repeated experiments. Here we focus on cell-adhesion peptides as a model application of this peptide-screening strategy. Cell-adhesion peptides were screened by use of a cell-based assay of a peptide array. Starting with the screening data obtained from a limited, random 5-mer library (643 sequences), a rule regarding structural characteristics of cell-adhesion peptides was extracted by fuzzy neural network (FNN) analysis. According to this rule, peptides with unfavored residues in certain positions that led to inefficient binding were eliminated from the random sequences. In the restricted, second random library (273 sequences), the yield of cell-adhesion peptides having an adhesion rate more than 1.5-fold to that of the basal array support was significantly high (31%) compared with the unrestricted random library (20%). In the restricted third library (50 sequences), the yield of cell-adhesion peptides increased to 84%. We conclude that a repeated cycle of experiments screening limited numbers of peptides can be assisted by the rule-extracting feature of FNN.

  20. What's your number? The effects of trial order on the one-target advantage.

    PubMed

    Bested, Stephen R; Khan, Michael A; Lawrence, Gavin P; Tremblay, Luc

    2018-05-01

    When moving our upper-limb towards a single target, movement times are typically shorter than when movement to a second target is required. This is known as the one-target advantage. Most studies that have demonstrated the one-target advantage have employed separate trial blocks for the one- and two-segment movements. To test if the presence of the one-target advantage depends on advance knowledge of the number of segments, the present study investigated whether the one-target advantage would emerge under different trial orders/sequences. One- and two-segment responses were organized in blocked (i.e., 1-1-1, 2-2-2), alternating (i.e., 1-2-1-2-1-2), and random (i.e., 1-1-2-1-2-2) trial sequences. Similar to previous studies, where only blocked schedules have typically been utilized, the one-target advantage emerged during the blocked and alternate conditions, but not in the random condition. This finding indicates that the one-target advantage is contingent on participants knowing the number of movement segments prior to stimulus onset. Copyright © 2018 Elsevier B.V. All rights reserved.

  1. Applying Agrep to r-NSA to solve multiple sequences approximate matching.

    PubMed

    Ni, Bing; Wong, Man-Hon; Lam, Chi-Fai David; Leung, Kwong-Sak

    2014-01-01

    This paper addresses the approximate matching problem in a database consisting of multiple DNA sequences, where the proposed approach applies Agrep to a new truncated suffix array, r-NSA. The construction time of the structure is linear to the database size, and the computations of indexing a substring in the structure are constant. The number of characters processed in applying Agrep is analysed theoretically, and the theoretical upper-bound can approximate closely the empirical number of characters, which is obtained through enumerating the characters in the actual structure built. Experiments are carried out using (synthetic) random DNA sequences, as well as (real) genome sequences including Hepatitis-B Virus and X-chromosome. Experimental results show that, compared to the straight-forward approach that applies Agrep to multiple sequences individually, the proposed approach solves the matching problem in much shorter time. The speed-up of our approach depends on the sequence patterns, and for highly similar homologous genome sequences, which are the common cases in real-life genomes, it can be up to several orders of magnitude.

  2. AFLP fragment isolation technique as a method to produce random sequences for single nucleotide polymorphism discovery in the green turtle, Chelonia mydas.

    PubMed

    Roden, Suzanne E; Dutton, Peter H; Morin, Phillip A

    2009-01-01

    The green sea turtle, Chelonia mydas, was used as a case study for single nucleotide polymorphism (SNP) discovery in a species that has little genetic sequence information available. As green turtles have a complex population structure, additional nuclear markers other than microsatellites could add to our understanding of their complex life history. Amplified fragment length polymorphism technique was used to generate sets of random fragments of genomic DNA, which were then electrophoretically separated with precast gels, stained with SYBR green, excised, and directly sequenced. It was possible to perform this method without the use of polyacrylamide gels, radioactive or fluorescent labeled primers, or hybridization methods, reducing the time, expense, and safety hazards of SNP discovery. Within 13 loci, 2547 base pairs were screened, resulting in the discovery of 35 SNPs. Using this method, it was possible to yield a sufficient number of loci to screen for SNP markers without the availability of prior sequence information.

  3. Determination of the stacking fault density in highly defective single GaAs nanowires by means of coherent diffraction imaging

    NASA Astrophysics Data System (ADS)

    Davtyan, Arman; Biermanns, Andreas; Loffeld, Otmar; Pietsch, Ullrich

    2016-06-01

    Coherent x-ray diffraction imaging is used to measure diffraction patterns from individual highly defective nanowires, showing a complex speckle pattern instead of well-defined Bragg peaks. The approach is tested for nanowires of 500 nm diameter and 500 nm height predominately composed by zinc-blende (ZB) and twinned zinc-blende (TZB) phase domains. Phase retrieval is used to reconstruct the measured 2-dimensional intensity patterns recorded from single nanowires with 3.48 nm and 0.98 nm spatial resolution. Whereas the speckle amplitudes and distribution are perfectly reconstructed, no unique solution could be obtained for the phase structure. The number of phase switches is found to be proportional to the number of measured speckles and follows a narrow number distribution. Using data with 0.98 nm spatial resolution the mean number of phase switches is in reasonable agreement with estimates taken from TEM. However, since the resolved phase domain still is 3-4 times larger than a single GaAs bilayer we explain the non-ambiguous phase reconstruction by the fact that depending on starting phase and sequence of subroutines used during the phase retrieval the retrieved phase domain host a different sequence of randomly stacked bilayers. Modelling possible arrangements of bilayer sequences within a phase domain demonstrate that the complex speckle patterns measured can indeed be explained by the random arrangement of the ZB and TZB phase domains.

  4. Simulation of rockfalls triggered by earthquakes

    USGS Publications Warehouse

    Kobayashi, Y.; Harp, E.L.; Kagawa, T.

    1990-01-01

    A computer program to simulate the downslope movement of boulders in rolling or bouncing modes has been developed and applied to actual rockfalls triggered by the Mammoth Lakes, California, earthquake sequence in 1980 and the Central Idaho earthquake in 1983. In order to reproduce a movement mode where bouncing predominated, we introduced an artificial unevenness to the slope surface by adding a small random number to the interpolated value of the mid-points between the adjacent surveyed points. Three hundred simulations were computed for each site by changing the random number series, which determined distances and bouncing intervals. The movement of the boulders was, in general, rather erratic depending on the random numbers employed, and the results could not be seen as deterministic but stochastic. The closest agreement between calculated and actual movements was obtained at the site with the most detailed and accurate topographic measurements. ?? 1990 Springer-Verlag.

  5. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971

  6. Viral morphogenesis is the dominant source of sequence censorship in M13 combinatorial peptide phage display.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Rodi, D. J.; Soares, A. S.; Makowski, L.

    Novel statistical methods have been developed and used to quantitate and annotate the sequence diversity within combinatorial peptide libraries on the basis of small numbers (1-200) of sequences selected at random from commercially available M13 p3-based phage display libraries. These libraries behave statistically as though they correspond to populations containing roughly 4.0{+-}1.6% of the random dodecapeptides and 7.9{+-}2.6% of the random constrained heptapeptides that are theoretically possible within the phage populations. Analysis of amino acid residue occurrence patterns shows no demonstrable influence on sequence censorship by Escherichia coli tRNA isoacceptor profiles or either overall codon or Class II codon usagemore » patterns, suggesting no metabolic constraints on recombinant p3 synthesis. There is an overall depression in the occurrence of cysteine, arginine and glycine residues and an overabundance of proline, threonine and histidine residues. The majority of position-dependent amino acid sequence bias is clustered at three positions within the inserted peptides of the dodecapeptide library, +1, +3 and +12 downstream from the signal peptidase cleavage site. Conformational tendency measures of the peptides indicate a significant preference for inserts favoring a {beta}-turn conformation. The observed protein sequence limitations can primarily be attributed to genetic codon degeneracy and signal peptidase cleavage preferences. These data suggest that for applications in which maximal sequence diversity is essential, such as epitope mapping or novel receptor identification, combinatorial peptide libraries should be constructed using codon-corrected trinucleotide cassettes within vector-host systems designed to minimize morphogenesis-related censorship.« less

  7. Long period pseudo random number sequence generator

    NASA Technical Reports Server (NTRS)

    Wang, Charles C. (Inventor)

    1989-01-01

    A circuit for generating a sequence of pseudo random numbers, (A sub K). There is an exponentiator in GF(2 sup m) for the normal basis representation of elements in a finite field GF(2 sup m) each represented by m binary digits and having two inputs and an output from which the sequence (A sub K). Of pseudo random numbers is taken. One of the two inputs is connected to receive the outputs (E sub K) of maximal length shift register of n stages. There is a switch having a pair of inputs and an output. The switch outputs is connected to the other of the two inputs of the exponentiator. One of the switch inputs is connected for initially receiving a primitive element (A sub O) in GF(2 sup m). Finally, there is a delay circuit having an input and an output. The delay circuit output is connected to the other of the switch inputs and the delay circuit input is connected to the output of the exponentiator. Whereby after the exponentiator initially receives the primitive element (A sub O) in GF(2 sup m) through the switch, the switch can be switched to cause the exponentiator to receive as its input a delayed output A(K-1) from the exponentiator thereby generating (A sub K) continuously at the output of the exponentiator. The exponentiator in GF(2 sup m) is novel and comprises a cyclic-shift circuit; a Massey-Omura multiplier; and, a control logic circuit all operably connected together to perform the function U(sub i) = 92(sup i) (for n(sub i) = 1 or 1 (for n(subi) = 0).

  8. Wide brick tunnel randomization - an unequal allocation procedure that limits the imbalance in treatment totals.

    PubMed

    Kuznetsova, Olga M; Tymofyeyev, Yevgen

    2014-04-30

    In open-label studies, partial predictability of permuted block randomization provides potential for selection bias. To lessen the selection bias in two-arm studies with equal allocation, a number of allocation procedures that limit the imbalance in treatment totals at a pre-specified level but do not require the exact balance at the ends of the blocks were developed. In studies with unequal allocation, however, the task of designing a randomization procedure that sets a pre-specified limit on imbalance in group totals is not resolved. Existing allocation procedures either do not preserve the allocation ratio at every allocation or do not include all allocation sequences that comply with the pre-specified imbalance threshold. Kuznetsova and Tymofyeyev described the brick tunnel randomization for studies with unequal allocation that preserves the allocation ratio at every step and, in the two-arm case, includes all sequences that satisfy the smallest possible imbalance threshold. This article introduces wide brick tunnel randomization for studies with unequal allocation that allows all allocation sequences with imbalance not exceeding any pre-specified threshold while preserving the allocation ratio at every step. In open-label studies, allowing a larger imbalance in treatment totals lowers selection bias because of the predictability of treatment assignments. The applications of the technique in two-arm and multi-arm open-label studies with unequal allocation are described. Copyright © 2013 John Wiley & Sons, Ltd.

  9. Analysis of entropy extraction efficiencies in random number generation systems

    NASA Astrophysics Data System (ADS)

    Wang, Chao; Wang, Shuang; Chen, Wei; Yin, Zhen-Qiang; Han, Zheng-Fu

    2016-05-01

    Random numbers (RNs) have applications in many areas: lottery games, gambling, computer simulation, and, most importantly, cryptography [N. Gisin et al., Rev. Mod. Phys. 74 (2002) 145]. In cryptography theory, the theoretical security of the system calls for high quality RNs. Therefore, developing methods for producing unpredictable RNs with adequate speed is an attractive topic. Early on, despite the lack of theoretical support, pseudo RNs generated by algorithmic methods performed well and satisfied reasonable statistical requirements. However, as implemented, those pseudorandom sequences were completely determined by mathematical formulas and initial seeds, which cannot introduce extra entropy or information. In these cases, “random” bits are generated that are not at all random. Physical random number generators (RNGs), which, in contrast to algorithmic methods, are based on unpredictable physical random phenomena, have attracted considerable research interest. However, the way that we extract random bits from those physical entropy sources has a large influence on the efficiency and performance of the system. In this manuscript, we will review and discuss several randomness extraction schemes that are based on radiation or photon arrival times. We analyze the robustness, post-processing requirements and, in particular, the extraction efficiency of those methods to aid in the construction of efficient, compact and robust physical RNG systems.

  10. PCV2d-2 is the predominant type of PCV2 DNA in pig samples collected in the U.S. during 2014-2016.

    PubMed

    Xiao, Chao-Ting; Harmon, Karen M; Halbur, Patrick G; Opriessnig, Tanja

    2016-12-25

    Porcine circovirus type 2 (PCV2) vaccination was introduced in the US in 2006 and since has been adopted by most pig producers. While porcine circovirus associated disease (PCVAD) outbreaks are now relatively uncommon in the US, PCV2 remains a concern which is emphasized by increasing numbers of PCR and sequencing requests for PCV2. In the present study, randomly selected lung tissues from 586 pigs submitted in 2015 were tested for presence of PCV2 DNA. Positive samples were further characterized by sequencing and combined with available PCV2 open-reading-frame (ORF) 2 sequences from the client data base of the Iowa State University Veterinary Diagnostic Laboratory. The prevalence of PCV2 in the randomly selected lung tissues was 23% (135/586) with 11.3% PCV2a, 29% PCV2b and 71.8% for PCV2d subgroup PCV2d-2. A total of 455 ORF2 sequences obtained from 2014 through 2016 were analyzed and PCV2d accounted for 66.7% of the 2014 sequences, 71.8% of the 2015 sequences, and 72% of the 2016 sequences. Interestingly, only 1.9% (9/455) of the sequences belonged to the recently identified PCV2e genotype. The present data indicates that despite an almost 100% PCV2 vaccine coverage in the US, PCV2 DNA can still be detected in almost 1 of 4 randomly selected pig tissues. PCV2d-2 is now the predominant genotype in the USA suggesting that PCV2d-2 may have some advantage over PCV2a and PCV2b in its ability to replicate in pigs under vaccination pressure. Copyright © 2016. Published by Elsevier B.V.

  11. Alignment of RNA molecules: Binding energy and statistical properties of random sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Valba, O. V., E-mail: valbaolga@gmail.com; Nechaev, S. K., E-mail: sergei.nechaev@gmail.com; Tamm, M. V., E-mail: thumm.m@gmail.com

    2012-02-15

    A new statistical approach to the problem of pairwise alignment of RNA sequences is proposed. The problem is analyzed for a pair of interacting polymers forming an RNA-like hierarchical cloverleaf structures. An alignment is characterized by the numbers of matches, mismatches, and gaps. A weight function is assigned to each alignment; this function is interpreted as a free energy taking into account both direct monomer-monomer interactions and a combinatorial contribution due to formation of various cloverleaf secondary structures. The binding free energy is determined for a pair of RNA molecules. Statistical properties are discussed, including fluctuations of the binding energymore » between a pair of RNA molecules and loop length distribution in a complex. Based on an analysis of the free energy per nucleotide pair complexes of random RNAs as a function of the number of nucleotide types c, a hypothesis is put forward about the exclusivity of the alphabet c = 4 used by nature.« less

  12. A multi-center randomized controlled trial to compare a self-ligating bracket with a conventional bracket in a UK population: Part 1: Treatment efficiency.

    PubMed

    O'Dywer, Lian; Littlewood, Simon J; Rahman, Shahla; Spencer, R James; Barber, Sophy K; Russell, Joanne S

    2016-01-01

    To use a two-arm parallel trial to compare treatment efficiency between a self-ligating and a conventional preadjusted edgewise appliance system. A prospective multi-center randomized controlled clinical trial was conducted in three hospital orthodontic departments. Subjects were randomly allocated to receive treatment with either a self-ligating (3M SmartClip) or conventional (3M Victory) preadjusted edgewise appliance bracket system using a computer-generated random sequence concealed in opaque envelopes, with stratification for operator and center. Two operators followed a standardized protocol regarding bracket bonding procedure and archwire sequence. Efficiency of each ligation system was assessed by comparing the duration of treatment (months), total number of appointments (scheduled and emergency visits), and number of bracket bond failures. One hundred thirty-eight subjects (mean age 14 years 11 months) were enrolled in the study, of which 135 subjects (97.8%) completed treatment. The mean treatment time and number of visits were 25.12 months and 19.97 visits in the SmartClip group and 25.80 months and 20.37 visits in the Victory group. The overall bond failure rate was 6.6% for the SmartClip and 7.2% for Victory, with a similar debond distribution between the two appliances. No significant differences were found between the bracket systems in any of the outcome measures. No serious harm was observed from either bracket system. There was no clinically significant difference in treatment efficiency between treatment with a self-ligating bracket system and a conventional ligation system.

  13. Random sampling of constrained phylogenies: conducting phylogenetic analyses when the phylogeny is partially known.

    PubMed

    Housworth, E A; Martins, E P

    2001-01-01

    Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.

  14. Sequence requirement of the ade6-4095 meiotic recombination hotspot in Schizosaccharomyces pombe.

    PubMed

    Foulis, Steven J; Fowler, Kyle R; Steiner, Walter W

    2018-02-01

    Homologous recombination occurs at a greatly elevated frequency in meiosis compared to mitosis and is initiated by programmed double-strand DNA breaks (DSBs). DSBs do not occur at uniform frequency throughout the genome in most organisms, but occur preferentially at a limited number of sites referred to as hotspots. The location of hotspots have been determined at nucleotide-level resolution in both the budding and fission yeasts, and while several patterns have emerged regarding preferred locations for DSB hotspots, it remains unclear why particular sites experience DSBs at much higher frequency than other sites with seemingly similar properties. Short sequence motifs, which are often sites for binding of transcription factors, are known to be responsible for a number of hotspots. In this study we identified the minimum sequence required for activity of one of such motif identified in a screen of random sequences capable of producing recombination hotspots. The experimentally determined sequence, GGTCTRGACC, closely matches the previously inferred sequence. Full hotspot activity requires an effective sequence length of 9.5 bp, whereas moderate activity requires an effective sequence length of approximately 8.2 bp and shows significant association with DSB hotspots. In combination with our previous work, this result is consistent with a large number of different sequence motifs capable of producing recombination hotspots, and supports a model in which hotspots can be rapidly regenerated by mutation as they are lost through recombination.

  15. On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21

    NASA Technical Reports Server (NTRS)

    Aalfs, David D.

    1995-01-01

    For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.

  16. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  17. Random number generation in bilingual Balinese and German students: preliminary findings from an exploratory cross-cultural study.

    PubMed

    Strenge, Hans; Lesmana, Cokorda Bagus Jaya; Suryani, Luh Ketut

    2009-08-01

    Verbal random number generation is a procedurally simple task to assess executive function and appears ideally suited for the use under diverse settings in cross-cultural research. The objective of this study was to examine ethnic group differences between young adults in Bali (Indonesia) and Kiel (Germany): 50 bilingual healthy students, 30 Balinese and 20 Germans, attempted to generate a random sequence of the digits 1 to 9. In Balinese participants, randomization was done in Balinese (native language L1) and Indonesian (first foreign language L2), in German subjects in the German (L1) and English (L2) languages. 10 of 30 Balinese (33%), but no Germans, were unable to inhibit habitual counting in more than half of the responses. The Balinese produced significantly more nonrandom responses than the Germans with higher rates of counting and significantly less occurrence of the digits 2 and 3 in L1 compared with L2. Repetition and cycling behavior did not differ between the four languages. The findings highlight the importance of taking into account culture-bound psychosocial factors for Balinese individuals when administering and interpreting a random number generation test.

  18. Fossils out of sequence: Computer simulations and strategies for dealing with stratigraphic disorder

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cutler, A.H.; Flessa, K.W.

    Microstratigraphic resolution is limited by vertical mixing and reworking of fossils. Stratigraphic disorder is the degree to which fossils within a stratigraphic sequence are not in proper chronological order. Stratigraphic disorder arises through in situ vertical mixing of fossils and reworking of older fossils into younger deposits. The authors simulated the effects of mixing and reworking by simple computer models, and measured stratigraphic disorder using rank correlation between age and stratigraphic position (Spearman and Kendall coefficients). Mixing was simulated by randomly transposing pairs of adjacent fossils in a sequence. Reworking was simulated by randomly inserting older fossils into a youngermore » sequence. Mixing is an inefficient means of producing disorder; after 500 mixing steps stratigraphic order is still significant at the 99% to 95% level, depending on the coefficient used. Reworking disorders sequences very efficiently: significant order begins to be lost when reworked shells make up 35% of the sequence. Thus a sequence can be dominated by undisturbed, autochthonous shells and still be disordered. The effects of mixing-produced disorder can be minimized by increasing sample size at each horizon. Increased spacing between samples is of limited utility in dealing with disordered sequences: while widely separated samples are more likely to be stratigraphically ordered, the smaller number of samples makes the detection of trends problematic.« less

  19. Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

    PubMed Central

    Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

    2013-01-01

    Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799

  20. Experimental rugged fitness landscape in protein sequence space.

    PubMed

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-12-20

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4)-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.

  1. Experimental Rugged Fitness Landscape in Protein Sequence Space

    PubMed Central

    Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

    2006-01-01

    The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12–130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7×104-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18–24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region. PMID:17183728

  2. Unbiased Combinatorial Genomic Approaches to Identify Alternative Therapeutic Targets within the TSC Signaling Network

    DTIC Science & Technology

    2013-06-01

    number of ways to generate either random mutations or specific alterations to the genome sequence . Unlike previous approaches however, both TALENs and...made to the donor construct will be incorporated into the endogenous genomic sequence (examples in Liu et al., 2012; Zu et al., 2013). One challenge... Drosophila with the CRISPR RNA-guided Cas9 nuclease. Genetics. 2013. Hwang WY, Fu Y, Reyon D, Maeder ML, Tsai SQ, Sander JD, et al. Efficient genome

  3. Improved training for target detection using Fukunaga-Koontz transform and distance classifier correlation filter

    NASA Astrophysics Data System (ADS)

    Elbakary, M. I.; Alam, M. S.; Aslan, M. S.

    2008-03-01

    In a FLIR image sequence, a target may disappear permanently or may reappear after some frames and crucial information such as direction, position and size related to the target are lost. If the target reappears at a later frame, it may not be tracked again because the 3D orientation, size and location of the target might be changed. To obtain information about the target before disappearing and to detect the target after reappearing, distance classifier correlation filter (DCCF) is trained manualy by selecting a number of chips randomly. This paper introduces a novel idea to eliminates the manual intervention in training phase of DCCF. Instead of selecting the training chips manually and selecting the number of the training chips randomly, we adopted the K-means algorithm to cluster the training frames and based on the number of clusters we select the training chips such that a training chip for each cluster. To detect and track the target after reappearing in the field-ofview ,TBF and DCCF are employed. The contduced experiemnts using real FLIR sequences show results similar to the traditional agorithm but eleminating the manual intervention is the advantage of the proposed algorithm.

  4. Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

    PubMed Central

    Matochko, Wadim L.; Derda, Ratmir

    2013-01-01

    Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071

  5. Dimensional Reduction for the General Markov Model on Phylogenetic Trees.

    PubMed

    Sumner, Jeremy G

    2017-03-01

    We present a method of dimensional reduction for the general Markov model of sequence evolution on a phylogenetic tree. We show that taking certain linear combinations of the associated random variables (site pattern counts) reduces the dimensionality of the model from exponential in the number of extant taxa, to quadratic in the number of taxa, while retaining the ability to statistically identify phylogenetic divergence events. A key feature is the identification of an invariant subspace which depends only bilinearly on the model parameters, in contrast to the usual multi-linear dependence in the full space. We discuss potential applications including the computation of split (edge) weights on phylogenetic trees from observed sequence data.

  6. Subrandom methods for multidimensional nonuniform sampling.

    PubMed

    Worley, Bradley

    2016-08-01

    Methods of nonuniform sampling that utilize pseudorandom number sequences to select points from a weighted Nyquist grid are commonplace in biomolecular NMR studies, due to the beneficial incoherence introduced by pseudorandom sampling. However, these methods require the specification of a non-arbitrary seed number in order to initialize a pseudorandom number generator. Because the performance of pseudorandom sampling schedules can substantially vary based on seed number, this can complicate the task of routine data collection. Approaches such as jittered sampling and stochastic gap sampling are effective at reducing random seed dependence of nonuniform sampling schedules, but still require the specification of a seed number. This work formalizes the use of subrandom number sequences in nonuniform sampling as a means of seed-independent sampling, and compares the performance of three subrandom methods to their pseudorandom counterparts using commonly applied schedule performance metrics. Reconstruction results using experimental datasets are also provided to validate claims made using these performance metrics. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Cancelable biometrics realization with multispace random projections.

    PubMed

    Teoh, Andrew Beng Jin; Yuang, Chong Tze

    2007-10-01

    Biometric characteristics cannot be changed; therefore, the loss of privacy is permanent if they are ever compromised. This paper presents a two-factor cancelable formulation, where the biometric data are distorted in a revocable but non-reversible manner by first transforming the raw biometric data into a fixed-length feature vector and then projecting the feature vector onto a sequence of random subspaces that were derived from a user-specific pseudorandom number (PRN). This process is revocable and makes replacing biometrics as easy as replacing PRNs. The formulation has been verified under a number of scenarios (normal, stolen PRN, and compromised biometrics scenarios) using 2400 Facial Recognition Technology face images. The diversity property is also examined.

  8. Random number generators tested on quantum Monte Carlo simulations.

    PubMed

    Hongo, Kenta; Maezono, Ryo; Miura, Kenichi

    2010-08-01

    We have tested and compared several (pseudo) random number generators (RNGs) applied to a practical application, ground state energy calculations of molecules using variational and diffusion Monte Carlo metheds. A new multiple recursive generator with 8th-order recursion (MRG8) and the Mersenne twister generator (MT19937) are tested and compared with the RANLUX generator with five luxury levels (RANLUX-[0-4]). Both MRG8 and MT19937 are proven to give the same total energy as that evaluated with RANLUX-4 (highest luxury level) within the statistical error bars with less computational cost to generate the sequence. We also tested the notorious implementation of linear congruential generator (LCG), RANDU, for comparison. (c) 2010 Wiley Periodicals, Inc.

  9. Rumor Processes in Random Environment on and on Galton-Watson Trees

    NASA Astrophysics Data System (ADS)

    Bertacchi, Daniela; Zucca, Fabio

    2013-11-01

    The aim of this paper is to study rumor processes in random environment. In a rumor process a signal starts from the stations of a fixed vertex (the root) and travels on a graph from vertex to vertex. We consider two rumor processes. In the firework process each station, when reached by the signal, transmits it up to a random distance. In the reverse firework process, on the other hand, stations do not send any signal but they “listen” for it up to a random distance. The first random environment that we consider is the deterministic 1-dimensional tree with a random number of stations on each vertex; in this case the root is the origin of . We give conditions for the survival/extinction on almost every realization of the sequence of stations. Later on, we study the processes on Galton-Watson trees with random number of stations on each vertex. We show that if the probability of survival is positive, then there is survival on almost every realization of the infinite tree such that there is at least one station at the root. We characterize the survival of the process in some cases and we give sufficient conditions for survival/extinction.

  10. Molecular Analysis of Date Palm Genetic Diversity Using Random Amplified Polymorphic DNA (RAPD) and Inter-Simple Sequence Repeats (ISSRs).

    PubMed

    El Sharabasy, Sherif F; Soliman, Khaled A

    2017-01-01

    The date palm is an ancient domesticated plant with great diversity and has been cultivated in the Middle East and North Africa for at last 5000 years. Date palm cultivars are classified based on the fruit moisture content, as dry, semidry, and soft dates. There are a number of biochemical and molecular techniques available for characterization of the date palm variation. This chapter focuses on the DNA-based markers random amplified polymorphic DNA (RAPD) and inter-simple sequence repeats (ISSR) techniques, in addition to biochemical markers based on isozyme analysis. These techniques coupled with appropriate statistical tools proved useful for determining phylogenetic relationships among date palm cultivars and provide information resources for date palm gene banks.

  11. Composition bias and the origin of ORFan genes

    PubMed Central

    Yomtovian, Inbal; Teerakulkittipong, Nuttinee; Lee, Byungkook; Moult, John; Unger, Ron

    2010-01-01

    Motivation: Intriguingly, sequence analysis of genomes reveals that a large number of genes are unique to each organism. The origin of these genes, termed ORFans, is not known. Here, we explore the origin of ORFan genes by defining a simple measure called ‘composition bias’, based on the deviation of the amino acid composition of a given sequence from the average composition of all proteins of a given genome. Results: For a set of 47 prokaryotic genomes, we show that the amino acid composition bias of real proteins, random ‘proteins’ (created by using the nucleotide frequencies of each genome) and ‘proteins’ translated from intergenic regions are distinct. For ORFans, we observed a correlation between their composition bias and their relative evolutionary age. Recent ORFan proteins have compositions more similar to those of random ‘proteins’, while the compositions of more ancient ORFan proteins are more similar to those of the set of all proteins of the organism. This observation is consistent with an evolutionary scenario wherein ORFan genes emerged and underwent a large number of random mutations and selection, eventually adapting to the composition preference of their organism over time. Contact: ron@biocoml.ls.biu.ac.il Supplementary information: Supplementary data are available at Bioinformatics online. PMID:20231229

  12. ECB deacylase mutants

    DOEpatents

    Arnold, Frances H.; Shao, Zhixin; Zhao, Huimin; Giver, Lorraine J.

    2002-01-01

    A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.

  13. Random Sequence for Optimal Low-Power Laser Generated Ultrasound

    NASA Astrophysics Data System (ADS)

    Vangi, D.; Virga, A.; Gulino, M. S.

    2017-08-01

    Low-power laser generated ultrasounds are lately gaining importance in the research world, thanks to the possibility of investigating a mechanical component structural integrity through a non-contact and Non-Destructive Testing (NDT) procedure. The ultrasounds are, however, very low in amplitude, making it necessary to use pre-processing and post-processing operations on the signals to detect them. The cross-correlation technique is used in this work, meaning that a random signal must be used as laser input. For this purpose, a highly random and simple-to-create code called T sequence, capable of enhancing the ultrasound detectability, is introduced (not previously available at the state of the art). Several important parameters which characterize the T sequence can influence the process: the number of pulses Npulses , the pulse duration δ and the distance between pulses dpulses . A Finite Element FE model of a 3 mm steel disk has been initially developed to analytically study the longitudinal ultrasound generation mechanism and the obtainable outputs. Later, experimental tests have shown that the T sequence is highly flexible for ultrasound detection purposes, making it optimal to use high Npulses and δ but low dpulses . In the end, apart from describing all phenomena that arise in the low-power laser generation process, the results of this study are also important for setting up an effective NDT procedure using this technology.

  14. Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

    NASA Technical Reports Server (NTRS)

    Kaljurand, M.; Valentin, J. R.; Shao, M.

    1996-01-01

    Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.

  15. Simulation of Crack Propagation in Engine Rotating Components under Variable Amplitude Loading

    NASA Technical Reports Server (NTRS)

    Bonacuse, P. J.; Ghosn, L. J.; Telesman, J.; Calomino, A. M.; Kantzos, P.

    1998-01-01

    The crack propagation life of tested specimens has been repeatedly shown to strongly depend on the loading history. Overloads and extended stress holds at temperature can either retard or accelerate the crack growth rate. Therefore, to accurately predict the crack propagation life of an actual component, it is essential to approximate the true loading history. In military rotorcraft engine applications, the loading profile (stress amplitudes, temperature, and number of excursions) can vary significantly depending on the type of mission flown. To accurately assess the durability of a fleet of engines, the crack propagation life distribution of a specific component should account for the variability in the missions performed (proportion of missions flown and sequence). In this report, analytical and experimental studies are described that calibrate/validate the crack propagation prediction capability ]or a disk alloy under variable amplitude loading. A crack closure based model was adopted to analytically predict the load interaction effects. Furthermore, a methodology has been developed to realistically simulate the actual mission mix loading on a fleet of engines over their lifetime. A sequence of missions is randomly selected and the number of repeats of each mission in the sequence is determined assuming a Poisson distributed random variable with a given mean occurrence rate. Multiple realizations of random mission histories are generated in this manner and are used to produce stress, temperature, and time points for fracture mechanics calculations. The result is a cumulative distribution of crack propagation lives for a given, life limiting, component location. This information can be used to determine a safe retirement life or inspection interval for the given location.

  16. Percolation in random-Sierpiński carpets: A real space renormalization group approach

    NASA Astrophysics Data System (ADS)

    Perreau, Michel; Peiro, Joaquina; Berthier, Serge

    1996-11-01

    The site percolation transition in random Sierpiński carpets is investigated by real space renormalization. The fixed point is not unique like in regular translationally invariant lattices, but depends on the number k of segmentation steps of the generation process of the fractal. It is shown that, for each scale invariance ratio n, the sequence of fixed points pn,k is increasing with k, and converges when k-->∞ toward a limit pn strictly less than 1. Moreover, in such scale invariant structures, the percolation threshold does not depend only on the scale invariance ratio n, but also on the scale. The sequence pn,k and pn are calculated for n=4, 8, 16, 32, and 64, and for k=1 to k=11, and k=∞. The corresponding thermal exponent sequence νn,k is calculated for n=8 and 16, and for k=1 to k=5, and k=∞. Suggestions are made for an experimental test in physical self-similar structures.

  17. Sustained State-Independent Quantum Contextual Correlations from a Single Ion

    NASA Astrophysics Data System (ADS)

    Leupold, F. M.; Malinowski, M.; Zhang, C.; Negnevitsky, V.; Alonso, J.; Home, J. P.; Cabello, A.

    2018-05-01

    We use a single trapped-ion qutrit to demonstrate the quantum-state-independent violation of noncontextuality inequalities using a sequence of randomly chosen quantum nondemolition projective measurements. We concatenate 53 ×106 sequential measurements of 13 observables, and unambiguously violate an optimal noncontextual bound. We use the same data set to characterize imperfections including signaling and repeatability of the measurements. The experimental sequence was generated in real time with a quantum random number generator integrated into our control system to select the subsequent observable with a latency below 50 μ s , which can be used to constrain contextual hidden-variable models that might describe our results. The state-recycling experimental procedure is resilient to noise and independent of the qutrit state, substantiating the fact that the contextual nature of quantum physics is connected to measurements and not necessarily to designated states. The use of extended sequences of quantum nondemolition measurements finds applications in the fields of sensing and quantum information.

  18. Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

    PubMed

    Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

    2016-05-12

    In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P < 0.001). However, the microsatellite analysis revealed that most isolates contained mixed genotypes, even those that had no detectable genome sequence heterogeneity. Random sampling of different numbers of SNPs showed that an F ws index derived from ten or more SNPs with minor allele frequencies of >10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).

  19. Single-electron random-number generator (RNG) for highly secure ubiquitous computing applications

    NASA Astrophysics Data System (ADS)

    Uchida, Ken; Tanamoto, Tetsufumi; Fujita, Shinobu

    2007-11-01

    Since the security of all modern cryptographic techniques relies on unpredictable and irreproducible digital keys generated by random-number generators (RNGs), the realization of high-quality RNG is essential for secure communications. In this report, a new RNG, which utilizes single-electron phenomena, is proposed. A room-temperature operating silicon single-electron transistor (SET) having nearby an electron pocket is used as a high-quality, ultra-small RNG. In the proposed RNG, stochastic single-electron capture/emission processes to/from the electron pocket are detected with high sensitivity by the SET, and result in giant random telegraphic signals (GRTS) on the SET current. It is experimentally demonstrated that the single-electron RNG generates extremely high-quality random digital sequences at room temperature, in spite of its simple configuration. Because of its small-size and low-power properties, the single-electron RNG is promising as a key nanoelectronic device for future ubiquitous computing systems with highly secure mobile communication capabilities.

  20. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F. William

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.

  1. Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

    DOEpatents

    Studier, F.W.

    1995-04-18

    Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.

  2. Hide and vanish: data sets where the most parsimonious tree is known but hard to find, and their implications for tree search methods.

    PubMed

    Goloboff, Pablo A

    2014-10-01

    Three different types of data sets, for which the uniquely most parsimonious tree can be known exactly but is hard to find with heuristic tree search methods, are studied. Tree searches are complicated more by the shape of the tree landscape (i.e. the distribution of homoplasy on different trees) than by the sheer abundance of homoplasy or character conflict. Data sets of Type 1 are those constructed by Radel et al. (2013). Data sets of Type 2 present a very rugged landscape, with narrow peaks and valleys, but relatively low amounts of homoplasy. For such a tree landscape, subjecting the trees to TBR and saving suboptimal trees produces much better results when the sequence of clipping for the tree branches is randomized instead of fixed. An unexpected finding for data sets of Types 1 and 2 is that starting a search from a random tree instead of a random addition sequence Wagner tree may increase the probability that the search finds the most parsimonious tree; a small artificial example where these probabilities can be calculated exactly is presented. Data sets of Type 3, the most difficult data sets studied here, comprise only congruent characters, and a single island with only one most parsimonious tree. Even if there is a single island, missing entries create a very flat landscape which is difficult to traverse with tree search algorithms because the number of equally parsimonious trees that need to be saved and swapped to effectively move around the plateaus is too large. Minor modifications of the parameters of tree drifting, ratchet, and sectorial searches allow travelling around these plateaus much more efficiently than saving and swapping large numbers of equally parsimonious trees with TBR. For these data sets, two new related criteria for selecting taxon addition sequences in Wagner trees (the "selected" and "informative" addition sequences) produce much better results than the standard random or closest addition sequences. These new methods for Wagner trees and for moving around plateaus can be useful when analyzing phylogenomic data sets formed by concatenation of genes with uneven taxon representation ("sparse" supermatrices), which are likely to present a tree landscape with extensive plateaus. Copyright © 2014 Elsevier Inc. All rights reserved.

  3. Novel pseudo-random number generator based on quantum random walks.

    PubMed

    Yang, Yu-Guang; Zhao, Qian-Qian

    2016-02-04

    In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation.

  4. Novel pseudo-random number generator based on quantum random walks

    PubMed Central

    Yang, Yu-Guang; Zhao, Qian-Qian

    2016-01-01

    In this paper, we investigate the potential application of quantum computation for constructing pseudo-random number generators (PRNGs) and further construct a novel PRNG based on quantum random walks (QRWs), a famous quantum computation model. The PRNG merely relies on the equations used in the QRWs, and thus the generation algorithm is simple and the computation speed is fast. The proposed PRNG is subjected to statistical tests such as NIST and successfully passed the test. Compared with the representative PRNG based on quantum chaotic maps (QCM), the present QRWs-based PRNG has some advantages such as better statistical complexity and recurrence. For example, the normalized Shannon entropy and the statistical complexity of the QRWs-based PRNG are 0.999699456771172 and 1.799961178212329e-04 respectively given the number of 8 bits-words, say, 16Mbits. By contrast, the corresponding values of the QCM-based PRNG are 0.999448131481064 and 3.701210794388818e-04 respectively. Thus the statistical complexity and the normalized entropy of the QRWs-based PRNG are closer to 0 and 1 respectively than those of the QCM-based PRNG when the number of words of the analyzed sequence increases. It provides a new clue to construct PRNGs and also extends the applications of quantum computation. PMID:26842402

  5. Operations analysis (study 2.1): Program manual and users guide for the LOVES computer code

    NASA Technical Reports Server (NTRS)

    Wray, S. T., Jr.

    1975-01-01

    Information is provided necessary to use the LOVES Computer Program in its existing state, or to modify the program to include studies not properly handled by the basic model. The Users Guide defines the basic elements assembled together to form the model for servicing satellites in orbit. As the program is a simulation, the method of attack is to disassemble the problem into a sequence of events, each occurring instantaneously and each creating one or more other events in the future. The main driving force of the simulation is the deterministic launch schedule of satellites and the subsequent failure of the various modules which make up the satellites. The LOVES Computer Program uses a random number generator to simulate the failure of module elements and therefore operates over a long span of time typically 10 to 15 years. The sequence of events is varied by making several runs in succession with different random numbers resulting in a Monte Carlo technique to determine statistical parameters of minimum value, average value, and maximum value.

  6. Abstracting Sequences: Reasoning That Is a Key to Academic Achievement.

    PubMed

    Pasnak, Robert; Kidd, Julie K; Gadzichowski, K Marinka; Gallington, Debbie A; Schmerold, Katrina Lea; West, Heather

    2015-01-01

    The ability to understand sequences of items may be an important cognitive ability. To test this proposition, 8 first-grade children from each of 36 classes were randomly assigned to four conditions. Some were taught sequences that represented increasing or decreasing values, or were symmetrical, or were rotations of an object through 6 or 8 positions. Control children received equal numbers of sessions on mathematics, reading, or social studies. Instruction was conducted three times weekly in 15-min sessions for seven months. In May, the children taught sequences applied their understanding to novel sequences, and scored as well or better on three standardized reading tests as the control children. They outscored all children on tests of mathematics concepts, and scored better than control children on some mathematics scales. These findings indicate that developing an understanding of sequences is a form of abstraction, probably involving fluid reasoning, that provides a foundation for academic achievement in early education.

  7. An approach for Ewing test selection to support the clinical assessment of cardiac autonomic neuropathy.

    PubMed

    Stranieri, Andrew; Abawajy, Jemal; Kelarev, Andrei; Huda, Shamsul; Chowdhury, Morshed; Jelinek, Herbert F

    2013-07-01

    This article addresses the problem of determining optimal sequences of tests for the clinical assessment of cardiac autonomic neuropathy (CAN). We investigate the accuracy of using only one of the recommended Ewing tests to classify CAN and the additional accuracy obtained by adding the remaining tests of the Ewing battery. This is important as not all five Ewing tests can always be applied in each situation in practice. We used new and unique database of the diabetes screening research initiative project, which is more than ten times larger than the data set used by Ewing in his original investigation of CAN. We utilized decision trees and the optimal decision path finder (ODPF) procedure for identifying optimal sequences of tests. We present experimental results on the accuracy of using each one of the recommended Ewing tests to classify CAN and the additional accuracy that can be achieved by adding the remaining tests of the Ewing battery. We found the best sequences of tests for cost-function equal to the number of tests. The accuracies achieved by the initial segments of the optimal sequences for 2, 3 and 4 categories of CAN are 80.80, 91.33, 93.97 and 94.14, and respectively, 79.86, 89.29, 91.16 and 91.76, and 78.90, 86.21, 88.15 and 88.93. They show significant improvement compared to the sequence considered previously in the literature and the mathematical expectations of the accuracies of a random sequence of tests. The complete outcomes obtained for all subsets of the Ewing features are required for determining optimal sequences of tests for any cost-function with the use of the ODPF procedure. We have also found two most significant additional features that can increase the accuracy when some of the Ewing attributes cannot be obtained. The outcomes obtained can be used to determine the optimal sequences of tests for each individual cost-function by following the ODPF procedure. The results show that the best single Ewing test for diagnosing CAN is the deep breathing heart rate variation test. Optimal sequences found for the cost-function equal to the number of tests guarantee that the best accuracy is achieved after any number of tests and provide an improvement in comparison with the previous ordering of tests or a random sequence. Copyright © 2013 Elsevier B.V. All rights reserved.

  8. ARTS: automated randomization of multiple traits for study design.

    PubMed

    Maienschein-Cline, Mark; Lei, Zhengdeng; Gardeux, Vincent; Abbasi, Taimur; Machado, Roberto F; Gordeuk, Victor; Desai, Ankit A; Saraf, Santosh; Bahroos, Neil; Lussier, Yves

    2014-06-01

    Collecting data from large studies on high-throughput platforms, such as microarray or next-generation sequencing, typically requires processing samples in batches. There are often systematic but unpredictable biases from batch-to-batch, so proper randomization of biologically relevant traits across batches is crucial for distinguishing true biological differences from experimental artifacts. When a large number of traits are biologically relevant, as is common for clinical studies of patients with varying sex, age, genotype and medical background, proper randomization can be extremely difficult to prepare by hand, especially because traits may affect biological inferences, such as differential expression, in a combinatorial manner. Here we present ARTS (automated randomization of multiple traits for study design), which aids researchers in study design by automatically optimizing batch assignment for any number of samples, any number of traits and any batch size. ARTS is implemented in Perl and is available at github.com/mmaiensc/ARTS. ARTS is also available in the Galaxy Tool Shed, and can be used at the Galaxy installation hosted by the UIC Center for Research Informatics (CRI) at galaxy.cri.uic.edu. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.

  9. The Status, Quality, and Expansion of the NIH Full-Length cDNA Project: The Mammalian Gene Collection (MGC)

    PubMed Central

    2004-01-01

    The National Institutes of Health's Mammalian Gene Collection (MGC) project was designed to generate and sequence a publicly accessible cDNA resource containing a complete open reading frame (ORF) for every human and mouse gene. The project initially used a random strategy to select clones from a large number of cDNA libraries from diverse tissues. Candidate clones were chosen based on 5′-EST sequences, and then fully sequenced to high accuracy and analyzed by algorithms developed for this project. Currently, more than 11,000 human and 10,000 mouse genes are represented in MGC by at least one clone with a full ORF. The random selection approach is now reaching a saturation point, and a transition to protocols targeted at the missing transcripts is now required to complete the mouse and human collections. Comparison of the sequence of the MGC clones to reference genome sequences reveals that most cDNA clones are of very high sequence quality, although it is likely that some cDNAs may carry missense variants as a consequence of experimental artifact, such as PCR, cloning, or reverse transcriptase errors. Recently, a rat cDNA component was added to the project, and ongoing frog (Xenopus) and zebrafish (Danio) cDNA projects were expanded to take advantage of the high-throughput MGC pipeline. PMID:15489334

  10. Deep Sequencing of Random Mutant Libraries Reveals the Active Site of the Narrow Specificity CphA Metallo-β-Lactamase is Fragile to Mutations.

    PubMed

    Sun, Zhizeng; Mehta, Shrenik C; Adamski, Carolyn J; Gibbs, Richard A; Palzkill, Timothy

    2016-09-12

    CphA is a Zn(2+)-dependent metallo-β-lactamase that efficiently hydrolyzes only carbapenem antibiotics. To understand the sequence requirements for CphA function, single codon random mutant libraries were constructed for residues in and near the active site and mutants were selected for E. coli growth on increasing concentrations of imipenem, a carbapenem antibiotic. At high concentrations of imipenem that select for phenotypically wild-type mutants, the active-site residues exhibit stringent sequence requirements in that nearly all residues in positions that contact zinc, the substrate, or the catalytic water do not tolerate amino acid substitutions. In addition, at high imipenem concentrations a number of residues that do not directly contact zinc or substrate are also essential and do not tolerate substitutions. Biochemical analysis confirmed that amino acid substitutions at essential positions decreased the stability or catalytic activity of the CphA enzyme. Therefore, the CphA active - site is fragile to substitutions, suggesting active-site residues are optimized for imipenem hydrolysis. These results also suggest that resistance to inhibitors targeted to the CphA active site would be slow to develop because of the strong sequence constraints on function.

  11. Optical Processing Techniques For Pseudorandom Sequence Prediction

    NASA Astrophysics Data System (ADS)

    Gustafson, Steven C.

    1983-11-01

    Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.

  12. Facilitated sequence counting and assembly by template mutagenesis

    PubMed Central

    Levy, Dan; Wigler, Michael

    2014-01-01

    Presently, inferring the long-range structure of the DNA templates is limited by short read lengths. Accurate template counts suffer from distortions occurring during PCR amplification. We explore the utility of introducing random mutations in identical or nearly identical templates to create distinguishable patterns that are inherited during subsequent copying. We simulate the applications of this process under assumptions of error-free sequencing and perfect mapping, using cytosine deamination as a model for mutation. The simulations demonstrate that within readily achievable conditions of nucleotide conversion and sequence coverage, we can accurately count the number of otherwise identical molecules as well as connect variants separated by long spans of identical sequence. We discuss many potential applications, such as transcript profiling, isoform assembly, haplotype phasing, and de novo genome assembly. PMID:25313059

  13. MDC-Analyzer: a novel degenerate primer design tool for the construction of intelligent mutagenesis libraries with contiguous sites.

    PubMed

    Tang, Lixia; Wang, Xiong; Ru, Beibei; Sun, Hengfei; Huang, Jian; Gao, Hui

    2014-06-01

    Recent computational and bioinformatics advances have enabled the efficient creation of novel biocatalysts by reducing amino acid variability at hot spot regions. To further expand the utility of this strategy, we present here a tool called Multi-site Degenerate Codon Analyzer (MDC-Analyzer) for the automated design of intelligent mutagenesis libraries that can completely cover user-defined randomized sequences, especially when multiple contiguous and/or adjacent sites are targeted. By initially defining an objective function, the possible optimal degenerate PCR primer profiles could be automatically explored using the heuristic approach of Greedy Best-First-Search. Compared to the previously developed DC-Analyzer, MDC-Analyzer allows for the existence of a small amount of undesired sequences as a tradeoff between the number of degenerate primers and the encoded library size while still providing all the benefits of DC-Analyzer with the ability to randomize multiple contiguous sites. MDC-Analyzer was validated using a series of randomly generated mutation schemes and experimental case studies on the evolution of halohydrin dehalogenase, which proved that the MDC methodology is more efficient than other methods and is particularly well-suited to exploring the sequence space of proteins using data-driven protein engineering strategies.

  14. Random sequences generation through optical measurements by phase-shifting interferometry

    NASA Astrophysics Data System (ADS)

    François, M.; Grosges, T.; Barchiesi, D.; Erra, R.; Cornet, A.

    2012-04-01

    The development of new techniques for producing random sequences with a high level of security is a challenging topic of research in modern cryptographics. The proposed method is based on the measurement by phase-shifting interferometry of the speckle signals of the interaction between light and structures. We show how the combination of amplitude and phase distributions (maps) under a numerical process can produce random sequences. The produced sequences satisfy all the statistical requirements of randomness and can be used in cryptographic schemes.

  15. Influence of age on adaptability of human mastication.

    PubMed

    Peyron, Marie-Agnès; Blanc, Olivier; Lund, James P; Woda, Alain

    2004-08-01

    The objective of this work was to study the influence of age on the ability of subjects to adapt mastication to changes in the hardness of foods. The study was carried out on 67 volunteers aged from 25 to 75 yr (29 males, 38 females) who had complete healthy dentitions. Surface electromyograms of the left and right masseter and temporalis muscles were recorded simultaneously with jaw movements using an electromagnetic transducer. Each volunteer was asked to chew and swallow four visco-elastic model foods of different hardness, each presented three times in random order. The number of masticatory cycles, their frequency, and the sum of all electromyographic (EMG) activity in all four muscles were calculated for each masticatory sequence. Multiple linear regression analyses were used to assess the effects of hardness, age, and gender. Hardness was associated to an increase in the mean number of cycles and mean summed EMG activity per sequence. It also increased mean vertical amplitude. Mean vertical amplitude and mean summed EMG activity per sequence were higher in males. These adaptations were present at all ages. Age was associated with an increase of 0.3 cycles per sequence per year of life and with a progressive increase in mean summed EMG activity per sequence. Cycle and opening duration early in the sequence also fell with age. We concluded that although the number of cycles needed to chew a standard piece of food increases progressively with age, the capacity to adapt to changes in the hardness of food is maintained.

  16. Methodological reporting of randomized clinical trials in respiratory research in 2010.

    PubMed

    Lu, Yi; Yao, Qiuju; Gu, Jie; Shen, Ce

    2013-09-01

    Although randomized controlled trials (RCTs) are considered the highest level of evidence, they are also subject to bias, due to a lack of adequately reported randomization, and therefore the reporting should be as explicit as possible for readers to determine the significance of the contents. We evaluated the methodological quality of RCTs in respiratory research in high ranking clinical journals, published in 2010. We assessed the methodological quality, including generation of the allocation sequence, allocation concealment, double-blinding, sample-size calculation, intention-to-treat analysis, flow diagrams, number of medical centers involved, diseases, funding sources, types of interventions, trial registration, number of times the papers have been cited, journal impact factor, journal type, and journal endorsement of the CONSORT (Consolidated Standards of Reporting Trials) rules, in RCTs published in 12 top ranking clinical respiratory journals and 5 top ranking general medical journals. We included 176 trials, of which 93 (53%) reported adequate generation of the allocation sequence, 66 (38%) reported adequate allocation concealment, 79 (45%) were double-blind, 123 (70%) reported adequate sample-size calculation, 88 (50%) reported intention-to-treat analysis, and 122 (69%) included a flow diagram. Multivariate logistic regression analysis revealed that journal impact factor ≥ 5 was the only variable that significantly influenced adequate allocation sequence generation. Trial registration and journal impact factor ≥ 5 significantly influenced adequate allocation concealment. Medical interventions, trial registration, and journal endorsement of the CONSORT statement influenced adequate double-blinding. Publication in one of the general medical journal influenced adequate sample-size calculation. The methodological quality of RCTs in respiratory research needs improvement. Stricter enforcement of the CONSORT statement should enhance the quality of RCTs.

  17. MIP models and hybrid algorithms for simultaneous job splitting and scheduling on unrelated parallel machines.

    PubMed

    Eroglu, Duygu Yilmaz; Ozmutlu, H Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms.

  18. The influence of environmental forcing on biodiversity and extinction in a resource competition model

    NASA Astrophysics Data System (ADS)

    Vakulenko, Sergey A.; Sudakov, Ivan; Mander, Luke

    2018-03-01

    In this paper, we study a model of many species that compete, directly or indirectly, for a pool of common resources under the influence of periodic, stochastic, and/or chaotic environmental forcing. Using numerical simulations, we find the number and sequence of species going extinct when the community is initially packed with a large number of species of random initial densities. Thereby, any species with a density below a given threshold is regarded to be extinct.

  19. The influence of environmental forcing on biodiversity and extinction in a resource competition model.

    PubMed

    Vakulenko, Sergey A; Sudakov, Ivan; Mander, Luke

    2018-03-01

    In this paper, we study a model of many species that compete, directly or indirectly, for a pool of common resources under the influence of periodic, stochastic, and/or chaotic environmental forcing. Using numerical simulations, we find the number and sequence of species going extinct when the community is initially packed with a large number of species of random initial densities. Thereby, any species with a density below a given threshold is regarded to be extinct.

  20. The relationships of 'ecstasy' (MDMA) and cannabis use to impaired executive inhibition and access to semantic long-term memory.

    PubMed

    Murphy, Philip N; Erwin, Philip G; Maciver, Linda; Fisk, John E; Larkin, Derek; Wareing, Michelle; Montgomery, Catharine; Hilton, Joanne; Tames, Frank J; Bradley, Belinda; Yanulevitch, Kate; Ralley, Richard

    2011-10-01

    This study aimed to examine the relationship between the consumption of ecstasy (3,4-methylenedioxymethamphetamine (MDMA)) and cannabis, and performance on the random letter generation task which generates dependent variables drawing upon executive inhibition and access to semantic long-term memory (LTM). The participant group was a between-participant independent variable with users of both ecstasy and cannabis (E/C group, n = 15), users of cannabis but not ecstasy (CA group, n = 13) and controls with no exposure to these drugs (CO group, n = 12). Dependent variables measured violations of randomness: number of repeat sequences, number of alphabetical sequences (both drawing upon inhibition) and redundancy (drawing upon access to semantic LTM). E/C participants showed significantly higher redundancy than CO participants but did not differ from CA participants. There were no significant effects for the other dependent variables. A regression model comprising intelligence measures and estimates of ecstasy and cannabis consumption predicted redundancy scores, but only cannabis consumption contributed significantly to this prediction. Impaired access to semantic LTM may be related to cannabis consumption, although the involvement of ecstasy and other stimulant drugs cannot be excluded here. Executive inhibitory functioning, as measured by the random letter generation task, is unrelated to ecstasy and cannabis consumption. Copyright © 2011 John Wiley & Sons, Ltd.

  1. NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

    PubMed

    Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

    2016-11-01

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.

  2. Alternation blindness in the representation of binary sequences.

    PubMed

    Yu, Ru Qi; Osherson, Daniel; Zhao, Jiaying

    2018-03-01

    Binary information is prevalent in the environment and contains 2 distinct outcomes. Binary sequences consist of a mixture of alternation and repetition. Understanding how people perceive such sequences would contribute to a general theory of information processing. In this study, we examined how people process alternation and repetition in binary sequences. Across 4 paradigms involving estimation, working memory, change detection, and visual search, we found that the number of alternations is underestimated compared with repetitions (Experiment 1). Moreover, recall for binary sequences deteriorates as the sequence alternates more (Experiment 2). Changes in bits are also harder to detect as the sequence alternates more (Experiment 3). Finally, visual targets superimposed on bits of a binary sequence take longer to process as alternation increases (Experiment 4). Overall, our results indicate that compared with repetition, alternation in a binary sequence is less salient in the sense of requiring more attention for successful encoding. The current study thus reveals the cognitive constraints in the representation of alternation and provides a new explanation for the overalternation bias in randomness perception. (PsycINFO Database Record (c) 2018 APA, all rights reserved).

  3. Reduction of display artifacts by random sampling

    NASA Technical Reports Server (NTRS)

    Ahumada, A. J., Jr.; Nagel, D. C.; Watson, A. B.; Yellott, J. I., Jr.

    1983-01-01

    The application of random-sampling techniques to remove visible artifacts (such as flicker, moire patterns, and paradoxical motion) introduced in TV-type displays by discrete sequential scanning is discussed and demonstrated. Sequential-scanning artifacts are described; the window of visibility defined in spatiotemporal frequency space by Watson and Ahumada (1982 and 1983) and Watson et al. (1983) is explained; the basic principles of random sampling are reviewed and illustrated by the case of the human retina; and it is proposed that the sampling artifacts can be replaced by random noise, which can then be shifted to frequency-space regions outside the window of visibility. Vertical sequential, single-random-sequence, and continuously renewed random-sequence plotting displays generating 128 points at update rates up to 130 Hz are applied to images of stationary and moving lines, and best results are obtained with the single random sequence for the stationary lines and with the renewed random sequence for the moving lines.

  4. Structure and function of neonatal social communication in a genetic mouse model of autism.

    PubMed

    Takahashi, T; Okabe, S; Broin, P Ó; Nishi, A; Ye, K; Beckert, M V; Izumi, T; Machida, A; Kang, G; Abe, S; Pena, J L; Golden, A; Kikusui, T; Hiroi, N

    2016-09-01

    A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor.

  5. Structure and function of neonatal social communication in a genetic mouse model of autism

    PubMed Central

    Takahashi, Tomohisa; Okabe, Shota; Ó Broin, Pilib; Nishi, Akira; Ye, Kenny; Beckert, Michael V.; Izumi, Takeshi; Machida, Akihiro; Kang, Gina; Abe, Seiji; Pena, Jose L.; Golden, Aaron; Kikusui, Takefumi; Hiroi, Noboru

    2015-01-01

    A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically-triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor. PMID:26666205

  6. Molecular Identification of Date Palm Cultivars Using Random Amplified Polymorphic DNA (RAPD) Markers.

    PubMed

    Al-Khalifah, Nasser S; Shanavaskhan, A E

    2017-01-01

    Ambiguity in the total number of date palm cultivars across the world is pointing toward the necessity for an enumerative study using standard morphological and molecular markers. Among molecular markers, DNA markers are more suitable and ubiquitous to most applications. They are highly polymorphic in nature, frequently occurring in genomes, easy to access, and highly reproducible. Various molecular markers such as restriction fragment length polymorphism (RFLP), amplified fragment length polymorphism (AFLP), simple sequence repeats (SSR), inter-simple sequence repeats (ISSR), and random amplified polymorphic DNA (RAPD) markers have been successfully used as efficient tools for analysis of genetic variation in date palm. This chapter explains a stepwise protocol for extracting total genomic DNA from date palm leaves. A user-friendly protocol for RAPD analysis and a table showing the primers used in different molecular techniques that produce polymorphisms in date palm are also provided.

  7. Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

    PubMed Central

    Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

    2010-01-01

    Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614

  8. Controlling Light Transmission Through Highly Scattering Media Using Semi-Definite Programming as a Phase Retrieval Computation Method.

    PubMed

    N'Gom, Moussa; Lien, Miao-Bin; Estakhri, Nooshin M; Norris, Theodore B; Michielssen, Eric; Nadakuditi, Raj Rao

    2017-05-31

    Complex Semi-Definite Programming (SDP) is introduced as a novel approach to phase retrieval enabled control of monochromatic light transmission through highly scattering media. In a simple optical setup, a spatial light modulator is used to generate a random sequence of phase-modulated wavefronts, and the resulting intensity speckle patterns in the transmitted light are acquired on a camera. The SDP algorithm allows computation of the complex transmission matrix of the system from this sequence of intensity-only measurements, without need for a reference beam. Once the transmission matrix is determined, optimal wavefronts are computed that focus the incident beam to any position or sequence of positions on the far side of the scattering medium, without the need for any subsequent measurements or wavefront shaping iterations. The number of measurements required and the degree of enhancement of the intensity at focus is determined by the number of pixels controlled by the spatial light modulator.

  9. Determinism and randomness in the evolution of introns and sine inserts in mouse and human mitochondrial solute carrier and cytokine receptor genes.

    PubMed

    Cianciulli, Antonia; Calvello, Rosa; Panaro, Maria A

    2015-04-01

    In the homologous genes studied, the exons and introns alternated in the same order in mouse and human. We studied, in both species: corresponding short segments of introns, whole corresponding introns and complete homologous genes. We considered the total number of nucleotides and the number and orientation of the SINE inserts. Comparisons of mouse and human data series showed that at the level of individual relatively short segments of intronic sequences the stochastic variability prevails in the local structuring, but at higher levels of organization a deterministic component emerges, conserved in mouse and human during the divergent evolution, despite the ample re-editing of the intronic sequences and the fact that processes such as SINE spread had taken place in an independent way in the two species. Intron conservation is negatively correlated with the SINE occupancy, suggesting that virus inserts interfere with the conservation of the sequences inherited from the common ancestor. Copyright © 2015 Elsevier Ltd. All rights reserved.

  10. CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.

    PubMed

    Onsongo, Getiria; Baughn, Linda B; Bower, Matthew; Henzler, Christine; Schomaker, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2016-11-01

    Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  11. THE EFFECTS ON LEARNING FROM A MOTION PICTURE FILM OF SELECTIVE CHANGES IN SOUND TRACK LOUDNESS LEVEL. FINAL REPORT.

    ERIC Educational Resources Information Center

    MOAKLEY, FRANCIS X.

    EFFECTS OF PERIODIC VARIATIONS IN AN INSTRUCTIONAL FILM'S NORMAL LOUDNESS LEVEL FOR RELEVANT AND IRRELEVANT FILM SEQUENCES WERE MEASURED BY A MULTIPLE CHOICE TEST. RIGOROUS PILOT STUDIES, RANDOM GROUPING OF SEVENTH GRADERS FOR TREATMENTS, AND RATINGS OF RELEVANT AND IRRELEVANT PORTIONS OF THE FILM BY AN UNSPECIFIED NUMBER OF JUDGES PRECEDED THE…

  12. Relatively Random: Context Effects on Perceived Randomness and Predicted Outcomes

    ERIC Educational Resources Information Center

    Matthews, William J.

    2013-01-01

    This article concerns the effect of context on people's judgments about sequences of chance outcomes. In Experiment 1, participants judged whether sequences were produced by random, mechanical processes (such as a roulette wheel) or skilled human action (such as basketball shots). Sequences with lower alternation rates were judged more likely to…

  13. Partial bisulfite conversion for unique template sequencing

    PubMed Central

    Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael

    2018-01-01

    Abstract We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. PMID:29161423

  14. Stability of Tandem Repeats in the Drosophila Melanogaster HSR-Omega Nuclear RNA

    PubMed Central

    Hogan, N. C.; Slot, F.; Traverse, K. L.; Garbe, J. C.; Bendena, W. G.; Pardue, M. L.

    1995-01-01

    The Drosophila melanogaster Hsr-omega locus produces a nuclear RNA containing >5 kb of tandem repeat sequences. These repeats are unique to Hsr-omega and show concerted evolution similar to that seen with classical satellite DNAs. In D. melanogaster the monomer is ~280 bp. Sequences of 191/2 monomers differ by 8 +/- 5% (mean +/- SD), when all pairwise comparisons are considered. Differences are single nucleotide substitutions and 1-3 nucleotide deletions/insertions. Changes appear to be randomly distributed over the repeat unit. Outer repeats do not show the decrease in monomer homogeneity that might be expected if homogeneity is maintained by recombination. However, just outside the last complete repeat at each end, there are a few fragments of sequence similar to the monomer. The sequences in these flanking regions are not those predicted for sequences decaying in the absence of recombination. Instead, the fragmentation of the sequence homology suggests that flanking regions have undergone more severe disruptions, possibly during an insertion or amplification event. Hsr-omega alleles differing in the number of repeats are detected and appear to be stable over a few thousand generations; however, both increases and decreases in repeat numbers have been observed. The new alleles appear to be as stable as their predecessors. No alleles of less than ~5 kb nor more than ~16 kb of repeats were seen in any stocks examined. The evidence that there is a limit on the minimum number of repeats is consistent with the suggestion that these repeats are important in the function of the unusual Hsr-omega nuclear RNA. PMID:7540581

  15. Viral metagenomic analysis of feces of wild small carnivores

    PubMed Central

    2014-01-01

    Background Recent studies have clearly demonstrated the enormous virus diversity that exists among wild animals. This exemplifies the required expansion of our knowledge of the virus diversity present in wildlife, as well as the potential transmission of these viruses to domestic animals or humans. Methods In the present study we evaluated the viral diversity of fecal samples (n = 42) collected from 10 different species of wild small carnivores inhabiting the northern part of Spain using random PCR in combination with next-generation sequencing. Samples were collected from American mink (Neovison vison), European mink (Mustela lutreola), European polecat (Mustela putorius), European pine marten (Martes martes), stone marten (Martes foina), Eurasian otter (Lutra lutra) and Eurasian badger (Meles meles) of the family of Mustelidae; common genet (Genetta genetta) of the family of Viverridae; red fox (Vulpes vulpes) of the family of Canidae and European wild cat (Felis silvestris) of the family of Felidae. Results A number of sequences of possible novel viruses or virus variants were detected, including a theilovirus, phleboviruses, an amdovirus, a kobuvirus and picobirnaviruses. Conclusions Using random PCR in combination with next generation sequencing, sequences of various novel viruses or virus variants were detected in fecal samples collected from Spanish carnivores. Detected novel viruses highlight the viral diversity that is present in fecal material of wild carnivores. PMID:24886057

  16. Detection by real-time PCR and pyrosequencing of the cry1Ab and cry1Ac genes introduced in genetically modified (GM) constructs.

    PubMed

    Debode, Frederic; Janssen, Eric; Bragard, Claude; Berben, Gilbert

    2017-08-01

    The presence of genetically modified organisms (GMOs) in food and feed is mainly detected by the use of targets focusing on promoters and terminators. As some genes are frequently used in genetically modified (GM) construction, they also constitute excellent screening elements and their use is increasing. In this paper we propose a new target for the detection of cry1Ab and cry1Ac genes by real-time polymerase chain reaction (PCR) and pyrosequencing. The specificity, sensitivity and robustness of the real-time PCR method were tested following the recommendations of international guidelines and the method met the expected performance criteria. This paper also shows how the robustness testing was assessed. This new cry1Ab/Ac method can provide a positive signal with a larger number of GM events than do the other existing methods using double dye-probes. The method permits the analysis of results with less ambiguity than the SYBRGreen method recommended by the European Reference Laboratory (EURL) GM Food and Feed (GMFF). A pyrosequencing method was also developed to gain additional information thanks to the sequence of the amplicon. This method of sequencing-by-synthesis can determine the sequence between the primers used for PCR. Pyrosequencing showed that the sequences internal to the primers present differences following the GM events considered and three different sequences were observed. The sensitivity of the pyrosequencing was tested on reference flours with a low percentage GM content and different copy numbers. Improvements in the pyrosequencing protocol provided correct sequences with 50 copies of the target. Below this copy number, the quality of the sequence was more random.

  17. The Effect of Interference on Temporal Order Memory for Random and Fixed Sequences in Nondemented Older Adults

    ERIC Educational Resources Information Center

    Tolentino, Jerlyn C.; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K.; Gilbert, Paul E.

    2012-01-01

    Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order…

  18. Local contextual processing of abstract and meaningful real-life images in professional athletes.

    PubMed

    Fogelson, Noa; Fernandez-Del-Olmo, Miguel; Acero, Rafael Martín

    2012-05-01

    We investigated the effect of abstract versus real-life meaningful images from sports on local contextual processing in two groups of professional athletes. Local context was defined as the occurrence of a short predictive series of stimuli occurring before delivery of a target event. EEG was recorded in 10 professional basketball players and 9 professional athletes of individual sports during three sessions. In each session, a different set of visual stimuli were presented: triangles facing left, up, right, or down; four images of a basketball player throwing a ball; four images of a baseball player pitching a baseball. Stimuli consisted of 15 % targets and 85 % of equal numbers of three types of standards. Recording blocks consisted of targets preceded by randomized sequences of standards and by sequences including a predictive sequence signaling the occurrence of a subsequent target event. Subjects pressed a button in response to targets. In all three sessions, reaction times and peak P3b latencies were shorter for predicted targets compared with random targets, the last most informative stimulus of the predictive sequence induced a robust P3b, and N2 amplitude was larger for random targets compared with predicted targets. P3b and N2 peak amplitudes were larger in the professional basketball group in comparison with professional athletes of individual sports, across the three sessions. The findings of this study suggest that local contextual information is processed similarly for abstract and for meaningful images and that professional basketball players seem to allocate more attentional resources in the processing of these visual stimuli.

  19. The DNA of ciliated protozoa.

    PubMed Central

    Prescott, D M

    1994-01-01

    Ciliates contain two types of nuclei: a micronucleus and a macronucleus. The micronucleus serves as the germ line nucleus but does not express its genes. The macronucleus provides the nuclear RNA for vegetative growth. Mating cells exchange haploid micronuclei, and a new macronucleus develops from a new diploid micronucleus. The old macronucleus is destroyed. This conversion consists of amplification, elimination, fragmentation, and splicing of DNA sequences on a massive scale. Fragmentation produces subchromosomal molecules in Tetrahymena and Paramecium cells and much smaller, gene-sized molecules in hypotrichous ciliates to which telomere sequences are added. These molecules are then amplified, some to higher copy numbers than others. rDNA is differentially amplified to thousands of copies per macronucleus. Eliminated sequences include transposonlike elements and sequences called internal eliminated sequences that interrupt gene coding regions in the micronuclear genome. Some, perhaps all, of these are excised as circular molecules and destroyed. In at least some hypotrichs, segments of some micronuclear genes are scrambled in a nonfunctional order and are recorded during macronuclear development. Vegetatively growing ciliates appear to possess a mechanism for adjusting copy numbers of individual genes, which corrects gene imbalances resulting from random distribution of DNA molecules during amitosis of the macronucleus. Other distinctive features of ciliate DNA include an altered use of the conventional stop codons. Images PMID:8078435

  20. MIP Models and Hybrid Algorithms for Simultaneous Job Splitting and Scheduling on Unrelated Parallel Machines

    PubMed Central

    Ozmutlu, H. Cenk

    2014-01-01

    We developed mixed integer programming (MIP) models and hybrid genetic-local search algorithms for the scheduling problem of unrelated parallel machines with job sequence and machine-dependent setup times and with job splitting property. The first contribution of this paper is to introduce novel algorithms which make splitting and scheduling simultaneously with variable number of subjobs. We proposed simple chromosome structure which is constituted by random key numbers in hybrid genetic-local search algorithm (GAspLA). Random key numbers are used frequently in genetic algorithms, but it creates additional difficulty when hybrid factors in local search are implemented. We developed algorithms that satisfy the adaptation of results of local search into the genetic algorithms with minimum relocation operation of genes' random key numbers. This is the second contribution of the paper. The third contribution of this paper is three developed new MIP models which are making splitting and scheduling simultaneously. The fourth contribution of this paper is implementation of the GAspLAMIP. This implementation let us verify the optimality of GAspLA for the studied combinations. The proposed methods are tested on a set of problems taken from the literature and the results validate the effectiveness of the proposed algorithms. PMID:24977204

  1. Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

    PubMed Central

    Pierro, Gaetano

    2012-01-01

    The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380

  2. The Effect of the Number and Nature of Features and of General Ability on the Simultaneous and Successive Processing of Maps.

    ERIC Educational Resources Information Center

    Sutherland, Sandra; Winn, William

    The interactions of three factors that may be involved with the memory for pattern or sequence in visual materials were investigated in this study: (1) arbitrariness of representation; (2) task; and (3) ability of students. The subjects, who were 29 graduate students in education, were pretested for general ability and randomly assigned to four…

  3. The development of GPU-based parallel PRNG for Monte Carlo applications in CUDA Fortran

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kargaran, Hamed, E-mail: h-kargaran@sbu.ac.ir; Minuchehr, Abdolhamid; Zolfaghari, Ahmad

    The implementation of Monte Carlo simulation on the CUDA Fortran requires a fast random number generation with good statistical properties on GPU. In this study, a GPU-based parallel pseudo random number generator (GPPRNG) have been proposed to use in high performance computing systems. According to the type of GPU memory usage, GPU scheme is divided into two work modes including GLOBAL-MODE and SHARED-MODE. To generate parallel random numbers based on the independent sequence method, the combination of middle-square method and chaotic map along with the Xorshift PRNG have been employed. Implementation of our developed PPRNG on a single GPU showedmore » a speedup of 150x and 470x (with respect to the speed of PRNG on a single CPU core) for GLOBAL-MODE and SHARED-MODE, respectively. To evaluate the accuracy of our developed GPPRNG, its performance was compared to that of some other commercially available PPRNGs such as MATLAB, FORTRAN and Miller-Park algorithm through employing the specific standard tests. The results of this comparison showed that the developed GPPRNG in this study can be used as a fast and accurate tool for computational science applications.« less

  4. Meta-structure correlation in protein space unveils different selection rules for folded and intrinsically disordered proteins.

    PubMed

    Naranjo, Yandi; Pons, Miquel; Konrat, Robert

    2012-01-01

    The number of existing protein sequences spans a very small fraction of sequence space. Natural proteins have overcome a strong negative selective pressure to avoid the formation of insoluble aggregates. Stably folded globular proteins and intrinsically disordered proteins (IDPs) use alternative solutions to the aggregation problem. While in globular proteins folding minimizes the access to aggregation prone regions, IDPs on average display large exposed contact areas. Here, we introduce the concept of average meta-structure correlation maps to analyze sequence space. Using this novel conceptual view we show that representative ensembles of folded and ID proteins show distinct characteristics and respond differently to sequence randomization. By studying the way evolutionary constraints act on IDPs to disable a negative function (aggregation) we might gain insight into the mechanisms by which function-enabling information is encoded in IDPs.

  5. Design, production and molecular structure of a new family of artificial alpha-helicoidal repeat proteins (αRep) based on thermostable HEAT-like repeats.

    PubMed

    Urvoas, Agathe; Guellouz, Asma; Valerio-Lepiniec, Marie; Graille, Marc; Durand, Dominique; Desravines, Danielle C; van Tilbeurgh, Herman; Desmadril, Michel; Minard, Philippe

    2010-11-26

    Repeat proteins have a modular organization and a regular architecture that make them attractive models for design and directed evolution experiments. HEAT repeat proteins, although very common, have not been used as a scaffold for artificial proteins, probably because they are made of long and irregular repeats. Here, we present and validate a consensus sequence for artificial HEAT repeat proteins. The sequence was defined from the structure-based sequence analysis of a thermostable HEAT-like repeat protein. Appropriate sequences were identified for the N- and C-caps. A library of genes coding for artificial proteins based on this sequence design, named αRep, was assembled using new and versatile methodology based on circular amplification. Proteins picked randomly from this library are expressed as soluble proteins. The biophysical properties of proteins with different numbers of repeats and different combinations of side chains in hypervariable positions were characterized. Circular dichroism and differential scanning calorimetry experiments showed that all these proteins are folded cooperatively and are very stable (T(m) >70 °C). Stability of these proteins increases with the number of repeats. Detailed gel filtration and small-angle X-ray scattering studies showed that the purified proteins form either monomers or dimers. The X-ray structure of a stable dimeric variant structure was solved. The protein is folded with a highly regular topology and the repeat structure is organized, as expected, as pairs of alpha helices. In this protein variant, the dimerization interface results directly from the variable surface enriched in aromatic residues located in the randomized positions of the repeats. The dimer was crystallized both in an apo and in a PEG-bound form, revealing a very well defined binding crevice and some structure flexibility at the interface. This fortuitous binding site could later prove to be a useful binding site for other low molecular mass partners. Copyright © 2010 Elsevier Ltd. All rights reserved.

  6. Gift from statistical learning: Visual statistical learning enhances memory for sequence elements and impairs memory for items that disrupt regularities.

    PubMed

    Otsuka, Sachio; Saiki, Jun

    2016-02-01

    Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.

  7. [Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

    PubMed

    Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

    2011-01-01

    The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.

  8. Computational Analysis of Mouse piRNA Sequence and Biogenesis

    PubMed Central

    Betel, Doron; Sheridan, Robert; Marks, Debora S; Sander, Chris

    2007-01-01

    The recent discovery of a new class of 30-nucleotide long RNAs in mammalian testes, called PIWI-interacting RNA (piRNA), with similarities to microRNAs and repeat-associated small interfering RNAs (rasiRNAs), has raised puzzling questions regarding their biogenesis and function. We report a comparative analysis of currently available piRNA sequence data from the pachytene stage of mouse spermatogenesis that sheds light on their sequence diversity and mechanism of biogenesis. We conclude that (i) there are at least four times as many piRNAs in mouse testes than currently known; (ii) piRNAs, which originate from long precursor transcripts, are generated by quasi-random enzymatic processing that is guided by a weak sequence signature at the piRNA 5′ends resulting in a large number of distinct sequences; and (iii) many of the piRNA clusters contain inverted repeats segments capable of forming double-strand RNA fold-back segments that may initiate piRNA processing analogous to transposon silencing. PMID:17997596

  9. Comparison of Methods of Detection of Exceptional Sequences in Prokaryotic Genomes.

    PubMed

    Rusinov, I S; Ershova, A S; Karyagina, A S; Spirin, S A; Alexeevski, A V

    2018-02-01

    Many proteins need recognition of specific DNA sequences for functioning. The number of recognition sites and their distribution along the DNA might be of biological importance. For example, the number of restriction sites is often reduced in prokaryotic and phage genomes to decrease the probability of DNA cleavage by restriction endonucleases. We call a sequence an exceptional one if its frequency in a genome significantly differs from one predicted by some mathematical model. An exceptional sequence could be either under- or over-represented, depending on its frequency in comparison with the predicted one. Exceptional sequences could be considered biologically meaningful, for example, as targets of DNA-binding proteins or as parts of abundant repetitive elements. Several methods to predict frequency of a short sequence in a genome, based on actual frequencies of certain its subsequences, are used. The most popular are methods based on Markov chain models. But any rigorous comparison of the methods has not previously been performed. We compared three methods for the prediction of short sequence frequencies: the maximum-order Markov chain model-based method, the method that uses geometric mean of extended Markovian estimates, and the method that utilizes frequencies of all subsequences including discontiguous ones. We applied them to restriction sites in complete genomes of 2500 prokaryotic species and demonstrated that the results depend greatly on the method used: lists of 5% of the most under-represented sites differed by up to 50%. The method designed by Burge and coauthors in 1992, which utilizes all subsequences of the sequence, showed a higher precision than the other two methods both on prokaryotic genomes and randomly generated sequences after computational imitation of selective pressure. We propose this method as the first choice for detection of exceptional sequences in prokaryotic genomes.

  10. Impact of recombination on polymorphism of genes encoding Kunitz-type protease inhibitors in the genus Solanum.

    PubMed

    Speranskaya, Anna S; Krinitsina, Anastasia A; Kudryavtseva, Anna V; Poltronieri, Palmiro; Santino, Angelo; Oparina, Nina Y; Dmitriev, Alexey A; Belenikin, Maxim S; Guseva, Marina A; Shevelev, Alexei B

    2012-08-01

    The group of Kunitz-type protease inhibitors (KPI) from potato is encoded by a polymorphic family of multiple allelic and non-allelic genes. The previous explanations of the KPI variability were based on the hypothesis of random mutagenesis as a key factor of KPI polymorphism. KPI-A genes from the genomes of Solanum tuberosum cv. Istrinskii and the wild species Solanum palustre were amplified by PCR with subsequent cloning in plasmids. True KPI sequences were derived from comparison of the cloned copies. "Hot spots" of recombination in KPI genes were independently identified by DnaSP 4.0 and TOPALi v2.5 software. The KPI-A sequence from potato cv. Istrinskii was found to be 100% identical to the gene from Solanum nigrum. This fact illustrates a high degree of similarity of KPI genes in the genus Solanum. Pairwise comparison of KPI A and B genes unambiguously showed a non-uniform extent of polymorphism at different nt positions. Moreover, the occurrence of substitutions was not random along the strand. Taken together, these facts contradict the traditional hypothesis of random mutagenesis as a principal source of KPI gene polymorphism. The experimentally found mosaic structure of KPI genes in both plants studied is consistent with the hypothesis suggesting recombination of ancestral genes. The same mechanism was proposed earlier for other resistance-conferring genes in the nightshade family (Solanaceae). Based on the data obtained, we searched for potential motifs of site-specific binding with plant DNA recombinases. During this work, we analyzed the sequencing data reported by the Potato Genome Sequencing Consortium (PGSC), 2011 and found considerable inconsistence of their data concerning the number, location, and orientation of KPI genes of groups A and B. The key role of recombination rather than random point mutagenesis in KPI polymorphism was demonstrated for the first time. Copyright © 2012 Elsevier Masson SAS. All rights reserved.

  11. Random whole metagenomic sequencing for forensic discrimination of soils.

    PubMed

    Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

    2014-01-01

    Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.

  12. Pulse homodyne field disturbance sensor

    DOEpatents

    McEwan, Thomas E.

    1997-01-01

    A field disturbance sensor operates with relatively low power, provides an adjustable operating range, is not hypersensitive at close range, allows co-location of multiple sensors, and is inexpensive to manufacture. The sensor includes a transmitter that transmits a sequence of transmitted bursts of electromagnetic energy. The transmitter frequency is modulated at an intermediate frequency. The sequence of bursts has a burst repetition rate, and each burst has a burst width and comprises a number of cycles at a transmitter frequency. The sensor includes a receiver which receives electromagnetic energy at the transmitter frequency, and includes a mixer which mixes a transmitted burst with reflections of the same transmitted burst to produce an intermediate frequency signal. Circuitry, responsive to the intermediate frequency signal indicates disturbances in the sensor field. Because the mixer mixes the transmitted burst with reflections of the transmitted burst, the burst width defines the sensor range. The burst repetition rate is randomly or pseudo-randomly modulated so that bursts in the sequence of bursts have a phase which varies. A second range-defining mode transmits two radio frequency bursts, where the time spacing between the bursts defines the maximum range divided by two.

  13. Pulse homodyne field disturbance sensor

    DOEpatents

    McEwan, T.E.

    1997-10-28

    A field disturbance sensor operates with relatively low power, provides an adjustable operating range, is not hypersensitive at close range, allows co-location of multiple sensors, and is inexpensive to manufacture. The sensor includes a transmitter that transmits a sequence of transmitted bursts of electromagnetic energy. The transmitter frequency is modulated at an intermediate frequency. The sequence of bursts has a burst repetition rate, and each burst has a burst width and comprises a number of cycles at a transmitter frequency. The sensor includes a receiver which receives electromagnetic energy at the transmitter frequency, and includes a mixer which mixes a transmitted burst with reflections of the same transmitted burst to produce an intermediate frequency signal. Circuitry, responsive to the intermediate frequency signal indicates disturbances in the sensor field. Because the mixer mixes the transmitted burst with reflections of the transmitted burst, the burst width defines the sensor range. The burst repetition rate is randomly or pseudo-randomly modulated so that bursts in the sequence of bursts have a phase which varies. A second range-defining mode transmits two radio frequency bursts, where the time spacing between the bursts defines the maximum range divided by two. 12 figs.

  14. Theory and implementation of a very high throughput true random number generator in field programmable gate array

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong

    The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less

  15. Theory and implementation of a very high throughput true random number generator in field programmable gate array.

    PubMed

    Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao

    2016-04-01

    The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.

  16. Survey and Analysis of Microsatellites in the Silkworm, Bombyx mori

    PubMed Central

    Prasad, M. Dharma; Muthulakshmi, M.; Madhu, M.; Archak, Sunil; Mita, K.; Nagaraju, J.

    2005-01-01

    We studied microsatellite frequency and distribution in 21.76-Mb random genomic sequences, 0.67-Mb BAC sequences from the Z chromosome, and 6.3-Mb EST sequences of Bombyx mori. We mined microsatellites of ≥15 bases of mononucleotide repeats and ≥5 repeat units of other classes of repeats. We estimated that microsatellites account for 0.31% of the genome of B. mori. Microsatellite tracts of A, AT, and ATT were the most abundant whereas their number drastically decreased as the length of the repeat motif increased. In general, tri- and hexanucleotide repeats were overrepresented in the transcribed sequences except TAA, GTA, and TGA, which were in excess in genomic sequences. The Z chromosome sequences contained shorter repeat types than the rest of the chromosomes in addition to a higher abundance of AT-rich repeats. Our results showed that base composition of the flanking sequence has an influence on the origin and evolution of microsatellites. Transitions/transversions were high in microsatellites of ESTs, whereas the genomic sequence had an equal number of substitutions and indels. The average heterozygosity value for 23 polymorphic microsatellite loci surveyed in 13 diverse silkmoth strains having 2–14 alleles was 0.54. Only 36 (18.2%) of 198 microsatellite loci were polymorphic between the two divergent silkworm populations and 10 (5%) loci revealed null alleles. The microsatellite map generated using these polymorphic markers resulted in 8 linkage groups. B. mori microsatellite loci were the most conserved in its immediate ancestor, B. mandarina, followed by the wild saturniid silkmoth, Antheraea assama. PMID:15371363

  17. ``Sequence space soup'' of proteins and copolymers

    NASA Astrophysics Data System (ADS)

    Chan, Hue Sun; Dill, Ken A.

    1991-09-01

    To study the protein folding problem, we use exhaustive computer enumeration to explore ``sequence space soup,'' an imaginary solution containing the ``native'' conformations (i.e., of lowest free energy) under folding conditions, of every possible copolymer sequence. The model is of short self-avoiding chains of hydrophobic (H) and polar (P) monomers configured on the two-dimensional square lattice. By exhaustive enumeration, we identify all native structures for every possible sequence. We find that random sequences of H/P copolymers will bear striking resemblance to known proteins: Most sequences under folding conditions will be approximately as compact as known proteins, will have considerable amounts of secondary structure, and it is most probable that an arbitrary sequence will fold to a number of lowest free energy conformations that is of order one. In these respects, this simple model shows that proteinlike behavior should arise simply in copolymers in which one monomer type is highly solvent averse. It suggests that the structures and uniquenesses of native proteins are not consequences of having 20 different monomer types, or of unique properties of amino acid monomers with regard to special packing or interactions, and thus that simple copolymers might be designable to collapse to proteinlike structures and properties. A good strategy for designing a sequence to have a minimum possible number of native states is to strategically insert many P monomers. Thus known proteins may be marginally stable due to a balance: More H residues stabilize the desired native state, but more P residues prevent simultaneous stabilization of undesired native states.

  18. Theta oscillations promote temporal sequence learning.

    PubMed

    Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan

    2018-05-17

    Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of temporal order information. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.

  19. Image Encryption Algorithm Based on Hyperchaotic Maps and Nucleotide Sequences Database

    PubMed Central

    2017-01-01

    Image encryption technology is one of the main means to ensure the safety of image information. Using the characteristics of chaos, such as randomness, regularity, ergodicity, and initial value sensitiveness, combined with the unique space conformation of DNA molecules and their unique information storage and processing ability, an efficient method for image encryption based on the chaos theory and a DNA sequence database is proposed. In this paper, digital image encryption employs a process of transforming the image pixel gray value by using chaotic sequence scrambling image pixel location and establishing superchaotic mapping, which maps quaternary sequences and DNA sequences, and by combining with the logic of the transformation between DNA sequences. The bases are replaced under the displaced rules by using DNA coding in a certain number of iterations that are based on the enhanced quaternary hyperchaotic sequence; the sequence is generated by Chen chaos. The cipher feedback mode and chaos iteration are employed in the encryption process to enhance the confusion and diffusion properties of the algorithm. Theoretical analysis and experimental results show that the proposed scheme not only demonstrates excellent encryption but also effectively resists chosen-plaintext attack, statistical attack, and differential attack. PMID:28392799

  20. A New Strategy to Enhance Cavitational Tissue Erosion Using a High-Intensity, Initiating Sequence

    PubMed Central

    Xu, Zhen; Fowlkes, J. Brian; Cain, Charles A.

    2009-01-01

    Our previous studies have shown that pulsed ultrasound can physically remove soft tissue through cavitation. A new strategy to enhance the cavitation-induced erosion is proposed wherein tissue erosion is initiated by a short, high-intensity sequence of pulses and sustained by lower intensity pulses. We investigated effects of the initiating sequence on erosion and cavitation sustained by lower intensity pulses. Multiple three-cycle pulses at a pulse repetition frequency of 20 kHz delivered by a 788-kHz focused transducer were used for tissue erosion. Fixing the initiating sequence at ISPPA of 9000 W/cm2, 16 combinations of different numbers of pulses within the initiating sequence and different sustaining pulse intensities were tested. Results showed: the initiating sequence increases the probability of erosion occurrence and the erosion rate with only slight overall increases in propagated energy; the initiating sequence containing more pulses does not increase the sustained cavitation period; and if extinguished and reinitiated, the sustained cavitation period becomes shorter after each initiation, although the waiting time between adjacent cavitation periods is random. The high-intensity, initiating sequence enhances cavitational tissue erosion and enables erosion at intensities significantly lower than what is required to initiate erosion. PMID:16921893

  1. Phylogenetic mixtures and linear invariants for equal input models.

    PubMed

    Casanellas, Marta; Steel, Mike

    2017-04-01

    The reconstruction of phylogenetic trees from molecular sequence data relies on modelling site substitutions by a Markov process, or a mixture of such processes. In general, allowing mixed processes can result in different tree topologies becoming indistinguishable from the data, even for infinitely long sequences. However, when the underlying Markov process supports linear phylogenetic invariants, then provided these are sufficiently informative, the identifiability of the tree topology can be restored. In this paper, we investigate a class of processes that support linear invariants once the stationary distribution is fixed, the 'equal input model'. This model generalizes the 'Felsenstein 1981' model (and thereby the Jukes-Cantor model) from four states to an arbitrary number of states (finite or infinite), and it can also be described by a 'random cluster' process. We describe the structure and dimension of the vector spaces of phylogenetic mixtures and of linear invariants for any fixed phylogenetic tree (and for all trees-the so called 'model invariants'), on any number n of leaves. We also provide a precise description of the space of mixtures and linear invariants for the special case of [Formula: see text] leaves. By combining techniques from discrete random processes and (multi-) linear algebra, our results build on a classic result that was first established by James Lake (Mol Biol Evol 4:167-191, 1987).

  2. Forecasting drought risks for a water supply storage system using bootstrap position analysis

    USGS Publications Warehouse

    Tasker, Gary; Dunne, Paul

    1997-01-01

    Forecasting the likelihood of drought conditions is an integral part of managing a water supply storage and delivery system. Position analysis uses a large number of possible flow sequences as inputs to a simulation of a water supply storage and delivery system. For a given set of operating rules and water use requirements, water managers can use such a model to forecast the likelihood of specified outcomes such as reservoir levels falling below a specified level or streamflows falling below statutory passing flows a few months ahead conditioned on the current reservoir levels and streamflows. The large number of possible flow sequences are generated using a stochastic streamflow model with a random resampling of innovations. The advantages of this resampling scheme, called bootstrap position analysis, are that it does not rely on the unverifiable assumption of normality and it allows incorporation of long-range weather forecasts into the analysis.

  3. Dynamical decoupling of local transverse random telegraph noise in a two-qubit gate

    NASA Astrophysics Data System (ADS)

    D'Arrigo, A.; Falci, G.; Paladino, E.

    2015-10-01

    Achieving high-fidelity universal two-qubit gates is a central requisite of any implementation of quantum information processing. The presence of spurious fluctuators of various physical origin represents a limiting factor for superconducting nanodevices. Operating qubits at optimal points, where the qubit-fluctuator interaction is transverse with respect to the single qubit Hamiltonian, considerably improved single qubit gates. Further enhancement has been achieved by dynamical decoupling (DD). In this article we investigate DD of transverse random telegraph noise acting locally on each of the qubits forming an entangling gate. Our analysis is based on the exact numerical solution of the stochastic Schrödinger equation. We evaluate the gate error under local periodic, Carr-Purcell and Uhrig DD sequences. We find that a threshold value of the number, n, of pulses exists above which the gate error decreases with a sequence-specific power-law dependence on n. Below threshold, DD may even increase the error with respect to the unconditioned evolution, a behaviour reminiscent of the anti-Zeno effect.

  4. A Numerical Study of New Logistic Map

    NASA Astrophysics Data System (ADS)

    Khmou, Youssef

    In this paper, we propose a new logistic map based on the relation of the information entropy, we study the bifurcation diagram comparatively to the standard logistic map. In the first part, we compare the obtained diagram, by numerical simulations, with that of the standard logistic map. It is found that the structures of both diagrams are similar where the range of the growth parameter is restricted to the interval [0,e]. In the second part, we present an application of the proposed map in traffic flow using macroscopic model. It is found that the bifurcation diagram is an exact model of the Greenberg’s model of traffic flow where the growth parameter corresponds to the optimal velocity and the random sequence corresponds to the density. In the last part, we present a second possible application of the proposed map which consists of random number generation. The results of the analysis show that the excluded initial values of the sequences are (0,1).

  5. Can a school-based hand hygiene program reduce asthma exacerbations among elementary school children?

    PubMed Central

    Gerald, Joe K.; Zhang, Bin; McClure, Leslie A.; Bailey, William C.; Harrington, Kathy F.

    2012-01-01

    Background Viral upper respiratory infections have been implicated as a major cause of asthma exacerbations among school age children. Regular hand washing is the most effective method to prevent the spread of viral respiratory infections but, effective hand washing practices are difficult to establish in schools. Objectives This randomized controlled trial evaluated whether a standardized regimen of hand washing plus alcohol-based hand sanitizer could reduce asthma exacerbations more than schools’ usual hand hygiene practices. Methods This was a two year, community-based, randomized controlled crossover trial. Schools were randomized to usual care then intervention (Sequence 1) or intervention then usual care (Sequence 2). Intervention schools were provided with alcohol-based hand sanitizer, hand soap, and hand hygiene education. The primary outcome was the proportion of students experiencing an asthma exacerbation each month. Generalized estimating equations were used to model the difference in the marginal rate of exacerbations between sequences while controlling for individual demographic factors and the correlation within each student and between students within each school. Results 527 students with asthma were enrolled among 31 schools. The hand hygiene intervention did not reduce the number of asthma exacerbations as compared to the schools’ usual hand hygiene practices (p=0.132). There was a strong temporal trend as both sequences experienced fewer exacerbations during Year 2 as compared to Year 1 (p<0.001). Conclusions While the intervention was not found to be effective, the results were confounded by the H1N1 influenza pandemic that resulted in substantially increased hand hygiene behaviors and resources in usual care schools. Therefore, these results should be viewed cautiously. PMID:23069487

  6. Beyond Reasonable Doubt: Evolution from DNA Sequences

    PubMed Central

    Penny, David

    2013-01-01

    We demonstrate quantitatively that, as predicted by evolutionary theory, sequences of homologous proteins from different species converge as we go further and further back in time. The converse, a non-evolutionary model can be expressed as probabilities, and the test works for chloroplast, nuclear and mitochondrial sequences, as well as for sequences that diverged at different time depths. Even on our conservative test, the probability that chance could produce the observed levels of ancestral convergence for just one of the eight datasets of 51 proteins is ≈1×10−19 and combined over 8 datasets is ≈1×10−132. By comparison, there are about 1080 protons in the universe, hence the probability that the sequences could have been produced by a process involving unrelated ancestral sequences is about 1050 lower than picking, among all protons, the same proton at random twice in a row. A non-evolutionary control model shows no convergence, and only a small number of parameters are required to account for the observations. It is time that that researchers insisted that doubters put up testable alternatives to evolution. PMID:23950906

  7. Partial bisulfite conversion for unique template sequencing.

    PubMed

    Kumar, Vijay; Rosenbaum, Julie; Wang, Zihua; Forcier, Talitha; Ronemus, Michael; Wigler, Michael; Levy, Dan

    2018-01-25

    We introduce a new protocol, mutational sequencing or muSeq, which uses sodium bisulfite to randomly deaminate unmethylated cytosines at a fixed and tunable rate. The muSeq protocol marks each initial template molecule with a unique mutation signature that is present in every copy of the template, and in every fragmented copy of a copy. In the sequenced read data, this signature is observed as a unique pattern of C-to-T or G-to-A nucleotide conversions. Clustering reads with the same conversion pattern enables accurate count and long-range assembly of initial template molecules from short-read sequence data. We explore count and low-error sequencing by profiling 135 000 restriction fragments in a PstI representation, demonstrating that muSeq improves copy number inference and significantly reduces sporadic sequencer error. We explore long-range assembly in the context of cDNA, generating contiguous transcript clusters greater than 3,000 bp in length. The muSeq assemblies reveal transcriptional diversity not observable from short-read data alone. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  8. Treatment of fatigue with methylphenidate, modafinil and amantadine in multiple sclerosis (TRIUMPHANT-MS): Study design for a pragmatic, randomized, double-blind, crossover clinical trial.

    PubMed

    Nourbakhsh, Bardia; Revirajan, Nisha; Waubant, Emmanuelle

    2018-01-01

    Fatigue is the most common symptom of multiple sclerosis (MS). Amantadine, modafinil and amphetamine-like stimulants are commonly used in clinical practice for treatment of fatigue; however, the evidence supporting their effectiveness is sparse and conflicting. To describe the design of a trial study funded by Patient-Centered Outcome Research Institute (PCORI) that will compare the efficacy of commonly used fatigue medications in patients with MS. The study is a randomized, placebo-controlled, crossover, four-sequence, four-period, double-blind, multicenter trial of three commonly used medications for the treatment of MS-related fatigue (amantadine, modafinil, methylphenidate) versus placebo in fatigued subjects with MS. Adult patients with MS, with an Expanded Disability Status Scale of <7.0 are eligible to participate. Participants will be randomized to one of four predefined sequences of medication administration. Each sequence comprises four 6-week periods of treatment with one of the 3 study drugs or placebo, and three 2-week washout periods between medication periods. 136 participants will be randomized over two years in two academic centers in the United States starting in the Summer 2017. Complete enrollment is expected by early 2019. The primary outcome of the study is the modified fatigue impact scale (MFIS) score while participants receive the maximally tolerated dose of each study medication (or placebo). Safety and tolerability of the medications and heterogeneity of treatment effect will also be assessed. Results of the proposed study will provide evidence-based and personalized treatment options for patients affected by MS-related fatigue. Clinicaltrials.gov registration number: NCT03185065. Copyright © 2017 Elsevier Inc. All rights reserved.

  9. A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules

    PubMed Central

    Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

    2015-01-01

    A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis. PMID:26355961

  10. A Robust and Versatile Method of Combinatorial Chemical Synthesis of Gene Libraries via Hierarchical Assembly of Partially Randomized Modules.

    PubMed

    Popova, Blagovesta; Schubert, Steffen; Bulla, Ingo; Buchwald, Daniela; Kramer, Wilfried

    2015-01-01

    A major challenge in gene library generation is to guarantee a large functional size and diversity that significantly increases the chances of selecting different functional protein variants. The use of trinucleotides mixtures for controlled randomization results in superior library diversity and offers the ability to specify the type and distribution of the amino acids at each position. Here we describe the generation of a high diversity gene library using tHisF of the hyperthermophile Thermotoga maritima as a scaffold. Combining various rational criteria with contingency, we targeted 26 selected codons of the thisF gene sequence for randomization at a controlled level. We have developed a novel method of creating full-length gene libraries by combinatorial assembly of smaller sub-libraries. Full-length libraries of high diversity can easily be assembled on demand from smaller and much less diverse sub-libraries, which circumvent the notoriously troublesome long-term archivation and repeated proliferation of high diversity ensembles of phages or plasmids. We developed a generally applicable software tool for sequence analysis of mutated gene sequences that provides efficient assistance for analysis of library diversity. Finally, practical utility of the library was demonstrated in principle by assessment of the conformational stability of library members and isolating protein variants with HisF activity from it. Our approach integrates a number of features of nucleic acids synthetic chemistry, biochemistry and molecular genetics to a coherent, flexible and robust method of combinatorial gene synthesis.

  11. Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences

    PubMed Central

    Peng, Zhen; Genewein, Tim; Braun, Daniel A.

    2014-01-01

    Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects' self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories. PMID:24744716

  12. Investigation of the contextual interference effect in the manipulation of the motor parameter of over-all force.

    PubMed

    Goodwin, J E; Meeuwsen, H J

    1996-12-01

    This investigation examined the contextual interference effect when manipulating over-all force in a golf-putting task. Undergraduate women (N = 30) were randomly assigned to a Random, Blocked-Random, or Blocked practice condition and practiced golf putting from distances of 2.43 m, 3.95 m, and 5.47 m during acquisition. Subjects in the Random condition practiced trials in a quasirandom sequence and those in the Blocked-Random condition practiced trials initially in a blocked sequence with the remainder of the trials practiced in a quasirandom sequence. In the Blocked condition subjects practiced trials in a blocked sequence. A 24-hr. transfer test consisted of 30 trials with 10 trials each from 1.67 m, 3.19 m, and 6.23 m. Transfer scores supported the Magill and Hall (1990) hypothesis that, when task variations involve learning parameters of a generalized motor program, the benefit of random practice over blocked practice would not be found.

  13. Using Maximum Entropy to Find Patterns in Genomes

    NASA Astrophysics Data System (ADS)

    Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

    The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.

  14. Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.

    PubMed

    Girardot, Charles; Scholtalbers, Jelle; Sauer, Sajoscha; Su, Shu-Yi; Furlong, Eileen E M

    2016-10-08

    The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je . Je can also be easily installed in Galaxy through the Galaxy toolshed.

  15. Undesirable Choice Biases with Small Differences in the Spatial Structure of Chance Stimulus Sequences.

    PubMed

    Herrera, David; Treviño, Mario

    2015-01-01

    In two-alternative discrimination tasks, experimenters usually randomize the location of the rewarded stimulus so that systematic behavior with respect to irrelevant stimuli can only produce chance performance on the learning curves. One way to achieve this is to use random numbers derived from a discrete binomial distribution to create a 'full random training schedule' (FRS). When using FRS, however, sporadic but long laterally-biased training sequences occur by chance and such 'input biases' are thought to promote the generation of laterally-biased choices (i.e., 'output biases'). As an alternative, a 'Gellerman-like training schedule' (GLS) can be used. It removes most input biases by prohibiting the reward from appearing on the same location for more than three consecutive trials. The sequence of past rewards obtained from choosing a particular discriminative stimulus influences the probability of choosing that same stimulus on subsequent trials. Assuming that the long-term average ratio of choices matches the long-term average ratio of reinforcers, we hypothesized that a reduced amount of input biases in GLS compared to FRS should lead to a reduced production of output biases. We compared the choice patterns produced by a 'Rational Decision Maker' (RDM) in response to computer-generated FRS and GLS training sequences. To create a virtual RDM, we implemented an algorithm that generated choices based on past rewards. Our simulations revealed that, although the GLS presented fewer input biases than the FRS, the virtual RDM produced more output biases with GLS than with FRS under a variety of test conditions. Our results reveal that the statistical and temporal properties of training sequences interacted with the RDM to influence the production of output biases. Thus, discrete changes in the training paradigms did not translate linearly into modifications in the pattern of choices generated by a RDM. Virtual RDMs could be further employed to guide the selection of proper training schedules for perceptual decision-making studies.

  16. The effect of interference on temporal order memory for random and fixed sequences in nondemented older adults.

    PubMed

    Tolentino, Jerlyn C; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K; Gilbert, Paul E

    2012-05-21

    Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order memory for fixed sequences is comparable between older adults and young adults when temporal interference is minimized. The results suggest that temporal order memory is less efficient and more susceptible to interference in older adults, possibly due to impaired temporal pattern separation.

  17. At least some errors are randomly generated (Freud was wrong)

    NASA Technical Reports Server (NTRS)

    Sellen, A. J.; Senders, J. W.

    1986-01-01

    An experiment was carried out to expose something about human error generating mechanisms. In the context of the experiment, an error was made when a subject pressed the wrong key on a computer keyboard or pressed no key at all in the time allotted. These might be considered, respectively, errors of substitution and errors of omission. Each of seven subjects saw a sequence of three digital numbers, made an easily learned binary judgement about each, and was to press the appropriate one of two keys. Each session consisted of 1,000 presentations of randomly permuted, fixed numbers broken into 10 blocks of 100. One of two keys should have been pressed within one second of the onset of each stimulus. These data were subjected to statistical analyses in order to probe the nature of the error generating mechanisms. Goodness of fit tests for a Poisson distribution for the number of errors per 50 trial interval and for an exponential distribution of the length of the intervals between errors were carried out. There is evidence for an endogenous mechanism that may best be described as a random error generator. Furthermore, an item analysis of the number of errors produced per stimulus suggests the existence of a second mechanism operating on task driven factors producing exogenous errors. Some errors, at least, are the result of constant probability generating mechanisms with error rate idiosyncratically determined for each subject.

  18. A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

    PubMed

    Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

    2015-04-11

    Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.

  19. GTRAC: fast retrieval from compressed collections of genomic variants

    PubMed Central

    Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

    2016-01-01

    Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665

  20. GTRAC: fast retrieval from compressed collections of genomic variants.

    PubMed

    Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

    2016-09-01

    The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Multidimensional density shaping by sigmoids.

    PubMed

    Roth, Z; Baram, Y

    1996-01-01

    An estimate of the probability density function of a random vector is obtained by maximizing the output entropy of a feedforward network of sigmoidal units with respect to the input weights. Classification problems can be solved by selecting the class associated with the maximal estimated density. Newton's optimization method, applied to the estimated density, yields a recursive estimator for a random variable or a random sequence. A constrained connectivity structure yields a linear estimator, which is particularly suitable for "real time" prediction. A Gaussian nonlinearity yields a closed-form solution for the network's parameters, which may also be used for initializing the optimization algorithm when other nonlinearities are employed. A triangular connectivity between the neurons and the input, which is naturally suggested by the statistical setting, reduces the number of parameters. Applications to classification and forecasting problems are demonstrated.

  2. Single-Molecule Electrical Random Resequencing of DNA and RNA

    NASA Astrophysics Data System (ADS)

    Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

    2012-07-01

    Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.

  3. mtDNA sequence diversity of Hazara ethnic group from Pakistan.

    PubMed

    Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

    2017-09-01

    The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Spatial serial order processing in schizophrenia.

    PubMed

    Fraser, David; Park, Sohee; Clark, Gina; Yohanna, Daniel; Houk, James C

    2004-10-01

    The aim of this study was to examine serial order processing deficits in 21 schizophrenia patients and 16 age- and education-matched healthy controls. In a spatial serial order working memory task, one to four spatial targets were presented in a randomized sequence. Subjects were required to remember the locations and the order in which the targets were presented. Patients showed a marked deficit in ability to remember the sequences compared with controls. Increasing the number of targets within a sequence resulted in poorer memory performance for both control and schizophrenia subjects, but the effect was much more pronounced in the patients. Targets presented at the end of a long sequence were more vulnerable to memory error in schizophrenia patients. Performance deficits were not attributable to motor errors, but to errors in target choice. The results support the idea that the memory errors seen in schizophrenia patients may be due to saturating the working memory network at relatively low levels of memory load.

  5. Cognitive functioning in opioid-dependent patients treated with buprenorphine, methadone, and other psychoactive medications: stability and correlates

    PubMed Central

    2011-01-01

    Background In many but not in all neuropsychological studies buprenorphine-treated opioid-dependent patients have shown fewer cognitive deficits than patients treated with methadone. In order to examine if hypothesized cognitive advantage of buprenorphine in relation to methadone is seen in clinical patients we did a neuropsychological follow-up study in unselected sample of buprenorphine- vs. methadone-treated patients. Methods In part I of the study fourteen buprenorphine-treated and 12 methadone-treated patients were tested by cognitive tests within two months (T1), 6-9 months (T2), and 12 - 17 months (T3) from the start of opioid substitution treatment. Fourteen healthy controls were examined at similar intervals. Benzodiazepine and other psychoactive comedications were common among the patients. Test results were analyzed with repeated measures analysis of variance and planned contrasts. In part II of the study the patient sample was extended to include 36 patients at T2 and T3. Correlations between cognitive functioning and medication, substance abuse, or demographic variables were then analyzed. Results In part I methadone patients were inferior to healthy controls tests in all tests measuring attention, working memory, or verbal memory. Buprenorphine patients were inferior to healthy controls in the first working memory task, the Paced Auditory Serial Addition Task and verbal memory. In the second working memory task, the Letter-Number Sequencing, their performance improved between T2 and T3. In part II only group membership (buprenorphine vs. methadone) correlated significantly with attention performance and improvement in the Letter-Number Sequencing. High frequency of substance abuse in the past month was associated with poor performance in the Letter-Number Sequencing. Conclusions The results underline the differences between non-randomized and randomized studies comparing cognitive performance in opioid substitution treated patients (fewer deficits in buprenorphine patients vs. no difference between buprenorphine and methadone patients, respectively). Possible reasons for this are discussed. PMID:21854644

  6. Discovery and analysis of an active long terminal repeat-retrotransposable element in Aspergillus oryzae.

    PubMed

    Jie Jin, Feng; Hara, Seiichi; Sato, Atsushi; Koyama, Yasuji

    2014-01-01

    Wild-type Aspergillus oryzae RIB40 contains two copies of the AO090005001597 gene. We previously constructed A. oryzae RIB40 strain, RKuAF8B, with multiple chromosomal deletions, in which the AO090005001597 copy number was found to be increased significantly. Sequence analysis indicated that AO090005001597 is part of a putative 6,000-bp retrotransposable element, flanked by two long terminal repeats (LTRs) of 669 bp, with characteristics of retroviruses and retrotransposons, and thus designated AoLTR (A. oryzae LTR-retrotransposable element). AoLTR comprised putative reverse transcriptase, RNase H, and integrase domains. The deduced amino acid sequence alignment of AoLTR showed 94% overall identity with AFLAV, an A. flavus Tf1/sushi retrotransposon. Quantitative real-time RT-PCR showed that AoLTR gene expression was significantly increased in the RKuAF8B, in accordance with the increased copy number. Inverse PCR indicated that the full-length retrotransposable element was randomly integrated into multiple genomic locations. However, no obvious phenotypic changes were associated with the increased AoLTR gene copy number.

  7. Generation of Aptamers from A Primer-Free Randomized ssDNA Library Using Magnetic-Assisted Rapid Aptamer Selection

    NASA Astrophysics Data System (ADS)

    Tsao, Shih-Ming; Lai, Ji-Ching; Horng, Horng-Er; Liu, Tu-Chen; Hong, Chin-Yih

    2017-04-01

    Aptamers are oligonucleotides that can bind to specific target molecules. Most aptamers are generated using random libraries in the standard systematic evolution of ligands by exponential enrichment (SELEX). Each random library contains oligonucleotides with a randomized central region and two fixed primer regions at both ends. The fixed primer regions are necessary for amplifying target-bound sequences by PCR. However, these extra-sequences may cause non-specific bindings, which potentially interfere with good binding for random sequences. The Magnetic-Assisted Rapid Aptamer Selection (MARAS) is a newly developed protocol for generating single-strand DNA aptamers. No repeat selection cycle is required in the protocol. This study proposes and demonstrates a method to isolate aptamers for C-reactive proteins (CRP) from a randomized ssDNA library containing no fixed sequences at 5‧ and 3‧ termini using the MARAS platform. Furthermore, the isolated primer-free aptamer was sequenced and binding affinity for CRP was analyzed. The specificity of the obtained aptamer was validated using blind serum samples. The result was consistent with monoclonal antibody-based nephelometry analysis, which indicated that a primer-free aptamer has high specificity toward targets. MARAS is a feasible platform for efficiently generating primer-free aptamers for clinical diagnoses.

  8. [Methodological quality and reporting quality evaluation of randomized controlled trials published in China Journal of Chinese Materia Medica].

    PubMed

    Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei

    2018-02-01

    To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials (1.09%) reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.

  9. A double-blind, randomized, placebo-controlled trial studying the effects of Saccharomyces boulardii on the gastrointestinal tolerability, safety, and pharmacokinetics of miglustat.

    PubMed

    Remenova, Tatiana; Morand, Olivier; Amato, Dominick; Chadha-Boreham, Harbajan; Tsurutani, Scott; Marquardt, Thorsten

    2015-06-19

    Gastrointestinal (GI) disturbances such as diarrhea and flatulence are the most frequent adverse effects associated with miglustat therapy in type 1 Gaucher disease (GD1) and Niemann-Pick disease type C (NP-C), and the most common recorded reason for stopping treatment during clinical trials and in clinical practice settings. Miglustat-related GI disturbances are thought to arise from the inhibition of intestinal disaccharidases, mainly sucrase isomaltase. We report the effects of a co-administered dietary probiotic, S. boulardii, on the GI tolerability of miglustat in healthy adult subjects. In a double-blind, placebo-controlled, two-period, two-treatment cross-over trial, healthy adult male and female subjects were randomly allocated to treatment sequences, A-B and B-A (treatment A - miglustat 100 mg t.i.d. + placebo; treatment B - miglustat 100 mg t.i.d. + S. boulardii [500 mg, b.i.d.]). GI tolerability data were collected in patient diaries. The primary endpoint was the total number of 'diarrhea days' (≥3 loose stools within a 24-h period meeting Bristol Stool Scores [BSS] 6-7) based on WHO criteria. Secondary endpoints comprised numerous other diarrhea and GI tolerability indices. Twenty-one subjects received randomized therapy in each treatment sequence (total N = 42), and overall, 37 (88 %) subjects completed the study. The total number of diarrhea days was <1.5 for both treatment sequences, and approximately 60 % of subjects did not experience diarrhea during either treatment period. The mean (SD) number of diarrhea days was lower with miglustat + S. boulardii (0.8 [2.4] days) than with miglustat + placebo (1.3 [2.4] days), but the paired treatment difference was not statistically significant (-0.5 [2.4] days; p = 0.159). However, a significant treatment difference (-0.7 [1.9]; p < 0.05) was identified after post hoc exclusion of a clear outlier who had a very high number of diarrhea days (n = 13) and inconsistent GI tolerability reporting. The incidence of the GI AEs was higher with miglustat + placebo (82 %) than with miglustat + S. boulardii (73 %). There were no between-treatment differences in miglustat pharmacokinetics. Although the primary endpoint was not met, the results of the post-hoc analysis suggest that co-administration of miglustat with S. boulardii might improve GI tolerability.

  10. cWINNOWER algorithm for finding fuzzy dna motifs

    NASA Technical Reports Server (NTRS)

    Liang, S.; Samanta, M. P.; Biegel, B. A.

    2004-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if a clique consisting of a sufficiently large number of mutated copies of the motif (i.e., the signals) is present in the DNA sequence. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum detectable clique size qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12,000 for (l, d) = (15, 4). Copyright Imperial College Press.

  11. cWINNOWER Algorithm for Finding Fuzzy DNA Motifs

    NASA Technical Reports Server (NTRS)

    Liang, Shoudan

    2003-01-01

    The cWINNOWER algorithm detects fuzzy motifs in DNA sequences rich in protein-binding signals. A signal is defined as any short nucleotide pattern having up to d mutations differing from a motif of length l. The algorithm finds such motifs if multiple mutated copies of the motif (i.e., the signals) are present in the DNA sequence in sufficient abundance. The cWINNOWER algorithm substantially improves the sensitivity of the winnower method of Pevzner and Sze by imposing a consensus constraint, enabling it to detect much weaker signals. We studied the minimum number of detectable motifs qc as a function of sequence length N for random sequences. We found that qc increases linearly with N for a fast version of the algorithm based on counting three-member sub-cliques. Imposing consensus constraints reduces qc, by a factor of three in this case, which makes the algorithm dramatically more sensitive. Our most sensitive algorithm, which counts four-member sub-cliques, needs a minimum of only 13 signals to detect motifs in a sequence of length N = 12000 for (l,d) = (15,4).

  12. Templated sequence insertion polymorphisms in the human genome

    NASA Astrophysics Data System (ADS)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  13. Red, green, blue equals 1, 2, 3: Digit-color synesthetes can use structured digit information to boost recall of color sequences.

    PubMed

    Teichmann, A Lina; Nieuwenstein, Mark R; Rich, Anina N

    2015-01-01

    Digit-color synesthetes report experiencing colors when perceiving letters and digits. The conscious experience is typically unidirectional (e.g., digits elicit colors but not vice versa) but recent evidence shows subtle bidirectional effects. We examined whether short-term memory for colors could be affected by the order of presentation reflecting more or less structure in the associated digits. We presented a stream of colored squares and asked participants to report the colors in order. The colors matched each synesthete's colors for digits 1-9 and the order of the colors corresponded either to a sequence of numbers (e.g., [red, green, blue] if 1 = red, 2 = green, 3 = blue) or no systematic sequence. The results showed that synesthetes recalled sequential color sequences more accurately than pseudo-randomized colors, whereas no such effect was found for the non-synesthetic controls. Synesthetes did not differ from non-synesthetic controls in recall of color sequences overall, providing no evidence of a general advantage in memory for serial recall of colors.

  14. Efficient Detection of Copy Number Mutations in PMS2 Exons with a Close Homolog.

    PubMed

    Herman, Daniel S; Smith, Christina; Liu, Chang; Vaughn, Cecily P; Palaniappan, Selvi; Pritchard, Colin C; Shirts, Brian H

    2018-07-01

    Detection of 3' PMS2 copy-number mutations that cause Lynch syndrome is difficult because of highly homologous pseudogenes. To improve the accuracy and efficiency of clinical screening for these mutations, we developed a new method to analyze standard capture-based, next-generation sequencing data to identify deletions and duplications in PMS2 exons 9 to 15. The approach captures sequences using PMS2 targets, maps sequences randomly among regions with equal mapping quality, counts reads aligned to homologous exons and introns, and flags read count ratios outside of empirically derived reference ranges. The method was trained on 1352 samples, including 8 known positives, and tested on 719 samples, including 17 known positives. Clinical implementation of the first version of this method detected new mutations in the training (N = 7) and test (N = 2) sets that had not been identified by our initial clinical testing pipeline. The described final method showed complete sensitivity in both sample sets and false-positive rates of 5% (training) and 7% (test), dramatically decreasing the number of cases needing additional mutation evaluation. This approach leveraged the differences between gene and pseudogene to distinguish between PMS2 and PMS2CL copy-number mutations. These methods enable efficient and sensitive Lynch syndrome screening for 3' PMS2 copy-number mutations and may be applied similarly to other genomic regions with highly homologous pseudogenes. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  15. Molecular selection in a unified evolutionary sequence

    NASA Technical Reports Server (NTRS)

    Fox, S. W.

    1986-01-01

    With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.

  16. EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.

    PubMed

    Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang

    2016-11-01

    The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  17. A Gibbs sampler for motif detection in phylogenetically close sequences

    NASA Astrophysics Data System (ADS)

    Siddharthan, Rahul; van Nimwegen, Erik; Siggia, Eric

    2004-03-01

    Genes are regulated by transcription factors that bind to DNA upstream of genes and recognize short conserved ``motifs'' in a random intergenic ``background''. Motif-finders such as the Gibbs sampler compare the probability of these short sequences being represented by ``weight matrices'' to the probability of their arising from the background ``null model'', and explore this space (analogous to a free-energy landscape). But closely related species may show conservation not because of functional sites but simply because they have not had sufficient time to diverge, so conventional methods will fail. We introduce a new Gibbs sampler algorithm that accounts for common ancestry when searching for motifs, while requiring minimal ``prior'' assumptions on the number and types of motifs, assessing the significance of detected motifs by ``tracking'' clusters that stay together. We apply this scheme to motif detection in sporulation-cycle genes in the yeast S. cerevisiae, using recent sequences of other closely-related Saccharomyces species.

  18. Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells

    PubMed Central

    Gole, Jeff; Gore, Athurva; Richards, Andrew; Chiu, Yu-Jui; Fung, Ho-Lim; Bushman, Diane; Chiang, Hsin-I; Chun, Jerold; Lo, Yu-Hwa; Zhang, Kun

    2013-01-01

    Genome sequencing of single cells has a variety of applications, including characterizing difficult-to-culture microorganisms and identifying somatic mutations in single cells from mammalian tissues. A major hurdle in this process is the bias in amplifying the genetic material from a single cell, a procedure known as polymerase cloning. Here we describe the microwell displacement amplification system (MIDAS), a massively parallel polymerase cloning method in which single cells are randomly distributed into hundreds to thousands of nanoliter wells and simultaneously amplified for shotgun sequencing. MIDAS reduces amplification bias because polymerase cloning occurs in physically separated nanoliter-scale reactors, facilitating the de novo assembly of near-complete microbial genomes from single E. coli cells. In addition, MIDAS allowed us to detect single-copy number changes in primary human adult neurons at 1–2 Mb resolution. MIDAS will further the characterization of genomic diversity in many heterogeneous cell populations. PMID:24213699

  19. Optimized scheduling technique of null subcarriers for peak power control in 3GPP LTE downlink.

    PubMed

    Cho, Soobum; Park, Sang Kyu

    2014-01-01

    Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system.

  20. Optimized Scheduling Technique of Null Subcarriers for Peak Power Control in 3GPP LTE Downlink

    PubMed Central

    Park, Sang Kyu

    2014-01-01

    Orthogonal frequency division multiple access (OFDMA) is a key multiple access technique for the long term evolution (LTE) downlink. However, high peak-to-average power ratio (PAPR) can cause the degradation of power efficiency. The well-known PAPR reduction technique, dummy sequence insertion (DSI), can be a realistic solution because of its structural simplicity. However, the large usage of subcarriers for the dummy sequences may decrease the transmitted data rate in the DSI scheme. In this paper, a novel DSI scheme is applied to the LTE system. Firstly, we obtain the null subcarriers in single-input single-output (SISO) and multiple-input multiple-output (MIMO) systems, respectively; then, optimized dummy sequences are inserted into the obtained null subcarrier. Simulation results show that Walsh-Hadamard transform (WHT) sequence is the best for the dummy sequence and the ratio of 16 to 20 for the WHT and randomly generated sequences has the maximum PAPR reduction performance. The number of near optimal iteration is derived to prevent exhausted iterations. It is also shown that there is no bit error rate (BER) degradation with the proposed technique in LTE downlink system. PMID:24883376

  1. Short-Sequence DNA Repeats in Prokaryotic Genomes

    PubMed Central

    van Belkum, Alex; Scherer, Stewart; van Alphen, Loek; Verbrugh, Henri

    1998-01-01

    Short-sequence DNA repeat (SSR) loci can be identified in all eukaryotic and many prokaryotic genomes. These loci harbor short or long stretches of repeated nucleotide sequence motifs. DNA sequence motifs in a single locus can be identical and/or heterogeneous. SSRs are encountered in many different branches of the prokaryote kingdom. They are found in genes encoding products as diverse as microbial surface components recognizing adhesive matrix molecules and specific bacterial virulence factors such as lipopolysaccharide-modifying enzymes or adhesins. SSRs enable genetic and consequently phenotypic flexibility. SSRs function at various levels of gene expression regulation. Variations in the number of repeat units per locus or changes in the nature of the individual repeat sequences may result from recombination processes or polymerase inadequacy such as slipped-strand mispairing (SSM), either alone or in combination with DNA repair deficiencies. These rather complex phenomena can occur with relative ease, with SSM approaching a frequency of 10−4 per bacterial cell division and allowing high-frequency genetic switching. Bacteria use this random strategy to adapt their genetic repertoire in response to selective environmental pressure. SSR-mediated variation has important implications for bacterial pathogenesis and evolutionary fitness. Molecular analysis of changes in SSRs allows epidemiological studies on the spread of pathogenic bacteria. The occurrence, evolution and function of SSRs, and the molecular methods used to analyze them are discussed in the context of responsiveness to environmental factors, bacterial pathogenicity, epidemiology, and the availability of full-genome sequences for increasing numbers of microorganisms, especially those that are medically relevant. PMID:9618442

  2. A Systematic Prediction of Drug-Target Interactions Using Molecular Fingerprints and Protein Sequences.

    PubMed

    Huang, Yu-An; You, Zhu-Hong; Chen, Xing

    2018-01-01

    Drug-Target Interactions (DTI) play a crucial role in discovering new drug candidates and finding new proteins to target for drug development. Although the number of detected DTI obtained by high-throughput techniques has been increasing, the number of known DTI is still limited. On the other hand, the experimental methods for detecting the interactions among drugs and proteins are costly and inefficient. Therefore, computational approaches for predicting DTI are drawing increasing attention in recent years. In this paper, we report a novel computational model for predicting the DTI using extremely randomized trees model and protein amino acids information. More specifically, the protein sequence is represented as a Pseudo Substitution Matrix Representation (Pseudo-SMR) descriptor in which the influence of biological evolutionary information is retained. For the representation of drug molecules, a novel fingerprint feature vector is utilized to describe its substructure information. Then the DTI pair is characterized by concatenating the two vector spaces of protein sequence and drug substructure. Finally, the proposed method is explored for predicting the DTI on four benchmark datasets: Enzyme, Ion Channel, GPCRs and Nuclear Receptor. The experimental results demonstrate that this method achieves promising prediction accuracies of 89.85%, 87.87%, 82.99% and 81.67%, respectively. For further evaluation, we compared the performance of Extremely Randomized Trees model with that of the state-of-the-art Support Vector Machine classifier. And we also compared the proposed model with existing computational models, and confirmed 15 potential drug-target interactions by looking for existing databases. The experiment results show that the proposed method is feasible and promising for predicting drug-target interactions for new drug candidate screening based on sizeable features. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  3. Sequence Determinants of Compaction in Intrinsically Disordered Proteins

    PubMed Central

    Marsh, Joseph A.; Forman-Kay, Julie D.

    2010-01-01

    Abstract Intrinsically disordered proteins (IDPs), which lack folded structure and are disordered under nondenaturing conditions, have been shown to perform important functions in a large number of cellular processes. These proteins have interesting structural properties that deviate from the random-coil-like behavior exhibited by chemically denatured proteins. In particular, IDPs are often observed to exhibit significant compaction. In this study, we have analyzed the hydrodynamic radii of a number of IDPs to investigate the sequence determinants of this compaction. Net charge and proline content are observed to be strongly correlated with increased hydrodynamic radii, suggesting that these are the dominant contributors to compaction. Hydrophobicity and secondary structure, on the other hand, appear to have negligible effects on compaction, which implies that the determinants of structure in folded and intrinsically disordered proteins are profoundly different. Finally, we observe that polyhistidine tags seem to increase IDP compaction, which suggests that these tags have significant perturbing effects and thus should be removed before any structural characterizations of IDPs. Using the relationships observed in this analysis, we have developed a sequence-based predictor of hydrodynamic radius for IDPs that shows substantial improvement over a simple model based upon chain length alone. PMID:20483348

  4. Novel and highly informative Capsicum SSR markers and their cross-species transferability.

    PubMed

    Buso, G S C; Reis, A M M; Amaral, Z P S; Ferreira, M E

    2016-09-23

    This study was undertaken primarily to develop new simple sequence repeat (SSR) markers for Capsicum. As part of this project aimed at broadening the use of molecular tools in Capsicum breeding, two genomic libraries enriched for AG/TC repeat sequences were constructed for Capsicum annuum. A total of 475 DNA clones were sequenced from both libraries and 144 SSR markers were tested on cultivated and wild species of Capsicum. Forty-five SSR markers were randomly selected to genotype a panel of 48 accessions of the Capsicum germplasm bank. The number of alleles per locus ranged from 2 to 11, with an average of 6 alleles. The polymorphism information content was on average 0.60, ranging from 0.20 to 0.83. The cross-species transferability to seven cultivated and wild Capsicum species was tested with a set of 91 SSR markers. We found that a high proportion of the loci produced amplicons in all species tested. C. frutescens had the highest number of transferable markers, whereas the wild species had the lowest. Our results indicate that the new markers can be readily used in genetic analyses of Capsicum.

  5. Natural Time and Nowcasting Earthquakes: Are Large Global Earthquakes Temporally Clustered?

    NASA Astrophysics Data System (ADS)

    Luginbuhl, Molly; Rundle, John B.; Turcotte, Donald L.

    2018-02-01

    The objective of this paper is to analyze the temporal clustering of large global earthquakes with respect to natural time, or interevent count, as opposed to regular clock time. To do this, we use two techniques: (1) nowcasting, a new method of statistically classifying seismicity and seismic risk, and (2) time series analysis of interevent counts. We chose the sequences of M_{λ } ≥ 7.0 and M_{λ } ≥ 8.0 earthquakes from the global centroid moment tensor (CMT) catalog from 2004 to 2016 for analysis. A significant number of these earthquakes will be aftershocks of the largest events, but no satisfactory method of declustering the aftershocks in clock time is available. A major advantage of using natural time is that it eliminates the need for declustering aftershocks. The event count we utilize is the number of small earthquakes that occur between large earthquakes. The small earthquake magnitude is chosen to be as small as possible, such that the catalog is still complete based on the Gutenberg-Richter statistics. For the CMT catalog, starting in 2004, we found the completeness magnitude to be M_{σ } ≥ 5.1. For the nowcasting method, the cumulative probability distribution of these interevent counts is obtained. We quantify the distribution using the exponent, β, of the best fitting Weibull distribution; β = 1 for a random (exponential) distribution. We considered 197 earthquakes with M_{λ } ≥ 7.0 and found β = 0.83 ± 0.08. We considered 15 earthquakes with M_{λ } ≥ 8.0, but this number was considered too small to generate a meaningful distribution. For comparison, we generated synthetic catalogs of earthquakes that occur randomly with the Gutenberg-Richter frequency-magnitude statistics. We considered a synthetic catalog of 1.97 × 10^5 M_{λ } ≥ 7.0 earthquakes and found β = 0.99 ± 0.01. The random catalog converted to natural time was also random. We then generated 1.5 × 10^4 synthetic catalogs with 197 M_{λ } ≥ 7.0 in each catalog and found the statistical range of β values. The observed value of β = 0.83 for the CMT catalog corresponds to a p value of p=0.004 leading us to conclude that the interevent natural times in the CMT catalog are not random. For the time series analysis, we calculated the autocorrelation function for the sequence of natural time intervals between large global earthquakes and again compared with data from 1.5 × 10^4 synthetic catalogs of random data. In this case, the spread of autocorrelation values was much larger, so we concluded that this approach is insensitive to deviations from random behavior.

  6. Origins of Protein Functions in Cells

    NASA Technical Reports Server (NTRS)

    Seelig, Burchard; Pohorille, Andrzej

    2011-01-01

    In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis and in vitro evolution of very large libraries of random amino acid sequences. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions yet, important clues have been uncovered. In one example (Keefe and Szostak, 2001), novel ATP binding proteins were identified that appear to be unrelated in both sequence and structure to any known ATP binding proteins. One of these proteins was subsequently redesigned computationally to bind GTP through introducing several mutations that introduce targeted structural changes to the protein, improve its binding to guanine and prevent water from accessing the active center. This study facilitates further investigations of individual evolutionary steps that lead to a change of function in primordial proteins. In a second study (Seelig and Szostak, 2007), novel enzymes were generated that can join two pieces of RNA in a reaction for which no natural enzymes are known. Recently it was found that, as in the previous case, the proteins have a structure unknown among modern enzymes. In this case, in vitro evolution started from a small, non-enzymatic protein. A similar selection process initiated from a library of random polypeptides is in progress. These results not only allow for estimating the occurrence of function in random protein assemblies but also provide evidence for the possibility of alternative protein worlds. Extant proteins might simply represent a frozen accident in the world of possible proteins. Alternative collections of proteins, even with similar functions, could originate alternative evolutionary paths.

  7. Mycelial Propagation and Molecular Phylogenetic Relationships of Commercially Cultivated Agrocybe cylindracea based on ITS Sequences and RAPD

    PubMed Central

    Alam, Nuhu; Kim, Jeong Hwa; Shim, Mi Ja; Lee, U Youn

    2010-01-01

    This study evaluated the optimal vegetative growth conditions and molecular phylogenetic relationships of eleven strains of Agrocybe cylindracea collected from different ecological regions of Korea, China and Taiwan. The optimal temperature and pH for mycelial growth were observed at 25℃ and 6. Potato dextrose agar and Hennerberg were the favorable media for vegetative growth, whereas glucose tryptone was unfavorable. Dextrin, maltose, and fructose were the most effective carbon sources. The most suitable nitrogen sources were arginine and glycine, whereas methionine, alanine, histidine, and urea were least effective for the mycelial propagation of A. cylindracea. The internal transcribed spacer (ITS) regions of rDNA were amplified using PCR. The sequence of ITS2 was more variable than that of ITS1, while the 5.8S sequences were identical. The reciprocal homologies of the ITS sequences ranged from 98 to 100%. The strains were also analyzed by random amplification of polymorphic DNA (RAPD) using 20 arbitrary primers. Fifteen primers efficiently amplified the genomic DNA. The average number of polymorphic bands observed per primer was 3.8. The numbers of amplified bands varied based on the primers and strains, with polymorphic fragments ranging from 0.1 to 2.9 kb. The results of RAPD analysis were similar to the ITS region sequences. The results revealed that RAPD and ITS techniques were well suited for detecting the genetic diversity of all A. cylindracea strains tested. PMID:23956633

  8. rasbhari: Optimizing Spaced Seeds for Database Searching, Read Mapping and Alignment-Free Sequence Comparison.

    PubMed

    Hahn, Lars; Leimeister, Chris-André; Ounit, Rachid; Lonardi, Stefano; Morgenstern, Burkhard

    2016-10-01

    Many algorithms for sequence analysis rely on word matching or word statistics. Often, these approaches can be improved if binary patterns representing match and don't-care positions are used as a filter, such that only those positions of words are considered that correspond to the match positions of the patterns. The performance of these approaches, however, depends on the underlying patterns. Herein, we show that the overlap complexity of a pattern set that was introduced by Ilie and Ilie is closely related to the variance of the number of matches between two evolutionarily related sequences with respect to this pattern set. We propose a modified hill-climbing algorithm to optimize pattern sets for database searching, read mapping and alignment-free sequence comparison of nucleic-acid sequences; our implementation of this algorithm is called rasbhari. Depending on the application at hand, rasbhari can either minimize the overlap complexity of pattern sets, maximize their sensitivity in database searching or minimize the variance of the number of pattern-based matches in alignment-free sequence comparison. We show that, for database searching, rasbhari generates pattern sets with slightly higher sensitivity than existing approaches. In our Spaced Words approach to alignment-free sequence comparison, pattern sets calculated with rasbhari led to more accurate estimates of phylogenetic distances than the randomly generated pattern sets that we previously used. Finally, we used rasbhari to generate patterns for short read classification with CLARK-S. Here too, the sensitivity of the results could be improved, compared to the default patterns of the program. We integrated rasbhari into Spaced Words; the source code of rasbhari is freely available at http://rasbhari.gobics.de/.

  9. Succinylcholine versus rocuronium for rapid sequence intubation in intensive care: a prospective, randomized controlled trial

    PubMed Central

    2011-01-01

    Introduction Succinylcholine and rocuronium are widely used to facilitate rapid sequence induction (RSI) intubation in intensive care. Concerns relate to the side effects of succinylcholine and to slower onset and inferior intubation conditions associated with rocuronium. So far, succinylcholine and rocuronium have not been compared in an adequately powered randomized trial in intensive care. Accordingly, the aim of the present study was to compare the incidence of hypoxemia after rocuronium or succinylcholine in critically ill patients requiring an emergent RSI. Methods This was a prospective randomized controlled single-blind trial conducted from 2006 to 2010 at the University Hospital of Basel. Participants were 401 critically ill patients requiring emergent RSI. Patients were randomized to receive 1 mg/kg succinylcholine or 0.6 mg/kg rocuronium for neuromuscular blockade. The primary outcome was the incidence of oxygen desaturations defined as a decrease in oxygen saturation ≥ 5%, assessed by continuous pulse oxymetry, at any time between the start of the induction sequence and two minutes after the completion of the intubation. A severe oxygen desaturation was defined as a decrease in oxygen saturation ≥ 5% leading to a saturation value of ≤ 80%. Results There was no difference between succinylcholine and rocuronium regarding oxygen desaturations (succinylcholine 73/196; rocuronium 66/195; P = 0.67); severe oxygen desaturations (succinylcholine 20/196; rocuronium 20/195; P = 1.0); and extent of oxygen desaturations (succinylcholine -14 ± 12%; rocuronium -16 ± 13%; P = 0.77). The duration of the intubation sequence was shorter after succinycholine than after rocuronium (81 ± 38 sec versus 95 ± 48 sec; P = 0.002). Intubation conditions (succinylcholine 8.3 ± 0.8; rocuronium 8.2 ± 0.9; P = 0.7) and failed first intubation attempts (succinylcholine 32/200; rocuronium 36/201; P = 1.0) did not differ between the groups. Conclusions In critically ill patients undergoing emergent RSI, incidence and severity of oxygen desaturations, the quality of intubation conditions, and incidence of failed intubation attempts did not differ between succinylcholine and rocuronium. Trial Registration ClinicalTrials.gov, number NCT00355368. PMID:21846380

  10. Rényi continuous entropy of DNA sequences.

    PubMed

    Vinga, Susana; Almeida, Jonas S

    2004-12-07

    Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.

  11. Ranked solutions to a class of combinatorial optimizations—with applications in mass spectrometry based peptide sequencing and a variant of directed paths in random media

    NASA Astrophysics Data System (ADS)

    Doerr, Timothy P.; Alves, Gelio; Yu, Yi-Kuo

    2005-08-01

    Typical combinatorial optimizations are NP-hard; however, for a particular class of cost functions the corresponding combinatorial optimizations can be solved in polynomial time using the transfer matrix technique or, equivalently, the dynamic programming approach. This suggests a way to efficiently find approximate solutions-find a transformation that makes the cost function as similar as possible to that of the solvable class. After keeping many high-ranking solutions using the approximate cost function, one may then re-assess these solutions with the full cost function to find the best approximate solution. Under this approach, it is important to be able to assess the quality of the solutions obtained, e.g., by finding the true ranking of the kth best approximate solution when all possible solutions are considered exhaustively. To tackle this statistical issue, we provide a systematic method starting with a scaling function generated from the finite number of high-ranking solutions followed by a convergent iterative mapping. This method, useful in a variant of the directed paths in random media problem proposed here, can also provide a statistical significance assessment for one of the most important proteomic tasks-peptide sequencing using tandem mass spectrometry data. For directed paths in random media, the scaling function depends on the particular realization of randomness; in the mass spectrometry case, the scaling function is spectrum-specific.

  12. Monoallelic Gene Expression in Mammals.

    PubMed

    Chess, Andrew

    2016-11-23

    Monoallelic expression not due to cis-regulatory sequence polymorphism poses an intriguing problem in epigenetics because it requires the unequal treatment of two segments of DNA that are present in the same nucleus and that can indeed have absolutely identical sequences. Here, I focus on a few recent developments in the field of monoallelic expression that are of particular interest and raise interesting questions for future work. One development is regarding analyses of imprinted genes, in which recent work suggests the possibility that intriguing networks of imprinted genes exist and are important for genetic and physiological studies. Another issue that has been raised in recent years by a number of publications is the question of how skewed allelic expression should be for it to be designated as monoallelic expression and, further, what methods are appropriate or inappropriate for analyzing genomic data to examine allele-specific expression. Perhaps the most exciting recent development in mammalian monoallelic expression is a clever and carefully executed analysis of genetic diversity of autosomal genes subject to random monoallelic expression (RMAE), which provides compelling evidence for distinct evolutionary forces acting on random monoallelically expressed genes.

  13. Effects of different preservation methods on inter simple sequence repeat (ISSR) and random amplified polymorphic DNA (RAPD) molecular markers in botanic samples.

    PubMed

    Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia

    2017-04-01

    To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.

  14. Stimulus novelty, task relevance and the visual evoked potential in man

    NASA Technical Reports Server (NTRS)

    Courchesne, E.; Hillyard, S. A.; Galambos, R.

    1975-01-01

    The effect of task relevance on P3 (waveform of human evoked potential) waves and the methodologies used to deal with them are outlined. Visual evoked potentials (VEPs) were recorded from normal adult subjects performing in a visual discrimination task. Subjects counted the number of presentations of the numeral 4 which was interposed rarely and randomly within a sequence of tachistoscopically flashed background stimuli. Intrusive, task-irrelevant (not counted) stimuli were also interspersed rarely and randomly in the sequence of 2s; these stimuli were of two types: simples, which were easily recognizable, and novels, which were completely unrecognizable. It was found that the simples and the counted 4s evoked posteriorly distributed P3 waves while the irrelevant novels evoked large, frontally distributed P3 waves. These large, frontal P3 waves to novels were also found to be preceded by large N2 waves. These findings indicate that the P3 wave is not a unitary phenomenon but should be considered in terms of a family of waves, differing in their brain generators and in their psychological correlates.

  15. Synchronization of random bit generators based on coupled chaotic lasers and application to cryptography.

    PubMed

    Kanter, Ido; Butkovski, Maria; Peleg, Yitzhak; Zigzag, Meital; Aviad, Yaara; Reidler, Igor; Rosenbluh, Michael; Kinzel, Wolfgang

    2010-08-16

    Random bit generators (RBGs) constitute an important tool in cryptography, stochastic simulations and secure communications. The later in particular has some difficult requirements: high generation rate of unpredictable bit strings and secure key-exchange protocols over public channels. Deterministic algorithms generate pseudo-random number sequences at high rates, however, their unpredictability is limited by the very nature of their deterministic origin. Recently, physical RBGs based on chaotic semiconductor lasers were shown to exceed Gbit/s rates. Whether secure synchronization of two high rate physical RBGs is possible remains an open question. Here we propose a method, whereby two fast RBGs based on mutually coupled chaotic lasers, are synchronized. Using information theoretic analysis we demonstrate security against a powerful computational eavesdropper, capable of noiseless amplification, where all parameters are publicly known. The method is also extended to secure synchronization of a small network of three RBGs.

  16. Run charts revisited: a simulation study of run chart rules for detection of non-random variation in health care processes.

    PubMed

    Anhøj, Jacob; Olesen, Anne Vingaard

    2014-01-01

    A run chart is a line graph of a measure plotted over time with the median as a horizontal line. The main purpose of the run chart is to identify process improvement or degradation, which may be detected by statistical tests for non-random patterns in the data sequence. We studied the sensitivity to shifts and linear drifts in simulated processes using the shift, crossings and trend rules for detecting non-random variation in run charts. The shift and crossings rules are effective in detecting shifts and drifts in process centre over time while keeping the false signal rate constant around 5% and independent of the number of data points in the chart. The trend rule is virtually useless for detection of linear drift over time, the purpose it was intended for.

  17. Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

    NASA Technical Reports Server (NTRS)

    Gatlin, L. L.

    1974-01-01

    Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.

  18. Numerical study of Potts models with aperiodic modulations: influence on first-order transitions

    NASA Astrophysics Data System (ADS)

    Branco, Nilton; Girardi, Daniel

    2012-02-01

    We perform a numerical study of Potts models on a rectangular lattice with aperiodic interactions along one spatial direction. The number of states q is such that the transition is a first-order one for the uniform model. The Wolff algorithm is employed, for many lattice sizes, allowing for a finite-size scaling analyses to be carried out. Three different self-dual aperiodic sequences are employed, such that the exact critical temperature is known: this leads to precise results for the exponents. We analyze models with q=6 and 15 and show that the Harris-Luck criterion, originally introduced in the study of continuous transitions, is obeyed also for first-order ones. The new universality class that emerges for relevant aperiodic modulations depends on the number of states of the Potts model, as obtained elsewhere for random disorder, and on the aperiodic sequence. We determine the occurrence of log-periodic behavior, as expected for models with aperiodic modulated interactions.

  19. A Generative Angular Model of Protein Structure Evolution

    PubMed Central

    Golden, Michael; García-Portugués, Eduardo; Sørensen, Michael; Mardia, Kanti V.; Hamelryck, Thomas; Hein, Jotun

    2017-01-01

    Abstract Recently described stochastic models of protein evolution have demonstrated that the inclusion of structural information in addition to amino acid sequences leads to a more reliable estimation of evolutionary parameters. We present a generative, evolutionary model of protein structure and sequence that is valid on a local length scale. The model concerns the local dependencies between sequence and structure evolution in a pair of homologous proteins. The evolutionary trajectory between the two structures in the protein pair is treated as a random walk in dihedral angle space, which is modeled using a novel angular diffusion process on the two-dimensional torus. Coupling sequence and structure evolution in our model allows for modeling both “smooth” conformational changes and “catastrophic” conformational jumps, conditioned on the amino acid changes. The model has interpretable parameters and is comparatively more realistic than previous stochastic models, providing new insights into the relationship between sequence and structure evolution. For example, using the trained model we were able to identify an apparent sequence–structure evolutionary motif present in a large number of homologous protein pairs. The generative nature of our model enables us to evaluate its validity and its ability to simulate aspects of protein evolution conditioned on an amino acid sequence, a related amino acid sequence, a related structure or any combination thereof. PMID:28453724

  20. Exceptional diversity, non-random distribution, and rapid evolution of retroelements in the B73 maize genome.

    PubMed

    Baucom, Regina S; Estill, James C; Chaparro, Cristian; Upshaw, Naadira; Jogi, Ansuya; Deragon, Jean-Marc; Westerman, Richard P; Sanmiguel, Phillip J; Bennetzen, Jeffrey L

    2009-11-01

    Recent comprehensive sequence analysis of the maize genome now permits detailed discovery and description of all transposable elements (TEs) in this complex nuclear environment. Reiteratively optimized structural and homology criteria were used in the computer-assisted search for retroelements, TEs that transpose by reverse transcription of an RNA intermediate, with the final results verified by manual inspection. Retroelements were found to occupy the majority (>75%) of the nuclear genome in maize inbred B73. Unprecedented genetic diversity was discovered in the long terminal repeat (LTR) retrotransposon class of retroelements, with >400 families (>350 newly discovered) contributing >31,000 intact elements. The two other classes of retroelements, SINEs (four families) and LINEs (at least 30 families), were observed to contribute 1,991 and approximately 35,000 copies, respectively, or a combined approximately 1% of the B73 nuclear genome. With regard to fully intact elements, median copy numbers for all retroelement families in maize was 2 because >250 LTR retrotransposon families contained only one or two intact members that could be detected in the B73 draft sequence. The majority, perhaps all, of the investigated retroelement families exhibited non-random dispersal across the maize genome, with LINEs, SINEs, and many low-copy-number LTR retrotransposons exhibiting a bias for accumulation in gene-rich regions. In contrast, most (but not all) medium- and high-copy-number LTR retrotransposons were found to preferentially accumulate in gene-poor regions like pericentromeric heterochromatin, while a few high-copy-number families exhibited the opposite bias. Regions of the genome with the highest LTR retrotransposon density contained the lowest LTR retrotransposon diversity. These results indicate that the maize genome provides a great number of different niches for the survival and procreation of a great variety of retroelements that have evolved to differentially occupy and exploit this genomic diversity.

  1. Bottom-up driven involuntary auditory evoked field change: constant sound sequencing amplifies but does not sharpen neural activity.

    PubMed

    Okamoto, Hidehiko; Stracke, Henning; Lagemann, Lothar; Pantev, Christo

    2010-01-01

    The capability of involuntarily tracking certain sound signals during the simultaneous presence of noise is essential in human daily life. Previous studies have demonstrated that top-down auditory focused attention can enhance excitatory and inhibitory neural activity, resulting in sharpening of frequency tuning of auditory neurons. In the present study, we investigated bottom-up driven involuntary neural processing of sound signals in noisy environments by means of magnetoencephalography. We contrasted two sound signal sequencing conditions: "constant sequencing" versus "random sequencing." Based on a pool of 16 different frequencies, either identical (constant sequencing) or pseudorandomly chosen (random sequencing) test frequencies were presented blockwise together with band-eliminated noises to nonattending subjects. The results demonstrated that the auditory evoked fields elicited in the constant sequencing condition were significantly enhanced compared with the random sequencing condition. However, the enhancement was not significantly different between different band-eliminated noise conditions. Thus the present study confirms that by constant sound signal sequencing under nonattentive listening the neural activity in human auditory cortex can be enhanced, but not sharpened. Our results indicate that bottom-up driven involuntary neural processing may mainly amplify excitatory neural networks, but may not effectively enhance inhibitory neural circuits.

  2. A yoga program for cognitive enhancement.

    PubMed

    Brunner, Devon; Abramovitch, Amitai; Etherton, Joseph

    2017-01-01

    Recent studies suggest that yoga practice may improve cognitive functioning. Although preliminary data indicate that yoga improves working memory (WM), high-resolution information about the type of WM subconstructs, namely maintenance and manipulation, is not available. Furthermore, the association between cognitive enhancement and improved mindfulness as a result of yoga practice requires empirical examination. The aim of the present study is to assess the impact of a brief yoga program on WM maintenance, WM manipulation and attentive mindfulness. Measures of WM (Digit Span Forward, Backward, and Sequencing, and Letter-Number Sequencing) were administered prior to and following 6 sessions of yoga (N = 43). Additionally, the Mindfulness Attention Awareness Scale was administered to examine the potential impact of yoga practice on mindfulness, as well as the relationships among changes in WM and mindfulness. Analyses revealed significant improvement from pre- to post- training assessment on both maintenance WM (Digit Span Forward) and manipulation WM (Digit Span Backward and Letter-Number Sequencing). No change was found on Digit Span Sequencing. Improvement was also found on mindfulness scores. However, no correlation was observed between mindfulness and WM measures. A 6-session yoga program was associated with improvement on manipulation and maintenance WM measures as well as enhanced mindfulness scores. Additional research is needed to understand the extent of yoga-related cognitive enhancement and mechanisms by which yoga may enhance cognition, ideally by utilizing randomized controlled trials and more comprehensive neuropsychological batteries.

  3. Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length

    PubMed Central

    Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.

    2012-01-01

    Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677

  4. Portable and Error-Free DNA-Based Data Storage.

    PubMed

    Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

    2017-07-10

    DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.

  5. Multistate Lempel-Ziv (MLZ) index interpretation as a measure of amplitude and complexity changes.

    PubMed

    Sarlabous, Leonardo; Torres, Abel; Fiz, Jose A; Gea, Joaquim; Galdiz, Juan B; Jane, Raimon

    2009-01-01

    The Lempel-Ziv complexity (LZ) has been widely used to evaluate the randomness of finite sequences. In general, the LZ complexity has been used to determine the complexity grade present in biomedical signals. The LZ complexity is not able to discern between signals with different amplitude variations and similar random components. On the other hand, amplitude parameters, as the root mean square (RMS), are not able to discern between signals with similar power distributions and different random components. In this work, we present a novel method to quantify amplitude and complexity variations in biomedical signals by means of the computation of the LZ coefficient using more than two quantification states, and with thresholds fixed and independent of the dynamic range or standard deviation of the analyzed signal: the Multistate Lempel-Ziv (MLZ) index. Our results indicate that MLZ index with few quantification levels only evaluate the complexity changes of the signal, with high number of levels, the amplitude variations, and with an intermediate number of levels informs about both amplitude and complexity variations. The study performed in diaphragmatic mechanomyographic signals shows that the amplitude variations of this signal are more correlated with the respiratory effort than the complexity variations. Furthermore, it has been observed that the MLZ index with high number of levels practically is not affected by the existence of impulsive, sinusoidal, constant and Gaussian noises compared with the RMS amplitude parameter.

  6. Long-range correlations and charge transport properties of DNA sequences

    NASA Astrophysics Data System (ADS)

    Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

    2010-04-01

    By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5

  7. Discovery of Genome-Wide Microsatellite Markers in Scombridae: A Pilot Study on Albacore Tuna

    PubMed Central

    Nikolic, Natacha; Duthoy, Stéphanie; Destombes, Antoine; Bodin, Nathalie; West, Wendy; Puech, Alexis; Bourjea, Jérôme

    2015-01-01

    Recent developments in sequencing technologies and bioinformatics analysis provide a greater amount of DNA sequencing reads at a low cost. Microsatellites are the markers of choice for a variety of population genetic studies, and high quality markers can be discovered in non-model organisms, such as tuna, with these recent developments. Here, we use a high-throughput method to isolate microsatellite markers in albacore tuna, Thunnus alalunga, based on coupling multiplex enrichment and next-generation sequencing on 454 GS-FLX Titanium pyrosequencing. The crucial minimum number of polymorphic markers to infer evolutionary and ecological processes for this species has been described for the first time. We provide 1670 microsatellite design primer pairs, and technical and molecular genetics selection resulting in 43 polymorphic microsatellite markers. On this panel, we characterized 34 random and selectively neutral markers («neutral») and 9 «non-neutral» markers. The variability of «neutral» markers was screened with 136 individuals of albacore tuna from southwest Indian Ocean (42), northwest Indian Ocean (31), South Africa (31), and southeast Atlantic Ocean (32). Power analysis demonstrated that the panel of genetic markers can be applied in diversity and population genetics studies. Global genetic diversity for albacore was high with a mean number of alleles at 16.94; observed heterozygosity 66% and expected heterozygosity 77%. The number of individuals was insufficient to provide accurate results on differentiation. Of the 9 «non-neutral» markers, 3 were linked to a sequence of known function. The one is located to a sequence having an immunity function (ThuAla-Tcell-01) and the other to a sequence having energy allocation function (ThuAla-Hki-01). These two markers were genotyped on the 136 individuals and presented different diversity levels. ThuAla-Tcell-01 has a high number of alleles (20), heterozygosity (87–90%), and assignment index. ThuAla-Hki-01 has a lower number of alleles (9), low heterozygosity (24–27%), low assignment index and significant inbreeding. Finally, the 34 «neutral» and 3 «non-neutral» microsatellites markers were tested on four economically important Scombridae species—Thunnus albacares, Thunnus thynnus, Thunnus obesus, and Acanthocybium solandri. PMID:26544051

  8. A general strategy for cloning viroids and other small circular RNAs that uses minimal amounts of template and does not require prior knowledge of its sequence.

    PubMed

    Navarro, B; Daròs, J A; Flores, R

    1996-01-01

    Two PCR-based methods are described for obtaining clones of small circular RNAs of unknown sequence and for which only minute amounts are available. To avoid introducing any assumption about the RNA sequence, synthesis of the cDNAs is initiated with random primers. The cDNA population is then PCR-amplified using a primer whose sequence is present at both sides of the cDNAs, since they have been obtained with random hexamers and then a linker with the sequence of the PCR primer has been ligated to their termini, or because the cDNAs have been synthesized with an oligonucleotide that contains the sequence of the PCR primer at its 5' end and six randomized positions at its 3' end. The procedures need only approximately 50 ng of purified RNA template. The reasons for the emergence of cloning artifacts and precautions to avoid them are discussed.

  9. Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

    PubMed

    Herold, Keith E; Rasooly, Avraham

    2003-12-01

    Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.

  10. Secure self-calibrating quantum random-bit generator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fiorentino, M.; Santori, C.; Spillane, S. M.

    2007-03-15

    Random-bit generators (RBGs) are key components of a variety of information processing applications ranging from simulations to cryptography. In particular, cryptographic systems require 'strong' RBGs that produce high-entropy bit sequences, but traditional software pseudo-RBGs have very low entropy content and therefore are relatively weak for cryptography. Hardware RBGs yield entropy from chaotic or quantum physical systems and therefore are expected to exhibit high entropy, but in current implementations their exact entropy content is unknown. Here we report a quantum random-bit generator (QRBG) that harvests entropy by measuring single-photon and entangled two-photon polarization states. We introduce and implement a quantum tomographicmore » method to measure a lower bound on the 'min-entropy' of the system, and we employ this value to distill a truly random-bit sequence. This approach is secure: even if an attacker takes control of the source of optical states, a secure random sequence can be distilled.« less

  11. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

    PubMed

    Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

    2014-01-13

    Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.

  12. Constructing high complexity synthetic libraries of long ORFs using in vitro selection

    NASA Technical Reports Server (NTRS)

    Cho, G.; Keefe, A. D.; Liu, R.; Wilson, D. S.; Szostak, J. W.

    2000-01-01

    We present a method that can significantly increase the complexity of protein libraries used for in vitro or in vivo protein selection experiments. Protein libraries are often encoded by chemically synthesized DNA, in which part of the open reading frame is randomized. There are, however, major obstacles associated with the chemical synthesis of long open reading frames, especially those containing random segments. Insertions and deletions that occur during chemical synthesis cause frameshifts, and stop codons in the random region will cause premature termination. These problems can together greatly reduce the number of full-length synthetic genes in the library. We describe a strategy in which smaller segments of the synthetic open reading frame are selected in vitro using mRNA display for the absence of frameshifts and stop codons. These smaller segments are then ligated together to form combinatorial libraries of long uninterrupted open reading frames. This process can increase the number of full-length open reading frames in libraries by up to two orders of magnitude, resulting in protein libraries with complexities of greater than 10(13). We have used this methodology to generate three types of displayed protein library: a completely random sequence library, a library of concatemerized oligopeptide cassettes with a propensity for forming amphipathic alpha-helical or beta-strand structures, and a library based on one of the most common enzymatic scaffolds, the alpha/beta (TIM) barrel. Copyright 2000 Academic Press.

  13. Mitochondrial genomic variation associated with higher mitochondrial copy number: the Cache County Study on Memory Health and Aging.

    PubMed

    Ridge, Perry G; Maxwell, Taylor J; Foutz, Spencer J; Bailey, Matthew H; Corcoran, Christopher D; Tschanz, JoAnn T; Norton, Maria C; Munger, Ronald G; O'Brien, Elizabeth; Kerber, Richard A; Cawthon, Richard M; Kauwe, John S K

    2014-01-01

    The mitochondria are essential organelles and are the location of cellular respiration, which is responsible for the majority of ATP production. Each cell contains multiple mitochondria, and each mitochondrion contains multiple copies of its own circular genome. The ratio of mitochondrial genomes to nuclear genomes is referred to as mitochondrial copy number. Decreases in mitochondrial copy number are known to occur in many tissues as people age, and in certain diseases. The regulation of mitochondrial copy number by nuclear genes has been studied extensively. While mitochondrial variation has been associated with longevity and some of the diseases known to have reduced mitochondrial copy number, the role that the mitochondrial genome itself has in regulating mitochondrial copy number remains poorly understood. We analyzed the complete mitochondrial genomes from 1007 individuals randomly selected from the Cache County Study on Memory Health and Aging utilizing the inferred evolutionary history of the mitochondrial haplotypes present in our dataset to identify sequence variation and mitochondrial haplotypes associated with changes in mitochondrial copy number. Three variants belonging to mitochondrial haplogroups U5A1 and T2 were significantly associated with higher mitochondrial copy number in our dataset. We identified three variants associated with higher mitochondrial copy number and suggest several hypotheses for how these variants influence mitochondrial copy number by interacting with known regulators of mitochondrial copy number. Our results are the first to report sequence variation in the mitochondrial genome that causes changes in mitochondrial copy number. The identification of these variants that increase mtDNA copy number has important implications in understanding the pathological processes that underlie these phenotypes.

  14. Emergence of Complexity in Protein Functions and Metabolic Networks

    NASA Technical Reports Server (NTRS)

    Pohorille, Andzej

    2009-01-01

    In modern organisms proteins perform a majority of cellular functions, such as chemical catalysis, energy transduction and transport of material across cell walls. Although great strides have been made towards understanding protein evolution, a meaningful extrapolation from contemporary proteins to their earliest ancestors is virtually impossible. In an alternative approach, the origin of water-soluble proteins was probed through the synthesis of very large libraries of random amino acid sequences and subsequently subjecting them to in vitro evolution. In combination with computer modeling and simulations, these experiments allow us to address a number of fundamental questions about the origins of proteins. Can functionality emerge from random sequences of proteins? How did the initial repertoire of functional proteins diversify to facilitate new functions? Did this diversification proceed primarily through drawing novel functionalities from random sequences or through evolution of already existing proto-enzymes? Did protein evolution start from a pool of proteins defined by a frozen accident and other collections of proteins could start a different evolutionary pathway? Although we do not have definitive answers to these questions, important clues have been uncovered. Considerable progress has been also achieved in understanding the origins of membrane proteins. We will address this issue in the example of ion channels - proteins that mediate transport of ions across cell walls. Remarkably, despite overall complexity of these proteins in contemporary cells, their structural motifs are quite simple, with -helices being most common. By combining results of experimental and computer simulation studies on synthetic models and simple, natural channels, I will show that, even though architectures of membrane proteins are not nearly as diverse as those of water-soluble proteins, they are sufficiently flexible to adapt readily to the functional demands arising during evolution.

  15. Environmental distribution, abundance and activity of the Miscellaneous Crenarchaeotal Group

    NASA Astrophysics Data System (ADS)

    Lloyd, K. G.; Biddle, J.; Teske, A.

    2011-12-01

    Many marine sedimentary microbes have only been identified by 16S rRNA sequences. Consequently, little is known about the types of metabolism, activity levels, or relative abundance of these groups in marine sediments. We found that one of these uncultured groups, called the Miscellaneous Crenarchaeotal Group (MCG), dominated clone libraries made from reverse transcribed 16S rRNA, and 454 pyrosequenced 16S rRNA genes, in the White Oak River estuary. Primers suitable for quantitative PCR were developed for MCG and used to show that 16S rRNA DNA copy numbers from MCG account for nearly all the archaeal 16S rRNA genes present. RT-qPCR shows much less MCG rRNA than total archaeal rRNA, but comparisons of different primers for each group suggest bias in the RNA-based work relative to the DNA-based work. There is no evidence of a population shift with depth below the sulfate-methane transition zone, suggesting that the metabolism of MCG may not be tied to sulfur or methane cycles. We classified 2,771 new sequences within the SSU Silva 106 database that, along with the classified sequences in the Silva database was used to make an MCG database of 4,646 sequences that allowed us to increase the named subgroups of MCG from 7 to 19. Percent terrestrial sequences in each subgroup is positively correlated with percent of the marine sequences that are nearshore, suggesting that membership in the different subgroups is not random, but dictated by environmental selective pressures. Given their high phylogenetic diversity, ubiquitous distribution in anoxic environments, and high DNA copy number relative to total archaea, members of MCG are most likely anaerobic heterotrophs who are integral to the post-depositional marine carbon cycle.

  16. Authentication of Cordyceps sinensis by DNA Analyses: Comparison of ITS Sequence Analysis and RAPD-Derived Molecular Markers.

    PubMed

    Lam, Kelly Y C; Chan, Gallant K L; Xin, Gui-Zhong; Xu, Hong; Ku, Chuen-Fai; Chen, Jian-Ping; Yao, Ping; Lin, Huang-Quan; Dong, Tina T X; Tsim, Karl W K

    2015-12-15

    Cordyceps sinensis is an endoparasitic fungus widely used as a tonic and medicinal food in the practice of traditional Chinese medicine (TCM). In historical usage, Cordyceps specifically is referring to the species of C. sinensis. However, a number of closely related species are named themselves as Cordyceps, and they are sold commonly as C. sinensis. The substitutes and adulterants of C. sinensis are often introduced either intentionally or accidentally in the herbal market, which seriously affects the therapeutic effects or even leads to life-threatening poisoning. Here, we aim to identify Cordyceps by DNA sequencing technology. Two different DNA-based approaches were compared. The internal transcribed spacer (ITS) sequences and the random amplified polymorphic DNA (RAPD)-sequence characterized amplified region (SCAR) were developed here to authenticate different species of Cordyceps. Both approaches generally enabled discrimination of C. sinensis from others. The application of the two methods, supporting each other, increases the security of identification. For better reproducibility and faster analysis, the SCAR markers derived from the RAPD results provide a new method for quick authentication of Cordyceps.

  17. The Effect of Practice Schedule on Context-Dependent Learning.

    PubMed

    Lee, Ya-Yun; Fisher, Beth E

    2018-03-02

    It is well established that random practice compared to blocked practice enhances motor learning. Additionally, while information in the environment may be incidental, learning is also enhanced when an individual performs a task within the same environmental context in which the task was originally practiced. This study aimed to disentangle the effects of practice schedule and incidental/environmental context on motor learning. Participants practiced three finger sequences under either a random or blocked practice schedule. Each sequence was associated with specific incidental context (i.e., color and location on the computer screen) during practice. The participants were tested under the conditions when the sequence-context associations remained the same or were changed from that of practice. When the sequence-context association was changed, the participants who practiced under blocked schedule demonstrated greater performance decrement than those who practiced under random schedule. The findings suggested that those participants who practiced under random schedule were more resistant to the change of environmental context.

  18. Molecular analysis of the microbial diversity present in the colonic wall, colonic lumen, and cecal lumen of a pig.

    PubMed

    Pryde, S E; Richardson, A J; Stewart, C S; Flint, H J

    1999-12-01

    Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined.

  19. Molecular Analysis of the Microbial Diversity Present in the Colonic Wall, Colonic Lumen, and Cecal Lumen of a Pig

    PubMed Central

    Pryde, Susan E.; Richardson, Anthony J.; Stewart, Colin S.; Flint, Harry J.

    1999-01-01

    Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined. PMID:10583991

  20. Selection of Optimal Polypurine Tract Region Sequences during Moloney Murine Leukemia Virus Replication

    PubMed Central

    Robson, Nicole D.; Telesnitsky, Alice

    2000-01-01

    Retrovirus plus-strand synthesis is primed by a cleavage remnant of the polypurine tract (PPT) region of viral RNA. In this study, we tested replication properties for Moloney murine leukemia viruses with targeted mutations in the PPT and in conserved sequences upstream, as well as for pools of mutants with randomized sequences in these regions. The importance of maintaining some purine residues within the PPT was indicated both by examining the evolution of random PPT pools and from the replication properties of targeted mutants. Although many different PPT sequences could support efficient replication and one mutant that contained two differences in the core PPT was found to replicate as well as the wild type, some sequences in the core PPT clearly conferred advantages over others. Contributions of sequences upstream of the core PPT were examined with deletion mutants. A conserved T-stretch within the upstream sequence was examined in detail and found to be unimportant to helper functions. Evolution of virus pools containing randomized T-stretch sequences demonstrated marked preference for the wild-type sequence in six of its eight positions. These findings demonstrate that maintenance of the T-rich element is more important to viral replication than is maintenance of the core PPT. PMID:11044073

  1. Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities.

    PubMed

    Gilbert, Jack A; Field, Dawn; Huang, Ying; Edwards, Rob; Li, Weizhong; Gilna, Paul; Joint, Ian

    2008-08-22

    Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities. We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, approximately 91% of which were novel. This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities--if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.

  2. Identification of peptide sequences that target to the brain using in vivo phage display.

    PubMed

    Li, Jingwei; Zhang, Qizhi; Pang, Zhiqing; Wang, Yuchen; Liu, Qingfeng; Guo, Liangran; Jiang, Xinguo

    2012-06-01

    Phage display technology could provide a rapid means for the discovery of novel peptides. To find peptide ligands specific for the brain vascular receptors, we performed a modified phage display method. Phages were recovered from mice brain parenchyma after administrated with a random 7-mer peptide library intravenously. A longer circulation time was arranged according to the biodistributive brain/blood ratios of phage particles. Following sequential rounds of isolation, a number of phages were sequenced and a peptide sequence (CTSTSAPYC, denoted as PepC7) was identified. Clone 7-1, which encodes PepC7, exhibited translocation efficiency about 41-fold higher than the random library phage. Immunofluorescence analysis revealed that Clone 7-1 had a significant superiority on transport efficiency into the brain compared with native M13 phage. Clone 7-1 was inhibited from homing to the brain in a dose-dependent fashion when cyclic peptides of the same sequence were present in a competition assay. Interestingly, the linear peptide (ATSTSAPYA, Pep7) and a scrambled control peptide PepSC7 (CSPATSYTC) did not compete with the phage at the same tested concentration (0.2-200 pg). Labeled by Cy5.5, PepC7 exhibited significant brain-targeting capability in in vivo optical imaging analysis. The cyclic conformation of PepC7 formed by disulfide bond, and the correct structure itself play a critical role in maintaining the selectivity and affinity for the brain. In conclusion, PepC7 is a promising brain-target motif never been reported before and it could be applied to targeted drug delivery into the brain.

  3. Rhythm sensitivity in macaque monkeys

    PubMed Central

    Selezneva, Elena; Deike, Susann; Knyazeva, Stanislava; Scheich, Henning; Brechmann, André; Brosch, Michael

    2013-01-01

    This study provides evidence that monkeys are rhythm sensitive. We composed isochronous tone sequences consisting of repeating triplets of two short tones and one long tone which humans perceive as repeating triplets of two weak and one strong beat. This regular sequence was compared to an irregular sequence with the same number of randomly arranged short and long tones with no such beat structure. To search for indication of rhythm sensitivity we employed an oddball paradigm in which occasional duration deviants were introduced in the sequences. In a pilot study on humans we showed that subjects more easily detected these deviants when they occurred in a regular sequence. In the monkeys we searched for spontaneous behaviors the animals executed concomitant with the deviants. We found that monkeys more frequently exhibited changes of gaze and facial expressions to the deviants when they occurred in the regular sequence compared to the irregular sequence. In addition we recorded neuronal firing and local field potentials from 175 sites of the primary auditory cortex during sequence presentation. We found that both types of neuronal signals differentiated regular from irregular sequences. Both signals were stronger in regular sequences and occurred after the onset of the long tones, i.e., at the position of the strong beat. Local field potential responses were also significantly larger for the durational deviants in regular sequences, yet in a later time window. We speculate that these temporal pattern-selective mechanisms with a focus on strong beats and their deviants underlie the perception of rhythm in the chosen sequences. PMID:24046732

  4. Efficient error correction for next-generation sequencing of viral amplicons

    PubMed Central

    2012-01-01

    Background Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. Results In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Conclusions Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses. The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm PMID:22759430

  5. Efficient error correction for next-generation sequencing of viral amplicons.

    PubMed

    Skums, Pavel; Dimitrova, Zoya; Campo, David S; Vaughan, Gilberto; Rossi, Livia; Forbi, Joseph C; Yokosawa, Jonny; Zelikovsky, Alex; Khudyakov, Yury

    2012-06-25

    Next-generation sequencing allows the analysis of an unprecedented number of viral sequence variants from infected patients, presenting a novel opportunity for understanding virus evolution, drug resistance and immune escape. However, sequencing in bulk is error prone. Thus, the generated data require error identification and correction. Most error-correction methods to date are not optimized for amplicon analysis and assume that the error rate is randomly distributed. Recent quality assessment of amplicon sequences obtained using 454-sequencing showed that the error rate is strongly linked to the presence and size of homopolymers, position in the sequence and length of the amplicon. All these parameters are strongly sequence specific and should be incorporated into the calibration of error-correction algorithms designed for amplicon sequencing. In this paper, we present two new efficient error correction algorithms optimized for viral amplicons: (i) k-mer-based error correction (KEC) and (ii) empirical frequency threshold (ET). Both were compared to a previously published clustering algorithm (SHORAH), in order to evaluate their relative performance on 24 experimental datasets obtained by 454-sequencing of amplicons with known sequences. All three algorithms show similar accuracy in finding true haplotypes. However, KEC and ET were significantly more efficient than SHORAH in removing false haplotypes and estimating the frequency of true ones. Both algorithms, KEC and ET, are highly suitable for rapid recovery of error-free haplotypes obtained by 454-sequencing of amplicons from heterogeneous viruses.The implementations of the algorithms and data sets used for their testing are available at: http://alan.cs.gsu.edu/NGS/?q=content/pyrosequencing-error-correction-algorithm.

  6. Perceptions of randomness in binary sequences: Normative, heuristic, or both?

    PubMed

    Reimers, Stian; Donkin, Chris; Le Pelley, Mike E

    2018-03-01

    When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)

  7. Bone-eating Osedax females and their 'harems' of dwarf males are recruited from a common larval pool.

    PubMed

    Vrijenhoek, R C; Johnson, S B; Rouse, G W

    2008-10-01

    Extreme male dwarfism occurs in Osedax (Annelida: Siboglinidae), marine worms with sessile females that bore into submerged bones. Osedax are hypothesized to use environmental sex determination, in which undifferentiated larvae that settle on bones develop as females, and subsequent larvae that settle on females transform into dwarf males. This study addresses several hypotheses regarding possible recruitment sources for the males: (i) common larval pool--males and females are sampled from a common pool of larvae; (ii) neighbourhood--males are supplied by a limited number of neighbouring females; and (iii) arrhenotoky--males are primarily the sons of host females. Osedax rubiplumus were sampled from submerged whalebones located at 1820-m and 2893-m depths in Monterey Bay, California. Immature females typically did not host males, but mature females maintained male 'harems' that grew exponentially in the number of males as female size increased. Allozyme analysis of the females revealed binomial proportions of nuclear genotypes, an indication of random sexual mating. Analysis of mitochondrial DNA sequences from the male harems and their host females allowed us to reject the arrhenotoky and neighbourhood hypotheses for male recruitment. No significant partitioning of mitochondrial diversity existed between the male and female sexes, or between subsamples of worms collected at different depths or during different years (2002-2007). Mitochondrial sequence diversity was very high in these worms, suggesting that as many as 10(6) females contributed to a common larval pool from which the two sexes were randomly drawn.

  8. Redshift data and statistical inference

    NASA Technical Reports Server (NTRS)

    Newman, William I.; Haynes, Martha P.; Terzian, Yervant

    1994-01-01

    Frequency histograms and the 'power spectrum analysis' (PSA) method, the latter developed by Yu & Peebles (1969), have been widely employed as techniques for establishing the existence of periodicities. We provide a formal analysis of these two classes of methods, including controlled numerical experiments, to better understand their proper use and application. In particular, we note that typical published applications of frequency histograms commonly employ far greater numbers of class intervals or bins than is advisable by statistical theory sometimes giving rise to the appearance of spurious patterns. The PSA method generates a sequence of random numbers from observational data which, it is claimed, is exponentially distributed with unit mean and variance, essentially independent of the distribution of the original data. We show that the derived random processes is nonstationary and produces a small but systematic bias in the usual estimate of the mean and variance. Although the derived variable may be reasonably described by an exponential distribution, the tail of the distribution is far removed from that of an exponential, thereby rendering statistical inference and confidence testing based on the tail of the distribution completely unreliable. Finally, we examine a number of astronomical examples wherein these methods have been used giving rise to widespread acceptance of statistically unconfirmed conclusions.

  9. Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

    PubMed Central

    Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg

    2017-01-01

    Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404

  10. Information and redundancy in the burial folding code of globular proteins within a wide range of shapes and sizes.

    PubMed

    Ferreira, Diogo C; van der Linden, Marx G; de Oliveira, Leandro C; Onuchic, José N; de Araújo, Antônio F Pereira

    2016-04-01

    Recent ab initio folding simulations for a limited number of small proteins have corroborated a previous suggestion that atomic burial information obtainable from sequence could be sufficient for tertiary structure determination when combined to sequence-independent geometrical constraints. Here, we use simulations parameterized by native burials to investigate the required amount of information in a diverse set of globular proteins comprising different structural classes and a wide size range. Burial information is provided by a potential term pushing each atom towards one among a small number L of equiprobable concentric layers. An upper bound for the required information is provided by the minimal number of layers L(min) still compatible with correct folding behavior. We obtain L(min) between 3 and 5 for seven small to medium proteins with 50 ≤ Nr ≤ 110 residues while for a larger protein with Nr = 141 we find that L ≥ 6 is required to maintain native stability. We additionally estimate the usable redundancy for a given L ≥ L(min) from the burial entropy associated to the largest folding-compatible fraction of "superfluous" atoms, for which the burial term can be turned off or target layers can be chosen randomly. The estimated redundancy for small proteins with L = 4 is close to 0.8. Our results are consistent with the above-average quality of burial predictions used in previous simulations and indicate that the fraction of approachable proteins could increase significantly with even a mild, plausible, improvement on sequence-dependent burial prediction or on sequence-independent constraints that augment the detectable redundancy during simulations. © 2016 Wiley Periodicals, Inc.

  11. Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

    PubMed Central

    Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

    2012-01-01

    ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136

  12. Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

    PubMed Central

    Dröge, J.; Gregor, I.; McHardy, A. C.

    2015-01-01

    Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150

  13. Rapid development of microsatellite markers for the endangered fish Schizothorax biddulphi (Günther) using next generation sequencing and cross-species amplification.

    PubMed

    Luo, Wei; Nie, Zhulan; Zhan, Fanbin; Wei, Jie; Wang, Weimin; Gao, Zexia

    2012-11-14

    Tarim schizothoracin (Schizothorax biddulphi) is an endemic fish species native to the Tarim River system of Xinjiang and has been classified as an extremely endangered freshwater fish species in China. Here, we used a next generation sequencing platform (ion torrent PGM™) to obtain a large number of microsatellites for S. biddulphi, for the first time. A total of 40577 contigs were assembled, which contained 1379 SSRs. In these SSRs, the number of dinucleotide repeats were the most frequent (77.08%) and AC repeats were the most frequently occurring microsatellite, followed by AG, AAT and AT. Fifty loci were randomly selected for primer development; of these, 38 loci were successfully amplified and 29 loci were polymorphic across panels of 30 individuals. The H(o) ranged from 0.15 to 0.83, and H(e) ranged from 0.15 to 0.85, with 3.5 alleles per locus on average. Cross-species utility indicated that 20 of these markers were successfully amplified in a related, also an endangered fish species, S. irregularis. This study suggests that PGM™ sequencing is a rapid and cost-effective tool for developing microsatellite markers for non-model species and the developed microsatellite markers in this study would be useful in Schizothorax genetic analysis.

  14. Population and forensic genetic analyses of mitochondrial DNA control region variation from six major provinces in the Korean population.

    PubMed

    Hong, Seung Beom; Kim, Ki Cheol; Kim, Wook

    2015-07-01

    We generated complete mitochondrial DNA (mtDNA) control region sequences from 704 unrelated individuals residing in six major provinces in Korea. In addition to our earlier survey of the distribution of mtDNA haplogroup variation, a total of 560 different haplotypes characterized by 271 polymorphic sites were identified, of which 473 haplotypes were unique. The gene diversity and random match probability were 0.9989 and 0.0025, respectively. According to the pairwise comparison of the 704 control region sequences, the mean number of pairwise differences between individuals was 13.47±6.06. Based on the result of mtDNA control region sequences, pairwise FST genetic distances revealed genetic homogeneity of the Korean provinces on a peninsular level, except in samples from Jeju Island. This result indicates there may be a need to formulate a local mtDNA database for Jeju Island, to avoid bias in forensic parameter estimates caused by genetic heterogeneity of the population. Thus, the present data may help not only in personal identification but also in determining maternal lineages to provide an expanded and reliable Korean mtDNA database. These data will be available on the EMPOP database via accession number EMP00661. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  15. Design of a randomized controlled trial for genomic carrier screening in healthy patients seeking preconception genetic testing.

    PubMed

    Kauffman, Tia L; Wilfond, Benjamin S; Jarvik, Gail P; Leo, Michael C; Lynch, Frances L; Reiss, Jacob A; Richards, C Sue; McMullen, Carmit; Nickerson, Deborah; Dorschner, Michael O; Goddard, Katrina A B

    2017-02-01

    Population-based carrier screening is limited to well-studied or high-impact genetic conditions for which the benefits may outweigh the associated harms and costs. As the cost of genome sequencing declines and availability increases, the balance of risks and benefits may change for a much larger number of genetic conditions, including medically actionable additional findings. We designed an RCT to evaluate genomic clinical sequencing for women and partners considering a pregnancy. All results are placed into the medical record for use by healthcare providers. Through quantitative and qualitative measures, including baseline and post result disclosure surveys, post result disclosure interviews, 1-2year follow-up interviews, and team journaling, we are obtaining data about the clinical and personal utility of genomic carrier screening in this population. Key outcomes include the number of reportable carrier and additional findings, and the comparative cost, utilization, and psychosocial impacts of usual care vs. genomic carrier screening. As the study progresses, we will compare the costs of genome sequencing and usual care as well as the cost of screening, pattern of use of genetic or mental health counseling services, number of outpatient visits, and total healthcare costs. This project includes novel investigation into human reactions and responses from would-be parents who are learning information that could both affect a future pregnancy and their own health. Copyright © 2016 The Authors. Published by Elsevier Inc. All rights reserved.

  16. Competition between B-Z and B-L transitions in a single DNA molecule: Computational studies

    NASA Astrophysics Data System (ADS)

    Kwon, Ah-Young; Nam, Gi-Moon; Johner, Albert; Kim, Seyong; Hong, Seok-Cheol; Lee, Nam-Kyung

    2016-02-01

    Under negative torsion, DNA adopts left-handed helical forms, such as Z-DNA and L-DNA. Using the random copolymer model developed for a wormlike chain, we represent a single DNA molecule with structural heterogeneity as a helical chain consisting of monomers which can be characterized by different helical senses and pitches. By Monte Carlo simulation, where we take into account bending and twist fluctuations explicitly, we study sequence dependence of B-Z transitions under torsional stress and tension focusing on the interaction with B-L transitions. We consider core sequences, (GC) n repeats or (TG) n repeats, which can interconvert between the right-handed B form and the left-handed Z form, imbedded in a random sequence, which can convert to left-handed L form with different (tension dependent) helical pitch. We show that Z-DNA formation from the (GC) n sequence is always supported by unwinding torsional stress but Z-DNA formation from the (TG) n sequence, which are more costly to convert but numerous, can be strongly influenced by the quenched disorder in the surrounding random sequence.

  17. Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees.

    PubMed

    Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard

    2010-03-31

    Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.

  18. Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.

    PubMed

    Nath, Abhigyan; Subbiah, Karthikeyan

    2015-12-01

    Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.

  19. Range-gated field disturbance sensor with range-sensitivity compensation

    DOEpatents

    McEwan, T.E.

    1996-05-28

    A field disturbance sensor operates with relatively low power, provides an adjustable operating range, is not hypersensitive at close range, allows co-location of multiple sensors, and is inexpensive to manufacture. The sensor includes a transmitter that transmits a sequence of transmitted bursts of electromagnetic energy. The transmitter frequency is modulated at an intermediate frequency. The sequence of bursts has a burst repetition rate, and each burst has a burst width and comprises a number of cycles at a transmitter frequency. The sensor includes a receiver which receives electromagnetic energy at the transmitter frequency, and includes a mixer which mixes a transmitted burst with reflections of the same transmitted burst to produce an intermediate frequency signal. Circuitry, responsive to the intermediate frequency signal indicates disturbances in the sensor field. Because the mixer mixes the transmitted burst with reflections of the transmitted burst, the burst width defines the sensor range. The burst repetition rate is randomly or pseudorandomly modulated so that bursts in the sequence of bursts have a phase which varies. 8 figs.

  20. Sequenced RAPD markers to detect hybridization in the barbary partridge (Alectoris barbara, Phasianidae).

    PubMed

    Barbanera, Filippo; Guerrini, Monica; Bertoncini, Franco; Cappelli, Fabio; Muzzeddu, Marco; Dini, Fernando

    2011-01-01

    In the Alectoris partridges (Phasianidae), hybridization occurs occasionally as a result of the natural breakdown of isolating mechanisms but more frequently as a result of human activity. No genetic record of hybridization is known for the barbary partridge (A. barbara). This species is distributed mostly in North Africa and, in Europe, on the island of Sardinia (Italy) and on Gibraltar. The risk of hybridization between barbary and red-legged partridge (A. rufa: Iberian Peninsula, France, Italy) is high in Sardinia and in Spain. We developed two random amplified polymorphic DNA (RAPD) markers to detect A. barbara × A. rufa hybrid partridges. We tested them on 125 experimental hybrids, sequenced the relative species-specific bands and found that the bands and their corresponding sequences were reliably transmitted through a number of generations (F1, F2, F3, BC1, BC2). Our markers represent a highly valuable tool for the preservation of the A. barbara genome from the pressing threat of A. rufa pollution. © 2010 Blackwell Publishing Ltd.

  1. Range-gated field disturbance sensor with range-sensitivity compensation

    DOEpatents

    McEwan, Thomas E.

    1996-01-01

    A field disturbance sensor operates with relatively low power, provides an adjustable operating range, is not hypersensitive at close range, allows co-location of multiple sensors, and is inexpensive to manufacture. The sensor includes a transmitter that transmits a sequence of transmitted bursts of electromagnetic energy. The transmitter frequency is modulated at an intermediate frequency. The sequence of bursts has a burst repetition rate, and each burst has a burst width and comprises a number of cycles at a transmitter frequency. The sensor includes a receiver which receives electromagnetic energy at the transmitter frequency, and includes a mixer which mixes a transmitted burst with reflections of the same transmitted burst to produce an intermediate frequency signal. Circuitry, responsive to the intermediate frequency signal indicates disturbances in the sensor field. Because the mixer mixes the transmitted burst with reflections of the transmitted burst, the burst width defines the sensor range. The burst repetition rate is randomly or pseudorandomly modulated so that bursts in the sequence of bursts have a phase which varies.

  2. Evolution of biological sequences implies an extreme value distribution of type I for both global and local pairwise alignment scores.

    PubMed

    Bastien, Olivier; Maréchal, Eric

    2008-08-07

    Confidence in pairwise alignments of biological sequences, obtained by various methods such as Blast or Smith-Waterman, is critical for automatic analyses of genomic data. Two statistical models have been proposed. In the asymptotic limit of long sequences, the Karlin-Altschul model is based on the computation of a P-value, assuming that the number of high scoring matching regions above a threshold is Poisson distributed. Alternatively, the Lipman-Pearson model is based on the computation of a Z-value from a random score distribution obtained by a Monte-Carlo simulation. Z-values allow the deduction of an upper bound of the P-value (1/Z-value2) following the TULIP theorem. Simulations of Z-value distribution is known to fit with a Gumbel law. This remarkable property was not demonstrated and had no obvious biological support. We built a model of evolution of sequences based on aging, as meant in Reliability Theory, using the fact that the amount of information shared between an initial sequence and the sequences in its lineage (i.e., mutual information in Information Theory) is a decreasing function of time. This quantity is simply measured by a sequence alignment score. In systems aging, the failure rate is related to the systems longevity. The system can be a machine with structured components, or a living entity or population. "Reliability" refers to the ability to operate properly according to a standard. Here, the "reliability" of a sequence refers to the ability to conserve a sufficient functional level at the folded and maturated protein level (positive selection pressure). Homologous sequences were considered as systems 1) having a high redundancy of information reflected by the magnitude of their alignment scores, 2) which components are the amino acids that can independently be damaged by random DNA mutations. From these assumptions, we deduced that information shared at each amino acid position evolved with a constant rate, corresponding to the information hazard rate, and that pairwise sequence alignment scores should follow a Gumbel distribution, which parameters could find some theoretical rationale. In particular, one parameter corresponds to the information hazard rate. Extreme value distribution of alignment scores, assessed from high scoring segments pairs following the Karlin-Altschul model, can also be deduced from the Reliability Theory applied to molecular sequences. It reflects the redundancy of information between homologous sequences, under functional conservative pressure. This model also provides a link between concepts of biological sequence analysis and of systems biology.

  3. Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

    PubMed Central

    2014-01-01

    Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292

  4. Development of a high-copy plasmid for enhanced production of recombinant proteins in Leuconostoc citreum.

    PubMed

    Son, Yeon Jeong; Ryu, Ae Jin; Li, Ling; Han, Nam Soo; Jeong, Ki Jun

    2016-01-15

    Leuconostoc is a hetero-fermentative lactic acid bacteria, and its importance is widely recognized in the dairy industry. However, due to limited genetic tools including plasmids for Leuconostoc, there has not been much extensive research on the genetics and engineering of Leuconostoc yet. Thus, there is a big demand for high-copy-number plasmids for useful gene manipulation and overproduction of recombinant proteins in Leuconostoc. Using an existing low-copy plasmid, the copy number of plasmid was increased by random mutagenesis followed by FACS-based high-throughput screening. First, a random library of plasmids was constructed by randomizing the region responsible for replication in Leuconostoc citreum; additionally, a superfolder green fluorescent protein (sfGFP) was used as a reporter protein. With a high-speed FACS sorter, highly fluorescent cells were enriched, and after two rounds of sorting, single clone exhibiting the highest level of sfGFP was isolated. The copy number of the isolated plasmid (pCB4270) was determined by quantitative PCR (qPCR). It was found that the isolated plasmid has approximately a 30-fold higher copy number (approx. 70 copies per cell) than that of the original plasmid. From the sequence analysis, a single mutation (C→T) at position 4690 was found, and we confirmed that this single mutation was responsible for the increased plasmid copy number. The effectiveness of the isolated high-copy-number plasmid for the overproduction of recombinant proteins was successfully demonstrated with two protein models Glutathione-S-transferase (GST) and α-amylase. The high-copy number plasmid was successfully isolated by FACS-based high-throughput screening of a plasmid library in L. citreum. The isolated plasmid could be a useful genetic tool for high-level gene expression in Leuconostoc, and for extending the applications of this useful bacteria to various areas in the dairy and pharmaceutical industries.

  5. Observation of quantum criticality with ultracold atoms in optical lattices

    NASA Astrophysics Data System (ADS)

    Zhang, Xibo

    As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields: comparative genomics and epigenomics, and employs a variety of computational techniques to address the problems. One fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective: to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.

  6. A measurement of disorder in binary sequences

    NASA Astrophysics Data System (ADS)

    Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

    2015-03-01

    We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.

  7. Boosting the FM-Index on the GPU: Effective Techniques to Mitigate Random Memory Access.

    PubMed

    Chacón, Alejandro; Marco-Sola, Santiago; Espinosa, Antonio; Ribeca, Paolo; Moure, Juan Carlos

    2015-01-01

    The recent advent of high-throughput sequencing machines producing big amounts of short reads has boosted the interest in efficient string searching techniques. As of today, many mainstream sequence alignment software tools rely on a special data structure, called the FM-index, which allows for fast exact searches in large genomic references. However, such searches translate into a pseudo-random memory access pattern, thus making memory access the limiting factor of all computation-efficient implementations, both on CPUs and GPUs. Here, we show that several strategies can be put in place to remove the memory bottleneck on the GPU: more compact indexes can be implemented by having more threads work cooperatively on larger memory blocks, and a k-step FM-index can be used to further reduce the number of memory accesses. The combination of those and other optimisations yields an implementation that is able to process about two Gbases of queries per second on our test platform, being about 8 × faster than a comparable multi-core CPU version, and about 3 × to 5 × faster than the FM-index implementation on the GPU provided by the recently announced Nvidia NVBIO bioinformatics library.

  8. The effect of two different visual presentation modalities on the narratives of mainstream grade 3 children.

    PubMed

    Klop, D; Engelbrecht, L

    2013-12-01

    This study investigated whether a dynamic visual presentation method (a soundless animated video presentation) would elicit better narratives than a static visual presentation method (a wordless picture book). Twenty mainstream grade 3 children were randomly assigned to two groups and assessed with one of the visual presentation methods. Narrative performance was measured in terms of micro- and macrostructure variables. Microstructure variables included productivity (total number of words, total number of T-units), syntactic complexity (mean length of T-unit) and lexical diversity measures (number of different words). Macrostructure variables included episodic structure in terms of goal-attempt-outcome (GAO) sequences. Both visual presentation modalities elicited narratives of similar quantity and quality in terms of the micro- and macrostructure variables that were investigated. Animation of picture stimuli did not elicit better narratives than static picture stimuli.

  9. Visual Perceptual Echo Reflects Learning of Regularities in Rapid Luminance Sequences.

    PubMed

    Chang, Acer Y-C; Schwartzman, David J; VanRullen, Rufin; Kanai, Ryota; Seth, Anil K

    2017-08-30

    A novel neural signature of active visual processing has recently been described in the form of the "perceptual echo", in which the cross-correlation between a sequence of randomly fluctuating luminance values and occipital electrophysiological signals exhibits a long-lasting periodic (∼100 ms cycle) reverberation of the input stimulus (VanRullen and Macdonald, 2012). As yet, however, the mechanisms underlying the perceptual echo and its function remain unknown. Reasoning that natural visual signals often contain temporally predictable, though nonperiodic features, we hypothesized that the perceptual echo may reflect a periodic process associated with regularity learning. To test this hypothesis, we presented subjects with successive repetitions of a rapid nonperiodic luminance sequence, and examined the effects on the perceptual echo, finding that echo amplitude linearly increased with the number of presentations of a given luminance sequence. These data suggest that the perceptual echo reflects a neural signature of regularity learning.Furthermore, when a set of repeated sequences was followed by a sequence with inverted luminance polarities, the echo amplitude decreased to the same level evoked by a novel stimulus sequence. Crucially, when the original stimulus sequence was re-presented, the echo amplitude returned to a level consistent with the number of presentations of this sequence, indicating that the visual system retained sequence-specific information, for many seconds, even in the presence of intervening visual input. Altogether, our results reveal a previously undiscovered regularity learning mechanism within the human visual system, reflected by the perceptual echo. SIGNIFICANCE STATEMENT How the brain encodes and learns fast-changing but nonperiodic visual input remains unknown, even though such visual input characterizes natural scenes. We investigated whether the phenomenon of "perceptual echo" might index such learning. The perceptual echo is a long-lasting reverberation between a rapidly changing visual input and evoked neural activity, apparent in cross-correlations between occipital EEG and stimulus sequences, peaking in the alpha (∼10 Hz) range. We indeed found that perceptual echo is enhanced by repeatedly presenting the same visual sequence, indicating that the human visual system can rapidly and automatically learn regularities embedded within fast-changing dynamic sequences. These results point to a previously undiscovered regularity learning mechanism, operating at a rate defined by the alpha frequency. Copyright © 2017 the authors 0270-6474/17/378486-12$15.00/0.

  10. Experimental studies of two-stage centrifugal dust concentrator

    NASA Astrophysics Data System (ADS)

    Vechkanova, M. V.; Fadin, Yu M.; Ovsyannikov, Yu G.

    2018-03-01

    The article presents data of experimental results of two-stage centrifugal dust concentrator, describes its design, and shows the development of a method of engineering calculation and laboratory investigations. For the experiments, the authors used quartz, ceramic dust and slag. Experimental dispersion analysis of dust particles was obtained by sedimentation method. To build a mathematical model of the process, dust collection was built using central composite rotatable design of the four factorial experiment. A sequence of experiments was conducted in accordance with the table of random numbers. Conclusion were made.

  11. IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

    EPA Science Inventory

    Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...

  12. Evolution in a Test Tube: Exploring the Structure and Function of RNA Probes

    DTIC Science & Technology

    2008-05-02

    Bartel, D.P. and Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. 24...Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. Chen, Ying; Carlini

  13. Movie denoising by average of warped lines.

    PubMed

    Bertalmío, Marcelo; Caselles, Vicent; Pardo, Alvaro

    2007-09-01

    Here, we present an efficient method for movie denoising that does not require any motion estimation. The method is based on the well-known fact that averaging several realizations of a random variable reduces the variance. For each pixel to be denoised, we look for close similar samples along the level surface passing through it. With these similar samples, we estimate the denoised pixel. The method to find close similar samples is done via warping lines in spatiotemporal neighborhoods. For that end, we present an algorithm based on a method for epipolar line matching in stereo pairs which has per-line complexity O (N), where N is the number of columns in the image. In this way, when applied to the image sequence, our algorithm is computationally efficient, having a complexity of the order of the total number of pixels. Furthermore, we show that the presented method is unsupervised and is adapted to denoise image sequences with an additive white noise while respecting the visual details on the movie frames. We have also experimented with other types of noise with satisfactory results.

  14. Reconstruction of DNA sequences using genetic algorithms and cellular automata: towards mutation prediction?

    PubMed

    Mizas, Ch; Sirakoulis, G Ch; Mardiris, V; Karafyllidis, I; Glykos, N; Sandaltzopoulos, R

    2008-04-01

    Change of DNA sequence that fuels evolution is, to a certain extent, a deterministic process because mutagenesis does not occur in an absolutely random manner. So far, it has not been possible to decipher the rules that govern DNA sequence evolution due to the extreme complexity of the entire process. In our attempt to approach this issue we focus solely on the mechanisms of mutagenesis and deliberately disregard the role of natural selection. Hence, in this analysis, evolution refers to the accumulation of genetic alterations that originate from mutations and are transmitted through generations without being subjected to natural selection. We have developed a software tool that allows modelling of a DNA sequence as a one-dimensional cellular automaton (CA) with four states per cell which correspond to the four DNA bases, i.e. A, C, T and G. The four states are represented by numbers of the quaternary number system. Moreover, we have developed genetic algorithms (GAs) in order to determine the rules of CA evolution that simulate the DNA evolution process. Linear evolution rules were considered and square matrices were used to represent them. If DNA sequences of different evolution steps are available, our approach allows the determination of the underlying evolution rule(s). Conversely, once the evolution rules are deciphered, our tool may reconstruct the DNA sequence in any previous evolution step for which the exact sequence information was unknown. The developed tool may be used to test various parameters that could influence evolution. We describe a paradigm relying on the assumption that mutagenesis is governed by a near-neighbour-dependent mechanism. Based on the satisfactory performance of our system in the deliberately simplified example, we propose that our approach could offer a starting point for future attempts to understand the mechanisms that govern evolution. The developed software is open-source and has a user-friendly graphical input interface.

  15. Entropy and long-range memory in random symbolic additive Markov chains

    NASA Astrophysics Data System (ADS)

    Melnik, S. S.; Usatenko, O. V.

    2016-06-01

    The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.

  16. Entropy and long-range memory in random symbolic additive Markov chains.

    PubMed

    Melnik, S S; Usatenko, O V

    2016-06-01

    The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.

  17. Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

    PubMed

    Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

    2016-04-28

    Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  18. solar spicules and jets

    NASA Astrophysics Data System (ADS)

    Tavabi, E.; Koutchmy, S.; Ajabshirizadeh, A.

    2012-06-01

    In order to clear up the origin and possibly explain some solar limb and disc spicule quasi-periodic recurrences produced by overlapping effects, we present a simulation model assuming quasi- random positions of spicules. We also allow a set number of spicules with different physical properties (such as: height, lifetime and tilt angle as shown by an individual spicule) occurring randomly. Results of simulations made with three different spatial resolutions of the corresponding frames and also for different number density of spicules, are analyzed. The wavelet time/frequency method is used to obtain the exact period of spicule visibility. Results are compared with observations of the chromosphere from i/ the Transition Region and Coronal Explorer (TRACE) filtergrams taken at 1600 angstrom, ii/ the Solar Optical Telescope (SOT) of Hinode taken in the Ca II H-line and iii/ the Sac-Peak Dunn's VTT taken in H? line. Our results suggest the need to be cautious when interpreting apparent oscillations seen in spicule image sequences when overlapping is present, i.e.; when the spatial resolution is not enough to resolve individual components of spicules.

  19. Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality

    NASA Astrophysics Data System (ADS)

    Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.

    2018-03-01

    This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.

  20. Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome.

    PubMed

    Wu, Jia Qian; Du, Jiang; Rozowsky, Joel; Zhang, Zhengdong; Urban, Alexander E; Euskirchen, Ghia; Weissman, Sherman; Gerstein, Mark; Snyder, Michael

    2008-01-03

    Recent studies of the mammalian transcriptome have revealed a large number of additional transcribed regions and extraordinary complexity in transcript diversity. However, there is still much uncertainty regarding precisely what portion of the genome is transcribed, the exact structures of these novel transcripts, and the levels of the transcripts produced. We have interrogated the transcribed loci in 420 selected ENCyclopedia Of DNA Elements (ENCODE) regions using rapid amplification of cDNA ends (RACE) sequencing. We analyzed annotated known gene regions, but primarily we focused on novel transcriptionally active regions (TARs), which were previously identified by high-density oligonucleotide tiling arrays and on random regions that were not believed to be transcribed. We found RACE sequencing to be very sensitive and were able to detect low levels of transcripts in specific cell types that were not detectable by microarrays. We also observed many instances of sense-antisense transcripts; further analysis suggests that many of the antisense transcripts (but not all) may be artifacts generated from the reverse transcription reaction. Our results show that the majority of the novel TARs analyzed (60%) are connected to other novel TARs or known exons. Of previously unannotated random regions, 17% were shown to produce overlapping transcripts. Furthermore, it is estimated that 9% of the novel transcripts encode proteins. We conclude that RACE sequencing is an efficient, sensitive, and highly accurate method for characterization of the transcriptome of specific cell/tissue types. Using this method, it appears that much of the genome is represented in polyA+ RNA. Moreover, a fraction of the novel RNAs can encode protein and are likely to be functional.

  1. Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

    NASA Astrophysics Data System (ADS)

    Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

    2018-04-01

    Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.

  2. Network harness: bundles of routes in public transport networks

    NASA Astrophysics Data System (ADS)

    Berche, B.; von Ferber, C.; Holovatch, T.

    2009-12-01

    Public transport routes sharing the same grid of streets and tracks are often found to proceed in parallel along shorter or longer sequences of stations. Similar phenomena are observed in other networks built with space consuming links such as cables, vessels, pipes, neurons, etc. In the case of public transport networks (PTNs) this behavior may be easily worked out on the basis of sequences of stations serviced by each route. To quantify this behavior we use the recently introduced notion of network harness. It is described by the harness distribution P(r, s): the number of sequences of s consecutive stations that are serviced by r parallel routes. For certain PTNs that we have analyzed we observe that the harness distribution may be described by power laws. These power laws indicate a certain level of organization and planning which may be driven by the need to minimize the costs of infrastructure and secondly by the fact that points of interest tend to be clustered in certain locations of a city. This effect may be seen as a result of the strong interdependence of the evolutions of both the city and its PTN. To further investigate the significance of the empirical results we have studied one- and two-dimensional models of randomly placed routes modeled by different types of walks. While in one dimension an analytic treatment was successful, the two dimensional case was studied by simulations showing that the empirical results for real PTNs deviate significantly from those expected for randomly placed routes.

  3. An In-Depth Analysis of the Chung-Lu Model

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Winlaw, M.; DeSterck, H.; Sanders, G.

    2015-10-28

    In the classic Erd}os R enyi random graph model [5] each edge is chosen with uniform probability and the degree distribution is binomial, limiting the number of graphs that can be modeled using the Erd}os R enyi framework [10]. The Chung-Lu model [1, 2, 3] is an extension of the Erd}os R enyi model that allows for more general degree distributions. The probability of each edge is no longer uniform and is a function of a user-supplied degree sequence, which by design is the expected degree sequence of the model. This property makes it an easy model to work withmore » theoretically and since the Chung-Lu model is a special case of a random graph model with a given degree sequence, many of its properties are well known and have been studied extensively [2, 3, 13, 8, 9]. It is also an attractive null model for many real-world networks, particularly those with power-law degree distributions and it is sometimes used as a benchmark for comparison with other graph generators despite some of its limitations [12, 11]. We know for example, that the average clustering coe cient is too low relative to most real world networks. As well, measures of a nity are also too low relative to most real-world networks of interest. However, despite these limitations or perhaps because of them, the Chung-Lu model provides a basis for comparing new graph models.« less

  4. Conditional poliovirus mutants made by random deletion mutagenesis of infectious cDNA.

    PubMed Central

    Kirkegaard, K; Nelsen, B

    1990-01-01

    Small deletions were introduced into DNA plasmids bearing cDNA copies of Mahoney type 1 poliovirus RNA. The procedure used was similar to that of P. Hearing and T. Shenk (J. Mol. Biol. 167:809-822, 1983), with modifications designed to introduce only one lesion randomly into each DNA molecule. Methods to map small deletions in either large DNA or RNA molecules were employed. Two poliovirus mutants, VP1-101 and VP1-102, were selected from mutagenized populations on the basis of their host range phenotype, showing a large reduction in the relative numbers of plaques on CV1 and HeLa cells compared with wild-type virus. The deletions borne by the mutant genomes were mapped to the region encoding the amino terminus of VP1. That these lesions were responsible for the mutant phenotypes was substantiated by reintroduction of the sequenced lesions into a wild-type poliovirus cDNA by deoxyoligonucleotide-directed mutagenesis. The deletion of nucleotides encoding amino acids 8 and 9 of VP1 was responsible for the VP1-101 phenotype; the VP1-102 defect was caused by the deletion of the sequences encoding the first four amino acids of VP1. The peptide sequence at the VP1-VP3 proteolytic cleavage site was altered from glutamine-glycine to glutamine-methionine in VP1-102; this apparently did not alter the proteolytic cleavage pattern. The biochemical defects resulting from these mutations are discussed in the accompanying report. Images PMID:2152811

  5. Volume calculation of CT lung lesions based on Halton low-discrepancy sequences

    NASA Astrophysics Data System (ADS)

    Li, Shusheng; Wang, Liansheng; Li, Shuo

    2017-03-01

    Volume calculation from the Computed Tomography (CT) lung lesions data is a significant parameter for clinical diagnosis. The volume is widely used to assess the severity of the lung nodules and track its progression, however, the accuracy and efficiency of previous studies are not well achieved for clinical uses. It remains to be a challenging task due to its tight attachment to the lung wall, inhomogeneous background noises and large variations in sizes and shape. In this paper, we employ Halton low-discrepancy sequences to calculate the volume of the lung lesions. The proposed method directly compute the volume without the procedure of three-dimension (3D) model reconstruction and surface triangulation, which significantly improves the efficiency and reduces the complexity. The main steps of the proposed method are: (1) generate a certain number of random points in each slice using Halton low-discrepancy sequences and calculate the lesion area of each slice through the proportion; (2) obtain the volume by integrating the areas in the sagittal direction. In order to evaluate our proposed method, the experiments were conducted on the sufficient data sets with different size of lung lesions. With the uniform distribution of random points, our proposed method achieves more accurate results compared with other methods, which demonstrates the robustness and accuracy for the volume calculation of CT lung lesions. In addition, our proposed method is easy to follow and can be extensively applied to other applications, e.g., volume calculation of liver tumor, atrial wall aneurysm, etc.

  6. The low information content of Neurospora splicing signals: implications for RNA splicing and intron origin.

    PubMed

    Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M

    2015-05-01

    When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  7. Sites of instability in the human TCF3 (E2A) gene adopt G-quadruplex DNA structures in vitro

    PubMed Central

    Williams, Jonathan D.; Fleetwood, Sara; Berroyer, Alexandra; Kim, Nayun; Larson, Erik D.

    2015-01-01

    The formation of highly stable four-stranded DNA, called G-quadruplex (G4), promotes site-specific genome instability. G4 DNA structures fold from repetitive guanine sequences, and increasing experimental evidence connects G4 sequence motifs with specific gene rearrangements. The human transcription factor 3 (TCF3) gene (also termed E2A) is subject to genetic instability associated with severe disease, most notably a common translocation event t(1;19) associated with acute lymphoblastic leukemia. The sites of instability in TCF3 are not randomly distributed, but focused to certain sequences. We asked if G4 DNA formation could explain why TCF3 is prone to recombination and mutagenesis. Here we demonstrate that sequences surrounding the major t(1;19) break site and a region associated with copy number variations both contain G4 sequence motifs. The motifs identified readily adopt G4 DNA structures that are stable enough to interfere with DNA synthesis in physiological salt conditions in vitro. When introduced into the yeast genome, TCF3 G4 motifs promoted gross chromosomal rearrangements in a transcription-dependent manner. Our results provide a molecular rationale for the site-specific instability of human TCF3, suggesting that G4 DNA structures contribute to oncogenic DNA breaks and recombination. PMID:26029241

  8. Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis

    PubMed Central

    Zheng, Jin-shuang; Sun, Cheng-zhen; Zhang, Shu-ning; Hou, Xi-lin; Bonnema, Guusje

    2016-01-01

    A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis. PMID:27507974

  9. Cytogenetic Diversity of Simple Sequences Repeats in Morphotypes of Brassica rapa ssp. chinensis.

    PubMed

    Zheng, Jin-Shuang; Sun, Cheng-Zhen; Zhang, Shu-Ning; Hou, Xi-Lin; Bonnema, Guusje

    2016-01-01

    A significant fraction of the nuclear DNA of all eukaryotes is comprised of simple sequence repeats (SSRs). Although these sequences are widely used for studying genetic variation, linkage mapping and evolution, little attention had been paid to the chromosomal distribution and cytogenetic diversity of these sequences. In this paper, we report the distribution characterization of mono-, di-, and tri-nucleotide SSRs in Brassica rapa ssp. chinensis. Fluorescence in situ hybridization was used to characterize the cytogenetic diversity of SSRs among morphotypes of B. rapa ssp. chinensis. The proportion of different SSR motifs varied among morphotypes of B. rapa ssp. chinensis, with tri-nucleotide SSRs being more prevalent in the genome of B. rapa ssp. chinensis. We determined the chromosomal locations of mono-, di-, and tri-nucleotide repeat loci. The results showed that the chromosomal distribution of SSRs in the different morphotypes is non-random and motif-dependent, and allowed us to characterize the relative variability in terms of SSR numbers and similar chromosomal distributions in centromeric/peri-centromeric heterochromatin. The differences between SSR repeats with respect to abundance and distribution indicate that SSRs are a driving force in the genomic evolution of B. rapa species. Our results provide a comprehensive view of the SSR sequence distribution and evolution for comparison among morphotypes B. rapa ssp. chinensis.

  10. Hiding message into DNA sequence through DNA coding and chaotic maps.

    PubMed

    Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

    2014-09-01

    The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.

  11. Random oligonucleotide mutagenesis: application to a large protein coding sequence of a major histocompatibility complex class I gene, H-2DP.

    PubMed Central

    Murray, R; Pederson, K; Prosser, H; Muller, D; Hutchison, C A; Frelinger, J A

    1988-01-01

    We have used random oligonucleotide mutagenesis (or saturation mutagenesis) to create a library of point mutations in the alpha 1 protein domain of a Major Histocompatibility Complex (MHC) molecule. This protein domain is critical for T cell and B cell recognition. We altered the MHC class I H-2DP gene sequence such that synthetic mutant alpha 1 exons (270 bp of coding sequence), which contain mutations identified by sequence analysis, can replace the wild type alpha 1 exon. The synthetic exons were constructed from twelve overlapping oligonucleotides which contained an average of 1.3 random point mutations per intact exon. DNA sequence analysis of mutant alpha 1 exons has shown a point mutant distribution that fits a Poisson distribution, and thus emphasizes the utility of this mutagenesis technique to "scan" a large protein sequence for important mutations. We report our use of saturation mutagenesis to scan an entire exon of the H-2DP gene, a cassette strategy to replace the wild type alpha 1 exon with individual mutant alpha 1 exons, and analysis of mutant molecules expressed on the surface of transfected mouse L cells. Images PMID:2903482

  12. Determining the Significance of Item Order in Randomized Problem Sets

    ERIC Educational Resources Information Center

    Pardos, Zachary A.; Heffernan, Neil T.

    2009-01-01

    Researchers who make tutoring systems would like to know which sequences of educational content lead to the most effective learning by their students. The majority of data collected in many ITS systems consist of answers to a group of questions of a given skill often presented in a random sequence. Following work that identifies which items…

  13. Improved diagonal queue medical image steganography using Chaos theory, LFSR, and Rabin cryptosystem.

    PubMed

    Jain, Mamta; Kumar, Anil; Choudhary, Rishabh Charan

    2017-06-01

    In this article, we have proposed an improved diagonal queue medical image steganography for patient secret medical data transmission using chaotic standard map, linear feedback shift register, and Rabin cryptosystem, for improvement of previous technique (Jain and Lenka in Springer Brain Inform 3:39-51, 2016). The proposed algorithm comprises four stages, generation of pseudo-random sequences (pseudo-random sequences are generated by linear feedback shift register and standard chaotic map), permutation and XORing using pseudo-random sequences, encryption using Rabin cryptosystem, and steganography using the improved diagonal queues. Security analysis has been carried out. Performance analysis is observed using MSE, PSNR, maximum embedding capacity, as well as by histogram analysis between various Brain disease stego and cover images.

  14. Event related potentials to digit learning: tracking neurophysiologic changes accompanying recall performance.

    PubMed

    Jongsma, Marijtje L A; Gerrits, Niels J H M; van Rijn, Clementina M; Quiroga, Rodrigo Quian; Maes, Joseph H R

    2012-07-01

    The aim of this study was to track recall performance and event-related potentials (ERPs) across multiple trials in a digit-learning task. When a sequence is practiced by repetition, the number of errors typically decreases and a learning curve emerges. Until now, almost all ERP learning and memory research has focused on effects after a single presentation and, therefore, fails to capture the dynamic changes that characterize a learning process. However, the current study used a free-recall task in which a sequence of ten auditory digits was presented repeatedly. Auditory sequences of ten digits were presented in a logical order (control sequences) or in a random order (experimental sequences). Each sequence was presented six times. Participants had to reproduce the sequence after each presentation. EEG recordings were made at the time of the digit presentations. Recall performance for the control sequences was close to asymptote right after the first learning trial, whereas performance for the experimental sequences initially displayed primacy and recency effects. However, these latter effects gradually disappeared over the six repetitions, resulting in near-asymptotic recall performance for all digits. The performance improvement for the middle items of the list was accompanied by an increase in P300 amplitude, implying a close correspondence between this ERP component and the behavioral data. These results, which were discussed in the framework of theories on the functional significance of the P300 amplitude, add to the scarce empirical data on the dynamics of ERP responses in the process of intentional learning. Copyright © 2011 Elsevier B.V. All rights reserved.

  15. Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

    PubMed Central

    2011-01-01

    Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510

  16. δ-exceedance records and random adaptive walks

    NASA Astrophysics Data System (ADS)

    Park, Su-Chan; Krug, Joachim

    2016-08-01

    We study a modified record process where the kth record in a series of independent and identically distributed random variables is defined recursively through the condition {Y}k\\gt {Y}k-1-{δ }k-1 with a deterministic sequence {δ }k\\gt 0 called the handicap. For constant {δ }k\\equiv δ and exponentially distributed random variables it has been shown in previous work that the process displays a phase transition as a function of δ between a normal phase where the mean record value increases indefinitely and a stationary phase where the mean record value remains bounded and a finite fraction of all entries are records (Park et al 2015 Phys. Rev. E 91 042707). Here we explore the behavior for general probability distributions and decreasing and increasing sequences {δ }k, focusing in particular on the case when {δ }k matches the typical spacing between subsequent records in the underlying simple record process without handicap. We find that a continuous phase transition occurs only in the exponential case, but a novel kind of first order transition emerges when {δ }k is increasing. The problem is partly motivated by the dynamics of evolutionary adaptation in biological fitness landscapes, where {δ }k corresponds to the change of the deterministic fitness component after k mutational steps. The results for the record process are used to compute the mean number of steps that a population performs in such a landscape before being trapped at a local fitness maximum.

  17. A maize map standard with sequenced core markers, grass genome reference points and 932 expressed sequence tagged sites (ESTs) in a 1736-locus map.

    PubMed Central

    Davis, G L; McMullen, M D; Baysdorfer, C; Musket, T; Grant, D; Staebell, M; Xu, G; Polacco, M; Koster, L; Melia-Hancock, S; Houchins, K; Chao, S; Coe, E H

    1999-01-01

    We have constructed a 1736-locus maize genome map containing1156 loci probed by cDNAs, 545 probed by random genomic clones, 16 by simple sequence repeats (SSRs), 14 by isozymes, and 5 by anonymous clones. Sequence information is available for 56% of the loci with 66% of the sequenced loci assigned functions. A total of 596 new ESTs were mapped from a B73 library of 5-wk-old shoots. The map contains 237 loci probed by barley, oat, wheat, rice, or tripsacum clones, which serve as grass genome reference points in comparisons between maize and other grass maps. Ninety core markers selected for low copy number, high polymorphism, and even spacing along the chromosome delineate the 100 bins on the map. The average bin size is 17 cM. Use of bin assignments enables comparison among different maize mapping populations and experiments including those involving cytogenetic stocks, mutants, or quantitative trait loci. Integration of nonmaize markers in the map extends the resources available for gene discovery beyond the boundaries of maize mapping information into the expanse of map, sequence, and phenotype information from other grass species. This map provides a foundation for numerous basic and applied investigations including studies of gene organization, gene and genome evolution, targeted cloning, and dissection of complex traits. PMID:10388831

  18. Investigation of a protein complex network

    NASA Astrophysics Data System (ADS)

    Mashaghi, A. R.; Ramezanpour, A.; Karimipour, V.

    2004-09-01

    The budding yeast Saccharomyces cerevisiae is the first eukaryote whose genome has been completely sequenced. It is also the first eukaryotic cell whose proteome (the set of all proteins) and interactome (the network of all mutual interactions between proteins) has been analyzed. In this paper we study the structure of the yeast protein complex network in which weighted edges between complexes represent the number of shared proteins. It is found that the network of protein complexes is a small world network with scale free behavior for many of its distributions. However we find that there are no strong correlations between the weights and degrees of neighboring complexes. To reveal non-random features of the network we also compare it with a null model in which the complexes randomly select their proteins. Finally we propose a simple evolutionary model based on duplication and divergence of proteins.

  19. Test-enhanced web-based learning: optimizing the number of questions (a randomized crossover trial).

    PubMed

    Cook, David A; Thompson, Warren G; Thomas, Kris G

    2014-01-01

    Questions enhance learning in Web-based courses, but preliminary evidence suggests that too many questions may interfere with learning. The authors sought to determine how varying the number of self-assessment questions affects knowledge outcomes in a Web-based course. The authors conducted a randomized crossover trial in one internal medicine and one family medicine residency program between January 2009 and July 2010. Eight Web-based modules on ambulatory medicine topics were developed, with varying numbers of self-assessment questions (0, 1, 5, 10, or 15). Participants completed modules in four different formats each year, with sequence randomly assigned. Participants completed a pretest for half their modules. Outcomes included knowledge, completion time, and module ratings. One hundred eighty residents provided data. The mean (standard error) percent correct knowledge score was 53.2 (0.8) for pretests and 73.7 (0.5) for posttests. In repeated-measures analysis pooling all data, mean posttest knowledge scores were highest for the 10- and 15-question formats (75.7 [1.1] and 74.4 [1.0], respectively) and lower for 0-, 1-, and 5-question formats (73.1 [1.3], 72.9 [1.0], and 72.8 [1.5], respectively); P = .04 for differences across all modules. Modules with more questions generally took longer to complete and were rated higher, although differences were small. Residents most often identified 10 questions as ideal. Posttest knowledge scores were higher for modules that included a pretest (75.4 [0.9] versus 72.2 [0.9]; P = .0002). Increasing the number of self-assessment questions improves learning until a plateau beyond which additional questions do not add value.

  20. Benchmarking protein classification algorithms via supervised cross-validation.

    PubMed

    Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

    2008-04-24

    Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.

  1. Effect of Ginkgo biloba on visual field and contrast sensitivity in Chinese patients with normal tension glaucoma: a randomized, crossover clinical trial.

    PubMed

    Guo, Xinxing; Kong, Xiangbin; Huang, Rui; Jin, Ling; Ding, Xiaohu; He, Mingguang; Liu, Xing; Patel, Mehul Chimanlal; Congdon, Nathan G

    2014-01-07

    We evaluated the effect of ginkgo biloba extract on visual field defect and contrast sensitivity in a Chinese cohort with normal tension glaucoma. In this prospective, randomized, placebo-controlled crossover study, patients newly diagnosed with normal tension glaucoma, either in a tertiary glaucoma clinic (n = 5) or in a cohort undergoing routine general physical examinations in a primary care clinic (n = 30), underwent two 4-week phases of treatment, separated by a washout period of 8 weeks. Randomization determined whether ginkgo biloba extract (40 mg, 3 times per day) or placebo (identical-appearing tablets) was received first. Primary outcomes were change in contrast sensitivity and mean deviation on 24-2 SITA standard visual field testing, while secondary outcomes included IOP and self-reported adverse events. A total of 35 patients with mean age 63.7 (6.5) years were randomized to the ginkgo biloba extract-placebo (n = 18) or the placebo-ginkgo biloba extract (n = 17) sequence. A total of 28 patients (80.0%, 14 in each group) who completed testing did not differ at baseline in age, sex, visual field mean deviation, contrast sensitivity, IOP, or blood pressure. Changes in visual field and contrast sensitivity did not differ by treatment received or sequence (P > 0.2 for all). Power to have detected a difference in mean defect as large as previously reported was 80%. In contrast to some previous reports, ginkgo biloba extract treatment had no effect on mean defect or contrast sensitivity in this group of normal tension glaucoma patients. (http://www.chictr.org number, ChiCTR-TRC-08000724).

  2. GuiTope: an application for mapping random-sequence peptides to protein sequences.

    PubMed

    Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

    2012-01-03

    Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.

  3. Microscale simulations of shock interaction with large assembly of particles for developing point-particle models

    NASA Astrophysics Data System (ADS)

    Thakur, Siddharth; Neal, Chris; Mehta, Yash; Sridharan, Prasanth; Jackson, Thomas; Balachandar, S.

    2017-01-01

    Micrsoscale simulations are being conducted for developing point-particle and other related models that are needed for the mesoscale and macroscale simulations of explosive dispersal of particles. These particle models are required to compute (a) instantaneous aerodynamic force on the particle and (b) instantaneous net heat transfer between the particle and the surrounding. A strategy for a sequence of microscale simulations has been devised that allows systematic development of the hybrid surrogate models that are applicable at conditions representative of the explosive dispersal application. The ongoing microscale simulations seek to examine particle force dependence on: (a) Mach number, (b) Reynolds number, and (c) volume fraction (different particle arrangements such as cubic, face-centered cubic (FCC), body-centered cubic (BCC) and random). Future plans include investigation of sequences of fully-resolved microscale simulations consisting of an array of particles subjected to more realistic time-dependent flows that progressively better approximate the actual problem of explosive dispersal. Additionally, effects of particle shape, size, and number in simulation as well as the transient particle deformation dependence on various parameters including: (a) particle material, (b) medium material, (c) multiple particles, (d) incoming shock pressure and speed, (e) medium to particle impedance ratio, (f) particle shape and orientation to shock, etc. are being investigated.

  4. Missing value imputation for gene expression data by tailored nearest neighbors.

    PubMed

    Faisal, Shahla; Tutz, Gerhard

    2017-04-25

    High dimensional data like gene expression and RNA-sequences often contain missing values. The subsequent analysis and results based on these incomplete data can suffer strongly from the presence of these missing values. Several approaches to imputation of missing values in gene expression data have been developed but the task is difficult due to the high dimensionality (number of genes) of the data. Here an imputation procedure is proposed that uses weighted nearest neighbors. Instead of using nearest neighbors defined by a distance that includes all genes the distance is computed for genes that are apt to contribute to the accuracy of imputed values. The method aims at avoiding the curse of dimensionality, which typically occurs if local methods as nearest neighbors are applied in high dimensional settings. The proposed weighted nearest neighbors algorithm is compared to existing missing value imputation techniques like mean imputation, KNNimpute and the recently proposed imputation by random forests. We use RNA-sequence and microarray data from studies on human cancer to compare the performance of the methods. The results from simulations as well as real studies show that the weighted distance procedure can successfully handle missing values for high dimensional data structures where the number of predictors is larger than the number of samples. The method typically outperforms the considered competitors.

  5. Comparing the white dwarf cooling sequences in 47 Tuc and NGC 6397

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Richer, Harvey B.; Goldsbury, Ryan; Heyl, Jeremy

    2013-12-01

    Using deep Hubble Space Telescope imaging, color-magnitude diagrams are constructed for the globular clusters 47 Tuc and NGC 6397. As expected, because of its lower metal abundance, the main sequence of NGC 6397 lies well to the blue of that of 47 Tuc. A comparison of the white dwarf cooling sequences of the two clusters, however, demonstrates that these sequences are indistinguishable over most of their loci—a consequence of the settling out of heavy elements in the dense white dwarf atmosphere and the near equality of their masses. Lower quality data on M4 continues this trend to a third clustermore » whose metallicity is intermediate between these two. While the path of the white dwarfs in the color-magnitude diagram is nearly identical in 47 Tuc and NGC 6397, the numbers of white dwarfs along the path are not. This results from the relatively rapid relaxation in NGC 6397 compared to 47 Tuc and provides a cautionary note that simply counting objects in star clusters in random locations as a method of testing stellar evolutionary theory is likely dangerous unless dynamical considerations are included.« less

  6. Trinucleotide cassettes increase diversity of T7 phage-displayed peptide library.

    PubMed

    Krumpe, Lauren R H; Schumacher, Kathryn M; McMahon, James B; Makowski, Lee; Mori, Toshiyuki

    2007-10-05

    Amino acid sequence diversity is introduced into a phage-displayed peptide library by randomizing library oligonucleotide DNA. We recently evaluated the diversity of peptide libraries displayed on T7 lytic phage and M13 filamentous phage and showed that T7 phage can display a more diverse amino acid sequence repertoire due to differing processes of viral morphogenesis. In this study, we evaluated and compared the diversity of a 12-mer T7 phage-displayed peptide library randomized using codon-corrected trinucleotide cassettes with a T7 and an M13 12-mer phage-displayed peptide library constructed using the degenerate codon randomization method. We herein demonstrate that the combination of trinucleotide cassette amino acid codon randomization and T7 phage display construction methods resulted in a significant enhancement to the functional diversity of a 12-mer peptide library. This novel library exhibited superior amino acid uniformity and order-of-magnitude increases in amino acid sequence diversity as compared to degenerate codon randomized peptide libraries. Comparative analyses of the biophysical characteristics of the 12-mer peptide libraries revealed the trinucleotide cassette-randomized library to be a unique resource. The combination of T7 phage display and trinucleotide cassette randomization resulted in a novel resource for the potential isolation of binding peptides for new and previously studied molecular targets.

  7. The effects of eszopiclone on sleep spindles and memory consolidation in schizophrenia: a randomized placebo-controlled trial.

    PubMed

    Wamsley, Erin J; Shinn, Ann K; Tucker, Matthew A; Ono, Kim E; McKinley, Sophia K; Ely, Alice V; Goff, Donald C; Stickgold, Robert; Manoach, Dara S

    2013-09-01

    In schizophrenia there is a dramatic reduction of sleep spindles that predicts deficient sleep-dependent memory consolidation. Eszopiclone (Lunesta), a non-benzodiazepine hypnotic, acts on γ-aminobutyric acid (GABA) neurons in the thalamic reticular nucleus where spindles are generated. We investigated whether eszopiclone could increase spindles and thereby improve memory consolidation in schizophrenia. In a double-blind design, patients were randomly assigned to receive either placebo or 3 mg of eszopiclone. Patients completed Baseline and Treatment visits, each consisting of two consecutive nights of polysomnography. On the second night of each visit, patients were trained on the motor sequence task (MST) at bedtime and tested the following morning. Academic research center. Twenty-one chronic, medicated schizophrenia outpatients. We compared the effects of two nights of eszopiclone vs. placebo on stage 2 sleep spindles and overnight changes in MST performance. Eszopiclone increased the number and density of spindles over baseline levels significantly more than placebo, but did not significantly enhance overnight MST improvement. In the combined eszopiclone and placebo groups, spindle number and density predicted overnight MST improvement. Eszopiclone significantly increased sleep spindles, which correlated with overnight motor sequence task improvement. These findings provide partial support for the hypothesis that the spindle deficit in schizophrenia impairs sleep-dependent memory consolidation and may be ameliorated by eszopiclone. Larger samples may be needed to detect a significant effect on memory. Given the general role of sleep spindles in cognition, they offer a promising novel potential target for treating cognitive deficits in schizophrenia.

  8. Universality of long-range correlations in expansion randomization systems

    NASA Astrophysics Data System (ADS)

    Messer, P. W.; Lässig, M.; Arndt, P. F.

    2005-10-01

    We study the stochastic dynamics of sequences evolving by single-site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.

  9. Serial Reaction Time Learning in Preschool- and School-Age Children.

    ERIC Educational Resources Information Center

    Thomas, Kathleen M.; Nelson, Charles A.

    2001-01-01

    Two experiments assessed visuomotor sequence learning in 4- to 10-year-olds using a serial reaction time (SRT) task with random and sequenced trials. Found that children demonstrated sequence-specific decreases in RT. Participants with explicit awareness of the sequence at the session's end showed larger sequence-specific RT decrements than…

  10. Stationary Random Metrics on Hierarchical Graphs Via {(min,+)}-type Recursive Distributional Equations

    NASA Astrophysics Data System (ADS)

    Khristoforov, Mikhail; Kleptsyn, Victor; Triestino, Michele

    2016-07-01

    This paper is inspired by the problem of understanding in a mathematical sense the Liouville quantum gravity on surfaces. Here we show how to define a stationary random metric on self-similar spaces which are the limit of nice finite graphs: these are the so-called hierarchical graphs. They possess a well-defined level structure and any level is built using a simple recursion. Stopping the construction at any finite level, we have a discrete random metric space when we set the edges to have random length (using a multiplicative cascade with fixed law {m}). We introduce a tool, the cut-off process, by means of which one finds that renormalizing the sequence of metrics by an exponential factor, they converge in law to a non-trivial metric on the limit space. Such limit law is stationary, in the sense that glueing together a certain number of copies of the random limit space, according to the combinatorics of the brick graph, the obtained random metric has the same law when rescaled by a random factor of law {m} . In other words, the stationary random metric is the solution of a distributional equation. When the measure m has continuous positive density on {mathbf{R}+}, the stationary law is unique up to rescaling and any other distribution tends to a rescaled stationary law under the iterations of the hierarchical transformation. We also investigate topological and geometric properties of the random space when m is log-normal, detecting a phase transition influenced by the branching random walk associated to the multiplicative cascade.

  11. The Advanced Glaucoma Intervention Study (AGIS): 9. Comparison of glaucoma outcomes in black and white patients within treatment groups.

    PubMed

    2001-09-01

    To compare in eyes of black and white patients the progression of glaucoma after failure of medical therapy and upon start of surgical intervention. Cohort study analysis of data from a randomized clinical trial. This multicenter study included open-angle glaucoma patients who had failed medical therapy: 451 eyes of 332 black patients, 325 eyes of 249 white patients. Eyes were randomly assigned to an argon laser trabeculoplasty (ALT)-trabeculectomy-trabeculectomy (ATT) sequence or a trabeculectomy-ALT-trabeculectomy (TAT) sequence; they had been followed for 7 to 11 years at database closure. Main outcome measures were decrease of visual field (DVF), sustained decrease of visual field (SDVF), decrease of visual acuity (DVA), sustained decrease of visual acuity (SDVA), and failure of first surgical glaucoma intervention. Statistical methods included logistic regression to obtain average adjusted black-white odds ratios for binary outcomes, and Cox regression to estimate adjusted black-white risk ratios for time-to-event outcomes. In the ATT sequence blacks were at lower risk than whites of failure of first intervention (ALT, RR = 0.68, P = 0.040). In the TAT sequence blacks were at higher risk than whites of failure of the first intervention (trabeculectomy, RR = 1.79, P = 0.033), of intraocular pressure > or =18 mm Hg (average OR = 1.41, P = 0.026), and of DVF (average OR = 1.78, P = 0.007). In both treatment sequences, the average number of prescribed medications was greater for blacks than whites (P < or = 0.002). The results support the hypothesis that after failure of medical therapy and upon initiation of surgical intervention, an initial intervention with trabeculectomy retards the progression of glaucoma more effectively in white than in black patients. The data provide a weak suggestion that an initial surgical intervention with ALT retards the progression of glaucoma more effectively in black than in white patients.

  12. Alu repeats: A source for the genesis of primate microsatellites

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Arcot, S.S.; Batzer, M.A.; Wang, Zhenyuan

    1995-09-01

    As a result of their abundance, relatively uniform distribution, and high degree of polymorphism, microsatellites and minisatellites have become valuable tools in genetic mapping, forensic identity testing, and population studies. In recent years, a number of microsatellite repeats have been found to be associated with Alu interspersed repeated DNA elements. The association of an Alu element with a microsatellite repeat could result from the integration of an Alu element within a preexisting microsatellite repeat. Alternatively, Alu elements could have a direct role in the origin of microsatellite repeats. Errors introduced during reverse transcription of the primary transcript derived from anmore » Alu {open_quotes}master{close_quote} gene or the accumulation of random mutations in the middle A-rich regions and oligo(dA)-rich tails of Alu elements after insertion and subsequent expansion and contraction of these sequences could result in the genesis of a microsatellite repeat. We have tested these hypotheses by a direct evolutionary comparison of the sequences of some recent Alu elements that are found only in humans and are absent from nonhuman primates, as well as some older Alu elements that are present at orthologous positions in a number of nonhuman primates. The origin of {open_quotes}young{close_quotes} Alu insertions, absence of sequences that resemble microsatellite repeats at the orthologous loci in chimpanzees, and the gradual expansion of microsatellite repeats in some old Alu repeats at orthologous positions within the genomes of a number of nonhuman primates suggest that Alu elements are a source for the genesis of primate microsatellite repeats. 48 refs., 5 figs., 3 tabs.« less

  13. Self-correcting random number generator

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Humble, Travis S.; Pooser, Raphael C.

    2016-09-06

    A system and method for generating random numbers. The system may include a random number generator (RNG), such as a quantum random number generator (QRNG) configured to self-correct or adapt in order to substantially achieve randomness from the output of the RNG. By adapting, the RNG may generate a random number that may be considered random regardless of whether the random number itself is tested as such. As an example, the RNG may include components to monitor one or more characteristics of the RNG during operation, and may use the monitored characteristics as a basis for adapting, or self-correcting, tomore » provide a random number according to one or more performance criteria.« less

  14. Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent

    PubMed Central

    Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric

    2014-01-01

    Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414

  15. Augmented brain function by coordinated reset stimulation with slowly varying sequences.

    PubMed

    Zeitler, Magteld; Tass, Peter A

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  16. Augmented brain function by coordinated reset stimulation with slowly varying sequences

    PubMed Central

    Zeitler, Magteld; Tass, Peter A.

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867

  17. Finding specific RNA motifs: Function in a zeptomole world?

    PubMed Central

    KNIGHT, ROB; YARUS, MICHAEL

    2003-01-01

    We have developed a new method for estimating the abundance of any modular (piecewise) RNA motif within a longer random region. We have used this method to estimate the size of the active motifs available to modern SELEX experiments (picomoles of unique sequences) and to a plausible RNA World (zeptomoles of unique sequences: 1 zmole = 602 sequences). Unexpectedly, activities such as specific isoleucine binding are almost certainly present in zeptomoles of molecules, and even ribozymes such as self-cleavage motifs may appear (depending on assumptions about the minimal structures). The number of specified nucleotides is not the only important determinant of a motif’s rarity: The number of modules into which it is divided, and the details of this division, are also crucial. We propose three maxims for easily isolated motifs: the Maxim of Minimization, the Maxim of Multiplicity, and the Maxim of the Median. These maxims together state that selected motifs should be small and composed of as many separate, equally sized modules as possible. For evenly divided motifs with four modules, the largest accessible activity in picomole scale (1–1000 pmole) pools of length 100 is about 34 nucleotides; while for zeptomole scale (1–1000 zmole) pools it is about 20 specific nucleotides (50% probability of occurrence). This latter figure includes some ribozymes and aptamers. Consequently, an RNA metabolism apparently could have begun with only zeptomoles of RNA molecules. PMID:12554865

  18. High-Throughput Development of SSR Markers from Pea (Pisum sativum L.) Based on Next Generation Sequencing of a Purified Chinese Commercial Variety

    PubMed Central

    Zhang, Xiaoyan; Hu, Jinguo; Bao, Shiying; Hao, Junjie; Li, Ling; He, Yuhua; Jiang, Junye; Wang, Fang; Tian, Shufang; Zong, Xuxiao

    2015-01-01

    Pea (Pisum sativum L.) is an important food legume globally, and is the plant species that J.G. Mendel used to lay the foundation of modern genetics. However, genomics resources of pea are limited comparing to other crop species. Application of marker assisted selection (MAS) in pea breeding has lagged behind many other crops. Development of a large number of novel and reliable SSR (simple sequence repeat) or microsatellite markers will help both basic and applied genomics research of this crop. The Illumina HiSeq 2500 System was used to uncover 8,899 putative SSR containing sequences, and 3,275 non-redundant primers were designed to amplify these SSRs. Among the 1,644 SSRs that were randomly selected for primer validation, 841 yielded reliable amplifications of detectable polymorphisms among 24 genotypes of cultivated pea (Pisum sativum L.) and wild relatives (P. fulvum Sm.) originated from diverse geographical locations. The dataset indicated that the allele number per locus ranged from 2 to 10, and that the polymorphism information content (PIC) ranged from 0.08 to 0.82 with an average of 0.38. These 1,644 novel SSR markers were also tested for polymorphism between genotypes G0003973 and G0005527. Finally, 33 polymorphic SSR markers were anchored on the genetic linkage map of G0003973 × G0005527 F2 population. PMID:26440522

  19. Occurrence and Nonoccurrence of Random Sequences: Comment on Hahn and Warren (2009)

    ERIC Educational Resources Information Center

    Sun, Yanlong; Tweney, Ryan D.; Wang, Hongbin

    2010-01-01

    On the basis of the statistical concept of waiting time and on computer simulations of the "probabilities of nonoccurrence" (p. 457) for random sequences, Hahn and Warren (2009) proposed that given people's experience of a finite data stream from the environment, the gambler's fallacy is not as gross an error as it might seem. We deal with two…

  20. Indirect vs direct bonding of mandibular fixed retainers in orthodontic patients: a single-center randomized controlled trial comparing placement time and failure over a 6-month period.

    PubMed

    Bovali, Efstathia; Kiliaridis, Stavros; Cornelis, Marie A

    2014-12-01

    The objective of this 2-arm parallel single-center trial was to compare placement time and numbers of failures of mandibular lingual retainers bonded with an indirect procedure vs a direct bonding procedure. Sixty-four consecutive patients at the postgraduate orthodontic clinic of the University of Geneva in Switzerland scheduled for debonding and mandibular fixed retainer placement were randomly allocated to either an indirect bonding procedure or a traditional direct bonding procedure. Eligibility criteria were the presence of the 4 mandibular incisors and the 2 mandibular canines, and no active caries, restorations, fractures, or periodontal disease of these teeth. The patients were randomized in blocks of 4; the randomization sequence was generated using an online randomization service (www.randomization.com). Allocation concealment was secured by contacting the sequence generator for treatment assignment; blinding was possible for outcome assessment only. Bonding time was measured for each procedure. Unpaired t tests were used to assess differences in time. Patients were recalled at 1, 2, 4, and 6 months after bonding. Mandibular fixed retainers having at least 1 composite pad debonded were considered as failures. The log-rank test was used to compare the Kaplan-Meier survival curves of both procedures. A test of proportion was applied to compare the failures at 6 months between the treatment groups. Sixty-four patients were randomized in a 1:1 ratio. One patient dropped out at baseline after the bonding procedure, and 3 patients did not attend the recalls at 4 and 6 months. Bonding time was significantly shorter for the indirect procedure (321 ± 31 seconds, mean ± SD) than for the direct procedure (401 ± 40 seconds) (per protocol analysis of 63 patients: mean difference = 80 seconds; 95% CI = 62.4-98.1; P <0.001). The 6-month numbers of failures were 10 of 31 (32%) with the indirect technique and 7 of 29 (24%) with the direct technique (log rank: P = 0.35; test of proportions: risk difference = 0.08; 95% CI = -0.15 to 0.31; P = 0.49). No serious harm was observed except for plaque accumulation. Indirect bonding was statistically significantly faster than direct bonding, with both techniques showing similar risks of failure. This trial was not registered. The protocol was not published before trial commencement. No funding or conflict of interest to be declared. Copyright © 2014 American Association of Orthodontists. Published by Elsevier Inc. All rights reserved.

  1. A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

    NASA Astrophysics Data System (ADS)

    Tong, Xiaojun; Cui, Minggen; Wang, Zhu

    2009-07-01

    The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kundu, Sourav, E-mail: sourav.kundu@saha.ac.in; Karmakar, S. N., E-mail: sachindranath.karmakar@saha.ac.in

    We present a tight-binding study of conformation dependent electronic transport properties of DNA double-helix including its helical symmetry. We have studied the changes in the localization properties of DNA as we alter the number of stacked bases within every pitch of the double-helix keeping fixed the total number of nitrogen bases within the DNA molecule. We take three DNA sequences, two of them are periodic and one is random and observe that in all the cases localization length increases as we increase the radius of DNA double-helix i.e., number of nucleobases within a pitch. We have also investigated the effectmore » of backbone energetic on the I-V response of the system and found that in presence of helical symmetry, depending on the interplay of conformal variation and disorder, DNA can be found in either metallic, semiconducting and insulating phases, as observed experimentally.« less

  3. Acceptability and performance of the menstrual cup in South Africa: a randomized crossover trial comparing the menstrual cup to tampons or sanitary pads.

    PubMed

    Beksinska, Mags E; Smit, Jenni; Greener, Ross; Todd, Catherine S; Lee, Mei-ling Ting; Maphumulo, Virginia; Hoffmann, Vivian

    2015-02-01

    In low-income settings, many women and girls face activity restrictions during menses, owing to lack of affordable menstrual products. The menstrual cup (MC) is a nonabsorbent reusable cup that collects menstrual blood. We assessed the acceptability and performance of the MPower® MC compared to pads or tampons among women in a low-resource setting. We conducted a randomized two-period crossover trial at one site in Durban, South Africa, between January and November 2013. Participants aged 18-45 years with regular menstrual cycles were eligible for inclusion if they had no intention of becoming pregnant, were using an effective contraceptive method, had water from the municipal system as their primary water source, and had no sexually transmitted infections. We used a computer-generated randomization sequence to assign participants to one of two sequences of menstrual product use, with allocation concealed only from the study investigators. Participants used each method over three menstrual cycles (total 6 months) and were interviewed at baseline and monthly follow-up visits. The product acceptability outcome compared product satisfaction question scores using an ordinal logistic regression model with individual random effects. This study is registered on the South African Clinical Trials database: number DOH-27-01134273. Of 124 women assessed, 110 were eligible and randomly assigned to selected menstrual products. One hundred and five women completed all follow-up visits. By comparison to pads/tampons (usual product used), the MC was rated significantly better for comfort, quality, menstrual blood collection, appearance, and preference. Both of these comparative outcome measures, along with likelihood of continued use, recommending the product, and future purchase, increased for the MC over time. MC acceptance in a population of novice users, many with limited experience with tampons, indicates that there is a pool of potential users in low-resource settings.

  4. Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

    USDA-ARS?s Scientific Manuscript database

    Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...

  5. Analysis of in vitro evolution reveals the underlying distribution of catalytic activity among random sequences.

    PubMed

    Pressman, Abe; Moretti, Janina E; Campbell, Gregory W; Müller, Ulrich F; Chen, Irene A

    2017-08-21

    The emergence of catalytic RNA is believed to have been a key event during the origin of life. Understanding how catalytic activity is distributed across random sequences is fundamental to estimating the probability that catalytic sequences would emerge. Here, we analyze the in vitro evolution of triphosphorylating ribozymes and translate their fitnesses into absolute estimates of catalytic activity for hundreds of ribozyme families. The analysis efficiently identified highly active ribozymes and estimated catalytic activity with good accuracy. The evolutionary dynamics follow Fisher's Fundamental Theorem of Natural Selection and a corollary, permitting retrospective inference of the distribution of fitness and activity in the random sequence pool for the first time. The frequency distribution of rate constants appears to be log-normal, with a surprisingly steep dropoff at higher activity, consistent with a mechanism for the emergence of activity as the product of many independent contributions. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  6. CRF: detection of CRISPR arrays using random forest.

    PubMed

    Wang, Kai; Liang, Chun

    2017-01-01

    CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.

  7. Non-random distribution and co-localization of purine/pyrimidine-encoded information and transcriptional regulatory domains.

    PubMed

    Povinelli, C M

    1992-01-01

    In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (> 32 genes, 230 kb). Predominant clusters of M1 and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of approximately 1.2 kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50-70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.

  8. Subjective randomness as statistical inference.

    PubMed

    Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

    2018-06-01

    Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.

  9. An RNA motif that binds ATP

    NASA Technical Reports Server (NTRS)

    Sassanfar, M.; Szostak, J. W.

    1993-01-01

    RNAs that contain specific high-affinity binding sites for small molecule ligands immobilized on a solid support are present at a frequency of roughly one in 10(10)-10(11) in pools of random sequence RNA molecules. Here we describe a new in vitro selection procedure designed to ensure the isolation of RNAs that bind the ligand of interest in solution as well as on a solid support. We have used this method to isolate a remarkably small RNA motif that binds ATP, a substrate in numerous biological reactions and the universal biological high-energy intermediate. The selected ATP-binding RNAs contain a consensus sequence, embedded in a common secondary structure. The binding properties of ATP analogues and modified RNAs show that the binding interaction is characterized by a large number of close contacts between the ATP and RNA, and by a change in the conformation of the RNA.

  10. Probabilistic motor sequence learning in a virtual reality serial reaction time task.

    PubMed

    Sense, Florian; van Rijn, Hedderik

    2018-01-01

    The serial reaction time task is widely used to study learning and memory. The task is traditionally administered by showing target positions on a computer screen and collecting responses using a button box or keyboard. By comparing response times to random or sequenced items or by using different transition probabilities, various forms of learning can be studied. However, this traditional laboratory setting limits the number of possible experimental manipulations. Here, we present a virtual reality version of the serial reaction time task and show that learning effects emerge as expected despite the novel way in which responses are collected. We also show that response times are distributed as expected. The current experiment was conducted in a blank virtual reality room to verify these basic principles. For future applications, the technology can be used to modify the virtual reality environment in any conceivable way, permitting a wide range of previously impossible experimental manipulations.

  11. Role of working memory in transformation of visual and motor representations for use in mental simulation.

    PubMed

    Gabbard, Carl; Lee, Jihye; Caçola, Priscila

    2013-01-01

    This study examined the role of visual working memory when transforming visual representations to motor representations in the context of motor imagery. Participants viewed randomized number sequences of three, four, and five digits, and then reproduced the sequence by finger tapping using motor imagery or actually executing the movements; movement duration was recorded. One group viewed the stimulus for three seconds and responded immediately, while the second group had a three-second view followed by a three-second blank screen delay before responding. As expected, delay group times were longer with each condition and digit load. Whereas correlations between imagined and executed actions (temporal congruency) were significant in a positive direction for both groups, interestingly, the delay group's values were significantly stronger. That outcome prompts speculation that delay influenced the congruency between motor representation and actual execution.

  12. Identification of the critical residues responsible for differential reactivation of the triosephosphate isomerases of two trypanosomes

    PubMed Central

    Rodríguez-Bolaños, Monica; Cabrera, Nallely

    2016-01-01

    The reactivation of triosephosphate isomerase (TIM) from unfolded monomers induced by guanidine hydrochloride involves different amino acids of its sequence in different stages of protein refolding. We describe a systematic mutagenesis method to find critical residues for certain physico-chemical properties of a protein. The two similar TIMs of Trypanosoma brucei and Trypanosoma cruzi have different reactivation velocities and efficiencies. We used a small number of chimeric enzymes, additive mutants and planned site-directed mutants to produce an enzyme from T. brucei with 13 mutations in its sequence, which reactivates fast and efficiently like wild-type (WT) TIM from T. cruzi, and another enzyme from T. cruzi, with 13 slightly altered mutations, which reactivated slowly and inefficiently like the WT TIM of T. brucei. Our method is a shorter alternative to random mutagenesis, saturation mutagenesis or directed evolution to find multiple amino acids critical for certain properties of proteins. PMID:27733588

  13. Memory replay in balanced recurrent networks

    PubMed Central

    Chenkov, Nikolay; Sprekeler, Henning; Kempter, Richard

    2017-01-01

    Complex patterns of neural activity appear during up-states in the neocortex and sharp waves in the hippocampus, including sequences that resemble those during prior behavioral experience. The mechanisms underlying this replay are not well understood. How can small synaptic footprints engraved by experience control large-scale network activity during memory retrieval and consolidation? We hypothesize that sparse and weak synaptic connectivity between Hebbian assemblies are boosted by pre-existing recurrent connectivity within them. To investigate this idea, we connect sequences of assemblies in randomly connected spiking neuronal networks with a balance of excitation and inhibition. Simulations and analytical calculations show that recurrent connections within assemblies allow for a fast amplification of signals that indeed reduces the required number of inter-assembly connections. Replay can be evoked by small sensory-like cues or emerge spontaneously by activity fluctuations. Global—potentially neuromodulatory—alterations of neuronal excitability can switch between network states that favor retrieval and consolidation. PMID:28135266

  14. Question 3: The Worlds of the Prebiotic and Never Born Proteins

    NASA Astrophysics Data System (ADS)

    Chiarabelli, Cristiano; de Lucrezia, Davide

    2007-10-01

    Starting from the statement that no reliable methods are known to produce high molecular weight polypeptides under prebiotic conditions, a possible approach, at least to understand the differences between extant proteins and the possible large number of never born proteins, could be biological. Using the phage display method a large library of totally random amino acidic sequences was obtained. Consequently, different experiments to directly consider the frequency of stable folds were performed, and the interesting results obtained from such new approach are discussed in terms of contingency, contributing to the discussion on the selection mechanism of extant proteins.

  15. Controllability of Deterministic Networks with the Identical Degree Sequence

    PubMed Central

    Ma, Xiujuan; Zhao, Haixing; Wang, Binghong

    2015-01-01

    Controlling complex network is an essential problem in network science and engineering. Recent advances indicate that the controllability of complex network is dependent on the network's topology. Liu and Barabási, et.al speculated that the degree distribution was one of the most important factors affecting controllability for arbitrary complex directed network with random link weights. In this paper, we analysed the effect of degree distribution to the controllability for the deterministic networks with unweighted and undirected. We introduce a class of deterministic networks with identical degree sequence, called (x,y)-flower. We analysed controllability of the two deterministic networks ((1, 3)-flower and (2, 2)-flower) by exact controllability theory in detail and give accurate results of the minimum number of driver nodes for the two networks. In simulation, we compare the controllability of (x,y)-flower networks. Our results show that the family of (x,y)-flower networks have the same degree sequence, but their controllability is totally different. So the degree distribution itself is not sufficient to characterize the controllability of deterministic networks with unweighted and undirected. PMID:26020920

  16. Scaling exponents for ordered maxima

    DOE PAGES

    Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.

    2015-12-22

    We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less

  17. Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

    NASA Astrophysics Data System (ADS)

    Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

    2016-02-01

    Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.

  18. Models of Protocellular Structure, Function and Evolution

    NASA Technical Reports Server (NTRS)

    New, Michael H.; Pohorille, Andrew; Szostak, Jack W.; Keefe, Tony; Lanyi, Janos K.

    2001-01-01

    In the absence of any record of protocells, the most direct way to test our understanding of the origin of cellular life is to construct laboratory models that capture important features of protocellular systems. Such efforts are currently underway in a collaborative project between NASA-Ames, Harvard Medical School and University of California. They are accompanied by computational studies aimed at explaining self-organization of simple molecules into ordered structures. The centerpiece of this project is a method for the in vitro evolution of protein enzymes toward arbitrary catalytic targets. A similar approach has already been developed for nucleic acids in which a small number of functional molecules are selected from a large, random population of candidates. The selected molecules are next vastly multiplied using the polymerase chain reaction. A mutagenic approach, in which the sequences of selected molecules are randomly altered, can yield further improvements in performance or alterations of specificities. Unfortunately, the catalytic potential of nucleic acids is rather limited. Proteins are more catalytically capable but cannot be directly amplified. In the new technique, this problem is circumvented by covalently linking each protein of the initial, diverse, pool to the RNA sequence that codes for it. Then, selection is performed on the proteins, but the nucleic acids are replicated. Additional information is contained in the original extended abstract.

  19. Improve homology search sensitivity of PacBio data by correcting frameshifts.

    PubMed

    Du, Nan; Sun, Yanni

    2016-09-01

    Single-molecule, real-time sequencing (SMRT) developed by Pacific BioSciences produces longer reads than secondary generation sequencing technologies such as Illumina. The long read length enables PacBio sequencing to close gaps in genome assembly, reveal structural variations, and identify gene isoforms with higher accuracy in transcriptomic sequencing. However, PacBio data has high sequencing error rate and most of the errors are insertion or deletion errors. During alignment-based homology search, insertion or deletion errors in genes will cause frameshifts and may only lead to marginal alignment scores and short alignments. As a result, it is hard to distinguish true alignments from random alignments and the ambiguity will incur errors in structural and functional annotation. Existing frameshift correction tools are designed for data with much lower error rate and are not optimized for PacBio data. As an increasing number of groups are using SMRT, there is an urgent need for dedicated homology search tools for PacBio data. In this work, we introduce Frame-Pro, a profile homology search tool for PacBio reads. Our tool corrects sequencing errors and also outputs the profile alignments of the corrected sequences against characterized protein families. We applied our tool to both simulated and real PacBio data. The results showed that our method enables more sensitive homology search, especially for PacBio data sets of low sequencing coverage. In addition, we can correct more errors when comparing with a popular error correction tool that does not rely on hybrid sequencing. The source code is freely available at https://sourceforge.net/projects/frame-pro/ yannisun@msu.edu. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  20. Extracting random numbers from quantum tunnelling through a single diode.

    PubMed

    Bernardo-Gavito, Ramón; Bagci, Ibrahim Ethem; Roberts, Jonathan; Sexton, James; Astbury, Benjamin; Shokeir, Hamzah; McGrath, Thomas; Noori, Yasir J; Woodhead, Christopher S; Missous, Mohamed; Roedig, Utz; Young, Robert J

    2017-12-19

    Random number generation is crucial in many aspects of everyday life, as online security and privacy depend ultimately on the quality of random numbers. Many current implementations are based on pseudo-random number generators, but information security requires true random numbers for sensitive applications like key generation in banking, defence or even social media. True random number generators are systems whose outputs cannot be determined, even if their internal structure and response history are known. Sources of quantum noise are thus ideal for this application due to their intrinsic uncertainty. In this work, we propose using resonant tunnelling diodes as practical true random number generators based on a quantum mechanical effect. The output of the proposed devices can be directly used as a random stream of bits or can be further distilled using randomness extraction algorithms, depending on the application.

  1. Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

    PubMed Central

    Davis, C A; Wyatt, G R

    1989-01-01

    The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148

  2. Physiology is rocking the foundations of evolutionary biology.

    PubMed

    Noble, Denis

    2013-08-01

    The 'Modern Synthesis' (Neo-Darwinism) is a mid-20th century gene-centric view of evolution, based on random mutations accumulating to produce gradual change through natural selection. Any role of physiological function in influencing genetic inheritance was excluded. The organism became a mere carrier of the real objects of selection, its genes. We now know that genetic change is far from random and often not gradual. Molecular genetics and genome sequencing have deconstructed this unnecessarily restrictive view of evolution in a way that reintroduces physiological function and interactions with the environment as factors influencing the speed and nature of inherited change. Acquired characteristics can be inherited, and in a few but growing number of cases that inheritance has now been shown to be robust for many generations. The 21st century can look forward to a new synthesis that will reintegrate physiology with evolutionary biology.

  3. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Miall, A.D.

    The basic premise of the recent Exxon cycle chart, that there exists a globally correlatable suite of third-order eustatic cycles, remains unproven. Many of the tests of this premise are based on circular reasoning. The implied precision of the Exxon global cycle chart is not supportable, because it is greater than that of the best available chronostratigraphic techniques, such as those used to construct the global standard time scale. Correlations of new stratigraphic sections with the Exxon chart will almost always succeed, because there are so many Exxon sequence-boundary events from which to choose. This is demonstrated by the usemore » of four synthetic sections constructed from tables of random numbers. A minimum of 77% successful correlations of random events with the Exxon chart was achieved. The existing cycle chart represents an amalgam of regional and local tectonic events and probably also includes unrecognized miscorrelations. It is of questionable value as an independent standard of geologic time.« less

  4. Generation of physical random numbers by using homodyne detection

    NASA Astrophysics Data System (ADS)

    Hirakawa, Kodai; Oya, Shota; Oguri, Yusuke; Ichikawa, Tsubasa; Eto, Yujiro; Hirano, Takuya; Tsurumaru, Toyohiro

    2016-10-01

    Physical random numbers generated by quantum measurements are, in principle, impossible to predict. We have demonstrated the generation of physical random numbers by using a high-speed balanced photodetector to measure the quadrature amplitudes of vacuum states. Using this method, random numbers were generated at 500 Mbps, which is more than one order of magnitude faster than previously [Gabriel et al:, Nature Photonics 4, 711-715 (2010)]. The Crush test battery of the TestU01 suite consists of 31 tests in 144 variations, and we used them to statistically analyze these numbers. The generated random numbers passed 14 of the 31 tests. To improve the randomness, we performed a hash operation, in which each random number was multiplied by a random Toeplitz matrix; the resulting numbers passed all of the tests in the TestU01 Crush battery.

  5. RSAT: regulatory sequence analysis tools.

    PubMed

    Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

    2008-07-01

    The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.

  6. Identification and quantification of homologous series of compound in complex mixtures: autocovariance study of GC/MS chromatograms.

    PubMed

    Pietrogrande, Maria Chiara; Zampolli, Maria Grazia; Dondi, Francesco

    2006-04-15

    The paper describes a method for determining homologous classes of compounds in a multicomponent complex chromatogram obtained under programming elution conditions. The method is based on the computation of the autocovariance function of the experimental chromatogram (EACVF). The EACVF plot, if properly interpreted, can be regarded as a "class chromatogram" i.e., a virtual chromatogram formed by peaks whose positions and heights allow identification and quantification of the different homologous series, even if they are embedded in a random complex chromatogram. Theoretical models were developed to describe complex chromatograms displaying random retention pattern, ordered sequences or a combination of them. On the basis of theoretical autocovariance function, the properties of the chromatogram can be experimentally evaluated, under well-defined conditions: in particular, the two components of the chromatogram, ordered and random, can be identified. Moreover, the total number of single components (SCs) and the separated number of the SCs belonging to the random and ordered components can be determined, when the two components display the same concentration. If the mixture contains several homologous series with common frequency and different phase values, the number and identity of the different homologous series as well as the number of SCs belonging to each of them can be evaluated. Moreover, the power of the EACVF method can be magnified by applying it to the single ion monitoring (SIM) signals to selectively detect specific compound classes in order to identify the different homologous series. By this way, a full "decoding" of the complex multicomponent chromatogram is achieved. The method was validated on synthetic mixtures containing known amount of SCs belonging to homologous series of hydrocarbon, alcohols, ketones, and aromatic compounds in addition to other not structurally related SCs. The method was applied to both the total ion monitoring (TIC) and the SIM signals, to describe step by step the essence of the procedure. Moreover, the systematic use of both SIM and TIC can simplify the decoding procedure of complex chromatograms by singling out only specific compound classes or by confirming the identification of the different homologous series. The method was further applied to a sample containing unknown number of compounds and homologous series (a petroleum benzin, bp 140-160 degrees C): the results obtained were meaningful in terms of both the identified number of components and identified homologous series.

  7. A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

    NASA Astrophysics Data System (ADS)

    Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

    2011-07-01

    An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.

  8. Mammalian genome projects reveal new growth hormone (GH) sequences. Characterization of the GH-encoding genes of armadillo (Dasypus novemcinctus), hedgehog (Erinaceus europaeus), bat (Myotis lucifugus), hyrax (Procavia capensis), shrew (Sorex araneus), ground squirrel (Spermophilus tridecemlineatus), elephant (Loxodonta africana), cat (Felis catus) and opossum (Monodelphis domestica).

    PubMed

    Wallis, Michael

    2008-01-15

    Mammalian growth hormone (GH) sequences have been shown previously to display episodic evolution: the sequence is generally strongly conserved but on at least two occasions during mammalian evolution (on lineages leading to higher primates and ruminants) bursts of rapid evolution occurred. However, the number of mammalian orders studied previously has been relatively limited, and the availability of sequence data via mammalian genome projects provides the potential for extending the range of GH gene sequences examined. Complete or nearly complete GH gene sequences for six mammalian species for which no data were previously available have been extracted from the genome databases-Dasypus novemcinctus (nine-banded armadillo), Erinaceus europaeus (western European hedgehog), Myotis lucifugus (little brown bat), Procavia capensis (cape rock hyrax), Sorex araneus (European shrew), Spermophilus tridecemlineatus (13-lined ground squirrel). In addition incomplete data for several other species have been extended. Examination of the data in detail and comparison with previously available sequences has allowed assessment of the reliability of deduced sequences. Several of the new sequences differ substantially from the consensus sequence previously determined for eutherian GHs, indicating greater variability than previously recognised, and confirming the episodic pattern of evolution. The episodic pattern is not seen for signal sequences, 5' upstream sequence or synonymous substitutions-it is specific to the mature protein sequence, suggesting that it relates to the hormonal function. The substitutions accumulated during the course of GH evolution have occurred mainly on the side of the hormone facing away from the receptor, in a non-random fashion, and it is suggested that this may reflect interaction of the receptor-bound hormone with other proteins or small ligands.

  9. Sequence and Structure Dependent DNA-DNA Interactions

    NASA Astrophysics Data System (ADS)

    Kopchick, Benjamin; Qiu, Xiangyun

    Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.

  10. Do humans and nonhuman animals share the grouping principles of the Iambic - Trochaic Law?

    PubMed Central

    de la Mora, Daniela M.; Nespor, Marina; Toro, Juan M.

    2014-01-01

    The Iambic-Trochaic Law describes humans’ tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sound marks group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sound marks group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group as trochees sequences based on pitch variations, but they do not group as iambs sequences varying in duration. Importantly, humans grouped the same stimuli following the principles of the Iambic-Trochaic Law (Experiment 3). These results suggest an early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals as well, whereas the iambic rhythmic grouping bias based on duration might depend on language experience. PMID:22956287

  11. Do humans and nonhuman animals share the grouping principles of the iambic-trochaic law?

    PubMed

    de la Mora, Daniela M; Nespor, Marina; Toro, Juan M

    2013-01-01

    The iambic-trochaic law describes humans' tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sounds mark group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sounds mark group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group sequences based on pitch variations as trochees, but they do not group sequences varying in duration as iambs. Importantly, humans grouped the same stimuli following the principles of the iambic-trochaic law (Exp. 3). These results suggest the early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals, whereas the iambic rhythmic grouping bias based on duration might depend on language experience.

  12. The quality of reporting of randomized controlled trials of traditional Chinese medicine: a survey of 13 randomly selected journals from mainland China.

    PubMed

    Wang, Gang; Mao, Bing; Xiong, Ze-Yu; Fan, Tao; Chen, Xiao-Dong; Wang, Lei; Liu, Guan-Jian; Liu, Jia; Guo, Jia; Chang, Jing; Wu, Tai-Xiang; Li, Ting-Qian

    2007-07-01

    The number of randomized controlled trials (RCTs) of traditional Chinese medicine (TCM) is increasing. However, there have been few systematic assessments of the quality of reporting of these trials. This study was undertaken to evaluate the quality of reporting of RCTs in TCM journals published in mainland China from 1999 to 2004. Thirteen TCM journals were randomly selected by stratified sampling of the approximately 100 TCM journals published in mainland China. All issues of the selected journals published from 1999 to 2004 were hand-searched according to guidelines from the Cochrane Centre. All reviewers underwent training in the evaluation of RCTs at the Chinese Centre of Evidence-based Medicine. A comprehensive quality assessment of each RCT was completed using a modified version of the Consolidated Standards of Reporting Trials (CONSORT) checklist (total of 30 items) and the Jadad scale. Disagreements were resolved by consensus. Seven thousand four hundred twenty-two RCTs were identified. The proportion of published RCTs relative to all types of published clinical trials increased significantly over the period studied, from 18.6% in 1999 to 35.9% in 2004 (P < 0.001). The mean (SD) Jadad score was 1.03 (0.61) overall. One RCT had a Jadad score of 5 points; 14 had a score of 4 points; and 102 had a score of 3 points. The mean (SD) Jadad score was 0.85 (0.53) in 1999 (746 RCTs) and 1.20 (0.62) in 2004 (1634 RCTs). Across all trials, 39.4% of the items on the modified CONSORT checklist were reported, which was equivalent to 11.82 (5.78) of the 30 items. Some important methodologic components of RCTs were incompletely reported, such as sample-size calculation (reported in 1.1% of RCTs), randomization sequence (7.9%), allocation concealment (0.3 %), implementation of the random-allocation sequence (0%), and analysis of intention to treat (0%). The findings of this study indicate that the quality of reporting of RCTs of TCM has improved, but remains poor.

  13. Active learning reduces annotation time for clinical concept extraction.

    PubMed

    Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

    2017-10-01

    To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. The estimation of genetic divergence

    NASA Technical Reports Server (NTRS)

    Holmquist, R.; Conroy, T.

    1981-01-01

    Consideration is given to the criticism of Nei and Tateno (1978) of the REH (random evolutionary hits) theory of genetic divergence in nucleic acids and proteins, and to their proposed alternative estimator of total fixed mutations designated X2. It is argued that the assumption of nonuniform amino acid or nucleotide substitution will necessarily increase REH estimates relative to those made for a model where each locus has an equal likelihood of fixing mutations, thus the resulting value will not be an overestimation. The relative values of X2 and measures calculated on the basis of the PAM and REH theories for the number of nucleotide substitutions necessary to explain a given number of observed amino acid differences between two homologous proteins are compared, and the smaller values of X2 are attributed to (1) a mathematical model based on the incorrect assumption that an entire structural gene is free to fix mutations and (2) the assumptions of different numbers of variable codons for the X2 and REH calculations. Results of a repeat of the computer simulations of Nei and Tateno are presented which, in contrast to the original results, confirm the REH theory. It is pointed out that while a negative correlation is observed between estimations of the fixation intensity per varion and the number of varions for a given pair of sequences, the correlation between the two fixation intensities and varion numbers of two different pairs of sequences need not be negative. Finally, REH theory is used to resolve a paradox concerning the high rate of covarion turnover and the nature of general function sites as permanent covarions.

  15. Fast and secure encryption-decryption method based on chaotic dynamics

    DOEpatents

    Protopopescu, Vladimir A.; Santoro, Robert T.; Tolliver, Johnny S.

    1995-01-01

    A method and system for the secure encryption of information. The method comprises the steps of dividing a message of length L into its character components; generating m chaotic iterates from m independent chaotic maps; producing an "initial" value based upon the m chaotic iterates; transforming the "initial" value to create a pseudo-random integer; repeating the steps of generating, producing and transforming until a pseudo-random integer sequence of length L is created; and encrypting the message as ciphertext based upon the pseudo random integer sequence. A system for accomplishing the invention is also provided.

  16. A Protocol for Functional Assessment of Whole-Protein Saturation Mutagenesis Libraries Utilizing High-Throughput Sequencing.

    PubMed

    Stiffler, Michael A; Subramanian, Subu K; Salinas, Victor H; Ranganathan, Rama

    2016-07-03

    Site-directed mutagenesis has long been used as a method to interrogate protein structure, function and evolution. Recent advances in massively-parallel sequencing technology have opened up the possibility of assessing the functional or fitness effects of large numbers of mutations simultaneously. Here, we present a protocol for experimentally determining the effects of all possible single amino acid mutations in a protein of interest utilizing high-throughput sequencing technology, using the 263 amino acid antibiotic resistance enzyme TEM-1 β-lactamase as an example. In this approach, a whole-protein saturation mutagenesis library is constructed by site-directed mutagenic PCR, randomizing each position individually to all possible amino acids. The library is then transformed into bacteria, and selected for the ability to confer resistance to β-lactam antibiotics. The fitness effect of each mutation is then determined by deep sequencing of the library before and after selection. Importantly, this protocol introduces methods which maximize sequencing read depth and permit the simultaneous selection of the entire mutation library, by mixing adjacent positions into groups of length accommodated by high-throughput sequencing read length and utilizing orthogonal primers to barcode each group. Representative results using this protocol are provided by assessing the fitness effects of all single amino acid mutations in TEM-1 at a clinically relevant dosage of ampicillin. The method should be easily extendable to other proteins for which a high-throughput selection assay is in place.

  17. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping

    PubMed Central

    Hulse-Kemp, Amanda M.; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A.; Scheffler, Brian E.; Fang, David D.; Chen, Z. Jeffrey; Van Deynze, Allen; Stelly, David M.

    2015-01-01

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. PMID:25858960

  18. From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

    PubMed

    Garza, Daniel R; Dutilh, Bas E

    2015-11-01

    Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.

  19. A generator for unique quantum random numbers based on vacuum states

    NASA Astrophysics Data System (ADS)

    Gabriel, Christian; Wittmann, Christoffer; Sych, Denis; Dong, Ruifang; Mauerer, Wolfgang; Andersen, Ulrik L.; Marquardt, Christoph; Leuchs, Gerd

    2010-10-01

    Random numbers are a valuable component in diverse applications that range from simulations over gambling to cryptography. The quest for true randomness in these applications has engendered a large variety of different proposals for producing random numbers based on the foundational unpredictability of quantum mechanics. However, most approaches do not consider that a potential adversary could have knowledge about the generated numbers, so the numbers are not verifiably random and unique. Here we present a simple experimental setup based on homodyne measurements that uses the purity of a continuous-variable quantum vacuum state to generate unique random numbers. We use the intrinsic randomness in measuring the quadratures of a mode in the lowest energy vacuum state, which cannot be correlated to any other state. The simplicity of our source, combined with its verifiably unique randomness, are important attributes for achieving high-reliability, high-speed and low-cost quantum random number generators.

  20. Shaping the spectrum of random-phase radar waveforms

    DOEpatents

    Doerry, Armin W.; Marquette, Brandeis

    2017-05-09

    The various technologies presented herein relate to generation of a desired waveform profile in the form of a spectrum of apparently random noise (e.g., white noise or colored noise), but with precise spectral characteristics. Hence, a waveform profile that could be readily determined (e.g., by a spoofing system) is effectively obscured. Obscuration is achieved by dividing the waveform into a series of chips, each with an assigned frequency, wherein the sequence of chips are subsequently randomized. Randomization can be a function of the application of a key to the chip sequence. During processing of the echo pulse, a copy of the randomized transmitted pulse is recovered or regenerated against which the received echo is correlated. Hence, with the echo energy range-compressed in this manner, it is possible to generate a radar image with precise impulse response.

  1. Mirror Numbers and Wigner's ``Unreasonable Effectiveness''

    NASA Astrophysics Data System (ADS)

    Berezin, Alexander

    2006-04-01

    Wigner's ``unreasonable effectiveness of mathematics in physics'' can be augmented by concept of mirror number (MN). It is defined as digital string infinite in both directions. Example is ()5141327182() where first 5 digits is Pi ``spelled'' backward (``mirrored'') and last 5 digits is the beginning of decimal exp1 string. Let MN be constructed from two different transcendental (or algebraically irrational) numbers, set of such MNs is Cantor-uncountable. Most MNs have contain any finite digital sequence repeated infinitely many times. In spirit of ``Contact'' (C.Sagan) each normal MN contains ``Library of Babel'' of all possible texts and patterns (J.L.Borges). Infinite at both ends, MN do not have any numerical values and, contrary to numbers written in positional systems, all digits in MNs have equal weight -- sort of ``numerological democracy''. In Pythagorean-Platonic models (space-time and physical world originating from pure numbers) idea of MN resolves paradox of ``beginning'' (or ``end'') of time. Because in MNs all digits have equal status, (quantum) randomness leads to more uniform and fully ergodic phase trajectories (cf. F.Dyson, Infinite in All Directions) .

  2. Statistical method to compare massive parallel sequencing pipelines.

    PubMed

    Elsensohn, M H; Leblay, N; Dimassi, S; Campan-Fournier, A; Labalme, A; Roucher-Boulez, F; Sanlaville, D; Lesca, G; Bardel, C; Roy, P

    2017-03-01

    Today, sequencing is frequently carried out by Massive Parallel Sequencing (MPS) that cuts drastically sequencing time and expenses. Nevertheless, Sanger sequencing remains the main validation method to confirm the presence of variants. The analysis of MPS data involves the development of several bioinformatic tools, academic or commercial. We present here a statistical method to compare MPS pipelines and test it in a comparison between an academic (BWA-GATK) and a commercial pipeline (TMAP-NextGENe®), with and without reference to a gold standard (here, Sanger sequencing), on a panel of 41 genes in 43 epileptic patients. This method used the number of variants to fit log-linear models for pairwise agreements between pipelines. To assess the heterogeneity of the margins and the odds ratios of agreement, four log-linear models were used: a full model, a homogeneous-margin model, a model with single odds ratio for all patients, and a model with single intercept. Then a log-linear mixed model was fitted considering the biological variability as a random effect. Among the 390,339 base-pairs sequenced, TMAP-NextGENe® and BWA-GATK found, on average, 2253.49 and 1857.14 variants (single nucleotide variants and indels), respectively. Against the gold standard, the pipelines had similar sensitivities (63.47% vs. 63.42%) and close but significantly different specificities (99.57% vs. 99.65%; p < 0.001). Same-trend results were obtained when only single nucleotide variants were considered (99.98% specificity and 76.81% sensitivity for both pipelines). The method allows thus pipeline comparison and selection. It is generalizable to all types of MPS data and all pipelines.

  3. Dimeric PROP1 binding to diverse palindromic TAAT sequences promotes its transcriptional activity.

    PubMed

    Nakayama, Michie; Kato, Takako; Susa, Takao; Sano, Akiko; Kitahara, Kousuke; Kato, Yukio

    2009-08-13

    Mutations in the Prop1 gene are responsible for murine Ames dwarfism and human combined pituitary hormone deficiency with hypogonadism. Recently, we reported that PROP1 is a possible transcription factor for gonadotropin subunit genes through plural cis-acting sites composed of AT-rich sequences containing a TAAT motif which differs from its consensus binding sequence known as PRDQ9 (TAATTGAATTA). This study aimed to verify the binding specificity and sequence of PROP1 by applying the method of SELEX (Systematic Evolution of Ligands by EXponential enrichment), EMSA (electrophoretic mobility shift assay) and transient transfection assay. SELEX, after 5, 7 and 9 generations of selection using a random sequence library, showed that nucleotides containing one or two TAAT motifs were accumulated and accounted for 98.5% at the 9th generation. Aligned sequences and EMSA demonstrated that PROP1 binds preferentially to 11 nucleotides composed of an inverted TAAT motif separated by 3 nucleotides with variation in the half site of palindromic TAAT motifs and with preferential requirement of T at the nucleotide number 5 immediately 3' to a TAAT motif. Transient transfection assay demonstrated first that dimeric binding of PROP1 to an inverted TAAT motif and its cognates resulted in transcriptional activation, whereas monomeric binding of PROP1 to a single TAAT motif and an inverted ATTA motif did not mediate activation. Thus, this study demonstrated that dimeric binding of PROP1 is able to recognize diverse palindromic TAAT sequences separated by 3 nucleotides and to exhibit its transcriptional activity.

  4. Pseudo-Random Number Generator Based on Coupled Map Lattices

    NASA Astrophysics Data System (ADS)

    Lü, Huaping; Wang, Shihong; Hu, Gang

    A one-way coupled chaotic map lattice is used for generating pseudo-random numbers. It is shown that with suitable cooperative applications of both chaotic and conventional approaches, the output of the spatiotemporally chaotic system can easily meet the practical requirements of random numbers, i.e., excellent random statistical properties, long periodicity of computer realizations, and fast speed of random number generations. This pseudo-random number generator system can be used as ideal synchronous and self-synchronizing stream cipher systems for secure communications.

  5. Contribution to the modeling of solar spicules

    NASA Astrophysics Data System (ADS)

    Tavabi, E.; Koutchmy, S.; Ajabshirizadeh, A.

    2011-06-01

    Solar limb and disk spicule quasi-periodic motions have been reported for a long time, strongly suggesting that they are oscillating. In order to clear up the origin and possibly explain some solar limb and disk spicule quasi-periodic recurrences produced by overlapping effects, we present a simulation model assuming quasi-random positions of spicules. We also allow a set number of spicules with different physical properties (such as: height, lifetime and tilt angle as shown by an individual spicule) occurring randomly. Results of simulations made with three different spatial resolutions of the corresponding frames and also for different number density of spicules, are analyzed. The wavelet time/frequency method is used to obtain the exact period of spicule visibility. Results are compared with observations of the chromosphere from (i) the Transition Region and Coronal Explorer (TRACE) filtergrams taken at 1600 Å, (ii) the Solar Optical Telescope (SOT) of Hinode taken in the Ca II H-line and (iii) the Sac-Peak Dunn's VTT taken in Hα line. Our results suggest the need to be cautious when interpreting apparent oscillations seen in spicule image sequences when overlapping is present, i.e., when the spatial resolution is not enough to resolve individual components of spicules.

  6. On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

    PubMed Central

    Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

    2013-01-01

    The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608

  7. Topical silver diamine fluoride for dental caries arrest in preschool children: A randomized controlled trial and microbiological analysis of caries associated microbes and resistance gene expression.

    PubMed

    Milgrom, Peter; Horst, Jeremy A; Ludwig, Sharity; Rothen, Marilynn; Chaffee, Benjamin W; Lyalina, Svetlana; Pollard, Katherine S; DeRisi, Joseph L; Mancl, Lloyd

    2018-01-01

    The Stopping Cavities Trial investigated effectiveness and safety of 38% silver diamine fluoride in arresting caries lesions. The study was a double-blind randomized placebo-controlled superiority trial with 2 parallel groups. The sites were Oregon preschools. Sixty-six preschool children with ≥1 lesion were enrolled. Silver diamine fluoride (38%) or placebo (blue-tinted water), applied topically to the lesion. The primary endpoint was caries arrest (lesion inactivity, Nyvad criteria) 14-21days post intervention. Dental plaque was collected from all children, and microbial composition was assessed by RNA sequencing from 2 lesions and 1 unaffected surface before treatment and at follow-up for 3 children from each group. Average proportion of arrested caries lesions in the silver diamine fluoride group was higher (0.72; 95% CI; 0.55, 0.84) than in the placebo group (0.05; 95% CI; 0.00, 0.16). Confirmatory analysis using generalized estimating equation log-linear regression, based on the number of arrested lesions and accounting for the number of treated surfaces and length of follow-up, indicates the risk of arrested caries was significantly higher in the treatment group (relative risk, 17.3; 95% CI: 4.3 to 69.4). No harms were observed. RNA sequencing analysis identified no consistent changes in relative abundance of caries-associated microbes, nor emergence of antibiotic or metal resistance gene expression. Topical 38% silver diamine fluoride is effective and safe in arresting cavities in preschool children. The treatment is applicable to primary care practice and may reduce the burden of untreated tooth decay in the population. Copyright © 2017 The Author(s). Published by Elsevier Ltd.. All rights reserved.

  8. Characterization of Adelphocoris suturalis (Hemiptera: Miridae) Transcriptome from Different Developmental Stages

    NASA Astrophysics Data System (ADS)

    Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping

    2015-06-01

    Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs.

  9. Characterization of Adelphocoris suturalis (Hemiptera: Miridae) Transcriptome from Different Developmental Stages

    PubMed Central

    Tian, Caihong; Tek Tay, Wee; Feng, Hongqiang; Wang, Ying; Hu, Yongmin; Li, Guoping

    2015-01-01

    Adelphocoris suturalis is one of the most serious pest insects of Bt cotton in China, however its molecular genetics, biochemistry and physiology are poorly understood. We used high throughput sequencing platform to perform de novo transcriptome assembly and gene expression analyses across different developmental stages (eggs, 2nd and 5th instar nymphs, female and male adults). We obtained 20 GB of clean data and revealed 88,614 unigenes, including 23,830 clusters and 64,784 singletons. These unigene sequences were annotated and classified by Gene Ontology, Clusters of Orthologous Groups, and Kyoto Encyclopedia of Genes and Genomes databases. A large number of differentially expressed genes were discovered through pairwise comparisons between these developmental stages. Gene expression profiles were dramatically different between life stage transitions, with some of these most differentially expressed genes being associated with sex difference, metabolism and development. Quantitative real-time PCR results confirm deep-sequencing findings based on relative expression levels of nine randomly selected genes. Furthermore, over 791,390 single nucleotide polymorphisms and 2,682 potential simple sequence repeats were identified. Our study provided comprehensive transcriptional gene expression information for A. suturalis that will form the basis to better understanding of development pathways, hormone biosynthesis, sex differences and wing formation in mirid bugs. PMID:26047353

  10. Quasirandom geometric networks from low-discrepancy sequences

    NASA Astrophysics Data System (ADS)

    Estrada, Ernesto

    2017-08-01

    We define quasirandom geometric networks using low-discrepancy sequences, such as Halton, Sobol, and Niederreiter. The networks are built in d dimensions by considering the d -tuples of digits generated by these sequences as the coordinates of the vertices of the networks in a d -dimensional Id unit hypercube. Then, two vertices are connected by an edge if they are at a distance smaller than a connection radius. We investigate computationally 11 network-theoretic properties of two-dimensional quasirandom networks and compare them with analogous random geometric networks. We also study their degree distribution and their spectral density distributions. We conclude from this intensive computational study that in terms of the uniformity of the distribution of the vertices in the unit square, the quasirandom networks look more random than the random geometric networks. We include an analysis of potential strategies for generating higher-dimensional quasirandom networks, where it is know that some of the low-discrepancy sequences are highly correlated. In this respect, we conclude that up to dimension 20, the use of scrambling, skipping and leaping strategies generate quasirandom networks with the desired properties of uniformity. Finally, we consider a diffusive process taking place on the nodes and edges of the quasirandom and random geometric graphs. We show that the diffusion time is shorter in the quasirandom graphs as a consequence of their larger structural homogeneity. In the random geometric graphs the diffusion produces clusters of concentration that make the process more slow. Such clusters are a direct consequence of the heterogeneous and irregular distribution of the nodes in the unit square in which the generation of random geometric graphs is based on.

  11. Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

    PubMed

    Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

    2017-05-01

    Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.

  12. A computerized handheld decision-support system to improve pulmonary embolism diagnosis: a randomized trial.

    PubMed

    Roy, Pierre-Marie; Durieux, Pierre; Gillaizeau, Florence; Legall, Catherine; Armand-Perroux, Aurore; Martino, Ludovic; Hachelaf, Mohamed; Dubart, Alain-Eric; Schmidt, Jeannot; Cristiano, Mirko; Chretien, Jean-Marie; Perrier, Arnaud; Meyer, Guy

    2009-11-17

    Testing for pulmonary embolism often differs from that recommended by evidence-based guidelines. To assess the effectiveness of a handheld clinical decision-support system to improve the diagnostic work-up of suspected pulmonary embolism among patients in the emergency department. Cluster randomized trial. Assignment was by random-number table, providers were not blinded, and outcome assessment was automated. (ClinicalTrials.gov registration number: NCT00188032). 20 emergency departments in France. 1103 and 1768 consecutive outpatients with suspected pulmonary embolism. After a preintervention period involving 20 centers and 1103 patients, in which providers grew accustomed to inputting clinical data into handheld devices and investigators assessed baseline testing, emergency departments were randomly assigned to activation of a decision-support system on the devices (10 centers, 753 patients) or posters and pocket cards that showed validated diagnostic strategies (10 centers, 1015 patients). Appropriateness of diagnostic work-up, defined as any sequence of tests that yielded a posttest probability less than 5% or greater than 85% (primary outcome) or as strict adherence to guideline recommendations (secondary outcome); number of tests per patient (secondary outcome). The proportion of patients who received appropriate diagnostic work-ups was greater during the trial than in the preintervention period in both groups, but the increase was greater in the computer-based guidelines group (adjusted mean difference in increase, 19.3 percentage points favoring computer-based guidelines [95% CI, 2.9 to 35.6 percentage points]; P = 0.023). Among patients with appropriate work-ups, those in the computer-based guidelines group received slightly fewer tests than did patients in the paper guidelines group (mean tests per patient, 1.76 [SD, 0.98] vs. 2.25 [SD, 1.04]; P < 0.001). The study was not designed to show a difference in the clinical outcomes of patients during follow-up. A handheld decision-support system improved diagnostic decision making for patients with suspected pulmonary embolism in the emergency department.

  13. Affinity selection of Nipah and Hendra virus-related vaccine candidates from a complex random peptide library displayed on bacteriophage virus-like particles

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Peabody, David S.; Chackerian, Bryce; Ashley, Carlee

    The invention relates to virus-like particles of bacteriophage MS2 (MS2 VLPs) displaying peptide epitopes or peptide mimics of epitopes of Nipah Virus envelope glycoprotein that elicit an immune response against Nipah Virus upon vaccination of humans or animals. Affinity selection on Nipah Virus-neutralizing monoclonal antibodies using random sequence peptide libraries on MS2 VLPs selected peptides with sequence similarity to peptide sequences found within the envelope glycoprotein of Nipah itself, thus identifying the epitopes the antibodies recognize. The selected peptide sequences themselves are not necessarily identical in all respects to a sequence within Nipah Virus glycoprotein, and therefore may be referredmore » to as epitope mimics VLPs displaying these epitope mimics can serve as vaccine. On the other hand, display of the corresponding wild-type sequence derived from Nipah Virus and corresponding to the epitope mapped by affinity selection, may also be used as a vaccine.« less

  14. Influence of motion on face recognition.

    PubMed

    Bonfiglio, Natale S; Manfredi, Valentina; Pessa, Eliano

    2012-02-01

    The influence of motion information and temporal associations on recognition of non-familiar faces was investigated using two groups which performed a face recognition task. One group was presented with regular temporal sequences of face views designed to produce the impression of motion of the face rotating in depth, the other group with random sequences of the same views. In one condition, participants viewed the sequences of the views in rapid succession with a negligible interstimulus interval (ISI). This condition was characterized by three different presentation times. In another condition, participants were presented a sequence with a 1-sec. ISI among the views. That regular sequences of views with a negligible ISI and a shorter presentation time were hypothesized to give rise to better recognition, related to a stronger impression of face rotation. Analysis of data from 45 participants showed a shorter presentation time was associated with significantly better accuracy on the recognition task; however, differences between performances associated with regular and random sequences were not significant.

  15. Comparative Analysis of the Genomes of Two Field Isolates of the Rice Blast Fungus Magnaporthe oryzae

    PubMed Central

    Li, Zhigang; Hu, Songnian; Yao, Nan; Dean, Ralph A.; Zhao, Wensheng; Shen, Mi; Zhang, Haiwang; Li, Chao; Liu, Liyuan; Cao, Lei; Xu, Xiaowen; Xing, Yunfei; Hsiang, Tom; Zhang, Ziding; Xu, Jin-Rong; Peng, You-Liang

    2012-01-01

    Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus. PMID:22876203

  16. Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

    It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less

  17. Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

    DOE PAGES

    Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

    2017-08-22

    It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less

  18. Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

    PubMed Central

    Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

    2017-01-01

    It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic (H) and polar (P) monomers in a computational model. We find that even short hydrophobic polar (HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating other such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition. PMID:28831002

  19. Selective interference of grasp and space representations with number magnitude and serial order processing.

    PubMed

    van Dijck, Jean-Philippe; Fias, Wim; Andres, Michael

    2015-10-01

    It has been proposed that the metrics of space, time and other magnitudes relevant for action are coupled through a generalized magnitude system that also contribute to number representation. Several studies capitalized on stimulus-response compatibility effects to show that numbers map onto left-right representations and grasp representations as a function of their magnitude. However, the tasks typically used do not allow disentangling magnitude from serial order processing. Here, we devised a working memory (WM) task where participants had to remember random sequences of numbers and perform a precision/whole-hand grip (Experiment 1) or a uni-manual left/right button press (Experiment 2) in response to numbers presented during the retention interval. This task does allow differentiating the interference of number magnitude and serial order with each set of responses. Experiment 1 showed that precision grips were initiated faster than whole-hand grips in response to small numbers, irrespective of their serial position in WM. In contrast, Experiment 2 revealed an advantage of right over left button presses as serial position increased, without any influence of number magnitude. These findings demonstrate that grasping and left-right movements overlap with distinct dimensions of number processing. These findings are discussed in the light of different theories explaining the interactions between numbers, space and action.

  20. An Unconditional Test for Change Point Detection in Binary Sequences with Applications to Clinical Registries.

    PubMed

    Ellenberger, David; Friede, Tim

    2016-08-05

    Methods for change point (also sometimes referred to as threshold or breakpoint) detection in binary sequences are not new and were introduced as early as 1955. Much of the research in this area has focussed on asymptotic and exact conditional methods. Here we develop an exact unconditional test. An unconditional exact test is developed which assumes the total number of events as random instead of conditioning on the number of observed events. The new test is shown to be uniformly more powerful than Worsley's exact conditional test and means for its efficient numerical calculations are given. Adaptions of methods by Berger and Boos are made to deal with the issue that the unknown event probability imposes a nuisance parameter. The methods are compared in a Monte Carlo simulation study and applied to a cohort of patients undergoing traumatic orthopaedic surgery involving external fixators where a change in pin site infections is investigated. The unconditional test controls the type I error rate at the nominal level and is uniformly more powerful than (or to be more precise uniformly at least as powerful as) Worsley's exact conditional test which is very conservative for small sample sizes. In the application a beneficial effect associated with the introduction of a new treatment procedure for pin site care could be revealed. We consider the new test an effective and easy to use exact test which is recommended in small sample size change point problems in binary sequences.

  1. Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles.

    PubMed

    Robinson, James; Guethlein, Lisbeth A; Cereb, Nezih; Yang, Soo Young; Norman, Paul J; Marsh, Steven G E; Parham, Peter

    2017-06-01

    HLA class I glycoproteins contain the functional sites that bind peptide antigens and engage lymphocyte receptors. Recently, clinical application of sequence-based HLA typing has uncovered an unprecedented number of novel HLA class I alleles. Here we define the nature and extent of the variation in 3,489 HLA-A, 4,356 HLA-B and 3,111 HLA-C alleles. This analysis required development of suites of methods, having general applicability, for comparing and analyzing large numbers of homologous sequences. At least three amino-acid substitutions are present at every position in the polymorphic α1 and α2 domains of HLA-A, -B and -C. A minority of positions have an incidence >1% for the 'second' most frequent nucleotide, comprising 70 positions in HLA-A, 85 in HLA-B and 54 in HLA-C. The majority of these positions have three or four alternative nucleotides. These positions were subject to positive selection and correspond to binding sites for peptides and receptors. Most alleles of HLA class I (>80%) are very rare, often identified in one person or family, and they differ by point mutation from older, more common alleles. These alleles with single nucleotide polymorphisms reflect the germ-line mutation rate. Their frequency predicts the human population harbors 8-9 million HLA class I variants. The common alleles of human populations comprise 42 core alleles, which represent all selected polymorphism, and recombinants that have assorted this polymorphism.

  2. Distinguishing functional polymorphism from random variation in the sequences of >10,000 HLA-A, -B and -C alleles

    PubMed Central

    Cereb, Nezih; Yang, Soo Young; Marsh, Steven G. E.; Parham, Peter

    2017-01-01

    HLA class I glycoproteins contain the functional sites that bind peptide antigens and engage lymphocyte receptors. Recently, clinical application of sequence-based HLA typing has uncovered an unprecedented number of novel HLA class I alleles. Here we define the nature and extent of the variation in 3,489 HLA-A, 4,356 HLA-B and 3,111 HLA-C alleles. This analysis required development of suites of methods, having general applicability, for comparing and analyzing large numbers of homologous sequences. At least three amino-acid substitutions are present at every position in the polymorphic α1 and α2 domains of HLA-A, -B and -C. A minority of positions have an incidence >1% for the ‘second’ most frequent nucleotide, comprising 70 positions in HLA-A, 85 in HLA-B and 54 in HLA-C. The majority of these positions have three or four alternative nucleotides. These positions were subject to positive selection and correspond to binding sites for peptides and receptors. Most alleles of HLA class I (>80%) are very rare, often identified in one person or family, and they differ by point mutation from older, more common alleles. These alleles with single nucleotide polymorphisms reflect the germ-line mutation rate. Their frequency predicts the human population harbors 8–9 million HLA class I variants. The common alleles of human populations comprise 42 core alleles, which represent all selected polymorphism, and recombinants that have assorted this polymorphism. PMID:28650991

  3. Employing online quantum random number generators for generating truly random quantum states in Mathematica

    NASA Astrophysics Data System (ADS)

    Miszczak, Jarosław Adam

    2013-01-01

    The presented package for the Mathematica computing system allows the harnessing of quantum random number generators (QRNG) for investigating the statistical properties of quantum states. The described package implements a number of functions for generating random states. The new version of the package adds the ability to use the on-line quantum random number generator service and implements new functions for retrieving lists of random numbers. Thanks to the introduced improvements, the new version provides faster access to high-quality sources of random numbers and can be used in simulations requiring large amount of random data. New version program summaryProgram title: TRQS Catalogue identifier: AEKA_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEKA_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 18 134 No. of bytes in distributed program, including test data, etc.: 2 520 49 Distribution format: tar.gz Programming language: Mathematica, C. Computer: Any supporting Mathematica in version 7 or higher. Operating system: Any platform supporting Mathematica; tested with GNU/Linux (32 and 64 bit). RAM: Case-dependent Supplementary material: Fig. 1 mentioned below can be downloaded. Classification: 4.15. External routines: Quantis software library (http://www.idquantique.com/support/quantis-trng.html) Catalogue identifier of previous version: AEKA_v1_0 Journal reference of previous version: Comput. Phys. Comm. 183(2012)118 Does the new version supersede the previous version?: Yes Nature of problem: Generation of random density matrices and utilization of high-quality random numbers for the purpose of computer simulation. Solution method: Use of a physical quantum random number generator and an on-line service providing access to the source of true random numbers generated by quantum real number generator. Reasons for new version: Added support for the high-speed on-line quantum random number generator and improved methods for retrieving lists of random numbers. Summary of revisions: The presented version provides two signicant improvements. The first one is the ability to use the on-line Quantum Random Number Generation service developed by PicoQuant GmbH and the Nano-Optics groups at the Department of Physics of Humboldt University. The on-line service supported in the version 2.0 of the TRQS package provides faster access to true randomness sources constructed using the laws of quantum physics. The service is freely available at https://qrng.physik.hu-berlin.de/. The use of this service allows using the presented package with the need of a physical quantum random number generator. The second improvement introduced in this version is the ability to retrieve arrays of random data directly for the used source. This increases the speed of the random number generation, especially in the case of an on-line service, where it reduces the time necessary to establish the connection. Thanks to the speed improvement of the presented version, the package can now be used in simulations requiring larger amounts of random data. Moreover, the functions for generating random numbers provided by the current version of the package more closely follow the pattern of functions for generating pseudo- random numbers provided in Mathematica. Additional comments: Speed comparison: The implementation of the support for the QRNG on-line service provides a noticeable improvement in the speed of random number generation. For the samples of real numbers of size 101; 102,…,107 the times required to generate these samples using Quantis USB device and QRNG service are compared in Fig. 1. The presented results show that the use of the on-line service provides faster access to random numbers. One should note, however, that the speed gain can increase or decrease depending on the connection speed between the computer and the server providing random numbers. Running time: Depends on the used source of randomness and the amount of random data used in the experiment. References: [1] M. Wahl, M. Leifgen, M. Berlin, T. Röhlicke, H.-J. Rahn, O. Benson., An ultrafast quantum random number generator with provably bounded output bias based on photon arrival time measurements, Applied Physics Letters, Vol. 098, 171105 (2011). http://dx.doi.org/10.1063/1.3578456.

  4. Not all (possibly) “random” sequences are created equal

    PubMed Central

    Pincus, Steve; Kalman, Rudolf E.

    1997-01-01

    The need to assess the randomness of a single sequence, especially a finite sequence, is ubiquitous, yet is unaddressed by axiomatic probability theory. Here, we assess randomness via approximate entropy (ApEn), a computable measure of sequential irregularity, applicable to single sequences of both (even very short) finite and infinite length. We indicate the novelty and facility of the multidimensional viewpoint taken by ApEn, in contrast to classical measures. Furthermore and notably, for finite length, finite state sequences, one can identify maximally irregular sequences, and then apply ApEn to quantify the extent to which given sequences differ from maximal irregularity, via a set of deficit (defm) functions. The utility of these defm functions which we show allows one to considerably refine the notions of probabilistic independence and normality, is featured in several studies, including (i) digits of e, π, √2, and √3, both in base 2 and in base 10, and (ii) sequences given by fractional parts of multiples of irrationals. We prove companion analytic results, which also feature in a discussion of the role and validity of the almost sure properties from axiomatic probability theory insofar as they apply to specified sequences and sets of sequences (in the physical world). We conclude by relating the present results and perspective to both previous and subsequent studies. PMID:11038612

  5. Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project.

    PubMed

    Park, Tae-Ho; Park, Beom-Seok; Kim, Jin-A; Hong, Joon Ki; Jin, Mina; Seol, Young-Joo; Mun, Jeong-Hwan

    2011-01-01

    As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones. Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa. Copyright © 2011. Published by Elsevier Ltd.

  6. Roles of the 2 microns gene products in stable maintenance of the 2 microns plasmid of Saccharomyces cerevisiae.

    PubMed Central

    Reynolds, A E; Murray, A W; Szostak, J W

    1987-01-01

    We have examined the replication and segregation of the Saccharomyces cerevisiae 2 microns circle. The amplification of the plasmid at low copy numbers requires site-specific recombination between the 2 microns inverted repeat sequences catalyzed by the plasmid-encoded FLP gene. No other 2 microns gene products are required. The overexpression of FLP in a strain carrying endogenous 2 microns leads to uncontrolled plasmid replication, longer cell cycles, and cell death. Two different assays show that the level of Flp activity decreases with increasing 2 microns copy number. This regulation requires the products of the REP1 and REP2 genes. These gene products also act together to ensure that 2 microns molecules are randomly segregated between mother and daughter cells at cell division. Images PMID:3316982

  7. Contrast-enhanced 3-dimensional SPACE versus MP-RAGE for the detection of brain metastases: considerations with a 32-channel head coil.

    PubMed

    Reichert, Miriam; Morelli, John N; Runge, Val M; Tao, Ai; von Ritschl, Ruediger; von Ritschl, Andreas; Padua, Abraham; Dix, James E; Marra, Michael J; Schoenberg, Stefan O; Attenberger, Ulrike I

    2013-01-01

    The aim of this study was to compare the detection of brain metastases at 3 T using a 32-channel head coil with 2 different 3-dimensional (3D) contrast-enhanced sequences, a T1-weighted fast spin-echo-based (SPACE; sampling perfection with application-optimized contrasts using different flip angle evolutions) sequence and a conventional magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. Seventeen patients with 161 brain metastases were examined prospectively using both SPACE and MP-RAGE sequences on a 3-T magnetic resonance system. Eight healthy volunteers were similarly examined for determination of signal-to-noise ratio (SNR) values. Parameters were adjusted to equalize acquisition times between the sequences (3 minutes and 30 seconds). The order in which sequences were performed was randomized. Two blinded board-certified neuroradiologists evaluated the number of detectable metastatic lesions with each sequence relative to a criterion standard reading conducted at the Gamma Knife facility by a neuroradiologist with access to all clinical and imaging data. In the volunteer assessment with SPACE and MP-RAGE, SNR (10.3 ± 0.8 vs 7.7 ± 0.7) and contrast-to-noise ratio (0.8 ± 0.2 vs 0.5 ± 0.1) were statistically significantly greater with the SPACE sequence (P < 0.05). Overall, lesion detection was markedly improved with the SPACE sequence (99.1% of lesions for reader 1 and 96.3% of lesions for reader 2) compared with the MP-RAGE sequence (73.6% of lesions for reader 1 and 68.5% of lesions for reader 2; P < 0.01). A 3D T1-weighted fast spin echo sequence (SPACE) improves detection of metastatic lesions relative to 3D T1-weighted gradient-echo-based scan (MP-RAGE) imaging when implemented with a 32-channel head coil at identical scan acquisition times (3 minutes and 30 seconds).

  8. Introducing Perception and Modelling of Spatial Randomness in Classroom

    ERIC Educational Resources Information Center

    De Nóbrega, José Renato

    2017-01-01

    A strategy to facilitate understanding of spatial randomness is described, using student activities developed in sequence: looking at spatial patterns, simulating approximate spatial randomness using a grid of equally-likely squares, using binomial probabilities for approximations and predictions and then comparing with given Poisson…

  9. True random numbers from amplified quantum vacuum.

    PubMed

    Jofre, M; Curty, M; Steinlechner, F; Anzolin, G; Torres, J P; Mitchell, M W; Pruneri, V

    2011-10-10

    Random numbers are essential for applications ranging from secure communications to numerical simulation and quantitative finance. Algorithms can rapidly produce pseudo-random outcomes, series of numbers that mimic most properties of true random numbers while quantum random number generators (QRNGs) exploit intrinsic quantum randomness to produce true random numbers. Single-photon QRNGs are conceptually simple but produce few random bits per detection. In contrast, vacuum fluctuations are a vast resource for QRNGs: they are broad-band and thus can encode many random bits per second. Direct recording of vacuum fluctuations is possible, but requires shot-noise-limited detectors, at the cost of bandwidth. We demonstrate efficient conversion of vacuum fluctuations to true random bits using optical amplification of vacuum and interferometry. Using commercially-available optical components we demonstrate a QRNG at a bit rate of 1.11 Gbps. The proposed scheme has the potential to be extended to 10 Gbps and even up to 100 Gbps by taking advantage of high speed modulation sources and detectors for optical fiber telecommunication devices.

  10. Impact of Sequencing Radiation Therapy and Chemotherapy on Long-Term Local Toxicity for Early Breast Cancer: Results of a Randomized Study at 15-Year Follow-Up

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Pinnarò, Paola; Giordano, Carolina; Farneti, Alessia

    Purpose: To compare long-term late local toxicity after either concomitant or sequential chemoradiation therapy after breast-conserving surgery. Methods and Materials: From 1997 to 2002, women aged 18 to 75 years who underwent breast-conserving surgery and axillary dissection for early breast cancer and in whom CMF (cyclophosphamide, methotrexate, and 5-fluorouracil) chemotherapy was planned were randomized between concomitant and sequential radiation therapy. Radiation therapy was delivered to the whole breast through tangential fields to 50 Gy in 20 fractions over a period of 4 weeks, followed by an electron boost. Surviving patients were tentatively contacted and examined between March and September 2014. Patients in whom progressive diseasemore » had developed or who had undergone further breast surgery were excluded. Local toxicity (fibrosis, telangiectasia, and breast atrophy or retraction) was scored blindly to the treatment received. A logistic regression was run to investigate the effect of treatment sequence after correction for several patient-, treatment-, and tumor-related covariates on selected endpoints. The median time to cross-sectional analysis was 15.7 years (range, 12.0-17.8 years). Results: Of 206 patients randomized, 154 (74.8%) were potentially eligible. Of these, 43 (27.9%) refused participation and 4 (2.6%) had been lost to follow-up, and for 5 (3.2%), we could not restore planning data; thus, the final number of analyzed patients was 102. No grade 4 toxicity had been observed, whereas the number of grade 3 toxicity events was low (<8%) for each item, allowing pooling of grade 2 and 3 events for further analysis. Treatment sequence (concomitant vs sequential) was an independent predictor of grade 2 or 3 fibrosis according to both the National Cancer Institute Common Terminology Criteria for Adverse Events (odds ratio [OR], 4.05; 95% confidence interval [CI], 1.34-12.2; P=.013) and the SOMA (Subjective, Objective, Management and Analytic) scale (OR, 3.75; 95% CI, 1.19-11.79; P=.018), as well as grade 2 or 3 breast atrophy or retraction (OR, 3.87; 95% CI, 1.42-10.56; P=.008). No effect on telangiectasia was detected. Conclusions: At long-term follow-up, concomitant chemoradiation therapy has a detrimental effect on both fibrosis and retraction with an approximately 4-fold increase in the odds of grade 2 or 3 toxicity.« less

  11. Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps

    NASA Astrophysics Data System (ADS)

    Niranjani, G.; Murugan, R.

    2016-08-01

    The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites (SBSs). It is an intriguing question how the naturally occurring TFs and their SBSs are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the SBS such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the SBS and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the SBSs of TFs on genomic DNA.

  12. On the synchronizability and detectability of random PPM sequences

    NASA Technical Reports Server (NTRS)

    Georghiades, Costas N.; Lin, Shu

    1987-01-01

    The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum-likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds derived on the symbol error probability as well as the probability of false synchronization indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.

  13. On the synchronizability and detectability of random PPM sequences

    NASA Technical Reports Server (NTRS)

    Georghiades, Costas N.

    1987-01-01

    The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds were derived on the symbol error probability as well as the probability of false synchronization that indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.

  14. Random variability explains apparent global clustering of large earthquakes

    USGS Publications Warehouse

    Michael, A.J.

    2011-01-01

    The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.

  15. Implicit transfer of spatial structure in visuomotor sequence learning.

    PubMed

    Tanaka, Kanji; Watanabe, Katsumi

    2014-11-01

    Implicit learning and transfer in sequence learning are essential in daily life. Here, we investigated the implicit transfer of visuomotor sequences following a spatial transformation. In the two experiments, participants used trial and error to learn a sequence consisting of several button presses, known as the m×n task (Hikosaka et al., 1995). After this learning session, participants learned another sequence in which the button configuration was spatially transformed in one of the following ways: mirrored, rotated, and random arrangement. Our results showed that even when participants were unaware of the transformation rules, accuracy of transfer session in the mirrored and rotated groups was higher than that in the random group (i.e., implicit transfer occurred). Both those who noticed the transformation rules and those who did not (i.e., explicit and implicit transfer instances, respectively) showed faster performance in the mirrored sequences than in the rotated sequences. Taken together, the present results suggest that people can use their implicit visuomotor knowledge to spatially transform sequences and that implicit transfers are modulated by a transformation cost, similar to that in explicit transfer. Copyright © 2014 Elsevier B.V. All rights reserved.

  16. Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

    NASA Astrophysics Data System (ADS)

    Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

    2017-03-01

    This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.

  17. Comparative effectiveness of next generation genomic sequencing for disease diagnosis: design of a randomized controlled trial in patients with colorectal cancer/polyposis syndromes.

    PubMed

    Gallego, Carlos J; Bennette, Caroline S; Heagerty, Patrick; Comstock, Bryan; Horike-Pyne, Martha; Hisama, Fuki; Amendola, Laura M; Bennett, Robin L; Dorschner, Michael O; Tarczy-Hornoch, Peter; Grady, William M; Fullerton, S Malia; Trinidad, Susan B; Regier, Dean A; Nickerson, Deborah A; Burke, Wylie; Patrick, Donald L; Jarvik, Gail P; Veenstra, David L

    2014-09-01

    Whole exome and whole genome sequencing are applications of next generation sequencing transforming clinical care, but there is little evidence whether these tests improve patient outcomes or if they are cost effective compared to current standard of care. These gaps in knowledge can be addressed by comparative effectiveness and patient-centered outcomes research. We designed a randomized controlled trial that incorporates these research methods to evaluate whole exome sequencing compared to usual care in patients being evaluated for hereditary colorectal cancer and polyposis syndromes. Approximately 220 patients will be randomized and followed for 12 months after return of genomic findings. Patients will receive findings associated with colorectal cancer in a first return of results visit, and findings not associated with colorectal cancer (incidental findings) during a second return of results visit. The primary outcome is efficacy to detect mutations associated with these syndromes; secondary outcomes include psychosocial impact, cost-effectiveness and comparative costs. The secondary outcomes will be obtained via surveys before and after each return visit. The expected challenges in conducting this randomized controlled trial include the relatively low prevalence of genetic disease, difficult interpretation of some genetic variants, and uncertainty about which incidental findings should be returned to patients. The approaches utilized in this study may help guide other investigators in clinical genomics to identify useful outcome measures and strategies to address comparative effectiveness questions about the clinical implementation of genomic sequencing in clinical care. Copyright © 2014 Elsevier Inc. All rights reserved.

  18. DNA based random key generation and management for OTP encryption.

    PubMed

    Zhang, Yunpeng; Liu, Xin; Sun, Manhui

    2017-09-01

    One-time pad (OTP) is a principle of key generation applied to the stream ciphering method which offers total privacy. The OTP encryption scheme has proved to be unbreakable in theory, but difficult to realize in practical applications. Because OTP encryption specially requires the absolute randomness of the key, its development has suffered from dense constraints. DNA cryptography is a new and promising technology in the field of information security. DNA chromosomes storing capabilities can be used as one-time pad structures with pseudo-random number generation and indexing in order to encrypt the plaintext messages. In this paper, we present a feasible solution to the OTP symmetric key generation and transmission problem with DNA at the molecular level. Through recombinant DNA technology, by using only sender-receiver known restriction enzymes to combine the secure key represented by DNA sequence and the T vector, we generate the DNA bio-hiding secure key and then place the recombinant plasmid in implanted bacteria for secure key transmission. The designed bio experiments and simulation results show that the security of the transmission of the key is further improved and the environmental requirements of key transmission are reduced. Analysis has demonstrated that the proposed DNA-based random key generation and management solutions are marked by high security and usability. Published by Elsevier B.V.

  19. Query construction, entropy, and generalization in neural-network models

    NASA Astrophysics Data System (ADS)

    Sollich, Peter

    1994-05-01

    We study query construction algorithms, which aim at improving the generalization ability of systems that learn from examples by choosing optimal, nonredundant training sets. We set up a general probabilistic framework for deriving such algorithms from the requirement of optimizing a suitable objective function; specifically, we consider the objective functions entropy (or information gain) and generalization error. For two learning scenarios, the high-low game and the linear perceptron, we evaluate the generalization performance obtained by applying the corresponding query construction algorithms and compare it to training on random examples. We find qualitative differences between the two scenarios due to the different structure of the underlying rules (nonlinear and ``noninvertible'' versus linear); in particular, for the linear perceptron, random examples lead to the same generalization ability as a sequence of queries in the limit of an infinite number of examples. We also investigate learning algorithms which are ill matched to the learning environment and find that, in this case, minimum entropy queries can in fact yield a lower generalization ability than random examples. Finally, we study the efficiency of single queries and its dependence on the learning history, i.e., on whether the previous training examples were generated randomly or by querying, and the difference between globally and locally optimal query construction.

  20. Random mutagenesis of the hyperthermophilic archaeon Pyrococcus furiosus using in vitro mariner transposition and natural transformation.

    PubMed

    Guschinskaya, Natalia; Brunel, Romain; Tourte, Maxime; Lipscomb, Gina L; Adams, Michael W W; Oger, Philippe; Charpentier, Xavier

    2016-11-08

    Transposition mutagenesis is a powerful tool to identify the function of genes, reveal essential genes and generally to unravel the genetic basis of living organisms. However, transposon-mediated mutagenesis has only been successfully applied to a limited number of archaeal species and has never been reported in Thermococcales. Here, we report random insertion mutagenesis in the hyperthermophilic archaeon Pyrococcus furiosus. The strategy takes advantage of the natural transformability of derivatives of the P. furiosus COM1 strain and of in vitro Mariner-based transposition. A transposon bearing a genetic marker is randomly transposed in vitro in genomic DNA that is then used for natural transformation of P. furiosus. A small-scale transposition reaction routinely generates several hundred and up to two thousands transformants. Southern analysis and sequencing showed that the obtained mutants contain a single and random genomic insertion. Polyploidy has been reported in Thermococcales and P. furiosus is suspected of being polyploid. Yet, about half of the mutants obtained on the first selection are homozygous for the transposon insertion. Two rounds of isolation on selective medium were sufficient to obtain gene conversion in initially heterozygous mutants. This transposition mutagenesis strategy will greatly facilitate functional exploration of the Thermococcales genomes.

  1. Dice and DNA

    ERIC Educational Resources Information Center

    Wernersson, Rasmus

    2007-01-01

    An important part of teaching students how to use the BLAST tool for searching large sequence databases, is to train the students to think critically about the quality of the sequence hits found--both in terms of the statistical significance and how informative the individual hits are. This paper describes how generating truly random sequences by…

  2. A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

    PubMed

    Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

    2015-07-01

    In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.

  3. Individual Differences Methods for Randomized Experiments

    ERIC Educational Resources Information Center

    Tucker-Drob, Elliot M.

    2011-01-01

    Experiments allow researchers to randomly vary the key manipulation, the instruments of measurement, and the sequences of the measurements and manipulations across participants. To date, however, the advantages of randomized experiments to manipulate both the aspects of interest and the aspects that threaten internal validity have been primarily…

  4. Application of a case-control study design to investigate genotypic signatures of HIV-1 transmission.

    PubMed

    Mota, Talia M; Murray, John M; Center, Rob J; Purcell, Damian F J; McCaw, James M

    2012-06-25

    The characterization of HIV-1 transmission strains may inform the design of an effective vaccine. Shorter variable loops with fewer predicted glycosites have been suggested as signatures enriched in envelope sequences derived during acute HIV-1 infection. Specifically, a transmission-linked lack of glycosites within the V1 and V2 loops of gp120 provides greater access to an α4β7 binding motif, which promotes the establishment of infection. Also, a histidine at position 12 in the leader sequence of Env has been described as a transmission signature that is selected against during chronic infection. The purpose of this study is to measure the association of the presence of an α4β7 binding motif, the number of N-linked glycosites, the length of the variable loops, and the prevalence of histidine at position 12 with HIV-1 transmission. A case-control study design was used to measure the prevalence of these variables between subtype B and C transmission sequences and frequency-matched randomly-selected sequences derived from chronically infected controls. Subtype B transmission strains had shorter V3 regions than chronic strains (p = 0.031); subtype C transmission strains had shorter V1 loops than chronic strains (p = 0.047); subtype B transmission strains had more V3 loop glycosites (p = 0.024) than chronic strains. Further investigation showed that these statistically significant results were unlikely to be biologically meaningful. Also, there was no difference observed in the prevalence of a histidine at position 12 among transmission strains and controls of either subtype. Although a genetic bottleneck is observed after HIV-1 transmission, our results indicate that summary characteristics of Env hypothesised to be important in transmission are not divergent between transmission and chronic strains of either subtype. The success of a transmission strain to initiate infection may be a random event from the divergent pool of donor viral sequences. The characteristics explored through this study are important, but may not function as genotypic signatures of transmission as previously described.

  5. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    PubMed Central

    Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

    2009-01-01

    The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722

  6. Automatic trajectory measurement of large numbers of crowded objects

    NASA Astrophysics Data System (ADS)

    Li, Hui; Liu, Ye; Chen, Yan Qiu

    2013-06-01

    Complex motion patterns of natural systems, such as fish schools, bird flocks, and cell groups, have attracted great attention from scientists for years. Trajectory measurement of individuals is vital for quantitative and high-throughput study of their collective behaviors. However, such data are rare mainly due to the challenges of detection and tracking of large numbers of objects with similar visual features and frequent occlusions. We present an automatic and effective framework to measure trajectories of large numbers of crowded oval-shaped objects, such as fish and cells. We first use a novel dual ellipse locator to detect the coarse position of each individual and then propose a variance minimization active contour method to obtain the optimal segmentation results. For tracking, cost matrix of assignment between consecutive frames is trainable via a random forest classifier with many spatial, texture, and shape features. The optimal trajectories are found for the whole image sequence by solving two linear assignment problems. We evaluate the proposed method on many challenging data sets.

  7. Rational evolutionary design: the theory of in vitro protein evolution.

    PubMed

    Voigt, C A; Kauffman, S; Wang, Z G

    2000-01-01

    Directed evolution uses a combination of powerful search techniques to generate proteins with improved properties. Part of the success is due to the stochastic element of random mutagenesis; improvements can be made without a detailed description of the complex interactions that constitute function or stability. However, optimization is not a conglomeration of random processes. Rather, it requires both knowledge of the system that is being optimized and a logical series of techniques that best explores the pathways of evolution (Eigen et al., 1988). The weighing of parameters associated with mutation, recombination, and screening to achieve the maximum fitness improvement is the beginning of rational evolutionary design. The optimal mutation rate is strongly influenced by the finite number of mutants that can be screened. A smooth fitness landscape implies that many mutations can be accumulated without disrupting the fitness. This has the effect of lowering the required library size to sample a higher mutation rate. As the sequence ascends the fitness landscape, the optimal mutation rate decreases as the probability of discovering improved mutations also decreases. Highly coupled regions require that many mutations be simultaneously made to generate a positive mutant. Therefore, positive mutations are discovered at uncoupled positions as the fitness of the parent increases. The benefit of recombination is twofold: it combines good mutations and searches more sequence space in a meaningful way. Recombination is most beneficial when the number of mutants that can be screened is limited and the landscape is of an intermediate ruggedness. The structure of schema in proteins leads to the conclusion that many cut points are required. The number of parents and their sequence identity are determined by the balance between exploration and exploitation. Many disparate parents can explore more space, but at the risk of losing information. The required screening effort is related to the number of uphill paths, which decreases more rapidly for rugged landscapes. Noise in the fitness measurements causes a dramatic increase in the required mutant library size, thus implying a smaller optimal mutation rate. Because of strict limitations on the number of mutants that can be screened, there is motivation to optimize the content of the mutant library. By restricting mutations to regions of the gene that are expected to show improvement, a greater return can be made with the same number of mutants. Initial studies with subtilisin E have shown that structurally tolerant positions tend to be where positive activity mutants are made during directed evolution. Mutant fitness information is produced by the screening step that has the potential to provide insight into the structure of the fitness landscape, thus aiding the setting of experimental parameters. By analyzing the mutant fitness distribution and targeting specific regions of the sequence, in vitro evolution can be accelerated. However, when expediting the search, there is a trade-off between rapid improvement and the quality of the long-term solution. The benefit of neutrality has yet to be captured with in vitro protein evolution. Neutral theory predicts the punctuated emergence of novel structure and function, however, with current methods, the required time scale is not feasible. Utilizing neutral evolution to accelerate the discovery of new functional and structural solutions requires a theory that predicts the behavior of mutational pathways between networks. Because the transition from neutral to adaptive evolution requires a multi-mutational switch, increasing the mutation rate decreases the time required for a punctuated change to occur. By limiting the search to the less coupled region of the sequence (smooth portion of the fitness landscape), the required larger mutation rate can be tolerated. Advances in directed evolution will be achieved when the driving forces behind such proce

  8. Using Computer-Generated Random Numbers to Calculate the Lifetime of a Comet.

    ERIC Educational Resources Information Center

    Danesh, Iraj

    1991-01-01

    An educational technique to calculate the lifetime of a comet using software-generated random numbers is introduced to undergraduate physiques and astronomy students. Discussed are the generation and eligibility of the required random numbers, background literature related to the problem, and the solution to the problem using random numbers.…

  9. SHARAKU: an algorithm for aligning and clustering read mapping profiles of deep sequencing in non-coding RNA processing.

    PubMed

    Tsuchiya, Mariko; Amano, Kojiro; Abe, Masaya; Seki, Misato; Hase, Sumitaka; Sato, Kengo; Sakakibara, Yasubumi

    2016-06-15

    Deep sequencing of the transcripts of regulatory non-coding RNA generates footprints of post-transcriptional processes. After obtaining sequence reads, the short reads are mapped to a reference genome, and specific mapping patterns can be detected called read mapping profiles, which are distinct from random non-functional degradation patterns. These patterns reflect the maturation processes that lead to the production of shorter RNA sequences. Recent next-generation sequencing studies have revealed not only the typical maturation process of miRNAs but also the various processing mechanisms of small RNAs derived from tRNAs and snoRNAs. We developed an algorithm termed SHARAKU to align two read mapping profiles of next-generation sequencing outputs for non-coding RNAs. In contrast with previous work, SHARAKU incorporates the primary and secondary sequence structures into an alignment of read mapping profiles to allow for the detection of common processing patterns. Using a benchmark simulated dataset, SHARAKU exhibited superior performance to previous methods for correctly clustering the read mapping profiles with respect to 5'-end processing and 3'-end processing from degradation patterns and in detecting similar processing patterns in deriving the shorter RNAs. Further, using experimental data of small RNA sequencing for the common marmoset brain, SHARAKU succeeded in identifying the significant clusters of read mapping profiles for similar processing patterns of small derived RNA families expressed in the brain. The source code of our program SHARAKU is available at http://www.dna.bio.keio.ac.jp/sharaku/, and the simulated dataset used in this work is available at the same link. Accession code: The sequence data from the whole RNA transcripts in the hippocampus of the left brain used in this work is available from the DNA DataBank of Japan (DDBJ) Sequence Read Archive (DRA) under the accession number DRA004502. yasu@bio.keio.ac.jp Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  10. Viral Linkage in HIV-1 Seroconverters and Their Partners in an HIV-1 Prevention Clinical Trial

    PubMed Central

    Campbell, Mary S.; Mullins, James I.; Hughes, James P.; Celum, Connie; Wong, Kim G.; Raugi, Dana N.; Sorensen, Stefanie; Stoddard, Julia N.; Zhao, Hong; Deng, Wenjie; Kahle, Erin; Panteleeff, Dana; Baeten, Jared M.; McCutchan, Francine E.; Albert, Jan; Leitner, Thomas; Wald, Anna; Corey, Lawrence; Lingappa, Jairam R.

    2011-01-01

    Background Characterization of viruses in HIV-1 transmission pairs will help identify biological determinants of infectiousness and evaluate candidate interventions to reduce transmission. Although HIV-1 sequencing is frequently used to substantiate linkage between newly HIV-1 infected individuals and their sexual partners in epidemiologic and forensic studies, viral sequencing is seldom applied in HIV-1 prevention trials. The Partners in Prevention HSV/HIV Transmission Study (ClinicalTrials.gov #NCT00194519) was a prospective randomized placebo-controlled trial that enrolled serodiscordant heterosexual couples to determine the efficacy of genital herpes suppression in reducing HIV-1 transmission; as part of the study analysis, HIV-1 sequences were examined for genetic linkage between seroconverters and their enrolled partners. Methodology/Principal Findings We obtained partial consensus HIV-1 env and gag sequences from blood plasma for 151 transmission pairs and performed deep sequencing of env in some cases. We analyzed sequences with phylogenetic techniques and developed a Bayesian algorithm to evaluate the probability of linkage. For linkage, we required monophyletic clustering between enrolled partners' sequences and a Bayesian posterior probability of ≥50%. Adjudicators classified each seroconversion, finding 108 (71.5%) linked, 40 (26.5%) unlinked, and 3 (2.0%) indeterminate transmissions, with linkage determined by consensus env sequencing in 91 (84%). Male seroconverters had a higher frequency of unlinked transmissions than female seroconverters. The likelihood of transmission from the enrolled partner was related to time on study, with increasing numbers of unlinked transmissions occurring after longer observation periods. Finally, baseline viral load was found to be significantly higher among linked transmitters. Conclusions/Significance In this first use of HIV-1 sequencing to establish endpoints in a large clinical trial, more than one-fourth of transmissions were unlinked to the enrolled partner, illustrating the relevance of these methods in the design of future HIV-1 prevention trials in serodiscordant couples. A hierarchy of sequencing techniques, analysis methods, and expert adjudication contributed to the linkage determination process. PMID:21399681

  11. HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology

    PubMed Central

    Hodzic, Ermin; Sauerwald, Thomas; Dao, Phuong; Wang, Kendric; Yeung, Jake; Anderson, Shawn; Vandin, Fabio; Haffari, Gholamreza; Collins, Colin C.; Sahinalp, S. Cenk

    2017-01-01

    Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the “random walk facility location” (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: “multihitting time,” the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients’ survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology. PMID:28768687

  12. Pseudo-random number generator for the Sigma 5 computer

    NASA Technical Reports Server (NTRS)

    Carroll, S. N.

    1983-01-01

    A technique is presented for developing a pseudo-random number generator based on the linear congruential form. The two numbers used for the generator are a prime number and a corresponding primitive root, where the prime is the largest prime number that can be accurately represented on a particular computer. The primitive root is selected by applying Marsaglia's lattice test. The technique presented was applied to write a random number program for the Sigma 5 computer. The new program, named S:RANDOM1, is judged to be superior to the older program named S:RANDOM. For applications requiring several independent random number generators, a table is included showing several acceptable primitive roots. The technique and programs described can be applied to any computer having word length different from that of the Sigma 5.

  13. Assessment of antibody library diversity through next generation sequencing and technical error compensation

    PubMed Central

    Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

    2017-01-01

    Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201

  14. Complete chloroplast genome sequence and comparative analysis of loblolly pine (Pinus taeda L.) with related species

    PubMed Central

    Khan, Abdul Latif; Khan, Muhammad Aaqil; Shahzad, Raheem; Lubna; Kang, Sang Mo; Al-Harrasi, Ahmed; Al-Rawahi, Ahmed; Lee, In-Jung

    2018-01-01

    Pinaceae, the largest family of conifers, has a diversified organization of chloroplast (cp) genomes with two typical highly reduced inverted repeats (IRs). In the current study, we determined the complete sequence of the cp genome of an economically and ecologically important conifer tree, the loblolly pine (Pinus taeda L.), using Illumina paired-end sequencing and compared the sequence with those of other pine species. The results revealed a genome size of 121,531 base pairs (bp) containing a pair of 830-bp IR regions, distinguished by a small single copy (42,258 bp) and large single copy (77,614 bp) region. The chloroplast genome of P. taeda encodes 120 genes, comprising 81 protein-coding genes, four ribosomal RNA genes, and 35 tRNA genes, with 151 randomly distributed microsatellites. Approximately 6 palindromic, 34 forward, and 22 tandem repeats were found in the P. taeda cp genome. Whole cp genome comparison with those of other Pinus species exhibited an overall high degree of sequence similarity, with some divergence in intergenic spacers. Higher and lower numbers of indels and single-nucleotide polymorphism substitutions were observed relative to P. contorta and P. monophylla, respectively. Phylogenomic analyses based on the complete genome sequence revealed that 60 shared genes generated trees with the same topologies, and P. taeda was closely related to P. contorta in the subgenus Pinus. Thus, the complete P. taeda genome provided valuable resources for population and evolutionary studies of gymnosperms and can be used to identify related species. PMID:29596414

  15. Assessment of antibody library diversity through next generation sequencing and technical error compensation.

    PubMed

    Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

    2017-01-01

    Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.

  16. A semi-supervised learning approach for RNA secondary structure prediction.

    PubMed

    Yonemoto, Haruka; Asai, Kiyoshi; Hamada, Michiaki

    2015-08-01

    RNA secondary structure prediction is a key technology in RNA bioinformatics. Most algorithms for RNA secondary structure prediction use probabilistic models, in which the model parameters are trained with reliable RNA secondary structures. Because of the difficulty of determining RNA secondary structures by experimental procedures, such as NMR or X-ray crystal structural analyses, there are still many RNA sequences that could be useful for training whose secondary structures have not been experimentally determined. In this paper, we introduce a novel semi-supervised learning approach for training parameters in a probabilistic model of RNA secondary structures in which we employ not only RNA sequences with annotated secondary structures but also ones with unknown secondary structures. Our model is based on a hybrid of generative (stochastic context-free grammars) and discriminative models (conditional random fields) that has been successfully applied to natural language processing. Computational experiments indicate that the accuracy of secondary structure prediction is improved by incorporating RNA sequences with unknown secondary structures into training. To our knowledge, this is the first study of a semi-supervised learning approach for RNA secondary structure prediction. This technique will be useful when the number of reliable structures is limited. Copyright © 2015 Elsevier Ltd. All rights reserved.

  17. Multiplex single-molecule interaction profiling of DNA-barcoded proteins.

    PubMed

    Gu, Liangcai; Li, Chao; Aach, John; Hill, David E; Vidal, Marc; Church, George M

    2014-11-27

    In contrast with advances in massively parallel DNA sequencing, high-throughput protein analyses are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule protein detection using optical methods is limited by the number of spectrally non-overlapping chromophores. Here we introduce a single-molecular-interaction sequencing (SMI-seq) technology for parallel protein interaction profiling leveraging single-molecule advantages. DNA barcodes are attached to proteins collectively via ribosome display or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide thin film to construct a random single-molecule array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies) and analysed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimetre. Furthermore, protein interactions can be measured on the basis of the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor and antibody-binding profiling, are demonstrated. SMI-seq enables 'library versus library' screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity.

  18. Multiplex single-molecule interaction profiling of DNA barcoded proteins

    PubMed Central

    Gu, Liangcai; Li, Chao; Aach, John; Hill, David E.; Vidal, Marc; Church, George M.

    2014-01-01

    In contrast with advances in massively parallel DNA sequencing1, high-throughput protein analyses2-4 are often limited by ensemble measurements, individual analyte purification and hence compromised quality and cost-effectiveness. Single-molecule (SM) protein detection achieved using optical methods5 is limited by the number of spectrally nonoverlapping chromophores. Here, we introduce a single molecular interaction-sequencing (SMI-Seq) technology for parallel protein interaction profiling leveraging SM advantages. DNA barcodes are attached to proteins collectively via ribosome display6 or individually via enzymatic conjugation. Barcoded proteins are assayed en masse in aqueous solution and subsequently immobilized in a polyacrylamide (PAA) thin film to construct a random SM array, where barcoding DNAs are amplified into in situ polymerase colonies (polonies)7 and analyzed by DNA sequencing. This method allows precise quantification of various proteins with a theoretical maximum array density of over one million polonies per square millimeter. Furthermore, protein interactions can be measured based on the statistics of colocalized polonies arising from barcoding DNAs of interacting proteins. Two demanding applications, G-protein coupled receptor (GPCR) and antibody binding profiling, were demonstrated. SMI-Seq enables “library vs. library” screening in a one-pot assay, simultaneously interrogating molecular binding affinity and specificity. PMID:25252978

  19. Test equality in binary data for a 4 × 4 crossover trial under a Latin-square design.

    PubMed

    Lui, Kung-Jong; Chang, Kuang-Chao

    2016-10-15

    When there are four or more treatments under comparison, the use of a crossover design with a complete set of treatment-receipt sequences in binary data is of limited use because of too many treatment-receipt sequences. Thus, we may consider use of a 4 × 4 Latin square to reduce the number of treatment-receipt sequences when comparing three experimental treatments with a control treatment. Under a distribution-free random effects logistic regression model, we develop simple procedures for testing non-equality between any of the three experimental treatments and the control treatment in a crossover trial with dichotomous responses. We further derive interval estimators in closed forms for the relative effect between treatments. To evaluate the performance of these test procedures and interval estimators, we employ Monte Carlo simulation. We use the data taken from a crossover trial using a 4 × 4 Latin-square design for studying four-treatments to illustrate the use of test procedures and interval estimators developed here. Copyright © 2016 John Wiley & Sons, Ltd. Copyright © 2016 John Wiley & Sons, Ltd.

  20. A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait.

    PubMed

    Chen, Zhijian; Craiu, Radu V; Bull, Shelley B

    2014-11-01

    In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies. © 2014 WILEY PERIODICALS, INC.

  1. Random Item Generation Is Affected by Age

    ERIC Educational Resources Information Center

    Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal

    2016-01-01

    Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…

  2. Reduced rDNA Copy Number Does Not Affect “Competitive” Chromosome Pairing in XYY Males of Drosophila melanogaster

    PubMed Central

    Maggert, Keith A.

    2014-01-01

    The ribosomal DNA (rDNA) arrays are causal agents in X-Y chromosome pairing in meiosis I of Drosophila males. Despite broad variation in X-linked and Y-linked rDNA copy number, polymorphisms in regulatory/spacer sequences between rRNA genes, and variance in copy number of interrupting R1 and R2 retrotransposable elements, there is little evidence that different rDNA arrays affect pairing efficacy. I investigated whether induced rDNA copy number polymorphisms affect chromosome pairing in a “competitive” situation in which complex pairing configurations were possible using males with XYY constitution. Using a common normal X chromosome, one of two different full-length Y chromosomes, and a third chromosome from a series of otherwise-isogenic rDNA deletions, I detected no differences in X-Y or Y-Y pairing or chromosome segregation frequencies that could not be attributed to random variation alone. This work was performed in the context of an undergraduate teaching program at Texas A&M University, and I discuss the pedagogical utility of this and other such experiments. PMID:24449686

  3. Transposon facilitated DNA sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berg, D.E.; Berg, C.M.; Huang, H.V.

    1990-01-01

    The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less

  4. Automatic generation of randomized trial sequences for priming experiments.

    PubMed

    Ihrke, Matthias; Behrendt, Jörg

    2011-01-01

    In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.

  5. Markers and mapping revisited: finding your gene.

    PubMed

    Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

    2009-01-01

    This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.

  6. Development of Pineapple Microsatellite Markers and Germplasm Genetic Diversity Analysis

    PubMed Central

    Tong, Helin; Chen, You; Wang, Jingyi; Chen, Yeyuan; Sun, Guangming; He, Junhu; Wu, Yaoting

    2013-01-01

    Two methods were used to develop pineapple microsatellite markers. Genomic library-based SSR development: using selectively amplified microsatellite assay, 86 sequences were generated from pineapple genomic library. 91 (96.8%) of the 94 Simple Sequence Repeat (SSR) loci were dinucleotide repeats (39 AC/GT repeats and 52 GA/TC repeats, accounting for 42.9% and 57.1%, resp.), and the other three were mononucleotide repeats. Thirty-six pairs of SSR primers were designed; 24 of them generated clear bands of expected sizes, and 13 of them showed polymorphism. EST-based SSR development: 5659 pineapple EST sequences obtained from NCBI were analyzed; among 1397 nonredundant EST sequences, 843 were found containing 1110 SSR loci (217 of them contained more than one SSR locus). Frequency of SSRs in pineapple EST sequences is 1SSR/3.73 kb, and 44 types were found. Mononucleotide, dinucleotide, and trinucleotide repeats dominate, accounting for 95.6% in total. AG/CT and AGC/GCT were the dominant type of dinucleotide and trinucleotide repeats, accounting for 83.5% and 24.1%, respectively. Thirty pairs of primers were designed for each of randomly selected 30 sequences; 26 of them generated clear and reproducible bands, and 22 of them showed polymorphism. Eighteen pairs of primers obtained by the one or the other of the two methods above that showed polymorphism were selected to carry out germplasm genetic diversity analysis for 48 breeds of pineapple; similarity coefficients of these breeds were between 0.59 and 1.00, and they can be divided into four groups accordingly. Amplification products of five SSR markers were extracted and sequenced, corresponding repeat loci were found and locus mutations are mainly in copy number of repeats and base mutations in the flanking region. PMID:24024187

  7. Genetic Characterization of Fasciola Isolates from West Azerbaijan Province Iran Based on ITS1 and ITS2 Sequence of Ribosomal DNA

    PubMed Central

    GALAVANI, Hossein; GHOLIZADEH, Saber; HAZRATI TAPPEH, Khosrow

    2016-01-01

    Background: Fascioliasis, caused by Fasciola hepatica and F. gigantica, has medical and economic importance in the world. Molecular approaches comparing traditional methods using for identification and characterization of Fasciola spp. are precise and reliable. The aims of current study were molecular characterization of Fasciola spp. in West Azerbaijan Province, Iran and then comparative analysis of them using GenBank sequences. Methods: A total number of 580 isolates were collected from different hosts in five cities of West Azerbaijan Province, in 2014 from 90 slaughtered cattle (n=50) and sheep (n=40). After morphological identification and DNA extraction, designing specific primer were used to amplification of ITS1, 5.8s and ITS2 regions, 50 samples were conducted to sequence, randomly. Result: Using morphometric characters 99.14% and 0.86% of isolates identified as F. hepatica and F. gigantica, respectively. PCR amplification of 1081 bp fragment and sequencing result showed 100% similarity with F. hepatica in ITS1 (428 bp), 5.8s (158 bp), and ITS2 (366 bp) regions. Sequence comparison among current study sequences and GenBank data showed 98% identity with 11 nucleotide mismatches. However, in phylogenetic tree F. hepatica sequences of West Azerbaijan Province, Iran, were in a close relationship with Iranian, Asian, and African isolates. Conclusions: Only F. hepatica species is distributed among sheep and cattle in West Azerbaijan Province Iran. However, 5 and 6 bp variation in ITS1 and ITS2 regions, respectively, is not enough to separate of Fasciola spp. Therefore, more studies are essential for designing new molecular markers to correct species identification. PMID:27095969

  8. BAC-End Sequence-Based SNP Mining in Allotetraploid Cotton (Gossypium) Utilizing Resequencing Data, Phylogenetic Inferences, and Perspectives for Genetic Mapping.

    PubMed

    Hulse-Kemp, Amanda M; Ashrafi, Hamid; Stoffel, Kevin; Zheng, Xiuting; Saski, Christopher A; Scheffler, Brian E; Fang, David D; Chen, Z Jeffrey; Van Deynze, Allen; Stelly, David M

    2015-04-09

    A bacterial artificial chromosome library and BAC-end sequences for cultivated cotton (Gossypium hirsutum L.) have recently been developed. This report presents genome-wide single nucleotide polymorphism (SNP) mining utilizing resequencing data with BAC-end sequences as a reference by alignment of 12 G. hirsutum L. lines, one G. barbadense L. line, and one G. longicalyx Hutch and Lee line. A total of 132,262 intraspecific SNPs have been developed for G. hirsutum, whereas 223,138 and 470,631 interspecific SNPs have been developed for G. barbadense and G. longicalyx, respectively. Using a set of interspecific SNPs, 11 randomly selected and 77 SNPs that are putatively associated with the homeologous chromosome pair 12 and 26, we mapped 77 SNPs into two linkage groups representing these chromosomes, spanning a total of 236.2 cM in an interspecific F2 population (G. barbadense 3-79 × G. hirsutum TM-1). The mapping results validated the approach for reliably producing large numbers of both intraspecific and interspecific SNPs aligned to BAC-ends. This will allow for future construction of high-density integrated physical and genetic maps for cotton and other complex polyploid genomes. The methods developed will allow for future Gossypium resequencing data to be automatically genotyped for identified SNPs along the BAC-end sequence reference for anchoring sequence assemblies and comparative studies. Copyright © 2015 Hulse-Kemp et al.

  9. Gene Discovery through Genomic Sequencing of Brucella abortus

    PubMed Central

    Sánchez, Daniel O.; Zandomeni, Ruben O.; Cravero, Silvio; Verdún, Ramiro E.; Pierrou, Ester; Faccio, Paula; Diaz, Gabriela; Lanzavecchia, Silvia; Agüero, Fernán; Frasch, Alberto C. C.; Andersson, Siv G. E.; Rossetti, Osvaldo L.; Grau, Oscar; Ugalde, Rodolfo A.

    2001-01-01

    Brucella abortus is the etiological agent of brucellosis, a disease that affects bovines and human. We generated DNA random sequences from the genome of B. abortus strain 2308 in order to characterize molecular targets that might be useful for developing immunological or chemotherapeutic strategies against this pathogen. The partial sequencing of 1,899 clones allowed the identification of 1,199 genomic sequence surveys (GSSs) with high homology (BLAST expect value < 10−5) to sequences deposited in the GenBank databases. Among them, 925 represent putative novel genes for the Brucella genus. Out of 925 nonredundant GSSs, 470 were classified in 15 categories based on cellular function. Seven hundred GSSs showed no significant database matches and remain available for further studies in order to identify their function. A high number of GSSs with homology to Agrobacterium tumefaciens and Rhizobium meliloti proteins were observed, thus confirming their close phylogenetic relationship. Among them, several GSSs showed high similarity with genes related to nodule nitrogen fixation, synthesis of nod factors, nodulation protein symbiotic plasmid, and nodule bacteroid differentiation. We have also identified several B. abortus homologs of virulence and pathogenesis genes from other pathogens, including a homolog to both the Shda gene from Salmonella enterica serovar Typhimurium and the AidA-1 gene from Escherichia coli. Other GSSs displayed significant homologies to genes encoding components of the type III and type IV secretion machineries, suggesting that Brucella might also have an active type III secretion machinery. PMID:11159979

  10. AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.

    PubMed

    Paridaens, Tom; Van Wallendael, Glenn; De Neve, Wesley; Lambert, Peter

    2017-05-15

    The past decade has seen the introduction of new technologies that lowered the cost of genomic sequencing increasingly. We can even observe that the cost of sequencing is dropping significantly faster than the cost of storage and transmission. The latter motivates a need for continuous improvements in the area of genomic data compression, not only at the level of effectiveness (compression rate), but also at the level of functionality (e.g. random access), configurability (effectiveness versus complexity, coding tool set …) and versatility (support for both sequenced reads and assembled sequences). In that regard, we can point out that current approaches mostly do not support random access, requiring full files to be transmitted, and that current approaches are restricted to either read or sequence compression. We propose AFRESh, an adaptive framework for no-reference compression of genomic data with random access functionality, targeting the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a Context-Adaptive Binary Arithmetic Coding scheme (CABAC), to compress raw genetic codes. To the best of our knowledge, our paper is the first to describe an effective implementation CABAC outside of its' original application. By applying CABAC, the compression effectiveness improves by up to 19% for assembled sequences and up to 62% for reads. By applying AFRESh to the genomic symbols of the MPEG genomic compression test set for reads, a compression gain is achieved of up to 51% compared to SCALCE, 42% compared to LFQC and 44% compared to ORCOM. When comparing to generic compression approaches, a compression gain is achieved of up to 41% compared to GNU Gzip and 22% compared to 7-Zip at the Ultra setting. Additionaly, when compressing assembled sequences of the Human Genome, a compression gain is achieved up to 34% compared to GNU Gzip and 16% compared to 7-Zip at the Ultra setting. A Windows executable version can be downloaded at https://github.com/tparidae/AFresh . tom.paridaens@ugent.be. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  11. A package of Linux scripts for the parallelization of Monte Carlo simulations

    NASA Astrophysics Data System (ADS)

    Badal, Andreu; Sempau, Josep

    2006-09-01

    Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) simulation involves a prohibitively large amount of time. This limitation can be overcome by having recourse to parallel computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a parallelization scheme of a MC simulation that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the parallel calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to parallelize PENELOPE, a code for MC simulation of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in parallel easily, without requiring specific libraries or significant alterations of the sequential code. Program summary 1Title of program:clonEasy Catalogue identifier:ADYD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYD_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a Unix style shell (bash), support for the Secure Shell protocol and a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1) Compilers:GNU FORTRAN g77 (Linux); g95 (Linux); Intel Fortran Compiler 7.1 (Linux) Programming language used:Linux shell (bash) script, FORTRAN 77 No. of bits in a word:32 No. of lines in distributed program, including test data, etc.:1916 No. of bytes in distributed program, including test data, etc.:18 202 Distribution format:tar.gz Nature of the physical problem:There are many situations where a Monte Carlo simulation involves a huge amount of CPU time. The parallelization of such calculations is a simple way of obtaining a relatively low statistical uncertainty using a reasonable amount of time. Method of solution:The presented collection of Linux scripts and auxiliary FORTRAN programs implement Secure Shell-based communication between a "master" computer and a set of "clones". The aim of this communication is to execute a code that performs a Monte Carlo simulation on all the clones simultaneously. The code is unique, but each clone is fed with a different set of random seeds. Hence, clonEasy effectively permits the parallelization of the calculation. Restrictions on the complexity of the program:clonEasy can only be used with programs that produce statistically independent results using the same code, but with a different sequence of random numbers. Users must choose the initialization values for the random number generator on each computer and combine the output from the different executions. A FORTRAN program to combine the final results is also provided. Typical running time:The execution time of each script largely depends on the number of computers that are used, the actions that are to be performed and, to a lesser extent, on the network connexion bandwidth. Unusual features of the program:Any computer on the Internet with a Secure Shell client/server program installed can be used as a node of a virtual computer cluster for parallel calculations with the sequential source code. The simplicity of the parallelization scheme makes the use of this package a straightforward task, which does not require installing any additional libraries. Program summary 2Title of program:seedsMLCG Catalogue identifier:ADYE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYE_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1), MS Windows (2000, XP) Compilers:GNU FORTRAN g77 (Linux and Windows); g95 (Linux); Intel Fortran Compiler 7.1 (Linux); Compaq Visual Fortran 6.1 (Windows) Programming language used:FORTRAN 77 No. of bits in a word:32 Memory required to execute with typical data:500 kilobytes No. of lines in distributed program, including test data, etc.:492 No. of bytes in distributed program, including test data, etc.:5582 Distribution format:tar.gz Nature of the physical problem:Statistically independent results from different runs of a Monte Carlo code can be obtained using uncorrelated sequences of random numbers on each execution. Multiplicative linear congruential generators (MLCG), or other generators that are based on them such as RANECU, can be adapted to produce these sequences. Method of solution:For a given MLCG, the presented program calculates initialization values that produce disjoint, consecutive sequences of pseudo-random numbers. The calculated values initiate the generator in distant positions of the random number cycle and can be used, for instance, on a parallel simulation. The values are found using the formula S=(aS)MODm, which gives the random value that will be generated after J iterations of the MLCG. Restrictions on the complexity of the program:The 32-bit length restriction for the integer variables in standard FORTRAN 77 limits the produced seeds to be separated a distance smaller than 2 31, when the distance J is expressed as an integer value. The program allows the user to input the distance as a power of 10 for the purpose of efficiently splitting the sequence of generators with a very long period. Typical running time:The execution time depends on the parameters of the used MLCG and the distance between the generated seeds. The generation of 10 6 seeds separated 10 12 units in the sequential cycle, for one of the MLCGs found in the RANECU generator, takes 3 s on a 2.4 GHz Intel Pentium 4 using the g77 compiler.

  12. Construction of a reference genetic linkage map for carnation (Dianthus caryophyllus L.)

    PubMed Central

    2013-01-01

    Background Genetic linkage maps are important tools for many genetic applications including mapping of quantitative trait loci (QTLs), identifying DNA markers for fingerprinting, and map-based gene cloning. Carnation (Dianthus caryophyllus L.) is an important ornamental flower worldwide. We previously reported a random amplified polymorphic DNA (RAPD)-based genetic linkage map derived from Dianthus capitatus ssp. andrezejowskianus and a simple sequence repeat (SSR)-based genetic linkage map constructed using data from intraspecific F2 populations; however, the number of markers was insufficient, and so the number of linkage groups (LGs) did not coincide with the number of chromosomes (x = 15). Therefore, we aimed to produce a high-density genetic map to improve its usefulness for breeding purposes and genetic research. Results We improved the SSR-based genetic linkage map using SSR markers derived from a genomic library, expression sequence tags, and RNA-seq data. Linkage analysis revealed that 412 SSR loci (including 234 newly developed SSR loci) could be mapped to 17 linkage groups (LGs) covering 969.6 cM. Comparison of five minor LGs covering less than 50 cM with LGs in our previous RAPD-based genetic map suggested that four LGs could be integrated into two LGs by anchoring common SSR loci. Consequently, the number of LGs corresponded to the number of chromosomes (x = 15). We added 192 new SSRs, eight RAPD, and two sequence-tagged site loci to refine the RAPD-based genetic linkage map, which comprised 15 LGs consisting of 348 loci covering 978.3 cM. The two maps had 125 SSR loci in common, and most of the positions of markers were conserved between them. We identified 635 loci in carnation using the two linkage maps. We also mapped QTLs for two traits (bacterial wilt resistance and anthocyanin pigmentation in the flower) and a phenotypic locus for flower-type by analyzing previously reported genotype and phenotype data. Conclusions The improved genetic linkage maps and SSR markers developed in this study will serve as reference genetic linkage maps for members of the genus Dianthus, including carnation, and will be useful for mapping QTLs associated with various traits, and for improving carnation breeding programs. PMID:24160306

  13. Construction of a reference genetic linkage map for carnation (Dianthus caryophyllus L.).

    PubMed

    Yagi, Masafumi; Yamamoto, Toshiya; Isobe, Sachiko; Hirakawa, Hideki; Tabata, Satoshi; Tanase, Koji; Yamaguchi, Hiroyasu; Onozaki, Takashi

    2013-10-26

    Genetic linkage maps are important tools for many genetic applications including mapping of quantitative trait loci (QTLs), identifying DNA markers for fingerprinting, and map-based gene cloning. Carnation (Dianthus caryophyllus L.) is an important ornamental flower worldwide. We previously reported a random amplified polymorphic DNA (RAPD)-based genetic linkage map derived from Dianthus capitatus ssp. andrezejowskianus and a simple sequence repeat (SSR)-based genetic linkage map constructed using data from intraspecific F2 populations; however, the number of markers was insufficient, and so the number of linkage groups (LGs) did not coincide with the number of chromosomes (x = 15). Therefore, we aimed to produce a high-density genetic map to improve its usefulness for breeding purposes and genetic research. We improved the SSR-based genetic linkage map using SSR markers derived from a genomic library, expression sequence tags, and RNA-seq data. Linkage analysis revealed that 412 SSR loci (including 234 newly developed SSR loci) could be mapped to 17 linkage groups (LGs) covering 969.6 cM. Comparison of five minor LGs covering less than 50 cM with LGs in our previous RAPD-based genetic map suggested that four LGs could be integrated into two LGs by anchoring common SSR loci. Consequently, the number of LGs corresponded to the number of chromosomes (x = 15). We added 192 new SSRs, eight RAPD, and two sequence-tagged site loci to refine the RAPD-based genetic linkage map, which comprised 15 LGs consisting of 348 loci covering 978.3 cM. The two maps had 125 SSR loci in common, and most of the positions of markers were conserved between them. We identified 635 loci in carnation using the two linkage maps. We also mapped QTLs for two traits (bacterial wilt resistance and anthocyanin pigmentation in the flower) and a phenotypic locus for flower-type by analyzing previously reported genotype and phenotype data. The improved genetic linkage maps and SSR markers developed in this study will serve as reference genetic linkage maps for members of the genus Dianthus, including carnation, and will be useful for mapping QTLs associated with various traits, and for improving carnation breeding programs.

  14. Effect of Structured Touch and Guided Imagery for Pain and Anxiety in Elective Joint Replacement Patients--A Randomized Controlled Trial: M-TIJRP.

    PubMed

    Forward, John Brent; Greuter, Nancy Elizabeth; Crisall, Santa J; Lester, Houston F

    2015-01-01

    Postoperative management of pain after total joint arthroplasty remains a challenge despite advancements in analgesics. Evidence shows that complementary modalities with mind-body and tactile-based approaches are valid and effective adjuncts to reduce pain and anxiety postoperatively. To investigate the effectiveness of the "M" Technique (M), a registered method of structured touch using a set sequence and number of strokes, and a consistent level of pressure on hands and feet, compared with guided imagery and usual care, for the reduction of pain and anxiety in patients undergoing elective total knee or hip replacement surgery. Randomized controlled trial: M-TIJRP (MiTechnique and guided Imagery in Joint Replacement Patients [Mighty Junior P]). At a community hospital, 225 male and female patients, aged 38 to 90 years, undergoing elective total hip or knee replacement were randomly assigned to 1 of 3 groups (75 patients in each): M, guided imagery, or usual care. They were blinded to their assignment until the intervention. Reduction of pain and anxiety postoperatively. Secondary outcomes measured use of pain medication and patient satisfaction. This study yielded positive findings for the management of pain and anxiety in patients undergoing elective joint replacement using M and guided imagery for 18 to 20 minutes compared with usual care. M showed the largest predicted decreases in both pain and anxiety between groups. There was no significant difference in narcotic pain medication use between groups. Patient satisfaction survey ratings were highest for M, followed by guided imagery. The benefit of M may be because of the specifically structured sequence of touch by competent caring, trained providers.

  15. Effect of Structured Touch and Guided Imagery for Pain and Anxiety in Elective Joint Replacement Patients—A Randomized Controlled Trial: M-TIJRP

    PubMed Central

    Forward, John Brent; Greuter, Nancy Elizabeth; Crisall, Santa J; Lester, Houston F

    2015-01-01

    Context: Postoperative management of pain after total joint arthroplasty remains a challenge despite advancements in analgesics. Evidence shows that complementary modalities with mind-body and tactile-based approaches are valid and effective adjuncts to reduce pain and anxiety postoperatively. Objective: To investigate the effectiveness of the “M” Technique (M), a registered method of structured touch using a set sequence and number of strokes, and a consistent level of pressure on hands and feet, compared with guided imagery and usual care, for the reduction of pain and anxiety in patients undergoing elective total knee or hip replacement surgery. Methods: Randomized controlled trial: M-TIJRP (MiTechnique and guided Imagery in Joint Replacement Patients [Mighty Junior P]). At a community hospital, 225 male and female patients, aged 38 to 90 years, undergoing elective total hip or knee replacement were randomly assigned to 1 of 3 groups (75 patients in each): M, guided imagery, or usual care. They were blinded to their assignment until the intervention. Main Outcome Measures: Reduction of pain and anxiety postoperatively. Secondary outcomes measured use of pain medication and patient satisfaction. Results: This study yielded positive findings for the management of pain and anxiety in patients undergoing elective joint replacement using M and guided imagery for 18 to 20 minutes compared with usual care. M showed the largest predicted decreases in both pain and anxiety between groups. There was no significant difference in narcotic pain medication use between groups. Patient satisfaction survey ratings were highest for M, followed by guided imagery. Conclusion: The benefit of M may be because of the specifically structured sequence of touch by competent caring, trained providers. PMID:26222093

  16. Differentiating Visual from Response Sequencing during Long-term Skill Learning.

    PubMed

    Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy

    2017-01-01

    The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.

  17. High resolution identity testing of inactivated poliovirus vaccines

    PubMed Central

    Mee, Edward T.; Minor, Philip D.; Martin, Javier

    2015-01-01

    Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003

  18. A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

    PubMed

    Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

    2007-04-01

    Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.

  19. Common position of indels that cause deviations from canonical genome organization in different measles virus strains.

    PubMed

    Ivancic-Jelecki, Jelena; Slovic, Anamarija; Šantak, Maja; Tešović, Goran; Forcic, Dubravko

    2016-07-29

    The canonical genome organization of measles virus (MV) is characterized by total size of 15 894 nucleotides (nts) and defined length of every genomic region, both coding and non-coding. Only rarely have reports of strains possessing non-canonical genomic properties (possessing indels, with or without the change of total genome length) been published. The observed mutations are mutually compensatory in a sense that the total genome length remains polyhexameric. Although programmed and highly precise pseudo-templated nucleotide additions during transcription are inherent to polymerases of all viruses belonging to family Paramyxoviridae, a similar mechanism that would serve to non-randomly correct genome length, if an indel has occurred during replication, has so far not been described in the context of a complete virus genome. We compiled all complete MV genomic sequences (64 in total) available in open access sequence databases. Multiple sequence comparisons and phylogenetic analyses were performed with the aim of exploring whether non-recombinant and non-evolutionary linked measles strains that show deviations from canonical genome organization possess a common genetic characteristic. In 11 MV sequences we detected deviations from canonical genome organization due to short indels located within homopolymeric stretches or next to them. In nine out of 11 identified non-canonical MV sequences, a common feature was observed: one mutation, either an insertion or a deletion, was located in a 28 nts long region in F gene 5' untranslated region (positions 5051-5078 in genomic cDNA of canonical strains). This segment is composed of five tandemly linked homopolymeric stretches, its consensus sequence is G6-7C7-8A6-7G1-3C5-6. Although none of the mononucleotide repeats within this segment has fixed length, the total number of nts in canonical strains is always 28. These nine non-canonical strains, as well as the tenth (not mutated in 5051-5078 segment), can be grouped in three clusters, based on their passage histories/epidemiological data/genetic similarities. There are no indications that the 3 clusters are evolutionary linked, other than the fact that they all belong to clade D. A common narrow genomic region was found to be mutated in different, non-related, wild type strains suggesting that this region might have a function in non-random genome length corrections occurring during MV replication.

  20. Efficient encapsulation of proteins with random copolymers.

    PubMed

    Nguyen, Trung Dac; Qiao, Baofu; Olvera de la Cruz, Monica

    2018-06-12

    Membraneless organelles are aggregates of disordered proteins that form spontaneously to promote specific cellular functions in vivo. The possibility of synthesizing membraneless organelles out of cells will therefore enable fabrication of protein-based materials with functions inherent to biological matter. Since random copolymers contain various compositions and sequences of solvophobic and solvophilic groups, they are expected to function in nonbiological media similarly to a set of disordered proteins in membraneless organelles. Interestingly, the internal environment of these organelles has been noted to behave more like an organic solvent than like water. Therefore, an adsorbed layer of random copolymers that mimics the function of disordered proteins could, in principle, protect and enhance the proteins' enzymatic activity even in organic solvents, which are ideal when the products and/or the reactants have limited solubility in aqueous media. Here, we demonstrate via multiscale simulations that random copolymers efficiently incorporate proteins into different solvents with the potential to optimize their enzymatic activity. We investigate the key factors that govern the ability of random copolymers to encapsulate proteins, including the adsorption energy, copolymer average composition, and solvent selectivity. The adsorbed polymer chains have remarkably similar sequences, indicating that the proteins are able to select certain sequences that best reduce their exposure to the solvent. We also find that the protein surface coverage decreases when the fluctuation in the average distance between the protein adsorption sites increases. The results herein set the stage for computational design of random copolymers for stabilizing and delivering proteins across multiple media.

  1. Construction, Characterization, and Preliminary BAC-End Sequence Analysis of a Bacterial Artificial Chromosome Library of the Tea Plant (Camellia sinensis)

    PubMed Central

    Lin, Jinke; Kudrna, Dave; Wing, Rod A.

    2011-01-01

    We describe the construction and characterization of a publicly available BAC library for the tea plant, Camellia sinensis. Using modified methods, the library was constructed with the aim of developing public molecular resources to advance tea plant genomics research. The library consists of a total of 401,280 clones with an average insert size of 135 kb, providing an approximate coverage of 13.5 haploid genome equivalents. No empty vector clones were observed in a random sampling of 576 BAC clones. Further analysis of 182 BAC-end sequences from randomly selected clones revealed a GC content of 40.35% and low chloroplast and mitochondrial contamination. Repetitive sequence analyses indicated that LTR retrotransposons were the most predominant sequence class (86.93%–87.24%), followed by DNA retrotransposons (11.16%–11.69%). Additionally, we found 25 simple sequence repeats (SSRs) that could potentially be used as genetic markers. PMID:21234344

  2. Short-term memory capacity in networks via the restricted isometry property.

    PubMed

    Charles, Adam S; Yap, Han Lun; Rozell, Christopher J

    2014-06-01

    Cortical networks are hypothesized to rely on transient network activity to support short-term memory (STM). In this letter, we study the capacity of randomly connected recurrent linear networks for performing STM when the input signals are approximately sparse in some basis. We leverage results from compressed sensing to provide rigorous nonasymptotic recovery guarantees, quantifying the impact of the input sparsity level, the input sparsity basis, and the network characteristics on the system capacity. Our analysis demonstrates that network memory capacities can scale superlinearly with the number of nodes and in some situations can achieve STM capacities that are much larger than the network size. We provide perfect recovery guarantees for finite sequences and recovery bounds for infinite sequences. The latter analysis predicts that network STM systems may have an optimal recovery length that balances errors due to omission and recall mistakes. Furthermore, we show that the conditions yielding optimal STM capacity can be embodied in several network topologies, including networks with sparse or dense connectivities.

  3. Model for calculation of electrostatic contribution into protein stability

    NASA Astrophysics Data System (ADS)

    Kundrotas, Petras; Karshikoff, Andrey

    2003-03-01

    Existing models of the denatured state of proteins consider only one possible spatial distribution of protein charges and therefore are applicable to a limited number of cases. In this presentation a more general framework for the modeling of the denatured state is proposed. It is based on the assumption that the titratable groups of an unfolded protein can adopt a quasi-random distribution, restricted by the protein sequence. The model was tested on two proteins, barnase and N-terminal domain of the ribosomal protein L9. The calculated free energy of denaturation, Δ G( pH), reproduces the experimental data essentially better than the commonly used null approximation (NA). It was demonstrated that the seemingly good agreement with experimental data obtained by NA originates from the compensatory effect between the pair-wise electrostatic interactions and the desolvation energy of the individual sites. It was also found that the ionization properties of denatured proteins are influenced by the protein sequence.

  4. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.

    PubMed

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-11

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  5. Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields

    NASA Astrophysics Data System (ADS)

    Wang, Sheng; Peng, Jian; Ma, Jianzhu; Xu, Jinbo

    2016-01-01

    Protein secondary structure (SS) prediction is important for studying protein structure and function. When only the sequence (profile) information is used as input feature, currently the best predictors can obtain ~80% Q3 accuracy, which has not been improved in the past decade. Here we present DeepCNF (Deep Convolutional Neural Fields) for protein SS prediction. DeepCNF is a Deep Learning extension of Conditional Neural Fields (CNF), which is an integration of Conditional Random Fields (CRF) and shallow neural networks. DeepCNF can model not only complex sequence-structure relationship by a deep hierarchical architecture, but also interdependency between adjacent SS labels, so it is much more powerful than CNF. Experimental results show that DeepCNF can obtain ~84% Q3 accuracy, ~85% SOV score, and ~72% Q8 accuracy, respectively, on the CASP and CAMEO test proteins, greatly outperforming currently popular predictors. As a general framework, DeepCNF can be used to predict other protein structure properties such as contact number, disorder regions, and solvent accessibility.

  6. Contingency, convergence and hyper-astronomical numbers in biological evolution.

    PubMed

    Louis, Ard A

    2016-08-01

    Counterfactual questions such as "what would happen if you re-run the tape of life?" turn on the nature of the landscape of biological possibilities. Since the number of potential sequences that store genetic information grows exponentially with length, genetic possibility spaces can be so unimaginably vast that commentators frequently reach of hyper-astronomical metaphors that compare their size to that of the universe. Re-run the tape of life and the likelihood of encountering the same sequences in such hyper-astronomically large spaces is infinitesimally small, suggesting that evolutionary outcomes are highly contingent. On the other hand, the wide-spread occurrence of evolutionary convergence implies that similar phenotypes can be found again with relative ease. How can this be? Part of the solution to this conundrum must lie in the manner that genotypes map to phenotypes. By studying simple genotype-phenotype maps, where the counterfactual space of all possible phenotypes can be enumerated, it is shown that strong bias in the arrival of variation may explain why certain phenotypes are (repeatedly) observed in nature, while others never appear. This biased variation provides a non-selective cause for certain types of convergence. It illustrates how the role of randomness and contingency may differ significantly between genetic and phenotype spaces. Copyright © 2016 Elsevier Ltd. All rights reserved.

  7. Random-effects linear modeling and sample size tables for two special crossover designs of average bioequivalence studies: the four-period, two-sequence, two-formulation and six-period, three-sequence, three-formulation designs.

    PubMed

    Diaz, Francisco J; Berg, Michel J; Krebill, Ron; Welty, Timothy; Gidal, Barry E; Alloway, Rita; Privitera, Michael

    2013-12-01

    Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.

  8. Demonstration of Nondeclarative Sequence Learning in Mice: Development of an Animal Analog of the Human Serial Reaction Time Task

    ERIC Educational Resources Information Center

    Christie, Michael A.; Hersch, Steven M.

    2004-01-01

    In this paper, we demonstrate nondeclarative sequence learning in mice using an animal analog of the human serial reaction time task (SRT) that uses a within-group comparison of behavior in response to a repeating sequence versus a random sequence. Ten female B6CBA mice performed eleven 96-trial sessions containing 24 repetitions of a 4-trial…

  9. Robust High Data Rate MIMO Underwater Acoustic Communications

    DTIC Science & Technology

    2010-12-31

    algorithm is referred to as periodic CAN ( PeCAN ). Unlike most existing sequence construction methods which are algebraic and deterministic in nature, we...start the iteration of PeCAN from random phase initializations and then proceed to cyclically minimize the desired metric. In this way, through...by the foe and hence are especially useful as training sequences or as spreading sequences for UAC applications. We will use PeCAN sequences for

  10. Secure uniform random-number extraction via incoherent strategies

    NASA Astrophysics Data System (ADS)

    Hayashi, Masahito; Zhu, Huangjun

    2018-01-01

    To guarantee the security of uniform random numbers generated by a quantum random-number generator, we study secure extraction of uniform random numbers when the environment of a given quantum state is controlled by the third party, the eavesdropper. Here we restrict our operations to incoherent strategies that are composed of the measurement on the computational basis and incoherent operations (or incoherence-preserving operations). We show that the maximum secure extraction rate is equal to the relative entropy of coherence. By contrast, the coherence of formation gives the extraction rate when a certain constraint is imposed on the eavesdropper's operations. The condition under which the two extraction rates coincide is then determined. Furthermore, we find that the exponential decreasing rate of the leaked information is characterized by Rényi relative entropies of coherence. These results clarify the power of incoherent strategies in random-number generation, and can be applied to guarantee the quality of random numbers generated by a quantum random-number generator.

  11. Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

    PubMed Central

    Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

    2005-01-01

    We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762

  12. Implementation of a parallel protein structure alignment service on cloud.

    PubMed

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform.

  13. Structural Transitions in Densifying Networks

    NASA Astrophysics Data System (ADS)

    Lambiotte, R.; Krapivsky, P. L.; Bhat, U.; Redner, S.

    2016-11-01

    We introduce a minimal generative model for densifying networks in which a new node attaches to a randomly selected target node and also to each of its neighbors with probability p . The networks that emerge from this copying mechanism are sparse for p <1/2 and dense (average degree increasing with number of nodes N ) for p ≥1/2 . The behavior in the dense regime is especially rich; for example, individual network realizations that are built by copying are disparate and not self-averaging. Further, there is an infinite sequence of structural anomalies at p =2/3 , 3/4 , 4/5 , etc., where the N dependences of the number of triangles (3-cliques), 4-cliques, undergo phase transitions. When linking to second neighbors of the target can occur, the probability that the resulting graph is complete—all nodes are connected—is nonzero as N →∞ .

  14. Thermodynamic prediction of protein neutrality.

    PubMed

    Bloom, Jesse D; Silberg, Jonathan J; Wilke, Claus O; Drummond, D Allan; Adami, Christoph; Arnold, Frances H

    2005-01-18

    We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 beta-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications.

  15. Thermodynamic prediction of protein neutrality

    PubMed Central

    Bloom, Jesse D.; Silberg, Jonathan J.; Wilke, Claus O.; Drummond, D. Allan; Adami, Christoph; Arnold, Frances H.

    2005-01-01

    We present a simple theory that uses thermodynamic parameters to predict the probability that a protein retains the wild-type structure after one or more random amino acid substitutions. Our theory predicts that for large numbers of substitutions the probability that a protein retains its structure will decline exponentially with the number of substitutions, with the severity of this decline determined by properties of the structure. Our theory also predicts that a protein can gain extra robustness to the first few substitutions by increasing its thermodynamic stability. We validate our theory with simulations on lattice protein models and by showing that it quantitatively predicts previously published experimental measurements on subtilisin and our own measurements on variants of TEM1 β-lactamase. Our work unifies observations about the clustering of functional proteins in sequence space, and provides a basis for interpreting the response of proteins to substitutions in protein engineering applications. PMID:15644440

  16. Implementation of a Parallel Protein Structure Alignment Service on Cloud

    PubMed Central

    Hung, Che-Lun; Lin, Yaw-Ling

    2013-01-01

    Protein structure alignment has become an important strategy by which to identify evolutionary relationships between protein sequences. Several alignment tools are currently available for online comparison of protein structures. In this paper, we propose a parallel protein structure alignment service based on the Hadoop distribution framework. This service includes a protein structure alignment algorithm, a refinement algorithm, and a MapReduce programming model. The refinement algorithm refines the result of alignment. To process vast numbers of protein structures in parallel, the alignment and refinement algorithms are implemented using MapReduce. We analyzed and compared the structure alignments produced by different methods using a dataset randomly selected from the PDB database. The experimental results verify that the proposed algorithm refines the resulting alignments more accurately than existing algorithms. Meanwhile, the computational performance of the proposed service is proportional to the number of processors used in our cloud platform. PMID:23671842

  17. An investigation of error correcting techniques for OMV and AXAF

    NASA Technical Reports Server (NTRS)

    Ingels, Frank; Fryer, John

    1991-01-01

    The original objectives of this project were to build a test system for the NASA 255/223 Reed/Solomon encoding/decoding chip set and circuit board. This test system was then to be interfaced with a convolutional system at MSFC to examine the performance of the concantinated codes. After considerable work, it was discovered that the convolutional system could not function as needed. This report documents the design, construction, and testing of the test apparatus for the R/S chip set. The approach taken was to verify the error correcting behavior of the chip set by injecting known error patterns onto data and observing the results. Error sequences were generated using pseudo-random number generator programs, with Poisson time distribution between errors and Gaussian burst lengths. Sample means, variances, and number of un-correctable errors were calculated for each data set before testing.

  18. Algorithm to determine the percolation largest component in interconnected networks.

    PubMed

    Schneider, Christian M; Araújo, Nuno A M; Herrmann, Hans J

    2013-04-01

    Interconnected networks have been shown to be much more vulnerable to random and targeted failures than isolated ones, raising several interesting questions regarding the identification and mitigation of their risk. The paradigm to address these questions is the percolation model, where the resilience of the system is quantified by the dependence of the size of the largest cluster on the number of failures. Numerically, the major challenge is the identification of this cluster and the calculation of its size. Here, we propose an efficient algorithm to tackle this problem. We show that the algorithm scales as O(NlogN), where N is the number of nodes in the network, a significant improvement compared to O(N(2)) for a greedy algorithm, which permits studying much larger networks. Our new strategy can be applied to any network topology and distribution of interdependencies, as well as any sequence of failures.

  19. Improved Targeting Through Collaborative Decision-Making and Brain Computer Interfaces

    NASA Technical Reports Server (NTRS)

    Stoica, Adrian; Barrero, David F.; McDonald-Maier, Klaus

    2013-01-01

    This paper reports a first step toward a brain-computer interface (BCI) for collaborative targeting. Specifically, we explore, from a broad perspective, how the collaboration of a group of people can increase the performance on a simple target identification task. To this end, we requested a group of people to identify the location and color of a sequence of targets appearing on the screen and measured the time and accuracy of the response. The individual results are compared to a collective identification result determined by simple majority voting, with random choice in case of drawn. The results are promising, as the identification becomes significantly more reliable even with this simple voting and a small number of people (either odd or even number) involved in the decision. In addition, the paper briefly analyzes the role of brain-computer interfaces in collaborative targeting, extending the targeting task by using a BCI instead of a mechanical response.

  20. Dominant genetics using a yeast genomic library under the control of a strong inducible promoter.

    PubMed

    Ramer, S W; Elledge, S J; Davis, R W

    1992-12-01

    In Saccharomyces cerevisiae, numerous genes have been identified by selection from high-copy-number libraries based on "multicopy suppression" or other phenotypic consequences of overexpression. Although fruitful, this approach suffers from two major drawbacks. First, high copy number alone may not permit high-level expression of tightly regulated genes. Conversely, other genes expressed in proportion to dosage cannot be identified if their products are toxic at elevated levels. This work reports construction of a genomic DNA expression library for S. cerevisiae that circumvents both limitations by fusing randomly sheared genomic DNA to the strong, inducible yeast GAL1 promoter, which can be regulated by carbon source. The library obtained contains 5 x 10(7) independent recombinants, representing a breakpoint at every base in the yeast genome. This library was used to examine aberrant gene expression in S. cerevisiae. A screen for dominant activators of yeast mating response identified eight genes that activate the pathway in the absence of exogenous mating pheromone, including one previously unidentified gene. One activator was a truncated STE11 gene lacking approximately 1000 base pairs of amino-terminal coding sequence. In two different clones, the same GAL1 promoter-proximal ATG is in-frame with the coding sequence of STE11, suggesting that internal initiation of translation there results in production of a biologically active, truncated STE11 protein. Thus this library allows isolation based on dominant phenotypes of genes that might have been difficult or impossible to isolate from high-copy-number libraries.

  1. Regulatory sequence analysis tools.

    PubMed

    van Helden, Jacques

    2003-07-01

    The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.

  2. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.

    PubMed

    Jérôme, Mariette; Noirot, Céline; Klopp, Christophe

    2011-05-26

    Roche 454 pyrosequencing platform is often considered the most versatile of the Next Generation Sequencing technology platforms, permitting the sequencing of large genomes, the analysis of variations or the study of transcriptomes. A recent reported bias leads to the production of multiple reads for a unique DNA fragment in a random manner within a run. This bias has a direct impact on the quality of the measurement of the representation of the fragments using the reads. Other cleaning steps are usually performed on the reads before assembly or alignment. PyroCleaner is a software module intended to clean 454 pyrosequencing reads in order to ease the assembly process. This program is a free software and is distributed under the terms of the GNU General Public License as published by the Free Software Foundation. It implements several filters using criteria such as read duplication, length, complexity, base-pair quality and number of undetermined bases. It also permits to clean flowgram files (.sff) of paired-end sequences generating on one hand validated paired-ends file and the other hand single read file. Read cleaning has always been an important step in sequence analysis. The pyrocleaner python module is a Swiss knife dedicated to 454 reads cleaning. It includes commonly used filters as well as specialised ones such as duplicated read removal and paired-end read verification.

  3. Perception of randomness: On the time of streaks.

    PubMed

    Sun, Yanlong; Wang, Hongbin

    2010-12-01

    People tend to think that streaks in random sequential events are rare and remarkable. When they actually encounter streaks, they tend to consider the underlying process as non-random. The present paper examines the time of pattern occurrences in sequences of Bernoulli trials, and shows that among all patterns of the same length, a streak is the most delayed pattern for its first occurrence. It is argued that when time is of essence, how often a pattern is to occur (mean time, or, frequency) and when a pattern is to first occur (waiting time) are different questions and bear different psychological relevance. The waiting time statistics may provide a quantitative measure to the psychological distance when people are expecting a probabilistic event, and such measure is consistent with both of the representativeness and availability heuristics in people's perception of randomness. We discuss some of the recent empirical findings and suggest that people's judgment and generation of random sequences may be guided by their actual experiences of the waiting time statistics. Published by Elsevier Inc.

  4. Cluster Tails for Critical Power-Law Inhomogeneous Random Graphs

    NASA Astrophysics Data System (ADS)

    van der Hofstad, Remco; Kliem, Sandra; van Leeuwaarden, Johan S. H.

    2018-04-01

    Recently, the scaling limit of cluster sizes for critical inhomogeneous random graphs of rank-1 type having finite variance but infinite third moment degrees was obtained in Bhamidi et al. (Ann Probab 40:2299-2361, 2012). It was proved that when the degrees obey a power law with exponent τ \\in (3,4), the sequence of clusters ordered in decreasing size and multiplied through by n^{-(τ -2)/(τ -1)} converges as n→ ∞ to a sequence of decreasing non-degenerate random variables. Here, we study the tails of the limit of the rescaled largest cluster, i.e., the probability that the scaling limit of the largest cluster takes a large value u, as a function of u. This extends a related result of Pittel (J Combin Theory Ser B 82(2):237-269, 2001) for the Erdős-Rényi random graph to the setting of rank-1 inhomogeneous random graphs with infinite third moment degrees. We make use of delicate large deviations and weak convergence arguments.

  5. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE PAGES

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

    2015-05-12

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  6. Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

    Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less

  7. Random numbers from vacuum fluctuations

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shi, Yicheng; Kurtsiefer, Christian, E-mail: christian.kurtsiefer@gmail.com; Center for Quantum Technologies, National University of Singapore, 3 Science Drive 2, Singapore 117543

    2016-07-25

    We implement a quantum random number generator based on a balanced homodyne measurement of vacuum fluctuations of the electromagnetic field. The digitized signal is directly processed with a fast randomness extraction scheme based on a linear feedback shift register. The random bit stream is continuously read in a computer at a rate of about 480 Mbit/s and passes an extended test suite for random numbers.

  8. Investigating the Randomness of Numbers

    ERIC Educational Resources Information Center

    Pendleton, Kenn L.

    2009-01-01

    The use of random numbers is pervasive in today's world. Random numbers have practical applications in such far-flung arenas as computer simulations, cryptography, gambling, the legal system, statistical sampling, and even the war on terrorism. Evaluating the randomness of extremely large samples is a complex, intricate process. However, the…

  9. The Origins of Order: Self-Organization and Selection in Evolution

    NASA Astrophysics Data System (ADS)

    Kauffman, Stuart A.

    The following sections are included: * Introduction * Fitness Landscapes in Sequence Space * The NK Model of Rugged Fitness Landscapes * The NK Model of Random Epistatic Interactions * The Rank Order Statistics on K = N - 1 Random Landscapes * The number of local optima is very large * The expected fraction of fitter 1-mutant neighbors dwindles by 1/2 on each improvement step * Walks to local optima are short and vary as a logarithmic function of N * The expected time to reach an optimum is proportional to the dimensionality of the space * The ratio of accepted to tried mutations scales as lnN/N * Any genotype can only climb to a small fraction of the local optima * A small fraction of the genotypes can climb to any one optimum * Conflicting constraints cause a "complexity catastrophe": as complexity increase accessible adaptive peaks fall toward the mean fitness * The "Tunable" NK Family of Correlated Landscapes * Other Combinatorial Optimization Problems and Their Landscapes * Summary * References

  10. No evidence for MHC class II-based non-random mating at the gametic haplotype in Atlantic salmon.

    PubMed

    Promerová, M; Alavioon, G; Tusso, S; Burri, R; Immler, S

    2017-06-01

    Genes of the major histocompatibility complex (MHC) are a likely target of mate choice because of their role in inbreeding avoidance and potential benefits for offspring immunocompetence. Evidence for female choice for complementary MHC alleles among competing males exists both for the pre- and the postmating stages. However, it remains unclear whether the latter may involve non-random fusion of gametes depending on gametic haplotypes resulting in transmission ratio distortion or non-random sequence divergence among fused gametes. We tested whether non-random gametic fusion of MHC-II haplotypes occurs in Atlantic salmon Salmo salar. We performed in vitro fertilizations that excluded interindividual sperm competition using a split family design with large clutch sample sizes to test for a possible role of the gametic haplotype in mate choice. We sequenced two MHC-II loci in 50 embryos per clutch to assess allelic frequencies and sequence divergence. We found no evidence for transmission ratio distortion at two linked MHC-II loci, nor for non-random gamete fusion with respect to MHC-II alleles. Our findings suggest that the gametic MHC-II haplotypes play no role in gamete association in Atlantic salmon and that earlier findings of MHC-based mate choice most likely reflect choice among diploid genotypes. We discuss possible explanations for these findings and how they differ from findings in mammals.

  11. Estimating Genomic Distance from DNA Sequence Location in Cell Nuclei by a Random Walk Model

    NASA Astrophysics Data System (ADS)

    van den Engh, Ger; Sachs, Rainer; Trask, Barbara J.

    1992-09-01

    The folding of chromatin in interphase cell nuclei was studied by fluorescent in situ hybridization with pairs of unique DNA sequence probes. The sites of DNA sequences separated by 100 to 2000 kilobase pairs (kbp) are distributed in interphase chromatin according to a random walk model. This model provides the basis for calculating the spacing of sequences along the linear DNA molecule from interphase distance measurements. An interphase mapping strategy based on this model was tested with 13 probes from a 4-megabase pair (Mbp) region of chromosome 4 containing the Huntington disease locus. The results confirmed the locations of the probes and showed that the remaining gap in the published maps of this region is negligible in size. Interphase distance measurements should facilitate construction of chromosome maps with an average marker density of one per 100 kbp, approximately ten times greater than that achieved by hybridization to metaphase chromosomes.

  12. Artificial neural network study on organ-targeting peptides

    NASA Astrophysics Data System (ADS)

    Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

    2010-01-01

    We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.

  13. Genetic discovery in Xylella fastidiosa through sequence analysis of selected randomly amplified polymorphic DNAs.

    PubMed

    Chen, Jianchi; Civerolo, Edwin L; Jarret, Robert L; Van Sluys, Marie-Anne; de Oliveira, Mariana C

    2005-02-01

    Xylella fastidiosa causes many important plant diseases including Pierce's disease (PD) in grape and almond leaf scorch disease (ALSD). DNA-based methodologies, such as randomly amplified polymorphic DNA (RAPD) analysis, have been playing key roles in genetic information collection of the bacterium. This study further analyzed the nucleotide sequences of selected RAPDs from X. fastidiosa strains in conjunction with the available genome sequence databases and unveiled several previously unknown novel genetic traits. These include a sequence highly similar to those in the phage family of Podoviridae. Genome comparisons among X. fastidiosa strains suggested that the "phage" is currently active. Two other RAPDs were also related to horizontal gene transfer: one was part of a broadly distributed cryptic plasmid and the other was associated with conjugal transfer. One RAPD inferred a genomic rearrangement event among X. fastidiosa PD strains and another identified a single nucleotide polymorphism of evolutionary value.

  14. Using random forests for assistance in the curation of G-protein coupled receptor databases.

    PubMed

    Shkurin, Aleksei; Vellido, Alfredo

    2017-08-18

    Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.

  15. Multimodal brain-tumor segmentation based on Dirichlet process mixture model with anisotropic diffusion and Markov random field prior.

    PubMed

    Lu, Yisu; Jiang, Jun; Yang, Wei; Feng, Qianjin; Chen, Wufan

    2014-01-01

    Brain-tumor segmentation is an important clinical requirement for brain-tumor diagnosis and radiotherapy planning. It is well-known that the number of clusters is one of the most important parameters for automatic segmentation. However, it is difficult to define owing to the high diversity in appearance of tumor tissue among different patients and the ambiguous boundaries of lesions. In this study, a nonparametric mixture of Dirichlet process (MDP) model is applied to segment the tumor images, and the MDP segmentation can be performed without the initialization of the number of clusters. Because the classical MDP segmentation cannot be applied for real-time diagnosis, a new nonparametric segmentation algorithm combined with anisotropic diffusion and a Markov random field (MRF) smooth constraint is proposed in this study. Besides the segmentation of single modal brain-tumor images, we developed the algorithm to segment multimodal brain-tumor images by the magnetic resonance (MR) multimodal features and obtain the active tumor and edema in the same time. The proposed algorithm is evaluated using 32 multimodal MR glioma image sequences, and the segmentation results are compared with other approaches. The accuracy and computation time of our algorithm demonstrates very impressive performance and has a great potential for practical real-time clinical use.

  16. Multimodal Brain-Tumor Segmentation Based on Dirichlet Process Mixture Model with Anisotropic Diffusion and Markov Random Field Prior

    PubMed Central

    Lu, Yisu; Jiang, Jun; Chen, Wufan

    2014-01-01

    Brain-tumor segmentation is an important clinical requirement for brain-tumor diagnosis and radiotherapy planning. It is well-known that the number of clusters is one of the most important parameters for automatic segmentation. However, it is difficult to define owing to the high diversity in appearance of tumor tissue among different patients and the ambiguous boundaries of lesions. In this study, a nonparametric mixture of Dirichlet process (MDP) model is applied to segment the tumor images, and the MDP segmentation can be performed without the initialization of the number of clusters. Because the classical MDP segmentation cannot be applied for real-time diagnosis, a new nonparametric segmentation algorithm combined with anisotropic diffusion and a Markov random field (MRF) smooth constraint is proposed in this study. Besides the segmentation of single modal brain-tumor images, we developed the algorithm to segment multimodal brain-tumor images by the magnetic resonance (MR) multimodal features and obtain the active tumor and edema in the same time. The proposed algorithm is evaluated using 32 multimodal MR glioma image sequences, and the segmentation results are compared with other approaches. The accuracy and computation time of our algorithm demonstrates very impressive performance and has a great potential for practical real-time clinical use. PMID:25254064

  17. Frequency of RNA–RNA interaction in a model of the RNA World

    PubMed Central

    STRIGGLES, JOHN C.; MARTIN, MATTHEW B.; SCHMIDT, FRANCIS J.

    2006-01-01

    The RNA World model for prebiotic evolution posits the selection of catalytic/template RNAs from random populations. The mechanisms by which these random populations could be generated de novo are unclear. Non-enzymatic and RNA-catalyzed nucleic acid polymerizations are poorly processive, which means that the resulting short-chain RNA population could contain only limited diversity. Nonreciprocal recombination of smaller RNAs provides an alternative mechanism for the assembly of larger species with concomitantly greater structural diversity; however, the frequency of any specific recombination event in a random RNA population is limited by the low probability of an encounter between any two given molecules. This low probability could be overcome if the molecules capable of productive recombination were redundant, with many nonhomologous but functionally equivalent RNAs being present in a random population. Here we report fluctuation experiments to estimate the redundancy of the set of RNAs in a population of random sequences that are capable of non-Watson-Crick interaction with another RNA. Parallel SELEX experiments showed that at least one in 106 random 20-mers binds to the P5.1 stem–loop of Bacillus subtilis RNase P RNA with affinities equal to that of its naturally occurring partner. This high frequency predicts that a single RNA in an RNA World would encounter multiple interacting RNAs within its lifetime, supporting recombination as a plausible mechanism for prebiotic RNA evolution. The large number of equivalent species implies that the selection of any single interacting species in the RNA World would be a contingent event, i.e., one resulting from historical accident. PMID:16495233

  18. Tauberian theorems for Abel summability of sequences of fuzzy numbers

    NASA Astrophysics Data System (ADS)

    Yavuz, Enes; ćoşkun, Hüsamettin

    2015-09-01

    We give some conditions under which Abel summable sequences of fuzzy numbers are convergent. As corollaries we obtain the results given in [E. Yavuz, Ö. Talo, Abel summability of sequences of fuzzy numbers, Soft computing 2014, doi: 10.1007/s00500-014-1563-7].

  19. Fundamental Bounds for Sequence Reconstruction from Nanopore Sequencers.

    PubMed

    Magner, Abram; Duda, Jarosław; Szpankowski, Wojciech; Grama, Ananth

    2016-06-01

    Nanopore sequencers are emerging as promising new platforms for high-throughput sequencing. As with other technologies, sequencer errors pose a major challenge for their effective use. In this paper, we present a novel information theoretic analysis of the impact of insertion-deletion (indel) errors in nanopore sequencers. In particular, we consider the following problems: (i) for given indel error characteristics and rate, what is the probability of accurate reconstruction as a function of sequence length; (ii) using replicated extrusion (the process of passing a DNA strand through the nanopore), what is the number of replicas needed to accurately reconstruct the true sequence with high probability? Our results provide a number of important insights: (i) the probability of accurate reconstruction of a sequence from a single sample in the presence of indel errors tends quickly (i.e., exponentially) to zero as the length of the sequence increases; and (ii) replicated extrusion is an effective technique for accurate reconstruction. We show that for typical distributions of indel errors, the required number of replicas is a slow function (polylogarithmic) of sequence length - implying that through replicated extrusion, we can sequence large reads using nanopore sequencers. Moreover, we show that in certain cases, the required number of replicas can be related to information-theoretic parameters of the indel error distributions.

  20. The Use of a Sequenced Questioning Paradigm to Facilitate Associative Fluency in Preschoolers.

    ERIC Educational Resources Information Center

    Pellegrini, A. D.; Greene, Helen

    The extent to which free play versus sequenced questioning conditions facilitates preschoolers' associative fluency was investigated in this study. Twenty-four children (12 boys and 12 girls, with a mean age of 50.7 months) were randomly assigned to one of three conditions: free play, sequenced questioning, and control. In the sequenced…

Top