randomly sampled sequences: Topics by Science.gov

Sample records for randomly sampled sequences

Reduction of display artifacts by random sampling

NASA Technical Reports Server (NTRS)

Ahumada, A. J., Jr.; Nagel, D. C.; Watson, A. B.; Yellott, J. I., Jr.

1983-01-01

The application of random-sampling techniques to remove visible artifacts (such as flicker, moire patterns, and paradoxical motion) introduced in TV-type displays by discrete sequential scanning is discussed and demonstrated. Sequential-scanning artifacts are described; the window of visibility defined in spatiotemporal frequency space by Watson and Ahumada (1982 and 1983) and Watson et al. (1983) is explained; the basic principles of random sampling are reviewed and illustrated by the case of the human retina; and it is proposed that the sampling artifacts can be replaced by random noise, which can then be shifted to frequency-space regions outside the window of visibility. Vertical sequential, single-random-sequence, and continuously renewed random-sequence plotting displays generating 128 points at update rates up to 130 Hz are applied to images of stationary and moving lines, and best results are obtained with the single random sequence for the stationary lines and with the renewed random sequence for the moving lines.
Molecular analysis of the microbial diversity present in the colonic wall, colonic lumen, and cecal lumen of a pig.

PubMed

Pryde, S E; Richardson, A J; Stewart, C S; Flint, H J

1999-12-01

Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined.
Molecular Analysis of the Microbial Diversity Present in the Colonic Wall, Colonic Lumen, and Cecal Lumen of a Pig

PubMed Central

Pryde, Susan E.; Richardson, Anthony J.; Stewart, Colin S.; Flint, Harry J.

1999-01-01

Random clones of 16S ribosomal DNA gene sequences were isolated after PCR amplification with eubacterial primers from total genomic DNA recovered from samples of the colonic lumen, colonic wall, and cecal lumen from a pig. Sequences were also obtained for cultures isolated anaerobically from the same colonic-wall sample. Phylogenetic analysis showed that many sequences were related to those of Lactobacillus or Streptococcus spp. or fell into clusters IX, XIVa, and XI of gram-positive bacteria. In addition, 59% of randomly cloned sequences showed less than 95% similarity to database entries or sequences from cultivated organisms. Cultivation bias is also suggested by the fact that the majority of isolates (54%) recovered from the colon wall by culturing were related to Lactobacillus and Streptococcus, whereas this group accounted for only one-third of the sequence variation for the same sample from random cloning. The remaining cultured isolates were mainly Selenomonas related. A higher proportion of Lactobacillus reuteri-related sequences than of Lactobacillus acidophilus- and Lactobacillus amylovorus-related sequences were present in the colonic-wall sample. Since the majority of bacterial ribosomal sequences recovered from the colon wall are less than 95% related to known organisms, the roles of many of the predominant wall-associated bacteria remain to be defined. PMID:10583991
Novel application of the MSSCP method in biodiversity studies.

PubMed

Tomczyk-Żak, Karolina; Kaczanowski, Szymon; Górecka, Magdalena; Zielenkiewicz, Urszula

2012-02-01

Analysis of 16S rRNA sequence diversity is widely performed for characterizing the biodiversity of microbial samples. The number of determined sequences has a considerable impact on complete results. Although the cost of mass sequencing is decreasing, it is often still too high for individual projects. We applied the multi-temperature single-strand conformational polymorphism (MSSCP) method to decrease the number of analysed sequences. This was a novel application of this method. As a control, the same sample was analysed using random sequencing. In this paper, we adapted the MSSCP technique for screening of unique sequences of the 16S rRNA gene library and bacterial strains isolated from biofilms growing on the walls of an ancient gold mine in Poland and determined whether the results obtained by both methods differed and whether random sequencing could be replaced by MSSCP. Although it was biased towards the detection of rare sequences in the samples, the qualitative results of MSSCP were not different than those of random sequencing. Unambiguous discrimination of unique clones and strains creates an opportunity to effectively estimate the biodiversity of natural communities, especially in populations which are numerous but species poor. Copyright © 2012 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Random sampling causes the low reproducibility of rare eukaryotic OTUs in Illumina COI metabarcoding.

PubMed

Leray, Matthieu; Knowlton, Nancy

2017-01-01

DNA metabarcoding, the PCR-based profiling of natural communities, is becoming the method of choice for biodiversity monitoring because it circumvents some of the limitations inherent to traditional ecological surveys. However, potential sources of bias that can affect the reproducibility of this method remain to be quantified. The interpretation of differences in patterns of sequence abundance and the ecological relevance of rare sequences remain particularly uncertain. Here we used one artificial mock community to explore the significance of abundance patterns and disentangle the effects of two potential biases on data reproducibility: indexed PCR primers and random sampling during Illumina MiSeq sequencing. We amplified a short fragment of the mitochondrial Cytochrome c Oxidase Subunit I (COI) for a single mock sample containing equimolar amounts of total genomic DNA from 34 marine invertebrates belonging to six phyla. We used seven indexed broad-range primers and sequenced the resulting library on two consecutive Illumina MiSeq runs. The total number of Operational Taxonomic Units (OTUs) was ∼4 times higher than expected based on the composition of the mock sample. Moreover, the total number of reads for the 34 components of the mock sample differed by up to three orders of magnitude. However, 79 out of 86 of the unexpected OTUs were represented by <10 sequences that did not appear consistently across replicates. Our data suggest that random sampling of rare OTUs (e.g., small associated fauna such as parasites) accounted for most of variation in OTU presence-absence, whereas biases associated with indexed PCRs accounted for a larger amount of variation in relative abundance patterns. These results suggest that random sampling during sequencing leads to the low reproducibility of rare OTUs. We suggest that the strategy for handling rare OTUs should depend on the objectives of the study. Systematic removal of rare OTUs may avoid inflating diversity based on common β descriptors but will exclude positive records of taxa that are functionally important. Our results further reinforce the need for technical replicates (parallel PCR and sequencing from the same sample) in metabarcoding experimental designs. Data reproducibility should be determined empirically as it will depend upon the sequencing depth, the type of sample, the sequence analysis pipeline, and the number of replicates. Moreover, estimating relative biomasses or abundances based on read counts remains elusive at the OTU level.
Sequential time interleaved random equivalent sampling for repetitive signal.

PubMed

Zhao, Yijiu; Liu, Jingjing

2016-12-01

Compressed sensing (CS) based sampling techniques exhibit many advantages over other existing approaches for sparse signal spectrum sensing; they are also incorporated into non-uniform sampling signal reconstruction to improve the efficiency, such as random equivalent sampling (RES). However, in CS based RES, only one sample of each acquisition is considered in the signal reconstruction stage, and it will result in more acquisition runs and longer sampling time. In this paper, a sampling sequence is taken in each RES acquisition run, and the corresponding block measurement matrix is constructed using a Whittaker-Shannon interpolation formula. All the block matrices are combined into an equivalent measurement matrix with respect to all sampling sequences. We implemented the proposed approach with a multi-cores analog-to-digital converter (ADC), whose ADC cores are time interleaved. A prototype realization of this proposed CS based sequential random equivalent sampling method has been developed. It is able to capture an analog waveform at an equivalent sampling rate of 40 GHz while sampled at 1 GHz physically. Experiments indicate that, for a sparse signal, the proposed CS based sequential random equivalent sampling exhibits high efficiency.
Optimization and validation of sample preparation for metagenomic sequencing of viruses in clinical samples.

PubMed

Lewandowska, Dagmara W; Zagordi, Osvaldo; Geissberger, Fabienne-Desirée; Kufner, Verena; Schmutz, Stefan; Böni, Jürg; Metzner, Karin J; Trkola, Alexandra; Huber, Michael

2017-08-08

Sequence-specific PCR is the most common approach for virus identification in diagnostic laboratories. However, as specific PCR only detects pre-defined targets, novel virus strains or viruses not included in routine test panels will be missed. Recently, advances in high-throughput sequencing allow for virus-sequence-independent identification of entire virus populations in clinical samples, yet standardized protocols are needed to allow broad application in clinical diagnostics. Here, we describe a comprehensive sample preparation protocol for high-throughput metagenomic virus sequencing using random amplification of total nucleic acids from clinical samples. In order to optimize metagenomic sequencing for application in virus diagnostics, we tested different enrichment and amplification procedures on plasma samples spiked with RNA and DNA viruses. A protocol including filtration, nuclease digestion, and random amplification of RNA and DNA in separate reactions provided the best results, allowing reliable recovery of viral genomes and a good correlation of the relative number of sequencing reads with the virus input. We further validated our method by sequencing a multiplexed viral pathogen reagent containing a range of human viruses from different virus families. Our method proved successful in detecting the majority of the included viruses with high read numbers and compared well to other protocols in the field validated against the same reference reagent. Our sequencing protocol does work not only with plasma but also with other clinical samples such as urine and throat swabs. The workflow for virus metagenomic sequencing that we established proved successful in detecting a variety of viruses in different clinical samples. Our protocol supplements existing virus-specific detection strategies providing opportunities to identify atypical and novel viruses commonly not accounted for in routine diagnostic panels.
Random-effects linear modeling and sample size tables for two special crossover designs of average bioequivalence studies: the four-period, two-sequence, two-formulation and six-period, three-sequence, three-formulation designs.

PubMed

Diaz, Francisco J; Berg, Michel J; Krebill, Ron; Welty, Timothy; Gidal, Barry E; Alloway, Rita; Privitera, Michael

2013-12-01

Due to concern and debate in the epilepsy medical community and to the current interest of the US Food and Drug Administration (FDA) in revising approaches to the approval of generic drugs, the FDA is currently supporting ongoing bioequivalence studies of antiepileptic drugs, the EQUIGEN studies. During the design of these crossover studies, the researchers could not find commercial or non-commercial statistical software that quickly allowed computation of sample sizes for their designs, particularly software implementing the FDA requirement of using random-effects linear models for the analyses of bioequivalence studies. This article presents tables for sample-size evaluations of average bioequivalence studies based on the two crossover designs used in the EQUIGEN studies: the four-period, two-sequence, two-formulation design, and the six-period, three-sequence, three-formulation design. Sample-size computations assume that random-effects linear models are used in bioequivalence analyses with crossover designs. Random-effects linear models have been traditionally viewed by many pharmacologists and clinical researchers as just mathematical devices to analyze repeated-measures data. In contrast, a modern view of these models attributes an important mathematical role in theoretical formulations in personalized medicine to them, because these models not only have parameters that represent average patients, but also have parameters that represent individual patients. Moreover, the notation and language of random-effects linear models have evolved over the years. Thus, another goal of this article is to provide a presentation of the statistical modeling of data from bioequivalence studies that highlights the modern view of these models, with special emphasis on power analyses and sample-size computations.
Deep Sequencing to Identify the Causes of Viral Encephalitis

PubMed Central

Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

2014-01-01

Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
The Performance of the Date-Randomization Test in Phylogenetic Analyses of Time-Structured Virus Data.

PubMed

Duchêne, Sebastián; Duchêne, David; Holmes, Edward C; Ho, Simon Y W

2015-07-01

Rates and timescales of viral evolution can be estimated using phylogenetic analyses of time-structured molecular sequences. This involves the use of molecular-clock methods, calibrated by the sampling times of the viral sequences. However, the spread of these sampling times is not always sufficient to allow the substitution rate to be estimated accurately. We conducted Bayesian phylogenetic analyses of simulated virus data to evaluate the performance of the date-randomization test, which is sometimes used to investigate whether time-structured data sets have temporal signal. An estimate of the substitution rate passes this test if its mean does not fall within the 95% credible intervals of rate estimates obtained using replicate data sets in which the sampling times have been randomized. We find that the test sometimes fails to detect rate estimates from data with no temporal signal. This error can be minimized by using a more conservative criterion, whereby the 95% credible interval of the estimate with correct sampling times should not overlap with those obtained with randomized sampling times. We also investigated the behavior of the test when the sampling times are not uniformly distributed throughout the tree, which sometimes occurs in empirical data sets. The test performs poorly in these circumstances, such that a modification to the randomization scheme is needed. Finally, we illustrate the behavior of the test in analyses of nucleotide sequences of cereal yellow dwarf virus. Our results validate the use of the date-randomization test and allow us to propose guidelines for interpretation of its results. © The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Fossils out of sequence: Computer simulations and strategies for dealing with stratigraphic disorder

DOE Office of Scientific and Technical Information (OSTI.GOV)

Cutler, A.H.; Flessa, K.W.

Microstratigraphic resolution is limited by vertical mixing and reworking of fossils. Stratigraphic disorder is the degree to which fossils within a stratigraphic sequence are not in proper chronological order. Stratigraphic disorder arises through in situ vertical mixing of fossils and reworking of older fossils into younger deposits. The authors simulated the effects of mixing and reworking by simple computer models, and measured stratigraphic disorder using rank correlation between age and stratigraphic position (Spearman and Kendall coefficients). Mixing was simulated by randomly transposing pairs of adjacent fossils in a sequence. Reworking was simulated by randomly inserting older fossils into a youngermore » sequence. Mixing is an inefficient means of producing disorder; after 500 mixing steps stratigraphic order is still significant at the 99% to 95% level, depending on the coefficient used. Reworking disorders sequences very efficiently: significant order begins to be lost when reworked shells make up 35% of the sequence. Thus a sequence can be dominated by undisturbed, autochthonous shells and still be disordered. The effects of mixing-produced disorder can be minimized by increasing sample size at each horizon. Increased spacing between samples is of limited utility in dealing with disordered sequences: while widely separated samples are more likely to be stratigraphically ordered, the smaller number of samples makes the detection of trends problematic.« less
Revisiting sample size: are big trials the answer?

PubMed

Lurati Buse, Giovanna A L; Botto, Fernando; Devereaux, P J

2012-07-18

The superiority of the evidence generated in randomized controlled trials over observational data is not only conditional to randomization. Randomized controlled trials require proper design and implementation to provide a reliable effect estimate. Adequate random sequence generation, allocation implementation, analyses based on the intention-to-treat principle, and sufficient power are crucial to the quality of a randomized controlled trial. Power, or the probability of the trial to detect a difference when a real difference between treatments exists, strongly depends on sample size. The quality of orthopaedic randomized controlled trials is frequently threatened by a limited sample size. This paper reviews basic concepts and pitfalls in sample-size estimation and focuses on the importance of large trials in the generation of valid evidence.
[Krigle estimation and its simulated sampling of Chilo suppressalis population density].

PubMed

Yuan, Zheming; Bai, Lianyang; Wang, Kuiwu; Hu, Xiangyue

2004-07-01

In order to draw up a rational sampling plan for the larvae population of Chilo suppressalis, an original population and its two derivative populations, random population and sequence population, were sampled and compared with random sampling, gap-range-random sampling, and a new systematic sampling integrated Krigle interpolation and random original position. As for the original population whose distribution was up to aggregative and dependence range in line direction was 115 cm (6.9 units), gap-range-random sampling in line direction was more precise than random sampling. Distinguishing the population pattern correctly is the key to get a better precision. Gap-range-random sampling and random sampling are fit for aggregated population and random population, respectively, but both of them are difficult to apply in practice. Therefore, a new systematic sampling named as Krigle sample (n = 441) was developed to estimate the density of partial sample (partial estimation, n = 441) and population (overall estimation, N = 1500). As for original population, the estimated precision of Krigle sample to partial sample and population was better than that of investigation sample. With the increase of the aggregation intensity of population, Krigel sample was more effective than investigation sample in both partial estimation and overall estimation in the appropriate sampling gap according to the dependence range.
Method of multiplexed analysis using ion mobility spectrometer

DOEpatents

Belov, Mikhail E [Richland, WA; Smith, Richard D [Richland, WA

2009-06-02

A method for analyzing analytes from a sample introduced into a Spectrometer by generating a pseudo random sequence of a modulation bins, organizing each modulation bin as a series of submodulation bins, thereby forming an extended pseudo random sequence of submodulation bins, releasing the analytes in a series of analyte packets into a Spectrometer, thereby generating an unknown original ion signal vector, detecting the analytes at a detector, and characterizing the sample using the plurality of analyte signal subvectors. The method is advantageously applied to an Ion Mobility Spectrometer, and an Ion Mobility Spectrometer interfaced with a Time of Flight Mass Spectrometer.
Rapid Quantification of Mutant Fitness in Diverse Bacteria by Sequencing Randomly Bar-Coded Transposons

PubMed Central

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; Lamson, Jacob S.; He, Jennifer; Hoover, Cindi A.; Blow, Matthew J.; Bristow, James; Butland, Gareth

2015-01-01

ABSTRACT Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with any transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative d-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. PMID:25968644
Generation of Aptamers from A Primer-Free Randomized ssDNA Library Using Magnetic-Assisted Rapid Aptamer Selection

NASA Astrophysics Data System (ADS)

Tsao, Shih-Ming; Lai, Ji-Ching; Horng, Horng-Er; Liu, Tu-Chen; Hong, Chin-Yih

2017-04-01

Aptamers are oligonucleotides that can bind to specific target molecules. Most aptamers are generated using random libraries in the standard systematic evolution of ligands by exponential enrichment (SELEX). Each random library contains oligonucleotides with a randomized central region and two fixed primer regions at both ends. The fixed primer regions are necessary for amplifying target-bound sequences by PCR. However, these extra-sequences may cause non-specific bindings, which potentially interfere with good binding for random sequences. The Magnetic-Assisted Rapid Aptamer Selection (MARAS) is a newly developed protocol for generating single-strand DNA aptamers. No repeat selection cycle is required in the protocol. This study proposes and demonstrates a method to isolate aptamers for C-reactive proteins (CRP) from a randomized ssDNA library containing no fixed sequences at 5‧ and 3‧ termini using the MARAS platform. Furthermore, the isolated primer-free aptamer was sequenced and binding affinity for CRP was analyzed. The specificity of the obtained aptamer was validated using blind serum samples. The result was consistent with monoclonal antibody-based nephelometry analysis, which indicated that a primer-free aptamer has high specificity toward targets. MARAS is a feasible platform for efficiently generating primer-free aptamers for clinical diagnoses.
FastRNABindR: Fast and Accurate Prediction of Protein-RNA Interface Residues.

PubMed

El-Manzalawy, Yasser; Abbas, Mostafa; Malluhi, Qutaibah; Honavar, Vasant

2016-01-01

A wide range of biological processes, including regulation of gene expression, protein synthesis, and replication and assembly of many viruses are mediated by RNA-protein interactions. However, experimental determination of the structures of protein-RNA complexes is expensive and technically challenging. Hence, a number of computational tools have been developed for predicting protein-RNA interfaces. Some of the state-of-the-art protein-RNA interface predictors rely on position-specific scoring matrix (PSSM)-based encoding of the protein sequences. The computational efforts needed for generating PSSMs severely limits the practical utility of protein-RNA interface prediction servers. In this work, we experiment with two approaches, random sampling and sequence similarity reduction, for extracting a representative reference database of protein sequences from more than 50 million protein sequences in UniRef100. Our results suggest that random sampled databases produce better PSSM profiles (in terms of the number of hits used to generate the profile and the distance of the generated profile to the corresponding profile generated using the entire UniRef100 data as well as the accuracy of the machine learning classifier trained using these profiles). Based on our results, we developed FastRNABindR, an improved version of RNABindR for predicting protein-RNA interface residues using PSSM profiles generated using 1% of the UniRef100 sequences sampled uniformly at random. To the best of our knowledge, FastRNABindR is the only protein-RNA interface residue prediction online server that requires generation of PSSM profiles for query sequences and accepts hundreds of protein sequences per submission. Our approach for determining the optimal BLAST database for a protein-RNA interface residue classification task has the potential of substantially speeding up, and hence increasing the practical utility of, other amino acid sequence based predictors of protein-protein and protein-DNA interfaces.
Effects of 16S rDNA sampling on estimates of the number of endosymbiont lineages in sucking lice

PubMed Central

Burleigh, J. Gordon; Light, Jessica E.; Reed, David L.

2016-01-01

Phylogenetic trees can reveal the origins of endosymbiotic lineages of bacteria and detect patterns of co-evolution with their hosts. Although taxon sampling can greatly affect phylogenetic and co-evolutionary inference, most hypotheses of endosymbiont relationships are based on few available bacterial sequences. Here we examined how different sampling strategies of Gammaproteobacteria sequences affect estimates of the number of endosymbiont lineages in parasitic sucking lice (Insecta: Phthirapatera: Anoplura). We estimated the number of louse endosymbiont lineages using both newly obtained and previously sequenced 16S rDNA bacterial sequences and more than 42,000 16S rDNA sequences from other Gammaproteobacteria. We also performed parametric and nonparametric bootstrapping experiments to examine the effects of phylogenetic error and uncertainty on these estimates. Sampling of 16S rDNA sequences affects the estimates of endosymbiont diversity in sucking lice until we reach a threshold of genetic diversity, the size of which depends on the sampling strategy. Sampling by maximizing the diversity of 16S rDNA sequences is more efficient than randomly sampling available 16S rDNA sequences. Although simulation results validate estimates of multiple endosymbiont lineages in sucking lice, the bootstrap results suggest that the precise number of endosymbiont origins is still uncertain. PMID:27547523
Direct generation of all-optical random numbers from optical pulse amplitude chaos.

PubMed

Li, Pu; Wang, Yun-Cai; Wang, An-Bang; Yang, Ling-Zhen; Zhang, Ming-Jiang; Zhang, Jian-Zhong

2012-02-13

We propose and theoretically demonstrate an all-optical method for directly generating all-optical random numbers from pulse amplitude chaos produced by a mode-locked fiber ring laser. Under an appropriate pump intensity, the mode-locked laser can experience a quasi-periodic route to chaos. Such a chaos consists of a stream of pulses with a fixed repetition frequency but random intensities. In this method, we do not require sampling procedure and external triggered clocks but directly quantize the chaotic pulses stream into random number sequence via an all-optical flip-flop. Moreover, our simulation results show that the pulse amplitude chaos has no periodicity and possesses a highly symmetric distribution of amplitude. Thus, in theory, the obtained random number sequence without post-processing has a high-quality randomness verified by industry-standard statistical tests.
PCV2d-2 is the predominant type of PCV2 DNA in pig samples collected in the U.S. during 2014-2016.

PubMed

Xiao, Chao-Ting; Harmon, Karen M; Halbur, Patrick G; Opriessnig, Tanja

2016-12-25

Porcine circovirus type 2 (PCV2) vaccination was introduced in the US in 2006 and since has been adopted by most pig producers. While porcine circovirus associated disease (PCVAD) outbreaks are now relatively uncommon in the US, PCV2 remains a concern which is emphasized by increasing numbers of PCR and sequencing requests for PCV2. In the present study, randomly selected lung tissues from 586 pigs submitted in 2015 were tested for presence of PCV2 DNA. Positive samples were further characterized by sequencing and combined with available PCV2 open-reading-frame (ORF) 2 sequences from the client data base of the Iowa State University Veterinary Diagnostic Laboratory. The prevalence of PCV2 in the randomly selected lung tissues was 23% (135/586) with 11.3% PCV2a, 29% PCV2b and 71.8% for PCV2d subgroup PCV2d-2. A total of 455 ORF2 sequences obtained from 2014 through 2016 were analyzed and PCV2d accounted for 66.7% of the 2014 sequences, 71.8% of the 2015 sequences, and 72% of the 2016 sequences. Interestingly, only 1.9% (9/455) of the sequences belonged to the recently identified PCV2e genotype. The present data indicates that despite an almost 100% PCV2 vaccine coverage in the US, PCV2 DNA can still be detected in almost 1 of 4 randomly selected pig tissues. PCV2d-2 is now the predominant genotype in the USA suggesting that PCV2d-2 may have some advantage over PCV2a and PCV2b in its ability to replicate in pigs under vaccination pressure. Copyright © 2016. Published by Elsevier B.V.

Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE PAGES

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.; ...

2015-05-12

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
Rapid quantification of mutant fitness in diverse bacteria by sequencing randomly bar-coded transposons

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wetmore, Kelly M.; Price, Morgan N.; Waters, Robert J.

Transposon mutagenesis with next-generation sequencing (TnSeq) is a powerful approach to annotate gene function in bacteria, but existing protocols for TnSeq require laborious preparation of every sample before sequencing. Thus, the existing protocols are not amenable to the throughput necessary to identify phenotypes and functions for the majority of genes in diverse bacteria. Here, we present a method, random bar code transposon-site sequencing (RB-TnSeq), which increases the throughput of mutant fitness profiling by incorporating random DNA bar codes into Tn5 and mariner transposons and by using bar code sequencing (BarSeq) to assay mutant fitness. RB-TnSeq can be used with anymore » transposon, and TnSeq is performed once per organism instead of once per sample. Each BarSeq assay requires only a simple PCR, and 48 to 96 samples can be sequenced on one lane of an Illumina HiSeq system. We demonstrate the reproducibility and biological significance of RB-TnSeq with Escherichia coli, Phaeobacter inhibens, Pseudomonas stutzeri, Shewanella amazonensis, and Shewanella oneidensis. To demonstrate the increased throughput of RB-TnSeq, we performed 387 successful genome-wide mutant fitness assays representing 130 different bacterium-carbon source combinations and identified 5,196 genes with significant phenotypes across the five bacteria. In P. inhibens, we used our mutant fitness data to identify genes important for the utilization of diverse carbon substrates, including a putative D-mannose isomerase that is required for mannitol catabolism. RB-TnSeq will enable the cost-effective functional annotation of diverse bacteria using mutant fitness profiling. A large challenge in microbiology is the functional assessment of the millions of uncharacterized genes identified by genome sequencing. Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach to assign phenotypes and functions to genes. However, the current strategies for TnSeq are too laborious to be applied to hundreds of experimental conditions across multiple bacteria. Here, we describe an approach, random bar code transposon-site sequencing (RB-TnSeq), which greatly simplifies the measurement of gene fitness by using bar code sequencing (BarSeq) to monitor the abundance of mutants. We performed 387 genome-wide fitness assays across five bacteria and identified phenotypes for over 5,000 genes. RB-TnSeq can be applied to diverse bacteria and is a powerful tool to annotate uncharacterized genes using phenotype data.« less
A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing.

PubMed

Oono, Ryoko

2017-01-01

High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions 'how and why are communities different?' This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences.
A confidence interval analysis of sampling effort, sequencing depth, and taxonomic resolution of fungal community ecology in the era of high-throughput sequencing

PubMed Central

2017-01-01

High-throughput sequencing technology has helped microbial community ecologists explore ecological and evolutionary patterns at unprecedented scales. The benefits of a large sample size still typically outweigh that of greater sequencing depths per sample for accurate estimations of ecological inferences. However, excluding or not sequencing rare taxa may mislead the answers to the questions ‘how and why are communities different?’ This study evaluates the confidence intervals of ecological inferences from high-throughput sequencing data of foliar fungal endophytes as case studies through a range of sampling efforts, sequencing depths, and taxonomic resolutions to understand how technical and analytical practices may affect our interpretations. Increasing sampling size reliably decreased confidence intervals across multiple community comparisons. However, the effects of sequencing depths on confidence intervals depended on how rare taxa influenced the dissimilarity estimates among communities and did not significantly decrease confidence intervals for all community comparisons. A comparison of simulated communities under random drift suggests that sequencing depths are important in estimating dissimilarities between microbial communities under neutral selective processes. Confidence interval analyses reveal important biases as well as biological trends in microbial community studies that otherwise may be ignored when communities are only compared for statistically significant differences. PMID:29253889
A fosmid cloning strategy for detecting the widest possible spectrum of microbes from the international space station drinking water system.

PubMed

Choi, Sangdun; Chang, Mi Sook; Stuecker, Tara; Chung, Christine; Newcombe, David A; Venkateswaran, Kasthuri

2012-12-01

In this study, fosmid cloning strategies were used to assess the microbial populations in water from the International Space Station (ISS) drinking water system (henceforth referred to as Prebiocide and Tank A water samples). The goals of this study were: to compare the sensitivity of the fosmid cloning strategy with that of traditional culture-based and 16S rRNA-based approaches and to detect the widest possible spectrum of microbial populations during the water purification process. Initially, microbes could not be cultivated, and conventional PCR failed to amplify 16S rDNA fragments from these low biomass samples. Therefore, randomly primed rolling-circle amplification was used to amplify any DNA that might be present in the samples, followed by size selection by using pulsed-field gel electrophoresis. The amplified high-molecular-weight DNA from both samples was cloned into fosmid vectors. Several hundred clones were randomly selected for sequencing, followed by Blastn/Blastx searches. Sequences encoding specific genes from Burkholderia, a species abundant in the soil and groundwater, were found in both samples. Bradyrhizobium and Mesorhizobium, which belong to rhizobia, a large community of nitrogen fixers often found in association with plant roots, were present in the Prebiocide samples. Ralstonia, which is prevalent in soils with a high heavy metal content, was detected in the Tank A samples. The detection of many unidentified sequences suggests the presence of potentially novel microbial fingerprints. The bacterial diversity detected in this pilot study using a fosmid vector approach was higher than that detected by conventional 16S rRNA gene sequencing.
Random whole metagenomic sequencing for forensic discrimination of soils.

PubMed

Khodakova, Anastasia S; Smith, Renee J; Burgoyne, Leigh; Abarno, Damien; Linacre, Adrian

2014-01-01

Here we assess the ability of random whole metagenomic sequencing approaches to discriminate between similar soils from two geographically distinct urban sites for application in forensic science. Repeat samples from two parklands in residential areas separated by approximately 3 km were collected and the DNA was extracted. Shotgun, whole genome amplification (WGA) and single arbitrarily primed DNA amplification (AP-PCR) based sequencing techniques were then used to generate soil metagenomic profiles. Full and subsampled metagenomic datasets were then annotated against M5NR/M5RNA (taxonomic classification) and SEED Subsystems (metabolic classification) databases. Further comparative analyses were performed using a number of statistical tools including: hierarchical agglomerative clustering (CLUSTER); similarity profile analysis (SIMPROF); non-metric multidimensional scaling (NMDS); and canonical analysis of principal coordinates (CAP) at all major levels of taxonomic and metabolic classification. Our data showed that shotgun and WGA-based approaches generated highly similar metagenomic profiles for the soil samples such that the soil samples could not be distinguished accurately. An AP-PCR based approach was shown to be successful at obtaining reproducible site-specific metagenomic DNA profiles, which in turn were employed for successful discrimination of visually similar soil samples collected from two different locations.
Systematic Evaluation of the Dependence of Deoxyribozyme Catalysis on Random Region Length

PubMed Central

Velez, Tania E.; Singh, Jaydeep; Xiao, Ying; Allen, Emily C.; Wong, On Yi; Chandra, Madhavaiah; Kwon, Sarah C.; Silverman, Scott K.

2012-01-01

Functional nucleic acids are DNA and RNA aptamers that bind targets, or they are deoxyribozymes and ribozymes that have catalytic activity. These functional DNA and RNA sequences can be identified from random-sequence pools by in vitro selection, which requires choosing the length of the random region. Shorter random regions allow more complete coverage of sequence space but may not permit the structural complexity necessary for binding or catalysis. In contrast, longer random regions are sampled incompletely but may allow adoption of more complicated structures that enable function. In this study, we systematically examined random region length (N20 through N60) for two particular deoxyribozyme catalytic activities, DNA cleavage and tyrosine-RNA nucleopeptide linkage formation. For both activities, we previously identified deoxyribozymes using only N40 regions. In the case of DNA cleavage, here we found that shorter N20 and N30 regions allowed robust catalytic function, either by DNA hydrolysis or by DNA deglycosylation and strand scission via β-elimination, whereas longer N50 and N60 regions did not lead to catalytically active DNA sequences. Follow-up selections with N20, N30, and N40 regions revealed an interesting interplay of metal ion cofactors and random region length. Separately, for Tyr-RNA linkage formation, N30 and N60 regions provided catalytically active sequences, whereas N20 was unsuccessful, and the N40 deoxyribozymes were functionally superior (in terms of rate and yield) to N30 and N60. Collectively, the results indicate that with future in vitro selection experiments for DNA and RNA catalysts, and by extension for aptamers, random region length should be an important experimental variable. PMID:23088677
Viral metagenomic analysis of feces of wild small carnivores

PubMed Central

2014-01-01

Background Recent studies have clearly demonstrated the enormous virus diversity that exists among wild animals. This exemplifies the required expansion of our knowledge of the virus diversity present in wildlife, as well as the potential transmission of these viruses to domestic animals or humans. Methods In the present study we evaluated the viral diversity of fecal samples (n = 42) collected from 10 different species of wild small carnivores inhabiting the northern part of Spain using random PCR in combination with next-generation sequencing. Samples were collected from American mink (Neovison vison), European mink (Mustela lutreola), European polecat (Mustela putorius), European pine marten (Martes martes), stone marten (Martes foina), Eurasian otter (Lutra lutra) and Eurasian badger (Meles meles) of the family of Mustelidae; common genet (Genetta genetta) of the family of Viverridae; red fox (Vulpes vulpes) of the family of Canidae and European wild cat (Felis silvestris) of the family of Felidae. Results A number of sequences of possible novel viruses or virus variants were detected, including a theilovirus, phleboviruses, an amdovirus, a kobuvirus and picobirnaviruses. Conclusions Using random PCR in combination with next generation sequencing, sequences of various novel viruses or virus variants were detected in fecal samples collected from Spanish carnivores. Detected novel viruses highlight the viral diversity that is present in fecal material of wild carnivores. PMID:24886057
Assessing the Relationship of Ancient and Modern Populations

PubMed Central

Schraiber, Joshua G.

2018-01-01

Genetic material sequenced from ancient samples is revolutionizing our understanding of the recent evolutionary past. However, ancient DNA is often degraded, resulting in low coverage, error-prone sequencing. Several solutions exist to this problem, ranging from simple approach, such as selecting a read at random for each site, to more complicated approaches involving genotype likelihoods. In this work, we present a novel method for assessing the relationship of an ancient sample with a modern population, while accounting for sequencing error and postmortem damage by analyzing raw reads from multiple ancient individuals simultaneously. We show that, when analyzing SNP data, it is better to sequence more ancient samples to low coverage: two samples sequenced to 0.5× coverage provide better resolution than a single sample sequenced to 2× coverage. We also examined the power to detect whether an ancient sample is directly ancestral to a modern population, finding that, with even a few high coverage individuals, even ancient samples that are very slightly diverged from the modern population can be detected with ease. When we applied our approach to European samples, we found that no ancient samples represent direct ancestors of modern Europeans. We also found that, as shown previously, the most ancient Europeans appear to have had the smallest effective population sizes, indicating a role for agriculture in modern population growth. PMID:29167200
Simultaneous genomic identification and profiling of a single cell using semiconductor-based next generation sequencing.

PubMed

Watanabe, Manabu; Kusano, Junko; Ohtaki, Shinsaku; Ishikura, Takashi; Katayama, Jin; Koguchi, Akira; Paumen, Michael; Hayashi, Yoshiharu

2014-09-01

Combining single-cell methods and next-generation sequencing should provide a powerful means to understand single-cell biology and obviate the effects of sample heterogeneity. Here we report a single-cell identification method and seamless cancer gene profiling using semiconductor-based massively parallel sequencing. A549 cells (adenocarcinomic human alveolar basal epithelial cell line) were used as a model. Single-cell capture was performed using laser capture microdissection (LCM) with an Arcturus® XT system, and a captured single cell and a bulk population of A549 cells (≈ 10(6) cells) were subjected to whole genome amplification (WGA). For cell identification, a multiplex PCR method (AmpliSeq™ SNP HID panel) was used to enrich 136 highly discriminatory SNPs with a genotype concordance probability of 10(31-35). For cancer gene profiling, we used mutation profiling that was performed in parallel using a hotspot panel for 50 cancer-related genes. Sequencing was performed using a semiconductor-based bench top sequencer. The distribution of sequence reads for both HID and Cancer panel amplicons was consistent across these samples. For the bulk population of cells, the percentages of sequence covered at coverage of more than 100 × were 99.04% for the HID panel and 98.83% for the Cancer panel, while for the single cell percentages of sequence covered at coverage of more than 100 × were 55.93% for the HID panel and 65.96% for the Cancer panel. Partial amplification failure or randomly distributed non-amplified regions across samples from single cells during the WGA procedures or random allele drop out probably caused these differences. However, comparative analyses showed that this method successfully discriminated a single A549 cancer cell from a bulk population of A549 cells. Thus, our approach provides a powerful means to overcome tumor sample heterogeneity when searching for somatic mutations.
Host-Associated Metagenomics: A Guide to Generating Infectious RNA Viromes

PubMed Central

Robert, Catherine; Pascalis, Hervé; Michelle, Caroline; Jardot, Priscilla; Charrel, Rémi; Raoult, Didier; Desnues, Christelle

2015-01-01

Background Metagenomic analyses have been widely used in the last decade to describe viral communities in various environments or to identify the etiology of human, animal, and plant pathologies. Here, we present a simple and standardized protocol that allows for the purification and sequencing of RNA viromes from complex biological samples with an important reduction of host DNA and RNA contaminants, while preserving the infectivity of viral particles. Principal Findings We evaluated different viral purification steps, random reverse transcriptions and sequence-independent amplifications of a pool of representative RNA viruses. Viruses remained infectious after the purification process. We then validated the protocol by sequencing the RNA virome of human body lice engorged in vitro with artificially contaminated human blood. The full genomes of the most abundant viruses absorbed by the lice during the blood meal were successfully sequenced. Interestingly, random amplifications differed in the genome coverage of segmented RNA viruses. Moreover, the majority of reads were taxonomically identified, and only 7–15% of all reads were classified as “unknown”, depending on the random amplification method. Conclusion The protocol reported here could easily be applied to generate RNA viral metagenomes from complex biological samples of different origins. Our protocol allows further virological characterizations of the described viral communities because it preserves the infectivity of viral particles and allows for the isolation of viruses. PMID:26431175
Normal and compound poisson approximations for pattern occurrences in NGS reads.

PubMed

Zhai, Zhiyuan; Reinert, Gesine; Song, Kai; Waterman, Michael S; Luan, Yihui; Sun, Fengzhu

2012-06-01

Next generation sequencing (NGS) technologies are now widely used in many biological studies. In NGS, sequence reads are randomly sampled from the genome sequence of interest. Most computational approaches for NGS data first map the reads to the genome and then analyze the data based on the mapped reads. Since many organisms have unknown genome sequences and many reads cannot be uniquely mapped to the genomes even if the genome sequences are known, alternative analytical methods are needed for the study of NGS data. Here we suggest using word patterns to analyze NGS data. Word pattern counting (the study of the probabilistic distribution of the number of occurrences of word patterns in one or multiple long sequences) has played an important role in molecular sequence analysis. However, no studies are available on the distribution of the number of occurrences of word patterns in NGS reads. In this article, we build probabilistic models for the background sequence and the sampling process of the sequence reads from the genome. Based on the models, we provide normal and compound Poisson approximations for the number of occurrences of word patterns from the sequence reads, with bounds on the approximation error. The main challenge is to consider the randomness in generating the long background sequence, as well as in the sampling of the reads using NGS. We show the accuracy of these approximations under a variety of conditions for different patterns with various characteristics. Under realistic assumptions, the compound Poisson approximation seems to outperform the normal approximation in most situations. These approximate distributions can be used to evaluate the statistical significance of the occurrence of patterns from NGS data. The theory and the computational algorithm for calculating the approximate distributions are then used to analyze ChIP-Seq data using transcription factor GABP. Software is available online (www-rcf.usc.edu/∼fsun/Programs/NGS_motif_power/NGS_motif_power.html). In addition, Supplementary Material can be found online (www.liebertonline.com/cmb).
A Fosmid Cloning Strategy for Detecting the Widest Possible Spectrum of Microbes from the International Space Station Drinking Water System

PubMed Central

Choi, Sangdun; Chang, Mi Sook; Stuecker, Tara; Chung, Christine; Newcombe, David A.; Venkateswaran, Kasthuri

2012-01-01

In this study, fosmid cloning strategies were used to assess the microbial populations in water from the International Space Station (ISS) drinking water system (henceforth referred to as Prebiocide and Tank A water samples). The goals of this study were: to compare the sensitivity of the fosmid cloning strategy with that of traditional culture-based and 16S rRNA-based approaches and to detect the widest possible spectrum of microbial populations during the water purification process. Initially, microbes could not be cultivated, and conventional PCR failed to amplify 16S rDNA fragments from these low biomass samples. Therefore, randomly primed rolling-circle amplification was used to amplify any DNA that might be present in the samples, followed by size selection by using pulsed-field gel electrophoresis. The amplified high-molecular-weight DNA from both samples was cloned into fosmid vectors. Several hundred clones were randomly selected for sequencing, followed by Blastn/Blastx searches. Sequences encoding specific genes from Burkholderia, a species abundant in the soil and groundwater, were found in both samples. Bradyrhizobium and Mesorhizobium, which belong to rhizobia, a large community of nitrogen fixers often found in association with plant roots, were present in the Prebiocide samples. Ralstonia, which is prevalent in soils with a high heavy metal content, was detected in the Tank A samples. The detection of many unidentified sequences suggests the presence of potentially novel microbial fingerprints. The bacterial diversity detected in this pilot study using a fosmid vector approach was higher than that detected by conventional 16S rRNA gene sequencing. PMID:23346038
Application of Stochastic Labeling with Random-Sequence Barcodes for Simultaneous Quantification and Sequencing of Environmental 16S rRNA Genes.

PubMed

Hoshino, Tatsuhiko; Inagaki, Fumio

2017-01-01

Next-generation sequencing (NGS) is a powerful tool for analyzing environmental DNA and provides the comprehensive molecular view of microbial communities. For obtaining the copy number of particular sequences in the NGS library, however, additional quantitative analysis as quantitative PCR (qPCR) or digital PCR (dPCR) is required. Furthermore, number of sequences in a sequence library does not always reflect the original copy number of a target gene because of biases caused by PCR amplification, making it difficult to convert the proportion of particular sequences in the NGS library to the copy number using the mass of input DNA. To address this issue, we applied stochastic labeling approach with random-tag sequences and developed a NGS-based quantification protocol, which enables simultaneous sequencing and quantification of the targeted DNA. This quantitative sequencing (qSeq) is initiated from single-primer extension (SPE) using a primer with random tag adjacent to the 5' end of target-specific sequence. During SPE, each DNA molecule is stochastically labeled with the random tag. Subsequently, first-round PCR is conducted, specifically targeting the SPE product, followed by second-round PCR to index for NGS. The number of random tags is only determined during the SPE step and is therefore not affected by the two rounds of PCR that may introduce amplification biases. In the case of 16S rRNA genes, after NGS sequencing and taxonomic classification, the absolute number of target phylotypes 16S rRNA gene can be estimated by Poisson statistics by counting random tags incorporated at the end of sequence. To test the feasibility of this approach, the 16S rRNA gene of Sulfolobus tokodaii was subjected to qSeq, which resulted in accurate quantification of 5.0 × 103 to 5.0 × 104 copies of the 16S rRNA gene. Furthermore, qSeq was applied to mock microbial communities and environmental samples, and the results were comparable to those obtained using digital PCR and relative abundance based on a standard sequence library. We demonstrated that the qSeq protocol proposed here is advantageous for providing less-biased absolute copy numbers of each target DNA with NGS sequencing at one time. By this new experiment scheme in microbial ecology, microbial community compositions can be explored in more quantitative manner, thus expanding our knowledge of microbial ecosystems in natural environments.
High-speed imaging using CMOS image sensor with quasi pixel-wise exposure

NASA Astrophysics Data System (ADS)

Sonoda, T.; Nagahara, H.; Endo, K.; Sugiyama, Y.; Taniguchi, R.

2017-02-01

Several recent studies in compressive video sensing have realized scene capture beyond the fundamental trade-off limit between spatial resolution and temporal resolution using random space-time sampling. However, most of these studies showed results for higher frame rate video that were produced by simulation experiments or using an optically simulated random sampling camera, because there are currently no commercially available image sensors with random exposure or sampling capabilities. We fabricated a prototype complementary metal oxide semiconductor (CMOS) image sensor with quasi pixel-wise exposure timing that can realize nonuniform space-time sampling. The prototype sensor can reset exposures independently by columns and fix these amount of exposure by rows for each 8x8 pixel block. This CMOS sensor is not fully controllable via the pixels, and has line-dependent controls, but it offers flexibility when compared with regular CMOS or charge-coupled device sensors with global or rolling shutters. We propose a method to realize pseudo-random sampling for high-speed video acquisition that uses the flexibility of the CMOS sensor. We reconstruct the high-speed video sequence from the images produced by pseudo-random sampling using an over-complete dictionary.
Generation of Some First-Order Autoregressive Markovian Sequences of Positive Random Variables with Given Marginal Distributions,

DTIC Science & Technology

1981-03-01

Again E( XnX 1 Xn) Xn + (l-aB)/X PlXn-1 + (l-Pl)/x 2.11) and X0 E0 gives a stationary sequence. Thus the correla- tions and regressions are the...sequence, although the sample paths will tend to have runs-up. A similar analysis given in Lawrance and Lewis [5] shows that 1 1 + i a + au (3.7) E( XnX
Dynamic learning and context-dependence in sequential, attribute-based, stated-preference valuation questions

Treesearch

Thomas P. Holmes; Kevin J. Boyle

2005-01-01

A hybrid stated-preference model is presented that combines the referendum contingent valuation response format with an experimentally designed set of attributes. A sequence of valuation questions is asked to a random sample in a mailout mail-back format. Econometric analysis shows greater discrimination between alternatives in the final choice in the sequence, and the...
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

PubMed

Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

2013-07-01

The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

PubMed Central

Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

2007-01-01

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
Construction, Characterization, and Preliminary BAC-End Sequence Analysis of a Bacterial Artificial Chromosome Library of the Tea Plant (Camellia sinensis)

PubMed Central

Lin, Jinke; Kudrna, Dave; Wing, Rod A.

2011-01-01

We describe the construction and characterization of a publicly available BAC library for the tea plant, Camellia sinensis. Using modified methods, the library was constructed with the aim of developing public molecular resources to advance tea plant genomics research. The library consists of a total of 401,280 clones with an average insert size of 135 kb, providing an approximate coverage of 13.5 haploid genome equivalents. No empty vector clones were observed in a random sampling of 576 BAC clones. Further analysis of 182 BAC-end sequences from randomly selected clones revealed a GC content of 40.35% and low chloroplast and mitochondrial contamination. Repetitive sequence analyses indicated that LTR retrotransposons were the most predominant sequence class (86.93%–87.24%), followed by DNA retrotransposons (11.16%–11.69%). Additionally, we found 25 simple sequence repeats (SSRs) that could potentially be used as genetic markers. PMID:21234344

A high speed implementation of the random decrement algorithm

NASA Technical Reports Server (NTRS)

Kiraly, L. J.

1982-01-01

The algorithm is useful for measuring net system damping levels in stochastic processes and for the development of equivalent linearized system response models. The algorithm works by summing together all subrecords which occur after predefined threshold level is crossed. The random decrement signature is normally developed by scanning stored data and adding subrecords together. The high speed implementation of the random decrement algorithm exploits the digital character of sampled data and uses fixed record lengths of 2(n) samples to greatly speed up the process. The contributions to the random decrement signature of each data point was calculated only once and in the same sequence as the data were taken. A hardware implementation of the algorithm using random logic is diagrammed and the process is shown to be limited only by the record size and the threshold crossing frequency of the sampled data. With a hardware cycle time of 200 ns and 1024 point signature, a threshold crossing frequency of 5000 Hertz can be processed and a stably averaged signature presented in real time.
Structurally complex and highly active RNA ligases derived from random RNA sequences

NASA Technical Reports Server (NTRS)

Ekland, E. H.; Szostak, J. W.; Bartel, D. P.

1995-01-01

Seven families of RNA ligases, previously isolated from random RNA sequences, fall into three classes on the basis of secondary structure and regiospecificity of ligation. Two of the three classes of ribozymes have been engineered to act as true enzymes, catalyzing the multiple-turnover transformation of substrates into products. The most complex of these ribozymes has a minimal catalytic domain of 93 nucleotides. An optimized version of this ribozyme has a kcat exceeding one per second, a value far greater than that of most natural RNA catalysts and approaching that of comparable protein enzymes. The fact that such a large and complex ligase emerged from a very limited sampling of sequence space implies the existence of a large number of distinct RNA structures of equivalent complexity and activity.
Sampling through time and phylodynamic inference with coalescent and birth–death models

PubMed Central

Volz, Erik M.; Frost, Simon D. W.

2014-01-01

Many population genetic models have been developed for the purpose of inferring population size and growth rates from random samples of genetic data. We examine two popular approaches to this problem, the coalescent and the birth–death-sampling model (BDM), in the context of estimating population size and birth rates in a population growing exponentially according to the birth–death branching process. For sequences sampled at a single time, we found the coalescent and the BDM gave virtually indistinguishable results in terms of the growth rates and fraction of the population sampled, even when sampling from a small population. For sequences sampled at multiple time points, we find that the birth–death model estimators are subject to large bias if the sampling process is misspecified. Since BDMs incorporate a model of the sampling process, we show how much of the statistical power of BDMs arises from the sequence of sample times and not from the genealogical tree. This motivates the development of a new coalescent estimator, which is augmented with a model of the known sampling process and is potentially more precise than the coalescent that does not use sample time information. PMID:25401173
Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

NASA Astrophysics Data System (ADS)

Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

2016-02-01

Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.
High-throughput sequencing of complete human mtDNA genomes from the Caucasus and West Asia: high diversity and demographic inferences.

PubMed

Schönberg, Anna; Theunert, Christoph; Li, Mingkun; Stoneking, Mark; Nasidze, Ivan

2011-09-01

To investigate the demographic history of human populations from the Caucasus and surrounding regions, we used high-throughput sequencing to generate 147 complete mtDNA genome sequences from random samples of individuals from three groups from the Caucasus (Armenians, Azeri and Georgians), and one group each from Iran and Turkey. Overall diversity is very high, with 144 different sequences that fall into 97 different haplogroups found among the 147 individuals. Bayesian skyline plots (BSPs) of population size change through time show a population expansion around 40-50 kya, followed by a constant population size, and then another expansion around 15-18 kya for the groups from the Caucasus and Iran. The BSP for Turkey differs the most from the others, with an increase from 35 to 50 kya followed by a prolonged period of constant population size, and no indication of a second period of growth. An approximate Bayesian computation approach was used to estimate divergence times between each pair of populations; the oldest divergence times were between Turkey and the other four groups from the South Caucasus and Iran (~400-600 generations), while the divergence time of the three Caucasus groups from each other was comparable to their divergence time from Iran (average of ~360 generations). These results illustrate the value of random sampling of complete mtDNA genome sequences that can be obtained with high-throughput sequencing platforms.
Importance Sampling of Word Patterns in DNA and Protein Sequences

PubMed Central

Chan, Hock Peng; Chen, Louis H.Y.

2010-01-01

Abstract Monte Carlo methods can provide accurate p-value estimates of word counting test statistics and are easy to implement. They are especially attractive when an asymptotic theory is absent or when either the search sequence or the word pattern is too short for the application of asymptotic formulae. Naive direct Monte Carlo is undesirable for the estimation of small probabilities because the associated rare events of interest are seldom generated. We propose instead efficient importance sampling algorithms that use controlled insertion of the desired word patterns on randomly generated sequences. The implementation is illustrated on word patterns of biological interest: palindromes and inverted repeats, patterns arising from position-specific weight matrices (PSWMs), and co-occurrences of pairs of motifs. PMID:21128856
DS/LPI autocorrelation detection in noise plus random-tone interference. [Direct Sequence Low-Probabilty of Intercept

NASA Technical Reports Server (NTRS)

Hinedi, S.; Polydoros, A.

1988-01-01

The authors present and analyze a frequency-noncoherent two-lag autocorrelation statistic for the wideband detection of random BPSK signals in noise-plus-random-multitone interference. It is shown that this detector is quite robust to the presence or absence of interference and its specific parameter values, contrary to the case of an energy detector. The rule assumes knowledge of the data rate and the active scenario under H0. It is concluded that the real-time autocorrelation domain and its samples (lags) are a viable approach for detecting random signals in dense environments.
Whole genome sequencing identifies influenza A H3N2 transmission and offers superior resolution to classical typing methods.

PubMed

Meinel, Dominik M; Heinzinger, Susanne; Eberle, Ute; Ackermann, Nikolaus; Schönberger, Katharina; Sing, Andreas

2018-02-01

Influenza with its annual epidemic waves is a major cause of morbidity and mortality worldwide. However, only little whole genome data are available regarding the molecular epidemiology promoting our understanding of viral spread in human populations. We implemented a RT-PCR strategy starting from patient material to generate influenza A whole genome sequences for molecular epidemiological surveillance. Samples were obtained within the Bavarian Influenza Sentinel. The complete influenza virus genome was amplified by a one-tube multiplex RT-PCR and sequenced on an Illumina MiSeq. We report whole genomic sequences for 50 influenza A H3N2 viruses, which was the predominating virus in the season 2014/15, directly from patient specimens. The dataset included random samples from Bavaria (Germany) throughout the influenza season and samples from three suspected transmission clusters. We identified the outbreak samples based on sequence identity. Whole genome sequencing (WGS) was superior in resolution compared to analysis of single segments or partial segment analysis. Additionally, we detected manifestation of substantial amounts of viral quasispecies in several patients, carrying mutations varying from the dominant virus in each patient. Our rapid whole genome sequencing approach for influenza A virus shows that WGS can effectively be used to detect and understand outbreaks in large communities. Additionally, the genomic data provide in-depth details about the circulating virus within one season.
Prevalence of pathogenic bacteria in Ixodes ricinus ticks in Central Bohemia.

PubMed

Klubal, Radek; Kopecky, Jan; Nesvorna, Marta; Sparagano, Olivier A E; Thomayerova, Jana; Hubert, Jan

2016-01-01

Bacteria associated with the tick Ixodes ricinus were assessed in specimens unattached or attached to the skin of cats, dogs and humans, collected in the Czech Republic. The bacteria were detected by PCR in 97 of 142 pooled samples including 204 ticks, i.e. 1-7 ticks per sample, collected at the same time from one host. A fragment of the bacterial 16S rRNA gene was amplified, cloned and sequenced from 32 randomly selected samples. The most frequent sequences were those related to Candidatus Midichloria midichlori (71% of cloned sequences), followed by Diplorickettsia (13%), Spiroplasma (3%), Rickettsia (3%), Pasteurella (3%), Morganella (3%), Pseudomonas (2%), Bacillus (1%), Methylobacterium (1%) and Phyllobacterium (1%). The phylogenetic analysis of Spiroplasma 16S rRNA gene sequences showed two groups related to Spiroplasma eriocheiris and Spiroplasma melliferum, respectively. Using group-specific primers, the following potentially pathogenic bacteria were detected: Borellia (in 20% of the 142 samples), Rickettsia (12%), Spiroplasma (5%), Diplorickettsia (5%) and Anaplasma (2%). In total, 68% of I. ricinus samples (97/142) contained detectable bacteria and 13% contained two or more putative pathogenic groups. The prevalence of tick-borne bacteria was similar to the observations in other European countries.
Molecular Diagnosis of Orthopedic-Device-Related Infection Directly from Sonication Fluid by Metagenomic Sequencing

PubMed Central

Sanderson, Nicholas D.; Atkins, Bridget L.; Brent, Andrew J.; Cole, Kevin; Foster, Dona; McNally, Martin A.; Oakley, Sarah; Peto, Leon; Taylor, Adrian; Peto, Tim E. A.; Crook, Derrick W.; Eyre, David W.

2017-01-01

ABSTRACT Culture of multiple periprosthetic tissue samples is the current gold standard for microbiological diagnosis of prosthetic joint infections (PJI). Additional diagnostic information may be obtained through culture of sonication fluid from explants. However, current techniques can have relatively low sensitivity, with prior antimicrobial therapy and infection by fastidious organisms influencing results. We assessed if metagenomic sequencing of total DNA extracts obtained direct from sonication fluid can provide an alternative rapid and sensitive tool for diagnosis of PJI. We compared metagenomic sequencing with standard aerobic and anaerobic culture in 97 sonication fluid samples from prosthetic joint and other orthopedic device infections. Reads from Illumina MiSeq sequencing were taxonomically classified using Kraken. Using 50 derivation samples, we determined optimal thresholds for the number and proportion of bacterial reads required to identify an infection and confirmed our findings in 47 independent validation samples. Compared to results from sonication fluid culture, the species-level sensitivity of metagenomic sequencing was 61/69 (88%; 95% confidence interval [CI], 77 to 94%; for derivation samples 35/38 [92%; 95% CI, 79 to 98%]; for validation samples, 26/31 [84%; 95% CI, 66 to 95%]), and genus-level sensitivity was 64/69 (93%; 95% CI, 84 to 98%). Species-level specificity, adjusting for plausible fastidious causes of infection, species found in concurrently obtained tissue samples, and prior antibiotics, was 85/97 (88%; 95% CI, 79 to 93%; for derivation samples, 43/50 [86%; 95% CI, 73 to 94%]; for validation samples, 42/47 [89%; 95% CI, 77 to 96%]). High levels of human DNA contamination were seen despite the use of laboratory methods to remove it. Rigorous laboratory good practice was required to minimize bacterial DNA contamination. We demonstrate that metagenomic sequencing can provide accurate diagnostic information in PJI. Our findings, combined with the increasing availability of portable, random-access sequencing technology, offer the potential to translate metagenomic sequencing into a rapid diagnostic tool in PJI. PMID:28490492
Sequencing of the large dsDNA genome of Oryctes rhinoceros nudivirus using multiple displacement amplification of nanogram amounts of virus DNA.

PubMed

Wang, Yongjie; Kleespies, Regina G; Ramle, Moslim B; Jehle, Johannes A

2008-09-01

The genomic sequence analysis of many large dsDNA viruses is hampered by the lack of enough sample materials. Here, we report a whole genome amplification of the Oryctes rhinoceros nudivirus (OrNV) isolate Ma07 starting from as few as about 10 ng of purified viral DNA by application of phi29 DNA polymerase- and exonuclease-resistant random hexamer-based multiple displacement amplification (MDA) method. About 60 microg of high molecular weight DNA with fragment sizes of up to 25 kbp was amplified. A genomic DNA clone library was generated using the product DNA. After 8-fold sequencing coverage, the 127,615 bp of OrNV whole genome was sequenced successfully. The results demonstrate that the MDA-based whole genome amplification enables rapid access to genomic information from exiguous virus samples.
Identification of cancer-specific motifs in mimotope profiles of serum antibody repertoire.

PubMed

Gerasimov, Ekaterina; Zelikovsky, Alex; Măndoiu, Ion; Ionov, Yurij

2017-06-07

For fighting cancer, earlier detection is crucial. Circulating auto-antibodies produced by the patient's own immune system after exposure to cancer proteins are promising bio-markers for the early detection of cancer. Since an antibody recognizes not the whole antigen but 4-7 critical amino acids within the antigenic determinant (epitope), the whole proteome can be represented by a random peptide phage display library. This opens the possibility to develop an early cancer detection test based on a set of peptide sequences identified by comparing cancer patients' and healthy donors' global peptide profiles of antibody specificities. Due to the enormously large number of peptide sequences contained in global peptide profiles generated by next generation sequencing, the large number of cancer and control sera is required to identify cancer-specific peptides with high degree of statistical significance. To decrease the number of peptides in profiles generated by nextgen sequencing without losing cancer-specific sequences we used for generation of profiles the phage library enriched by panning on the pool of cancer sera. To further decrease the complexity of profiles we used computational methods for transforming a list of peptides constituting the mimotope profiles to the list motifs formed by similar peptide sequences. We have shown that the amino-acid order is meaningful in mimotope motifs since they contain significantly more peptides than motifs among peptides where amino-acids are randomly permuted. Also the single sample motifs significantly differ from motifs in peptides drawn from multiple samples. Finally, multiple cancer-specific motifs have been identified.
Je, a versatile suite to handle multiplexed NGS libraries with unique molecular identifiers.

PubMed

Girardot, Charles; Scholtalbers, Jelle; Sauer, Sajoscha; Su, Shu-Yi; Furlong, Eileen E M

2016-10-08

The yield obtained from next generation sequencers has increased almost exponentially in recent years, making sample multiplexing common practice. While barcodes (known sequences of fixed length) primarily encode the sample identity of sequenced DNA fragments, barcodes made of random sequences (Unique Molecular Identifier or UMIs) are often used to distinguish between PCR duplicates and transcript abundance in, for example, single-cell RNA sequencing (scRNA-seq). In paired-end sequencing, different barcodes can be inserted at each fragment end to either increase the number of multiplexed samples in the library or to use one of the barcodes as UMI. Alternatively, UMIs can be combined with the sample barcodes into composite barcodes, or with standard Illumina® indexing. Subsequent analysis must take read duplicates and sample identity into account, by identifying UMIs. Existing tools do not support these complex barcoding configurations and custom code development is frequently required. Here, we present Je, a suite of tools that accommodates complex barcoding strategies, extracts UMIs and filters read duplicates taking UMIs into account. Using Je on publicly available scRNA-seq and iCLIP data containing UMIs, the number of unique reads increased by up to 36 %, compared to when UMIs are ignored. Je is implemented in JAVA and uses the Picard API. Code, executables and documentation are freely available at http://gbcs.embl.de/Je . Je can also be easily installed in Galaxy through the Galaxy toolshed.
No evidence for MHC class II-based non-random mating at the gametic haplotype in Atlantic salmon.

PubMed

Promerová, M; Alavioon, G; Tusso, S; Burri, R; Immler, S

2017-06-01

Genes of the major histocompatibility complex (MHC) are a likely target of mate choice because of their role in inbreeding avoidance and potential benefits for offspring immunocompetence. Evidence for female choice for complementary MHC alleles among competing males exists both for the pre- and the postmating stages. However, it remains unclear whether the latter may involve non-random fusion of gametes depending on gametic haplotypes resulting in transmission ratio distortion or non-random sequence divergence among fused gametes. We tested whether non-random gametic fusion of MHC-II haplotypes occurs in Atlantic salmon Salmo salar. We performed in vitro fertilizations that excluded interindividual sperm competition using a split family design with large clutch sample sizes to test for a possible role of the gametic haplotype in mate choice. We sequenced two MHC-II loci in 50 embryos per clutch to assess allelic frequencies and sequence divergence. We found no evidence for transmission ratio distortion at two linked MHC-II loci, nor for non-random gamete fusion with respect to MHC-II alleles. Our findings suggest that the gametic MHC-II haplotypes play no role in gamete association in Atlantic salmon and that earlier findings of MHC-based mate choice most likely reflect choice among diploid genotypes. We discuss possible explanations for these findings and how they differ from findings in mammals.
Methodological reporting of randomized clinical trials in respiratory research in 2010.

PubMed

Lu, Yi; Yao, Qiuju; Gu, Jie; Shen, Ce

2013-09-01

Although randomized controlled trials (RCTs) are considered the highest level of evidence, they are also subject to bias, due to a lack of adequately reported randomization, and therefore the reporting should be as explicit as possible for readers to determine the significance of the contents. We evaluated the methodological quality of RCTs in respiratory research in high ranking clinical journals, published in 2010. We assessed the methodological quality, including generation of the allocation sequence, allocation concealment, double-blinding, sample-size calculation, intention-to-treat analysis, flow diagrams, number of medical centers involved, diseases, funding sources, types of interventions, trial registration, number of times the papers have been cited, journal impact factor, journal type, and journal endorsement of the CONSORT (Consolidated Standards of Reporting Trials) rules, in RCTs published in 12 top ranking clinical respiratory journals and 5 top ranking general medical journals. We included 176 trials, of which 93 (53%) reported adequate generation of the allocation sequence, 66 (38%) reported adequate allocation concealment, 79 (45%) were double-blind, 123 (70%) reported adequate sample-size calculation, 88 (50%) reported intention-to-treat analysis, and 122 (69%) included a flow diagram. Multivariate logistic regression analysis revealed that journal impact factor ≥ 5 was the only variable that significantly influenced adequate allocation sequence generation. Trial registration and journal impact factor ≥ 5 significantly influenced adequate allocation concealment. Medical interventions, trial registration, and journal endorsement of the CONSORT statement influenced adequate double-blinding. Publication in one of the general medical journal influenced adequate sample-size calculation. The methodological quality of RCTs in respiratory research needs improvement. Stricter enforcement of the CONSORT statement should enhance the quality of RCTs.
Subrandom methods for multidimensional nonuniform sampling.

PubMed

Worley, Bradley

2016-08-01

Methods of nonuniform sampling that utilize pseudorandom number sequences to select points from a weighted Nyquist grid are commonplace in biomolecular NMR studies, due to the beneficial incoherence introduced by pseudorandom sampling. However, these methods require the specification of a non-arbitrary seed number in order to initialize a pseudorandom number generator. Because the performance of pseudorandom sampling schedules can substantially vary based on seed number, this can complicate the task of routine data collection. Approaches such as jittered sampling and stochastic gap sampling are effective at reducing random seed dependence of nonuniform sampling schedules, but still require the specification of a seed number. This work formalizes the use of subrandom number sequences in nonuniform sampling as a means of seed-independent sampling, and compares the performance of three subrandom methods to their pseudorandom counterparts using commonly applied schedule performance metrics. Reconstruction results using experimental datasets are also provided to validate claims made using these performance metrics. Copyright © 2016 Elsevier Inc. All rights reserved.
On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21

NASA Technical Reports Server (NTRS)

Aalfs, David D.

1995-01-01

For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.
Genetic variability in isolates of Chromobacterium violaceum from pulmonary secretion, water, and soil.

PubMed

Santini, A C; Magalhães, J T; Cascardo, J C M; Corrêa, R X

2016-04-28

Chromobacterium violaceum is a free-living Gram-negative bacillus usually found in the water and soil in tropical regions, which causes infections in humans. Chromobacteriosis is characterized by rapid dissemination and high mortality. The aim of this study was to detect the genetic variability among C. violaceum type strain ATCC 12472, and seven isolates from the environment and one from a pulmonary secretion from a chromobacteriosis patient from Ilhéus, Bahia. The molecular characterization of all samples was performed by polymerase chain reaction (PCR) sequencing and 16S rDNA analysis. Primers specific for two ATCC 12472 pathogenicity genes, hilA and yscD, as well as random amplified polymorphic DNA (RAPD), were used for PCR amplification and comparative sequencing of the products. For a more specific approach, the PCR products of 16S rDNA were digested with restriction enzymes. Seven of the samples, including type-strain ATCC 12472, were amplified by the hilA primers; these were subsequently sequenced. Gene yscD was amplified only in type-strain ATCC 12472. MspI and AluI digestion revealed 16S rDNA polymorphisms. This data allowed the generation of a dendogram for each analysis. The isolates of C. violaceum have variability in random genomic regions demonstrated by RAPD. Also, these isolates have variability in pathogenicity genes, as demonstrated by sequencing and restriction enzyme digestion.
Image encryption using random sequence generated from generalized information domain

NASA Astrophysics Data System (ADS)

Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

2016-05-01

A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Random sampling of constrained phylogenies: conducting phylogenetic analyses when the phylogeny is partially known.

PubMed

Housworth, E A; Martins, E P

2001-01-01

Statistical randomization tests in evolutionary biology often require a set of random, computer-generated trees. For example, earlier studies have shown how large numbers of computer-generated trees can be used to conduct phylogenetic comparative analyses even when the phylogeny is uncertain or unknown. These methods were limited, however, in that (in the absence of molecular sequence or other data) they allowed users to assume that no phylogenetic information was available or that all possible trees were known. Intermediate situations where only a taxonomy or other limited phylogenetic information (e.g., polytomies) are available are technically more difficult. The current study describes a procedure for generating random samples of phylogenies while incorporating limited phylogenetic information (e.g., four taxa belong together in a subclade). The procedure can be used to conduct comparative analyses when the phylogeny is only partially resolved or can be used in other randomization tests in which large numbers of possible phylogenies are needed.

Golden Ratio Versus Pi as Random Sequence Sources for Monte Carlo Integration

NASA Technical Reports Server (NTRS)

Sen, S. K.; Agarwal, Ravi P.; Shaykhian, Gholam Ali

2007-01-01

We discuss here the relative merits of these numbers as possible random sequence sources. The quality of these sequences is not judged directly based on the outcome of all known tests for the randomness of a sequence. Instead, it is determined implicitly by the accuracy of the Monte Carlo integration in a statistical sense. Since our main motive of using a random sequence is to solve real world problems, it is more desirable if we compare the quality of the sequences based on their performances for these problems in terms of quality/accuracy of the output. We also compare these sources against those generated by a popular pseudo-random generator, viz., the Matlab rand and the quasi-random generator ha/ton both in terms of error and time complexity. Our study demonstrates that consecutive blocks of digits of each of these numbers produce a good random sequence source. It is observed that randomly chosen blocks of digits do not have any remarkable advantage over consecutive blocks for the accuracy of the Monte Carlo integration. Also, it reveals that pi is a better source of a random sequence than theta when the accuracy of the integration is concerned.
Read clouds uncover variation in complex regions of the human genome

PubMed Central

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E.; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-01-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. PMID:26286554
The genome sequence of pepper vein yellows virus (family Luteoviridae, genus Polerovirus).

PubMed

Murakami, Ritsuko; Nakashima, Nobuhiko; Hinomoto, Norihide; Kawano, Shinji; Toyosato, Tetsuya

2011-05-01

The complete genome of pepper vein yellows virus (PeVYV) was sequenced using random amplification of RNA samples isolated from vector insects (Aphis gossypii) that had been given access to PeVYV-infected plants. The PeVYV genome consisted of 6244 nucleotides and had a genomic organization characteristic of members of the genus Polerovirus. PeVYV had highest amino acid sequence identities in ORF0 to ORF3 (75.9 - 91.9%) with tobacco vein distorting polerovirus, with which it was only 25.1% identical in ORF5. These sequence comparisons and previously studied biological properties indicate that PeVYV is a distinctly different virus and belongs to a new species of the genus Polerovirus.
Investigation of modulation parameters in multiplexing gas chromatography.

PubMed

Trapp, Oliver

2010-10-22

Combination of information technology and separation sciences opens a new avenue to achieve high sample throughputs and therefore is of great interest to bypass bottlenecks in catalyst screening of parallelized reactors or using multitier well plates in reaction optimization. Multiplexing gas chromatography utilizes pseudo-random injection sequences derived from Hadamard matrices to perform rapid sample injections which gives a convoluted chromatogram containing the information of a single sample or of several samples with similar analyte composition. The conventional chromatogram is obtained by application of the Hadamard transform using the known injection sequence or in case of several samples an averaged transformed chromatogram is obtained which can be used in a Gauss-Jordan deconvolution procedure to obtain all single chromatograms of the individual samples. The performance of such a system depends on the modulation precision and on the parameters, e.g. the sequence length and modulation interval. Here we demonstrate the effects of the sequence length and modulation interval on the deconvoluted chromatogram, peak shapes and peak integration for sequences between 9-bit (511 elements) and 13-bit (8191 elements) and modulation intervals Δt between 5 s and 500 ms using a mixture of five components. It could be demonstrated that even for high-speed modulation at time intervals of 500 ms the chromatographic information is very well preserved and that the separation efficiency can be improved by very narrow sample injections. Furthermore this study shows that the relative peak areas in multiplexed chromatograms do not deviate from conventionally recorded chromatograms. Copyright © 2010 Elsevier B.V. All rights reserved.
The female urinary microbiome in urgency urinary incontinence.

PubMed

Pearce, Meghan M; Zilliox, Michael J; Rosenfeld, Amy B; Thomas-White, Krystal J; Richter, Holly E; Nager, Charles W; Visco, Anthony G; Nygaard, Ingrid E; Barber, Matthew D; Schaffer, Joseph; Moalli, Pamela; Sung, Vivian W; Smith, Ariana L; Rogers, Rebecca; Nolen, Tracy L; Wallace, Dennis; Meikle, Susan F; Gai, Xiaowu; Wolfe, Alan J; Brubaker, Linda

2015-09-01

The purpose of this study was to characterize the urinary microbiota in women who are planning treatment for urgency urinary incontinence and to describe clinical associations with urinary symptoms, urinary tract infection, and treatment outcomes. Catheterized urine samples were collected from multisite randomized trial participants who had no clinical evidence of urinary tract infection; 16S ribosomal RNA gene sequencing was used to dichotomize participants as either DNA sequence-positive or sequence-negative. Associations with demographics, urinary symptoms, urinary tract infection risk, and treatment outcomes were determined. In sequence-positive samples, microbiotas were characterized on the basis of their dominant microorganisms. More than one-half (51.1%; 93/182) of the participants' urine samples were sequence-positive. Sequence-positive participants were younger (55.8 vs 61.3 years old; P = .0007), had a higher body mass index (33.7 vs 30.1 kg/m(2); P = .0009), had a higher mean baseline daily urgency urinary incontinence episodes (5.7 vs 4.2 episodes; P < .0001), responded better to treatment (decrease in urgency urinary incontinence episodes, -4.4 vs -3.3; P = .0013), and were less likely to experience urinary tract infection (9% vs 27%; P = .0011). In sequence-positive samples, 8 major bacterial clusters were identified; 7 clusters were dominated not only by a single genus, most commonly Lactobacillus (45%) or Gardnerella (17%), but also by other taxa (25%). The remaining cluster had no dominant genus (13%). DNA sequencing confirmed urinary bacterial DNA in many women with urgency urinary incontinence who had no signs of infection. Sequence status was associated with baseline urgency urinary incontinence episodes, treatment response, and posttreatment urinary tract infection risk. Copyright © 2015 Elsevier Inc. All rights reserved.
Studies in astronomical time series analysis: Modeling random processes in the time domain

NASA Technical Reports Server (NTRS)

Scargle, J. D.

1979-01-01

Random process models phased in the time domain are used to analyze astrophysical time series data produced by random processes. A moving average (MA) model represents the data as a sequence of pulses occurring randomly in time, with random amplitudes. An autoregressive (AR) model represents the correlations in the process in terms of a linear function of past values. The best AR model is determined from sampled data and transformed to an MA for interpretation. The randomness of the pulse amplitudes is maximized by a FORTRAN algorithm which is relatively stable numerically. Results of test cases are given to study the effects of adding noise and of different distributions for the pulse amplitudes. A preliminary analysis of the optical light curve of the quasar 3C 273 is given.
Optimization of whole-transcriptome amplification from low cell density deep-sea microbial samples for metatranscriptomic analysis.

PubMed

Wu, Jieying; Gao, Weimin; Zhang, Weiwen; Meldrum, Deirdre R

2011-01-01

Limitation in sample quality and quantity is one of the big obstacles for applying metatranscriptomic technologies to explore gene expression and functionality of microbial communities in natural environments. In this study, several amplification methods were evaluated for whole-transcriptome amplification of deep-sea microbial samples, which are of low cell density and high impurity. The best amplification method was identified and incorporated into a complete protocol to isolate and amplify deep-sea microbial samples. In the protocol, total RNA was first isolated by a modified method combining Trizol (Invitrogen, CA) and RNeasy (QIAGEN, CA) method, amplified with a WT-Ovation™ Pico RNA Amplification System (NuGEN, CA), and then converted to double-strand DNA from single-strand cDNA with a WT-Ovation™ Exon Module (NuGEN, CA). The products from the whole-transcriptome amplification of deep-sea microbial samples were assessed first through random clone library sequencing. The BLAST search results showed that marine-based sequences are dominant in the libraries, consistent with the ecological source of the samples. The products were then used for next-generation Roche GS FLX Titanium sequencing to obtain metatranscriptome data. Preliminary analysis of the metatranscriptomic data showed good sequencing quality. Although the protocol was designed and demonstrated to be effective for deep-sea microbial samples, it should be applicable to similar samples from other extreme environments in exploring community structure and functionality of microbial communities. Copyright © 2010 Elsevier B.V. All rights reserved.
GTRAC: fast retrieval from compressed collections of genomic variants

PubMed Central

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-01-01

Motivation: The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. Results: We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. Availability and Implementation: The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC Contact: kedart@stanford.edu Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27587665
GTRAC: fast retrieval from compressed collections of genomic variants.

PubMed

Tatwawadi, Kedar; Hernaez, Mikel; Ochoa, Idoia; Weissman, Tsachy

2016-09-01

The dramatic decrease in the cost of sequencing has resulted in the generation of huge amounts of genomic data, as evidenced by projects such as the UK10K and the Million Veteran Project, with the number of sequenced genomes ranging in the order of 10 K to 1 M. Due to the large redundancies among genomic sequences of individuals from the same species, most of the medical research deals with the variants in the sequences as compared with a reference sequence, rather than with the complete genomic sequences. Consequently, millions of genomes represented as variants are stored in databases. These databases are constantly updated and queried to extract information such as the common variants among individuals or groups of individuals. Previous algorithms for compression of this type of databases lack efficient random access capabilities, rendering querying the database for particular variants and/or individuals extremely inefficient, to the point where compression is often relinquished altogether. We present a new algorithm for this task, called GTRAC, that achieves significant compression ratios while allowing fast random access over the compressed database. For example, GTRAC is able to compress a Homo sapiens dataset containing 1092 samples in 1.1 GB (compression ratio of 160), while allowing for decompression of specific samples in less than a second and decompression of specific variants in 17 ms. GTRAC uses and adapts techniques from information theory, such as a specialized Lempel-Ziv compressor, and tailored succinct data structures. The GTRAC algorithm is available for download at: https://github.com/kedartatwawadi/GTRAC CONTACT: : kedart@stanford.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Sampled-Data Consensus of Linear Multi-agent Systems With Packet Losses.

PubMed

Zhang, Wenbing; Tang, Yang; Huang, Tingwen; Kurths, Jurgen

In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.In this paper, the consensus problem is studied for a class of multi-agent systems with sampled data and packet losses, where random and deterministic packet losses are considered, respectively. For random packet losses, a Bernoulli-distributed white sequence is used to describe packet dropouts among agents in a stochastic way. For deterministic packet losses, a switched system with stable and unstable subsystems is employed to model packet dropouts in a deterministic way. The purpose of this paper is to derive consensus criteria, such that linear multi-agent systems with sampled-data and packet losses can reach consensus. By means of the Lyapunov function approach and the decomposition method, the design problem of a distributed controller is solved in terms of convex optimization. The interplay among the allowable bound of the sampling interval, the probability of random packet losses, and the rate of deterministic packet losses are explicitly derived to characterize consensus conditions. The obtained criteria are closely related to the maximum eigenvalue of the Laplacian matrix versus the second minimum eigenvalue of the Laplacian matrix, which reveals the intrinsic effect of communication topologies on consensus performance. Finally, simulations are given to show the effectiveness of the proposed results.
Generalized species sampling priors with latent Beta reinforcements

PubMed Central

Airoldi, Edoardo M.; Costa, Thiago; Bassetti, Federico; Leisen, Fabrizio; Guindani, Michele

2014-01-01

Many popular Bayesian nonparametric priors can be characterized in terms of exchangeable species sampling sequences. However, in some applications, exchangeability may not be appropriate. We introduce a novel and probabilistically coherent family of non-exchangeable species sampling sequences characterized by a tractable predictive probability function with weights driven by a sequence of independent Beta random variables. We compare their theoretical clustering properties with those of the Dirichlet Process and the two parameters Poisson-Dirichlet process. The proposed construction provides a complete characterization of the joint process, differently from existing work. We then propose the use of such process as prior distribution in a hierarchical Bayes modeling framework, and we describe a Markov Chain Monte Carlo sampler for posterior inference. We evaluate the performance of the prior and the robustness of the resulting inference in a simulation study, providing a comparison with popular Dirichlet Processes mixtures and Hidden Markov Models. Finally, we develop an application to the detection of chromosomal aberrations in breast cancer by leveraging array CGH data. PMID:25870462
Quantum random bit generation using energy fluctuations in stimulated Raman scattering.

PubMed

Bustard, Philip J; England, Duncan G; Nunn, Josh; Moffatt, Doug; Spanner, Michael; Lausten, Rune; Sussman, Benjamin J

2013-12-02

Random number sequences are a critical resource in modern information processing systems, with applications in cryptography, numerical simulation, and data sampling. We introduce a quantum random number generator based on the measurement of pulse energy quantum fluctuations in Stokes light generated by spontaneously-initiated stimulated Raman scattering. Bright Stokes pulse energy fluctuations up to five times the mean energy are measured with fast photodiodes and converted to unbiased random binary strings. Since the pulse energy is a continuous variable, multiple bits can be extracted from a single measurement. Our approach can be generalized to a wide range of Raman active materials; here we demonstrate a prototype using the optical phonon line in bulk diamond.
Molecular Survey of Hepatozoon canis in Red Foxes (Vulpes vulpes) from Romania.

PubMed

Imre, Mirela; Dudu, Andreea; Ilie, Marius S; Morariu, Sorin; Imre, Kálmán; Dărăbuş, Gheorghe

2015-08-01

Blood samples of 119 red foxes, originating from 44 hunting grounds of 3 western counties (Arad, Hunedoara, and Timiş) of Romania, have been examined for the presence of Hepatozoon canis infection using the conventional polymerase chain reaction (PCR) of the fragment of 18S rRNA gene. Overall, 15 (12.6%) samples were found to be PCR-positive. Of the sampled hunting grounds, 29.5% (13/44) were found positive. Positive samples were recorded in all screened counties with the prevalence of 14.8% (9/61) in Arad, 9.8% (5/51) in Timiş, and 14.3% (1/7) in Hunedoara, respectively. No correlation was found (P > 0.05) between H. canis positivity and gender or territorial distribution of the infection. To confirm PCR results, 9 randomly selected amplicons were sequenced. The obtained sequences were identical to each other, confirmed the results of the conventional PCR, and showed 98-100% homology to other H. canis sequences. The results of the current survey support the role of red foxes as sylvatic reservoirs of H. canis in Romania.
Usefulness of fire ant genetics in insecticide efficacy trials

USDA-ARS?s Scientific Manuscript database

Mature fire ant colonies contain an average of 80,000 worker ants. For this study, eight fire ant workers were randomly sampled from each colony. DNA fingerprints for each individual ant were generated using 21 simple sequence repeats (SSR) markers that were developed from fire ant DNA by other lab...
Feedback shift register sequences versus uniformly distributed random sequences for correlation chromatography

NASA Technical Reports Server (NTRS)

Kaljurand, M.; Valentin, J. R.; Shao, M.

1996-01-01

Two alternative input sequences are commonly employed in correlation chromatography (CC). They are sequences derived according to the algorithm of the feedback shift register (i.e., pseudo random binary sequences (PRBS)) and sequences derived by using the uniform random binary sequences (URBS). These two sequences are compared. By applying the "cleaning" data processing technique to the correlograms that result from these sequences, we show that when the PRBS is used the S/N of the correlogram is much higher than the one resulting from using URBS.
Database-independent Protein Sequencing (DiPS) Enables Full-length de Novo Protein and Antibody Sequence Determination.

PubMed

Savidor, Alon; Barzilay, Rotem; Elinger, Dalia; Yarden, Yosef; Lindzen, Moshit; Gabashvili, Alexandra; Adiv Tal, Ophir; Levin, Yishai

2017-06-01

Traditional "bottom-up" proteomic approaches use proteolytic digestion, LC-MS/MS, and database searching to elucidate peptide identities and their parent proteins. Protein sequences absent from the database cannot be identified, and even if present in the database, complete sequence coverage is rarely achieved even for the most abundant proteins in the sample. Thus, sequencing of unknown proteins such as antibodies or constituents of metaproteomes remains a challenging problem. To date, there is no available method for full-length protein sequencing, independent of a reference database, in high throughput. Here, we present Database-independent Protein Sequencing, a method for unambiguous, rapid, database-independent, full-length protein sequencing. The method is a novel combination of non-enzymatic, semi-random cleavage of the protein, LC-MS/MS analysis, peptide de novo sequencing, extraction of peptide tags, and their assembly into a consensus sequence using an algorithm named "Peptide Tag Assembler." As proof-of-concept, the method was applied to samples of three known proteins representing three size classes and to a previously un-sequenced, clinically relevant monoclonal antibody. Excluding leucine/isoleucine and glutamic acid/deamidated glutamine ambiguities, end-to-end full-length de novo sequencing was achieved with 99-100% accuracy for all benchmarking proteins and the antibody light chain. Accuracy of the sequenced antibody heavy chain, including the entire variable region, was also 100%, but there was a 23-residue gap in the constant region sequence. © 2017 by The American Society for Biochemistry and Molecular Biology, Inc.
Analysis of delay reducing and fuel saving sequencing and spacing algorithms for arrival traffic

NASA Technical Reports Server (NTRS)

Neuman, Frank; Erzberger, Heinz

1991-01-01

The air traffic control subsystem that performs sequencing and spacing is discussed. The function of the sequencing and spacing algorithms is to automatically plan the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several algorithms are described and their statistical performance is examined. Sequencing brings order to an arrival sequence for aircraft. First-come-first-served sequencing (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the arriving traffic, gaps will remain in the sequence of aircraft. Delays are reduced by time-advancing the leading aircraft of each group while still preserving the FCFS order. Tightly spaced groups of aircraft remain with a mix of heavy and large aircraft. Spacing requirements differ for different types of aircraft trailing each other. Traffic is reordered slightly to take advantage of this spacing criterion, thus shortening the groups and reducing average delays. For heavy traffic, delays for different traffic samples vary widely, even when the same set of statistical parameters is used to produce each sample. This report supersedes NASA TM-102795 on the same subject. It includes a new method of time-advance as well as an efficient method of sequencing and spacing for two dependent runways.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F. William

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient.
Method for high-volume sequencing of nucleic acids: random and directed priming with libraries of oligonucleotides

DOEpatents

Studier, F.W.

1995-04-18

Random and directed priming methods for determining nucleotide sequences by enzymatic sequencing techniques, using libraries of primers of lengths 8, 9 or 10 bases, are disclosed. These methods permit direct sequencing of nucleic acids as large as 45,000 base pairs or larger without the necessity for subcloning. Individual primers are used repeatedly to prime sequence reactions in many different nucleic acid molecules. Libraries containing as few as 10,000 octamers, 14,200 nonamers, or 44,000 decamers would have the capacity to determine the sequence of almost any cosmid DNA. Random priming with a fixed set of primers from a smaller library can also be used to initiate the sequencing of individual nucleic acid molecules, with the sequence being completed by directed priming with primers from the library. In contrast to random cloning techniques, a combined random and directed priming strategy is far more efficient. 2 figs.
First Description of Two Sequence Type 2 Acinetobacter baumannii Isolates Carrying OXA-23 Carbapenemase in Pagellus acarne Fished from the Mediterranean Sea near Bejaia, Algeria

PubMed Central

Brahmi, Soumia; Touati, Abdelaziz; Cadière, Axelle; Djahmi, Nassima; Pantel, Alix; Sotto, Albert; Dunyach-Remy, Catherine

2016-01-01

To determine the occurrence of carbapenem-resistant Acinetobacter baumannii in fish fished from the Mediterranean Sea near the Bejaia coast (Algeria), we studied 300 gills and gut samples that had been randomly and prospectively collected during 1 year. After screening on selective agar media, using PCR arrays and whole-genome sequencing, we identified for the first time two OXA-23-producing A. baumannii strains belonging to the widespread sequence type 2 (ST2)/international clone II and harboring aminoglycoside-modifying enzymes [aac(6′)-Ib and aac(3′)-I genes]. PMID:26787693

Betting on Illusory Patterns: Probability Matching in Habitual Gamblers.

PubMed

Gaissmaier, Wolfgang; Wilke, Andreas; Scheibehenne, Benjamin; McCanney, Paige; Barrett, H Clark

2016-03-01

Why do people gamble? A large body of research suggests that cognitive distortions play an important role in pathological gambling. Many of these distortions are specific cases of a more general misperception of randomness, specifically of an illusory perception of patterns in random sequences. In this article, we provide further evidence for the assumption that gamblers are particularly prone to perceiving illusory patterns. In particular, we compared habitual gamblers to a matched sample of community members with regard to how much they exhibit the choice anomaly 'probability matching'. Probability matching describes the tendency to match response proportions to outcome probabilities when predicting binary outcomes. It leads to a lower expected accuracy than the maximizing strategy of predicting the most likely event on each trial. Previous research has shown that an illusory perception of patterns in random sequences fuels probability matching. So does impulsivity, which is also reported to be higher in gamblers. We therefore hypothesized that gamblers will exhibit more probability matching than non-gamblers, which was confirmed in a controlled laboratory experiment. Additionally, gamblers scored much lower than community members on the cognitive reflection task, which indicates higher impulsivity. This difference could account for the difference in probability matching between the samples. These results suggest that gamblers are more willing to bet impulsively on perceived illusory patterns.
Investigation of microsatellite instability in Turkish breast cancer patients.

PubMed

Demokan, Semra; Muslumanoglu, Mahmut; Yazici, H; Igci, Abdullah; Dalay, Nejat

2002-01-01

Multiple somatic and inherited genetic changes that lead to loss of growth control may contribute to the development of breast cancer. Microsatellites are tandem repeats of simple sequences that occur abundantly and at random throughout most eucaryotic genomes. Microsatellite instability (MI), characterized by the presence of random contractions or expansions in the length of simple sequence repeats or microsatellites, is observed in a variety of tumors. The aim of this study was to compare tumor DNA fingerprints with constitutional DNA fingerprints to investigate changes specific to breast cancer and evaluate its correlation with clinical characteristics. Tumor and normal tissue samples of 38 patients with breast cancer were investigated by comparing PCR-amplified microsatellite sequences D2S443 and D21S1436. Microsatellite instability at D21S1436 and D2S443 was found in 5 (13%) and 7 (18%) patients, respectively. Two patients displayed instability at both marker loci. No association was found between MI and age, family history, lymph node involvement and other clinical parameters.
Reducing DNA context dependence in bacterial promoters

PubMed Central

Carr, Swati B.; Densmore, Douglas M.

2017-01-01

Variation in the DNA sequence upstream of bacterial promoters is known to affect the expression levels of the products they regulate, sometimes dramatically. While neutral synthetic insulator sequences have been found to buffer promoters from upstream DNA context, there are no established methods for designing effective insulator sequences with predictable effects on expression levels. We address this problem with Degenerate Insulation Screening (DIS), a novel method based on a randomized 36-nucleotide insulator library and a simple, high-throughput, flow-cytometry-based screen that randomly samples from a library of 436 potential insulated promoters. The results of this screen can then be compared against a reference uninsulated device to select a set of insulated promoters providing a precise level of expression. We verify this method by insulating the constitutive, inducible, and repressible promotors of a four transcriptional-unit inverter (NOT-gate) circuit, finding both that order dependence is largely eliminated by insulation and that circuit performance is also significantly improved, with a 5.8-fold mean improvement in on/off ratio. PMID:28422998
Metagenomic approaches for direct and cell culture evaluation of the virological quality of wastewater

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aw, Tiong Gim; Howe, Adina; Rose, Joan B.

2014-12-01

Genomic-based molecular techniques are emerging as powerful tools that allow a comprehensive characterization of water and wastewater microbiomes. Most recently, next generation sequencing (NGS) technologies which produce large amounts of sequence data are beginning to impact the field of environmental virology. In this study, NGS and bioinformatics have been employed for the direct detection and characterization of viruses in wastewater and of viruses isolated after cell culture. Viral particles were concentrated and purified from sewage samples by polyethylene glycol precipitation. Viral nucleic acid was extracted and randomly amplified prior to sequencing using Illumina technology, yielding a total of 18 millionmore » sequence reads. Most of the viral sequences detected could not be characterized, indicating the great viral diversity that is yet to be discovered. This sewage virome was dominated by bacteriophages and contained sequences related to known human pathogenic viruses such as adenoviruses (species B, C and F), polyomaviruses JC and BK and enteroviruses (type B). An array of other animal viruses was also found, suggesting unknown zoonotic viruses. This study demonstrated the feasibility of metagenomic approaches to characterize viruses in complex environmental water samples.« less
Microsatellite genotyping and genome-wide single nucleotide polymorphism-based indices of Plasmodium falciparum diversity within clinical infections.

PubMed

Murray, Lee; Mobegi, Victor A; Duffy, Craig W; Assefa, Samuel A; Kwiatkowski, Dominic P; Laman, Eugene; Loua, Kovana M; Conway, David J

2016-05-12

In regions where malaria is endemic, individuals are often infected with multiple distinct parasite genotypes, a situation that may impact on evolution of parasite virulence and drug resistance. Most approaches to studying genotypic diversity have involved analysis of a modest number of polymorphic loci, although whole genome sequencing enables a broader characterisation of samples. PCR-based microsatellite typing of a panel of ten loci was performed on Plasmodium falciparum in 95 clinical isolates from a highly endemic area in the Republic of Guinea, to characterize within-isolate genetic diversity. Separately, single nucleotide polymorphism (SNP) data from genome-wide short-read sequences of the same samples were used to derive within-isolate fixation indices (F ws), an inverse measure of diversity within each isolate compared to overall local genetic diversity. The latter indices were compared with the microsatellite results, and also with indices derived by randomly sampling modest numbers of SNPs. As expected, the number of microsatellite loci with more than one allele in each isolate was highly significantly inversely correlated with the genome-wide F ws fixation index (r = -0.88, P < 0.001). However, the microsatellite analysis revealed that most isolates contained mixed genotypes, even those that had no detectable genome sequence heterogeneity. Random sampling of different numbers of SNPs showed that an F ws index derived from ten or more SNPs with minor allele frequencies of >10 % had high correlation (r > 0.90) with the index derived using all SNPs. Different types of data give highly correlated indices of within-infection diversity, although PCR-based analysis detects low-level minority genotypes not apparent in bulk sequence analysis. When whole-genome data are not obtainable, quantitative assay of ten or more SNPs can yield a reasonably accurate estimate of the within-infection fixation index (F ws).
Predicting membrane protein types using various decision tree classifiers based on various modes of general PseAAC for imbalanced datasets.

PubMed

Sankari, E Siva; Manimegalai, D

2017-12-21

Predicting membrane protein types is an important and challenging research area in bioinformatics and proteomics. Traditional biophysical methods are used to classify membrane protein types. Due to large exploration of uncharacterized protein sequences in databases, traditional methods are very time consuming, expensive and susceptible to errors. Hence, it is highly desirable to develop a robust, reliable, and efficient method to predict membrane protein types. Imbalanced datasets and large datasets are often handled well by decision tree classifiers. Since imbalanced datasets are taken, the performance of various decision tree classifiers such as Decision Tree (DT), Classification And Regression Tree (CART), C4.5, Random tree, REP (Reduced Error Pruning) tree, ensemble methods such as Adaboost, RUS (Random Under Sampling) boost, Rotation forest and Random forest are analysed. Among the various decision tree classifiers Random forest performs well in less time with good accuracy of 96.35%. Another inference is RUS boost decision tree classifier is able to classify one or two samples in the class with very less samples while the other classifiers such as DT, Adaboost, Rotation forest and Random forest are not sensitive for the classes with fewer samples. Also the performance of decision tree classifiers is compared with SVM (Support Vector Machine) and Naive Bayes classifier. Copyright © 2017 Elsevier Ltd. All rights reserved.
Read clouds uncover variation in complex regions of the human genome.

PubMed

Bishara, Alex; Liu, Yuling; Weng, Ziming; Kashef-Haghighi, Dorna; Newburger, Daniel E; West, Robert; Sidow, Arend; Batzoglou, Serafim

2015-10-01

Although an increasing amount of human genetic variation is being identified and recorded, determining variants within repeated sequences of the human genome remains a challenge. Most population and genome-wide association studies have therefore been unable to consider variation in these regions. Core to the problem is the lack of a sequencing technology that produces reads with sufficient length and accuracy to enable unique mapping. Here, we present a novel methodology of using read clouds, obtained by accurate short-read sequencing of DNA derived from long fragment libraries, to confidently align short reads within repeat regions and enable accurate variant discovery. Our novel algorithm, Random Field Aligner (RFA), captures the relationships among the short reads governed by the long read process via a Markov Random Field. We utilized a modified version of the Illumina TruSeq synthetic long-read protocol, which yielded shallow-sequenced read clouds. We test RFA through extensive simulations and apply it to discover variants on the NA12878 human sample, for which shallow TruSeq read cloud sequencing data are available, and on an invasive breast carcinoma genome that we sequenced using the same method. We demonstrate that RFA facilitates accurate recovery of variation in 155 Mb of the human genome, including 94% of 67 Mb of segmental duplication sequence and 96% of 11 Mb of transcribed sequence, that are currently hidden from short-read technologies. © 2015 Bishara et al.; Published by Cold Spring Harbor Laboratory Press.
Methodological reporting of randomized trials in five leading Chinese nursing journals.

PubMed

Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

2014-01-01

Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.
Differential gene expression in the siphonophore Nanomia bijuga (Cnidaria) assessed with multiple next-generation sequencing workflows.

PubMed

Siebert, Stefan; Robinson, Mark D; Tintori, Sophia C; Goetz, Freya; Helm, Rebecca R; Smith, Stephen A; Shaner, Nathan; Haddock, Steven H D; Dunn, Casey W

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing.
Differential Gene Expression in the Siphonophore Nanomia bijuga (Cnidaria) Assessed with Multiple Next-Generation Sequencing Workflows

PubMed Central

Siebert, Stefan; Robinson, Mark D.; Tintori, Sophia C.; Goetz, Freya; Helm, Rebecca R.; Smith, Stephen A.; Shaner, Nathan; Haddock, Steven H. D.; Dunn, Casey W.

2011-01-01

We investigated differential gene expression between functionally specialized feeding polyps and swimming medusae in the siphonophore Nanomia bijuga (Cnidaria) with a hybrid long-read/short-read sequencing strategy. We assembled a set of partial gene reference sequences from long-read data (Roche 454), and generated short-read sequences from replicated tissue samples that were mapped to the references to quantify expression. We collected and compared expression data with three short-read expression workflows that differ in sample preparation, sequencing technology, and mapping tools. These workflows were Illumina mRNA-Seq, which generates sequence reads from random locations along each transcript, and two tag-based approaches, SOLiD SAGE and Helicos DGE, which generate reads from particular tag sites. Differences in expression results across workflows were mostly due to the differential impact of missing data in the partial reference sequences. When all 454-derived gene reference sequences were considered, Illumina mRNA-Seq detected more than twice as many differentially expressed (DE) reference sequences as the tag-based workflows. This discrepancy was largely due to missing tag sites in the partial reference that led to false negatives in the tag-based workflows. When only the subset of reference sequences that unambiguously have tag sites was considered, we found broad congruence across workflows, and they all identified a similar set of DE sequences. Our results are promising in several regards for gene expression studies in non-model organisms. First, we demonstrate that a hybrid long-read/short-read sequencing strategy is an effective way to collect gene expression data when an annotated genome sequence is not available. Second, our replicated sampling indicates that expression profiles are highly consistent across field-collected animals in this case. Third, the impacts of partial reference sequences on the ability to detect DE can be mitigated through workflow choice and deeper reference sequencing. PMID:21829563
Using a Calendar and Explanatory Instructions to Aid Within-Household Selection in Mail Surveys

ERIC Educational Resources Information Center

Stange, Mathew; Smyth, Jolene D.; Olson, Kristen

2016-01-01

Although researchers can easily select probability samples of addresses using the U.S. Postal Service's Delivery Sequence File, randomly selecting respondents within households for surveys remains challenging. Researchers often place within-household selection instructions, such as the next or last birthday methods, in survey cover letters to…
Supplementing Literacy Instruction with a Media-Rich Intervention: Results of a Randomized Controlled Trial

ERIC Educational Resources Information Center

Penuel, William R.; Bates, Lauren; Gallagher, Lawrence P.; Pasnik, Shelley; Llorente, Carlin; Townsend, Eve; Hupert, Naomi; Dominguez, Ximena; VanderBorght, Mieke

2012-01-01

This study investigates whether a curriculum supplement organized as a sequence of teacher-led literacy activities using digital content from public educational television programs can improve early literacy outcomes of low-income preschoolers. The study sample was 436 children in 80 preschool classrooms in California and New York. Preschool…
Effects of different preservation methods on inter simple sequence repeat (ISSR) and random amplified polymorphic DNA (RAPD) molecular markers in botanic samples.

PubMed

Wang, Xiaolong; Li, Lin; Zhao, Jiaxin; Li, Fangliang; Guo, Wei; Chen, Xia

2017-04-01

To evaluate the effects of different preservation methods (stored in a -20°C ice chest, preserved in liquid nitrogen and dried in silica gel) on inter simple sequence repeat (ISSR) or random amplified polymorphic DNA (RAPD) analyses in various botanical specimens (including broad-leaved plants, needle-leaved plants and succulent plants) for different times (three weeks and three years), we used a statistical analysis based on the number of bands, genetic index and cluster analysis. The results demonstrate that methods used to preserve samples can provide sufficient amounts of genomic DNA for ISSR and RAPD analyses; however, the effect of different preservation methods on these analyses vary significantly, and the preservation time has little effect on these analyses. Our results provide a reference for researchers to select the most suitable preservation method depending on their study subject for the analysis of molecular markers based on genomic DNA. Copyright © 2017 Académie des sciences. Published by Elsevier Masson SAS. All rights reserved.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Denef, Vincent; Shah, Manesh B; Verberkmoes, Nathan C

The recent surge in microbial genomic sequencing, combined with the development of high-throughput liquid chromatography-mass-spectrometry-based (LC/LC-MS/MS) proteomics, has raised the question of the extent to which genomic information of one strain or environmental sample can be used to profile proteomes of related strains or samples. Even with decreasing sequencing costs, it remains impractical to obtain genomic sequence for every strain or sample analyzed. Here, we evaluate how shotgun proteomics is affected by amino acid divergence between the sample and the genomic database using a probability-based model and a random mutation simulation model constrained by experimental data. To assess the effectsmore » of nonrandom distribution of mutations, we also evaluated identification levels using in silico peptide data from sequenced isolates with average amino acid identities (AAI) varying between 76 and 98%. We compared the predictions to experimental protein identification levels for a sample that was evaluated using a database that included genomic information for the dominant organism and for a closely related variant (95% AAI). The range of models set the boundaries at which half of the proteins in a proteomic experiment can be identified to be 77-92% AAI between orthologs in the sample and database. Consistent with this prediction, experimental data indicated loss of half the identifiable proteins at 90% AAI. Additional analysis indicated a 6.4% reduction of the initial protein coverage per 1% amino acid divergence and total identification loss at 86% AAI. Consequently, shotgun proteomics is capable of cross-strain identifications but avoids most crossspecies false positives.« less
Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

DOEpatents

Lucas, J.N.; Straume, T.; Bogen, K.T.

1998-03-24

A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration. 14 figs.
Identification of random nucleic acid sequence aberrations using dual capture probes which hybridize to different chromosome regions

DOEpatents

Lucas, Joe N.; Straume, Tore; Bogen, Kenneth T.

1998-01-01

A method is provided for detecting nucleic acid sequence aberrations using two immobilization steps. According to the method, a nucleic acid sequence aberration is detected by detecting nucleic acid sequences having both a first nucleic acid sequence type (e.g., from a first chromosome) and a second nucleic acid sequence type (e.g., from a second chromosome), the presence of the first and the second nucleic acid sequence type on the same nucleic acid sequence indicating the presence of a nucleic acid sequence aberration. In the method, immobilization of a first hybridization probe is used to isolate a first set of nucleic acids in the sample which contain the first nucleic acid sequence type. Immobilization of a second hybridization probe is then used to isolate a second set of nucleic acids from within the first set of nucleic acids which contain the second nucleic acid sequence type. The second set of nucleic acids are then detected, their presence indicating the presence of a nucleic acid sequence aberration.
Implementation of a quantum random number generator based on the optimal clustering of photocounts

NASA Astrophysics Data System (ADS)

Balygin, K. A.; Zaitsev, V. I.; Klimov, A. N.; Kulik, S. P.; Molotkov, S. N.

2017-10-01

To implement quantum random number generators, it is fundamentally important to have a mathematically provable and experimentally testable process of measurements of a system from which an initial random sequence is generated. This makes sure that randomness indeed has a quantum nature. A quantum random number generator has been implemented with the use of the detection of quasi-single-photon radiation by a silicon photomultiplier (SiPM) matrix, which makes it possible to reliably reach the Poisson statistics of photocounts. The choice and use of the optimal clustering of photocounts for the initial sequence of photodetection events and a method of extraction of a random sequence of 0's and 1's, which is polynomial in the length of the sequence, have made it possible to reach a yield rate of 64 Mbit/s of the output certainly random sequence.
Real-time fast physical random number generator with a photonic integrated circuit.

PubMed

Ugajin, Kazusa; Terashima, Yuta; Iwakawa, Kento; Uchida, Atsushi; Harayama, Takahisa; Yoshimura, Kazuyuki; Inubushi, Masanobu

2017-03-20

Random number generators are essential for applications in information security and numerical simulations. Most optical-chaos-based random number generators produce random bit sequences by offline post-processing with large optical components. We demonstrate a real-time hardware implementation of a fast physical random number generator with a photonic integrated circuit and a field programmable gate array (FPGA) electronic board. We generate 1-Tbit random bit sequences and evaluate their statistical randomness using NIST Special Publication 800-22 and TestU01. All of the BigCrush tests in TestU01 are passed using 410-Gbit random bit sequences. A maximum real-time generation rate of 21.1 Gb/s is achieved for random bit sequences in binary format stored in a computer, which can be directly used for applications involving secret keys in cryptography and random seeds in large-scale numerical simulations.
Robust reliable sampled-data control for switched systems with application to flight control

NASA Astrophysics Data System (ADS)

Sakthivel, R.; Joby, Maya; Shi, P.; Mathiyalagan, K.

2016-11-01

This paper addresses the robust reliable stabilisation problem for a class of uncertain switched systems with random delays and norm bounded uncertainties. The main aim of this paper is to obtain the reliable robust sampled-data control design which involves random time delay with an appropriate gain control matrix for achieving the robust exponential stabilisation for uncertain switched system against actuator failures. In particular, the involved delays are assumed to be randomly time-varying which obeys certain mutually uncorrelated Bernoulli distributed white noise sequences. By constructing an appropriate Lyapunov-Krasovskii functional (LKF) and employing an average-dwell time approach, a new set of criteria is derived for ensuring the robust exponential stability of the closed-loop switched system. More precisely, the Schur complement and Jensen's integral inequality are used in derivation of stabilisation criteria. By considering the relationship among the random time-varying delay and its lower and upper bounds, a new set of sufficient condition is established for the existence of reliable robust sampled-data control in terms of solution to linear matrix inequalities (LMIs). Finally, an illustrative example based on the F-18 aircraft model is provided to show the effectiveness of the proposed design procedures.
Theory and implementation of a very high throughput true random number generator in field programmable gate array

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, Yonggang, E-mail: wangyg@ustc.edu.cn; Hui, Cong; Liu, Chong

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving,more » so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.« less

Theory and implementation of a very high throughput true random number generator in field programmable gate array.

PubMed

Wang, Yonggang; Hui, Cong; Liu, Chong; Xu, Chao

2016-04-01

The contribution of this paper is proposing a new entropy extraction mechanism based on sampling phase jitter in ring oscillators to make a high throughput true random number generator in a field programmable gate array (FPGA) practical. Starting from experimental observation and analysis of the entropy source in FPGA, a multi-phase sampling method is exploited to harvest the clock jitter with a maximum entropy and fast sampling speed. This parametrized design is implemented in a Xilinx Artix-7 FPGA, where the carry chains in the FPGA are explored to realize the precise phase shifting. The generator circuit is simple and resource-saving, so that multiple generation channels can run in parallel to scale the output throughput for specific applications. The prototype integrates 64 circuit units in the FPGA to provide a total output throughput of 7.68 Gbps, which meets the requirement of current high-speed quantum key distribution systems. The randomness evaluation, as well as its robustness to ambient temperature, confirms that the new method in a purely digital fashion can provide high-speed high-quality random bit sequences for a variety of embedded applications.
Random sequences generation through optical measurements by phase-shifting interferometry

NASA Astrophysics Data System (ADS)

François, M.; Grosges, T.; Barchiesi, D.; Erra, R.; Cornet, A.

2012-04-01

The development of new techniques for producing random sequences with a high level of security is a challenging topic of research in modern cryptographics. The proposed method is based on the measurement by phase-shifting interferometry of the speckle signals of the interaction between light and structures. We show how the combination of amplitude and phase distributions (maps) under a numerical process can produce random sequences. The produced sequences satisfy all the statistical requirements of randomness and can be used in cryptographic schemes.
Error baseline rates of five sample preparation methods used to characterize RNA virus populations.

PubMed

Kugelman, Jeffrey R; Wiley, Michael R; Nagle, Elyse R; Reyes, Daniel; Pfeffer, Brad P; Kuhn, Jens H; Sanchez-Lockhart, Mariano; Palacios, Gustavo F

2017-01-01

Individual RNA viruses typically occur as populations of genomes that differ slightly from each other due to mutations introduced by the error-prone viral polymerase. Understanding the variability of RNA virus genome populations is critical for understanding virus evolution because individual mutant genomes may gain evolutionary selective advantages and give rise to dominant subpopulations, possibly even leading to the emergence of viruses resistant to medical countermeasures. Reverse transcription of virus genome populations followed by next-generation sequencing is the only available method to characterize variation for RNA viruses. However, both steps may lead to the introduction of artificial mutations, thereby skewing the data. To better understand how such errors are introduced during sample preparation, we determined and compared error baseline rates of five different sample preparation methods by analyzing in vitro transcribed Ebola virus RNA from an artificial plasmid-based system. These methods included: shotgun sequencing from plasmid DNA or in vitro transcribed RNA as a basic "no amplification" method, amplicon sequencing from the plasmid DNA or in vitro transcribed RNA as a "targeted" amplification method, sequence-independent single-primer amplification (SISPA) as a "random" amplification method, rolling circle reverse transcription sequencing (CirSeq) as an advanced "no amplification" method, and Illumina TruSeq RNA Access as a "targeted" enrichment method. The measured error frequencies indicate that RNA Access offers the best tradeoff between sensitivity and sample preparation error (1.4-5) of all compared methods.
Are quantitative trait-dependent sampling designs cost-effective for analysis of rare and common variants?

PubMed

Yilmaz, Yildiz E; Bull, Shelley B

2011-11-29

Use of trait-dependent sampling designs in whole-genome association studies of sequence data can reduce total sequencing costs with modest losses of statistical efficiency. In a quantitative trait (QT) analysis of data from the Genetic Analysis Workshop 17 mini-exome for unrelated individuals in the Asian subpopulation, we investigate alternative designs that sequence only 50% of the entire cohort. In addition to a simple random sampling design, we consider extreme-phenotype designs that are of increasing interest in genetic association analysis of QTs, especially in studies concerned with the detection of rare genetic variants. We also evaluate a novel sampling design in which all individuals have a nonzero probability of being selected into the sample but in which individuals with extreme phenotypes have a proportionately larger probability. We take differential sampling of individuals with informative trait values into account by inverse probability weighting using standard survey methods which thus generalizes to the source population. In replicate 1 data, we applied the designs in association analysis of Q1 with both rare and common variants in the FLT1 gene, based on knowledge of the generating model. Using all 200 replicate data sets, we similarly analyzed Q1 and Q4 (which is known to be free of association with FLT1) to evaluate relative efficiency, type I error, and power. Simulation study results suggest that the QT-dependent selection designs generally yield greater than 50% relative efficiency compared to using the entire cohort, implying cost-effectiveness of 50% sample selection and worthwhile reduction of sequencing costs.
Covariance Matrix Estimation for Massive MIMO

NASA Astrophysics Data System (ADS)

Upadhya, Karthik; Vorobyov, Sergiy A.

2018-04-01

We propose a novel pilot structure for covariance matrix estimation in massive multiple-input multiple-output (MIMO) systems in which each user transmits two pilot sequences, with the second pilot sequence multiplied by a random phase-shift. The covariance matrix of a particular user is obtained by computing the sample cross-correlation of the channel estimates obtained from the two pilot sequences. This approach relaxes the requirement that all the users transmit their uplink pilots over the same set of symbols. We derive expressions for the achievable rate and the mean-squared error of the covariance matrix estimate when the proposed method is used with staggered pilots. The performance of the proposed method is compared with existing methods through simulations.
Optical Processing Techniques For Pseudorandom Sequence Prediction

NASA Astrophysics Data System (ADS)

Gustafson, Steven C.

1983-11-01

Pseudorandom sequences are series of apparently random numbers generated, for example, by linear or nonlinear feedback shift registers. An important application of these sequences is in spread spectrum communication systems, in which, for example, the transmitted carrier phase is digitally modulated rapidly and pseudorandomly and in which the information to be transmitted is incorporated as a slow modulation in the pseudorandom sequence. In this case the transmitted information can be extracted only by a receiver that uses for demodulation the same pseudorandom sequence used by the transmitter, and thus this type of communication system has a very high immunity to third-party interference. However, if a third party can predict in real time the probable future course of the transmitted pseudorandom sequence given past samples of this sequence, then interference immunity can be significantly reduced.. In this application effective pseudorandom sequence prediction techniques should be (1) applicable in real time to rapid (e.g., megahertz) sequence generation rates, (2) applicable to both linear and nonlinear pseudorandom sequence generation processes, and (3) applicable to error-prone past sequence samples of limited number and continuity. Certain optical processing techniques that may meet these requirements are discussed in this paper. In particular, techniques based on incoherent optical processors that perform general linear transforms or (more specifically) matrix-vector multiplications are considered. Computer simulation examples are presented which indicate that significant prediction accuracy can be obtained using these transforms for simple pseudorandom sequences. However, the useful prediction of more complex pseudorandom sequences will probably require the application of more sophisticated optical processing techniques.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Migratory flyway and geographical distance are barriers to the gene flow of influenza virus among North American birds

USGS Publications Warehouse

Lam, Tommy Tsan-Yuk; Ip, Hon S.; Ghedin, Elodie; Wentworth, David E.; Halpin, Rebecca A.; Stockwell, Timothy B.; Spiro, David J.; Dusek, Robert J.; Bortner, James B.; Hoskins, Jenny; Bales, Bradley D.; Yparraguirre, Dan R.; Holmes, Edward C.

2012-01-01

Despite the importance of migratory birds in the ecology and evolution of avian influenza virus (AIV), there is a lack of information on the patterns of AIV spread at the intra-continental scale. We applied a variety of statistical phylogeographic techniques to a plethora of viral genome sequence data to determine the strength, pattern and determinants of gene flow in AIV sampled from wild birds in North America. These analyses revealed a clear isolation-by-distance of AIV among sampling localities. In addition, we show that phylogeographic models incorporating information on the avian flyway of sampling proved a better fit to the observed sequence data than those specifying homogeneous or random rates of gene flow among localities. In sum, these data strongly suggest that the intra-continental spread of AIV by migratory birds is subject to major ecological barriers, including spatial distance and avian flyway.
Detection of large numbers of novel sequences in the metatranscriptomes of complex marine microbial communities.

PubMed

Gilbert, Jack A; Field, Dawn; Huang, Ying; Edwards, Rob; Li, Weizhong; Gilna, Paul; Joint, Ian

2008-08-22

Sequencing the expressed genetic information of an ecosystem (metatranscriptome) can provide information about the response of organisms to varying environmental conditions. Until recently, metatranscriptomics has been limited to microarray technology and random cloning methodologies. The application of high-throughput sequencing technology is now enabling access to both known and previously unknown transcripts in natural communities. We present a study of a complex marine metatranscriptome obtained from random whole-community mRNA using the GS-FLX Pyrosequencing technology. Eight samples, four DNA and four mRNA, were processed from two time points in a controlled coastal ocean mesocosm study (Bergen, Norway) involving an induced phytoplankton bloom producing a total of 323,161,989 base pairs. Our study confirms the finding of the first published metatranscriptomic studies of marine and soil environments that metatranscriptomics targets highly expressed sequences which are frequently novel. Our alternative methodology increases the range of experimental options available for conducting such studies and is characterized by an exceptional enrichment of mRNA (99.92%) versus ribosomal RNA. Analysis of corresponding metagenomes confirms much higher levels of assembly in the metatranscriptomic samples and a far higher yield of large gene families with >100 members, approximately 91% of which were novel. This study provides further evidence that metatranscriptomic studies of natural microbial communities are not only feasible, but when paired with metagenomic data sets, offer an unprecedented opportunity to explore both structure and function of microbial communities--if we can overcome the challenges of elucidating the functions of so many never-seen-before gene families.
Novel Degenerate PCR Method for Whole-Genome Amplification Applied to Peru Margin (ODP Leg 201) Subsurface Samples

PubMed Central

Martino, Amanda J.; Rhodes, Matthew E.; Biddle, Jennifer F.; Brandt, Leah D.; Tomsho, Lynn P.; House, Christopher H.

2011-01-01

A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3′ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples. PMID:22319519
Multiplex Amplification Refractory Mutation System PCR (ARMS-PCR) provides sequencing independent typing of canine parvovirus.

PubMed

Chander, Vishal; Chakravarti, Soumendu; Gupta, Vikas; Nandi, Sukdeb; Singh, Mithilesh; Badasara, Surendra Kumar; Sharma, Chhavi; Mittal, Mitesh; Dandapat, S; Gupta, V K

2016-12-01

Canine parvovirus-2 antigenic variants (CPV-2a, CPV-2b and CPV-2c) ubiquitously distributed worldwide in canine population causes severe fatal gastroenteritis. Antigenic typing of CPV-2 remains a prime focus of research groups worldwide in understanding the disease epidemiology and virus evolution. The present study was thus envisioned to provide a simple sequencing independent, rapid, robust, specific, user-friendly technique for detecting and typing of presently circulating CPV-2 antigenic variants. ARMS-PCR strategy was employed using specific primers for CPV-2a, CPV-2b and CPV-2c to differentiate these antigenic types. ARMS-PCR was initially optimized with reference positive controls in two steps; where first reaction was used to differentiate CPV-2a from CPV-2b/CPV-2c. The second reaction was carried out with CPV-2c specific primers to confirm the presence of CPV-2c. Initial validation of the ARMS-PCR was carried out with 24 sequenced samples and the results were matched with the sequencing results. ARMS-PCR technique was further used to screen and type 90 suspected clinical samples. Randomly selected 15 suspected clinical samples that were typed with this technique were sequenced. The results of ARMS-PCR and the sequencing matched exactly with each other. The developed technique has a potential to become a sequencing independent method for simultaneous detection and typing of CPV-2 antigenic variants in veterinary disease diagnostic laboratories globally. Copyright Â© 2016 Elsevier B.V. All rights reserved.
Application of a time-dependent coalescence process for inferring the history of population size changes from DNA sequence data.

PubMed

Polanski, A; Kimmel, M; Chakraborty, R

1998-05-12

Distribution of pairwise differences of nucleotides from data on a sample of DNA sequences from a given segment of the genome has been used in the past to draw inferences about the past history of population size changes. However, all earlier methods assume a given model of population size changes (such as sudden expansion), parameters of which (e.g., time and amplitude of expansion) are fitted to the observed distributions of nucleotide differences among pairwise comparisons of all DNA sequences in the sample. Our theory indicates that for any time-dependent population size, N(tau) (in which time tau is counted backward from present), a time-dependent coalescence process yields the distribution, p(tau), of the time of coalescence between two DNA sequences randomly drawn from the population. Prediction of p(tau) and N(tau) requires the use of a reverse Laplace transform known to be unstable. Nevertheless, simulated data obtained from three models of monotone population change (stepwise, exponential, and logistic) indicate that the pattern of a past population size change leaves its signature on the pattern of DNA polymorphism. Application of the theory to the published mtDNA sequences indicates that the current mtDNA sequence variation is not inconsistent with a logistic growth of the human population.
Comparative characterization of random-sequence proteins consisting of 5, 12, and 20 kinds of amino acids

PubMed Central

Tanaka, Junko; Doi, Nobuhide; Takashima, Hideaki; Yanagawa, Hiroshi

2010-01-01

Screening of functional proteins from a random-sequence library has been used to evolve novel proteins in the field of evolutionary protein engineering. However, random-sequence proteins consisting of the 20 natural amino acids tend to aggregate, and the occurrence rate of functional proteins in a random-sequence library is low. From the viewpoint of the origin of life, it has been proposed that primordial proteins consisted of a limited set of amino acids that could have been abundantly formed early during chemical evolution. We have previously found that members of a random-sequence protein library constructed with five primitive amino acids show high solubility (Doi et al., Protein Eng Des Sel 2005;18:279–284). Although such a library is expected to be appropriate for finding functional proteins, the functionality may be limited, because they have no positively charged amino acid. Here, we constructed three libraries of 120-amino acid, random-sequence proteins using alphabets of 5, 12, and 20 amino acids by preselection using mRNA display (to eliminate sequences containing stop codons and frameshifts) and characterized and compared the structural properties of random-sequence proteins arbitrarily chosen from these libraries. We found that random-sequence proteins constructed with the 12-member alphabet (including five primitive amino acids and positively charged amino acids) have higher solubility than those constructed with the 20-member alphabet, though other biophysical properties are very similar in the two libraries. Thus, a library of moderate complexity constructed from 12 amino acids may be a more appropriate resource for functional screening than one constructed from 20 amino acids. PMID:20162614
Application of SCAR (sequence characterized amplified region) analysis to authenticate Lycium barbarum (wolfberry) and its adulterants.

PubMed

Sze, Stephen Cho-Wing; Song, Ju-Xian; Wong, Ricky Ngok-Shun; Feng, Yi-Bin; Ng, Tzi-Bun; Tong, Yao; Zhang, Kalin Yan-Bo

2008-09-01

Fructus Lycii (Gouqizi) is well known in Chinese herbal medicine for its restorative function of benefiting the liver and kidney, replenishing vital essence and improving eyesight. However, ten species and varieties of Lycium have benn found to be substitutes or adulterants of Lycium barbarum (wolfberry) in commercial markets in the Hong Kong Special Administrative Region and in China generally. L. barbarum cv. 'Tianjinense' and Lycium chinense var. potaninii are the most common examples. It is difficult to differentiate among the Lycium species by traditional morphological and histological analyses. An easy and reliable approach based on SCAR (sequence characterized amplified region) analysis was developed in the present study to differentiate L. barbarum from other Lycium species. Two characteristic bands of approx. 700 and 650 bp were detected on the RAPD (random amplification of polymorphic DNA) profiles generated from samples of L. barbarum and L. chinense var. potaninii using the primer OPC-7. They were isolated and sequenced. Two primer sets, based on the sequences, could amplify a single specific band in samples of L. barbarum respectively, whereas no bands were detected in samples of L. chinense var. potaninii. The results confirmed that the SCAR technique can be employed for authenticating L. barbarum and its adulterants.
Pretreatment drug resistance in a large countrywide Ethiopian HIV-1C cohort: a comparison of Sanger and high-throughput sequencing.

PubMed

Telele, Nigus Fikrie; Kalu, Amare Worku; Gebre-Selassie, Solomon; Fekade, Daniel; Abdurahman, Samir; Marrone, Gaetano; Neogi, Ujjwal; Tegbaru, Belete; Sönnerborg, Anders

2018-05-15

Baseline plasma samples of 490 randomly selected antiretroviral therapy (ART) naïve patients from seven hospitals participating in the first nationwide Ethiopian HIV-1 cohort were analysed for surveillance drug resistance mutations (sDRM) by population based Sanger sequencing (PBSS). Also next generation sequencing (NGS) was used in a subset of 109 baseline samples of patients. Treatment outcome after 6- and 12-months was assessed by on-treatment (OT) and intention-to-treat (ITT) analyses. Transmitted drug resistance (TDR) was detected in 3.9% (18/461) of successfully sequenced samples by PBSS. However, NGS detected sDRM more often (24%; 26/109) than PBSS (6%; 7/109) (p = 0.0001) and major integrase strand transfer inhibitors (INSTI) DRMs were also found in minor viral variants from five patients. Patients with sDRM had more frequent treatment failure in both OT and ITT analyses. The high rate of TDR by NGS and the identification of preexisting INSTI DRMs in minor wild-type HIV-1 subtype C viral variants infected Ethiopian patients underscores the importance of TDR surveillance in low- and middle-income countries and shows added value of high-throughput NGS in such studies.
Relatively Random: Context Effects on Perceived Randomness and Predicted Outcomes

ERIC Educational Resources Information Center

Matthews, William J.

2013-01-01

This article concerns the effect of context on people's judgments about sequences of chance outcomes. In Experiment 1, participants judged whether sequences were produced by random, mechanical processes (such as a roulette wheel) or skilled human action (such as basketball shots). Sequences with lower alternation rates were judged more likely to…
First report of human parvovirus 4 detection in Iran.

PubMed

Asiyabi, Sanaz; Nejati, Ahmad; Shoja, Zabihollah; Shahmahmoodi, Shohreh; Jalilvand, Somayeh; Farahmand, Mohammad; Gorzin, Ali-Akbar; Najafi, Alireza; Haji Mollahoseini, Mostafa; Marashi, Sayed Mahdi

2016-08-01

Parvovirus 4 (PARV4) is an emerging and intriguing virus that currently received many attentions. High prevalence of PARV4 infection in high-risk groups such as HIV infected patients highlights the potential clinical outcomes that this virus might have. Molecular techniques were used to determine both the presence and the genotype of circulating PARV4 on previously collected serum samples from 133 HIV infected patients and 120 healthy blood donors. Nested PCR was applied to assess the presence of PARV4 DNA genome in both groups. PARV4 DNA was detected in 35.3% of HIV infected patients compared to 16.6% healthy donors. To genetically characterize the PARV4 genotype in these groups, positive samples were randomly selected and subjected for sequencing and phylogenetic analysis. All PARV4 sequences were found to be genotype 1 and clustered with the reference sequences of PARV4 genotype 1. J. Med. Virol. 88:1314-1318, 2016. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955–2013

PubMed Central

Armijo-Olivo, Susan; Cummings, Greta G.; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

Objectives To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. Methods We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Results Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955–2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. Conclusions The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed. PMID:29272315
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955-2013.

PubMed

Saltaji, Humam; Armijo-Olivo, Susan; Cummings, Greta G; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955-2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed.
Nicotine pharmacokinetic profiles of the Tobacco Heating System 2.2, cigarettes and nicotine gum in Japanese smokers.

PubMed

Brossard, Patrick; Weitkunat, Rolf; Poux, Valerie; Lama, Nicola; Haziza, Christelle; Picavet, Patrick; Baker, Gizelle; Lüdicke, Frank

2017-10-01

Two open-label randomized cross-over studies in Japanese smokers investigated the single-use nicotine pharmacokinetic profile of the Tobacco Heating System (THS) 2.2, cigarettes (CC) and nicotine replacement therapy (Gum). In each study, one on the regular and one on the menthol variants of the THS and CC, both using Gum as reference, 62 subjects were randomized to four sequences: Sequence 1: THS - CC (n = 22); Sequence 2: CC - THS (n = 22); Sequence 3: THS - Gum (n = 9); Sequence 4: Gum - THS (n = 9). Plasma nicotine concentrations were measured in 16 blood samples collected over 24 h after single use. Maximal nicotine concentration (C max ) and area under the curve from start of product use to time of last quantifiable concentration (AUC 0-last ) were similar between THS and CC in both studies, with ratios varying from 88 to 104% for C max and from 96 to 98% for AUC 0-last . Urge-to-smoke total scores were comparable between THS and CC. The THS nicotine pharmacokinetic profile was close to CC, with similar levels of urge-to-smoke. This suggests that THS can satisfy smokers and be a viable alternative to cigarettes for adult smokers who want to continue using tobacco. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

The Effect of Interference on Temporal Order Memory for Random and Fixed Sequences in Nondemented Older Adults

ERIC Educational Resources Information Center

Tolentino, Jerlyn C.; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K.; Gilbert, Paul E.

2012-01-01

Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order…
A Study of Ontogenetic and Generational Change in Adolescent Personality by Means of Multivariate Longitudinal Sequences: Phase II. Final Report.

ERIC Educational Resources Information Center

Nesselroade, John R.; Baltes, Paul B.

Assessment of the relationship between ontogenetic (individual) and generational (historical) change in adolescent personality development was the focus of this study. The total sample included 1000 male and female adolescents (ages 13-18) randomly drawn from 32 public school systems in West Virginia following a design using longitudinal sequences…
Pseudo-Random Sequence Modifications for Ion Mobility Orthogonal Time of Flight Mass Spectrometry

PubMed Central

Clowers, Brian H.; Belov, Mikhail E.; Prior, David C.; Danielson, William F.; Ibrahim, Yehia; Smith, Richard D.

2008-01-01

Due to the inherently low duty cycle of ion mobility spectrometry (IMS) experiments that sample from continuous ion sources, a range of experimental advances have been developed to maximize ion utilization efficiency. The use of ion trapping mechanisms prior to the ion mobility drift tube has demonstrated significant gains over discrete sampling from continuous sources; however, these technologies have traditionally relied upon a signal averaging to attain analytically relevant signal-to-noise ratios (SNR). Multiplexed (MP) techniques based upon the Hadamard transform offer an alternative experimental approach by which ion utilization efficiency can be elevated to ∼ 50 %. Recently, our research group demonstrated a unique multiplexed ion mobility time-of-flight (MP-IMS-TOF) approach that incorporates ion trapping and can extend ion utilization efficiency beyond 50 %. However, the spectral reconstruction of the multiplexed signal using this experiment approach requires the use of sample-specific weighing designs. Though general weighing designs have been shown to significantly enhance ion utilization efficiency using this MP technique, such weighing designs cannot be applied to all samples. By modifying both the ion funnel trap and the pseudo random sequence (PRS) used for the MP experiment we have eliminated the need for complex weighing matrices. For both simple and complex mixtures SNR enhancements of up to 13 were routinely observed as compared to the SA-IMS-TOF experiment. In addition, this new class of PRS provides a two fold enhancement in ion throughput compared to the traditional HT-IMS experiment. PMID:18311942
Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

PubMed

Shteingart, Hanan; Loewenstein, Yonatan

2016-01-01

There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
Clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing

PubMed Central

Momeni, Stephanie S.; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A.; Childers, Noel K.

2015-01-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African American children was examined using MLST. Serotype and presence of collagen-binding proteins (CBP) cnm/cbm were also assessed. One hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using START2 and MEGA. Thirty-four sequence types (ST) were identified of which 27 were unique to this population. Seventy-five percent of the isolates clustered into 16 clonal groups. Serotypes observed were c (n=84), e (n=3), and k (n=11). The prevalence of S. mutans isolates serotype k was notably high at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized populations studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study is higher than reported in most populations and is the first report of S. mutans serotype k in a US population. PMID:26443288
Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

PubMed Central

Matochko, Wadim L.; Derda, Ratmir

2013-01-01

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
Genetic alterations of hepatocellular carcinoma by random amplified polymorphic DNA analysis and cloning sequencing of tumor differential DNA fragment

PubMed Central

Xian, Zhi-Hong; Cong, Wen-Ming; Zhang, Shu-Hui; Wu, Meng-Chao

2005-01-01

AIM: To study the genetic alterations and their association with clinicopathological characteristics of hepatocellular carcinoma (HCC), and to find the tumor related DNA fragments. METHODS: DNA isolated from tumors and corresponding noncancerous liver tissues of 56 HCC patients was amplified by random amplified polymorphic DNA (RAPD) with 10 random 10-mer arbitrary primers. The RAPD bands showing obvious differences in tumor tissue DNA corresponding to that of normal tissue were separated, purified, cloned and sequenced. DNA sequences were analyzed and compared with GenBank data. RESULTS: A total of 56 cases of HCC were demonstrated to have genetic alterations, which were detected by at least one primer. The detestability of genetic alterations ranged from 20% to 70% in each case, and 17.9% to 50% in each primer. Serum HBV infection, tumor size, histological grade, tumor capsule, as well as tumor intrahepatic metastasis, might be correlated with genetic alterations on certain primers. A band with a higher intensity of 480 bp or so amplified fragments in tumor DNA relative to normal DNA could be seen in 27 of 56 tumor samples using primer 4. Sequence analysis of these fragments showed 91% homology with Homo sapiens double homeobox protein DUX10 gene. CONCLUSION: Genetic alterations are a frequent event in HCC, and tumor related DNA fragments have been found in this study, which may be associated with hepatocarcin-ogenesis. RAPD is an effective method for the identification and analysis of genetic alterations in HCC, and may provide new information for further evaluating the molecular mechanism of hepatocarcinogenesis. PMID:15996039
Simulations Using Random-Generated DNA and RNA Sequences

ERIC Educational Resources Information Center

Bryce, C. F. A.

1977-01-01

Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

PubMed Central

Pierro, Gaetano

2012-01-01

The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380
Nonlinear Estimation of Discrete-Time Signals Under Random Observation Delay

DOE Office of Scientific and Technical Information (OSTI.GOV)

Caballero-Aguila, R.; Jimenez-Lopez, J. D.; Hermoso-Carazo, A.

2008-11-06

This paper presents an approximation to the nonlinear least-squares estimation problem of discrete-time stochastic signals using nonlinear observations with additive white noise which can be randomly delayed by one sampling time. The observation delay is modelled by a sequence of independent Bernoulli random variables whose values, zero or one, indicate that the real observation arrives on time or it is delayed and, hence, the available measurement to estimate the signal is not up-to-date. Assuming that the state-space model generating the signal is unknown and only the covariance functions of the processes involved in the observation equation are ready for use,more » a filtering algorithm based on linear approximations of the real observations is proposed.« less
Methodological Reporting of Randomized Trials in Five Leading Chinese Nursing Journals

PubMed Central

Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

2014-01-01

Background Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. Methods In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. Results In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34±0.97 (Mean ± SD). No RCT reported descriptions and changes in “trial design,” changes in “outcomes” and “implementation,” or descriptions of the similarity of interventions for “blinding.” Poor reporting was found in detailing the “settings of participants” (13.1%), “type of randomization sequence generation” (1.8%), calculation methods of “sample size” (0.4%), explanation of any interim analyses and stopping guidelines for “sample size” (0.3%), “allocation concealment mechanism” (0.3%), additional analyses in “statistical methods” (2.1%), and targeted subjects and methods of “blinding” (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of “participants,” “interventions,” and definitions of the “outcomes” and “statistical methods.” The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. Conclusions The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods. PMID:25415382
Fungal diversity in grape must and wine fermentation assessed by massive sequencing, quantitative PCR and DGGE

PubMed Central

Wang, Chunxiao; García-Fernández, David; Mas, Albert; Esteve-Zarzoso, Braulio

2015-01-01

The diversity of fungi in grape must and during wine fermentation was investigated in this study by culture-dependent and culture-independent techniques. Carignan and Grenache grapes were harvested from three vineyards in the Priorat region (Spain) in 2012, and nine samples were selected from the grape must after crushing and during wine fermentation. From culture-dependent techniques, 362 isolates were randomly selected and identified by 5.8S-ITS-RFLP and 26S-D1/D2 sequencing. Meanwhile, genomic DNA was extracted directly from the nine samples and analyzed by qPCR, DGGE and massive sequencing. The results indicated that grape must after crushing harbored a high species richness of fungi with Aspergillus tubingensis, Aureobasidium pullulans, or Starmerella bacillaris as the dominant species. As fermentation proceeded, the species richness decreased, and yeasts such as Hanseniaspora uvarum, Starmerella bacillaris and Saccharomyces cerevisiae successively occupied the must samples. The “terroir” characteristics of the fungus population are more related to the location of the vineyard than to grape variety. Sulfur dioxide treatment caused a low effect on yeast diversity by similarity analysis. Because of the existence of large population of fungi on grape berries, massive sequencing was more appropriate to understand the fungal community in grape must after crushing than the other techniques used in this study. Suitable target sequences and databases were necessary for accurate evaluation of the community and the identification of species by the 454 pyrosequencing of amplicons. PMID:26557110
Nested PCR detection and phylogenetic analysis of Babesia bovis and Babesia bigemina in cattle from Peri-urban localities in Gauteng Province, South Africa.

PubMed

Mtshali, Phillip Senzo; Tsotetsi, Ana Mbokeleng; Thekisoe, Matlhahane Molifi Oriel; Mtshali, Moses Sibusiso

2014-01-01

Babesia bovis and Babesia bigemina are tick-borne hemoparasites causing babesiosis in cattle worldwide. This study was aimed at providing information about the occurrence and geographical distribution of B. bovis and B. bigemina species in cattle from Gauteng province, South Africa. A total of 268 blood samples collected from apparently healthy animals in 14 different peri-urban localities were tested using previously established nested PCR assays for the detection of B. bovis and B. bigemina species-specific genes encoding rhoptry-associated protein 1 (RAP-1) and SpeI-AvaI restriction fragment, respectively. Nested PCR assays revealed that the overall prevalence was 35.5% (95% confidence interval [CI]=± 5.73) and 76.1% (95% CI=± 5.11) for B. bovis and B. bigemina, respectively. PCR results were corroborated by sequencing amplicons of randomly selected samples. The neighbor-joining trees were constructed to study the phylogenetic relationship between B. bovis and B. bigemina sequences of randomly selected isolates. Analysis of phylogram inferred with B. bovis RAP-1 sequences indicated a close relationship between our isolates and GenBank strains. On the other hand, a tree constructed with B. bigemina gp45 sequences revealed a high degree of polymorphism among the B. bigemina isolates investigated in this study. Taken together, the results presented in this work indicate the high incidence of Babesia parasites in cattle from previously uncharacterised peri-urban areas of the Gauteng province. These findings suggest that effective preventative and control measures are essential to curtail the spread of Babesia infections among cattle populations in Gauteng.
High-quality mtDNA control region sequences from 680 individuals sampled across the Netherlands to establish a national forensic mtDNA reference database.

PubMed

Chaitanya, Lakshmi; van Oven, Mannis; Brauer, Silke; Zimmermann, Bettina; Huber, Gabriela; Xavier, Catarina; Parson, Walther; de Knijff, Peter; Kayser, Manfred

2016-03-01

The use of mitochondrial DNA (mtDNA) for maternal lineage identification often marks the last resort when investigating forensic and missing-person cases involving highly degraded biological materials. As with all comparative DNA testing, a match between evidence and reference sample requires a statistical interpretation, for which high-quality mtDNA population frequency data are crucial. Here, we determined, under high quality standards, the complete mtDNA control-region sequences of 680 individuals from across the Netherlands sampled at 54 sites, covering the entire country with 10 geographic sub-regions. The complete mtDNA control region (nucleotide positions 16,024-16,569 and 1-576) was amplified with two PCR primers and sequenced with ten different sequencing primers using the EMPOP protocol. Haplotype diversity of the entire sample set was very high at 99.63% and, accordingly, the random-match probability was 0.37%. No population substructure within the Netherlands was detected with our dataset. Phylogenetic analyses were performed to determine mtDNA haplogroups. Inclusion of these high-quality data in the EMPOP database (accession number: EMP00666) will improve its overall data content and geographic coverage in the interest of all EMPOP users worldwide. Moreover, this dataset will serve as (the start of) a national reference database for mtDNA applications in forensic and missing person casework in the Netherlands. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.
A note on the efficiencies of sampling strategies in two-stage Bayesian regional fine mapping of a quantitative trait.

PubMed

Chen, Zhijian; Craiu, Radu V; Bull, Shelley B

2014-11-01

In focused studies designed to follow up associations detected in a genome-wide association study (GWAS), investigators can proceed to fine-map a genomic region by targeted sequencing or dense genotyping of all variants in the region, aiming to identify a functional sequence variant. For the analysis of a quantitative trait, we consider a Bayesian approach to fine-mapping study design that incorporates stratification according to a promising GWAS tag SNP in the same region. Improved cost-efficiency can be achieved when the fine-mapping phase incorporates a two-stage design, with identification of a smaller set of more promising variants in a subsample taken in stage 1, followed by their evaluation in an independent stage 2 subsample. To avoid the potential negative impact of genetic model misspecification on inference we incorporate genetic model selection based on posterior probabilities for each competing model. Our simulation study shows that, compared to simple random sampling that ignores genetic information from GWAS, tag-SNP-based stratified sample allocation methods reduce the number of variants continuing to stage 2 and are more likely to promote the functional sequence variant into confirmation studies. © 2014 WILEY PERIODICALS, INC.
Linkage of Viral Sequences among HIV-Infected Village Residents in Botswana: Estimation of Linkage Rates in the Presence of Missing Data

PubMed Central

Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor

2014-01-01

Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40–100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data. PMID:24415932
Linkage of viral sequences among HIV-infected village residents in Botswana: estimation of linkage rates in the presence of missing data.

PubMed

Carnegie, Nicole Bohme; Wang, Rui; Novitsky, Vladimir; De Gruttola, Victor

2014-01-01

Linkage analysis is useful in investigating disease transmission dynamics and the effect of interventions on them, but estimates of probabilities of linkage between infected people from observed data can be biased downward when missingness is informative. We investigate variation in the rates at which subjects' viral genotypes link across groups defined by viral load (low/high) and antiretroviral treatment (ART) status using blood samples from household surveys in the Northeast sector of Mochudi, Botswana. The probability of obtaining a sequence from a sample varies with viral load; samples with low viral load are harder to amplify. Pairwise genetic distances were estimated from aligned nucleotide sequences of HIV-1C env gp120. It is first shown that the probability that randomly selected sequences are linked can be estimated consistently from observed data. This is then used to develop estimates of the probability that a sequence from one group links to at least one sequence from another group under the assumption of independence across pairs. Furthermore, a resampling approach is developed that accounts for the presence of correlation across pairs, with diagnostics for assessing the reliability of the method. Sequences were obtained for 65% of subjects with high viral load (HVL, n = 117), 54% of subjects with low viral load but not on ART (LVL, n = 180), and 45% of subjects on ART (ART, n = 126). The probability of linkage between two individuals is highest if both have HVL, and lowest if one has LVL and the other has LVL or is on ART. Linkage across groups is high for HVL and lower for LVL and ART. Adjustment for missing data increases the group-wise linkage rates by 40-100%, and changes the relative rates between groups. Bias in inferences regarding HIV viral linkage that arise from differential ability to genotype samples can be reduced by appropriate methods for accommodating missing data.
Gift from statistical learning: Visual statistical learning enhances memory for sequence elements and impairs memory for items that disrupt regularities.

PubMed

Otsuka, Sachio; Saiki, Jun

2016-02-01

Prior studies have shown that visual statistical learning (VSL) enhances familiarity (a type of memory) of sequences. How do statistical regularities influence the processing of each triplet element and inserted distractors that disrupt the regularity? Given that increased attention to triplets induced by VSL and inhibition of unattended triplets, we predicted that VSL would promote memory for each triplet constituent, and degrade memory for inserted stimuli. Across the first two experiments, we found that objects from structured sequences were more likely to be remembered than objects from random sequences, and that letters (Experiment 1) or objects (Experiment 2) inserted into structured sequences were less likely to be remembered than those inserted into random sequences. In the subsequent two experiments, we examined an alternative account for our results, whereby the difference in memory for inserted items between structured and random conditions is due to individuation of items within random sequences. Our findings replicated even when control letters (Experiment 3A) or objects (Experiment 3B) were presented before or after, rather than inserted into, random sequences. Our findings suggest that statistical learning enhances memory for each item in a regular set and impairs memory for items that disrupt the regularity. Copyright © 2015 Elsevier B.V. All rights reserved.
Deep nirS amplicon sequencing of San Francisco Bay sediments enables prediction of geography and environmental conditions from denitrifying community composition.

PubMed

Lee, Jessica A; Francis, Christopher A

2017-12-01

Denitrification is a dominant nitrogen loss process in the sediments of San Francisco Bay. In this study, we sought to understand the ecology of denitrifying bacteria by using next-generation sequencing (NGS) to survey the diversity of a denitrification functional gene, nirS (encoding cytchrome-cd 1 nitrite reductase), along the salinity gradient of San Francisco Bay over the course of a year. We compared our dataset to a library of nirS sequences obtained previously from the same samples by standard PCR cloning and Sanger sequencing, and showed that both methods similarly demonstrated geography, salinity and, to a lesser extent, nitrogen, to be strong determinants of community composition. Furthermore, the depth afforded by NGS enabled novel techniques for measuring the association between environment and community composition. We used Random Forests modelling to demonstrate that the site and salinity of a sample could be predicted from its nirS sequences, and to identify indicator taxa associated with those environmental characteristics. This work contributes significantly to our understanding of the distribution and dynamics of denitrifying communities in San Francisco Bay, and provides valuable tools for the further study of this key N-cycling guild in all estuarine systems. © 2017 Society for Applied Microbiology and John Wiley & Sons Ltd.
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

PubMed

Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

2018-03-20

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.

Benchmarking protein classification algorithms via supervised cross-validation.

PubMed

Kertész-Farkas, Attila; Dhir, Somdutta; Sonego, Paolo; Pacurar, Mircea; Netoteia, Sergiu; Nijveen, Harm; Kuzniar, Arnold; Leunissen, Jack A M; Kocsor, András; Pongor, Sándor

2008-04-24

Development and testing of protein classification algorithms are hampered by the fact that the protein universe is characterized by groups vastly different in the number of members, in average protein size, similarity within group, etc. Datasets based on traditional cross-validation (k-fold, leave-one-out, etc.) may not give reliable estimates on how an algorithm will generalize to novel, distantly related subtypes of the known protein classes. Supervised cross-validation, i.e., selection of test and train sets according to the known subtypes within a database has been successfully used earlier in conjunction with the SCOP database. Our goal was to extend this principle to other databases and to design standardized benchmark datasets for protein classification. Hierarchical classification trees of protein categories provide a simple and general framework for designing supervised cross-validation strategies for protein classification. Benchmark datasets can be designed at various levels of the concept hierarchy using a simple graph-theoretic distance. A combination of supervised and random sampling was selected to construct reduced size model datasets, suitable for algorithm comparison. Over 3000 new classification tasks were added to our recently established protein classification benchmark collection that currently includes protein sequence (including protein domains and entire proteins), protein structure and reading frame DNA sequence data. We carried out an extensive evaluation based on various machine-learning algorithms such as nearest neighbor, support vector machines, artificial neural networks, random forests and logistic regression, used in conjunction with comparison algorithms, BLAST, Smith-Waterman, Needleman-Wunsch, as well as 3D comparison methods DALI and PRIDE. The resulting datasets provide lower, and in our opinion more realistic estimates of the classifier performance than do random cross-validation schemes. A combination of supervised and random sampling was used to construct model datasets, suitable for algorithm comparison.
[Influence of "prehistory" of sequential movements of the right and the left hand on reproduction: coding of positions, movements and sequence structure].

PubMed

Bobrova, E V; Liakhovetskiĭ, V A; Borshchevskaia, E R

2011-01-01

The dependence of errors during reproduction of a sequence of hand movements without visual feedback on the previous right- and left-hand performance ("prehistory") and on positions in space of sequence elements (random or ordered by the explicit rule) was analyzed. It was shown that the preceding information about the ordered positions of the sequence elements was used during right-hand movements, whereas left-hand movements were performed with involvement of the information about the random sequence. The data testify to a central mechanism of the analysis of spatial structure of sequence elements. This mechanism activates movement coding specific for the left hemisphere (vector coding) in case of an ordered sequence structure and positional coding specific for the right hemisphere in case of a random sequence structure.
Insights into the Performance of SD Bioline Malaria Ag P.f/Pan Rapid Diagnostic Test and Plasmodium falciparum Histidine-Rich Protein 2 Gene Variation in Madagascar.

PubMed

Willie, Nigani; Mehlotra, Rajeev K; Howes, Rosalind E; Rakotomanga, Tovonahary A; Ramboarina, Stephanie; Ratsimbasoa, Arsène C; Zimmerman, Peter A

2018-06-01

Plasmodium falciparum histidine-rich protein 2 (PfHRP2) forms the basis of many current malaria rapid diagnostic tests (RDTs). However, the parasites lacking part or all of the pfhrp2 gene do not express the PfHRP2 protein and are, therefore, not identifiable by PfHRP2-detecting RDTs. We evaluated the performance of the SD Bioline Malaria Ag P.f/Pan RDT together with pfhrp2 variation in Madagascar. Genomic DNA isolated from 260 patient blood samples were polymerase chain reaction (PCR)-amplified for the parasite 18S rRNA and pfhrp2 genes. Post-PCR ligation detection reaction-fluorescent microsphere assay (LDR-FMA) was performed for the identification of parasite species. Plasmodium falciparum histidine-rich protein 2 amplicons were sequenced. Polymerase chain reaction diagnosis of patient samples showed that 29% (75/260) were infected and P. falciparum was present in 95% (71/75) of these PCR-positive samples. Comparing RDT and P. falciparum detection by LDR-FMA, eight samples were RDT negative but P. falciparum positive (false negatives), all of which were pfhrp2 positive. The sensitivity and specificity of the RDT were 87% and 90%, respectively. Seventy-three samples were amplified for pfhrp2 , from which nine randomly selected amplicons were sequenced, yielding 13 sequences. Amplification of pfhrp2 , combined with RDT analysis and P. falciparum detection by LDR-FMA, showed that there was no indication of pfhrp2 deletion. Sequence analysis of pfhrp2 showed that the correlation between pfhrp2 sequence structure and RDT detection rates was unclear. Although the observed absence of pfhrp2 deletion from the samples screened here is encouraging, continued monitoring of the efficacy of the SD Bioline Malaria Ag P.f/Pan RDT for malaria diagnosis in Madagascar is warranted.
MHC diversity in two Acrocephalus species: the outbred Great reed warbler and the inbred Seychelles warbler.

PubMed

Richardson, David S; Westerdahl, Helena

2003-12-01

The Great reed warbler (GRW) and the Seychelles warbler (SW) are congeners with markedly different demographic histories. The GRW is a normal outbred bird species while the SW population remains isolated and inbred after undergoing a severe population bottleneck. We examined variation at Major Histocompatibility Complex (MHC) class I exon 3 using restriction fragment length polymorphism, denaturing gradient gel electrophoresis and DNA sequencing. Although genetic variation was higher in the GRW, considerable variation has been maintained in the SW. The ten exon 3 sequences found in the SW were as diverged from each other as were a random sub-sample of the 67 sequences from the GRW. There was evidence for balancing selection in both species, and the phylogenetic analysis showing that the exon 3 sequences did not separate according to species, was consistent with transspecies evolution of the MHC.
Comparing viral metagenomics methods using a highly multiplexed human viral pathogens reagent

PubMed Central

Li, Linlin; Deng, Xutao; Mee, Edward T.; Collot-Teixeira, Sophie; Anderson, Rob; Schepelmann, Silke; Minor, Philip D.; Delwart, Eric

2014-01-01

Unbiased metagenomic sequencing holds significant potential as a diagnostic tool for the simultaneous detection of any previously genetically described viral nucleic acids in clinical samples. Viral genome sequences can also inform on likely phenotypes including drug susceptibility or neutralization serotypes. In this study, different variables of the laboratory methods often used to generate viral metagenomics libraries on the efficiency of viral detection and virus genome coverage were compared. A biological reagent consisting of 25 different human RNA and DNA viral pathogens was used to estimate the effect of filtration and nuclease digestion, DNA/RNA extraction methods, pre-amplification and the use of different library preparation kits on the detection of viral nucleic acids. Filtration and nuclease treatment led to slight decreases in the percentage of viral sequence reads and number of viruses detected. For nucleic acid extractions silica spin columns improved viral sequence recovery relative to magnetic beads and Trizol extraction. Pre-amplification using random RT-PCR while generating more viral sequence reads resulted in detection of fewer viruses, more overlapping sequences, and lower genome coverage. The ScriptSeq library preparation method retrieved more viruses and a greater fraction of their genomes than the TruSeq and Nextera methods. Viral metagenomics sequencing was able to simultaneously detect up to 22 different viruses in the biological reagent analyzed including all those detected by qPCR. Further optimization will be required for the detection of viruses in biologically more complex samples such as tissues, blood, or feces. PMID:25497414
Theta oscillations promote temporal sequence learning.

PubMed

Crivelli-Decker, Jordan; Hsieh, Liang-Tien; Clarke, Alex; Ranganath, Charan

2018-05-17

Many theoretical models suggest that neural oscillations play a role in learning or retrieval of temporal sequences, but the extent to which oscillations support sequence representation remains unclear. To address this question, we used scalp electroencephalography (EEG) to examine oscillatory activity over learning of different object sequences. Participants made semantic decisions on each object as they were presented in a continuous stream. For three "Consistent" sequences, the order of the objects was always fixed. Activity during Consistent sequences was compared to "Random" sequences that consisted of the same objects presented in a different order on each repetition. Over the course of learning, participants made faster semantic decisions to objects in Consistent, as compared to objects in Random sequences. Thus, participants were able to use sequence knowledge to predict upcoming items in Consistent sequences. EEG analyses revealed decreased oscillatory power in the theta (4-7 Hz) band at frontal sites following decisions about objects in Consistent sequences, as compared with objects in Random sequences. The theta power difference between Consistent and Random only emerged in the second half of the task, as participants were more effectively able to predict items in Consistent sequences. Moreover, we found increases in parieto-occipital alpha (10-13 Hz) and beta (14-28 Hz) power during the pre-response period for objects in Consistent sequences, relative to objects in Random sequences. Linear mixed effects modeling revealed that single trial theta oscillations were related to reaction time for future objects in a sequence, whereas beta and alpha oscillations were only predictive of reaction time on the current trial. These results indicate that theta and alpha/beta activity preferentially relate to future and current events, respectively. More generally our findings highlight the importance of band-specific neural oscillations in the learning of temporal order information. Copyright © 2018 The Authors. Published by Elsevier Inc. All rights reserved.
SNP-VISTA: An Interactive SNPs Visualization Tool

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shah, Nameeta; Teplitsky, Michael V.; Pennacchio, Len A.

2005-07-05

Recent advances in sequencing technologies promise better diagnostics for many diseases as well as better understanding of evolution of microbial populations. Single Nucleotide Polymorphisms(SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it is possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease and then screen for causative mutations.In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmentalmore » samples makes possible more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at http://genome.lbl.gov/vista/snpvista.« less
BAsE-Seq: a method for obtaining long viral haplotypes from short sequence reads.

PubMed

Hong, Lewis Z; Hong, Shuzhen; Wong, Han Teng; Aw, Pauline P K; Cheng, Yan; Wilm, Andreas; de Sessions, Paola F; Lim, Seng Gee; Nagarajan, Niranjan; Hibberd, Martin L; Quake, Stephen R; Burkholder, William F

2014-01-01

We present a method for obtaining long haplotypes, of over 3 kb in length, using a short-read sequencer, Barcode-directed Assembly for Extra-long Sequences (BAsE-Seq). BAsE-Seq relies on transposing a template-specific barcode onto random segments of the template molecule and assembling the barcoded short reads into complete haplotypes. We applied BAsE-Seq on mixed clones of hepatitis B virus and accurately identified haplotypes occurring at frequencies greater than or equal to 0.4%, with >99.9% specificity. Applying BAsE-Seq to a clinical sample, we obtained over 9,000 viral haplotypes, which provided an unprecedented view of hepatitis B virus population structure during chronic infection. BAsE-Seq is readily applicable for monitoring quasispecies evolution in viral diseases.
Real-time UAV trajectory generation using feature points matching between video image sequences

NASA Astrophysics Data System (ADS)

Byun, Younggi; Song, Jeongheon; Han, Dongyeob

2017-09-01

Unmanned aerial vehicles (UAVs), equipped with navigation systems and video capability, are currently being deployed for intelligence, reconnaissance and surveillance mission. In this paper, we present a systematic approach for the generation of UAV trajectory using a video image matching system based on SURF (Speeded up Robust Feature) and Preemptive RANSAC (Random Sample Consensus). Video image matching to find matching points is one of the most important steps for the accurate generation of UAV trajectory (sequence of poses in 3D space). We used the SURF algorithm to find the matching points between video image sequences, and removed mismatching by using the Preemptive RANSAC which divides all matching points to outliers and inliers. The inliers are only used to determine the epipolar geometry for estimating the relative pose (rotation and translation) between image sequences. Experimental results from simulated video image sequences showed that our approach has a good potential to be applied to the automatic geo-localization of the UAVs system
RAD tag sequencing as a source of SNP markers in Cynara cardunculus L

PubMed Central

2012-01-01

Background The globe artichoke (Cynara cardunculus L. var. scolymus) genome is relatively poorly explored, especially compared to those of the other major Asteraceae crops sunflower and lettuce. No SNP markers are in the public domain. We have combined the recently developed restriction-site associated DNA (RAD) approach with the Illumina DNA sequencing platform to effect the rapid and mass discovery of SNP markers for C. cardunculus. Results RAD tags were sequenced from the genomic DNA of three C. cardunculus mapping population parents, generating 9.7 million reads, corresponding to ~1 Gbp of sequence. An assembly based on paired ends produced ~6.0 Mbp of genomic sequence, separated into ~19,000 contigs (mean length 312 bp), of which ~21% were fragments of putative coding sequence. The shared sequences allowed for the discovery of ~34,000 SNPs and nearly 800 indels, equivalent to a SNP frequency of 5.6 per 1,000 nt, and an indel frequency of 0.2 per 1,000 nt. A sample of heterozygous SNP loci was mapped by CAPS assays and this exercise provided validation of our mining criteria. The repetitive fraction of the genome had a high representation of retrotransposon sequence, followed by simple repeats, AT-low complexity regions and mobile DNA elements. The genomic k-mers distribution and CpG rate of C. cardunculus, compared with data derived from three whole genome-sequenced dicots species, provided a further evidence of the random representation of the C. cardunculus genome generated by RAD sampling. Conclusion The RAD tag sequencing approach is a cost-effective and rapid method to develop SNP markers in a highly heterozygous species. Our approach permitted to generate a large and robust SNP datasets by the adoption of optimized filtering criteria. PMID:22214349
A preliminary survey of Chlamydia psittaci genotypes from native and introduced birds in New Zealand.

PubMed

Gedye, K R; Fremaux, M; Garcia-Ramirez, J C; Gartrell, B D

2018-05-01

To describe the Chlamydia psittaci genotypes in samples from native and introduced birds from New Zealand by analysis of the sequence variation of the ompA gene. DNA was extracted from samples collected from a non-random sample of birds; either swabs from live asymptomatic birds or birds with clinical signs, or formalin-fixed, paraffin-embedded (FFPE) samples from historical post-mortem cases. The presence of C. psittaci in all samples had been confirmed using a quantitative PCR assay. The C. psittaci ompA gene was amplified and sequenced from samples from 26 native and introduced infected birds comprising 12 different species. These sequences were compared to published available C. psittaci genotypes. Genotypes A and C of C. psittaci were identified in the samples. Genotype A was identified in samples from nine birds, including various native and introduced species. Genotype C was identified in samples from 16 different waterfowl species, and a mixed infection of both genotypes was found in a kaka (Nestor meridionalis). In native birds, C. psittaci infection was confirmed in seven new host species. Two genotypes (A and C) of C. psittaci were found in samples from a wider range of both native and introduced species of birds in New Zealand than previously reported. Both genotypes have been globally associated with significant disease in birds and humans. These initial results suggest the host range of C. psittaci in New Zealand birds is under-reported. However, the prevalence of C. psittaci infection in New Zealand, and the associated impact on avian and public health, remains to be determined. There are biosecurity implications associated with the importation of birds to New Zealand if there is a limited diversity of C. psittaci genotypes present.
Molecular epidemiology of goat pox viruses.

PubMed

Roy, P; Jaisree, S; Balakrishnan, S; Senthilkumar, K; Mahaprabhu, R; Mishra, A; Maity, B; Ghosh, T K; Karmakar, A P

2018-02-01

Goat pox disease outbreaks were observed in different places affecting Black Bengal Goats in West Bengal (WB) and Tellicherry, Vembur and non-descriptive breeds in Tamil Nadu (TN) causing severe lesions and mortality up to 30%. Clinical specimens from all the outbreaks were screened by polymerase chain reaction followed by restriction fragment length polymorphism (PCR-RFLP) and confirmed the diseases as Goat Pox. Virus isolation in Vero cell line was done with randomly selected ten samples, cytopathic effects (CPE) characterized by syncytia and intracytoplasmic inclusion bodies were observed after several blind passages. Nucleotide sequence of complete p32 gene using randomly selected two isolates and three clinical specimens revealed presence of Goat pox virus (GTPV)-specific signature residues in all the sequences. Phylogenetic analysis using the present five sequences along with GenBank data of GTPV complete p32 gene sequences showed all the GTPV sequences cluster together except Pellor strain (NC004003) and FZ Chinese strain (KC951854). The five sequences either from WB or TN cluster more closely with GTPV isolates of Maharashtra state that were responsible for cross species outbreak of pox disease in both sheep (KF468759) and goats (KF468762) in India during the year 2010. All the Indian goat pox viruses, including the Mukteswar strain, isolated in 1946 and sequence reported in 2004 clustered together with the GTPVs causing the recent outbreaks. It was observed that GTPVs caused similar clinical manifestation irrespective of their geographical locations and breed characteristics, no variation observed among the Indian isolates based on p32 gene over the period of seventy years and disease outbreaks could not be observed or reported in vaccinated goats. © 2017 Blackwell Verlag GmbH.
Improving the performance of minimizers and winnowing schemes

PubMed Central

Marçais, Guillaume; Pellow, David; Bork, Daniel; Orenstein, Yaron; Shamir, Ron; Kingsford, Carl

2017-01-01

Abstract Motivation: The minimizers scheme is a method for selecting k-mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k-mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k-mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues. Results: We provide an in-depth analysis of the effect of k-mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al.) on the expected density of minimizers in a random sequence. Availability and Implementation: The software used for this analysis is available on GitHub: https://github.com/gmarcais/minimizers.git. Contact: gmarcais@cs.cmu.edu or carlk@cs.cmu.edu PMID:28881970
Hybridization Capture Using RAD Probes (hyRAD), a New Tool for Performing Genomic Analyses on Collection Specimens

PubMed Central

Suchan, Tomasz; Pitteloud, Camille; Gerasimova, Nadezhda S.; Kostikova, Anna; Schmid, Sarah; Arrigo, Nils; Pajkovic, Mila; Ronikier, Michał; Alvarez, Nadir

2016-01-01

In the recent years, many protocols aimed at reproducibly sequencing reduced-genome subsets in non-model organisms have been published. Among them, RAD-sequencing is one of the most widely used. It relies on digesting DNA with specific restriction enzymes and performing size selection on the resulting fragments. Despite its acknowledged utility, this method is of limited use with degraded DNA samples, such as those isolated from museum specimens, as these samples are less likely to harbor fragments long enough to comprise two restriction sites making possible ligation of the adapter sequences (in the case of double-digest RAD) or performing size selection of the resulting fragments (in the case of single-digest RAD). Here, we address these limitations by presenting a novel method called hybridization RAD (hyRAD). In this approach, biotinylated RAD fragments, covering a random fraction of the genome, are used as baits for capturing homologous fragments from genomic shotgun sequencing libraries. This simple and cost-effective approach allows sequencing of orthologous loci even from highly degraded DNA samples, opening new avenues of research in the field of museum genomics. Not relying on the restriction site presence, it improves among-sample loci coverage. In a trial study, hyRAD allowed us to obtain a large set of orthologous loci from fresh and museum samples from a non-model butterfly species, with a high proportion of single nucleotide polymorphisms present in all eight analyzed specimens, including 58-year-old museum samples. The utility of the method was further validated using 49 museum and fresh samples of a Palearctic grasshopper species for which the spatial genetic structure was previously assessed using mtDNA amplicons. The application of the method is eventually discussed in a wider context. As it does not rely on the restriction site presence, it is therefore not sensitive to among-sample loci polymorphisms in the restriction sites that usually causes loci dropout. This should enable the application of hyRAD to analyses at broader evolutionary scales. PMID:26999359
The genealogy of samples in models with selection.

PubMed

Neuhauser, C; Krone, S M

1997-02-01

We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models. DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case.
The Genealogy of Samples in Models with Selection

PubMed Central

Neuhauser, C.; Krone, S. M.

1997-01-01

We introduce the genealogy of a random sample of genes taken from a large haploid population that evolves according to random reproduction with selection and mutation. Without selection, the genealogy is described by Kingman's well-known coalescent process. In the selective case, the genealogy of the sample is embedded in a graph with a coalescing and branching structure. We describe this graph, called the ancestral selection graph, and point out differences and similarities with Kingman's coalescent. We present simulations for a two-allele model with symmetric mutation in which one of the alleles has a selective advantage over the other. We find that when the allele frequencies in the population are already in equilibrium, then the genealogy does not differ much from the neutral case. This is supported by rigorous results. Furthermore, we describe the ancestral selection graph for other selective models with finitely many selection classes, such as the K-allele models, infinitely-many-alleles models, DNA sequence models, and infinitely-many-sites models, and briefly discuss the diploid case. PMID:9071604
ARTS: automated randomization of multiple traits for study design.

PubMed

Maienschein-Cline, Mark; Lei, Zhengdeng; Gardeux, Vincent; Abbasi, Taimur; Machado, Roberto F; Gordeuk, Victor; Desai, Ankit A; Saraf, Santosh; Bahroos, Neil; Lussier, Yves

2014-06-01

Collecting data from large studies on high-throughput platforms, such as microarray or next-generation sequencing, typically requires processing samples in batches. There are often systematic but unpredictable biases from batch-to-batch, so proper randomization of biologically relevant traits across batches is crucial for distinguishing true biological differences from experimental artifacts. When a large number of traits are biologically relevant, as is common for clinical studies of patients with varying sex, age, genotype and medical background, proper randomization can be extremely difficult to prepare by hand, especially because traits may affect biological inferences, such as differential expression, in a combinatorial manner. Here we present ARTS (automated randomization of multiple traits for study design), which aids researchers in study design by automatically optimizing batch assignment for any number of samples, any number of traits and any batch size. ARTS is implemented in Perl and is available at github.com/mmaiensc/ARTS. ARTS is also available in the Galaxy Tool Shed, and can be used at the Galaxy installation hosted by the UIC Center for Research Informatics (CRI) at galaxy.cri.uic.edu. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Molecular analysis of microbial community in a groundwater sample polluted by landfill leachate and seawater*

PubMed Central

Tian, Yang-jie; Yang, Hong; Wu, Xiu-juan; Li, Dao-tang

2005-01-01

Seashore landfill aquifers are environments of special physicochemical conditions (high organic load and high salinity), and microbes in leachate-polluted aquifers play a significant role for intrinsic bioremediation. In order to characterize microbial diversity and look for clues on the relationship between microbial community structure and hydrochemistry, a culture-independent examination of a typical groundwater sample obtained from a seashore landfill was conducted by sequence analysis of 16S rDNA clone library. Two sets of universal 16S rDNA primers were used to amplify DNA extracted from the groundwater so that problems arising from primer efficiency and specificity could be reduced. Of 74 clones randomly selected from the libraries, 30 contained unique sequences whose analysis showed that the majority of them belonged to bacteria (95.9%), with Proteobacteria (63.5%) being the dominant division. One archaeal sequence and one eukaryotic sequence were found as well. Bacterial sequences belonging to the following phylogenic groups were identified: Bacteroidetes (20.3%), β, γ, δ and ε-subdivisions of Proteobacteria (47.3%, 9.5%, 5.4% and 1.3%, respectively), Firmicutes (1.4%), Actinobacteria (2.7%), Cyanobacteria (2.7%). The percentages of Proteobacteria and Bacteroides in seawater were greater than those in the groundwater from a non-seashore landfill, indicating a possible influence of seawater. Quite a few sequences had close relatives in marine or hypersaline environments. Many sequences showed affiliations with microbes involved in anaerobic fermentation. The remarkable abundance of sequences related to (per)chlorate-reducing bacteria (ClRB) in the groundwater was significant and worthy of further study. PMID:15682499
Assessing randomness and complexity in human motion trajectories through analysis of symbolic sequences

PubMed Central

Peng, Zhen; Genewein, Tim; Braun, Daniel A.

2014-01-01

Complexity is a hallmark of intelligent behavior consisting both of regular patterns and random variation. To quantitatively assess the complexity and randomness of human motion, we designed a motor task in which we translated subjects' motion trajectories into strings of symbol sequences. In the first part of the experiment participants were asked to perform self-paced movements to create repetitive patterns, copy pre-specified letter sequences, and generate random movements. To investigate whether the degree of randomness can be manipulated, in the second part of the experiment participants were asked to perform unpredictable movements in the context of a pursuit game, where they received feedback from an online Bayesian predictor guessing their next move. We analyzed symbol sequences representing subjects' motion trajectories with five common complexity measures: predictability, compressibility, approximate entropy, Lempel-Ziv complexity, as well as effective measure complexity. We found that subjects' self-created patterns were the most complex, followed by drawing movements of letters and self-paced random motion. We also found that participants could change the randomness of their behavior depending on context and feedback. Our results suggest that humans can adjust both complexity and regularity in different movement types and contexts and that this can be assessed with information-theoretic measures of the symbolic sequences generated from movement trajectories. PMID:24744716
Investigation of the contextual interference effect in the manipulation of the motor parameter of over-all force.

PubMed

Goodwin, J E; Meeuwsen, H J

1996-12-01

This investigation examined the contextual interference effect when manipulating over-all force in a golf-putting task. Undergraduate women (N = 30) were randomly assigned to a Random, Blocked-Random, or Blocked practice condition and practiced golf putting from distances of 2.43 m, 3.95 m, and 5.47 m during acquisition. Subjects in the Random condition practiced trials in a quasirandom sequence and those in the Blocked-Random condition practiced trials initially in a blocked sequence with the remainder of the trials practiced in a quasirandom sequence. In the Blocked condition subjects practiced trials in a blocked sequence. A 24-hr. transfer test consisted of 30 trials with 10 trials each from 1.67 m, 3.19 m, and 6.23 m. Transfer scores supported the Magill and Hall (1990) hypothesis that, when task variations involve learning parameters of a generalized motor program, the benefit of random practice over blocked practice would not be found.

Using Maximum Entropy to Find Patterns in Genomes

NASA Astrophysics Data System (ADS)

Liu, Sophia; Hockenberry, Adam; Lancichinetti, Andrea; Jewett, Michael; Amaral, Luis

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. To accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. This approach can also be easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes. National Institute of General Medical Science, Northwestern University Presidential Fellowship, National Science Foundation, David and Lucile Packard Foundation, Camille Dreyfus Teacher Scholar Award.
Practical quantum key distribution protocol without monitoring signal disturbance.

PubMed

Sasaki, Toshihiko; Yamamoto, Yoshihisa; Koashi, Masato

2014-05-22

Quantum cryptography exploits the fundamental laws of quantum mechanics to provide a secure way to exchange private information. Such an exchange requires a common random bit sequence, called a key, to be shared secretly between the sender and the receiver. The basic idea behind quantum key distribution (QKD) has widely been understood as the property that any attempt to distinguish encoded quantum states causes a disturbance in the signal. As a result, implementation of a QKD protocol involves an estimation of the experimental parameters influenced by the eavesdropper's intervention, which is achieved by randomly sampling the signal. If the estimation of many parameters with high precision is required, the portion of the signal that is sacrificed increases, thus decreasing the efficiency of the protocol. Here we propose a QKD protocol based on an entirely different principle. The sender encodes a bit sequence onto non-orthogonal quantum states and the receiver randomly dictates how a single bit should be calculated from the sequence. The eavesdropper, who is unable to learn the whole of the sequence, cannot guess the bit value correctly. An achievable rate of secure key distribution is calculated by considering complementary choices between quantum measurements of two conjugate observables. We found that a practical implementation using a laser pulse train achieves a key rate comparable to a decoy-state QKD protocol, an often-used technique for lasers. It also has a better tolerance of bit errors and of finite-sized-key effects. We anticipate that this finding will give new insight into how the probabilistic nature of quantum mechanics can be related to secure communication, and will facilitate the simple and efficient use of conventional lasers for QKD.
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery

PubMed Central

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-01-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2–ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data. PMID:22570408
FusionAnalyser: a new graphical, event-driven tool for fusion rearrangements discovery.

PubMed

Piazza, Rocco; Pirola, Alessandra; Spinelli, Roberta; Valletta, Simona; Redaelli, Sara; Magistroni, Vera; Gambacorti-Passerini, Carlo

2012-09-01

Gene fusions are common driver events in leukaemias and solid tumours; here we present FusionAnalyser, a tool dedicated to the identification of driver fusion rearrangements in human cancer through the analysis of paired-end high-throughput transcriptome sequencing data. We initially tested FusionAnalyser by using a set of in silico randomly generated sequencing data from 20 known human translocations occurring in cancer and subsequently using transcriptome data from three chronic and three acute myeloid leukaemia samples. in all the cases our tool was invariably able to detect the presence of the correct driver fusion event(s) with high specificity. In one of the acute myeloid leukaemia samples, FusionAnalyser identified a novel, cryptic, in-frame ETS2-ERG fusion. A fully event-driven graphical interface and a flexible filtering system allow complex analyses to be run in the absence of any a priori programming or scripting knowledge. Therefore, we propose FusionAnalyser as an efficient and robust graphical tool for the identification of functional rearrangements in the context of high-throughput transcriptome sequencing data.
The effect of interference on temporal order memory for random and fixed sequences in nondemented older adults.

PubMed

Tolentino, Jerlyn C; Pirogovsky, Eva; Luu, Trinh; Toner, Chelsea K; Gilbert, Paul E

2012-05-21

Two experiments tested the effect of temporal interference on order memory for fixed and random sequences in young adults and nondemented older adults. The results demonstrate that temporal order memory for fixed and random sequences is impaired in nondemented older adults, particularly when temporal interference is high. However, temporal order memory for fixed sequences is comparable between older adults and young adults when temporal interference is minimized. The results suggest that temporal order memory is less efficient and more susceptible to interference in older adults, possibly due to impaired temporal pattern separation.
Assessment of clonality and serotypes of Streptococcus mutans among children by multilocus sequence typing.

PubMed

Momeni, Stephanie S; Whiddon, Jennifer; Cheon, Kyounga; Moser, Stephen A; Childers, Noel K

2015-12-01

Studies using multilocus sequence typing (MLST) have demonstrated that Streptococcus mutans isolates are genetically diverse. Our laboratory previously demonstrated clonality of S. mutans using MLST but could not discount the possibility of sampling bias. In this study, the clonality of randomly selected S. mutans plaque isolates from African-American children was examined using MLST. Serotype and the presence of collagen-binding proteins (CBPs) encoded by cnm/cbm were also assessed. One-hundred S. mutans isolates were randomly selected for MLST analysis. Sequence analysis was performed and phylogenetic trees were generated using start2 and mega. Thirty-four sequence types were identified, of which 27 were unique to this population. Seventy-five per cent of the isolates clustered into 16 clonal groups. The serotypes observed were c (n = 84), e (n = 3), and k (n = 11). The prevalence of S. mutans isolates of serotype k was notably high, at 17.5%. All isolates were cnm/cbm negative. The clonality of S. mutans demonstrated in this study illustrates the importance of localized population studies and are consistent with transmission. The prevalence of serotype k, a recently proposed systemic pathogen, observed in this study, is higher than reported in most populations and is the first report of S. mutans serotype k in a United States population. © 2015 Eur J Oral Sci.
Randomizer for High Data Rates

NASA Technical Reports Server (NTRS)

Garon, Howard; Sank, Victor J.

2018-01-01

NASA as well as a number of other space agencies now recognize that the current recommended CCSDS randomizer used for telemetry (TM) is too short. When multiple applications of the PN8 Maximal Length Sequence (MLS) are required in order to fully cover a channel access data unit (CADU), spectral problems in the form of elevated spurious discretes (spurs) appear. Originally the randomizer was called a bit transition generator (BTG) precisely because it was thought that its primary value was to insure sufficient bit transitions to allow the bit/symbol synchronizer to lock and remain locked. We, NASA, have shown that the old BTG concept is a limited view of the real value of the randomizer sequence and that the randomizer also aids in signal acquisition as well as minimizing the potential for false decoder lock. Under the guidelines we considered here there are multiple maximal length sequences under GF(2) which appear attractive in this application. Although there may be mitigating reasons why another MLS sequence could be selected, one sequence in particular possesses a combination of desired properties which offsets it from the others.
Active learning reduces annotation time for clinical concept extraction.

PubMed

Kholghi, Mahnoosh; Sitbon, Laurianne; Zuccon, Guido; Nguyen, Anthony

2017-10-01

To investigate: (1) the annotation time savings by various active learning query strategies compared to supervised learning and a random sampling baseline, and (2) the benefits of active learning-assisted pre-annotations in accelerating the manual annotation process compared to de novo annotation. There are 73 and 120 discharge summary reports provided by Beth Israel institute in the train and test sets of the concept extraction task in the i2b2/VA 2010 challenge, respectively. The 73 reports were used in user study experiments for manual annotation. First, all sequences within the 73 reports were manually annotated from scratch. Next, active learning models were built to generate pre-annotations for the sequences selected by a query strategy. The annotation/reviewing time per sequence was recorded. The 120 test reports were used to measure the effectiveness of the active learning models. When annotating from scratch, active learning reduced the annotation time up to 35% and 28% compared to a fully supervised approach and a random sampling baseline, respectively. Reviewing active learning-assisted pre-annotations resulted in 20% further reduction of the annotation time when compared to de novo annotation. The number of concepts that require manual annotation is a good indicator of the annotation time for various active learning approaches as demonstrated by high correlation between time rate and concept annotation rate. Active learning has a key role in reducing the time required to manually annotate domain concepts from clinical free text, either when annotating from scratch or reviewing active learning-assisted pre-annotations. Copyright © 2017 Elsevier B.V. All rights reserved.
The quality of reporting of randomized controlled trials of traditional Chinese medicine: a survey of 13 randomly selected journals from mainland China.

PubMed

Wang, Gang; Mao, Bing; Xiong, Ze-Yu; Fan, Tao; Chen, Xiao-Dong; Wang, Lei; Liu, Guan-Jian; Liu, Jia; Guo, Jia; Chang, Jing; Wu, Tai-Xiang; Li, Ting-Qian

2007-07-01

The number of randomized controlled trials (RCTs) of traditional Chinese medicine (TCM) is increasing. However, there have been few systematic assessments of the quality of reporting of these trials. This study was undertaken to evaluate the quality of reporting of RCTs in TCM journals published in mainland China from 1999 to 2004. Thirteen TCM journals were randomly selected by stratified sampling of the approximately 100 TCM journals published in mainland China. All issues of the selected journals published from 1999 to 2004 were hand-searched according to guidelines from the Cochrane Centre. All reviewers underwent training in the evaluation of RCTs at the Chinese Centre of Evidence-based Medicine. A comprehensive quality assessment of each RCT was completed using a modified version of the Consolidated Standards of Reporting Trials (CONSORT) checklist (total of 30 items) and the Jadad scale. Disagreements were resolved by consensus. Seven thousand four hundred twenty-two RCTs were identified. The proportion of published RCTs relative to all types of published clinical trials increased significantly over the period studied, from 18.6% in 1999 to 35.9% in 2004 (P < 0.001). The mean (SD) Jadad score was 1.03 (0.61) overall. One RCT had a Jadad score of 5 points; 14 had a score of 4 points; and 102 had a score of 3 points. The mean (SD) Jadad score was 0.85 (0.53) in 1999 (746 RCTs) and 1.20 (0.62) in 2004 (1634 RCTs). Across all trials, 39.4% of the items on the modified CONSORT checklist were reported, which was equivalent to 11.82 (5.78) of the 30 items. Some important methodologic components of RCTs were incompletely reported, such as sample-size calculation (reported in 1.1% of RCTs), randomization sequence (7.9%), allocation concealment (0.3 %), implementation of the random-allocation sequence (0%), and analysis of intention to treat (0%). The findings of this study indicate that the quality of reporting of RCTs of TCM has improved, but remains poor.
A large-scale study of the random variability of a coding sequence: a study on the CFTR gene.

PubMed

Modiano, Guido; Bombieri, Cristina; Ciminelli, Bianca Maria; Belpinati, Francesca; Giorgi, Silvia; Georges, Marie des; Scotet, Virginie; Pompei, Fiorenza; Ciccacci, Cinzia; Guittard, Caroline; Audrézet, Marie Pierre; Begnini, Angela; Toepfer, Michael; Macek, Milan; Ferec, Claude; Claustres, Mireille; Pignatti, Pier Franco

2005-02-01

Coding single nucleotide substitutions (cSNSs) have been studied on hundreds of genes using small samples (n(g) approximately 100-150 genes). In the present investigation, a large random European population sample (average n(g) approximately 1500) was studied for a single gene, the CFTR (Cystic Fibrosis Transmembrane conductance Regulator). The nonsynonymous (NS) substitutions exhibited, in accordance with previous reports, a mean probability of being polymorphic (q > 0.005), much lower than that of the synonymous (S) substitutions, but they showed a similar rate of subpolymorphic (q < 0.005) variability. This indicates that, in autosomal genes that may have harmful recessive alleles (nonduplicated genes with important functions), genetic drift overwhelms selection in the subpolymorphic range of variability, making disadvantageous alleles behave as neutral. These results imply that the majority of the subpolymorphic nonsynonymous alleles of these genes are selectively negative or even pathogenic.
A simple method for semi-random DNA amplicon fragmentation using the methylation-dependent restriction enzyme MspJI.

PubMed

Shinozuka, Hiroshi; Cogan, Noel O I; Shinozuka, Maiko; Marshall, Alexis; Kay, Pippa; Lin, Yi-Han; Spangenberg, German C; Forster, John W

2015-04-11

Fragmentation at random nucleotide locations is an essential process for preparation of DNA libraries to be used on massively parallel short-read DNA sequencing platforms. Although instruments for physical shearing, such as the Covaris S2 focused-ultrasonicator system, and products for enzymatic shearing, such as the Nextera technology and NEBNext dsDNA Fragmentase kit, are commercially available, a simple and inexpensive method is desirable for high-throughput sequencing library preparation. MspJI is a recently characterised restriction enzyme which recognises the sequence motif CNNR (where R = G or A) when the first base is modified to 5-methylcytosine or 5-hydroxymethylcytosine. A semi-random enzymatic DNA amplicon fragmentation method was developed based on the unique cleavage properties of MspJI. In this method, random incorporation of 5-methyl-2'-deoxycytidine-5'-triphosphate is achieved through DNA amplification with DNA polymerase, followed by DNA digestion with MspJI. Due to the recognition sequence of the enzyme, DNA amplicons are fragmented in a relatively sequence-independent manner. The size range of the resulting fragments was capable of control through optimisation of 5-methyl-2'-deoxycytidine-5'-triphosphate concentration in the reaction mixture. A library suitable for sequencing using the Illumina MiSeq platform was prepared and processed using the proposed method. Alignment of generated short reads to a reference sequence demonstrated a relatively high level of random fragmentation. The proposed method may be performed with standard laboratory equipment. Although the uniformity of coverage was slightly inferior to the Covaris physical shearing procedure, due to efficiencies of cost and labour, the method may be more suitable than existing approaches for implementation in large-scale sequencing activities, such as bacterial artificial chromosome (BAC)-based genome sequence assembly, pan-genomic studies and locus-targeted genotyping-by-sequencing.
Single-Molecule Electrical Random Resequencing of DNA and RNA

NASA Astrophysics Data System (ADS)

Ohshiro, Takahito; Matsubara, Kazuki; Tsutsui, Makusu; Furuhashi, Masayuki; Taniguchi, Masateru; Kawai, Tomoji

2012-07-01

Two paradigm shifts in DNA sequencing technologies--from bulk to single molecules and from optical to electrical detection--are expected to realize label-free, low-cost DNA sequencing that does not require PCR amplification. It will lead to development of high-throughput third-generation sequencing technologies for personalized medicine. Although nanopore devices have been proposed as third-generation DNA-sequencing devices, a significant milestone in these technologies has been attained by demonstrating a novel technique for resequencing DNA using electrical signals. Here we report single-molecule electrical resequencing of DNA and RNA using a hybrid method of identifying single-base molecules via tunneling currents and random sequencing. Our method reads sequences of nine types of DNA oligomers. The complete sequence of 5'-UGAGGUA-3' from the let-7 microRNA family was also identified by creating a composite of overlapping fragment sequences, which was randomly determined using tunneling current conducted by single-base molecules as they passed between a pair of nanoelectrodes.
Parents' interest in whole-genome sequencing of newborns.

PubMed

Goldenberg, Aaron J; Dodson, Daniel S; Davis, Matthew M; Tarini, Beth A

2014-01-01

The aim of this study was to assess parents' interest in whole-genome sequencing for newborns. We conducted a survey of a nationally representative sample of 1,539 parents about their interest in whole-genome sequencing of newborns. Participants were randomly presented with one of two scenarios that differed in the venue of testing: one offered whole-genome sequencing through a state newborn screening program, whereas the other offered whole-genome sequencing in a pediatrician's office. Overall interest in having future newborns undergo whole-genome sequencing was generally high among parents. If whole-genome sequencing were offered through a state's newborn-screening program, 74% of parents were either definitely or somewhat interested in utilizing this technology. If offered in a pediatrician's office, 70% of parents were either definitely or somewhat interested. Parents in both groups most frequently identified test accuracy and the ability to prevent a child from developing a disease as "very important" in making a decision to have a newborn's whole genome sequenced. These data may help health departments and children's health-care providers anticipate parents' level of interest in genomic screening for newborns. As whole-genome sequencing is integrated into clinical and public health services, these findings may inform the development of educational strategies and outreach messages for parents.
Significant variance in genetic diversity among populations of Schistosoma haematobium detected using microsatellite DNA loci from a genome-wide database.

PubMed

Glenn, Travis C; Lance, Stacey L; McKee, Anna M; Webster, Bonnie L; Emery, Aidan M; Zerlotini, Adhemar; Oliveira, Guilherme; Rollinson, David; Faircloth, Brant C

2013-10-17

Urogenital schistosomiasis caused by Schistosoma haematobium is widely distributed across Africa and is increasingly being targeted for control. Genome sequences and population genetic parameters can give insight into the potential for population- or species-level drug resistance. Microsatellite DNA loci are genetic markers in wide use by Schistosoma researchers, but there are few primers available for S. haematobium. We sequenced 1,058,114 random DNA fragments from clonal cercariae collected from a snail infected with a single Schistosoma haematobium miracidium. We assembled and aligned the S. haematobium sequences to the genomes of S. mansoni and S. japonicum, identifying microsatellite DNA loci across all three species and designing primers to amplify the loci in S. haematobium. To validate our primers, we screened 32 randomly selected primer pairs with population samples of S. haematobium. We designed >13,790 primer pairs to amplify unique microsatellite loci in S. haematobium, (available at http://www.cebio.org/projetos/schistosoma-haematobium-genome). The three Schistosoma genomes contained similar overall frequencies of microsatellites, but the frequency and length distributions of specific motifs differed among species. We identified 15 primer pairs that amplified consistently and were easily scored. We genotyped these 15 loci in S. haematobium individuals from six locations: Zanzibar had the highest levels of diversity; Malawi, Mauritius, Nigeria, and Senegal were nearly as diverse; but the sample from South Africa was much less diverse. About half of the primers in the database of Schistosoma haematobium microsatellite DNA loci should yield amplifiable and easily scored polymorphic markers, thus providing thousands of potential markers. Sequence conservation among S. haematobium, S. japonicum, and S. mansoni is relatively high, thus it should now be possible to identify markers that are universal among Schistosoma species (i.e., using DNA sequences conserved among species), as well as other markers that are specific to species or species-groups (i.e., using DNA sequences that differ among species). Full genome-sequencing of additional species and specimens of S. haematobium, S. japonicum, and S. mansoni is desirable to better characterize differences within and among these species, to develop additional genetic markers, and to examine genes as well as conserved non-coding elements associated with drug resistance.
A Comparison of Three Random Number Generators for Aircraft Dynamic Modeling Applications

NASA Technical Reports Server (NTRS)

Grauer, Jared A.

2017-01-01

Three random number generators, which produce Gaussian white noise sequences, were compared to assess their suitability in aircraft dynamic modeling applications. The first generator considered was the MATLAB (registered) implementation of the Mersenne-Twister algorithm. The second generator was a website called Random.org, which processes atmospheric noise measured using radios to create the random numbers. The third generator was based on synthesis of the Fourier series, where the random number sequences are constructed from prescribed amplitude and phase spectra. A total of 200 sequences, each having 601 random numbers, for each generator were collected and analyzed in terms of the mean, variance, normality, autocorrelation, and power spectral density. These sequences were then applied to two problems in aircraft dynamic modeling, namely estimating stability and control derivatives from simulated onboard sensor data, and simulating flight in atmospheric turbulence. In general, each random number generator had good performance and is well-suited for aircraft dynamic modeling applications. Specific strengths and weaknesses of each generator are discussed. For Monte Carlo simulation, the Fourier synthesis method is recommended because it most accurately and consistently approximated Gaussian white noise and can be implemented with reasonable computational effort.
Absolute nuclear material assay

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2012-05-15

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Absolute nuclear material assay

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2010-07-13

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
PuLSE: Quality control and quantification of peptide sequences explored by phage display libraries.

PubMed

Shave, Steven; Mann, Stefan; Koszela, Joanna; Kerr, Alastair; Auer, Manfred

2018-01-01

The design of highly diverse phage display libraries is based on assumption that DNA bases are incorporated at similar rates within the randomized sequence. As library complexity increases and expected copy numbers of unique sequences decrease, the exploration of library space becomes sparser and the presence of truly random sequences becomes critical. We present the program PuLSE (Phage Library Sequence Evaluation) as a tool for assessing randomness and therefore diversity of phage display libraries. PuLSE runs on a collection of sequence reads in the fastq file format and generates tables profiling the library in terms of unique DNA sequence counts and positions, translated peptide sequences, and normalized 'expected' occurrences from base to residue codon frequencies. The output allows at-a-glance quantitative quality control of a phage library in terms of sequence coverage both at the DNA base and translated protein residue level, which has been missing from toolsets and literature. The open source program PuLSE is available in two formats, a C++ source code package for compilation and integration into existing bioinformatics pipelines and precompiled binaries for ease of use.
Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures

PubMed Central

Wang, Ying; Fu, Lei; Ren, Jie; Yu, Zhaoxia; Chen, Ting; Sun, Fengzhu

2018-01-01

Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered “group-specific” in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two strains of Veillonella parvula: UTDB1-3 and DSM2008. The experiments on the other two real datasets related to Inflammatory Bowel Disease and Type 2 Diabetes in Women consistently demonstrated that MetaGO achieved better prediction accuracy with fewer features compared to previous studies. The experiments showed that MetaGO is a powerful tool for identifying group-specific k-mers, which would be clinically applicable for disease prediction. MetaGO is available at https://github.com/VVsmileyx/MetaGO. PMID:29774017
Improving the performance of minimizers and winnowing schemes.

PubMed

Marçais, Guillaume; Pellow, David; Bork, Daniel; Orenstein, Yaron; Shamir, Ron; Kingsford, Carl

2017-07-15

The minimizers scheme is a method for selecting k -mers from sequences. It is used in many bioinformatics software tools to bin comparable sequences or to sample a sequence in a deterministic fashion at approximately regular intervals, in order to reduce memory consumption and processing time. Although very useful, the minimizers selection procedure has undesirable behaviors (e.g. too many k -mers are selected when processing certain sequences). Some of these problems were already known to the authors of the minimizers technique, and the natural lexicographic ordering of k -mers used by minimizers was recognized as their origin. Many software tools using minimizers employ ad hoc variations of the lexicographic order to alleviate those issues. We provide an in-depth analysis of the effect of k -mer ordering on the performance of the minimizers technique. By using small universal hitting sets (a recently defined concept), we show how to significantly improve the performance of minimizers and avoid some of its worse behaviors. Based on these results, we encourage bioinformatics software developers to use an ordering based on a universal hitting set or, if not possible, a randomized ordering, rather than the lexicographic order. This analysis also settles negatively a conjecture (by Schleimer et al. ) on the expected density of minimizers in a random sequence. The software used for this analysis is available on GitHub: https://github.com/gmarcais/minimizers.git . gmarcais@cs.cmu.edu or carlk@cs.cmu.edu. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

On the existence, uniqueness, and asymptotic normality of a consistent solution of the likelihood equations for nonidentically distributed observations: Applications to missing data problems

NASA Technical Reports Server (NTRS)

Peters, C. (Principal Investigator)

1980-01-01

A general theorem is given which establishes the existence and uniqueness of a consistent solution of the likelihood equations given a sequence of independent random vectors whose distributions are not identical but have the same parameter set. In addition, it is shown that the consistent solution is a MLE and that it is asymptotically normal and efficient. Two applications are discussed: one in which independent observations of a normal random vector have missing components, and the other in which the parameters in a mixture from an exponential family are estimated using independent homogeneous sample blocks of different sizes.
[Methodological quality and reporting quality evaluation of randomized controlled trials published in China Journal of Chinese Materia Medica].

PubMed

Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei

2018-02-01

To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials （1.09%） reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.
mtDNA sequence diversity of Hazara ethnic group from Pakistan.

PubMed

Rakha, Allah; Fatima; Peng, Min-Sheng; Adan, Atif; Bi, Rui; Yasmin, Memona; Yao, Yong-Gang

2017-09-01

The present study was undertaken to investigate mitochondrial DNA (mtDNA) control region sequences of Hazaras from Pakistan, so as to generate mtDNA reference database for forensic casework in Pakistan and to analyze phylogenetic relationship of this particular ethnic group with geographically proximal populations. Complete mtDNA control region (nt 16024-576) sequences were generated through Sanger Sequencing for 319 Hazara individuals from Quetta, Baluchistan. The population sample set showed a total of 189 distinct haplotypes, belonging mainly to West Eurasian (51.72%), East & Southeast Asian (29.78%) and South Asian (18.50%) haplogroups. Compared with other populations from Pakistan, the Hazara population had a relatively high haplotype diversity (0.9945) and a lower random match probability (0.0085). The dataset has been incorporated into EMPOP database under accession number EMP00680. The data herein comprises the largest, and likely most thoroughly examined, control region mtDNA dataset from Hazaras of Pakistan. Copyright © 2017 Elsevier B.V. All rights reserved.
Phylogeography of Influenza A(H3N2) Virus in Peru, 2010-2012.

PubMed

Pollett, Simon; Nelson, Martha I; Kasper, Matthew; Tinoco, Yeny; Simons, Mark; Romero, Candice; Silva, Marita; Lin, Xudong; Halpin, Rebecca A; Fedorova, Nadia; Stockwell, Timothy B; Wentworth, David; Holmes, Edward C; Bausch, Daniel G

2015-08-01

It remains unclear whether lineages of influenza A(H3N2) virus can persist in the tropics and seed temperate areas. We used viral gene sequence data sampled from Peru to test this source-sink model for a Latin American country. Viruses were obtained during 2010-2012 from influenza surveillance cohorts in Cusco, Tumbes, Puerto Maldonado, and Lima. Specimens positive for influenza A(H3N2) virus were randomly selected and underwent hemagglutinin sequencing and phylogeographic analyses. Analysis of 389 hemagglutinin sequences from Peru and 2,192 global sequences demonstrated interseasonal extinction of Peruvian lineages. Extensive mixing occurred with global clades, but some spatial structure was observed at all sites; this structure was weakest in Lima and Puerto Maldonado, indicating that these locations may experience greater viral traffic. The broad diversity and co-circulation of many simultaneous lineages of H3N2 virus in Peru suggests that this country should not be overlooked as a potential source for novel pandemic strains.
Phylogeography of Influenza A(H3N2) Virus in Peru, 2010–2012

PubMed Central

Nelson, Martha I.; Kasper, Matthew; Tinoco, Yeny; Simons, Mark; Romero, Candice; Silva, Marita; Lin, Xudong; Halpin, Rebecca A.; Fedorova, Nadia; Stockwell, Timothy B.; Wentworth, David; Holmes, Edward C.; Bausch, Daniel G.

2015-01-01

It remains unclear whether lineages of influenza A(H3N2) virus can persist in the tropics and seed temperate areas. We used viral gene sequence data sampled from Peru to test this source–sink model for a Latin American country. Viruses were obtained during 2010–2012 from influenza surveillance cohorts in Cusco, Tumbes, Puerto Maldonado, and Lima. Specimens positive for influenza A(H3N2) virus were randomly selected and underwent hemagglutinin sequencing and phylogeographic analyses. Analysis of 389 hemagglutinin sequences from Peru and 2,192 global sequences demonstrated interseasonal extinction of Peruvian lineages. Extensive mixing occurred with global clades, but some spatial structure was observed at all sites; this structure was weakest in Lima and Puerto Maldonado, indicating that these locations may experience greater viral traffic. The broad diversity and co-circulation of many simultaneous lineages of H3N2 virus in Peru suggests that this country should not be overlooked as a potential source for novel pandemic strains. PMID:26196599
Use of simulated data sets to evaluate the fidelity of metagenomic processing methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, K; Ivanova, N; Barry, Kerrie

2007-01-01

Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene-finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity-based ( blast hit distribution) and twomore » sequence composition-based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Use of simulated data sets to evaluate the fidelity of Metagenomicprocessing methods

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mavromatis, Konstantinos; Ivanova, Natalia; Barry, Kerri

2006-12-01

Metagenomics is a rapidly emerging field of research for studying microbial communities. To evaluate methods presently used to process metagenomic sequences, we constructed three simulated data sets of varying complexity by combining sequencing reads randomly selected from 113 isolate genomes. These data sets were designed to model real metagenomes in terms of complexity and phylogenetic composition. We assembled sampled reads using three commonly used genome assemblers (Phrap, Arachne and JAZZ), and predicted genes using two popular gene finding pipelines (fgenesb and CRITICA/GLIMMER). The phylogenetic origins of the assembled contigs were predicted using one sequence similarity--based (blast hit distribution) and twomore » sequence composition--based (PhyloPythia, oligonucleotide frequencies) binning methods. We explored the effects of the simulated community structure and method combinations on the fidelity of each processing step by comparison to the corresponding isolate genomes. The simulated data sets are available online to facilitate standardized benchmarking of tools for metagenomic analysis.« less
Molecular selection in a unified evolutionary sequence

NASA Technical Reports Server (NTRS)

Fox, S. W.

1986-01-01

With guidance from experiments and observations that indicate internally limited phenomena, an outline of unified evolutionary sequence is inferred. Such unification is not visible for a context of random matrix and random mutation. The sequence proceeds from Big Bang through prebiotic matter, protocells, through the evolving cell via molecular and natural selection, to mind, behavior, and society.
Single-cell full-length total RNA sequencing uncovers dynamics of recursive splicing and enhancer RNAs.

PubMed

Hayashi, Tetsutaro; Ozaki, Haruka; Sasagawa, Yohei; Umeda, Mana; Danno, Hiroki; Nikaido, Itoshi

2018-02-12

Total RNA sequencing has been used to reveal poly(A) and non-poly(A) RNA expression, RNA processing and enhancer activity. To date, no method for full-length total RNA sequencing of single cells has been developed despite the potential of this technology for single-cell biology. Here we describe random displacement amplification sequencing (RamDA-seq), the first full-length total RNA-sequencing method for single cells. Compared with other methods, RamDA-seq shows high sensitivity to non-poly(A) RNA and near-complete full-length transcript coverage. Using RamDA-seq with differentiation time course samples of mouse embryonic stem cells, we reveal hundreds of dynamically regulated non-poly(A) transcripts, including histone transcripts and long noncoding RNA Neat1. Moreover, RamDA-seq profiles recursive splicing in >300-kb introns. RamDA-seq also detects enhancer RNAs and their cell type-specific activity in single cells. Taken together, we demonstrate that RamDA-seq could help investigate the dynamics of gene expression, RNA-processing events and transcriptional regulation in single cells.
Variations on a theme of Lander and Waterman

DOE Office of Scientific and Technical Information (OSTI.GOV)

Speed, T.

1997-12-01

The original Lander and Waterman mathematical analysis was for fingerprinting random clones. Since that time, a number of variants of their theory have appeared, including ones which apply to mapping by anchoring random clones, and to non-random or directed clone mapping. The same theory is now widely used to devise random sequencing strategies. In this talk I will review these developments, and go on the discuss the theory required for directed sequencing strategies.
A DNA-Based Procedure for In Planta Detection of Fusarium oxysporum f. sp. phaseoli.

PubMed

Alves-Santos, Fernando M; Ramos, Brisa; García-Sánchez, M Asunción; Eslava, Arturo P; Díaz-Mínguez, José María

2002-03-01

ABSTRACT We have characterized strains of Fusarium oxysporum from common bean fields in Spain that were nonpathogenic on common bean, as well as F. oxysporum strains (F. oxysporum f. sp. phaseoli) pathogenic to common bean by random amplified polymorphic DNA (RAPD) analysis. We identified a RAPD marker (RAPD 4.12) specific for the highly virulent pathogenic strains of the seven races of F. oxysporum f. sp. phaseoli. Sequence analysis of RAPD 4.12 allowed the design of oligonucleotides that amplify a 609-bp sequence characterized amplified region (SCAR) marker (SCAR-B310A280). Under controlled environmental and greenhouse conditions, detection of the pathogen by polymerase chain reaction was 100% successful in root samples of infected but still symptomless plants and in stem samples of plants with disease severity of >/=4 in the Centro Internacional de Agricultura Tropical (CIAT; Cali, Colombia) scale. The diagnostic procedure can be completed in 5 h and allows the detection of all known races of the pathogen in plant samples at early stages of the disease with no visible symptoms.
Movie denoising by average of warped lines.

PubMed

Bertalmío, Marcelo; Caselles, Vicent; Pardo, Alvaro

2007-09-01

Here, we present an efficient method for movie denoising that does not require any motion estimation. The method is based on the well-known fact that averaging several realizations of a random variable reduces the variance. For each pixel to be denoised, we look for close similar samples along the level surface passing through it. With these similar samples, we estimate the denoised pixel. The method to find close similar samples is done via warping lines in spatiotemporal neighborhoods. For that end, we present an algorithm based on a method for epipolar line matching in stereo pairs which has per-line complexity O (N), where N is the number of columns in the image. In this way, when applied to the image sequence, our algorithm is computationally efficient, having a complexity of the order of the total number of pixels. Furthermore, we show that the presented method is unsupervised and is adapted to denoise image sequences with an additive white noise while respecting the visual details on the movie frames. We have also experimented with other types of noise with satisfactory results.
[Detection and Analysis of Human Parainfluenza Virus Infection in Hospitalized Adults with Acute Respiratory Tract Infections].

PubMed

Li, Xing-Qiao; Liu, Xue-Wei; Zhou, Tao; Pei, Xiao-Fang

2017-11-01

To investigate the prevalence and gene characteristics of different groups of human parainfluenza virus (HPIV) infection in hospitalized adults with acute respiratory tract infections (ARI). RT-PCR was used to detect HPIV hemagglutinin (HA) DNA,which was extracted from sputum samples of 1 039 adult patients with ARI from March,2014 to June,2016. The HA gene amplified from randomly selected positive samples were sequenced to analyze the homology and variation. 10.6% (110/1 039) of these samples were positive for HPIV,including 8 cases of HPIV-1,22 cases of HPIV-2,46 cases of HPIV-3 and 34 cases of HPIV-4. Detectable rate varied among different groups of HPIV according to seasons of the year and ages of patients. No significant differences were found between the positive samples and the reference sequences. Compared with different reference strains of different regions,the genetic distance of nucleotide is the smallest between the strains tested in this study and the reference strains of other provinces and cities in China. In Chengdu region,HPIV virus is highly detected in ARI,all subtypes were detected with HPIV-3 being the main subtype.
pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach.

PubMed

Jia, Jianhua; Liu, Zi; Xiao, Xuan; Liu, Bingxiang; Chou, Kuo-Chen

2016-04-07

Being one type of post-translational modifications (PTMs), protein lysine succinylation is important in regulating varieties of biological processes. It is also involved with some diseases, however. Consequently, from the angles of both basic research and drug development, we are facing a challenging problem: for an uncharacterized protein sequence having many Lys residues therein, which ones can be succinylated, and which ones cannot? To address this problem, we have developed a predictor called pSuc-Lys through (1) incorporating the sequence-coupled information into the general pseudo amino acid composition, (2) balancing out skewed training dataset by random sampling, and (3) constructing an ensemble predictor by fusing a series of individual random forest classifiers. Rigorous cross-validations indicated that it remarkably outperformed the existing methods. A user-friendly web-server for pSuc-Lys has been established at http://www.jci-bioinfo.cn/pSuc-Lys, by which users can easily obtain their desired results without the need to go through the complicated mathematical equations involved. It has not escaped our notice that the formulation and approach presented here can also be used to analyze many other problems in computational proteomics. Copyright © 2016 Elsevier Ltd. All rights reserved.
DNA barcode analysis: a comparison of phylogenetic and statistical classification methods.

PubMed

Austerlitz, Frederic; David, Olivier; Schaeffer, Brigitte; Bleakley, Kevin; Olteanu, Madalina; Leblois, Raphael; Veuille, Michel; Laredo, Catherine

2009-11-10

DNA barcoding aims to assign individuals to given species according to their sequence at a small locus, generally part of the CO1 mitochondrial gene. Amongst other issues, this raises the question of how to deal with within-species genetic variability and potential transpecific polymorphism. In this context, we examine several assignation methods belonging to two main categories: (i) phylogenetic methods (neighbour-joining and PhyML) that attempt to account for the genealogical framework of DNA evolution and (ii) supervised classification methods (k-nearest neighbour, CART, random forest and kernel methods). These methods range from basic to elaborate. We investigated the ability of each method to correctly classify query sequences drawn from samples of related species using both simulated and real data. Simulated data sets were generated using coalescent simulations in which we varied the genealogical history, mutation parameter, sample size and number of species. No method was found to be the best in all cases. The simplest method of all, "one nearest neighbour", was found to be the most reliable with respect to changes in the parameters of the data sets. The parameter most influencing the performance of the various methods was molecular diversity of the data. Addition of genetically independent loci--nuclear genes--improved the predictive performance of most methods. The study implies that taxonomists can influence the quality of their analyses either by choosing a method best-adapted to the configuration of their sample, or, given a certain method, increasing the sample size or altering the amount of molecular diversity. This can be achieved either by sequencing more mtDNA or by sequencing additional nuclear genes. In the latter case, they may also have to modify their data analysis method.
High-density, microsphere-based fiber optic DNA microarrays.

PubMed

Epstein, Jason R; Leung, Amy P K; Lee, Kyong Hoon; Walt, David R

2003-05-01

A high-density fiber optic DNA microarray has been developed consisting of oligonucleotide-functionalized, 3.1-microm-diameter microspheres randomly distributed on the etched face of an imaging fiber bundle. The fiber bundles are comprised of 6000-50000 fused optical fibers and each fiber terminates with an etched well. The microwell array is capable of housing complementary-sized microspheres, each containing thousands of copies of a unique oligonucleotide probe sequence. The array fabrication process results in random microsphere placement. Determining the position of microspheres in the random array requires an optical encoding scheme. This array platform provides many advantages over other array formats. The microsphere-stock suspension concentration added to the etched fiber can be controlled to provide inherent sensor redundancy. Examining identical microspheres has a beneficial effect on the signal-to-noise ratio. As other sequences of interest are discovered, new microsphere sensing elements can be added to existing microsphere pools and new arrays can be fabricated incorporating the new sequences without altering the existing detection capabilities. These microarrays contain the smallest feature sizes (3 microm) of any DNA array, allowing interrogation of extremely small sample volumes. Reducing the feature size results in higher local target molecule concentrations, creating rapid and highly sensitive assays. The microsphere array platform is also flexible in its applications; research has included DNA-protein interaction profiles, microbial strain differentiation, and non-labeled target interrogation with molecular beacons. Fiber optic microsphere-based DNA microarrays have a simple fabrication protocol enabling their expansion into other applications, such as single cell-based assays.
Rényi continuous entropy of DNA sequences.

PubMed

Vinga, Susana; Almeida, Jonas S

2004-12-07

Entropy measures of DNA sequences estimate their randomness or, inversely, their repeatability. L-block Shannon discrete entropy accounts for the empirical distribution of all length-L words and has convergence problems for finite sequences. A new entropy measure that extends Shannon's formalism is proposed. Renyi's quadratic entropy calculated with Parzen window density estimation method applied to CGR/USM continuous maps of DNA sequences constitute a novel technique to evaluate sequence global randomness without some of the former method drawbacks. The asymptotic behaviour of this new measure was analytically deduced and the calculation of entropies for several synthetic and experimental biological sequences was performed. The results obtained were compared with the distributions of the null model of randomness obtained by simulation. The biological sequences have shown a different p-value according to the kernel resolution of Parzen's method, which might indicate an unknown level of organization of their patterns. This new technique can be very useful in the study of DNA sequence complexity and provide additional tools for DNA entropy estimation. The main MATLAB applications developed and additional material are available at the webpage . Specialized functions can be obtained from the authors.
Prevalence and phylogenetic analysis of hepatitis E virus in pigs, wild boars, roe deer, red deer and moose in Lithuania.

PubMed

Spancerniene, Ugne; Grigas, Juozas; Buitkuviene, Jurate; Zymantiene, Judita; Juozaitiene, Vida; Stankeviciute, Milda; Razukevicius, Dainius; Zienius, Dainius; Stankevicius, Arunas

2018-02-23

Hepatitis E virus (HEV) is one of the major causes of acute viral hepatitis worldwide. In Europe, food-borne zoonotic transmission of HEV genotype 3 has been associated with domestic pigs and wild boar. Controversial data are available on the circulation of the virus in animals that are used for human consumption, and to date, no gold standard has yet been defined for the diagnosis of HEV-associated hepatitis. To investigate the current HEV infection status in Lithuanian pigs and wild ungulates, the presence of viral RNA was analyzed by nested reverse transcription polymerase chain reaction (RT-nPCR) in randomly selected samples, and the viral RNA was subsequently genotyped. In total, 32.98 and 22.55% of the domestic pig samples were HEV-positive using RT-nPCR targeting the ORF1 and ORF2 fragments, respectively. Among ungulates, 25.94% of the wild boar samples, 22.58% of the roe deer samples, 6.67% of the red deer samples and 7.69% of the moose samples were positive for HEV RNA using primers targeting the ORF1 fragment. Using primers targeting the ORF2 fragment of the HEV genome, viral RNA was only detected in 17.03% of the wild boar samples and 12.90% of the roe deer samples. Phylogenetic analysis based on a 348-nucleotide-long region of the HEV ORF2 showed that all obtained sequences detected in Lithuanian domestic pigs and wildlife belonged to genotype 3. In this study, the sequences identified from pigs, wild boars and roe deer clustered within the 3i subtype reference sequences from the GenBank database. The sequences obtained from pig farms located in two different counties of Lithuania were of the HEV 3f subtype. The wild boar sequences clustered within subtypes 3i and 3h, clearly indicating that wild boars can harbor additional subtypes of HEV. For the first time, the ORF2 nucleotide sequences obtained from roe deer proved that HEV subtype 3i can be found in a novel host. The results of the viral prevalence and phylogenetic analyses clearly demonstrated viral infection in Lithuanian pigs and wild ungulates, thus highlighting a significant concern for zoonotic virus transmission through both the food chain and direct contact with animals. Unexpected HEV genotype 3 subtype diversity in Lithuania and neighboring countries revealed that further studies are necessary to understand the mode of HEV transmission between animals and humans in the Baltic States region.
EMPOP-quality mtDNA control region sequences from Kashmiri of Azad Jammu & Kashmir, Pakistan.

PubMed

Rakha, Allah; Peng, Min-Sheng; Bi, Rui; Song, Jiao-Jiao; Salahudin, Zeenat; Adan, Atif; Israr, Muhammad; Yao, Yong-Gang

2016-11-01

The mitochondrial DNA (mtDNA) control region (nucleotide position 16024-576) sequences were generated through Sanger sequencing method for 317 self-identified Kashmiris from all districts of Azad Jammu & Kashmir Pakistan. The population sample set showed a total of 251 haplotypes, with a relatively high haplotype diversity (0.9977) and a low random match probability (0.54%). The containing matrilineal lineages belonging to three different phylogeographic origins of Western Eurasian (48.9%), South Asian (47.0%) and East Asian (4.1%). The present study was compared to previous data from Pakistan and other worldwide populations (Central Asia, Western Asia, and East & Southeast Asia). The dataset is made available through EMPOP under accession number EMP00679 and will serve as an mtDNA reference database in forensic casework in Pakistan. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Staphylococcus nepalensis in the guano of bats (Mammalia).

PubMed

Vandžurová, A; Bačkor, P; Javorský, P; Pristaš, P

2013-05-31

Thirty randomly selected mesophilic isolates from the six years old guano sample from mixed Myotis myotis and M. blythii summer roosts colony were isolated and identified as Staphylococcus nepalensis using MALDI TOF analysis. 16S rRNA gene sequencing of selected five isolates and subsequent phylogenetic analysis confirmed that all sequences showed the highest similarity to S. nepalensis sequences. Several virulence factors were produced by tested isolates, mainly capsule formation and resistance to tetracycline, ampicillin, gentamycin, and chloramphenicol antibiotics. Our experiments show that the majority of cultivable mesophilic bacteria from the guano of bats belong to the S. nepalensis species. This is the first report on the occurrence of this species in the guano of bats and our results indicate that the guano accumulated near or directly in human dwellings and buildings may represent a significant risk for human health. Copyright © 2013 Elsevier B.V. All rights reserved.

Supersampling multiframe blind deconvolution resolution enhancement of adaptive-optics-compensated imagery of LEO satellites

NASA Astrophysics Data System (ADS)

Gerwe, David R.; Lee, David J.; Barchers, Jeffrey D.

2000-10-01

A post-processing methodology for reconstructing undersampled image sequences with randomly varying blur is described which can provide image enhancement beyond the sampling resolution of the sensor. This method is demonstrated on simulated imagery and on adaptive optics compensated imagery taken by the Starfire Optical Range 3.5 meter telescope that has been artificially undersampled. Also shown are the results of multiframe blind deconvolution of some of the highest quality optical imagery of low earth orbit satellites collected with a ground based telescope to date. The algorithm used is a generalization of multiframe blind deconvolution techniques which includes a representation of spatial sampling by the focal plane array elements in the forward stochastic model of the imaging system. This generalization enables the random shifts and shape of the adaptive compensated PSF to be used to partially eliminate the aliasing effects associated with sub- Nyquist sampling of the image by the focal plane array. The method could be used to reduce resolution loss which occurs when imaging in wide FOV modes.
Supersampling multiframe blind deconvolution resolution enhancement of adaptive optics compensated imagery of low earth orbit satellites

NASA Astrophysics Data System (ADS)

Gerwe, David R.; Lee, David J.; Barchers, Jeffrey D.

2002-09-01

We describe a postprocessing methodology for reconstructing undersampled image sequences with randomly varying blur that can provide image enhancement beyond the sampling resolution of the sensor. This method is demonstrated on simulated imagery and on adaptive-optics-(AO)-compensated imagery taken by the Starfire Optical Range 3.5-m telescope that has been artificially undersampled. Also shown are the results of multiframe blind deconvolution of some of the highest quality optical imagery of low earth orbit satellites collected with a ground-based telescope to date. The algorithm used is a generalization of multiframe blind deconvolution techniques that include a representation of spatial sampling by the focal plane array elements based on a forward stochastic model. This generalization enables the random shifts and shape of the AO- compensated point spread function (PSF) to be used to partially eliminate the aliasing effects associated with sub-Nyquist sampling of the image by the focal plane array. The method could be used to reduce resolution loss that occurs when imaging in wide- field-of-view (FOV) modes.
The human clone ST22 SCCmec IV methicillin-resistant Staphylococcus aureus isolated from swine herds and wild primates in Nepal: is man the common source?

PubMed

Roberts, Marilyn C; Joshi, Prabhu Raj; Greninger, Alexander L; Melendez, Daira; Paudel, Saroj; Acharya, Mahesh; Bimali, Nabin Kishor; Koju, Narayan P; No, David; Chalise, Mukesh; Kyes, Randall C

2018-05-01

Swine nasal samples [n = 282] were collected from 12 randomly selected farms around Kathmandu, Nepal, from healthy animals. In addition, wild monkey (Macaca mulatta) saliva samples [n = 59] were collected near temples areas in Kathmandu using a non-invasive sampling technique. All samples were processed for MRSA using standardized selective media and conventional biochemical tests. MRSA verification was done and isolates characterized by SCCmec, multilocus sequence typing, whole genome sequencing [WGS] and antibiotic susceptibilities. Six (2.1%) swine MRSA were isolated from five of the different swine herds tested, five were ST22 type IV and one ST88 type V. Four (6.8%) macaques MRSA were isolated, with three ST22 SCCmec type IV and one ST239 type III. WGS sequencing showed that the eight ciprofloxacin resistant ST22 isolates carried gyrA mutation [S84L]. Six isolates carried the erm(C) genes, five isolates carried aacC-aphD genes and four isolates carried blaZ genes. The swine linezolid resistant ST22 did not carry any known acquired linezolid resistance genes but had a mutation in ribosomal protein L22 [A29V] and an insertion in L4 [68KG69], both previously associated with linezolid resistance. Multiple virulence factors were also identified. This is the first time MRSA ST22 SCCmec IV has been isolated from livestock or primates.
Conservation of Shannon's redundancy for proteins. [information theory applied to amino acid sequences

NASA Technical Reports Server (NTRS)

Gatlin, L. L.

1974-01-01

Concepts of information theory are applied to examine various proteins in terms of their redundancy in natural originators such as animals and plants. The Monte Carlo method is used to derive information parameters for random protein sequences. Real protein sequence parameters are compared with the standard parameters of protein sequences having a specific length. The tendency of a chain to contain some amino acids more frequently than others and the tendency of a chain to contain certain amino acid pairs more frequently than other pairs are used as randomness measures of individual protein sequences. Non-periodic proteins are generally found to have random Shannon redundancies except in cases of constraints due to short chain length and genetic codes. Redundant characteristics of highly periodic proteins are discussed. A degree of periodicity parameter is derived.
Absolute nuclear material assay using count distribution (LAMBDA) space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Prasad, Mano K.; Snyderman, Neal J.; Rowland, Mark S.

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Absolute nuclear material assay using count distribution (LAMBDA) space

DOEpatents

Prasad, Manoj K [Pleasanton, CA; Snyderman, Neal J [Berkeley, CA; Rowland, Mark S [Alamo, CA

2012-06-05

A method of absolute nuclear material assay of an unknown source comprising counting neutrons from the unknown source and providing an absolute nuclear material assay utilizing a model to optimally compare to the measured count distributions. In one embodiment, the step of providing an absolute nuclear material assay comprises utilizing a random sampling of analytically computed fission chain distributions to generate a continuous time-evolving sequence of event-counts by spreading the fission chain distribution in time.
Prevalence of K13-propeller gene polymorphisms among Plasmodium falciparum parasites isolated from adult symptomatic patients in northern Uganda.

PubMed

Ocan, Moses; Bwanga, Freddie; Okeng, Alfred; Katabazi, Fred; Kigozi, Edgar; Kyobe, Samuel; Ogwal-Okeng, Jasper; Obua, Celestino

2016-08-19

In the absence of an effective vaccine, malaria treatment and eradication is still a challenge in most endemic areas globally. This is especially the case with the current reported emergence of resistance to artemisinin agents in Southeast Asia. This study therefore explored the prevalence of K13-propeller gene polymorphisms among Plasmodium falciparum parasites in northern Uganda. Adult patients (≥18 years) presenting to out-patients department of Lira and Gulu regional referral hospitals in northern Uganda were randomly recruited. Laboratory investigation for presence of plasmodium infection among patients was done using Plasmodium falciparum exclusive rapid diagnostic test, histidine rich protein-2 (HRP2) (Pf). Finger prick capillary blood from patients with a positive malaria test was spotted on a filter paper Whatman no. 903. The parasite DNA was extracted using chelex resin method and sequenced for mutations in K13-propeller gene using Sanger sequencing. PCR DNA sequence products were analyzed using in DNAsp 5.10.01software, data was further processed in Excel spreadsheet 2007. A total of 60 parasite DNA samples were sequenced. Polymorphisms in the K13-propeller gene were detected in four (4) of the 60 parasite DNA samples sequenced. A non-synonymous polymorphism at codon 533 previously detected in Cambodia was found in the parasite DNA samples analyzed. Polymorphisms at codon 522 (non-synonymous) and codon 509 (synonymous) were also found in the samples analyzed. The study found evidence of positive selection in the Plasmodium falciparum population in northern Uganda (Tajima's D = -1.83205; Fu and Li's D = -1.82458). Polymorphism in the K13-propeller gene previously reported in Cambodia has been found in the Ugandan Plasmodium falciparum parasites. There is need for continuous surveillance for artemisinin resistance gene markers in the country.
Population and performance analyses of four major populations with Illumina's FGx Forensic Genomics System.

PubMed

Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Seah, Lay Hong; Budowle, Bruce

2017-09-01

The MiSeq FGx Forensic Genomics System (Illumina) enables amplification and massively parallel sequencing of 59 STRs, 94 identity informative SNPs, 54 ancestry informative SNPs, and 24 phenotypic informative SNPs. Allele frequency and population statistics data were generated for the 172 SNP loci included in this panel on four major population groups (Chinese, African Americans, US Caucasians, and Southwest Hispanics). Single-locus and combined random match probability values were generated for the identity informative SNPs. The average combined STR and identity informative SNP random match probabilities (assuming independence) across all four populations were 1.75E-67 and 2.30E-71 with length-based and sequence-based STR alleles, respectively. Ancestry and phenotype predictions were obtained using the ForenSeq™ Universal Analysis System (UAS; Illumina) based on the ancestry informative and phenotype informative SNP profiles generated for each sample. Additionally, performance metrics, including profile completeness, read depth, relative locus performance, and allele coverage ratios, were evaluated and detailed for the 725 samples included in this study. While some genetic markers included in this panel performed notably better than others, performance across populations was generally consistent. The performance and population data included in this study support that accurate and reliable profiles were generated and provide valuable background information for laboratories considering internal validation studies and implementation. Copyright © 2017 Elsevier B.V. All rights reserved.
Occurrence and characterization of livestock-associated methicillin-resistant Staphylococcus aureus in pig industries of northern Thailand.

PubMed

Patchanee, Prapas; Tadee, Pakpoom; Arjkumpa, Orapun; Love, David; Chanachai, Karoon; Alter, Thomas; Hinjoy, Soawapak; Tharavichitkul, Prasit

2014-12-01

This study was conducted to determine the prevalence of livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) in pigs, farm workers, and the environment in northern Thailand, and to assess LA-MRSA isolate phenotypic characteristics. One hundred and four pig farms were randomly selected from the 21,152 in Chiang Mai and Lamphun provinces in 2012. Nasal and skin swab samples were collected from pigs and farm workers. Environmental swabs (pig stable floor, faucet, and feeder) were also collected. MRSA was identified by conventional bacterial culture technique, with results confirmed by multiplex PCR and multi locus sequence typing (MLST). Herd prevalence of MRSA was 9.61% (10 of 104 farms). Among pigs, workers, and farm environments, prevalence was 0.68% (two of 292 samples), 2.53% (seven of 276 samples), and 1.28% (four of 312 samples), respectively. Thirteen MRSA isolates (seven from workers, four from environmental samples, and two from pigs) were identified as Staphylococcal chromosomal cassette mec IV sequences type 9. Antimicrobial sensitivity tests found 100% of the MRSA isolates resistant to clindamycin, oxytetracycline, and tetracycline, while 100% were susceptible to cloxacillin and vancomycin. All possessed a multidrug-resistant phenotype. This is the first evidence of an LA-MRSA interrelationship among pigs, workers, and the farm environment in Thailand.
Occurrence and characterization of livestock-associated methicillin-resistant Staphylococcus aureus in pig industries of northern Thailand

PubMed Central

Tadee, Pakpoom; Arjkumpa, Orapun; Love, David; Chanachai, Karoon; Alter, Thomas; Hinjoy, Soawapak; Tharavichitkul, Prasit

2014-01-01

This study was conducted to determine the prevalence of livestock-associated methicillin-resistant Staphylococcus aureus (LA-MRSA) in pigs, farm workers, and the environment in northern Thailand, and to assess LA-MRSA isolate phenotypic characteristics. One hundred and four pig farms were randomly selected from the 21,152 in Chiang Mai and Lamphun provinces in 2012. Nasal and skin swab samples were collected from pigs and farm workers. Environmental swabs (pig stable floor, faucet, and feeder) were also collected. MRSA was identified by conventional bacterial culture technique, with results confirmed by multiplex PCR and multi locus sequence typing (MLST). Herd prevalence of MRSA was 9.61% (10 of 104 farms). Among pigs, workers, and farm environments, prevalence was 0.68% (two of 292 samples), 2.53% (seven of 276 samples), and 1.28% (four of 312 samples), respectively. Thirteen MRSA isolates (seven from workers, four from environmental samples, and two from pigs) were identified as Staphylococcal chromosomal cassette mec IV sequences type 9. Antimicrobial sensitivity tests found 100% of the MRSA isolates resistant to clindamycin, oxytetracycline, and tetracycline, while 100% were susceptible to cloxacillin and vancomycin. All possessed a multidrug-resistant phenotype. This is the first evidence of an LA-MRSA interrelationship among pigs, workers, and the farm environment in Thailand. PMID:25530702
Assessment of antibody library diversity through next generation sequencing and technical error compensation

PubMed Central

Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

2017-01-01

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error. PMID:28505201
Assessment of antibody library diversity through next generation sequencing and technical error compensation.

PubMed

Fantini, Marco; Pandolfini, Luca; Lisi, Simonetta; Chirichella, Michele; Arisi, Ivan; Terrigno, Marco; Goracci, Martina; Cremisi, Federico; Cattaneo, Antonino

2017-01-01

Antibody libraries are important resources to derive antibodies to be used for a wide range of applications, from structural and functional studies to intracellular protein interference studies to developing new diagnostics and therapeutics. Whatever the goal, the key parameter for an antibody library is its complexity (also known as diversity), i.e. the number of distinct elements in the collection, which directly reflects the probability of finding in the library an antibody against a given antigen, of sufficiently high affinity. Quantitative evaluation of antibody library complexity and quality has been for a long time inadequately addressed, due to the high similarity and length of the sequences of the library. Complexity was usually inferred by the transformation efficiency and tested either by fingerprinting and/or sequencing of a few hundred random library elements. Inferring complexity from such a small sampling is, however, very rudimental and gives limited information about the real diversity, because complexity does not scale linearly with sample size. Next-generation sequencing (NGS) has opened new ways to tackle the antibody library complexity quality assessment. However, much remains to be done to fully exploit the potential of NGS for the quantitative analysis of antibody repertoires and to overcome current limitations. To obtain a more reliable antibody library complexity estimate here we show a new, PCR-free, NGS approach to sequence antibody libraries on Illumina platform, coupled to a new bioinformatic analysis and software (Diversity Estimator of Antibody Library, DEAL) that allows to reliably estimate the complexity, taking in consideration the sequencing error.
High-Throughput Sequencing Reveals Drastic Changes in Fungal Communities in the Phyllosphere of Norway Spruce (Picea abies) Following Invasion of the Spruce Bud Scale (Physokermes piceae).

PubMed

Menkis, Audrius; Marčiulynas, Adas; Gedminas, Artūras; Lynikienė, Jūratė; Povilaitienė, Aistė

2015-11-01

The aim of this study was to assess the diversity and composition of fungal communities in damaged and undamaged shoots of Norway spruce (Picea abies) following recent invasion of the spruce bud scale (Physokermes piceae) in Lithuania. Sampling was done in July 2013 and included 50 random lateral shoots from ten random trees in each of five visually undamaged and five damaged 40-50-year-old pure stands of P. abies. DNA was isolated from 500 individual shoots, subjected to amplification of the internal transcribed spacer of fungal ribosomal DNA (ITS rDNA), barcoded and sequenced. Clustering of 149,426 high-quality sequences resulted in 1193 non-singleton contigs of which 1039 (87.1 %) were fungal. In total, there were 893 fungal taxa in damaged shoots and 608 taxa in undamaged shoots (p < 0.0001). Furthermore, 431 (41.5 %) fungal taxa were exclusively in damaged shoots, 146 (14.0 %) were exclusively in undamaged shoots, and 462 (44.5 %) were common to both types of samples. Correspondence analysis showed that study sites representing damaged and undamaged shoots were separated from each other, indicating that in these fungal communities, these were largely different and, therefore, heavily affected by P. piceae. In conclusion, the results demonstrated that invasive alien tree pests may have a profound effect on fungal mycobiota associated with the phyllosphere of P. abies, and therefore, in addition to their direct negative effect owing physical damage of the tissue, they may also indirectly determine health, sustainability and, ultimately, distribution of the forest tree species.
Molecular characterization of Wolbachia infection in bed bugs (Cimex lectularius) collected from several localities in France

PubMed Central

Akhoundi, Mohammad; Cannet, Arnaud; Loubatier, Céline; Berenger, Jean-Michel; Izri, Arezki; Marty, Pierre; Delaunay, Pascal

2016-01-01

Wolbachia symbionts are maternally inherited intracellular bacteria that have been detected in numerous insects including bed bugs. The objective of this study, the first epidemiological study in Europe, was to screen Wolbachia infection among Cimex lectularius collected in the field, using PCR targeting the surface protein gene (wsp), and to compare obtained Wolbachia strains with those reported from laboratory colonies of C. lectularius as well as other Wolbachia groups. For this purpose, 284 bed bug specimens were caught and studied from eight different regions of France including the suburbs of Paris, Bouches-du-Rhône, Lot-et-Garonne, and five localities in Alpes-Maritimes. Among the samples, 166 were adults and the remaining 118 were considered nymphs. In all, 47 out of 118 nymphs (40%) and 61 out of 166 adults (37%) were found positive on wsp screening. Among the positive cases, 10 samples were selected randomly for sequencing. The sequences had 100% homology with wsp sequences belonging to the F-supergroup strains of Wolbachia. Therefore, we confirm the similarity of Wolbachia strains detected in this epidemiological study to Wolbachia spp. reported from laboratory colonies of C. lectularius. PMID:27492563
Population entropies estimates of proteins

NASA Astrophysics Data System (ADS)

Low, Wai Yee

2017-05-01

The Shannon entropy equation provides a way to estimate variability of amino acids sequences in a multiple sequence alignment of proteins. Knowledge of protein variability is useful in many areas such as vaccine design, identification of antibody binding sites, and exploration of protein 3D structural properties. In cases where the population entropies of a protein are of interest but only a small sample size can be obtained, a method based on linear regression and random subsampling can be used to estimate the population entropy. This method is useful for comparisons of entropies where the actual sequence counts differ and thus, correction for alignment size bias is needed. In the current work, an R based package named EntropyCorrect that enables estimation of population entropy is presented and an empirical study on how well this new algorithm performs on simulated dataset of various combinations of population and sample sizes is discussed. The package is available at https://github.com/lloydlow/EntropyCorrect. This article, which was originally published online on 12 May 2017, contained an error in Eq. (1), where the summation sign was missing. The corrected equation appears in the Corrigendum attached to the pdf.
Bottom-up driven involuntary auditory evoked field change: constant sound sequencing amplifies but does not sharpen neural activity.

PubMed

Okamoto, Hidehiko; Stracke, Henning; Lagemann, Lothar; Pantev, Christo

2010-01-01

The capability of involuntarily tracking certain sound signals during the simultaneous presence of noise is essential in human daily life. Previous studies have demonstrated that top-down auditory focused attention can enhance excitatory and inhibitory neural activity, resulting in sharpening of frequency tuning of auditory neurons. In the present study, we investigated bottom-up driven involuntary neural processing of sound signals in noisy environments by means of magnetoencephalography. We contrasted two sound signal sequencing conditions: "constant sequencing" versus "random sequencing." Based on a pool of 16 different frequencies, either identical (constant sequencing) or pseudorandomly chosen (random sequencing) test frequencies were presented blockwise together with band-eliminated noises to nonattending subjects. The results demonstrated that the auditory evoked fields elicited in the constant sequencing condition were significantly enhanced compared with the random sequencing condition. However, the enhancement was not significantly different between different band-eliminated noise conditions. Thus the present study confirms that by constant sound signal sequencing under nonattentive listening the neural activity in human auditory cortex can be enhanced, but not sharpened. Our results indicate that bottom-up driven involuntary neural processing may mainly amplify excitatory neural networks, but may not effectively enhance inhibitory neural circuits.
Bioavailability of everolimus administered as a single 5 mg tablet versus five 1 mg tablets: a randomized, open-label, two-way crossover study of healthy volunteers.

PubMed

Thudium, Karen; Gallo, Jorge; Bouillaud, Emmanuel; Sachs, Carolin; Eddy, Simantini; Cheung, Wing

2015-01-01

The mammalian target of rapamycin (mTOR) inhibitor everolimus has a well-established pharmacokinetics profile. We conducted a randomized, single-center, open-label, two-sequence, two-period crossover study of healthy volunteers to assess the relative bioavailability of everolimus administered as one 5 mg tablet or five 1 mg tablets. Subjects were randomized 1:1 to receive everolimus dosed as one 5 mg tablet or as five 1 mg tablets on day 1, followed by a washout period on days 8-14 and then the opposite formulation on day 15. Blood sampling for pharmacokinetic evaluation was performed at prespecified time points, with 17 samples taken for each treatment period. Primary variables for evaluation of relative bioavailability were area under the concentration-time curve from time zero to infinity (AUCinf) and maximum blood concentration (Cmax). Safety was assessed by reporting the incidence of adverse events (AEs). Twenty-two participants received everolimus as one 5 mg tablet followed by five 1 mg tablets (n=11) or the opposite sequence (n=11). The Cmax of five 1 mg tablets was 48% higher than that of one 5 mg tablet (geometric mean ratio, 1.48; 90% confidence interval [CI], 1.35-1.62). AUCinf was similar (geometric mean ratio, 1.08; 90% CI, 1.02-1.16), as were the extent of absorption and the distribution and elimination kinetics. AEs, all grade 1 or 2, were observed in 54.5% of subjects. Although the extent of absorption was similar, the Cmax of five 1 mg tablets was higher than that of one 5 mg tablet, suggesting these formulations lead to different peak blood concentrations and are not interchangeable at the dose tested.
Development of Specific Sequence-Characterized Amplified Region Markers for Detecting Histoplasma capsulatum in Clinical and Environmental Samples

PubMed Central

Frías De León, María Guadalupe; Arenas López, Gabina; Taylor, Maria Lucia; Acosta Altamirano, Gustavo

2012-01-01

Sequence-characterized amplified region (SCAR) markers, generated by randomly amplified polymorphic DNA (RAPD)-PCR, were developed to detect Histoplasma capsulatum selectively in clinical and environmental samples. A 1,200-bp RAPD-PCR-specific band produced with the 1281-1283 primers was cloned, sequenced, and used to design two SCAR markers, 1281-1283220 and 1281-1283230. The specificity of these markers was confirmed by Southern hybridization. To evaluate the relevance of the SCAR markers for the diagnosis of histoplasmosis, another molecular marker (M antigen probe) was used for comparison. To validate 1281-1283220 and 1281-1283230 as new tools for the identification of H. capsulatum, the specificity and sensitivity of these markers were assessed for the detection of the pathogen in 36 clinical (17 humans, as well as 9 experimentally and 10 naturally infected nonhuman mammals) and 20 environmental (10 contaminated soil and 10 guano) samples. Although the two SCAR markers and the M antigen probe identified H. capsulatum isolates from different geographic origins in America, the 1281-1283220 SCAR marker was the most specific and detected the pathogen in all samples tested. In contrast, the 1281-1283230 SCAR marker and the M antigen probe also amplified DNA from Aspergillus niger and Cryptococcus neoformans, respectively. Both SCAR markers detected as little as 0.001 ng of H. capsulatum DNA, while the M antigen probe detected 0.5 ng of fungal DNA. The SCAR markers revealed the fungal presence better than the M antigen probe in contaminated soil and guano samples. Based on our results, the 1281-1283220 marker can be used to detect and identify H. capsulatum in samples from different sources. PMID:22189121
Violation of an Evolutionarily Conserved Immunoglobulin Diversity Gene Sequence Preference Promotes Production of dsDNA-Specific IgG Antibodies

PubMed Central

Silva-Sanchez, Aaron; Liu, Cun Ren; Vale, Andre M.; Khass, Mohamed; Kapoor, Pratibha; Elgavish, Ada; Ivanov, Ivaylo I.; Ippolito, Gregory C.; Schelonka, Robert L.; Schoeb, Trenton R.; Burrows, Peter D.; Schroeder, Harry W.

2015-01-01

Variability in the developing antibody repertoire is focused on the third complementarity determining region of the H chain (CDR-H3), which lies at the center of the antigen binding site where it often plays a decisive role in antigen binding. The power of VDJ recombination and N nucleotide addition has led to the common conception that the sequence of CDR-H3 is unrestricted in its variability and random in its composition. Under this view, the immune response is solely controlled by somatic positive and negative clonal selection mechanisms that act on individual B cells to promote production of protective antibodies and prevent the production of self-reactive antibodies. This concept of a repertoire of random antigen binding sites is inconsistent with the observation that diversity (DH) gene segment sequence content by reading frame (RF) is evolutionarily conserved, creating biases in the prevalence and distribution of individual amino acids in CDR-H3. For example, arginine, which is often found in the CDR-H3 of dsDNA binding autoantibodies, is under-represented in the commonly used DH RFs rearranged by deletion, but is a frequent component of rarely used inverted RF1 (iRF1), which is rearranged by inversion. To determine the effect of altering this germline bias in DH gene segment sequence on autoantibody production, we generated mice that by genetic manipulation are forced to utilize an iRF1 sequence encoding two arginines. Over a one year period we collected serial serum samples from these unimmunized, specific pathogen-free mice and found that more than one-fifth of them contained elevated levels of dsDNA-binding IgG, but not IgM; whereas mice with a wild type DH sequence did not. Thus, germline bias against the use of arginine enriched DH sequence helps to reduce the likelihood of producing self-reactive antibodies. PMID:25706374
Layers: A molecular surface peeling algorithm and its applications to analyze protein structures

PubMed Central

Karampudi, Naga Bhushana Rao; Bahadur, Ranjit Prasad

2015-01-01

We present an algorithm ‘Layers’ to peel the atoms of proteins as layers. Using Layers we show an efficient way to transform protein structures into 2D pattern, named residue transition pattern (RTP), which is independent of molecular orientations. RTP explains the folding patterns of proteins and hence identification of similarity between proteins is simple and reliable using RTP than with the standard sequence or structure based methods. Moreover, Layers generates a fine-tunable coarse model for the molecular surface by using non-random sampling. The coarse model can be used for shape comparison, protein recognition and ligand design. Additionally, Layers can be used to develop biased initial configuration of molecules for protein folding simulations. We have developed a random forest classifier to predict the RTP of a given polypeptide sequence. Layers is a standalone application; however, it can be merged with other applications to reduce the computational load when working with large datasets of protein structures. Layers is available freely at http://www.csb.iitkgp.ernet.in/applications/mol_layers/main. PMID:26553411

Portable and Error-Free DNA-Based Data Storage.

PubMed

Yazdi, S M Hossein Tabatabaei; Gabrys, Ryan; Milenkovic, Olgica

2017-07-10

DNA-based data storage is an emerging nonvolatile memory technology of potentially unprecedented density, durability, and replication efficiency. The basic system implementation steps include synthesizing DNA strings that contain user information and subsequently retrieving them via high-throughput sequencing technologies. Existing architectures enable reading and writing but do not offer random-access and error-free data recovery from low-cost, portable devices, which is crucial for making the storage technology competitive with classical recorders. Here we show for the first time that a portable, random-access platform may be implemented in practice using nanopore sequencers. The novelty of our approach is to design an integrated processing pipeline that encodes data to avoid costly synthesis and sequencing errors, enables random access through addressing, and leverages efficient portable sequencing via new iterative alignment and deletion error-correcting codes. Our work represents the only known random access DNA-based data storage system that uses error-prone nanopore sequencers, while still producing error-free readouts with the highest reported information rate/density. As such, it represents a crucial step towards practical employment of DNA molecules as storage media.
Epidemiology of foot-and-mouth disease in Landhi Dairy Colony, Pakistan, the world largest Buffalo colony

PubMed Central

Klein, Joern; Hussain, Manzoor; Ahmad, Munir; Afzal, Muhammad; Alexandersen, Soren

2008-01-01

Background Foot-and-mouth disease (FMD) is endemic in Pakistan and causes huge economic losses. This work focus on the Landhi Dairy Colony (LDC), located in the suburbs of Karachi. LDC is the largest Buffalo colony in the world, with more than 300,000 animals (around 95% buffaloes and 5% cattle, as well as an unknown number of sheep and goats). Each month from April 2006 to April 2007 we collected mouth-swabs from apparently healthy buffaloes and cattle, applying a convenient sampling based on a two-stage random sampling scheme, in conjunction with participatory information from each selected farm. Furthermore, we also collected epithelium samples from animals with clinical disease, as well as mouth-swabs samples from those farms. In addition, we analysed a total of 180 serum samples randomly collecting 30 samples each month at the local slaughterhouse, from October 2006 to March 2007. Samples have been screened for FMDV by real-time RT-PCR and the partial or full 1D coding region of selected isolates has been sequenced. Serum samples have been analysed by applying serotype-specific antibody ELISA and non-structural proteins (NSP) antibody ELISA. Results FMDV infection prevalence at aggregate level shows an endemic occurrence of FMDV in the colony, with peaks in August 2006, December 2006 and February 2007 to March 2007. A significant association of prevalence peaks to the rainy seasons, which includes the coldest time of the year and the muslimic Eid-festival, has been demonstrated. Participatory information indicated that 88% of all questioned farmers vaccinate their animals. Analysis of the serum samples showed high levels of antibodies for serotypes O, A, Asia 1 and C. The median endpoint-titre for all tested serotypes, except serotype C, in VNT titration is at a serum dilution of equal or above 1/100. All 180 serum samples collected have been tested for antibodies against the non-structural proteins and all but four have been found positive. Out of the 106 swab-samples from apparently healthy and affected animals positive in real-time RT-PCR, we sequenced the partial or full 1D coding region from 58 samples. In addition we sequenced the full 1D coding region of 17 epithelium samples from animals with clinical signs of FMD. From all sequenced samples, swabs and epithelium, 19 belong to the regional PanAsia II lineage of serotype O and 56 to the A/Iran/2005 lineage of serotype A. Conclusion For an effective and realisable FMD control program in LDC, we suggest to introduce a twice annually mass vaccination of all buffaloes and cattle in the colony. These mass vaccinations should optimally take place shortly before the beginning of the two rainy periods, e.g. in June and September. Those vaccinations should, in our opinion, be in addition to the already individually performed vaccinations of single animals, as the latter usually targets only newly introduced animals. This suggested combination of mass vaccination of all large ruminants with the already performed individually vaccination should provide a continuous high level of herd immunity in the entire colony. Vaccines used for this purpose should contain the matching vaccine strains, i.e. as our results indicate antigens for A/Iran/2005 and the regional type of serotype O (PanAsia II), but also antigens of the, in this world region endemic, Asia 1 lineage should be included. In the long term it will be important to control the vaccine use, so that subclinical FMD will be avoided. PMID:18445264
Long-range correlations and charge transport properties of DNA sequences

NASA Astrophysics Data System (ADS)

Liu, Xiao-liang; Ren, Yi; Xie, Qiong-tao; Deng, Chao-sheng; Xu, Hui

2010-04-01

By using Hurst's analysis and transfer approach, the rescaled range functions and Hurst exponents of human chromosome 22 and enterobacteria phage lambda DNA sequences are investigated and the transmission coefficients, Landauer resistances and Lyapunov coefficients of finite segments based on above genomic DNA sequences are calculated. In a comparison with quasiperiodic and random artificial DNA sequences, we find that λ-DNA exhibits anticorrelation behavior characterized by a Hurst exponent 0.5
Truly random number generation: an example

NASA Astrophysics Data System (ADS)

Frauchiger, Daniela; Renner, Renato

2013-10-01

Randomness is crucial for a variety of applications, ranging from gambling to computer simulations, and from cryptography to statistics. However, many of the currently used methods for generating randomness do not meet the criteria that are necessary for these applications to work properly and safely. A common problem is that a sequence of numbers may look random but nevertheless not be truly random. In fact, the sequence may pass all standard statistical tests and yet be perfectly predictable. This renders it useless for many applications. For example, in cryptography, the predictability of a "andomly" chosen password is obviously undesirable. Here, we review a recently developed approach to generating true | and hence unpredictable | randomness.
A general strategy for cloning viroids and other small circular RNAs that uses minimal amounts of template and does not require prior knowledge of its sequence.

PubMed

Navarro, B; Daròs, J A; Flores, R

1996-01-01

Two PCR-based methods are described for obtaining clones of small circular RNAs of unknown sequence and for which only minute amounts are available. To avoid introducing any assumption about the RNA sequence, synthesis of the cDNAs is initiated with random primers. The cDNA population is then PCR-amplified using a primer whose sequence is present at both sides of the cDNAs, since they have been obtained with random hexamers and then a linker with the sequence of the PCR primer has been ligated to their termini, or because the cDNAs have been synthesized with an oligonucleotide that contains the sequence of the PCR primer at its 5' end and six randomized positions at its 3' end. The procedures need only approximately 50 ng of purified RNA template. The reasons for the emergence of cloning artifacts and precautions to avoid them are discussed.
Interfaces of Malignant and Immunologic Clonal Dynamics in Ovarian Cancer.

PubMed

Zhang, Allen W; McPherson, Andrew; Milne, Katy; Kroeger, David R; Hamilton, Phineas T; Miranda, Alex; Funnell, Tyler; Little, Nicole; de Souza, Camila P E; Laan, Sonya; LeDoux, Stacey; Cochrane, Dawn R; Lim, Jamie L P; Yang, Winnie; Roth, Andrew; Smith, Maia A; Ho, Julie; Tse, Kane; Zeng, Thomas; Shlafman, Inna; Mayo, Michael R; Moore, Richard; Failmezger, Henrik; Heindl, Andreas; Wang, Yi Kan; Bashashati, Ali; Grewal, Diljot S; Brown, Scott D; Lai, Daniel; Wan, Adrian N C; Nielsen, Cydney B; Huebner, Curtis; Tessier-Cloutier, Basile; Anglesio, Michael S; Bouchard-Côté, Alexandre; Yuan, Yinyin; Wasserman, Wyeth W; Gilks, C Blake; Karnezis, Anthony N; Aparicio, Samuel; McAlpine, Jessica N; Huntsman, David G; Holt, Robert A; Nelson, Brad H; Shah, Sohrab P

2018-05-07

High-grade serous ovarian cancer (HGSC) exhibits extensive malignant clonal diversity with widespread but non-random patterns of disease dissemination. We investigated whether local immune microenvironment factors shape tumor progression properties at the interface of tumor-infiltrating lymphocytes (TILs) and cancer cells. Through multi-region study of 212 samples from 38 patients with whole-genome sequencing, immunohistochemistry, histologic image analysis, gene expression profiling, and T and B cell receptor sequencing, we identified three immunologic subtypes across samples and extensive within-patient diversity. Epithelial CD8+ TILs negatively associated with malignant diversity, reflecting immunological pruning of tumor clones inferred by neoantigen depletion, HLA I loss of heterozygosity, and spatial tracking between T cell and tumor clones. In addition, combinatorial prognostic effects of mutational processes and immune properties were observed, illuminating how specific genomic aberration types associate with immune response and impact survival. We conclude that within-patient spatial immune microenvironment variation shapes intraperitoneal malignant spread, provoking new evolutionary perspectives on HGSC clonal dispersion. Copyright © 2018 Elsevier Inc. All rights reserved.
Weight distributions for turbo codes using random and nonrandom permutations

NASA Technical Reports Server (NTRS)

Dolinar, S.; Divsalar, D.

1995-01-01

This article takes a preliminary look at the weight distributions achievable for turbo codes using random, nonrandom, and semirandom permutations. Due to the recursiveness of the encoders, it is important to distinguish between self-terminating and non-self-terminating input sequences. The non-self-terminating sequences have little effect on decoder performance, because they accumulate high encoded weight until they are artificially terminated at the end of the block. From probabilistic arguments based on selecting the permutations randomly, it is concluded that the self-terminating weight-2 data sequences are the most important consideration in the design of constituent codes; higher-weight self-terminating sequences have successively decreasing importance. Also, increasing the number of codes and, correspondingly, the number of permutations makes it more and more likely that the bad input sequences will be broken up by one or more of the permuters. It is possible to design nonrandom permutations that ensure that the minimum distance due to weight-2 input sequences grows roughly as the square root of (2N), where N is the block length. However, these nonrandom permutations amplify the bad effects of higher-weight inputs, and as a result they are inferior in performance to randomly selected permutations. But there are 'semirandom' permutations that perform nearly as well as the designed nonrandom permutations with respect to weight-2 input sequences and are not as susceptible to being foiled by higher-weight inputs.
DNA-based random number generation in security circuitry.

PubMed

Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

2010-06-01

DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.
Bacterial diversity in faeces from polar bear (Ursus maritimus) in Arctic Svalbard.

PubMed

Glad, Trine; Bernhardsen, Pål; Nielsen, Kaare M; Brusetti, Lorenzo; Andersen, Magnus; Aars, Jon; Sundset, Monica A

2010-01-14

Polar bears (Ursus maritimus) are major predators in the Arctic marine ecosystem, feeding mainly on seals, and living closely associated with sea ice. Little is known of their gut microbial ecology and the main purpose of this study was to investigate the microbial diversity in faeces of polar bears in Svalbard, Norway (74-81 degrees N, 10-33 degrees E). In addition the level of blaTEM alleles, encoding ampicillin resistance (ampr) were determined. In total, ten samples were collected from ten individual bears, rectum swabs from five individuals in 2004 and faeces samples from five individuals in 2006. A 16S rRNA gene clone library was constructed, and all sequences obtained from 161 clones showed affiliation with the phylum Firmicutes, with 160 sequences identified as Clostridiales and one sequence identified as unclassified Firmicutes. The majority of the sequences (70%) were affiliated with the genus Clostridium. Aerobic heterotrophic cell counts on chocolate agar ranged between 5.0 x 10(4) to 1.6 x 10(6) colony forming units (cfu)/ml for the rectum swabs and 4.0 x 10(3) to 1.0 x 10(5) cfu/g for the faeces samples. The proportion of ampr bacteria ranged from 0% to 44%. All of 144 randomly selected ampr isolates tested positive for enzymatic beta-lactamase activity. Three % of the ampr isolates from the rectal samples yielded positive results when screened for the presence of blaTEM genes by PCR. BlaTEM alleles were also detected by PCR in two out of three total faecal DNA samples from polar bears. The bacterial diversity in faeces from polar bears in their natural environment in Svalbard is low compared to other animal species, with all obtained clones affiliating to Firmicutes. Furthermore, only low levels of blaTEM alleles were detected in contrast to their increasing prevalence in some clinical and commensal bacterial populations.
Bacterial diversity in faeces from polar bear (Ursus maritimus) in Arctic Svalbard

PubMed Central

2010-01-01

Background Polar bears (Ursus maritimus) are major predators in the Arctic marine ecosystem, feeding mainly on seals, and living closely associated with sea ice. Little is known of their gut microbial ecology and the main purpose of this study was to investigate the microbial diversity in faeces of polar bears in Svalbard, Norway (74-81°N, 10-33°E). In addition the level of blaTEM alleles, encoding ampicillin resistance (ampr) were determined. In total, ten samples were collected from ten individual bears, rectum swabs from five individuals in 2004 and faeces samples from five individuals in 2006. Results A 16S rRNA gene clone library was constructed, and all sequences obtained from 161 clones showed affiliation with the phylum Firmicutes, with 160 sequences identified as Clostridiales and one sequence identified as unclassified Firmicutes. The majority of the sequences (70%) were affiliated with the genus Clostridium. Aerobic heterotrophic cell counts on chocolate agar ranged between 5.0 × 104 to 1.6 × 106 colony forming units (cfu)/ml for the rectum swabs and 4.0 × 103 to 1.0 × 105 cfu/g for the faeces samples. The proportion of ampr bacteria ranged from 0% to 44%. All of 144 randomly selected ampr isolates tested positive for enzymatic β-lactamase activity. Three % of the ampr isolates from the rectal samples yielded positive results when screened for the presence of blaTEM genes by PCR. BlaTEM alleles were also detected by PCR in two out of three total faecal DNA samples from polar bears. Conclusion The bacterial diversity in faeces from polar bears in their natural environment in Svalbard is low compared to other animal species, with all obtained clones affiliating to Firmicutes. Furthermore, only low levels of blaTEM alleles were detected in contrast to their increasing prevalence in some clinical and commensal bacterial populations. PMID:20074323
On the design of henon and logistic map-based random number generator

NASA Astrophysics Data System (ADS)

Magfirawaty; Suryadi, M. T.; Ramli, Kalamullah

2017-10-01

The key sequence is one of the main elements in the cryptosystem. True Random Number Generators (TRNG) method is one of the approaches to generating the key sequence. The randomness source of the TRNG divided into three main groups, i.e. electrical noise based, jitter based and chaos based. The chaos based utilizes a non-linear dynamic system (continuous time or discrete time) as an entropy source. In this study, a new design of TRNG based on discrete time chaotic system is proposed, which is then simulated in LabVIEW. The principle of the design consists of combining 2D and 1D chaotic systems. A mathematical model is implemented for numerical simulations. We used comparator process as a harvester method to obtain the series of random bits. Without any post processing, the proposed design generated random bit sequence with high entropy value and passed all NIST 800.22 statistical tests.
Secure self-calibrating quantum random-bit generator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fiorentino, M.; Santori, C.; Spillane, S. M.

2007-03-15

Random-bit generators (RBGs) are key components of a variety of information processing applications ranging from simulations to cryptography. In particular, cryptographic systems require 'strong' RBGs that produce high-entropy bit sequences, but traditional software pseudo-RBGs have very low entropy content and therefore are relatively weak for cryptography. Hardware RBGs yield entropy from chaotic or quantum physical systems and therefore are expected to exhibit high entropy, but in current implementations their exact entropy content is unknown. Here we report a quantum random-bit generator (QRBG) that harvests entropy by measuring single-photon and entangled two-photon polarization states. We introduce and implement a quantum tomographicmore » method to measure a lower bound on the 'min-entropy' of the system, and we employ this value to distill a truly random-bit sequence. This approach is secure: even if an attacker takes control of the source of optical states, a secure random sequence can be distilled.« less
10-year trend in quantity and quality of pediatric randomized controlled trials published in mainland China: 2002–2011

PubMed Central

2013-01-01

Background Quality assessment of pediatric randomized controlled trials (RCTs) in China is limited. The aim of this study was to evaluate the quantitative trends and quality indicators of RCTs published in mainland China over a recent 10-year period. Methods We individually searched all 17 available pediatric journals published in China from January 1, 2002 to December 30, 2011 to identify RCTs of drug treatment in participants under the age of 18 years. The quality was evaluated according to the Cochrane quality assessment protocol. Results Of 1287 journal issues containing 44398 articles, a total of 2.4% (1077/44398) articles were included in the analysis. The proportion of RCTs increased from 0.28% in 2002 to 0.32% in 2011. Individual sample sizes ranged from 10 to 905 participants (median 81 participants); 2.3% of the RCTs were multiple center trials; 63.9% evaluated Western medicine, 32.5% evaluated traditional Chinese medicine; 15% used an adequate method of random sequence generation; and 10.4% used a quasi-random method for randomization. Only 1% of the RCTs reported adequate allocation concealment and 0.6% reported the method of blinding. The follow-up period was from 7 days to 96 months, with a median of 7.5 months. There was incomplete outcome data reported in 8.3%, of which 4.5% (4/89) used intention-to-treat analysis. Only 0.4% of the included trials used adequate random sequence allocation, concealment and blinding. The articles published from 2007 to 2011 revealed an improvement in the randomization method compared with articles published from 2002 to 2006 (from 2.7% to 23.6%, p = 0.000). Conclusions In mainland China, the quantity of RCTs did not increase in the pediatric population, and the general quality was relatively poor. Quality improvements were suboptimal in the later 5 years. PMID:23914882
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences.

PubMed

Warris, Sven; Boymans, Sander; Muiser, Iwe; Noback, Michiel; Krijnen, Wim; Nap, Jan-Peter

2014-01-13

Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification.
Discriminative motif discovery via simulated evolution and random under-sampling.

PubMed

Song, Tao; Gu, Hong

2014-01-01

Conserved motifs in biological sequences are closely related to their structure and functions. Recently, discriminative motif discovery methods have attracted more and more attention. However, little attention has been devoted to the data imbalance problem, which is one of the main reasons affecting the performance of the discriminative models. In this article, a simulated evolution method is applied to solve the multi-class imbalance problem at the stage of data preprocessing, and at the stage of Hidden Markov Models (HMMs) training, a random under-sampling method is introduced for the imbalance between the positive and negative datasets. It is shown that, in the task of discovering targeting motifs of nine subcellular compartments, the motifs found by our method are more conserved than the methods without considering data imbalance problem and recover the most known targeting motifs from Minimotif Miner and InterPro. Meanwhile, we use the found motifs to predict protein subcellular localization and achieve higher prediction precision and recall for the minority classes.
The coalescent process in models with selection and recombination.

PubMed

Hudson, R R; Kaplan, N L

1988-11-01

The statistical properties of the process describing the genealogical history of a random sample of genes at a selectively neutral locus which is linked to a locus at which natural selection operates are investigated. It is found that the equations describing this process are simple modifications of the equations describing the process assuming that the two loci are completely linked. Thus, the statistical properties of the genealogical process for a random sample at a neutral locus linked to a locus with selection follow from the results obtained for the selected locus. Sequence data from the alcohol dehydrogenase (Adh) region of Drosophila melanogaster are examined and compared to predictions based on the theory. It is found that the spatial distribution of nucleotide differences between Fast and Slow alleles of Adh is very similar to the spatial distribution predicted if balancing selection operates to maintain the allozyme variation at the Adh locus. The spatial distribution of nucleotide differences between different Slow alleles of Adh do not match the predictions of this simple model very well.
The Effect of Practice Schedule on Context-Dependent Learning.

PubMed

Lee, Ya-Yun; Fisher, Beth E

2018-03-02

It is well established that random practice compared to blocked practice enhances motor learning. Additionally, while information in the environment may be incidental, learning is also enhanced when an individual performs a task within the same environmental context in which the task was originally practiced. This study aimed to disentangle the effects of practice schedule and incidental/environmental context on motor learning. Participants practiced three finger sequences under either a random or blocked practice schedule. Each sequence was associated with specific incidental context (i.e., color and location on the computer screen) during practice. The participants were tested under the conditions when the sequence-context associations remained the same or were changed from that of practice. When the sequence-context association was changed, the participants who practiced under blocked schedule demonstrated greater performance decrement than those who practiced under random schedule. The findings suggested that those participants who practiced under random schedule were more resistant to the change of environmental context.
Selection of Optimal Polypurine Tract Region Sequences during Moloney Murine Leukemia Virus Replication

PubMed Central

Robson, Nicole D.; Telesnitsky, Alice

2000-01-01

Retrovirus plus-strand synthesis is primed by a cleavage remnant of the polypurine tract (PPT) region of viral RNA. In this study, we tested replication properties for Moloney murine leukemia viruses with targeted mutations in the PPT and in conserved sequences upstream, as well as for pools of mutants with randomized sequences in these regions. The importance of maintaining some purine residues within the PPT was indicated both by examining the evolution of random PPT pools and from the replication properties of targeted mutants. Although many different PPT sequences could support efficient replication and one mutant that contained two differences in the core PPT was found to replicate as well as the wild type, some sequences in the core PPT clearly conferred advantages over others. Contributions of sequences upstream of the core PPT were examined with deletion mutants. A conserved T-stretch within the upstream sequence was examined in detail and found to be unimportant to helper functions. Evolution of virus pools containing randomized T-stretch sequences demonstrated marked preference for the wild-type sequence in six of its eight positions. These findings demonstrate that maintenance of the T-rich element is more important to viral replication than is maintenance of the core PPT. PMID:11044073
Perceptions of randomness in binary sequences: Normative, heuristic, or both?

PubMed

Reimers, Stian; Donkin, Chris; Le Pelley, Mike E

2018-03-01

When people consider a series of random binary events, such as tossing an unbiased coin and recording the sequence of heads (H) and tails (T), they tend to erroneously rate sequences with less internal structure or order (such as HTTHT) as more probable than sequences containing more structure or order (such as HHHHH). This is traditionally explained as a local representativeness effect: Participants assume that the properties of long sequences of random outcomes-such as an equal proportion of heads and tails, and little internal structure-should also apply to short sequences. However, recent theoretical work has noted that the probability of a particular sequence of say, heads and tails of length n, occurring within a larger (>n) sequence of coin flips actually differs by sequence, so P(HHHHH)
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

PubMed

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.

Deep learning of the regulatory grammar of yeast 5′ untranslated regions from 500,000 random sequences

PubMed Central

Groves, Benjamin; Kuchina, Anna; Rosenberg, Alexander B.; Jojic, Nebojsa; Fields, Stanley; Seelig, Georg

2017-01-01

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of cis-regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5′ untranslated region (UTR) of mRNAs in the yeast Saccharomyces cerevisiae. We constructed a library of half a million 50-nucleotide-long random 5′ UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5′ UTRs as well as native S. cerevisiae 5′ UTRs. The model additionally was used to computationally evolve highly active 5′ UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model. PMID:29097404
Random Amplification and Pyrosequencing for Identification of Novel Viral Genome Sequences

PubMed Central

Hang, Jun; Forshey, Brett M.; Kochel, Tadeusz J.; Li, Tao; Solórzano, Víctor Fiestas; Halsey, Eric S.; Kuschner, Robert A.

2012-01-01

ssRNA viruses have high levels of genomic divergence, which can lead to difficulty in genomic characterization of new viruses using traditional PCR amplification and sequencing methods. In this study, random reverse transcription, anchored random PCR amplification, and high-throughput pyrosequencing were used to identify orthobunyavirus sequences from total RNA extracted from viral cultures of acute febrile illness specimens. Draft genome sequence for the orthobunyavirus L segment was assembled and sequentially extended using de novo assembly contigs from pyrosequencing reads and orthobunyavirus sequences in GenBank as guidance. Accuracy and continuous coverage were achieved by mapping all reads to the L segment draft sequence. Subsequently, RT-PCR and Sanger sequencing were used to complete the genome sequence. The complete L segment was found to be 6936 bases in length, encoding a 2248-aa putative RNA polymerase. The identified L segment was distinct from previously published South American orthobunyaviruses, sharing 63% and 54% identity at the nucleotide and amino acid level, respectively, with the complete Oropouche virus L segment and 73% and 81% identity at the nucleotide and amino acid level, respectively, with a partial Caraparu virus L segment. The result demonstrated the effectiveness of a sequence-independent amplification and next-generation sequencing approach for obtaining complete viral genomes from total nucleic acid extracts and its use in pathogen discovery. PMID:22468136
The impact of within-herd genetic variation upon inferred transmission trees for foot-and-mouth disease virus.

PubMed

Valdazo-González, Begoña; Kim, Jan T; Soubeyrand, Samuel; Wadsworth, Jemma; Knowles, Nick J; Haydon, Daniel T; King, Donald P

2015-06-01

Full-genome sequences have been used to monitor the fine-scale dynamics of epidemics caused by RNA viruses. However, the ability of this approach to confidently reconstruct transmission trees is limited by the knowledge of the genetic diversity of viruses that exist within different epidemiological units. In order to address this question, this study investigated the variability of 45 foot-and-mouth disease virus (FMDV) genome sequences (from 33 animals) that were collected during 2007 from eight premises (10 different herds) in the United Kingdom. Bayesian and statistical parsimony analysis demonstrated that these sequences exhibited clustering which was consistent with a transmission scenario describing herd-to-herd spread of the virus. As an alternative to analysing all of the available samples in future epidemics, the impact of randomly selecting one sequence from each of these herds was used to assess cost-effective methods that might be used to infer transmission trees during FMD outbreaks. Using these approaches, 85% and 91% of the resulting topologies were either identical or differed by only one edge from a reference tree comprising all of the sequences generated within the outbreak. The sequence distances that accrued during sequential transmission events between epidemiological units was estimated to be 4.6 nucleotides, although the genetic variability between viruses recovered from chronic carrier animals was higher than between viruses from animals with acute-stage infection: an observation which poses challenges for the use of simple approaches to infer transmission trees. This study helps to develop strategies for sampling during FMD outbreaks, and provides data that will guide the development of further models to support control policies in the event of virus incursions into FMD free countries. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
DNA Barcoding through Quaternary LDPC Codes

PubMed Central

Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

2015-01-01

For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10−2 per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10−9 at the expense of a rate of read losses just in the order of 10−6. PMID:26492348
DNA Barcoding through Quaternary LDPC Codes.

PubMed

Tapia, Elizabeth; Spetale, Flavio; Krsticevic, Flavia; Angelone, Laura; Bulacio, Pilar

2015-01-01

For many parallel applications of Next-Generation Sequencing (NGS) technologies short barcodes able to accurately multiplex a large number of samples are demanded. To address these competitive requirements, the use of error-correcting codes is advised. Current barcoding systems are mostly built from short random error-correcting codes, a feature that strongly limits their multiplexing accuracy and experimental scalability. To overcome these problems on sequencing systems impaired by mismatch errors, the alternative use of binary BCH and pseudo-quaternary Hamming codes has been proposed. However, these codes either fail to provide a fine-scale with regard to size of barcodes (BCH) or have intrinsic poor error correcting abilities (Hamming). Here, the design of barcodes from shortened binary BCH codes and quaternary Low Density Parity Check (LDPC) codes is introduced. Simulation results show that although accurate barcoding systems of high multiplexing capacity can be obtained with any of these codes, using quaternary LDPC codes may be particularly advantageous due to the lower rates of read losses and undetected sample misidentification errors. Even at mismatch error rates of 10(-2) per base, 24-nt LDPC barcodes can be used to multiplex roughly 2000 samples with a sample misidentification error rate in the order of 10(-9) at the expense of a rate of read losses just in the order of 10(-6).
Reproducibility and quantitation of amplicon sequencing-based detection

PubMed Central

Zhou, Jizhong; Wu, Liyou; Deng, Ye; Zhi, Xiaoyang; Jiang, Yi-Huei; Tu, Qichao; Xie, Jianping; Van Nostrand, Joy D; He, Zhili; Yang, Yunfeng

2011-01-01

To determine the reproducibility and quantitation of the amplicon sequencing-based detection approach for analyzing microbial community structure, a total of 24 microbial communities from a long-term global change experimental site were examined. Genomic DNA obtained from each community was used to amplify 16S rRNA genes with two or three barcode tags as technical replicates in the presence of a small quantity (0.1% wt/wt) of genomic DNA from Shewanella oneidensis MR-1 as the control. The technical reproducibility of the amplicon sequencing-based detection approach is quite low, with an average operational taxonomic unit (OTU) overlap of 17.2%±2.3% between two technical replicates, and 8.2%±2.3% among three technical replicates, which is most likely due to problems associated with random sampling processes. Such variations in technical replicates could have substantial effects on estimating β-diversity but less on α-diversity. A high variation was also observed in the control across different samples (for example, 66.7-fold for the forward primer), suggesting that the amplicon sequencing-based detection approach could not be quantitative. In addition, various strategies were examined to improve the comparability of amplicon sequencing data, such as increasing biological replicates, and removing singleton sequences and less-representative OTUs across biological replicates. Finally, as expected, various statistical analyses with preprocessed experimental data revealed clear differences in the composition and structure of microbial communities between warming and non-warming, or between clipping and non-clipping. Taken together, these results suggest that amplicon sequencing-based detection is useful in analyzing microbial community structure even though it is not reproducible and quantitative. However, great caution should be taken in experimental design and data interpretation when the amplicon sequencing-based detection approach is used for quantitative analysis of the β-diversity of microbial communities. PMID:21346791
Competition between B-Z and B-L transitions in a single DNA molecule: Computational studies

NASA Astrophysics Data System (ADS)

Kwon, Ah-Young; Nam, Gi-Moon; Johner, Albert; Kim, Seyong; Hong, Seok-Cheol; Lee, Nam-Kyung

2016-02-01

Under negative torsion, DNA adopts left-handed helical forms, such as Z-DNA and L-DNA. Using the random copolymer model developed for a wormlike chain, we represent a single DNA molecule with structural heterogeneity as a helical chain consisting of monomers which can be characterized by different helical senses and pitches. By Monte Carlo simulation, where we take into account bending and twist fluctuations explicitly, we study sequence dependence of B-Z transitions under torsional stress and tension focusing on the interaction with B-L transitions. We consider core sequences, (GC) n repeats or (TG) n repeats, which can interconvert between the right-handed B form and the left-handed Z form, imbedded in a random sequence, which can convert to left-handed L form with different (tension dependent) helical pitch. We show that Z-DNA formation from the (GC) n sequence is always supported by unwinding torsional stress but Z-DNA formation from the (TG) n sequence, which are more costly to convert but numerous, can be strongly influenced by the quenched disorder in the surrounding random sequence.
Parametric and non-parametric masking of randomness in sequence alignments can be improved and leads to better resolved trees.

PubMed

Kück, Patrick; Meusemann, Karen; Dambach, Johannes; Thormann, Birthe; von Reumont, Björn M; Wägele, Johann W; Misof, Bernhard

2010-03-31

Methods of alignment masking, which refers to the technique of excluding alignment blocks prior to tree reconstructions, have been successful in improving the signal-to-noise ratio in sequence alignments. However, the lack of formally well defined methods to identify randomness in sequence alignments has prevented a routine application of alignment masking. In this study, we compared the effects on tree reconstructions of the most commonly used profiling method (GBLOCKS) which uses a predefined set of rules in combination with alignment masking, with a new profiling approach (ALISCORE) based on Monte Carlo resampling within a sliding window, using different data sets and alignment methods. While the GBLOCKS approach excludes variable sections above a certain threshold which choice is left arbitrary, the ALISCORE algorithm is free of a priori rating of parameter space and therefore more objective. ALISCORE was successfully extended to amino acids using a proportional model and empirical substitution matrices to score randomness in multiple sequence alignments. A complex bootstrap resampling leads to an even distribution of scores of randomly similar sequences to assess randomness of the observed sequence similarity. Testing performance on real data, both masking methods, GBLOCKS and ALISCORE, helped to improve tree resolution. The sliding window approach was less sensitive to different alignments of identical data sets and performed equally well on all data sets. Concurrently, ALISCORE is capable of dealing with different substitution patterns and heterogeneous base composition. ALISCORE and the most relaxed GBLOCKS gap parameter setting performed best on all data sets. Correspondingly, Neighbor-Net analyses showed the most decrease in conflict. Alignment masking improves signal-to-noise ratio in multiple sequence alignments prior to phylogenetic reconstruction. Given the robust performance of alignment profiling, alignment masking should routinely be used to improve tree reconstructions. Parametric methods of alignment profiling can be easily extended to more complex likelihood based models of sequence evolution which opens the possibility of further improvements.
A Sequence-Independent Strategy for Detection and Cloning of Circular DNA Virus Genomes by Using Multiply Primed Rolling-Circle Amplification

PubMed Central

Rector, Annabel; Tachezy, Ruth; Van Ranst, Marc

2004-01-01

The discovery of novel viruses has often been accomplished by using hybridization-based methods that necessitate the availability of a previously characterized virus genome probe or knowledge of the viral nucleotide sequence to construct consensus or degenerate PCR primers. In their natural replication cycle, certain viruses employ a rolling-circle mechanism to propagate their circular genomes, and multiply primed rolling-circle amplification (RCA) with φ29 DNA polymerase has recently been applied in the amplification of circular plasmid vectors used in cloning. We employed an isothermal RCA protocol that uses random hexamer primers to amplify the complete genomes of papillomaviruses without the need for prior knowledge of their DNA sequences. We optimized this RCA technique with extracted human papillomavirus type 16 (HPV-16) DNA from W12 cells, using a real-time quantitative PCR assay to determine amplification efficiency, and obtained a 2.4 × 104-fold increase in HPV-16 DNA concentration. We were able to clone the complete HPV-16 genome from this multiply primed RCA product. The optimized protocol was subsequently applied to a bovine fibropapillomatous wart tissue sample. Whereas no papillomavirus DNA could be detected by restriction enzyme digestion of the original sample, multiply primed RCA enabled us to obtain a sufficient amount of papillomavirus DNA for restriction enzyme analysis, cloning, and subsequent sequencing of a novel variant of bovine papillomavirus type 1. The multiply primed RCA method allows the discovery of previously unknown papillomaviruses, and possibly also other circular DNA viruses, without a priori sequence information. PMID:15113879
Identification of the infectious source of an unusual outbreak of histoplasmosis, in a hotel in Acapulco, state of Guerrero, Mexico.

PubMed

Taylor, Maria Lucia; Ruíz-Palacios, Guillermo M; del Rocío Reyes-Montes, María; Rodríguez-Arellanes, Gabriela; Carreto-Binaghi, Laura E; Duarte-Escalante, Esperanza; Hernández-Ramírez, Aurora; Pérez, Armando; Suárez-Alvarez, Roberto O; Roldán-Aragón, Yuri A; Romero-Martínez, Rafael; Sahaza-Cardona, Jorge H; Sifuentes-Osornio, José; Soto-Ramírez, Luis E; Peña-Sandoval, Gabriela R

2005-09-01

Three isolates of Histoplasma capsulatum were identified from mice lung, liver, and spleen inoculated with soil samples of the X hotel's ornamental potted plants that had been fertilized with organic material known as compost. The presence of H. capsulatum in the original compost was detected using the dot-enzyme-linked immunosorbent assay. Nested-PCR, using a specific protein Hcp100 coding gene sequence, confirmed the fungal identification associated with an unusual histoplasmosis outbreak in Acapulco. Although, diversity between the H. capsulatum isolate from the hotel and some clinical isolates from Guerrero (positive controls) was observed using random amplification of polymorphic DNA based-PCR, sequence analyses of H-anti and ole fragment genes revealed a high homology (92-99%) between them.
Environmental Screening for the Scedosporium apiospermum Species Complex in Public Parks in Bangkok, Thailand.

PubMed

Luplertlop, Natthanej; Pumeesat, Potjaman; Muangkaew, Watcharamat; Wongsuk, Thanwa; Alastruey-Izquierdo, Ana

2016-01-01

The Scedosporium apiospermum species complex, comprising filamentous fungal species S. apiospermum sensu stricto, S. boydii, S. aurantiacum, S. dehoogii and S. minutispora, are important pathogens that cause a wide variety of infections. Although some species (S. boydii and S. apiospermum) have been isolated from patients in Thailand, no environmental surveys of these fungi have been performed in Thailand or surrounding countries. In this study, we isolated and identified species of these fungi from 68 soil and 16 water samples randomly collected from 10 parks in Bangkok. After filtration and subsequent inoculation of samples on Scedo-Select III medium, colony morphological examinations and microscopic observations were performed. Scedosporium species were isolated from soil in 8 of the 10 parks, but were only detected in one water sample. Colony morphologies of isolates from 41 of 68 soil samples (60.29%) and 1 of 15 water samples (6.67%) were consistent with that of the S. apiospermum species complex. Each morphological type was selected for species identification based on DNA sequencing and phylogenetic analysis of the β-tubulin gene. Three species of the S. apiospermum species complex were identified: S. apiospermum (71 isolates), S. aurantiacum (6 isolates) and S. dehoogii (5 isolates). In addition, 16 sequences could not be assigned to an exact Scedosporium species. According to our environmental survey, the S. apiospermum species complex is widespread in soil in Bangkok, Thailand.
Environmental Screening for the Scedosporium apiospermum Species Complex in Public Parks in Bangkok, Thailand

PubMed Central

Pumeesat, Potjaman; Muangkaew, Watcharamat; Wongsuk, Thanwa; Alastruey-Izquierdo, Ana

2016-01-01

The Scedosporium apiospermum species complex, comprising filamentous fungal species S. apiospermum sensu stricto, S. boydii, S. aurantiacum, S. dehoogii and S. minutispora, are important pathogens that cause a wide variety of infections. Although some species (S. boydii and S. apiospermum) have been isolated from patients in Thailand, no environmental surveys of these fungi have been performed in Thailand or surrounding countries. In this study, we isolated and identified species of these fungi from 68 soil and 16 water samples randomly collected from 10 parks in Bangkok. After filtration and subsequent inoculation of samples on Scedo-Select III medium, colony morphological examinations and microscopic observations were performed. Scedosporium species were isolated from soil in 8 of the 10 parks, but were only detected in one water sample. Colony morphologies of isolates from 41 of 68 soil samples (60.29%) and 1 of 15 water samples (6.67%) were consistent with that of the S. apiospermum species complex. Each morphological type was selected for species identification based on DNA sequencing and phylogenetic analysis of the β-tubulin gene. Three species of the S. apiospermum species complex were identified: S. apiospermum (71 isolates), S. aurantiacum (6 isolates) and S. dehoogii (5 isolates). In addition, 16 sequences could not be assigned to an exact Scedosporium species. According to our environmental survey, the S. apiospermum species complex is widespread in soil in Bangkok, Thailand. PMID:27467209
Two-sample discrimination of Poisson means

NASA Technical Reports Server (NTRS)

Lampton, M.

1994-01-01

This paper presents a statistical test for detecting significant differences between two random count accumulations. The null hypothesis is that the two samples share a common random arrival process with a mean count proportional to each sample's exposure. The model represents the partition of N total events into two counts, A and B, as a sequence of N independent Bernoulli trials whose partition fraction, f, is determined by the ratio of the exposures of A and B. The detection of a significant difference is claimed when the background (null) hypothesis is rejected, which occurs when the observed sample falls in a critical region of (A, B) space. The critical region depends on f and the desired significance level, alpha. The model correctly takes into account the fluctuations in both the signals and the background data, including the important case of small numbers of counts in the signal, the background, or both. The significance can be exactly determined from the cumulative binomial distribution, which in turn can be inverted to determine the critical A(B) or B(A) contour. This paper gives efficient implementations of these tests, based on lookup tables. Applications include the detection of clustering of astronomical objects, the detection of faint emission or absorption lines in photon-limited spectroscopy, the detection of faint emitters or absorbers in photon-limited imaging, and dosimetry.
Taxator-tk: precise taxonomic assignment of metagenomes by fast approximation of evolutionary neighborhoods

PubMed Central

Dröge, J.; Gregor, I.; McHardy, A. C.

2015-01-01

Motivation: Metagenomics characterizes microbial communities by random shotgun sequencing of DNA isolated directly from an environment of interest. An essential step in computational metagenome analysis is taxonomic sequence assignment, which allows identifying the sequenced community members and reconstructing taxonomic bins with sequence data for the individual taxa. For the massive datasets generated by next-generation sequencing technologies, this cannot be performed with de-novo phylogenetic inference methods. We describe an algorithm and the accompanying software, taxator-tk, which performs taxonomic sequence assignment by fast approximate determination of evolutionary neighbors from sequence similarities. Results: Taxator-tk was precise in its taxonomic assignment across all ranks and taxa for a range of evolutionary distances and for short as well as for long sequences. In addition to the taxonomic binning of metagenomes, it is well suited for profiling microbial communities from metagenome samples because it identifies bacterial, archaeal and eukaryotic community members without being affected by varying primer binding strengths, as in marker gene amplification, or copy number variations of marker genes across different taxa. Taxator-tk has an efficient, parallelized implementation that allows the assignment of 6 Gb of sequence data per day on a standard multiprocessor system with 10 CPU cores and microbial RefSeq as the genomic reference data. Availability and implementation: Taxator-tk source and binary program files are publicly available at http://algbio.cs.uni-duesseldorf.de/software/. Contact: Alice.McHardy@uni-duesseldorf.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:25388150
Bone-eating Osedax females and their 'harems' of dwarf males are recruited from a common larval pool.

PubMed

Vrijenhoek, R C; Johnson, S B; Rouse, G W

2008-10-01

Extreme male dwarfism occurs in Osedax (Annelida: Siboglinidae), marine worms with sessile females that bore into submerged bones. Osedax are hypothesized to use environmental sex determination, in which undifferentiated larvae that settle on bones develop as females, and subsequent larvae that settle on females transform into dwarf males. This study addresses several hypotheses regarding possible recruitment sources for the males: (i) common larval pool--males and females are sampled from a common pool of larvae; (ii) neighbourhood--males are supplied by a limited number of neighbouring females; and (iii) arrhenotoky--males are primarily the sons of host females. Osedax rubiplumus were sampled from submerged whalebones located at 1820-m and 2893-m depths in Monterey Bay, California. Immature females typically did not host males, but mature females maintained male 'harems' that grew exponentially in the number of males as female size increased. Allozyme analysis of the females revealed binomial proportions of nuclear genotypes, an indication of random sexual mating. Analysis of mitochondrial DNA sequences from the male harems and their host females allowed us to reject the arrhenotoky and neighbourhood hypotheses for male recruitment. No significant partitioning of mitochondrial diversity existed between the male and female sexes, or between subsamples of worms collected at different depths or during different years (2002-2007). Mitochondrial sequence diversity was very high in these worms, suggesting that as many as 10(6) females contributed to a common larval pool from which the two sexes were randomly drawn.
Fast selection of miRNA candidates based on large-scale pre-computed MFE sets of randomized sequences

PubMed Central

2014-01-01

Background Small RNAs are important regulators of genome function, yet their prediction in genomes is still a major computational challenge. Statistical analyses of pre-miRNA sequences indicated that their 2D structure tends to have a minimal free energy (MFE) significantly lower than MFE values of equivalently randomized sequences with the same nucleotide composition, in contrast to other classes of non-coding RNA. The computation of many MFEs is, however, too intensive to allow for genome-wide screenings. Results Using a local grid infrastructure, MFE distributions of random sequences were pre-calculated on a large scale. These distributions follow a normal distribution and can be used to determine the MFE distribution for any given sequence composition by interpolation. It allows on-the-fly calculation of the normal distribution for any candidate sequence composition. Conclusion The speedup achieved makes genome-wide screening with this characteristic of a pre-miRNA sequence practical. Although this particular property alone will not be able to distinguish miRNAs from other sequences sufficiently discriminative, the MFE-based P-value should be added to the parameters of choice to be included in the selection of potential miRNA candidates for experimental verification. PMID:24418292
Reduced randomness in quantum cryptography with sequences of qubits encoded in the same basis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lamoureux, L.-P.; Cerf, N. J.; Bechmann-Pasquinucci, H.

2006-03-15

We consider the cloning of sequences of qubits prepared in the states used in the BB84 or six-state quantum cryptography protocol, and show that the single-qubit fidelity is unaffected even if entire sequences of qubits are prepared in the same basis. This result is only valid provided that the sequences are much shorter than the total key. It is of great importance for practical quantum cryptosystems because it reduces the need for high-speed random number generation without impairing on the security against finite-size cloning attacks.
Polymorphism in the Eruption Sequence of Primary Dentition: A Cross-sectional Study

PubMed Central

Bhojraj, Nandlal; Narayanappa

2017-01-01

Introduction Primary teeth have shown wide variations in their eruption time among different population. Population specific eruption ages are provided as mean with standard deviations or median ages with its percentile range. This alone will be insufficient for prediction of tooth eruption sequence because they provide no information on the frequency of sequence variation within the pairs of teeth. Norms of polymorphic variation in the eruption sequence can be more useful. Aim This study aims at providing norms for the sequence polymorphism in primary teeth among the children of Mysore population. Materials and Methods A cross-sectional study was designed with 1392 children, recruited from December 2015 to June 2016 by simple random sampling method. Tooth was recorded as present or absent. Across the entire possible intra quadrant tooth pair, cases of present-present, absent-absent, present-absent and absent-present and were counted and computed as percentages. Results Sequence polymorphisms were more common in 82-84 pairs of teeth. Significant polymorphic reverse sequence was observed in 52-54 (9%), 82-84 (35%) in males and 82-84 (18%) in females. There was no polymorphism in maxillary arch in females. Conclusion The present study provides the baseline data values for sequence variation in primary teeth eruption. To the best of investigators knowledge, there are no previous studies describing the sequence polymorphism in primary teeth in Indian population. The results of this study helps in assessment of eruption sequence problems in paediatric dentistry and in evaluation and prediction of tooth eruption sequence in individual child. PMID:28658912
Generating intrinsically disordered protein conformational ensembles from a Markov chain

NASA Astrophysics Data System (ADS)

Cukier, Robert I.

2018-03-01

Intrinsically disordered proteins (IDPs) sample a diverse conformational space. They are important to signaling and regulatory pathways in cells. An entropy penalty must be payed when an IDP becomes ordered upon interaction with another protein or a ligand. Thus, the degree of conformational disorder of an IDP is of interest. We create a dichotomic Markov model that can explore entropic features of an IDP. The Markov condition introduces local (neighbor residues in a protein sequence) rotamer dependences that arise from van der Waals and other chemical constraints. A protein sequence of length N is characterized by its (information) entropy and mutual information, MIMC, the latter providing a measure of the dependence among the random variables describing the rotamer probabilities of the residues that comprise the sequence. For a Markov chain, the MIMC is proportional to the pair mutual information MI which depends on the singlet and pair probabilities of neighbor residue rotamer sampling. All 2N sequence states are generated, along with their probabilities, and contrasted with the probabilities under the assumption of independent residues. An efficient method to generate realizations of the chain is also provided. The chain entropy, MIMC, and state probabilities provide the ingredients to distinguish different scenarios using the terminologies: MoRF (molecular recognition feature), not-MoRF, and not-IDP. A MoRF corresponds to large entropy and large MIMC (strong dependence among the residues' rotamer sampling), a not-MoRF corresponds to large entropy but small MIMC, and not-IDP corresponds to low entropy irrespective of the MIMC. We show that MorFs are most appropriate as descriptors of IDPs. They provide a reasonable number of high-population states that reflect the dependences between neighbor residues, thus classifying them as IDPs, yet without very large entropy that might lead to a too high entropy penalty.
Genetic variability of porcine circovirus 2 (PCV2) field isolates from vaccinated and non-vaccinated pig herds in Germany.

PubMed

Reiner, Gerald; Hofmeister, Regina; Willems, Hermann

2015-10-22

Porcine circovirus 2 (PCV2) is responsible for a wide range of associated diseases (PCVD) affecting swine production worldwide. Highly efficient commercial vaccines induce protective immunity, but PCV2 is still circulating in vaccinated farms. Thus, and because of the viruś high mutation rate, recent findings provide concerns about PCV2 strains capable to escape vaccination. Based on 2156 samples from individual pigs of 315 herds from Germany we describe a high effectivity of vaccination between 2008 and the third quarter of 2011. In this period, virus load dropped continuously and at the end of this period it hardly reached the limit of quantification. Thereafter, virus loads re-increased, although most of the herds were still vaccinated. Sixty-two randomly selected samples from vaccinated (n=28) and non-vaccinated (n=26) herds between 2008 and 2012 were completely sequenced. As compared to the PCV2b reference sequence 259 polymorphisms were detected. Polymorhisms were analysed for associations to vaccination status, genotype (PCV2a/PCV2b), and virus load. PCV2a sequences were significantly repelled by PCV2b. One SNP at position 1182 (g.1182G>T), involved in capsid epitope formation, was significantly associated with the PCV2 genotype (2a/2b). Moreover, this SNP was affected by vaccination, with effects on allele frequencies and viral load, independent from the PCV2 genotype (2a/2b). We conclude that there is indeed evidence for a selectional impact of vaccination on the PCV2 sequence, especially on nucleotides involved in epitope formation. Such variation might be responsible for the observed re-increase of PCV2-loads in samples from the end of 2011 in Germany. Copyright © 2015 Elsevier B.V. All rights reserved.

Prevalence of Complement-Mediated Cell Lysis-like Gene (sicG) in Streptococcus dysgalactiae subsp. equisimilis Isolates From Japan (2014-2016).

PubMed

Takahashi, Takashi; Fujita, Tomohiro; Shibayama, Akiyoshi; Tsuyuki, Yuzo; Yoshida, Haruno

2017-07-01

Streptococcus dysgalactiae subsp. equisimilis (SDSE; a β-hemolytic streptococcus of human or animal origin) infections are emerging worldwide. We evaluated the clonal distribution of complement-mediated cell lysis-like gene (sicG) among SDSE isolates from three central prefectures of Japan. Group G/C β-hemolytic streptococci were collected from three institutions from April 2014 to March 2016. Fifty-five strains (52 from humans and three from animals) were identified as SDSE on the basis of 16S rRNA sequencing data.; they were obtained from 25 sterile (blood, joint fluid, and cerebrospinal fluid) and 30 non-sterile (skin-, respiratory tract-, and genitourinary tract-origin) samples. emm genotyping, multilocus sequence typing, sicG amplification/sequencing, and random amplified polymorphic DNA (RAPD) analysis of sicG-positive strains were performed. sicG was detected in 30.9% of the isolates (16 human and one canine) and the genes from the 16 human samples (blood, 10; open pus, 3; sputum, 2; throat swab, 1) and one canine sample (open pus) showed the same sequence pattern. All sicG-harboring isolates belonged to clonal complex (CC) 17, and the most prevalent emm type was stG6792 (82.4%). There was a significant association between sicG presence and the development of skin/soft tissue infections. CC17 isolates with sicG could be divided into three subtypes by RAPD analysis. CC17 SDSE harboring sicG might have spread into three closely-related prefectures in central Japan during 2014-2016. Clonal analysis of isolates from other areas might be needed to monitor potentially virulent strains in humans and animals. © The Korean Society for Laboratory Medicine
Maximizing lipocalin prediction through balanced and diversified training set and decision fusion.

PubMed

Nath, Abhigyan; Subbiah, Karthikeyan

2015-12-01

Lipocalins are short in sequence length and perform several important biological functions. These proteins are having less than 20% sequence similarity among paralogs. Experimentally identifying them is an expensive and time consuming process. The computational methods based on the sequence similarity for allocating putative members to this family are also far elusive due to the low sequence similarity existing among the members of this family. Consequently, the machine learning methods become a viable alternative for their prediction by using the underlying sequence/structurally derived features as the input. Ideally, any machine learning based prediction method must be trained with all possible variations in the input feature vector (all the sub-class input patterns) to achieve perfect learning. A near perfect learning can be achieved by training the model with diverse types of input instances belonging to the different regions of the entire input space. Furthermore, the prediction performance can be improved through balancing the training set as the imbalanced data sets will tend to produce the prediction bias towards majority class and its sub-classes. This paper is aimed to achieve (i) the high generalization ability without any classification bias through the diversified and balanced training sets as well as (ii) enhanced the prediction accuracy by combining the results of individual classifiers with an appropriate fusion scheme. Instead of creating the training set randomly, we have first used the unsupervised Kmeans clustering algorithm to create diversified clusters of input patterns and created the diversified and balanced training set by selecting an equal number of patterns from each of these clusters. Finally, probability based classifier fusion scheme was applied on boosted random forest algorithm (which produced greater sensitivity) and K nearest neighbour algorithm (which produced greater specificity) to achieve the enhanced predictive performance than that of individual base classifiers. The performance of the learned models trained on Kmeans preprocessed training set is far better than the randomly generated training sets. The proposed method achieved a sensitivity of 90.6%, specificity of 91.4% and accuracy of 91.0% on the first test set and sensitivity of 92.9%, specificity of 96.2% and accuracy of 94.7% on the second blind test set. These results have established that diversifying training set improves the performance of predictive models through superior generalization ability and balancing the training set improves prediction accuracy. For smaller data sets, unsupervised Kmeans based sampling can be an effective technique to increase generalization than that of the usual random splitting method. Copyright © 2015 Elsevier Ltd. All rights reserved.
A measurement of disorder in binary sequences

NASA Astrophysics Data System (ADS)

Gong, Longyan; Wang, Haihong; Cheng, Weiwen; Zhao, Shengmei

2015-03-01

We propose a complex quantity, AL, to characterize the degree of disorder of L-length binary symbolic sequences. As examples, we respectively apply it to typical random and deterministic sequences. One kind of random sequences is generated from a periodic binary sequence and the other is generated from the logistic map. The deterministic sequences are the Fibonacci and Thue-Morse sequences. In these analyzed sequences, we find that the modulus of AL, denoted by |AL | , is a (statistically) equivalent quantity to the Boltzmann entropy, the metric entropy, the conditional block entropy and/or other quantities, so it is a useful quantitative measure of disorder. It can be as a fruitful index to discern which sequence is more disordered. Moreover, there is one and only one value of |AL | for the overall disorder characteristics. It needs extremely low computational costs. It can be easily experimentally realized. From all these mentioned, we believe that the proposed measure of disorder is a valuable complement to existing ones in symbolic sequences.
Genetic distances and phylogenetic trees of different Awassi sheep populations based on DNA sequencing.

PubMed

Al-Atiyat, R M; Aljumaah, R S

2014-08-27

This study aimed to estimate evolutionary distances and to reconstruct phylogeny trees between different Awassi sheep populations. Thirty-two sheep individuals from three different geographical areas of Jordan and the Kingdom of Saudi Arabia (KSA) were randomly sampled. DNA was extracted from the tissue samples and sequenced using the T7 promoter universal primer. Different phylogenetic trees were reconstructed from 0.64-kb DNA sequences using the MEGA software with the best general time reverse distance model. Three methods of distance estimation were then used. The maximum composite likelihood test was considered for reconstructing maximum likelihood, neighbor-joining and UPGMA trees. The maximum likelihood tree indicated three major clusters separated by cytosine (C) and thymine (T). The greatest distance was shown between the South sheep and North sheep. On the other hand, the KSA sheep as an outgroup showed shorter evolutionary distance to the North sheep population than to the others. The neighbor-joining and UPGMA trees showed quite reliable clusters of evolutionary differentiation of Jordan sheep populations from the Saudi population. The overall results support geographical information and ecological types of the sheep populations studied. Summing up, the resulting phylogeny trees may contribute to the limited information about the genetic relatedness and phylogeny of Awassi sheep in nearby Arab countries.
Score distributions of gapped multiple sequence alignments down to the low-probability tail

NASA Astrophysics Data System (ADS)

Fieth, Pascal; Hartmann, Alexander K.

2016-08-01

Assessing the significance of alignment scores of optimally aligned DNA or amino acid sequences can be achieved via the knowledge of the score distribution of random sequences. But this requires obtaining the distribution in the biologically relevant high-scoring region, where the probabilities are exponentially small. For gapless local alignments of infinitely long sequences this distribution is known analytically to follow a Gumbel distribution. Distributions for gapped local alignments and global alignments of finite lengths can only be obtained numerically. To obtain result for the small-probability region, specific statistical mechanics-based rare-event algorithms can be applied. In previous studies, this was achieved for pairwise alignments. They showed that, contrary to results from previous simple sampling studies, strong deviations from the Gumbel distribution occur in case of finite sequence lengths. Here we extend the studies to multiple sequence alignments with gaps, which are much more relevant for practical applications in molecular biology. We study the distributions of scores over a large range of the support, reaching probabilities as small as 10-160, for global and local (sum-of-pair scores) multiple alignments. We find that even after suitable rescaling, eliminating the sequence-length dependence, the distributions for multiple alignment differ from the pairwise alignment case. Furthermore, we also show that the previously discussed Gaussian correction to the Gumbel distribution needs to be refined, also for the case of pairwise alignments.
Recommendations for Accurate Resolution of Gene and Isoform Allele-Specific Expression in RNA-Seq Data

PubMed Central

Wood, David L. A.; Nones, Katia; Steptoe, Anita; Christ, Angelika; Harliwong, Ivon; Newell, Felicity; Bruxner, Timothy J. C.; Miller, David; Cloonan, Nicole; Grimmond, Sean M.

2015-01-01

Genetic variation modulates gene expression transcriptionally or post-transcriptionally, and can profoundly alter an individual’s phenotype. Measuring allelic differential expression at heterozygous loci within an individual, a phenomenon called allele-specific expression (ASE), can assist in identifying such factors. Massively parallel DNA and RNA sequencing and advances in bioinformatic methodologies provide an outstanding opportunity to measure ASE genome-wide. In this study, matched DNA and RNA sequencing, genotyping arrays and computationally phased haplotypes were integrated to comprehensively and conservatively quantify ASE in a single human brain and liver tissue sample. We describe a methodological evaluation and assessment of common bioinformatic steps for ASE quantification, and recommend a robust approach to accurately measure SNP, gene and isoform ASE through the use of personalized haplotype genome alignment, strict alignment quality control and intragenic SNP aggregation. Our results indicate that accurate ASE quantification requires careful bioinformatic analyses and is adversely affected by sample specific alignment confounders and random sampling even at moderate sequence depths. We identified multiple known and several novel ASE genes in liver, including WDR72, DSP and UBD, as well as genes that contained ASE SNPs with imbalance direction discordant with haplotype phase, explainable by annotated transcript structure, suggesting isoform derived ASE. The methods evaluated in this study will be of use to researchers performing highly conservative quantification of ASE, and the genes and isoforms identified as ASE of interest to researchers studying those loci. PMID:25965996
[High resolution melting analysis for detecting of JAK2V617F mutation in patients with myeloproliferative neoplasms].

PubMed

Chen, Hai-Hua; Yang, Ji-Long; Lu, Hui-Fang; Zhou, Wei-Jun; Yao, Fei; Deng, Lan

2014-02-01

This study was purposed to investigate the feasibility of high resolution melting (HRM) in the detection of JAK2V617F mutation in patients with myeloproliferative neoplasm (MPN). The 29 marrow samples randomly selected from patients with clinically diagnosed MPN from January 2008 to January 2011 were detected by HRM method. The results of HRM analysis were compared with that detected by allele specific polymerase chain reaction (AS-PCR) and DNA direct sequencing. The results showed that the JAK2V617F mutations were detected in 11 (37.9%, 11/29) cases by HRM, and its comparability with the direct sequencing result was 100%. While the consistency of AS-PCR with the direct sequencing was moderate (Kappa = 0.179, P = 0.316). It is concluded that the HRM analysis may be an optimal method for clinical screening of JAK2V617F mutation due to its simplicity and promptness with a high specificity.
IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

EPA Science Inventory

Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...
Evolution in a Test Tube: Exploring the Structure and Function of RNA Probes

DTIC Science & Technology

2008-05-02

Bartel, D.P. and Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. 24...Szostak, J.W. (1993) Isolation of New Ribozymes from a Large Pool of Random Sequences. Science, New Series 261, 1141-1418. Chen, Ying; Carlini
Dynamic laser speckle for non-destructive quality evaluation of bread

NASA Astrophysics Data System (ADS)

Stoykova, E.; Ivanov, B.; Shopova, M.; Lyubenova, T.; Panchev, I.; Sainov, V.

2010-10-01

Coherent illumination of a diffuse object yields a randomly varying interference pattern, which changes over time at any modification of the object. This phenomenon can be used for detection and visualization of physical or biological activity in various objects (e.g. fruits, seeds, coatings) through statistical description of laser speckle dynamics. The present report aims at non-destructive full-field evaluation of bread by spatial-temporal characterization of laser speckle. The main purpose of the conducted experiments was to prove the ability of the dynamic speckle method to indicate activity within the studied bread samples. In the set-up for acquisition and storage of dynamic speckle patterns an expanded beam from a DPSS laser (532 nm and 100mW) illuminated the sample through a ground glass diffuser. A CCD camera, adjusted to focus the sample, recorded regularly a sequence of images (8 bits and 780 x 582 squared pixels, sized 8.1 × 8.1 μm) at sampling frequency 0.25 Hz. A temporal structure function was calculated to evaluate activity of the bread samples in time using the full images in the sequence. In total, 7 samples of two types of bread were monitored during a chemical and physical process of bread's staling. Segmentation of images into matrixes of isometric fragments was also utilized. The results proved the potential of dynamic speckle as effective means for monitoring the process of bread staling and ability of this approach to differentiate between different types of bread.
Observation of quantum criticality with ultracold atoms in optical lattices

NASA Astrophysics Data System (ADS)

Zhang, Xibo

As biological problems are becoming more complex and data growing at a rate much faster than that of computer hardware, new and faster algorithms are required. This dissertation investigates computational problems arising in two of the fields: comparative genomics and epigenomics, and employs a variety of computational techniques to address the problems. One fundamental question in the studies of chromosome evolution is whether the rearrangement breakpoints are happening at random positions or along certain hotspots. We investigate the breakpoint reuse phenomenon, and show the analyses that support the more recently proposed fragile breakage model as opposed to the conventional random breakage models for chromosome evolution. The identification of syntenic regions between chromosomes forms the basis for studies of genome architectures, comparative genomics, and evolutionary genomics. The previous synteny block reconstruction algorithms could not be scaled to a large number of mammalian genomes being sequenced; neither did they address the issue of generating non-overlapping synteny blocks suitable for analyzing rearrangements and evolutionary history of large-scale duplications prevalent in plant genomes. We present a new unified synteny block generation algorithm based on A-Bruijn graph framework that overcomes these shortcomings. In the epigenome sequencing, a sample may contain a mixture of epigenomes and there is a need to resolve the distinct methylation patterns from the mixture. Many sequencing applications, such as haplotype inference for diploid or polyploid genomes, and metagenomic sequencing, share the similar objective: to infer a set of distinct assemblies from reads that are sequenced from a heterogeneous sample and subsequently aligned to a reference genome. We model the problem from both a combinatorial and a statistical angles. First, we describe a theoretical framework. A linear-time algorithm is then given to resolve a minimum number of assemblies that are consistent with all reads, substantially improving on previous algorithms. An efficient algorithm is also described to determine a set of assemblies that is consistent with a maximum subset of the reads, a previously untreated problem. We then prove that allowing nested reads or permitting mismatches between reads and their assemblies renders these problems NP-hard. Second, we describe a mixture model-based approach, and applied the model for the detection of allele-specific methylations.
Entropy and long-range memory in random symbolic additive Markov chains

NASA Astrophysics Data System (ADS)

Melnik, S. S.; Usatenko, O. V.

2016-06-01

The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
Entropy and long-range memory in random symbolic additive Markov chains.

PubMed

Melnik, S S; Usatenko, O V

2016-06-01

The goal of this paper is to develop an estimate for the entropy of random symbolic sequences with elements belonging to a finite alphabet. As a plausible model, we use the high-order additive stationary ergodic Markov chain with long-range memory. Supposing that the correlations between random elements of the chain are weak, we express the conditional entropy of the sequence by means of the symbolic pair correlation function. We also examine an algorithm for estimating the conditional entropy of finite symbolic sequences. We show that the entropy contains two contributions, i.e., the correlation and the fluctuation. The obtained analytical results are used for numerical evaluation of the entropy of written English texts and DNA nucleotide sequences. The developed theory opens the way for constructing a more consistent and sophisticated approach to describe the systems with strong short-range and weak long-range memory.
Evaluation of a Real-Time Reverse Transcription-PCR Assay for Detection of Enterovirus D68 in Clinical Samples from an Outbreak in New York State in 2014.

PubMed

Zhuge, Jian; Vail, Eric; Bush, Jeffrey L; Singelakis, Lauren; Huang, Weihua; Nolan, Sheila M; Haas, Janet P; Engel, Helen; Della Posta, Millicent; Yoon, Esther C; Fallon, John T; Wang, Guiqing

2015-06-01

An outbreak of severe respiratory illness associated with enterovirus D68 (EV-D68) infection was reported in mid-August 2014 in the United States. In this study, we evaluated the diagnostic utility of an EV-D68-specific real-time reverse transcription-PCR (rRT-PCR) that was recently developed by the Centers for Disease Control and Prevention in clinical samples. Nasopharyngeal (NP) swab specimens from patients in a recent outbreak of respiratory illness in the lower Hudson Valley, New York State, were collected and examined for the presence of human rhinovirus or enterovirus using the FilmArray Respiratory Panel (RP) assay. Samples positive by RP were assessed using EV-D68 rRT-PCR, and the data were compared to results from sequencing analysis of partial VP1 and 5' untranslated region (5'-UTR) sequences of the EV genome. A total of 285 RP-positive NP specimens (260 from the 2014 outbreak and 25 from 2013) were analyzed by rRT-PCR; EV-D68 was detected in 74 of 285 (26.0%) specimens examined. Data for comparisons between rRT-PCR and sequencing analysis were obtained from 194 NP specimens. EV-D68 detection was confirmed by sequencing analysis in 71 of 74 positive and in 1 of 120 randomly selected negative specimens by rRT-PCR. The EV-D68 rRT-PCR showed diagnostic sensitivity and specificity of 98.6% and 97.5%, respectively. Our data suggest that the EV-D68 rRT-PCR is a reliable assay for detection of EV-D68 in clinical samples and has a potential to be used as a tool for rapid diagnosis and outbreak investigation of EV-D68-associated infections in clinical and public health laboratories. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Evaluation of a Real-Time Reverse Transcription-PCR Assay for Detection of Enterovirus D68 in Clinical Samples from an Outbreak in New York State in 2014

PubMed Central

Zhuge, Jian; Vail, Eric; Bush, Jeffrey L.; Singelakis, Lauren; Huang, Weihua; Nolan, Sheila M.; Haas, Janet P.; Engel, Helen; Della Posta, Millicent; Yoon, Esther C.; Fallon, John T.

2015-01-01

An outbreak of severe respiratory illness associated with enterovirus D68 (EV-D68) infection was reported in mid-August 2014 in the United States. In this study, we evaluated the diagnostic utility of an EV-D68-specific real-time reverse transcription-PCR (rRT-PCR) that was recently developed by the Centers for Disease Control and Prevention in clinical samples. Nasopharyngeal (NP) swab specimens from patients in a recent outbreak of respiratory illness in the lower Hudson Valley, New York State, were collected and examined for the presence of human rhinovirus or enterovirus using the FilmArray Respiratory Panel (RP) assay. Samples positive by RP were assessed using EV-D68 rRT-PCR, and the data were compared to results from sequencing analysis of partial VP1 and 5′ untranslated region (5′-UTR) sequences of the EV genome. A total of 285 RP-positive NP specimens (260 from the 2014 outbreak and 25 from 2013) were analyzed by rRT-PCR; EV-D68 was detected in 74 of 285 (26.0%) specimens examined. Data for comparisons between rRT-PCR and sequencing analysis were obtained from 194 NP specimens. EV-D68 detection was confirmed by sequencing analysis in 71 of 74 positive and in 1 of 120 randomly selected negative specimens by rRT-PCR. The EV-D68 rRT-PCR showed diagnostic sensitivity and specificity of 98.6% and 97.5%, respectively. Our data suggest that the EV-D68 rRT-PCR is a reliable assay for detection of EV-D68 in clinical samples and has a potential to be used as a tool for rapid diagnosis and outbreak investigation of EV-D68-associated infections in clinical and public health laboratories. PMID:25854481
Examination of Sarcocystis spp. of giant snakes from Australia and Southeast Asia confirms presence of a known pathogen - Sarcocystis nesbitti.

PubMed

Wassermann, Marion; Raisch, Lisa; Lyons, Jessica Ann; Natusch, Daniel James Deans; Richter, Sarah; Wirth, Mareike; Preeprem, Piyarat; Khoprasert, Yuvaluk; Ginting, Sulaiman; Mackenstedt, Ute; Jäkel, Thomas

2017-01-01

We examined Sarcocystis spp. in giant snakes from the Indo-Australian Archipelago and Australia using a combination of morphological (size of sporocyst) and molecular analyses. We amplified by PCR nuclear 18S rDNA from single sporocysts in order to detect mixed infections and unequivocally assign the retrieved sequences to the corresponding parasite stage. Sarcocystis infection was generally high across the study area, with 78 (68%) of 115 examined pythons being infected by one or more Sarcocystis spp. Among 18 randomly chosen, sporocyst-positive samples (11 from Southeast Asia, 7 from Northern Australia) the only Sarcocystis species detected in Southeast Asian snakes was S. singaporensis (in reticulated pythons), which was absent from all Australian samples. We distinguished three different Sarcocystis spp. in the Australian sample set; two were excreted by scrub pythons and one by the spotted python. The sequence of the latter is an undescribed species phylogenetically related to S. lacertae. Of the two Sarcocystis species found in scrub pythons, one showed an 18S rRNA gene sequence similar to S. zamani, which is described from Australia for the first time. The second sequence was identical/similar to that of S. nesbitti, a known human pathogen that was held responsible for outbreaks of disease among tourists in Malaysia. The potential presence of S. nesbitti in Australia challenges the current hypothesis of a snake-primate life cycle, and would have implications for human health in the region. Further molecular and biological characterizations are required to confirm species identity and determine whether or not the Australian isolate has the same zoonotic potential as its Malaysian counterpart. Finally, the absence of S. nesbitti in samples from reticulated pythons (which were reported to be definitive hosts), coupled with our phylogenetic analyses, suggest that alternative snake hosts may be responsible for transmitting this parasite in Malaysia.
Parallel Mitogenome Sequencing Alleviates Random Rooting Effect in Phylogeography.

PubMed

Hirase, Shotaro; Takeshima, Hirohiko; Nishida, Mutsumi; Iwasaki, Wataru

2016-04-28

Reliably rooted phylogenetic trees play irreplaceable roles in clarifying diversification in the patterns of species and populations. However, such trees are often unavailable in phylogeographic studies, particularly when the focus is on rapidly expanded populations that exhibit star-like trees. A fundamental bottleneck is known as the random rooting effect, where a distant outgroup tends to root an unrooted tree "randomly." We investigated whether parallel mitochondrial genome (mitogenome) sequencing alleviates this effect in phylogeography using a case study on the Sea of Japan lineage of the intertidal goby Chaenogobius annularis Eighty-three C. annularis individuals were collected and their mitogenomes were determined by high-throughput and low-cost parallel sequencing. Phylogenetic analysis of these mitogenome sequences was conducted to root the Sea of Japan lineage, which has a star-like phylogeny and had not been reliably rooted. The topologies of the bootstrap trees were investigated to determine whether the use of mitogenomes alleviated the random rooting effect. The mitogenome data successfully rooted the Sea of Japan lineage by alleviating the effect, which hindered phylogenetic analysis that used specific gene sequences. The reliable rooting of the lineage led to the discovery of a novel, northern lineage that expanded during an interglacial period with high bootstrap support. Furthermore, the finding of this lineage suggested the existence of additional glacial refugia and provided a new recent calibration point that revised the divergence time estimation between the Sea of Japan and Pacific Ocean lineages. This study illustrates the effectiveness of parallel mitogenome sequencing for solving the random rooting problem in phylogeographic studies. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

PubMed Central

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
Repeats of base oligomers as the primordial coding sequences of the primeval earth and their vestiges in modern genes.

PubMed

Ohno, S

1984-01-01

Three outstanding properties uniquely qualify repeats of base oligomers as the primordial coding sequences of all polypeptide chains. First, when compared with randomly generated base sequences in general, they are more likely to have long open reading frames. Second, periodical polypeptide chains specified by such repeats are more likely to assume either alpha-helical or beta-sheet secondary structures than are polypeptide chains of random sequence. Third, provided that the number of bases in the oligomeric unit is not a multiple of 3, these internally repetitious coding sequences are impervious to randomly sustained base substitutions, deletions, and insertions. This is because the recurring periodicity of their polypeptide chains is given by three consecutive copies of the oligomeric unit translated in three different reading frames. Accordingly, when one reading frame is open, the other two are automatically open as well, all three being capable of coding for polypeptide chains of identical periodicity. Under this circumstance, a frame shift due to the deletion or insertion of a number of bases that is not a multiple of 3 fails to alter the down-stream amino acid sequence, and even a base change causing premature chain-termination can silence only one of the three potential coding units. Newly arisen coding sequences in modern organisms are oligomeric repeats, and most of the older genes retain various vestiges of their original internal repetitions. Some of the genes (e.g., oncogenes) have even inherited the property of being impervious to randomly sustained base changes.
Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality

NASA Astrophysics Data System (ADS)

Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.

2018-03-01

This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.

Influence of Layup Sequence on the Surface Accuracy of Carbon Fiber Composite Space Mirrors

NASA Astrophysics Data System (ADS)

Yang, Zhiyong; Liu, Qingnian; Zhang, Boming; Xu, Liang; Tang, Zhanwen; Xie, Yongjie

2018-04-01

Layup sequence is directly related to stiffness and deformation resistance of the composite space mirror, and error caused by layup sequence can affect the surface precision of composite mirrors evidently. Variation of layup sequence with the same total thickness of composite space mirror changes surface form of the composite mirror, which is the focus of our study. In our research, the influence of varied quasi-isotropic stacking sequences and random angular deviation on the surface accuracy of composite space mirrors was investigated through finite element analyses (FEA). We established a simulation model for the studied concave mirror with 500 mm diameter, essential factors of layup sequences and random angular deviations on different plies were discussed. Five guiding findings were described in this study. Increasing total plies, optimizing stacking sequence and keeping consistency of ply alignment in ply placement are effective to improve surface accuracy of composite mirror.
The low information content of Neurospora splicing signals: implications for RNA splicing and intron origin.

PubMed

Collins, Richard A; Stajich, Jason E; Field, Deborah J; Olive, Joan E; DeAbreu, Diane M

2015-05-01

When we expressed a small (0.9 kb) nonprotein-coding transcript derived from the mitochondrial VS plasmid in the nucleus of Neurospora we found that it was efficiently spliced at one or more of eight 5' splice sites and ten 3' splice sites, which are present apparently by chance in the sequence. Further experimental and bioinformatic analyses of other mitochondrial plasmids, random sequences, and natural nuclear genes in Neurospora and other fungi indicate that fungal spliceosomes recognize a wide range of 5' splice site and branchpoint sequences and predict introns to be present at high frequency in random sequence. In contrast, analysis of intronless fungal nuclear genes indicates that branchpoint, 5' splice site and 3' splice site consensus sequences are underrepresented compared with random sequences. This underrepresentation of splicing signals is sufficient to deplete the nuclear genome of splice sites at locations that do not comprise biologically relevant introns. Thus, the splicing machinery can recognize a wide range of splicing signal sequences, but splicing still occurs with great accuracy, not because the splicing machinery distinguishes correct from incorrect introns, but because incorrect introns are substantially depleted from the genome. © 2015 Collins et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Hiding message into DNA sequence through DNA coding and chaotic maps.

PubMed

Liu, Guoyan; Liu, Hongjun; Kadir, Abdurahman

2014-09-01

The paper proposes an improved reversible substitution method to hide data into deoxyribonucleic acid (DNA) sequence, and four measures have been taken to enhance the robustness and enlarge the hiding capacity, such as encode the secret message by DNA coding, encrypt it by pseudo-random sequence, generate the relative hiding locations by piecewise linear chaotic map, and embed the encoded and encrypted message into a randomly selected DNA sequence using the complementary rule. The key space and the hiding capacity are analyzed. Experimental results indicate that the proposed method has a better performance compared with the competing methods with respect to robustness and capacity.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin H; Giver, Lorraine J.

2000-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Recombination of polynucleotide sequences using random or defined primers

DOEpatents

Arnold, Frances H.; Shao, Zhixin; Affholter, Joseph A.; Zhao, Huimin; Giver, Lorraine J.

2001-01-01

A method for in vitro mutagenesis and recombination of polynucleotide sequences based on polymerase-catalyzed extension of primer oligonucleotides is disclosed. The method involves priming template polynucleotide(s) with random-sequences or defined-sequence primers to generate a pool of short DNA fragments with a low level of point mutations. The DNA fragments are subjected to denaturization followed by annealing and further enzyme-catalyzed DNA polymerization. This procedure is repeated a sufficient number of times to produce full-length genes which comprise mutants of the original template polynucleotides. These genes can be further amplified by the polymerase chain reaction and cloned into a vector for expression of the encoded proteins.
Random oligonucleotide mutagenesis: application to a large protein coding sequence of a major histocompatibility complex class I gene, H-2DP.

PubMed Central

Murray, R; Pederson, K; Prosser, H; Muller, D; Hutchison, C A; Frelinger, J A

1988-01-01

We have used random oligonucleotide mutagenesis (or saturation mutagenesis) to create a library of point mutations in the alpha 1 protein domain of a Major Histocompatibility Complex (MHC) molecule. This protein domain is critical for T cell and B cell recognition. We altered the MHC class I H-2DP gene sequence such that synthetic mutant alpha 1 exons (270 bp of coding sequence), which contain mutations identified by sequence analysis, can replace the wild type alpha 1 exon. The synthetic exons were constructed from twelve overlapping oligonucleotides which contained an average of 1.3 random point mutations per intact exon. DNA sequence analysis of mutant alpha 1 exons has shown a point mutant distribution that fits a Poisson distribution, and thus emphasizes the utility of this mutagenesis technique to "scan" a large protein sequence for important mutations. We report our use of saturation mutagenesis to scan an entire exon of the H-2DP gene, a cassette strategy to replace the wild type alpha 1 exon with individual mutant alpha 1 exons, and analysis of mutant molecules expressed on the surface of transfected mouse L cells. Images PMID:2903482
Model parameter estimation approach based on incremental analysis for lithium-ion batteries without using open circuit voltage

NASA Astrophysics Data System (ADS)

Wu, Hongjie; Yuan, Shifei; Zhang, Xi; Yin, Chengliang; Ma, Xuerui

2015-08-01

To improve the suitability of lithium-ion battery model under varying scenarios, such as fluctuating temperature and SoC variation, dynamic model with parameters updated realtime should be developed. In this paper, an incremental analysis-based auto regressive exogenous (I-ARX) modeling method is proposed to eliminate the modeling error caused by the OCV effect and improve the accuracy of parameter estimation. Then, its numerical stability, modeling error, and parametric sensitivity are analyzed at different sampling rates (0.02, 0.1, 0.5 and 1 s). To identify the model parameters recursively, a bias-correction recursive least squares (CRLS) algorithm is applied. Finally, the pseudo random binary sequence (PRBS) and urban dynamic driving sequences (UDDSs) profiles are performed to verify the realtime performance and robustness of the newly proposed model and algorithm. Different sampling rates (1 Hz and 10 Hz) and multiple temperature points (5, 25, and 45 °C) are covered in our experiments. The experimental and simulation results indicate that the proposed I-ARX model can present high accuracy and suitability for parameter identification without using open circuit voltage.
Determining the Significance of Item Order in Randomized Problem Sets

ERIC Educational Resources Information Center

Pardos, Zachary A.; Heffernan, Neil T.

2009-01-01

Researchers who make tutoring systems would like to know which sequences of educational content lead to the most effective learning by their students. The majority of data collected in many ITS systems consist of answers to a group of questions of a given skill often presented in a random sequence. Following work that identifies which items…
Improved diagonal queue medical image steganography using Chaos theory, LFSR, and Rabin cryptosystem.

PubMed

Jain, Mamta; Kumar, Anil; Choudhary, Rishabh Charan

2017-06-01

In this article, we have proposed an improved diagonal queue medical image steganography for patient secret medical data transmission using chaotic standard map, linear feedback shift register, and Rabin cryptosystem, for improvement of previous technique (Jain and Lenka in Springer Brain Inform 3:39-51, 2016). The proposed algorithm comprises four stages, generation of pseudo-random sequences (pseudo-random sequences are generated by linear feedback shift register and standard chaotic map), permutation and XORing using pseudo-random sequences, encryption using Rabin cryptosystem, and steganography using the improved diagonal queues. Security analysis has been carried out. Performance analysis is observed using MSE, PSNR, maximum embedding capacity, as well as by histogram analysis between various Brain disease stego and cover images.
GuiTope: an application for mapping random-sequence peptides to protein sequences.

PubMed

Halperin, Rebecca F; Stafford, Phillip; Emery, Jack S; Navalkar, Krupa Arun; Johnston, Stephen Albert

2012-01-03

Random-sequence peptide libraries are a commonly used tool to identify novel ligands for binding antibodies, other proteins, and small molecules. It is often of interest to compare the selected peptide sequences to the natural protein binding partners to infer the exact binding site or the importance of particular residues. The ability to search a set of sequences for similarity to a set of peptides may sometimes enable the prediction of an antibody epitope or a novel binding partner. We have developed a software application designed specifically for this task. GuiTope provides a graphical user interface for aligning peptide sequences to protein sequences. All alignment parameters are accessible to the user including the ability to specify the amino acid frequency in the peptide library; these frequencies often differ significantly from those assumed by popular alignment programs. It also includes a novel feature to align di-peptide inversions, which we have found improves the accuracy of antibody epitope prediction from peptide microarray data and shows utility in analyzing phage display datasets. Finally, GuiTope can randomly select peptides from a given library to estimate a null distribution of scores and calculate statistical significance. GuiTope provides a convenient method for comparing selected peptide sequences to protein sequences, including flexible alignment parameters, novel alignment features, ability to search a database, and statistical significance of results. The software is available as an executable (for PC) at http://www.immunosignature.com/software and ongoing updates and source code will be available at sourceforge.net.
Trinucleotide cassettes increase diversity of T7 phage-displayed peptide library.

PubMed

Krumpe, Lauren R H; Schumacher, Kathryn M; McMahon, James B; Makowski, Lee; Mori, Toshiyuki

2007-10-05

Amino acid sequence diversity is introduced into a phage-displayed peptide library by randomizing library oligonucleotide DNA. We recently evaluated the diversity of peptide libraries displayed on T7 lytic phage and M13 filamentous phage and showed that T7 phage can display a more diverse amino acid sequence repertoire due to differing processes of viral morphogenesis. In this study, we evaluated and compared the diversity of a 12-mer T7 phage-displayed peptide library randomized using codon-corrected trinucleotide cassettes with a T7 and an M13 12-mer phage-displayed peptide library constructed using the degenerate codon randomization method. We herein demonstrate that the combination of trinucleotide cassette amino acid codon randomization and T7 phage display construction methods resulted in a significant enhancement to the functional diversity of a 12-mer peptide library. This novel library exhibited superior amino acid uniformity and order-of-magnitude increases in amino acid sequence diversity as compared to degenerate codon randomized peptide libraries. Comparative analyses of the biophysical characteristics of the 12-mer peptide libraries revealed the trinucleotide cassette-randomized library to be a unique resource. The combination of T7 phage display and trinucleotide cassette randomization resulted in a novel resource for the potential isolation of binding peptides for new and previously studied molecular targets.
A statistical approach to selecting and confirming validation targets in -omics experiments

PubMed Central

2012-01-01

Background Genomic technologies are, by their very nature, designed for hypothesis generation. In some cases, the hypotheses that are generated require that genome scientists confirm findings about specific genes or proteins. But one major advantage of high-throughput technology is that global genetic, genomic, transcriptomic, and proteomic behaviors can be observed. Manual confirmation of every statistically significant genomic result is prohibitively expensive. This has led researchers in genomics to adopt the strategy of confirming only a handful of the most statistically significant results, a small subset chosen for biological interest, or a small random subset. But there is no standard approach for selecting and quantitatively evaluating validation targets. Results Here we present a new statistical method and approach for statistically validating lists of significant results based on confirming only a small random sample. We apply our statistical method to show that the usual practice of confirming only the most statistically significant results does not statistically validate result lists. We analyze an extensively validated RNA-sequencing experiment to show that confirming a random subset can statistically validate entire lists of significant results. Finally, we analyze multiple publicly available microarray experiments to show that statistically validating random samples can both (i) provide evidence to confirm long gene lists and (ii) save thousands of dollars and hundreds of hours of labor over manual validation of each significant result. Conclusions For high-throughput -omics studies, statistical validation is a cost-effective and statistically valid approach to confirming lists of significant results. PMID:22738145
Methodological reporting quality of randomized controlled trials: A survey of seven core journals of orthopaedics from Mainland China over 5 years following the CONSORT statement.

PubMed

Zhang, J; Chen, X; Zhu, Q; Cui, J; Cao, L; Su, J

2016-11-01

In recent years, the number of randomized controlled trials (RCTs) in the field of orthopaedics is increasing in Mainland China. However, randomized controlled trials (RCTs) are inclined to bias if they lack methodological quality. Therefore, we performed a survey of RCT to assess: (1) What about the quality of RCTs in the field of orthopedics in Mainland China? (2) Whether there is difference between the core journals of the Chinese department of orthopedics and Orthopaedics Traumatology Surgery & Research (OTSR). This research aimed to evaluate the methodological reporting quality according to the CONSORT statement of randomized controlled trials (RCTs) in seven key orthopaedic journals published in Mainland China over 5 years from 2010 to 2014. All of the articles were hand researched on Chongqing VIP database between 2010 and 2014. Studies were considered eligible if the words "random", "randomly", "randomization", "randomized" were employed to describe the allocation way. Trials including animals, cadavers, trials published as abstracts and case report, trials dealing with subgroups analysis, or trials without the outcomes were excluded. In addition, eight articles selected from Orthopaedics Traumatology Surgery & Research (OTSR) between 2010 and 2014 were included in this study for comparison. The identified RCTs are analyzed using a modified version of the Consolidated Standards of Reporting Trials (CONSORT), including the sample size calculation, allocation sequence generation, allocation concealment, blinding and handling of dropouts. A total of 222 RCTs were identified in seven core orthopaedic journals. No trials reported adequate sample size calculation, 74 (33.4%) reported adequate allocation generation, 8 (3.7%) trials reported adequate allocation concealment, 18 (8.1%) trials reported adequate blinding and 16 (7.2%) trials reported handling of dropouts. In OTSR, 1 (12.5%) trial reported adequate sample size calculation, 4 (50.0%) reported adequate allocation generation, 1 (12.5%) trials reported adequate allocation concealment, 2 (25.0%) trials reported adequate blinding and 5 (62.5%) trials reported handling of dropouts. There were statistical differences as for sample size calculation and handling of dropouts between papers from Mainland China and OTSR (P<0.05). The findings of this study show that the methodological reporting quality of RCTs in seven core orthopaedic journals from the Mainland China is far from satisfaction and it needs to further improve to keep up with the standards of the CONSORT statement. Level III case control. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Generating constrained randomized sequences: item frequency matters.

PubMed

French, Robert M; Perruchet, Pierre

2009-11-01

All experimental psychologists understand the importance of randomizing lists of items. However, randomization is generally constrained, and these constraints-in particular, not allowing immediately repeated items-which are designed to eliminate particular biases, frequently engender others. We describe a simple Monte Carlo randomization technique that solves a number of these problems. However, in many experimental settings, we are concerned not only with the number and distribution of items but also with the number and distribution of transitions between items. The algorithm mentioned above provides no control over this. We therefore introduce a simple technique that uses transition tables for generating correctly randomized sequences. We present an analytic method of producing item-pair frequency tables and item-pair transitional probability tables when immediate repetitions are not allowed. We illustrate these difficulties and how to overcome them, with reference to a classic article on word segmentation in infants. Finally, we provide free access to an Excel file that allows users to generate transition tables with up to 10 different item types, as well as to generate appropriately distributed randomized sequences of any length without immediately repeated elements. This file is freely available from http://leadserv.u-bourgogne.fr/IMG/xls/TransitionMatrix.xls.
Random sampling of the Central European bat fauna reveals the existence of numerous hitherto unknown adenoviruses.

PubMed

Vidovszky, Márton; Kohl, Claudia; Boldogh, Sándor; Görföl, Tamás; Wibbelt, Gudrun; Kurth, Andreas; Harrach, Balázs

2015-12-01

From over 1250 extant species of the order Chiroptera, 25 and 28 are known to occur in Germany and Hungary, respectively. Close to 350 samples originating from 28 bat species (17 from Germany, 27 from Hungary) were screened for the presence of adenoviruses (AdVs) using a nested PCR that targets the DNA polymerase gene of AdVs. An additional PCR was designed and applied to amplify a fragment from the gene encoding the IVa2 protein of mastadenoviruses. All German samples originated from organs of bats found moribund or dead. The Hungarian samples were excrements collected from colonies of known bat species, throat or rectal swab samples, taken from live individuals that had been captured for faunistic surveys and migration studies, as well as internal organs of dead specimens. Overall, 51 samples (14.73%) were found positive. We detected 28 seemingly novel and six previously described bat AdVs by sequencing the PCR products. The positivity rate was the highest among the guano samples of bat colonies. In phylogeny reconstructions, the AdVs detected in bats clustered roughly, but not perfectly, according to the hosts' families (Vespertilionidae, Rhinolophidae, Hipposideridae, Phyllostomidae and Pteropodidae). In a few cases, identical sequences were derived from animals of closely related species. On the other hand, some bat species proved to harbour more than one type of AdV. The high prevalence of infection and the large number of chiropteran species worldwide make us hypothesise that hundreds of different yet unknown AdV types might circulate in bats.
HCV Genotyping from NGS Short Reads and Its Application in Genotype Detection from HCV Mixed Infected Plasma

PubMed Central

Qiu, Ping; Stevens, Richard; Wei, Bo; Lahser, Fred; Howe, Anita Y. M.; Klappenbach, Joel A.; Marton, Matthew J.

2015-01-01

Genotyping of hepatitis C virus (HCV) plays an important role in the treatment of HCV. As new genotype-specific treatment options become available, it has become increasingly important to have accurate HCV genotype and subtype information to ensure that the most appropriate treatment regimen is selected. Most current genotyping methods are unable to detect mixed genotypes from two or more HCV infections. Next generation sequencing (NGS) allows for rapid and low cost mass sequencing of viral genomes and provides an opportunity to probe the viral population from a single host. In this paper, the possibility of using short NGS reads for direct HCV genotyping without genome assembly was evaluated. We surveyed the publicly-available genetic content of three HCV drug target regions (NS3, NS5A, NS5B) in terms of whether these genes contained genotype-specific regions that could predict genotype. Six genotypes and 38 subtypes were included in this study. An automated phylogenetic analysis based HCV genotyping method was implemented and used to assess different HCV target gene regions. Candidate regions of 250-bp each were found for all three genes that have enough genetic information to predict HCV genotypes/subtypes. Validation using public datasets shows 100% genotyping accuracy. To test whether these 250-bp regions were sufficient to identify mixed genotypes, we developed a random primer-based method to sequence HCV plasma samples containing mixtures of two HCV genotypes in different ratios. We were able to determine the genotypes without ambiguity and to quantify the ratio of the abundances of the mixed genotypes in the samples. These data provide a proof-of-concept that this random primed, NGS-based short-read genotyping approach does not need prior information about the viral population and is capable of detecting mixed viral infection. PMID:25830316
Cognitive functioning in opioid-dependent patients treated with buprenorphine, methadone, and other psychoactive medications: stability and correlates

PubMed Central

2011-01-01

Background In many but not in all neuropsychological studies buprenorphine-treated opioid-dependent patients have shown fewer cognitive deficits than patients treated with methadone. In order to examine if hypothesized cognitive advantage of buprenorphine in relation to methadone is seen in clinical patients we did a neuropsychological follow-up study in unselected sample of buprenorphine- vs. methadone-treated patients. Methods In part I of the study fourteen buprenorphine-treated and 12 methadone-treated patients were tested by cognitive tests within two months (T1), 6-9 months (T2), and 12 - 17 months (T3) from the start of opioid substitution treatment. Fourteen healthy controls were examined at similar intervals. Benzodiazepine and other psychoactive comedications were common among the patients. Test results were analyzed with repeated measures analysis of variance and planned contrasts. In part II of the study the patient sample was extended to include 36 patients at T2 and T3. Correlations between cognitive functioning and medication, substance abuse, or demographic variables were then analyzed. Results In part I methadone patients were inferior to healthy controls tests in all tests measuring attention, working memory, or verbal memory. Buprenorphine patients were inferior to healthy controls in the first working memory task, the Paced Auditory Serial Addition Task and verbal memory. In the second working memory task, the Letter-Number Sequencing, their performance improved between T2 and T3. In part II only group membership (buprenorphine vs. methadone) correlated significantly with attention performance and improvement in the Letter-Number Sequencing. High frequency of substance abuse in the past month was associated with poor performance in the Letter-Number Sequencing. Conclusions The results underline the differences between non-randomized and randomized studies comparing cognitive performance in opioid substitution treated patients (fewer deficits in buprenorphine patients vs. no difference between buprenorphine and methadone patients, respectively). Possible reasons for this are discussed. PMID:21854644
Universality of long-range correlations in expansion randomization systems

NASA Astrophysics Data System (ADS)

Messer, P. W.; Lässig, M.; Arndt, P. F.

2005-10-01

We study the stochastic dynamics of sequences evolving by single-site mutations, segmental duplications, deletions, and random insertions. These processes are relevant for the evolution of genomic DNA. They define a universality class of non-equilibrium 1D expansion-randomization systems with generic stationary long-range correlations in a regime of growing sequence length. We obtain explicitly the two-point correlation function of the sequence composition and the distribution function of the composition bias in sequences of finite length. The characteristic exponent χ of these quantities is determined by the ratio of two effective rates, which are explicitly calculated for several specific sequence evolution dynamics of the universality class. Depending on the value of χ, we find two different scaling regimes, which are distinguished by the detectability of the initial composition bias. All analytic results are accurately verified by numerical simulations. We also discuss the non-stationary build-up and decay of correlations, as well as more complex evolutionary scenarios, where the rates of the processes vary in time. Our findings provide a possible example for the emergence of universality in molecular biology.
The sampled-data consensus of multi-agent systems with probabilistic time-varying delays and packet losses

NASA Astrophysics Data System (ADS)

Sui, Xin; Yang, Yongqing; Xu, Xianyun; Zhang, Shuai; Zhang, Lingzhong

2018-02-01

This paper investigates the consensus of multi-agent systems with probabilistic time-varying delays and packet losses via sampled-data control. On the one hand, a Bernoulli-distributed white sequence is employed to model random packet losses among agents. On the other hand, a switched system is used to describe packet dropouts in a deterministic way. Based on the special property of the Laplacian matrix, the consensus problem can be converted into a stabilization problem of a switched system with lower dimensions. Some mean square consensus criteria are derived in terms of constructing an appropriate Lyapunov function and using linear matrix inequalities (LMIs). Finally, two numerical examples are given to show the effectiveness of the proposed method.
State estimator for multisensor systems with irregular sampling and time-varying delays

NASA Astrophysics Data System (ADS)

Peñarrocha, I.; Sanchis, R.; Romero, J. A.

2012-08-01

This article addresses the state estimation in linear time-varying systems with several sensors with different availability, randomly sampled in time and whose measurements have a time-varying delay. The approach is based on a modification of the Kalman filter with the negative-time measurement update strategy, avoiding running back the full standard Kalman filter, the use of full augmented order models or the use of reorganisation techniques, leading to a lower implementation cost algorithm. The update equations are run every time a new measurement is available, independently of the time when it was taken. The approach is useful for networked control systems, systems with long delays and scarce measurements and for out-of-sequence measurements.

Serial Reaction Time Learning in Preschool- and School-Age Children.

ERIC Educational Resources Information Center

Thomas, Kathleen M.; Nelson, Charles A.

2001-01-01

Two experiments assessed visuomotor sequence learning in 4- to 10-year-olds using a serial reaction time (SRT) task with random and sequenced trials. Found that children demonstrated sequence-specific decreases in RT. Participants with explicit awareness of the sequence at the session's end showed larger sequence-specific RT decrements than…
SNP-VISTA: An interactive SNP visualization tool

PubMed Central

Shah, Nameeta; Teplitsky, Michael V; Minovitsky, Simon; Pennacchio, Len A; Hugenholtz, Philip; Hamann, Bernd; Dubchak, Inna L

2005-01-01

Background Recent advances in sequencing technologies promise to provide a better understanding of the genetics of human disease as well as the evolution of microbial populations. Single Nucleotide Polymorphisms (SNPs) are established genetic markers that aid in the identification of loci affecting quantitative traits and/or disease in a wide variety of eukaryotic species. With today's technological capabilities, it has become possible to re-sequence a large set of appropriate candidate genes in individuals with a given disease in an attempt to identify causative mutations. In addition, SNPs have been used extensively in efforts to study the evolution of microbial populations, and the recent application of random shotgun sequencing to environmental samples enables more extensive SNP analysis of co-occurring and co-evolving microbial populations. The program is available at [1]. Results We have developed and present two modifications of an interactive visualization tool, SNP-VISTA, to aid in the analyses of the following types of data: A. Large-scale re-sequence data of disease-related genes for discovery of associated and/or causative alleles (GeneSNP-VISTA). B. Massive amounts of ecogenomics data for studying homologous recombination in microbial populations (EcoSNP-VISTA). The main features and capabilities of SNP-VISTA are: 1) mapping of SNPs to gene structure; 2) classification of SNPs, based on their location in the gene, frequency of occurrence in samples and allele composition; 3) clustering, based on user-defined subsets of SNPs, highlighting haplotypes as well as recombinant sequences; 4) integration of protein evolutionary conservation visualization; and 5) display of automatically calculated recombination points that are user-editable. Conclusion The main strength of SNP-VISTA is its graphical interface and use of visual representations, which support interactive exploration and hence better understanding of large-scale SNP data by the user. PMID:16336665
Classification of HCV and HIV-1 Sequences with the Branching Index

PubMed Central

Hraber, Peter; Kuiken, Carla; Waugh, Mark; Geer, Shaun; Bruno, William J.; Leitner, Thomas

2009-01-01

SUMMARY Classification of viral sequences should be fast, objective, accurate, and reproducible. Most methods that classify sequences use either pairwise distances or phylogenetic relations, but cannot discern when a sequence is unclassifiable. The branching index (BI) combines distance and phylogeny methods to compute a ratio that quantifies how closely a query sequence clusters with a subtype clade. In the hypothesis-testing framework of statistical inference, the BI is compared with a threshold to test whether sufficient evidence exists for the query sequence to be classified among known sequences. If above the threshold, the null hypothesis of no support for the subtype relation is rejected and the sequence is taken as belonging to the subtype clade with which it clusters on the tree. This study evaluates statistical properties of the branching index for subtype classification in HCV and HIV-1. Pairs of BI values with known positive and negative test results were computed from 10,000 random fragments of reference alignments. Sampled fragments were of sufficient length to contain phylogenetic signal that groups reference sequences together properly into subtype clades. For HCV, a threshold BI of 0.71 yields 95.1% agreement with reference subtypes, with equal false positive and false negative rates. For HIV-1, a threshold of 0.66 yields 93.5% agreement. Higher thresholds can be used where lower false positive rates are required. In synthetic recombinants, regions without breakpoints are recognized accurately; regions with breakpoints do not uniquely represent any known subtype. Web-based services for viral subtype classification with the branching index are available online. PMID:18753218
Markov and semi-Markov switching linear mixed models used to identify forest tree growth components.

PubMed

Chaubert-Pereira, Florence; Guédon, Yann; Lavergne, Christian; Trottier, Catherine

2010-09-01

Tree growth is assumed to be mainly the result of three components: (i) an endogenous component assumed to be structured as a succession of roughly stationary phases separated by marked change points that are asynchronous among individuals, (ii) a time-varying environmental component assumed to take the form of synchronous fluctuations among individuals, and (iii) an individual component corresponding mainly to the local environment of each tree. To identify and characterize these three components, we propose to use semi-Markov switching linear mixed models, i.e., models that combine linear mixed models in a semi-Markovian manner. The underlying semi-Markov chain represents the succession of growth phases and their lengths (endogenous component) whereas the linear mixed models attached to each state of the underlying semi-Markov chain represent-in the corresponding growth phase-both the influence of time-varying climatic covariates (environmental component) as fixed effects, and interindividual heterogeneity (individual component) as random effects. In this article, we address the estimation of Markov and semi-Markov switching linear mixed models in a general framework. We propose a Monte Carlo expectation-maximization like algorithm whose iterations decompose into three steps: (i) sampling of state sequences given random effects, (ii) prediction of random effects given state sequences, and (iii) maximization. The proposed statistical modeling approach is illustrated by the analysis of successive annual shoots along Corsican pine trunks influenced by climatic covariates. © 2009, The International Biometric Society.
Time multiplexing super-resolution nanoscopy based on the Brownian motion of gold nanoparticles

NASA Astrophysics Data System (ADS)

Ilovitsh, Tali; Ilovitsh, Asaf; Wagner, Omer; Zalevsky, Zeev

2017-02-01

Super-resolution localization microscopy can overcome the diffraction limit and achieve a tens of order improvement in resolution. It requires labeling the sample with fluorescent probes followed with their repeated cycles of activation and photobleaching. This work presents an alternative approach that is free from direct labeling and does not require the activation and photobleaching cycles. Fluorescently labeled gold nanoparticles in a solution are distributed on top of the sample. The nanoparticles move in a random Brownian motion, and interact with the sample. By obscuring different areas in the sample, the nanoparticles encode the sub-wavelength features. A sequence of images of the sample is captured and decoded by digital post processing to create the super-resolution image. The achievable resolution is limited by the additive noise and the size of the nanoparticles. Regular nanoparticles with diameter smaller than 100nm are barely seen in a conventional bright field microscope, thus fluorescently labeled gold nanoparticles were used, with proper
[Observation on gene polymorphism of Rh blood group in Chinese Han nationality].

PubMed

Lan, Jiong-Cai; Wang, Cong-Rong; Wei, Ya-Ming; Zhou, Hua-You; Cao, Qiong; Zhang, Yin-Ze; Jiang, KuReXi; Wu, Da-Lin; Liu, Zhong

2003-12-01

To observe the gene polymorphism of Rh blood group in unrelated random individuals and families for Chinese Han nationality, polymerase chain reaction-sequence specific primer (PCR-SSP) was used to amplify the Rh C/E gene, RhD gene, exons, intron 2 and 10, insert and Rh Box in 160 blood samples of RhD positive unrelated individuals and 71 samples of RhD negative unrelated individuals and 7 samples of families whose probands were RhD-negative. The results showed that RhD genes of RhD-negative individuals with C antigens were polymorphism, three forms were found for D exon including intact, partial deletion and complete deletion exons. Insert fragments and Rh Box were found in most cases of families whose probands were RhD-negative and its inheritance accorded with the Mendel's Law, and it did not affect the expression of RhD gene. "Normal" RhD exon 4 amplifying product was not found in all of the samples. It was concluded that gene structure of the RhD-negative in Chinese was polymorphism, intact, partial deletion and complete deletion exons were found in the individuals with C antigen and probably existed specific D (nf) Ce haplotype. The function of insert was uncertain. The Rh gene sequences of Chinese Han nationality are different from those of Caucasian and the Rh gene library based on Han nationality should be established.
Spread-Spectrum Beamforming and Clutter Filtering for Plane-Wave Color Doppler Imaging.

PubMed

Mansour, Omar; Poepping, Tamie L; Lacefield, James C

2016-07-21

Plane-wave imaging is desirable for its ability to achieve high frame rates, allowing the capture of fast dynamic events and continuous Doppler data. In most implementations of plane-wave imaging, multiple low-resolution images from different plane wave tilt angles are compounded to form a single high-resolution image, thereby reducing the frame rate. Compounding improves the lateral beam profile in the high-resolution image, but it also acts as a low-pass filter in slow time that causes attenuation and aliasing of signals with high Doppler shifts. This paper introduces a spread-spectrum color Doppler imaging method that produces high-resolution images without the use of compounding, thereby eliminating the tradeoff between beam quality, maximum unaliased Doppler frequency, and frame rate. The method uses a long, random sequence of transmit angles rather than a linear sweep of plane wave directions. The random angle sequence randomizes the phase of off-focus (clutter) signals, thereby spreading the clutter power in the Doppler spectrum, while keeping the spectrum of the in-focus signal intact. The ensemble of randomly tilted low-resolution frames also acts as the Doppler ensemble, so it can be much longer than a conventional linear sweep, thereby improving beam formation while also making the slow-time Doppler sampling frequency equal to the pulse repetition frequency. Experiments performed using a carotid artery phantom with constant flow demonstrate that the spread-spectrum method more accurately measures the parabolic flow profile of the vessel and outperforms conventional plane-wave Doppler in both contrast resolution and estimation of high flow velocities. The spread-spectrum method is expected to be valuable for Doppler applications that require measurement of high velocities at high frame rates.
Augmented brain function by coordinated reset stimulation with slowly varying sequences.

PubMed

Zeitler, Magteld; Tass, Peter A

2015-01-01

Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.
Augmented brain function by coordinated reset stimulation with slowly varying sequences

PubMed Central

Zeitler, Magteld; Tass, Peter A.

2015-01-01

Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867
RFMix: A Discriminative Modeling Approach for Rapid and Robust Local-Ancestry Inference

PubMed Central

Maples, Brian K.; Gravel, Simon; Kenny, Eimear E.; Bustamante, Carlos D.

2013-01-01

Local-ancestry inference is an important step in the genetic analysis of fully sequenced human genomes. Current methods can only detect continental-level ancestry (i.e., European versus African versus Asian) accurately even when using millions of markers. Here, we present RFMix, a powerful discriminative modeling approach that is faster (∼30×) and more accurate than existing methods. We accomplish this by using a conditional random field parameterized by random forests trained on reference panels. RFMix is capable of learning from the admixed samples themselves to boost performance and autocorrect phasing errors. RFMix shows high sensitivity and specificity in simulated Hispanics/Latinos and African Americans and admixed Europeans, Africans, and Asians. Finally, we demonstrate that African Americans in HapMap contain modest (but nonzero) levels of Native American ancestry (∼0.4%). PMID:23910464
Optimized approach for Ion Proton RNA sequencing reveals details of RNA splicing and editing features of the transcriptome.

PubMed

Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A

2017-01-01

RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.
Primer design for a prokaryotic differential display RT-PCR.

PubMed Central

Fislage, R; Berceanu, M; Humboldt, Y; Wendt, M; Oberender, H

1997-01-01

We have developed a primer set for a prokaryotic differential display of mRNA in the Enterobacteriaceae group. Each combination of ten 10mer and ten 11mer primers generates up to 85 bands from total Escherichia coli RNA, thus covering expressed sequences of a complete bacterial genome. Due to the lack of polyadenylation in prokaryotic RNA the type T11VN anchored oligonucleotides for the reverse transcriptase reaction had to be replaced with respect to the original method described by Liang and Pardee [ Science , 257, 967-971 (1992)]. Therefore, the sequences of both the 10mer and the new 11mer oligonucleotides were determined by a statistical evaluation of species-specific coding regions extracted from the EMBL database. The 11mer primers used for reverse transcription were selected for localization in the 3'-region of the bacterial RNA. The 10mer primers preferentially bind to the 5'-end of the RNA. None of the primers show homology to rRNA or other abundant small RNA species. Randomly sampled cDNA bands were checked for their bacterial origin either by re-amplification, cloning and sequencing or by re-amplification and direct sequencing with 10mer and 11mer primers after asymmetric PCR. PMID:9108168
Primer design for a prokaryotic differential display RT-PCR.

PubMed

Fislage, R; Berceanu, M; Humboldt, Y; Wendt, M; Oberender, H

1997-05-01

We have developed a primer set for a prokaryotic differential display of mRNA in the Enterobacteriaceae group. Each combination of ten 10mer and ten 11mer primers generates up to 85 bands from total Escherichia coli RNA, thus covering expressed sequences of a complete bacterial genome. Due to the lack of polyadenylation in prokaryotic RNA the type T11VN anchored oligonucleotides for the reverse transcriptase reaction had to be replaced with respect to the original method described by Liang and Pardee [ Science , 257, 967-971 (1992)]. Therefore, the sequences of both the 10mer and the new 11mer oligonucleotides were determined by a statistical evaluation of species-specific coding regions extracted from the EMBL database. The 11mer primers used for reverse transcription were selected for localization in the 3'-region of the bacterial RNA. The 10mer primers preferentially bind to the 5'-end of the RNA. None of the primers show homology to rRNA or other abundant small RNA species. Randomly sampled cDNA bands were checked for their bacterial origin either by re-amplification, cloning and sequencing or by re-amplification and direct sequencing with 10mer and 11mer primers after asymmetric PCR.
OPEN PROBLEM: Orbits' statistics in chaotic dynamical systems

NASA Astrophysics Data System (ADS)

Arnold, V.

2008-07-01

This paper shows how the measurement of the stochasticity degree of a finite sequence of real numbers, published by Kolmogorov in Italian in a journal of insurances' statistics, can be usefully applied to measure the objective stochasticity degree of sequences, originating from dynamical systems theory and from number theory. Namely, whenever the value of Kolmogorov's stochasticity parameter of a given sequence of numbers is too small (or too big), one may conclude that the conjecture describing this sequence as a sample of independent values of a random variables is highly improbable. Kolmogorov used this strategy fighting (in a paper in 'Doklady', 1940) against Lysenko, who had tried to disprove the classical genetics' law of Mendel experimentally. Calculating his stochasticity parameter value for the numbers from Lysenko's experiment reports, Kolmogorov deduced, that, while these numbers were different from the exact fulfilment of Mendel's 3 : 1 law, any smaller deviation would be a manifestation of the report's number falsification. The calculation of the values of the stochasticity parameter would be useful for many other generators of pseudorandom numbers and for many other chaotically looking statistics, including even the prime numbers distribution (discussed in this paper as an example).
Occurrence and Nonoccurrence of Random Sequences: Comment on Hahn and Warren (2009)

ERIC Educational Resources Information Center

Sun, Yanlong; Tweney, Ryan D.; Wang, Hongbin

2010-01-01

On the basis of the statistical concept of waiting time and on computer simulations of the "probabilities of nonoccurrence" (p. 457) for random sequences, Hahn and Warren (2009) proposed that given people's experience of a finite data stream from the environment, the gambler's fallacy is not as gross an error as it might seem. We deal with two…
A new feedback image encryption scheme based on perturbation with dynamical compound chaotic sequence cipher generator

NASA Astrophysics Data System (ADS)

Tong, Xiaojun; Cui, Minggen; Wang, Zhu

2009-07-01

The design of the new compound two-dimensional chaotic function is presented by exploiting two one-dimensional chaotic functions which switch randomly, and the design is used as a chaotic sequence generator which is proved by Devaney's definition proof of chaos. The properties of compound chaotic functions are also proved rigorously. In order to improve the robustness against difference cryptanalysis and produce avalanche effect, a new feedback image encryption scheme is proposed using the new compound chaos by selecting one of the two one-dimensional chaotic functions randomly and a new image pixels method of permutation and substitution is designed in detail by array row and column random controlling based on the compound chaos. The results from entropy analysis, difference analysis, statistical analysis, sequence randomness analysis, cipher sensitivity analysis depending on key and plaintext have proven that the compound chaotic sequence cipher can resist cryptanalytic, statistical and brute-force attacks, and especially it accelerates encryption speed, and achieves higher level of security. By the dynamical compound chaos and perturbation technology, the paper solves the problem of computer low precision of one-dimensional chaotic function.
Microbial contamination of contact lenses after scaling and root planing using ultrasonic scalers with and without protective eyewear: A clinical and microbiological study.

PubMed

Afzha, Rooh; Chatterjee, Anirban; Subbaiah, Shobha Krishna; Pradeep, Avani Rangaraju

2016-01-01

Ultrasonic scaler is a preferential treatment modality among the clinicians. However, the aerosol/splatter generated is a concern for patients and practitioners. Therefore, the purpose of this study was to evaluate contamination of contact lenses of the dentist after scaling and root planing using ultrasonic scalers with and without protective eyewear. Thirty patients were randomly selected for scaling and root planing and divided into 2 groups of 15 each. Group A - dentist wearing contact lenses and protective eyewear. Group B - dentist wearing only contact lenses. After scaling and root planing using ultrasonic scalers, the lenses were subjected to culture and 16S rRNA (16S ribosomal RNA) gene sequencing. In Group A - 15 out of thirty samples were contaminated, in Group B - all the thirty samples were contaminated. Most of the samples showed Gram-positive bacteria and 5 samples were contaminated with fungi. 16S rRNA gene sequencing of forty contaminated samples showed that 31 were contaminated with Streptococcus mutans and 9 with Staphylococcus aureus. Keeping in mind the limitation of the study for the absence of negative control, we would like to conclude that dental practitioners should better avoid contact lenses in a dental setup because of the risk of contamination of the contact lenses from the various dental procedures which can produce aerosol/splatter and if worn, it is recommended to wear protective eyewear.
Microbial contamination of contact lenses after scaling and root planing using ultrasonic scalers with and without protective eyewear: A clinical and microbiological study

PubMed Central

Afzha, Rooh; Chatterjee, Anirban; Subbaiah, Shobha Krishna; Pradeep, Avani Rangaraju

2016-01-01

Background: Ultrasonic scaler is a preferential treatment modality among the clinicians. However, the aerosol/splatter generated is a concern for patients and practitioners. Therefore, the purpose of this study was to evaluate contamination of contact lenses of the dentist after scaling and root planing using ultrasonic scalers with and without protective eyewear. Materials and Methods: Thirty patients were randomly selected for scaling and root planing and divided into 2 groups of 15 each. Group A – dentist wearing contact lenses and protective eyewear. Group B - dentist wearing only contact lenses. After scaling and root planing using ultrasonic scalers, the lenses were subjected to culture and 16S rRNA (16S ribosomal RNA) gene sequencing. Results: In Group A – 15 out of thirty samples were contaminated, in Group B – all the thirty samples were contaminated. Most of the samples showed Gram-positive bacteria and 5 samples were contaminated with fungi. 16S rRNA gene sequencing of forty contaminated samples showed that 31 were contaminated with Streptococcus mutans and 9 with Staphylococcus aureus. Conclusion: Keeping in mind the limitation of the study for the absence of negative control, we would like to conclude that dental practitioners should better avoid contact lenses in a dental setup because of the risk of contamination of the contact lenses from the various dental procedures which can produce aerosol/splatter and if worn, it is recommended to wear protective eyewear. PMID:27563200
Intestinal virome changes precede autoimmunity in type I diabetes-susceptible children

PubMed Central

Vatanen, Tommi; Droit, Lindsay; Kostic, Aleksandar D.; Poon, Tiffany W.; Vlamakis, Hera; Siljander, Heli; Härkönen, Taina; Hämäläinen, Anu-Maaria; Peet, Aleksandr; Tillmann, Vallo; Ilonen, Jorma; Wang, David; Knip, Mikael; Xavier, Ramnik J.

2017-01-01

Viruses have long been considered potential triggers of autoimmune diseases. Here we defined the intestinal virome from birth to the development of autoimmunity in children at risk for type 1 diabetes (T1D). A total of 220 virus-enriched preparations from serially collected fecal samples from 11 children (cases) who developed serum autoantibodies associated with T1D (of whom five developed clinical T1D) were compared with samples from controls. Intestinal viromes of case subjects were less diverse than those of controls. Among eukaryotic viruses, we identified significant enrichment of Circoviridae-related sequences in samples from controls in comparison with cases. Enterovirus, kobuvirus, parechovirus, parvovirus, and rotavirus sequences were frequently detected but were not associated with autoimmunity. For bacteriophages, we found higher Shannon diversity and richness in controls compared with cases and observed that changes in the intestinal virome over time differed between cases and controls. Using Random Forests analysis, we identified disease-associated viral bacteriophage contigs after subtraction of age-associated contigs. These disease-associated contigs were statistically linked to specific components of the bacterial microbiome. Thus, changes in the intestinal virome preceded autoimmunity in this cohort. Specific components of the virome were both directly and inversely associated with the development of human autoimmune disease. PMID:28696303
Molecular Epidemiology of Rhinovirus Detections in Young Children

PubMed Central

Howard, Leigh M.; Johnson, Monika; Gil, Ana I.; Griffin, Marie R.; Edwards, Kathryn M.; Lanata, Claudio F.; Williams, John V.; Grijalva, Carlos G.

2016-01-01

Background. Human rhinoviruses (HRVs) are frequently detected in children with acute respiratory illnesses (ARIs) but also in asymptomatic children. We compared features of ARI with HRV species (A, B, C) and determined genotypes associated with repeated HRV detections within individuals. Methods. We used clinical data and respiratory samples obtained from children <3 years old during weekly active household-based surveillance. A random subset of samples in which HRV was detected from individuals during both ARI and an asymptomatic period within 120 days of the ARI were genotyped. Features of ARI were compared among HRV species. Concordance of genotype among repeated HRV detections within individuals was assessed. Results. Among 207 ARI samples sequenced, HRV-A, HRV-B, and HRV-C were detected in 104 (50%), 20 (10%), and 83 (40%), respectively. Presence of fever, decreased appetite, and malaise were significantly higher in children with HRV-B. When codetections with other viruses were excluded (n = 155), these trends persisted, but some did not reach statistical significance. When 58 paired sequential HRV detections during asymptomatic and ARI episodes were sequenced, only 9 (16%) were identical genotypes of HRV. Conclusions. Clinical features may differ among HRV species. Repeated HRV detections in young children frequently represented acquisition of new HRV strains. PMID:26900577

Canine parvovirus in asymptomatic feline carriers.

PubMed

Clegg, S R; Coyne, K P; Dawson, S; Spibey, N; Gaskell, R M; Radford, A D

2012-05-25

Canine parvovirus (CPV) and feline panleukopaenia virus (FPLV) are two closely related viruses, which are known to cause severe disease in younger unvaccinated animals. As well as causing disease in their respective hosts, CPV has recently acquired the feline host range, allowing it to infect both cats and dogs. As well as causing disease in dogs, there is evidence that under some circumstances CPV may also cause disease in cats. This study has investigated the prevalence of parvoviruses in the faeces of clinically healthy cats and dogs in two rescue shelters. Canine parvovirus was demonstrated in 32.5% (13/50) of faecal samples in a cross sectional study of 50 cats from a feline only shelter, and 33.9% (61/180) of faecal samples in a longitudinal study of 74 cats at a mixed canine and feline shelter. Virus was isolated in cell cultures of both canine and feline origin from all PCR-positive samples suggesting they contained viable, infectious virus. In contrast to the high CPV prevalence in cats, no FPLV was found, and none of 122 faecal samples from dogs, or 160 samples collected from the kennel environment, tested positive for parvovirus by PCR. Sequence analysis of major capsid VP2 gene from all positive samples, as well as the non-structural gene from 18 randomly selected positive samples, showed that all positive cats were shedding CPV2a or 2b, rather than FPLV. Longitudinally sampling in one shelter showed that all cats appeared to shed the same virus sequence type at each date they were positive (up to six weeks), despite a lack of clinical signs. Fifty percent of the sequences obtained here were shown to be similar to those recently obtained in a study of sick dogs in the UK (Clegg et al., 2011). These results suggest that in some circumstances, clinically normal cats may be able to shed CPV for prolonged periods of time, and raises the possibility that such cats may be important reservoirs for the maintenance of infection in both the cat and the dog population. Copyright © 2011 Elsevier B.V. All rights reserved.
Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

USDA-ARS?s Scientific Manuscript database

Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...
Molecular epidemiology of pathogenic Leptospira spp. in the straw-colored fruit bat (Eidolon helvum) migrating to Zambia from the Democratic Republic of Congo.

PubMed

Ogawa, Hirohito; Koizumi, Nobuo; Ohnuma, Aiko; Mutemwa, Alisheke; Hang'ombe, Bernard M; Mweene, Aaron S; Takada, Ayato; Sugimoto, Chihiro; Suzuki, Yasuhiko; Kida, Hiroshi; Sawa, Hirofumi

2015-06-01

The role played by bats as a potential source of transmission of Leptospira spp. to humans is poorly understood, despite various pathogenic Leptospira spp. being identified in these mammals. Here, we investigated the prevalence and diversity of pathogenic Leptospira spp. that infect the straw-colored fruit bat (Eidolon helvum). We captured this bat species, which is widely distributed in Africa, in Zambia during 2008-2013. We detected the flagellin B gene (flaB) from pathogenic Leptospira spp. in kidney samples from 79 of 529 E. helvum (14.9%) bats. Phylogenetic analysis of 70 flaB fragments amplified from E. helvum samples and previously reported sequences, revealed that 12 of the fragments grouped with Leptospira borgpetersenii and Leptospira kirschneri; however, the remaining 58 flaB fragments appeared not to be associated with any reported species. Additionally, the 16S ribosomal RNA gene (rrs) amplified from 27 randomly chosen flaB-positive samples was compared with previously reported sequences, including bat-derived Leptospira spp. All 27 rrs fragments clustered into a pathogenic group. Eight fragments were located in unique branches, the other 19 fragments were closely related to Leptospira spp. detected in bats. These results show that rrs sequences in bats are genetically related to each other without regional variation, suggesting that Leptospira are evolutionarily well-adapted to bats and have uniquely evolved in the bat population. Our study indicates that pathogenic Leptospira spp. in E. helvum in Zambia have unique genotypes. Copyright © 2015 The Authors. Published by Elsevier B.V. All rights reserved.
Efficient Detection of Copy Number Mutations in PMS2 Exons with a Close Homolog.

PubMed

Herman, Daniel S; Smith, Christina; Liu, Chang; Vaughn, Cecily P; Palaniappan, Selvi; Pritchard, Colin C; Shirts, Brian H

2018-07-01

Detection of 3' PMS2 copy-number mutations that cause Lynch syndrome is difficult because of highly homologous pseudogenes. To improve the accuracy and efficiency of clinical screening for these mutations, we developed a new method to analyze standard capture-based, next-generation sequencing data to identify deletions and duplications in PMS2 exons 9 to 15. The approach captures sequences using PMS2 targets, maps sequences randomly among regions with equal mapping quality, counts reads aligned to homologous exons and introns, and flags read count ratios outside of empirically derived reference ranges. The method was trained on 1352 samples, including 8 known positives, and tested on 719 samples, including 17 known positives. Clinical implementation of the first version of this method detected new mutations in the training (N = 7) and test (N = 2) sets that had not been identified by our initial clinical testing pipeline. The described final method showed complete sensitivity in both sample sets and false-positive rates of 5% (training) and 7% (test), dramatically decreasing the number of cases needing additional mutation evaluation. This approach leveraged the differences between gene and pseudogene to distinguish between PMS2 and PMS2CL copy-number mutations. These methods enable efficient and sensitive Lynch syndrome screening for 3' PMS2 copy-number mutations and may be applied similarly to other genomic regions with highly homologous pseudogenes. Copyright © 2018 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.
Analysis of in vitro evolution reveals the underlying distribution of catalytic activity among random sequences.

PubMed

Pressman, Abe; Moretti, Janina E; Campbell, Gregory W; Müller, Ulrich F; Chen, Irene A

2017-08-21

The emergence of catalytic RNA is believed to have been a key event during the origin of life. Understanding how catalytic activity is distributed across random sequences is fundamental to estimating the probability that catalytic sequences would emerge. Here, we analyze the in vitro evolution of triphosphorylating ribozymes and translate their fitnesses into absolute estimates of catalytic activity for hundreds of ribozyme families. The analysis efficiently identified highly active ribozymes and estimated catalytic activity with good accuracy. The evolutionary dynamics follow Fisher's Fundamental Theorem of Natural Selection and a corollary, permitting retrospective inference of the distribution of fitness and activity in the random sequence pool for the first time. The frequency distribution of rate constants appears to be log-normal, with a surprisingly steep dropoff at higher activity, consistent with a mechanism for the emergence of activity as the product of many independent contributions. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.
CRF: detection of CRISPR arrays using random forest.

PubMed

Wang, Kai; Liang, Chun

2017-01-01

CRISPRs (clustered regularly interspaced short palindromic repeats) are particular repeat sequences found in wide range of bacteria and archaea genomes. Several tools are available for detecting CRISPR arrays in the genomes of both domains. Here we developed a new web-based CRISPR detection tool named CRF (CRISPR Finder by Random Forest). Different from other CRISPR detection tools, a random forest classifier was used in CRF to filter out invalid CRISPR arrays from all putative candidates and accordingly enhanced detection accuracy. In CRF, particularly, triplet elements that combine both sequence content and structure information were extracted from CRISPR repeats for classifier training. The classifier achieved high accuracy and sensitivity. Moreover, CRF offers a highly interactive web interface for robust data visualization that is not available among other CRISPR detection tools. After detection, the query sequence, CRISPR array architecture, and the sequences and secondary structures of CRISPR repeats and spacers can be visualized for visual examination and validation. CRF is freely available at http://bioinfolab.miamioh.edu/crf/home.php.
Non-random distribution and co-localization of purine/pyrimidine-encoded information and transcriptional regulatory domains.

PubMed

Povinelli, C M

1992-01-01

In order to detect sequence-based information predictive for the location of eukaryotic transcriptional regulatory domains, the frequencies and distributions of the 36 possible purine/pyrimidine reverse complement hexamer pairs was determined for test sets of real and random sequences. The distribution of one of the hexamer pairs (RRYYRR/YYRRYY, referred to as M1) was further examined in a larger set of sequences (> 32 genes, 230 kb). Predominant clusters of M1 and the locations of eukaryotic transcriptional regulatory domains were found to be associated and non-randomly distributed along the DNA consistent with a periodicity of approximately 1.2 kb. In the context of higher ordered chromatin this would align promoters, enhancers and the predominant clusters of M1 longitudinally along one face of a 30 nm fiber. Using only information about the distribution of the M1 motif, 50-70% of a sequence could be eliminated as being unlikely to contain transcriptional regulatory domains with an 87% recovery of the regulatory domains present.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Allen, J; Velsko, S

This report explores the question of whether meaningful conclusions can be drawn regarding the transmission relationship between two microbial samples on the basis of differences observed between the two sample's respective genomes. Unlike similar forensic applications using human DNA, the rapid rate of microbial genome evolution combined with the dynamics of infectious disease require a shift in thinking on what it means for two samples to 'match' in support of a forensic hypothesis. Previous outbreaks for SARS-CoV, FMDV and HIV were examined to investigate the question of how microbial sequence data can be used to draw inferences that link twomore » infected individuals by direct transmission. The results are counter intuitive with respect to human DNA forensic applications in that some genetic change rather than exact matching improve confidence in inferring direct transmission links, however, too much genetic change poses challenges, which can weaken confidence in inferred links. High rates of infection coupled with relatively weak selective pressure observed in the SARS-CoV and FMDV data lead to fairly low confidence for direct transmission links. Confidence values for forensic hypotheses increased when testing for the possibility that samples are separated by at most a few intermediate hosts. Moreover, the observed outbreak conditions support the potential to provide high confidence values for hypothesis that exclude direct transmission links. Transmission inferences are based on the total number of observed or inferred genetic changes separating two sequences rather than uniquely weighing the importance of any one genetic mismatch. Thus, inferences are surprisingly robust in the presence of sequencing errors provided the error rates are randomly distributed across all samples in the reference outbreak database and the novel sequence samples in question. When the number of observed nucleotide mutations are limited due to characteristics of the outbreak or the availability of only partial rather than whole genome sequencing, indel information was shown to have the potential to improve performance but only for select outbreak conditions. In examined HIV transmission cases, extended evolution proved to be the limiting factor in assigning high confidence to transmission links, however, the potential to correct for extended evolution not associated with transmission events is demonstrated. Outbreak specific conditions such as selective pressure (in the form of varying mutation rate), are shown to impact the strength of inference made and a Monte Carlo simulation tool is introduced, which is used to provide upper and lower bounds on the confidence values associated with a forensic hypothesis.« less
Construction and sequence sampling of deep-coverage, large-insert BAC libraries for three model lepidopteran species

PubMed Central

Wu, Chengcang; Proestou, Dina; Carter, Dorothy; Nicholson, Erica; Santos, Filippe; Zhao, Shaying; Zhang, Hong-Bin; Goldsmith, Marian R

2009-01-01

Background Manduca sexta, Heliothis virescens, and Heliconius erato represent three widely-used insect model species for genomic and fundamental studies in Lepidoptera. Large-insert BAC libraries of these insects are critical resources for many molecular studies, including physical mapping and genome sequencing, but not available to date. Results We report the construction and characterization of six large-insert BAC libraries for the three species and sampling sequence analysis of the genomes. The six BAC libraries were constructed with two restriction enzymes, two libraries for each species, and each has an average clone insert size ranging from 152–175 kb. We estimated that the genome coverage of each library ranged from 6–9 ×, with the two combined libraries of each species being equivalent to 13.0–16.3 × haploid genomes. The genome coverage, quality and utility of the libraries were further confirmed by library screening using 6~8 putative single-copy probes. To provide a first glimpse into these genomes, we sequenced and analyzed the BAC ends of ~200 clones randomly selected from the libraries of each species. The data revealed that the genomes are AT-rich, contain relatively small fractions of repeat elements with a majority belonging to the category of low complexity repeats, and are more abundant in retro-elements than DNA transposons. Among the species, the H. erato genome is somewhat more abundant in repeat elements and simple repeats than those of M. sexta and H. virescens. The BLAST analysis of the BAC end sequences suggested that the evolution of the three genomes is widely varied, with the genome of H. virescens being the most conserved as a typical lepidopteran, whereas both genomes of H. erato and M. sexta appear to have evolved significantly, resulting in a higher level of species- or evolutionary lineage-specific sequences. Conclusion The high-quality and large-insert BAC libraries of the insects, together with the identified BACs containing genes of interest, provide valuable information, resources and tools for comprehensive understanding and studies of the insect genomes and for addressing many fundamental questions in Lepidoptera. The sample of the genomic sequences provides the first insight into the constitution and evolution of the insect genomes. PMID:19558662
Nitrous Oxide Reductase (nosZ) Gene Fragments Differ between Native and Cultivated Michigan Soils

PubMed Central

Stres, Blaž; Mahne, Ivan; Avguštin, Gorazd; Tiedje, James M.

2004-01-01

The effect of standard agricultural management on the genetic heterogeneity of nitrous oxide reductase (nosZ) fragments from denitrifying prokaryotes in native and cultivated soil was explored. Thirty-six soil cores were composited from each of the two soil management conditions. nosZ gene fragments were amplified from triplicate samples, and PCR products were cloned and screened by restriction fragment length polymorphism (RFLP). The total nosZ RFLP profiles increased in similarity with soil sample size until triplicate 3-g samples produced visually identical RFLP profiles for each treatment. Large differences in total nosZ profiles were observed between the native and cultivated soils. The fragments representing major groups of clones encountered at least twice and four randomly selected clones with unique RFLP patterns were sequenced to verify nosZ identity. The sequence diversity of nosZ clones from the cultivated field was higher, and only eight patterns were found in clone libraries from both soils among the 182 distinct nosZ RFLP patterns identified from the two soils. A group of clones that comprised 32% of all clones dominated the gene library of native soil, whereas many minor groups were observed in the gene library of cultivated soil. The 95% confidence intervals of the Chao1 nonparametric richness estimator for nosZ RFLP data did not overlap, indicating that the levels of species richness are significantly different in the two soils, the cultivated soil having higher diversity. Phylogenetic analysis of deduced amino acid sequences grouped the majority of nosZ clones into an interleaved Michigan soil cluster whose cultured members are α-Proteobacteria. Only four nosZ sequences from cultivated soil and one from the native soil were related to sequences found in γ-Proteobacteria. Sequences from the native field formed a distinct, closely related cluster (Dmean = 0.16) containing 91.6% of the native clones. Clones from the cultivated field were more distantly related to each other (Dmean = 0.26), and 65% were found outside of the cluster from the native soil, further indicating a difference in the two communities. Overall, there appears to be a relationship between use and richness, diversity, and the phylogenetic position of nosZ sequences, indicating that agricultural use of soil caused a shift to a more diverse denitrifying community. PMID:14711656
High-Throughput Next-Generation Sequencing of Polioviruses

PubMed Central

Montmayeur, Anna M.; Schmidt, Alexander; Zhao, Kun; Magaña, Laura; Iber, Jane; Castro, Christina J.; Chen, Qi; Henderson, Elizabeth; Ramos, Edward; Shaw, Jing; Tatusov, Roman L.; Dybdahl-Sissoko, Naomi; Endegue-Zanga, Marie Claire; Adeniji, Johnson A.; Oberste, M. Steven; Burns, Cara C.

2016-01-01

ABSTRACT The poliovirus (PV) is currently targeted for worldwide eradication and containment. Sanger-based sequencing of the viral protein 1 (VP1) capsid region is currently the standard method for PV surveillance. However, the whole-genome sequence is sometimes needed for higher resolution global surveillance. In this study, we optimized whole-genome sequencing protocols for poliovirus isolates and FTA cards using next-generation sequencing (NGS), aiming for high sequence coverage, efficiency, and throughput. We found that DNase treatment of poliovirus RNA followed by random reverse transcription (RT), amplification, and the use of the Nextera XT DNA library preparation kit produced significantly better results than other preparations. The average viral reads per total reads, a measurement of efficiency, was as high as 84.2% ± 15.6%. PV genomes covering >99 to 100% of the reference length were obtained and validated with Sanger sequencing. A total of 52 PV genomes were generated, multiplexing as many as 64 samples in a single Illumina MiSeq run. This high-throughput, sequence-independent NGS approach facilitated the detection of a diverse range of PVs, especially for those in vaccine-derived polioviruses (VDPV), circulating VDPV, or immunodeficiency-related VDPV. In contrast to results from previous studies on other viruses, our results showed that filtration and nuclease treatment did not discernibly increase the sequencing efficiency of PV isolates. However, DNase treatment after nucleic acid extraction to remove host DNA significantly improved the sequencing results. This NGS method has been successfully implemented to generate PV genomes for molecular epidemiology of the most recent PV isolates. Additionally, the ability to obtain full PV genomes from FTA cards will aid in facilitating global poliovirus surveillance. PMID:27927929
Random and externally controlled occurrences of Dansgaard-Oeschger events

NASA Astrophysics Data System (ADS)

Lohmann, Johannes; Ditlevsen, Peter D.

2018-05-01

Dansgaard-Oeschger (DO) events constitute the most pronounced mode of centennial to millennial climate variability of the last glacial period. Since their discovery, many decades of research have been devoted to understand the origin and nature of these rapid climate shifts. In recent years, a number of studies have appeared that report emergence of DO-type variability in fully coupled general circulation models via different mechanisms. These mechanisms result in the occurrence of DO events at varying degrees of regularity, ranging from periodic to random. When examining the full sequence of DO events as captured in the North Greenland Ice Core Project (NGRIP) ice core record, one can observe high irregularity in the timing of individual events at any stage within the last glacial period. In addition to the prevailing irregularity, certain properties of the DO event sequence, such as the average event frequency or the relative distribution of cold versus warm periods, appear to be changing throughout the glacial. By using statistical hypothesis tests on simple event models, we investigate whether the observed event sequence may have been generated by stationary random processes or rather was strongly modulated by external factors. We find that the sequence of DO warming events is consistent with a stationary random process, whereas dividing the event sequence into warming and cooling events leads to inconsistency with two independent event processes. As we include external forcing, we find a particularly good fit to the observed DO sequence in a model where the average residence time in warm periods are controlled by global ice volume and cold periods by boreal summer insolation.
Egg laying sequence influences egg mercury concentrations and egg size in three bird species: Implications for contaminant monitoring programs

USGS Publications Warehouse

Ackerman, Joshua T.; Eagles-Smith, Collin A.; Herzog, Mark P.; Yee, Julie L.; Hartman, C. Alex

2016-01-01

Bird eggs are commonly used in contaminant monitoring programs and toxicological risk assessments, but intra-clutch variation and sampling methodology could influence interpretability. We examined the influence of egg laying sequence on egg mercury concentrations and burdens in American avocets, black-necked stilts, and Forster's terns. The average decline in mercury concentrations between the first and last egg laid was 33% for stilts, 22% for terns, and 11% for avocets, and most of this decline occurred between the first and second eggs laid (24% for stilts, 18% for terns, and 9% for avocets). Trends in egg size with egg laying order were inconsistent among species and overall differences in egg volume, mass, length, and width were <3%. We summarized the literature and, among 17 species studied, mercury concentrations generally declined by 16% between the first and second eggs laid. Despite the strong effect of egg laying sequence, most of the variance in egg mercury concentrations still occurred among clutches (75%-91%) rather than within clutches (9%-25%). Using simulations, we determined that to accurately estimate a population's mean egg mercury concentration using only a single random egg from a subset of nests, it would require sampling >60 nests to represent a large population (10% accuracy) or ≥14 nests to represent a small colony that contained <100 nests (20% accuracy).
A global sampling approach to designing and reengineering RNA secondary structures.

PubMed

Levin, Alex; Lis, Mieszko; Ponty, Yann; O'Donnell, Charles W; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme

2012-11-01

The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign.
A global sampling approach to designing and reengineering RNA secondary structures

PubMed Central

Levin, Alex; Lis, Mieszko; Ponty, Yann; O’Donnell, Charles W.; Devadas, Srinivas; Berger, Bonnie; Waldispühl, Jérôme

2012-01-01

The development of algorithms for designing artificial RNA sequences that fold into specific secondary structures has many potential biomedical and synthetic biology applications. To date, this problem remains computationally difficult, and current strategies to address it resort to heuristics and stochastic search techniques. The most popular methods consist of two steps: First a random seed sequence is generated; next, this seed is progressively modified (i.e. mutated) to adopt the desired folding properties. Although computationally inexpensive, this approach raises several questions such as (i) the influence of the seed; and (ii) the efficiency of single-path directed searches that may be affected by energy barriers in the mutational landscape. In this article, we present RNA-ensign, a novel paradigm for RNA design. Instead of taking a progressive adaptive walk driven by local search criteria, we use an efficient global sampling algorithm to examine large regions of the mutational landscape under structural and thermodynamical constraints until a solution is found. When considering the influence of the seeds and the target secondary structures, our results show that, compared to single-path directed searches, our approach is more robust, succeeds more often and generates more thermodynamically stable sequences. An ensemble approach to RNA design is thus well worth pursuing as a complement to existing approaches. RNA-ensign is available at http://csb.cs.mcgill.ca/RNAensign. PMID:22941632
Methods and analysis of realizing randomized grouping.

PubMed

Hu, Liang-Ping; Bao, Xiao-Lei; Wang, Qi

2011-07-01

Randomization is one of the four basic principles of research design. The meaning of randomization includes two aspects: one is to randomly select samples from the population, which is known as random sampling; the other is to randomly group all the samples, which is called randomized grouping. Randomized grouping can be subdivided into three categories: completely, stratified and dynamically randomized grouping. This article mainly introduces the steps of complete randomization, the definition of dynamic randomization and the realization of random sampling and grouping by SAS software.
Subjective randomness as statistical inference.

PubMed

Griffiths, Thomas L; Daniels, Dylan; Austerweil, Joseph L; Tenenbaum, Joshua B

2018-06-01

Some events seem more random than others. For example, when tossing a coin, a sequence of eight heads in a row does not seem very random. Where do these intuitions about randomness come from? We argue that subjective randomness can be understood as the result of a statistical inference assessing the evidence that an event provides for having been produced by a random generating process. We show how this account provides a link to previous work relating randomness to algorithmic complexity, in which random events are those that cannot be described by short computer programs. Algorithmic complexity is both incomputable and too general to capture the regularities that people can recognize, but viewing randomness as statistical inference provides two paths to addressing these problems: considering regularities generated by simpler computing machines, and restricting the set of probability distributions that characterize regularity. Building on previous work exploring these different routes to a more restricted notion of randomness, we define strong quantitative models of human randomness judgments that apply not just to binary sequences - which have been the focus of much of the previous work on subjective randomness - but also to binary matrices and spatial clustering. Copyright © 2018 Elsevier Inc. All rights reserved.
Autonomous Byte Stream Randomizer

NASA Technical Reports Server (NTRS)

Paloulian, George K.; Woo, Simon S.; Chow, Edward T.

2013-01-01

Net-centric networking environments are often faced with limited resources and must utilize bandwidth as efficiently as possible. In networking environments that span wide areas, the data transmission has to be efficient without any redundant or exuberant metadata. The Autonomous Byte Stream Randomizer software provides an extra level of security on top of existing data encryption methods. Randomizing the data s byte stream adds an extra layer to existing data protection methods, thus making it harder for an attacker to decrypt protected data. Based on a generated crypto-graphically secure random seed, a random sequence of numbers is used to intelligently and efficiently swap the organization of bytes in data using the unbiased and memory-efficient in-place Fisher-Yates shuffle method. Swapping bytes and reorganizing the crucial structure of the byte data renders the data file unreadable and leaves the data in a deconstructed state. This deconstruction adds an extra level of security requiring the byte stream to be reconstructed with the random seed in order to be readable. Once the data byte stream has been randomized, the software enables the data to be distributed to N nodes in an environment. Each piece of the data in randomized and distributed form is a separate entity unreadable on its own right, but when combined with all N pieces, is able to be reconstructed back to one. Reconstruction requires possession of the key used for randomizing the bytes, leading to the generation of the same cryptographically secure random sequence of numbers used to randomize the data. This software is a cornerstone capability possessing the ability to generate the same cryptographically secure sequence on different machines and time intervals, thus allowing this software to be used more heavily in net-centric environments where data transfer bandwidth is limited.
Random Sampling of Squamate Reptiles in Spanish Natural Reserves Reveals the Presence of Novel Adenoviruses in Lacertids (Family Lacertidae) and Worm Lizards (Amphisbaenia)

PubMed Central

Szirovicza, Leonóra; López, Pilar; Kopena, Renáta; Benkő, Mária; Martín, José; Pénzes, Judit J.

2016-01-01

Here, we report the results of a large-scale PCR survey on the prevalence and diversity of adenoviruses (AdVs) in samples collected randomly from free-living reptiles. On the territories of the Guadarrama Mountains National Park in Central Spain and of the Chafarinas Islands in North Africa, cloacal swabs were taken from 318 specimens of eight native species representing five squamate reptilian families. The healthy-looking animals had been captured temporarily for physiological and ethological examinations, after which they were released. We found 22 AdV-positive samples in representatives of three species, all from Central Spain. Sequence analysis of the PCR products revealed the existence of three hitherto unknown AdVs in 11 Carpetane rock lizards (Iberolacerta cyreni), nine Iberian worm lizards (Blanus cinereus), and two Iberian green lizards (Lacerta schreiberi), respectively. Phylogeny inference showed every novel putative virus to be a member of the genus Atadenovirus. This is the very first description of the occurrence of AdVs in amphisbaenian and lacertid hosts. Unlike all squamate atadenoviruses examined previously, two of the novel putative AdVs had A+T rich DNA, a feature generally deemed to mirror previous host switch events. Our results shed new light on the diversity and evolution of atadenoviruses. PMID:27399970
Random Sampling of Squamate Reptiles in Spanish Natural Reserves Reveals the Presence of Novel Adenoviruses in Lacertids (Family Lacertidae) and Worm Lizards (Amphisbaenia).

PubMed

Szirovicza, Leonóra; López, Pilar; Kopena, Renáta; Benkő, Mária; Martín, José; Pénzes, Judit J

2016-01-01

Here, we report the results of a large-scale PCR survey on the prevalence and diversity of adenoviruses (AdVs) in samples collected randomly from free-living reptiles. On the territories of the Guadarrama Mountains National Park in Central Spain and of the Chafarinas Islands in North Africa, cloacal swabs were taken from 318 specimens of eight native species representing five squamate reptilian families. The healthy-looking animals had been captured temporarily for physiological and ethological examinations, after which they were released. We found 22 AdV-positive samples in representatives of three species, all from Central Spain. Sequence analysis of the PCR products revealed the existence of three hitherto unknown AdVs in 11 Carpetane rock lizards (Iberolacerta cyreni), nine Iberian worm lizards (Blanus cinereus), and two Iberian green lizards (Lacerta schreiberi), respectively. Phylogeny inference showed every novel putative virus to be a member of the genus Atadenovirus. This is the very first description of the occurrence of AdVs in amphisbaenian and lacertid hosts. Unlike all squamate atadenoviruses examined previously, two of the novel putative AdVs had A+T rich DNA, a feature generally deemed to mirror previous host switch events. Our results shed new light on the diversity and evolution of atadenoviruses.

The zoonotic potential of Giardia intestinalis assemblage E in rural settings.

PubMed

Abdel-Moein, Khaled A; Saeed, Hossam

2016-08-01

Giardiasis is a globally re-emerging protozoan disease with veterinary and public health implications. The current study was carried out to investigate the zoonotic potential of livestock-specific assemblage E in rural settings. For this purpose, a total of 40 microscopically positive Giardia stool samples from children with gastrointestinal complaints with or without diarrhea were enrolled in the study as well as fecal samples from 46 diarrheic cattle (18 dairy cows and 28 calves). Animal samples were examined by sedimentation method to identify Giardia spp., and then, all Giardia positive samples from human and animals were processed for molecular detection of livestock-specific assemblage E through amplification of assemblage-specific triosephosphate isomerase (tpi) gene using nested polymerase chain reaction (PCR). The results of the study revealed high unexpected occurrence of assemblage E among human samples (62.5 %), whereas the distribution among patients with diarrhea and those without was 42.1 and 81 %, respectively. On the other hand, the prevalence of Giardia spp. among diarrheic dairy cattle was (8.7 %), while only calves yielded positive results (14.3 %) and all bovine Giardia spp. were genetically classified as Giardia intestinalis assemblage E. Moreover, DNA sequencing of randomly selected one positive human sample and another bovine one revealed 100 and 99 % identity with assemblage E tpi gene sequences available at GenBank after BLAST analysis. In conclusion, the current study highlights the wide dissemination of livestock-specific assemblage E among humans in rural areas, and thus, zoonotic transmission cycle should not be discounted during the control of giardiasis in such settings.
Scaling exponents for ordered maxima

DOE PAGES

Ben-Naim, E.; Krapivsky, P. L.; Lemons, N. W.

2015-12-22

We study extreme value statistics of multiple sequences of random variables. For each sequence with N variables, independently drawn from the same distribution, the running maximum is defined as the largest variable to date. We compare the running maxima of m independent sequences and investigate the probability S N that the maxima are perfectly ordered, that is, the running maximum of the first sequence is always larger than that of the second sequence, which is always larger than the running maximum of the third sequence, and so on. The probability S N is universal: it does not depend on themore » distribution from which the random variables are drawn. For two sequences, S N~N –1/2, and in general, the decay is algebraic, S N~N –σm, for large N. We analytically obtain the exponent σ 3≅1.302931 as root of a transcendental equation. Moreover, the exponents σ m grow with m, and we show that σ m~m for large m.« less
Random Number Generation and Executive Functions in Parkinson's Disease: An Event-Related Brain Potential Study.

PubMed

Münte, Thomas F; Joppich, Gregor; Däuper, Jan; Schrader, Christoph; Dengler, Reinhard; Heldmann, Marcus

2015-01-01

The generation of random sequences is considered to tax executive functions and has been reported to be impaired in Parkinson's disease (PD) previously. To assess the neurophysiological markers of random number generation in PD. Event-related potentials (ERP) were recorded in 12 PD patients and 12 age-matched normal controls (NC) while either engaging in random number generation (RNG) by pressing the number keys on a computer keyboard in a random sequence or in ordered number generation (ONG) necessitating key presses in the canonical order. Key presses were paced by an external auditory stimulus at a rate of 1 tone every 1800 ms. As a secondary task subjects had to monitor the tone-sequence for a particular target tone to which the number "0" key had to be pressed. This target tone occurred randomly and infrequently, thus creating a secondary oddball task. Behaviorally, PD patients showed an increased tendency to count in steps of one as well as a tendency towards repetition avoidance. Electrophysiologically, the amplitude of the P3 component of the ERP to the target tone of the secondary task was reduced during RNG in PD but not in NC. The behavioral findings indicate less random behavior in PD while the ERP findings suggest that this impairment comes about, because attentional resources are depleted in PD.
Single molecule counting and assessment of random molecular tagging errors with transposable giga-scale error-correcting barcodes.

PubMed

Lau, Billy T; Ji, Hanlee P

2017-09-21

RNA-Seq measures gene expression by counting sequence reads belonging to unique cDNA fragments. Molecular barcodes commonly in the form of random nucleotides were recently introduced to improve gene expression measures by detecting amplification duplicates, but are susceptible to errors generated during PCR and sequencing. This results in false positive counts, leading to inaccurate transcriptome quantification especially at low input and single-cell RNA amounts where the total number of molecules present is minuscule. To address this issue, we demonstrated the systematic identification of molecular species using transposable error-correcting barcodes that are exponentially expanded to tens of billions of unique labels. We experimentally showed random-mer molecular barcodes suffer from substantial and persistent errors that are difficult to resolve. To assess our method's performance, we applied it to the analysis of known reference RNA standards. By including an inline random-mer molecular barcode, we systematically characterized the presence of sequence errors in random-mer molecular barcodes. We observed that such errors are extensive and become more dominant at low input amounts. We described the first study to use transposable molecular barcodes and its use for studying random-mer molecular barcode errors. Extensive errors found in random-mer molecular barcodes may warrant the use of error correcting barcodes for transcriptome analysis as input amounts decrease.
Enhanced Wang Landau sampling of adsorbed protein conformations.

PubMed

Radhakrishna, Mithun; Sharma, Sumit; Kumar, Sanat K

2012-03-21

Using computer simulations to model the folding of proteins into their native states is computationally expensive due to the extraordinarily low degeneracy of the ground state. In this paper, we develop an efficient way to sample these folded conformations using Wang Landau sampling coupled with the configurational bias method (which uses an unphysical "temperature" that lies between the collapse and folding transition temperatures of the protein). This method speeds up the folding process by roughly an order of magnitude over existing algorithms for the sequences studied. We apply this method to study the adsorption of intrinsically disordered hydrophobic polar protein fragments on a hydrophobic surface. We find that these fragments, which are unstructured in the bulk, acquire secondary structure upon adsorption onto a strong hydrophobic surface. Apparently, the presence of a hydrophobic surface allows these random coil fragments to fold by providing hydrophobic contacts that were lost in protein fragmentation. © 2012 American Institute of Physics
A survey of fish viruses isolated from wild marine fishes from the coastal waters of southern Korea.

PubMed

Kim, Wi-Sik; Choi, Shin-Young; Kim, Do-Hyung; Oh, Myung-Joo

2013-11-01

A survey was conducted to investigate viral infection in 253 wild marine fishes harvested in the southern coastal area of Korea from 2010 to 2012. The fish that were captured by local anglers were randomly bought and sampled for virus examination. The samples were tested for presence of virus by virus isolation with FHM, FSP, and BF-2 cells and molecular methods (polymerase chain reaction and sequencing). Of the 253 fish sampled, 9 fish were infected with virus. Aquabirnaviruses (ABVs), Viral hemorrhagic septicemia virus (VHSV), and Red seabream iridovirus (RSIV) were detected in 7, 1, and 1 fish, respectively. Molecular phylogenies demonstrated the detected viruses (ABV, VHSV, and RSIV) were more closely related to viruses reported of the same type from Korea and Japan than from other countries, suggesting these viruses may be indigenous to Korean and Japanese coastal waters.
Integrating sampling techniques and inverse virtual screening: toward the discovery of artificial peptide-based receptors for ligands.

PubMed

Pérez, Germán M; Salomón, Luis A; Montero-Cabrera, Luis A; de la Vega, José M García; Mascini, Marcello

2016-05-01

A novel heuristic using an iterative select-and-purge strategy is proposed. It combines statistical techniques for sampling and classification by rigid molecular docking through an inverse virtual screening scheme. This approach aims to the de novo discovery of short peptides that may act as docking receptors for small target molecules when there are no data available about known association complexes between them. The algorithm performs an unbiased stochastic exploration of the sample space, acting as a binary classifier when analyzing the entire peptides population. It uses a novel and effective criterion for weighting the likelihood of a given peptide to form an association complex with a particular ligand molecule based on amino acid sequences. The exploratory analysis relies on chemical information of peptides composition, sequence patterns, and association free energies (docking scores) in order to converge to those peptides forming the association complexes with higher affinities. Statistical estimations support these results providing an association probability by improving predictions accuracy even in cases where only a fraction of all possible combinations are sampled. False positives/false negatives ratio was also improved with this method. A simple rigid-body docking approach together with the proper information about amino acid sequences was used. The methodology was applied in a retrospective docking study to all 8000 possible tripeptide combinations using the 20 natural amino acids, screened against a training set of 77 different ligands with diverse functional groups. Afterward, all tripeptides were screened against a test set of 82 ligands, also containing different functional groups. Results show that our integrated methodology is capable of finding a representative group of the top-scoring tripeptides. The associated probability of identifying the best receptor or a group of the top-ranked receptors is more than double and about 10 times higher, respectively, when compared to classical random sampling methods.
Distribution and sequence homogeneity of an abundant satellite DNA in the beetle, Tenebrio molitor.

PubMed Central

Davis, C A; Wyatt, G R

1989-01-01

The mealworm beetle, Tenebrio molitor, contains an unusually abundant and homogeneous satellite DNA which constitutes up to 60% of its genome. The satellite DNA is shown to be present in all of the chromosomes by in situ hybridization. 18 dimers of the repeat unit were cloned and sequenced. The consensus sequence is 142 nt long and lacks any internal repeat structure. Monomers of the sequence are very similar, showing on average a 2% divergence from the calculated consensus. Variant nucleotides are scattered randomly throughout the sequence although some variants are more common than others. Neighboring repeat units are no more alike than randomly chosen ones. The results suggest that some mechanism, perhaps gene conversion, is acting to maintain the homogeneity of the satellite DNA despite its abundance and distribution on all of the chromosomes. Images PMID:2762148
RSAT: regulatory sequence analysis tools.

PubMed

Thomas-Chollier, Morgane; Sand, Olivier; Turatsinze, Jean-Valéry; Janky, Rekin's; Defrance, Matthieu; Vervisch, Eric; Brohée, Sylvain; van Helden, Jacques

2008-07-01

The regulatory sequence analysis tools (RSAT, http://rsat.ulb.ac.be/rsat/) is a software suite that integrates a wide collection of modular tools for the detection of cis-regulatory elements in genome sequences. The suite includes programs for sequence retrieval, pattern discovery, phylogenetic footprint detection, pattern matching, genome scanning and feature map drawing. Random controls can be performed with random gene selections or by generating random sequences according to a variety of background models (Bernoulli, Markov). Beyond the original word-based pattern-discovery tools (oligo-analysis and dyad-analysis), we recently added a battery of tools for matrix-based detection of cis-acting elements, with some original features (adaptive background models, Markov-chain estimation of P-values) that do not exist in other matrix-based scanning tools. The web server offers an intuitive interface, where each program can be accessed either separately or connected to the other tools. In addition, the tools are now available as web services, enabling their integration in programmatic workflows. Genomes are regularly updated from various genome repositories (NCBI and EnsEMBL) and 682 organisms are currently supported. Since 1998, the tools have been used by several hundreds of researchers from all over the world. Several predictions made with RSAT were validated experimentally and published.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

NASA Astrophysics Data System (ADS)

Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

2011-07-01

An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.
Analysis of sequencing and scheduling methods for arrival traffic

NASA Technical Reports Server (NTRS)

Neuman, Frank; Erzberger, Heinz

1990-01-01

The air traffic control subsystem that performs scheduling is discussed. The function of the scheduling algorithms is to plan automatically the most efficient landing order and to assign optimally spaced landing times to all arrivals. Several important scheduling algorithms are described and the statistical performance of the scheduling algorithms is examined. Scheduling brings order to an arrival sequence for aircraft. First-come-first-served scheduling (FCFS) establishes a fair order, based on estimated times of arrival, and determines proper separations. Because of the randomness of the traffic, gaps will remain in the scheduled sequence of aircraft. These gaps are filled, or partially filled, by time-advancing the leading aircraft after a gap while still preserving the FCFS order. Tightly scheduled groups of aircraft remain with a mix of heavy and large aircraft. Separation requirements differ for different types of aircraft trailing each other. Advantage is taken of this fact through mild reordering of the traffic, thus shortening the groups and reducing average delays. Actual delays for different samples with the same statistical parameters vary widely, especially for heavy traffic.
Sequence and Structure Dependent DNA-DNA Interactions

NASA Astrophysics Data System (ADS)

Kopchick, Benjamin; Qiu, Xiangyun

Molecular forces between dsDNA strands are largely dominated by electrostatics and have been extensively studied. Quantitative knowledge has been accumulated on how DNA-DNA interactions are modulated by varied biological constituents such as ions, cationic ligands, and proteins. Despite its central role in biology, the sequence of DNA has not received substantial attention and ``random'' DNA sequences are typically used in biophysical studies. However, ~50% of human genome is composed of non-random-sequence DNAs, particularly repetitive sequences. Furthermore, covalent modifications of DNA such as methylation play key roles in gene functions. Such DNAs with specific sequences or modifications often take on structures other than the canonical B-form. Here we present series of quantitative measurements of the DNA-DNA forces with the osmotic stress method on different DNA sequences, from short repeats to the most frequent sequences in genome, and to modifications such as bromination and methylation. We observe peculiar behaviors that appear to be strongly correlated with the incurred structural changes. We speculate the causalities in terms of the differences in hydration shell and DNA surface structures.
Do humans and nonhuman animals share the grouping principles of the Iambic - Trochaic Law?

PubMed Central

de la Mora, Daniela M.; Nespor, Marina; Toro, Juan M.

2014-01-01

The Iambic-Trochaic Law describes humans’ tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sound marks group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sound marks group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group as trochees sequences based on pitch variations, but they do not group as iambs sequences varying in duration. Importantly, humans grouped the same stimuli following the principles of the Iambic-Trochaic Law (Experiment 3). These results suggest an early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals as well, whereas the iambic rhythmic grouping bias based on duration might depend on language experience. PMID:22956287
Do humans and nonhuman animals share the grouping principles of the iambic-trochaic law?

PubMed

de la Mora, Daniela M; Nespor, Marina; Toro, Juan M

2013-01-01

The iambic-trochaic law describes humans' tendency to form trochaic groups over sequences varying in pitch or intensity (i.e., the loudest or highest sounds mark group beginnings), and iambic groups over sequences varying in duration (i.e., the longest sounds mark group endings). The extent to which these perceptual biases are shared by humans and nonhuman animals is yet unclear. In Experiment 1, we trained rats to discriminate pitch-alternating sequences of tones from sequences randomly varying in pitch. In Experiment 2, rats were trained to discriminate duration-alternating sequences of tones from sequences randomly varying in duration. We found that nonhuman animals group sequences based on pitch variations as trochees, but they do not group sequences varying in duration as iambs. Importantly, humans grouped the same stimuli following the principles of the iambic-trochaic law (Exp. 3). These results suggest the early emergence of the trochaic rhythmic grouping bias based on pitch, possibly relying on perceptual abilities shared by humans and other mammals, whereas the iambic rhythmic grouping bias based on duration might depend on language experience.
Machine learning prediction for classification of outcomes in local minimisation

NASA Astrophysics Data System (ADS)

Das, Ritankar; Wales, David J.

2017-01-01

Machine learning schemes are employed to predict which local minimum will result from local energy minimisation of random starting configurations for a triatomic cluster. The input data consists of structural information at one or more of the configurations in optimisation sequences that converge to one of four distinct local minima. The ability to make reliable predictions, in terms of the energy or other properties of interest, could save significant computational resources in sampling procedures that involve systematic geometry optimisation. Results are compared for two energy minimisation schemes, and for neural network and quadratic functions of the inputs.
A test for patterns of modularity in sequences of developmental events.

PubMed

Poe, Steven

2004-08-01

This study presents a statistical test for modularity in the context of relative timing of developmental events. The test assesses whether sets of developmental events show special phylogenetic conservation of rank order. The test statistic is the correlation coefficient of developmental ranks of the N events of the hypothesized module across taxa. The null distribution is obtained by taking correlation coefficients for randomly sampled sets of N events. This test was applied to two datasets, including one where phylogenetic information was taken into account. The events of limb development in two frog species were found to behave as a module.
Proteasix: a tool for automated and large-scale prediction of proteases involved in naturally occurring peptide generation.

PubMed

Klein, Julie; Eales, James; Zürbig, Petra; Vlahou, Antonia; Mischak, Harald; Stevens, Robert

2013-04-01

In this study, we have developed Proteasix, an open-source peptide-centric tool that can be used to predict in silico the proteases involved in naturally occurring peptide generation. We developed a curated cleavage site (CS) database, containing 3500 entries about human protease/CS combinations. On top of this database, we built a tool, Proteasix, which allows CS retrieval and protease associations from a list of peptides. To establish the proof of concept of the approach, we used a list of 1388 peptides identified from human urine samples, and compared the prediction to the analysis of 1003 randomly generated amino acid sequences. Metalloprotease activity was predominantly involved in urinary peptide generation, and more particularly to peptides associated with extracellular matrix remodelling, compared to proteins from other origins. In comparison, random sequences returned almost no results, highlighting the specificity of the prediction. This study provides a tool that can facilitate linking of identified protein fragments to predicted protease activity, and therefore into presumed mechanisms of disease. Experiments are needed to confirm the in silico hypotheses; nevertheless, this approach may be of great help to better understand molecular mechanisms of disease, and define new biomarkers, and therapeutic targets. © 2013 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Genomes: At the edge of chaos with maximum information capacity

NASA Astrophysics Data System (ADS)

Kong, Sing-Guan; Chen, Hong-Da; Torda, Andrew; Lee, H. C.

2016-12-01

We propose an order index, ϕ, which quantifies the notion of “life at the edge of chaos” when applied to genome sequences. It maps genomes to a number from 0 (random and of infinite length) to 1 (fully ordered) and applies regardless of sequence length and base composition. The 786 complete genomic sequences in GenBank were found to have ϕ values in a very narrow range, 0.037 ± 0.027. We show this implies that genomes are halfway towards being completely random, namely, at the edge of chaos. We argue that this narrow range represents the neighborhood of a fixed-point in the space of sequences, and genomes are driven there by the dynamics of a robust, predominantly neutral evolution process.
Fast and secure encryption-decryption method based on chaotic dynamics

DOEpatents

Protopopescu, Vladimir A.; Santoro, Robert T.; Tolliver, Johnny S.

1995-01-01

A method and system for the secure encryption of information. The method comprises the steps of dividing a message of length L into its character components; generating m chaotic iterates from m independent chaotic maps; producing an "initial" value based upon the m chaotic iterates; transforming the "initial" value to create a pseudo-random integer; repeating the steps of generating, producing and transforming until a pseudo-random integer sequence of length L is created; and encrypting the message as ciphertext based upon the pseudo random integer sequence. A system for accomplishing the invention is also provided.
Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity

PubMed Central

Hurst, Gregory D.D.

2017-01-01

High throughput (or ‘next generation’) sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and ‘contaminating’ material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these ‘contaminations’ provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee (Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo. We conclude that ‘contamination’ in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses. PMID:28717593

Short reads from honey bee (Apis sp.) sequencing projects reflect microbial associate diversity.

PubMed

Gerth, Michael; Hurst, Gregory D D

2017-01-01

High throughput (or 'next generation') sequencing has transformed most areas of biological research and is now a standard method that underpins empirical study of organismal biology, and (through comparison of genomes), reveals patterns of evolution. For projects focused on animals, these sequencing methods do not discriminate between the primary target of sequencing (the animal genome) and 'contaminating' material, such as associated microbes. A common first step is to filter out these contaminants to allow better assembly of the animal genome or transcriptome. Here, we aimed to assess if these 'contaminations' provide information with regard to biologically important microorganisms associated with the individual. To achieve this, we examined whether the short read data from Apis retrieved elements of its well established microbiome. To this end, we screened almost 1,000 short read libraries of honey bee ( Apis sp.) DNA sequencing project for the presence of microbial sequences, and find sequences from known honey bee microbial associates in at least 11% of them. Further to this, we screened ∼500 Apis RNA sequencing libraries for evidence of viral infections, which were found to be present in about half of them. We then used the data to reconstruct draft genomes of three Apis associated bacteria, as well as several viral strains de novo . We conclude that 'contamination' in short read sequencing libraries can provide useful genomic information on microbial taxa known to be associated with the target organisms, and may even lead to the discovery of novel associations. Finally, we demonstrate that RNAseq samples from experiments commonly carry uneven viral loads across libraries. We note variation in viral presence and load may be a confounding feature of differential gene expression analyses, and as such it should be incorporated as a random factor in analyses.
Shaping the spectrum of random-phase radar waveforms

DOEpatents

Doerry, Armin W.; Marquette, Brandeis

2017-05-09

The various technologies presented herein relate to generation of a desired waveform profile in the form of a spectrum of apparently random noise (e.g., white noise or colored noise), but with precise spectral characteristics. Hence, a waveform profile that could be readily determined (e.g., by a spoofing system) is effectively obscured. Obscuration is achieved by dividing the waveform into a series of chips, each with an assigned frequency, wherein the sequence of chips are subsequently randomized. Randomization can be a function of the application of a key to the chip sequence. During processing of the echo pulse, a copy of the randomized transmitted pulse is recovered or regenerated against which the received echo is correlated. Hence, with the echo energy range-compressed in this manner, it is possible to generate a radar image with precise impulse response.
Compact quantum random number generator based on superluminescent light-emitting diodes

NASA Astrophysics Data System (ADS)

Wei, Shihai; Yang, Jie; Fan, Fan; Huang, Wei; Li, Dashuang; Xu, Bingjie

2017-12-01

By measuring the amplified spontaneous emission (ASE) noise of the superluminescent light emitting diodes, we propose and realize a quantum random number generator (QRNG) featured with practicability. In the QRNG, after the detection and amplification of the ASE noise, the data acquisition and randomness extraction which is integrated in a field programmable gate array (FPGA) are both implemented in real-time, and the final random bit sequences are delivered to a host computer with a real-time generation rate of 1.2 Gbps. Further, to achieve compactness, all the components of the QRNG are integrated on three independent printed circuit boards with a compact design, and the QRNG is packed in a small enclosure sized 140 mm × 120 mm × 25 mm. The final random bit sequences can pass all the NIST-STS and DIEHARD tests.
Systematic versus random sampling in stereological studies.

PubMed

West, Mark J

2012-12-01

The sampling that takes place at all levels of an experimental design must be random if the estimate is to be unbiased in a statistical sense. There are two fundamental ways by which one can make a random sample of the sections and positions to be probed on the sections. Using a card-sampling analogy, one can pick any card at all out of a deck of cards. This is referred to as independent random sampling because the sampling of any one card is made without reference to the position of the other cards. The other approach to obtaining a random sample would be to pick a card within a set number of cards and others at equal intervals within the deck. Systematic sampling along one axis of many biological structures is more efficient than random sampling, because most biological structures are not randomly organized. This article discusses the merits of systematic versus random sampling in stereological studies.
[Methodological quality of an article on the treatment of gastric cancer adopted as protocol by some Chilean hospitals].

PubMed

Manterola, Carlos; Torres, Rodrigo; Burgos, Luis; Vial, Manuel; Pineda, Viviana

2006-07-01

Surgery is a curative treatment for gastric cancer (GC). As relapse is frequent, adjuvant therapies such as postoperative chemo radiotherapy have been tried. In Chile, some hospitals adopted Macdonald's study as a protocol for the treatment of GC. To determine methodological quality and internal and external validity of the Macdonald study. Three instruments were applied that assess methodological quality. A critical appraisal was done and the internal and external validity of the methodological quality was analyzed with two scales: MINCIR (Methodology and Research in Surgery), valid for therapy studies and CONSORT (Consolidated Standards of Reporting Trials), valid for randomized controlled trials (RCT). Guides and scales were applied by 5 researchers with training in clinical epidemiology. The reader's guide verified that the Macdonald study was not directed to answer a clearly defined question. There was random assignment, but the method used is not described and the patients were not considered until the end of the study (36% of the group with surgery plus chemo radiotherapy did not complete treatment). MINCIR scale confirmed a multicentric RCT, not blinded, with an unclear randomized sequence, erroneous sample size estimation, vague objectives and no exclusion criteria. CONSORT system proved the lack of working hypothesis and specific objectives as well as an absence of exclusion criteria and identification of the primary variable, an imprecise estimation of sample size, ambiguities in the randomization process, no blinding, an absence of statistical adjustment and the omission of a subgroup analysis. The instruments applied demonstrated methodological shortcomings that compromise the internal and external validity of the.
Hubble Tarantula Treasury Project - VI. Identification of Pre-Main-Sequence Stars using Machine Learning techniques

NASA Astrophysics Data System (ADS)

Ksoll, Victor F.; Gouliermis, Dimitrios A.; Klessen, Ralf S.; Grebel, Eva K.; Sabbi, Elena; Anderson, Jay; Lennon, Daniel J.; Cignoni, Michele; de Marchi, Guido; Smith, Linda J.; Tosi, Monica; van der Marel, Roeland P.

2018-05-01

The Hubble Tarantula Treasury Project (HTTP) has provided an unprecedented photometric coverage of the entire star-burst region of 30 Doradus down to the half Solar mass limit. We use the deep stellar catalogue of HTTP to identify all the pre-main-sequence (PMS) stars of the region, i.e., stars that have not started their lives on the main-sequence yet. The photometric distinction of these stars from the more evolved populations is not a trivial task due to several factors that alter their colour-magnitude diagram positions. The identification of PMS stars requires, thus, sophisticated statistical methods. We employ Machine Learning Classification techniques on the HTTP survey of more than 800,000 sources to identify the PMS stellar content of the observed field. Our methodology consists of 1) carefully selecting the most probable low-mass PMS stellar population of the star-forming cluster NGC2070, 2) using this sample to train classification algorithms to build a predictive model for PMS stars, and 3) applying this model in order to identify the most probable PMS content across the entire Tarantula Nebula. We employ Decision Tree, Random Forest and Support Vector Machine classifiers to categorise the stars as PMS and Non-PMS. The Random Forest and Support Vector Machine provided the most accurate models, predicting about 20,000 sources with a candidateship probability higher than 50 percent, and almost 10,000 PMS candidates with a probability higher than 95 percent. This is the richest and most accurate photometric catalogue of extragalactic PMS candidates across the extent of a whole star-forming complex.
Safety and Immune Responses in Children After Concurrent or Sequential 2009 H1N1 and 2009–2010 Seasonal Trivalent Influenza Vaccinations

PubMed Central

Frey, Sharon E.; Bernstein, David I.; Gerber, Michael A.; Keyserling, Harry L.; Munoz, Flor M.; Winokur, Patricia L.; Turley, Christine B.; Rupp, Richard E.; Hill, Heather; Wolff, Mark; Noah, Diana L.; Ross, Allison C.; Cress, Gretchen; Belshe, Robert B.

2012-01-01

Background. Administering 2 separate vaccines for seasonal and pandemic influenza was necessary in 2009. Therefore, we conducted a randomized trial of monovalent 2009 H1N1 influenza vaccine (2009 H1N1 vaccine) and seasonal trivalent inactivated influenza vaccine (TIV; split virion) given sequentially or concurrently in previously vaccinated children. Methods. Children randomized to 4 study groups and stratified by age received 1 dose of seasonal TIV and 2 doses of 2009 H1N1 vaccine in 1 of 4 combinations. Injections were given at 21-day intervals and serum samples for hemagglutination inhibition antibody responses were obtained prior to and 21 days after each vaccination. Reactogenicity and adverse events were monitored. Results. All combinations of vaccines were safe in the 531 children enrolled. Generally, 1 dose of 2009 H1N1 vaccine and 1 dose of TIV, regardless of sequence or concurrency of administration, was immunogenic in children ≥10 years of age; children <10 years of age required 2 doses of 2009 H1N1 vaccine. Conclusions. Vaccines were generally well tolerated. The immune responses to 2009 H1N1 vaccine were adequate regardless of the sequence of vaccination in all age groups but the sequence affected titers to TIV antigens. Two doses of 2009 H1N1 vaccine were required to achieve a protective immune response in children <10 years of age. Clinical Trials Registration. NCT00943202. PMID:22802432
Effects of learning duration on implicit transfer.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2015-10-01

Implicit learning and transfer in sequence acquisition play important roles in daily life. Several previous studies have found that even when participants are not aware that a transfer sequence has been transformed from the learning sequence, they are able to perform the transfer sequence faster and more accurately; this suggests implicit transfer of visuomotor sequences. Here, we investigated whether implicit transfer could be modulated by the number of trials completed in a learning session. Participants learned a sequence through trial and error, known as the m × n task (Hikosaka et al. in J Neurophysiol 74:1652-1661, 1995). In the learning session, participants were required to successfully perform the same sequence 4, 12, 16, or 20 times. In the transfer session, participants then learned one of two other sequences: one where the button configuration Vertically Mirrored the learning sequence, or a randomly generated sequence. Our results show that even when participants did not notice the alternation rule (i.e., vertical mirroring), their total working time was less and their total number of errors was lower in the transfer session compared with those who performed a Random sequence, irrespective of the number of trials completed in the learning session. This result suggests that implicit transfer likely occurs even over a shorter learning duration.
Maintenance treatment for opioid dependence with slow-release oral morphine: a randomized cross-over, non-inferiority study versus methadone

PubMed Central

Beck, Thilo; Haasen, Christian; Verthein, Uwe; Walcher, Stephan; Schuler, Christoph; Backmund, Markus; Ruckes, Christian; Reimer, Jens

2014-01-01

Aims To compare the efficacy of slow-release oral morphine (SROM) and methadone as maintenance medication for opioid dependence in patients previously treated with methadone. Design Prospective, multiple-dose, open label, randomized, non-inferiority, cross-over study over two 11-week periods. Methadone treatment was switched to SROM with flexible dosing and vice versa according to period and sequence of treatment. Setting Fourteen out-patient addiction treatment centres in Switzerland and Germany. Participants Adults with opioid dependence in methadone maintenance programmes (dose ≥50 mg/day) for ≥26 weeks. Measurements The efficacy end-point was the proportion of heroin-positive urine samples per patient and period of treatment. Each week, two urine samples were collected, randomly selected and analysed for 6-monoacetyl-morphine and 6-acetylcodeine. Non-inferiority was concluded if the two-sided 95% confidence interval (CI) in the difference of proportions of positive urine samples was below the predefined boundary of 10%. Findings One hundred and fifty-seven patients fulfilled criteria to form the per protocol population. The proportion of heroin-positive urine samples under SROM treatment (0.20) was non-inferior to the proportion under methadone treatment (0.15) (least-squares mean difference 0.05; 95% CI = 0.02, 0.08; P > 0.01). The 95% CI fell within the 10% non-inferiority margin, confirming the non-inferiority of SROM to methadone. A dose-dependent effect was shown for SROM (i.e. decreasing proportions of heroin-positive urine samples with increasing SROM doses). Retention in treatment showed no significant differences between treatments (period 1/period 2: SROM: 88.7%/82.1%, methadone: 91.1%/88.0%; period 1: P = 0.50, period 2: P = 0.19). Overall, safety outcomes were similar between the two groups. Conclusions Slow-release oral morphine appears to be at least as effective as methadone in treating people with opioid use disorder. PMID:24304412
A Mainly Circum-Mediterranean Origin for West Eurasian and North African mtDNAs in Puerto Rico with Strong Contributions from the Canary Islands and West Africa.

PubMed

Díaz-Zabala, Héctor J; Nieves-Colón, María A; Martínez-Cruzado, Juan C

2017-04-01

Maternal lineages of West Eurasian and North African origin account for 11.5% of total mitochondrial ancestry in Puerto Rico. Historical sources suggest that this ancestry arrived mostly from European migrations that took place during the four centuries of the Spanish colonization of Puerto Rico. This study analyzed 101 mitochondrial control region sequences and diagnostic coding region variants from a sample set randomly and systematically selected using a census-based sampling frame to be representative of the Puerto Rican population, with the goal of defining West Eurasian-North African maternal clades and estimating their possible geographical origin. Median-joining haplotype networks were constructed using hypervariable regions 1 and 2 sequences from various reference populations in search of shared haplotypes. A posterior probability analysis was performed to estimate the percentage of possible origins across wide geographic regions for the entire sample set and for the most common haplogroups on the island. Principal component analyses were conducted to place the Puerto Rican mtDNA set within the variation present among all reference populations. Our study shows that up to 38% of West Eurasian and North African mitochondrial ancestry in Puerto Rico most likely migrated from the Canary Islands. However, most of those haplotypes had previously migrated to the Canary Islands from elsewhere, and there are substantial contributions from various populations across the circum-Mediterranean region and from West African populations related to the modern Wolof and Serer peoples from Senegal and the nomad Fulani who extend up to Cameroon. In conclusion, the West Eurasian mitochondrial ancestry in Puerto Ricans is geographically diverse. However, haplotype diversity seems to be low, and frequencies have been shaped by population bottlenecks, migration waves, and random genetic drift. Consequently, approximately 47% of mtDNAs of West Eurasian and North African ancestry in Puerto Rico probably arrived early in its colonial history.
On the Role of Aggregation Prone Regions in Protein Evolution, Stability, and Enzymatic Catalysis: Insights from Diverse Analyses

PubMed Central

Buck, Patrick M.; Kumar, Sandeep; Singh, Satish K.

2013-01-01

The various roles that aggregation prone regions (APRs) are capable of playing in proteins are investigated here via comprehensive analyses of multiple non-redundant datasets containing randomly generated amino acid sequences, monomeric proteins, intrinsically disordered proteins (IDPs) and catalytic residues. Results from this study indicate that the aggregation propensities of monomeric protein sequences have been minimized compared to random sequences with uniform and natural amino acid compositions, as observed by a lower average aggregation propensity and fewer APRs that are shorter in length and more often punctuated by gate-keeper residues. However, evidence for evolutionary selective pressure to disrupt these sequence regions among homologous proteins is inconsistent. APRs are less conserved than average sequence identity among closely related homologues (≥80% sequence identity with a parent) but APRs are more conserved than average sequence identity among homologues that have at least 50% sequence identity with a parent. Structural analyses of APRs indicate that APRs are three times more likely to contain ordered versus disordered residues and that APRs frequently contribute more towards stabilizing proteins than equal length segments from the same protein. Catalytic residues and APRs were also found to be in structural contact significantly more often than expected by random chance. Our findings suggest that proteins have evolved by optimizing their risk of aggregation for cellular environments by both minimizing aggregation prone regions and by conserving those that are important for folding and function. In many cases, these sequence optimizations are insufficient to develop recombinant proteins into commercial products. Rational design strategies aimed at improving protein solubility for biotechnological purposes should carefully evaluate the contributions made by candidate APRs, targeted for disruption, towards protein structure and activity. PMID:24146608
Genarris: Random generation of molecular crystal structures and fast screening with a Harris approximation

NASA Astrophysics Data System (ADS)

Li, Xiayue; Curtis, Farren S.; Rose, Timothy; Schober, Christoph; Vazquez-Mayagoitia, Alvaro; Reuter, Karsten; Oberhofer, Harald; Marom, Noa

2018-06-01

We present Genarris, a Python package that performs configuration space screening for molecular crystals of rigid molecules by random sampling with physical constraints. For fast energy evaluations, Genarris employs a Harris approximation, whereby the total density of a molecular crystal is constructed via superposition of single molecule densities. Dispersion-inclusive density functional theory is then used for the Harris density without performing a self-consistency cycle. Genarris uses machine learning for clustering, based on a relative coordinate descriptor developed specifically for molecular crystals, which is shown to be robust in identifying packing motif similarity. In addition to random structure generation, Genarris offers three workflows based on different sequences of successive clustering and selection steps: the "Rigorous" workflow is an exhaustive exploration of the potential energy landscape, the "Energy" workflow produces a set of low energy structures, and the "Diverse" workflow produces a maximally diverse set of structures. The latter is recommended for generating initial populations for genetic algorithms. Here, the implementation of Genarris is reported and its application is demonstrated for three test cases.
Translocation and deletion breakpoints in cancer genomes are associated with potential non-B DNA-forming sequences.

PubMed

Bacolla, Albino; Tainer, John A; Vasquez, Karen M; Cooper, David N

2016-07-08

Gross chromosomal rearrangements (including translocations, deletions, insertions and duplications) are a hallmark of cancer genomes and often create oncogenic fusion genes. An obligate step in the generation of such gross rearrangements is the formation of DNA double-strand breaks (DSBs). Since the genomic distribution of rearrangement breakpoints is non-random, intrinsic cellular factors may predispose certain genomic regions to breakage. Notably, certain DNA sequences with the potential to fold into secondary structures [potential non-B DNA structures (PONDS); e.g. triplexes, quadruplexes, hairpin/cruciforms, Z-DNA and single-stranded looped-out structures with implications in DNA replication and transcription] can stimulate the formation of DNA DSBs. Here, we tested the postulate that these DNA sequences might be found at, or in close proximity to, rearrangement breakpoints. By analyzing the distribution of PONDS-forming sequences within ±500 bases of 19 947 translocation and 46 365 sequence-characterized deletion breakpoints in cancer genomes, we find significant association between PONDS-forming repeats and cancer breakpoints. Specifically, (AT)n, (GAA)n and (GAAA)n constitute the most frequent repeats at translocation breakpoints, whereas A-tracts occur preferentially at deletion breakpoints. Translocation breakpoints near PONDS-forming repeats also recur in different individuals and patient tumor samples. Hence, PONDS-forming sequences represent an intrinsic risk factor for genomic rearrangements in cancer genomes. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Does the sequence of data collection influence participants' responses to closed and open-ended questions? A methodological study.

PubMed

Covell, Christine L; Sidani, Souraya; Ritchie, Judith A

2012-06-01

The sequence used for collecting quantitative and qualitative data in concurrent mixed-methods research may influence participants' responses. Empirical evidence is needed to determine if the order of data collection in concurrent mixed methods research biases participants' responses to closed and open-ended questions. To examine the influence of the quantitative-qualitative sequence on responses to closed and open-ended questions when assessing the same variables or aspects of a phenomenon simultaneously within the same study phase. A descriptive cross-sectional, concurrent mixed-methods design was used to collect quantitative (survey) and qualitative (interview) data. The setting was a large multi-site health care centre in Canada. A convenience sample of 50 registered nurses was selected and participated in the study. Participants were randomly assigned to one of two sequences for data collection, quantitative-qualitative or qualitative-quantitative. Independent t-tests were performed to compare the two groups' responses to the survey items. Directed content analysis was used to compare the participants' responses to the interview questions. The sequence of data collection did not greatly affect the participants' responses to the closed-ended questions (survey items) or the open-ended questions (interview questions). The sequencing of data collection, when using both survey and semi-structured interviews, may not bias participants' responses to closed or open-ended questions. Additional research is required to confirm these findings. Copyright © 2011 Elsevier Ltd. All rights reserved.
From cultured to uncultured genome sequences: metagenomics and modeling microbial ecosystems.

PubMed

Garza, Daniel R; Dutilh, Bas E

2015-11-01

Microorganisms and the viruses that infect them are the most numerous biological entities on Earth and enclose its greatest biodiversity and genetic reservoir. With strength in their numbers, these microscopic organisms are major players in the cycles of energy and matter that sustain all life. Scientists have only scratched the surface of this vast microbial world through culture-dependent methods. Recent developments in generating metagenomes, large random samples of nucleic acid sequences isolated directly from the environment, are providing comprehensive portraits of the composition, structure, and functioning of microbial communities. Moreover, advances in metagenomic analysis have created the possibility of obtaining complete or nearly complete genome sequences from uncultured microorganisms, providing important means to study their biology, ecology, and evolution. Here we review some of the recent developments in the field of metagenomics, focusing on the discovery of genetic novelty and on methods for obtaining uncultured genome sequences, including through the recycling of previously published datasets. Moreover we discuss how metagenomics has become a core scientific tool to characterize eco-evolutionary patterns of microbial ecosystems, thus allowing us to simultaneously discover new microbes and study their natural communities. We conclude by discussing general guidelines and challenges for modeling the interactions between uncultured microorganisms and viruses based on the information contained in their genome sequences. These models will significantly advance our understanding of the functioning of microbial ecosystems and the roles of microbes in the environment.
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples

PubMed Central

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-01-01

Background Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. Results We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12–17 bp), C. elegans (11–17 bp), A. thaliana (11–17 bp), S. cerevisiae (10–16 bp) and E. coli (9–15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Conclusion Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues. PMID:18973670
Sequence space coverage, entropy of genomes and the potential to detect non-human DNA in human samples.

PubMed

Liu, Zhandong; Venkatesh, Santosh S; Maley, Carlo C

2008-10-30

Genomes store information for building and maintaining organisms. Complete sequencing of many genomes provides the opportunity to study and compare global information properties of those genomes. We have analyzed aspects of the information content of Homo sapiens, Mus musculus, Drosophila melanogaster, Caenorhabditis elegans, Arabidopsis thaliana, Saccharomyces cerevisiae, and Escherichia coli (K-12) genomes. Virtually all possible (> 98%) 12 bp oligomers appear in vertebrate genomes while < 2% of 19 bp oligomers are present. Other species showed different ranges of > 98% to < 2% of possible oligomers in D. melanogaster (12-17 bp), C. elegans (11-17 bp), A. thaliana (11-17 bp), S. cerevisiae (10-16 bp) and E. coli (9-15 bp). Frequencies of unique oligomers in the genomes follow similar patterns. We identified a set of 2.6 M 15-mers that are more than 1 nucleotide different from all 15-mers in the human genome and so could be used as probes to detect microbes in human samples. In a human sample, these probes would detect 100% of the 433 currently fully sequenced prokaryotes and 75% of the 3065 fully sequenced viruses. The human genome is significantly more compact in sequence space than a random genome. We identified the most frequent 5- to 20-mers in the human genome, which may prove useful as PCR primers. We also identified a bacterium, Anaeromyxobacter dehalogenans, which has an exceptionally low diversity of oligomers given the size of its genome and its GC content. The entropy of coding regions in the human genome is significantly higher than non-coding regions and chromosomes. However chromosomes 1, 2, 9, 12 and 14 have a relatively high proportion of coding DNA without high entropy, and chromosome 20 is the opposite with a low frequency of coding regions but relatively high entropy. Measures of the frequency of oligomers are useful for designing PCR assays and for identifying chromosomes and organisms with hidden structure that had not been previously recognized. This information may be used to detect novel microbes in human tissues.
Quasirandom geometric networks from low-discrepancy sequences

NASA Astrophysics Data System (ADS)

Estrada, Ernesto

2017-08-01

We define quasirandom geometric networks using low-discrepancy sequences, such as Halton, Sobol, and Niederreiter. The networks are built in d dimensions by considering the d -tuples of digits generated by these sequences as the coordinates of the vertices of the networks in a d -dimensional Id unit hypercube. Then, two vertices are connected by an edge if they are at a distance smaller than a connection radius. We investigate computationally 11 network-theoretic properties of two-dimensional quasirandom networks and compare them with analogous random geometric networks. We also study their degree distribution and their spectral density distributions. We conclude from this intensive computational study that in terms of the uniformity of the distribution of the vertices in the unit square, the quasirandom networks look more random than the random geometric networks. We include an analysis of potential strategies for generating higher-dimensional quasirandom networks, where it is know that some of the low-discrepancy sequences are highly correlated. In this respect, we conclude that up to dimension 20, the use of scrambling, skipping and leaping strategies generate quasirandom networks with the desired properties of uniformity. Finally, we consider a diffusive process taking place on the nodes and edges of the quasirandom and random geometric graphs. We show that the diffusion time is shorter in the quasirandom graphs as a consequence of their larger structural homogeneity. In the random geometric graphs the diffusion produces clusters of concentration that make the process more slow. Such clusters are a direct consequence of the heterogeneous and irregular distribution of the nodes in the unit square in which the generation of random geometric graphs is based on.
Multiple ECG Fiducial Points-Based Random Binary Sequence Generation for Securing Wireless Body Area Networks.

PubMed

Zheng, Guanglou; Fang, Gengfa; Shankaran, Rajan; Orgun, Mehmet A; Zhou, Jie; Qiao, Li; Saleem, Kashif

2017-05-01

Generating random binary sequences (BSes) is a fundamental requirement in cryptography. A BS is a sequence of N bits, and each bit has a value of 0 or 1. For securing sensors within wireless body area networks (WBANs), electrocardiogram (ECG)-based BS generation methods have been widely investigated in which interpulse intervals (IPIs) from each heartbeat cycle are processed to produce BSes. Using these IPI-based methods to generate a 128-bit BS in real time normally takes around half a minute. In order to improve the time efficiency of such methods, this paper presents an ECG multiple fiducial-points based binary sequence generation (MFBSG) algorithm. The technique of discrete wavelet transforms is employed to detect arrival time of these fiducial points, such as P, Q, R, S, and T peaks. Time intervals between them, including RR, RQ, RS, RP, and RT intervals, are then calculated based on this arrival time, and are used as ECG features to generate random BSes with low latency. According to our analysis on real ECG data, these ECG feature values exhibit the property of randomness and, thus, can be utilized to generate random BSes. Compared with the schemes that solely rely on IPIs to generate BSes, this MFBSG algorithm uses five feature values from one heart beat cycle, and can be up to five times faster than the solely IPI-based methods. So, it achieves a design goal of low latency. According to our analysis, the complexity of the algorithm is comparable to that of fast Fourier transforms. These randomly generated ECG BSes can be used as security keys for encryption or authentication in a WBAN system.
Strategies for Achieving High Sequencing Accuracy for Low Diversity Samples and Avoiding Sample Bleeding Using Illumina Platform

PubMed Central

Mitra, Abhishek; Skrzypczak, Magdalena; Ginalski, Krzysztof; Rowicka, Maga

2015-01-01

Sequencing microRNA, reduced representation sequencing, Hi-C technology and any method requiring the use of in-house barcodes result in sequencing libraries with low initial sequence diversity. Sequencing such data on the Illumina platform typically produces low quality data due to the limitations of the Illumina cluster calling algorithm. Moreover, even in the case of diverse samples, these limitations are causing substantial inaccuracies in multiplexed sample assignment (sample bleeding). Such inaccuracies are unacceptable in clinical applications, and in some other fields (e.g. detection of rare variants). Here, we discuss how both problems with quality of low-diversity samples and sample bleeding are caused by incorrect detection of clusters on the flowcell during initial sequencing cycles. We propose simple software modifications (Long Template Protocol) that overcome this problem. We present experimental results showing that our Long Template Protocol remarkably increases data quality for low diversity samples, as compared with the standard analysis protocol; it also substantially reduces sample bleeding for all samples. For comprehensiveness, we also discuss and compare experimental results from alternative approaches to sequencing low diversity samples. First, we discuss how the low diversity problem, if caused by barcodes, can be avoided altogether at the barcode design stage. Second and third, we present modified guidelines, which are more stringent than the manufacturer’s, for mixing low diversity samples with diverse samples and lowering cluster density, which in our experience consistently produces high quality data from low diversity samples. Fourth and fifth, we present rescue strategies that can be applied when sequencing results in low quality data and when there is no more biological material available. In such cases, we propose that the flowcell be re-hybridized and sequenced again using our Long Template Protocol. Alternatively, we discuss how analysis can be repeated from saved sequencing images using the Long Template Protocol to increase accuracy. PMID:25860802

Affinity selection of Nipah and Hendra virus-related vaccine candidates from a complex random peptide library displayed on bacteriophage virus-like particles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Peabody, David S.; Chackerian, Bryce; Ashley, Carlee

The invention relates to virus-like particles of bacteriophage MS2 (MS2 VLPs) displaying peptide epitopes or peptide mimics of epitopes of Nipah Virus envelope glycoprotein that elicit an immune response against Nipah Virus upon vaccination of humans or animals. Affinity selection on Nipah Virus-neutralizing monoclonal antibodies using random sequence peptide libraries on MS2 VLPs selected peptides with sequence similarity to peptide sequences found within the envelope glycoprotein of Nipah itself, thus identifying the epitopes the antibodies recognize. The selected peptide sequences themselves are not necessarily identical in all respects to a sequence within Nipah Virus glycoprotein, and therefore may be referredmore » to as epitope mimics VLPs displaying these epitope mimics can serve as vaccine. On the other hand, display of the corresponding wild-type sequence derived from Nipah Virus and corresponding to the epitope mapped by affinity selection, may also be used as a vaccine.« less
Cluster-Based Multipolling Sequencing Algorithm for Collecting RFID Data in Wireless LANs

NASA Astrophysics Data System (ADS)

Choi, Woo-Yong; Chatterjee, Mainak

2015-03-01

With the growing use of RFID (Radio Frequency Identification), it is becoming important to devise ways to read RFID tags in real time. Access points (APs) of IEEE 802.11-based wireless Local Area Networks (LANs) are being integrated with RFID networks that can efficiently collect real-time RFID data. Several schemes, such as multipolling methods based on the dynamic search algorithm and random sequencing, have been proposed. However, as the number of RFID readers associated with an AP increases, it becomes difficult for the dynamic search algorithm to derive the multipolling sequence in real time. Though multipolling methods can eliminate the polling overhead, we still need to enhance the performance of the multipolling methods based on random sequencing. To that extent, we propose a real-time cluster-based multipolling sequencing algorithm that drastically eliminates more than 90% of the polling overhead, particularly so when the dynamic search algorithm fails to derive the multipolling sequence in real time.
Influence of motion on face recognition.

PubMed

Bonfiglio, Natale S; Manfredi, Valentina; Pessa, Eliano

2012-02-01

The influence of motion information and temporal associations on recognition of non-familiar faces was investigated using two groups which performed a face recognition task. One group was presented with regular temporal sequences of face views designed to produce the impression of motion of the face rotating in depth, the other group with random sequences of the same views. In one condition, participants viewed the sequences of the views in rapid succession with a negligible interstimulus interval (ISI). This condition was characterized by three different presentation times. In another condition, participants were presented a sequence with a 1-sec. ISI among the views. That regular sequences of views with a negligible ISI and a shorter presentation time were hypothesized to give rise to better recognition, related to a stronger impression of face rotation. Analysis of data from 45 participants showed a shorter presentation time was associated with significantly better accuracy on the recognition task; however, differences between performances associated with regular and random sequences were not significant.
Tick-borne haemoparasites and Anaplasmataceae in domestic dogs in Zambia.

PubMed

Qiu, Yongjin; Kaneko, Chiho; Kajihara, Masahiro; Ngonda, Saasa; Simulundu, Edgar; Muleya, Walter; Thu, May June; Hang'ombe, Mudenda Bernard; Katakura, Ken; Takada, Ayato; Sawa, Hirofumi; Simuunza, Martin; Nakao, Ryo

2018-05-01

Tick-borne diseases (TBDs), including emerging and re-emerging infectious diseases, are important threats to human and animal health worldwide. Indeed, the number of reported human and animal infectious cases of novel TBD agents has increased in recent decades. However, TBDs tend to be neglected, especially in resource-limited countries that often have limited diagnostic capacity. The aim of this molecular survey was to detect and characterise tick-borne pathogens (Babesia, Theileria, and Hepatozoon parasites and Anaplasmataceae bacteria) in domestic dogs in Zambia. In total, 247 canine peripheral blood samples were collected in Lusaka, Mazabuka, Monze, and Shangombo. Conventional PCR to detect the selected pathogens was performed using DNA extracted from canine blood. One hundred eleven samples were positive for protozoa and 5 were positive for Anaplasmataceae. Sequencing of thirty-five randomly selected protozoa-positive samples revealed the presence of Babesia rossi, Babesia vogeli, and Hepatozoon canis 18S rDNA. Based on these sequences, a multiplex PCR system was developed to yield PCR products with different amplicons, the size of which depended on the parasite species; thus, each species could be identified without the need for sequence analysis. Approximately 40% of dogs were positive for H. canis. In particular, the positive rate (75.2%) of H. canis infection was significantly higher in Shangombo than in other sampling sites. Multiplex PCR assay detected B. rossi and B. vogeli infections in five and seven dogs, respectively, indicating that this approach is useful for detecting parasites with low prevalence. Sequencing analysis of gltA and groEL genes of Anaplasmataceae revealed that two and one dogs in Lusaka were infected with Anaplasma platys and Ehrlichia canis, respectively. The data indicated that Zambian dogs were infected with multiple tick-borne pathogens such as H. canis, B. rossi, B. vogeli, A. platys, E. canis and uncharacterized Ehrlichia sp. Since some of these parasites are zoonotic, concerted efforts are needed to raise awareness of, and control, these tick-borne pathogens. Copyright © 2018 Elsevier GmbH. All rights reserved.
Use of Sequence-independent, single-primer amplification (SISPA) with NGS platform for detection of RNA viruses in clinical samples

USDA-ARS?s Scientific Manuscript database

Current technologies for next generation sequencing (NGS) have revolutionized metagenomics analysis of clinical samples. One advantage of the NGS platform is the possibility to sequence the genetic material in samples without any prior knowledge of the sequence contained within. Sequence-Independent...
VarWalker: Personalized Mutation Network Analysis of Putative Cancer Genes from Next-Generation Sequencing Data

PubMed Central

Jia, Peilin; Zhao, Zhongming

2014-01-01

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data. PMID:24516372
VarWalker: personalized mutation network analysis of putative cancer genes from next-generation sequencing data.

PubMed

Jia, Peilin; Zhao, Zhongming

2014-02-01

A major challenge in interpreting the large volume of mutation data identified by next-generation sequencing (NGS) is to distinguish driver mutations from neutral passenger mutations to facilitate the identification of targetable genes and new drugs. Current approaches are primarily based on mutation frequencies of single-genes, which lack the power to detect infrequently mutated driver genes and ignore functional interconnection and regulation among cancer genes. We propose a novel mutation network method, VarWalker, to prioritize driver genes in large scale cancer mutation data. VarWalker fits generalized additive models for each sample based on sample-specific mutation profiles and builds on the joint frequency of both mutation genes and their close interactors. These interactors are selected and optimized using the Random Walk with Restart algorithm in a protein-protein interaction network. We applied the method in >300 tumor genomes in two large-scale NGS benchmark datasets: 183 lung adenocarcinoma samples and 121 melanoma samples. In each cancer, we derived a consensus mutation subnetwork containing significantly enriched consensus cancer genes and cancer-related functional pathways. These cancer-specific mutation networks were then validated using independent datasets for each cancer. Importantly, VarWalker prioritizes well-known, infrequently mutated genes, which are shown to interact with highly recurrently mutated genes yet have been ignored by conventional single-gene-based approaches. Utilizing VarWalker, we demonstrated that network-assisted approaches can be effectively adapted to facilitate the detection of cancer driver genes in NGS data.
Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

PubMed Central

Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

2017-01-01

Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

DOE PAGES

Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

2017-08-22

It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic ( H) and polar ( P) monomers in a computational model. We find that even short hydrophobic polar ( HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating othermore » such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition.« less
Foldamer hypothesis for the growth and sequence differentiation of prebiotic polymers

PubMed Central

Guseva, Elizaveta; Zuckermann, Ronald N.; Dill, Ken A.

2017-01-01

It is not known how life originated. It is thought that prebiotic processes were able to synthesize short random polymers. However, then, how do short-chain molecules spontaneously grow longer? Also, how would random chains grow more informational and become autocatalytic (i.e., increasing their own concentrations)? We study the folding and binding of random sequences of hydrophobic (H) and polar (P) monomers in a computational model. We find that even short hydrophobic polar (HP) chains can collapse into relatively compact structures, exposing hydrophobic surfaces. In this way, they act as primitive versions of today’s protein catalysts, elongating other such HP polymers as ribosomes would now do. Such foldamer catalysts are shown to form an autocatalytic set, through which short chains grow into longer chains that have particular sequences. An attractive feature of this model is that it does not overconverge to a single solution; it gives ensembles that could further evolve under selection. This mechanism describes how specific sequences and conformations could contribute to the chemistry-to-biology (CTB) transition. PMID:28831002
Improved Compressive Sensing of Natural Scenes Using Localized Random Sampling

PubMed Central

Barranca, Victor J.; Kovačič, Gregor; Zhou, Douglas; Cai, David

2016-01-01

Compressive sensing (CS) theory demonstrates that by using uniformly-random sampling, rather than uniformly-spaced sampling, higher quality image reconstructions are often achievable. Considering that the structure of sampling protocols has such a profound impact on the quality of image reconstructions, we formulate a new sampling scheme motivated by physiological receptive field structure, localized random sampling, which yields significantly improved CS image reconstructions. For each set of localized image measurements, our sampling method first randomly selects an image pixel and then measures its nearby pixels with probability depending on their distance from the initially selected pixel. We compare the uniformly-random and localized random sampling methods over a large space of sampling parameters, and show that, for the optimal parameter choices, higher quality image reconstructions can be consistently obtained by using localized random sampling. In addition, we argue that the localized random CS optimal parameter choice is stable with respect to diverse natural images, and scales with the number of samples used for reconstruction. We expect that the localized random sampling protocol helps to explain the evolutionarily advantageous nature of receptive field structure in visual systems and suggests several future research areas in CS theory and its application to brain imaging. PMID:27555464
A Micro-Computer Model for Army Air Defense Training.

DTIC Science & Technology

1985-03-01

generator. The period is 32763 numbers generated before a repetitive sequence is encountered on the development system. Chi-Squared tests for frequency...C’ Tests CPeriodicity. The period is 32763 numbers generated C’before a repetitive sequence is encountered on the development system. This was...positions in the test array. This was done with several different random number seeds. In each case 32763 p random numbers were generated before a
The short-term treatment effects on the microbiota at the dorsum of the tongue in intra-oral halitosis patients--a randomized clinical trial.

PubMed

Ademovski, Seida Erovic; Persson, G Rutger; Winkel, Edwin; Tangerman, Albert; Lingström, Peter; Renvert, Stefan

2013-03-01

This study aims to assess the effects of rinsing with zinc- and chlorhexidine-containing mouth rinse with or without adjunct tongue scraping on volatile sulfur compounds (VSCs) in breath air, and the microbiota at the dorsum of the tongue. A randomized single-masked controlled clinical trial with a cross-over study design over 14 days including 21 subjects was performed. Bacterial samples from the dorsum of the tongue were assayed by checkerboard DNA-DNA hybridization. No halitosis (identified by VSC assessments) at day 14 was identified in 12/21 subjects with active rinse alone, in 10/21 with adjunct use of tongue scraper, in 1/21 for negative control rinse alone, and in 3/21 in the control and tongue scraping sequence. At day 14, significantly lower counts were identified only in the active rinse sequence (p < 0.001) for 15/78 species including, Fusobacterium sp., Porphyromonas gingivalis, Pseudomonas aeruginosa, Staphylococcus aureus, and Tannerella forsythia. A decrease in bacteria from baseline to day 14 was found in successfully treated subjects for 9/74 species including: P. gingivalis, Prevotella melaninogenica, S. aureus, and Treponema denticola. Baseline VSC scores were correlated with several bacterial species. The use of a tongue scraper combined with active rinse did not change the levels of VSC compared to rinsing alone. VSC scores were not associated with bacterial counts in samples taken from the dorsum of the tongue. The active rinse alone containing zinc and chlorhexidine had effects on intra-oral halitosis and reduced bacterial counts of species associated with malodor. Tongue scraping provided no beneficial effects on the microbiota studied. Periodontally healthy subjects with intra-oral halitosis benefit from daily rinsing with zinc- and chlorhexidine-containing mouth rinse.
Not all (possibly) “random” sequences are created equal

PubMed Central

Pincus, Steve; Kalman, Rudolf E.

1997-01-01

The need to assess the randomness of a single sequence, especially a finite sequence, is ubiquitous, yet is unaddressed by axiomatic probability theory. Here, we assess randomness via approximate entropy (ApEn), a computable measure of sequential irregularity, applicable to single sequences of both (even very short) finite and infinite length. We indicate the novelty and facility of the multidimensional viewpoint taken by ApEn, in contrast to classical measures. Furthermore and notably, for finite length, finite state sequences, one can identify maximally irregular sequences, and then apply ApEn to quantify the extent to which given sequences differ from maximal irregularity, via a set of deficit (defm) functions. The utility of these defm functions which we show allows one to considerably refine the notions of probabilistic independence and normality, is featured in several studies, including (i) digits of e, π, √2, and √3, both in base 2 and in base 10, and (ii) sequences given by fractional parts of multiples of irrationals. We prove companion analytic results, which also feature in a discussion of the role and validity of the almost sure properties from axiomatic probability theory insofar as they apply to specified sequences and sets of sequences (in the physical world). We conclude by relating the present results and perspective to both previous and subsequent studies. PMID:11038612
Construction of random sheared fosmid library from Chinese cabbage and its use for Brassica rapa genome sequencing project.

PubMed

Park, Tae-Ho; Park, Beom-Seok; Kim, Jin-A; Hong, Joon Ki; Jin, Mina; Seol, Young-Joo; Mun, Jeong-Hwan

2011-01-01

As a part of the Multinational Genome Sequencing Project of Brassica rapa, linkage group R9 and R3 were sequenced using a bacterial artificial chromosome (BAC) by BAC strategy. The current physical contigs are expected to cover approximately 90% euchromatins of both chromosomes. As the project progresses, BAC selection for sequence extension becomes more limited because BAC libraries are restriction enzyme-specific. To support the project, a random sheared fosmid library was constructed. The library consists of 97536 clones with average insert size of approximately 40 kb corresponding to seven genome equivalents, assuming a Chinese cabbage genome size of 550 Mb. The library was screened with primers designed at the end of sequences of nine points of scaffold gaps where BAC clones cannot be selected to extend the physical contigs. The selected positive clones were end-sequenced to check the overlap between the fosmid clones and the adjacent BAC clones. Nine fosmid clones were selected and fully sequenced. The sequences revealed two completed gap filling and seven sequence extensions, which can be used for further selection of BAC clones confirming that the fosmid library will facilitate the sequence completion of B. rapa. Copyright © 2011. Published by Elsevier Ltd.
Prediction of protein long-range contacts using an ensemble of genetic algorithm classifiers with sequence profile centers.

PubMed

Chen, Peng; Li, Jinyan

2010-05-17

Prediction of long-range inter-residue contacts is an important topic in bioinformatics research. It is helpful for determining protein structures, understanding protein foldings, and therefore advancing the annotation of protein functions. In this paper, we propose a novel ensemble of genetic algorithm classifiers (GaCs) to address the long-range contact prediction problem. Our method is based on the key idea called sequence profile centers (SPCs). Each SPC is the average sequence profiles of residue pairs belonging to the same contact class or non-contact class. GaCs train on multiple but different pairs of long-range contact data (positive data) and long-range non-contact data (negative data). The negative data sets, having roughly the same sizes as the positive ones, are constructed by random sampling over the original imbalanced negative data. As a result, about 21.5% long-range contacts are correctly predicted. We also found that the ensemble of GaCs indeed makes an accuracy improvement by around 5.6% over the single GaC. Classifiers with the use of sequence profile centers may advance the long-range contact prediction. In line with this approach, key structural features in proteins would be determined with high efficiency and accuracy.
The assessment of epiphytic yeast diversity in sugarcane phyllosphere in Thailand by culture-independent method.

PubMed

Nasanit, Rujikan; Tangwong-O-Thai, Apirat; Tantirungkij, Manee; Limtong, Savitree

2015-12-01

The diversity of epiphytic yeasts from sugarcane (Saccharum officinarum Linn.) phyllospheres in Thailand was investigated by culture-independent method based on the analysis of the D1/D2 domains of the large subunit rRNA gene sequences. Forty-five samples of sugarcane leaf were collected randomly from ten provinces in Thailand. A total of 1342 clones were obtained from 45 clone libraries. 426 clones (31.7 %) were closely related to yeast strains in the GenBank database, and they were clustered into 31 operational taxonomic units (OTUs) with a similarity threshold of 99 %. All OTU sequences were classified in phylum Basidiomycota which were closely related to 11 yeast species in seven genera including Cryptococcus flavus, Hannaella coprosmaensis, Rhodotorula taiwanensis, Jaminaea angkoreiensis, Malassezia restricta, Pseudozyma antarctica, Pseudozyma aphidis, Pseudozyma hubeiensis, Pseudozyma prolifica, Pseudozyma shanxiensis, and Sporobolomyces vermiculatus. The most predominant yeasts detected belonged to Ustilaginales with 89.4 % relative frequency and the prevalent yeast genus was Pseudozyma. However, the majority were unable to be identified as known yeast species and these sequences may represent the sequences of new yeast taxa. In addition, The OTU that closely related to P. prolifica was commonly detected in sugarcane phyllosphere. Copyright © 2015 The British Mycological Society. Published by Elsevier Ltd. All rights reserved.
Viruses of invasive Argentine ants from the European Main supercolony: characterization, interactions and evolution.

PubMed

Viljakainen, Lumi; Holmberg, Ida; Abril, Sílvia; Jurvansuu, Jaana

2018-06-25

The Argentine ant (Linepithema humile) is a highly invasive pest, yet very little is known about its viruses. We analysed individual RNA-sequencing data from 48 Argentine ant queens to identify and characterisze their viruses. We discovered eight complete RNA virus genomes - all from different virus families - and one putative partial entomopoxvirus genome. Seven of the nine virus sequences were found from ant samples spanning 7 years, suggesting that these viruses may cause long-term infections within the super-colony. Although all nine viruses successfully infect Argentine ants, they have very different characteristics, such as genome organization, prevalence, loads, activation frequencies and rates of evolution. The eight RNA viruses constituted in total 23 different virus combinations which, based on statistical analysis, were non-random, suggesting that virus compatibility is a factor in infections. We also searched for virus sequences from New Zealand and Californian Argentine ant RNA-sequencing data and discovered that many of the viruses are found on different continents, yet some viruses are prevalent only in certain colonies. The viral loads described here most probably present a normal asymptomatic level of infection; nevertheless, detailed knowledge of Argentine ant viruses may enable the design of viral biocontrol methods against this pest.
Rapid and efficient cDNA library screening by self-ligation of inverse PCR products (SLIP).

PubMed

Hoskins, Roger A; Stapleton, Mark; George, Reed A; Yu, Charles; Wan, Kenneth H; Carlson, Joseph W; Celniker, Susan E

2005-12-02

cDNA cloning is a central technology in molecular biology. cDNA sequences are used to determine mRNA transcript structures, including splice junctions, open reading frames (ORFs) and 5'- and 3'-untranslated regions (UTRs). cDNA clones are valuable reagents for functional studies of genes and proteins. Expressed Sequence Tag (EST) sequencing is the method of choice for recovering cDNAs representing many of the transcripts encoded in a eukaryotic genome. However, EST sequencing samples a cDNA library at random, and it recovers transcripts with low expression levels inefficiently. We describe a PCR-based method for directed screening of plasmid cDNA libraries. We demonstrate its utility in a screen of libraries used in our Drosophila EST projects for 153 transcription factor genes that were not represented by full-length cDNA clones in our Drosophila Gene Collection. We recovered high-quality, full-length cDNAs for 72 genes and variously compromised clones for an additional 32 genes. The method can be used at any scale, from the isolation of cDNA clones for a particular gene of interest, to the improvement of large gene collections in model organisms and the human. Finally, we discuss the relative merits of directed cDNA library screening and RT-PCR approaches.

Species identification and sex determination of the genus Nepenthes (Nepenthaceae).

PubMed

Mokkamul, Piya; Chaveerach, Arunrat; Sudmoon, Runglawan; Tanee, Tawatchai

2007-02-15

Nepenthes species are well known for their ornamentally attractive pitchers. The species diversity was randomly surveyed in some conservation areas of Thailand and three species were found, namely N. gracilis Korth., N. mirabilis Druce. and N. smilesii Hemsl. Young plants as unknown species from Chatuchak market were added in plant sampled set. Thirty two Inter Simple Sequence Repeat (ISSR) primers were screened and 13 successful primers were used to produce DNA banding patterns for constructing a dendrogram. The dendrogram is potentially power tool to identify unknown species from Chatuchak market, differentiate species population, population by geographical areas and sex determination. The geographical area of N. mirabilis was specified to Southern and Northeastern regions and finally, subdivided into exact areas according to province. Male and female plants of N. gracilis at Phu Wua Wildlife Sanctuary and N. mirabilis at Bung Khonglong non-hunting area were determined. Two unknown species from Chatuchak market were analyzed to be N. mirabilis with the genetic similarities (S) 77.2 to 84.7. Be more sex specific in all sample studied, 37 Random Amplified Polymorphic DNA (RAPD) primers were investigated. The result shows that only one RAPD primer show high resolution results at about 750 bp specific male-related marker.
Observed oil and gas field size distributions: A consequence of the discovery process and prices of oil and gas

USGS Publications Warehouse

Drew, L.J.; Attanasi, E.D.; Schuenemeyer, J.H.

1988-01-01

If observed oil and gas field size distributions are obtained by random samplings, the fitted distributions should approximate that of the parent population of oil and gas fields. However, empirical evidence strongly suggests that larger fields tend to be discovered earlier in the discovery process than they would be by random sampling. Economic factors also can limit the number of small fields that are developed and reported. This paper examines observed size distributions in state and federal waters of offshore Texas. Results of the analysis demonstrate how the shape of the observable size distributions change with significant hydrocarbon price changes. Comparison of state and federal observed size distributions in the offshore area shows how production cost differences also affect the shape of the observed size distribution. Methods for modifying the discovery rate estimation procedures when economic factors significantly affect the discovery sequence are presented. A primary conclusion of the analysis is that, because hydrocarbon price changes can significantly affect the observed discovery size distribution, one should not be confident about inferring the form and specific parameters of the parent field size distribution from the observed distributions. ?? 1988 International Association for Mathematical Geology.
Are all data created equal?--Exploring some boundary conditions for a lazy intuitive statistician.

PubMed

Lindskog, Marcus; Winman, Anders

2014-01-01

The study investigated potential effects of the presentation order of numeric information on retrospective subjective judgments of descriptive statistics of this information. The studies were theoretically motivated by the assumption in the naïve sampling model of independence between temporal encoding order of data in long-term memory and retrieval probability (i.e. as implied by a "random sampling" from memory metaphor). In Experiment 1, participants experienced Arabic numbers that varied in distribution shape/variability between the first and the second half of the information sequence. Results showed no effects of order on judgments of mean, variability or distribution shape. To strengthen the interpretation of these results, Experiment 2 used a repeated judgment procedure, with an initial judgment occurring prior to the change in distribution shape of the information half-way through data presentation. The results of Experiment 2 were in line with those from Experiment 1, and in addition showed that the act of making explicit judgments did not impair accuracy of later judgments, as would be suggested by an anchoring and insufficient adjustment strategy. Overall, the results indicated that participants were very responsive to the properties of the data while at the same time being more or less immune to order effects. The results were interpreted as being in line with the naïve sampling models in which values are stored as exemplars and sampled randomly from long-term memory.
Introducing Perception and Modelling of Spatial Randomness in Classroom

ERIC Educational Resources Information Center

De Nóbrega, José Renato

2017-01-01

A strategy to facilitate understanding of spatial randomness is described, using student activities developed in sequence: looking at spatial patterns, simulating approximate spatial randomness using a grid of equally-likely squares, using binomial probabilities for approximations and predictions and then comparing with given Poisson…
UVnovo: A De Novo Sequencing Algorithm Using Single Series of Fragment Ions via Chromophore Tagging and 351 nm Ultraviolet Photodissociation Mass Spectrometry

PubMed Central

Robotham, Scott A.; Horton, Andrew P.; Cannon, Joe R.; Cotham, Victoria C.; Marcotte, Edward M.; Brodbelt, Jennifer S.

2016-01-01

De novo peptide sequencing by mass spectrometry represents an important strategy for characterizing novel peptides and proteins, in which a peptide’s amino acid sequence is inferred directly from the precursor peptide mass and tandem mass spectrum (MS/MS or MS3) fragment ions, without comparison to a reference proteome. This method is ideal for organisms or samples lacking a complete or well-annotated reference sequence set. One of the major barriers to de novo spectral interpretation arises from confusion of N- and C-terminal ion series due to the symmetry between b and y ion pairs created by collisional activation methods (or c, z ions for electron-based activation methods). This is known as the ‘antisymmetric path problem’ and leads to inverted amino acid subsequences within a de novo reconstruction. Here, we combine several key strategies for de novo peptide sequencing into a single high-throughput pipeline: high efficiency carbamylation blocks lysine side chains, and subsequent tryptic digestion and N-terminal peptide derivatization with the ultraviolet chromophore AMCA yields peptides susceptible to 351 nm ultraviolet photodissociation (UVPD). UVPD-MS/MS of the AMCA-modified peptides then predominantly produces y ions in the MS/MS spectra, specifically addressing the antisymmetric path problem. Finally, the program UVnovo applies a random forest algorithm to automatically learn from and then interpret UVPD mass spectra, passing results to a hidden Markov model for de novo sequence prediction and scoring. We show this combined strategy provides high performance de novo peptide sequencing, enabling the de novo sequencing of thousands of peptides from an E. coli lysate at high confidence. PMID:26938041
Comparison of Immunohistochemistry and Direct Sanger Sequencing for Detection of the BRAFV600E Mutation in Thyroid Neoplasm

PubMed Central

Oh, Hye-Seon; Kwon, Hyemi; Park, Suyeon; Kim, Mijin; Jeon, Min Ji; Kim, Tae Yong; Shong, Young Kee; Kim, Won Bae; Choi, Jene

2018-01-01

Background The BRAFV600E mutation is the most common genetic alteration identified in papillary thyroid carcinoma (PTC). Because of its costs effectiveness and sensitivity, direct Sanger sequencing has several limitations. The aim of this study was to evaluate the efficiency of immunohistochemistry (IHC) as an alternative method to detect the BRAFV600E mutation in preoperative and postoperative tissue samples. Methods We evaluated 71 patients who underwent thyroid surgery with the result of direct sequencing of the BRAFV600E mutation. IHC staining of the BRAFV600E mutation was performed in 49 preoperative and 23 postoperative thyroid specimens. Results Sixty-two patients (87.3%) had PTC, and of these, BRAFV600E was confirmed by direct sequencing in 57 patients (91.9%). In 23 postoperative tissue samples, the BRAFV600E mutation was detected in 16 samples (70%) by direct sequencing and 18 samples (78%) by IHC. In 24 fine needle aspiration (FNA) samples, BRAFV600E was detected in 18 samples (75%) by direct sequencing and 16 samples (67%) by IHC. In 25 core needle biopsy (CNB) samples, the BRAFV600E mutation was detected in 15 samples (60%) by direct sequencing and 16 samples (64%) by IHC. The sensitivity and specificity of IHC for detecting the BRAFV600E mutation were 77.8% and 66.7% in FNA samples and 99.3% and 80.0% in CNB samples. Conclusion IHC could be an alternative method to direct Sanger sequencing for BRAFV600E mutation detection both in postoperative and preoperative samples. However, application of IHC to detect the BRAFV600E mutation in FNA samples is of limited value compared with direct sequencing. PMID:29388401
Theory on the mechanism of site-specific DNA-protein interactions in the presence of traps

NASA Astrophysics Data System (ADS)

Niranjani, G.; Murugan, R.

2016-08-01

The speed of site-specific binding of transcription factor (TFs) proteins with genomic DNA seems to be strongly retarded by the randomly occurring sequence traps. Traps are those DNA sequences sharing significant similarity with the original specific binding sites (SBSs). It is an intriguing question how the naturally occurring TFs and their SBSs are designed to manage the retarding effects of such randomly occurring traps. We develop a simple random walk model on the site-specific binding of TFs with genomic DNA in the presence of sequence traps. Our dynamical model predicts that (a) the retarding effects of traps will be minimum when the traps are arranged around the SBS such that there is a negative correlation between the binding strength of TFs with traps and the distance of traps from the SBS and (b) the retarding effects of sequence traps can be appeased by the condensed conformational state of DNA. Our computational analysis results on the distribution of sequence traps around the putative binding sites of various TFs in mouse and human genome clearly agree well the theoretical predictions. We propose that the distribution of traps can be used as an additional metric to efficiently identify the SBSs of TFs on genomic DNA.
Pattern-based integer sample motion search strategies in the context of HEVC

NASA Astrophysics Data System (ADS)

Maier, Georg; Bross, Benjamin; Grois, Dan; Marpe, Detlev; Schwarz, Heiko; Veltkamp, Remco C.; Wiegand, Thomas

2015-09-01

The H.265/MPEG-H High Efficiency Video Coding (HEVC) standard provides a significant increase in coding efficiency compared to its predecessor, the H.264/MPEG-4 Advanced Video Coding (AVC) standard, which however comes at the cost of a high computational burden for a compliant encoder. Motion estimation (ME), which is a part of the inter-picture prediction process, typically consumes a high amount of computational resources, while significantly increasing the coding efficiency. In spite of the fact that both H.265/MPEG-H HEVC and H.264/MPEG-4 AVC standards allow processing motion information on a fractional sample level, the motion search algorithms based on the integer sample level remain to be an integral part of ME. In this paper, a flexible integer sample ME framework is proposed, thereby allowing to trade off significant reduction of ME computation time versus coding efficiency penalty in terms of bit rate overhead. As a result, through extensive experimentation, an integer sample ME algorithm that provides a good trade-off is derived, incorporating a combination and optimization of known predictive, pattern-based and early termination techniques. The proposed ME framework is implemented on a basis of the HEVC Test Model (HM) reference software, further being compared to the state-of-the-art fast search algorithm, which is a native part of HM. It is observed that for high resolution sequences, the integer sample ME process can be speed-up by factors varying from 3.2 to 7.6, resulting in the bit-rate overhead of 1.5% and 0.6% for Random Access (RA) and Low Delay P (LDP) configurations, respectively. In addition, the similar speed-up is observed for sequences with mainly Computer-Generated Imagery (CGI) content while trading off the bit rate overhead of up to 5.2%.
New method of extracting information of arterial oxygen saturation based on ∑ | 𝚫 |

NASA Astrophysics Data System (ADS)

Dai, Wenting; Lin, Ling; Li, Gang

2017-04-01

Noninvasive detection of oxygen saturation with near-infrared spectroscopy has been widely used in clinics. In order to further enhance its detection precision and reliability, this paper proposes a method of time domain absolute difference summation (∑|Δ|) based on a dynamic spectrum. In this method, the ratio of absolute differences between intervals of two differential sampling points at the same moment on logarithm photoplethysmography signals of red and infrared light was obtained in turn, and then they obtained a ratio sequence which was screened with a statistical method. Finally, use the summation of the screened ratio sequence as the oxygen saturation coefficient Q. We collected 120 reference samples of SpO2 and then compared the result of two methods, which are ∑|Δ| and peak-peak. Average root-mean-square errors of the two methods were 3.02% and 6.80%, respectively, in the 20 cases which were selected randomly. In addition, the average variance of Q of the 10 samples, which were obtained by the new method, reduced to 22.77% of that obtained by the peak-peak method. Comparing with the commercial product, the new method makes the results more accurate. Theoretical and experimental analysis indicates that the application of the ∑|Δ| method could enhance the precision and reliability of oxygen saturation detection in real time.
New method of extracting information of arterial oxygen saturation based on ∑|𝚫 |

NASA Astrophysics Data System (ADS)

Wenting, Dai; Ling, Lin; Gang, Li

2017-04-01

Noninvasive detection of oxygen saturation with near-infrared spectroscopy has been widely used in clinics. In order to further enhance its detection precision and reliability, this paper proposes a method of time domain absolute difference summation (∑|Δ|) based on a dynamic spectrum. In this method, the ratio of absolute differences between intervals of two differential sampling points at the same moment on logarithm photoplethysmography signals of red and infrared light was obtained in turn, and then they obtained a ratio sequence which was screened with a statistical method. Finally, use the summation of the screened ratio sequence as the oxygen saturation coefficient Q. We collected 120 reference samples of SpO2 and then compared the result of two methods, which are ∑|Δ| and peak-peak. Average root-mean-square errors of the two methods were 3.02% and 6.80%, respectively, in the 20 cases which were selected randomly. In addition, the average variance of Q of the 10 samples, which were obtained by the new method, reduced to 22.77% of that obtained by the peak-peak method. Comparing with the commercial product, the new method makes the results more accurate. Theoretical and experimental analysis indicates that the application of the ∑|Δ| method could enhance the precision and reliability of oxygen saturation detection in real time.
Analysis of Puumala hantavirus in a bank vole population in northern Finland: evidence for co-circulation of two genetic lineages and frequent reassortment between strains.

PubMed

Razzauti, Maria; Plyusnina, Angelina; Sironen, Tarja; Henttonen, Heikki; Plyusnin, Alexander

2009-08-01

In this study, for the first time, two distinct genetic lineages of Puumala virus (PUUV) were found within a small sampling area and within a single host genetic lineage (Ural mtDNA) at Pallasjärvi, northern Finland. Lung tissue samples of 171 bank voles (Myodes glareolus) trapped in September 1998 were screened for the presence of PUUV nucleocapsid antigen and 25 were found to be positive. Partial sequences of the PUUV small (S), medium (M) and large (L) genome segments were recovered from these samples using RT-PCR. Phylogenetic analysis revealed two genetic groups of PUUV sequences that belonged to the Finnish and north Scandinavian lineages. This presented a unique opportunity to study inter-lineage reassortment in PUUV; indeed, 32 % of the studied bank voles appeared to carry reassortant virus genomes. Thus, the frequency of inter-lineage reassortment in PUUV was comparable to that of intra-lineage reassortment observed previously (Razzauti, M., Plyusnina, A., Henttonen, H. & Plyusnin, A. (2008). J Gen Virol 89, 1649-1660). Of six possible reassortant S/M/L combinations, only two were found at Pallasjärvi and, notably, in all reassortants, both S and L segments originated from the same genetic lineage, suggesting a non-random pattern for the reassortment. These findings are discussed in connection to PUUV evolution in Fennoscandia.
On the synchronizability and detectability of random PPM sequences

NASA Technical Reports Server (NTRS)

Georghiades, Costas N.; Lin, Shu

1987-01-01

The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum-likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds derived on the symbol error probability as well as the probability of false synchronization indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.
On the synchronizability and detectability of random PPM sequences

NASA Technical Reports Server (NTRS)

Georghiades, Costas N.

1987-01-01

The problem of synchronization and detection of random pulse-position-modulation (PPM) sequences is investigated under the assumption of perfect slot synchronization. Maximum likelihood PPM symbol synchronization and receiver algorithms are derived that make decisions based both on soft as well as hard data; these algorithms are seen to be easily implementable. Bounds were derived on the symbol error probability as well as the probability of false synchronization that indicate the existence of a rather severe performance floor, which can easily be the limiting factor in the overall system performance. The performance floor is inherent in the PPM format and random data and becomes more serious as the PPM alphabet size Q is increased. A way to eliminate the performance floor is suggested by inserting special PPM symbols in the random data stream.
Random variability explains apparent global clustering of large earthquakes

USGS Publications Warehouse

Michael, A.J.

2011-01-01

The occurrence of 5 Mw ≥ 8.5 earthquakes since 2004 has created a debate over whether or not we are in a global cluster of large earthquakes, temporarily raising risks above long-term levels. I use three classes of statistical tests to determine if the record of M ≥ 7 earthquakes since 1900 can reject a null hypothesis of independent random events with a constant rate plus localized aftershock sequences. The data cannot reject this null hypothesis. Thus, the temporal distribution of large global earthquakes is well-described by a random process, plus localized aftershocks, and apparent clustering is due to random variability. Therefore the risk of future events has not increased, except within ongoing aftershock sequences, and should be estimated from the longest possible record of events.
Implicit transfer of spatial structure in visuomotor sequence learning.

PubMed

Tanaka, Kanji; Watanabe, Katsumi

2014-11-01

Implicit learning and transfer in sequence learning are essential in daily life. Here, we investigated the implicit transfer of visuomotor sequences following a spatial transformation. In the two experiments, participants used trial and error to learn a sequence consisting of several button presses, known as the m×n task (Hikosaka et al., 1995). After this learning session, participants learned another sequence in which the button configuration was spatially transformed in one of the following ways: mirrored, rotated, and random arrangement. Our results showed that even when participants were unaware of the transformation rules, accuracy of transfer session in the mirrored and rotated groups was higher than that in the random group (i.e., implicit transfer occurred). Both those who noticed the transformation rules and those who did not (i.e., explicit and implicit transfer instances, respectively) showed faster performance in the mirrored sequences than in the rotated sequences. Taken together, the present results suggest that people can use their implicit visuomotor knowledge to spatially transform sequences and that implicit transfers are modulated by a transformation cost, similar to that in explicit transfer. Copyright © 2014 Elsevier B.V. All rights reserved.
Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

NASA Astrophysics Data System (ADS)

Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

2017-03-01

This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.
Comparative effectiveness of next generation genomic sequencing for disease diagnosis: design of a randomized controlled trial in patients with colorectal cancer/polyposis syndromes.

PubMed

Gallego, Carlos J; Bennette, Caroline S; Heagerty, Patrick; Comstock, Bryan; Horike-Pyne, Martha; Hisama, Fuki; Amendola, Laura M; Bennett, Robin L; Dorschner, Michael O; Tarczy-Hornoch, Peter; Grady, William M; Fullerton, S Malia; Trinidad, Susan B; Regier, Dean A; Nickerson, Deborah A; Burke, Wylie; Patrick, Donald L; Jarvik, Gail P; Veenstra, David L

2014-09-01

Whole exome and whole genome sequencing are applications of next generation sequencing transforming clinical care, but there is little evidence whether these tests improve patient outcomes or if they are cost effective compared to current standard of care. These gaps in knowledge can be addressed by comparative effectiveness and patient-centered outcomes research. We designed a randomized controlled trial that incorporates these research methods to evaluate whole exome sequencing compared to usual care in patients being evaluated for hereditary colorectal cancer and polyposis syndromes. Approximately 220 patients will be randomized and followed for 12 months after return of genomic findings. Patients will receive findings associated with colorectal cancer in a first return of results visit, and findings not associated with colorectal cancer (incidental findings) during a second return of results visit. The primary outcome is efficacy to detect mutations associated with these syndromes; secondary outcomes include psychosocial impact, cost-effectiveness and comparative costs. The secondary outcomes will be obtained via surveys before and after each return visit. The expected challenges in conducting this randomized controlled trial include the relatively low prevalence of genetic disease, difficult interpretation of some genetic variants, and uncertainty about which incidental findings should be returned to patients. The approaches utilized in this study may help guide other investigators in clinical genomics to identify useful outcome measures and strategies to address comparative effectiveness questions about the clinical implementation of genomic sequencing in clinical care. Copyright © 2014 Elsevier Inc. All rights reserved.
Genetic analysis of the Yavapai Native Americans from West-Central Arizona using the Illumina MiSeq FGx™ forensic genomics system.

PubMed

Wendt, Frank R; Churchill, Jennifer D; Novroski, Nicole M M; King, Jonathan L; Ng, Jillian; Oldt, Robert F; McCulloh, Kelly L; Weise, Jessica A; Smith, David Glenn; Kanthaswamy, Sreetharan; Budowle, Bruce

2016-09-01

Forensically-relevant genetic markers were typed for sixty-two Yavapai Native Americans using the ForenSeq™ DNA Signature Prep Kit.These data are invaluable to the human identity community due to the greater genetic differentiation among Native American tribes than among other subdivisions within major populations of the United States. Autosomal, X-chromosomal, and Y-chromosomal short tandem repeat (STR) and identity-informative (iSNPs), ancestry-informative (aSNPs), and phenotype-informative (pSNPs) single nucleotide polymorphism (SNP) allele frequencies are reported. Sequence-based allelic variants were observed in 13 autosomal, 3 X, and 3 Y STRs. These observations increased observed and expected heterozygosities for autosomal STRs by 0.081±0.068 and 0.073±0.063, respectively, and decreased single-locus random match probabilities by 0.051±0.043 for 13 autosomal STRs. The autosomal random match probabilities (RMPs) were 2.37×10-26 and 2.81×10-29 for length-based and sequence-based alleles, respectively. There were 22 and 25 unique Y-STR haplotypes among 26 males, generating haplotype diversities of 0.95 and 0.96, for length-based and sequencebased alleles, respectively. Of the 26 haplotypes generated, 17 were assigned to haplogroup Q, three to haplogroup R1b, two each to haplogroups E1b1b and L, and one each to haplogroups R1a and I1. Male and female sequence-based X-STR random match probabilities were 3.28×10-7 and 1.22×10-6, respectively. The average observed and expected heterozygosities for 94 iSNPs were 0.39±0.12 and 0.39±0.13, respectively, and the combined iSNP RMP was 1.08×10-32. The combined STR and iSNP RMPs were 2.55×10-58 and 3.02×10-61 for length-based and sequence-based STR alleles, respectively. Ancestry and phenotypic SNP information, performed using the ForenSeq™ Universal Analysis Software, predicted black hair, brown eyes, and some probability of East Asian ancestry for all but one sample that clustered between European and Admixed American ancestry on a principal components analysis. These data serve as the first population assessment using the ForenSeq™ panel and highlight the value of employing sequence-based alleles for forensic DNA typing to increase heterozygosity, which is beneficial for identity testing in populations with reduced genetic diversity. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Dice and DNA

ERIC Educational Resources Information Center

Wernersson, Rasmus

2007-01-01

An important part of teaching students how to use the BLAST tool for searching large sequence databases, is to train the students to think critically about the quality of the sequence hits found--both in terms of the statistical significance and how informative the individual hits are. This paper describes how generating truly random sequences by…
[Molecular epidemiology and transmission of HIV-1 infection in Zhejiang province, 2015].

PubMed

Yang, J Z; Chen, W J; Zhang, W J; He, L; Zhang, J F; Pan, X H

2017-11-10

Objective: To understand the distribution of HIV-1 subtype diversity and its transmission characteristics in Zhejiang province. Methods: A total of 302 newly diagnosed HIV-1 positive patients were selected through stratified random sampling in Zhejiang in 2015. HIV-1 pol genes were sequenced successfully with reverse transcription PCR/nested PCR and phylogenetic analysis was conducted for 276 patients. Then a molecular epidemiologic study was performed combined with field epidemiological investigation. Results: Of 276 sequence samples analyzed, 122 CRF07_BC strains (44.2%), 103 CRF01_AE strains (37.3%), 17 CRF08_BC strains (6.1%), 9 B strains (3.2%), 6 CRF55_01B strains (2.2%), 5 C strains (1.8%), 1 CRF59_01B strain (0.4%), 1 CRF67_01B strain (0.4%), 1 A1 strain (0.4%), and 11 URFs strains (4.0%) were identified. Phylogenetic analysis revealed 16 clusters with only 15.1% (34/225) sequences involved among CRF07_BC and CRF01_AE strains. The clustered cases in MSM were higher than that in populations with other transmission routes. And clusters existed between the populations with different transmission routes. Conclusion: The major strains of HIV-1 in Zhejiang are CRF07_BC and CRF01_AE. The HIV subtypes showed more complexity in Zhejiang. It is necessary to strengthen the surveillance for HIV subtypes, carry out classified management and conduct effective prevention and control in the population at high risk.

An Unconditional Test for Change Point Detection in Binary Sequences with Applications to Clinical Registries.

PubMed

Ellenberger, David; Friede, Tim

2016-08-05

Methods for change point (also sometimes referred to as threshold or breakpoint) detection in binary sequences are not new and were introduced as early as 1955. Much of the research in this area has focussed on asymptotic and exact conditional methods. Here we develop an exact unconditional test. An unconditional exact test is developed which assumes the total number of events as random instead of conditioning on the number of observed events. The new test is shown to be uniformly more powerful than Worsley's exact conditional test and means for its efficient numerical calculations are given. Adaptions of methods by Berger and Boos are made to deal with the issue that the unknown event probability imposes a nuisance parameter. The methods are compared in a Monte Carlo simulation study and applied to a cohort of patients undergoing traumatic orthopaedic surgery involving external fixators where a change in pin site infections is investigated. The unconditional test controls the type I error rate at the nominal level and is uniformly more powerful than (or to be more precise uniformly at least as powerful as) Worsley's exact conditional test which is very conservative for small sample sizes. In the application a beneficial effect associated with the introduction of a new treatment procedure for pin site care could be revealed. We consider the new test an effective and easy to use exact test which is recommended in small sample size change point problems in binary sequences.
Characterization of Microbial Communities in Gas Industry Pipelines

PubMed Central

Zhu, Xiang Y.; Lubeck, John; Kilbane, John J.

2003-01-01

Culture-independent techniques, denaturing gradient gel electrophoresis (DGGE) analysis, and random cloning of 16S rRNA gene sequences amplified from community DNA were used to determine the diversity of microbial communities in gas industry pipelines. Samples obtained from natural gas pipelines were used directly for DNA extraction, inoculated into sulfate-reducing bacterium medium, or used to inoculate a reactor that simulated a natural gas pipeline environment. The variable V2-V3 (average size, 384 bp) and V3-V6 (average size, 648 bp) regions of bacterial and archaeal 16S rRNA genes, respectively, were amplified from genomic DNA isolated from nine natural gas pipeline samples and analyzed. A total of 106 bacterial 16S rDNA sequences were derived from DGGE bands, and these formed three major clusters: beta and gamma subdivisions of Proteobacteria and gram-positive bacteria. The most frequently encountered bacterial species was Comamonas denitrificans, which was not previously reported to be associated with microbial communities found in gas pipelines or with microbially influenced corrosion. The 31 archaeal 16S rDNA sequences obtained in this study were all related to those of methanogens and phylogenetically fall into three clusters: order I, Methanobacteriales; order III, Methanomicrobiales; and order IV, Methanosarcinales. Further microbial ecology studies are needed to better understand the relationship among bacterial and archaeal groups and the involvement of these groups in the process of microbially influenced corrosion in order to develop improved ways of monitoring and controlling microbially influenced corrosion. PMID:12957923
Differential Expression and Functional Analysis of High-Throughput -Omics Data Using Open Source Tools.

PubMed

Kebschull, Moritz; Fittler, Melanie Julia; Demmer, Ryan T; Papapanou, Panos N

2017-01-01

Today, -omics analyses, including the systematic cataloging of messenger RNA and microRNA sequences or DNA methylation patterns in a cell population, organ, or tissue sample, allow for an unbiased, comprehensive genome-level analysis of complex diseases, offering a large advantage over earlier "candidate" gene or pathway analyses. A primary goal in the analysis of these high-throughput assays is the detection of those features among several thousand that differ between different groups of samples. In the context of oral biology, our group has successfully utilized -omics technology to identify key molecules and pathways in different diagnostic entities of periodontal disease.A major issue when inferring biological information from high-throughput -omics studies is the fact that the sheer volume of high-dimensional data generated by contemporary technology is not appropriately analyzed using common statistical methods employed in the biomedical sciences.In this chapter, we outline a robust and well-accepted bioinformatics workflow for the initial analysis of -omics data generated using microarrays or next-generation sequencing technology using open-source tools. Starting with quality control measures and necessary preprocessing steps for data originating from different -omics technologies, we next outline a differential expression analysis pipeline that can be used for data from both microarray and sequencing experiments, and offers the possibility to account for random or fixed effects. Finally, we present an overview of the possibilities for a functional analysis of the obtained data.
Biased phylodynamic inferences from analysing clusters of viral sequences

PubMed Central

Xiang, Fei; Frost, Simon D. W.

2017-01-01

Abstract Phylogenetic methods are being increasingly used to help understand the transmission dynamics of measurably evolving viruses, including HIV. Clusters of highly similar sequences are often observed, which appear to follow a ‘power law’ behaviour, with a small number of very large clusters. These clusters may help to identify subpopulations in an epidemic, and inform where intervention strategies should be implemented. However, clustering of samples does not necessarily imply the presence of a subpopulation with high transmission rates, as groups of closely related viruses can also occur due to non-epidemiological effects such as over-sampling. It is important to ensure that observed phylogenetic clustering reflects true heterogeneity in the transmitting population, and is not being driven by non-epidemiological effects. We qualify the effect of using a falsely identified ‘transmission cluster’ of sequences to estimate phylodynamic parameters including the effective population size and exponential growth rate under several demographic scenarios. Our simulation studies show that taking the maximum size cluster to re-estimate parameters from trees simulated under a randomly mixing, constant population size coalescent process systematically underestimates the overall effective population size. In addition, the transmission cluster wrongly resembles an exponential or logistic growth model 99% of the time. We also illustrate the consequences of false clusters in exponentially growing coalescent and birth-death trees, where again, the growth rate is skewed upwards. This has clear implications for identifying clusters in large viral databases, where a false cluster could result in wasted intervention resources. PMID:28852573
A k-mer-based barcode DNA classification methodology based on spectral representation and a neural gas network.

PubMed

Fiannaca, Antonino; La Rosa, Massimo; Rizzo, Riccardo; Urso, Alfonso

2015-07-01

In this paper, an alignment-free method for DNA barcode classification that is based on both a spectral representation and a neural gas network for unsupervised clustering is proposed. In the proposed methodology, distinctive words are identified from a spectral representation of DNA sequences. A taxonomic classification of the DNA sequence is then performed using the sequence signature, i.e., the smallest set of k-mers that can assign a DNA sequence to its proper taxonomic category. Experiments were then performed to compare our method with other supervised machine learning classification algorithms, such as support vector machine, random forest, ripper, naïve Bayes, ridor, and classification tree, which also consider short DNA sequence fragments of 200 and 300 base pairs (bp). The experimental tests were conducted over 10 real barcode datasets belonging to different animal species, which were provided by the on-line resource "Barcode of Life Database". The experimental results showed that our k-mer-based approach is directly comparable, in terms of accuracy, recall and precision metrics, with the other classifiers when considering full-length sequences. In addition, we demonstrate the robustness of our method when a classification is performed task with a set of short DNA sequences that were randomly extracted from the original data. For example, the proposed method can reach the accuracy of 64.8% at the species level with 200-bp fragments. Under the same conditions, the best other classifier (random forest) reaches the accuracy of 20.9%. Our results indicate that we obtained a clear improvement over the other classifiers for the study of short DNA barcode sequence fragments. Copyright © 2015 Elsevier B.V. All rights reserved.
CpG PatternFinder: a Windows-based utility program for easy and rapid identification of the CpG methylation status of DNA.

PubMed

Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C

2007-09-01

The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.
Contrast-enhanced 3-dimensional SPACE versus MP-RAGE for the detection of brain metastases: considerations with a 32-channel head coil.

PubMed

Reichert, Miriam; Morelli, John N; Runge, Val M; Tao, Ai; von Ritschl, Ruediger; von Ritschl, Andreas; Padua, Abraham; Dix, James E; Marra, Michael J; Schoenberg, Stefan O; Attenberger, Ulrike I

2013-01-01

The aim of this study was to compare the detection of brain metastases at 3 T using a 32-channel head coil with 2 different 3-dimensional (3D) contrast-enhanced sequences, a T1-weighted fast spin-echo-based (SPACE; sampling perfection with application-optimized contrasts using different flip angle evolutions) sequence and a conventional magnetization-prepared rapid gradient-echo (MP-RAGE) sequence. Seventeen patients with 161 brain metastases were examined prospectively using both SPACE and MP-RAGE sequences on a 3-T magnetic resonance system. Eight healthy volunteers were similarly examined for determination of signal-to-noise ratio (SNR) values. Parameters were adjusted to equalize acquisition times between the sequences (3 minutes and 30 seconds). The order in which sequences were performed was randomized. Two blinded board-certified neuroradiologists evaluated the number of detectable metastatic lesions with each sequence relative to a criterion standard reading conducted at the Gamma Knife facility by a neuroradiologist with access to all clinical and imaging data. In the volunteer assessment with SPACE and MP-RAGE, SNR (10.3 ± 0.8 vs 7.7 ± 0.7) and contrast-to-noise ratio (0.8 ± 0.2 vs 0.5 ± 0.1) were statistically significantly greater with the SPACE sequence (P < 0.05). Overall, lesion detection was markedly improved with the SPACE sequence (99.1% of lesions for reader 1 and 96.3% of lesions for reader 2) compared with the MP-RAGE sequence (73.6% of lesions for reader 1 and 68.5% of lesions for reader 2; P < 0.01). A 3D T1-weighted fast spin echo sequence (SPACE) improves detection of metastatic lesions relative to 3D T1-weighted gradient-echo-based scan (MP-RAGE) imaging when implemented with a 32-channel head coil at identical scan acquisition times (3 minutes and 30 seconds).
Computationally assisted screening and design of cell-interactive peptides by a cell-based assay using peptide arrays and a fuzzy neural network algorithm.

PubMed

Kaga, Chiaki; Okochi, Mina; Tomita, Yasuyuki; Kato, Ryuji; Honda, Hiroyuki

2008-03-01

We developed a method of effective peptide screening that combines experiments and computational analysis. The method is based on the concept that screening efficiency can be enhanced from even limited data by use of a model derived from computational analysis that serves as a guide to screening and combining the model with subsequent repeated experiments. Here we focus on cell-adhesion peptides as a model application of this peptide-screening strategy. Cell-adhesion peptides were screened by use of a cell-based assay of a peptide array. Starting with the screening data obtained from a limited, random 5-mer library (643 sequences), a rule regarding structural characteristics of cell-adhesion peptides was extracted by fuzzy neural network (FNN) analysis. According to this rule, peptides with unfavored residues in certain positions that led to inefficient binding were eliminated from the random sequences. In the restricted, second random library (273 sequences), the yield of cell-adhesion peptides having an adhesion rate more than 1.5-fold to that of the basal array support was significantly high (31%) compared with the unrestricted random library (20%). In the restricted third library (50 sequences), the yield of cell-adhesion peptides increased to 84%. We conclude that a repeated cycle of experiments screening limited numbers of peptides can be assisted by the rule-extracting feature of FNN.
Individual Differences Methods for Randomized Experiments

ERIC Educational Resources Information Center

Tucker-Drob, Elliot M.

2011-01-01

Experiments allow researchers to randomly vary the key manipulation, the instruments of measurement, and the sequences of the measurements and manipulations across participants. To date, however, the advantages of randomized experiments to manipulate both the aspects of interest and the aspects that threaten internal validity have been primarily…
Problems with the random number generator RANF implemented on the CDC cyber 205

NASA Astrophysics Data System (ADS)

Kalle, Claus; Wansleben, Stephan

1984-10-01

We show that using RANF may lead to wrong results when lattice models are simulated by Monte Carlo methods. We present a shift-register sequence random number generator which generates two random numbers per cycle on a two pipe CDC Cyber 205.
Viral morphogenesis is the dominant source of sequence censorship in M13 combinatorial peptide phage display.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rodi, D. J.; Soares, A. S.; Makowski, L.

Novel statistical methods have been developed and used to quantitate and annotate the sequence diversity within combinatorial peptide libraries on the basis of small numbers (1-200) of sequences selected at random from commercially available M13 p3-based phage display libraries. These libraries behave statistically as though they correspond to populations containing roughly 4.0{+-}1.6% of the random dodecapeptides and 7.9{+-}2.6% of the random constrained heptapeptides that are theoretically possible within the phage populations. Analysis of amino acid residue occurrence patterns shows no demonstrable influence on sequence censorship by Escherichia coli tRNA isoacceptor profiles or either overall codon or Class II codon usagemore » patterns, suggesting no metabolic constraints on recombinant p3 synthesis. There is an overall depression in the occurrence of cysteine, arginine and glycine residues and an overabundance of proline, threonine and histidine residues. The majority of position-dependent amino acid sequence bias is clustered at three positions within the inserted peptides of the dodecapeptide library, +1, +3 and +12 downstream from the signal peptidase cleavage site. Conformational tendency measures of the peptides indicate a significant preference for inserts favoring a {beta}-turn conformation. The observed protein sequence limitations can primarily be attributed to genetic codon degeneracy and signal peptidase cleavage preferences. These data suggest that for applications in which maximal sequence diversity is essential, such as epitope mapping or novel receptor identification, combinatorial peptide libraries should be constructed using codon-corrected trinucleotide cassettes within vector-host systems designed to minimize morphogenesis-related censorship.« less
Rare event simulation in radiation transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kollman, Craig

1993-10-01

This dissertation studies methods for estimating extremely small probabilities by Monte Carlo simulation. Problems in radiation transport typically involve estimating very rare events or the expected value of a random variable which is with overwhelming probability equal to zero. These problems often have high dimensional state spaces and irregular geometries so that analytic solutions are not possible. Monte Carlo simulation must be used to estimate the radiation dosage being transported to a particular location. If the area is well shielded the probability of any one particular particle getting through is very small. Because of the large number of particles involved,more » even a tiny fraction penetrating the shield may represent an unacceptable level of radiation. It therefore becomes critical to be able to accurately estimate this extremely small probability. Importance sampling is a well known technique for improving the efficiency of rare event calculations. Here, a new set of probabilities is used in the simulation runs. The results are multiple by the likelihood ratio between the true and simulated probabilities so as to keep the estimator unbiased. The variance of the resulting estimator is very sensitive to which new set of transition probabilities are chosen. It is shown that a zero variance estimator does exist, but that its computation requires exact knowledge of the solution. A simple random walk with an associated killing model for the scatter of neutrons is introduced. Large deviation results for optimal importance sampling in random walks are extended to the case where killing is present. An adaptive ``learning`` algorithm for implementing importance sampling is given for more general Markov chain models of neutron scatter. For finite state spaces this algorithm is shown to give with probability one, a sequence of estimates converging exponentially fast to the true solution.« less
Estimating residual fault hitting rates by recapture sampling

NASA Technical Reports Server (NTRS)

Lee, Larry; Gupta, Rajan

1988-01-01

For the recapture debugging design introduced by Nayak (1988) the problem of estimating the hitting rates of the faults remaining in the system is considered. In the context of a conditional likelihood, moment estimators are derived and are shown to be asymptotically normal and fully efficient. Fixed sample properties of the moment estimators are compared, through simulation, with those of the conditional maximum likelihood estimators. Properties of the conditional model are investigated such as the asymptotic distribution of linear functions of the fault hitting frequencies and a representation of the full data vector in terms of a sequence of independent random vectors. It is assumed that the residual hitting rates follow a log linear rate model and that the testing process is truncated when the gaps between the detection of new errors exceed a fixed amount of time.
Molecular Epidemiology and Phylogenetic Analyses of Influenza B Virus in Thailand during 2010 to 2014

PubMed Central

Tewawong, Nipaporn; Suwannakarn, Kamol; Prachayangprecha, Slinporn; Korkong, Sumeth; Vichiwattana, Preeyaporn; Vongpunsawad, Sompong; Poovorawan, Yong

2015-01-01

Influenza B virus remains a major contributor to the seasonal influenza outbreak and its prevalence has increased worldwide. We investigated the epidemiology and analyzed the full genome sequences of influenza B virus strains in Thailand between 2010 and 2014. Samples from the upper respiratory tract were collected from patients diagnosed with influenza like-illness. All samples were screened for influenza A/B viruses by one-step multiplex real-time RT-PCR. The whole genome of 53 influenza B isolates were amplified, sequenced, and analyzed. From 14,418 respiratory samples collected during 2010 to 2014, a total of 3,050 tested positive for influenza virus. Approximately 3.27% (471/14,418) were influenza B virus samples. Fifty three isolates of influenza B virus were randomly chosen for detailed whole genome analysis. Phylogenetic analysis of the HA gene showed clusters in Victoria clades 1A, 1B, 3, 5 and Yamagata clades 2 and 3. Both B/Victoria and B/Yamagata lineages were found to co-circulate during this time. The NA sequences of all isolates belonged to lineage II and consisted of viruses from both HA Victoria and Yamagata lineages, reflecting possible reassortment of the HA and NA genes. No significant changes were seen in the NA protein. The phylogenetic trees generated through the analysis of the PB1 and PB2 genes closely resembled that of the HA gene, while trees generated from the analysis of the PA, NP, and M genes showed similar topology. The NS gene exhibited the pattern of genetic reassortment distinct from those of the PA, NP or M genes. Thus, antigenic drift and genetic reassortment among the influenza B virus strains were observed in the isolates examined. Our findings indicate that the co-circulation of two distinct lineages of influenza B viruses and the limitation of cross-protection of the current vaccine formulation provide support for quadrivalent influenza vaccine in this region. PMID:25602617
Whole-gene analysis of two groups of hepatitis B virus C/D inter-genotype recombinant strains isolated in Tibet, China

PubMed Central

Liu, Tiezhu; Wang, Fuzhen; Zhang, Shuang; Wang, Feng; Meng, Qingling; Zhang, Guomin; Cui, Fuqiang; Dunzhu, Dorji; Yin, Wenjiao; Bi, Shengli

2017-01-01

Tibet is a highly hepatitis B virus (HBV) endemic area. Two types of C/D recombinant HBV are commonly isolated in Tibet and have been previously described. In an effort to better understand the molecular characteristic of these C/D recombinant strains from Tibet, we undertook a multistage random sampling project to collect HBsAg positive samples. Molecular epidemiological and bio-informational technologies were used to analyze the characteristics of the sequences found in this study. There were 60 samples enrolled in the survey, and we obtained 19 whole-genome sequences. 19 samples were all C/D recombinant, and could be divided into two sub-types named C/D1 and C/D2 according to the differences in the location of the recombinant breakpoint. The recombination breakpoint of the 10 strains belonging to the C/D1 sub-type was located at nt750, while the 9 stains belonging to C/D2 had their recombination break point at nt1530. According to whole-genome sequence analysis, the 19 identified strains belong to genotype C, but the nucleotide distance was more than 5% between the 19 strains and sub-genotypes C1 to C15. The distance between C/D1with C2 was 5.8±2.1%, while the distance between C/D2 with C2 was 6.4±2.1%. The parental strain was most likely sub-genotype C2. C/D1 strains were all collected in the middle and northern areas of Tibet including Lhasa, Linzhi and Ali, while C/D2 was predominant in Shannan in southern Tibet. This indicates that the two recombinant genotypes are regionally distributed in Tibet. These results provide important information for the study of special HBV recombination events, gene features, virus evolution, and the control and prevention policy of HBV in Tibet. PMID:28654691
Diverse Gene Cassettes in Class 1 Integrons of Facultative Oligotrophic Bacteria of River Mahananda, West Bengal, India

PubMed Central

Chakraborty, Ranadhir; Kumar, Arvind; Bhowal, Suparna Saha; Mandal, Amit Kumar; Tiwary, Bipransh Kumar; Mukherjee, Shriparna

2013-01-01

Background In this study a large random collection (n = 2188) of facultative oligotrophic bacteria, from 90 water samples gathered in three consecutive years (2007–2009) from three different sampling sites of River Mahananda in Siliguri, West Bengal, India, were investigated for the presence of class 1 integrons and sequences of the amplification products. Methodology/Principal Findings Replica plating method was employed for determining the antibiotic resistance profile of the randomly assorted facultative oligotrophic isolates. Genomic DNA from each isolate was analyzed by PCR for the presence of class 1 integron. Amplicons were cloned and sequenced. Numerical taxonomy and 16S rRNA gene sequence analyses were done to ascertain putative genera of the class 1 integron bearing isolates. Out of 2188 isolates, 1667 (76.19%) were antibiotic-resistant comprising of both single-antibiotic resistance (SAR) and multiple-antibiotic resistant (MAR), and 521 (23.81%) were sensitive to all twelve different antibiotics used in this study. Ninety out of 2188 isolates produced amplicon(s) of varying sizes from 0.15 to 3.45 KB. Chi-square (χ2) test revealed that the possession of class 1 integron in sensitive, SAR and MAR is not equally probable at the 1% level of significance. Diverse antibiotic-resistance gene cassettes, aadA1, aadA2, aadA4, aadA5, dfrA1, dfrA5, dfrA7, dfrA12, dfrA16, dfrA17, dfrA28, dfrA30, dfr-IIe, blaIMP-9, aacA4, Ac-6′-Ib, oxa1, oxa10 and arr2 were detected in 64 isolates. The novel cassettes encoding proteins unrelated to any known antibiotic resistance gene function were identified in 26 isolates. Antibiotic-sensitive isolates have a greater propensity to carry gene cassettes unrelated to known antibiotic-resistance genes. The integron-positive isolates under the class Betaproteobacteria comprised of only two genera, Comamonas and Acidovorax of family Comamonadaceae, while isolates under class Gammaproteobacteria fell under the families, Moraxellaceae, Pseudomonadaceae, Aeromonadaceae and Enterobacteriaceae. Conclusions Oligotrophic bacteria are good sources of novel genes as well as potential reservoirs of antibiotic resistance gene casettes. PMID:23951238
Random Item Generation Is Affected by Age

ERIC Educational Resources Information Center

Multani, Namita; Rudzicz, Frank; Wong, Wing Yiu Stephanie; Namasivayam, Aravind Kumar; van Lieshout, Pascal

2016-01-01

Purpose: Random item generation (RIG) involves central executive functioning. Measuring aspects of random sequences can therefore provide a simple method to complement other tools for cognitive assessment. We examine the extent to which RIG relates to specific measures of cognitive function, and whether those measures can be estimated using RIG…
On the limiting characteristics of quantum random number generators at various clusterings of photocounts

NASA Astrophysics Data System (ADS)

Molotkov, S. N.

2017-03-01

Various methods for the clustering of photocounts constituting a sequence of random numbers are considered. It is shown that the clustering of photocounts resulting in the Fermi-Dirac distribution makes it possible to achieve the theoretical limit of the random number generation rate.
Two stochastic models useful in petroleum exploration

NASA Technical Reports Server (NTRS)

Kaufman, G. M.; Bradley, P. G.

1972-01-01

A model of the petroleum exploration process that tests empirically the hypothesis that at an early stage in the exploration of a basin, the process behaves like sampling without replacement is proposed along with a model of the spatial distribution of petroleum reserviors that conforms to observed facts. In developing the model of discovery, the following topics are discussed: probabilitistic proportionality, likelihood function, and maximum likelihood estimation. In addition, the spatial model is described, which is defined as a stochastic process generating values of a sequence or random variables in a way that simulates the frequency distribution of areal extent, the geographic location, and shape of oil deposits
Transposon facilitated DNA sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berg, D.E.; Berg, C.M.; Huang, H.V.

1990-01-01

The purpose of this research is to investigate and develop methods that exploit the power of bacterial transposable elements for large scale DNA sequencing: Our premise is that the use of transposons to put primer binding sites randomly in target DNAs should provide access to all portions of large DNA fragments, without the inefficiencies of methods involving random subcloning and attendant repetitive sequencing, or of sequential synthesis of many oligonucleotide primers that are used to match systematically along a DNA molecule. Two unrelated bacterial transposons, Tn5 and {gamma}{delta}, are being used because they have both proven useful for molecular analyses,more » and because they differ sufficiently in mechanism and specificity of transposition to merit parallel development.« less

Automatic generation of randomized trial sequences for priming experiments.

PubMed

Ihrke, Matthias; Behrendt, Jörg

2011-01-01

In most psychological experiments, a randomized presentation of successive displays is crucial for the validity of the results. For some paradigms, this is not a trivial issue because trials are interdependent, e.g., priming paradigms. We present a software that automatically generates optimized trial sequences for (negative-) priming experiments. Our implementation is based on an optimization heuristic known as genetic algorithms that allows for an intuitive interpretation due to its similarity to natural evolution. The program features a graphical user interface that allows the user to generate trial sequences and to interactively improve them. The software is based on freely available software and is released under the GNU General Public License.
AFRESh: an adaptive framework for compression of reads and assembled sequences with random access functionality.

PubMed

Paridaens, Tom; Van Wallendael, Glenn; De Neve, Wesley; Lambert, Peter

2017-05-15

The past decade has seen the introduction of new technologies that lowered the cost of genomic sequencing increasingly. We can even observe that the cost of sequencing is dropping significantly faster than the cost of storage and transmission. The latter motivates a need for continuous improvements in the area of genomic data compression, not only at the level of effectiveness (compression rate), but also at the level of functionality (e.g. random access), configurability (effectiveness versus complexity, coding tool set …) and versatility (support for both sequenced reads and assembled sequences). In that regard, we can point out that current approaches mostly do not support random access, requiring full files to be transmitted, and that current approaches are restricted to either read or sequence compression. We propose AFRESh, an adaptive framework for no-reference compression of genomic data with random access functionality, targeting the effective representation of the raw genomic symbol streams of both reads and assembled sequences. AFRESh makes use of a configurable set of prediction and encoding tools, extended by a Context-Adaptive Binary Arithmetic Coding scheme (CABAC), to compress raw genetic codes. To the best of our knowledge, our paper is the first to describe an effective implementation CABAC outside of its' original application. By applying CABAC, the compression effectiveness improves by up to 19% for assembled sequences and up to 62% for reads. By applying AFRESh to the genomic symbols of the MPEG genomic compression test set for reads, a compression gain is achieved of up to 51% compared to SCALCE, 42% compared to LFQC and 44% compared to ORCOM. When comparing to generic compression approaches, a compression gain is achieved of up to 41% compared to GNU Gzip and 22% compared to 7-Zip at the Ultra setting. Additionaly, when compressing assembled sequences of the Human Genome, a compression gain is achieved up to 34% compared to GNU Gzip and 16% compared to 7-Zip at the Ultra setting. A Windows executable version can be downloaded at https://github.com/tparidae/AFresh . tom.paridaens@ugent.be. © The Author 2017. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com
Differentiating Visual from Response Sequencing during Long-term Skill Learning.

PubMed

Lynch, Brighid; Beukema, Patrick; Verstynen, Timothy

2017-01-01

The dual-system model of sequence learning posits that during early learning there is an advantage for encoding sequences in sensory frames; however, it remains unclear whether this advantage extends to long-term consolidation. Using the serial RT task, we set out to distinguish the dynamics of learning sequential orders of visual cues from learning sequential responses. On each day, most participants learned a new mapping between a set of symbolic cues and responses made with one of four fingers, after which they were exposed to trial blocks of either randomly ordered cues or deterministic ordered cues (12-item sequence). Participants were randomly assigned to one of four groups (n = 15 per group): Visual sequences (same sequence of visual cues across training days), Response sequences (same order of key presses across training days), Combined (same serial order of cues and responses on all training days), and a Control group (a novel sequence each training day). Across 5 days of training, sequence-specific measures of response speed and accuracy improved faster in the Visual group than any of the other three groups, despite no group differences in explicit awareness of the sequence. The two groups that were exposed to the same visual sequence across days showed a marginal improvement in response binding that was not found in the other groups. These results indicate that there is an advantage, in terms of rate of consolidation across multiple days of training, for learning sequences of actions in a sensory representational space, rather than as motoric representations.
High resolution identity testing of inactivated poliovirus vaccines

PubMed Central

Mee, Edward T.; Minor, Philip D.; Martin, Javier

2015-01-01

Background Definitive identification of poliovirus strains in vaccines is essential for quality control, particularly where multiple wild-type and Sabin strains are produced in the same facility. Sequence-based identification provides the ultimate in identity testing and would offer several advantages over serological methods. Methods We employed random RT-PCR and high throughput sequencing to recover full-length genome sequences from monovalent and trivalent poliovirus vaccine products at various stages of the manufacturing process. Results All expected strains were detected in previously characterised products and the method permitted identification of strains comprising as little as 0.1% of sequence reads. Highly similar Mahoney and Sabin 1 strains were readily discriminated on the basis of specific variant positions. Analysis of a product known to contain incorrect strains demonstrated that the method correctly identified the contaminants. Conclusion Random RT-PCR and shotgun sequencing provided high resolution identification of vaccine components. In addition to the recovery of full-length genome sequences, the method could also be easily adapted to the characterisation of minor variant frequencies and distinction of closely related products on the basis of distinguishing consensus and low frequency polymorphisms. PMID:26049003
A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

PubMed

Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

2007-04-01

Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
Efficient encapsulation of proteins with random copolymers.

PubMed

Nguyen, Trung Dac; Qiao, Baofu; Olvera de la Cruz, Monica

2018-06-12

Membraneless organelles are aggregates of disordered proteins that form spontaneously to promote specific cellular functions in vivo. The possibility of synthesizing membraneless organelles out of cells will therefore enable fabrication of protein-based materials with functions inherent to biological matter. Since random copolymers contain various compositions and sequences of solvophobic and solvophilic groups, they are expected to function in nonbiological media similarly to a set of disordered proteins in membraneless organelles. Interestingly, the internal environment of these organelles has been noted to behave more like an organic solvent than like water. Therefore, an adsorbed layer of random copolymers that mimics the function of disordered proteins could, in principle, protect and enhance the proteins' enzymatic activity even in organic solvents, which are ideal when the products and/or the reactants have limited solubility in aqueous media. Here, we demonstrate via multiscale simulations that random copolymers efficiently incorporate proteins into different solvents with the potential to optimize their enzymatic activity. We investigate the key factors that govern the ability of random copolymers to encapsulate proteins, including the adsorption energy, copolymer average composition, and solvent selectivity. The adsorbed polymer chains have remarkably similar sequences, indicating that the proteins are able to select certain sequences that best reduce their exposure to the solvent. We also find that the protein surface coverage decreases when the fluctuation in the average distance between the protein adsorption sites increases. The results herein set the stage for computational design of random copolymers for stabilizing and delivering proteins across multiple media.
Direct typing of Canine parvovirus (CPV) from infected dog faeces by rapid mini sequencing technique.

PubMed

V, Pavana Jyothi; S, Akila; Selvan, Malini K; Naidu, Hariprasad; Raghunathan, Shwethaa; Kota, Sathish; Sundaram, R C Raja; Rana, Samir Kumar; Raj, G Dhinakar; Srinivasan, V A; Mohana Subramanian, B

2016-12-01

Canine parvovirus (CPV) is a non-enveloped single stranded DNA virus with an icosahedral capsid. Mini-sequencing based CPV typing was developed earlier to detect and differentiate all the CPV types and FPV in a single reaction. This technique was further evaluated in the present study by performing the mini-sequencing directly from fecal samples which avoided tedious virus isolation steps by cell culture system. Fecal swab samples were collected from 84 dogs with enteritis symptoms, suggestive of parvoviral infection from different locations across India. Seventy six of these samples were positive by PCR; the subsequent mini-sequencing reaction typed 74 of them as type 2a virus, and 2 samples as type 2b. Additionally, 25 of the positive samples were typed by cycle sequencing of PCR products. Direct CPV typing from fecal samples using mini-sequencing showed 100% correlation with CPV typing by cycle sequencing. Moreover, CPV typing was achieved by mini-sequencing even with faintly positive PCR amplicons which was not possible by cycle sequencing. Therefore, the mini-sequencing technique is recommended for regular epidemiological follow up of CPV types, since the technique is rapid, highly sensitive and high capacity method for CPV typing. Copyright © 2016. Published by Elsevier B.V.
HIT'nDRIVE: patient-specific multidriver gene prioritization for precision oncology

PubMed Central

Hodzic, Ermin; Sauerwald, Thomas; Dao, Phuong; Wang, Kendric; Yeung, Jake; Anderson, Shawn; Vandin, Fabio; Haffari, Gholamreza; Collins, Colin C.; Sahinalp, S. Cenk

2017-01-01

Prioritizing molecular alterations that act as drivers of cancer remains a crucial bottleneck in therapeutic development. Here we introduce HIT'nDRIVE, a computational method that integrates genomic and transcriptomic data to identify a set of patient-specific, sequence-altered genes, with sufficient collective influence over dysregulated transcripts. HIT'nDRIVE aims to solve the “random walk facility location” (RWFL) problem in a gene (or protein) interaction network, which differs from the standard facility location problem by its use of an alternative distance measure: “multihitting time,” the expected length of the shortest random walk from any one of the set of sequence-altered genes to an expression-altered target gene. When applied to 2200 tumors from four major cancer types, HIT'nDRIVE revealed many potentially clinically actionable driver genes. We also demonstrated that it is possible to perform accurate phenotype prediction for tumor samples by only using HIT'nDRIVE-seeded driver gene modules from gene interaction networks. In addition, we identified a number of breast cancer subtype-specific driver modules that are associated with patients’ survival outcome. Furthermore, HIT'nDRIVE, when applied to a large panel of pan-cancer cell lines, accurately predicted drug efficacy using the driver genes and their seeded gene modules. Overall, HIT'nDRIVE may help clinicians contextualize massive multiomics data in therapeutic decision making, enabling widespread implementation of precision oncology. PMID:28768687
Quantitative assessments of the distinct contributions of polypeptide backbone amides versus sidechain groups to chain expansion via chemical denaturation

PubMed Central

Holehouse, Alex S.; Garai, Kanchan; Lyle, Nicholas; Vitalis, Andreas; Pappu, Rohit V.

2015-01-01

In aqueous solutions with high concentrations of chemical denaturants such as urea and guanidinium chloride (GdmCl) proteins expand to populate heterogeneous conformational ensembles. These denaturing environments are thought to be good solvents for generic protein sequences because properties of conformational distributions align with those of canonical random coils. Previous studies showed that water is a poor solvent for polypeptide backbones and therefore backbones form collapsed globular structures in aqueous solvents. Here, we ask if polypeptide backbones can intrinsically undergo the requisite chain expansion in aqueous solutions with high concentrations of urea and GdmCl. We answer this question using a combination of molecular dynamics simulations and fluorescence correlation spectroscopy. We find that the degree of backbone expansion is minimal in aqueous solutions with high concentrations denaturants. Instead, polypeptide backbones sample conformations that are denaturant-specific mixtures of coils and globules, with a persistent preference for globules. Therefore, typical denaturing environments cannot be classified as good solvents for polypeptide backbones. How then do generic protein sequences expand in denaturing environments? To answer this question, we investigated the effects of sidechains using simulations of two archetypal sequences with amino acid compositions that are mixtures of charged, hydrophobic, and polar groups. We find that sidechains lower the effective concentration of backbone amides in water leading to an intrinsic expansion of polypeptide backbones in the absence of denaturants. Additional dilution of the effective concentration of backbone amides is achieved through preferential interactions with denaturants. These effects lead to conformational statistics in denaturing environments that are congruent with those of canonical random coils. Our results highlight the role of sidechain-mediated interactions as determinants of the conformational properties of unfolded states in water and in influencing chain expansion upon denaturation. PMID:25664638
Equation Chapter 1 Section 1Sequence-To-Conformation Relationships of Disordered Regions Tethered to Folded Domains of Proteins.

PubMed

Mittal, Anuradha; Holehouse, Alex S; Cohan, Megan C; Pappu, Rohit V

2018-05-12

Intrinsically disordered proteins and regions (IDPs / IDRs) are characterized by well-defined sequence-to-conformation relationships (SCRs). These relationships refer to the sequence-specific preferences for average sizes, shapes, residue-specific secondary structure propensities, and amplitudes of multiscale conformational fluctuations. SCRs are discerned from the sequence-specific conformational ensembles of IDPs. A vast majority of IDPs are actually tethered to folded domains (FDs). This raises the question of whether or not SCRs inferred for IDPs are applicable to IDRs tethered to folded domains. Here, we use atomistic simulations based on a well-established forcefield paradigm and an enhanced sampling method to obtain comparative assessments of SCRs for thirteen archetypal IDRs modeled as autonomous units, as C-terminal tails connected to folded domains, and as linkers between pairs of folded domains. Our studies uncover a set of general observations regarding context-independent versus context-dependent SCRs of IDRs. SCRs are minimally perturbed upon tethering to folded domains if the IDRs are deficient in charged residues and for polyampholytic IDRs where the oppositely charged residues within the sequence of the IDR are separated into distinct blocks. In contrast, the interplay between IDRs and tethered folded domains has a significant modulatory effect on SCRs if the IDRs have intermediate fractions of charged residues or if they have sequence-intrinsic conformational preferences for canonical random coils. Our findings suggest that IDRs with context-independent SCRs might be independent evolutionary modules whereas IDRs with context-dependent intrinsic SCRs might co-evolve with the FDs to which they are tethered. Copyright © 2018. Published by Elsevier Ltd.
Preserving correlations between trajectories for efficient path sampling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gingrich, Todd R.; Geissler, Phillip L.; Chemical Sciences Division, Lawrence Berkeley National Laboratory, Berkeley, California 94720

2015-06-21

Importance sampling of trajectories has proved a uniquely successful strategy for exploring rare dynamical behaviors of complex systems in an unbiased way. Carrying out this sampling, however, requires an ability to propose changes to dynamical pathways that are substantial, yet sufficiently modest to obtain reasonable acceptance rates. Satisfying this requirement becomes very challenging in the case of long trajectories, due to the characteristic divergences of chaotic dynamics. Here, we examine schemes for addressing this problem, which engineer correlation between a trial trajectory and its reference path, for instance using artificial forces. Our analysis is facilitated by a modern perspective onmore » Markov chain Monte Carlo sampling, inspired by non-equilibrium statistical mechanics, which clarifies the types of sampling strategies that can scale to long trajectories. Viewed in this light, the most promising such strategy guides a trial trajectory by manipulating the sequence of random numbers that advance its stochastic time evolution, as done in a handful of existing methods. In cases where this “noise guidance” synchronizes trajectories effectively, as the Glauber dynamics of a two-dimensional Ising model, we show that efficient path sampling can be achieved for even very long trajectories.« less
Classification of Pelteobagrus fish in Poyang Lake based on mitochondrial COI gene sequence.

PubMed

Zhong, Bin; Chen, Ting-Ting; Gong, Rui-Yue; Zhao, Zhe-Xia; Wang, Binhua; Fang, Chunlin; Mao, Hui-Ling

2016-11-01

We use DNA molecular marker technology to correct the deficiency of traditional morphological taxonomy. Totality 770 Pelteobagrus fish from Poyang Lake were collected. After preliminary morphological classification, random selected eight samples in each species for DNA extraction. Mitochondrial COI gene sequence was cloned with universal primers and sequenced. The results showed that there are four species of Pelteobagrus living in Poyang Lake. The average of intraspecific genetic distance value was 0.003, while the average interspecific genetic distance was 0.128. The interspecific genetic distance is far more than intraspecific genetic distance. Besides, phylogenetic tree analysis revealed that molecular systematics was in accord with morphological classification. It indicated that COI gene is an effective DNA molecular marker in Pelteobagrus classification. Surprisingly, the intraspecific difference of some individuals (P. e6, P. n6, P. e5, and P. v4) from their original named exceeded species threshold (2%), which should be renewedly classified into Pelteobagrus fulvidraco. However, another individual P. v3 was very different, because its genetic distance was over 8.4% difference from original named Pelteobagrus vachelli. Its taxonomic status remained to be further studied.
Lyapunov exponents for one-dimensional aperiodic photonic bandgap structures

NASA Astrophysics Data System (ADS)

Kissel, Glen J.

2011-10-01

Existing in the "gray area" between perfectly periodic and purely randomized photonic bandgap structures are the socalled aperoidic structures whose layers are chosen according to some deterministic rule. We consider here a onedimensional photonic bandgap structure, a quarter-wave stack, with the layer thickness of one of the bilayers subject to being either thin or thick according to five deterministic sequence rules and binary random selection. To produce these aperiodic structures we examine the following sequences: Fibonacci, Thue-Morse, Period doubling, Rudin-Shapiro, as well as the triadic Cantor sequence. We model these structures numerically with a long chain (approximately 5,000,000) of transfer matrices, and then use the reliable algorithm of Wolf to calculate the (upper) Lyapunov exponent for the long product of matrices. The Lyapunov exponent is the statistically well-behaved variable used to characterize the Anderson localization effect (exponential confinement) when the layers are randomized, so its calculation allows us to more precisely compare the purely randomized structure with its aperiodic counterparts. It is found that the aperiodic photonic systems show much fine structure in their Lyapunov exponents as a function of frequency, and, in a number of cases, the exponents are quite obviously fractal.
Markers and mapping revisited: finding your gene.

PubMed

Jones, Neil; Ougham, Helen; Thomas, Howard; Pasakinskiene, Izolda

2009-01-01

This paper is an update of our earlier review (Jones et al., 1997, Markers and mapping: we are all geneticists now. New Phytologist 137: 165-177), which dealt with the genetics of mapping, in terms of recombination as the basis of the procedure, and covered some of the first generation of markers, including restriction fragment length polymorphisms (RFLPs), random amplified polymorphic DNA (RAPDs), simple sequence repeats (SSRs) and quantitative trait loci (QTLs). In the intervening decade there have been numerous developments in marker science with many new systems becoming available, which are herein described: cleavage amplification polymorphism (CAP), sequence-specific amplification polymorphism (S-SAP), inter-simple sequence repeat (ISSR), sequence tagged site (STS), sequence characterized amplification region (SCAR), selective amplification of microsatellite polymorphic loci (SAMPL), single nucleotide polymorphism (SNP), expressed sequence tag (EST), sequence-related amplified polymorphism (SRAP), target region amplification polymorphism (TRAP), microarrays, diversity arrays technology (DArT), single-strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), temperature gradient gel electrophoresis (TGGE) and methylation-sensitive PCR. In addition there has been an explosion of knowledge and databases in the area of genomics and bioinformatics. The number of flowering plant ESTs is c. 19 million and counting, with all the opportunity that this provides for gene-hunting, while the survey of bioinformatics and computer resources points to a rapid growth point for future activities in unravelling and applying the burst of new information on plant genomes. A case study is presented on tracking down a specific gene (stay-green (SGR), a post-transcriptional senescence regulator) using the full suite of mapping tools and comparative mapping resources. We end with a brief speculation on how genome analysis may progress into the future of this highly dynamic arena of plant science.
Genetic Characterization of Fasciola Isolates from West Azerbaijan Province Iran Based on ITS1 and ITS2 Sequence of Ribosomal DNA

PubMed Central

GALAVANI, Hossein; GHOLIZADEH, Saber; HAZRATI TAPPEH, Khosrow

2016-01-01

Background: Fascioliasis, caused by Fasciola hepatica and F. gigantica, has medical and economic importance in the world. Molecular approaches comparing traditional methods using for identification and characterization of Fasciola spp. are precise and reliable. The aims of current study were molecular characterization of Fasciola spp. in West Azerbaijan Province, Iran and then comparative analysis of them using GenBank sequences. Methods: A total number of 580 isolates were collected from different hosts in five cities of West Azerbaijan Province, in 2014 from 90 slaughtered cattle (n=50) and sheep (n=40). After morphological identification and DNA extraction, designing specific primer were used to amplification of ITS1, 5.8s and ITS2 regions, 50 samples were conducted to sequence, randomly. Result: Using morphometric characters 99.14% and 0.86% of isolates identified as F. hepatica and F. gigantica, respectively. PCR amplification of 1081 bp fragment and sequencing result showed 100% similarity with F. hepatica in ITS1 (428 bp), 5.8s (158 bp), and ITS2 (366 bp) regions. Sequence comparison among current study sequences and GenBank data showed 98% identity with 11 nucleotide mismatches. However, in phylogenetic tree F. hepatica sequences of West Azerbaijan Province, Iran, were in a close relationship with Iranian, Asian, and African isolates. Conclusions: Only F. hepatica species is distributed among sheep and cattle in West Azerbaijan Province Iran. However, 5 and 6 bp variation in ITS1 and ITS2 regions, respectively, is not enough to separate of Fasciola spp. Therefore, more studies are essential for designing new molecular markers to correct species identification. PMID:27095969
Local linear regression for function learning: an analysis based on sample discrepancy.

PubMed

Cervellera, Cristiano; Macciò, Danilo

2014-11-01

Local linear regression models, a kind of nonparametric structures that locally perform a linear estimation of the target function, are analyzed in the context of empirical risk minimization (ERM) for function learning. The analysis is carried out with emphasis on geometric properties of the available data. In particular, the discrepancy of the observation points used both to build the local regression models and compute the empirical risk is considered. This allows to treat indifferently the case in which the samples come from a random external source and the one in which the input space can be freely explored. Both consistency of the ERM procedure and approximating capabilities of the estimator are analyzed, proving conditions to ensure convergence. Since the theoretical analysis shows that the estimation improves as the discrepancy of the observation points becomes smaller, low-discrepancy sequences, a family of sampling methods commonly employed for efficient numerical integration, are also analyzed. Simulation results involving two different examples of function learning are provided.
High-Throughput Sequencing and De Novo Assembly of the Isatis indigotica Transcriptome

PubMed Central

Tang, Xiaoqing; Xiao, Yunhua; Lv, Tingting; Wang, Fangquan; Zhu, QianHao; Zheng, Tianqing; Yang, Jie

2014-01-01

Background Isatis indigotica, the source of the traditional Chinese medicine Radix isatidis (Ban-Lan-Gen), is an extremely important economical crop in China. To facilitate biological, biochemical and molecular research on the medicinal chemicals in I. indigotica, here we report the first I. indigotica transcriptome generated by RNA sequencing (RNA-seq). Results RNA-seq library was created using RNA extracted from a mixed sample including leaf and root. A total of 33,238 unigenes were assembled from more than 28 million of high quality short reads. The quality of the assembly was experimentally examined by cDNA sequencing of seven randomly selected unigenes. Based on blast search 28,184 unigenes had a hit in at least one of the protein and nucleotide databases used in this study, and 8 unigenes were found to be associated with biosynthesis of indole and its derivatives. According to Gene Ontology classification, 22,365 unigenes were categorized into 48 functional groups. Furthermore, Clusters of Orthologous Group and Swiss-Port annotation were assigned for 7,707 and 18,679 unigenes, respectively. Analysis of repeat motifs identified 6,400 simple sequence repeat markers in 4,509 unigenes. Conclusion Our data provide a comprehensive sequence resource for molecular study of I. indigotica. Our results will facilitate studies on the functions of genes involved in the indole alkaloid biosynthesis pathway and on metabolism of nitrogen and indole alkaloids in I. indigotica and its related species. PMID:25259890
Comparison of three human papillomavirus DNA detection methods: Next generation sequencing, multiplex-PCR and nested-PCR followed by Sanger based sequencing.

PubMed

da Fonseca, Allex Jardim; Galvão, Renata Silva; Miranda, Angelica Espinosa; Ferreira, Luiz Carlos de Lima; Chen, Zigui

2016-05-01

To compare the diagnostic performance for HPV infection using three laboratorial techniques. Ninty-five cervicovaginal samples were randomly selected; each was tested for HPV DNA and genotypes using 3 methods in parallel: Multiplex-PCR, the Nested PCR followed by Sanger sequencing, and the Next_Gen Sequencing (NGS) with two assays (NGS-A1, NGS-A2). The study was approved by the Brazilian National IRB (CONEP protocol 16,800). The prevalence of HPV by the NGS assays was higher than that using the Multiplex-PCR (64.2% vs. 45.2%, respectively; P = 0.001) and the Nested-PCR (64.2% vs. 49.5%, respectively; P = 0.003). NGS also showed better performance in detecting high-risk HPV (HR-HPV) and HPV16. There was a weak interobservers agreement between the results of Multiplex-PCR and Nested-PCR in relation to NGS for the diagnosis of HPV infection, and a moderate correlation for HR-HPV detection. Both NGS assays showed a strong correlation for detection of HPVs (k = 0.86), HR-HPVs (k = 0.91), HPV16 (k = 0.92) and HPV18 (k = 0.91). NGS is more sensitive than the traditional Sanger sequencing and the Multiplex PCR to genotype HPVs, with promising ability to detect multiple infections, and may have the potential to establish an alternative method for the diagnosis and genotyping of HPV. © 2015 Wiley Periodicals, Inc.
Human papillomavirus type 18 variant lineages in United States populations characterized by sequence analysis of LCR-E6, E2, and L1 regions.

PubMed

Arias-Pulido, Hugo; Peyton, Cheri L; Torrez-Martínez, Norah; Anderson, D Nelson; Wheeler, Cosette M

2005-07-20

While HPV 16 variant lineages have been well characterized, the knowledge about HPV 18 variants is limited. In this study, HPV 18 nucleotide variations in the E2 hinge region were characterized by sequence analysis in 47 control and 51 tumor specimens. Fifty of these specimens were randomly selected for sequencing of an LCR-E6 segment and 20 samples representative of LCR-E6 and E2 sequence variants were examined across the L1 region. A total of 2770 nucleotides per HPV 18 variant genome were considered in this study. HPV 18 variant nucleotides were linked among all gene segments analyzed and grouped into three main branches: Asian-American (AA), European (E), and African (Af). These three branches were equally distributed among controls and cases and when stratified by Hispanic and non-Hispanic ethnicities. Among invasive cervical cancer cases, no significant differences in the three HPV variant branches were observed among ethnic groups or when stratified by histopathology (squamous vs. adenocarcinoma). The Af branch showed the greatest nucleotide variability when compared to the HPV 18 reference sequence and was more closely related to HPV 45 than either AA or E branches. Our data also characterize nucleotide and amino acid variations in the L1 capsid gene among HPV 18 variants, which may be relevant to vaccine strategies and subsequent studies of naturally occurring HPV 18 variants. Several novel HPV 18 nucleotide variations were identified in this study.
Effects of Ethnic Attributes on the Quality of Family Planning Services in Lima, Peru: A Randomized Crossover Trial

PubMed Central

Planas, Maria-Elena; García, Patricia J.; Bustelo, Monserrat; Carcamo, Cesar P.; Martinez, Sebastian; Nopo, Hugo; Rodriguez, Julio; Merino, Maria-Fernanda; Morrison, Andrew

2015-01-01

Most studies reporting ethnic disparities in the quality of healthcare come from developed countries and rely on observational methods. We conducted the first experimental study to evaluate whether health providers in Peru provide differential quality of care for family planning services, based on the indigenous or mestizo (mixed ethnoracial ancestry) profile of the patient. In a crossover randomized controlled trial conducted in 2012, a sample of 351 out of the 408 public health establishments in Metropolitan Lima, Peru were randomly assigned to receive unannounced simulated patients enacting indigenous and mestizo profiles (sequence-1) or mestizo and then indigenous profiles (sequence-2), with a five week wash-out period. Both ethnic profiles used the same scripted scenario for seeking contraceptive advice but had distinctive cultural attributes such as clothing, styling of hair, make-up, accessories, posture and patterns of movement and speech. Our primary outcome measure of quality of care is the proportion of technical tasks performed by providers, as established by Peruvian family planning clinical guidelines. Providers and data analysts were kept blinded to the allocation. We found a non-significant mean difference of -0·7% (p = 0·23) between ethnic profiles in the percentage of technical tasks performed by providers. However we report large deficiencies in the compliance with quality standards of care for both profiles. Differential provider behaviour based on the patient's ethnic profiles compared in the study did not contribute to deficiencies in family planning outcomes observed. The study highlights the need to explore other determinants for poor compliance with quality standards, including demand and supply side factors, and calls for interventions to improve the quality of care for family planning services in Metropolitan Lima. PMID:25671664

Effects of ethnic attributes on the quality of family planning services in Lima, Peru: a randomized crossover trial.

PubMed

Planas, Maria-Elena; García, Patricia J; Bustelo, Monserrat; Carcamo, Cesar P; Martinez, Sebastian; Nopo, Hugo; Rodriguez, Julio; Merino, Maria-Fernanda; Morrison, Andrew

2015-01-01

Most studies reporting ethnic disparities in the quality of healthcare come from developed countries and rely on observational methods. We conducted the first experimental study to evaluate whether health providers in Peru provide differential quality of care for family planning services, based on the indigenous or mestizo (mixed ethnoracial ancestry) profile of the patient. In a crossover randomized controlled trial conducted in 2012, a sample of 351 out of the 408 public health establishments in Metropolitan Lima, Peru were randomly assigned to receive unannounced simulated patients enacting indigenous and mestizo profiles (sequence-1) or mestizo and then indigenous profiles (sequence-2), with a five week wash-out period. Both ethnic profiles used the same scripted scenario for seeking contraceptive advice but had distinctive cultural attributes such as clothing, styling of hair, make-up, accessories, posture and patterns of movement and speech. Our primary outcome measure of quality of care is the proportion of technical tasks performed by providers, as established by Peruvian family planning clinical guidelines. Providers and data analysts were kept blinded to the allocation. We found a non-significant mean difference of -0.7% (p = 0.23) between ethnic profiles in the percentage of technical tasks performed by providers. However we report large deficiencies in the compliance with quality standards of care for both profiles. Differential provider behaviour based on the patient's ethnic profiles compared in the study did not contribute to deficiencies in family planning outcomes observed. The study highlights the need to explore other determinants for poor compliance with quality standards, including demand and supply side factors, and calls for interventions to improve the quality of care for family planning services in Metropolitan Lima.
Open-target sparse sensing of biological agents using DNA microarray

PubMed Central

2011-01-01

Background Current biosensors are designed to target and react to specific nucleic acid sequences or structural epitopes. These 'target-specific' platforms require creation of new physical capture reagents when new organisms are targeted. An 'open-target' approach to DNA microarray biosensing is proposed and substantiated using laboratory generated data. The microarray consisted of 12,900 25 bp oligonucleotide capture probes derived from a statistical model trained on randomly selected genomic segments of pathogenic prokaryotic organisms. Open-target detection of organisms was accomplished using a reference library of hybridization patterns for three test organisms whose DNA sequences were not included in the design of the microarray probes. Results A multivariate mathematical model based on the partial least squares regression (PLSR) was developed to detect the presence of three test organisms in mixed samples. When all 12,900 probes were used, the model correctly detected the signature of three test organisms in all mixed samples (mean(R2)) = 0.76, CI = 0.95), with a 6% false positive rate. A sampling algorithm was then developed to sparsely sample the probe space for a minimal number of probes required to capture the hybridization imprints of the test organisms. The PLSR detection model was capable of correctly identifying the presence of the three test organisms in all mixed samples using only 47 probes (mean(R2)) = 0.77, CI = 0.95) with nearly 100% specificity. Conclusions We conceived an 'open-target' approach to biosensing, and hypothesized that a relatively small, non-specifically designed, DNA microarray is capable of identifying the presence of multiple organisms in mixed samples. Coupled with a mathematical model applied to laboratory generated data, and sparse sampling of capture probes, the prototype microarray platform was able to capture the signature of each organism in all mixed samples with high sensitivity and specificity. It was demonstrated that this new approach to biosensing closely follows the principles of sparse sensing. PMID:21801424
Experimental rugged fitness landscape in protein sequence space.

PubMed

Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

2006-12-20

The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12-130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7x10(4)-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18-24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region.
Experimental Rugged Fitness Landscape in Protein Sequence Space

PubMed Central

Hayashi, Yuuki; Aita, Takuyo; Toyota, Hitoshi; Husimi, Yuzuru; Urabe, Itaru; Yomo, Tetsuya

2006-01-01

The fitness landscape in sequence space determines the process of biomolecular evolution. To plot the fitness landscape of protein function, we carried out in vitro molecular evolution beginning with a defective fd phage carrying a random polypeptide of 139 amino acids in place of the g3p minor coat protein D2 domain, which is essential for phage infection. After 20 cycles of random substitution at sites 12–130 of the initial random polypeptide and selection for infectivity, the selected phage showed a 1.7×104-fold increase in infectivity, defined as the number of infected cells per ml of phage suspension. Fitness was defined as the logarithm of infectivity, and we analyzed (1) the dependence of stationary fitness on library size, which increased gradually, and (2) the time course of changes in fitness in transitional phases, based on an original theory regarding the evolutionary dynamics in Kauffman's n-k fitness landscape model. In the landscape model, single mutations at single sites among n sites affect the contribution of k other sites to fitness. Based on the results of these analyses, k was estimated to be 18–24. According to the estimated parameters, the landscape was plotted as a smooth surface up to a relative fitness of 0.4 of the global peak, whereas the landscape had a highly rugged surface with many local peaks above this relative fitness value. Based on the landscapes of these two different surfaces, it appears possible for adaptive walks with only random substitutions to climb with relative ease up to the middle region of the fitness landscape from any primordial or random sequence, whereas an enormous range of sequence diversity is required to climb further up the rugged surface above the middle region. PMID:17183728
Demonstration of Nondeclarative Sequence Learning in Mice: Development of an Animal Analog of the Human Serial Reaction Time Task

ERIC Educational Resources Information Center

Christie, Michael A.; Hersch, Steven M.

2004-01-01

In this paper, we demonstrate nondeclarative sequence learning in mice using an animal analog of the human serial reaction time task (SRT) that uses a within-group comparison of behavior in response to a repeating sequence versus a random sequence. Ten female B6CBA mice performed eleven 96-trial sessions containing 24 repetitions of a 4-trial…
Robust High Data Rate MIMO Underwater Acoustic Communications

DTIC Science & Technology

2010-12-31

algorithm is referred to as periodic CAN ( PeCAN ). Unlike most existing sequence construction methods which are algebraic and deterministic in nature, we...start the iteration of PeCAN from random phase initializations and then proceed to cyclically minimize the desired metric. In this way, through...by the foe and hence are especially useful as training sequences or as spreading sequences for UAC applications. We will use PeCAN sequences for
Design of nucleic acid sequences for DNA computing based on a thermodynamic approach

PubMed Central

Tanaka, Fumiaki; Kameda, Atsushi; Yamamoto, Masahito; Ohuchi, Azuma

2005-01-01

We have developed an algorithm for designing multiple sequences of nucleic acids that have a uniform melting temperature between the sequence and its complement and that do not hybridize non-specifically with each other based on the minimum free energy (ΔGmin). Sequences that satisfy these constraints can be utilized in computations, various engineering applications such as microarrays, and nano-fabrications. Our algorithm is a random generate-and-test algorithm: it generates a candidate sequence randomly and tests whether the sequence satisfies the constraints. The novelty of our algorithm is that the filtering method uses a greedy search to calculate ΔGmin. This effectively excludes inappropriate sequences before ΔGmin is calculated, thereby reducing computation time drastically when compared with an algorithm without the filtering. Experimental results in silico showed the superiority of the greedy search over the traditional approach based on the hamming distance. In addition, experimental results in vitro demonstrated that the experimental free energy (ΔGexp) of 126 sequences correlated well with ΔGmin (|R| = 0.90) than with the hamming distance (|R| = 0.80). These results validate the rationality of a thermodynamic approach. We implemented our algorithm in a graphic user interface-based program written in Java. PMID:15701762
Nonuniform sampling theorems for random signals in the linear canonical transform domain

NASA Astrophysics Data System (ADS)

Shuiqing, Xu; Congmei, Jiang; Yi, Chai; Youqiang, Hu; Lei, Huang

2018-06-01

Nonuniform sampling can be encountered in various practical processes because of random events or poor timebase. The analysis and applications of the nonuniform sampling for deterministic signals related to the linear canonical transform (LCT) have been well considered and researched, but up to now no papers have been published regarding the various nonuniform sampling theorems for random signals related to the LCT. The aim of this article is to explore the nonuniform sampling and reconstruction of random signals associated with the LCT. First, some special nonuniform sampling models are briefly introduced. Second, based on these models, some reconstruction theorems for random signals from various nonuniform samples associated with the LCT have been derived. Finally, the simulation results are made to prove the accuracy of the sampling theorems. In addition, the latent real practices of the nonuniform sampling for random signals have been also discussed.
How to infer relative fitness from a sample of genomic sequences.

PubMed

Dayarian, Adel; Shraiman, Boris I

2014-07-01

Mounting evidence suggests that natural populations can harbor extensive fitness diversity with numerous genomic loci under selection. It is also known that genealogical trees for populations under selection are quantifiably different from those expected under neutral evolution and described statistically by Kingman's coalescent. While differences in the statistical structure of genealogies have long been used as a test for the presence of selection, the full extent of the information that they contain has not been exploited. Here we demonstrate that the shape of the reconstructed genealogical tree for a moderately large number of random genomic samples taken from a fitness diverse, but otherwise unstructured, asexual population can be used to predict the relative fitness of individuals within the sample. To achieve this we define a heuristic algorithm, which we test in silico, using simulations of a Wright-Fisher model for a realistic range of mutation rates and selection strength. Our inferred fitness ranking is based on a linear discriminator that identifies rapidly coalescing lineages in the reconstructed tree. Inferred fitness ranking correlates strongly with actual fitness, with a genome in the top 10% ranked being in the top 20% fittest with false discovery rate of 0.1-0.3, depending on the mutation/selection parameters. The ranking also enables us to predict the genotypes that future populations inherit from the present one. While the inference accuracy increases monotonically with sample size, samples of 200 nearly saturate the performance. We propose that our approach can be used for inferring relative fitness of genomes obtained in single-cell sequencing of tumors and in monitoring viral outbreaks. Copyright © 2014 by the Genetics Society of America.
Analysis of Uniform Random Numbers Generated by Randu and Urn Ten Different Seeds.

DTIC Science & Technology

The statistical properties of the numbers generated by two uniform random number generators, RANDU and URN, each using ten different seeds are...The testing is performed on a sequence of 50,000 numbers generated by each uniform random number generator using each of the ten seeds . (Author)
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
The single-species metagenome: subtyping Staphylococcus aureus core genome sequences from shotgun metagenomic data

PubMed Central

Li, Ben; Petit III, Robert A.; Qin, Zhaohui S.; Darrow, Lyndsey

2016-01-01

In this study we developed a genome-based method for detecting Staphylococcus aureus subtypes from metagenome shotgun sequence data. We used a binomial mixture model and the coverage counts at >100,000 known S. aureus SNP (single nucleotide polymorphism) sites derived from prior comparative genomic analysis to estimate the proportion of 40 subtypes in metagenome samples. We were able to obtain >87% sensitivity and >94% specificity at 0.025X coverage for S. aureus. We found that 321 and 149 metagenome samples from the Human Microbiome Project and metaSUB analysis of the New York City subway, respectively, contained S. aureus at genome coverage >0.025. In both projects, CC8 and CC30 were the most common S. aureus clonal complexes encountered. We found evidence that the subtype composition at different body sites of the same individual were more similar than random sampling and more limited evidence that certain body sites were enriched for particular subtypes. One surprising finding was the apparent high frequency of CC398, a lineage often associated with livestock, in samples from the tongue dorsum. Epidemiologic analysis of the HMP subject population suggested that high BMI (body mass index) and health insurance are possibly associated with S. aureus carriage but there was limited power to identify factors linked to carriage of even the most common subtype. In the NYC subway data, we found a small signal of geographic distance affecting subtype clustering but other unknown factors influence taxonomic distribution of the species around the city. PMID:27781166
EXONSAMPLER: a computer program for genome-wide and candidate gene exon sampling for targeted next-generation sequencing.

PubMed

Cosart, Ted; Beja-Pereira, Albano; Luikart, Gordon

2014-11-01

The computer program EXONSAMPLER automates the sampling of thousands of exon sequences from publicly available reference genome sequences and gene annotation databases. It was designed to provide exon sequences for the efficient, next-generation gene sequencing method called exon capture. The exon sequences can be sampled by a list of gene name abbreviations (e.g. IFNG, TLR1), or by sampling exons from genes spaced evenly across chromosomes. It provides a list of genomic coordinates (a bed file), as well as a set of sequences in fasta format. User-adjustable parameters for collecting exon sequences include a minimum and maximum acceptable exon length, maximum number of exonic base pairs (bp) to sample per gene, and maximum total bp for the entire collection. It allows for partial sampling of very large exons. It can preferentially sample upstream (5 prime) exons, downstream (3 prime) exons, both external exons, or all internal exons. It is written in the Python programming language using its free libraries. We describe the use of EXONSAMPLER to collect exon sequences from the domestic cow (Bos taurus) genome for the design of an exon-capture microarray to sequence exons from related species, including the zebu cow and wild bison. We collected ~10% of the exome (~3 million bp), including 155 candidate genes, and ~16,000 exons evenly spaced genomewide. We prioritized the collection of 5 prime exons to facilitate discovery and genotyping of SNPs near upstream gene regulatory DNA sequences, which control gene expression and are often under natural selection. © 2014 John Wiley & Sons Ltd.
LLNL Genomic Assessment: Viral and Bacterial Sequencing Needs for TMTI, Task 1.4.2 Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slezak, T; Borucki, M; Lam, M

Good progress has been made on both bacterial and viral sequencing by the TMTI centers. While access to appropriate samples is a limiting factor to throughput, excellent progress has been made with respect to getting agreements in place with key sources of relevant materials. Sharing of sequenced genomes funded by TMTI has been extremely limited to date. The April 2010 exercise should force a resolution to this, but additional managerial pressures may be needed to ensure that rapid sharing of TMTI-funded sequencing occurs, regardless of collaborator constraints concerning ultimate publication(s). Policies to permit TMTI-internal rapid sharing of sequenced genomes shouldmore » be written into all TMTI agreements with collaborators now being negotiated. TMTI needs to establish a Web-based system for tracking samples destined for sequencing. This includes metadata on sample origins and contributor, information on sample shipment/receipt, prioritization by TMTI, assignment to one or more sequencing centers (including possible TMTI-sponsored sequencing at a contributor site), and status history of the sample sequencing effort. While this system could be a component of the AFRL system, it is not part of any current development effort. Policy and standardized procedures are needed to ensure appropriate verification of all TMTI samples prior to the investment in sequencing. PCR, arrays, and classical biochemical tests are examples of potential verification methods. Verification is needed to detect miss-labeled, degraded, mixed or contaminated samples. Regular QC exercises are needed to ensure that the TMTI-funded centers are meeting all standards for producing quality genomic sequence data.« less
Perception of randomness: On the time of streaks.

PubMed

Sun, Yanlong; Wang, Hongbin

2010-12-01

People tend to think that streaks in random sequential events are rare and remarkable. When they actually encounter streaks, they tend to consider the underlying process as non-random. The present paper examines the time of pattern occurrences in sequences of Bernoulli trials, and shows that among all patterns of the same length, a streak is the most delayed pattern for its first occurrence. It is argued that when time is of essence, how often a pattern is to occur (mean time, or, frequency) and when a pattern is to first occur (waiting time) are different questions and bear different psychological relevance. The waiting time statistics may provide a quantitative measure to the psychological distance when people are expecting a probabilistic event, and such measure is consistent with both of the representativeness and availability heuristics in people's perception of randomness. We discuss some of the recent empirical findings and suggest that people's judgment and generation of random sequences may be guided by their actual experiences of the waiting time statistics. Published by Elsevier Inc.
Cluster Tails for Critical Power-Law Inhomogeneous Random Graphs

NASA Astrophysics Data System (ADS)

van der Hofstad, Remco; Kliem, Sandra; van Leeuwaarden, Johan S. H.

2018-04-01

Recently, the scaling limit of cluster sizes for critical inhomogeneous random graphs of rank-1 type having finite variance but infinite third moment degrees was obtained in Bhamidi et al. (Ann Probab 40:2299-2361, 2012). It was proved that when the degrees obey a power law with exponent τ \\in (3,4), the sequence of clusters ordered in decreasing size and multiplied through by n^{-(τ -2)/(τ -1)} converges as n→ ∞ to a sequence of decreasing non-degenerate random variables. Here, we study the tails of the limit of the rescaled largest cluster, i.e., the probability that the scaling limit of the largest cluster takes a large value u, as a function of u. This extends a related result of Pittel (J Combin Theory Ser B 82(2):237-269, 2001) for the Erdős-Rényi random graph to the setting of rank-1 inhomogeneous random graphs with infinite third moment degrees. We make use of delicate large deviations and weak convergence arguments.
Wide brick tunnel randomization - an unequal allocation procedure that limits the imbalance in treatment totals.

PubMed

Kuznetsova, Olga M; Tymofyeyev, Yevgen

2014-04-30

In open-label studies, partial predictability of permuted block randomization provides potential for selection bias. To lessen the selection bias in two-arm studies with equal allocation, a number of allocation procedures that limit the imbalance in treatment totals at a pre-specified level but do not require the exact balance at the ends of the blocks were developed. In studies with unequal allocation, however, the task of designing a randomization procedure that sets a pre-specified limit on imbalance in group totals is not resolved. Existing allocation procedures either do not preserve the allocation ratio at every allocation or do not include all allocation sequences that comply with the pre-specified imbalance threshold. Kuznetsova and Tymofyeyev described the brick tunnel randomization for studies with unequal allocation that preserves the allocation ratio at every step and, in the two-arm case, includes all sequences that satisfy the smallest possible imbalance threshold. This article introduces wide brick tunnel randomization for studies with unequal allocation that allows all allocation sequences with imbalance not exceeding any pre-specified threshold while preserving the allocation ratio at every step. In open-label studies, allowing a larger imbalance in treatment totals lowers selection bias because of the predictability of treatment assignments. The applications of the technique in two-arm and multi-arm open-label studies with unequal allocation are described. Copyright © 2013 John Wiley & Sons, Ltd.
Estimating Genomic Distance from DNA Sequence Location in Cell Nuclei by a Random Walk Model

NASA Astrophysics Data System (ADS)

van den Engh, Ger; Sachs, Rainer; Trask, Barbara J.

1992-09-01

The folding of chromatin in interphase cell nuclei was studied by fluorescent in situ hybridization with pairs of unique DNA sequence probes. The sites of DNA sequences separated by 100 to 2000 kilobase pairs (kbp) are distributed in interphase chromatin according to a random walk model. This model provides the basis for calculating the spacing of sequences along the linear DNA molecule from interphase distance measurements. An interphase mapping strategy based on this model was tested with 13 probes from a 4-megabase pair (Mbp) region of chromosome 4 containing the Huntington disease locus. The results confirmed the locations of the probes and showed that the remaining gap in the published maps of this region is negligible in size. Interphase distance measurements should facilitate construction of chromosome maps with an average marker density of one per 100 kbp, approximately ten times greater than that achieved by hybridization to metaphase chromosomes.
Artificial neural network study on organ-targeting peptides

NASA Astrophysics Data System (ADS)

Jung, Eunkyoung; Kim, Junhyoung; Choi, Seung-Hoon; Kim, Minkyoung; Rhee, Hokyoung; Shin, Jae-Min; Choi, Kihang; Kang, Sang-Kee; Lee, Nam Kyung; Choi, Yun-Jaie; Jung, Dong Hyun

2010-01-01

We report a new approach to studying organ targeting of peptides on the basis of peptide sequence information. The positive control data sets consist of organ-targeting peptide sequences identified by the peroral phage-display technique for four organs, and the negative control data are prepared from random sequences. The capacity of our models to make appropriate predictions is validated by statistical indicators including sensitivity, specificity, enrichment curve, and the area under the receiver operating characteristic (ROC) curve (the ROC score). VHSE descriptor produces statistically significant training models and the models with simple neural network architectures show slightly greater predictive power than those with complex ones. The training and test set statistics indicate that our models could discriminate between organ-targeting and random sequences. We anticipate that our models will be applicable to the selection of organ-targeting peptides for generating peptide drugs or peptidomimetics.
Genetic discovery in Xylella fastidiosa through sequence analysis of selected randomly amplified polymorphic DNAs.

PubMed

Chen, Jianchi; Civerolo, Edwin L; Jarret, Robert L; Van Sluys, Marie-Anne; de Oliveira, Mariana C

2005-02-01

Xylella fastidiosa causes many important plant diseases including Pierce's disease (PD) in grape and almond leaf scorch disease (ALSD). DNA-based methodologies, such as randomly amplified polymorphic DNA (RAPD) analysis, have been playing key roles in genetic information collection of the bacterium. This study further analyzed the nucleotide sequences of selected RAPDs from X. fastidiosa strains in conjunction with the available genome sequence databases and unveiled several previously unknown novel genetic traits. These include a sequence highly similar to those in the phage family of Podoviridae. Genome comparisons among X. fastidiosa strains suggested that the "phage" is currently active. Two other RAPDs were also related to horizontal gene transfer: one was part of a broadly distributed cryptic plasmid and the other was associated with conjugal transfer. One RAPD inferred a genomic rearrangement event among X. fastidiosa PD strains and another identified a single nucleotide polymorphism of evolutionary value.

The Airborne Metagenome in an Indoor Urban Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tringe, Susannah; Zhang, Tao; Liu, Xuguo

2008-02-12

The indoor atmosphere is an ecological unit that impacts on public health. To investigate the composition of organisms in this space, we applied culture-independent approaches to microbes harvested from the air of two densely populated urban buildings, from which we analyzed 80 megabases genomic DNA sequence and 6000 16S rDNA clones. The air microbiota is primarily bacteria, including potential opportunistic pathogens commonly isolated from human-inhabited environments such as hospitals, but none of the data contain matches to virulent pathogens or bioterror agents. Comparison of air samples with each other and nearby environments suggested that the indoor air microbes are notmore » random transients from surrounding outdoor environments, but rather originate from indoor niches. Sequence annotation by gene function revealed specific adaptive capabilities enriched in the air environment, including genes potentially involved in resistance to desiccation and oxidative damage. This baseline index of air microbiota will be valuable for improving designs of surveillance for natural or man-made release of virulent pathogens.« less
Comprehensive analysis of an Antarctic bacterial community with the adaptability of growth at higher temperatures than those in Antarctica.

PubMed

Hosoi-Tanabe, Shoko; Zhang, Hongyan; Zhu, Daochen; Nagata, Shinichi; Ban, Syuhei; Imura, Satoshi

2010-06-01

To investigate the adaptability to higher temperatures of Antarctic microorganisms persisting in low temperature conditions for a long time, Antarctic lake samples were incubated in several selection media at 25 degrees C and 30 degrees C. The microorganisms did not grow at 30 degrees C; however, some of them grew at 25 degrees C, indicating that the bacteria in Antarctic have the ability to grow at a wide range of temperatures. Total DNA was extracted from these microorganisms and amplified using the bacteria-universal primers. The amplified fragments were cloned, and randomly selected 48 clones were sequenced. The sequenced clones showed high similarity to the alpha-subdivision of the Proteobacteria with specific affinity to the genus Agrobacterium, Caulobacter and Brevundimonas, the ss-subdivision of Proteobacteria with specific affinity to the genus Cupriavidus, and Bacillus of the phylum Firmicutes. These results showed the presence of universal genera, suggesting that the bacteria in the Antarctic lake were not specific to this environment.
The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population.

PubMed Central

Nordborg, Magnus; Innan, Hideki

2003-01-01

A stochastic model for the genealogy of a sample of recombining sequences containing one or more sites subject to selection in a subdivided population is described. Selection is incorporated by dividing the population into allelic classes and then conditioning on the past sizes of these classes. The past allele frequencies at the selected sites are thus treated as parameters rather than as random variables. The purpose of the model is not to investigate the dynamics of selection, but to investigate effects of linkage to the selected sites on the genealogy of the surrounding chromosomal region. This approach is useful for modeling strong selection, when it is natural to parameterize the past allele frequencies at the selected sites. Several models of strong balancing selection are used as examples, and the effects on the pattern of neutral polymorphism in the chromosomal region are discussed. We focus in particular on the statistical power to detect balancing selection when it is present. PMID:12663556
The genealogy of sequences containing multiple sites subject to strong selection in a subdivided population.

PubMed

Nordborg, Magnus; Innan, Hideki

2003-03-01

A stochastic model for the genealogy of a sample of recombining sequences containing one or more sites subject to selection in a subdivided population is described. Selection is incorporated by dividing the population into allelic classes and then conditioning on the past sizes of these classes. The past allele frequencies at the selected sites are thus treated as parameters rather than as random variables. The purpose of the model is not to investigate the dynamics of selection, but to investigate effects of linkage to the selected sites on the genealogy of the surrounding chromosomal region. This approach is useful for modeling strong selection, when it is natural to parameterize the past allele frequencies at the selected sites. Several models of strong balancing selection are used as examples, and the effects on the pattern of neutral polymorphism in the chromosomal region are discussed. We focus in particular on the statistical power to detect balancing selection when it is present.
A discovery of novel microRNAs in the silkworm (Bombyx mori) genome.

PubMed

Yu, Xiaomin; Zhou, Qing; Cai, Yimei; Luo, Qibin; Lin, Hongbin; Hu, Songnian; Yu, Jun

2009-12-01

MicroRNAs (miRNAs) are pivotal regulators involved in various physiological and pathological processes via their post-transcriptional regulation of gene expressions. We sequenced 14 libraries of small RNAs constructed from samples spanning the life cycle of silkworms, and discovered 50 novel miRNAs previously not known in animals and verified 43 of them using stem-loop RT-PCR. Our genome-wide analyses of 27 species-specific miRNAs suggest they arise from transposable elements, protein-coding genes duplication/transposition and random foldback sequences; which is consistent with the idea that novel animal miRNAs may evolve from incomplete self-complementary transcripts and become fixed in the process of co-adaptation with their targets. Computational prediction suggests that the silkworm-specific miRNAs may have a preference of regulating genes that are related to life-cycle-associated traits, and these genes can serve as potential targets for subsequent studies of the modulating networks in the development of Bombyx mori.
The Airborne Metagenome in an Indoor Urban Environment

PubMed Central

Liu, Xuguo; Yu, Yiting; Lee, Wah Heng; Yap, Jennifer; Yao, Fei; Suan, Sim Tiow; Ing, Seah Keng; Haynes, Matthew; Rohwer, Forest; Wei, Chia Lin; Tan, Patrick; Bristow, James; Rubin, Edward M.; Ruan, Yijun

2008-01-01

The indoor atmosphere is an ecological unit that impacts on public health. To investigate the composition of organisms in this space, we applied culture-independent approaches to microbes harvested from the air of two densely populated urban buildings, from which we analyzed 80 megabases genomic DNA sequence and 6000 16S rDNA clones. The air microbiota is primarily bacteria, including potential opportunistic pathogens commonly isolated from human-inhabited environments such as hospitals, but none of the data contain matches to virulent pathogens or bioterror agents. Comparison of air samples with each other and nearby environments suggested that the indoor air microbes are not random transients from surrounding outdoor environments, but rather originate from indoor niches. Sequence annotation by gene function revealed specific adaptive capabilities enriched in the air environment, including genes potentially involved in resistance to desiccation and oxidative damage. This baseline index of air microbiota will be valuable for improving designs of surveillance for natural or man-made release of virulent pathogens. PMID:18382653
Using random forests for assistance in the curation of G-protein coupled receptor databases.

PubMed

Shkurin, Aleksei; Vellido, Alfredo

2017-08-18

Biology is experiencing a gradual but fast transformation from a laboratory-centred science towards a data-centred one. As such, it requires robust data engineering and the use of quantitative data analysis methods as part of database curation. This paper focuses on G protein-coupled receptors, a large and heterogeneous super-family of cell membrane proteins of interest to biology in general. One of its families, Class C, is of particular interest to pharmacology and drug design. This family is quite heterogeneous on its own, and the discrimination of its several sub-families is a challenging problem. In the absence of known crystal structure, such discrimination must rely on their primary amino acid sequences. We are interested not as much in achieving maximum sub-family discrimination accuracy using quantitative methods, but in exploring sequence misclassification behavior. Specifically, we are interested in isolating those sequences showing consistent misclassification, that is, sequences that are very often misclassified and almost always to the same wrong sub-family. Random forests are used for this analysis due to their ensemble nature, which makes them naturally suited to gauge the consistency of misclassification. This consistency is here defined through the voting scheme of their base tree classifiers. Detailed consistency results for the random forest ensemble classification were obtained for all receptors and for all data transformations of their unaligned primary sequences. Shortlists of the most consistently misclassified receptors for each subfamily and transformation, as well as an overall shortlist including those cases that were consistently misclassified across transformations, were obtained. The latter should be referred to experts for further investigation as a data curation task. The automatic discrimination of the Class C sub-families of G protein-coupled receptors from their unaligned primary sequences shows clear limits. This study has investigated in some detail the consistency of their misclassification using random forest ensemble classifiers. Different sub-families have been shown to display very different discrimination consistency behaviors. The individual identification of consistently misclassified sequences should provide a tool for quality control to GPCR database curators.
Random trinomial tree models and vanilla options

NASA Astrophysics Data System (ADS)

Ganikhodjaev, Nasir; Bayram, Kamola

2013-09-01

In this paper we introduce and study random trinomial model. The usual trinomial model is prescribed by triple of numbers (u, d, m). We call the triple (u, d, m) an environment of the trinomial model. A triple (Un, Dn, Mn), where {Un}, {Dn} and {Mn} are the sequences of independent, identically distributed random variables with 0 < Dn < 1 < Un and Mn = 1 for all n, is called a random environment and trinomial tree model with random environment is called random trinomial model. The random trinomial model is considered to produce more accurate results than the random binomial model or usual trinomial model.
The Use of a Sequenced Questioning Paradigm to Facilitate Associative Fluency in Preschoolers.

ERIC Educational Resources Information Center

Pellegrini, A. D.; Greene, Helen

The extent to which free play versus sequenced questioning conditions facilitates preschoolers' associative fluency was investigated in this study. Twenty-four children (12 boys and 12 girls, with a mean age of 50.7 months) were randomly assigned to one of three conditions: free play, sequenced questioning, and control. In the sequenced…
Assessing the Impact of Sequencing Practicums for Welding in Agricultural Mechanics

ERIC Educational Resources Information Center

Rose, Malcolm; Pate, Michael L.; Lawver, Rebecca G.; Warnick, Brian K.; Dai, Xin

2015-01-01

This study examined the impact of sequencing practicums for welding on students' ability to perform a 1F (flat position-fillet lap joint) weld on low-carbon steel. Participants were randomly assigned a specific practice sequence of welding for using gas metal arc welding (GMAW) and shielded metal arc welding (SMAW). A total of 71 participants…
Single molecule targeted sequencing for cancer gene mutation detection.

PubMed

Gao, Yan; Deng, Liwei; Yan, Qin; Gao, Yongqian; Wu, Zengding; Cai, Jinsen; Ji, Daorui; Li, Gailing; Wu, Ping; Jin, Huan; Zhao, Luyang; Liu, Song; Ge, Liangjin; Deem, Michael W; He, Jiankui

2016-05-19

With the rapid decline in cost of sequencing, it is now affordable to examine multiple genes in a single disease-targeted clinical test using next generation sequencing. Current targeted sequencing methods require a separate step of targeted capture enrichment during sample preparation before sequencing. Although there are fast sample preparation methods available in market, the library preparation process is still relatively complicated for physicians to use routinely. Here, we introduced an amplification-free Single Molecule Targeted Sequencing (SMTS) technology, which combined targeted capture and sequencing in one step. We demonstrated that this technology can detect low-frequency mutations using artificially synthesized DNA sample. SMTS has several potential advantages, including simple sample preparation thus no biases and errors are introduced by PCR reaction. SMTS has the potential to be an easy and quick sequencing technology for clinical diagnosis such as cancer gene mutation detection, infectious disease detection, inherited condition screening and noninvasive prenatal diagnosis.
Analysis of sequences from field samples reveals the presence of the recently described pepper vein yellows virus (genus Polerovirus) in six additional countries.

PubMed

Knierim, Dennis; Tsai, Wen-Shi; Kenyon, Lawrence

2013-06-01

Polerovirus infection was detected by reverse transcription polymerase chain reaction (RT-PCR) in 29 pepper plants (Capsicum spp.) and one black nightshade plant (Solanum nigrum) sample collected from fields in India, Indonesia, Mali, Philippines, Thailand and Taiwan. At least two representative samples for each country were selected to generate a general polerovirus RT-PCR product of 1.4 kb length for sequencing. Sequence analysis of the partial genome sequences revealed the presence of pepper vein yellows virus (PeVYV) in all 13 samples. A 1990 Australian herbarium sample of pepper described by serological means as infected with capsicum yellows virus (CYV) was identified by sequence analysis of a partial CP sequence as probably infected with a potato leaf roll virus (PLRV) isolate.
Differentially Private Frequent Sequence Mining via Sampling-based Candidate Pruning

PubMed Central

Xu, Shengzhi; Cheng, Xiang; Li, Zhengyi; Xiong, Li

2016-01-01

In this paper, we study the problem of mining frequent sequences under the rigorous differential privacy model. We explore the possibility of designing a differentially private frequent sequence mining (FSM) algorithm which can achieve both high data utility and a high degree of privacy. We found, in differentially private FSM, the amount of required noise is proportionate to the number of candidate sequences. If we could effectively reduce the number of unpromising candidate sequences, the utility and privacy tradeoff can be significantly improved. To this end, by leveraging a sampling-based candidate pruning technique, we propose a novel differentially private FSM algorithm, which is referred to as PFS2. The core of our algorithm is to utilize sample databases to further prune the candidate sequences generated based on the downward closure property. In particular, we use the noisy local support of candidate sequences in the sample databases to estimate which sequences are potentially frequent. To improve the accuracy of such private estimations, a sequence shrinking method is proposed to enforce the length constraint on the sample databases. Moreover, to decrease the probability of misestimating frequent sequences as infrequent, a threshold relaxation method is proposed to relax the user-specified threshold for the sample databases. Through formal privacy analysis, we show that our PFS2 algorithm is ε-differentially private. Extensive experiments on real datasets illustrate that our PFS2 algorithm can privately find frequent sequences with high accuracy. PMID:26973430
First isolation of Leptospira noguchii serogroups Panama and Autumnalis from cattle.

PubMed

Martins, G; Loureiro, A P; Hamond, C; Pinna, M H; Bremont, S; Bourhy, P; Lilenbaum, W

2015-05-01

Prevention and control of leptospirosis are based on the knowledge of locally circulating strains. Thus, efforts to obtain local isolates are paramount to the epidemiological understanding of leptospirosis. We report and discuss here the first isolation of members of serogroups Autumnalis and Panama from cattle, both belonging to Leptospira noguchii species. Urine samples (n = 167) were collected directly by puncture of the bladder from randomly selected cows from a slaughterhouse in Rio de Janeiro, Brazil, for bacteriological culture. Isolates were characterized by serogrouping and sequencing (rrs and secY genes). Overall, 10/167 positive urine samples (6%) were obtained. Sequencing of amplicons targeting for both rrs and secY genes identified two of them (2013_U73 and 2013_U232) as L. noguchii. Serogrouping of those strains indicated that 2013_U73 belonged to the Panama serogroup (titre 1600), and 2013_U232 to the Autumnalis serogroup (titre 12800). Both Panama and Autumnalis are known agents of incidental leptospirosis in cattle. This group of leptospires could be particularly important in tropical countries. This is the first report of members of serogroups Autumnalis and Panama belonging to L. noguchii species from cattle. Although related to previously reported strains, these isolates have been shown to be genetically diverse from them.
Identification and tracing of Enterococcus spp. by RAPD-PCR in traditional fermented sausages and meat environment.

PubMed

Martín, B; Corominas, L; Garriga, M; Aymerich, T

2009-01-01

Four local small-scale factories were studied to determine the sources of enterococci in traditional fermented sausages. Different points during the production of a traditional fermented sausage type (fuet) were evaluated. Randomly amplified polymorphic DNA (RAPD)-PCR was used to type 596 Enterococcus isolates from the final products, the initial meat batter, the casing, the workers' hands and the equipment. Species-specific PCR-multiplex and the partial sequencing of atpA gene and 16S rRNA gene sequencing allowed the identification of the isolates: Enterococcus faecalis (31.4%), Enterococcus faecium (30.7%), Enterococcus sanguinicola (14.9%), Enterococcus devriesei (9.7%), Enterococcus malodoratus (7.2%), Enterococcus gilvus (1.0%), Enterococcus gallinarum (1.3%), Enterococcus casseliflavus (3.4%), Enterococcus hermanniensis (0.2%), and Enterococcus durans (0.2%). A total of 92 different RAPD-PCR profiles were distributed among the different factories and samples evaluated. Most of the genotypes found in fuet samples were traced back to their source. The major sources of enterococci in the traditional fermented sausages studied were mainly the equipment followed by the raw ingredients, although a low proportion was traced back to human origin. This work contributes to determine the source of enterococcal contamination in fermented sausages and also to the knowledge of the meat environment.
Diversity, Physiochemical and Phylogenetic Analyses of Bacteria Isolated from Various Drinking Water Sources.

PubMed

Eid, Neveen H; Al Doghaither, Huda A; Kumosani, Taha A; Gull, Munazza

2017-01-01

To evaluate the indigenous bacterial strains of drinking water from the most commercial water types including bottled and filtered water that are currently used in Saudi Arabia. Thirty randomly selected commercial brands of bottled water were purchased from Saudi local markets. Moreover, samples from tap water and filtered water were collected in sterilized glass bottles and stored at 4°C. Biochemical analyses including pH, temperature, lactose fermentation test (LAC), indole test (IND), methyl red test (MR), Voges-Proskauer test (VP), urease test (URE), catalase test (CAT), aerobic and anaerobic test (Ae/An) were measured. Molecular identification and comparative sequence analyses were done by full length 16S rRNA gene sequences using gene bank databases and phylogenetic trees were constructed to see the closely related similarity index between bacterial strains. Among 30 water samples tested, 18 were found positive for bacterial growth. Molecular identification of four selected bacterial strains indicated the alarming presence of pathogenic bacteria Bacillus spp . in most common commercial types of drinking water used in Saudi Arabia. The lack of awareness about good sanitation, poor personal hygienic practices and failure of safe water management and supply are the important factors for poor drinking water quality in these sources, need to be addressed.
A novel, privacy-preserving cryptographic approach for sharing sequencing data

PubMed Central

Cassa, Christopher A; Miller, Rachel A; Mandl, Kenneth D

2013-01-01

Objective DNA samples are often processed and sequenced in facilities external to the point of collection. These samples are routinely labeled with patient identifiers or pseudonyms, allowing for potential linkage to identity and private clinical information if intercepted during transmission. We present a cryptographic scheme to securely transmit externally generated sequence data which does not require any patient identifiers, public key infrastructure, or the transmission of passwords. Materials and methods This novel encryption scheme cryptographically protects participant sequence data using a shared secret key that is derived from a unique subset of an individual’s genetic sequence. This scheme requires access to a subset of an individual’s genetic sequence to acquire full access to the transmitted sequence data, which helps to prevent sample mismatch. Results We validate that the proposed encryption scheme is robust to sequencing errors, population uniqueness, and sibling disambiguation, and provides sufficient cryptographic key space. Discussion Access to a set of an individual’s genotypes and a mutually agreed cryptographic seed is needed to unlock the full sequence, which provides additional sample authentication and authorization security. We present modest fixed and marginal costs to implement this transmission architecture. Conclusions It is possible for genomics researchers who sequence participant samples externally to protect the transmission of sequence data using unique features of an individual’s genetic sequence. PMID:23125421
Antibody performance in ChIP-sequencing assays: From quality scores of public data sets to quantitative certification.

PubMed

Mendoza-Parra, Marco-Antonio; Saravaki, Vincent; Cholley, Pierre-Etienne; Blum, Matthias; Billoré, Benjamin; Gronemeyer, Hinrich

2016-01-01

We have established a certification system for antibodies to be used in chromatin immunoprecipitation assays coupled to massive parallel sequencing (ChIP-seq). This certification comprises a standardized ChIP procedure and the attribution of a numerical quality control indicator (QCi) to biological replicate experiments. The QCi computation is based on a universally applicable quality assessment that quantitates the global deviation of randomly sampled subsets of ChIP-seq dataset with the original genome-aligned sequence reads. Comparison with a QCi database for >28,000 ChIP-seq assays were used to attribute quality grades (ranging from 'AAA' to 'DDD') to a given dataset. In the present report we used the numerical QC system to assess the factors influencing the quality of ChIP-seq assays, including the nature of the target, the sequencing depth and the commercial source of the antibody. We have used this approach specifically to certify mono and polyclonal antibodies obtained from Active Motif directed against the histone modification marks H3K4me3, H3K27ac and H3K9ac for ChIP-seq. The antibodies received the grades AAA to BBC ( www.ngs-qc.org). We propose to attribute such quantitative grading of all antibodies attributed with the label "ChIP-seq grade".
Transcriptome sequencing and differential gene expression analysis in Viola yedoensis Makino (Fam. Violaceae) responsive to cadmium (Cd) pollution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gao, Jian; Luo, Mao; Zhu, Ye

2015-03-27

Viola yedoensis Makino is an important Chinese traditional medicine plant adapted to cadmium (Cd) pollution regions. Illumina sequencing technology was used to sequence the transcriptome of V. yedoensis Makino. We sequenced Cd-treated (VIYCd) and untreated (VIYCK) samples of V. yedoensis, and obtained 100,410,834 and 83,587,676 high quality reads, respectively. After de novo assembly and quantitative assessment, 109,800 unigenes were finally generated with an average length of 661 bp. We then obtained functional annotations by aligning unigenes with public protein databases including NR, NT, SwissProt, KEGG and COG. In addition, 892 differentially expressed genes (DEGs) were investigated between the two libraries ofmore » untreated (VIYCK) and Cd-treated (VIYCd) plants. Moreover, 15 randomly selected DEGs were further validated with qRT-PCR and the results were highly accordant with the Solexa analysis. This study firstly generated a successful global analysis of the V. yedoensis transcriptome and it will provide for further studies on gene expression, genomics, and functional genomics in Violaceae. - Highlights: • A de novo assembly generated 109,800 unigenes and 5,4479 of them were annotated. • 31,285 could be classified into 26 COG categories. • 263 biosynthesis pathways were predicted and classified into five categories. • 892 DEGs were detected and 15 of them were validated by qRT-PCR.« less
Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

NASA Technical Reports Server (NTRS)

Hurd, W. J.; Welch, L. R.

1977-01-01

A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.