Sample records for system call sequences

  1. A Machine Learning Method for Power Prediction on the Mobile Devices.

    PubMed

    Chen, Da-Ren; Chen, You-Shyang; Chen, Lin-Chih; Hsu, Ming-Yang; Chiang, Kai-Feng

    2015-10-01

    Energy profiling and estimation have been popular areas of research in multicore mobile architectures. While short sequences of system calls have been recognized by machine learning as pattern descriptions for anomalous detection, power consumption of running processes with respect to system-call patterns are not well studied. In this paper, we propose a fuzzy neural network (FNN) for training and analyzing process execution behaviour with respect to series of system calls, parameters and their power consumptions. On the basis of the patterns of a series of system calls, we develop a power estimation daemon (PED) to analyze and predict the energy consumption of the running process. In the initial stage, PED categorizes sequences of system calls as functional groups and predicts their energy consumptions by FNN. In the operational stage, PED is applied to identify the predefined sequences of system calls invoked by running processes and estimates their energy consumption.

  2. ParticleCall: A particle filter for base calling in next-generation sequencing systems

    PubMed Central

    2012-01-01

    Background Next-generation sequencing systems are capable of rapid and cost-effective DNA sequencing, thus enabling routine sequencing tasks and taking us one step closer to personalized medicine. Accuracy and lengths of their reads, however, are yet to surpass those provided by the conventional Sanger sequencing method. This motivates the search for computationally efficient algorithms capable of reliable and accurate detection of the order of nucleotides in short DNA fragments from the acquired data. Results In this paper, we consider Illumina’s sequencing-by-synthesis platform which relies on reversible terminator chemistry and describe the acquired signal by reformulating its mathematical model as a Hidden Markov Model. Relying on this model and sequential Monte Carlo methods, we develop a parameter estimation and base calling scheme called ParticleCall. ParticleCall is tested on a data set obtained by sequencing phiX174 bacteriophage using Illumina’s Genome Analyzer II. The results show that the developed base calling scheme is significantly more computationally efficient than the best performing unsupervised method currently available, while achieving the same accuracy. Conclusions The proposed ParticleCall provides more accurate calls than the Illumina’s base calling algorithm, Bustard. At the same time, ParticleCall is significantly more computationally efficient than other recent schemes with similar performance, rendering it more feasible for high-throughput sequencing data analysis. Improvement of base calling accuracy will have immediate beneficial effects on the performance of downstream applications such as SNP and genotype calling. ParticleCall is freely available at https://sourceforge.net/projects/particlecall. PMID:22776067

  3. Gelada vocal sequences follow Menzerath's linguistic law.

    PubMed

    Gustison, Morgan L; Semple, Stuart; Ferrer-I-Cancho, Ramon; Bergman, Thore J

    2016-05-10

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath's law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath's law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath's law reflects compression-the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language.

  4. The alarm call system of two species of black-and-white colobus monkeys (Colobus polykomos and Colobus guereza).

    PubMed

    Schel, Anne Marijke; Tranquilli, Sandra; Zuberbühler, Klaus

    2009-05-01

    Vervet monkey alarm calling has long been the paradigmatic example of how primates use vocalizations in response to predators. In vervets, there is a close and direct relationship between the production of distinct alarm vocalizations and the presence of distinct predator types. Recent fieldwork has however revealed the use of several additional alarm calling systems in primates. Here, the authors describe playback studies on the alarm call system of two colobine species, the King colobus (Colobus polykomos) of Taï Forest, Ivory Coast, and the Guereza colobus (C. guereza) of Budongo Forest, Uganda. Both species produce two basic alarm call types, snorts and acoustically variable roaring phrases, when confronted with leopards or crowned eagles. Neither call type is given exclusively to one predator, but the authors found strong regularities in call sequencing. Leopards typically elicited sequences consisting of a snort followed by few phrases, while eagles typically elicited sequences with no snorts and many phrases. The authors discuss how these call sequences have the potential to encode information at different levels, such as predator type, response-urgency, or the caller's imminent behavior. (PsycINFO Database Record (c) 2009 APA, all rights reserved).

  5. Gelada vocal sequences follow Menzerath’s linguistic law

    PubMed Central

    Gustison, Morgan L.; Semple, Stuart; Ferrer-i-Cancho, Ramon; Bergman, Thore J.

    2016-01-01

    Identifying universal principles underpinning diverse natural systems is a key goal of the life sciences. A powerful approach in addressing this goal has been to test whether patterns consistent with linguistic laws are found in nonhuman animals. Menzerath’s law is a linguistic law that states that, the larger the construct, the smaller the size of its constituents. Here, to our knowledge, we present the first evidence that Menzerath’s law holds in the vocal communication of a nonhuman species. We show that, in vocal sequences of wild male geladas (Theropithecus gelada), construct size (sequence size in number of calls) is negatively correlated with constituent size (duration of calls). Call duration does not vary significantly with position in the sequence, but call sequence composition does change with sequence size and most call types are abbreviated in larger sequences. We also find that intercall intervals follow the same relationship with sequence size as do calls. Finally, we provide formal mathematical support for the idea that Menzerath’s law reflects compression—the principle of minimizing the expected length of a code. Our findings suggest that a common principle underpins human and gelada vocal communication, highlighting the value of exploring the applicability of linguistic laws in vocal systems outside the realm of language. PMID:27091968

  6. When seconds count: A study of communication variables in the opening segment of emergency calls.

    PubMed

    Penn, Claire; Koole, Tom; Nattrass, Rhona

    2017-09-01

    The opening sequence of an emergency call influences the efficiency of the ambulance dispatch time. The greeting sequences in 105 calls to a South African emergency service were analysed. Initial results suggested the advantage of a specific two-part opening sequence. An on-site experiment aimed at improving call efficiency was conducted during one shift (1100 calls). Results indicated reduced conversational repairs and a significant reduction of 4 seconds in mean call length. Implications for systems and training are derived.

  7. Wild Birds Use an Ordering Rule to Decode Novel Call Sequences.

    PubMed

    Suzuki, Toshitaka N; Wheatcroft, David; Griesser, Michael

    2017-08-07

    The generative power of human language depends on grammatical rules, such as word ordering, that allow us to produce and comprehend even novel combinations of words [1-3]. Several species of birds and mammals produce sequences of calls [4-6], and, like words in human sentences, their order may influence receiver responses [7]. However, it is unknown whether animals use call ordering to extract meaning from truly novel sequences. Here, we use a novel experimental approach to test this in a wild bird species, the Japanese tit (Parus minor). Japanese tits are attracted to mobbing a predator when they hear conspecific alert and recruitment calls ordered as alert-recruitment sequences [7]. They also approach in response to recruitment calls of heterospecific individuals in mixed-species flocks [8, 9]. Using experimental playbacks, we assess their responses to artificial sequences in which their own alert calls are combined into different orderings with heterospecific recruitment calls. We find that Japanese tits respond similarly to mixed-species alert-recruitment call sequences and to their own alert-recruitment sequences. Importantly, however, tits rarely respond to mixed-species sequences in which the call order is reversed. Thus, Japanese tits extract a compound meaning from novel call sequences using an ordering rule. These results demonstrate a new parallel between animal communication systems and human language, opening new avenues for exploring the evolution of ordering rules and compositionality in animal vocal sequences. Copyright © 2017 Elsevier Ltd. All rights reserved.

  8. Campbell's monkeys concatenate vocalizations into context-specific call sequences

    PubMed Central

    Ouattara, Karim; Lemasson, Alban; Zuberbühler, Klaus

    2009-01-01

    Primate vocal behavior is often considered irrelevant in modeling human language evolution, mainly because of the caller's limited vocal control and apparent lack of intentional signaling. Here, we present the results of a long-term study on Campbell's monkeys, which has revealed an unrivaled degree of vocal complexity. Adult males produced six different loud call types, which they combined into various sequences in highly context-specific ways. We found stereotyped sequences that were strongly associated with cohesion and travel, falling trees, neighboring groups, nonpredatory animals, unspecific predatory threat, and specific predator classes. Within the responses to predators, we found that crowned eagles triggered four and leopards three different sequences, depending on how the caller learned about their presence. Callers followed a number of principles when concatenating sequences, such as nonrandom transition probabilities of call types, addition of specific calls into an existing sequence to form a different one, or recombination of two sequences to form a third one. We conclude that these primates have overcome some of the constraints of limited vocal control by combinatorial organization. As the different sequences were so tightly linked to specific external events, the Campbell's monkey call system may be the most complex example of ‘proto-syntax’ in animal communication known to date. PMID:20007377

  9. CoVaCS: a consensus variant calling system.

    PubMed

    Chiara, Matteo; Gioiosa, Silvia; Chillemi, Giovanni; D'Antonio, Mattia; Flati, Tiziano; Picardi, Ernesto; Zambelli, Federico; Horner, David Stephen; Pesole, Graziano; Castrignanò, Tiziana

    2018-02-05

    The advent and ongoing development of next generation sequencing technologies (NGS) has led to a rapid increase in the rate of human genome re-sequencing data, paving the way for personalized genomics and precision medicine. The body of genome resequencing data is progressively increasing underlining the need for accurate and time-effective bioinformatics systems for genotyping - a crucial prerequisite for identification of candidate causal mutations in diagnostic screens. Here we present CoVaCS, a fully automated, highly accurate system with a web based graphical interface for genotyping and variant annotation. Extensive tests on a gold standard benchmark data-set -the NA12878 Illumina platinum genome- confirm that call-sets based on our consensus strategy are completely in line with those attained by similar command line based approaches, and far more accurate than call-sets from any individual tool. Importantly our system exhibits better sensitivity and higher specificity than equivalent commercial software. CoVaCS offers optimized pipelines integrating state of the art tools for variant calling and annotation for whole genome sequencing (WGS), whole-exome sequencing (WES) and target-gene sequencing (TGS) data. The system is currently hosted at Cineca, and offers the speed of a HPC computing facility, a crucial consideration when large numbers of samples must be analysed. Importantly, all the analyses are performed automatically allowing high reproducibility of the results. As such, we believe that CoVaCS can be a valuable tool for the analysis of human genome resequencing studies. CoVaCS is available at: https://bioinformatics.cineca.it/covacs .

  10. Mixed Sequence Reader: A Program for Analyzing DNA Sequences with Heterozygous Base Calling

    PubMed Central

    Chang, Chun-Tien; Tsai, Chi-Neu; Tang, Chuan Yi; Chen, Chun-Houh; Lian, Jang-Hau; Hu, Chi-Yu; Tsai, Chia-Lung; Chao, Angel; Lai, Chyong-Huey; Wang, Tzu-Hao; Lee, Yun-Shien

    2012-01-01

    The direct sequencing of PCR products generates heterozygous base-calling fluorescence chromatograms that are useful for identifying single-nucleotide polymorphisms (SNPs), insertion-deletions (indels), short tandem repeats (STRs), and paralogous genes. Indels and STRs can be easily detected using the currently available Indelligent or ShiftDetector programs, which do not search reference sequences. However, the detection of other genomic variants remains a challenge due to the lack of appropriate tools for heterozygous base-calling fluorescence chromatogram data analysis. In this study, we developed a free web-based program, Mixed Sequence Reader (MSR), which can directly analyze heterozygous base-calling fluorescence chromatogram data in .abi file format using comparisons with reference sequences. The heterozygous sequences are identified as two distinct sequences and aligned with reference sequences. Our results showed that MSR may be used to (i) physically locate indel and STR sequences and determine STR copy number by searching NCBI reference sequences; (ii) predict combinations of microsatellite patterns using the Federal Bureau of Investigation Combined DNA Index System (CODIS); (iii) determine human papilloma virus (HPV) genotypes by searching current viral databases in cases of double infections; (iv) estimate the copy number of paralogous genes, such as β-defensin 4 (DEFB4) and its paralog HSPDP3. PMID:22778697

  11. Machine Learned Replacement of N-Labels for Basecalled Sequences in DNA Barcoding.

    PubMed

    Ma, Eddie Y T; Ratnasingham, Sujeevan; Kremer, Stefan C

    2018-01-01

    This study presents a machine learning method that increases the number of identified bases in Sanger Sequencing. The system post-processes a KB basecalled chromatogram. It selects a recoverable subset of N-labels in the KB-called chromatogram to replace with basecalls (A,C,G,T). An N-label correction is defined given an additional read of the same sequence, and a human finished sequence. Corrections are added to the dataset when an alignment determines the additional read and human agree on the identity of the N-label. KB must also rate the replacement with quality value of in the additional read. Corrections are only available during system training. Developing the system, nearly 850,000 N-labels are obtained from Barcode of Life Datasystems, the premier database of genetic markers called DNA Barcodes. Increasing the number of correct bases improves reference sequence reliability, increases sequence identification accuracy, and assures analysis correctness. Keeping with barcoding standards, our system maintains an error rate of percent. Our system only applies corrections when it estimates low rate of error. Tested on this data, our automation selects and recovers: 79 percent of N-labels from COI (animal barcode); 80 percent from matK and rbcL (plant barcodes); and 58 percent from non-protein-coding sequences (across eukaryotes).

  12. Group-based variant calling leveraging next-generation supercomputing for large-scale whole-genome sequencing studies.

    PubMed

    Standish, Kristopher A; Carland, Tristan M; Lockwood, Glenn K; Pfeiffer, Wayne; Tatineni, Mahidhar; Huang, C Chris; Lamberth, Sarah; Cherkas, Yauheniya; Brodmerkel, Carrie; Jaeger, Ed; Smith, Lance; Rajagopal, Gunaretnam; Curran, Mark E; Schork, Nicholas J

    2015-09-22

    Next-generation sequencing (NGS) technologies have become much more efficient, allowing whole human genomes to be sequenced faster and cheaper than ever before. However, processing the raw sequence reads associated with NGS technologies requires care and sophistication in order to draw compelling inferences about phenotypic consequences of variation in human genomes. It has been shown that different approaches to variant calling from NGS data can lead to different conclusions. Ensuring appropriate accuracy and quality in variant calling can come at a computational cost. We describe our experience implementing and evaluating a group-based approach to calling variants on large numbers of whole human genomes. We explore the influence of many factors that may impact the accuracy and efficiency of group-based variant calling, including group size, the biogeographical backgrounds of the individuals who have been sequenced, and the computing environment used. We make efficient use of the Gordon supercomputer cluster at the San Diego Supercomputer Center by incorporating job-packing and parallelization considerations into our workflow while calling variants on 437 whole human genomes generated as part of large association study. We ultimately find that our workflow resulted in high-quality variant calls in a computationally efficient manner. We argue that studies like ours should motivate further investigations combining hardware-oriented advances in computing systems with algorithmic developments to tackle emerging 'big data' problems in biomedical research brought on by the expansion of NGS technologies.

  13. Integrated on-line system for DNA sequencing by capillary electrophoresis: From template to called bases

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ton, H.; Yeung, E.S.

    1997-02-15

    An integrated on-line prototype for coupling a microreactor to capillary electrophoresis for DNA sequencing has been demonstrated. A dye-labeled terminator cycle-sequencing reaction is performed in a fused-silica capillary. Subsequently, the sequencing ladder is directly injected into a size-exclusion chromatographic column operated at nearly 95{degree}C for purification. On-line injection to a capillary for electrophoresis is accomplished at a junction set at nearly 70{degree}C. High temperature at the purification column and injection junction prevents the renaturation of DNA fragments during on-line transfer without affecting the separation. The high solubility of DNA in and the relatively low ionic strength of 1 x TEmore » buffer permit both effective purification and electrokinetic injection of the DNA sample. The system is compatible with highly efficient separations by a replaceable poly(ethylene oxide) polymer solution in uncoated capillary tubes. Future automation and adaptation to a multiple-capillary array system should allow high-speed, high-throughput DNA sequencing from templates to called bases in one step. 32 refs., 5 figs.« less

  14. Meaningful call combinations and compositional processing in the southern pied babbler

    PubMed Central

    Engesser, Sabrina; Ridley, Amanda R.; Townsend, Simon W.

    2016-01-01

    Language’s expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a “mobbing sequence,” potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought. PMID:27155011

  15. Meaningful call combinations and compositional processing in the southern pied babbler.

    PubMed

    Engesser, Sabrina; Ridley, Amanda R; Townsend, Simon W

    2016-05-24

    Language's expressive power is largely attributable to its compositionality: meaningful words are combined into larger/higher-order structures with derived meaning. Despite its importance, little is known regarding the evolutionary origins and emergence of this syntactic ability. Although previous research has shown a rudimentary capability to combine meaningful calls in primates, because of a scarcity of comparative data, it is unclear to what extent analog forms might also exist outside of primates. Here, we address this ambiguity and provide evidence for rudimentary compositionality in the discrete vocal system of a social passerine, the pied babbler (Turdoides bicolor). Natural observations and predator presentations revealed that babblers produce acoustically distinct alert calls in response to close, low-urgency threats and recruitment calls when recruiting group members during locomotion. On encountering terrestrial predators, both vocalizations are combined into a "mobbing sequence," potentially to recruit group members in a dangerous situation. To investigate whether babblers process the sequence in a compositional way, we conducted systematic experiments, playing back the individual calls in isolation as well as naturally occurring and artificial sequences. Babblers reacted most strongly to mobbing sequence playbacks, showing a greater attentiveness and a quicker approach to the loudspeaker, compared with individual calls or control sequences. We conclude that the sequence constitutes a compositional structure, communicating information on both the context and the requested action. Our work supports previous research suggesting combinatoriality as a viable mechanism to increase communicative output and indicates that the ability to combine and process meaningful vocal structures, a basic syntax, may be more widespread than previously thought.

  16. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects

    PubMed Central

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB PMID:25281234

  17. Performance characteristics of the AmpliSeq Cancer Hotspot panel v2 in combination with the Ion Torrent Next Generation Sequencing Personal Genome Machine.

    PubMed

    Butler, Kimberly S; Young, Megan Y L; Li, Zhihua; Elespuru, Rosalie K; Wood, Steven C

    2016-02-01

    Next-Generation Sequencing is a rapidly advancing technology that has research and clinical applications. For many cancers, it is important to know the precise mutation(s) present, as specific mutations could indicate or contra-indicate certain treatments as well as be indicative of prognosis. Using the Ion Torrent Personal Genome Machine and the AmpliSeq Cancer Hotspot panel v2, we sequenced two pancreatic cancer cell lines, BxPC-3 and HPAF-II, alone or in mixtures, to determine the error rate, sensitivity, and reproducibility of this system. The system resulted in coverage averaging 2000× across the various amplicons and was able to reliably and reproducibly identify mutations present at a rate of 5%. Identification of mutations present at a lower rate was possible by altering the parameters by which calls were made, but with an increase in erroneous, low-level calls. The panel was able to identify known mutations in these cell lines that are present in the COSMIC database. In addition, other, novel mutations were also identified that may prove clinically useful. The system was assessed for systematic errors such as homopolymer effects, end of amplicon effects and patterns in NO CALL sequence. Overall, the system is adequate at identifying the known, targeted mutations in the panel. Published by Elsevier Inc.

  18. CanvasDB: a local database infrastructure for analysis of targeted- and whole genome re-sequencing projects.

    PubMed

    Ameur, Adam; Bunikis, Ignas; Enroth, Stefan; Gyllensten, Ulf

    2014-01-01

    CanvasDB is an infrastructure for management and analysis of genetic variants from massively parallel sequencing (MPS) projects. The system stores SNP and indel calls in a local database, designed to handle very large datasets, to allow for rapid analysis using simple commands in R. Functional annotations are included in the system, making it suitable for direct identification of disease-causing mutations in human exome- (WES) or whole-genome sequencing (WGS) projects. The system has a built-in filtering function implemented to simultaneously take into account variant calls from all individual samples. This enables advanced comparative analysis of variant distribution between groups of samples, including detection of candidate causative mutations within family structures and genome-wide association by sequencing. In most cases, these analyses are executed within just a matter of seconds, even when there are several hundreds of samples and millions of variants in the database. We demonstrate the scalability of canvasDB by importing the individual variant calls from all 1092 individuals present in the 1000 Genomes Project into the system, over 4.4 billion SNPs and indels in total. Our results show that canvasDB makes it possible to perform advanced analyses of large-scale WGS projects on a local server. Database URL: https://github.com/UppsalaGenomeCenter/CanvasDB. © The Author(s) 2014. Published by Oxford University Press.

  19. Systematic comparison of variant calling pipelines using gold standard personal exome variants

    PubMed Central

    Hwang, Sohyun; Kim, Eiru; Lee, Insuk; Marcotte, Edward M.

    2015-01-01

    The success of clinical genomics using next generation sequencing (NGS) requires the accurate and consistent identification of personal genome variants. Assorted variant calling methods have been developed, which show low concordance between their calls. Hence, a systematic comparison of the variant callers could give important guidance to NGS-based clinical genomics. Recently, a set of high-confident variant calls for one individual (NA12878) has been published by the Genome in a Bottle (GIAB) consortium, enabling performance benchmarking of different variant calling pipelines. Based on the gold standard reference variant calls from GIAB, we compared the performance of thirteen variant calling pipelines, testing combinations of three read aligners—BWA-MEM, Bowtie2, and Novoalign—and four variant callers—Genome Analysis Tool Kit HaplotypeCaller (GATK-HC), Samtools mpileup, Freebayes and Ion Proton Variant Caller (TVC), for twelve data sets for the NA12878 genome sequenced by different platforms including Illumina2000, Illumina2500, and Ion Proton, with various exome capture systems and exome coverage. We observed different biases toward specific types of SNP genotyping errors by the different variant callers. The results of our study provide useful guidelines for reliable variant identification from deep sequencing of personal genomes. PMID:26639839

  20. THE RHIC SEQUENCER.

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    VAN ZEIJTS,J.; DOTTAVIO,T.; FRAK,B.

    The Relativistic Heavy Ion Collider (RHIC) has a high level asynchronous time-line driven by a controlling program called the ''Sequencer''. Most high-level magnet and beam related issues are orchestrated by this system. The system also plays an important task in coordinated data acquisition and saving. We present the program, operator interface, operational impact and experience.

  1. STORMSeq: an open-source, user-friendly pipeline for processing personal genomics data in the cloud.

    PubMed

    Karczewski, Konrad J; Fernald, Guy Haskin; Martin, Alicia R; Snyder, Michael; Tatonetti, Nicholas P; Dudley, Joel T

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5-10 hours to process a full exome sequence and $30 and 3-8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2.

  2. Automation and integration of multiplexed on-line sample preparation with capillary electrophoresis for DNA sequencing

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Tan, H.

    1999-03-31

    The purpose of this research is to develop a multiplexed sample processing system in conjunction with multiplexed capillary electrophoresis for high-throughput DNA sequencing. The concept from DNA template to called bases was first demonstrated with a manually operated single capillary system. Later, an automated microfluidic system with 8 channels based on the same principle was successfully constructed. The instrument automatically processes 8 templates through reaction, purification, denaturation, pre-concentration, injection, separation and detection in a parallel fashion. A multiplexed freeze/thaw switching principle and a distribution network were implemented to manage flow direction and sample transportation. Dye-labeled terminator cycle-sequencing reactions are performedmore » in an 8-capillary array in a hot air thermal cycler. Subsequently, the sequencing ladders are directly loaded into a corresponding size-exclusion chromatographic column operated at {approximately} 60 C for purification. On-line denaturation and stacking injection for capillary electrophoresis is simultaneously accomplished at a cross assembly set at {approximately} 70 C. Not only the separation capillary array but also the reaction capillary array and purification columns can be regenerated after every run. DNA sequencing data from this system allow base calling up to 460 bases with accuracy of 98%.« less

  3. An Assessment for Learning System Called ACED: Designing for Learning Effectiveness and Accessibility. Research Report. ETS RR-07-26

    ERIC Educational Resources Information Center

    Shute, Valerie J.; Hansen, Eric G.; Almond, Russell G.

    2007-01-01

    This paper reports on a 3-year, NSF-funded research and development project called ACED: Adaptive Content with Evidence-based Diagnosis. The purpose of the project was to design, develop, and evaluate an assessment for learning (AfL) system for diverse students, using Algebra I content related to geometric sequences (i.e., successive numbers…

  4. Generation of animation sequences of three dimensional models

    NASA Technical Reports Server (NTRS)

    Poi, Sharon (Inventor); Bell, Brad N. (Inventor)

    1990-01-01

    The invention is directed toward a method and apparatus for generating an animated sequence through the movement of three-dimensional graphical models. A plurality of pre-defined graphical models are stored and manipulated in response to interactive commands or by means of a pre-defined command file. The models may be combined as part of a hierarchical structure to represent physical systems without need to create a separate model which represents the combined system. System motion is simulated through the introduction of translation, rotation and scaling parameters upon a model within the system. The motion is then transmitted down through the system hierarchy of models in accordance with hierarchical definitions and joint movement limitations. The present invention also calls for a method of editing hierarchical structure in response to interactive commands or a command file such that a model may be included, deleted, copied or moved within multiple system model hierarchies. The present invention also calls for the definition of multiple viewpoints or cameras which may exist as part of a system hierarchy or as an independent camera. The simulated movement of the models and systems is graphically displayed on a monitor and a frame is recorded by means of a video controller. Multiple movement and hierarchy manipulations are then recorded as a sequence of frames which may be played back as an animation sequence on a video cassette recorder.

  5. An adaptive, object oriented strategy for base calling in DNA sequence analysis.

    PubMed Central

    Giddings, M C; Brumley, R L; Haker, M; Smith, L M

    1993-01-01

    An algorithm has been developed for the determination of nucleotide sequence from data produced in fluorescence-based automated DNA sequencing instruments employing the four-color strategy. This algorithm takes advantage of object oriented programming techniques for modularity and extensibility. The algorithm is adaptive in that data sets from a wide variety of instruments and sequencing conditions can be used with good results. Confidence values are provided on the base calls as an estimate of accuracy. The algorithm iteratively employs confidence determinations from several different modules, each of which examines a different feature of the data for accurate peak identification. Modules within this system can be added or removed for increased performance or for application to a different task. In comparisons with commercial software, the algorithm performed well. Images PMID:8233787

  6. STORMSeq: An Open-Source, User-Friendly Pipeline for Processing Personal Genomics Data in the Cloud

    PubMed Central

    Karczewski, Konrad J.; Fernald, Guy Haskin; Martin, Alicia R.; Snyder, Michael; Tatonetti, Nicholas P.; Dudley, Joel T.

    2014-01-01

    The increasing public availability of personal complete genome sequencing data has ushered in an era of democratized genomics. However, read mapping and variant calling software is constantly improving and individuals with personal genomic data may prefer to customize and update their variant calls. Here, we describe STORMSeq (Scalable Tools for Open-Source Read Mapping), a graphical interface cloud computing solution that does not require a parallel computing environment or extensive technical experience. This customizable and modular system performs read mapping, read cleaning, and variant calling and annotation. At present, STORMSeq costs approximately $2 and 5–10 hours to process a full exome sequence and $30 and 3–8 days to process a whole genome sequence. We provide this open-access and open-source resource as a user-friendly interface in Amazon EC2. PMID:24454756

  7. BONSAI Garden: Parallel knowledge discovery system for amino acid sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shoudai, T.; Miyano, S.; Shinohara, A.

    1995-12-31

    We have developed a machine discovery system BON-SAI which receives positive and negative examples as inputs and produces as a hypothesis a pair of a decision tree over regular patterns and an alphabet indexing. This system has succeeded in discovering reasonable knowledge on transmembrane domain sequences and signal peptide sequences by computer experiments. However, when several kinds of sequences axe mixed in the data, it does not seem reasonable for a single BONSAI system to find a hypothesis of a reasonably small size with high accuracy. For this purpose, we have designed a system BONSAI Garden, in which several BONSAI`smore » and a program called Gardener run over a network in parallel, to partition the data into some number of classes together with hypotheses explaining these classes accurately.« less

  8. Structure and function of neonatal social communication in a genetic mouse model of autism.

    PubMed

    Takahashi, T; Okabe, S; Broin, P Ó; Nishi, A; Ye, K; Beckert, M V; Izumi, T; Machida, A; Kang, G; Abe, S; Pena, J L; Golden, A; Kikusui, T; Hiroi, N

    2016-09-01

    A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor.

  9. Structure and function of neonatal social communication in a genetic mouse model of autism

    PubMed Central

    Takahashi, Tomohisa; Okabe, Shota; Ó Broin, Pilib; Nishi, Akira; Ye, Kenny; Beckert, Michael V.; Izumi, Takeshi; Machida, Akihiro; Kang, Gina; Abe, Seiji; Pena, Jose L.; Golden, Aaron; Kikusui, Takefumi; Hiroi, Noboru

    2015-01-01

    A critical step toward understanding autism spectrum disorder (ASD) is to identify both genetic and environmental risk factors. A number of rare copy number variants (CNVs) have emerged as robust genetic risk factors for ASD, but not all CNV carriers exhibit ASD and the severity of ASD symptoms varies among CNV carriers. Although evidence exists that various environmental factors modulate symptomatic severity, the precise mechanisms by which these factors determine the ultimate severity of ASD are still poorly understood. Here, using a mouse heterozygous for Tbx1 (a gene encoded in 22q11.2 CNV), we demonstrate that a genetically-triggered neonatal phenotype in vocalization generates a negative environmental loop in pup-mother social communication. Wild-type pups used individually diverse sequences of simple and complicated call types, but heterozygous pups used individually invariable call sequences with less complicated call types. When played back, representative wild-type call sequences elicited maternal approach, but heterozygous call sequences were ineffective. When the representative wild-type call sequences were randomized, they were ineffective in eliciting vigorous maternal approach behavior. These data demonstrate that an ASD risk gene alters the neonatal call sequence of its carriers and this pup phenotype in turn diminishes maternal care through atypical social communication. Thus, an ASD risk gene induces, through atypical neonatal call sequences, less than optimal maternal care as a negative neonatal environmental factor. PMID:26666205

  10. RIKEN Integrated Sequence Analysis (RISA) System—384-Format Sequencing Pipeline with 384 Multicapillary Sequencer

    PubMed Central

    Shibata, Kazuhiro; Itoh, Masayoshi; Aizawa, Katsunori; Nagaoka, Sumiharu; Sasaki, Nobuya; Carninci, Piero; Konno, Hideaki; Akiyama, Junichi; Nishi, Katsuo; Kitsunai, Tokuji; Tashiro, Hideo; Itoh, Mari; Sumi, Noriko; Ishii, Yoshiyuki; Nakamura, Shin; Hazama, Makoto; Nishine, Tsutomu; Harada, Akira; Yamamoto, Rintaro; Matsumoto, Hiroyuki; Sakaguchi, Sumito; Ikegami, Takashi; Kashiwagi, Katsuya; Fujiwake, Syuji; Inoue, Kouji; Togawa, Yoshiyuki; Izawa, Masaki; Ohara, Eiji; Watahiki, Masanori; Yoneda, Yuko; Ishikawa, Tomokazu; Ozawa, Kaori; Tanaka, Takumi; Matsuura, Shuji; Kawai, Jun; Okazaki, Yasushi; Muramatsu, Masami; Inoue, Yorinao; Kira, Akira; Hayashizaki, Yoshihide

    2000-01-01

    The RIKEN high-throughput 384-format sequencing pipeline (RISA system) including a 384-multicapillary sequencer (the so-called RISA sequencer) was developed for the RIKEN mouse encyclopedia project. The RISA system consists of colony picking, template preparation, sequencing reaction, and the sequencing process. A novel high-throughput 384-format capillary sequencer system (RISA sequencer system) was developed for the sequencing process. This system consists of a 384-multicapillary auto sequencer (RISA sequencer), a 384-multicapillary array assembler (CAS), and a 384-multicapillary casting device. The RISA sequencer can simultaneously analyze 384 independent sequencing products. The optical system is a scanning system chosen after careful comparison with an image detection system for the simultaneous detection of the 384-capillary array. This scanning system can be used with any fluorescent-labeled sequencing reaction (chain termination reaction), including transcriptional sequencing based on RNA polymerase, which was originally developed by us, and cycle sequencing based on thermostable DNA polymerase. For long-read sequencing, 380 out of 384 sequences (99.2%) were successfully analyzed and the average read length, with more than 99% accuracy, was 654.4 bp. A single RISA sequencer can analyze 216 kb with >99% accuracy in 2.7 h (90 kb/h). For short-read sequencing to cluster the 3′ end and 5′ end sequencing by reading 350 bp, 384 samples can be analyzed in 1.5 h. We have also developed a RISA inoculator, RISA filtrator and densitometer, RISA plasmid preparator which can handle throughput of 40,000 samples in 17.5 h, and a high-throughput RISA thermal cycler which has four 384-well sites. The combination of these technologies allowed us to construct the RISA system consisting of 16 RISA sequencers, which can process 50,000 DNA samples per day. One haploid genome shotgun sequence of a higher organism, such as human, mouse, rat, domestic animals, and plants, can be revealed by seven RISA systems within one month. PMID:11076861

  11. Eye movement sequence generation in humans: Motor or goal updating?

    PubMed Central

    Quaia, Christian; Joiner, Wilsaan M.; FitzGibbon, Edmond J.; Optican, Lance M.; Smith, Maurice A.

    2011-01-01

    Saccadic eye movements are often grouped in pre-programmed sequences. The mechanism underlying the generation of each saccade in a sequence is currently poorly understood. Broadly speaking, two alternative schemes are possible: first, after each saccade the retinotopic location of the next target could be estimated, and an appropriate saccade could be generated. We call this the goal updating hypothesis. Alternatively, multiple motor plans could be pre-computed, and they could then be updated after each movement. We call this the motor updating hypothesis. We used McLaughlin’s intra-saccadic step paradigm to artificially create a condition under which these two hypotheses make discriminable predictions. We found that in human subjects, when sequences of two saccades are planned, the motor updating hypothesis predicts the landing position of the second saccade in two-saccade sequences much better than the goal updating hypothesis. This finding suggests that the human saccadic system is capable of executing sequences of saccades to multiple targets by planning multiple motor commands, which are then updated by serial subtraction of ongoing motor output. PMID:21191134

  12. A machine learning model to determine the accuracy of variant calls in capture-based next generation sequencing.

    PubMed

    van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y

    2018-04-17

    Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.

  13. Comparison and evaluation of two exome capture kits and sequencing platforms for variant calling.

    PubMed

    Zhang, Guoqiang; Wang, Jianfeng; Yang, Jin; Li, Wenjie; Deng, Yutian; Li, Jing; Huang, Jun; Hu, Songnian; Zhang, Bing

    2015-08-05

    To promote the clinical application of next-generation sequencing, it is important to obtain accurate and consistent variants of target genomic regions at low cost. Ion Proton, the latest updated semiconductor-based sequencing instrument from Life Technologies, is designed to provide investigators with an inexpensive platform for human whole exome sequencing that achieves a rapid turnaround time. However, few studies have comprehensively compared and evaluated the accuracy of variant calling between Ion Proton and Illumina sequencing platforms such as HiSeq 2000, which is the most popular sequencing platform for the human genome. The Ion Proton sequencer combined with the Ion TargetSeq Exome Enrichment Kit together make up TargetSeq-Proton, whereas SureSelect-Hiseq is based on the Agilent SureSelect Human All Exon v4 Kit and the HiSeq 2000 sequencer. Here, we sequenced exonic DNA from four human blood samples using both TargetSeq-Proton and SureSelect-HiSeq. We then called variants in the exonic regions that overlapped between the two exome capture kits (33.6 Mb). The rates of shared variant loci called by two sequencing platforms were from 68.0 to 75.3% in four samples, whereas the concordance of co-detected variant loci reached 99%. Sanger sequencing validation revealed that the validated rate of concordant single nucleotide polymorphisms (SNPs) (91.5%) was higher than the SNPs specific to TargetSeq-Proton (60.0%) or specific to SureSelect-HiSeq (88.3%). With regard to 1-bp small insertions and deletions (InDels), the Sanger sequencing validated rates of concordant variants (100.0%) and SureSelect-HiSeq-specific (89.6%) were higher than those of TargetSeq-Proton-specific (15.8%). In the sequencing of exonic regions, a combination of using of two sequencing strategies (SureSelect-HiSeq and TargetSeq-Proton) increased the variant calling specificity for concordant variant loci and the sensitivity for variant loci called by any one platform. However, for the sequencing of platform-specific variants, the accuracy of variant calling by HiSeq 2000 was higher than that of Ion Proton, specifically for the InDel detection. Moreover, the variant calling software also influences the detection of SNPs and, specifically, InDels in Ion Proton exome sequencing.

  14. Experimental Design-Based Functional Mining and Characterization of High-Throughput Sequencing Data in the Sequence Read Archive

    PubMed Central

    Nakazato, Takeru; Ohta, Tazro; Bono, Hidemasa

    2013-01-01

    High-throughput sequencing technology, also called next-generation sequencing (NGS), has the potential to revolutionize the whole process of genome sequencing, transcriptomics, and epigenetics. Sequencing data is captured in a public primary data archive, the Sequence Read Archive (SRA). As of January 2013, data from more than 14,000 projects have been submitted to SRA, which is double that of the previous year. Researchers can download raw sequence data from SRA website to perform further analyses and to compare with their own data. However, it is extremely difficult to search entries and download raw sequences of interests with SRA because the data structure is complicated, and experimental conditions along with raw sequences are partly described in natural language. Additionally, some sequences are of inconsistent quality because anyone can submit sequencing data to SRA with no quality check. Therefore, as a criterion of data quality, we focused on SRA entries that were cited in journal articles. We extracted SRA IDs and PubMed IDs (PMIDs) from SRA and full-text versions of journal articles and retrieved 2748 SRA ID-PMID pairs. We constructed a publication list referring to SRA entries. Since, one of the main themes of -omics analyses is clarification of disease mechanisms, we also characterized SRA entries by disease keywords, according to the Medical Subject Headings (MeSH) extracted from articles assigned to each SRA entry. We obtained 989 SRA ID-MeSH disease term pairs, and constructed a disease list referring to SRA data. We previously developed feature profiles of diseases in a system called “Gendoo”. We generated hyperlinks between diseases extracted from SRA and the feature profiles of it. The developed project, publication and disease lists resulting from this study are available at our web service, called “DBCLS SRA” (http://sra.dbcls.jp/). This service will improve accessibility to high-quality data from SRA. PMID:24167589

  15. Diffusion modulation of DNA by toehold exchange

    NASA Astrophysics Data System (ADS)

    Rodjanapanyakul, Thanapop; Takabatake, Fumi; Abe, Keita; Kawamata, Ibuki; Nomura, Shinichiro M.; Murata, Satoshi

    2018-05-01

    We propose a method to control the diffusion speed of DNA molecules with a target sequence in a polymer solution. The interaction between solute DNA and diffusion-suppressing DNA that has been anchored to a polymer matrix is modulated by the concentration of the third DNA molecule called the competitor by a mechanism called toehold exchange. Experimental results show that the sequence-specific modulation of the diffusion coefficient is successfully achieved. The diffusion coefficient can be modulated up to sixfold by changing the concentration of the competitor. The specificity of the modulation is also verified under the coexistence of a set of DNA with noninteracting base sequences. With this mechanism, we are able to control the diffusion coefficient of individual DNA species by the concentration of another DNA species. This methodology introduces a programmability to a DNA-based reaction-diffusion system.

  16. Fast single-pass alignment and variant calling using sequencing data

    USDA-ARS?s Scientific Manuscript database

    Sequencing research requires efficient computation. Few programs use already known information about DNA variants when aligning sequence data to the reference map. New program findmap.f90 reads the previous variant list before aligning sequence, calling variant alleles, and summing the allele counts...

  17. Evolutionary neural networks for anomaly detection based on the behavior of a program.

    PubMed

    Han, Sang-Jun; Cho, Sung-Bae

    2006-06-01

    The process of learning the behavior of a given program by using machine-learning techniques (based on system-call audit data) is effective to detect intrusions. Rule learning, neural networks, statistics, and hidden Markov models (HMMs) are some of the kinds of representative methods for intrusion detection. Among them, neural networks are known for good performance in learning system-call sequences. In order to apply this knowledge to real-world problems successfully, it is important to determine the structures and weights of these call sequences. However, finding the appropriate structures requires very long time periods because there are no suitable analytical solutions. In this paper, a novel intrusion-detection technique based on evolutionary neural networks (ENNs) is proposed. One advantage of using ENNs is that it takes less time to obtain superior neural networks than when using conventional approaches. This is because they discover the structures and weights of the neural networks simultaneously. Experimental results with the 1999 Defense Advanced Research Projects Agency (DARPA) Intrusion Detection Evaluation (IDEVAL) data confirm that ENNs are promising tools for intrusion detection.

  18. Writing DNA with GenoCAD.

    PubMed

    Czar, Michael J; Cai, Yizhi; Peccoud, Jean

    2009-07-01

    Chemical synthesis of custom DNA made to order calls for software streamlining the design of synthetic DNA sequences. GenoCAD (www.genocad.org) is a free web-based application to design protein expression vectors, artificial gene networks and other genetic constructs composed of multiple functional blocks called genetic parts. By capturing design strategies in grammatical models of DNA sequences, GenoCAD guides the user through the design process. By successively clicking on icons representing structural features or actual genetic parts, complex constructs composed of dozens of functional blocks can be designed in a matter of minutes. GenoCAD automatically derives the construct sequence from its comprehensive libraries of genetic parts. Upon completion of the design process, users can download the sequence for synthesis or further analysis. Users who elect to create a personal account on the system can customize their workspace by creating their own parts libraries, adding new parts to the libraries, or reusing designs to quickly generate sets of related constructs.

  19. Best practices for evaluating single nucleotide variant calling methods for microbial genomics

    PubMed Central

    Olson, Nathan D.; Lund, Steven P.; Colman, Rebecca E.; Foster, Jeffrey T.; Sahl, Jason W.; Schupp, James M.; Keim, Paul; Morrow, Jayne B.; Salit, Marc L.; Zook, Justin M.

    2015-01-01

    Innovations in sequencing technologies have allowed biologists to make incredible advances in understanding biological systems. As experience grows, researchers increasingly recognize that analyzing the wealth of data provided by these new sequencing platforms requires careful attention to detail for robust results. Thus far, much of the scientific Communit’s focus for use in bacterial genomics has been on evaluating genome assembly algorithms and rigorously validating assembly program performance. Missing, however, is a focus on critical evaluation of variant callers for these genomes. Variant calling is essential for comparative genomics as it yields insights into nucleotide-level organismal differences. Variant calling is a multistep process with a host of potential error sources that may lead to incorrect variant calls. Identifying and resolving these incorrect calls is critical for bacterial genomics to advance. The goal of this review is to provide guidance on validating algorithms and pipelines used in variant calling for bacterial genomics. First, we will provide an overview of the variant calling procedures and the potential sources of error associated with the methods. We will then identify appropriate datasets for use in evaluating algorithms and describe statistical methods for evaluating algorithm performance. As variant calling moves from basic research to the applied setting, standardized methods for performance evaluation and reporting are required; it is our hope that this review provides the groundwork for the development of these standards. PMID:26217378

  20. Masking as an effective quality control method for next-generation sequencing data analysis.

    PubMed

    Yun, Sajung; Yun, Sijung

    2014-12-13

    Next generation sequencing produces base calls with low quality scores that can affect the accuracy of identifying simple nucleotide variation calls, including single nucleotide polymorphisms and small insertions and deletions. Here we compare the effectiveness of two data preprocessing methods, masking and trimming, and the accuracy of simple nucleotide variation calls on whole-genome sequence data from Caenorhabditis elegans. Masking substitutes low quality base calls with 'N's (undetermined bases), whereas trimming removes low quality bases that results in a shorter read lengths. We demonstrate that masking is more effective than trimming in reducing the false-positive rate in single nucleotide polymorphism (SNP) calling. However, both of the preprocessing methods did not affect the false-negative rate in SNP calling with statistical significance compared to the data analysis without preprocessing. False-positive rate and false-negative rate for small insertions and deletions did not show differences between masking and trimming. We recommend masking over trimming as a more effective preprocessing method for next generation sequencing data analysis since masking reduces the false-positive rate in SNP calling without sacrificing the false-negative rate although trimming is more commonly used currently in the field. The perl script for masking is available at http://code.google.com/p/subn/. The sequencing data used in the study were deposited in the Sequence Read Archive (SRX450968 and SRX451773).

  1. Project Report: Automatic Sequence Processor Software Analysis

    NASA Technical Reports Server (NTRS)

    Benjamin, Brandon

    2011-01-01

    The Mission Planning and Sequencing (MPS) element of Multi-Mission Ground System and Services (MGSS) provides space missions with multi-purpose software to plan spacecraft activities, sequence spacecraft commands, and then integrate these products and execute them on spacecraft. Jet Propulsion Laboratory (JPL) is currently is flying many missions. The processes for building, integrating, and testing the multi-mission uplink software need to be improved to meet the needs of the missions and the operations teams that command the spacecraft. The Multi-Mission Sequencing Team is responsible for collecting and processing the observations, experiments and engineering activities that are to be performed on a selected spacecraft. The collection of these activities is called a sequence and ultimately a sequence becomes a sequence of spacecraft commands. The operations teams check the sequence to make sure that no constraints are violated. The workflow process involves sending a program start command, which activates the Automatic Sequence Processor (ASP). The ASP is currently a file-based system that is comprised of scripts written in perl, c-shell and awk. Once this start process is complete, the system checks for errors and aborts if there are any; otherwise the system converts the commands to binary, and then sends the resultant information to be radiated to the spacecraft.

  2. VizieR Online Data Catalog: Habitable zones around main-sequence stars (Kopparapu+, 2014)

    NASA Astrophysics Data System (ADS)

    Kopparapu, R. K.; Ramirez, R. M.; Schottelkotte, J.; Kasting, J. F.; Domagal-Goldman, S.; Eymet, V.

    2017-08-01

    Language: Fortran 90 Code tested under the following compilers/operating systems: ifort/CentOS linux Description of input data: No input necessary. Description of output data: Output files: HZs.dat, HZ_coefficients.dat System requirements: No major system requirement. Fortran compiler necessary. Calls to external routines: None. Additional comments: None (1 data file).

  3. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma.

    PubMed

    Wrzeszczynski, Kazimierz O; Frank, Mayu O; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A; Moore Vogel, Julia L; Bruce, Jeffrey N; Lassman, Andrew B; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V; Zody, Michael C; Jobanputra, Vaidehi; Royyuru, Ajay K; Darnell, Robert B

    2017-08-01

    To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. NCT02725684.

  4. Model-based quality assessment and base-calling for second-generation sequencing data.

    PubMed

    Bravo, Héctor Corrada; Irizarry, Rafael A

    2010-09-01

    Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.

  5. The Kritzel System for handwriting interpretation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Qian, G.

    We present a new system for recognizing on-line cursive handwriting. The system, which is called the Kritzel System, has four features. First, the system characterizes handwriting as a sequence of feature vectors. Second, the system adapts to a particular writing style itself through a learning process. Third, the reasoning of the system is formulated in propositional logic with likelihoods. Fourth, the system can be readily linked with other English processing systems for lexical and contextual checking.

  6. Correcting for Sample Contamination in Genotype Calling of DNA Sequence Data

    PubMed Central

    Flickinger, Matthew; Jun, Goo; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

    2015-01-01

    DNA sample contamination is a frequent problem in DNA sequencing studies and can result in genotyping errors and reduced power for association testing. We recently described methods to identify within-species DNA sample contamination based on sequencing read data, showed that our methods can reliably detect and estimate contamination levels as low as 1%, and suggested strategies to identify and remove contaminated samples from sequencing studies. Here we propose methods to model contamination during genotype calling as an alternative to removal of contaminated samples from further analyses. We compare our contamination-adjusted calls to calls that ignore contamination and to calls based on uncontaminated data. We demonstrate that, for moderate contamination levels (5%–20%), contamination-adjusted calls eliminate 48%–77% of the genotyping errors. For lower levels of contamination, our contamination correction methods produce genotypes nearly as accurate as those based on uncontaminated data. Our contamination correction methods are useful generally, but are particularly helpful for sample contamination levels from 2% to 20%. PMID:26235984

  7. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, R.J.; Crowell, S.L.

    1996-05-07

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection. 18 figs.

  8. Apparatus for improved DNA sequencing

    DOEpatents

    Douthart, Richard J.; Crowell, Shannon L.

    1996-01-01

    This invention is a means for the rapid sequencing of DNA samples. More specifically, it consists of a new design direct blotting electrophoresis unit. The DNA sequence is deposited on a membrane attached to a rotating drum. Initial data compaction is facilitated by the use of a machined multi-channeled plate called a ribbon channel plate. Each channel is an isolated mini gel system much like a gel filled capillary. The system as a whole, however, is in a slab gel like format with the advantages of uniformity and easy reusability. The system can be used in different embodiments. The drum system is unique in that after deposition the drum rotates the deposited DNA into a large non-buffer open space where processing and detection can occur. The drum can also be removed in toto to special workstations for downstream processing, multiplexing and detection.

  9. Characterization of a Fluorescent Protein Reporter System

    DTIC Science & Technology

    2008-03-01

    pathways are initiated with the binding of a small molecule to a catalytic ribonucleic acid molecule (RNA), called a ribozyme (Thodima et al., 2006). The... ribozyme is part of a larger RNA construct, called a riboswitch, which initiates translation of a specific genetic sequence on a plasmid (circular...protein gene. Yen et al. (2004) reported insertion of a self-cleaving ribozyme upstream of the reporter gene. In the absence of a regulator (“off

  10. VarDict: a novel and versatile variant caller for next-generation sequencing in cancer research

    PubMed Central

    Lai, Zhongwu; Markovets, Aleksandra; Ahdesmaki, Miika; Chapman, Brad; Hofmann, Oliver; McEwen, Robert; Johnson, Justin; Dougherty, Brian; Barrett, J. Carl; Dry, Jonathan R.

    2016-01-01

    Abstract Accurate variant calling in next generation sequencing (NGS) is critical to understand cancer genomes better. Here we present VarDict, a novel and versatile variant caller for both DNA- and RNA-sequencing data. VarDict simultaneously calls SNV, MNV, InDels, complex and structural variants, expanding the detected genetic driver landscape of tumors. It performs local realignments on the fly for more accurate allele frequency estimation. VarDict performance scales linearly to sequencing depth, enabling ultra-deep sequencing used to explore tumor evolution or detect tumor DNA circulating in blood. In addition, VarDict performs amplicon aware variant calling for polymerase chain reaction (PCR)-based targeted sequencing often used in diagnostic settings, and is able to detect PCR artifacts. Finally, VarDict also detects differences in somatic and loss of heterozygosity variants between paired samples. VarDict reprocessing of The Cancer Genome Atlas (TCGA) Lung Adenocarcinoma dataset called known driver mutations in KRAS, EGFR, BRAF, PIK3CA and MET in 16% more patients than previously published variant calls. We believe VarDict will greatly facilitate application of NGS in clinical cancer research. PMID:27060149

  11. The influence of bat echolocation call duration and timing on auditory encoding of predator distance in noctuoid moths.

    PubMed

    Gordon, Shira D; Ter Hofstede, Hannah M

    2018-03-22

    Animals co-occur with multiple predators, making sensory systems that can encode information about diverse predators advantageous. Moths in the families Noctuidae and Erebidae have ears with two auditory receptor cells (A1 and A2) used to detect the echolocation calls of predatory bats. Bat communities contain species that vary in echolocation call duration, and the dynamic range of A1 is limited by the duration of sound, suggesting that A1 provides less information about bats with shorter echolocation calls. To test this hypothesis, we obtained intensity-response functions for both receptor cells across many moth species for sound pulse durations representing the range of echolocation call durations produced by bat species in northeastern North America. We found that the threshold and dynamic range of both cells varied with sound pulse duration. The number of A1 action potentials per sound pulse increases linearly with increasing amplitude for long-duration pulses, saturating near the A2 threshold. For short sound pulses, however, A1 saturates with only a few action potentials per pulse at amplitudes far lower than the A2 threshold for both single sound pulses and pulse sequences typical of searching or approaching bats. Neural adaptation was only evident in response to approaching bat sequences at high amplitudes, not search-phase sequences. These results show that, for short echolocation calls, a large range of sound levels cannot be coded by moth auditory receptor activity, resulting in no information about the distance of a bat, although differences in activity between ears might provide information about direction. © 2018. Published by The Company of Biologists Ltd.

  12. GI-POP: a combinational annotation and genomic island prediction pipeline for ongoing microbial genome projects.

    PubMed

    Lee, Chi-Ching; Chen, Yi-Ping Phoebe; Yao, Tzu-Jung; Ma, Cheng-Yu; Lo, Wei-Cheng; Lyu, Ping-Chiang; Tang, Chuan Yi

    2013-04-10

    Sequencing of microbial genomes is important because of microbial-carrying antibiotic and pathogenetic activities. However, even with the help of new assembling software, finishing a whole genome is a time-consuming task. In most bacteria, pathogenetic or antibiotic genes are carried in genomic islands. Therefore, a quick genomic island (GI) prediction method is useful for ongoing sequencing genomes. In this work, we built a Web server called GI-POP (http://gipop.life.nthu.edu.tw) which integrates a sequence assembling tool, a functional annotation pipeline, and a high-performance GI predicting module, in a support vector machine (SVM)-based method called genomic island genomic profile scanning (GI-GPS). The draft genomes of the ongoing genome projects in contigs or scaffolds can be submitted to our Web server, and it provides the functional annotation and highly probable GI-predicting results. GI-POP is a comprehensive annotation Web server designed for ongoing genome project analysis. Researchers can perform annotation and obtain pre-analytic information include possible GIs, coding/non-coding sequences and functional analysis from their draft genomes. This pre-analytic system can provide useful information for finishing a genome sequencing project. Copyright © 2012 Elsevier B.V. All rights reserved.

  13. Virtual Machine Language

    NASA Technical Reports Server (NTRS)

    Grasso, Christopher; Page, Dennis; O'Reilly, Taifun; Fteichert, Ralph; Lock, Patricia; Lin, Imin; Naviaux, Keith; Sisino, John

    2005-01-01

    Virtual Machine Language (VML) is a mission-independent, reusable software system for programming for spacecraft operations. Features of VML include a rich set of data types, named functions, parameters, IF and WHILE control structures, polymorphism, and on-the-fly creation of spacecraft commands from calculated values. Spacecraft functions can be abstracted into named blocks that reside in files aboard the spacecraft. These named blocks accept parameters and execute in a repeatable fashion. The sizes of uplink products are minimized by the ability to call blocks that implement most of the command steps. This block approach also enables some autonomous operations aboard the spacecraft, such as aerobraking, telemetry conditional monitoring, and anomaly response, without developing autonomous flight software. Operators on the ground write blocks and command sequences in a concise, high-level, human-readable programming language (also called VML ). A compiler translates the human-readable blocks and command sequences into binary files (the operations products). The flight portion of VML interprets the uplinked binary files. The ground subsystem of VML also includes an interactive sequence- execution tool hosted on workstations, which runs sequences at several thousand times real-time speed, affords debugging, and generates reports. This tool enables iterative development of blocks and sequences within times of the order of seconds.

  14. Evaluation of exome variants using the Ion Proton Platform to sequence error-prone regions.

    PubMed

    Seo, Heewon; Park, Yoomi; Min, Byung Joo; Seo, Myung Eui; Kim, Ju Han

    2017-01-01

    The Ion Proton sequencer from Thermo Fisher accurately determines sequence variants from target regions with a rapid turnaround time at a low cost. However, misleading variant-calling errors can occur. We performed a systematic evaluation and manual curation of read-level alignments for the 675 ultrarare variants reported by the Ion Proton sequencer from 27 whole-exome sequencing data but that are not present in either the 1000 Genomes Project and the Exome Aggregation Consortium. We classified positive variant calls into 393 highly likely false positives, 126 likely false positives, and 156 likely true positives, which comprised 58.2%, 18.7%, and 23.1% of the variants, respectively. We identified four distinct error patterns of variant calling that may be bioinformatically corrected when using different strategies: simplicity region, SNV cluster, peripheral sequence read, and base inversion. Local de novo assembly successfully corrected 201 (38.7%) of the 519 highly likely or likely false positives. We also demonstrate that the two sequencing kits from Thermo Fisher (the Ion PI Sequencing 200 kit V3 and the Ion PI Hi-Q kit) exhibit different error profiles across different error types. A refined calling algorithm with better polymerase may improve the performance of the Ion Proton sequencing platform.

  15. Utilization of sequence on relatives to improve analysis of individuals' low-coverage NGS data

    USDA-ARS?s Scientific Manuscript database

    Low-coverage sequence data is expected to have low call rates under the prevailing paradigm that genotypes are first “called” from sequence data of each individual independently and subsequent analyses (including determination of haplotypes) are dependent on those called genotypes. However, provide...

  16. Robust sensorimotor representation to physical interaction changes in humanoid motion learning.

    PubMed

    Shimizu, Toshihiko; Saegusa, Ryo; Ikemoto, Shuhei; Ishiguro, Hiroshi; Metta, Giorgio

    2015-05-01

    This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.

  17. Automated constraint checking of spacecraft command sequences

    NASA Astrophysics Data System (ADS)

    Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Spitale, Joseph M.; Le, Dang

    1995-01-01

    Robotic spacecraft are controlled by onboard sets of commands called "sequences." Determining that sequences will have the desired effect on the spacecraft can be expensive in terms of both labor and computer coding time, with different particular costs for different types of spacecraft. Specification languages and appropriate user interface to the languages can be used to make the most effective use of engineering validation time. This paper describes one specification and verification environment ("SAVE") designed for validating that command sequences have not violated any flight rules. This SAVE system was subsequently adapted for flight use on the TOPEX/Poseidon spacecraft. The relationship of this work to rule-based artificial intelligence and to other specification techniques is discussed, as well as the issues that arise in the transfer of technology from a research prototype to a full flight system.

  18. The instant sequencing task: Toward constraint-checking a complex spacecraft command sequence interactively

    NASA Technical Reports Server (NTRS)

    Horvath, Joan C.; Alkalaj, Leon J.; Schneider, Karl M.; Amador, Arthur V.; Spitale, Joseph N.

    1993-01-01

    Robotic spacecraft are controlled by sets of commands called 'sequences.' These sequences must be checked against mission constraints. Making our existing constraint checking program faster would enable new capabilities in our uplink process. Therefore, we are rewriting this program to run on a parallel computer. To do so, we had to determine how to run constraint-checking algorithms in parallel and create a new method of specifying spacecraft models and constraints. This new specification gives us a means of representing flight systems and their predicted response to commands which could be used in a variety of applications throughout the command process, particularly during anomaly or high-activity operations. This commonality could reduce operations cost and risk for future complex missions. Lessons learned in applying some parts of this system to the TOPEX/Poseidon mission will be described.

  19. Functional and computational analysis of amino acid patterns predictive of type III secretion system substrates in Pseudomonas syringae

    USDA-ARS?s Scientific Manuscript database

    Bacterial type III secretion systems (T3SSs) deliver proteins called effectors into eukaryotic cells. Although N-terminal amino acid sequences are required for translocation, the mechanism of substrate recognition by the T3SS is unknown. Almost all actively deployed T3SS substrates in the plant path...

  20. Generating Nice Linear Systems for Matrix Gaussian Elimination

    ERIC Educational Resources Information Center

    Homewood, L. James

    2004-01-01

    In this article an augmented matrix that represents a system of linear equations is called nice if a sequence of elementary row operations that reduces the matrix to row-echelon form, through matrix Gaussian elimination, does so by restricting all entries to integers in every step. Many instructors wish to use the example of matrix Gaussian…

  1. Variant calling in low-coverage whole genome sequencing of a Native American population sample.

    PubMed

    Bizon, Chris; Spiegel, Michael; Chasse, Scott A; Gizer, Ian R; Li, Yun; Malc, Ewa P; Mieczkowski, Piotr A; Sailsbery, Josh K; Wang, Xiaoshu; Ehlers, Cindy L; Wilhelmsen, Kirk C

    2014-01-30

    The reduction in the cost of sequencing a human genome has led to the use of genotype sampling strategies in order to impute and infer the presence of sequence variants that can then be tested for associations with traits of interest. Low-coverage Whole Genome Sequencing (WGS) is a sampling strategy that overcomes some of the deficiencies seen in fixed content SNP array studies. Linkage-disequilibrium (LD) aware variant callers, such as the program Thunder, may provide a calling rate and accuracy that makes a low-coverage sequencing strategy viable. We examined the performance of an LD-aware variant calling strategy in a population of 708 low-coverage whole genome sequences from a community sample of Native Americans. We assessed variant calling through a comparison of the sequencing results to genotypes measured in 641 of the same subjects using a fixed content first generation exome array. The comparison was made using the variant calling routines GATK Unified Genotyper program and the LD-aware variant caller Thunder. Thunder was found to improve concordance in a coverage dependent fashion, while correctly calling nearly all of the common variants as well as a high percentage of the rare variants present in the sample. Low-coverage WGS is a strategy that appears to collect genetic information intermediate in scope between fixed content genotyping arrays and deep-coverage WGS. Our data suggests that low-coverage WGS is a viable strategy with a greater chance of discovering novel variants and associations than fixed content arrays for large sample association analyses.

  2. SeqReporter: automating next-generation sequencing result interpretation and reporting workflow in a clinical laboratory.

    PubMed

    Roy, Somak; Durso, Mary Beth; Wald, Abigail; Nikiforov, Yuri E; Nikiforova, Marina N

    2014-01-01

    A wide repertoire of bioinformatics applications exist for next-generation sequencing data analysis; however, certain requirements of the clinical molecular laboratory limit their use: i) comprehensive report generation, ii) compatibility with existing laboratory information systems and computer operating system, iii) knowledgebase development, iv) quality management, and v) data security. SeqReporter is a web-based application developed using ASP.NET framework version 4.0. The client-side was designed using HTML5, CSS3, and Javascript. The server-side processing (VB.NET) relied on interaction with a customized SQL server 2008 R2 database. Overall, 104 cases (1062 variant calls) were analyzed by SeqReporter. Each variant call was classified into one of five report levels: i) known clinical significance, ii) uncertain clinical significance, iii) pending pathologists' review, iv) synonymous and deep intronic, and v) platform and panel-specific sequence errors. SeqReporter correctly annotated and classified 99.9% (859 of 860) of sequence variants, including 68.7% synonymous single-nucleotide variants, 28.3% nonsynonymous single-nucleotide variants, 1.7% insertions, and 1.3% deletions. One variant of potential clinical significance was re-classified after pathologist review. Laboratory information system-compatible clinical reports were generated automatically. SeqReporter also facilitated quality management activities. SeqReporter is an example of a customized and well-designed informatics solution to optimize and automate the downstream analysis of clinical next-generation sequencing data. We propose it as a model that may envisage the development of a comprehensive clinical informatics solution. Copyright © 2014 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  3. Command system output bit verification

    NASA Technical Reports Server (NTRS)

    Odd, C. W.; Abbate, S. F.

    1981-01-01

    An automatic test was developed to test the ability of the deep space station (DSS) command subsystem and exciter to generate and radiate, from the exciter, the correct idle bit sequence for a given flight project or to store and radiate received command data elements and files without alteration. This test, called the command system output bit verification test, is an extension of the command system performance test (SPT) and can be selected as an SPT option. The test compares the bit stream radiated from the DSS exciter with reference sequences generated by the SPT software program. The command subsystem and exciter are verified when the bit stream and reference sequences are identical. It is a key element of the acceptance testing conducted on the command processor assembly (CPA) operational program (DMC-0584-OP-G) prior to its transfer from development to operations.

  4. Fundamental Mechanisms of NeuroInformation Processing: Inverse Problems and Spike Processing

    DTIC Science & Technology

    2016-08-04

    platform called Neurokernel for collaborative development of comprehensive models of the brain of the fruit fly Drosophila melanogaster and their execution...example. We investigated the following nonlinear identification problem: given both the input signal u and the time sequence (tk)k2Z at the output of...from a time sequence is to be contrasted with existing methods for rate-based models in neuroscience. In such models the output of the system is taken

  5. Comparing sequencing assays and human-machine analyses in actionable genomics for glioblastoma

    PubMed Central

    Wrzeszczynski, Kazimierz O.; Frank, Mayu O.; Koyama, Takahiko; Rhrissorrakrai, Kahn; Robine, Nicolas; Utro, Filippo; Emde, Anne-Katrin; Chen, Bo-Juen; Arora, Kanika; Shah, Minita; Vacic, Vladimir; Norel, Raquel; Bilal, Erhan; Bergmann, Ewa A.; Moore Vogel, Julia L.; Bruce, Jeffrey N.; Lassman, Andrew B.; Canoll, Peter; Grommes, Christian; Harvey, Steve; Parida, Laxmi; Michelini, Vanessa V.; Zody, Michael C.; Jobanputra, Vaidehi; Royyuru, Ajay K.

    2017-01-01

    Objective: To analyze a glioblastoma tumor specimen with 3 different platforms and compare potentially actionable calls from each. Methods: Tumor DNA was analyzed by a commercial targeted panel. In addition, tumor-normal DNA was analyzed by whole-genome sequencing (WGS) and tumor RNA was analyzed by RNA sequencing (RNA-seq). The WGS and RNA-seq data were analyzed by a team of bioinformaticians and cancer oncologists, and separately by IBM Watson Genomic Analytics (WGA), an automated system for prioritizing somatic variants and identifying drugs. Results: More variants were identified by WGS/RNA analysis than by targeted panels. WGA completed a comparable analysis in a fraction of the time required by the human analysts. Conclusions: The development of an effective human-machine interface in the analysis of deep cancer genomic datasets may provide potentially clinically actionable calls for individual patients in a more timely and efficient manner than currently possible. ClinicalTrials.gov identifier: NCT02725684. PMID:28740869

  6. Dynamic variable selection in SNP genotype autocalling from APEX microarray data.

    PubMed

    Podder, Mohua; Welch, William J; Zamar, Ruben H; Tebbutt, Scott J

    2006-11-30

    Single nucleotide polymorphisms (SNPs) are DNA sequence variations, occurring when a single nucleotide--adenine (A), thymine (T), cytosine (C) or guanine (G)--is altered. Arguably, SNPs account for more than 90% of human genetic variation. Our laboratory has developed a highly redundant SNP genotyping assay consisting of multiple probes with signals from multiple channels for a single SNP, based on arrayed primer extension (APEX). This mini-sequencing method is a powerful combination of a highly parallel microarray with distinctive Sanger-based dideoxy terminator sequencing chemistry. Using this microarray platform, our current genotype calling system (known as SNP Chart) is capable of calling single SNP genotypes by manual inspection of the APEX data, which is time-consuming and exposed to user subjectivity bias. Using a set of 32 Coriell DNA samples plus three negative PCR controls as a training data set, we have developed a fully-automated genotyping algorithm based on simple linear discriminant analysis (LDA) using dynamic variable selection. The algorithm combines separate analyses based on the multiple probe sets to give a final posterior probability for each candidate genotype. We have tested our algorithm on a completely independent data set of 270 DNA samples, with validated genotypes, from patients admitted to the intensive care unit (ICU) of St. Paul's Hospital (plus one negative PCR control sample). Our method achieves a concordance rate of 98.9% with a 99.6% call rate for a set of 96 SNPs. By adjusting the threshold value for the final posterior probability of the called genotype, the call rate reduces to 94.9% with a higher concordance rate of 99.6%. We also reversed the two independent data sets in their training and testing roles, achieving a concordance rate up to 99.8%. The strength of this APEX chemistry-based platform is its unique redundancy having multiple probes for a single SNP. Our model-based genotype calling algorithm captures the redundancy in the system considering all the underlying probe features of a particular SNP, automatically down-weighting any 'bad data' corresponding to image artifacts on the microarray slide or failure of a specific chemistry. In this regard, our method is able to automatically select the probes which work well and reduce the effect of other so-called bad performing probes in a sample-specific manner, for any number of SNPs.

  7. Describing sequencing results of structural chromosome rearrangements with a suggested next-generation cytogenetic nomenclature.

    PubMed

    Ordulu, Zehra; Wong, Kristen E; Currall, Benjamin B; Ivanov, Andrew R; Pereira, Shahrin; Althari, Sara; Gusella, James F; Talkowski, Michael E; Morton, Cynthia C

    2014-05-01

    With recent rapid advances in genomic technologies, precise delineation of structural chromosome rearrangements at the nucleotide level is becoming increasingly feasible. In this era of "next-generation cytogenetics" (i.e., an integration of traditional cytogenetic techniques and next-generation sequencing), a consensus nomenclature is essential for accurate communication and data sharing. Currently, nomenclature for describing the sequencing data of these aberrations is lacking. Herein, we present a system called Next-Gen Cytogenetic Nomenclature, which is concordant with the International System for Human Cytogenetic Nomenclature (2013). This system starts with the alignment of rearrangement sequences by BLAT or BLAST (alignment tools) and arrives at a concise and detailed description of chromosomal changes. To facilitate usage and implementation of this nomenclature, we are developing a program designated BLA(S)T Output Sequence Tool of Nomenclature (BOSToN), a demonstrative version of which is accessible online. A standardized characterization of structural chromosomal rearrangements is essential both for research analyses and for application in the clinical setting. Copyright © 2014 The American Society of Human Genetics. Published by Elsevier Inc. All rights reserved.

  8. The Geogenomic Mutational Atlas of Pathogens (GoMAP) Web System

    PubMed Central

    Sargeant, David P.; Hedden, Michael W.; Deverasetty, Sandeep; Strong, Christy L.; Alaniz, Izua J.; Bartlett, Alexandria N.; Brandon, Nicholas R.; Brooks, Steven B.; Brown, Frederick A.; Bufi, Flaviona; Chakarova, Monika; David, Roxanne P.; Dobritch, Karlyn M.; Guerra, Horacio P.; Levit, Kelvy S.; Mathew, Kiran R.; Matti, Ray; Maza, Dorothea Q.; Mistry, Sabyasachy; Novakovic, Nemanja; Pomerantz, Austin; Rafalski, Timothy F.; Rathnayake, Viraj; Rezapour, Noura; Ross, Christian A.; Schooler, Steve G.; Songao, Sarah; Tuggle, Sean L.; Wing, Helen J.; Yousif, Sandy; Schiller, Martin R.

    2014-01-01

    We present a new approach for pathogen surveillance we call Geogenomics. Geogenomics examines the geographic distribution of the genomes of pathogens, with a particular emphasis on those mutations that give rise to drug resistance. We engineered a new web system called Geogenomic Mutational Atlas of Pathogens (GoMAP) that enables investigation of the global distribution of individual drug resistance mutations. As a test case we examined mutations associated with HIV resistance to FDA-approved antiretroviral drugs. GoMAP-HIV makes use of existing public drug resistance and HIV protein sequence data to examine the distribution of 872 drug resistance mutations in ∼502,000 sequences for many countries in the world. We also implemented a broadened classification scheme for HIV drug resistance mutations. Several patterns for geographic distributions of resistance mutations were identified by visual mining using this web tool. GoMAP-HIV is an open access web application available at http://www.bio-toolkit.com/GoMap/project/ PMID:24675726

  9. The Geogenomic Mutational Atlas of Pathogens (GoMAP) web system.

    PubMed

    Sargeant, David P; Hedden, Michael W; Deverasetty, Sandeep; Strong, Christy L; Alaniz, Izua J; Bartlett, Alexandria N; Brandon, Nicholas R; Brooks, Steven B; Brown, Frederick A; Bufi, Flaviona; Chakarova, Monika; David, Roxanne P; Dobritch, Karlyn M; Guerra, Horacio P; Levit, Kelvy S; Mathew, Kiran R; Matti, Ray; Maza, Dorothea Q; Mistry, Sabyasachy; Novakovic, Nemanja; Pomerantz, Austin; Rafalski, Timothy F; Rathnayake, Viraj; Rezapour, Noura; Ross, Christian A; Schooler, Steve G; Songao, Sarah; Tuggle, Sean L; Wing, Helen J; Yousif, Sandy; Schiller, Martin R

    2014-01-01

    We present a new approach for pathogen surveillance we call Geogenomics. Geogenomics examines the geographic distribution of the genomes of pathogens, with a particular emphasis on those mutations that give rise to drug resistance. We engineered a new web system called Geogenomic Mutational Atlas of Pathogens (GoMAP) that enables investigation of the global distribution of individual drug resistance mutations. As a test case we examined mutations associated with HIV resistance to FDA-approved antiretroviral drugs. GoMAP-HIV makes use of existing public drug resistance and HIV protein sequence data to examine the distribution of 872 drug resistance mutations in ∼ 502,000 sequences for many countries in the world. We also implemented a broadened classification scheme for HIV drug resistance mutations. Several patterns for geographic distributions of resistance mutations were identified by visual mining using this web tool. GoMAP-HIV is an open access web application available at http://www.bio-toolkit.com/GoMap/project/

  10. TotalReCaller: improved accuracy and performance via integrated alignment and base-calling.

    PubMed

    Menges, Fabian; Narzisi, Giuseppe; Mishra, Bud

    2011-09-01

    Currently, re-sequencing approaches use multiple modules serially to interpret raw sequencing data from next-generation sequencing platforms, while remaining oblivious to the genomic information until the final alignment step. Such approaches fail to exploit the full information from both raw sequencing data and the reference genome that can yield better quality sequence reads, SNP-calls, variant detection, as well as an alignment at the best possible location in the reference genome. Thus, there is a need for novel reference-guided bioinformatics algorithms for interpreting analog signals representing sequences of the bases ({A, C, G, T}), while simultaneously aligning possible sequence reads to a source reference genome whenever available. Here, we propose a new base-calling algorithm, TotalReCaller, to achieve improved performance. A linear error model for the raw intensity data and Burrows-Wheeler transform (BWT) based alignment are combined utilizing a Bayesian score function, which is then globally optimized over all possible genomic locations using an efficient branch-and-bound approach. The algorithm has been implemented in soft- and hardware [field-programmable gate array (FPGA)] to achieve real-time performance. Empirical results on real high-throughput Illumina data were used to evaluate TotalReCaller's performance relative to its peers-Bustard, BayesCall, Ibis and Rolexa-based on several criteria, particularly those important in clinical and scientific applications. Namely, it was evaluated for (i) its base-calling speed and throughput, (ii) its read accuracy and (iii) its specificity and sensitivity in variant calling. A software implementation of TotalReCaller as well as additional information, is available at: http://bioinformatics.nyu.edu/wordpress/projects/totalrecaller/ fabian.menges@nyu.edu.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Stoiber, Marcus H.; Brown, James B.

    This software implements the first base caller for nanopore data that calls bases directly from raw data. The basecRAWller algorithm has two major advantages over current nanopore base calling software: (1) streaming base calling and (2) base calling from information rich raw signal. The ability to perform truly streaming base calling as signal is received from the sequencer can be very powerful as this is one of the major advantages of this technology as compared to other sequencing technologies. As such enabling as much streaming potential as possible will be incredibly important as this technology continues to become more widelymore » applied in biosciences. All other base callers currently employ the Viterbi algorithm which requires the whole sequence to employ the complete base calling procedure and thus precludes a natural streaming base calling procedure. The other major advantage of the basecRAWller algorithm is the prediction of bases from raw signal which contains much richer information than the segmented chunks that current algorithms employ. This leads to the potential for much more accurate base calls which would make this technology much more valuable to all of the growing user base for this technology.« less

  12. [The dilemma of data flood - reducing costs and increasing quality control].

    PubMed

    Gassmann, B

    2012-09-05

    Digitization is found everywhere in sonography. Printing of ultrasound images using the videoprinter with special paper will be done in single cases. The documentation of sonography procedures is more and more done by saving image sequences instead of still frames. Echocardiography is routinely recorded in between with so called R-R-loops. Doing contrast enhanced ultrasound recording of sequences is necessary to get a deep impression of the vascular structure of interest. Working with this data flood in daily practice a specialized software is required. Comparison in follow up of stored and recent images/sequences is very helpful. Nevertheless quality control of the ultrasound system and the transducers is simple and safe - using a phantom for detail resolution and general image quality the stored images/sequences are comparable over the life cycle of the system. The comparison in follow up is showing decreased image quality and transducer defects immediately.

  13. Evaluation and optimisation of indel detection workflows for ion torrent sequencing of the BRCA1 and BRCA2 genes.

    PubMed

    Yeo, Zhen Xuan; Wong, Joshua Chee Leong; Rozen, Steven G; Lee, Ann Siew Gek

    2014-06-24

    The Ion Torrent PGM is a popular benchtop sequencer that shows promise in replacing conventional Sanger sequencing as the gold standard for mutation detection. Despite the PGM's reported high accuracy in calling single nucleotide variations, it tends to generate many false positive calls in detecting insertions and deletions (indels), which may hinder its utility for clinical genetic testing. Recently, the proprietary analytical workflow for the Ion Torrent sequencer, Torrent Suite (TS), underwent a series of upgrades. We evaluated three major upgrades of TS by calling indels in the BRCA1 and BRCA2 genes. Our analysis revealed that false negative indels could be generated by TS under both default calling parameters and parameters adjusted for maximum sensitivity. However, indel calling with the same data using the open source variant callers, GATK and SAMtools showed that false negatives could be minimised with the use of appropriate bioinformatics analysis. Furthermore, we identified two variant calling measures, Quality-by-Depth (QD) and VARiation of the Width of gaps and inserts (VARW), which substantially reduced false positive indels, including non-homopolymer associated errors without compromising sensitivity. In our best case scenario that involved the TMAP aligner and SAMtools, we achieved 100% sensitivity, 99.99% specificity and 29% False Discovery Rate (FDR) in indel calling from all 23 samples, which is a good performance for mutation screening using PGM. New versions of TS, BWA and GATK have shown improvements in indel calling sensitivity and specificity over their older counterpart. However, the variant caller of TS exhibits a lower sensitivity than GATK and SAMtools. Our findings demonstrate that although indel calling from PGM sequences may appear to be noisy at first glance, proper computational indel calling analysis is able to maximize both the sensitivity and specificity at the single base level, paving the way for the usage of this technology for future clinical genetic testing.

  14. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  15. Systematic and stochastic influences on the performance of the MinION nanopore sequencer across a range of nucleotide bias

    DOE PAGES

    Krishnakumar, Raga; Sinha, Anupama; Bird, Sara W.; ...

    2018-02-16

    Emerging sequencing technologies are allowing us to characterize environmental, clinical and laboratory samples with increasing speed and detail, including real-time analysis and interpretation of data. One example of this is being able to rapidly and accurately detect a wide range of pathogenic organisms, both in the clinic and the field. Genomes can have radically different GC content however, such that accurate sequence analysis can be challenging depending upon the technology used. Here, we have characterized the performance of the Oxford MinION nanopore sequencer for detection and evaluation of organisms with a range of genomic nucleotide bias. We have diagnosed themore » quality of base-calling across individual reads and discovered that the position within the read affects base-calling and quality scores. Finally, we have evaluated the performance of the current state-of-the-art neural network-based MinION basecaller, characterizing its behavior with respect to systemic errors as well as context- and sequence-specific errors. Overall, we present a detailed characterization the capabilities of the MinION in terms of generating high-accuracy sequence data from genomes with a wide range of nucleotide content. This study provides a framework for designing the appropriate experiments that are the likely to lead to accurate and rapid field-forward diagnostics.« less

  16. Using comparative genome analysis to identify problems in annotated microbial genomes.

    PubMed

    Poptsova, Maria S; Gogarten, J Peter

    2010-07-01

    Genome annotation is a tedious task that is mostly done by automated methods; however, the accuracy of these approaches has been questioned since the beginning of the sequencing era. Genome annotation is a multilevel process, and errors can emerge at different stages: during sequencing, as a result of gene-calling procedures, and in the process of assigning gene functions. Missed or wrongly annotated genes differentially impact different types of analyses. Here we discuss and demonstrate how the methods of comparative genome analysis can refine annotations by locating missing orthologues. We also discuss possible reasons for errors and show that the second-generation annotation systems, which combine multiple gene-calling programs with similarity-based methods, perform much better than the first annotation tools. Since old errors may propagate to the newly sequenced genomes, we emphasize that the problem of continuously updating popular public databases is an urgent and unresolved one. Due to the progress in genome-sequencing technologies, automated annotation techniques will remain the main approach in the future. Researchers need to be aware of the existing errors in the annotation of even well-studied genomes, such as Escherichia coli, and consider additional quality control for their results.

  17. On optimal fuzzy best proximity coincidence points of fuzzy order preserving proximal Ψ(σ, α)-lower-bounding asymptotically contractive mappings in non-Archimedean fuzzy metric spaces.

    PubMed

    De la Sen, Manuel; Abbas, Mujahid; Saleem, Naeem

    2016-01-01

    This paper discusses some convergence properties in fuzzy ordered proximal approaches defined by [Formula: see text]-sequences of pairs, where [Formula: see text] is a surjective self-mapping and [Formula: see text] where Aand Bare nonempty subsets of and abstract nonempty set X and [Formula: see text] is a partially ordered non-Archimedean fuzzy metric space which is endowed with a fuzzy metric M, a triangular norm * and an ordering [Formula: see text] The fuzzy set M takes values in a sequence or set [Formula: see text] where the elements of the so-called switching rule [Formula: see text] are defined from [Formula: see text] to a subset of [Formula: see text] Such a switching rule selects a particular realization of M at the nth iteration and it is parameterized by a growth evolution sequence [Formula: see text] and a sequence or set [Formula: see text] which belongs to the so-called [Formula: see text]-lower-bounding mappings which are defined from [0, 1] to [0, 1]. Some application examples concerning discrete systems under switching rules and best approximation solvability of algebraic equations are discussed.

  18. Exploitation of peptide motif sequences and their use in nanobiotechnology.

    PubMed

    Shiba, Kiyotaka

    2010-08-01

    Short amino acid sequences extracted from natural proteins or created using in vitro evolution systems are sometimes associated with particular biological functions. These peptides, called peptide motifs, can serve as functional units for the creation of various tools for nanobiotechnology. In particular, peptide motifs that have the ability to specifically recognize the surfaces of solid materials and to mineralize certain inorganic materials have been linking biological science to material science. Here, I review how these peptide motifs have been isolated from natural proteins or created using in vitro evolution systems, and how they have been used in the nanobiotechnology field. Copyright © 2010 Elsevier Ltd. All rights reserved.

  19. Learning to Observe in a Geomorphological Context

    ERIC Educational Resources Information Center

    Martinez, Patricia; Bannan-Ritland, Brenda; Peters, Erin E.; Baek, John

    2011-01-01

    This three-lesson sequence, addressing the topic of slow geomorphological change caused by water movement, integrates a Web-based system called Goinquire into a series of activities aimed to help upper-elementary, diverse students improve their observation skills and content knowledge in geomorphology. During the inquiry-based lessons, students…

  20. Evaluating Variant Calling Tools for Non-Matched Next-Generation Sequencing Data

    NASA Astrophysics Data System (ADS)

    Sandmann, Sarah; de Graaf, Aniek O.; Karimi, Mohsen; van der Reijden, Bert A.; Hellström-Lindberg, Eva; Jansen, Joop H.; Dugas, Martin

    2017-02-01

    Valid variant calling results are crucial for the use of next-generation sequencing in clinical routine. However, there are numerous variant calling tools that usually differ in algorithms, filtering strategies, recommendations and thus, also in the output. We evaluated eight open-source tools regarding their ability to call single nucleotide variants and short indels with allelic frequencies as low as 1% in non-matched next-generation sequencing data: GATK HaplotypeCaller, Platypus, VarScan, LoFreq, FreeBayes, SNVer, SAMtools and VarDict. We analysed two real datasets from patients with myelodysplastic syndrome, covering 54 Illumina HiSeq samples and 111 Illumina NextSeq samples. Mutations were validated by re-sequencing on the same platform, on a different platform and expert based review. In addition we considered two simulated datasets with varying coverage and error profiles, covering 50 samples each. In all cases an identical target region consisting of 19 genes (42,322 bp) was analysed. Altogether, no tool succeeded in calling all mutations. High sensitivity was always accompanied by low precision. Influence of varying coverages- and background noise on variant calling was generally low. Taking everything into account, VarDict performed best. However, our results indicate that there is a need to improve reproducibility of the results in the context of multithreading.

  1. Identification of immunoglobulins using Chou's pseudo amino acid composition with feature selection technique.

    PubMed

    Tang, Hua; Chen, Wei; Lin, Hao

    2016-04-01

    Immunoglobulins, also called antibodies, are a group of cell surface proteins which are produced by the immune system in response to the presence of a foreign substance (called antigen). They play key roles in many medical, diagnostic and biotechnological applications. Correct identification of immunoglobulins is crucial to the comprehension of humoral immune function. With the avalanche of protein sequences identified in postgenomic age, it is highly desirable to develop computational methods to timely identify immunoglobulins. In view of this, we designed a predictor called "IGPred" by formulating protein sequences with the pseudo amino acid composition into which nine physiochemical properties of amino acids were incorporated. Jackknife cross-validated results showed that 96.3% of immunoglobulins and 97.5% of non-immunoglobulins can be correctly predicted, indicating that IGPred holds very high potential to become a useful tool for antibody analysis. For the convenience of most experimental scientists, a web-server for IGPred was established at http://lin.uestc.edu.cn/server/IGPred. We believe that the web-server will become a powerful tool to study immunoglobulins and to guide related experimental validations.

  2. Asset - An application in mission automation for science planning

    NASA Technical Reports Server (NTRS)

    Finnerty, D. F.; Martin, J.; Doms, P. E.

    1987-01-01

    Recent advances in computer technology were used to great advantage in planning science observation sequences for the Voyager 2 encounter with Uranus in 1986. Despite a loss of experienced personnel, a challenging schedule, workforce limitations, and the complex nature of the Uranus encounter itself, the resultant science observation timelines were the most highly optimized of the five Voyager encounters with the outer planets. In part, this was due to the development of a microcomputer-based system, called ASSET (Automated Science Sequence Encounter Timelines generator), which was used to design those science observation timelines. This paper details the development of that system. ASSET demonstrates several features essential to the design of the first expert systems for science planning which will be applied for future missions.

  3. Advances in DNA sequencing technologies for high resolution HLA typing.

    PubMed

    Cereb, Nezih; Kim, Hwa Ran; Ryu, Jaejun; Yang, Soo Young

    2015-12-01

    This communication describes our experience in large-scale G group-level high resolution HLA typing using three different DNA sequencing platforms - ABI 3730 xl, Illumina MiSeq and PacBio RS II. Recent advances in DNA sequencing technologies, so-called next generation sequencing (NGS), have brought breakthroughs in deciphering the genetic information in all living species at a large scale and at an affordable level. The NGS DNA indexing system allows sequencing multiple genes for large number of individuals in a single run. Our laboratory has adopted and used these technologies for HLA molecular testing services. We found that each sequencing technology has its own strengths and weaknesses, and their sequencing performances complement each other. HLA genes are highly complex and genotyping them is quite challenging. Using these three sequencing platforms, we were able to meet all requirements for G group-level high resolution and high volume HLA typing. Copyright © 2015 American Society for Histocompatibility and Immunogenetics. Published by Elsevier Inc. All rights reserved.

  4. preAssemble: a tool for automatic sequencer trace data processing.

    PubMed

    Adzhubei, Alexei A; Laerdahl, Jon K; Vlasova, Anna V

    2006-01-17

    Trace or chromatogram files (raw data) are produced by automatic nucleic acid sequencing equipment or sequencers. Each file contains information which can be interpreted by specialised software to reveal the sequence (base calling). This is done by the sequencer proprietary software or publicly available programs. Depending on the size of a sequencing project the number of trace files can vary from just a few to thousands of files. Sequencing quality assessment on various criteria is important at the stage preceding clustering and contig assembly. Two major publicly available packages--Phred and Staden are used by preAssemble to perform sequence quality processing. The preAssemble pre-assembly sequence processing pipeline has been developed for small to large scale automatic processing of DNA sequencer chromatogram (trace) data. The Staden Package Pregap4 module and base-calling program Phred are utilized in the pipeline, which produces detailed and self-explanatory output that can be displayed with a web browser. preAssemble can be used successfully with very little previous experience, however options for parameter tuning are provided for advanced users. preAssemble runs under UNIX and LINUX operating systems. It is available for downloading and will run as stand-alone software. It can also be accessed on the Norwegian Salmon Genome Project web site where preAssemble jobs can be run on the project server. preAssemble is a tool allowing to perform quality assessment of sequences generated by automatic sequencing equipment. preAssemble is flexible since both interactive jobs on the preAssemble server and the stand alone downloadable version are available. Virtually no previous experience is necessary to run a default preAssemble job, on the other hand options for parameter tuning are provided. Consequently preAssemble can be used as efficiently for just several trace files as for large scale sequence processing.

  5. MuSE: accounting for tumor heterogeneity using a sample-specific error model improves sensitivity and specificity in mutation calling from sequencing data.

    PubMed

    Fan, Yu; Xi, Liu; Hughes, Daniel S T; Zhang, Jianjun; Zhang, Jianhua; Futreal, P Andrew; Wheeler, David A; Wang, Wenyi

    2016-08-24

    Subclonal mutations reveal important features of the genetic architecture of tumors. However, accurate detection of mutations in genetically heterogeneous tumor cell populations using next-generation sequencing remains challenging. We develop MuSE ( http://bioinformatics.mdanderson.org/main/MuSE ), Mutation calling using a Markov Substitution model for Evolution, a novel approach for modeling the evolution of the allelic composition of the tumor and normal tissue at each reference base. MuSE adopts a sample-specific error model that reflects the underlying tumor heterogeneity to greatly improve the overall accuracy. We demonstrate the accuracy of MuSE in calling subclonal mutations in the context of large-scale tumor sequencing projects using whole exome and whole genome sequencing.

  6. Medical Operations Console Procedure Evaluation: BME Response to Crew Call Down for an Emergency

    NASA Technical Reports Server (NTRS)

    Johnson-Troop; Pettys, Marianne; Hurst, Victor, IV; Smaka, Todd; Paul, Bonnie; Rosenquist, Kevin; Gast, Karin; Gillis, David; McCulley, Phyllis

    2006-01-01

    International Space Station (ISS) Mission Operations are managed by multiple flight control disciplines located at the lead Mission Control Center (MCC) at NASA-Johnson Space Center (JSC). ISS Medical Operations are supported by the complementary roles of Flight Surgeons (Surgeon) and Biomedical Engineer (BME) flight controllers. The Surgeon, a board certified physician, oversees all medical concerns of the crew and the BME provides operational and engineering support for Medical Operations Crew Health Care System. ISS Medical Operations is currently addressing the coordinated response to a crew call down for an emergent medical event, in particular when the BME is the only Medical Operations representative in MCC. In this case, the console procedure BME Response to Crew Call Down for an Emergency will be used. The procedure instructs the BME to contact a Surgeon as soon as possible, coordinate with other flight disciplines to establish a Private Medical Conference (PMC) for the crew and Surgeon, gather information from the crew if time permits, and provide Surgeon with pertinent console resources. It is paramount that this procedure is clearly written and easily navigated to assist the BME to respond consistently and efficiently. A total of five BME flight controllers participated in the study. Each BME participant sat in a simulated MCC environment at a console configured with resources specific to the BME MCC console and was presented with two scripted emergency call downs from an ISS crew member. Each participant used the procedure while interacting with analog MCC disciplines to respond to the crew call down. Audio and video recordings of the simulations were analyzed and each BME participant's actions were compared to the procedure. Structured debriefs were conducted at the conclusion of both simulations. The procedure was evaluated for its ability to elicit consistent responses from each BME participant. Trials were examined for deviations in procedure task completion and/or navigation, in particular the execution of the Surgeon call sequence. Debrief comments were used to analyze unclear procedural steps and to discern any discrepancies between the procedure and generally accepted BME actions. The sequence followed by BME participants differed considerably from the sequence intended by the procedure. Common deviations included the call sequence used to contact Surgeon, the content of BME and crew interaction and the gathering of pertinent console resources. Differing perceptions of task priority and imprecise language seem to have caused multiple deviations from the procedure s intended sequence. The study generated 40 recommendations for the procedure, of which 34 are being implemented. These recommendations address improving the clarity of the instructions, identifying training considerations, expediting Surgeon contact, improving cues for anticipated flight control team communication and identifying missing console tools.

  7. Epsilon-Q: An Automated Analyzer Interface for Mass Spectral Library Search and Label-Free Protein Quantification.

    PubMed

    Cho, Jin-Young; Lee, Hyoung-Joo; Jeong, Seul-Ki; Paik, Young-Ki

    2017-12-01

    Mass spectrometry (MS) is a widely used proteome analysis tool for biomedical science. In an MS-based bottom-up proteomic approach to protein identification, sequence database (DB) searching has been routinely used because of its simplicity and convenience. However, searching a sequence DB with multiple variable modification options can increase processing time, false-positive errors in large and complicated MS data sets. Spectral library searching is an alternative solution, avoiding the limitations of sequence DB searching and allowing the detection of more peptides with high sensitivity. Unfortunately, this technique has less proteome coverage, resulting in limitations in the detection of novel and whole peptide sequences in biological samples. To solve these problems, we previously developed the "Combo-Spec Search" method, which uses manually multiple references and simulated spectral library searching to analyze whole proteomes in a biological sample. In this study, we have developed a new analytical interface tool called "Epsilon-Q" to enhance the functions of both the Combo-Spec Search method and label-free protein quantification. Epsilon-Q performs automatically multiple spectral library searching, class-specific false-discovery rate control, and result integration. It has a user-friendly graphical interface and demonstrates good performance in identifying and quantifying proteins by supporting standard MS data formats and spectrum-to-spectrum matching powered by SpectraST. Furthermore, when the Epsilon-Q interface is combined with the Combo-Spec search method, called the Epsilon-Q system, it shows a synergistic function by outperforming other sequence DB search engines for identifying and quantifying low-abundance proteins in biological samples. The Epsilon-Q system can be a versatile tool for comparative proteome analysis based on multiple spectral libraries and label-free quantification.

  8. Direct-Sequence Spread Spectrum System

    DTIC Science & Technology

    1990-06-01

    by directly modulating a conventional narrowband frequency-modulated (FM) carrier by a high rate digital code. The direct modulation is binary phase ...specification of the DSSS system will not be developed. The results of the evaluation phase of this research will be compared against theoretical...spread spectrum is called binary phase -shift keying 19 (BPSK). BPSK is a modulation in which a binary Ŕ" represents a 0-degree relative phase

  9. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages.

    PubMed

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-03-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. 'dada2' performs trimming of the high-throughput sequencing data. 'QuasR' and 'mosaics' perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, 'ChIPseeker' performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git.

  10. A ChIP-Seq Data Analysis Pipeline Based on Bioconductor Packages

    PubMed Central

    Park, Seung-Jin; Kim, Jong-Hwan; Yoon, Byung-Ha; Kim, Seon-Young

    2017-01-01

    Nowadays, huge volumes of chromatin immunoprecipitation-sequencing (ChIP-Seq) data are generated to increase the knowledge on DNA-protein interactions in the cell, and accordingly, many tools have been developed for ChIP-Seq analysis. Here, we provide an example of a streamlined workflow for ChIP-Seq data analysis composed of only four packages in Bioconductor: dada2, QuasR, mosaics, and ChIPseeker. ‘dada2’ performs trimming of the high-throughput sequencing data. ‘QuasR’ and ‘mosaics’ perform quality control and mapping of the input reads to the reference genome and peak calling, respectively. Finally, ‘ChIPseeker’ performs annotation and visualization of the called peaks. This workflow runs well independently of operating systems (e.g., Windows, Mac, or Linux) and processes the input fastq files into various results in one run. R code is available at github: https://github.com/ddhb/Workflow_of_Chipseq.git. PMID:28416945

  11. SVM-Fold: a tool for discriminative multi-class protein fold and superfamily recognition

    PubMed Central

    Melvin, Iain; Ie, Eugene; Kuang, Rui; Weston, Jason; Stafford, William Noble; Leslie, Christina

    2007-01-01

    Background Predicting a protein's structural class from its amino acid sequence is a fundamental problem in computational biology. Much recent work has focused on developing new representations for protein sequences, called string kernels, for use with support vector machine (SVM) classifiers. However, while some of these approaches exhibit state-of-the-art performance at the binary protein classification problem, i.e. discriminating between a particular protein class and all other classes, few of these studies have addressed the real problem of multi-class superfamily or fold recognition. Moreover, there are only limited software tools and systems for SVM-based protein classification available to the bioinformatics community. Results We present a new multi-class SVM-based protein fold and superfamily recognition system and web server called SVM-Fold, which can be found at . Our system uses an efficient implementation of a state-of-the-art string kernel for sequence profiles, called the profile kernel, where the underlying feature representation is a histogram of inexact matching k-mer frequencies. We also employ a novel machine learning approach to solve the difficult multi-class problem of classifying a sequence of amino acids into one of many known protein structural classes. Binary one-vs-the-rest SVM classifiers that are trained to recognize individual structural classes yield prediction scores that are not comparable, so that standard "one-vs-all" classification fails to perform well. Moreover, SVMs for classes at different levels of the protein structural hierarchy may make useful predictions, but one-vs-all does not try to combine these multiple predictions. To deal with these problems, our method learns relative weights between one-vs-the-rest classifiers and encodes information about the protein structural hierarchy for multi-class prediction. In large-scale benchmark results based on the SCOP database, our code weighting approach significantly improves on the standard one-vs-all method for both the superfamily and fold prediction in the remote homology setting and on the fold recognition problem. Moreover, our code weight learning algorithm strongly outperforms nearest-neighbor methods based on PSI-BLAST in terms of prediction accuracy on every structure classification problem we consider. Conclusion By combining state-of-the-art SVM kernel methods with a novel multi-class algorithm, the SVM-Fold system delivers efficient and accurate protein fold and superfamily recognition. PMID:17570145

  12. Base-Calling Algorithm with Vocabulary (BCV) Method for Analyzing Population Sequencing Chromatograms

    PubMed Central

    Fantin, Yuri S.; Neverov, Alexey D.; Favorov, Alexander V.; Alvarez-Figueroa, Maria V.; Braslavskaya, Svetlana I.; Gordukova, Maria A.; Karandashova, Inga V.; Kuleshov, Konstantin V.; Myznikova, Anna I.; Polishchuk, Maya S.; Reshetov, Denis A.; Voiciehovskaya, Yana A.; Mironov, Andrei A.; Chulanov, Vladimir P.

    2013-01-01

    Sanger sequencing is a common method of reading DNA sequences. It is less expensive than high-throughput methods, and it is appropriate for numerous applications including molecular diagnostics. However, sequencing mixtures of similar DNA of pathogens with this method is challenging. This is important because most clinical samples contain such mixtures, rather than pure single strains. The traditional solution is to sequence selected clones of PCR products, a complicated, time-consuming, and expensive procedure. Here, we propose the base-calling with vocabulary (BCV) method that computationally deciphers Sanger chromatograms obtained from mixed DNA samples. The inputs to the BCV algorithm are a chromatogram and a dictionary of sequences that are similar to those we expect to obtain. We apply the base-calling function on a test dataset of chromatograms without ambiguous positions, as well as one with 3–14% sequence degeneracy. Furthermore, we use BCV to assemble a consensus sequence for an HIV genome fragment in a sample containing a mixture of viral DNA variants and to determine the positions of the indels. Finally, we detect drug-resistant Mycobacterium tuberculosis strains carrying frameshift mutations mixed with wild-type bacteria in the pncA gene, and roughly characterize bacterial communities in clinical samples by direct 16S rRNA sequencing. PMID:23382983

  13. Flight Experiment Investigation of General Aviation Self-Separation and Sequencing Tasks

    NASA Technical Reports Server (NTRS)

    Murdoch, Jennifer L.; Ramiscal, Ermin R.; McNabb, Jennifer L.; Bussink, Frank J. L.

    2005-01-01

    A new flight operations concept called Small Aircraft Transportation System (SATS) Higher Volume Operations (HVO) was developed to increase capacity during Instrument Meteorological Conditions (IMC) at non-towered, non-radar airports by enabling concurrent operations of multiple aircraft. One aspect of this concept involves having pilots safely self-separate from other aircraft during approaches into these airports using appropriate SATS HVO procedures. A flight experiment was conducted to determine if instrument-rated general aviation (GA) pilots could self-separate and sequence their ownship aircraft, while following a simulated aircraft, into a non-towered, non-radar airport during simulated IMC. Six GA pilots' workload levels and abilities to perform self-separation and sequencing procedures while flying a global positioning system (GPS) instrument approach procedure were examined. The results showed that the evaluation pilots maintained at least the minimum specified separation between their ownship aircraft and simulated traffic and maintained their assigned landing sequence 100-percent of the time. Neither flight path deviations nor subjective workload assessments were negatively impacted by the additional tasks of self-separating and sequencing during these instrument approaches.

  14. Development of a State Machine Sequencer for the Keck Interferometer: Evolution, Development and Lessons Learned using a CASE Tool Approach

    NASA Technical Reports Server (NTRS)

    Rede, Leonard J.; Booth, Andrew; Hsieh, Jonathon; Summer, Kellee

    2004-01-01

    This paper presents a discussion of the evolution of a sequencer from a simple EPICS (Experimental Physics and Industrial Control System) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a CASE (Computer Aided Software Engineering) tool approach. The main purpose of the sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Hare1 finite state machine, software program designed to orchestrate several lower-level hardware and software hard real time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORB A, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.

  15. Development of a state machine sequencer for the Keck Interferometer: evolution, development, and lessons learned using a CASE tool approach

    NASA Astrophysics Data System (ADS)

    Reder, Leonard J.; Booth, Andrew; Hsieh, Jonathan; Summers, Kellee R.

    2004-09-01

    This paper presents a discussion of the evolution of a sequencer from a simple Experimental Physics and Industrial Control System (EPICS) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a Computer Aided Software Engineering (CASE) tool approach. The main purpose of the Interferometer Sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations to be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Harel finite state machine software program designed to orchestrate several lower-level hardware and software hard real-time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORBA, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.

  16. Somatic Point Mutation Calling in Low Cellularity Tumors

    PubMed Central

    Kassahn, Karin S.; Holmes, Oliver; Nones, Katia; Patch, Ann-Marie; Miller, David K.; Christ, Angelika N.; Harliwong, Ivon; Bruxner, Timothy J.; Xu, Qinying; Anderson, Matthew; Wood, Scott; Leonard, Conrad; Taylor, Darrin; Newell, Felicity; Song, Sarah; Idrisoglu, Senel; Nourse, Craig; Nourbakhsh, Ehsan; Manning, Suzanne; Wani, Shivangi; Steptoe, Anita; Pajic, Marina; Cowley, Mark J.; Pinese, Mark; Chang, David K.; Gill, Anthony J.; Johns, Amber L.; Wu, Jianmin; Wilson, Peter J.; Fink, Lynn; Biankin, Andrew V.; Waddell, Nicola; Grimmond, Sean M.; Pearson, John V.

    2013-01-01

    Somatic mutation calling from next-generation sequencing data remains a challenge due to the difficulties of distinguishing true somatic events from artifacts arising from PCR, sequencing errors or mis-mapping. Tumor cellularity or purity, sub-clonality and copy number changes also confound the identification of true somatic events against a background of germline variants. We have developed a heuristic strategy and software (http://www.qcmg.org/bioinformatics/qsnp/) for somatic mutation calling in samples with low tumor content and we show the superior sensitivity and precision of our approach using a previously sequenced cell line, a series of tumor/normal admixtures, and 3,253 putative somatic SNVs verified on an orthogonal platform. PMID:24250782

  17. Sequence and batch language programs and alarm-related ``C`` programs for the 242-A MCS. Revision 2

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berger, J.F.

    1995-03-01

    A Distributive Process Control system was purchased by Project B-534, ``242-A Evaporator/Crystallizer Upgrades``. This control system, called the Monitor and Control System (MCS), was installed in the 242-A Evaporator located in the 200 East Area. The purpose of the MCS is to monitor and control the Evaporator and monitor a number of alarms and other signals from various Tank Farm facilities. Applications software for the MCS was developed by the Waste Treatment Systems Engineering (WTSE) group of Westinghouse. The standard displays and alarm scheme provide for control and monitoring, but do not directly indicate the signal location or depict themore » overall process. To do this, WTSE developed a second alarm scheme which uses special programs, annunciator keys, and process graphics. The special programs are written in two languages; Sequence and Batch Language (SABL), and ``C`` language. The WTSE-developed alarm scheme works as described below: SABL relates signals and alarms to the annunciator keys, called SKID keys. When an alarm occurs, a SABL program causes a SKID key to flash, and if the alarm is of yellow or white priority then a ``C`` program turns on an audible horn (the D/3 system uses a different audible horn for the red priority alarms). The horn and flashing key draws the attention of the operator.« less

  18. Urban, Indoor and Subterranean Navigation Sensors and Systems (Capteurs et systemes de navigation urbains, interieurs et souterrains)

    DTIC Science & Technology

    2010-11-01

    3-10 Multiple Images of an Image Sequence Figure 3-10 A Digital Magnetic Compass from KVH Industries 3-11 Figure 3-11 Earth’s Magnetic Field 3-11...ARINO SENER – Ingenieria y Sistemas S.A Aerospace Division Parque Tecnologico de Madrid Calle Severo Ocho 4 28760 Tres Cantos Madrid Email...experts from government, academia, industry and the military produced an analysis of future navigation sensors and systems whose performance

  19. Spacecraft command verification: The AI solution

    NASA Technical Reports Server (NTRS)

    Fesq, Lorraine M.; Stephan, Amy; Smith, Brian K.

    1990-01-01

    Recently, a knowledge-based approach was used to develop a system called the Command Constraint Checker (CCC) for TRW. CCC was created to automate the process of verifying spacecraft command sequences. To check command files by hand for timing and sequencing errors is a time-consuming and error-prone task. Conventional software solutions were rejected when it was estimated that it would require 36 man-months to build an automated tool to check constraints by conventional methods. Using rule-based representation to model the various timing and sequencing constraints of the spacecraft, CCC was developed and tested in only three months. By applying artificial intelligence techniques, CCC designers were able to demonstrate the viability of AI as a tool to transform difficult problems into easily managed tasks. The design considerations used in developing CCC are discussed and the potential impact of this system on future satellite programs is examined.

  20. A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine

    NASA Astrophysics Data System (ADS)

    Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing

    2017-09-01

    Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.

  1. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data.

    PubMed

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths.

  2. Evaluation of Nine Somatic Variant Callers for Detection of Somatic Mutations in Exome and Targeted Deep Sequencing Data

    PubMed Central

    Krøigård, Anne Bruun; Thomassen, Mads; Lænkholm, Anne-Vibeke; Kruse, Torben A.; Larsen, Martin Jakob

    2016-01-01

    Next generation sequencing is extensively applied to catalogue somatic mutations in cancer, in research settings and increasingly in clinical settings for molecular diagnostics, guiding therapy decisions. Somatic variant callers perform paired comparisons of sequencing data from cancer tissue and matched normal tissue in order to detect somatic mutations. The advent of many new somatic variant callers creates a need for comparison and validation of the tools, as no de facto standard for detection of somatic mutations exists and only limited comparisons have been reported. We have performed a comprehensive evaluation using exome sequencing and targeted deep sequencing data of paired tumor-normal samples from five breast cancer patients to evaluate the performance of nine publicly available somatic variant callers: EBCall, Mutect, Seurat, Shimmer, Indelocator, Somatic Sniper, Strelka, VarScan 2 and Virmid for the detection of single nucleotide mutations and small deletions and insertions. We report a large variation in the number of calls from the nine somatic variant callers on the same sequencing data and highly variable agreement. Sequencing depth had markedly diverse impact on individual callers, as for some callers, increased sequencing depth highly improved sensitivity. For SNV calling, we report EBCall, Mutect, Virmid and Strelka to be the most reliable somatic variant callers for both exome sequencing and targeted deep sequencing. For indel calling, EBCall is superior due to high sensitivity and robustness to changes in sequencing depths. PMID:27002637

  3. VirVarSeq: a low-frequency virus variant detection pipeline for Illumina sequencing using adaptive base-calling accuracy filtering.

    PubMed

    Verbist, Bie M P; Thys, Kim; Reumers, Joke; Wetzels, Yves; Van der Borght, Koen; Talloen, Willem; Aerssens, Jeroen; Clement, Lieven; Thas, Olivier

    2015-01-01

    In virology, massively parallel sequencing (MPS) opens many opportunities for studying viral quasi-species, e.g. in HIV-1- and HCV-infected patients. This is essential for understanding pathways to resistance, which can substantially improve treatment. Although MPS platforms allow in-depth characterization of sequence variation, their measurements still involve substantial technical noise. For Illumina sequencing, single base substitutions are the main error source and impede powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores (Qs) that are useful for differentiating errors from the real low-frequency mutations. A variant calling tool, Q-cpileup, is proposed, which exploits the Qs of nucleotides in a filtering strategy to increase specificity. The tool is imbedded in an open-source pipeline, VirVarSeq, which allows variant calling starting from fastq files. Using both plasmid mixtures and clinical samples, we show that Q-cpileup is able to reduce the number of false-positive findings. The filtering strategy is adaptive and provides an optimized threshold for individual samples in each sequencing run. Additionally, linkage information is kept between single-nucleotide polymorphisms as variants are called at the codon level. This enables virologists to have an immediate biological interpretation of the reported variants with respect to their antiviral drug responses. A comparison with existing SNP caller tools reveals that calling variants at the codon level with Q-cpileup results in an outstanding sensitivity while maintaining a good specificity for variants with frequencies down to 0.5%. The VirVarSeq is available, together with a user's guide and test data, at sourceforge: http://sourceforge.net/projects/virtools/?source=directory. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  4. snpAD: An ancient DNA genotype caller.

    PubMed

    Prüfer, Kay

    2018-06-21

    The study of ancient genomes can elucidate the evolutionary past. However, analyses are complicated by base-modifications in ancient DNA molecules that result in errors in DNA sequences. These errors are particularly common near the ends of sequences and pose a challenge for genotype calling. I describe an iterative method that estimates genotype frequencies and errors along sequences to allow for accurate genotype calling from ancient sequences. The implementation of this method, called snpAD, performs well on high-coverage ancient data, as shown by simulations and by subsampling the data of a high-coverage Neandertal genome. Although estimates for low-coverage genomes are less accurate, I am able to derive approximate estimates of heterozygosity from several low-coverage Neandertals. These estimates show that low heterozygosity, compared to modern humans, was common among Neandertals. The C ++ code of snpAD is freely available at http://bioinf.eva.mpg.de/snpAD/. Supplementary data are available at Bioinformatics online.

  5. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

    PubMed Central

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

    2015-01-01

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151

  6. Antimicrobial activity predictors benchmarking analysis using shuffled and designed synthetic peptides.

    PubMed

    Porto, William F; Pires, Állan S; Franco, Octavio L

    2017-08-07

    The antimicrobial activity prediction tools aim to help the novel antimicrobial peptides (AMP) sequences discovery, utilizing machine learning methods. Such approaches have gained increasing importance in the generation of novel synthetic peptides by means of rational design techniques. This study focused on predictive ability of such approaches to determine the antimicrobial sequence activities, which were previously characterized at the protein level by in vitro studies. Using four web servers and one standalone software, we evaluated 78 sequences generated by the so-called linguistic model, being 40 designed and 38 shuffled sequences, with ∼60 and ∼25% of identity to AMPs, respectively. The ab initio molecular modelling of such sequences indicated that the structure does not affect the predictions, as both sets present similar structures. Overall, the systems failed on predicting shuffled versions of designed peptides, as they are identical in AMPs composition, which implies in accuracies below 30%. The prediction accuracy is negatively affected by the low specificity of all systems here evaluated, as they, on the other hand, reached 100% of sensitivity. Our results suggest that complementary approaches with high specificity, not necessarily high accuracy, should be developed to be used together with the current systems, overcoming their limitations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  7. EGenBio: A Data Management System for Evolutionary Genomics and Biodiversity

    PubMed Central

    Nahum, Laila A; Reynolds, Matthew T; Wang, Zhengyuan O; Faith, Jeremiah J; Jonna, Rahul; Jiang, Zhi J; Meyer, Thomas J; Pollock, David D

    2006-01-01

    Background Evolutionary genomics requires management and filtering of large numbers of diverse genomic sequences for accurate analysis and inference on evolutionary processes of genomic and functional change. We developed Evolutionary Genomics and Biodiversity (EGenBio; ) to begin to address this. Description EGenBio is a system for manipulation and filtering of large numbers of sequences, integrating curated sequence alignments and phylogenetic trees, managing evolutionary analyses, and visualizing their output. EGenBio is organized into three conceptual divisions, Evolution, Genomics, and Biodiversity. The Genomics division includes tools for selecting pre-aligned sequences from different genes and species, and for modifying and filtering these alignments for further analysis. Species searches are handled through queries that can be modified based on a tree-based navigation system and saved. The Biodiversity division contains tools for analyzing individual sequences or sequence alignments, whereas the Evolution division contains tools involving phylogenetic trees. Alignments are annotated with analytical results and modification history using our PRAED format. A miscellaneous Tools section and Help framework are also available. EGenBio was developed around our comparative genomic research and a prototype database of mtDNA genomes. It utilizes MySQL-relational databases and dynamic page generation, and calls numerous custom programs. Conclusion EGenBio was designed to serve as a platform for tools and resources to ease combined analysis in evolution, genomics, and biodiversity. PMID:17118150

  8. A pluggable framework for parallel pairwise sequence search.

    PubMed

    Archuleta, Jeremy; Feng, Wu-chun; Tilevich, Eli

    2007-01-01

    The current and near future of the computing industry is one of multi-core and multi-processor technology. Most existing sequence-search tools have been designed with a focus on single-core, single-processor systems. This discrepancy between software design and hardware architecture substantially hinders sequence-search performance by not allowing full utilization of the hardware. This paper presents a novel framework that will aid the conversion of serial sequence-search tools into a parallel version that can take full advantage of the available hardware. The framework, which is based on a software architecture called mixin layers with refined roles, enables modules to be plugged into the framework with minimal effort. The inherent modular design improves maintenance and extensibility, thus opening up a plethora of opportunities for advanced algorithmic features to be developed and incorporated while routine maintenance of the codebase persists.

  9. Coverage Bias and Sensitivity of Variant Calling for Four Whole-genome Sequencing Technologies

    PubMed Central

    Lasitschka, Bärbel; Jones, David; Northcott, Paul; Hutter, Barbara; Jäger, Natalie; Kool, Marcel; Taylor, Michael; Lichter, Peter; Pfister, Stefan; Wolf, Stephan; Brors, Benedikt; Eils, Roland

    2013-01-01

    The emergence of high-throughput, next-generation sequencing technologies has dramatically altered the way we assess genomes in population genetics and in cancer genomics. Currently, there are four commonly used whole-genome sequencing platforms on the market: Illumina’s HiSeq2000, Life Technologies’ SOLiD 4 and its completely redesigned 5500xl SOLiD, and Complete Genomics’ technology. A number of earlier studies have compared a subset of those sequencing platforms or compared those platforms with Sanger sequencing, which is prohibitively expensive for whole genome studies. Here we present a detailed comparison of the performance of all currently available whole genome sequencing platforms, especially regarding their ability to call SNVs and to evenly cover the genome and specific genomic regions. Unlike earlier studies, we base our comparison on four different samples, allowing us to assess the between-sample variation of the platforms. We find a pronounced GC bias in GC-rich regions for Life Technologies’ platforms, with Complete Genomics performing best here, while we see the least bias in GC-poor regions for HiSeq2000 and 5500xl. HiSeq2000 gives the most uniform coverage and displays the least sample-to-sample variation. In contrast, Complete Genomics exhibits by far the smallest fraction of bases not covered, while the SOLiD platforms reveal remarkable shortcomings, especially in covering CpG islands. When comparing the performance of the four platforms for calling SNPs, HiSeq2000 and Complete Genomics achieve the highest sensitivity, while the SOLiD platforms show the lowest false positive rate. Finally, we find that integrating sequencing data from different platforms offers the potential to combine the strengths of different technologies. In summary, our results detail the strengths and weaknesses of all four whole-genome sequencing platforms. It indicates application areas that call for a specific sequencing platform and disallow other platforms. This helps to identify the proper sequencing platform for whole genome studies with different application scopes. PMID:23776689

  10. Muver, a computational framework for accurately calling accumulated mutations.

    PubMed

    Burkholder, Adam B; Lujan, Scott A; Lavender, Christopher A; Grimm, Sara A; Kunkel, Thomas A; Fargo, David C

    2018-05-09

    Identification of mutations from next-generation sequencing data typically requires a balance between sensitivity and accuracy. This is particularly true of DNA insertions and deletions (indels), that can impart significant phenotypic consequences on cells but are harder to call than substitution mutations from whole genome mutation accumulation experiments. To overcome these difficulties, we present muver, a computational framework that integrates established bioinformatics tools with novel analytical methods to generate mutation calls with the extremely low false positive rates and high sensitivity required for accurate mutation rate determination and comparison. Muver uses statistical comparison of ancestral and descendant allelic frequencies to identify variant loci and assigns genotypes with models that include per-sample assessments of sequencing errors by mutation type and repeat context. Muver identifies maximally parsimonious mutation pathways that connect these genotypes, differentiating potential allelic conversion events and delineating ambiguities in mutation location, type, and size. Benchmarking with a human gold standard father-son pair demonstrates muver's sensitivity and low false positive rates. In DNA mismatch repair (MMR) deficient Saccharomyces cerevisiae, muver detects multi-base deletions in homopolymers longer than the replicative polymerase footprint at rates greater than predicted for sequential single-base deletions, implying a novel multi-repeat-unit slippage mechanism. Benchmarking results demonstrate the high accuracy and sensitivity achieved with muver, particularly for indels, relative to available tools. Applied to an MMR-deficient Saccharomyces cerevisiae system, muver mutation calls facilitate mechanistic insights into DNA replication fidelity.

  11. The effect of input DNA copy number on genotype call and characterising SNP markers in the humpback whale genome using a nanofluidic array.

    PubMed

    Bhat, Somanath; Polanowski, Andrea M; Double, Mike C; Jarman, Simon N; Emslie, Kerry R

    2012-01-01

    Recent advances in nanofluidic technologies have enabled the use of Integrated Fluidic Circuits (IFCs) for high-throughput Single Nucleotide Polymorphism (SNP) genotyping (GT). In this study, we implemented and validated a relatively low cost nanofluidic system for SNP-GT with and without Specific Target Amplification (STA). As proof of principle, we first validated the effect of input DNA copy number on genotype call rate using well characterised, digital PCR (dPCR) quantified human genomic DNA samples and then implemented the validated method to genotype 45 SNPs in the humpback whale, Megaptera novaeangliae, nuclear genome. When STA was not incorporated, for a homozygous human DNA sample, reaction chambers containing, on average 9 to 97 copies, showed 100% call rate and accuracy. Below 9 copies, the call rate decreased, and at one copy it was 40%. For a heterozygous human DNA sample, the call rate decreased from 100% to 21% when predicted copies per reaction chamber decreased from 38 copies to one copy. The tightness of genotype clusters on a scatter plot also decreased. In contrast, when the same samples were subjected to STA prior to genotyping a call rate and a call accuracy of 100% were achieved. Our results demonstrate that low input DNA copy number affects the quality of data generated, in particular for a heterozygous sample. Similar to human genomic DNA, a call rate and a call accuracy of 100% was achieved with whale genomic DNA samples following multiplex STA using either 15 or 45 SNP-GT assays. These calls were 100% concordant with their true genotypes determined by an independent method, suggesting that the nanofluidic system is a reliable platform for executing call rates with high accuracy and concordance in genomic sequences derived from biological tissue.

  12. Time-optimal excitation of maximum quantum coherence: Physical limits and pulse sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Köcher, S. S.; Institute of Energy and Climate Research; Heydenreich, T.

    Here we study the optimum efficiency of the excitation of maximum quantum (MaxQ) coherence using analytical and numerical methods based on optimal control theory. The theoretical limit of the achievable MaxQ amplitude and the minimum time to achieve this limit are explored for a set of model systems consisting of up to five coupled spins. In addition to arbitrary pulse shapes, two simple pulse sequence families of practical interest are considered in the optimizations. Compared to conventional approaches, substantial gains were found both in terms of the achieved MaxQ amplitude and in pulse sequence durations. For a model system, theoreticallymore » predicted gains of a factor of three compared to the conventional pulse sequence were experimentally demonstrated. Motivated by the numerical results, also two novel analytical transfer schemes were found: Compared to conventional approaches based on non-selective pulses and delays, double-quantum coherence in two-spin systems can be created twice as fast using isotropic mixing and hard spin-selective pulses. Also it is proved that in a chain of three weakly coupled spins with the same coupling constants, triple-quantum coherence can be created in a time-optimal fashion using so-called geodesic pulses.« less

  13. Reasoning about real-time systems with temporal interval logic constraints on multi-state automata

    NASA Technical Reports Server (NTRS)

    Gabrielian, Armen

    1991-01-01

    Models of real-time systems using a single paradigm often turn out to be inadequate, whether the paradigm is based on states, rules, event sequences, or logic. A model-based approach to reasoning about real-time systems is presented in which a temporal interval logic called TIL is employed to define constraints on a new type of high level automata. The combination, called hierarchical multi-state (HMS) machines, can be used to model formally a real-time system, a dynamic set of requirements, the environment, heuristic knowledge about planning-related problem solving, and the computational states of the reasoning mechanism. In this framework, mathematical techniques were developed for: (1) proving the correctness of a representation; (2) planning of concurrent tasks to achieve goals; and (3) scheduling of plans to satisfy complex temporal constraints. HMS machines allow reasoning about a real-time system from a model of how truth arises instead of merely depending of what is true in a system.

  14. Multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, Edward S.; Li, Qingbo; Lu, Xiandan

    1998-04-21

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification ("base calling") is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations.

  15. Multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, Edward S.; Chang, Huan-Tsang; Fung, Eliza N.; Li, Qingbo; Lu, Xiandan

    1996-12-10

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification ("base calling") is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations.

  16. Multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, E.S.; Li, Q.; Lu, X.

    1998-04-21

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification (``base calling``) is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations. 19 figs.

  17. Multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, E.S.; Chang, H.T.; Fung, E.N.; Li, Q.; Lu, X.

    1996-12-10

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification (``base calling``) is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations. 19 figs.

  18. A reference bacterial genome dataset generated on the MinION™ portable single-molecule nanopore sequencer.

    PubMed

    Quick, Joshua; Quinlan, Aaron R; Loman, Nicholas J

    2014-01-01

    The MinION™ is a new, portable single-molecule sequencer developed by Oxford Nanopore Technologies. It measures four inches in length and is powered from the USB 3.0 port of a laptop computer. The MinION™ measures the change in current resulting from DNA strands interacting with a charged protein nanopore. These measurements can then be used to deduce the underlying nucleotide sequence. We present a read dataset from whole-genome shotgun sequencing of the model organism Escherichia coli K-12 substr. MG1655 generated on a MinION™ device during the early-access MinION™ Access Program (MAP). Sequencing runs of the MinION™ are presented, one generated using R7 chemistry (released in July 2014) and one using R7.3 (released in September 2014). Base-called sequence data are provided to demonstrate the nature of data produced by the MinION™ platform and to encourage the development of customised methods for alignment, consensus and variant calling, de novo assembly and scaffolding. FAST5 files containing event data within the HDF5 container format are provided to assist with the development of improved base-calling methods.

  19. A short review of variants calling for single-cell-sequencing data with applications.

    PubMed

    Wei, Zhuohui; Shu, Chang; Zhang, Changsheng; Huang, Jingying; Cai, Hongmin

    2017-11-01

    The field of single-cell sequencing is fleetly expanding, and many techniques have been developed in the past decade. With this technology, biologists can study not only the heterogeneity between two adjacent cells in the same tissue or organ, but also the evolutionary relationships and degenerative processes in a single cell. Calling variants is the main purpose in analyzing single cell sequencing (SCS) data. Currently, some popular methods used for bulk-cell-sequencing data analysis are tailored directly to be applied in dealing with SCS data. However, SCS requires an extra step of genome amplification to accumulate enough quantity for satisfying sequencing needs. The amplification yields large biases and thus raises challenge for using the bulk-cell-sequencing methods. In order to provide guidance for the development of specialized analyzed methods as well as using currently developed tools for SNS, this paper aims to bridge the gap. In this paper, we firstly introduced two popular genome amplification methods and compared their capabilities. Then we introduced a few popular models for calling single-nucleotide polymorphisms and copy-number variations. Finally, break-through applications of SNS were summarized to demonstrate its potential in researching cell evolution. Copyright © 2017 Elsevier Ltd. All rights reserved.

  20. Use of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells.

    PubMed

    Xin, Yurong; Kim, Jinrang; Ni, Min; Wei, Yi; Okamoto, Haruka; Lee, Joseph; Adler, Christina; Cavino, Katie; Murphy, Andrew J; Yancopoulos, George D; Lin, Hsin Chieh; Gromada, Jesper

    2016-03-22

    This study provides an assessment of the Fluidigm C1 platform for RNA sequencing of single mouse pancreatic islet cells. The system combines microfluidic technology and nanoliter-scale reactions. We sequenced 622 cells, allowing identification of 341 islet cells with high-quality gene expression profiles. The cells clustered into populations of α-cells (5%), β-cells (92%), δ-cells (1%), and pancreatic polypeptide cells (2%). We identified cell-type-specific transcription factors and pathways primarily involved in nutrient sensing and oxidation and cell signaling. Unexpectedly, 281 cells had to be removed from the analysis due to low viability, low sequencing quality, or contamination resulting in the detection of more than one islet hormone. Collectively, we provide a resource for identification of high-quality gene expression datasets to help expand insights into genes and pathways characterizing islet cell types. We reveal limitations in the C1 Fluidigm cell capture process resulting in contaminated cells with altered gene expression patterns. This calls for caution when interpreting single-cell transcriptomics data using the C1 Fluidigm system.

  1. DNA Base-Calling from a Nanopore Using a Viterbi Algorithm

    PubMed Central

    Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei

    2012-01-01

    Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (∼98%), even with a poor signal/noise ratio. PMID:22677395

  2. Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.

    PubMed

    Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P

    2015-08-18

    Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  3. Advanced Transport Operating System (ATOPS) control display unit software description

    NASA Technical Reports Server (NTRS)

    Slominski, Christopher J.; Parks, Mark A.; Debure, Kelly R.; Heaphy, William J.

    1992-01-01

    The software created for the Control Display Units (CDUs), used for the Advanced Transport Operating Systems (ATOPS) project, on the Transport Systems Research Vehicle (TSRV) is described. Module descriptions are presented in a standardized format which contains module purpose, calling sequence, a detailed description, and global references. The global reference section includes subroutines, functions, and common variables referenced by a particular module. The CDUs, one for the pilot and one for the copilot, are used for flight management purposes. Operations performed with the CDU affects the aircraft's guidance, navigation, and display software.

  4. PyParse: a semiautomated system for scoring spoken recall data.

    PubMed

    Solway, Alec; Geller, Aaron S; Sederberg, Per B; Kahana, Michael J

    2010-02-01

    Studies of human memory often generate data on the sequence and timing of recalled items, but scoring such data using conventional methods is difficult or impossible. We describe a Python-based semiautomated system that greatly simplifies this task. This software, called PyParse, can easily be used in conjunction with many common experiment authoring systems. Scored data is output in a simple ASCII format and can be accessed with the programming language of choice, allowing for the identification of features such as correct responses, prior-list intrusions, extra-list intrusions, and repetitions.

  5. VIPER: a web application for rapid expert review of variant calls.

    PubMed

    Wöste, Marius; Dugas, Martin

    2018-06-01

    With the rapid development in next-generation sequencing, cost and time requirements for genomic sequencing are decreasing, enabling applications in many areas such as cancer research. Many tools have been developed to analyze genomic variation ranging from single nucleotide variants to whole chromosomal aberrations. As sequencing throughput increases, the number of variants called by such tools also grows. Often employed manual inspection of such calls is thus becoming a time-consuming procedure. We developed the Variant InsPector and Expert Rating tool (VIPER) to speed up this process by integrating the Integrative Genomics Viewer into a web application. Analysts can then quickly iterate through variants, apply filters and make decisions based on the generated images and variant metadata. VIPER was successfully employed in analyses with manual inspection of more than 10 000 calls. VIPER is implemented in Java and Javascript and is freely available at https://github.com/MarWoes/viper. marius.woeste@uni-muenster.de. Supplementary data are available at Bioinformatics online.

  6. One Size Doesn't Fit All - RefEditor: Building Personalized Diploid Reference Genome to Improve Read Mapping and Genotype Calling in Next Generation Sequencing Studies

    PubMed Central

    Yuan, Shuai; Johnston, H. Richard; Zhang, Guosheng; Li, Yun; Hu, Yi-Juan; Qin, Zhaohui S.

    2015-01-01

    With rapid decline of the sequencing cost, researchers today rush to embrace whole genome sequencing (WGS), or whole exome sequencing (WES) approach as the next powerful tool for relating genetic variants to human diseases and phenotypes. A fundamental step in analyzing WGS and WES data is mapping short sequencing reads back to the reference genome. This is an important issue because incorrectly mapped reads affect the downstream variant discovery, genotype calling and association analysis. Although many read mapping algorithms have been developed, the majority of them uses the universal reference genome and do not take sequence variants into consideration. Given that genetic variants are ubiquitous, it is highly desirable if they can be factored into the read mapping procedure. In this work, we developed a novel strategy that utilizes genotypes obtained a priori to customize the universal haploid reference genome into a personalized diploid reference genome. The new strategy is implemented in a program named RefEditor. When applying RefEditor to real data, we achieved encouraging improvements in read mapping, variant discovery and genotype calling. Compared to standard approaches, RefEditor can significantly increase genotype calling consistency (from 43% to 61% at 4X coverage; from 82% to 92% at 20X coverage) and reduce Mendelian inconsistency across various sequencing depths. Because many WGS and WES studies are conducted on cohorts that have been genotyped using array-based genotyping platforms previously or concurrently, we believe the proposed strategy will be of high value in practice, which can also be applied to the scenario where multiple NGS experiments are conducted on the same cohort. The RefEditor sources are available at https://github.com/superyuan/refeditor. PMID:26267278

  7. Inexpensive and Highly Reproducible Cloud-Based Variant Calling of 2,535 Human Genomes

    PubMed Central

    Shringarpure, Suyash S.; Carroll, Andrew; De La Vega, Francisco M.; Bustamante, Carlos D.

    2015-01-01

    Population scale sequencing of whole human genomes is becoming economically feasible; however, data management and analysis remains a formidable challenge for many research groups. Large sequencing studies, like the 1000 Genomes Project, have improved our understanding of human demography and the effect of rare genetic variation in disease. Variant calling on datasets of hundreds or thousands of genomes is time-consuming, expensive, and not easily reproducible given the myriad components of a variant calling pipeline. Here, we describe a cloud-based pipeline for joint variant calling in large samples using the Real Time Genomics population caller. We deployed the population caller on the Amazon cloud with the DNAnexus platform in order to achieve low-cost variant calling. Using our pipeline, we were able to identify 68.3 million variants in 2,535 samples from Phase 3 of the 1000 Genomes Project. By performing the variant calling in a parallel manner, the data was processed within 5 days at a compute cost of $7.33 per sample (a total cost of $18,590 for completed jobs and $21,805 for all jobs). Analysis of cost dependence and running time on the data size suggests that, given near linear scalability, cloud computing can be a cheap and efficient platform for analyzing even larger sequencing studies in the future. PMID:26110529

  8. Observations and Bayesian location methodology of transient acoustic signals (likely blue whales) in the Indian Ocean, using a hydrophone triplet.

    PubMed

    Le Bras, Ronan J; Kuzma, Heidi; Sucic, Victor; Bokelmann, Götz

    2016-05-01

    A notable sequence of calls was encountered, spanning several days in January 2003, in the central part of the Indian Ocean on a hydrophone triplet recording acoustic data at a 250 Hz sampling rate. This paper presents signal processing methods applied to the waveform data to detect, group, extract amplitude and bearing estimates for the recorded signals. An approximate location for the source of the sequence of calls is inferred from extracting the features from the waveform. As the source approaches the hydrophone triplet, the source level (SL) of the calls is estimated at 187 ± 6 dB re: 1 μPa-1 m in the 15-60 Hz frequency range. The calls are attributed to a subgroup of blue whales, Balaenoptera musculus, with a characteristic acoustic signature. A Bayesian location method using probabilistic models for bearing and amplitude is demonstrated on the calls sequence. The method is applied to the case of detection at a single triad of hydrophones and results in a probability distribution map for the origin of the calls. It can be extended to detections at multiple triads and because of the Bayesian formulation, additional modeling complexity can be built-in as needed.

  9. VaDiR: an integrated approach to Variant Detection in RNA.

    PubMed

    Neums, Lisa; Suenaga, Seiji; Beyerlein, Peter; Anders, Sara; Koestler, Devin; Mariani, Andrea; Chien, Jeremy

    2018-02-01

    Advances in next-generation DNA sequencing technologies are now enabling detailed characterization of sequence variations in cancer genomes. With whole-genome sequencing, variations in coding and non-coding sequences can be discovered. But the cost associated with it is currently limiting its general use in research. Whole-exome sequencing is used to characterize sequence variations in coding regions, but the cost associated with capture reagents and biases in capture rate limit its full use in research. Additional limitations include uncertainty in assigning the functional significance of the mutations when these mutations are observed in the non-coding region or in genes that are not expressed in cancer tissue. We investigated the feasibility of uncovering mutations from expressed genes using RNA sequencing datasets with a method called Variant Detection in RNA(VaDiR) that integrates 3 variant callers, namely: SNPiR, RVBoost, and MuTect2. The combination of all 3 methods, which we called Tier 1 variants, produced the highest precision with true positive mutations from RNA-seq that could be validated at the DNA level. We also found that the integration of Tier 1 variants with those called by MuTect2 and SNPiR produced the highest recall with acceptable precision. Finally, we observed a higher rate of mutation discovery in genes that are expressed at higher levels. Our method, VaDiR, provides a possibility of uncovering mutations from RNA sequencing datasets that could be useful in further functional analysis. In addition, our approach allows orthogonal validation of DNA-based mutation discovery by providing complementary sequence variation analysis from paired RNA/DNA sequencing datasets.

  10. Efficient theory of dipolar recoupling in solid-state nuclear magnetic resonance of rotating solids using Floquet-Magnus expansion: application on BABA and C7 radiofrequency pulse sequences.

    PubMed

    Mananga, Eugene S; Reid, Alicia E; Charpentier, Thibault

    2012-02-01

    This article describes the use of an alternative expansion scheme called Floquet-Magnus expansion (FME) to study the dynamics of spin system in solid-state NMR. The main tool used to describe the effect of time-dependent interactions in NMR is the average Hamiltonian theory (AHT). However, some NMR experiments, such as sample rotation and pulse crafting, seem to be more conveniently described using the Floquet theory (FT). Here, we present the first report highlighting the basics of the Floquet-Magnus expansion (FME) scheme and hint at its application on recoupling sequences that excite more efficiently double-quantum coherences, namely BABA and C7 radiofrequency pulse sequences. The use of Λ(n)(t) functions available only in the FME scheme, allows the comparison of the efficiency of BABA and C7 sequences. Copyright © 2011 Elsevier Inc. All rights reserved.

  11. Efficient theory of dipolar recoupling in–solid state nuclear magnetic resonance of rotating solids using Floquet-Magnus expansion: Application on BABA and C7 radiofrequency pulse sequences

    PubMed Central

    Reid, Alicia E.; Charpentier, Thibault

    2013-01-01

    This article describes the use of an alternative expansion scheme called Floquet-Magnus expansion (FME) to study the dynamics of spin system in solid-state NMR. The main tool used to describe the effect of time-dependent interactions in NMR is the average Hamiltonian theory (AHT). However, some NMR experiments, such as sample rotation and pulse crafting, seem to be more conveniently described using the Floquet theory (FT). Here, we present the first report highlighting the basics of the Floquet-Magnus expansion (FME) scheme and hint at its application on recoupling sequences that excite more efficiently double-quantum coherences, namely BABA and C7 radiofrequency pulse sequences. The use of Λn(t) functions available only in the FME scheme, allows the comparison of the efficiency of BABA and C7 sequences. PMID:22197191

  12. Pure Perceptual-Based Sequence Learning: A Role for Visuospatial Attention

    ERIC Educational Resources Information Center

    Remillard, Gilbert

    2009-01-01

    Learning the structure of a sequence of target locations when target location is not the response dimension and the sequence of target locations is uncorrelated with the sequence of responses is called pure perceptual-based sequence learning. The paradigm introduced by G. Remillard (2003) was used to determine whether orienting of visuospatial…

  13. The Functional Human C-Terminome

    PubMed Central

    Hedden, Michael; Lyon, Kenneth F.; Brooks, Steven B.; David, Roxanne P.; Limtong, Justin; Newsome, Jacklyn M.; Novakovic, Nemanja; Rajasekaran, Sanguthevar; Thapar, Vishal; Williams, Sean R.; Schiller, Martin R.

    2016-01-01

    All translated proteins end with a carboxylic acid commonly called the C-terminus. Many short functional sequences (minimotifs) are located on or immediately proximal to the C-terminus. However, information about the function of protein C-termini has not been consolidated into a single source. Here, we built a new “C-terminome” database and web system focused on human proteins. Approximately 3,600 C-termini in the human proteome have a minimotif with an established molecular function. To help evaluate the function of the remaining C-termini in the human proteome, we inferred minimotifs identified by experimentation in rodent cells, predicted minimotifs based upon consensus sequence matches, and predicted novel highly repetitive sequences in C-termini. Predictions can be ranked by enrichment scores or Gene Evolutionary Rate Profiling (GERP) scores, a measurement of evolutionary constraint. By searching for new anchored sequences on the last 10 amino acids of proteins in the human proteome with lengths between 3–10 residues and up to 5 degenerate positions in the consensus sequences, we have identified new consensus sequences that predict instances in the majority of human genes. All of this information is consolidated into a database that can be accessed through a C-terminome web system with search and browse functions for minimotifs and human proteins. A known consensus sequence-based predicted function is assigned to nearly half the proteins in the human proteome. Weblink: http://cterminome.bio-toolkit.com. PMID:27050421

  14. Capillaries for use in a multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, Edward S.; Chang, Huan-Tsang; Fung, Eliza N.

    1997-12-09

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification ("base calling") is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations.

  15. Capillaries for use in a multiplexed capillary electrophoresis system

    DOEpatents

    Yeung, E.S.; Chang, H.T.; Fung, E.N.

    1997-12-09

    The invention provides a side-entry optical excitation geometry for use in a multiplexed capillary electrophoresis system. A charge-injection device is optically coupled to capillaries in the array such that the interior of a capillary is imaged onto only one pixel. In Sanger-type 4-label DNA sequencing reactions, nucleotide identification (``base calling``) is improved by using two long-pass filters to split fluorescence emission into two emission channels. A binary poly(ethyleneoxide) matrix is used in the electrophoretic separations. 19 figs.

  16. Program Synthesizes UML Sequence Diagrams

    NASA Technical Reports Server (NTRS)

    Barry, Matthew R.; Osborne, Richard N.

    2006-01-01

    A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.

  17. Petri net modeling of high-order genetic systems using grammatical evolution.

    PubMed

    Moore, Jason H; Hahn, Lance W

    2003-11-01

    Understanding how DNA sequence variations impact human health through a hierarchy of biochemical and physiological systems is expected to improve the diagnosis, prevention, and treatment of common, complex human diseases. We have previously developed a hierarchical dynamic systems approach based on Petri nets for generating biochemical network models that are consistent with genetic models of disease susceptibility. This modeling approach uses an evolutionary computation approach called grammatical evolution as a search strategy for optimal Petri net models. We have previously demonstrated that this approach routinely identifies biochemical network models that are consistent with a variety of genetic models in which disease susceptibility is determined by nonlinear interactions between two DNA sequence variations. In the present study, we evaluate whether the Petri net approach is capable of identifying biochemical networks that are consistent with disease susceptibility due to higher order nonlinear interactions between three DNA sequence variations. The results indicate that our model-building approach is capable of routinely identifying good, but not perfect, Petri net models. Ideas for improving the algorithm for this high-dimensional problem are presented.

  18. From sequencer to supercomputer: an automatic pipeline for managing and processing next generation sequencing data.

    PubMed

    Camerlengo, Terry; Ozer, Hatice Gulcin; Onti-Srinivasan, Raghuram; Yan, Pearlly; Huang, Tim; Parvin, Jeffrey; Huang, Kun

    2012-01-01

    Next Generation Sequencing is highly resource intensive. NGS Tasks related to data processing, management and analysis require high-end computing servers or even clusters. Additionally, processing NGS experiments requires suitable storage space and significant manual interaction. At The Ohio State University's Biomedical Informatics Shared Resource, we designed and implemented a scalable architecture to address the challenges associated with the resource intensive nature of NGS secondary analysis built around Illumina Genome Analyzer II sequencers and Illumina's Gerald data processing pipeline. The software infrastructure includes a distributed computing platform consisting of a LIMS called QUEST (http://bisr.osumc.edu), an Automation Server, a computer cluster for processing NGS pipelines, and a network attached storage device expandable up to 40TB. The system has been architected to scale to multiple sequencers without requiring additional computing or labor resources. This platform provides demonstrates how to manage and automate NGS experiments in an institutional or core facility setting.

  19. Augmented brain function by coordinated reset stimulation with slowly varying sequences.

    PubMed

    Zeitler, Magteld; Tass, Peter A

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS.

  20. Augmented brain function by coordinated reset stimulation with slowly varying sequences

    PubMed Central

    Zeitler, Magteld; Tass, Peter A.

    2015-01-01

    Several brain disorders are characterized by abnormally strong neuronal synchrony. Coordinated Reset (CR) stimulation was developed to selectively counteract abnormal neuronal synchrony by desynchronization. For this, phase resetting stimuli are delivered to different subpopulations in a timely coordinated way. In neural networks with spike timing-dependent plasticity CR stimulation may eventually lead to an anti-kindling, i.e., an unlearning of abnormal synaptic connectivity and abnormal synchrony. The spatiotemporal sequence by which all stimulation sites are stimulated exactly once is called the stimulation site sequence, or briefly sequence. So far, in simulations, pre-clinical and clinical applications CR was applied either with fixed sequences or rapidly varying sequences (RVS). In this computational study we show that appropriate repetition of the sequence with occasional random switching to the next sequence may significantly improve the anti-kindling effect of CR. To this end, a sequence is applied many times before randomly switching to the next sequence. This new method is called SVS CR stimulation, i.e., CR with slowly varying sequences. In a neuronal network with strong short-range excitatory and weak long-range inhibitory dynamic couplings SVS CR stimulation turns out to be superior to CR stimulation with fixed sequences or RVS. PMID:25873867

  1. Multistage Spectral Relaxation Method for Solving the Hyperchaotic Complex Systems

    PubMed Central

    Saberi Nik, Hassan; Rebelo, Paulo

    2014-01-01

    We present a pseudospectral method application for solving the hyperchaotic complex systems. The proposed method, called the multistage spectral relaxation method (MSRM) is based on a technique of extending Gauss-Seidel type relaxation ideas to systems of nonlinear differential equations and using the Chebyshev pseudospectral methods to solve the resulting system on a sequence of multiple intervals. In this new application, the MSRM is used to solve famous hyperchaotic complex systems such as hyperchaotic complex Lorenz system and the complex permanent magnet synchronous motor. We compare this approach to the Runge-Kutta based ode45 solver to show that the MSRM gives accurate results. PMID:25386624

  2. In-depth comparison of somatic point mutation callers based on different tumor next-generation sequencing depth data

    NASA Astrophysics Data System (ADS)

    Cai, Lei; Yuan, Wei; Zhang, Zhou; He, Lin; Chou, Kuo-Chen

    2016-11-01

    Four popular somatic single nucleotide variant (SNV) calling methods (Varscan, SomaticSniper, Strelka and MuTect2) were carefully evaluated on the real whole exome sequencing (WES, depth of ~50X) and ultra-deep targeted sequencing (UDT-Seq, depth of ~370X) data. The four tools returned poor consensus on candidates (only 20% of calls were with multiple hits by the callers). For both WES and UDT-Seq, MuTect2 and Strelka obtained the largest proportion of COSMIC entries as well as the lowest rate of dbSNP presence and high-alternative-alleles-in-control calls, demonstrating their superior sensitivity and accuracy. Combining different callers does increase reliability of candidates, but narrows the list down to very limited range of tumor read depth and variant allele frequency. Calling SNV on UDT-Seq data, which were of much higher read-depth, discovered additional true-positive variations, despite an even more tremendous growth in false positive predictions. Our findings not only provide valuable benchmark for state-of-the-art SNV calling methods, but also shed light on the access to more accurate SNV identification in the future.

  3. In Vivo-Induced Genes in Pseudomonas aeruginosa

    PubMed Central

    Handfield, Martin; Lehoux, Dario E.; Sanschagrin, François; Mahan, Michael J.; Woods, Donald E.; Levesque, Roger C.

    2000-01-01

    In vivo expression technology was used for testing Pseudomonas aeruginosa in the rat lung model of chronic infection and in a mouse model of systemic infection. Three of the eight ivi proteins found showed sequence identity to known virulence factors involved in iron acquisition via an open reading frame (called pvdI) implicated in pyoverdine biosynthesis, membrane biogenesis (FtsY), and adhesion (Hag2). PMID:10722644

  4. The grief map

    NASA Astrophysics Data System (ADS)

    Monteiro, L. H. A.

    2014-12-01

    Grieving is a natural human reaction to a significant loss. According to a psychiatric model, this process is characterized by a typical sequence of psychological changes. Here, I propose a discrete-time dynamical system, called the grief map, in order to represent the grieving process. The corresponding bifurcation diagram, which exhibits stationary, periodic, and chaotic behavior, is related to the stages of this sorrowful journey occurring during about 12 months post-loss.

  5. Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success.

    PubMed

    Humble, Emily; Thorne, Michael A S; Forcada, Jaume; Hoffman, Joseph I

    2016-08-26

    Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of 'putative' SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms.

  6. A quasi-intermittency

    NASA Astrophysics Data System (ADS)

    He, Da-Ren; Wang, Xu-Ming; Wang, Ying-Mei; Wang, Wen-Xiu; Chen, He-Sheng

    2002-03-01

    A kind of discontinuous and noninvertible area-preserving maps can display behaviors as a dissipative one, so it may be addressed as a "quasi-dissipative system"^1. In a quasi-dissipative system the disappearance of some elliptic periodic orbits and the elliptic islands around them via a collision with the discontinuous border of the system function can be observed. A chaotic quasi-attractor dominates behavior of the system after the disappearance of the elliptic periodic orbit and a sequence of transition elliptic periodic orbits. When the chaotic quasi-attractor just appears, the chaotic time sequence shows a random intersperse between laminar and turbulence phases. All these are very similar to the properties of type V intermittency happened in a dissipative system. So, we may call the phenomenon as a "type V quasi-intermittency". However, there can be only some remnants of the last disappeared transition elliptic island instead of its "ghost", therefore type V quasi-intermittency does not obey the characteristic scaling laws of type V intermittency. ^1 J. Wang et al., Phys.Rev.E, 64(2001)026202.

  7. An extension of command shaping methods for controlling residual vibration using frequency sampling

    NASA Technical Reports Server (NTRS)

    Singer, Neil C.; Seering, Warren P.

    1992-01-01

    The authors present an extension to the impulse shaping technique for commanding machines to move with reduced residual vibration. The extension, called frequency sampling, is a method for generating constraints that are used to obtain shaping sequences which minimize residual vibration in systems such as robots whose resonant frequencies change during motion. The authors present a review of impulse shaping methods, a development of the proposed extension, and a comparison of results of tests conducted on a simple model of the space shuttle robot arm. Frequency shaping provides a method for minimizing the impulse sequence duration required to give the desired insensitivity.

  8. Optimized, unequal pulse spacing in multiple echo sequences improves refocusing in magnetic resonance.

    PubMed

    Jenista, Elizabeth R; Stokes, Ashley M; Branca, Rosa Tamara; Warren, Warren S

    2009-11-28

    A recent quantum computing paper (G. S. Uhrig, Phys. Rev. Lett. 98, 100504 (2007)) analytically derived optimal pulse spacings for a multiple spin echo sequence designed to remove decoherence in a two-level system coupled to a bath. The spacings in what has been called a "Uhrig dynamic decoupling (UDD) sequence" differ dramatically from the conventional, equal pulse spacing of a Carr-Purcell-Meiboom-Gill (CPMG) multiple spin echo sequence. The UDD sequence was derived for a model that is unrelated to magnetic resonance, but was recently shown theoretically to be more general. Here we show that the UDD sequence has theoretical advantages for magnetic resonance imaging of structured materials such as tissue, where diffusion in compartmentalized and microstructured environments leads to fluctuating fields on a range of different time scales. We also show experimentally, both in excised tissue and in a live mouse tumor model, that optimal UDD sequences produce different T(2)-weighted contrast than do CPMG sequences with the same number of pulses and total delay, with substantial enhancements in most regions. This permits improved characterization of low-frequency spectral density functions in a wide range of applications.

  9. BioVLAB-MMIA-NGS: microRNA-mRNA integrated analysis using high-throughput sequencing data.

    PubMed

    Chae, Heejoon; Rhee, Sungmin; Nephew, Kenneth P; Kim, Sun

    2015-01-15

    It is now well established that microRNAs (miRNAs) play a critical role in regulating gene expression in a sequence-specific manner, and genome-wide efforts are underway to predict known and novel miRNA targets. However, the integrated miRNA-mRNA analysis remains a major computational challenge, requiring powerful informatics systems and bioinformatics expertise. The objective of this study was to modify our widely recognized Web server for the integrated mRNA-miRNA analysis (MMIA) and its subsequent deployment on the Amazon cloud (BioVLAB-MMIA) to be compatible with high-throughput platforms, including next-generation sequencing (NGS) data (e.g. RNA-seq). We developed a new version called the BioVLAB-MMIA-NGS, deployed on both Amazon cloud and on a high-performance publicly available server called MAHA. By using NGS data and integrating various bioinformatics tools and databases, BioVLAB-MMIA-NGS offers several advantages. First, sequencing data is more accurate than array-based methods for determining miRNA expression levels. Second, potential novel miRNAs can be detected by using various computational methods for characterizing miRNAs. Third, because miRNA-mediated gene regulation is due to hybridization of an miRNA to its target mRNA, sequencing data can be used to identify many-to-many relationship between miRNAs and target genes with high accuracy. http://epigenomics.snu.ac.kr/biovlab_mmia_ngs/. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  10. Address the Major Societal Challenges

    NASA Astrophysics Data System (ADS)

    Laubichler, Manfred

    In his famous historical account about the origins of molecular biology Gunther Stent introduced a three phase sequence that turns out to be characteristic for many newly emerging paradigms within science. New ideas, according to Stent, follow a sequence of romantic, dogmatic, and academic phases. One can easily see that complex systems science followed this path. The question now is whether we are in an extended academic phase of gradually expanding both theoretical and practical knowledge, or whether we are entering a new transformation of complex systems science that might well bring about a new romantic phase. I would argue that complexity science, indeed, is at the dawn of a new period - let's call it complexity 3.0. The last academic phase has seen the application of complex systems ideas and methods in a variety of different domains. It has been to a large extent business as usual...

  11. Connecting the dots between genes, biochemistry, and disease susceptibility: systems biology modeling in human genetics.

    PubMed

    Moore, Jason H; Boczko, Erik M; Summar, Marshall L

    2005-02-01

    Understanding how DNA sequence variations impact human health through a hierarchy of biochemical and physiological systems is expected to improve the diagnosis, prevention, and treatment of common, complex human diseases. We have previously developed a hierarchical dynamic systems approach based on Petri nets for generating biochemical network models that are consistent with genetic models of disease susceptibility. This modeling approach uses an evolutionary computation approach called grammatical evolution as a search strategy for optimal Petri net models. We have previously demonstrated that this approach routinely identifies biochemical network models that are consistent with a variety of genetic models in which disease susceptibility is determined by nonlinear interactions between two or more DNA sequence variations. We review here this approach and then discuss how it can be used to model biochemical and metabolic data in the context of genetic studies of human disease susceptibility.

  12. Physical mode of bacteria and virus coevolution

    NASA Astrophysics Data System (ADS)

    Han, Pu; Niestemski, Liang; Deem, Michael

    2013-03-01

    Single-cell hosts such as bacteria or archaea possess an adaptive, heritable immune system that protects them from viral invasion. This system, known as the CRISPR-Cas system, allows the host to recognize and incorporate short foreign DNA or RNA sequences from viruses or plasmids. The sequences form what are called ``spacers'' in the CRISPR. Spacers in the CRISPR loci provide a record of the host and predator coevolution history. We develop a physical model to study the dynamics of this coevolution due to immune pressure. Hosts and viruses reproduce, die, and evolve due to viral infection pressure, host immune pressure, and mutation. We will discuss the differing effects of point mutation and recombination on CRISPR evolution. We will also discuss the effect of different spacer deletion mechanisms. We will describe population structure of hosts and viruses, how spacer diversity depends on position within CRISPR, and match of the CRISPR spacers to the virus population.

  13. Forecasting drought risks for a water supply storage system using bootstrap position analysis

    USGS Publications Warehouse

    Tasker, Gary; Dunne, Paul

    1997-01-01

    Forecasting the likelihood of drought conditions is an integral part of managing a water supply storage and delivery system. Position analysis uses a large number of possible flow sequences as inputs to a simulation of a water supply storage and delivery system. For a given set of operating rules and water use requirements, water managers can use such a model to forecast the likelihood of specified outcomes such as reservoir levels falling below a specified level or streamflows falling below statutory passing flows a few months ahead conditioned on the current reservoir levels and streamflows. The large number of possible flow sequences are generated using a stochastic streamflow model with a random resampling of innovations. The advantages of this resampling scheme, called bootstrap position analysis, are that it does not rely on the unverifiable assumption of normality and it allows incorporation of long-range weather forecasts into the analysis.

  14. Sequence Diversity Diagram for comparative analysis of multiple sequence alignments.

    PubMed

    Sakai, Ryo; Aerts, Jan

    2014-01-01

    The sequence logo is a graphical representation of a set of aligned sequences, commonly used to depict conservation of amino acid or nucleotide sequences. Although it effectively communicates the amount of information present at every position, this visual representation falls short when the domain task is to compare between two or more sets of aligned sequences. We present a new visual presentation called a Sequence Diversity Diagram and validate our design choices with a case study. Our software was developed using the open-source program called Processing. It loads multiple sequence alignment FASTA files and a configuration file, which can be modified as needed to change the visualization. The redesigned figure improves on the visual comparison of two or more sets, and it additionally encodes information on sequential position conservation. In our case study of the adenylate kinase lid domain, the Sequence Diversity Diagram reveals unexpected patterns and new insights, for example the identification of subgroups within the protein subfamily. Our future work will integrate this visual encoding into interactive visualization tools to support higher level data exploration tasks.

  15. Does similarity in call structure or foraging ecology explain interspecific information transfer in wild Myotis bats?

    PubMed

    Hügel, Theresa; van Meir, Vincent; Muñoz-Meneses, Amanda; Clarin, B-Markus; Siemers, Björn M; Goerlitz, Holger R

    2017-01-01

    Animals can gain important information by attending to the signals and cues of other animals in their environment, with acoustic information playing a major role in many taxa. Echolocation call sequences of bats contain information about the identity and behaviour of the sender which is perceptible to close-by receivers. Increasing evidence supports the communicative function of echolocation within species, yet data about its role for interspecific information transfer is scarce. Here, we asked which information bats extract from heterospecific echolocation calls during foraging. In three linked playback experiments, we tested in the flight room and field if foraging Myotis bats approached the foraging call sequences of conspecifics and four heterospecifics that were similar in acoustic call structure only (acoustic similarity hypothesis), in foraging ecology only (foraging similarity hypothesis), both, or none. Compared to the natural prey capture rate of 1.3 buzzes per minute of bat activity, our playbacks of foraging sequences with 23-40 buzzes/min simulated foraging patches with significantly higher profitability. In the flight room, M. capaccinii only approached call sequences of conspecifics and of the heterospecific M. daubentonii with similar acoustics and foraging ecology. In the field, M. capaccinii and M. daubentonii only showed a weak positive response to those two species. Our results confirm information transfer across species boundaries and highlight the importance of context on the studied behaviour, but cannot resolve whether information transfer in trawling Myotis is based on acoustic similarity only or on a combination of similarity in acoustics and foraging ecology. Animals transfer information, both voluntarily and inadvertently, and within and across species boundaries. In echolocating bats, acoustic call structure and foraging ecology are linked, making echolocation calls a rich source of information about species identity, ecology and activity of the sender, which receivers might exploit to find profitable foraging grounds. We tested in three lab and field experiments if information transfer occurs between bat species and if bats obtain information about ecology from echolocation calls. Myotis capaccinii/daubentonii bats approached call playbacks, but only those from con- and heterospecifics with similar call structure and foraging ecology, confirming interspecific information transfer. Reactions differed between lab and field, emphasising situation-dependent differences in animal behaviour, the importance of field research, and the need for further studies on the underlying mechanism of information transfer and the relative contributions of acoustic and ecological similarity.

  16. Verifying Digital Components of Physical Systems: Experimental Evaluation of Test Quality

    NASA Astrophysics Data System (ADS)

    Laputenko, A. V.; López, J. E.; Yevtushenko, N. V.

    2018-03-01

    This paper continues the study of high quality test derivation for verifying digital components which are used in various physical systems; those are sensors, data transfer components, etc. We have used logic circuits b01-b010 of the package of ITC'99 benchmarks (Second Release) for experimental evaluation which as stated before, describe digital components of physical systems designed for various applications. Test sequences are derived for detecting the most known faults of the reference logic circuit using three different approaches to test derivation. Three widely used fault types such as stuck-at-faults, bridges, and faults which slightly modify the behavior of one gate are considered as possible faults of the reference behavior. The most interesting test sequences are short test sequences that can provide appropriate guarantees after testing, and thus, we experimentally study various approaches to the derivation of the so-called complete test suites which detect all fault types. In the first series of experiments, we compare two approaches for deriving complete test suites. In the first approach, a shortest test sequence is derived for testing each fault. In the second approach, a test sequence is pseudo-randomly generated by the use of an appropriate software for logic synthesis and verification (ABC system in our study) and thus, can be longer. However, after deleting sequences detecting the same set of faults, a test suite returned by the second approach is shorter. The latter underlines the fact that in many cases it is useless to spend `time and efforts' for deriving a shortest distinguishing sequence; it is better to use the test minimization afterwards. The performed experiments also show that the use of only randomly generated test sequences is not very efficient since such sequences do not detect all the faults of any type. After reaching the fault coverage around 70%, saturation is observed, and the fault coverage cannot be increased anymore. For deriving high quality short test suites, the approach that is the combination of randomly generated sequences together with sequences which are aimed to detect faults not detected by random tests, allows to reach the good fault coverage using shortest test sequences.

  17. DNA base-calling from a nanopore using a Viterbi algorithm.

    PubMed

    Timp, Winston; Comer, Jeffrey; Aksimentiev, Aleksei

    2012-05-16

    Nanopore-based DNA sequencing is the most promising third-generation sequencing method. It has superior read length, speed, and sample requirements compared with state-of-the-art second-generation methods. However, base-calling still presents substantial difficulty because the resolution of the technique is limited compared with the measured signal/noise ratio. Here we demonstrate a method to decode 3-bp-resolution nanopore electrical measurements into a DNA sequence using a Hidden Markov model. This method shows tremendous potential for accuracy (~98%), even with a poor signal/noise ratio. Copyright © 2012 Biophysical Society. Published by Elsevier Inc. All rights reserved.

  18. BALSA: integrated secondary analysis for whole-genome and whole-exome sequencing, accelerated by GPU.

    PubMed

    Luo, Ruibang; Wong, Yiu-Lun; Law, Wai-Chun; Lee, Lap-Kei; Cheung, Jeanno; Liu, Chi-Man; Lam, Tak-Wah

    2014-01-01

    This paper reports an integrated solution, called BALSA, for the secondary analysis of next generation sequencing data; it exploits the computational power of GPU and an intricate memory management to give a fast and accurate analysis. From raw reads to variants (including SNPs and Indels), BALSA, using just a single computing node with a commodity GPU board, takes 5.5 h to process 50-fold whole genome sequencing (∼750 million 100 bp paired-end reads), or just 25 min for 210-fold whole exome sequencing. BALSA's speed is rooted at its parallel algorithms to effectively exploit a GPU to speed up processes like alignment, realignment and statistical testing. BALSA incorporates a 16-genotype model to support the calling of SNPs and Indels and achieves competitive variant calling accuracy and sensitivity when compared to the ensemble of six popular variant callers. BALSA also supports efficient identification of somatic SNVs and CNVs; experiments showed that BALSA recovers all the previously validated somatic SNVs and CNVs, and it is more sensitive for somatic Indel detection. BALSA outputs variants in VCF format. A pileup-like SNAPSHOT format, while maintaining the same fidelity as BAM in variant calling, enables efficient storage and indexing, and facilitates the App development of downstream analyses. BALSA is available at: http://sourceforge.net/p/balsa.

  19. Using cellular automata to generate image representation for biological sequences.

    PubMed

    Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C

    2005-02-01

    A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.

  20. Use of sequence-independent-single-primer-amplification (SISPA) for whole genome sequencing using illumina MiSeq platform for avian influenza virus, Newcastle disease virus, and infectious bronchitis virus

    USDA-ARS?s Scientific Manuscript database

    Over the past decade, Next Generation Sequencing (NGS) technologies, also called deep sequencing, have continued to evolve, increasing capacity and lower the cost necessary for large genome sequencing projects. The one of the advantage of NGS platforms is the possibility to sequence the samples with...

  1. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers.

    PubMed

    Quail, Michael A; Smith, Miriam; Coupland, Paul; Otto, Thomas D; Harris, Simon R; Connor, Thomas R; Bertoni, Anna; Swerdlow, Harold P; Gu, Yong

    2012-07-24

    Next generation sequencing (NGS) technology has revolutionized genomic and genetic research. The pace of change in this area is rapid with three major new sequencing platforms having been released in 2011: Ion Torrent's PGM, Pacific Biosciences' RS and the Illumina MiSeq. Here we compare the results obtained with those platforms to the performance of the Illumina HiSeq, the current market leader. In order to compare these platforms, and get sufficient coverage depth to allow meaningful analysis, we have sequenced a set of 4 microbial genomes with mean GC content ranging from 19.3 to 67.7%. Together, these represent a comprehensive range of genome content. Here we report our analysis of that sequence data in terms of coverage distribution, bias, GC distribution, variant detection and accuracy. Sequence generated by Ion Torrent, MiSeq and Pacific Biosciences technologies displays near perfect coverage behaviour on GC-rich, neutral and moderately AT-rich genomes, but a profound bias was observed upon sequencing the extremely AT-rich genome of Plasmodium falciparum on the PGM, resulting in no coverage for approximately 30% of the genome. We analysed the ability to call variants from each platform and found that we could call slightly more variants from Ion Torrent data compared to MiSeq data, but at the expense of a higher false positive rate. Variant calling from Pacific Biosciences data was possible but higher coverage depth was required. Context specific errors were observed in both PGM and MiSeq data, but not in that from the Pacific Biosciences platform. All three fast turnaround sequencers evaluated here were able to generate usable sequence. However there are key differences between the quality of that data and the applications it will support.

  2. Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

    NASA Technical Reports Server (NTRS)

    Wallace, G. R.; Weathers, G. D.; Graf, E. R.

    1973-01-01

    The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.

  3. SeqMule: automated pipeline for analysis of human exome/genome sequencing data.

    PubMed

    Guo, Yunfei; Ding, Xiaolei; Shen, Yufeng; Lyon, Gholson J; Wang, Kai

    2015-09-18

    Next-generation sequencing (NGS) technology has greatly helped us identify disease-contributory variants for Mendelian diseases. However, users are often faced with issues such as software compatibility, complicated configuration, and no access to high-performance computing facility. Discrepancies exist among aligners and variant callers. We developed a computational pipeline, SeqMule, to perform automated variant calling from NGS data on human genomes and exomes. SeqMule integrates computational-cluster-free parallelization capability built on top of the variant callers, and facilitates normalization/intersection of variant calls to generate consensus set with high confidence. SeqMule integrates 5 alignment tools, 5 variant calling algorithms and accepts various combinations all by one-line command, therefore allowing highly flexible yet fully automated variant calling. In a modern machine (2 Intel Xeon X5650 CPUs, 48 GB memory), when fast turn-around is needed, SeqMule generates annotated VCF files in a day from a 30X whole-genome sequencing data set; when more accurate calling is needed, SeqMule generates consensus call set that improves over single callers, as measured by both Mendelian error rate and consistency. SeqMule supports Sun Grid Engine for parallel processing, offers turn-key solution for deployment on Amazon Web Services, allows quality check, Mendelian error check, consistency evaluation, HTML-based reports. SeqMule is available at http://seqmule.openbioinformatics.org.

  4. OTG-snpcaller: An Optimized Pipeline Based on TMAP and GATK for SNP Calling from Ion Torrent Data

    PubMed Central

    Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y. Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology’s Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences. PMID:24824529

  5. OTG-snpcaller: an optimized pipeline based on TMAP and GATK for SNP calling from ion torrent data.

    PubMed

    Zhu, Pengyuan; He, Lingyu; Li, Yaqiao; Huang, Wenpan; Xi, Feng; Lin, Lin; Zhi, Qihuan; Zhang, Wenwei; Tang, Y Tom; Geng, Chunyu; Lu, Zhiyuan; Xu, Xun

    2014-01-01

    Because the new Proton platform from Life Technologies produced markedly different data from those of the Illumina platform, the conventional Illumina data analysis pipeline could not be used directly. We developed an optimized SNP calling method using TMAP and GATK (OTG-snpcaller). This method combined our own optimized processes, Remove Duplicates According to AS Tag (RDAST) and Alignment Optimize Structure (AOS), together with TMAP and GATK, to call SNPs from Proton data. We sequenced four sets of exomes captured by Agilent SureSelect and NimbleGen SeqCap EZ Kit, using Life Technology's Ion Proton sequencer. Then we applied OTG-snpcaller and compared our results with the results from Torrent Variants Caller. The results indicated that OTG-snpcaller can reduce both false positive and false negative rates. Moreover, we compared our results with Illumina results generated by GATK best practices, and we found that the results of these two platforms were comparable. The good performance in variant calling using GATK best practices can be primarily attributed to the high quality of the Illumina sequences.

  6. Computing Lives And Reliabilities Of Turboprop Transmissions

    NASA Technical Reports Server (NTRS)

    Coy, J. J.; Savage, M.; Radil, K. C.; Lewicki, D. G.

    1991-01-01

    Computer program PSHFT calculates lifetimes of variety of aircraft transmissions. Consists of main program, series of subroutines applying to specific configurations, generic subroutines for analysis of properties of components, subroutines for analysis of system, and common block. Main program selects routines used in analysis and causes them to operate in desired sequence. Series of configuration-specific subroutines put in configuration data, perform force and life analyses for components (with help of generic component-property-analysis subroutines), fill property array, call up system-analysis routines, and finally print out results of analysis for system and components. Written in FORTRAN 77(IV).

  7. Isolation and molecular typing of Naegleria fowleri from the brain of a cow that died of primary amebic meningoencephalitis.

    PubMed

    Visvesvara, Govinda S; De Jonckheere, Johan F; Sriram, Rama; Daft, Barbara

    2005-08-01

    Naegleria fowleri causes an acute and rapidly fatal central nervous system infection called primary amebic meningoencephalitis (PAM) in healthy children and young adults. We describe here the identification of N. fowleri isolated from the brain of one of several cows that died of PAM based on sequencing of the internal transcribed spacers, including the 5.8S rRNA genes.

  8. Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

    PubMed

    Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

    2017-01-01

    The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.

  9. Detailed temporal structure of communication networks in groups of songbirds.

    PubMed

    Stowell, Dan; Gill, Lisa; Clayton, David

    2016-06-01

    Animals in groups often exchange calls, in patterns whose temporal structure may be influenced by contextual factors such as physical location and the social network structure of the group. We introduce a model-based analysis for temporal patterns of animal call timing, originally developed for networks of firing neurons. This has advantages over cross-correlation analysis in that it can correctly handle common-cause confounds and provides a generative model of call patterns with explicit parameters for the influences between individuals. It also has advantages over standard Markovian analysis in that it incorporates detailed temporal interactions which affect timing as well as sequencing of calls. Further, a fitted model can be used to generate novel synthetic call sequences. We apply the method to calls recorded from groups of domesticated zebra finch (Taeniopygia guttata) individuals. We find that the communication network in these groups has stable structure that persists from one day to the next, and that 'kernels' reflecting the temporal range of influence have a characteristic structure for a calling individual's effect on itself, its partner and on others in the group. We further find characteristic patterns of influences by call type as well as by individual. © 2016 The Authors.

  10. With whom to dine? Ravens' responses to food-associated calls depend on individual characteristics of the caller

    PubMed Central

    Szipl, Georgine; Boeckle, Markus; Wascher, Claudia A.F.; Spreafico, Michela; Bugnyar, Thomas

    2015-01-01

    Upon discovering food, common ravens, Corvus corax, produce far-reaching ‘haa’ calls or yells, which are individually distinct and signal food availability to conspecifics. Here, we investigated whether ravens respond differently to ‘haa’ calls of known and unknown individuals. In a paired playback design, we tested responses to ‘haa’ call sequences in a group containing individually marked free-ranging ravens. We simultaneously played call sequences of a male and a female raven in two different locations and varied familiarity (known or unknown to the local group). Ravens responded strongest to dyads containing familiar females, performing more scan flights above and by perching in trees near the respective speaker. Acoustic analysis of the calls used as stimuli showed no sex-, age- or familiarity-specific acoustic cues, but highly significant classification results at the individual level. Taken together, our findings indicate that ravens respond to individual characteristics in ‘haa’ calls, and choose whom to approach for feeding, i.e. join social allies and avoid dominant conspecifics. This is the first study to investigate responses to ‘haa’ calls under natural conditions in a wild population containing individually marked ravens. PMID:25598542

  11. CRISPR-Cas9-Edited Site Sequencing (CRES-Seq): An Efficient and High-Throughput Method for the Selection of CRISPR-Cas9-Edited Clones.

    PubMed

    Veeranagouda, Yaligara; Debono-Lagneaux, Delphine; Fournet, Hamida; Thill, Gilbert; Didier, Michel

    2018-01-16

    The emergence of clustered regularly interspaced short palindromic repeats-Cas9 (CRISPR-Cas9) gene editing systems has enabled the creation of specific mutants at low cost, in a short time and with high efficiency, in eukaryotic cells. Since a CRISPR-Cas9 system typically creates an array of mutations in targeted sites, a successful gene editing project requires careful selection of edited clones. This process can be very challenging, especially when working with multiallelic genes and/or polyploid cells (such as cancer and plants cells). Here we described a next-generation sequencing method called CRISPR-Cas9 Edited Site Sequencing (CRES-Seq) for the efficient and high-throughput screening of CRISPR-Cas9-edited clones. CRES-Seq facilitates the precise genotyping up to 96 CRISPR-Cas9-edited sites (CRES) in a single MiniSeq (Illumina) run with an approximate sequencing cost of $6/clone. CRES-Seq is particularly useful when multiple genes are simultaneously targeted by CRISPR-Cas9, and also for screening of clones generated from multiallelic genes/polyploid cells. © 2018 by John Wiley & Sons, Inc. Copyright © 2018 John Wiley & Sons, Inc.

  12. Evolutionary conservation of sequence and secondary structures inCRISPR repeats

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Kunin, Victor; Sorek, Rotem; Hugenholtz, Philip

    Clustered Regularly Interspaced Palindromic Repeats (CRISPRs) are a novel class of direct repeats, separated by unique spacer sequences of similar length, that are present in {approx}40% of bacterial and all archaeal genomes analyzed to date. More than 40 gene families, called CRISPR-associated sequences (CAS), appear in conjunction with these repeats and are thought to be involved in the propagation and functioning of CRISPRs. It has been proposed that the CRISPR/CAS system samples, maintains a record of, and inactivates invasive DNA that the cell has encountered, and therefore constitutes a prokaryotic analog of an immune system. Here we analyze CRISPR repeatsmore » identified in 195 microbial genomes and show that they can be organized into multiple clusters based on sequence similarity. All individual repeats in any given cluster were inferred to form characteristic RNA secondary structure, ranging from non-existent to pronounced. Stable secondary structures included G:U base pairs and exhibited multiple compensatory base changes in the stem region, indicating evolutionary conservation and functional importance. We also show that the repeat-based classification corresponds to, and expands upon, a previously reported CAS gene-based classification including specific relationships between CRISPR and CAS subtypes.« less

  13. Simultaneous detection of transgenic DNA by surface plasmon resonance imaging with potential application to gene doping detection.

    PubMed

    Scarano, Simona; Ermini, Maria Laura; Spiriti, Maria Michela; Mascini, Marco; Bogani, Patrizia; Minunni, Maria

    2011-08-15

    Surface plasmon resonance imaging (SPRi) was used as the transduction principle for the development of optical-based sensing for transgenes detection in human cell lines. The objective was to develop a multianalyte, label-free, and real-time approach for DNA sequences that are identified as markers of transgenosis events. The strategy exploits SPRi sensing to detect the transgenic event by targeting selected marker sequences, which are present on shuttle vector backbone used to carry out the transfection of human embryonic kidney (HEK) cell lines. Here, we identified DNA sequences belonging to the Cytomegalovirus promoter and the Enhanced Green Fluorescent Protein gene. System development is discussed in terms of probe efficiency and influence of secondary structures on biorecognition reaction on sensor; moreover, optimization of PCR samples pretreatment was carried out to allow hybridization on biosensor, together with an approach to increase SPRi signals by in situ mass enhancement. Real-time PCR was also employed as reference technique for marker sequences detection on human HEK cells. We can foresee that the developed system may have potential applications in the field of antidoping research focused on the so-called gene doping.

  14. Identification and correction of systematic error in high-throughput sequence data

    PubMed Central

    2011-01-01

    Background A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed "next-gen" sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. Results We characterize and describe systematic errors using overlapping paired reads from high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that they are highly replicable across experiments. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq), and can be used with single-end datasets. Conclusions Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments. PMID:22099972

  15. Characterization of minimal sequences associated with self-similar interval exchange maps

    NASA Astrophysics Data System (ADS)

    Cobo, Milton; Gutiérrez-Romo, Rodolfo; Maass, Alejandro

    2018-04-01

    The construction of affine interval exchange maps (IEMs) with wandering intervals that are semi-conjugate to a given self-similar IEM is strongly related to the existence of the so-called minimal sequences associated with local potentials, which are certain elements of the substitution subshift arising from the given IEM. In this article, under the condition called unique representation property, we characterize such minimal sequences for potentials coming from non-real eigenvalues of the substitution matrix. We also give conditions on the slopes of the affine extensions of a self-similar IEM that determine whether it exhibits a wandering interval or not.

  16. Development of RT-components for the M-3 Strawberry Harvesting Robot

    NASA Astrophysics Data System (ADS)

    Yamashita, Tomoki; Tanaka, Motomasa; Yamamoto, Satoshi; Hayashi, Shigehiko; Saito, Sadafumi; Sugano, Shigeki

    We are now developing the strawberry harvest robot called “M-3” prototype robot system under the 4th urgent project of MAFF. In order to develop the control software of the M-3 robot more efficiently, we innovated the RT-middleware “OpenRTM-aist” software platform. In this system, we developed 9 kind of RT-Components (RTC): Robot task sequence player RTC, Proxy RTC for image processing software, DC motor controller RTC, Arm kinematics RTC, and so on. In this paper, we discuss advantages of RT-middleware developing system and problems about operating the RTC-configured robotic system by end-users.

  17. A non-parametric peak calling algorithm for DamID-Seq.

    PubMed

    Li, Renhua; Hempel, Leonie U; Jiang, Tingbo

    2015-01-01

    Protein-DNA interactions play a significant role in gene regulation and expression. In order to identify transcription factor binding sites (TFBS) of double sex (DSX)-an important transcription factor in sex determination, we applied the DNA adenine methylation identification (DamID) technology to the fat body tissue of Drosophila, followed by deep sequencing (DamID-Seq). One feature of DamID-Seq data is that induced adenine methylation signals are not assured to be symmetrically distributed at TFBS, which renders the existing peak calling algorithms for ChIP-Seq, including SPP and MACS, inappropriate for DamID-Seq data. This challenged us to develop a new algorithm for peak calling. A challenge in peaking calling based on sequence data is estimating the averaged behavior of background signals. We applied a bootstrap resampling method to short sequence reads in the control (Dam only). After data quality check and mapping reads to a reference genome, the peaking calling procedure compromises the following steps: 1) reads resampling; 2) reads scaling (normalization) and computing signal-to-noise fold changes; 3) filtering; 4) Calling peaks based on a statistically significant threshold. This is a non-parametric method for peak calling (NPPC). We also used irreproducible discovery rate (IDR) analysis, as well as ChIP-Seq data to compare the peaks called by the NPPC. We identified approximately 6,000 peaks for DSX, which point to 1,225 genes related to the fat body tissue difference between female and male Drosophila. Statistical evidence from IDR analysis indicated that these peaks are reproducible across biological replicates. In addition, these peaks are comparable to those identified by use of ChIP-Seq on S2 cells, in terms of peak number, location, and peaks width.

  18. Effects of weather conditions on emergency ambulance calls for acute coronary syndromes

    NASA Astrophysics Data System (ADS)

    Vencloviene, Jone; Babarskiene, Ruta; Dobozinskas, Paulius; Siurkaite, Viktorija

    2015-08-01

    The aim of this study was to evaluate the relationship between weather conditions and daily emergency ambulance calls for acute coronary syndromes (ACS). The study included data on 3631 patients who called the ambulance for chest pain and were admitted to the department of cardiology as patients with ACS. We investigated the effect of daily air temperature ( T), barometric pressure (BP), relative humidity, and wind speed (WS) to detect the risk areas for low and high daily volume (DV) of emergency calls. We used the classification and regression tree method as well as cluster analysis. The clusters were created by applying the k-means cluster algorithm using the standardized daily weather variables. The analysis was performed separately during cold (October-April) and warm (May-September) seasons. During the cold period, the greatest DV was observed on days of low T during the 3-day sequence, on cold and windy days, and on days of low BP and high WS during the 3-day sequence; low DV was associated with high BP and decreased WS on the previous day. During June-September, a lower DV was associated with low BP, windless days, and high BP and low WS during the 3-day sequence. During the warm period, the greatest DV was associated with increased BP and changing WS during the 3-day sequence. These results suggest that daily T, BP, and WS on the day of the ambulance call and on the two previous days may be prognostic variables for the risk of ACS.

  19. ViVaMBC: estimating viral sequence variation in complex populations from illumina deep-sequencing data using model-based clustering.

    PubMed

    Verbist, Bie; Clement, Lieven; Reumers, Joke; Thys, Kim; Vapirev, Alexander; Talloen, Willem; Wetzels, Yves; Meys, Joris; Aerssens, Jeroen; Bijnens, Luc; Thas, Olivier

    2015-02-22

    Deep-sequencing allows for an in-depth characterization of sequence variation in complex populations. However, technology associated errors may impede a powerful assessment of low-frequency mutations. Fortunately, base calls are complemented with quality scores which are derived from a quadruplet of intensities, one channel for each nucleotide type for Illumina sequencing. The highest intensity of the four channels determines the base that is called. Mismatch bases can often be corrected by the second best base, i.e. the base with the second highest intensity in the quadruplet. A virus variant model-based clustering method, ViVaMBC, is presented that explores quality scores and second best base calls for identifying and quantifying viral variants. ViVaMBC is optimized to call variants at the codon level (nucleotide triplets) which enables immediate biological interpretation of the variants with respect to their antiviral drug responses. Using mixtures of HCV plasmids we show that our method accurately estimates frequencies down to 0.5%. The estimates are unbiased when average coverages of 25,000 are reached. A comparison with the SNP-callers V-Phaser2, ShoRAH, and LoFreq shows that ViVaMBC has a superb sensitivity and specificity for variants with frequencies above 0.4%. Unlike the competitors, ViVaMBC reports a higher number of false-positive findings with frequencies below 0.4% which might partially originate from picking up artificial variants introduced by errors in the sample and library preparation step. ViVaMBC is the first method to call viral variants directly at the codon level. The strength of the approach lies in modeling the error probabilities based on the quality scores. Although the use of second best base calls appeared very promising in our data exploration phase, their utility was limited. They provided a slight increase in sensitivity, which however does not warrant the additional computational cost of running the offline base caller. Apparently a lot of information is already contained in the quality scores enabling the model based clustering procedure to adjust the majority of the sequencing errors. Overall the sensitivity of ViVaMBC is such that technical constraints like PCR errors start to form the bottleneck for low frequency variant detection.

  20. Identification of Clinical Coryneform Bacterial Isolates: Comparison of Biochemical Methods and Sequence Analysis of 16S rRNA and rpoB Genes▿

    PubMed Central

    Adderson, Elisabeth E.; Boudreaux, Jan W.; Cummings, Jessica R.; Pounds, Stanley; Wilson, Deborah A.; Procop, Gary W.; Hayden, Randall T.

    2008-01-01

    We compared the relative levels of effectiveness of three commercial identification kits and three nucleic acid amplification tests for the identification of coryneform bacteria by testing 50 diverse isolates, including 12 well-characterized control strains and 38 organisms obtained from pediatric oncology patients at our institution. Between 33.3 and 75.0% of control strains were correctly identified to the species level by phenotypic systems or nucleic acid amplification assays. The most sensitive tests were the API Coryne system and amplification and sequencing of the 16S rRNA gene using primers optimized for coryneform bacteria, which correctly identified 9 of 12 control isolates to the species level, and all strains with a high-confidence call were correctly identified. Organisms not correctly identified were species not included in the test kit databases or not producing a pattern of reactions included in kit databases or which could not be differentiated among several genospecies based on reaction patterns. Nucleic acid amplification assays had limited abilities to identify some bacteria to the species level, and comparison of sequence homologies was complicated by the inclusion of allele sequences obtained from uncultivated and uncharacterized strains in databases. The utility of rpoB genotyping was limited by the small number of representative gene sequences that are currently available for comparison. The correlation between identifications produced by different classification systems was poor, particularly for clinical isolates. PMID:18160450

  1. A Robust and Engineerable Self-Assembling Protein Template for the Synthesis and Patterning of Ordered Nanoparticle Arrays

    NASA Technical Reports Server (NTRS)

    McMillan, R. Andrew; Howard, Jeanie; Zaluzec, Nestor J.; Kagawa, Hiromi K.; Li, Yi-Fen; Paavola, Chad D.; Trent, Jonathan D.

    2004-01-01

    Self-assembling biomolecules that form highly ordered structures have attracted interest as potential alternatives to conventional lithographic processes for patterning materials. Here we introduce a general technique for patterning materials on the nanoscale using genetically modified protein cage structures called chaperonins that self-assemble into crystalline templates. Constrained chemical synthesis of transition metal nanoparticles is specific to templates genetically functionalized with poly-Histidine sequences. These arrays of materials are ordered by the nanoscale structure of the crystallized protein. This system may be easily adapted to pattern a variety of materials given the rapidly growing list of peptide sequences selected by screening for specificity for inorganic materials.

  2. Ends-in Vs. Ends-Out Recombination in Yeast

    PubMed Central

    Hastings, P. J.; McGill, C.; Shafer, B.; Strathern, J. N.

    1993-01-01

    Integration of linearized plasmids into yeast chromosomes has been used as a model system for the study of recombination initiated by double-strand breaks. The linearized plasmid DNA recombines efficiently into sequences homologous to the ends of the DNA. This efficient recombination occurs both for the configuration in which the break is in a contiguous region of homology (herein called the ends-in configuration) and for ``omega'' insertions in which plasmid sequences interrupt a linear region of homology (herein called the ends-out configuration). The requirements for integration of these two configurations are expected to be different. We compared these two processes in a yeast strain containing an ends-in target and an ends-out target for the same cut plasmid. Recovery of ends-in events exceeds ends-out events by two- to threefold. Possible causes for the origin of this small bias are discussed. The lack of an extreme difference in frequency implies that cooperativity between the two ends does not contribute to the efficiency with which cut circular plasmids are integrated. This may also be true for the repair of chromosomal double-strand breaks. PMID:8307337

  3. Advanced Transport Operating System (ATOPS) color displays software description microprocessor system

    NASA Technical Reports Server (NTRS)

    Slominski, Christopher J.; Plyler, Valerie E.; Dickson, Richard W.

    1992-01-01

    This document describes the software created for the Sperry Microprocessor Color Display System used for the Advanced Transport Operating Systems (ATOPS) project on the Transport Systems Research Vehicle (TSRV). The software delivery known as the 'baseline display system', is the one described in this document. Throughout this publication, module descriptions are presented in a standardized format which contains module purpose, calling sequence, detailed description, and global references. The global reference section includes procedures and common variables referenced by a particular module. The system described supports the Research Flight Deck (RFD) of the TSRV. The RFD contains eight cathode ray tubes (CRTs) which depict a Primary Flight Display, Navigation Display, System Warning Display, Takeoff Performance Monitoring System Display, and Engine Display.

  4. Methods For Self-Organizing Software

    DOEpatents

    Bouchard, Ann M.; Osbourn, Gordon C.

    2005-10-18

    A method for dynamically self-assembling and executing software is provided, containing machines that self-assemble execution sequences and data structures. In addition to ordered functions calls (found commonly in other software methods), mutual selective bonding between bonding sites of machines actuates one or more of the bonding machines. Two or more machines can be virtually isolated by a construct, called an encapsulant, containing a population of machines and potentially other encapsulants that can only bond with each other. A hierarchical software structure can be created using nested encapsulants. Multi-threading is implemented by populations of machines in different encapsulants that are interacting concurrently. Machines and encapsulants can move in and out of other encapsulants, thereby changing the functionality. Bonding between machines' sites can be deterministic or stochastic with bonding triggering a sequence of actions that can be implemented by each machine. A self-assembled execution sequence occurs as a sequence of stochastic binding between machines followed by their deterministic actuation. It is the sequence of bonding of machines that determines the execution sequence, so that the sequence of instructions need not be contiguous in memory.

  5. CRISPRFinder: a web tool to identify clustered regularly interspaced short palindromic repeats.

    PubMed

    Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

    2007-07-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) constitute a particular family of tandem repeats found in a wide range of prokaryotic genomes (half of eubacteria and almost all archaea). They consist of a succession of highly conserved regions (DR) varying in size from 23 to 47 bp, separated by similarly sized unique sequences (spacer) of usually viral origin. A CRISPR cluster is flanked on one side by an AT-rich sequence called the leader and assumed to be a transcriptional promoter. Recent studies suggest that this structure represents a putative RNA-interference-based immune system. Here we describe CRISPRFinder, a web service offering tools to (i) detect CRISPRs including the shortest ones (one or two motifs); (ii) define DRs and extract spacers; (iii) get the flanking sequences to determine the leader; (iv) blast spacers against Genbank database and (v) check if the DR is found elsewhere in prokaryotic sequenced genomes. CRISPRFinder is freely accessible at http://crispr.u-psud.fr/Server/CRISPRfinder.php.

  6. Detection of Spoofed MAC Addresses in 802.11 Wireless Networks

    NASA Astrophysics Data System (ADS)

    Tao, Kai; Li, Jing; Sampalli, Srinivas

    Medium Access Control (MAC) address spoofing is considered as an important first step in a hacker's attempt to launch a variety of attacks on 802.11 wireless networks. Unfortunately, MAC address spoofing is hard to detect. Most current spoofing detection systems mainly use the sequence number (SN) tracking technique, which has drawbacks. Firstly, it may lead to an increase in the number of false positives. Secondly, such techniques cannot be used in systems with wireless cards that do not follow standard 802.11 sequence number patterns. Thirdly, attackers can forge sequence numbers, thereby causing the attacks to go undetected. We present a new architecture called WISE GUARD (Wireless Security Guard) for detection of MAC address spoofing on 802.11 wireless LANs. It integrates three detection techniques - SN tracking, Operating System (OS) fingerprinting & tracking and Received Signal Strength (RSS) fingerprinting & tracking. It also includes the fingerprinting of Access Point (AP) parameters as an extension to the OS fingerprinting for detection of AP address spoofing. We have implemented WISE GUARD on a test bed using off-the-shelf wireless devices and open source drivers. Experimental results show that the new design enhances the detection effectiveness and reduces the number of false positives in comparison with current approaches.

  7. Exome sequencing reveals novel genetic loci influencing obesity-related traits in Hispanic children

    USDA-ARS?s Scientific Manuscript database

    To perform whole exome sequencing in 928 Hispanic children and identify variants and genes associated with childhood obesity.Single-nucleotide variants (SNVs) were identified from Illumina whole exome sequencing data using integrated read mapping, variant calling, and an annotation pipeline (Mercury...

  8. DNA sequence chromatogram browsing using JAVA and CORBA.

    PubMed

    Parsons, J D; Buehler, E; Hillier, L

    1999-03-01

    DNA sequence chromatograms (traces) are the primary data source for all large-scale genomic and expressed sequence tags (ESTs) sequencing projects. Access to the sequencing trace assists many later analyses, for example contig assembly and polymorphism detection, but obtaining and using traces is problematic. Traces are not collected and published centrally, they are much larger than the base calls derived from them, and viewing them requires the interactivity of a local graphical client with local data. To provide efficient global access to DNA traces, we developed a client/server system based on flexible Java components integrated into other applications including an applet for use in a WWW browser and a stand-alone trace viewer. Client/server interaction is facilitated by CORBA middleware which provides a well-defined interface, a naming service, and location independence. [The software is packaged as a Jar file available from the following URL: http://www.ebi.ac.uk/jparsons. Links to working examples of the trace viewers can be found at http://corba.ebi.ac.uk/EST. All the Washington University mouse EST traces are available for browsing at the same URL.

  9. ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

    PubMed

    Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi

    2014-03-01

    Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.

  10. Whole Genome Sequence of Two Wild-Derived Mus musculus domesticus Inbred Strains, LEWES/EiJ and ZALENDE/EiJ, with Different Diploid Numbers

    PubMed Central

    Morgan, Andrew P.; Didion, John P.; Doran, Anthony G.; Holt, James M.; McMillan, Leonard; Keane, Thomas M.; de Villena, Fernando Pardo-Manuel

    2016-01-01

    Wild-derived mouse inbred strains are becoming increasingly popular for complex traits analysis, evolutionary studies, and systems genetics. Here, we report the whole-genome sequencing of two wild-derived mouse inbred strains, LEWES/EiJ and ZALENDE/EiJ, of Mus musculus domesticus origin. These two inbred strains were selected based on their geographic origin, karyotype, and use in ongoing research. We generated 14× and 18× coverage sequence, respectively, and discovered over 1.1 million novel variants, most of which are private to one of these strains. This report expands the number of wild-derived inbred genomes in the Mus genus from six to eight. The sequence variation can be accessed via an online query tool; variant calls (VCF format) and alignments (BAM format) are available for download from a dedicated ftp site. Finally, the sequencing data have also been stored in a lossless, compressed, and indexed format using the multi-string Burrows-Wheeler transform. All data can be used without restriction. PMID:27765810

  11. The Role Of Rejuvenation In Shaping The High-Mass End Of The Main Sequence

    NASA Astrophysics Data System (ADS)

    Mancini, Chiara

    2017-06-01

    We investigate the nature of star forming galaxies with reduced specific SFRs and high stellar masses, those that seemingly cause the so-called bending of the main sequence. The fact that such objects host large bulges recently lead some to suggest that the internal formation of the bulges, via compaction or disk instabilities, was the late event that induced sSFRs of massive galaxies to drop in a slow downfall and thus the main sequence to bend. We have studied in detail a sample of 16 galaxies at 0.5

  12. Isolation and Molecular Typing of Naegleria fowleri from the Brain of a Cow That Died of Primary Amebic Meningoencephalitis

    PubMed Central

    Visvesvara, Govinda S.; De Jonckheere, Johan F.; Sriram, Rama; Daft, Barbara

    2005-01-01

    Naegleria fowleri causes an acute and rapidly fatal central nervous system infection called primary amebic meningoencephalitis (PAM) in healthy children and young adults. We describe here the identification of N. fowleri isolated from the brain of one of several cows that died of PAM based on sequencing of the internal transcribed spacers, including the 5.8S rRNA genes. PMID:16081978

  13. Tracking blue whales in the eastern tropical Pacific with an ocean-bottom seismometer and hydrophone array.

    PubMed

    Dunn, Robert A; Hernandez, Olga

    2009-09-01

    Low frequency northeastern Pacific blue whale calls were recorded near the northern East Pacific Rise (9 degrees N latitude) on 25 ocean-bottom-mounted hydrophones and three-component seismometers during a 5-day period (November 22-26, 1997). Call types A, B, C, and D were identified; the most common pattern being approximately 130-135 s repetitions of the AB sequence that, for any individual whale, persisted for hours. Up to eight individual blue whales were recorded near enough to the instruments to determine their locations and were tracked call-by-call using the B components of the calls and a Bayesian inversion procedure. For four of these eight whales, the entire call sequences and swim tracks were determined for 20-26-h periods; the other whales were tracked for much shorter periods. The eight whales moved into the area during a period of airgun activity conducted by the academic seismic ship R/V Maurice Ewing. The authors examined the whales' locations and call characteristics with respect to the periods of airgun activity. Although the data do not permit a thorough investigation of behavioral responses, no correlation in vocalization or movement with airgun activity was observed.

  14. Detecting very low allele fraction variants using targeted DNA sequencing and a novel molecular barcode-aware variant caller.

    PubMed

    Xu, Chang; Nezami Ranjbar, Mohammad R; Wu, Zhong; DiCarlo, John; Wang, Yexun

    2017-01-03

    Detection of DNA mutations at very low allele fractions with high accuracy will significantly improve the effectiveness of precision medicine for cancer patients. To achieve this goal through next generation sequencing, researchers need a detection method that 1) captures rare mutation-containing DNA fragments efficiently in the mix of abundant wild-type DNA; 2) sequences the DNA library extensively to deep coverage; and 3) distinguishes low level true variants from amplification and sequencing errors with high accuracy. Targeted enrichment using PCR primers provides researchers with a convenient way to achieve deep sequencing for a small, yet most relevant region using benchtop sequencers. Molecular barcoding (or indexing) provides a unique solution for reducing sequencing artifacts analytically. Although different molecular barcoding schemes have been reported in recent literature, most variant calling has been done on limited targets, using simple custom scripts. The analytical performance of barcode-aware variant calling can be significantly improved by incorporating advanced statistical models. We present here a highly efficient, simple and scalable enrichment protocol that integrates molecular barcodes in multiplex PCR amplification. In addition, we developed smCounter, an open source, generic, barcode-aware variant caller based on a Bayesian probabilistic model. smCounter was optimized and benchmarked on two independent read sets with SNVs and indels at 5 and 1% allele fractions. Variants were called with very good sensitivity and specificity within coding regions. We demonstrated that we can accurately detect somatic mutations with allele fractions as low as 1% in coding regions using our enrichment protocol and variant caller.

  15. Processing of Natural Echolocation Sequences in the Inferior Colliculus of Seba’s Fruit Eating Bat, Carollia perspicillata

    PubMed Central

    Kordes, Sebastian; Kössl, Manfred

    2017-01-01

    Abstract For the purpose of orientation, echolocating bats emit highly repetitive and spatially directed sonar calls. Echoes arising from call reflections are used to create an acoustic image of the environment. The inferior colliculus (IC) represents an important auditory stage for initial processing of echolocation signals. The present study addresses the following questions: (1) how does the temporal context of an echolocation sequence mimicking an approach flight of an animal affect neuronal processing of distance information to echo delays? (2) how does the IC process complex echolocation sequences containing echo information from multiple objects (multiobject sequence)? Here, we conducted neurophysiological recordings from the IC of ketamine-anaesthetized bats of the species Carollia perspicillata and compared the results from the IC with the ones from the auditory cortex (AC). Neuronal responses to an echolocation sequence was suppressed when compared to the responses to temporally isolated and randomized segments of the sequence. The neuronal suppression was weaker in the IC than in the AC. In contrast to the cortex, the time course of the acoustic events is reflected by IC activity. In the IC, suppression sharpens the neuronal tuning to specific call-echo elements and increases the signal-to-noise ratio in the units’ responses. When presenting multiple-object sequences, despite collicular suppression, the neurons responded to each object-specific echo. The latter allows parallel processing of multiple echolocation streams at the IC level. Altogether, our data suggests that temporally-precise neuronal responses in the IC could allow fast and parallel processing of multiple acoustic streams. PMID:29242823

  16. Processing of Natural Echolocation Sequences in the Inferior Colliculus of Seba's Fruit Eating Bat, Carollia perspicillata.

    PubMed

    Beetz, M Jerome; Kordes, Sebastian; García-Rosales, Francisco; Kössl, Manfred; Hechavarría, Julio C

    2017-01-01

    For the purpose of orientation, echolocating bats emit highly repetitive and spatially directed sonar calls. Echoes arising from call reflections are used to create an acoustic image of the environment. The inferior colliculus (IC) represents an important auditory stage for initial processing of echolocation signals. The present study addresses the following questions: (1) how does the temporal context of an echolocation sequence mimicking an approach flight of an animal affect neuronal processing of distance information to echo delays? (2) how does the IC process complex echolocation sequences containing echo information from multiple objects (multiobject sequence)? Here, we conducted neurophysiological recordings from the IC of ketamine-anaesthetized bats of the species Carollia perspicillata and compared the results from the IC with the ones from the auditory cortex (AC). Neuronal responses to an echolocation sequence was suppressed when compared to the responses to temporally isolated and randomized segments of the sequence. The neuronal suppression was weaker in the IC than in the AC. In contrast to the cortex, the time course of the acoustic events is reflected by IC activity. In the IC, suppression sharpens the neuronal tuning to specific call-echo elements and increases the signal-to-noise ratio in the units' responses. When presenting multiple-object sequences, despite collicular suppression, the neurons responded to each object-specific echo. The latter allows parallel processing of multiple echolocation streams at the IC level. Altogether, our data suggests that temporally-precise neuronal responses in the IC could allow fast and parallel processing of multiple acoustic streams.

  17. Discrete sequence prediction and its applications

    NASA Technical Reports Server (NTRS)

    Laird, Philip

    1992-01-01

    Learning from experience to predict sequences of discrete symbols is a fundamental problem in machine learning with many applications. We apply sequence prediction using a simple and practical sequence-prediction algorithm, called TDAG. The TDAG algorithm is first tested by comparing its performance with some common data compression algorithms. Then it is adapted to the detailed requirements of dynamic program optimization, with excellent results.

  18. The yeast two hybrid system in a screen for proteins interacting with axolotl (Ambystoma mexicanum) Msx1 during early limb regeneration.

    PubMed

    Abuqarn, Mehtap; Allmeling, Christina; Amshoff, Inga; Menger, Bjoern; Nasser, Inas; Vogt, Peter M; Reimers, Kerstin

    2011-07-01

    Urodele amphibians are exceptional in their ability to regenerate complex body structures such as limbs. Limb regeneration depends on a process called dedifferentiation. Under an inductive wound epidermis terminally differentiated cells transform to pluripotent progenitor cells that coordinately proliferate and eventually redifferentiate to form the new appendage. Recent studies have developed molecular models integrating a set of genes that might have important functions in the control of regenerative cellular plasticity. Among them is Msx1, which induced dedifferentiation in mammalian myotubes in vitro. Herein, we screened for interaction partners of axolotl Msx1 using a yeast two hybrid system. A two hybrid cDNA library of 5-day-old wound epidermis and underlying tissue containing more than 2×10⁶ cDNAs was constructed and used in the screen. 34 resulting cDNA clones were isolated and sequenced. We then compared sequences of the isolated clones to annotated EST contigs of the Salamander EST database (BLASTn) to identify presumptive orthologs. We subsequently searched all no-hit clone sequences against non redundant NCBI sequence databases using BLASTx. It is the first time, that the yeast two hybrid system was adapted to the axolotl animal model and successfully used in a screen for proteins interacting with Msx1 in the context of amphibian limb regeneration. 2011 Elsevier B.V. All rights reserved.

  19. Plant Genome Resources at the National Center for Biotechnology Information

    PubMed Central

    Wheeler, David L.; Smith-White, Brian; Chetvernin, Vyacheslav; Resenchuk, Sergei; Dombrowski, Susan M.; Pechous, Steven W.; Tatusova, Tatiana; Ostell, James

    2005-01-01

    The National Center for Biotechnology Information (NCBI) integrates data from more than 20 biological databases through a flexible search and retrieval system called Entrez. A core Entrez database, Entrez Nucleotide, includes GenBank and is tightly linked to the NCBI Taxonomy database, the Entrez Protein database, and the scientific literature in PubMed. A suite of more specialized databases for genomes, genes, gene families, gene expression, gene variation, and protein domains dovetails with the core databases to make Entrez a powerful system for genomic research. Linked to the full range of Entrez databases is the NCBI Map Viewer, which displays aligned genetic, physical, and sequence maps for eukaryotic genomes including those of many plants. A specialized plant query page allow maps from all plant genomes covered by the Map Viewer to be searched in tandem to produce a display of aligned maps from several species. PlantBLAST searches against the sequences shown in the Map Viewer allow BLAST alignments to be viewed within a genomic context. In addition, precomputed sequence similarities, such as those for proteins offered by BLAST Link, enable fluid navigation from unannotated to annotated sequences, quickening the pace of discovery. NCBI Web pages for plants, such as Plant Genome Central, complete the system by providing centralized access to NCBI's genomic resources as well as links to organism-specific Web pages beyond NCBI. PMID:16010002

  20. BS-virus-finder: virus integration calling using bisulfite sequencing data.

    PubMed

    Gao, Shengjie; Hu, Xuesong; Xu, Fengping; Gao, Changduo; Xiong, Kai; Zhao, Xiao; Chen, Haixiao; Zhao, Shancen; Wang, Mengyao; Fu, Dongke; Zhao, Xiaohui; Bai, Jie; Mao, Likai; Li, Bo; Wu, Song; Wang, Jian; Li, Shengbin; Yang, Huangming; Bolund, Lars; Pedersen, Christian N S

    2018-01-01

    DNA methylation plays a key role in the regulation of gene expression and carcinogenesis. Bisulfite sequencing studies mainly focus on calling single nucleotide polymorphism, different methylation region, and find allele-specific DNA methylation. Until now, only a few software tools have focused on virus integration using bisulfite sequencing data. We have developed a new and easy-to-use software tool, named BS-virus-finder (BSVF, RRID:SCR_015727), to detect viral integration breakpoints in whole human genomes. The tool is hosted at https://github.com/BGI-SZ/BSVF. BS-virus-finder demonstrates high sensitivity and specificity. It is useful in epigenetic studies and to reveal the relationship between viral integration and DNA methylation. BS-virus-finder is the first software tool to detect virus integration loci by using bisulfite sequencing data. © The Authors 2017. Published by Oxford University Press.

  1. Comparative performance of high-density oligonucleotide sequencing and dideoxynucleotide sequencing of HIV type 1 pol from clinical samples.

    PubMed

    Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D

    1998-07-01

    The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.

  2. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics.

    PubMed

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid.

  3. MutAid: Sanger and NGS Based Integrated Pipeline for Mutation Identification, Validation and Annotation in Human Molecular Genetics

    PubMed Central

    Pandey, Ram Vinay; Pabinger, Stephan; Kriegner, Albert; Weinhäusel, Andreas

    2016-01-01

    Traditional Sanger sequencing as well as Next-Generation Sequencing have been used for the identification of disease causing mutations in human molecular research. The majority of currently available tools are developed for research and explorative purposes and often do not provide a complete, efficient, one-stop solution. As the focus of currently developed tools is mainly on NGS data analysis, no integrative solution for the analysis of Sanger data is provided and consequently a one-stop solution to analyze reads from both sequencing platforms is not available. We have therefore developed a new pipeline called MutAid to analyze and interpret raw sequencing data produced by Sanger or several NGS sequencing platforms. It performs format conversion, base calling, quality trimming, filtering, read mapping, variant calling, variant annotation and analysis of Sanger and NGS data under a single platform. It is capable of analyzing reads from multiple patients in a single run to create a list of potential disease causing base substitutions as well as insertions and deletions. MutAid has been developed for expert and non-expert users and supports four sequencing platforms including Sanger, Illumina, 454 and Ion Torrent. Furthermore, for NGS data analysis, five read mappers including BWA, TMAP, Bowtie, Bowtie2 and GSNAP and four variant callers including GATK-HaplotypeCaller, SAMTOOLS, Freebayes and VarScan2 pipelines are supported. MutAid is freely available at https://sourceforge.net/projects/mutaid. PMID:26840129

  4. NGSANE: a lightweight production informatics framework for high-throughput data analysis.

    PubMed

    Buske, Fabian A; French, Hugh J; Smith, Martin A; Clark, Susan J; Bauer, Denis C

    2014-05-15

    The initial steps in the analysis of next-generation sequencing data can be automated by way of software 'pipelines'. However, individual components depreciate rapidly because of the evolving technology and analysis methods, often rendering entire versions of production informatics pipelines obsolete. Constructing pipelines from Linux bash commands enables the use of hot swappable modular components as opposed to the more rigid program call wrapping by higher level languages, as implemented in comparable published pipelining systems. Here we present Next Generation Sequencing ANalysis for Enterprises (NGSANE), a Linux-based, high-performance-computing-enabled framework that minimizes overhead for set up and processing of new projects, yet maintains full flexibility of custom scripting when processing raw sequence data. Ngsane is implemented in bash and publicly available under BSD (3-Clause) licence via GitHub at https://github.com/BauerLab/ngsane. Denis.Bauer@csiro.au Supplementary data are available at Bioinformatics online.

  5. Web Navigation Sequences Automation in Modern Websites

    NASA Astrophysics Data System (ADS)

    Montoto, Paula; Pan, Alberto; Raposo, Juan; Bellas, Fernando; López, Javier

    Most today’s web sources are designed to be used by humans, but they do not provide suitable interfaces for software programs. That is why a growing interest has arisen in so-called web automation applications that are widely used for different purposes such as B2B integration, automated testing of web applications or technology and business watch. Previous proposals assume models for generating and reproducing navigation sequences that are not able to correctly deal with new websites using technologies such as AJAX: on one hand existing systems only allow recording simple navigation actions and, on the other hand, they are unable to detect the end of the effects caused by an user action. In this paper, we propose a set of new techniques to record and execute web navigation sequences able to deal with all the complexity existing in AJAX-based web sites. We also present an exhaustive evaluation of the proposed techniques that shows very promising results.

  6. BioVLAB-mCpG-SNP-EXPRESS: A system for multi-level and multi-perspective analysis and exploration of DNA methylation, sequence variation (SNPs), and gene expression from multi-omics data.

    PubMed

    Chae, Heejoon; Lee, Sangseon; Seo, Seokjun; Jung, Daekyoung; Chang, Hyeonsook; Nephew, Kenneth P; Kim, Sun

    2016-12-01

    Measuring gene expression, DNA sequence variation, and DNA methylation status is routinely done using high throughput sequencing technologies. To analyze such multi-omics data and explore relationships, reliable bioinformatics systems are much needed. Existing systems are either for exploring curated data or for processing omics data in the form of a library such as R. Thus scientists have much difficulty in investigating relationships among gene expression, DNA sequence variation, and DNA methylation using multi-omics data. In this study, we report a system called BioVLAB-mCpG-SNP-EXPRESS for the integrated analysis of DNA methylation, sequence variation (SNPs), and gene expression for distinguishing cellular phenotypes at the pairwise and multiple phenotype levels. The system can be deployed on either the Amazon cloud or a publicly available high-performance computing node, and the data analysis and exploration of the analysis result can be conveniently done using a web-based interface. In order to alleviate analysis complexity, all the process are fully automated, and graphical workflow system is integrated to represent real-time analysis progression. The BioVLAB-mCpG-SNP-EXPRESS system works in three stages. First, it processes and analyzes multi-omics data as input in the form of the raw data, i.e., FastQ files. Second, various integrated analyses such as methylation vs. gene expression and mutation vs. methylation are performed. Finally, the analysis result can be explored in a number of ways through a web interface for the multi-level, multi-perspective exploration. Multi-level interpretation can be done by either gene, gene set, pathway or network level and multi-perspective exploration can be explored from either gene expression, DNA methylation, sequence variation, or their relationship perspective. The utility of the system is demonstrated by performing analysis of phenotypically distinct 30 breast cancer cell line data set. BioVLAB-mCpG-SNP-EXPRESS is available at http://biohealth.snu.ac.kr/software/biovlab_mcpg_snp_express/. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. The writer independent online handwriting recognition system frog on hand and cluster generative statistical dynamic time warping.

    PubMed

    Bahlmann, Claus; Burkhardt, Hans

    2004-03-01

    In this paper, we give a comprehensive description of our writer-independent online handwriting recognition system frog on hand. The focus of this work concerns the presentation of the classification/training approach, which we call cluster generative statistical dynamic time warping (CSDTW). CSDTW is a general, scalable, HMM-based method for variable-sized, sequential data that holistically combines cluster analysis and statistical sequence modeling. It can handle general classification problems that rely on this sequential type of data, e.g., speech recognition, genome processing, robotics, etc. Contrary to previous attempts, clustering and statistical sequence modeling are embedded in a single feature space and use a closely related distance measure. We show character recognition experiments of frog on hand using CSDTW on the UNIPEN online handwriting database. The recognition accuracy is significantly higher than reported results of other handwriting recognition systems. Finally, we describe the real-time implementation of frog on hand on a Linux Compaq iPAQ embedded device.

  8. Programming Native CRISPR Arrays for the Generation of Targeted Immunity.

    PubMed

    Hynes, Alexander P; Labrie, Simon J; Moineau, Sylvain

    2016-05-03

    The adaptive immune system of prokaryotes, called CRISPR-Cas (clustered regularly interspaced short palindromic repeats and CRISPR-associated genes), results in specific cleavage of invading nucleic acid sequences recognized by the cell's "memory" of past encounters. Here, we exploited the properties of native CRISPR-Cas systems to program the natural "memorization" process, efficiently generating immunity not only to a bacteriophage or plasmid but to any specifically chosen DNA sequence. CRISPR-Cas systems have entered the public consciousness as genome editing tools due to their readily programmable nature. In industrial settings, natural CRISPR-Cas immunity is already exploited to generate strains resistant to potentially disruptive viruses. However, the natural process by which bacteria acquire new target specificities (adaptation) is difficult to study and manipulate. The target against which immunity is conferred is selected stochastically. By biasing the immunization process, we offer a means to generate customized immunity, as well as provide a new tool to study adaptation. Copyright © 2016 Hynes et al.

  9. Virus Database and Online Inquiry System Based on Natural Vectors.

    PubMed

    Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St

    2017-01-01

    We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.

  10. Code-Time Diversity for Direct Sequence Spread Spectrum Systems

    PubMed Central

    Hassan, A. Y.

    2014-01-01

    Time diversity is achieved in direct sequence spread spectrum by receiving different faded delayed copies of the transmitted symbols from different uncorrelated channel paths when the transmission signal bandwidth is greater than the coherence bandwidth of the channel. In this paper, a new time diversity scheme is proposed for spread spectrum systems. It is called code-time diversity. In this new scheme, N spreading codes are used to transmit one data symbol over N successive symbols interval. The diversity order in the proposed scheme equals to the number of the used spreading codes N multiplied by the number of the uncorrelated paths of the channel L. The paper represents the transmitted signal model. Two demodulators structures will be proposed based on the received signal models from Rayleigh flat and frequency selective fading channels. Probability of error in the proposed diversity scheme is also calculated for the same two fading channels. Finally, simulation results are represented and compared with that of maximal ration combiner (MRC) and multiple-input and multiple-output (MIMO) systems. PMID:24982925

  11. The PAZAR database of gene regulatory information coupled to the ORCA toolkit for the study of regulatory sequences

    PubMed Central

    Portales-Casamar, Elodie; Arenillas, David; Lim, Jonathan; Swanson, Magdalena I.; Jiang, Steven; McCallum, Anthony; Kirov, Stefan; Wasserman, Wyeth W.

    2009-01-01

    The PAZAR database unites independently created and maintained data collections of transcription factor and regulatory sequence annotation. The flexible PAZAR schema permits the representation of diverse information derived from experiments ranging from biochemical protein–DNA binding to cellular reporter gene assays. Data collections can be made available to the public, or restricted to specific system users. The data ‘boutiques’ within the shopping-mall-inspired system facilitate the analysis of genomics data and the creation of predictive models of gene regulation. Since its initial release, PAZAR has grown in terms of data, features and through the addition of an associated package of software tools called the ORCA toolkit (ORCAtk). ORCAtk allows users to rapidly develop analyses based on the information stored in the PAZAR system. PAZAR is available at http://www.pazar.info. ORCAtk can be accessed through convenient buttons located in the PAZAR pages or via our website at http://www.cisreg.ca/ORCAtk. PMID:18971253

  12. DIALOG: An executive computer program for linking independent programs

    NASA Technical Reports Server (NTRS)

    Glatt, C. R.; Hague, D. S.; Watson, D. A.

    1973-01-01

    A very large scale computer programming procedure called the DIALOG executive system was developed for the CDC 6000 series computers. The executive computer program, DIALOG, controls the sequence of execution and data management function for a library of independent computer programs. Communication of common information is accomplished by DIALOG through a dynamically constructed and maintained data base of common information. Each computer program maintains its individual identity and is unaware of its contribution to the large scale program. This feature makes any computer program a candidate for use with the DIALOG executive system. The installation and uses of the DIALOG executive system are described.

  13. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  14. PANGEA: pipeline for analysis of next generation amplicons

    PubMed Central

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz FW; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-01-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including preprocessing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the χ2 step, are joined into one program called the ‘backbone’. PMID:20182525

  15. PANGEA: pipeline for analysis of next generation amplicons.

    PubMed

    Giongo, Adriana; Crabb, David B; Davis-Richardson, Austin G; Chauliac, Diane; Mobberley, Jennifer M; Gano, Kelsey A; Mukherjee, Nabanita; Casella, George; Roesch, Luiz F W; Walts, Brandon; Riva, Alberto; King, Gary; Triplett, Eric W

    2010-07-01

    High-throughput DNA sequencing can identify organisms and describe population structures in many environmental and clinical samples. Current technologies generate millions of reads in a single run, requiring extensive computational strategies to organize, analyze and interpret those sequences. A series of bioinformatics tools for high-throughput sequencing analysis, including pre-processing, clustering, database matching and classification, have been compiled into a pipeline called PANGEA. The PANGEA pipeline was written in Perl and can be run on Mac OSX, Windows or Linux. With PANGEA, sequences obtained directly from the sequencer can be processed quickly to provide the files needed for sequence identification by BLAST and for comparison of microbial communities. Two different sets of bacterial 16S rRNA sequences were used to show the efficiency of this workflow. The first set of 16S rRNA sequences is derived from various soils from Hawaii Volcanoes National Park. The second set is derived from stool samples collected from diabetes-resistant and diabetes-prone rats. The workflow described here allows the investigator to quickly assess libraries of sequences on personal computers with customized databases. PANGEA is provided for users as individual scripts for each step in the process or as a single script where all processes, except the chi(2) step, are joined into one program called the 'backbone'.

  16. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB.

    PubMed

    Xu, Qifang; Dunbrack, Roland L

    2012-11-01

    Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM-HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly.

  17. Identification of missing variants by combining multiple analytic pipelines.

    PubMed

    Ren, Yingxue; Reddy, Joseph S; Pottier, Cyril; Sarangi, Vivekananda; Tian, Shulan; Sinnwell, Jason P; McDonnell, Shannon K; Biernacka, Joanna M; Carrasquillo, Minerva M; Ross, Owen A; Ertekin-Taner, Nilüfer; Rademakers, Rosa; Hudson, Matthew; Mainzer, Liudmila Sergeevna; Asmann, Yan W

    2018-04-16

    After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. We analyzed 10,000 exomes from the Alzheimer's Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1-5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer's disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size.

  18. Development of use of an Operational Procedure Information System (OPIS) for future space missions

    NASA Technical Reports Server (NTRS)

    Illmer, N.; Mies, L.; Schoen, A.; Jain, A.

    1994-01-01

    A MS-Windows based electronic procedure system, called OPIS (Operational Procedure Information System), was developed. The system consists of two parts, the editor, for 'writing' the procedure and the notepad application, for the usage of the procedures by the crew during training and flight. The system is based on standardized, structured procedure format and language. It allows the embedding of sketches, photos, animated graphics and video sequences and the access to off-nominal procedures by linkage to an appropriate database. The system facilitates the work with procedures of different degrees of detail, depending on the training status of the crew. The development of a 'language module' for the automatic translation of the procedures, for example into Russian, is planned.

  19. MCTP system model based on linear programming optimization of apertures obtained from sequencing patient image data maps.

    PubMed

    Ureba, A; Salguero, F J; Barbeiro, A R; Jimenez-Ortega, E; Baeza, J A; Miras, H; Linares, R; Perucha, M; Leal, A

    2014-08-01

    The authors present a hybrid direct multileaf collimator (MLC) aperture optimization model exclusively based on sequencing of patient imaging data to be implemented on a Monte Carlo treatment planning system (MC-TPS) to allow the explicit radiation transport simulation of advanced radiotherapy treatments with optimal results in efficient times for clinical practice. The planning system (called CARMEN) is a full MC-TPS, controlled through aMATLAB interface, which is based on the sequencing of a novel map, called "biophysical" map, which is generated from enhanced image data of patients to achieve a set of segments actually deliverable. In order to reduce the required computation time, the conventional fluence map has been replaced by the biophysical map which is sequenced to provide direct apertures that will later be weighted by means of an optimization algorithm based on linear programming. A ray-casting algorithm throughout the patient CT assembles information about the found structures, the mass thickness crossed, as well as PET values. Data are recorded to generate a biophysical map for each gantry angle. These maps are the input files for a home-made sequencer developed to take into account the interactions of photons and electrons with the MLC. For each linac (Axesse of Elekta and Primus of Siemens) and energy beam studied (6, 9, 12, 15 MeV and 6 MV), phase space files were simulated with the EGSnrc/BEAMnrc code. The dose calculation in patient was carried out with the BEAMDOSE code. This code is a modified version of EGSnrc/DOSXYZnrc able to calculate the beamlet dose in order to combine them with different weights during the optimization process. Three complex radiotherapy treatments were selected to check the reliability of CARMEN in situations where the MC calculation can offer an added value: A head-and-neck case (Case I) with three targets delineated on PET/CT images and a demanding dose-escalation; a partial breast irradiation case (Case II) solved with photon and electron modulated beams (IMRT + MERT); and a prostatic bed case (Case III) with a pronounced concave-shaped PTV by using volumetric modulated arc therapy. In the three cases, the required target prescription doses and constraints on organs at risk were fulfilled in a short enough time to allow routine clinical implementation. The quality assurance protocol followed to check CARMEN system showed a high agreement with the experimental measurements. A Monte Carlo treatment planning model exclusively based on maps performed from patient imaging data has been presented. The sequencing of these maps allows obtaining deliverable apertures which are weighted for modulation under a linear programming formulation. The model is able to solve complex radiotherapy treatments with high accuracy in an efficient computation time.

  20. High speed clinical data retrieval system with event time sequence feature: with 10 years of clinical data of Hamamatsu University Hospital CPOE.

    PubMed

    Kimura, M; Tani, S; Watanabe, H; Naito, Y; Sakusabe, T; Watanabe, H; Nakaya, J; Sasaki, F; Numano, T; Furuta, T; Furuta, T

    2008-01-01

    This paper illustrates a high speed clinical data retrieving system, from 10 years of data of operating hospital information system for the purposes of research, evidence creation, patient safety, etc., even incorporating time sequence of causal relations. Total of 73,709,298 records of 10 years at Hamamatsu University Hospital (as of June 2008) are sent from HIS to retrieval system in HL7 v2.5 format. Hierarchical variable length database is used to install them. A search for "listing patients who were prescribed Pravastatin (Mevalotin and generic drugs, any titer)" took 1.92 seconds. "Pravastatin (any) prescribed and recorded AST >150 within two weeks" took 112.22 seconds. Searching conditions can be set to be more complex, connected by Boolean operator and/or. This system called D*D is in operation at Hamamatsu University Hospital since August 2002. It is used for 48,518 times (monthly average of 703 searches). Neither searching, nor background export of data from HIS caused delay of routine operating CPOE. Search database outside of routine operating CPOE, with daily export of order data in HL7 v2.5 format, is proved to provide excellent search environment without causing trouble. Hierarchical representation gives high-speed search response, especially with time sequence of events.

  1. Using Next-Generation Sequencing to Explore Genetics and Race in the High School Classroom

    ERIC Educational Resources Information Center

    Yang, Xinmiao; Hartman, Mark R.; Harrington, Kristin T.; Etson, Candice M.; Fierman, Matthew B.; Slonim, Donna K.; Walt, David R.

    2017-01-01

    With the development of new sequencing and bioinformatics technologies, concepts relating to personal genomics play an increasingly important role in our society. To promote interest and understanding of sequencing and bioinformatics in the high school classroom, we developed and implemented a laboratory-based teaching module called "The…

  2. Best Practices and Joint Calling of the HumanExome BeadChip: The CHARGE Consortium

    PubMed Central

    Grove, Megan L.; Yu, Bing; Cochran, Barbara J.; Haritunians, Talin; Bis, Joshua C.; Taylor, Kent D.; Hansen, Mark; Borecki, Ingrid B.; Cupples, L. Adrienne; Fornage, Myriam; Gudnason, Vilmundur; Harris, Tamara B.; Kathiresan, Sekar; Kraaij, Robert; Launer, Lenore J.; Levy, Daniel; Liu, Yongmei; Mosley, Thomas; Peloso, Gina M.; Psaty, Bruce M.; Rich, Stephen S.; Rivadeneira, Fernando; Siscovick, David S.; Smith, Albert V.; Uitterlinden, Andre; van Duijn, Cornelia M.; Wilson, James G.; O’Donnell, Christopher J.; Rotter, Jerome I.; Boerwinkle, Eric

    2013-01-01

    Genotyping arrays are a cost effective approach when typing previously-identified genetic polymorphisms in large numbers of samples. One limitation of genotyping arrays with rare variants (e.g., minor allele frequency [MAF] <0.01) is the difficulty that automated clustering algorithms have to accurately detect and assign genotype calls. Combining intensity data from large numbers of samples may increase the ability to accurately call the genotypes of rare variants. Approximately 62,000 ethnically diverse samples from eleven Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium cohorts were genotyped with the Illumina HumanExome BeadChip across seven genotyping centers. The raw data files for the samples were assembled into a single project for joint calling. To assess the quality of the joint calling, concordance of genotypes in a subset of individuals having both exome chip and exome sequence data was analyzed. After exclusion of low performing SNPs on the exome chip and non-overlap of SNPs derived from sequence data, genotypes of 185,119 variants (11,356 were monomorphic) were compared in 530 individuals that had whole exome sequence data. A total of 98,113,070 pairs of genotypes were tested and 99.77% were concordant, 0.14% had missing data, and 0.09% were discordant. We report that joint calling allows the ability to accurately genotype rare variation using array technology when large sample sizes are available and best practices are followed. The cluster file from this experiment is available at www.chargeconsortium.com/main/exomechip. PMID:23874508

  3. Automated Virtual Machine Introspection for Host-Based Intrusion Detection

    DTIC Science & Technology

    2009-03-01

    boxes represent the code and data sections of each process in memory with arrows representing hooks planted by malware to jump to the malware code...a useful indication of intrusion, it is also susceptible to mimicry and concurrency attacks [Pro03,Wat07]. Additionally, most research abstracts away...sequence of system calls that accomplishes his or her intent [WS02]. This “ mimicry attack” takes advantage of the fact that many HIDS discard the pa

  4. MUSI: an integrated system for identifying multiple specificity from very large peptide or nucleic acid data sets.

    PubMed

    Kim, Taehyung; Tyndel, Marc S; Huang, Haiming; Sidhu, Sachdev S; Bader, Gary D; Gfeller, David; Kim, Philip M

    2012-03-01

    Peptide recognition domains and transcription factors play crucial roles in cellular signaling. They bind linear stretches of amino acids or nucleotides, respectively, with high specificity. Experimental techniques that assess the binding specificity of these domains, such as microarrays or phage display, can retrieve thousands of distinct ligands, providing detailed insight into binding specificity. In particular, the advent of next-generation sequencing has recently increased the throughput of such methods by several orders of magnitude. These advances have helped reveal the presence of distinct binding specificity classes that co-exist within a set of ligands interacting with the same target. Here, we introduce a software system called MUSI that can rapidly analyze very large data sets of binding sequences to determine the relevant binding specificity patterns. Our pipeline provides two major advances. First, it can detect previously unrecognized multiple specificity patterns in any data set. Second, it offers integrated processing of very large data sets from next-generation sequencing machines. The results are visualized as multiple sequence logos describing the different binding preferences of the protein under investigation. We demonstrate the performance of MUSI by analyzing recent phage display data for human SH3 domains as well as microarray data for mouse transcription factors.

  5. High Order Non-Stationary Markov Models and Anomaly Propagation Analysis in Intrusion Detection System (IDS)

    DTIC Science & Technology

    2007-02-01

    almost identical system call sequences and triggering the same alarm at different hosts. The alarm propagation effect can be used to distinguish “true...different hosts. The alarm propagation effect can be used to distinguish “true alarms” from “false positives”. At the host-level, a new anomaly...0H ( ) ( )∑∑ = = ⎟⎟ ⎠ ⎞ ⎜⎜ ⎝ ⎛ − + − = 2 1 1, 2 2 2 2 1 1 ),( ),(),()( ),( ),(),()( k m ji jiT jiTjiTiN jiT jiTjiTiNW where - marginal observed

  6. Conditions for extinction events in chemical reaction networks with discrete state spaces.

    PubMed

    Johnston, Matthew D; Anderson, David F; Craciun, Gheorghe; Brijder, Robert

    2018-05-01

    We study chemical reaction networks with discrete state spaces and present sufficient conditions on the structure of the network that guarantee the system exhibits an extinction event. The conditions we derive involve creating a modified chemical reaction network called a domination-expanded reaction network and then checking properties of this network. Unlike previous results, our analysis allows algorithmic implementation via systems of equalities and inequalities and suggests sequences of reactions which may lead to extinction events. We apply the results to several networks including an EnvZ-OmpR signaling pathway in Escherichia coli.

  7. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver.

    PubMed

    Wymant, Chris; Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ong, Swee Hoe; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Berkhout, Ben; Cornelissen, Marion; Kellam, Paul; Reiss, Peter; Fraser, Christophe

    2018-01-01

    Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user's choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver's constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver.

  8. Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

    PubMed Central

    Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

    2013-01-01

    Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798

  9. Factors influencing success of clinical genome sequencing across a broad spectrum of disorders

    PubMed Central

    Lise, Stefano; Broxholme, John; Cazier, Jean-Baptiste; Rimmer, Andy; Kanapin, Alexander; Lunter, Gerton; Fiddy, Simon; Allan, Chris; Aricescu, A. Radu; Attar, Moustafa; Babbs, Christian; Becq, Jennifer; Beeson, David; Bento, Celeste; Bignell, Patricia; Blair, Edward; Buckle, Veronica J; Bull, Katherine; Cais, Ondrej; Cario, Holger; Chapel, Helen; Copley, Richard R; Cornall, Richard; Craft, Jude; Dahan, Karin; Davenport, Emma E; Dendrou, Calliope; Devuyst, Olivier; Fenwick, Aimée L; Flint, Jonathan; Fugger, Lars; Gilbert, Rodney D; Goriely, Anne; Green, Angie; Greger, Ingo H.; Grocock, Russell; Gruszczyk, Anja V; Hastings, Robert; Hatton, Edouard; Higgs, Doug; Hill, Adrian; Holmes, Chris; Howard, Malcolm; Hughes, Linda; Humburg, Peter; Johnson, David; Karpe, Fredrik; Kingsbury, Zoya; Kini, Usha; Knight, Julian C; Krohn, Jonathan; Lamble, Sarah; Langman, Craig; Lonie, Lorne; Luck, Joshua; McCarthy, Davis; McGowan, Simon J; McMullin, Mary Frances; Miller, Kerry A; Murray, Lisa; Németh, Andrea H; Nesbit, M Andrew; Nutt, David; Ormondroyd, Elizabeth; Oturai, Annette Bang; Pagnamenta, Alistair; Patel, Smita Y; Percy, Melanie; Petousi, Nayia; Piazza, Paolo; Piret, Sian E; Polanco-Echeverry, Guadalupe; Popitsch, Niko; Powrie, Fiona; Pugh, Chris; Quek, Lynn; Robbins, Peter A; Robson, Kathryn; Russo, Alexandra; Sahgal, Natasha; van Schouwenburg, Pauline A; Schuh, Anna; Silverman, Earl; Simmons, Alison; Sørensen, Per Soelberg; Sweeney, Elizabeth; Taylor, John; Thakker, Rajesh V; Tomlinson, Ian; Trebes, Amy; Twigg, Stephen RF; Uhlig, Holm H; Vyas, Paresh; Vyse, Tim; Wall, Steven A; Watkins, Hugh; Whyte, Michael P; Witty, Lorna; Wright, Ben; Yau, Chris; Buck, David; Humphray, Sean; Ratcliffe, Peter J; Bell, John I; Wilkie, Andrew OM; Bentley, David; Donnelly, Peter; McVean, Gilean

    2015-01-01

    To assess factors influencing the success of whole genome sequencing for mainstream clinical diagnosis, we sequenced 217 individuals from 156 independent cases across a broad spectrum of disorders in whom prior screening had identified no pathogenic variants. We quantified the number of candidate variants identified using different strategies for variant calling, filtering, annotation and prioritisation. We found that jointly calling variants across samples, filtering against both local and external databases, deploying multiple annotation tools and using familial transmission above biological plausibility contributed to accuracy. Overall, we identified disease causing variants in 21% of cases, rising to 34% (23/68) for Mendelian disorders and 57% (8/14) in trios. We also discovered 32 potentially clinically actionable variants in 18 genes unrelated to the referral disorder, though only four were ultimately considered reportable. Our results demonstrate the value of genome sequencing for routine clinical diagnosis, but also highlight many outstanding challenges. PMID:25985138

  10. Mapping Base Modifications in DNA by Transverse-Current Sequencing

    NASA Astrophysics Data System (ADS)

    Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

    2018-02-01

    Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.

  11. Advanced Transport Operating System (ATOPS) color displays software description: MicroVAX system

    NASA Technical Reports Server (NTRS)

    Slominski, Christopher J.; Plyler, Valerie E.; Dickson, Richard W.

    1992-01-01

    This document describes the software created for the Display MicroVAX computer used for the Advanced Transport Operating Systems (ATOPS) project on the Transport Systems Research Vehicle (TSRV). The software delivery of February 27, 1991, known as the 'baseline display system', is the one described in this document. Throughout this publication, module descriptions are presented in a standardized format which contains module purpose, calling sequence, detailed description, and global references. The global references section includes subroutines, functions, and common variables referenced by a particular module. The system described supports the Research Flight Deck (RFD) of the TSRV. The RFD contains eight Cathode Ray Tubes (CRTs) which depict a Primary Flight Display, Navigation Display, System Warning Display, Takeoff Performance Monitoring System Display, and Engine Display.

  12. The engineering design integration (EDIN) system. [digital computer program complex

    NASA Technical Reports Server (NTRS)

    Glatt, C. R.; Hirsch, G. N.; Alford, G. E.; Colquitt, W. N.; Reiners, S. J.

    1974-01-01

    A digital computer program complex for the evaluation of aerospace vehicle preliminary designs is described. The system consists of a Univac 1100 series computer and peripherals using the Exec 8 operating system, a set of demand access terminals of the alphanumeric and graphics types, and a library of independent computer programs. Modification of the partial run streams, data base maintenance and construction, and control of program sequencing are provided by a data manipulation program called the DLG processor. The executive control of library program execution is performed by the Univac Exec 8 operating system through a user established run stream. A combination of demand and batch operations is employed in the evaluation of preliminary designs. Applications accomplished with the EDIN system are described.

  13. Measure-valued solutions to the complete Euler system revisited

    NASA Astrophysics Data System (ADS)

    Březina, Jan; Feireisl, Eduard

    2018-06-01

    We consider the complete Euler system describing the time evolution of a general inviscid compressible fluid. We introduce a new concept of measure-valued solution based on the total energy balance and entropy inequality for the physical entropy without any renormalization. This class of so-called dissipative measure-valued solutions is large enough to include the vanishing dissipation limits of the Navier-Stokes-Fourier system. Our main result states that any sequence of weak solutions to the Navier-Stokes-Fourier system with vanishing viscosity and heat conductivity coefficients generates a dissipative measure-valued solution of the Euler system under some physically grounded constitutive relations. Finally, we discuss the same asymptotic limit for the bi-velocity fluid model introduced by H.Brenner.

  14. Schema vs. primitive perceptual grouping: the relative weighting of sequential vs. spatial cues during an auditory grouping task in frogs.

    PubMed

    Farris, Hamilton E; Ryan, Michael J

    2017-03-01

    Perceptually, grouping sounds based on their sources is critical for communication. This is especially true in túngara frog breeding aggregations, where multiple males produce overlapping calls that consist of an FM 'whine' followed by harmonic bursts called 'chucks'. Phonotactic females use at least two cues to group whines and chucks: whine-chuck spatial separation and sequence. Spatial separation is a primitive cue, whereas sequence is schema-based, as chuck production is morphologically constrained to follow whines, meaning that males cannot produce the components simultaneously. When one cue is available, females perceptually group whines and chucks using relative comparisons: components with the smallest spatial separation or those closest to the natural sequence are more likely grouped. By simultaneously varying the temporal sequence and spatial separation of a single whine and two chucks, this study measured between-cue perceptual weighting during a specific grouping task. Results show that whine-chuck spatial separation is a stronger grouping cue than temporal sequence, as grouping is more likely for stimuli with smaller spatial separation and non-natural sequence than those with larger spatial separation and natural sequence. Compared to the schema-based whine-chuck sequence, we propose that spatial cues have less variance, potentially explaining their preferred use when grouping during directional behavioral responses.

  15. Behavioral Context of Call Production by Eastern North Pacific Blue Whales

    DTIC Science & Technology

    2007-01-25

    pairs occurring in a repeated song sequence; B calls from a different blue whale are also evident; spectrogram parameters: fast Fourier transform (FFT...Acoustic data were viewed in spectrogram form ( fast Fourier transform [FFT] length 1 s, 80% overlap, Hanning window) to de- termine the presence of calls...dura- tion to song A and B units (Table 2), but the intermit - tent timing clearly distinguishes them from song. Whales producing singular calls were

  16. Genotype calling from next-generation sequencing data using haplotype information of reads

    PubMed Central

    Zhi, Degui; Wu, Jihua; Liu, Nianjun; Zhang, Kui

    2012-01-01

    Motivation: Low coverage sequencing provides an economic strategy for whole genome sequencing. When sequencing a set of individuals, genotype calling can be challenging due to low sequencing coverage. Linkage disequilibrium (LD) based refinement of genotyping calling is essential to improve the accuracy. Current LD-based methods use read counts or genotype likelihoods at individual potential polymorphic sites (PPSs). Reads that span multiple PPSs (jumping reads) can provide additional haplotype information overlooked by current methods. Results: In this article, we introduce a new Hidden Markov Model (HMM)-based method that can take into account jumping reads information across adjacent PPSs and implement it in the HapSeq program. Our method extends the HMM in Thunder and explicitly models jumping reads information as emission probabilities conditional on the states of adjacent PPSs. Our simulation results show that, compared to Thunder, HapSeq reduces the genotyping error rate by 30%, from 0.86% to 0.60%. The results from the 1000 Genomes Project show that HapSeq reduces the genotyping error rate by 12 and 9%, from 2.24% and 2.76% to 1.97% and 2.50% for individuals with European and African ancestry, respectively. We expect our program can improve genotyping qualities of the large number of ongoing and planned whole genome sequencing projects. Contact: dzhi@ms.soph.uab.edu; kzhang@ms.soph.uab.edu Availability: The software package HapSeq and its manual can be found and downloaded at www.ssg.uab.edu/hapseq/. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:22285565

  17. Number of 24-Hour Diet Recalls Needed to Estimate Energy Intake

    PubMed Central

    MA, Yunsheng; Olendzki, Barbara C.; Pagoto, Sherry L.; Hurley, Thomas G.; Magner, Robert P.; Ockene, Ira S.; Schneider, Kristin L.; Merriam, Philip A.; Hébert, James R.

    2009-01-01

    Purpose Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Method Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Results Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p = 0.02 for both comparisons). Additional calls did not improve estimation. Conclusions Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake. PMID:19576535

  18. Number of 24-hour diet recalls needed to estimate energy intake.

    PubMed

    Ma, Yunsheng; Olendzki, Barbara C; Pagoto, Sherry L; Hurley, Thomas G; Magner, Robert P; Ockene, Ira S; Schneider, Kristin L; Merriam, Philip A; Hébert, James R

    2009-08-01

    Twenty-four-hour diet recall interviews (24HRs) are used to assess diet and to validate other diet assessment instruments. Therefore it is important to know how many 24HRs are required to describe an individual's intake. Seventy-nine middle-aged white women completed seven 24HRs over a 14-day period, during which energy expenditure (EE) was determined by the doubly labeled water method (DLW). Mean daily intakes were compared to DLW-derived EE using paired t tests. Linear mixed models were used to evaluate the effect of call sequence and day of the week on 24HR-derived energy intake while adjusting for education, relative body weight, social desirability, and an interaction between call sequence and social desirability. Mean EE from DLW was 2115 kcal/day. Adjusted 24HR-derived energy intake was lowest at call 1 (1501 kcal/day); significantly higher energy intake was observed at calls 2 and 3 (2246 and 2315 kcal/day, respectively). Energy intake on Friday was significantly lower than on Sunday. Averaging energy intake from the first two calls better approximated true energy expenditure than did the first call, and averaging the first three calls further improved the estimate (p=0.02 for both comparisons). Additional calls did not improve estimation. Energy intake is underreported on the first 24HR. Three 24HRs appear optimal for estimating energy intake.

  19. Tribonacci-Like Sequences and Generalized Pascal's Pyramids

    ERIC Educational Resources Information Center

    Anatriello, Giuseppina; Vincenzi, Giovanni

    2014-01-01

    A well-known result of Feinberg and Shannon states that the tribonacci sequence can be detected by the so-called "Pascal's pyramid." Here we will show that any tribonacci-like sequence can be obtained by the diagonals of the "Feinberg's triangle" associated to a suitable "generalized Pascal's pyramid."…

  20. The tomato genome

    USDA-ARS?s Scientific Manuscript database

    The tomato genome sequence was undertaken at a time when state-of-the-art sequencing methodologies were undergoing a transition to co-called next generation methodologies. The result was an international consortium undertaking a strategy merging both old and new approaches. Because biologists were...

  1. QQ-SNV: single nucleotide variant detection at low frequency by comparing the quality quantiles.

    PubMed

    Van der Borght, Koen; Thys, Kim; Wetzels, Yves; Clement, Lieven; Verbist, Bie; Reumers, Joke; van Vlijmen, Herman; Aerssens, Jeroen

    2015-11-10

    Next generation sequencing enables studying heterogeneous populations of viral infections. When the sequencing is done at high coverage depth ("deep sequencing"), low frequency variants can be detected. Here we present QQ-SNV (http://sourceforge.net/projects/qqsnv), a logistic regression classifier model developed for the Illumina sequencing platforms that uses the quantiles of the quality scores, to distinguish true single nucleotide variants from sequencing errors based on the estimated SNV probability. To train the model, we created a dataset of an in silico mixture of five HIV-1 plasmids. Testing of our method in comparison to the existing methods LoFreq, ShoRAH, and V-Phaser 2 was performed on two HIV and four HCV plasmid mixture datasets and one influenza H1N1 clinical dataset. For default application of QQ-SNV, variants were called using a SNV probability cutoff of 0.5 (QQ-SNV(D)). To improve the sensitivity we used a SNV probability cutoff of 0.0001 (QQ-SNV(HS)). To also increase specificity, SNVs called were overruled when their frequency was below the 80(th) percentile calculated on the distribution of error frequencies (QQ-SNV(HS-P80)). When comparing QQ-SNV versus the other methods on the plasmid mixture test sets, QQ-SNV(D) performed similarly to the existing approaches. QQ-SNV(HS) was more sensitive on all test sets but with more false positives. QQ-SNV(HS-P80) was found to be the most accurate method over all test sets by balancing sensitivity and specificity. When applied to a paired-end HCV sequencing study, with lowest spiked-in true frequency of 0.5%, QQ-SNV(HS-P80) revealed a sensitivity of 100% (vs. 40-60% for the existing methods) and a specificity of 100% (vs. 98.0-99.7% for the existing methods). In addition, QQ-SNV required the least overall computation time to process the test sets. Finally, when testing on a clinical sample, four putative true variants with frequency below 0.5% were consistently detected by QQ-SNV(HS-P80) from different generations of Illumina sequencers. We developed and successfully evaluated a novel method, called QQ-SNV, for highly efficient single nucleotide variant calling on Illumina deep sequencing virology data.

  2. Probabilistic simple sticker systems

    NASA Astrophysics Data System (ADS)

    Selvarajoo, Mathuri; Heng, Fong Wan; Sarmin, Nor Haniza; Turaev, Sherzod

    2017-04-01

    A model for DNA computing using the recombination behavior of DNA molecules, known as a sticker system, was introduced by by L. Kari, G. Paun, G. Rozenberg, A. Salomaa, and S. Yu in the paper entitled DNA computing, sticker systems and universality from the journal of Acta Informatica vol. 35, pp. 401-420 in the year 1998. A sticker system uses the Watson-Crick complementary feature of DNA molecules: starting from the incomplete double stranded sequences, and iteratively using sticking operations until a complete double stranded sequence is obtained. It is known that sticker systems with finite sets of axioms and sticker rules generate only regular languages. Hence, different types of restrictions have been considered to increase the computational power of sticker systems. Recently, a variant of restricted sticker systems, called probabilistic sticker systems, has been introduced [4]. In this variant, the probabilities are initially associated with the axioms, and the probability of a generated string is computed by multiplying the probabilities of all occurrences of the initial strings in the computation of the string. Strings for the language are selected according to some probabilistic requirements. In this paper, we study fundamental properties of probabilistic simple sticker systems. We prove that the probabilistic enhancement increases the computational power of simple sticker systems.

  3. a Simple Symmetric Algorithm Using a Likeness with Introns Behavior in RNA Sequences

    NASA Astrophysics Data System (ADS)

    Regoli, Massimo

    2009-02-01

    The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. The RNA sequences has some sections called Introns. Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behaviour in the access to the secret key to code the messages. In the RNA-Crypto System algoritnm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.

  4. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS's do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the Extensible Object Model'', to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  5. Sequence modelling and an extensible data model for genomic database

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Peter Wei-Der

    1992-01-01

    The Human Genome Project (HGP) plans to sequence the human genome by the beginning of the next century. It will generate DNA sequences of more than 10 billion bases and complex marker sequences (maps) of more than 100 million markers. All of these information will be stored in database management systems (DBMSs). However, existing data models do not have the abstraction mechanism for modelling sequences and existing DBMS`s do not have operations for complex sequences. This work addresses the problem of sequence modelling in the context of the HGP and the more general problem of an extensible object data modelmore » that can incorporate the sequence model as well as existing and future data constructs and operators. First, we proposed a general sequence model that is application and implementation independent. This model is used to capture the sequence information found in the HGP at the conceptual level. In addition, abstract and biological sequence operators are defined for manipulating the modelled sequences. Second, we combined many features of semantic and object oriented data models into an extensible framework, which we called the ``Extensible Object Model``, to address the need of a modelling framework for incorporating the sequence data model with other types of data constructs and operators. This framework is based on the conceptual separation between constructors and constraints. We then used this modelling framework to integrate the constructs for the conceptual sequence model. The Extensible Object Model is also defined with a graphical representation, which is useful as a tool for database designers. Finally, we defined a query language to support this model and implement the query processor to demonstrate the feasibility of the extensible framework and the usefulness of the conceptual sequence model.« less

  6. Innovative production technology in aircraft construction: CIAM Forming 'made by MBB' - A highly productive example

    NASA Astrophysics Data System (ADS)

    A novel production technology in aircraft construction was developed for manufacturing parts of shapes and dimensions that involve only small quantities for one machine. The process, called computerized integrated and automated manufacturing (CIAM), makes it possible to make ready-to-install sheet-metal parts for all types of aircraft. All of the system's job sequences, which include milling the flat sheet-metal parts in stacks, deburring, heat treatment, and forming under the high-pressure rubber-pad press, are automated. The CIAM production center, called SIAM Forming, fulfills the prerequisites for the cost-effective production of sheet-metal parts made of aluminum alloys, titanium, or steel. The SIAM procedure results in negligible material loss through computerizing both component-contour nesting of the sheet-metal parts and contour milling.

  7. HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor

    PubMed Central

    Clima, Rosanna; Preste, Roberto; Calabrese, Claudia; Diroma, Maria Angela; Santorsola, Mariangela; Scioscia, Gaetano; Simone, Domenico; Shen, Lishuang; Gasparre, Giuseppe; Attimonelli, Marcella

    2017-01-01

    The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/. PMID:27899581

  8. Clinical sequencing in leukemia with the assistance of artificial intelligence.

    PubMed

    Tojo, Arinobu

    2017-01-01

    Next generation sequencing (NGS) of cancer genomes is now becoming a prerequisite for accurate diagnosis and proper treatment in clinical oncology. Because the genomic regions for NGS expand from a certain set of genes to the whole exome or whole genome, the resulting sequence data becomes incredibly enormous and makes it quite laborious to translate the genomic data into medicine, so-called annotation and curation. We organized a clinical sequencing team and established a bidirectional (bed-to-bench and bench-to-bed) system to integrate clinical and genomic data for hematological malignancies. We also started a collaborative research project with IBM Japan to adopt the artificial intelligence Watson for Genomics (WfG) to the pipeline of medical informatics. Genomic DNA was prepared from malignant as well as normal tissues in each patient and subjected to NGS. Sequence data was analyzed using an in-house semi-automated pipeline in combination with WfG, which was used to identify candidate driver mutations and relevant pathways from which applicable drug information was deduced. Currently, we have analyzed more than 150 patients with hematological disorders, including AML and ALL, and obtained many informative findings. In this presentation, I will introduce some of the achievements we have made so far.

  9. ABI Base Recall: Automatic Correction and Ends Trimming of DNA Sequences.

    PubMed

    Elyazghi, Zakaria; Yazouli, Loubna El; Sadki, Khalid; Radouani, Fouzia

    2017-12-01

    Automated DNA sequencers produce chromatogram files in ABI format. When viewing chromatograms, some ambiguities are shown at various sites along the DNA sequences, because the program implemented in the sequencing machine and used to call bases cannot always precisely determine the right nucleotide, especially when it is represented by either a broad peak or a set of overlaying peaks. In such cases, a letter other than A, C, G, or T is recorded, most commonly N. Thus, DNA sequencing chromatograms need manual examination: checking for mis-calls and truncating the sequence when errors become too frequent. The purpose of this paper is to develop a program allowing the automatic correction of these ambiguities. This application is a Web-based program powered by Shiny and runs under R platform for an easy exploitation. As a part of the interface, we added the automatic ends clipping option, alignment against reference sequences, and BLAST. To develop and test our tool, we collected several bacterial DNA sequences from different laboratories within Institut Pasteur du Maroc and performed both manual and automatic correction. The comparison between the two methods was carried out. As a result, we note that our program, ABI base recall, accomplishes good correction with a high accuracy. Indeed, it increases the rate of identity and coverage and minimizes the number of mismatches and gaps, hence it provides solution to sequencing ambiguities and saves biologists' time and labor.

  10. An investigation of the uniform random number generator

    NASA Technical Reports Server (NTRS)

    Temple, E. C.

    1982-01-01

    Most random number generators that are in use today are of the congruential form X(i+1) + AX(i) + C mod M where A, C, and M are nonnegative integers. If C=O, the generator is called the multiplicative type and those for which C/O are called mixed congruential generators. It is easy to see that congruential generators will repeat a sequence of numbers after a maximum of M values have been generated. The number of numbers that a procedure generates before restarting the sequence is called the length or the period of the generator. Generally, it is desirable to make the period as long as possible. A detailed discussion of congruential generators is given. Also, several promising procedures that differ from the multiplicative and mixed procedure are discussed.

  11. Analysis of ParB-centromere interactions by multiplex SPR imaging reveals specific patterns for binding ParB in six centromeres of Burkholderiales chromosomes and plasmids

    PubMed Central

    Pillet, Flavien; Passot, Fanny Marie

    2017-01-01

    Bacterial centromeres–also called parS, are cis-acting DNA sequences which, together with the proteins ParA and ParB, are involved in the segregation of chromosomes and plasmids. The specific binding of ParB to parS nucleates the assembly of a large ParB/DNA complex from which ParA—the motor protein, segregates the sister replicons. Closely related families of partition systems, called Bsr, were identified on the chromosomes and large plasmids of the multi-chromosomal bacterium Burkholderia cenocepacia and other species from the order Burkholeriales. The centromeres of the Bsr partition families are 16 bp palindromes, displaying similar base compositions, notably a central CG dinucleotide. Despite centromeres bind the cognate ParB with a narrow specificity, weak ParB-parS non cognate interactions were nevertheless detected between few Bsr partition systems of replicons not belonging to the same genome. These observations suggested that Bsr partition systems could have a common ancestry but that evolution mostly erased the possibilities of cross-reactions between them, in particular to prevent replicon incompatibility. To detect novel similarities between Bsr partition systems, we have analyzed the binding of six Bsr parS sequences and a wide collection of modified derivatives, to their cognate ParB. The study was carried out by Surface Plasmon Resonance imaging (SPRi) mulitplex analysis enabling a systematic survey of each nucleotide position within the centromere. We found that in each parS some positions could be changed while maintaining binding to ParB. Each centromere displays its own pattern of changes, but some positions are shared more or less widely. In addition from these changes we could speculate evolutionary links between these centromeres. PMID:28562673

  12. Analysis of ParB-centromere interactions by multiplex SPR imaging reveals specific patterns for binding ParB in six centromeres of Burkholderiales chromosomes and plasmids.

    PubMed

    Pillet, Flavien; Passot, Fanny Marie; Pasta, Franck; Anton Leberre, Véronique; Bouet, Jean-Yves

    2017-01-01

    Bacterial centromeres-also called parS, are cis-acting DNA sequences which, together with the proteins ParA and ParB, are involved in the segregation of chromosomes and plasmids. The specific binding of ParB to parS nucleates the assembly of a large ParB/DNA complex from which ParA-the motor protein, segregates the sister replicons. Closely related families of partition systems, called Bsr, were identified on the chromosomes and large plasmids of the multi-chromosomal bacterium Burkholderia cenocepacia and other species from the order Burkholeriales. The centromeres of the Bsr partition families are 16 bp palindromes, displaying similar base compositions, notably a central CG dinucleotide. Despite centromeres bind the cognate ParB with a narrow specificity, weak ParB-parS non cognate interactions were nevertheless detected between few Bsr partition systems of replicons not belonging to the same genome. These observations suggested that Bsr partition systems could have a common ancestry but that evolution mostly erased the possibilities of cross-reactions between them, in particular to prevent replicon incompatibility. To detect novel similarities between Bsr partition systems, we have analyzed the binding of six Bsr parS sequences and a wide collection of modified derivatives, to their cognate ParB. The study was carried out by Surface Plasmon Resonance imaging (SPRi) mulitplex analysis enabling a systematic survey of each nucleotide position within the centromere. We found that in each parS some positions could be changed while maintaining binding to ParB. Each centromere displays its own pattern of changes, but some positions are shared more or less widely. In addition from these changes we could speculate evolutionary links between these centromeres.

  13. Universal Frequency Domain Baseband Receiver Structure for Future Military Software Defined Radios

    DTIC Science & Technology

    2010-09-01

    selective channels, i.e., it may have a poor performance at good conditions [4]. Military systems may require a direct sequence ( DS ) component for...frequency bins using a spreading code. This is called the MC- CDMA signal. Note that spreading does not need to cover all the subcarriers but just a few, like...preambles with appropriate frequency domain properties. A DS component can be added as usually. The FDP block then includes this code as a reference

  14. Vecuum: identification and filtration of false somatic variants caused by recombinant vector contamination.

    PubMed

    Kim, Junho; Maeng, Ju Heon; Lim, Jae Seok; Son, Hyeonju; Lee, Junehawk; Lee, Jeong Ho; Kim, Sangwoo

    2016-10-15

    Advances in sequencing technologies have remarkably lowered the detection limit of somatic variants to a low frequency. However, calling mutations at this range is still confounded by many factors including environmental contamination. Vector contamination is a continuously occurring issue and is especially problematic since vector inserts are hardly distinguishable from the sample sequences. Such inserts, which may harbor polymorphisms and engineered functional mutations, can result in calling false variants at corresponding sites. Numerous vector-screening methods have been developed, but none could handle contamination from inserts because they are focusing on vector backbone sequences alone. We developed a novel method-Vecuum-that identifies vector-originated reads and resultant false variants. Since vector inserts are generally constructed from intron-less cDNAs, Vecuum identifies vector-originated reads by inspecting the clipping patterns at exon junctions. False variant calls are further detected based on the biased distribution of mutant alleles to vector-originated reads. Tests on simulated and spike-in experimental data validated that Vecuum could detect 93% of vector contaminants and could remove up to 87% of variant-like false calls with 100% precision. Application to public sequence datasets demonstrated the utility of Vecuum in detecting false variants resulting from various types of external contamination. Java-based implementation of the method is available at http://vecuum.sourceforge.net/ CONTACT: swkim@yuhs.acSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  15. GibbsCluster: unsupervised clustering and alignment of peptide sequences.

    PubMed

    Andreatta, Massimo; Alvarez, Bruno; Nielsen, Morten

    2017-07-03

    Receptor interactions with short linear peptide fragments (ligands) are at the base of many biological signaling processes. Conserved and information-rich amino acid patterns, commonly called sequence motifs, shape and regulate these interactions. Because of the properties of a receptor-ligand system or of the assay used to interrogate it, experimental data often contain multiple sequence motifs. GibbsCluster is a powerful tool for unsupervised motif discovery because it can simultaneously cluster and align peptide data. The GibbsCluster 2.0 presented here is an improved version incorporating insertion and deletions accounting for variations in motif length in the peptide input. In basic terms, the program takes as input a set of peptide sequences and clusters them into meaningful groups. It returns the optimal number of clusters it identified, together with the sequence alignment and sequence motif characterizing each cluster. Several parameters are available to customize cluster analysis, including adjustable penalties for small clusters and overlapping groups and a trash cluster to remove outliers. As an example application, we used the server to deconvolute multiple specificities in large-scale peptidome data generated by mass spectrometry. The server is available at http://www.cbs.dtu.dk/services/GibbsCluster-2.0. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  16. G4RNA: an RNA G-quadruplex database

    PubMed Central

    Garant, Jean-Michel; Luce, Mikael J.; Scott, Michelle S.

    2015-01-01

    Abstract G-quadruplexes (G4) are tetrahelical structures formed from planar arrangement of guanines in nucleic acids. A simple, regular motif was originally proposed to describe G4-forming sequences. More recently, however, formation of G4 was discovered to depend, at least in part, on the contextual backdrop of neighboring sequences. Prediction of G4 folding is thus becoming more challenging as G4 outlier structures, not described by the originally proposed motif, are increasingly reported. Recent observations thus call for a comprehensive tool, capable of consolidating the expanding information on tested G4s, in order to conduct systematic comparative analyses of G4-promoting sequences. The G4RNA Database we propose was designed to help meet the need for easily-retrievable data on known RNA G4s. A user-friendly, flexible query system allows for data retrieval on experimentally tested sequences, from many separate genes, to assess G4-folding potential. Query output sorts data according to sequence position, G4 likelihood, experimental outcomes and associated bibliographical references. G4RNA also provides an ideal foundation to collect and store additional sequence and experimental data, considering the growing interest G4s currently generate. Database URL: scottgroup.med.usherbrooke.ca/G4RNA PMID:26200754

  17. A massive parallel sequencing workflow for diagnostic genetic testing of mismatch repair genes

    PubMed Central

    Hansen, Maren F; Neckmann, Ulrike; Lavik, Liss A S; Vold, Trine; Gilde, Bodil; Toft, Ragnhild K; Sjursen, Wenche

    2014-01-01

    The purpose of this study was to develop a massive parallel sequencing (MPS) workflow for diagnostic analysis of mismatch repair (MMR) genes using the GS Junior system (Roche). A pathogenic variant in one of four MMR genes, (MLH1, PMS2, MSH6, and MSH2), is the cause of Lynch Syndrome (LS), which mainly predispose to colorectal cancer. We used an amplicon-based sequencing method allowing specific and preferential amplification of the MMR genes including PMS2, of which several pseudogenes exist. The amplicons were pooled at different ratios to obtain coverage uniformity and maximize the throughput of a single-GS Junior run. In total, 60 previously identified and distinct variants (substitutions and indels), were sequenced by MPS and successfully detected. The heterozygote detection range was from 19% to 63% and dependent on sequence context and coverage. We were able to distinguish between false-positive and true-positive calls in homopolymeric regions by cross-sample comparison and evaluation of flow signal distributions. In addition, we filtered variants according to a predefined status, which facilitated variant annotation. Our study shows that implementation of MPS in routine diagnostics of LS can accelerate sample throughput and reduce costs without compromising sensitivity, compared to Sanger sequencing. PMID:24689082

  18. ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection

    PubMed Central

    Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

    2013-01-01

    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors. PMID:24688709

  19. ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection.

    PubMed

    Alachiotis, Nikolaos; Vogiatzi, Emmanouella; Pavlidis, Pavlos; Stamatakis, Alexandros

    2013-01-01

    Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.

  20. Automated synthesis and composition of taskblocks for control of manufacturing systems.

    PubMed

    Holloway, L E; Guan, X; Sundaravadivelu, R; Ashley, J R

    2000-01-01

    Automated control synthesis methods for discrete-event systems promise to reduce the time required to develop, debug, and modify control software. Such methods must be able to translate high-level control goals into detailed sequences of actuation and sensing signals. In this paper, we present such a technique. It relies on analysis of a system model, defined as a set of interacting components, each represented as a form of condition system Petri net. Control logic modules, called taskblocks, are synthesized from these individual models. These then interact hierarchically and sequentially to drive the system through specified control goals. The resulting controller is automatically converted to executable control code. The paper concludes with a discussion of a set of software tools developed to demonstrate the techniques on a small manufacturing system.

  1. DIVWAG Model Documentation. Volume II. Programmer/Analyst Manual. Part 3. Chapter 9 Through 12.

    DTIC Science & Technology

    1976-07-01

    entered through a routine, NAM2, that calls the segment controlling routine NBARAS. (4) Segment 3, controlled by the routine NFIRE , simulates round...nuclear fire, NAM calls in sequence the routines NFIRE (segment 3), ASUNIT (segment 2), SASSMT (segment 4), and NFIRE (segment 3). These calls simulate...this is a call to NFIRE (ISEG equals one or two), control goes to block L2. (2) Block 2. If this is to assess a unit passing through a nuclear barrier

  2. Assignment of protein sequences to existing domain and family classification systems: Pfam and the PDB

    PubMed Central

    Dunbrack, Roland L.

    2012-01-01

    Motivation: Automating the assignment of existing domain and protein family classifications to new sets of sequences is an important task. Current methods often miss assignments because remote relationships fail to achieve statistical significance. Some assignments are not as long as the actual domain definitions because local alignment methods often cut alignments short. Long insertions in query sequences often erroneously result in two copies of the domain assigned to the query. Divergent repeat sequences in proteins are often missed. Results: We have developed a multilevel procedure to produce nearly complete assignments of protein families of an existing classification system to a large set of sequences. We apply this to the task of assigning Pfam domains to sequences and structures in the Protein Data Bank (PDB). We found that HHsearch alignments frequently scored more remotely related Pfams in Pfam clans higher than closely related Pfams, thus, leading to erroneous assignment at the Pfam family level. A greedy algorithm allowing for partial overlaps was, thus, applied first to sequence/HMM alignments, then HMM–HMM alignments and then structure alignments, taking care to join partial alignments split by large insertions into single-domain assignments. Additional assignment of repeat Pfams with weaker E-values was allowed after stronger assignments of the repeat HMM. Our database of assignments, presented in a database called PDBfam, contains Pfams for 99.4% of chains >50 residues. Availability: The Pfam assignment data in PDBfam are available at http://dunbrack2.fccc.edu/ProtCid/PDBfam, which can be searched by PDB codes and Pfam identifiers. They will be updated regularly. Contact: Roland.Dunbracks@fccc.edu PMID:22942020

  3. An efficient, versatile and scalable pattern growth approach to mine frequent patterns in unaligned protein sequences.

    PubMed

    Ye, Kai; Kosters, Walter A; Ijzerman, Adriaan P

    2007-03-15

    Pattern discovery in protein sequences is often based on multiple sequence alignments (MSA). The procedure can be computationally intensive and often requires manual adjustment, which may be particularly difficult for a set of deviating sequences. In contrast, two algorithms, PRATT2 (http//www.ebi.ac.uk/pratt/) and TEIRESIAS (http://cbcsrv.watson.ibm.com/) are used to directly identify frequent patterns from unaligned biological sequences without an attempt to align them. Here we propose a new algorithm with more efficiency and more functionality than both PRATT2 and TEIRESIAS, and discuss some of its applications to G protein-coupled receptors, a protein family of important drug targets. In this study, we designed and implemented six algorithms to mine three different pattern types from either one or two datasets using a pattern growth approach. We compared our approach to PRATT2 and TEIRESIAS in efficiency, completeness and the diversity of pattern types. Compared to PRATT2, our approach is faster, capable of processing large datasets and able to identify the so-called type III patterns. Our approach is comparable to TEIRESIAS in the discovery of the so-called type I patterns but has additional functionality such as mining the so-called type II and type III patterns and finding discriminating patterns between two datasets. The source code for pattern growth algorithms and their pseudo-code are available at http://www.liacs.nl/home/kosters/pg/.

  4. CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

    PubMed Central

    2017-01-01

    Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously. PMID:28630866

  5. A WorkFlow Engine Oriented Modeling System for Hydrologic Sciences

    NASA Astrophysics Data System (ADS)

    Lu, B.; Piasecki, M.

    2009-12-01

    In recent years the use of workflow engines for carrying out modeling and data analyses tasks has gained increased attention in the science and engineering communities. Tasks like processing raw data coming from sensors and passing these raw data streams to filters for QA/QC procedures possibly require multiple and complicated steps that need to be repeated over and over again. A workflow sequence that carries out a number of steps of various complexity is an ideal approach to deal with these tasks because the sequence can be stored, called up and repeated over again and again. This has several advantages: for one it ensures repeatability of processing steps and with that provenance, an issue that is increasingly important in the science and engineering communities. It also permits the hand off of lengthy and time consuming tasks that can be error prone to a chain of processing actions that are carried out automatically thus reducing the chance for error on the one side and freeing up time to carry out other tasks on the other hand. This paper aims to present the development of a workflow engine embedded modeling system which allows to build up working sequences for carrying out numerical modeling tasks regarding to hydrologic science. Trident, which facilitates creating, running and sharing scientific data analysis workflows, is taken as the central working engine of the modeling system. Current existing functionalities of the modeling system involve digital watershed processing, online data retrieval, hydrologic simulation and post-event analysis. They are stored as sequences or modules respectively. The sequences can be invoked to implement their preset tasks in orders, for example, triangulating a watershed from raw DEM. Whereas the modules encapsulated certain functions can be selected and connected through a GUI workboard to form sequences. This modeling system is demonstrated by setting up a new sequence for simulating rainfall-runoff processes which involves embedded Penn State Integrated Hydrologic Model(PIHM) module for hydrologic simulation as a kernel, DEM processing sub-sequence which prepares geospatial data for PIHM, data retrieval module which access time series data from online data repository via web services or from local database, post- data management module which stores , visualizes and analyzes model outputs.

  6. Sequence and batch language programs and alarm related C Programs for the 242-A MCS

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berger, J.F.

    1996-04-15

    A Distributive Process Control system was purchased by Project B-534, 242-A Evaporator/Crystallizer Upgrades. This control system, called the Monitor and Control system (MCS), was installed in the 242-A evaporator located in the 200 East Area. The purpose of the MCS is to monitor and control the Evaporator and monitor a number of alarms and other signals from various Tank Farm facilities. Applications software for the MCS was developed by the Waste Treatment Systems Engineering (WTSE) group of Westinghouse. The standard displays and alarm scheme provide for control and monitoring, but do not directly indicate the signal location or depict themore » overall process. To do this, WTSE developed a second alarm scheme.« less

  7. Improved detection of CXCR4-using HIV by V3 genotyping: application of population-based and "deep" sequencing to plasma RNA and proviral DNA.

    PubMed

    Swenson, Luke C; Moores, Andrew; Low, Andrew J; Thielen, Alexander; Dong, Winnie; Woods, Conan; Jensen, Mark A; Wynhoven, Brian; Chan, Dennison; Glascock, Christopher; Harrigan, P Richard

    2010-08-01

    Tropism testing should rule out CXCR4-using HIV before treatment with CCR5 antagonists. Currently, the recombinant phenotypic Trofile assay (Monogram) is most widely utilized; however, genotypic tests may represent alternative methods. Independent triplicate amplifications of the HIV gp120 V3 region were made from either plasma HIV RNA or proviral DNA. These underwent standard, population-based sequencing with an ABI3730 (RNA n = 63; DNA n = 40), or "deep" sequencing with a Roche/454 Genome Sequencer-FLX (RNA n = 12; DNA n = 12). Position-specific scoring matrices (PSSMX4/R5) (-6.96 cutoff) and geno2pheno[coreceptor] (5% false-positive rate) inferred tropism from V3 sequence. These methods were then independently validated with a separate, blinded dataset (n = 278) of screening samples from the maraviroc MOTIVATE trials. Standard sequencing of HIV RNA with PSSM yielded 69% sensitivity and 91% specificity, relative to Trofile. The validation dataset gave 75% sensitivity and 83% specificity. Proviral DNA plus PSSM gave 77% sensitivity and 71% specificity. "Deep" sequencing of HIV RNA detected >2% inferred-CXCR4-using virus in 8/8 samples called non-R5 by Trofile, and <2% in 4/4 samples called R5. Triplicate analyses of V3 standard sequence data detect greater proportions of CXCR4-using samples than previously achieved. Sequencing proviral DNA and "deep" V3 sequencing may also be useful tools for assessing tropism.

  8. Evolution of the Students' Conceptual Understanding in the Case of a Teaching Sequence in Mechanics: Concept of Interaction

    ERIC Educational Resources Information Center

    Küçüközer, Asuman

    2006-01-01

    This study aims to better understand the construction of the meaning of physics concepts in mechanics during a teaching sequence at the upper secondary school level. In the teaching sessions, students were introduced to the concepts of interaction and force. During this teaching sequence the models called "interactions" and "laws of…

  9. Formulaic Sequences Used by Native English Speaking Teachers in a Thai Primary School

    ERIC Educational Resources Information Center

    Steyn, Sunee; Jaroongkhongdach, Woravut

    2016-01-01

    The use of formulaic sequences in English as a Foreign Language (EFL) lessons plays an integral role in language teaching and learning, but it seems still widely neglected in the Thai school context. To call attention to this issue, this study aims at identifying formulaic sequences used in a Thai primary school. The data were taken from three…

  10. Sequencing Conservation Actions Through Threat Assessments in the Southeastern United States

    Treesearch

    Robert D. Sutter; Christopher C. Szell

    2006-01-01

    The identification of conservation priorities is one of the leading issues in conservation biology. We present a project of The Nature Conservancy, called Sequencing Conservation Actions, which prioritizes conservation areas and identifies foci for crosscutting strategies at various geographic scales. We use the term “Sequencing” to mean an ordering of actions over...

  11. Ribosomal DNA replication fork barrier and HOT1 recombination hot spot: shared sequences but independent activities.

    PubMed

    Ward, T R; Hoang, M L; Prusty, R; Lau, C K; Keil, R L; Fangman, W L; Brewer, B J

    2000-07-01

    In the ribosomal DNA of Saccharomyces cerevisiae, sequences in the nontranscribed spacer 3' of the 35S ribosomal RNA gene are important to the polar arrest of replication forks at a site called the replication fork barrier (RFB) and also to the cis-acting, mitotic hyperrecombination site called HOT1. We have found that the RFB and HOT1 activity share some but not all of their essential sequences. Many of the mutations that reduce HOT1 recombination also decrease or eliminate fork arrest at one of two closely spaced RFB sites, RFB1 and RFB2. A simple model for the juxtaposition of RFB and HOT1 sequences is that the breakage of strands in replication forks arrested at RFB stimulates recombination. Contrary to this model, we show here that HOT1-stimulated recombination does not require the arrest of forks at the RFB. Therefore, while HOT1 activity is independent of replication fork arrest, HOT1 and RFB require some common sequences, suggesting the existence of a common trans-acting factor(s).

  12. Are special read alignment strategies necessary and cost-effective when handling sequencing reads from patient-derived tumor xenografts?

    PubMed

    Tso, Kai-Yuen; Lee, Sau Dan; Lo, Kwok-Wai; Yip, Kevin Y

    2014-12-23

    Patient-derived tumor xenografts in mice are widely used in cancer research and have become important in developing personalized therapies. When these xenografts are subject to DNA sequencing, the samples could contain various amounts of mouse DNA. It has been unclear how the mouse reads would affect data analyses. We conducted comprehensive simulations to compare three alignment strategies at different mutation rates, read lengths, sequencing error rates, human-mouse mixing ratios and sequenced regions. We also sequenced a nasopharyngeal carcinoma xenograft and a cell line to test how the strategies work on real data. We found the "filtering" and "combined reference" strategies performed better than aligning reads directly to human reference in terms of alignment and variant calling accuracies. The combined reference strategy was particularly good at reducing false negative variants calls without significantly increasing the false positive rate. In some scenarios the performance gain of these two special handling strategies was too small for special handling to be cost-effective, but it was found crucial when false non-synonymous SNVs should be minimized, especially in exome sequencing. Our study systematically analyzes the effects of mouse contamination in the sequencing data of human-in-mouse xenografts. Our findings provide information for designing data analysis pipelines for these data.

  13. "First generation" automated DNA sequencing technology.

    PubMed

    Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

    2011-10-01

    Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.

  14. A family of Nikishin systems with periodic recurrence coefficients

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Delvaux, Steven; Lopez, Abey; Lopez, Guillermo L

    2013-01-31

    Suppose we have a Nikishin system of p measures with the kth generating measure of the Nikishin system supported on an interval {Delta}{sub k} subset of R with {Delta}{sub k} Intersection {Delta}{sub k+1} = Empty-Set for all k. It is well known that the corresponding staircase sequence of multiple orthogonal polynomials satisfies a (p+2)-term recurrence relation whose recurrence coefficients, under appropriate assumptions on the generating measures, have periodic limits of period p. (The limit values depend only on the positions of the intervals {Delta}{sub k}.) Taking these periodic limit values as the coefficients of a new (p+2)-term recurrence relation, wemore » construct a canonical sequence of monic polynomials {l_brace}P{sub n}{r_brace}{sub n=0}{sup {infinity}}, the so-called Chebyshev-Nikishin polynomials. We show that the polynomials P{sub n} themselves form a sequence of multiple orthogonal polynomials with respect to some Nikishin system of measures, with the kth generating measure being absolutely continuous on {Delta}{sub k}. In this way we generalize a result of the third author and Rocha [22] for the case p=2. The proof uses the connection with block Toeplitz matrices, and with a certain Riemann surface of genus zero. We also obtain strong asymptotics and an exact Widom-type formula for functions of the second kind of the Nikishin system for {l_brace}P{sub n}{r_brace}{sub n=0}{sup {infinity}}. Bibliography: 27 titles.« less

  15. PVDaCS - A prototype knowledge-based expert system for certification of spacecraft data

    NASA Technical Reports Server (NTRS)

    Wharton, Cathleen; Shiroma, Patricia J.; Simmons, Karen E.

    1989-01-01

    On-line data management techniques to certify spacecraft information are mandated by increasing telemetry rates. Knowledge-based expert systems offer the ability to certify data electronically without the need for time-consuming human interaction. Issues of automatic certification are explored by designing a knowledge-based expert system to certify data from a scientific instrument, the Orbiter Ultraviolet Spectrometer, on an operating NASA planetary spacecraft, Pioneer Venus. The resulting rule-based system, called PVDaCS (Pioneer Venus Data Certification System), is a functional prototype demonstrating the concepts of a larger system design. A key element of the system design is the representation of an expert's knowledge through the usage of well ordered sequences. PVDaCS produces a certification value derived from expert knowledge and an analysis of the instrument's operation. Results of system performance are presented.

  16. Optimization of European call options considering physical delivery network and reservoir operation rules

    NASA Astrophysics Data System (ADS)

    Cheng, Wei-Chen; Hsu, Nien-Sheng; Cheng, Wen-Ming; Yeh, William W.-G.

    2011-10-01

    This paper develops alternative strategies for European call options for water purchase under hydrological uncertainties that can be used by water resources managers for decision making. Each alternative strategy maximizes its own objective over a selected sequence of future hydrology that is characterized by exceedance probability. Water trade provides flexibility and enhances water distribution system reliability. However, water trade between two parties in a regional water distribution system involves many issues, such as delivery network, reservoir operation rules, storage space, demand, water availability, uncertainty, and any existing contracts. An option is a security giving the right to buy or sell an asset; in our case, the asset is water. We extend a flow path-based water distribution model to include reservoir operation rules. The model simultaneously considers both the physical distribution network as well as the relationships between water sellers and buyers. We first test the model extension. Then we apply the proposed optimization model for European call options to the Tainan water distribution system in southern Taiwan. The formulation lends itself to a mixed integer linear programming model. We use the weighing method to formulate a composite function for a multiobjective problem. The proposed methodology provides water resources managers with an overall picture of water trade strategies and the consequence of each strategy. The results from the case study indicate that the strategy associated with a streamflow exceedence probability of 50% or smaller should be adopted as the reference strategy for the Tainan water distribution system.

  17. Effective normalization for copy number variation detection from whole genome sequencing.

    PubMed

    Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

    2012-01-01

    Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.

  18. Analysis of the type II-A CRISPR-Cas system of Streptococcus agalactiae reveals distinctive features according to genetic lineages

    PubMed Central

    Lier, Clément; Baticle, Elodie; Horvath, Philippe; Haguenoer, Eve; Valentin, Anne-Sophie; Glaser, Philippe; Mereghetti, Laurent; Lanotte, Philippe

    2015-01-01

    CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) are found in 90% of archaea and about 40% of bacteria. In this original system, CRISPR arrays comprise short, almost unique sequences called spacers that are interspersed with conserved palindromic repeats. These systems play a role in adaptive immunity and participate to fight non-self DNA such as integrative and conjugative elements, plasmids, and phages. In Streptococcus agalactiae, a bacterium implicated in colonization and infections in humans since the 1960s, two CRISPR-Cas systems have been described. A type II-A system, characterized by proteins Cas9, Cas1, Cas2, and Csn2, is ubiquitous, and a type I–C system, with the Cas8c signature protein, is present in about 20% of the isolates. Unlike type I–C, which appears to be non-functional, type II-A appears fully functional. Here we studied type II-A CRISPR-cas loci from 126 human isolates of S. agalactiae belonging to different clonal complexes that represent the diversity of the species and that have been implicated in colonization or infection. The CRISPR-cas locus was analyzed both at spacer and repeat levels. Major distinctive features were identified according to the phylogenetic lineages previously defined by multilocus sequence typing, especially for the sequence type (ST) 17, which is considered hypervirulent. Among other idiosyncrasies, ST-17 shows a significantly lower number of spacers in comparison with other lineages. This characteristic could reflect the peculiar virulence or colonization specificities of this lineage. PMID:26124774

  19. GAMSOR: Gamma Source Preparation and DIF3D Flux Solution

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Smith, M. A.; Lee, C. H.; Hill, R. N.

    2017-06-28

    Nuclear reactors that rely upon the fission reaction have two modes of thermal energy deposition in the reactor system: neutron absorption and gamma absorption. The gamma rays are typically generated by neutron capture reactions or during the fission process which means the primary driver of energy production is of course the neutron interaction. In conventional reactor physics methods, the gamma heating component is ignored such that the gamma absorption is forced to occur at the gamma emission site. For experimental reactor systems like EBR-II and FFTF, the placement of structural pins and assemblies internal to the core leads to problemsmore » with power heating predictions because there is no fission power source internal to the assembly to dictate a spatial distribution of the power. As part of the EBR-II support work in the 1980s, the GAMSOR code was developed to assist analysts in calculating the gamma heating. The GAMSOR code is a modified version of DIF3D and actually functions within a sequence of DIF3D calculations. The gamma flux in a conventional fission reactor system does not perturb the neutron flux and thus the gamma flux calculation can be cast as a fixed source problem given a solution to the steady state neutron flux equation. This leads to a sequence of DIF3D calculations, called the GAMSOR sequence, which involves solving the neutron flux, then the gamma flux, and then combining the results to do a summary edit. In this manuscript, we go over the GAMSOR code and detail how it is put together and functions. We also discuss how to setup the GAMSOR sequence and input for each DIF3D calculation in the GAMSOR sequence.« less

  20. Sptrace

    NASA Technical Reports Server (NTRS)

    Burleigh, Scott C.

    2011-01-01

    Sptrace is a general-purpose space utilization tracing system that is conceptually similar to the commercial Purify product used to detect leaks and other memory usage errors. It is designed to monitor space utilization in any sort of heap, i.e., a region of data storage on some device (nominally memory; possibly shared and possibly persistent) with a flat address space. This software can trace usage of shared and/or non-volatile storage in addition to private RAM (random access memory). Sptrace is implemented as a set of C function calls that are invoked from within the software that is being examined. The function calls fall into two broad classes: (1) functions that are embedded within the heap management software [e.g., JPL's SDR (Simple Data Recorder) and PSM (Personal Space Management) systems] to enable heap usage analysis by populating a virtual time-sequenced log of usage activity, and (2) reporting functions that are embedded within the application program whose behavior is suspect. For ease of use, these functions may be wrapped privately inside public functions offered by the heap management software. Sptrace can be used for VxWorks or RTEMS realtime systems as easily as for Linux or OS/X systems.

  1. A Security Monitoring Framework For Virtualization Based HEP Infrastructures

    NASA Astrophysics Data System (ADS)

    Gomez Ramirez, A.; Martinez Pedreira, M.; Grigoras, C.; Betev, L.; Lara, C.; Kebschull, U.; ALICE Collaboration

    2017-10-01

    High Energy Physics (HEP) distributed computing infrastructures require automatic tools to monitor, analyze and react to potential security incidents. These tools should collect and inspect data such as resource consumption, logs and sequence of system calls for detecting anomalies that indicate the presence of a malicious agent. They should also be able to perform automated reactions to attacks without administrator intervention. We describe a novel framework that accomplishes these requirements, with a proof of concept implementation for the ALICE experiment at CERN. We show how we achieve a fully virtualized environment that improves the security by isolating services and Jobs without a significant performance impact. We also describe a collected dataset for Machine Learning based Intrusion Prevention and Detection Systems on Grid computing. This dataset is composed of resource consumption measurements (such as CPU, RAM and network traffic), logfiles from operating system services, and system call data collected from production Jobs running in an ALICE Grid test site and a big set of malware samples. This malware set was collected from security research sites. Based on this dataset, we will proceed to develop Machine Learning algorithms able to detect malicious Jobs.

  2. Estimating genotype error rates from high-coverage next-generation sequence data.

    PubMed

    Wall, Jeffrey D; Tang, Ling Fung; Zerbe, Brandon; Kvale, Mark N; Kwok, Pui-Yan; Schaefer, Catherine; Risch, Neil

    2014-11-01

    Exome and whole-genome sequencing studies are becoming increasingly common, but little is known about the accuracy of the genotype calls made by the commonly used platforms. Here we use replicate high-coverage sequencing of blood and saliva DNA samples from four European-American individuals to estimate lower bounds on the error rates of Complete Genomics and Illumina HiSeq whole-genome and whole-exome sequencing. Error rates for nonreference genotype calls range from 0.1% to 0.6%, depending on the platform and the depth of coverage. Additionally, we found (1) no difference in the error profiles or rates between blood and saliva samples; (2) Complete Genomics sequences had substantially higher error rates than Illumina sequences had; (3) error rates were higher (up to 6%) for rare or unique variants; (4) error rates generally declined with genotype quality (GQ) score, but in a nonlinear fashion for the Illumina data, likely due to loss of specificity of GQ scores greater than 60; and (5) error rates increased with increasing depth of coverage for the Illumina data. These findings, especially (3)-(5), suggest that caution should be taken in interpreting the results of next-generation sequencing-based association studies, and even more so in clinical application of this technology in the absence of validation by other more robust sequencing or genotyping methods. © 2014 Wall et al.; Published by Cold Spring Harbor Laboratory Press.

  3. Detection of microRNAs in color space.

    PubMed

    Marco, Antonio; Griffiths-Jones, Sam

    2012-02-01

    Deep sequencing provides inexpensive opportunities to characterize the transcriptional diversity of known genomes. The AB SOLiD technology generates millions of short sequencing reads in color-space; that is, the raw data is a sequence of colors, where each color represents 2 nt and each nucleotide is represented by two consecutive colors. This strategy is purported to have several advantages, including increased ability to distinguish sequencing errors from polymorphisms. Several programs have been developed to map short reads to genomes in color space. However, a number of previously unexplored technical issues arise when using SOLiD technology to characterize microRNAs. Here we explore these technical difficulties. First, since the sequenced reads are longer than the biological sequences, every read is expected to contain linker fragments. The color-calling error rate increases toward the 3(') end of the read such that recognizing the linker sequence for removal becomes problematic. Second, mapping in color space may lead to the loss of the first nucleotide of each read. We propose a sequential trimming and mapping approach to map small RNAs. Using our strategy, we reanalyze three published insect small RNA deep sequencing datasets and characterize 22 new microRNAs. A bash shell script to perform the sequential trimming and mapping procedure, called SeqTrimMap, is available at: http://www.mirbase.org/tools/seqtrimmap/ antonio.marco@manchester.ac.uk Supplementary data are available at Bioinformatics online.

  4. Pan-cancer analysis reveals technical artifacts in TCGA germline variant calls.

    PubMed

    Buckley, Alexandra R; Standish, Kristopher A; Bhutani, Kunal; Ideker, Trey; Lasken, Roger S; Carter, Hannah; Harismendy, Olivier; Schork, Nicholas J

    2017-06-12

    Cancer research to date has largely focused on somatically acquired genetic aberrations. In contrast, the degree to which germline, or inherited, variation contributes to tumorigenesis remains unclear, possibly due to a lack of accessible germline variant data. Here we called germline variants on 9618 cases from The Cancer Genome Atlas (TCGA) database representing 31 cancer types. We identified batch effects affecting loss of function (LOF) variant calls that can be traced back to differences in the way the sequence data were generated both within and across cancer types. Overall, LOF indel calls were more sensitive to technical artifacts than LOF Single Nucleotide Variant (SNV) calls. In particular, whole genome amplification of DNA prior to sequencing led to an artificially increased burden of LOF indel calls, which confounded association analyses relating germline variants to tumor type despite stringent indel filtering strategies. The samples affected by these technical artifacts include all acute myeloid leukemia and practically all ovarian cancer samples. We demonstrate how technical artifacts induced by whole genome amplification of DNA can lead to false positive germline-tumor type associations and suggest TCGA whole genome amplified samples be used with caution. This study draws attention to the need to be sensitive to problems associated with a lack of uniformity in data generation in TCGA data.

  5. Simulation system of arrhythmia using ActiveX control.

    PubMed

    Takeuchi, Akihiro; Hirose, Minoru; Hamada, Atsushi; Ikeda, Noriaki

    2005-07-01

    A simulation system for arrhythmias has been developed using Windows-based software technology, ActiveX control. The cardiac module consists of six cells, the sinus, atrium, AV node, ventricle, and ectopic foci. The physiological properties of the cells, the automaticity and conduction delay, were modelled, respectively, by the phase response curve and the excitability recovery curve. Cell functions were implemented in the ActiveX control and incorporated into the cardiac module. The system draws the ECG sequence as a ladder diagram in real time. The system interactively shows diverse arrhythmias for various user settings of the cell function and bidirectional conduction between the cells. Users are able to experiment virtually by setting up a so-called electrophysiological stimulation. This system is useful for learning and for teaching the interaction between the cells and arrhythmias.

  6. A remark on copy number variation detection methods.

    PubMed

    Li, Shuo; Dou, Xialiang; Gao, Ruiqi; Ge, Xinzhou; Qian, Minping; Wan, Lin

    2018-01-01

    Copy number variations (CNVs) are gain and loss of DNA sequence of a genome. High throughput platforms such as microarrays and next generation sequencing technologies (NGS) have been applied for genome wide copy number losses. Although progress has been made in both approaches, the accuracy and consistency of CNV calling from the two platforms remain in dispute. In this study, we perform a deep analysis on copy number losses on 254 human DNA samples, which have both SNP microarray data and NGS data publicly available from Hapmap Project and 1000 Genomes Project respectively. We show that the copy number losses reported from Hapmap Project and 1000 Genome Project only have < 30% overlap, while these reports are required to have cross-platform (e.g. PCR, microarray and high-throughput sequencing) experimental supporting by their corresponding projects, even though state-of-art calling methods were employed. On the other hand, copy number losses are found directly from HapMap microarray data by an accurate algorithm, i.e. CNVhac, almost all of which have lower read mapping depth in NGS data; furthermore, 88% of which can be supported by the sequences with breakpoint in NGS data. Our results suggest the ability of microarray calling CNVs and the possible introduction of false negatives from the unessential requirement of the additional cross-platform supporting. The inconsistency of CNV reports from Hapmap Project and 1000 Genomes Project might result from the inadequate information containing in microarray data, the inconsistent detection criteria, or the filtration effect of cross-platform supporting. The statistical test on CNVs called from CNVhac show that the microarray data can offer reliable CNV reports, and majority of CNV candidates can be confirmed by raw sequences. Therefore, the CNV candidates given by a good caller could be highly reliable without cross-platform supporting, so additional experimental information should be applied in need instead of necessarily.

  7. Not all transmembrane helices are born equal: Towards the extension of the sequence homology concept to membrane proteins

    PubMed Central

    2011-01-01

    Background Sequence homology considerations widely used to transfer functional annotation to uncharacterized protein sequences require special precautions in the case of non-globular sequence segments including membrane-spanning stretches composed of non-polar residues. Simple, quantitative criteria are desirable for identifying transmembrane helices (TMs) that must be included into or should be excluded from start sequence segments in similarity searches aimed at finding distant homologues. Results We found that there are two types of TMs in membrane-associated proteins. On the one hand, there are so-called simple TMs with elevated hydrophobicity, low sequence complexity and extraordinary enrichment in long aliphatic residues. They merely serve as membrane-anchoring device. In contrast, so-called complex TMs have lower hydrophobicity, higher sequence complexity and some functional residues. These TMs have additional roles besides membrane anchoring such as intra-membrane complex formation, ligand binding or a catalytic role. Simple and complex TMs can occur both in single- and multi-membrane-spanning proteins essentially in any type of topology. Whereas simple TMs have the potential to confuse searches for sequence homologues and to generate unrelated hits with seemingly convincing statistical significance, complex TMs contain essential evolutionary information. Conclusion For extending the homology concept onto membrane proteins, we provide a necessary quantitative criterion to distinguish simple TMs (and a sufficient criterion for complex TMs) in query sequences prior to their usage in homology searches based on assessment of hydrophobicity and sequence complexity of the TM sequence segments. Reviewers This article was reviewed by Shamil Sunyaev, L. Aravind and Arcady Mushegian. PMID:22024092

  8. Comparison of illumina and 454 deep sequencing in participants failing raltegravir-based antiretroviral therapy.

    PubMed

    Li, Jonathan Z; Chapman, Brad; Charlebois, Patrick; Hofmann, Oliver; Weiner, Brian; Porter, Alyssa J; Samuel, Reshmi; Vardhanabhuti, Saran; Zheng, Lu; Eron, Joseph; Taiwo, Babafemi; Zody, Michael C; Henn, Matthew R; Kuritzkes, Daniel R; Hide, Winston; Wilson, Cara C; Berzins, Baiba I; Acosta, Edward P; Bastow, Barbara; Kim, Peter S; Read, Sarah W; Janik, Jennifer; Meres, Debra S; Lederman, Michael M; Mong-Kryspin, Lori; Shaw, Karl E; Zimmerman, Louis G; Leavitt, Randi; De La Rosa, Guy; Jennings, Amy

    2014-01-01

    The impact of raltegravir-resistant HIV-1 minority variants (MVs) on raltegravir treatment failure is unknown. Illumina sequencing offers greater throughput than 454, but sequence analysis tools for viral sequencing are needed. We evaluated Illumina and 454 for the detection of HIV-1 raltegravir-resistant MVs. A5262 was a single-arm study of raltegravir and darunavir/ritonavir in treatment-naïve patients. Pre-treatment plasma was obtained from 5 participants with raltegravir resistance at the time of virologic failure. A control library was created by pooling integrase clones at predefined proportions. Multiplexed sequencing was performed with Illumina and 454 platforms at comparable costs. Illumina sequence analysis was performed with the novel snp-assess tool and 454 sequencing was analyzed with V-Phaser. Illumina sequencing resulted in significantly higher sequence coverage and a 0.095% limit of detection. Illumina accurately detected all MVs in the control library at ≥0.5% and 7/10 MVs expected at 0.1%. 454 sequencing failed to detect any MVs at 0.1% with 5 false positive calls. For MVs detected in the patient samples by both 454 and Illumina, the correlation in the detected variant frequencies was high (R2 = 0.92, P<0.001). Illumina sequencing detected 2.4-fold greater nucleotide MVs and 2.9-fold greater amino acid MVs compared to 454. The only raltegravir-resistant MV detected was an E138K mutation in one participant by Illumina sequencing, but not by 454. In participants of A5262 with raltegravir resistance at virologic failure, baseline raltegravir-resistant MVs were rarely detected. At comparable costs to 454 sequencing, Illumina demonstrated greater depth of coverage, increased sensitivity for detecting HIV MVs, and fewer false positive variant calls.

  9. Hidden symmetries in N-layer dielectric stacks

    NASA Astrophysics Data System (ADS)

    Liu, Haihao; Shoufie Ukhtary, M.; Saito, Riichiro

    2017-11-01

    The optical properties of a multilayer system with arbitrary N layers of dielectric media are investigated. Each layer is one of two dielectric media, with a thickness one-quarter the wavelength of light in that medium, corresponding to a central frequency f 0. Using the transfer matrix method, the transmittance T is calculated for all possible 2 N sequences for small N. Unexpectedly, it is found that instead of 2 N different values of T at f 0 (T 0), there are only (N/2+1) discrete values of T 0, for even N, and (N + 1) for odd N. We explain this high degeneracy in T 0 values by finding symmetry operations on the sequences that do not change T 0. Analytical formulae were derived for the T 0 values and their degeneracies as functions of N and an integer parameter for each sequence we call ‘charge’. Additionally, the bandwidth at f 0 and filter response of the transmission spectra are investigated, revealing asymptotic behavior at large N.

  10. Analysis of an "off-ladder" allele at the Penta D short tandem repeat locus.

    PubMed

    Yang, Y L; Wang, J G; Wang, D X; Zhang, W Y; Liu, X J; Cao, J; Yang, S L

    2015-11-25

    Kinship testing of a father and his son from Guangxi, China, the location of the Zhuang minority people, was performed using the PowerPlex® 18D System with a short tandem repeat typing kit. The results indicated that both the father and his son had an off-ladder allele at the Penta D locus, with a genetic size larger than that of the maximal standard allelic ladder. To further identify this locus, monogenic amplification, gene cloning, and genetic sequencing were performed. Sequencing analysis demonstrated that the fragment size of the Penta D-OL locus was 469 bp and the core sequence was [AAAGA]21, also called Penta D-21. The rare Penta D-21 allele was found to be distributed among the Zhuang population from the Guangxi Zhuang Autonomous Region of China; therefore, this study improved the range of DNA data available for this locus and enhanced our ability for individual identification of gene loci.

  11. An analytical study of composite laminate lay-up using search algorithms for maximization of flexural stiffness and minimization of springback angle

    NASA Astrophysics Data System (ADS)

    Singh, Ranjan Kumar; Rinawa, Moti Lal

    2018-04-01

    The residual stresses arising in fiber-reinforced laminates during their curing in closed molds lead to changes in the composites after their removal from the molds and cooling. One of these dimensional changes of angle sections is called springback. The parameters such as lay-up, stacking sequence, material system, cure temperature, thickness etc play important role in it. In present work, it is attempted to optimize lay-up and stacking sequence for maximization of flexural stiffness and minimization of springback angle. The search algorithms are employed to obtain best sequence through repair strategy such as swap. A new search algorithm, termed as lay-up search algorithm (LSA) is also proposed, which is an extension of permutation search algorithm (PSA). The efficacy of PSA and LSA is tested on the laminates with a range of lay-ups. A computer code is developed on MATLAB implementing the above schemes. Also, the strategies for multi objective optimization using search algorithms are suggested and tested.

  12. DNA typing by microbead arrays and PCR-SSP: apparent false-negative or -positive hybridization or amplification signals disclose new HLA-B and -DRB1 alleles.

    PubMed

    Rahal, M; Kervaire, B; Villard, J; Tiercy, J-M

    2008-03-01

    Human leukocyte antigen (HLA) typing by polymerase chain reaction-sequence-specific oligonucleotide (PCR-SSO) hybridization on solid phase (microbead assay) or polymerase chain reaction-sequence-specific primers (PCR-SSP) requires interpretation softwares to detect all possible allele combinations. These programs propose allele calls by taking into account false-positive or false-negative signal(s). The laboratory has the option to validate typing results in the presence of strongly cross-reacting or apparent false-negative signals. Alternatively, these seemingly aberrant signals may disclose novel variants. We report here four new HLA-B (B*5620 and B*5716) and HLA-DRB1 alleles (DRB1*110107 and DRB1*1474) that were detected by apparent false-negative or -positive hybridization or amplification patterns, and ultimately resolved by sequencing. To avoid allele misassignments, a comprehensive evaluation of acquired data as documented in a quality assurance system is therefore required to confirm unambiguous typing interpretation.

  13. SIBIS: a Bayesian model for inconsistent protein sequence estimation.

    PubMed

    Khenoussi, Walyd; Vanhoutrève, Renaud; Poch, Olivier; Thompson, Julie D

    2014-09-01

    The prediction of protein coding genes is a major challenge that depends on the quality of genome sequencing, the accuracy of the model used to elucidate the exonic structure of the genes and the complexity of the gene splicing process leading to different protein variants. As a consequence, today's protein databases contain a huge amount of inconsistency, due to both natural variants and sequence prediction errors. We have developed a new method, called SIBIS, to detect such inconsistencies based on the evolutionary information in multiple sequence alignments. A Bayesian framework, combined with Dirichlet mixture models, is used to estimate the probability of observing specific amino acids and to detect inconsistent or erroneous sequence segments. We evaluated the performance of SIBIS on a reference set of protein sequences with experimentally validated errors and showed that the sensitivity is significantly higher than previous methods, with only a small loss of specificity. We also assessed a large set of human sequences from the UniProt database and found evidence of inconsistency in 48% of the previously uncharacterized sequences. We conclude that the integration of quality control methods like SIBIS in automatic analysis pipelines will be critical for the robust inference of structural, functional and phylogenetic information from these sequences. Source code, implemented in C on a linux system, and the datasets of protein sequences are freely available for download at http://www.lbgi.fr/∼julie/SIBIS. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  14. A snapshot of the emerging tomato genome sequence

    USDA-ARS?s Scientific Manuscript database

    The genome of tomato (Solanum lycopersicum) is being sequenced by an international consortium of 10 countries (Korea, China, the United Kingdom, India, the Netherlands, France, Japan, Spain, Italy and the United States) as part of a larger initiative called the ‘International Solanaceae Genome Proje...

  15. FaStore - a space-saving solution for raw sequencing data.

    PubMed

    Roguski, Lukasz; Ochoa, Idoia; Hernaez, Mikel; Deorowicz, Sebastian

    2018-03-29

    The affordability of DNA sequencing has led to the generation of unprecedented volumes of raw sequencing data. These data must be stored, processed, and transmitted, which poses significant challenges. To facilitate this effort, we introduce FaStore, a specialized compressor for FASTQ files. FaStore does not use any reference sequences for compression, and permits the user to choose from several lossy modes to improve the overall compression ratio, depending on the specific needs. FaStore in the lossless mode achieves a significant improvement in compression ratio with respect to previously proposed algorithms. We perform an analysis on the effect that the different lossy modes have on variant calling, the most widely used application for clinical decision making, especially important in the era of precision medicine. We show that lossy compression can offer significant compression gains, while preserving the essential genomic information and without affecting the variant calling performance. FaStore can be downloaded from https://github.com/refresh-bio/FaStore. sebastian.deorowicz@polsl.pl. Supplementary data are available at Bioinformatics online.

  16. Abnormal plasma DNA profiles in early ovarian cancer using a non-invasive prenatal testing platform: implications for cancer screening.

    PubMed

    Cohen, Paul A; Flowers, Nicola; Tong, Stephen; Hannan, Natalie; Pertile, Mark D; Hui, Lisa

    2016-08-24

    Non-invasive prenatal testing (NIPT) identifies fetal aneuploidy by sequencing cell-free DNA in the maternal plasma. Pre-symptomatic maternal malignancies have been incidentally detected during NIPT based on abnormal genomic profiles. This low coverage sequencing approach could have potential for ovarian cancer screening in the non-pregnant population. Our objective was to investigate whether plasma DNA sequencing with a clinical whole genome NIPT platform can detect early- and late-stage high-grade serous ovarian carcinomas (HGSOC). This is a case control study of prospectively-collected biobank samples comprising preoperative plasma from 32 women with HGSOC (16 'early cancer' (FIGO I-II) and 16 'advanced cancer' (FIGO III-IV)) and 32 benign controls. Plasma DNA from cases and controls were sequenced using a commercial NIPT platform and chromosome dosage measured. Sequencing data were blindly analyzed with two methods: (1) Subchromosomal changes were called using an open source algorithm WISECONDOR (WIthin-SamplE COpy Number aberration DetectOR). Genomic gains or losses ≥ 15 Mb were prespecified as "screen positive" calls, and mapped to recurrent copy number variations reported in an ovarian cancer genome atlas. (2) Selected whole chromosome gains or losses were reported using the routine NIPT pipeline for fetal aneuploidy. We detected 13/32 cancer cases using the subchromosomal analysis (sensitivity 40.6 %, 95 % CI, 23.7-59.4 %), including 6/16 early and 7/16 advanced HGSOC cases. Two of 32 benign controls had subchromosomal gains ≥ 15 Mb (specificity 93.8 %, 95 % CI, 79.2-99.2 %). Twelve of the 13 true positive cancer cases exhibited specific recurrent changes reported in HGSOC tumors. The NIPT pipeline resulted in one "monosomy 18" call from the cancer group, and two "monosomy X" calls in the controls. Low coverage plasma DNA sequencing used for prenatal testing detected 40.6 % of all HGSOC, including 38 % of early stage cases. Our findings demonstrate the potential of a high throughput sequencing platform to screen for early HGSOC in plasma based on characteristic multiple segmental chromosome gains and losses. The performance of this approach may be further improved by refining bioinformatics algorithms and targeting selected cancer copy number variations.

  17. Template-Directed Copolymerization, Random Walks along Disordered Tracks, and Fractals

    NASA Astrophysics Data System (ADS)

    Gaspard, Pierre

    2016-12-01

    In biology, template-directed copolymerization is the fundamental mechanism responsible for the synthesis of DNA, RNA, and proteins. More than 50 years have passed since the discovery of DNA structure and its role in coding genetic information. Yet, the kinetics and thermodynamics of information processing in DNA replication, transcription, and translation remain poorly understood. Challenging issues are the facts that DNA or RNA sequences constitute disordered media for the motion of polymerases or ribosomes while errors occur in copying the template. Here, it is shown that these issues can be addressed and sequence heterogeneity effects can be quantitatively understood within a framework revealing universal aspects of information processing at the molecular scale. In steady growth regimes, the local velocities of polymerases or ribosomes along the template are distributed as the continuous or fractal invariant set of a so-called iterated function system, which determines the copying error probabilities. The growth may become sublinear in time with a scaling exponent that can also be deduced from the iterated function system.

  18. Utilization of multi-body trajectories in the Sun-Earth-Moon system

    NASA Technical Reports Server (NTRS)

    Farquhar, R. W.

    1980-01-01

    An overview of three uncommon trajectory concepts for space missions in the Sun-Earth-Moon System is presented. One concept uses a special class of libration-point orbits called 'halo orbits.' It is shown that members of this orbit family are advantageous for monitoring the solar wind input to the Earth's magnetosphere, and could also be used to establish a continuous communications link between the Earth and the far side of the Moon. The second concept employs pretzel-like trajectories to explore the Earth's geomagnetic tail. These trajectories are formed by using the Moon to carry out a prescribed sequence of gravity-assist maneuvers. Finally, there is the 'boomerang' trajectory technique for multiple-encounter missions to comets and asteroids. In this plan, Earth-swingby maneuvers are used to retarget the original spacecraft trajectory. The boomerang method could be used to produce a triple-encounter sequence which includes flybys of comets Halley and Tempel-2 as well as the asteroid Geographos.

  19. The Arabidopsis Information Resource (TAIR): a comprehensive database and web-based information retrieval, analysis, and visualization system for a model plant

    PubMed Central

    Huala, Eva; Dickerman, Allan W.; Garcia-Hernandez, Margarita; Weems, Danforth; Reiser, Leonore; LaFond, Frank; Hanley, David; Kiphart, Donald; Zhuang, Mingzhe; Huang, Wen; Mueller, Lukas A.; Bhattacharyya, Debika; Bhaya, Devaki; Sobral, Bruno W.; Beavis, William; Meinke, David W.; Town, Christopher D.; Somerville, Chris; Rhee, Seung Yon

    2001-01-01

    Arabidopsis thaliana, a small annual plant belonging to the mustard family, is the subject of study by an estimated 7000 researchers around the world. In addition to the large body of genetic, physiological and biochemical data gathered for this plant, it will be the first higher plant genome to be completely sequenced, with completion expected at the end of the year 2000. The sequencing effort has been coordinated by an international collaboration, the Arabidopsis Genome Initiative (AGI). The rationale for intensive investigation of Arabidopsis is that it is an excellent model for higher plants. In order to maximize use of the knowledge gained about this plant, there is a need for a comprehensive database and information retrieval and analysis system that will provide user-friendly access to Arabidopsis information. This paper describes the initial steps we have taken toward realizing these goals in a project called The Arabidopsis Information Resource (TAIR) (www.arabidopsis.org). PMID:11125061

  20. Mapping Ribonucleotides Incorporated into DNA by Hydrolytic End-Sequencing.

    PubMed

    Orebaugh, Clinton D; Lujan, Scott A; Burkholder, Adam B; Clausen, Anders R; Kunkel, Thomas A

    2018-01-01

    Ribonucleotides embedded within DNA render the DNA sensitive to the formation of single-stranded breaks under alkali conditions. Here, we describe a next-generation sequencing method called hydrolytic end sequencing (HydEn-seq) to map ribonucleotides inserted into the genome of Saccharomyce cerevisiae strains deficient in ribonucleotide excision repair. We use this method to map several genomic features in wild-type and replicase variant yeast strains.

  1. Easy and accurate reconstruction of whole HIV genomes from short-read sequence data with shiver

    PubMed Central

    Blanquart, François; Golubchik, Tanya; Gall, Astrid; Bakker, Margreet; Bezemer, Daniela; Croucher, Nicholas J; Hall, Matthew; Hillebregt, Mariska; Ratmann, Oliver; Albert, Jan; Bannert, Norbert; Fellay, Jacques; Fransen, Katrien; Gourlay, Annabelle; Grabowski, M Kate; Gunsenheimer-Bartmeyer, Barbara; Günthard, Huldrych F; Kivelä, Pia; Kouyos, Roger; Laeyendecker, Oliver; Liitsola, Kirsi; Meyer, Laurence; Porter, Kholoud; Ristola, Matti; van Sighem, Ard; Cornelissen, Marion; Kellam, Paul; Reiss, Peter

    2018-01-01

    Abstract Studying the evolution of viruses and their molecular epidemiology relies on accurate viral sequence data, so that small differences between similar viruses can be meaningfully interpreted. Despite its higher throughput and more detailed minority variant data, next-generation sequencing has yet to be widely adopted for HIV. The difficulty of accurately reconstructing the consensus sequence of a quasispecies from reads (short fragments of DNA) in the presence of large between- and within-host diversity, including frequent indels, may have presented a barrier. In particular, mapping (aligning) reads to a reference sequence leads to biased loss of information; this bias can distort epidemiological and evolutionary conclusions. De novo assembly avoids this bias by aligning the reads to themselves, producing a set of sequences called contigs. However contigs provide only a partial summary of the reads, misassembly may result in their having an incorrect structure, and no information is available at parts of the genome where contigs could not be assembled. To address these problems we developed the tool shiver to pre-process reads for quality and contamination, then map them to a reference tailored to the sample using corrected contigs supplemented with the user’s choice of existing reference sequences. Run with two commands per sample, it can easily be used for large heterogeneous data sets. We used shiver to reconstruct the consensus sequence and minority variant information from paired-end short-read whole-genome data produced with the Illumina platform, for sixty-five existing publicly available samples and fifty new samples. We show the systematic superiority of mapping to shiver’s constructed reference compared with mapping the same reads to the closest of 3,249 real references: median values of 13 bases called differently and more accurately, 0 bases called differently and less accurately, and 205 bases of missing sequence recovered. We also successfully applied shiver to whole-genome samples of Hepatitis C Virus and Respiratory Syncytial Virus. shiver is publicly available from https://github.com/ChrisHIV/shiver. PMID:29876136

  2. MALDI Top-Down sequencing: calling N- and C-terminal protein sequences with high confidence and speed.

    PubMed

    Suckau, Detlev; Resemann, Anja

    2009-12-01

    The ability to match Top-Down protein sequencing (TDS) results by MALDI-TOF to protein sequences by classical protein database searching was evaluated in this work. Resulting from these analyses were the protein identity, the simultaneous assignment of the N- and C-termini and protein sequences of up to 70 residues from either terminus. In combination with de novo sequencing using the MALDI-TDS data, even fusion proteins were assigned and the detailed sequence around the fusion site was elucidated. MALDI-TDS allowed to efficiently match protein sequences quickly and to validate recombinant protein structures-in particular, protein termini-on the level of undigested proteins.

  3. Abundant and diverse clustered regularly interspaced short palindromic repeat spacers in Clostridium difficile strains and prophages target multiple phage types within this pathogen.

    PubMed

    Hargreaves, Katherine R; Flores, Cesar O; Lawley, Trevor D; Clokie, Martha R J

    2014-08-26

    Clostridium difficile is an important human-pathogenic bacterium causing antibiotic-associated nosocomial infections worldwide. Mobile genetic elements and bacteriophages have helped shape C. difficile genome evolution. In many bacteria, phage infection may be controlled by a form of bacterial immunity called the clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) system. This uses acquired short nucleotide sequences (spacers) to target homologous sequences (protospacers) in phage genomes. C. difficile carries multiple CRISPR arrays, and in this paper we examine the relationships between the host- and phage-carried elements of the system. We detected multiple matches between spacers and regions in 31 C. difficile phage and prophage genomes. A subset of the spacers was located in prophage-carried CRISPR arrays. The CRISPR spacer profiles generated suggest that related phages would have similar host ranges. Furthermore, we show that C. difficile strains of the same ribotype could either have similar or divergent CRISPR contents. Both synonymous and nonsynonymous mutations in the protospacer sequences were identified, as well as differences in the protospacer adjacent motif (PAM), which could explain how phages escape this system. This paper illustrates how the distribution and diversity of CRISPR spacers in C. difficile, and its prophages, could modulate phage predation for this pathogen and impact upon its evolution and pathogenicity. Clostridium difficile is a significant bacterial human pathogen which undergoes continual genome evolution, resulting in the emergence of new virulent strains. Phages are major facilitators of genome evolution in other bacterial species, and we use sequence analysis-based approaches in order to examine whether the CRISPR/Cas system could control these interactions across divergent C. difficile strains. The presence of spacer sequences in prophages that are homologous to phage genomes raises an extra level of complexity in this predator-prey microbial system. Our results demonstrate that the impact of phage infection in this system is widespread and that the CRISPR/Cas system is likely to be an important aspect of the evolutionary dynamics in C. difficile. Copyright © 2014 Hargreaves et al.

  4. Large-scale synchronized activity during vocal deviance detection in the zebra finch auditory forebrain.

    PubMed

    Beckers, Gabriël J L; Gahr, Manfred

    2012-08-01

    Auditory systems bias responses to sounds that are unexpected on the basis of recent stimulus history, a phenomenon that has been widely studied using sequences of unmodulated tones (mismatch negativity; stimulus-specific adaptation). Such a paradigm, however, does not directly reflect problems that neural systems normally solve for adaptive behavior. We recorded multiunit responses in the caudomedial auditory forebrain of anesthetized zebra finches (Taeniopygia guttata) at 32 sites simultaneously, to contact calls that recur probabilistically at a rate that is used in communication. Neurons in secondary, but not primary, auditory areas respond preferentially to calls when they are unexpected (deviant) compared with the same calls when they are expected (standard). This response bias is predominantly due to sites more often not responding to standard events than to deviant events. When two call stimuli alternate between standard and deviant roles, most sites exhibit a response bias to deviant events of both stimuli. This suggests that biases are not based on a use-dependent decrease in response strength but involve a more complex mechanism that is sensitive to auditory deviance per se. Furthermore, between many secondary sites, responses are tightly synchronized, a phenomenon that is driven by internal neuronal interactions rather than by the timing of stimulus acoustic features. We hypothesize that this deviance-sensitive, internally synchronized network of neurons is involved in the involuntary capturing of attention by unexpected and behaviorally potentially relevant events in natural auditory scenes.

  5. MultiGeMS: detection of SNVs from multiple samples using model selection on high-throughput sequencing data.

    PubMed

    Murillo, Gabriel H; You, Na; Su, Xiaoquan; Cui, Wei; Reilly, Muredach P; Li, Mingyao; Ning, Kang; Cui, Xinping

    2016-05-15

    Single nucleotide variant (SNV) detection procedures are being utilized as never before to analyze the recent abundance of high-throughput DNA sequencing data, both on single and multiple sample datasets. Building on previously published work with the single sample SNV caller genotype model selection (GeMS), a multiple sample version of GeMS (MultiGeMS) is introduced. Unlike other popular multiple sample SNV callers, the MultiGeMS statistical model accounts for enzymatic substitution sequencing errors. It also addresses the multiple testing problem endemic to multiple sample SNV calling and utilizes high performance computing (HPC) techniques. A simulation study demonstrates that MultiGeMS ranks highest in precision among a selection of popular multiple sample SNV callers, while showing exceptional recall in calling common SNVs. Further, both simulation studies and real data analyses indicate that MultiGeMS is robust to low-quality data. We also demonstrate that accounting for enzymatic substitution sequencing errors not only improves SNV call precision at low mapping quality regions, but also improves recall at reference allele-dominated sites with high mapping quality. The MultiGeMS package can be downloaded from https://github.com/cui-lab/multigems xinping.cui@ucr.edu Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  6. UGbS-Flex, a novel bioinformatics pipeline for imputation-free SNP discovery in polyploids without a reference genome: finger millet as a case study.

    PubMed

    Qi, Peng; Gimode, Davis; Saha, Dipnarayan; Schröder, Stephan; Chakraborty, Debkanta; Wang, Xuewen; Dida, Mathews M; Malmberg, Russell L; Devos, Katrien M

    2018-06-15

    Research on orphan crops is often hindered by a lack of genomic resources. With the advent of affordable sequencing technologies, genotyping an entire genome or, for large-genome species, a representative fraction of the genome has become feasible for any crop. Nevertheless, most genotyping-by-sequencing (GBS) methods are geared towards obtaining large numbers of markers at low sequence depth, which excludes their application in heterozygous individuals. Furthermore, bioinformatics pipelines often lack the flexibility to deal with paired-end reads or to be applied in polyploid species. UGbS-Flex combines publicly available software with in-house python and perl scripts to efficiently call SNPs from genotyping-by-sequencing reads irrespective of the species' ploidy level, breeding system and availability of a reference genome. Noteworthy features of the UGbS-Flex pipeline are an ability to use paired-end reads as input, an effective approach to cluster reads across samples with enhanced outputs, and maximization of SNP calling. We demonstrate use of the pipeline for the identification of several thousand high-confidence SNPs with high representation across samples in an F 3 -derived F 2 population in the allotetraploid finger millet. Robust high-density genetic maps were constructed using the time-tested mapping program MAPMAKER which we upgraded to run efficiently and in a semi-automated manner in a Windows Command Prompt Environment. We exploited comparative GBS with one of the diploid ancestors of finger millet to assign linkage groups to subgenomes and demonstrate the presence of chromosomal rearrangements. The paper combines GBS protocol modifications, a novel flexible GBS analysis pipeline, UGbS-Flex, recommendations to maximize SNP identification, updated genetic mapping software, and the first high-density maps of finger millet. The modules used in the UGbS-Flex pipeline and for genetic mapping were applied to finger millet, an allotetraploid selfing species without a reference genome, as a case study. The UGbS-Flex modules, which can be run independently, are easily transferable to species with other breeding systems or ploidy levels.

  7. The Planning, Implementation, and Movement of an Academic Library Collection.

    ERIC Educational Resources Information Center

    Kurkul, Donna Lee

    1983-01-01

    Discusses methodology, logistics, and time/cost study of planning, implementation, and relocation of 682,810 volume Smith College Library collection into its newly constructed and renovated facility. Call number sequence location, collection movement phasing and formulas for sequence distribution, and personnel requirements are noted. Elementary…

  8. Quantum Point Contact Single-Nucleotide Conductance for DNA and RNA Sequence Identification.

    PubMed

    Afsari, Sepideh; Korshoj, Lee E; Abel, Gary R; Khan, Sajida; Chatterjee, Anushree; Nagpal, Prashant

    2017-11-28

    Several nanoscale electronic methods have been proposed for high-throughput single-molecule nucleic acid sequence identification. While many studies display a large ensemble of measurements as "electronic fingerprints" with some promise for distinguishing the DNA and RNA nucleobases (adenine, guanine, cytosine, thymine, and uracil), important metrics such as accuracy and confidence of base calling fall well below the current genomic methods. Issues such as unreliable metal-molecule junction formation, variation of nucleotide conformations, insufficient differences between the molecular orbitals responsible for single-nucleotide conduction, and lack of rigorous base calling algorithms lead to overlapping nanoelectronic measurements and poor nucleotide discrimination, especially at low coverage on single molecules. Here, we demonstrate a technique for reproducible conductance measurements on conformation-constrained single nucleotides and an advanced algorithmic approach for distinguishing the nucleobases. Our quantum point contact single-nucleotide conductance sequencing (QPICS) method uses combed and electrostatically bound single DNA and RNA nucleotides on a self-assembled monolayer of cysteamine molecules. We demonstrate that by varying the applied bias and pH conditions, molecular conductance can be switched ON and OFF, leading to reversible nucleotide perturbation for electronic recognition (NPER). We utilize NPER as a method to achieve >99.7% accuracy for DNA and RNA base calling at low molecular coverage (∼12×) using unbiased single measurements on DNA/RNA nucleotides, which represents a significant advance compared to existing sequencing methods. These results demonstrate the potential for utilizing simple surface modifications and existing biochemical moieties in individual nucleobases for a reliable, direct, single-molecule, nanoelectronic DNA and RNA nucleotide identification method for sequencing.

  9. Physical model of the immune response of bacteria against bacteriophage through the adaptive CRISPR-Cas immune system

    NASA Astrophysics Data System (ADS)

    Han, Pu; Niestemski, Liang Ren; Barrick, Jeffrey E.; Deem, Michael W.

    2013-04-01

    Bacteria and archaea have evolved an adaptive, heritable immune system that recognizes and protects against viruses or plasmids. This system, known as the CRISPR-Cas system, allows the host to recognize and incorporate short foreign DNA or RNA sequences, called ‘spacers’ into its CRISPR system. Spacers in the CRISPR system provide a record of the history of bacteria and phage coevolution. We use a physical model to study the dynamics of this coevolution as it evolves stochastically over time. We focus on the impact of mutation and recombination on bacteria and phage evolution and evasion. We discuss the effect of different spacer deletion mechanisms on the coevolutionary dynamics. We make predictions about bacteria and phage population growth, spacer diversity within the CRISPR locus, and spacer protection against the phage population.

  10. A stage is a stage is a stage: a direct comparison of two scoring systems.

    PubMed

    Dawson, Theo L

    2003-09-01

    L. Kohlberg (1969) argued that his moral stages captured a developmental sequence specific to the moral domain. To explore that contention, the author compared stage assignments obtained with the Standard Issue Scoring System (A. Colby & L. Kohlberg, 1987a, 1987b) and those obtained with a generalized content-independent stage-scoring system called the Hierarchical Complexity Scoring System (T. L. Dawson, 2002a), on 637 moral judgment interviews (participants' ages ranged from 5 to 86 years). The correlation between stage scores produced with the 2 systems was .88. Although standard issue scoring and hierarchical complexity scoring often awarded different scores up to Kohlberg's Moral Stage 2/3, from his Moral Stage 3 onward, scores awarded with the two systems predominantly agreed. The author explores the implications for developmental research.

  11. A transmission imaging spectrograph and microfabricated channel system for DNA analysis.

    PubMed

    Simpson, J W; Ruiz-Martinez, M C; Mulhern, G T; Berka, J; Latimer, D R; Ball, J A; Rothberg, J M; Went, G T

    2000-01-01

    In this paper we present the development of a DNA analysis system using a microfabricated channel device and a novel transmission imaging spectrograph which can be efficiently incorporated into a high throughput genomics facility for both sizing and sequencing of DNA fragments. The device contains 48 channels etched on a glass substrate. The channels are sealed with a flat glass plate which also provides a series of apertures for sample loading and contact with buffer reservoirs. Samples can be easily loaded in volumes up to 640 nL without band broadening because of an efficient electrokinetic stacking at the electrophoresis channel entrance. The system uses a dual laser excitation source and a highly sensitive charge-coupled device (CCD) detector allowing for simultaneous detection of many fluorescent dyes. The sieving matrices for the separation of single-stranded DNA fragments are polymerized in situ in denaturing buffer systems. Examples of separation of single-stranded DNA fragments up to 500 bases in length are shown, including accurate sizing of GeneCalling fragments, and sequencing samples prepared with a reduced amount of dye terminators. An increase in sample throughput has been achieved by color multiplexing.

  12. Learning of spatio-temporal codes in a coupled oscillator system.

    PubMed

    Orosz, Gábor; Ashwin, Peter; Townley, Stuart

    2009-07-01

    In this paper, we consider a learning strategy that allows one to transmit information between two coupled phase oscillator systems (called teaching and learning systems) via frequency adaptation. The dynamics of these systems can be modeled with reference to a number of partially synchronized cluster states and transitions between them. Forcing the teaching system by steady but spatially nonhomogeneous inputs produces cyclic sequences of transitions between the cluster states, that is, information about inputs is encoded via a "winnerless competition" process into spatio-temporal codes. The large variety of codes can be learned by the learning system that adapts its frequencies to those of the teaching system. We visualize the dynamics using "weighted order parameters (WOPs)" that are analogous to "local field potentials" in neural systems. Since spatio-temporal coding is a mechanism that appears in olfactory systems, the developed learning rules may help to extract information from these neural ensembles.

  13. AutoGen Version 5.0

    NASA Technical Reports Server (NTRS)

    Gladden, Roy E.; Khanampornpan, Teerapat; Fisher, Forest W.

    2010-01-01

    Version 5.0 of the AutoGen software has been released. Previous versions, variously denoted Autogen and autogen, were reported in two articles: Automated Sequence Generation Process and Software (NPO-30746), Software Tech Briefs (Special Supplement to NASA Tech Briefs), September 2007, page 30, and Autogen Version 2.0 (NPO- 41501), NASA Tech Briefs, Vol. 31, No. 10 (October 2007), page 58. To recapitulate: AutoGen (now signifying automatic sequence generation ) automates the generation of sequences of commands in a standard format for uplink to spacecraft. AutoGen requires fewer workers than are needed for older manual sequence-generation processes, and greatly reduces sequence-generation times. The sequences are embodied in spacecraft activity sequence files (SASFs). AutoGen automates generation of SASFs by use of another previously reported program called APGEN. AutoGen encodes knowledge of different mission phases and of how the resultant commands must differ among the phases. AutoGen also provides means for customizing sequences through use of configuration files. The approach followed in developing AutoGen has involved encoding the behaviors of a system into a model and encoding algorithms for context-sensitive customizations of the modeled behaviors. This version of AutoGen addressed the MRO (Mars Reconnaissance Orbiter) primary science phase (PSP) mission phase. On previous Mars missions this phase has more commonly been referred to as mapping phase. This version addressed the unique aspects of sequencing orbital operations and specifically the mission specific adaptation of orbital operations for MRO. This version also includes capabilities for MRO s role in Mars relay support for UHF relay communications with the MER rovers and the Phoenix lander.

  14. CRISPR-like sequences in Helicobacter pylori and application in genotyping.

    PubMed

    Bangpanwimon, Khotchawan; Sottisuporn, Jaksin; Mittraparp-Arthorn, Pimonsri; Ueaphatthanaphanich, Warattaya; Rattanasupar, Attapon; Pourcel, Christine; Vuddhakul, Varaporn

    2017-01-01

    Many bacteria and archaea possess a defense system called clustered regularly interspaced short palindromic repeats (CRISPR) associated proteins (CRISPR-Cas system) against invaders such as phages or plasmids. This system has not been demonstrated in Helicobacter pylori . The numbers of spacer in CRISPR array differ among bacterial strains and can be used as a genetic marker for bacterial typing. A total of 36 H. pylori isolates were collected from patients in three hospitals located in the central (PBH) and southern (SKH) regions of Thailand. It is of interest that CRISPR-like sequences of this bacterium were detected in vlpC encoded for VacA-like protein C. Virulence genes were investigated and the most pathogenic genotype ( cagA vacA s1m1) was detected in 17 out of 29 (58.6%) isolates from PBH and 5 out of 7 (71.4%) from SKH. vapD gene was identified in each one isolate from PBH and SKH. CRISPR-like sequences and virulence genes of 20 isolates of H. pylori obtained in this study were analyzed and CRISPR-virulence typing was constructed and compared to profiles obtained by the random amplification of polymorphic DNA (RAPD) technique. The discriminatory power (DI) of CRISPR-virulence typing was not different from RAPD typing. CRISPR-virulence typing in H. pylori is easy and reliable for epidemiology and can be used for inter-laboratory interpretation.

  15. Abundant and Diverse Clustered Regularly Interspaced Short Palindromic Repeat Spacers in Clostridium difficile Strains and Prophages Target Multiple Phage Types within This Pathogen

    PubMed Central

    Hargreaves, Katherine R.; Flores, Cesar O.; Lawley, Trevor D.

    2014-01-01

    ABSTRACT Clostridium difficile is an important human-pathogenic bacterium causing antibiotic-associated nosocomial infections worldwide. Mobile genetic elements and bacteriophages have helped shape C. difficile genome evolution. In many bacteria, phage infection may be controlled by a form of bacterial immunity called the clustered regularly interspaced short palindromic repeats/CRISPR-associated (CRISPR/Cas) system. This uses acquired short nucleotide sequences (spacers) to target homologous sequences (protospacers) in phage genomes. C. difficile carries multiple CRISPR arrays, and in this paper we examine the relationships between the host- and phage-carried elements of the system. We detected multiple matches between spacers and regions in 31 C. difficile phage and prophage genomes. A subset of the spacers was located in prophage-carried CRISPR arrays. The CRISPR spacer profiles generated suggest that related phages would have similar host ranges. Furthermore, we show that C. difficile strains of the same ribotype could either have similar or divergent CRISPR contents. Both synonymous and nonsynonymous mutations in the protospacer sequences were identified, as well as differences in the protospacer adjacent motif (PAM), which could explain how phages escape this system. This paper illustrates how the distribution and diversity of CRISPR spacers in C. difficile, and its prophages, could modulate phage predation for this pathogen and impact upon its evolution and pathogenicity. PMID:25161187

  16. DIALOG: An executive computer program for linking independent programs

    NASA Technical Reports Server (NTRS)

    Glatt, C. R.; Hague, D. S.; Watson, D. A.

    1973-01-01

    A very large scale computer programming procedure called the DIALOG Executive System has been developed for the Univac 1100 series computers. The executive computer program, DIALOG, controls the sequence of execution and data management function for a library of independent computer programs. Communication of common information is accomplished by DIALOG through a dynamically constructed and maintained data base of common information. The unique feature of the DIALOG Executive System is the manner in which computer programs are linked. Each program maintains its individual identity and as such is unaware of its contribution to the large scale program. This feature makes any computer program a candidate for use with the DIALOG Executive System. The installation and use of the DIALOG Executive System are described at Johnson Space Center.

  17. HmtDB 2016: data update, a better performing query system and human mitochondrial DNA haplogroup predictor.

    PubMed

    Clima, Rosanna; Preste, Roberto; Calabrese, Claudia; Diroma, Maria Angela; Santorsola, Mariangela; Scioscia, Gaetano; Simone, Domenico; Shen, Lishuang; Gasparre, Giuseppe; Attimonelli, Marcella

    2017-01-04

    The HmtDB resource hosts a database of human mitochondrial genome sequences from individuals with healthy and disease phenotypes. The database is intended to support both population geneticists as well as clinicians undertaking the task to assess the pathogenicity of specific mtDNA mutations. The wide application of next-generation sequencing (NGS) has provided an enormous volume of high-resolution data at a low price, increasing the availability of human mitochondrial sequencing data, which called for a cogent and significant expansion of HmtDB data content that has more than tripled in the current release. We here describe additional novel features, including: (i) a complete, user-friendly restyling of the web interface, (ii) links to the command-line stand-alone and web versions of the MToolBox package, an up-to-date tool to reconstruct and analyze human mitochondrial DNA from NGS data and (iii) the implementation of the Reconstructed Sapiens Reference Sequence (RSRS) as mitochondrial reference sequence. The overall update renders HmtDB an even more handy and useful resource as it enables a more rapid data access, processing and analysis. HmtDB is accessible at http://www.hmtdb.uniba.it/. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  18. In silico segmentations of lentivirus envelope sequences

    PubMed Central

    Boissin-Quillon, Aurélia; Piau, Didier; Leroux, Caroline

    2007-01-01

    Background The gene encoding the envelope of lentiviruses exhibits a considerable plasticity, particularly the region which encodes the surface (SU) glycoprotein. Interestingly, mutations do not appear uniformly along the sequence of SU, but they are clustered in restricted areas, called variable (V) regions, which are interspersed with relatively more stable regions, called constant (C) regions. We look for specific signatures of C/V regions, using hidden Markov models constructed with SU sequences of the equine, human, small ruminant and simian lentiviruses. Results Our models yield clear and accurate delimitations of the C/V regions, when the test set and the training set were made up of sequences of the same lentivirus, but also when they were made up of sequences of different lentiviruses. Interestingly, the models predicted the different regions of lentiviruses such as the bovine and feline lentiviruses, not used in the training set. Models based on composite training sets produce accurate segmentations of sequences of all these lentiviruses. Conclusion Our results suggest that each C/V region has a specific statistical oligonucleotide composition, and that the C (respectively V) regions of one of these lentiviruses are statistically more similar to the C (respectively V) regions of the other lentiviruses, than to the V (respectively C) regions of the same lentivirus. PMID:17376229

  19. Using Next Generation Sequencing for Multiplexed Trait-Linked Markers in Wheat

    PubMed Central

    Bernardo, Amy; Wang, Shan; St. Amand, Paul; Bai, Guihua

    2015-01-01

    With the advent of next generation sequencing (NGS) technologies, single nucleotide polymorphisms (SNPs) have become the major type of marker for genotyping in many crops. However, the availability of SNP markers for important traits of bread wheat ( Triticum aestivum L.) that can be effectively used in marker-assisted selection (MAS) is still limited and SNP assays for MAS are usually uniplex. A shift from uniplex to multiplex assays will allow the simultaneous analysis of multiple markers and increase MAS efficiency. We designed 33 locus-specific markers from SNP or indel-based marker sequences that linked to 20 different quantitative trait loci (QTL) or genes of agronomic importance in wheat and analyzed the amplicon sequences using an Ion Torrent Proton Sequencer and a custom allele detection pipeline to determine the genotypes of 24 selected germplasm accessions. Among the 33 markers, 27 were successfully multiplexed and 23 had 100% SNP call rates. Results from analysis of "kompetitive allele-specific PCR" (KASP) and sequence tagged site (STS) markers developed from the same loci fully verified the genotype calls of 23 markers. The NGS-based multiplexed assay developed in this study is suitable for rapid and high-throughput screening of SNPs and some indel-based markers in wheat. PMID:26625271

  20. AntiClustal: Multiple Sequence Alignment by antipole clustering and linear approximate 1-median computation.

    PubMed

    Di Pietro, C; Di Pietro, V; Emmanuele, G; Ferro, A; Maugeri, T; Modica, E; Pigola, G; Pulvirenti, A; Purrello, M; Ragusa, M; Scalia, M; Shasha, D; Travali, S; Zimmitti, V

    2003-01-01

    In this paper we present a new Multiple Sequence Alignment (MSA) algorithm called AntiClusAl. The method makes use of the commonly use idea of aligning homologous sequences belonging to classes generated by some clustering algorithm, and then continue the alignment process ina bottom-up way along a suitable tree structure. The final result is then read at the root of the tree. Multiple sequence alignment in each cluster makes use of the progressive alignment with the 1-median (center) of the cluster. The 1-median of set S of sequences is the element of S which minimizes the average distance from any other sequence in S. Its exact computation requires quadratic time. The basic idea of our proposed algorithm is to make use of a simple and natural algorithmic technique based on randomized tournaments which has been successfully applied to large size search problems in general metric spaces. In particular a clustering algorithm called Antipole tree and an approximate linear 1-median computation are used. Our algorithm compared with Clustal W, a widely used tool to MSA, shows a better running time results with fully comparable alignment quality. A successful biological application showing high aminoacid conservation during evolution of Xenopus laevis SOD2 is also cited.

  1. Empirical Validation of Pooled Whole Genome Population Re-Sequencing in Drosophila melanogaster

    PubMed Central

    Zhu, Yuan; Bergland, Alan O.; González, Josefa; Petrov, Dmitri A.

    2012-01-01

    The sequencing of pooled non-barcoded individuals is an inexpensive and efficient means of assessing genome-wide population allele frequencies, yet its accuracy has not been thoroughly tested. We assessed the accuracy of this approach on whole, complex eukaryotic genomes by resequencing pools of largely isogenic, individually sequenced Drosophila melanogaster strains. We called SNPs in the pooled data and estimated false positive and false negative rates using the SNPs called in individual strain as a reference. We also estimated allele frequency of the SNPs using “pooled” data and compared them with “true” frequencies taken from the estimates in the individual strains. We demonstrate that pooled sequencing provides a faithful estimate of population allele frequency with the error well approximated by binomial sampling, and is a reliable means of novel SNP discovery with low false positive rates. However, a sufficient number of strains should be used in the pooling because variation in the amount of DNA derived from individual strains is a substantial source of noise when the number of pooled strains is low. Our results and analysis confirm that pooled sequencing is a very powerful and cost-effective technique for assessing of patterns of sequence variation in populations on genome-wide scales, and is applicable to any dataset where sequencing individuals or individual cells is impossible, difficult, time consuming, or expensive. PMID:22848651

  2. Sam2bam: High-Performance Framework for NGS Data Preprocessing Tools

    PubMed Central

    Cheng, Yinhe; Tzeng, Tzy-Hwa Kathy

    2016-01-01

    This paper introduces a high-throughput software tool framework called sam2bam that enables users to significantly speed up pre-processing for next-generation sequencing data. The sam2bam is especially efficient on single-node multi-core large-memory systems. It can reduce the runtime of data pre-processing in marking duplicate reads on a single node system by 156–186x compared with de facto standard tools. The sam2bam consists of parallel software components that can fully utilize multiple processors, available memory, high-bandwidth storage, and hardware compression accelerators, if available. The sam2bam provides file format conversion between well-known genome file formats, from SAM to BAM, as a basic feature. Additional features such as analyzing, filtering, and converting input data are provided by using plug-in tools, e.g., duplicate marking, which can be attached to sam2bam at runtime. We demonstrated that sam2bam could significantly reduce the runtime of next generation sequencing (NGS) data pre-processing from about two hours to about one minute for a whole-exome data set on a 16-core single-node system using up to 130 GB of memory. The sam2bam could reduce the runtime of NGS data pre-processing from about 20 hours to about nine minutes for a whole-genome sequencing data set on the same system using up to 711 GB of memory. PMID:27861637

  3. Fast imputation using medium- or low-coverage sequence data

    USDA-ARS?s Scientific Manuscript database

    Direct imputation from raw sequence reads can be more accurate than calling genotypes first and then imputing, especially if read depth is low or error rates high, but different imputation strategies are required than those used for data from genotyping chips. A fast algorithm to impute from lower t...

  4. High-throughput interpretation of gene structure changes in human and nonhuman resequencing data, using ACE

    USDA-ARS?s Scientific Manuscript database

    We describe a suite of software tools for identifying possible functional changes in gene structure that may result from sequence variants. ACE (“Assessing Changes to Exons”) converts phased genotype calls to a collection of explicit haplotype sequences, maps transcript annotations onto them, detect...

  5. DNAApp: a mobile application for sequencing data analysis

    PubMed Central

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-01-01

    Summary: There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. Availability and implementation: The Android version of DNAApp is available in Google Play Store as ‘DNAApp’, and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. Contact: samuelg@bii.a-star.edu.sg PMID:25095882

  6. DNAApp: a mobile application for sequencing data analysis.

    PubMed

    Nguyen, Phi-Vu; Verma, Chandra Shekhar; Gan, Samuel Ken-En

    2014-11-15

    There have been numerous applications developed for decoding and visualization of ab1 DNA sequencing files for Windows and MAC platforms, yet none exists for the increasingly popular smartphone operating systems. The ability to decode sequencing files cannot easily be carried out using browser accessed Web tools. To overcome this hurdle, we have developed a new native app called DNAApp that can decode and display ab1 sequencing file on Android and iOS. In addition to in-built analysis tools such as reverse complementation, protein translation and searching for specific sequences, we have incorporated convenient functions that would facilitate the harnessing of online Web tools for a full range of analysis. Given the high usage of Android/iOS tablets and smartphones, such bioinformatics apps would raise productivity and facilitate the high demand for analyzing sequencing data in biomedical research. The Android version of DNAApp is available in Google Play Store as 'DNAApp', and the iOS version is available in the App Store. More details on the app can be found at www.facebook.com/APDLab; www.bii.a-star.edu.sg/research/trd/apd.php The DNAApp user guide is available at http://tinyurl.com/DNAAppuser, and a video tutorial is available on Google Play Store and App Store, as well as on the Facebook page. samuelg@bii.a-star.edu.sg. © The Author 2014. Published by Oxford University Press.

  7. A new species of Pseudopaludicola (Anura, Leiuperinae) from Espírito Santo, Brazil

    PubMed Central

    Baldo, Diego; Pupin, Nadya; Gasparini, João Luiz; Baptista Haddad, Célio F.

    2018-01-01

    We describe a new anuran species of the genus Pseudopaludicola that inhabits sandy areas in resting as associated to the Atlantic Forest biome in the state of Espírito Santo, Brazil. The new species is characterized by: SVL 11.7–14.6 mm in males, 14.0–16.7 mm in females; body slender; fingertips knobbed, with a central groove; hindlimbs short; abdominal fold complete; arytenoid cartilages wide; prepollex with base and two segments; prehallux with base and one segment; frontoparietal fontanelle partially exposed; advertisement call with one note composed of two isolated pulses per call; call dominant frequency ranging 4,380–4,884 Hz; diploid chromosome number 22; and Ag-NORs on 8q subterminal. In addition, its 16S rDNA sequence shows high genetic distances when compared to sequences of related species, which provides strong evidence that the new species is an independent lineage. PMID:29785347

  8. Lazy evaluation of FP programs: A data-flow approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Wei, Y.H.; Gaudiot, J.L.

    1988-12-31

    This paper presents a lazy evaluation system for the list-based functional language, Backus` FP in data-driven environment. A superset language of FP, called DFP (Demand-driven FP), is introduced. FP eager programs are transformed into DFP lazy programs which contain the notions of demands. The data-driven execution of DFP programs has the same effects of lazy evaluation. DFP lazy programs have the property of always evaluating a sufficient and necessary result. The infinite sequence generator is used to demonstrate the eager-lazy program transformation and the execution of the lazy programs.

  9. Leveraging Call Center Logs for Customer Behavior Prediction

    NASA Astrophysics Data System (ADS)

    Parvathy, Anju G.; Vasudevan, Bintu G.; Kumar, Abhishek; Balakrishnan, Rajesh

    Most major businesses use business process outsourcing for performing a process or a part of a process including financial services like mortgage processing, loan origination, finance and accounting and transaction processing. Call centers are used for the purpose of receiving and transmitting a large volume of requests through outbound and inbound calls to customers on behalf of a business. In this paper we deal specifically with the call centers notes from banks. Banks as financial institutions provide loans to non-financial businesses and individuals. Their call centers act as the nuclei of their client service operations and log the transactions between the customer and the bank. This crucial conversation or information can be exploited for predicting a customer’s behavior which will in turn help these businesses to decide on the next action to be taken. Thus the banks save considerable time and effort in tracking delinquent customers to ensure minimum subsequent defaulters. Majority of the time the call center notes are very concise and brief and often the notes are misspelled and use many domain specific acronyms. In this paper we introduce a novel domain specific spelling correction algorithm which corrects the misspelled words in the call center logs to meaningful ones. We also discuss a procedure that builds the behavioral history sequences for the customers by categorizing the logs into one of the predefined behavioral states. We then describe a pattern based predictive algorithm that uses temporal behavioral patterns mined from these sequences to predict the customer’s next behavioral state.

  10. Sharing programming resources between Bio* projects through remote procedure call and native call stack strategies.

    PubMed

    Prins, Pjotr; Goto, Naohisa; Yates, Andrew; Gautier, Laurent; Willis, Scooter; Fields, Christopher; Katayama, Toshiaki

    2012-01-01

    Open-source software (OSS) encourages computer programmers to reuse software components written by others. In evolutionary bioinformatics, OSS comes in a broad range of programming languages, including C/C++, Perl, Python, Ruby, Java, and R. To avoid writing the same functionality multiple times for different languages, it is possible to share components by bridging computer languages and Bio* projects, such as BioPerl, Biopython, BioRuby, BioJava, and R/Bioconductor. In this chapter, we compare the two principal approaches for sharing software between different programming languages: either by remote procedure call (RPC) or by sharing a local call stack. RPC provides a language-independent protocol over a network interface; examples are RSOAP and Rserve. The local call stack provides a between-language mapping not over the network interface, but directly in computer memory; examples are R bindings, RPy, and languages sharing the Java Virtual Machine stack. This functionality provides strategies for sharing of software between Bio* projects, which can be exploited more often. Here, we present cross-language examples for sequence translation, and measure throughput of the different options. We compare calling into R through native R, RSOAP, Rserve, and RPy interfaces, with the performance of native BioPerl, Biopython, BioJava, and BioRuby implementations, and with call stack bindings to BioJava and the European Molecular Biology Open Software Suite. In general, call stack approaches outperform native Bio* implementations and these, in turn, outperform RPC-based approaches. To test and compare strategies, we provide a downloadable BioNode image with all examples, tools, and libraries included. The BioNode image can be run on VirtualBox-supported operating systems, including Windows, OSX, and Linux.

  11. Robustness of Massively Parallel Sequencing Platforms

    PubMed Central

    Kavak, Pınar; Yüksel, Bayram; Aksu, Soner; Kulekci, M. Oguzhan; Güngör, Tunga; Hach, Faraz; Şahinalp, S. Cenk; Alkan, Can; Sağıroğlu, Mahmut Şamil

    2015-01-01

    The improvements in high throughput sequencing technologies (HTS) made clinical sequencing projects such as ClinSeq and Genomics England feasible. Although there are significant improvements in accuracy and reproducibility of HTS based analyses, the usability of these types of data for diagnostic and prognostic applications necessitates a near perfect data generation. To assess the usability of a widely used HTS platform for accurate and reproducible clinical applications in terms of robustness, we generated whole genome shotgun (WGS) sequence data from the genomes of two human individuals in two different genome sequencing centers. After analyzing the data to characterize SNPs and indels using the same tools (BWA, SAMtools, and GATK), we observed significant number of discrepancies in the call sets. As expected, the most of the disagreements between the call sets were found within genomic regions containing common repeats and segmental duplications, albeit only a small fraction of the discordant variants were within the exons and other functionally relevant regions such as promoters. We conclude that although HTS platforms are sufficiently powerful for providing data for first-pass clinical tests, the variant predictions still need to be confirmed using orthogonal methods before using in clinical applications. PMID:26382624

  12. Memory Palaces

    ERIC Educational Resources Information Center

    Wood, Marianne

    2007-01-01

    This article presents a lesson called Memory Palaces. A memory palace is a memory tool used to remember information, usually as visual images, in a sequence that is logical to the person remembering it. In his book, "In the Palaces of Memory", George Johnson calls them "...structure(s) for arranging knowledge. Lots of connections to language arts,…

  13. Short-read, high-throughput sequencing technology for STR genotyping

    PubMed Central

    Bornman, Daniel M.; Hester, Mark E.; Schuetter, Jared M.; Kasoji, Manjula D.; Minard-Smith, Angela; Barden, Curt A.; Nelson, Scott C.; Godbold, Gene D.; Baker, Christine H.; Yang, Boyu; Walther, Jacquelyn E.; Tornes, Ivan E.; Yan, Pearlly S.; Rodriguez, Benjamin; Bundschuh, Ralf; Dickens, Michael L.; Young, Brian A.; Faith, Seth A.

    2013-01-01

    DNA-based methods for human identification principally rely upon genotyping of short tandem repeat (STR) loci. Electrophoretic-based techniques for variable-length classification of STRs are universally utilized, but are limited in that they have relatively low throughput and do not yield nucleotide sequence information. High-throughput sequencing technology may provide a more powerful instrument for human identification, but is not currently validated for forensic casework. Here, we present a systematic method to perform high-throughput genotyping analysis of the Combined DNA Index System (CODIS) STR loci using short-read (150 bp) massively parallel sequencing technology. Open source reference alignment tools were optimized to evaluate PCR-amplified STR loci using a custom designed STR genome reference. Evaluation of this approach demonstrated that the 13 CODIS STR loci and amelogenin (AMEL) locus could be accurately called from individual and mixture samples. Sensitivity analysis showed that as few as 18,500 reads, aligned to an in silico referenced genome, were required to genotype an individual (>99% confidence) for the CODIS loci. The power of this technology was further demonstrated by identification of variant alleles containing single nucleotide polymorphisms (SNPs) and the development of quantitative measurements (reads) for resolving mixed samples. PMID:25621315

  14. Mutation detection in the human HSP70B′ gene by denaturing high-performance liquid chromatography

    PubMed Central

    Hecker, Karl H.; Asea, Alexzander; Kobayashi, Kaoru; Green, Stacy; Tang, Dan; Calderwood, Stuart K.

    2000-01-01

    Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B′ gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKERTM software. Four overlapping amplicons, which span the complete coding region of the HSP70B′ gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B′ gene on the WAVE® Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed. PMID:11189446

  15. Mutation detection in the human HSP7OB' gene by denaturing high-performance liquid chromatography.

    PubMed

    Hecker, K H; Asea, A; Kobayashi, K; Green, S; Tang, D; Calderwood, S K

    2000-11-01

    Variances, particularly single nucleotide polymorphisms (SNP), in the genomic sequence of individuals are the primary key to understanding gene function as it relates to differences in the susceptibility to disease, environmental influences, and therapy. In this report, the HSP70B' gene is the target sequence for mutation detection in biopsy samples from human prostate cancer patients undergoing combined hyperthermia and radiation therapy at the Dana-Farber Cancer Institute, using temperature-modulated heteroduplex analysis (TMHA). The underlying principles of TMHA for mutation detection using DHPLC technology are discussed. The procedures involved in amplicon design for mutation analysis by DHPLC are detailed. The melting behavior of the complete coding sequence of the target gene is characterized using WAVEMAKER software. Four overlapping amplicons, which span the complete coding region of the HSP70B' gene, amenable to mutation detection by DHPLC were identified based on the software-predicted melting profile of the target sequence. TMHA was performed on PCR products of individual amplicons of the HSP70B' gene on the WAVE Nucleic Acid Fragment Analysis System. The criteria for mutation calling by comparing wild-type and mutant chromatographic patterns are discussed.

  16. Impact of genotyping errors on statistical power of association tests in genomic analyses: A case study

    PubMed Central

    Hou, Lin; Sun, Ning; Mane, Shrikant; Sayward, Fred; Rajeevan, Nallakkandi; Cheung, Kei-Hoi; Cho, Kelly; Pyarajan, Saiju; Aslan, Mihaela; Miller, Perry; Harvey, Philip D.; Gaziano, J. Michael; Concato, John; Zhao, Hongyu

    2017-01-01

    A key step in genomic studies is to assess high throughput measurements across millions of markers for each participant’s DNA, either using microarrays or sequencing techniques. Accurate genotype calling is essential for downstream statistical analysis of genotype-phenotype associations, and next generation sequencing (NGS) has recently become a more common approach in genomic studies. How the accuracy of variant calling in NGS-based studies affects downstream association analysis has not, however, been studied using empirical data in which both microarrays and NGS were available. In this article, we investigate the impact of variant calling errors on the statistical power to identify associations between single nucleotides and disease, and on associations between multiple rare variants and disease. Both differential and nondifferential genotyping errors are considered. Our results show that the power of burden tests for rare variants is strongly influenced by the specificity in variant calling, but is rather robust with regard to sensitivity. By using the variant calling accuracies estimated from a substudy of a Cooperative Studies Program project conducted by the Department of Veterans Affairs, we show that the power of association tests is mostly retained with commonly adopted variant calling pipelines. An R package, GWAS.PC, is provided to accommodate power analysis that takes account of genotyping errors (http://zhaocenter.org/software/). PMID:28019059

  17. Limiting parental interaction during vocal development affects acoustic call structure in marmoset monkeys

    PubMed Central

    2018-01-01

    Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development. PMID:29651461

  18. Limiting parental interaction during vocal development affects acoustic call structure in marmoset monkeys.

    PubMed

    Gultekin, Yasemin B; Hage, Steffen R

    2018-04-01

    Human vocal development is dependent on learning by imitation through social feedback between infants and caregivers. Recent studies have revealed that vocal development is also influenced by parental feedback in marmoset monkeys, suggesting vocal learning mechanisms in nonhuman primates. Marmoset infants that experience more contingent vocal feedback than their littermates develop vocalizations more rapidly, and infant marmosets with limited parental interaction exhibit immature vocal behavior beyond infancy. However, it is yet unclear whether direct parental interaction is an obligate requirement for proper vocal development because all monkeys in the aforementioned studies were able to produce the adult call repertoire after infancy. Using quantitative measures to compare distinct call parameters and vocal sequence structure, we show that social interaction has a direct impact not only on the maturation of the vocal behavior but also on acoustic call structures during vocal development. Monkeys with limited parental interaction during development show systematic differences in call entropy, a measure for maturity, compared with their normally raised siblings. In addition, different call types were occasionally uttered in motif-like sequences similar to those exhibited by vocal learners, such as birds and humans, in early vocal development. These results indicate that a lack of parental interaction leads to long-term disturbances in the acoustic structure of marmoset vocalizations, suggesting an imperative role for social interaction in proper primate vocal development.

  19. RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

    PubMed

    Dean, Kimberly M; Grayhack, Elizabeth J

    2012-12-01

    We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.

  20. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq)

    PubMed Central

    Langley, Alexander R.; Gräf, Stefan; Smith, James C.; Krude, Torsten

    2016-01-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. PMID:27587586

  1. Genome-wide identification and characterisation of human DNA replication origins by initiation site sequencing (ini-seq).

    PubMed

    Langley, Alexander R; Gräf, Stefan; Smith, James C; Krude, Torsten

    2016-12-01

    Next-generation sequencing has enabled the genome-wide identification of human DNA replication origins. However, different approaches to mapping replication origins, namely (i) sequencing isolated small nascent DNA strands (SNS-seq); (ii) sequencing replication bubbles (bubble-seq) and (iii) sequencing Okazaki fragments (OK-seq), show only limited concordance. To address this controversy, we describe here an independent high-resolution origin mapping technique that we call initiation site sequencing (ini-seq). In this approach, newly replicated DNA is directly labelled with digoxigenin-dUTP near the sites of its initiation in a cell-free system. The labelled DNA is then immunoprecipitated and genomic locations are determined by DNA sequencing. Using this technique we identify >25,000 discrete origin sites at sub-kilobase resolution on the human genome, with high concordance between biological replicates. Most activated origins identified by ini-seq are found at transcriptional start sites and contain G-quadruplex (G4) motifs. They tend to cluster in early-replicating domains, providing a correlation between early replication timing and local density of activated origins. Origins identified by ini-seq show highest concordance with sites identified by SNS-seq, followed by OK-seq and bubble-seq. Furthermore, germline origins identified by positive nucleotide distribution skew jumps overlap with origins identified by ini-seq and OK-seq more frequently and more specifically than do sites identified by either SNS-seq or bubble-seq. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. From non-preemptive to preemptive scheduling using synchronization synthesis.

    PubMed

    Černý, Pavol; Clarke, Edmund M; Henzinger, Thomas A; Radhakrishna, Arjun; Ryzhyk, Leonid; Samanta, Roopsha; Tarrach, Thorsten

    2017-01-01

    We present a computer-aided programming approach to concurrency. The approach allows programmers to program assuming a friendly, non-preemptive scheduler, and our synthesis procedure inserts synchronization to ensure that the final program works even with a preemptive scheduler. The correctness specification is implicit, inferred from the non-preemptive behavior. Let us consider sequences of calls that the program makes to an external interface. The specification requires that any such sequence produced under a preemptive scheduler should be included in the set of sequences produced under a non-preemptive scheduler. We guarantee that our synthesis does not introduce deadlocks and that the synchronization inserted is optimal w.r.t. a given objective function. The solution is based on a finitary abstraction, an algorithm for bounded language inclusion modulo an independence relation, and generation of a set of global constraints over synchronization placements. Each model of the global constraints set corresponds to a correctness-ensuring synchronization placement. The placement that is optimal w.r.t. the given objective function is chosen as the synchronization solution. We apply the approach to device-driver programming, where the driver threads call the software interface of the device and the API provided by the operating system. Our experiments demonstrate that our synthesis method is precise and efficient. The implicit specification helped us find one concurrency bug previously missed when model-checking using an explicit, user-provided specification. We implemented objective functions for coarse-grained and fine-grained locking and observed that different synchronization placements are produced for our experiments, favoring a minimal number of synchronization operations or maximum concurrency, respectively.

  3. MetaStorm: A Public Resource for Customizable Metagenomics Annotation

    PubMed Central

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S.; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution. PMID:27632579

  4. MetaStorm: A Public Resource for Customizable Metagenomics Annotation.

    PubMed

    Arango-Argoty, Gustavo; Singh, Gargi; Heath, Lenwood S; Pruden, Amy; Xiao, Weidong; Zhang, Liqing

    2016-01-01

    Metagenomics is a trending research area, calling for the need to analyze large quantities of data generated from next generation DNA sequencing technologies. The need to store, retrieve, analyze, share, and visualize such data challenges current online computational systems. Interpretation and annotation of specific information is especially a challenge for metagenomic data sets derived from environmental samples, because current annotation systems only offer broad classification of microbial diversity and function. Moreover, existing resources are not configured to readily address common questions relevant to environmental systems. Here we developed a new online user-friendly metagenomic analysis server called MetaStorm (http://bench.cs.vt.edu/MetaStorm/), which facilitates customization of computational analysis for metagenomic data sets. Users can upload their own reference databases to tailor the metagenomics annotation to focus on various taxonomic and functional gene markers of interest. MetaStorm offers two major analysis pipelines: an assembly-based annotation pipeline and the standard read annotation pipeline used by existing web servers. These pipelines can be selected individually or together. Overall, MetaStorm provides enhanced interactive visualization to allow researchers to explore and manipulate taxonomy and functional annotation at various levels of resolution.

  5. A method for the real-time construction of a full parallax light field

    NASA Astrophysics Data System (ADS)

    Tanaka, Kenji; Aoki, Soko

    2006-02-01

    We designed and implemented a light field acquisition and reproduction system for dynamic objects called LiveDimension, which serves as a 3D live video system for multiple viewers. The acquisition unit consists of circularly arranged NTSC cameras surrounding an object. The display consists of circularly arranged projectors and a rotating screen. The projectors are constantly projecting images captured by the corresponding cameras onto the screen. The screen rotates around an in-plane vertical axis at a sufficient speed so that it faces each of the projectors in sequence. Since the Lambertian surfaces of the screens are covered by light-collimating plastic films with vertical louver patterns that are used for the selection of appropriate light rays, viewers can only observe images from a projector located in the same direction as the viewer. Thus, the dynamic view of an object is dependent on the viewer's head position. We evaluated the system by projecting both objects and human figures and confirmed that the entire system can reproduce light fields with a horizontal parallax to display video sequences of 430x770 pixels at a frame rate of 45 fps. Applications of this system include product design reviews, sales promotion, art exhibits, fashion shows, and sports training with form checking.

  6. GFAST Software Demonstration

    NASA Image and Video Library

    2017-03-17

    NASA engineers and test directors gather in Firing Room 3 in the Launch Control Center at NASA's Kennedy Space Center in Florida, to watch a demonstration of the automated command and control software for the agency's Space Launch System (SLS) and Orion spacecraft. The software is called the Ground Launch Sequencer. It will be responsible for nearly all of the launch commit criteria during the final phases of launch countdowns. The Ground and Flight Application Software Team (GFAST) demonstrated the software. It was developed by the Command, Control and Communications team in the Ground Systems Development and Operations (GSDO) Program. GSDO is helping to prepare the center for the first test flight of Orion atop the SLS on Exploration Mission 1.

  7. CP decomposition approach to blind separation for DS-CDMA system using a new performance index

    NASA Astrophysics Data System (ADS)

    Rouijel, Awatif; Minaoui, Khalid; Comon, Pierre; Aboutajdine, Driss

    2014-12-01

    In this paper, we present a canonical polyadic (CP) tensor decomposition isolating the scaling matrix. This has two major implications: (i) the problem conditioning shows up explicitly and could be controlled through a constraint on the so-called coherences and (ii) a performance criterion concerning the factor matrices can be exactly calculated and is more realistic than performance metrics used in the literature. Two new algorithms optimizing the CP decomposition based on gradient descent are proposed. This decomposition is illustrated by an application to direct-sequence code division multiplexing access (DS-CDMA) systems; computer simulations are provided and demonstrate the good behavior of these algorithms, compared to others in the literature.

  8. From monkey-like action recognition to human language: an evolutionary framework for neurolinguistics.

    PubMed

    Arbib, Michael A

    2005-04-01

    The article analyzes the neural and functional grounding of language skills as well as their emergence in hominid evolution, hypothesizing stages leading from abilities known to exist in monkeys and apes and presumed to exist in our hominid ancestors right through to modern spoken and signed languages. The starting point is the observation that both premotor area F5 in monkeys and Broca's area in humans contain a "mirror system" active for both execution and observation of manual actions, and that F5 and Broca's area are homologous brain regions. This grounded the mirror system hypothesis of Rizzolatti and Arbib (1998) which offers the mirror system for grasping as a key neural "missing link" between the abilities of our nonhuman ancestors of 20 million years ago and modern human language, with manual gestures rather than a system for vocal communication providing the initial seed for this evolutionary process. The present article, however, goes "beyond the mirror" to offer hypotheses on evolutionary changes within and outside the mirror systems which may have occurred to equip Homo sapiens with a language-ready brain. Crucial to the early stages of this progression is the mirror system for grasping and its extension to permit imitation. Imitation is seen as evolving via a so-called simple system such as that found in chimpanzees (which allows imitation of complex "object-oriented" sequences but only as the result of extensive practice) to a so-called complex system found in humans (which allows rapid imitation even of complex sequences, under appropriate conditions) which supports pantomime. This is hypothesized to have provided the substrate for the development of protosign, a combinatorially open repertoire of manual gestures, which then provides the scaffolding for the emergence of protospeech (which thus owes little to nonhuman vocalizations), with protosign and protospeech then developing in an expanding spiral. It is argued that these stages involve biological evolution of both brain and body. By contrast, it is argued that the progression from protosign and protospeech to languages with full-blown syntax and compositional semantics was a historical phenomenon in the development of Homo sapiens, involving few if any further biological changes.

  9. Governor Bush makes first phone call to KSC using new area code

    NASA Technical Reports Server (NTRS)

    1999-01-01

    In the videoconference room at Headquarters, key representatives of KSC contractors, along with KSC directorates, fill the room during an early morning phone call from Florida Governor Jeb Bush (seen on the video screen) in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Deputy Director for Business Operations Jim Jennings (fourth from right) received the call. Next to Jennings (at his right) is seated Robert Osband, Florida Space Institute, who suggested the 3-2-1 sequence to reflect the importance of the space industry to Florida's space coast.

  10. Robot Sequencing and Visualization Program (RSVP)

    NASA Technical Reports Server (NTRS)

    Cooper, Brian K.; Maxwell,Scott A.; Hartman, Frank R.; Wright, John R.; Yen, Jeng; Toole, Nicholas T.; Gorjian, Zareh; Morrison, Jack C

    2013-01-01

    The Robot Sequencing and Visualization Program (RSVP) is being used in the Mars Science Laboratory (MSL) mission for downlink data visualization and command sequence generation. RSVP reads and writes downlink data products from the operations data server (ODS) and writes uplink data products to the ODS. The primary users of RSVP are members of the Rover Planner team (part of the Integrated Planning and Execution Team (IPE)), who use it to perform traversability/articulation analyses, take activity plan input from the Science and Mission Planning teams, and create a set of rover sequences to be sent to the rover every sol. The primary inputs to RSVP are downlink data products and activity plans in the ODS database. The primary outputs are command sequences to be placed in the ODS for further processing prior to uplink to each rover. RSVP is composed of two main subsystems. The first, called the Robot Sequence Editor (RoSE), understands the MSL activity and command dictionaries and takes care of converting incoming activity level inputs into command sequences. The Rover Planners use the RoSE component of RSVP to put together command sequences and to view and manage command level resources like time, power, temperature, etc. (via a transparent realtime connection to SEQGEN). The second component of RSVP is called HyperDrive, a set of high-fidelity computer graphics displays of the Martian surface in 3D and in stereo. The Rover Planners can explore the environment around the rover, create commands related to motion of all kinds, and see the simulated result of those commands via its underlying tight coupling with flight navigation, motor, and arm software. This software is the evolutionary replacement for the Rover Sequencing and Visualization software used to create command sequences (and visualize the Martian surface) for the Mars Exploration Rover mission.

  11. Comparative high-throughput transcriptome sequencing and development of SiESTa, the Silene EST annotation database

    PubMed Central

    2011-01-01

    Background The genus Silene is widely used as a model system for addressing ecological and evolutionary questions in plants, but advances in using the genus as a model system are impeded by the lack of available resources for studying its genome. Massively parallel sequencing cDNA has recently developed into an efficient method for characterizing the transcriptomes of non-model organisms, generating massive amounts of data that enable the study of multiple species in a comparative framework. The sequences generated provide an excellent resource for identifying expressed genes, characterizing functional variation and developing molecular markers, thereby laying the foundations for future studies on gene sequence and gene expression divergence. Here, we report the results of a comparative transcriptome sequencing study of eight individuals representing four Silene and one Dianthus species as outgroup. All sequences and annotations have been deposited in a newly developed and publicly available database called SiESTa, the Silene EST annotation database. Results A total of 1,041,122 EST reads were generated in two runs on a Roche GS-FLX 454 pyrosequencing platform. EST reads were analyzed separately for all eight individuals sequenced and were assembled into contigs using TGICL. These were annotated with results from BLASTX searches and Gene Ontology (GO) terms, and thousands of single-nucleotide polymorphisms (SNPs) were characterized. Unassembled reads were kept as singletons and together with the contigs contributed to the unigenes characterized in each individual. The high quality of unigenes is evidenced by the proportion (49%) that have significant hits in similarity searches with the A. thaliana proteome. The SiESTa database is accessible at http://www.siesta.ethz.ch. Conclusion The sequence collections established in the present study provide an important genomic resource for four Silene and one Dianthus species and will help to further develop Silene as a plant model system. The genes characterized will be useful for future research not only in the species included in the present study, but also in related species for which no genomic resources are yet available. Our results demonstrate the efficiency of massively parallel transcriptome sequencing in a comparative framework as an approach for developing genomic resources in diverse groups of non-model organisms. PMID:21791039

  12. Challenges imposed by minor reference alleles on the identification and reporting of clinical variants from exome data.

    PubMed

    Koko, Mahmoud; Abdallah, Mohammed O E; Amin, Mutaz; Ibrahim, Muntaser

    2018-01-15

    The conventional variant calling of pathogenic alleles in exome and genome sequencing requires the presence of the non-pathogenic alleles as genome references. This hinders the correct identification of variants with minor and/or pathogenic reference alleles warranting additional approaches for variant calling. More than 26,000 Exome Aggregation Consortium (ExAC) variants have a minor reference allele including variants with known ClinVar disease alleles. For instance, in a number of variants related to clotting disorders, the phenotype-associated allele is a human genome reference allele (rs6025, rs6003, rs1799983, and rs2227564 using the assembly hg19). We highlighted how the current variant calling standards miss homozygous reference disease variants in these sites and provided a bioinformatic panel that can be used to screen these variants using commonly available variant callers. We present exome sequencing results from an individual with venous thrombosis to emphasize how pathogenic alleles in clinically relevant variants escape variant calling while non-pathogenic alleles are detected. This article highlights the importance of specialized variant calling strategies in clinical variants with minor reference alleles especially in the context of personal genomes and exomes. We provide here a simple strategy to screen potential disease-causing variants when present in homozygous reference state.

  13. Composeable Chat over Low-Bandwidth Intermittent Communication Links

    DTIC Science & Technology

    2007-04-01

    Compression (STC), introduced in this report, is a data compression algorithm intended to compress alphanumeric... Ziv - Lempel coding, the grandfather of most modern general-purpose file compression programs, watches for input symbol sequences that have previously... data . This section applies these techniques to create a new compression algorithm called Small Text Compression . Various sequence compression

  14. Heterozygous mapping strategy (HetMapps)for high resolution genotyping-by-sequencing markers: a case study in grapevine

    USDA-ARS?s Scientific Manuscript database

    Genotyping by sequencing (GBS) provides opportunities to generate high-resolution genetic maps at a low per-sample genotyping cost, but missing data and under-calling of heterozygotes complicate the creation of GBS linkage maps for highly heterozygous species. To overcome these issues, we developed ...

  15. The Babushka Concept--An Instructional Sequence to Enhance Laboratory Learning in Science Education

    ERIC Educational Resources Information Center

    Gårdebjer, Sofie; Larsson, Anette; Adawi, Tom

    2017-01-01

    This paper deals with a novel method for improving the traditional "verification" laboratory in science education. Drawing on the idea of integrated instructional units, we describe an instructional sequence which we call the Babushka concept. This concept consists of three integrated instructional units: a start-up lecture, a laboratory…

  16. JGI Plant Genomics Gene Annotation Pipeline

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Shu, Shengqiang; Rokhsar, Dan; Goodstein, David

    2014-07-14

    Plant genomes vary in size and are highly complex with a high amount of repeats, genome duplication and tandem duplication. Gene encodes a wealth of information useful in studying organism and it is critical to have high quality and stable gene annotation. Thanks to advancement of sequencing technology, many plant species genomes have been sequenced and transcriptomes are also sequenced. To use these vastly large amounts of sequence data to make gene annotation or re-annotation in a timely fashion, an automatic pipeline is needed. JGI plant genomics gene annotation pipeline, called integrated gene call (IGC), is our effort toward thismore » aim with aid of a RNA-seq transcriptome assembly pipeline. It utilizes several gene predictors based on homolog peptides and transcript ORFs. See Methods for detail. Here we present genome annotation of JGI flagship green plants produced by this pipeline plus Arabidopsis and rice except for chlamy which is done by a third party. The genome annotations of these species and others are used in our gene family build pipeline and accessible via JGI Phytozome portal whose URL and front page snapshot are shown below.« less

  17. Exploiting three kinds of interface propensities to identify protein binding sites.

    PubMed

    Liu, Bin; Wang, Xiaolong; Lin, Lei; Dong, Qiwen; Wang, Xuan

    2009-08-01

    Predicting the binding sites between two interacting proteins provides important clues to the function of a protein. In this study, we present a building block of proteins called order profiles to use the evolutionary information of the protein sequence frequency profiles and apply this building block to produce a class of propensities called order profile interface propensities. For comparisons, we revisit the usage of residue interface propensities and binary profile interface propensities for protein binding site prediction. Each kind of propensities combined with sequence profiles and accessible surface areas are inputted into SVM. When tested on four types of complexes (hetero-permanent complexes, hetero-transient complexes, homo-permanent complexes and homo-transient complexes), experimental results show that the order profile interface propensities are better than residue interface propensities and binary profile interface propensities. Therefore, order profile is a suitable profile-level building block of the protein sequences and can be widely used in many tasks of computational biology, such as the sequence alignment, the prediction of domain boundary, the designation of knowledge-based potentials and the protein remote homology detection.

  18. Analysis of the origin of predictability in human communications

    NASA Astrophysics Data System (ADS)

    Zhang, Lin; Liu, Yani; Wu, Ye; Xiao, Jinghua

    2014-01-01

    Human behaviors in daily life can be traced by their communications via electronic devices. E-mails, short messages and cell-phone calls can be used to investigate the predictability of communication partners’ patterns, because these three are the most representative and common behaviors in daily communications. In this paper, we show that all the three manners have apparent predictability in partners’ patterns, and moreover, the short message users’ sequences have the highest predictability among the three. We also reveal that people with fewer communication partners have higher predictability. Finally, we investigate the origin of predictability, which comes from two aspects: one is the intrinsic pattern in the partners sequence, that is, people have the preference of communicating with a fixed partner after another fixed one. The other aspect is the burst, which is communicating with the same partner several times in a row. The high burst in short message communication pattern is one of the main reasons for its high predictability, the intrinsic pattern in e-mail partners sequence is the main reason for its predictability, and the predictability of cell-phone call partners sequence comes from both aspects.

  19. Integrated Safety Risk Reduction Approach to Enhancing Human-Rated Spaceflight Safety

    NASA Astrophysics Data System (ADS)

    Mikula, J. F. Kip

    2005-12-01

    This paper explores and defines the current accepted concept and philosophy of safety improvement based on a Reliability enhancement (called here Reliability Enhancement Based Safety Theory [REBST]). In this theory a Reliability calculation is used as a measure of the safety achieved on the program. This calculation may be based on a math model or a Fault Tree Analysis (FTA) of the system, or on an Event Tree Analysis (ETA) of the system's operational mission sequence. In each case, the numbers used in this calculation are hardware failure rates gleaned from past similar programs. As part of this paper, a fictional but representative case study is provided that helps to illustrate the problems and inaccuracies of this approach to safety determination. Then a safety determination and enhancement approach based on hazard, worst case analysis, and safety risk determination (called here Worst Case Based Safety Theory [WCBST]) is included. This approach is defined and detailed using the same example case study as shown in the REBST case study. In the end it is concluded that an approach combining the two theories works best to reduce Safety Risk.

  20. DeNovoGUI: An Open Source Graphical User Interface for de Novo Sequencing of Tandem Mass Spectra

    PubMed Central

    2013-01-01

    De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com. PMID:24295440

  1. DeNovoGUI: an open source graphical user interface for de novo sequencing of tandem mass spectra.

    PubMed

    Muth, Thilo; Weilnböck, Lisa; Rapp, Erdmann; Huber, Christian G; Martens, Lennart; Vaudel, Marc; Barsnes, Harald

    2014-02-07

    De novo sequencing is a popular technique in proteomics for identifying peptides from tandem mass spectra without having to rely on a protein sequence database. Despite the strong potential of de novo sequencing algorithms, their adoption threshold remains quite high. We here present a user-friendly and lightweight graphical user interface called DeNovoGUI for running parallelized versions of the freely available de novo sequencing software PepNovo+, greatly simplifying the use of de novo sequencing in proteomics. Our platform-independent software is freely available under the permissible Apache2 open source license. Source code, binaries, and additional documentation are available at http://denovogui.googlecode.com .

  2. Integrating multi-omic features exploiting Chromosome Conformation Capture data.

    PubMed

    Merelli, Ivan; Tordini, Fabio; Drocco, Maurizio; Aldinucci, Marco; Liò, Pietro; Milanesi, Luciano

    2015-01-01

    The representation, integration, and interpretation of omic data is a complex task, in particular considering the huge amount of information that is daily produced in molecular biology laboratories all around the world. The reason is that sequencing data regarding expression profiles, methylation patterns, and chromatin domains is difficult to harmonize in a systems biology view, since genome browsers only allow coordinate-based representations, discarding functional clusters created by the spatial conformation of the DNA in the nucleus. In this context, recent progresses in high throughput molecular biology techniques and bioinformatics have provided insights into chromatin interactions on a larger scale and offer a formidable support for the interpretation of multi-omic data. In particular, a novel sequencing technique called Chromosome Conformation Capture allows the analysis of the chromosome organization in the cell's natural state. While performed genome wide, this technique is usually called Hi-C. Inspired by service applications such as Google Maps, we developed NuChart, an R package that integrates Hi-C data to describe the chromosomal neighborhood starting from the information about gene positions, with the possibility of mapping on the achieved graphs genomic features such as methylation patterns and histone modifications, along with expression profiles. In this paper we show the importance of the NuChart application for the integration of multi-omic data in a systems biology fashion, with particular interest in cytogenetic applications of these techniques. Moreover, we demonstrate how the integration of multi-omic data can provide useful information in understanding why genes are in certain specific positions inside the nucleus and how epigenetic patterns correlate with their expression.

  3. Identification of genomic indels and structural variations using split reads

    PubMed Central

    2011-01-01

    Background Recent studies have demonstrated the genetic significance of insertions, deletions, and other more complex structural variants (SVs) in the human population. With the development of the next-generation sequencing technologies, high-throughput surveys of SVs on the whole-genome level have become possible. Here we present split-read identification, calibrated (SRiC), a sequence-based method for SV detection. Results We start by mapping each read to the reference genome in standard fashion using gapped alignment. Then to identify SVs, we score each of the many initial mappings with an assessment strategy designed to take into account both sequencing and alignment errors (e.g. scoring more highly events gapped in the center of a read). All current SV calling methods have multilevel biases in their identifications due to both experimental and computational limitations (e.g. calling more deletions than insertions). A key aspect of our approach is that we calibrate all our calls against synthetic data sets generated from simulations of high-throughput sequencing (with realistic error models). This allows us to calculate sensitivity and the positive predictive value under different parameter-value scenarios and for different classes of events (e.g. long deletions vs. short insertions). We run our calculations on representative data from the 1000 Genomes Project. Coupling the observed numbers of events on chromosome 1 with the calibrations gleaned from the simulations (for different length events) allows us to construct a relatively unbiased estimate for the total number of SVs in the human genome across a wide range of length scales. We estimate in particular that an individual genome contains ~670,000 indels/SVs. Conclusions Compared with the existing read-depth and read-pair approaches for SV identification, our method can pinpoint the exact breakpoints of SV events, reveal the actual sequence content of insertions, and cover the whole size spectrum for deletions. Moreover, with the advent of the third-generation sequencing technologies that produce longer reads, we expect our method to be even more useful. PMID:21787423

  4. First insight into the viral community of the cnidarian model metaorganism Aiptasia using RNA-Seq data

    PubMed Central

    Brüwer, Jan D.

    2018-01-01

    Current research posits that all multicellular organisms live in symbioses with associated microorganisms and form so-called metaorganisms or holobionts. Cnidarian metaorganisms are of specific interest given that stony corals provide the foundation of the globally threatened coral reef ecosystems. To gain first insight into viruses associated with the coral model system Aiptasia (sensu Exaiptasia pallida), we analyzed an existing RNA-Seq dataset of aposymbiotic, partially populated, and fully symbiotic Aiptasia CC7 anemones with Symbiodinium. Our approach included the selective removal of anemone host and algal endosymbiont sequences and subsequent microbial sequence annotation. Of a total of 297 million raw sequence reads, 8.6 million (∼3%) remained after host and endosymbiont sequence removal. Of these, 3,293 sequences could be assigned as of viral origin. Taxonomic annotation of these sequences suggests that Aiptasia is associated with a diverse viral community, comprising 116 viral taxa covering 40 families. The viral assemblage was dominated by viruses from the families Herpesviridae (12.00%), Partitiviridae (9.93%), and Picornaviridae (9.87%). Despite an overall stable viral assemblage, we found that some viral taxa exhibited significant changes in their relative abundance when Aiptasia engaged in a symbiotic relationship with Symbiodinium. Elucidation of viral taxa consistently present across all conditions revealed a core virome of 15 viral taxa from 11 viral families, encompassing many viruses previously reported as members of coral viromes. Despite the non-random selection of viral genetic material due to the nature of the sequencing data analyzed, our study provides a first insight into the viral community associated with Aiptasia. Similarities of the Aiptasia viral community with those of corals corroborate the application of Aiptasia as a model system to study coral holobionts. Further, the change in abundance of certain viral taxa across different symbiotic states suggests a role of viruses in the algal endosymbiosis, but the functional significance of this remains to be determined. PMID:29507840

  5. BPSK Demodulation Using Digital Signal Processing

    NASA Technical Reports Server (NTRS)

    Garcia, Thomas R.

    1996-01-01

    A digital communications signal is a sinusoidal waveform that is modified by a binary (digital) information signal. The sinusoidal waveform is called the carrier. The carrier may be modified in amplitude, frequency, phase, or a combination of these. In this project a binary phase shift keyed (BPSK) signal is the communication signal. In a BPSK signal the phase of the carrier is set to one of two states, 180 degrees apart, by a binary (i.e., 1 or 0) information signal. A digital signal is a sampled version of a "real world" time continuous signal. The digital signal is generated by sampling the continuous signal at discrete points in time. The rate at which the signal is sampled is called the sampling rate (f(s)). The device that performs this operation is called an analog-to-digital (A/D) converter or a digitizer. The digital signal is composed of the sequence of individual values of the sampled BPSK signal. Digital signal processing (DSP) is the modification of the digital signal by mathematical operations. A device that performs this processing is called a digital signal processor. After processing, the digital signal may then be converted back to an analog signal using a digital-to-analog (D/A) converter. The goal of this project is to develop a system that will recover the digital information from a BPSK signal using DSP techniques. The project is broken down into the following steps: (1) Development of the algorithms required to demodulate the BPSK signal; (2) Simulation of the system; and (3) Implementation a BPSK receiver using digital signal processing hardware.

  6. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Ureba, A.; Salguero, F. J.; Barbeiro, A. R.

    Purpose: The authors present a hybrid direct multileaf collimator (MLC) aperture optimization model exclusively based on sequencing of patient imaging data to be implemented on a Monte Carlo treatment planning system (MC-TPS) to allow the explicit radiation transport simulation of advanced radiotherapy treatments with optimal results in efficient times for clinical practice. Methods: The planning system (called CARMEN) is a full MC-TPS, controlled through aMATLAB interface, which is based on the sequencing of a novel map, called “biophysical” map, which is generated from enhanced image data of patients to achieve a set of segments actually deliverable. In order to reducemore » the required computation time, the conventional fluence map has been replaced by the biophysical map which is sequenced to provide direct apertures that will later be weighted by means of an optimization algorithm based on linear programming. A ray-casting algorithm throughout the patient CT assembles information about the found structures, the mass thickness crossed, as well as PET values. Data are recorded to generate a biophysical map for each gantry angle. These maps are the input files for a home-made sequencer developed to take into account the interactions of photons and electrons with the MLC. For each linac (Axesse of Elekta and Primus of Siemens) and energy beam studied (6, 9, 12, 15 MeV and 6 MV), phase space files were simulated with the EGSnrc/BEAMnrc code. The dose calculation in patient was carried out with the BEAMDOSE code. This code is a modified version of EGSnrc/DOSXYZnrc able to calculate the beamlet dose in order to combine them with different weights during the optimization process. Results: Three complex radiotherapy treatments were selected to check the reliability of CARMEN in situations where the MC calculation can offer an added value: A head-and-neck case (Case I) with three targets delineated on PET/CT images and a demanding dose-escalation; a partial breast irradiation case (Case II) solved with photon and electron modulated beams (IMRT + MERT); and a prostatic bed case (Case III) with a pronounced concave-shaped PTV by using volumetric modulated arc therapy. In the three cases, the required target prescription doses and constraints on organs at risk were fulfilled in a short enough time to allow routine clinical implementation. The quality assurance protocol followed to check CARMEN system showed a high agreement with the experimental measurements. Conclusions: A Monte Carlo treatment planning model exclusively based on maps performed from patient imaging data has been presented. The sequencing of these maps allows obtaining deliverable apertures which are weighted for modulation under a linear programming formulation. The model is able to solve complex radiotherapy treatments with high accuracy in an efficient computation time.« less

  7. Fault trees and sequence dependencies

    NASA Technical Reports Server (NTRS)

    Dugan, Joanne Bechta; Boyd, Mark A.; Bavuso, Salvatore J.

    1990-01-01

    One of the frequently cited shortcomings of fault-tree models, their inability to model so-called sequence dependencies, is discussed. Several sources of such sequence dependencies are discussed, and new fault-tree gates to capture this behavior are defined. These complex behaviors can be included in present fault-tree models because they utilize a Markov solution. The utility of the new gates is demonstrated by presenting several models of the fault-tolerant parallel processor, which include both hot and cold spares.

  8. ACTG: novel peptide mapping onto gene models.

    PubMed

    Choi, Seunghyuk; Kim, Hyunwoo; Paek, Eunok

    2017-04-15

    In many proteogenomic applications, mapping peptide sequences onto genome sequences can be very useful, because it allows us to understand origins of the gene products. Existing software tools either take the genomic position of a peptide start site as an input or assume that the peptide sequence exactly matches the coding sequence of a given gene model. In case of novel peptides resulting from genomic variations, especially structural variations such as alternative splicing, these existing tools cannot be directly applied unless users supply information about the variant, either its genomic position or its transcription model. Mapping potentially novel peptides to genome sequences, while allowing certain genomic variations, requires introducing novel gene models when aligning peptide sequences to gene structures. We have developed a new tool called ACTG (Amino aCids To Genome), which maps peptides to genome, assuming all possible single exon skipping, junction variation allowing three edit distances from the original splice sites, exon extension and frame shift. In addition, it can also consider SNVs (single nucleotide variations) during mapping phase if a user provides the VCF (variant call format) file as an input. Available at http://prix.hanyang.ac.kr/ACTG/search.jsp . eunokpaek@hanyang.ac.kr. Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com

  9. Development of a New Marker System for Identification of Spirodela polyrhiza and Landoltia punctata

    PubMed Central

    Feng, Bo; Fang, Yang; Xu, Zhibin; Xiang, Chao; Zhou, Chunhong; Jiang, Fei; Wang, Tao

    2017-01-01

    Lemnaceae (commonly called duckweed) is an aquatic plant ideal for quantitative analysis in plant sciences. Several species of this family represent the smallest and fastest growing flowering plants. Different ecotypes of the same species vary in their biochemical and physiological properties. Thus, selecting of desirable ecotypes of a species is very important. Here, we developed a simple and rapid molecular identification system for Spirodela polyrhiza and Landoltia punctata based on the sequence polymorphism. First, several pairs of primers were designed and three markers were selected as good for identification. After PCR amplification, DNA fragments (the combination of three PCR products) in different duckweeds were detected using capillary electrophoresis. The high-resolution capillary electrophoresis displayed high identity to the sequencing results. The combination of the PCR products containing several DNA fragments highly improved the identification frequency. These results indicate that this method is not only good for interspecies identification but also ideal for intraspecies distinguishing. Meanwhile, 11 haplotypes were found in both the S. polyrhiza and L. punctata ecotypes. The results suggest that this marker system is useful for large-scale identification of duckweed and for the screening of desirable ecotypes to improve the diverse usage in duckweed utilization. PMID:28168191

  10. Hunting for low abundant redox proteins in plant plasma membranes.

    PubMed

    Lüthje, Sabine; Hopff, David; Schmitt, Anna; Meisrimler, Claudia-Nicole; Menckhoff, Ljiljana

    2009-04-13

    Nowadays electron transport (redox) systems in plasma membranes appear well established. Members of the flavocytochrome b family have been identified by their nucleotide acid sequences and characterized on the transcriptional level. For their gene products functions have been demonstrated in iron uptake and oxidative stress including biotic interactions, abiotic stress factors and plant development. In addition, NAD(P)H-dependent oxidoreductases and b-type cytochromes have been purified and characterized from plasma membranes. Several of these proteins seem to belong to the group of hypothetical or unknown proteins. Low abundance and the lack of amino acid sequence data for these proteins still hamper their functional analysis. Consequently, little is known about the physiological function and regulation of these enzymes. In recent years evidence has been presented for the existence of microdomains (so-called lipid rafts) in plasma membranes and their interaction with specific membrane proteins. The identification of redox systems in detergent insoluble membranes supports the idea that redox systems may have important functions in signal transduction, stress responses, cell wall metabolism, and transport processes. This review summarizes our present knowledge on plasma membrane redox proteins and discusses alternative strategies to investigate the function and regulation of these enzymes.

  11. Hot subdwarfs in (eclipsing) binaries with brown dwarf or low-mass main-sequence companions

    NASA Astrophysics Data System (ADS)

    Schaffenroth, Veronika; Geier, Stephan; Heber, Uli

    2014-09-01

    The formation of hot subdwarf stars (sdBs), which are core helium-burning stars located on the extended horizontal branch, is not yet understood. Many of the known hot subdwarf stars reside in close binary systems with short orbital periods of between a few hours and a few days, with either M-star or white-dwarf companions. Common-envelope ejection is the most probable formation channel. Among these, eclipsing systems are of special importance because it is possible to constrain the parameters of both components tightly by combining spectroscopic and light-curve analyses. They are called HW Virginis systems. Soker (1998) proposed that planetary or brown-dwarf companions could cause the mass loss necessary to form an sdB. Substellar objects with masses greater than >10 M_J were predicted to survive the common-envelope phase and end up in a close orbit around the stellar remnant, while planets with lower masses would entirely evaporate. This raises the question if planets can affect stellar evolution. Here we report on newly discovered eclipsing or not eclipsing hot subdwarf binaries with brown-dwarf or low-mass main-sequence companions and their spectral and photometric analysis to determine the fundamental parameters of both components.

  12. The era of immunogenomics/immunopharmacogenomics.

    PubMed

    Zewde, Makda; Kiyotani, Kazuma; Park, Jae-Hyun; Fang, Hua; Yap, Kai Lee; Yew, Poh Yin; Alachkar, Houda; Kato, Taigo; Mai, Tu H; Ikeda, Yuji; Matsuda, Tatsuo; Liu, Xiao; Ren, Lili; Deng, Boya; Harada, Makiko; Nakamura, Yusuke

    2018-05-21

    Although germline alterations and somatic mutations in disease cells have been extensively analyzed, molecular changes in immune cells associated with disease conditions have not been characterized in depth. It is clear that our immune system has a critical role in various biological and pathological conditions, such as infectious diseases, autoimmune diseases, drug-induced skin and liver toxicity, food allergy, and rejection of transplanted organs. The recent development of cancer immunotherapies, particularly drugs modulating the immune checkpoint molecules, has clearly demonstrated the importance of host immune cells in cancer treatments. However, the molecular mechanisms by which these new therapies kill tumor cells are still not fully understood. In this regard, we have begun to explore the role of newly developed tools such as next-generation sequencing in the genetic characterization of both cancer cells and host immune cells, a field that is called immunogenomics/ immunopharmacogenomics. This new field has enormous potential to help us better understand changes in our immune system during the course of various disease conditions. Here we report the potential of deep sequencing of T-cell and B-cell receptors in capturing the molecular contribution of the immune system, which we believe plays critical roles in the pathogenesis of various human diseases.

  13. Microbe-ID: an open source toolbox for microbial genotyping and species identification.

    PubMed

    Tabima, Javier F; Everhart, Sydney E; Larsen, Meredith M; Weisberg, Alexandra J; Kamvar, Zhian N; Tancos, Matthew A; Smart, Christine D; Chang, Jeff H; Grünwald, Niklaus J

    2016-01-01

    Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID.

  14. Innate Immune Complexity in the Purple Sea Urchin: Diversity of the Sp185/333 System

    PubMed Central

    Smith, L. Courtney

    2012-01-01

    The California purple sea urchin, Strongylocentrotus purpuratus, is a long-lived echinoderm with a complex and sophisticated innate immune system. There are several large gene families that function in immunity in this species including the Sp185/333 gene family that has ∼50 (±10) members. The family shows intriguing sequence diversity and encodes a broad array of diverse yet similar proteins. The genes have two exons of which the second encodes the mature protein and has repeats and blocks of sequence called elements. Mosaics of element patterns plus single nucleotide polymorphisms-based variants of the elements result in significant sequence diversity among the genes yet maintains similar structure among the members of the family. Sequence of a bacterial artificial chromosome insert shows a cluster of six, tightly linked Sp185/333 genes that are flanked by GA microsatellites. The sequences between the GA microsatellites in which the Sp185/333 genes and flanking regions are located, are much more similar to each other than are the sequences outside the microsatellites suggesting processes such as gene conversion, recombination, or duplication. However, close linkage does not correspond with greater sequence similarity compared to randomly cloned and sequenced genes that are unlikely to be linked. There are three segmental duplications that are bounded by GAT microsatellites and include three almost identical genes plus flanking regions. RNA editing is detectible throughout the mRNAs based on comparisons to the genes, which, in combination with putative post-translational modifications to the proteins, results in broad arrays of Sp185/333 proteins that differ among individuals. The mature proteins have an N-terminal glycine-rich region, a central RGD motif, and a C-terminal histidine-rich region. The Sp185/333 proteins are localized to the cell surface and are found within vesicles in subsets of polygonal and small phagocytes. The coelomocyte proteome shows full-length and truncated proteins, including some with missense sequence. Current results suggest that both native Sp185/333 proteins and a recombinant protein bind bacteria and are likely important in sea urchin innate immunity. PMID:22566951

  15. GenomeFingerprinter: the genome fingerprint and the universal genome fingerprint analysis for systematic comparative genomics.

    PubMed

    Ai, Yuncan; Ai, Hannan; Meng, Fanmei; Zhao, Lei

    2013-01-01

    No attention has been paid on comparing a set of genome sequences crossing genetic components and biological categories with far divergence over large size range. We define it as the systematic comparative genomics and aim to develop the methodology. First, we create a method, GenomeFingerprinter, to unambiguously produce a set of three-dimensional coordinates from a sequence, followed by one three-dimensional plot and six two-dimensional trajectory projections, to illustrate the genome fingerprint of a given genome sequence. Second, we develop a set of concepts and tools, and thereby establish a method called the universal genome fingerprint analysis (UGFA). Particularly, we define the total genetic component configuration (TGCC) (including chromosome, plasmid, and phage) for describing a strain as a systematic unit, the universal genome fingerprint map (UGFM) of TGCC for differentiating strains as a universal system, and the systematic comparative genomics (SCG) for comparing a set of genomes crossing genetic components and biological categories. Third, we construct a method of quantitative analysis to compare two genomes by using the outcome dataset of genome fingerprint analysis. Specifically, we define the geometric center and its geometric mean for a given genome fingerprint map, followed by the Euclidean distance, the differentiate rate, and the weighted differentiate rate to quantitatively describe the difference between two genomes of comparison. Moreover, we demonstrate the applications through case studies on various genome sequences, giving tremendous insights into the critical issues in microbial genomics and taxonomy. We have created a method, GenomeFingerprinter, for rapidly computing, geometrically visualizing, intuitively comparing a set of genomes at genome fingerprint level, and hence established a method called the universal genome fingerprint analysis, as well as developed a method of quantitative analysis of the outcome dataset. These have set up the methodology of systematic comparative genomics based on the genome fingerprint analysis.

  16. ToTem: a tool for variant calling pipeline optimization.

    PubMed

    Tom, Nikola; Tom, Ondrej; Malcikova, Jitka; Pavlova, Sarka; Kubesova, Blanka; Rausch, Tobias; Kolarik, Miroslav; Benes, Vladimir; Bystry, Vojtech; Pospisilova, Sarka

    2018-06-26

    High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. ToTem is a tool for automated pipeline optimization which is freely available as a web application at  https://totem.software .

  17. A multilevel ant colony optimization algorithm for classical and isothermic DNA sequencing by hybridization with multiplicity information available.

    PubMed

    Kwarciak, Kamil; Radom, Marcin; Formanowicz, Piotr

    2016-04-01

    The classical sequencing by hybridization takes into account a binary information about sequence composition. A given element from an oligonucleotide library is or is not a part of the target sequence. However, the DNA chip technology has been developed and it enables to receive a partial information about multiplicity of each oligonucleotide the analyzed sequence consist of. Currently, it is not possible to assess the exact data of such type but even partial information should be very useful. Two realistic multiplicity information models are taken into consideration in this paper. The first one, called "one and many" assumes that it is possible to obtain information if a given oligonucleotide occurs in a reconstructed sequence once or more than once. According to the second model, called "one, two and many", one is able to receive from biochemical experiment information if a given oligonucleotide is present in an analyzed sequence once, twice or at least three times. An ant colony optimization algorithm has been implemented to verify the above models and to compare with existing algorithms for sequencing by hybridization which utilize the additional information. The proposed algorithm solves the problem with any kind of hybridization errors. Computational experiment results confirm that using even the partial information about multiplicity leads to increased quality of reconstructed sequences. Moreover, they also show that the more precise model enables to obtain better solutions and the ant colony optimization algorithm outperforms the existing ones. Test data sets and the proposed ant colony optimization algorithm are available on: http://bioserver.cs.put.poznan.pl/download/ACO4mSBH.zip. Copyright © 2016 Elsevier Ltd. All rights reserved.

  18. Decoding the genome beyond sequencing: the new phase of genomic research.

    PubMed

    Heng, Henry H Q; Liu, Guo; Stevens, Joshua B; Bremer, Steven W; Ye, Karen J; Abdallah, Batoul Y; Horne, Steven D; Ye, Christine J

    2011-10-01

    While our understanding of gene-based biology has greatly improved, it is clear that the function of the genome and most diseases cannot be fully explained by genes and other regulatory elements. Genes and the genome represent distinct levels of genetic organization with their own coding systems; Genes code parts like protein and RNA, but the genome codes the structure of genetic networks, which are defined by the whole set of genes, chromosomes and their topological interactions within a cell. Accordingly, the genetic code of DNA offers limited understanding of genome functions. In this perspective, we introduce the genome theory which calls for the departure of gene-centric genomic research. To make this transition for the next phase of genomic research, it is essential to acknowledge the importance of new genome-based biological concepts and to establish new technology platforms to decode the genome beyond sequencing. Copyright © 2011 Elsevier Inc. All rights reserved.

  19. The ASLOTS concept: An interactive, adaptive decision support concept for Final Approach Spacing of Aircraft (FASA). FAA-NASA Joint University Program

    NASA Technical Reports Server (NTRS)

    Simpson, Robert W.

    1993-01-01

    This presentation outlines a concept for an adaptive, interactive decision support system to assist controllers at a busy airport in achieving efficient use of multiple runways. The concept is being implemented as a computer code called FASA (Final Approach Spacing for Aircraft), and will be tested and demonstrated in ATCSIM, a high fidelity simulation of terminal area airspace and airport surface operations. Objectives are: (1) to provide automated cues to assist controllers in the sequencing and spacing of landing and takeoff aircraft; (2) to provide the controller with a limited ability to modify the sequence and spacings between aircraft, and to insert takeoffs and missed approach aircraft in the landing flows; (3) to increase spacing accuracy using more complex and precise separation criteria while reducing controller workload; and (4) achieve higher operational takeoff and landing rates on multiple runways in poor visibility.

  20. Deep Sequencing to Identify the Causes of Viral Encephalitis

    PubMed Central

    Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

    2014-01-01

    Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691

  1. Synthetic internal control sequences to increase negative call veracity in multiplexed, quantitative PCR assays for Phakopsora pachyrhizi

    USDA-ARS?s Scientific Manuscript database

    Quantitative PCR (Q-PCR) utilizing specific primer sequences and a fluorogenic, 5’-exonuclease linear hydrolysis probe is well established as a detection and identification method for Phakopsora pachyrhizi, the soybean rust pathogen. Because of the extreme sensitivity of Q-PCR, the DNA of a single u...

  2. Comment on "Protein sequences from mastodon and Tyrannosaurus rex revealed by mass spectrometry".

    PubMed

    Pevzner, Pavel A; Kim, Sangtae; Ng, Julio

    2008-08-22

    Asara et al. (Reports, 13 April 2007, p. 280) reported sequencing of Tyrannosaurus rex proteins and used them to establish the evolutionary relationships between birds and dinosaurs. We argue that the reported T. rex peptides may represent statistical artifacts and call for complete data release to enable experimental and computational verification of their findings.

  3. Physics First: An Informational Guide for Teachers, School Administrators, Parents, Scientists, and the Public

    ERIC Educational Resources Information Center

    American Association of Physics Teachers (NJ1), 2009

    2009-01-01

    Physics First represents an organizational alternative to the traditional high school science sequence. It calls for a re-sequencing of high school courses so that students study physics before chemistry and biology. The purpose of this pamphlet is to provide: (1) Basic information and rationale for the Physics First curriculum; (2) Strategies for…

  4. How to Help Students Conceptualize the Rigorous Definition of the Limit of a Sequence

    ERIC Educational Resources Information Center

    Roh, Kyeong Hah

    2010-01-01

    This article suggests an activity, called the epsilon-strip activity, as an instructional method for conceptualization of the rigorous definition of the limit of a sequence via visualization. The article also describes the learning objectives of each instructional step of the activity, and then provides detailed instructional methods to guide…

  5. Lindamood Phonemic Sequencing (LiPS) [R]. What Works Clearinghouse Intervention Report

    ERIC Educational Resources Information Center

    What Works Clearinghouse, 2008

    2008-01-01

    The Lindamood Phonemic Sequencing (LiPS)[R] program (formerly called the Auditory Discrimination in Depth[R] [ADD] program) is designed to teach students skills to decode words and to identify individual sounds and blends in words. The program is individualized to meet student needs and is often used with students who have learning disabilities or…

  6. Coval: Improving Alignment Quality and Variant Calling Accuracy for Next-Generation Sequencing Data

    PubMed Central

    Kosugi, Shunichi; Natsume, Satoshi; Yoshida, Kentaro; MacLean, Daniel; Cano, Liliana; Kamoun, Sophien; Terauchi, Ryohei

    2013-01-01

    Accurate identification of DNA polymorphisms using next-generation sequencing technology is challenging because of a high rate of sequencing error and incorrect mapping of reads to reference genomes. Currently available short read aligners and DNA variant callers suffer from these problems. We developed the Coval software to improve the quality of short read alignments. Coval is designed to minimize the incidence of spurious alignment of short reads, by filtering mismatched reads that remained in alignments after local realignment and error correction of mismatched reads. The error correction is executed based on the base quality and allele frequency at the non-reference positions for an individual or pooled sample. We demonstrated the utility of Coval by applying it to simulated genomes and experimentally obtained short-read data of rice, nematode, and mouse. Moreover, we found an unexpectedly large number of incorrectly mapped reads in ‘targeted’ alignments, where the whole genome sequencing reads had been aligned to a local genomic segment, and showed that Coval effectively eliminated such spurious alignments. We conclude that Coval significantly improves the quality of short-read sequence alignments, thereby increasing the calling accuracy of currently available tools for SNP and indel identification. Coval is available at http://sourceforge.net/projects/coval105/. PMID:24116042

  7. A fully redundant double difference algorithm for obtaining minimum variance estimates from GPS observations

    NASA Technical Reports Server (NTRS)

    Melbourne, William G.

    1986-01-01

    In double differencing a regression system obtained from concurrent Global Positioning System (GPS) observation sequences, one either undersamples the system to avoid introducing colored measurement statistics, or one fully samples the system incurring the resulting non-diagonal covariance matrix for the differenced measurement errors. A suboptimal estimation result will be obtained in the undersampling case and will also be obtained in the fully sampled case unless the color noise statistics are taken into account. The latter approach requires a least squares weighting matrix derived from inversion of a non-diagonal covariance matrix for the differenced measurement errors instead of inversion of the customary diagonal one associated with white noise processes. Presented is the so-called fully redundant double differencing algorithm for generating a weighted double differenced regression system that yields equivalent estimation results, but features for certain cases a diagonal weighting matrix even though the differenced measurement error statistics are highly colored.

  8. MethylViewer: computational analysis and editing for bisulfite sequencing and methyltransferase accessibility protocol for individual templates (MAPit) projects.

    PubMed

    Pardo, Carolina E; Carr, Ian M; Hoffman, Christopher J; Darst, Russell P; Markham, Alexander F; Bonthron, David T; Kladde, Michael P

    2011-01-01

    Bisulfite sequencing is a widely-used technique for examining cytosine DNA methylation at nucleotide resolution along single DNA strands. Probing with cytosine DNA methyltransferases followed by bisulfite sequencing (MAPit) is an effective technique for mapping protein-DNA interactions. Here, MAPit methylation footprinting with M.CviPI, a GC methyltransferase we previously cloned and characterized, was used to probe hMLH1 chromatin in HCT116 and RKO colorectal cancer cells. Because M.CviPI-probed samples contain both CG and GC methylation, we developed a versatile, visually-intuitive program, called MethylViewer, for evaluating the bisulfite sequencing results. Uniquely, MethylViewer can simultaneously query cytosine methylation status in bisulfite-converted sequences at as many as four different user-defined motifs, e.g. CG, GC, etc., including motifs with degenerate bases. Data can also be exported for statistical analysis and as publication-quality images. Analysis of hMLH1 MAPit data with MethylViewer showed that endogenous CG methylation and accessible GC sites were both mapped on single molecules at high resolution. Disruption of positioned nucleosomes on single molecules of the PHO5 promoter was detected in budding yeast using M.CviPII, increasing the number of enzymes available for probing protein-DNA interactions. MethylViewer provides an integrated solution for primer design and rapid, accurate and detailed analysis of bisulfite sequencing or MAPit datasets from virtually any biological or biochemical system.

  9. Positional bias in variant calls against draft reference assemblies.

    PubMed

    Briskine, Roman V; Shimizu, Kentaro K

    2017-03-28

    Whole genome resequencing projects may implement variant calling using draft reference genomes assembled de novo from short-read libraries. Despite lower quality of such assemblies, they allowed researchers to extend a wide range of population genetic and genome-wide association analyses to non-model species. As the variant calling pipelines are complex and involve many software packages, it is important to understand inherent biases and limitations at each step of the analysis. In this article, we report a positional bias present in variant calling performed against draft reference assemblies constructed from de Bruijn or string overlap graphs. We assessed how frequently variants appeared at each position counted from ends of a contig or scaffold sequence, and discovered unexpectedly high number of variants at the positions related to the length of either k-mers or reads used for the assembly. We detected the bias in both publicly available draft assemblies from Assemblathon 2 competition as well as in the assemblies we generated from our simulated short-read data. Simulations confirmed that the bias causing variants are predominantly false positives induced by reads from spatially distant repeated sequences. The bias is particularly strong in contig assemblies. Scaffolding does not eliminate the bias but tends to mitigate it because of the changes in variants' relative positions and alterations in read alignments. The bias can be effectively reduced by filtering out the variants that reside in repetitive elements. Draft genome sequences generated by several popular assemblers appear to be susceptible to the positional bias potentially affecting many resequencing projects in non-model species. The bias is inherent to the assembly algorithms and arises from their particular handling of repeated sequences. It is recommended to reduce the bias by filtering especially if higher-quality genome assembly cannot be achieved. Our findings can help other researchers to improve the quality of their variant data sets and reduce artefactual findings in downstream analyses.

  10. Algorithm for Video Summarization of Bronchoscopy Procedures

    PubMed Central

    2011-01-01

    Background The duration of bronchoscopy examinations varies considerably depending on the diagnostic and therapeutic procedures used. It can last more than 20 minutes if a complex diagnostic work-up is included. With wide access to videobronchoscopy, the whole procedure can be recorded as a video sequence. Common practice relies on an active attitude of the bronchoscopist who initiates the recording process and usually chooses to archive only selected views and sequences. However, it may be important to record the full bronchoscopy procedure as documentation when liability issues are at stake. Furthermore, an automatic recording of the whole procedure enables the bronchoscopist to focus solely on the performed procedures. Video recordings registered during bronchoscopies include a considerable number of frames of poor quality due to blurry or unfocused images. It seems that such frames are unavoidable due to the relatively tight endobronchial space, rapid movements of the respiratory tract due to breathing or coughing, and secretions which occur commonly in the bronchi, especially in patients suffering from pulmonary disorders. Methods The use of recorded bronchoscopy video sequences for diagnostic, reference and educational purposes could be considerably extended with efficient, flexible summarization algorithms. Thus, the authors developed a prototype system to create shortcuts (called summaries or abstracts) of bronchoscopy video recordings. Such a system, based on models described in previously published papers, employs image analysis methods to exclude frames or sequences of limited diagnostic or education value. Results The algorithm for the selection or exclusion of specific frames or shots from video sequences recorded during bronchoscopy procedures is based on several criteria, including automatic detection of "non-informative", frames showing the branching of the airways and frames including pathological lesions. Conclusions The paper focuses on the challenge of generating summaries of bronchoscopy video recordings. PMID:22185344

  11. Single sea urchin phagocytes express messages of a single sequence from the diverse Sp185/333 gene family in response to bacterial challenge.

    PubMed

    Majeske, Audrey J; Oren, Matan; Sacchi, Sandro; Smith, L Courtney

    2014-12-01

    Immune systems in animals rely on fast and efficient responses to a wide variety of pathogens. The Sp185/333 gene family in the purple sea urchin, Strongylocentrotus purpuratus, consists of an estimated 50 (±10) members per genome that share a basic gene structure but show high sequence diversity, primarily due to the mosaic appearance of short blocks of sequence called elements. The genes show significantly elevated expression in three subpopulations of phagocytes responding to marine bacteria. The encoded Sp185/333 proteins are highly diverse and have central effector functions in the immune system. In this study we report the Sp185/333 gene expression in single sea urchin phagocytes. Sea urchins challenged with heat-killed marine bacteria resulted in a typical increase in coelomocyte concentration within 24 h, which included an increased proportion of phagocytes expressing Sp185/333 proteins. Phagocyte fractions enriched from coelomocytes were used in limiting dilutions to obtain samples of single cells that were evaluated for Sp185/333 gene expression by nested RT-PCR. Amplicon sequences showed identical or nearly identical Sp185/333 amplicon sequences in single phagocytes with matches to six known Sp185/333 element patterns, including both common and rare element patterns. This suggested that single phagocytes show restricted expression from the Sp185/333 gene family and infers a diverse, flexible, and efficient response to pathogens. This type of expression pattern from a family of immune response genes in single cells has not been identified previously in other invertebrates. Copyright © 2014 by The American Association of Immunologists, Inc.

  12. Scheduling with genetic algorithms

    NASA Technical Reports Server (NTRS)

    Fennel, Theron R.; Underbrink, A. J., Jr.; Williams, George P. W., Jr.

    1994-01-01

    In many domains, scheduling a sequence of jobs is an important function contributing to the overall efficiency of the operation. At Boeing, we develop schedules for many different domains, including assembly of military and commercial aircraft, weapons systems, and space vehicles. Boeing is under contract to develop scheduling systems for the Space Station Payload Planning System (PPS) and Payload Operations and Integration Center (POIC). These applications require that we respect certain sequencing restrictions among the jobs to be scheduled while at the same time assigning resources to the jobs. We call this general problem scheduling and resource allocation. Genetic algorithms (GA's) offer a search method that uses a population of solutions and benefits from intrinsic parallelism to search the problem space rapidly, producing near-optimal solutions. Good intermediate solutions are probabalistically recombined to produce better offspring (based upon some application specific measure of solution fitness, e.g., minimum flowtime, or schedule completeness). Also, at any point in the search, any intermediate solution can be accepted as a final solution; allowing the search to proceed longer usually produces a better solution while terminating the search at virtually any time may yield an acceptable solution. Many processes are constrained by restrictions of sequence among the individual jobs. For a specific job, other jobs must be completed beforehand. While there are obviously many other constraints on processes, it is these on which we focussed for this research: how to allocate crews to jobs while satisfying job precedence requirements and personnel, and tooling and fixture (or, more generally, resource) requirements.

  13. Computational power and generative capacity of genetic systems.

    PubMed

    Igamberdiev, Abir U; Shklovskiy-Kordi, Nikita E

    2016-01-01

    Semiotic characteristics of genetic sequences are based on the general principles of linguistics formulated by Ferdinand de Saussure, such as the arbitrariness of sign and the linear nature of the signifier. Besides these semiotic features that are attributable to the basic structure of the genetic code, the principle of generativity of genetic language is important for understanding biological transformations. The problem of generativity in genetic systems arises to a possibility of different interpretations of genetic texts, and corresponds to what Alexander von Humboldt called "the infinite use of finite means". These interpretations appear in the individual development as the spatiotemporal sequences of realizations of different textual meanings, as well as the emergence of hyper-textual statements about the text itself, which underlies the process of biological evolution. These interpretations are accomplished at the level of the readout of genetic texts by the structures defined by Efim Liberman as "the molecular computer of cell", which includes DNA, RNA and the corresponding enzymes operating with molecular addresses. The molecular computer performs physically manifested mathematical operations and possesses both reading and writing capacities. Generativity paradoxically resides in the biological computational system as a possibility to incorporate meta-statements about the system, and thus establishes the internal capacity for its evolution. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  14. Selection of an Aptamer Antidote to the Anticoagulant Drug Bivalirudin

    PubMed Central

    Martin, Jennifer A.; Parekh, Parag; Kim, Youngmi; Morey, Timothy E.; Sefah, Kwame; Gravenstein, Nikolaus; Dennis, Donn M.; Tan, Weihong

    2013-01-01

    Adverse drug reactions, including severe patient bleeding, may occur following the administration of anticoagulant drugs. Bivalirudin is a synthetic anticoagulant drug sometimes employed as a substitute for heparin, a commonly used anticoagulant that can cause a condition called heparin-induced thrombocytopenia (HIT). Although bivalrudin has the advantage of not causing HIT, a major concern is lack of an antidote for this drug. In contrast, medical professionals can quickly reverse the effects of heparin using protamine. This report details the selection of an aptamer to bivalirudin that functions as an antidote in buffer. This was accomplished by immobilizing the drug on a monolithic column to partition binding sequences from nonbinding sequences using a low-pressure chromatography system and salt gradient elution. The elution profile of binding sequences was compared to that of a blank column (no drug), and fractions with a chromatographic difference were analyzed via real-time PCR (polymerase chain reaction) and used for further selection. Sequences were identified by 454 sequencing and demonstrated low micromolar dissociation constants through fluorescence anisotropy after only two rounds of selection. One aptamer, JPB5, displayed a dose-dependent reduction of the clotting time in buffer, with a 20 µM aptamer achieving a nearly complete antidote effect. This work is expected to result in a superior safety profile for bivalirudin, resulting in enhanced patient care. PMID:23483901

  15. Phlugis ocraceovittata and its ultrasonic calling song (Orthoptera, Tettigoniidae, Phlugidini).

    PubMed

    Chamorro-Rengifo, Juliana; Braun, Holger

    2016-05-03

    Some observations on the small predatory katydid Phlugis ocraceovittata Piza 1960 from southern Brazil are presented. A male was calling both day and night, producing long uniformly structured sequences with maximum energy between 40 and 60 kHz. According to anecdotal and indirect evidence the species is not exclusively predacious and can live partly also on vegetable food.

  16. Image encryption using random sequence generated from generalized information domain

    NASA Astrophysics Data System (ADS)

    Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

    2016-05-01

    A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.

  17. GRIDSS: sensitive and specific genomic rearrangement detection using positional de Bruijn graph assembly

    PubMed Central

    Do, Hongdo; Molania, Ramyar

    2017-01-01

    The identification of genomic rearrangements with high sensitivity and specificity using massively parallel sequencing remains a major challenge, particularly in precision medicine and cancer research. Here, we describe a new method for detecting rearrangements, GRIDSS (Genome Rearrangement IDentification Software Suite). GRIDSS is a multithreaded structural variant (SV) caller that performs efficient genome-wide break-end assembly prior to variant calling using a novel positional de Bruijn graph-based assembler. By combining assembly, split read, and read pair evidence using a probabilistic scoring, GRIDSS achieves high sensitivity and specificity on simulated, cell line, and patient tumor data, recently winning SV subchallenge #5 of the ICGC-TCGA DREAM8.5 Somatic Mutation Calling Challenge. On human cell line data, GRIDSS halves the false discovery rate compared to other recent methods while matching or exceeding their sensitivity. GRIDSS identifies nontemplate sequence insertions, microhomologies, and large imperfect homologies, estimates a quality score for each breakpoint, stratifies calls into high or low confidence, and supports multisample analysis. PMID:29097403

  18. Evaluation of Advanced Microwave Landing System Procedures in the New York Terminal Area

    DTIC Science & Technology

    1991-03-01

    sector controller called the CAMRN sector who must then sequence that traffic with multiple feeders from the south before handing off to the final...Right (13R) were all being used by landing traffic, the final controller handled the runway 22 arrivals and the CAMRN controller handled the runway 13R...Feeder Fix AAL678 DC10 H 00:09:00 AAL68 B767 H 00:23:00 AAL588 A300 H 00:27:00 PAA224 A300 H 01:20:00 4/ TWAll L101 H 01:34:00 CAMRN Feeder Fix DAL144

  19. Intelligent monitoring and control of semiconductor manufacturing equipment

    NASA Technical Reports Server (NTRS)

    Murdock, Janet L.; Hayes-Roth, Barbara

    1991-01-01

    The use of AI methods to monitor and control semiconductor fabrication in a state-of-the-art manufacturing environment called the Rapid Thermal Multiprocessor is described. Semiconductor fabrication involves many complex processing steps with limited opportunities to measure process and product properties. By applying additional process and product knowledge to that limited data, AI methods augment classical control methods by detecting abnormalities and trends, predicting failures, diagnosing, planning corrective action sequences, explaining diagnoses or predictions, and reacting to anomalous conditions that classical control systems typically would not correct. Research methodology and issues are discussed, and two diagnosis scenarios are examined.

  20. Coordinating complex problem-solving among distributed intelligent agents

    NASA Technical Reports Server (NTRS)

    Adler, Richard M.

    1992-01-01

    A process-oriented control model is described for distributed problem solving. The model coordinates the transfer and manipulation of information across independent networked applications, both intelligent and conventional. The model was implemented using SOCIAL, a set of object-oriented tools for distributing computing. Complex sequences of distributed tasks are specified in terms of high level scripts. Scripts are executed by SOCIAL objects called Manager Agents, which realize an intelligent coordination model that routes individual tasks to suitable server applications across the network. These tools are illustrated in a prototype distributed system for decision support of ground operations for NASA's Space Shuttle fleet.

  1. Merlon-type density waves in a compartmentalized conveyor system

    NASA Astrophysics Data System (ADS)

    Kanellopoulos, G.; van derWeele, K.

    2016-09-01

    Multi-particle flow through a cyclic array of K connected compartments with a preferential direction is known to be able to organize itself in the form of density waves [Kanellopoulos, Van der Meer, and Van der Weele, Phys. Rev. E 92, 022205 (2015)]. In this brief note we focus on the intriguing shape these waves take when K is even, in which case they travel through alternatingly dense and diluted compartments. We call them "merlon waves", since the sequence of high and low densities is reminiscent of the merlons and crenels on the battlements of medieval castles.

  2. Evaluation of new spectral bands for multi-spectral imaging: SMIRR aircraft test results

    USGS Publications Warehouse

    Goetz, Alexander F.H.; Rowan, Lawrence C.; Barringer, Anthony R.

    1980-01-01

    A 10-channel radiometer called the Shuttle Multispectral Infrared Radiometer (SMIRR) is scheduled to take data from orbit on the second shuttle orbital light test. As part of the instrument test sequence, a series of aircraft flights was carried out over 10 test areas in Utah and Nevada. Apart from vegetation, the materials exposed at the surface were volcanic sequences ranging from tuffs to basalts, areas of hydrothermally altered volcanic rocks, sedimentary sequences of sandstone and carbonate rocks, and alluvial cover.

  3. A DNA sequence analysis package for the IBM personal computer.

    PubMed Central

    Lagrimini, L M; Brentano, S T; Donelson, J E

    1984-01-01

    We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433

  4. An Exploration of Rhythmic Grouping of Speech Sequences by French- and German-Learning Infants

    PubMed Central

    Abboub, Nawal; Boll-Avetisyan, Natalie; Bhatara, Anjali; Höhle, Barbara; Nazzi, Thierry

    2016-01-01

    Rhythm in music and speech can be characterized by a constellation of several acoustic cues. Individually, these cues have different effects on rhythmic perception: sequences of sounds alternating in duration are perceived as short-long pairs (weak-strong/iambic pattern), whereas sequences of sounds alternating in intensity or pitch are perceived as loud-soft, or high-low pairs (strong-weak/trochaic pattern). This perceptual bias—called the Iambic-Trochaic Law (ITL)–has been claimed to be an universal property of the auditory system applying in both the music and the language domains. Recent studies have shown that language experience can modulate the effects of the ITL on rhythmic perception of both speech and non-speech sequences in adults, and of non-speech sequences in 7.5-month-old infants. The goal of the present study was to explore whether language experience also modulates infants’ grouping of speech. To do so, we presented sequences of syllables to monolingual French- and German-learning 7.5-month-olds. Using the Headturn Preference Procedure (HPP), we examined whether they were able to perceive a rhythmic structure in sequences of syllables that alternated in duration, pitch, or intensity. Our findings show that both French- and German-learning infants perceived a rhythmic structure when it was cued by duration or pitch but not intensity. Our findings also show differences in how these infants use duration and pitch cues to group syllable sequences, suggesting that pitch cues were the easier ones to use. Moreover, performance did not differ across languages, failing to reveal early language effects on rhythmic perception. These results contribute to our understanding of the origin of rhythmic perception and perceptual mechanisms shared across music and speech, which may bootstrap language acquisition. PMID:27378887

  5. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort

    PubMed Central

    Gambin, Tomasz; Akdemir, Zeynep C.; Yuan, Bo; Gu, Shen; Chiang, Theodore; Carvalho, Claudia M.B.; Shaw, Chad; Jhangiani, Shalini; Boone, Philip M.; Eldomery, Mohammad K.; Karaca, Ender; Bayram, Yavuz; Stray-Pedersen, Asbjørg; Muzny, Donna; Charng, Wu-Lin; Bahrambeigi, Vahid; Belmont, John W.; Boerwinkle, Eric; Beaudet, Arthur L.; Gibbs, Richard A.

    2017-01-01

    Abstract We developed an algorithm, HMZDelFinder, that uses whole exome sequencing (WES) data to identify rare and intragenic homozygous and hemizygous (HMZ) deletions that may represent complete loss-of-function of the indicated gene. HMZDelFinder was applied to 4866 samples in the Baylor–Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% (82% for single-exonic and 88% for multi-exonic calls) and precision of 78% (53% single-exonic and 96% for multi-exonic calls). Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Further investigation of the 64 validated deletion calls revealed at least 15 pathogenic HMZ deletions. Of those, 7 accounted for 17–50% of pathogenic CNVs in different disease cohorts where 7.1–11% of the molecular diagnosis solved rate was attributed to CNVs. In summary, we present an algorithm to detect rare, intragenic, single-exon deletion CNVs using WES data; this tool can be useful for disease gene discovery efforts and clinical WES analyses. PMID:27980096

  6. Dynamical decoupling of unbounded Hamiltonians

    NASA Astrophysics Data System (ADS)

    Arenz, Christian; Burgarth, Daniel; Facchi, Paolo; Hillier, Robin

    2018-03-01

    We investigate the possibility to suppress interactions between a finite dimensional system and an infinite dimensional environment through a fast sequence of unitary kicks on the finite dimensional system. This method, called dynamical decoupling, is known to work for bounded interactions, but physical environments such as bosonic heat baths are usually modeled with unbounded interactions; hence, here, we initiate a systematic study of dynamical decoupling for unbounded operators. We develop a sufficient decoupling criterion for arbitrary Hamiltonians and a necessary decoupling criterion for semibounded Hamiltonians. We give examples for unbounded Hamiltonians where decoupling works and the limiting evolution as well as the convergence speed can be explicitly computed. We show that decoupling does not always work for unbounded interactions and we provide both physically and mathematically motivated examples.

  7. An application of computer aided requirements analysis to a real time deep space system

    NASA Technical Reports Server (NTRS)

    Farny, A. M.; Morris, R. V.; Hartsough, C.; Callender, E. D.; Teichroew, D.; Chikofsky, E.

    1981-01-01

    The entire procedure of incorporating the requirements and goals of a space flight project into integrated, time ordered sequences of spacecraft commands, is called the uplink process. The Uplink Process Control Task (UPCT) was created to examine the uplink process and determine ways to improve it. The Problem Statement Language/Problem Statement Analyzer (PSL/PSA) designed to assist the designer/analyst/engineer in the preparation of specifications of an information system is used as a supporting tool to aid in the analysis. Attention is given to a definition of the uplink process, the definition of PSL/PSA, the construction of a PSA database, the value of analysis to the study of the uplink process, and the PSL/PSA lessons learned.

  8. Overexpression and Purification of C-terminal Fragment of the Passenger Domain of Hap Protein from Nontypeable Haemophilus influenzae in a Highly Optimized Escherichia coli Expression System

    PubMed Central

    Tabatabaee, Akram; Siadat, Seyed Davar; Moosavi, Seyed Fazllolah; Aghasadeghi, Mohammad Reza; Memarnejadian, Arash; Pouriayevali, Mohammad Hassan; Yavari, Neda

    2013-01-01

    Background Nontypeable Haemophilus influenzae (NTHi) is a common cause of respiratory tract disease and initiates infection by colonization in nasopharynx. The Haemophilus influenzae (H. influenzae) Hap adhesin is an auto transporter protein that promotes initial interaction with human epithelial cells. Hap protein contains a 110 kDa internal passenger domain called “HapS” and a 45 kDa C-terminal translocator domain called “Hapβ”. Hap adhesive activity has been recently reported to be connected to its Cell Binding Domain (CBD) which resides within the 311 C-terminal residues of the internal passenger domain of the protein. Furthermore, immunization with this CBD protein has been shown to prevent bacterial nasopharynx colonization in animal models. Methods To provide enough amounts of pure HapS protein for vaccine studies, we sought to develop a highly optimized system to overexpress and purify the protein in large quantities. To this end, pET24a-cbd plasmid harboring cbd sequence from NTHi ATCC49766 was constructed and its expression was optimized by testing various expression parameters such as growth media, induction temperature, IPTG inducer concentration, induction stage and duration. SDS-PAGE and Western-blotting were used for protein analysis and confirmation and eventually the expressed protein was easily purified via immobilized metal affinity chromatography (IMAC) using Ni-NTA columns. Results The highest expression level of target protein was achieved when CBD expressing E. coli BL21 (DE3) cells were grown at 37°C in 2xTY medium with 1.0 mM IPTG at mid-log phase (OD600 nm equal to 0.6) for 5 hrs. Amino acid sequence alignment of expressed CBD protein with 3 previously published CBD amino acid sequences were more than %97 identical and antigenicity plot analysis further revealed 9 antigenic domains which appeared to be well conserved among different analyzed CBD sequences. Conclusion Due to the presence of high similarity among CBD from NTHi ATCC49766 and other NTHi strains, CBD protein expressed here sounds to be theoretically ideal as a universal candidate for being used in vaccine studies against NTHi strains of various geographical areas. Further investigations to corroborate the potency of this protein as a vaccine candidate are under process. PMID:23919121

  9. Geographic Variation in Advertisement Calls in a Tree Frog Species: Gene Flow and Selection Hypotheses

    PubMed Central

    Jang, Yikweon; Hahm, Eun Hye; Lee, Hyun-Jung; Park, Soyeon; Won, Yong-Jin; Choe, Jae C.

    2011-01-01

    Background In a species with a large distribution relative to its dispersal capacity, geographic variation in traits may be explained by gene flow, selection, or the combined effects of both. Studies of genetic diversity using neutral molecular markers show that patterns of isolation by distance (IBD) or barrier effect may be evident for geographic variation at the molecular level in amphibian species. However, selective factors such as habitat, predator, or interspecific interactions may be critical for geographic variation in sexual traits. We studied geographic variation in advertisement calls in the tree frog Hyla japonica to understand patterns of variation in these traits across Korea and provide clues about the underlying forces for variation. Methodology We recorded calls of H. japonica in three breeding seasons from 17 localities including localities in remote Jeju Island. Call characters analyzed were note repetition rate (NRR), note duration (ND), and dominant frequency (DF), along with snout-to-vent length. Results The findings of a barrier effect on DF and a longitudinal variation in NRR seemed to suggest that an open sea between the mainland and Jeju Island and mountain ranges dominated by the north-south Taebaek Mountains were related to geographic variation in call characters. Furthermore, there was a pattern of IBD in mitochondrial DNA sequences. However, no comparable pattern of IBD was found between geographic distance and call characters. We also failed to detect any effects of habitat or interspecific interaction on call characters. Conclusions Geographic variations in call characters as well as mitochondrial DNA sequences were largely stratified by geographic factors such as distance and barriers in Korean populations of H. japoinca. Although we did not detect effects of habitat or interspecific interaction, some other selective factors such as sexual selection might still be operating on call characters in conjunction with restricted gene flow. PMID:21858061

  10. Defiant: (DMRs: easy, fast, identification and ANnoTation) identifies differentially Methylated regions from iron-deficient rat hippocampus.

    PubMed

    Condon, David E; Tran, Phu V; Lien, Yu-Chin; Schug, Jonathan; Georgieff, Michael K; Simmons, Rebecca A; Won, Kyoung-Jae

    2018-02-05

    Identification of differentially methylated regions (DMRs) is the initial step towards the study of DNA methylation-mediated gene regulation. Previous approaches to call DMRs suffer from false prediction, use extreme resources, and/or require library installation and input conversion. We developed a new approach called Defiant to identify DMRs. Employing Weighted Welch Expansion (WWE), Defiant showed superior performance to other predictors in the series of benchmarking tests on artificial and real data. Defiant was subsequently used to investigate DNA methylation changes in iron-deficient rat hippocampus. Defiant identified DMRs close to genes associated with neuronal development and plasticity, which were not identified by its competitor. Importantly, Defiant runs between 5 to 479 times faster than currently available software packages. Also, Defiant accepts 10 different input formats widely used for DNA methylation data. Defiant effectively identifies DMRs for whole-genome bisulfite sequencing (WGBS), reduced-representation bisulfite sequencing (RRBS), Tet-assisted bisulfite sequencing (TAB-seq), and HpaII tiny fragment enrichment by ligation-mediated PCR-tag (HELP) assays.

  11. 11,670 whole-genome sequences representative of the Han Chinese population from the CONVERGE project.

    PubMed

    Cai, Na; Bigdeli, Tim B; Kretzschmar, Warren W; Li, Yihan; Liang, Jieqin; Hu, Jingchu; Peterson, Roseann E; Bacanu, Silviu; Webb, Bradley Todd; Riley, Brien; Li, Qibin; Marchini, Jonathan; Mott, Richard; Kendler, Kenneth S; Flint, Jonathan

    2017-02-14

    The China, Oxford and Virginia Commonwealth University Experimental Research on Genetic Epidemiology (CONVERGE) project on Major Depressive Disorder (MDD) sequenced 11,670 female Han Chinese at low-coverage (1.7X), providing the first large-scale whole genome sequencing resource representative of the largest ethnic group in the world. Samples are collected from 58 hospitals from 23 provinces around China. We are able to call 22 million high quality single nucleotide polymorphisms (SNP) from the nuclear genome, representing the largest SNP call set from an East Asian population to date. We use these variants for imputation of genotypes across all samples, and this has allowed us to perform a successful genome wide association study (GWAS) on MDD. The utility of these data can be extended to studies of genetic ancestry in the Han Chinese and evolutionary genetics when integrated with data from other populations. Molecular phenotypes, such as copy number variations and structural variations can be detected, quantified and analysed in similar ways.

  12. DROMPA: easy-to-handle peak calling and visualization software for the computational analysis and validation of ChIP-seq data.

    PubMed

    Nakato, Ryuichiro; Itoh, Tahehiko; Shirahige, Katsuhiko

    2013-07-01

    Chromatin immunoprecipitation with high-throughput sequencing (ChIP-seq) can identify genomic regions that bind proteins involved in various chromosomal functions. Although the development of next-generation sequencers offers the technology needed to identify these protein-binding sites, the analysis can be computationally challenging because sequencing data sometimes consist of >100 million reads/sample. Herein, we describe a cost-effective and time-efficient protocol that is generally applicable to ChIP-seq analysis; this protocol uses a novel peak-calling program termed DROMPA to identify peaks and an additional program, parse2wig, to preprocess read-map files. This two-step procedure drastically reduces computational time and memory requirements compared with other programs. DROMPA enables the identification of protein localization sites in repetitive sequences and efficiently identifies both broad and sharp protein localization peaks. Specifically, DROMPA outputs a protein-binding profile map in pdf or png format, which can be easily manipulated by users who have a limited background in bioinformatics. © 2013 The Authors Genes to Cells © 2013 by the Molecular Biology Society of Japan and Wiley Publishing Asia Pty Ltd.

  13. VARiD: a variation detection framework for color-space and letter-space platforms.

    PubMed

    Dalca, Adrian V; Rumble, Stephen M; Levy, Samuel; Brudno, Michael

    2010-06-15

    High-throughput sequencing (HTS) technologies are transforming the study of genomic variation. The various HTS technologies have different sequencing biases and error rates, and while most HTS technologies sequence the residues of the genome directly, generating base calls for each position, the Applied Biosystem's SOLiD platform generates dibase-coded (color space) sequences. While combining data from the various platforms should increase the accuracy of variation detection, to date there are only a few tools that can identify variants from color space data, and none that can analyze color space and regular (letter space) data together. We present VARiD--a probabilistic method for variation detection from both letter- and color-space reads simultaneously. VARiD is based on a hidden Markov model and uses the forward-backward algorithm to accurately identify heterozygous, homozygous and tri-allelic SNPs, as well as micro-indels. Our analysis shows that VARiD performs better than the AB SOLiD toolset at detecting variants from color-space data alone, and improves the calls dramatically when letter- and color-space reads are combined. The toolset is freely available at http://compbio.cs.utoronto.ca/varid.

  14. Microbe-ID: an open source toolbox for microbial genotyping and species identification

    PubMed Central

    Tabima, Javier F.; Everhart, Sydney E.; Larsen, Meredith M.; Weisberg, Alexandra J.; Kamvar, Zhian N.; Tancos, Matthew A.; Smart, Christine D.; Chang, Jeff H.

    2016-01-01

    Development of tools to identify species, genotypes, or novel strains of invasive organisms is critical for monitoring emergence and implementing rapid response measures. Molecular markers, although critical to identifying species or genotypes, require bioinformatic tools for analysis. However, user-friendly analytical tools for fast identification are not readily available. To address this need, we created a web-based set of applications called Microbe-ID that allow for customizing a toolbox for rapid species identification and strain genotyping using any genetic markers of choice. Two components of Microbe-ID, named Sequence-ID and Genotype-ID, implement species and genotype identification, respectively. Sequence-ID allows identification of species by using BLAST to query sequences for any locus of interest against a custom reference sequence database. Genotype-ID allows placement of an unknown multilocus marker in either a minimum spanning network or dendrogram with bootstrap support from a user-created reference database. Microbe-ID can be used for identification of any organism based on nucleotide sequences or any molecular marker type and several examples are provided. We created a public website for demonstration purposes called Microbe-ID (microbe-id.org) and provided a working implementation for the genus Phytophthora (phytophthora-id.org). In Phytophthora-ID, the Sequence-ID application allows identification based on ITS or cox spacer sequences. Genotype-ID groups individuals into clonal lineages based on simple sequence repeat (SSR) markers for the two invasive plant pathogen species P. infestans and P. ramorum. All code is open source and available on github and CRAN. Instructions for installation and use are provided at https://github.com/grunwaldlab/Microbe-ID. PMID:27602267

  15. Mutation Scanning in Wheat by Exon Capture and Next-Generation Sequencing.

    PubMed

    King, Robert; Bird, Nicholas; Ramirez-Gonzalez, Ricardo; Coghill, Jane A; Patil, Archana; Hassani-Pak, Keywan; Uauy, Cristobal; Phillips, Andrew L

    2015-01-01

    Targeted Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach to identify novel sequence variation in genomes, with the aims of investigating gene function and/or developing useful alleles for breeding. Despite recent advances in wheat genomics, most current TILLING methods are low to medium in throughput, being based on PCR amplification of the target genes. We performed a pilot-scale evaluation of TILLING in wheat by next-generation sequencing through exon capture. An oligonucleotide-based enrichment array covering ~2 Mbp of wheat coding sequence was used to carry out exon capture and sequencing on three mutagenised lines of wheat containing previously-identified mutations in the TaGA20ox1 homoeologous genes. After testing different mapping algorithms and settings, candidate SNPs were identified by mapping to the IWGSC wheat Chromosome Survey Sequences. Where sequence data for all three homoeologues were found in the reference, mutant calls were unambiguous; however, where the reference lacked one or two of the homoeologues, captured reads from these genes were mis-mapped to other homoeologues, resulting either in dilution of the variant allele frequency or assignment of mutations to the wrong homoeologue. Competitive PCR assays were used to validate the putative SNPs and estimate cut-off levels for SNP filtering. At least 464 high-confidence SNPs were detected across the three mutagenized lines, including the three known alleles in TaGA20ox1, indicating a mutation rate of ~35 SNPs per Mb, similar to that estimated by PCR-based TILLING. This demonstrates the feasibility of using exon capture for genome re-sequencing as a method of mutation detection in polyploid wheat, but accurate mutation calling will require an improved genomic reference with more comprehensive coverage of homoeologues.

  16. Sanger Confirmation Is Required to Achieve Optimal Sensitivity and Specificity in Next-Generation Sequencing Panel Testing.

    PubMed

    Mu, Wenbo; Lu, Hsiao-Mei; Chen, Jefferey; Li, Shuwei; Elliott, Aaron M

    2016-11-01

    Next-generation sequencing (NGS) has rapidly replaced Sanger sequencing as the method of choice for diagnostic gene-panel testing. For hereditary-cancer testing, the technical sensitivity and specificity of the assay are paramount as clinicians use results to make important clinical management and treatment decisions. There is significant debate within the diagnostics community regarding the necessity of confirming NGS variant calls by Sanger sequencing, considering that numerous laboratories report having 100% specificity from the NGS data alone. Here we report our results from 20,000 hereditary-cancer NGS panels spanning 47 genes, in which all 7845 nonpolymorphic variants were Sanger- sequenced. Of these, 98.7% were concordant between NGS and Sanger sequencing and 1.3% were identified as NGS false-positives, located mainly in complex genomic regions (A/T-rich regions, G/C-rich regions, homopolymer stretches, and pseudogene regions). Simulating a false-positive rate of zero by adjusting the variant-calling quality-score thresholds decreased the sensitivity of the assay from 100% to 97.8%, resulting in the missed detection of 176 Sanger-confirmed variants, the majority in complex genomic regions (n = 114) and mosaic mutations (n = 7). The data illustrate the importance of setting quality thresholds for panel testing only after thousands of samples have been processed and the necessity of Sanger confirmation of NGS variants to maintain the highest possible sensitivity. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  17. TreeShrink: fast and accurate detection of outlier long branches in collections of phylogenetic trees.

    PubMed

    Mai, Uyen; Mirarab, Siavash

    2018-05-08

    Sequence data used in reconstructing phylogenetic trees may include various sources of error. Typically errors are detected at the sequence level, but when missed, the erroneous sequences often appear as unexpectedly long branches in the inferred phylogeny. We propose an automatic method to detect such errors. We build a phylogeny including all the data then detect sequences that artificially inflate the tree diameter. We formulate an optimization problem, called the k-shrink problem, that seeks to find k leaves that could be removed to maximally reduce the tree diameter. We present an algorithm to find the exact solution for this problem in polynomial time. We then use several statistical tests to find outlier species that have an unexpectedly high impact on the tree diameter. These tests can use a single tree or a set of related gene trees and can also adjust to species-specific patterns of branch length. The resulting method is called TreeShrink. We test our method on six phylogenomic biological datasets and an HIV dataset and show that the method successfully detects and removes long branches. TreeShrink removes sequences more conservatively than rogue taxon removal and often reduces gene tree discordance more than rogue taxon removal once the amount of filtering is controlled. TreeShrink is an effective method for detecting sequences that lead to unrealistically long branch lengths in phylogenetic trees. The tool is publicly available at https://github.com/uym2/TreeShrink .

  18. Site-Specific Cassette Exchange Systems in the Aedes aegypti Mosquito and the Plutella xylostella Moth

    PubMed Central

    Haghighat-Khah, Roya Elaine; Scaife, Sarah; Martins, Sara; St John, Oliver; Matzen, Kelly Jean; Morrison, Neil; Alphey, Luke

    2015-01-01

    Genetically engineered insects are being evaluated as potential tools to decrease the economic and public health burden of mosquitoes and agricultural pest insects. Here we describe a new tool for the reliable and targeted genome manipulation of pest insects for research and field release using recombinase mediated cassette exchange (RMCE) mechanisms. We successfully demonstrated the established ΦC31-RMCE method in the yellow fever mosquito, Aedes aegypti, which is the first report of RMCE in mosquitoes. A new variant of this RMCE system, called iRMCE, combines the ΦC31-att integration system and Cre or FLP-mediated excision to remove extraneous sequences introduced as part of the site-specific integration process. Complete iRMCE was achieved in two important insect pests, Aedes aegypti and the diamondback moth, Plutella xylostella, demonstrating the transferability of the system across a wide phylogenetic range of insect pests. PMID:25830287

  19. Gene and translation initiation site prediction in metagenomic sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hyatt, Philip Douglas; LoCascio, Philip F; Hauser, Loren John

    2012-01-01

    Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translationmore » initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.« less

  20. What can we learn about lyssavirus genomes using 454 sequencing?

    PubMed

    Höper, Dirk; Finke, Stefan; Freuling, Conrad M; Hoffmann, Bernd; Beer, Martin

    2012-01-01

    The main task of the individual project number four"Whole genome sequencing, virus-host adaptation, and molecular epidemiological analyses of lyssaviruses "within the network" Lyssaviruses--a potential re-emerging public health threat" is to provide high quality complete genome sequences from lyssaviruses. These sequences are analysed in-depth with regard to the diversity of the viral populations as to both quasi-species and so-called defective interfering RNAs. Moreover, the sequence data will facilitate further epidemiological analyses, will provide insight into the evolution of lyssaviruses and will be the basis for the design of novel nucleic acid based diagnostics. The first results presented here indicate that not only high quality full-length lyssavirus genome sequences can be generated, but indeed efficient analysis of the viral population gets feasible.

  1. New Stopping Criteria for Segmenting DNA Sequences

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Li, Wentian

    2001-06-18

    We propose a solution on the stopping criterion in segmenting inhomogeneous DNA sequences with complex statistical patterns. This new stopping criterion is based on Bayesian information criterion in the model selection framework. When this criterion is applied to telomere of S.cerevisiae and the complete sequence of E.coli, borders of biologically meaningful units were identified, and a more reasonable number of domains was obtained. We also introduce a measure called segmentation strength which can be used to control the delineation of large domains. The relationship between the average domain size and the threshold of segmentation strength is determined for several genomemore » sequences.« less

  2. LongISLND: in silico sequencing of lengthy and noisy datatypes

    PubMed Central

    Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C.; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y. K.

    2016-01-01

    Summary: LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. Availability and Implementation: LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd Contact: hugo.lam@roche.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:27667791

  3. Whole exome sequencing for familial bicuspid aortic valve identifies putative variants.

    PubMed

    Martin, Lisa J; Pilipenko, Valentina; Kaufman, Kenneth M; Cripe, Linda; Kottyan, Leah C; Keddache, Mehdi; Dexheimer, Phillip; Weirauch, Matthew T; Benson, D Woodrow

    2014-10-01

    Bicuspid aortic valve (BAV) is the most common congenital cardiovascular malformation. Although highly heritable, few causal variants have been identified. The purpose of this study was to identify genetic variants underlying BAV by whole exome sequencing a multiplex BAV kindred. Whole exome sequencing was performed on 17 individuals from a single family (BAV=3; other cardiovascular malformation, 3). Postvariant calling error control metrics were established after examining the relationship between Mendelian inheritance error rate and coverage, quality score, and call rate. To determine the most effective approach to identifying susceptibility variants from among 54 674 variants passing error control metrics, we evaluated 3 variant selection strategies frequently used in whole exome sequencing studies plus extended family linkage. No putative rare, high-effect variants were identified in all affected but no unaffected individuals. Eight high-effect variants were identified by ≥2 of the commonly used selection strategies; however, these were either common in the general population (>10%) or present in the majority of the unaffected family members. However, using extended family linkage, 3 synonymous variants were identified; all 3 variants were identified by at least one other strategy. These results suggest that traditional whole exome sequencing approaches, which assume causal variants alter coding sense, may be insufficient for BAV and other complex traits. Identification of disease-associated variants is facilitated by the use of segregation within families. © 2014 American Heart Association, Inc.

  4. Identification of DNA-binding proteins by combining auto-cross covariance transformation and ensemble learning.

    PubMed

    Liu, Bin; Wang, Shanyi; Dong, Qiwen; Li, Shumin; Liu, Xuan

    2016-04-20

    DNA-binding proteins play a pivotal role in various intra- and extra-cellular activities ranging from DNA replication to gene expression control. With the rapid development of next generation of sequencing technique, the number of protein sequences is unprecedentedly increasing. Thus it is necessary to develop computational methods to identify the DNA-binding proteins only based on the protein sequence information. In this study, a novel method called iDNA-KACC is presented, which combines the Support Vector Machine (SVM) and the auto-cross covariance transformation. The protein sequences are first converted into profile-based protein representation, and then converted into a series of fixed-length vectors by the auto-cross covariance transformation with Kmer composition. The sequence order effect can be effectively captured by this scheme. These vectors are then fed into Support Vector Machine (SVM) to discriminate the DNA-binding proteins from the non DNA-binding ones. iDNA-KACC achieves an overall accuracy of 75.16% and Matthew correlation coefficient of 0.5 by a rigorous jackknife test. Its performance is further improved by employing an ensemble learning approach, and the improved predictor is called iDNA-KACC-EL. Experimental results on an independent dataset shows that iDNA-KACC-EL outperforms all the other state-of-the-art predictors, indicating that it would be a useful computational tool for DNA binding protein identification. .

  5. Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic

    PubMed Central

    Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K.; Strug, Lisa J.

    2014-01-01

    Motivation: Sufficiently powered case–control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. Results: We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the ‘gold standard’ analysis with the true underlying genotypes for both common and rare variants. Availability and implementation: An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. Contact: lisa.strug@utoronto.ca Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24733292

  6. On the joint spectral density of bivariate random sequences. Thesis Technical Report No. 21

    NASA Technical Reports Server (NTRS)

    Aalfs, David D.

    1995-01-01

    For univariate random sequences, the power spectral density acts like a probability density function of the frequencies present in the sequence. This dissertation extends that concept to bivariate random sequences. For this purpose, a function called the joint spectral density is defined that represents a joint probability weighing of the frequency content of pairs of random sequences. Given a pair of random sequences, the joint spectral density is not uniquely determined in the absence of any constraints. Two approaches to constraining the sequences are suggested: (1) assume the sequences are the margins of some stationary random field, (2) assume the sequences conform to a particular model that is linked to the joint spectral density. For both approaches, the properties of the resulting sequences are investigated in some detail, and simulation is used to corroborate theoretical results. It is concluded that under either of these two constraints, the joint spectral density can be computed from the non-stationary cross-correlation.

  7. Tandem Mass Spectrum Sequencing: An Alternative to Database Search Engines in Shotgun Proteomics.

    PubMed

    Muth, Thilo; Rapp, Erdmann; Berven, Frode S; Barsnes, Harald; Vaudel, Marc

    2016-01-01

    Protein identification via database searches has become the gold standard in mass spectrometry based shotgun proteomics. However, as the quality of tandem mass spectra improves, direct mass spectrum sequencing gains interest as a database-independent alternative. In this chapter, the general principle of this so-called de novo sequencing is introduced along with pitfalls and challenges of the technique. The main tools available are presented with a focus on user friendly open source software which can be directly applied in everyday proteomic workflows.

  8. Clustered regularly interspaced short palindromic repeats (CRISPRs) for the genotyping of bacterial pathogens.

    PubMed

    Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

    2009-01-01

    Clustered regularly interspaced short palindromic repeats (CRISPRs) are DNA sequences composed of a succession of repeats (23- to 47-bp long) separated by unique sequences called spacers. Polymorphism can be observed in different strains of a species and may be used for genotyping. We describe protocols and bioinformatics tools that allow the identification of CRISPRs from sequenced genomes, their comparison, and their component determination (the direct repeats and the spacers). A schematic representation of the spacer organization can be produced, allowing an easy comparison between strains.

  9. An algebraic hypothesis about the primeval genetic code architecture.

    PubMed

    Sánchez, Robersy; Grau, Ricardo

    2009-09-01

    A plausible architecture of an ancient genetic code is derived from an extended base triplet vector space over the Galois field of the extended base alphabet {D,A,C,G,U}, where symbol D represents one or more hypothetical bases with unspecific pairings. We hypothesized that the high degeneration of a primeval genetic code with five bases and the gradual origin and improvement of a primeval DNA repair system could make possible the transition from ancient to modern genetic codes. Our results suggest that the Watson-Crick base pairing G identical with C and A=U and the non-specific base pairing of the hypothetical ancestral base D used to define the sum and product operations are enough features to determine the coding constraints of the primeval and the modern genetic code, as well as, the transition from the former to the latter. Geometrical and algebraic properties of this vector space reveal that the present codon assignment of the standard genetic code could be induced from a primeval codon assignment. Besides, the Fourier spectrum of the extended DNA genome sequences derived from the multiple sequence alignment suggests that the called period-3 property of the present coding DNA sequences could also exist in the ancient coding DNA sequences. The phylogenetic analyses achieved with metrics defined in the N-dimensional vector space (B(3))(N) of DNA sequences and with the new evolutionary model presented here also suggest that an ancient DNA coding sequence with five or more bases does not contradict the expected evolutionary history.

  10. Evaluation of genomic high-throughput sequencing data generated on Illumina HiSeq and Genome Analyzer systems

    PubMed Central

    2011-01-01

    Background The generation and analysis of high-throughput sequencing data are becoming a major component of many studies in molecular biology and medical research. Illumina's Genome Analyzer (GA) and HiSeq instruments are currently the most widely used sequencing devices. Here, we comprehensively evaluate properties of genomic HiSeq and GAIIx data derived from two plant genomes and one virus, with read lengths of 95 to 150 bases. Results We provide quantifications and evidence for GC bias, error rates, error sequence context, effects of quality filtering, and the reliability of quality values. By combining different filtering criteria we reduced error rates 7-fold at the expense of discarding 12.5% of alignable bases. While overall error rates are low in HiSeq data we observed regions of accumulated wrong base calls. Only 3% of all error positions accounted for 24.7% of all substitution errors. Analyzing the forward and reverse strands separately revealed error rates of up to 18.7%. Insertions and deletions occurred at very low rates on average but increased to up to 2% in homopolymers. A positive correlation between read coverage and GC content was found depending on the GC content range. Conclusions The errors and biases we report have implications for the use and the interpretation of Illumina sequencing data. GAIIx and HiSeq data sets show slightly different error profiles. Quality filtering is essential to minimize downstream analysis artifacts. Supporting previous recommendations, the strand-specificity provides a criterion to distinguish sequencing errors from low abundance polymorphisms. PMID:22067484

  11. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files.

    PubMed

    Sun, Xiaobo; Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng; Qin, Zhaohui S

    2018-06-01

    Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)-based high-performance computing (HPC) implementation, and the popular VCFTools. Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems.

  12. Optimized distributed systems achieve significant performance improvement on sorted merging of massive VCF files

    PubMed Central

    Gao, Jingjing; Jin, Peng; Eng, Celeste; Burchard, Esteban G; Beaty, Terri H; Ruczinski, Ingo; Mathias, Rasika A; Barnes, Kathleen; Wang, Fusheng

    2018-01-01

    Abstract Background Sorted merging of genomic data is a common data operation necessary in many sequencing-based studies. It involves sorting and merging genomic data from different subjects by their genomic locations. In particular, merging a large number of variant call format (VCF) files is frequently required in large-scale whole-genome sequencing or whole-exome sequencing projects. Traditional single-machine based methods become increasingly inefficient when processing large numbers of files due to the excessive computation time and Input/Output bottleneck. Distributed systems and more recent cloud-based systems offer an attractive solution. However, carefully designed and optimized workflow patterns and execution plans (schemas) are required to take full advantage of the increased computing power while overcoming bottlenecks to achieve high performance. Findings In this study, we custom-design optimized schemas for three Apache big data platforms, Hadoop (MapReduce), HBase, and Spark, to perform sorted merging of a large number of VCF files. These schemas all adopt the divide-and-conquer strategy to split the merging job into sequential phases/stages consisting of subtasks that are conquered in an ordered, parallel, and bottleneck-free way. In two illustrating examples, we test the performance of our schemas on merging multiple VCF files into either a single TPED or a single VCF file, which are benchmarked with the traditional single/parallel multiway-merge methods, message passing interface (MPI)–based high-performance computing (HPC) implementation, and the popular VCFTools. Conclusions Our experiments suggest all three schemas either deliver a significant improvement in efficiency or render much better strong and weak scalabilities over traditional methods. Our findings provide generalized scalable schemas for performing sorted merging on genetics and genomics data using these Apache distributed systems. PMID:29762754

  13. GraphTeams: a method for discovering spatial gene clusters in Hi-C sequencing data.

    PubMed

    Schulz, Tizian; Stoye, Jens; Doerr, Daniel

    2018-05-08

    Hi-C sequencing offers novel, cost-effective means to study the spatial conformation of chromosomes. We use data obtained from Hi-C experiments to provide new evidence for the existence of spatial gene clusters. These are sets of genes with associated functionality that exhibit close proximity to each other in the spatial conformation of chromosomes across several related species. We present the first gene cluster model capable of handling spatial data. Our model generalizes a popular computational model for gene cluster prediction, called δ-teams, from sequences to graphs. Following previous lines of research, we subsequently extend our model to allow for several vertices being associated with the same label. The model, called δ-teams with families, is particular suitable for our application as it enables handling of gene duplicates. We develop algorithmic solutions for both models. We implemented the algorithm for discovering δ-teams with families and integrated it into a fully automated workflow for discovering gene clusters in Hi-C data, called GraphTeams. We applied it to human and mouse data to find intra- and interchromosomal gene cluster candidates. The results include intrachromosomal clusters that seem to exhibit a closer proximity in space than on their chromosomal DNA sequence. We further discovered interchromosomal gene clusters that contain genes from different chromosomes within the human genome, but are located on a single chromosome in mouse. By identifying δ-teams with families, we provide a flexible model to discover gene cluster candidates in Hi-C data. Our analysis of Hi-C data from human and mouse reveals several known gene clusters (thus validating our approach), but also few sparsely studied or possibly unknown gene cluster candidates that could be the source of further experimental investigations.

  14. Light-emitting diode street lights reduce last-ditch evasive manoeuvres by moths to bat echolocation calls

    PubMed Central

    Wakefield, Andrew; Stone, Emma L.; Jones, Gareth; Harris, Stephen

    2015-01-01

    The light-emitting diode (LED) street light market is expanding globally, and it is important to understand how LED lights affect wildlife populations. We compared evasive flight responses of moths to bat echolocation calls experimentally under LED-lit and -unlit conditions. Significantly, fewer moths performed ‘powerdive’ flight manoeuvres in response to bat calls (feeding buzz sequences from Nyctalus spp.) under an LED street light than in the dark. LED street lights reduce the anti-predator behaviour of moths, shifting the balance in favour of their predators, aerial hawking bats. PMID:26361558

  15. Random trinomial tree models and vanilla options

    NASA Astrophysics Data System (ADS)

    Ganikhodjaev, Nasir; Bayram, Kamola

    2013-09-01

    In this paper we introduce and study random trinomial model. The usual trinomial model is prescribed by triple of numbers (u, d, m). We call the triple (u, d, m) an environment of the trinomial model. A triple (Un, Dn, Mn), where {Un}, {Dn} and {Mn} are the sequences of independent, identically distributed random variables with 0 < Dn < 1 < Un and Mn = 1 for all n, is called a random environment and trinomial tree model with random environment is called random trinomial model. The random trinomial model is considered to produce more accurate results than the random binomial model or usual trinomial model.

  16. Large Scale Comparative Visualisation of Regulatory Networks with TRNDiff

    DOE PAGES

    Chua, Xin-Yi; Buckingham, Lawrence; Hogan, James M.; ...

    2015-06-01

    The advent of Next Generation Sequencing (NGS) technologies has seen explosive growth in genomic datasets, and dense coverage of related organisms, supporting study of subtle, strain-specific variations as a determinant of function. Such data collections present fresh and complex challenges for bioinformatics, those of comparing models of complex relationships across hundreds and even thousands of sequences. Transcriptional Regulatory Network (TRN) structures document the influence of regulatory proteins called Transcription Factors (TFs) on associated Target Genes (TGs). TRNs are routinely inferred from model systems or iterative search, and analysis at these scales requires simultaneous displays of multiple networks well beyond thosemore » of existing network visualisation tools [1]. In this paper we describe TRNDiff, an open source system supporting the comparative analysis and visualization of TRNs (and similarly structured data) from many genomes, allowing rapid identification of functional variations within species. The approach is demonstrated through a small scale multiple TRN analysis of the Fur iron-uptake system of Yersinia, suggesting a number of candidate virulence factors; and through a larger study exploiting integration with the RegPrecise database (http://regprecise.lbl.gov; [2]) - a collection of hundreds of manually curated and predicted transcription factor regulons drawn from across the entire spectrum of prokaryotic organisms.« less

  17. Pre-Mrna Introns as a Model for Cryptographic Algorithm:. Theory and Experiments

    NASA Astrophysics Data System (ADS)

    Regoli, Massimo

    2010-01-01

    The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. In particular the RNA sequences have some sections called Introns. Introns, derived from the term "intragenic regions", are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by Biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behaviour in the access to the secret key to code the messages. In the RNA-Crypto System algorithm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.

  18. Bio—Cryptography: A Possible Coding Role for RNA Redundancy

    NASA Astrophysics Data System (ADS)

    Regoli, M.

    2009-03-01

    The RNA-Crypto System (shortly RCS) is a symmetric key algorithm to cipher data. The idea for this new algorithm starts from the observation of nature. In particular from the observation of RNA behavior and some of its properties. The RNA sequences have some sections called Introns. Introns, derived from the term "intragenic regions," are non-coding sections of precursor mRNA (pre-mRNA) or other RNAs, that are removed (spliced out of the RNA) before the mature RNA is formed. Once the introns have been spliced out of a pre-mRNA, the resulting mRNA sequence is ready to be translated into a protein. The corresponding parts of a gene are known as introns as well. The nature and the role of Introns in the pre-mRNA is not clear and it is under ponderous researches by biologists but, in our case, we will use the presence of Introns in the RNA-Crypto System output as a strong method to add chaotic non coding information and an unnecessary behavior in the access to the secret key to code the messages. In the RNA-Crypto System algorithm the introns are sections of the ciphered message with non-coding information as well as in the precursor mRNA.

  19. Design and Evaluation of the Terminal Area Precision Scheduling and Spacing System

    NASA Technical Reports Server (NTRS)

    Swenson, Harry N.; Thipphavong, Jane; Sadovsky, Alex; Chen, Liang; Sullivan, Chris; Martin, Lynne

    2011-01-01

    This paper describes the design, development and results from a high fidelity human-in-the-loop simulation of an integrated set of trajectory-based automation tools providing precision scheduling, sequencing and controller merging and spacing functions. These integrated functions are combined into a system called the Terminal Area Precision Scheduling and Spacing (TAPSS) system. It is a strategic and tactical planning tool that provides Traffic Management Coordinators, En Route and Terminal Radar Approach Control air traffic controllers the ability to efficiently optimize the arrival capacity of a demand-impacted airport while simultaneously enabling fuel-efficient descent procedures. The TAPSS system consists of four-dimensional trajectory prediction, arrival runway balancing, aircraft separation constraint-based scheduling, traffic flow visualization and trajectory-based advisories to assist controllers in efficient metering, sequencing and spacing. The TAPSS system was evaluated and compared to today's ATC operation through extensive series of human-in-the-loop simulations for arrival flows into the Los Angeles International Airport. The test conditions included the variation of aircraft demand from a baseline of today's capacity constrained periods through 5%, 10% and 20% increases. Performance data were collected for engineering and human factor analysis and compared with similar operations both with and without the TAPSS system. The engineering data indicate operations with the TAPSS show up to a 10% increase in airport throughput during capacity constrained periods while maintaining fuel-efficient aircraft descent profiles from cruise to landing.

  20. Probing the potential of CnaB-type domains for the design of tag/catcher systems

    PubMed Central

    Pröschel, Marlene; Kraner, Max E.; Horn, Anselm H. C.; Schäfer, Lena; Sonnewald, Uwe

    2017-01-01

    Building proteins into larger, post-translational assemblies in a defined and stable way is still a challenging task. A promising approach relies on so-called tag/catcher systems that are fused to the proteins of interest and allow a durable linkage via covalent intermolecular bonds. Tags and catchers are generated by splitting protein domains that contain intramolecular isopeptide or ester bonds that form autocatalytically under physiological conditions. There are already numerous biotechnological and medical applications that demonstrate the usefulness of covalent linkages mediated by these systems. Additional covalent tag/catcher systems would allow creating more complex and ultra-stable protein architectures and networks. Two of the presently available tag/catcher systems were derived from closely related CnaB-domains of Streptococcus pyogenes and Streptococcus dysgalactiae proteins. However, it is unclear whether domain splitting is generally tolerated within the CnaB-family or only by a small subset of these domains. To address this point, we have selected a set of four CnaB domains of low sequence similarity and characterized the resulting tag/catcher systems by computational and experimental methods. Experimental testing for intermolecular isopeptide bond formation demonstrated two of the four systems to be functional. For these two systems length and sequence variations of the peptide tags were investigated revealing only a relatively small effect on the efficiency of the reaction. Our study suggests that splitting into tag and catcher moieties is tolerated by a significant portion of the naturally occurring CnaB-domains, thus providing a large reservoir for the design of novel tag/catcher systems. PMID:28654665

  1. Homozygous and hemizygous CNV detection from exome sequencing data in a Mendelian disease cohort.

    PubMed

    Gambin, Tomasz; Akdemir, Zeynep C; Yuan, Bo; Gu, Shen; Chiang, Theodore; Carvalho, Claudia M B; Shaw, Chad; Jhangiani, Shalini; Boone, Philip M; Eldomery, Mohammad K; Karaca, Ender; Bayram, Yavuz; Stray-Pedersen, Asbjørg; Muzny, Donna; Charng, Wu-Lin; Bahrambeigi, Vahid; Belmont, John W; Boerwinkle, Eric; Beaudet, Arthur L; Gibbs, Richard A; Lupski, James R

    2017-02-28

    We developed an algorithm, HMZDelFinder, that uses whole exome sequencing (WES) data to identify rare and intragenic homozygous and hemizygous (HMZ) deletions that may represent complete loss-of-function of the indicated gene. HMZDelFinder was applied to 4866 samples in the Baylor-Hopkins Center for Mendelian Genomics (BHCMG) cohort and detected 773 HMZ deletion calls (567 homozygous or 206 hemizygous) with an estimated sensitivity of 86.5% (82% for single-exonic and 88% for multi-exonic calls) and precision of 78% (53% single-exonic and 96% for multi-exonic calls). Out of 773 HMZDelFinder-detected deletion calls, 82 were subjected to array comparative genomic hybridization (aCGH) and/or breakpoint PCR and 64 were confirmed. These include 18 single-exon deletions out of which 8 were exclusively detected by HMZDelFinder and not by any of seven other CNV detection tools examined. Further investigation of the 64 validated deletion calls revealed at least 15 pathogenic HMZ deletions. Of those, 7 accounted for 17-50% of pathogenic CNVs in different disease cohorts where 7.1-11% of the molecular diagnosis solved rate was attributed to CNVs. In summary, we present an algorithm to detect rare, intragenic, single-exon deletion CNVs using WES data; this tool can be useful for disease gene discovery efforts and clinical WES analyses. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.

  2. Single stock dynamics on high-frequency data: from a compressed coding perspective.

    PubMed

    Fushing, Hsieh; Chen, Shu-Chun; Hwang, Chii-Ruey

    2014-01-01

    High-frequency return, trading volume and transaction number are digitally coded via a nonparametric computing algorithm, called hierarchical factor segmentation (HFS), and then are coupled together to reveal a single stock dynamics without global state-space structural assumptions. The base-8 digital coding sequence, which is capable of revealing contrasting aggregation against sparsity of extreme events, is further compressed into a shortened sequence of state transitions. This compressed digital code sequence vividly demonstrates that the aggregation of large absolute returns is the primary driving force for stimulating both the aggregations of large trading volumes and transaction numbers. The state of system-wise synchrony is manifested with very frequent recurrence in the stock dynamics. And this data-driven dynamic mechanism is seen to correspondingly vary as the global market transiting in and out of contraction-expansion cycles. These results not only elaborate the stock dynamics of interest to a fuller extent, but also contradict some classical theories in finance. Overall this version of stock dynamics is potentially more coherent and realistic, especially when the current financial market is increasingly powered by high-frequency trading via computer algorithms, rather than by individual investors.

  3. VarDetect: a nucleotide sequence variation exploratory tool

    PubMed Central

    Ngamphiw, Chumpol; Kulawonganunchai, Supasak; Assawamakin, Anunchai; Jenwitheesuk, Ekachai; Tongsima, Sissades

    2008-01-01

    Background Single nucleotide polymorphisms (SNPs) are the most commonly studied units of genetic variation. The discovery of such variation may help to identify causative gene mutations in monogenic diseases and SNPs associated with predisposing genes in complex diseases. Accurate detection of SNPs requires software that can correctly interpret chromatogram signals to nucleotides. Results We present VarDetect, a stand-alone nucleotide variation exploratory tool that automatically detects nucleotide variation from fluorescence based chromatogram traces. Accurate SNP base-calling is achieved using pre-calculated peak content ratios, and is enhanced by rules which account for common sequence reading artifacts. The proposed software tool is benchmarked against four other well-known SNP discovery software tools (PolyPhred, novoSNP, Genalys and Mutation Surveyor) using fluorescence based chromatograms from 15 human genes. These chromatograms were obtained from sequencing 16 two-pooled DNA samples; a total of 32 individual DNA samples. In this comparison of automatic SNP detection tools, VarDetect achieved the highest detection efficiency. Availability VarDetect is compatible with most major operating systems such as Microsoft Windows, Linux, and Mac OSX. The current version of VarDetect is freely available at . PMID:19091032

  4. Single Stock Dynamics on High-Frequency Data: From a Compressed Coding Perspective

    PubMed Central

    Fushing, Hsieh; Chen, Shu-Chun; Hwang, Chii-Ruey

    2014-01-01

    High-frequency return, trading volume and transaction number are digitally coded via a nonparametric computing algorithm, called hierarchical factor segmentation (HFS), and then are coupled together to reveal a single stock dynamics without global state-space structural assumptions. The base-8 digital coding sequence, which is capable of revealing contrasting aggregation against sparsity of extreme events, is further compressed into a shortened sequence of state transitions. This compressed digital code sequence vividly demonstrates that the aggregation of large absolute returns is the primary driving force for stimulating both the aggregations of large trading volumes and transaction numbers. The state of system-wise synchrony is manifested with very frequent recurrence in the stock dynamics. And this data-driven dynamic mechanism is seen to correspondingly vary as the global market transiting in and out of contraction-expansion cycles. These results not only elaborate the stock dynamics of interest to a fuller extent, but also contradict some classical theories in finance. Overall this version of stock dynamics is potentially more coherent and realistic, especially when the current financial market is increasingly powered by high-frequency trading via computer algorithms, rather than by individual investors. PMID:24586235

  5. FANCJ promotes DNA synthesis through G-quadruplex structures

    PubMed Central

    Castillo Bosch, Pau; Segura-Bayona, Sandra; Koole, Wouter; van Heteren, Jane T; Dewar, James M; Tijsterman, Marcel; Knipscheer, Puck

    2014-01-01

    Our genome contains many G-rich sequences, which have the propensity to fold into stable secondary DNA structures called G4 or G-quadruplex structures. These structures have been implicated in cellular processes such as gene regulation and telomere maintenance. However, G4 sequences are prone to mutations particularly upon replication stress or in the absence of specific helicases. To investigate how G-quadruplex structures are resolved during DNA replication, we developed a model system using ssDNA templates and Xenopus egg extracts that recapitulates eukaryotic G4 replication. Here, we show that G-quadruplex structures form a barrier for DNA replication. Nascent strand synthesis is blocked at one or two nucleotides from the G4. After transient stalling, G-quadruplexes are efficiently unwound and replicated. In contrast, depletion of the FANCJ/BRIP1 helicase causes persistent replication stalling at G-quadruplex structures, demonstrating a vital role for this helicase in resolving these structures. FANCJ performs this function independently of the classical Fanconi anemia pathway. These data provide evidence that the G4 sequence instability in FANCJ−/− cells and Fancj/dog1 deficient C. elegans is caused by replication stalling at G-quadruplexes. PMID:25193968

  6. Validation of Genotyping-By-Sequencing Analysis in Populations of Tetraploid Alfalfa by 454 Sequencing

    PubMed Central

    Rocher, Solen; Jean, Martine; Castonguay, Yves; Belzile, François

    2015-01-01

    Genotyping-by-sequencing (GBS) is a relatively low-cost high throughput genotyping technology based on next generation sequencing and is applicable to orphan species with no reference genome. A combination of genome complexity reduction and multiplexing with DNA barcoding provides a simple and affordable way to resolve allelic variation between plant samples or populations. GBS was performed on ApeKI libraries using DNA from 48 genotypes each of two heterogeneous populations of tetraploid alfalfa (Medicago sativa spp. sativa): the synthetic cultivar Apica (ATF0) and a derived population (ATF5) obtained after five cycles of recurrent selection for superior tolerance to freezing (TF). Nearly 400 million reads were obtained from two lanes of an Illumina HiSeq 2000 sequencer and analyzed with the Universal Network-Enabled Analysis Kit (UNEAK) pipeline designed for species with no reference genome. Following the application of whole dataset-level filters, 11,694 single nucleotide polymorphism (SNP) loci were obtained. About 60% had a significant match on the Medicago truncatula syntenic genome. The accuracy of allelic ratios and genotype calls based on GBS data was directly assessed using 454 sequencing on a subset of SNP loci scored in eight plant samples. Sequencing depth in this study was not sufficient for accurate tetraploid allelic dosage, but reliable genotype calls based on diploid allelic dosage were obtained when using additional quality filtering. Principal Component Analysis of SNP loci in plant samples revealed that a small proportion (<5%) of the genetic variability assessed by GBS is able to differentiate ATF0 and ATF5. Our results confirm that analysis of GBS data using UNEAK is a reliable approach for genome-wide discovery of SNP loci in outcrossed polyploids. PMID:26115486

  7. Validation and optimization of the Ion Torrent S5 XL sequencer and Oncomine workflow for BRCA1 and BRCA2 genetic testing.

    PubMed

    Shin, Saeam; Kim, Yoonjung; Chul Oh, Seoung; Yu, Nae; Lee, Seung-Tae; Rak Choi, Jong; Lee, Kyung-A

    2017-05-23

    In this study, we validated the analytical performance of BRCA1/2 sequencing using Ion Torrent's new bench-top sequencer with amplicon panel with optimized bioinformatics pipelines. Using 43 samples that were previously validated by Illumina's MiSeq platform and/or by Sanger sequencing/multiplex ligation-dependent probe amplification, we amplified the target with the Oncomine™ BRCA Research Assay and sequenced on Ion Torrent S5 XL (Thermo Fisher Scientific, Waltham, MA, USA). We compared two bioinformatics pipelines for optimal processing of S5 XL sequence data: the Torrent Suite with a plug-in Torrent Variant Caller (Thermo Fisher Scientific), and commercial NextGENe software (Softgenetics, State College, PA, USA). All expected 681 single nucleotide variants, 15 small indels, and three copy number variants were correctly called, except one common variant adjacent to a rare variant on the primer-binding site. The sensitivity, specificity, false positive rate, and accuracy for detection of single nucleotide variant and small indels of S5 XL sequencing were 99.85%, 100%, 0%, and 99.99% for the Torrent Variant Caller and 99.85%, 99.99%, 0.14%, and 99.99% for NextGENe, respectively. The reproducibility of variant calling was 100%, and the precision of variant frequency also showed good performance with coefficients of variation between 0.32 and 5.29%. We obtained highly accurate data through uniform and sufficient coverage depth over all target regions and through optimization of the bioinformatics pipeline. We confirmed that our platform is accurate and practical for diagnostic BRCA1/2 testing in a clinical laboratory.

  8. Sequence capture of ultraconserved elements from bird museum specimens.

    PubMed

    McCormack, John E; Tsai, Whitney L E; Faircloth, Brant C

    2016-09-01

    New DNA sequencing technologies are allowing researchers to explore the genomes of the millions of natural history specimens collected prior to the molecular era. Yet, we know little about how well specific next-generation sequencing (NGS) techniques work with the degraded DNA typically extracted from museum specimens. Here, we use one type of NGS approach, sequence capture of ultraconserved elements (UCEs), to collect data from bird museum specimens as old as 120 years. We targeted 5060 UCE loci in 27 western scrub-jays (Aphelocoma californica) representing three evolutionary lineages that could be species, and we collected an average of 3749 UCE loci containing 4460 single nucleotide polymorphisms (SNPs). Despite older specimens producing fewer and shorter loci in general, we collected thousands of markers from even the oldest specimens. More sequencing reads per individual helped to boost the number of UCE loci we recovered from older specimens, but more sequencing was not as successful at increasing the length of loci. We detected contamination in some samples and determined that contamination was more prevalent in older samples that were subject to less sequencing. For the phylogeny generated from concatenated UCE loci, contamination led to incorrect placement of some individuals. In contrast, a species tree constructed from SNPs called within UCE loci correctly placed individuals into three monophyletic groups, perhaps because of the stricter analytical procedures used for SNP calling. This study and other recent studies on the genomics of museum specimens have profound implications for natural history collections, where millions of older specimens should now be considered genomic resources. © 2015 The Authors. Molecular Ecology Resources Published by John Wiley & Sons Ltd.

  9. AmpliVar: mutation detection in high-throughput sequence from amplicon-based libraries.

    PubMed

    Hsu, Arthur L; Kondrashova, Olga; Lunke, Sebastian; Love, Clare J; Meldrum, Cliff; Marquis-Nicholson, Renate; Corboy, Greg; Pham, Kym; Wakefield, Matthew; Waring, Paul M; Taylor, Graham R

    2015-04-01

    Conventional means of identifying variants in high-throughput sequencing align each read against a reference sequence, and then call variants at each position. Here, we demonstrate an orthogonal means of identifying sequence variation by grouping the reads as amplicons prior to any alignment. We used AmpliVar to make key-value hashes of sequence reads and group reads as individual amplicons using a table of flanking sequences. Low-abundance reads were removed according to a selectable threshold, and reads above this threshold were aligned as groups, rather than as individual reads, permitting the use of sensitive alignment tools. We show that this approach is more sensitive, more specific, and more computationally efficient than comparable methods for the analysis of amplicon-based high-throughput sequencing data. The method can be extended to enable alignment-free confirmation of variants seen in hybridization capture target-enrichment data. © 2015 WILEY PERIODICALS, INC.

  10. Evidence of automatic processing in sequence learning using process-dissociation

    PubMed Central

    Mong, Heather M.; McCabe, David P.; Clegg, Benjamin A.

    2012-01-01

    This paper proposes a way to apply process-dissociation to sequence learning in addition and extension to the approach used by Destrebecqz and Cleeremans (2001). Participants were trained on two sequences separated from each other by a short break. Following training, participants self-reported their knowledge of the sequences. A recognition test was then performed which required discrimination of two trained sequences, either under the instructions to call any sequence encountered in the experiment “old” (the inclusion condition), or only sequence fragments from one half of the experiment “old” (the exclusion condition). The recognition test elicited automatic and controlled process estimates using the process dissociation procedure, and suggested both processes were involved. Examining the underlying processes supporting performance may provide more information on the fundamental aspects of the implicit and explicit constructs than has been attainable through awareness testing. PMID:22679465

  11. Amietia angolensis and A. fuscigula (Anura: Pyxicephalidae) in southern Africa: a cold case reheated.

    PubMed

    Channing, Alan; Baptista, Ninda

    2013-01-01

    A study combining DNA sequences of the mitochondrial 16S rRNA gene, advertisement calls and morphology of some southern African river frogs confirms Amietia vandijki (Visser & Channing, 1997) as a good species. The form presently referred to as Amietia angolensis in southern Africa is shown to comprise two species: Amietia angolensis (Bocage, 1866) known from Angola, and Amietia quecketti (Boulenger, 1895) known from South Africa, Zimbabwe and Lesotho. Junior synonyms of A. quecketti include Rana theileri Mocquard, 1906 and Afrana dracomontana Channing, 1978. The form presently known as Amietia fuscigula is shown to consist of two distantly related taxa: Amietia fuscigula (Duméril & Bibron, 1841) from the south-western Cape and an undescribed species that we here name Amietia poyntoni sp. nov. Channing & Baptista, known from the rest of South Africa and Namibia. These five species have large differences in 16S sequences, as well as differences in morphology and advertisement call. Call and molecular data are both diagnostic, while morphology shows some overlap between taxa. An extended study of the genus across Africa is in preparation.

  12. Systems configured to distribute a telephone call, communication systems, communication methods and methods of routing a telephone call to a service representative

    DOEpatents

    Harris, Scott H.; Johnson, Joel A.; Neiswanger, Jeffery R.; Twitchell, Kevin E.

    2004-03-09

    The present invention includes systems configured to distribute a telephone call, communication systems, communication methods and methods of routing a telephone call to a customer service representative. In one embodiment of the invention, a system configured to distribute a telephone call within a network includes a distributor adapted to connect with a telephone system, the distributor being configured to connect a telephone call using the telephone system and output the telephone call and associated data of the telephone call; and a plurality of customer service representative terminals connected with the distributor and a selected customer service representative terminal being configured to receive the telephone call and the associated data, the distributor and the selected customer service representative terminal being configured to synchronize, application of the telephone call and associated data from the distributor to the selected customer service representative terminal.

  13. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Anderson, Mark A.; Bigelow, Matthew; Gilkey, Jeff C.

    The Super Strypi Navigation, Guidance & Control Software is a real-time implementation of the navigation, guidance and control algorithms designed to deliver a payload to a desired orbit for the rail launched Super Strypi launch vehicle. The software contains all flight control algorithms required from pre-launch until orbital insertion. The flight sequencer module calls the NG&C functions at the appropriate times of flight. Additional functionality includes all the low level drivers and I/O for communicating to other systems within the launch vehicle and to the ground support equipment. The software is designed such that changes to the launch location andmore » desired orbit can be changed without recompiling the code.« less

  14. GFAST Software Demonstration

    NASA Image and Video Library

    2017-03-17

    NASA engineers and test directors gather in Firing Room 3 in the Launch Control Center at NASA's Kennedy Space Center in Florida, to watch a demonstration of the automated command and control software for the agency's Space Launch System (SLS) and Orion spacecraft. In front, far right, is Charlie Blackwell-Thompson, launch director for Exploration Mission 1 (EM-1). The software is called the Ground Launch Sequencer. It will be responsible for nearly all of the launch commit criteria during the final phases of launch countdowns. The Ground and Flight Application Software Team (GFAST) demonstrated the software. It was developed by the Command, Control and Communications team in the Ground Systems Development and Operations (GSDO) Program. GSDO is helping to prepare the center for the first test flight of Orion atop the SLS on EM-1.

  15. Synthesis and Properties of Size-expanded DNAs: Toward Designed, Functional Genetic Systems

    PubMed Central

    Krueger, Andrew T.; Lu, Haige; Lee, Alex H. F.; Kool, Eric T.

    2008-01-01

    We describe the design, synthesis, and properties of DNA-like molecules in which the base pairs are expanded by benzo homologation. The resulting size-expanded genetic helices are called xDNA (“expanded DNA”) and yDNA (“wide DNA”). The large component bases are fluorescent, and they display high stacking affinity. When singly substituted into natural DNA, they are destabilizing because the benzo-expanded base pair size is too large for the natural helix. However, when all base pairs are expanded, xDNA and yDNA form highly stable, sequence-selective double helices. The size-expanded DNAs are candidates for components of new, functioning genetic systems. In addition, the fluorescence of expanded DNA bases makes them potentially useful in probing nucleic acids. PMID:17309194

  16. From seconds to months: an overview of multi-scale dynamics of mobile telephone calls

    NASA Astrophysics Data System (ADS)

    Saramäki, Jari; Moro, Esteban

    2015-06-01

    Big Data on electronic records of social interactions allow approaching human behaviour and sociality from a quantitative point of view with unforeseen statistical power. Mobile telephone Call Detail Records (CDRs), automatically collected by telecom operators for billing purposes, have proven especially fruitful for understanding one-to-one communication patterns as well as the dynamics of social networks that are reflected in such patterns. We present an overview of empirical results on the multi-scale dynamics of social dynamics and networks inferred from mobile telephone calls. We begin with the shortest timescales and fastest dynamics, such as burstiness of call sequences between individuals, and "zoom out" towards longer temporal and larger structural scales, from temporal motifs formed by correlated calls between multiple individuals to long-term dynamics of social groups. We conclude this overview with a future outlook.

  17. SMITH: a LIMS for handling next-generation sequencing workflows

    PubMed Central

    2014-01-01

    Background Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). Methods SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. Results SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. Conclusions SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis. PMID:25471934

  18. SMITH: a LIMS for handling next-generation sequencing workflows.

    PubMed

    Venco, Francesco; Vaskin, Yuriy; Ceol, Arnaud; Muller, Heiko

    2014-01-01

    Life-science laboratories make increasing use of Next Generation Sequencing (NGS) for studying bio-macromolecules and their interactions. Array-based methods for measuring gene expression or protein-DNA interactions are being replaced by RNA-Seq and ChIP-Seq. Sequencing is generally performed by specialized facilities that have to keep track of sequencing requests, trace samples, ensure quality and make data available according to predefined privileges. An integrated tool helps to troubleshoot problems, to maintain a high quality standard, to reduce time and costs. Commercial and non-commercial tools called LIMS (Laboratory Information Management Systems) are available for this purpose. However, they often come at prohibitive cost and/or lack the flexibility and scalability needed to adjust seamlessly to the frequently changing protocols employed. In order to manage the flow of sequencing data produced at the Genomic Unit of the Italian Institute of Technology (IIT), we developed SMITH (Sequencing Machine Information Tracking and Handling). SMITH is a web application with a MySQL server at the backend. Wet-lab scientists of the Centre for Genomic Science and database experts from the Politecnico of Milan in the context of a Genomic Data Model Project developed SMITH. The data base schema stores all the information of an NGS experiment, including the descriptions of all protocols and algorithms used in the process. Notably, an attribute-value table allows associating an unconstrained textual description to each sample and all the data produced afterwards. This method permits the creation of metadata that can be used to search the database for specific files as well as for statistical analyses. SMITH runs automatically and limits direct human interaction mainly to administrative tasks. SMITH data-delivery procedures were standardized making it easier for biologists and analysts to navigate the data. Automation also helps saving time. The workflows are available through an API provided by the workflow management system. The parameters and input data are passed to the workflow engine that performs de-multiplexing, quality control, alignments, etc. SMITH standardizes, automates, and speeds up sequencing workflows. Annotation of data with key-value pairs facilitates meta-analysis.

  19. Developmental validation of the MiSeq FGx Forensic Genomics System for Targeted Next Generation Sequencing in Forensic DNA Casework and Database Laboratories.

    PubMed

    Jäger, Anne C; Alvarez, Michelle L; Davis, Carey P; Guzmán, Ernesto; Han, Yonmee; Way, Lisa; Walichiewicz, Paulina; Silva, David; Pham, Nguyen; Caves, Glorianna; Bruand, Jocelyne; Schlesinger, Felix; Pond, Stephanie J K; Varlaro, Joe; Stephens, Kathryn M; Holt, Cydne L

    2017-05-01

    Human DNA profiling using PCR at polymorphic short tandem repeat (STR) loci followed by capillary electrophoresis (CE) size separation and length-based allele typing has been the standard in the forensic community for over 20 years. Over the last decade, Next-Generation Sequencing (NGS) matured rapidly, bringing modern advantages to forensic DNA analysis. The MiSeq FGx™ Forensic Genomics System, comprised of the ForenSeq™ DNA Signature Prep Kit, MiSeq FGx™ Reagent Kit, MiSeq FGx™ instrument and ForenSeq™ Universal Analysis Software, uses PCR to simultaneously amplify up to 231 forensic loci in a single multiplex reaction. Targeted loci include Amelogenin, 27 common, forensic autosomal STRs, 24 Y-STRs, 7 X-STRs and three classes of single nucleotide polymorphisms (SNPs). The ForenSeq™ kit includes two primer sets: Amelogenin, 58 STRs and 94 identity informative SNPs (iiSNPs) are amplified using DNA Primer Set A (DPMA; 153 loci); if a laboratory chooses to generate investigative leads using DNA Primer Set B, amplification is targeted to the 153 loci in DPMA plus 22 phenotypic informative (piSNPs) and 56 biogeographical ancestry SNPs (aiSNPs). High-resolution genotypes, including detection of intra-STR sequence variants, are semi-automatically generated with the ForenSeq™ software. This system was subjected to developmental validation studies according to the 2012 Revised SWGDAM Validation Guidelines. A two-step PCR first amplifies the target forensic STR and SNP loci (PCR1); unique, sample-specific indexed adapters or "barcodes" are attached in PCR2. Approximately 1736 ForenSeq™ reactions were analyzed. Studies include DNA substrate testing (cotton swabs, FTA cards, filter paper), species studies from a range of nonhuman organisms, DNA input sensitivity studies from 1ng down to 7.8pg, two-person human DNA mixture testing with three genotype combinations, stability analysis of partially degraded DNA, and effects of five commonly encountered PCR inhibitors. Calculations from ForenSeq™ STR and SNP repeatability and reproducibility studies (1ng template) indicate 100.0% accuracy of the MiSeq FGx™ System in allele calling relative to CE for STRs (1260 samples), and >99.1% accuracy relative to bead array typing for SNPs (1260 samples for iiSNPs, 310 samples for aiSNPs and piSNPs), with >99.0% and >97.8% precision, respectively. Call rates of >99.0% were observed for all STRs and SNPs amplified with both ForenSeq™ primer mixes. Limitations of the MiSeq FGx™ System are discussed. Results described here demonstrate that the MiSeq FGx™ System meets forensic DNA quality assurance guidelines with robust, reliable, and reproducible performance on samples of various quantities and qualities. Copyright © 2017 The Authors. Published by Elsevier B.V. All rights reserved.

  20. Inverse statistical physics of protein sequences: a key issues review.

    PubMed

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  1. Inverse statistical physics of protein sequences: a key issues review

    NASA Astrophysics Data System (ADS)

    Cocco, Simona; Feinauer, Christoph; Figliuzzi, Matteo; Monasson, Rémi; Weigt, Martin

    2018-03-01

    In the course of evolution, proteins undergo important changes in their amino acid sequences, while their three-dimensional folded structure and their biological function remain remarkably conserved. Thanks to modern sequencing techniques, sequence data accumulate at unprecedented pace. This provides large sets of so-called homologous, i.e. evolutionarily related protein sequences, to which methods of inverse statistical physics can be applied. Using sequence data as the basis for the inference of Boltzmann distributions from samples of microscopic configurations or observables, it is possible to extract information about evolutionary constraints and thus protein function and structure. Here we give an overview over some biologically important questions, and how statistical-mechanics inspired modeling approaches can help to answer them. Finally, we discuss some open questions, which we expect to be addressed over the next years.

  2. Governor Bush makes first phone call to KSC using new area code

    NASA Technical Reports Server (NTRS)

    1999-01-01

    At 8 a.m. in the videoconference room at Headquarters, Deputy Director for Business Operations Jim Jennings (center) makes the connection for a phone call from Florida Governor Jeb Bush and Center Director Roy Bridges in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Key representatives of KSC contractors, along with KSC directorates, fill the room where the phone call is being received. Seated next to Jennings are Robert Osband (left), Florida Space Institute, and Col. Stephan Duresky (right), vice commander, 45th Space Wing. Osband is the one who suggested the 3-2-1 sequence to reflect the importance of the space industry to Florida's space coast.

  3. Governor Bush makes first phone call to KSC using new area code

    NASA Technical Reports Server (NTRS)

    1999-01-01

    At 8 a.m. in the videoconference room at Headquarters, Deputy Director for Business Operations Jim Jennings (center) waits for a phone call from Florida Governor Jeb Bush and Center Director Roy Bridges in Tallahassee, Fla. The call is to inaugurate the change of KSC's area code from 407 to 321, effective today. Key representatives of KSC contractors, along with KSC directorates, fill the room where the phone call is being received. Seated next to Jennings are Robert Osband (left), Florida Space Institute, and Col. Stephan Duresky (right), vice commander, 45th Space Wing. Osband is the one who suggested the 3-2-1 sequence, to reflect the importance of the space industry to Florida's space coast.

  4. Nanopore sequencing of drug-resistance-associated genes in malaria parasites, Plasmodium falciparum.

    PubMed

    Runtuwene, Lucky R; Tuda, Josef S B; Mongan, Arthur E; Makalowski, Wojciech; Frith, Martin C; Imwong, Mallika; Srisutham, Suttipat; Nguyen Thi, Lan Anh; Tuan, Nghia Nguyen; Eshita, Yuki; Maeda, Ryuichiro; Yamagishi, Junya; Suzuki, Yutaka

    2018-05-29

    Here, we report the application of a portable sequencer, MinION, for genotyping the malaria parasite Plasmodium falciparum. In the present study, an amplicon mixture of nine representative genes causing resistance to anti-malaria drugs is diagnosed. First, we developed the procedure for four laboratory strains (3D7, Dd2, 7G8, and K1), and then applied the developed procedure to ten clinical samples. We sequenced and re-sequenced the samples using the obsolete flow cell R7.3 and the most recent flow cell R9.4. Although the average base-call accuracy of the MinION sequencer was 74.3%, performing >50 reads at a given position improves the accuracy of the SNP call, yielding a precision and recall rate of 0.92 and 0.8, respectively, with flow cell R7.3. These numbers increased significantly with flow cell R9.4, in which the precision and recall are 1 and 0.97, respectively. Based on the SNP information, the drug resistance status in ten clinical samples was inferred. We also analyzed K13 gene mutations from 54 additional clinical samples as a proof of concept. We found that a novel amino-acid changing variation is dominant in this area. In addition, we performed a small population-based analysis using 3 and 5 cases (K13) and 10 and 5 cases (PfCRT) from Thailand and Vietnam, respectively. We identified distinct genotypes from the respective regions. This approach will change the standard methodology for the sequencing diagnosis of malaria parasites, especially in developing countries.

  5. SNP Discovery in the Transcriptome of White Pacific Shrimp Litopenaeus vannamei by Next Generation Sequencing

    PubMed Central

    Yu, Yang; Wei, Jiankai; Zhang, Xiaojun; Liu, Jingwen; Liu, Chengzhang; Li, Fuhua; Xiang, Jianhai

    2014-01-01

    The application of next generation sequencing technology has greatly facilitated high throughput single nucleotide polymorphism (SNP) discovery and genotyping in genetic research. In the present study, SNPs were discovered based on two transcriptomes of Litopenaeus vannamei (L. vannamei) generated from Illumina sequencing platform HiSeq 2000. One transcriptome of L. vannamei was obtained through sequencing on the RNA from larvae at mysis stage and its reference sequence was de novo assembled. The data from another transcriptome were downloaded from NCBI and the reads of the two transcriptomes were mapped separately to the assembled reference by BWA. SNP calling was performed using SAMtools. A total of 58,717 and 36,277 SNPs with high quality were predicted from the two transcriptomes, respectively. SNP calling was also performed using the reads of two transcriptomes together, and a total of 96,040 SNPs with high quality were predicted. Among these 96,040 SNPs, 5,242 and 29,129 were predicted as non-synonymous and synonymous SNPs respectively. Characterization analysis of the predicted SNPs in L. vannamei showed that the estimated SNP frequency was 0.21% (one SNP per 476 bp) and the estimated ratio for transition to transversion was 2.0. Fifty SNPs were randomly selected for validation by Sanger sequencing after PCR amplification and 76% of SNPs were confirmed, which indicated that the SNPs predicted in this study were reliable. These SNPs will be very useful for genetic study in L. vannamei, especially for the high density linkage map construction and genome-wide association studies. PMID:24498047

  6. A Control Chart Approach for Representing and Mining Data Streams with Shape Based Similarity

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Omitaomu, Olufemi A

    The mining of data streams for online condition monitoring is a challenging task in several domains including (electric) power grid system, intelligent manufacturing, and consumer science. Considering a power grid application in which thousands of sensors, called the phasor measurement units, are deployed on the power grid network to continuously collect streams of digital data for real-time situational awareness and system management. Depending on design, each sensor could stream between ten and sixty data samples per second. The myriad of sensory data captured could convey deeper insights about sequence of events in real-time and before major damages are done. However,more » the timely processing and analysis of these high-velocity and high-volume data streams is a challenge. Hence, a new data processing and transformation approach, based on the concept of control charts, for representing sequence of data streams from sensors is proposed. In addition, an application of the proposed approach for enhancing data mining tasks such as clustering using real-world power grid data streams is presented. The results indicate that the proposed approach is very efficient for data streams storage and manipulation.« less

  7. Programmable removal of bacterial strains by use of genome-targeting CRISPR-Cas systems.

    PubMed

    Gomaa, Ahmed A; Klumpe, Heidi E; Luo, Michelle L; Selle, Kurt; Barrangou, Rodolphe; Beisel, Chase L

    2014-01-28

    CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated) systems in bacteria and archaea employ CRISPR RNAs to specifically recognize the complementary DNA of foreign invaders, leading to sequence-specific cleavage or degradation of the target DNA. Recent work has shown that the accidental or intentional targeting of the bacterial genome is cytotoxic and can lead to cell death. Here, we have demonstrated that genome targeting with CRISPR-Cas systems can be employed for the sequence-specific and titratable removal of individual bacterial strains and species. Using the type I-E CRISPR-Cas system in Escherichia coli as a model, we found that this effect could be elicited using native or imported systems and was similarly potent regardless of the genomic location, strand, or transcriptional activity of the target sequence. Furthermore, the specificity of targeting with CRISPR RNAs could readily distinguish between even highly similar strains in pure or mixed cultures. Finally, varying the collection of delivered CRISPR RNAs could quantitatively control the relative number of individual strains within a mixed culture. Critically, the observed selectivity and programmability of bacterial removal would be virtually impossible with traditional antibiotics, bacteriophages, selectable markers, or tailored growth conditions. Once delivery challenges are addressed, we envision that this approach could offer a novel means to quantitatively control the composition of environmental and industrial microbial consortia and may open new avenues for the development of "smart" antibiotics that circumvent multidrug resistance and differentiate between pathogenic and beneficial microorganisms. Controlling the composition of microbial populations is a critical aspect in medicine, biotechnology, and environmental cycles. While different antimicrobial strategies, such as antibiotics, antimicrobial peptides, and lytic bacteriophages, offer partial solutions, what remains elusive is a generalized and programmable strategy that can distinguish between even closely related microorganisms and that allows for fine control over the composition of a microbial population. This study demonstrates that RNA-directed immune systems in bacteria and archaea called CRISPR-Cas systems can provide such a strategy. These systems can be employed to selectively and quantitatively remove individual bacterial strains based purely on sequence information, creating opportunities in the treatment of multidrug-resistant infections, the control of industrial fermentations, and the study of microbial consortia.

  8. Population-scale whole genome sequencing identifies 271 highly polymorphic short tandem repeats from Japanese population.

    PubMed

    Hirata, Satoshi; Kojima, Kaname; Misawa, Kazuharu; Gervais, Olivier; Kawai, Yosuke; Nagasaki, Masao

    2018-05-01

    Forensic DNA typing is widely used to identify missing persons and plays a central role in forensic profiling. DNA typing usually uses capillary electrophoresis fragment analysis of PCR amplification products to detect the length of short tandem repeat (STR) markers. Here, we analyzed whole genome data from 1,070 Japanese individuals generated using massively parallel short-read sequencing of 162 paired-end bases. We have analyzed 843,473 STR loci with two to six basepair repeat units and cataloged highly polymorphic STR loci in the Japanese population. To evaluate the performance of the cataloged STR loci, we compared 23 STR loci, widely used in forensic DNA typing, with capillary electrophoresis based STR genotyping results in the Japanese population. Seventeen loci had high correlations and high call rates. The other six loci had low call rates or low correlations due to either the limitations of short-read sequencing technology, the bioinformatics tool used, or the complexity of repeat patterns. With these analyses, we have also purified the suitable 218 STR loci with four basepair repeat units and 53 loci with five basepair repeat units both for short read sequencing and PCR based technologies, which would be candidates to the actual forensic DNA typing in Japanese population.

  9. Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

    PubMed

    Sim, Mikang; Kim, Jaebum

    2015-02-01

    The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.

  10. ɛ-connectedness, finite approximations, shape theory and coarse graining in hyperspaces

    NASA Astrophysics Data System (ADS)

    Alonso-Morón, Manuel; Cuchillo-Ibanez, Eduardo; Luzón, Ana

    2008-12-01

    We use upper semifinite hyperspaces of compacta to describe ε-connectedness and to compute homology from finite approximations. We find a new connection between ε-connectedness and the so-called Shape Theory. We construct a geodesically complete R-tree, by means of ε-components at different resolutions, whose behavior at infinite captures the topological structure of the space of components of a given compact metric space. We also construct inverse sequences of finite spaces using internal finite approximations of compact metric spaces. These sequences can be converted into inverse sequences of polyhedra and simplicial maps by means of what we call the Alexandroff-McCord correspondence. This correspondence allows us to relate upper semifinite hyperspaces of finite approximation with the Vietoris-Rips complexes of such approximations at different resolutions. Two motivating examples are included in the introduction. We propose this procedure as a different mathematical foundation for problems on data analysis. This process is intrinsically related to the methodology of shape theory. This paper reinforces Robins’s idea of using methods from shape theory to compute homology from finite approximations.

  11. Sequence Segmentation with changeptGUI.

    PubMed

    Tasker, Edward; Keith, Jonathan M

    2017-01-01

    Many biological sequences have a segmental structure that can provide valuable clues to their content, structure, and function. The program changept is a tool for investigating the segmental structure of a sequence, and can also be applied to multiple sequences in parallel to identify a common segmental structure, thus providing a method for integrating multiple data types to identify functional elements in genomes. In the previous edition of this book, a command line interface for changept is described. Here we present a graphical user interface for this package, called changeptGUI. This interface also includes tools for pre- and post-processing of data and results to facilitate investigation of the number and characteristics of segment classes.

  12. A simple algorithm for quantifying DNA methylation levels on multiple independent CpG sites in bisulfite genomic sequencing electropherograms.

    PubMed

    Leakey, Tatiana I; Zielinski, Jerzy; Siegfried, Rachel N; Siegel, Eric R; Fan, Chun-Yang; Cooney, Craig A

    2008-06-01

    DNA methylation at cytosines is a widely studied epigenetic modification. Methylation is commonly detected using bisulfite modification of DNA followed by PCR and additional techniques such as restriction digestion or sequencing. These additional techniques are either laborious, require specialized equipment, or are not quantitative. Here we describe a simple algorithm that yields quantitative results from analysis of conventional four-dye-trace sequencing. We call this method Mquant and we compare it with the established laboratory method of combined bisulfite restriction assay (COBRA). This analysis of sequencing electropherograms provides a simple, easily applied method to quantify DNA methylation at specific CpG sites.

  13. LongISLND: in silico sequencing of lengthy and noisy datatypes.

    PubMed

    Lau, Bayo; Mohiyuddin, Marghoob; Mu, John C; Fang, Li Tai; Bani Asadi, Narges; Dallett, Carolina; Lam, Hugo Y K

    2016-12-15

    LongISLND is a software package designed to simulate sequencing data according to the characteristics of third generation, single-molecule sequencing technologies. The general software architecture is easily extendable, as demonstrated by the emulation of Pacific Biosciences (PacBio) multi-pass sequencing with P5 and P6 chemistries, producing data in FASTQ, H5, and the latest PacBio BAM format. We demonstrate its utility by downstream processing with consensus building and variant calling. LongISLND is implemented in Java and available at http://bioinform.github.io/longislnd CONTACT: hugo.lam@roche.comSupplementary information: Supplementary data are available at Bioinformatics online. © The Author 2016. Published by Oxford University Press.

  14. The distributed annotation system.

    PubMed

    Dowell, R D; Jokerst, R M; Day, A; Eddy, S R; Stein, L

    2001-01-01

    Currently, most genome annotation is curated by centralized groups with limited resources. Efforts to share annotations transparently among multiple groups have not yet been satisfactory. Here we introduce a concept called the Distributed Annotation System (DAS). DAS allows sequence annotations to be decentralized among multiple third-party annotators and integrated on an as-needed basis by client-side software. The communication between client and servers in DAS is defined by the DAS XML specification. Annotations are displayed in layers, one per server. Any client or server adhering to the DAS XML specification can participate in the system; we describe a simple prototype client and server example. The DAS specification is being used experimentally by Ensembl, WormBase, and the Berkeley Drosophila Genome Project. Continued success will depend on the readiness of the research community to adopt DAS and provide annotations. All components are freely available from the project website http://www.biodas.org/.

  15. A UML-based meta-framework for system design in public health informatics.

    PubMed

    Orlova, Anna O; Lehmann, Harold

    2002-01-01

    The National Agenda for Public Health Informatics calls for standards in data and knowledge representation within public health, which requires a multi-level framework that links all aspects of public health. The literature of public health informatics and public health informatics application were reviewed. A UML-based systems analysis was performed. Face validity of results was evaluated in analyzing the public health domain of lead poisoning. The core class of the UML-based system of public health is the Public Health Domain, which is associated with multiple Problems, for which Actors provide Perspectives. Actors take Actions that define, generate, utilize and/or evaluate Data Sources. The life cycle of the domain is a sequence of activities attributed to its problems that spirals through multiple iterations and realizations within a domain. The proposed Public Health Informatics Meta-Framework broadens efforts in applying informatics principles to the field of public health

  16. A cascade reaction network mimicking the basic functional steps of adaptive immune response

    NASA Astrophysics Data System (ADS)

    Han, Da; Wu, Cuichen; You, Mingxu; Zhang, Tao; Wan, Shuo; Chen, Tao; Qiu, Liping; Zheng, Zheng; Liang, Hao; Tan, Weihong

    2015-10-01

    Biological systems use complex ‘information-processing cores’ composed of molecular networks to coordinate their external environment and internal states. An example of this is the acquired, or adaptive, immune system (AIS), which is composed of both humoral and cell-mediated components. Here we report the step-by-step construction of a prototype mimic of the AIS that we call an adaptive immune response simulator (AIRS). DNA and enzymes are used as simple artificial analogues of the components of the AIS to create a system that responds to specific molecular stimuli in vitro. We show that this network of reactions can function in a manner that is superficially similar to the most basic responses of the vertebrate AIS, including reaction sequences that mimic both humoral and cellular responses. As such, AIRS provides guidelines for the design and engineering of artificial reaction networks and molecular devices.

  17. Iterative Repair Planning for Spacecraft Operations Using the Aspen System

    NASA Technical Reports Server (NTRS)

    Rabideau, G.; Knight, R.; Chien, S.; Fukunaga, A.; Govindjee, A.

    2000-01-01

    This paper describes the Automated Scheduling and Planning Environment (ASPEN). ASPEN encodes complex spacecraft knowledge of operability constraints, flight rules, spacecraft hardware, science experiments and operations procedures to allow for automated generation of low level spacecraft sequences. Using a technique called iterative repair, ASPEN classifies constraint violations (i.e., conflicts) and attempts to repair each by performing a planning or scheduling operation. It must reason about which conflict to resolve first and what repair method to try for the given conflict. ASPEN is currently being utilized in the development of automated planner/scheduler systems for several spacecraft, including the UFO-1 naval communications satellite and the Citizen Explorer (CX1) satellite, as well as for planetary rover operations and antenna ground systems automation. This paper focuses on the algorithm and search strategies employed by ASPEN to resolve spacecraft operations constraints, as well as the data structures for representing these constraints.

  18. Nonclassicality of Temporal Correlations.

    PubMed

    Brierley, Stephen; Kosowski, Adrian; Markiewicz, Marcin; Paterek, Tomasz; Przysiężna, Anna

    2015-09-18

    The results of spacelike separated measurements are independent of distant measurement settings, a property one might call two-way no-signaling. In contrast, timelike separated measurements are only one-way no-signaling since the past is independent of the future but not vice versa. For this reason some temporal correlations that are formally identical to nonclassical spatial correlations can still be modeled classically. We propose a new formulation of Bell's theorem for temporal correlations; namely, we define nonclassical temporal correlations as the ones which cannot be simulated by propagating in time the classical information content of a quantum system given by the Holevo bound. We first show that temporal correlations between results of any projective quantum measurements on a qubit can be simulated classically. Then we present a sequence of general measurements on a single m-level quantum system that cannot be explained by propagating in time an m-level classical system and using classical computers with unlimited memory.

  19. Identification of co-occurrence in a patient with Dent's disease and ADA2-deficiency by exome sequencing.

    PubMed

    Günthner, Roman; Wagner, Matias; Thurm, Tobias; Ponsel, Sabine; Höfele, Julia; Lange-Sperandio, Bärbel

    2018-04-05

    Patients with co-occurrence of two independent pathologies pose a challenge for clinicians as the phenotype often presents as an unclear syndrome. In these cases, exome sequencing serves as a powerful instrument to determine the underlying genetic causes. Here, we present the case of a 4-year old boy with proteinuria, microhematuria, hypercalciuria, nephrocalcinosis, livedo-like rash, recurrent abdominal pain, anemia and continuously elevated CRP. Single exome sequencing revealed the pathogenic nonsense mutation p.(Arg98*) in the CLCN5 gene causing the X-linked inherited, renal tubular disorder Dent's disease. Furthermore, the two pathogenic and compound heterozygous missense variants p.(Gly47Ala) and p.(Pro251Leu) in the CECR1 gene could be identified. Mutations in the CECR1 gene are associated with a hereditary form of polyarteritis nodosa, called ADA2-deficiency. Both parents were carriers of a single heterozygous variant in CECR1 and the mother was carrier of the CLCN5 variant. This case evidently demonstrates the advantage of whole exome sequencing compared to single gene testing as the pathology in the CECR1 gene might have only been diagnosed after the occurrence of signs of systemic vasculitis like strokes or hemorrhages. Therefore, treatment and prevention can now start early to improve the outcome of these patients. Copyright © 2018 Elsevier B.V. All rights reserved.

  20. Model annotation for synthetic biology: automating model to nucleotide sequence conversion

    PubMed Central

    Misirli, Goksel; Hallinan, Jennifer S.; Yu, Tommy; Lawson, James R.; Wimalaratne, Sarala M.; Cooling, Michael T.; Wipat, Anil

    2011-01-01

    Motivation: The need for the automated computational design of genetic circuits is becoming increasingly apparent with the advent of ever more complex and ambitious synthetic biology projects. Currently, most circuits are designed through the assembly of models of individual parts such as promoters, ribosome binding sites and coding sequences. These low level models are combined to produce a dynamic model of a larger device that exhibits a desired behaviour. The larger model then acts as a blueprint for physical implementation at the DNA level. However, the conversion of models of complex genetic circuits into DNA sequences is a non-trivial undertaking due to the complexity of mapping the model parts to their physical manifestation. Automating this process is further hampered by the lack of computationally tractable information in most models. Results: We describe a method for automatically generating DNA sequences from dynamic models implemented in CellML and Systems Biology Markup Language (SBML). We also identify the metadata needed to annotate models to facilitate automated conversion, and propose and demonstrate a method for the markup of these models using RDF. Our algorithm has been implemented in a software tool called MoSeC. Availability: The software is available from the authors' web site http://research.ncl.ac.uk/synthetic_biology/downloads.html. Contact: anil.wipat@ncl.ac.uk Supplementary information: Supplementary data are available at Bioinformatics online. PMID:21296753

  1. A hybrid computational strategy to address WGS variant analysis in >5000 samples.

    PubMed

    Huang, Zhuoyi; Rustagi, Navin; Veeraraghavan, Narayanan; Carroll, Andrew; Gibbs, Richard; Boerwinkle, Eric; Venkata, Manjunath Gorentla; Yu, Fuli

    2016-09-10

    The decreasing costs of sequencing are driving the need for cost effective and real time variant calling of whole genome sequencing data. The scale of these projects are far beyond the capacity of typical computing resources available with most research labs. Other infrastructures like the cloud AWS environment and supercomputers also have limitations due to which large scale joint variant calling becomes infeasible, and infrastructure specific variant calling strategies either fail to scale up to large datasets or abandon joint calling strategies. We present a high throughput framework including multiple variant callers for single nucleotide variant (SNV) calling, which leverages hybrid computing infrastructure consisting of cloud AWS, supercomputers and local high performance computing infrastructures. We present a novel binning approach for large scale joint variant calling and imputation which can scale up to over 10,000 samples while producing SNV callsets with high sensitivity and specificity. As a proof of principle, we present results of analysis on Cohorts for Heart And Aging Research in Genomic Epidemiology (CHARGE) WGS freeze 3 dataset in which joint calling, imputation and phasing of over 5300 whole genome samples was produced in under 6 weeks using four state-of-the-art callers. The callers used were SNPTools, GATK-HaplotypeCaller, GATK-UnifiedGenotyper and GotCloud. We used Amazon AWS, a 4000-core in-house cluster at Baylor College of Medicine, IBM power PC Blue BioU at Rice and Rhea at Oak Ridge National Laboratory (ORNL) for the computation. AWS was used for joint calling of 180 TB of BAM files, and ORNL and Rice supercomputers were used for the imputation and phasing step. All other steps were carried out on the local compute cluster. The entire operation used 5.2 million core hours and only transferred a total of 6 TB of data across the platforms. Even with increasing sizes of whole genome datasets, ensemble joint calling of SNVs for low coverage data can be accomplished in a scalable, cost effective and fast manner by using heterogeneous computing platforms without compromising on the quality of variants.

  2. Approaching the taxonomic affiliation of unidentified sequences in public databases--an example from the mycorrhizal fungi.

    PubMed

    Nilsson, R Henrik; Kristiansson, Erik; Ryberg, Martin; Larsson, Karl-Henrik

    2005-07-18

    During the last few years, DNA sequence analysis has become one of the primary means of taxonomic identification of species, particularly so for species that are minute or otherwise lack distinct, readily obtainable morphological characters. Although the number of sequences available for comparison in public databases such as GenBank increases exponentially, only a minuscule fraction of all organisms have been sequenced, leaving taxon sampling a momentous problem for sequence-based taxonomic identification. When querying GenBank with a set of unidentified sequences, a considerable proportion typically lack fully identified matches, forming an ever-mounting pile of sequences that the researcher will have to monitor manually in the hope that new, clarifying sequences have been submitted by other researchers. To alleviate these concerns, a project to automatically monitor select unidentified sequences in GenBank for taxonomic progress through repeated local BLAST searches was initiated. Mycorrhizal fungi--a field where species identification often is prohibitively complex--and the much used ITS locus were chosen as test bed. A Perl script package called emerencia is presented. On a regular basis, it downloads select sequences from GenBank, separates the identified sequences from those insufficiently identified, and performs BLAST searches between these two datasets, storing all results in an SQL database. On the accompanying web-service http://emerencia.math.chalmers.se, users can monitor the taxonomic progress of insufficiently identified sequences over time, either through active searches or by signing up for e-mail notification upon disclosure of better matches. Other search categories, such as listing all insufficiently identified sequences (and their present best fully identified matches) publication-wise, are also available. The ever-increasing use of DNA sequences for identification purposes largely falls back on the assumption that public sequence databases contain a thorough sampling of taxonomically well-annotated sequences. Taxonomy, held by some to be an old-fashioned trade, has accordingly never been more important. emerencia does not automate the taxonomic process, but it does allow researchers to focus their efforts elsewhere than countless manual BLAST runs and arduous sieving of BLAST hit lists. The emerencia system is available on an open source basis for local installation with any organism and gene group as targets.

  3. PipeOnline 2.0: automated EST processing and functional data sorting.

    PubMed

    Ayoubi, Patricia; Jin, Xiaojing; Leite, Saul; Liu, Xianghui; Martajaja, Jeson; Abduraham, Abdurashid; Wan, Qiaolan; Yan, Wei; Misawa, Eduardo; Prade, Rolf A

    2002-11-01

    Expressed sequence tags (ESTs) are generated and deposited in the public domain, as redundant, unannotated, single-pass reactions, with virtually no biological content. PipeOnline automatically analyses and transforms large collections of raw DNA-sequence data from chromatograms or FASTA files by calling the quality of bases, screening and removing vector sequences, assembling and rewriting consensus sequences of redundant input files into a unigene EST data set and finally through translation, amino acid sequence similarity searches, annotation of public databases and functional data. PipeOnline generates an annotated database, retaining the processed unigene sequence, clone/file history, alignments with similar sequences, and proposed functional classification, if available. Functional annotation is automatic and based on a novel method that relies on homology of amino acid sequence multiplicity within GenBank records. Records are examined through a function ordered browser or keyword queries with automated export of results. PipeOnline offers customization for individual projects (MyPipeOnline), automated updating and alert service. PipeOnline is available at http://stress-genomics.org.

  4. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications.

    PubMed

    Huang, Lei; Ma, Fei; Chapman, Alec; Lu, Sijia; Xie, Xiaoliang Sunney

    2015-01-01

    We present a survey of single-cell whole-genome amplification (WGA) methods, including degenerate oligonucleotide-primed polymerase chain reaction (DOP-PCR), multiple displacement amplification (MDA), and multiple annealing and looping-based amplification cycles (MALBAC). The key parameters to characterize the performance of these methods are defined, including genome coverage, uniformity, reproducibility, unmappable rates, chimera rates, allele dropout rates, false positive rates for calling single-nucleotide variations, and ability to call copy-number variations. Using these parameters, we compare five commercial WGA kits by performing deep sequencing of multiple single cells. We also discuss several major applications of single-cell genomics, including studies of whole-genome de novo mutation rates, the early evolution of cancer genomes, circulating tumor cells (CTCs), meiotic recombination of germ cells, preimplantation genetic diagnosis (PGD), and preimplantation genomic screening (PGS) for in vitro-fertilized embryos.

  5. Auditory responses in the amygdala to social vocalizations

    NASA Astrophysics Data System (ADS)

    Gadziola, Marie A.

    The underlying goal of this dissertation is to understand how the amygdala, a brain region involved in establishing the emotional significance of sensory input, contributes to the processing of complex sounds. The general hypothesis is that communication calls of big brown bats (Eptesicus fuscus) transmit relevant information about social context that is reflected in the activity of amygdalar neurons. The first specific aim analyzed social vocalizations emitted under a variety of behavioral contexts, and related vocalizations to an objective measure of internal physiological state by monitoring the heart rate of vocalizing bats. These experiments revealed a complex acoustic communication system among big brown bats in which acoustic cues and call structure signal the emotional state of a sender. The second specific aim characterized the responsiveness of single neurons in the basolateral amygdala to a range of social syllables. Neurons typically respond to the majority of tested syllables, but effectively discriminate among vocalizations by varying the response duration. This novel coding strategy underscores the importance of persistent firing in the general functioning of the amygdala. The third specific aim examined the influence of acoustic context by characterizing both the behavioral and neurophysiological responses to natural vocal sequences. Vocal sequences differentially modify the internal affective state of a listening bat, with lower aggression vocalizations evoking the greatest change in heart rate. Amygdalar neurons employ two different coding strategies: low background neurons respond selectively to very few stimuli, whereas high background neurons respond broadly to stimuli but demonstrate variation in response magnitude and timing. Neurons appear to discriminate the valence of stimuli, with aggression sequences evoking robust population-level responses across all sound levels. Further, vocal sequences show improved discrimination among stimuli compared to isolated syllables, and this improved discrimination is expressed in part by the timing of action potentials. Taken together, these data support the hypothesis that big brown bat social vocalizations transmit relevant information about the social context that is encoded within the discharge pattern of amygdalar neurons ultimately responsible for coordinating appropriate social behaviors. I further propose that vocalization-evoked amygdalar activity will have significant impact on subsequent sensory processing and plasticity.

  6. Association analysis using next-generation sequence data from publicly available control groups: the robust variance score statistic.

    PubMed

    Derkach, Andriy; Chiang, Theodore; Gong, Jiafen; Addis, Laura; Dobbins, Sara; Tomlinson, Ian; Houlston, Richard; Pal, Deb K; Strug, Lisa J

    2014-08-01

    Sufficiently powered case-control studies with next-generation sequence (NGS) data remain prohibitively expensive for many investigators. If feasible, a more efficient strategy would be to include publicly available sequenced controls. However, these studies can be confounded by differences in sequencing platform; alignment, single nucleotide polymorphism and variant calling algorithms; read depth; and selection thresholds. Assuming one can match cases and controls on the basis of ethnicity and other potential confounding factors, and one has access to the aligned reads in both groups, we investigate the effect of systematic differences in read depth and selection threshold when comparing allele frequencies between cases and controls. We propose a novel likelihood-based method, the robust variance score (RVS), that substitutes genotype calls by their expected values given observed sequence data. We show theoretically that the RVS eliminates read depth bias in the estimation of minor allele frequency. We also demonstrate that, using simulated and real NGS data, the RVS method controls Type I error and has comparable power to the 'gold standard' analysis with the true underlying genotypes for both common and rare variants. An RVS R script and instructions can be found at strug.research.sickkids.ca, and at https://github.com/strug-lab/RVS. lisa.strug@utoronto.ca Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  7. A multi-model approach to nucleic acid-based drug development.

    PubMed

    Gautherot, Isabelle; Sodoyer, Regís

    2004-01-01

    With the advent of functional genomics and the shift of interest towards sequence-based therapeutics, the past decades have witnessed intense research efforts on nucleic acid-mediated gene regulation technologies. Today, RNA interference is emerging as a groundbreaking discovery, holding promise for development of genetic modulators of unprecedented potency. Twenty-five years after the discovery of antisense RNA and ribozymes, gene control therapeutics are still facing developmental difficulties, with only one US FDA-approved antisense drug currently available in the clinic. Limited predictability of target site selection models is recognized as one major stumbling block that is shared by all of the so-called complementary technologies, slowing the progress towards a commercial product. Currently employed in vitro systems for target site selection include RNAse H-based mapping, antisense oligonucleotide microarrays, and functional screening approaches using libraries of catalysts with randomized target-binding arms to identify optimal ribozyme/DNAzyme cleavage sites. Individually, each strategy has its drawbacks from a drug development perspective. Utilization of message-modulating sequences as therapeutic agents requires that their action on a given target transcript meets criteria of potency and selectivity in the natural physiological environment. In addition to sequence-dependent characteristics, other factors will influence annealing reactions and duplex stability, as well as nucleic acid-mediated catalysis. Parallel consideration of physiological selection systems thus appears essential for screening for nucleic acid compounds proposed for therapeutic applications. Cellular message-targeting studies face issues relating to efficient nucleic acid delivery and appropriate analysis of response. For reliability and simplicity, prokaryotic systems can provide a rapid and cost-effective means of studying message targeting under pseudo-cellular conditions, but such approaches also have limitations. To streamline nucleic acid drug discovery, we propose a multi-model strategy integrating high-throughput-adapted bacterial screening, followed by reporter-based and/or natural cellular models and potentially also in vitro assays for characterization of the most promising candidate sequences, before final in vivo testing.

  8. On the definition of adapted audio/video profiles for high-quality video calling services over LTE/4G

    NASA Astrophysics Data System (ADS)

    Ndiaye, Maty; Quinquis, Catherine; Larabi, Mohamed Chaker; Le Lay, Gwenael; Saadane, Hakim; Perrine, Clency

    2014-01-01

    During the last decade, the important advances and widespread availability of mobile technology (operating systems, GPUs, terminal resolution and so on) have encouraged a fast development of voice and video services like video-calling. While multimedia services have largely grown on mobile devices, the generated increase of data consumption is leading to the saturation of mobile networks. In order to provide data with high bit-rates and maintain performance as close as possible to traditional networks, the 3GPP (The 3rd Generation Partnership Project) worked on a high performance standard for mobile called Long Term Evolution (LTE). In this paper, we aim at expressing recommendations related to audio and video media profiles (selection of audio and video codecs, bit-rates, frame-rates, audio and video formats) for a typical video-calling services held over LTE/4G mobile networks. These profiles are defined according to targeted devices (smartphones, tablets), so as to ensure the best possible quality of experience (QoE). Obtained results indicate that for a CIF format (352 x 288 pixels) which is usually used for smartphones, the VP8 codec provides a better image quality than the H.264 codec for low bitrates (from 128 to 384 kbps). However sequences with high motion, H.264 in slow mode is preferred. Regarding audio, better results are globally achieved using wideband codecs offering good quality except for opus codec (at 12.2 kbps).

  9. Accurate and exact CNV identification from targeted high-throughput sequence data.

    PubMed

    Nord, Alex S; Lee, Ming; King, Mary-Claire; Walsh, Tom

    2011-04-12

    Massively parallel sequencing of barcoded DNA samples significantly increases screening efficiency for clinically important genes. Short read aligners are well suited to single nucleotide and indel detection. However, methods for CNV detection from targeted enrichment are lacking. We present a method combining coverage with map information for the identification of deletions and duplications in targeted sequence data. Sequencing data is first scanned for gains and losses using a comparison of normalized coverage data between samples. CNV calls are confirmed by testing for a signature of sequences that span the CNV breakpoint. With our method, CNVs can be identified regardless of whether breakpoints are within regions targeted for sequencing. For CNVs where at least one breakpoint is within targeted sequence, exact CNV breakpoints can be identified. In a test data set of 96 subjects sequenced across ~1 Mb genomic sequence using multiplexing technology, our method detected mutations as small as 31 bp, predicted quantitative copy count, and had a low false-positive rate. Application of this method allows for identification of gains and losses in targeted sequence data, providing comprehensive mutation screening when combined with a short read aligner.

  10. Real-time tracking of respiratory-induced tumor motion by dose-rate regulation

    NASA Astrophysics Data System (ADS)

    Han-Oh, Yeonju Sarah

    We have developed a novel real-time tumor-tracking technology, called Dose-Rate-Regulated Tracking (DRRT), to compensate for tumor motion caused by breathing. Unlike other previously proposed tumor-tracking methods, this new method uses a preprogrammed dynamic multileaf collimator (MLC) sequence in combination with real-time dose-rate control. This new scheme circumvents the technical challenge in MLC-based tumor tracking, that is to control the MLC motion in real time, based on real-time detected tumor motion. The preprogrammed MLC sequence describes the movement of the tumor, as a function of breathing phase, amplitude, or tidal volume. The irregularity of tumor motion during treatment is handled by real-time regulation of the dose rate, which effectively speeds up or slows down the delivery of radiation as needed. This method is based on the fact that all of the parameters in dynamic radiation delivery, including MLC motion, are enslaved to the cumulative dose, which, in turn, can be accelerated or decelerated by varying the dose rate. Because commercially available MLC systems do not allow the MLC delivery sequence to be modified in real time based on the patient's breathing signal, previously proposed tumor-tracking techniques using a MLC cannot be readily implemented in the clinic today. By using a preprogrammed MLC sequence to handle the required motion, the task for real-time control is greatly simplified. We have developed and tested the pre- programmed MLC sequence and the dose-rate regulation algorithm using lung-cancer patients breathing signals. It has been shown that DRRT can track the tumor with an accuracy of less than 2 mm for a latency of the DRRT system of less than 0.35 s. We also have evaluated the usefulness of guided breathing for DRRT. Since DRRT by its very nature can compensate for breathing-period changes, guided breathing was shown to be unnecessary for real-time tracking when using DRRT. Finally, DRRT uses the existing dose-rate control system that is provided for current linear accelerators. Therefore, DRRT can be achieved with minimal modification of existing technology, and this can shorten substantially the time necessary to establish DRRT in clinical practice.

  11. The CRISPRdb database and tools to display CRISPRs and to generate dictionaries of spacers and repeats

    PubMed Central

    Grissa, Ibtissem; Vergnaud, Gilles; Pourcel, Christine

    2007-01-01

    Background In Archeae and Bacteria, the repeated elements called CRISPRs for "clustered regularly interspaced short palindromic repeats" are believed to participate in the defence against viruses. Short sequences called spacers are stored in-between repeated elements. In the current model, motifs comprising spacers and repeats may target an invading DNA and lead to its degradation through a proposed mechanism similar to RNA interference. Analysis of intra-species polymorphism shows that new motifs (one spacer and one repeated element) are added in a polarised fashion. Although their principal characteristics have been described, a lot remains to be discovered on the way CRISPRs are created and evolve. As new genome sequences become available it appears necessary to develop automated scanning tools to make available CRISPRs related information and to facilitate additional investigations. Description We have produced a program, CRISPRFinder, which identifies CRISPRs and extracts the repeated and unique sequences. Using this software, a database is constructed which is automatically updated monthly from newly released genome sequences. Additional tools were created to allow the alignment of flanking sequences in search for similarities between different loci and to build dictionaries of unique sequences. To date, almost six hundred CRISPRs have been identified in 475 published genomes. Two Archeae out of thirty-seven and about half of Bacteria do not possess a CRISPR. Fine analysis of repeated sequences strongly supports the current view that new motifs are added at one end of the CRISPR adjacent to the putative promoter. Conclusion It is hoped that availability of a public database, regularly updated and which can be queried on the web will help in further dissecting and understanding CRISPR structure and flanking sequences evolution. Subsequent analyses of the intra-species CRISPR polymorphism will be facilitated by CRISPRFinder and the dictionary creator. CRISPRdb is accessible at PMID:17521438

  12. Synthetic Spike-in Standards Improve Run-Specific Systematic Error Analysis for DNA and RNA Sequencing

    PubMed Central

    Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc

    2012-01-01

    While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977

  13. The effect of call libraries and acoustic filters on the identification of bat echolocation.

    PubMed

    Clement, Matthew J; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C

    2014-09-01

    Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys.

  14. The effect of call libraries and acoustic filters on the identification of bat echolocation

    PubMed Central

    Clement, Matthew J; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C

    2014-01-01

    Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys. PMID:25535563

  15. The effect of call libraries and acoustic filters on the identification of bat echolocation

    USGS Publications Warehouse

    Clement, Matthew; Murray, Kevin L; Solick, Donald I; Gruver, Jeffrey C

    2014-01-01

    Quantitative methods for species identification are commonly used in acoustic surveys for animals. While various identification models have been studied extensively, there has been little study of methods for selecting calls prior to modeling or methods for validating results after modeling. We obtained two call libraries with a combined 1556 pulse sequences from 11 North American bat species. We used four acoustic filters to automatically select and quantify bat calls from the combined library. For each filter, we trained a species identification model (a quadratic discriminant function analysis) and compared the classification ability of the models. In a separate analysis, we trained a classification model using just one call library. We then compared a conventional model assessment that used the training library against an alternative approach that used the second library. We found that filters differed in the share of known pulse sequences that were selected (68 to 96%), the share of non-bat noises that were excluded (37 to 100%), their measurement of various pulse parameters, and their overall correct classification rate (41% to 85%). Although the top two filters did not differ significantly in overall correct classification rate (85% and 83%), rates differed significantly for some bat species. In our assessment of call libraries, overall correct classification rates were significantly lower (15% to 23% lower) when tested on the second call library instead of the training library. Well-designed filters obviated the need for subjective and time-consuming manual selection of pulses. Accordingly, researchers should carefully design and test filters and include adequate descriptions in publications. Our results also indicate that it may not be possible to extend inferences about model accuracy beyond the training library. If so, the accuracy of acoustic-only surveys may be lower than commonly reported, which could affect ecological understanding or management decisions based on acoustic surveys.

  16. High time for a roll call: gene duplication and phylogenetic relationships of TCP-like genes in monocots

    PubMed Central

    Mondragón-Palomino, Mariana; Trontin, Charlotte

    2011-01-01

    Background and Aims The TCP family is an ancient group of plant developmental transcription factors that regulate cell division in vegetative and reproductive structures and are essential in the establishment of flower zygomorphy. In-depth research on eudicot TCPs has documented their evolutionary and developmental role. This has not happened to the same extent in monocots, although zygomorphy has been critical for the diversification of Orchidaceae and Poaceae, the largest families of this group. Investigating the evolution and function of TCP-like genes in a wider group of monocots requires a detailed phylogenetic analysis of all available sequence information and a system that facilitates comparing genetic and functional information. Methods The phylogenetic relationships of TCP-like genes in monocots were investigated by analysing sequences from the genomes of Zea mays, Brachypodium distachyon, Oryza sativa and Sorghum bicolor, as well as EST data from several other monocot species. Key Results All available monocot TCP-like sequences are associated in 20 major groups with an average identity ≥64 % and most correspond to well-supported clades of the phylogeny. Their sequence motifs and relationships of orthology were documented and it was found that 67 % of the TCP-like genes of Sorghum, Oryza, Zea and Brachypodium are in microsyntenic regions. This analysis suggests that two rounds of whole genome duplication drove the expansion of TCP-like genes in these species. Conclusions A system of classification is proposed where putative or recognized monocot TCP-like genes are assigned to a specific clade of PCF-, CIN- or CYC/tb1-like genes. Specific biases in sequence data of this family that must be tackled when studying its molecular evolution and phylogeny are documented. Finally, the significant retention of duplicated TCP genes from Zea mays is considered in the context of balanced gene drive. PMID:21444336

  17. The 'dark matter' in the plant genomes: non-coding and unannotated DNA sequences associated with open chromatin.

    PubMed

    Jiang, Jiming

    2015-04-01

    Sequencing of complete plant genomes has become increasingly more routine since the advent of the next-generation sequencing technology. Identification and annotation of large amounts of noncoding but functional DNA sequences, including cis-regulatory DNA elements (CREs), have become a new frontier in plant genome research. Genomic regions containing active CREs bound to regulatory proteins are hypersensitive to DNase I digestion and are called DNase I hypersensitive sites (DHSs). Several recent DHS studies in plants illustrate that DHS datasets produced by DNase I digestion followed by next-generation sequencing (DNase-seq) are highly valuable for the identification and characterization of CREs associated with plant development and responses to environmental cues. DHS-based genomic profiling has opened a door to identify and annotate the 'dark matter' in sequenced plant genomes. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Sequence analysis of the canine mitochondrial DNA control region from shed hair samples in criminal investigations.

    PubMed

    Berger, C; Berger, B; Parson, W

    2012-01-01

    In recent years, evidence from domestic dogs has increasingly been analyzed by forensic DNA testing. Especially, canine hairs have proved most suitable and practical due to the high rate of hair transfer occurring between dogs and humans. Starting with the description of a contamination-free sample handling procedure, we give a detailed workflow for sequencing hypervariable segments (HVS) of the mtDNA control region from canine evidence. After the hair material is lysed and the DNA extracted by Phenol/Chloroform, the amplification and sequencing strategy comprises the HVS I and II of the canine control region and is optimized for DNA of medium-to-low quality and quantity. The sequencing procedure is based on the Sanger Big-dye deoxy-terminator method and the separation of the sequencing reaction products is performed on a conventional multicolor fluorescence detection capillary electrophoresis platform. Finally, software-aided base calling and sequence interpretation are addressed exemplarily.

  19. Planning Assembly Of Large Truss Structures In Outer Space

    NASA Technical Reports Server (NTRS)

    De Mello, Luiz S. Homem; Desai, Rajiv S.

    1992-01-01

    Report dicusses developmental algorithm used in systematic planning of sequences of operations in which large truss structures assembled in outer space. Assembly sequence represented by directed graph called "assembly graph", in which each arc represents joining of two parts or subassemblies. Algorithm generates assembly graph, working backward from state of complete assembly to initial state, in which all parts disassembled. Working backward more efficient than working forward because it avoids intermediate dead ends.

  20. Degree counting and Shadow system for Toda system of rank two: One bubbling

    NASA Astrophysics Data System (ADS)

    Lee, Youngae; Lin, Chang-Shou; Wei, Juncheng; Yang, Wen

    2018-04-01

    We initiate the program for computing the Leray-Schauder topological degree for Toda systems of rank two. This program still contains a lot of challenging problems for analysts. As the first step, we prove that if a sequence of solutions (u1k ,u2k) blows up, then one of hje ujk/∫Mhje ujk dvg, j = 1 , 2 tends to a sum of Dirac measures. This is so-called the phenomena of weak concentration. Our purposes in this article are (i) to introduce the shadow system due to the bubbling phenomena when one of parameters ρi crosses 4π and ρj ∉ 4 πN where 1 ≤ i ≠ j ≤ 2; (ii) to show how to calculate the topological degree of Toda systems by computing the topological degree of the general shadow systems; (iii) to calculate the topological degree of the shadow system for one point blow up. We believe that the degree counting formula for the shadow system would be useful in other problems.

Top