generation sequencing methodology: Topics by Science.gov

Sample records for generation sequencing methodology

Ancient DNA studies: new perspectives on old samples

PubMed Central

2012-01-01

In spite of past controversies, the field of ancient DNA is now a reliable research area due to recent methodological improvements. A series of recent large-scale studies have revealed the true potential of ancient DNA samples to study the processes of evolution and to test models and assumptions commonly used to reconstruct patterns of evolution and to analyze population genetics and palaeoecological changes. Recent advances in DNA technologies, such as next-generation sequencing make it possible to recover DNA information from archaeological and paleontological remains allowing us to go back in time and study the genetic relationships between extinct organisms and their contemporary relatives. With the next-generation sequencing methodologies, DNA sequences can be retrieved even from samples (for example human remains) for which the technical pitfalls of classical methodologies required stringent criteria to guaranty the reliability of the results. In this paper, we review the methodologies applied to ancient DNA analysis and the perspectives that next-generation sequencing applications provide in this field. PMID:22697611
The tomato genome

USDA-ARS?s Scientific Manuscript database

The tomato genome sequence was undertaken at a time when state-of-the-art sequencing methodologies were undergoing a transition to co-called next generation methodologies. The result was an international consortium undertaking a strategy merging both old and new approaches. Because biologists were...
Generating Models of Surgical Procedures using UMLS Concepts and Multiple Sequence Alignment

PubMed Central

Meng, Frank; D’Avolio, Leonard W.; Chen, Andrew A.; Taira, Ricky K.; Kangarloo, Hooshang

2005-01-01

Surgical procedures can be viewed as a process composed of a sequence of steps performed on, by, or with the patient’s anatomy. This sequence is typically the pattern followed by surgeons when generating surgical report narratives for documenting surgical procedures. This paper describes a methodology for semi-automatically deriving a model of conducted surgeries, utilizing a sequence of derived Unified Medical Language System (UMLS) concepts for representing surgical procedures. A multiple sequence alignment was computed from a collection of such sequences and was used for generating the model. These models have the potential of being useful in a variety of informatics applications such as information retrieval and automatic document generation. PMID:16779094
A vertebrate case study of the quality of assemblies derived from next-generation sequences

PubMed Central

2011-01-01

The unparalleled efficiency of next-generation sequencing (NGS) has prompted widespread adoption, but significant problems remain in the use of NGS data for whole genome assembly. We explore the advantages and disadvantages of chicken genome assemblies generated using a variety of sequencing and assembly methodologies. NGS assemblies are equivalent in some ways to a Sanger-based assembly yet deficient in others. Nonetheless, these assemblies are sufficient for the identification of the majority of genes and can reveal novel sequences when compared to existing assembly references. PMID:21453517
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

USDA-ARS?s Scientific Manuscript database

The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Whole-genome sequencing for comparative genomics and de novo genome assembly.

PubMed

Benjak, Andrej; Sala, Claudia; Hartkoorn, Ruben C

2015-01-01

Next-generation sequencing technologies for whole-genome sequencing of mycobacteria are rapidly becoming an attractive alternative to more traditional sequencing methods. In particular this technology is proving useful for genome-wide identification of mutations in mycobacteria (comparative genomics) as well as for de novo assembly of whole genomes. Next-generation sequencing however generates a vast quantity of data that can only be transformed into a usable and comprehensible form using bioinformatics. Here we describe the methodology one would use to prepare libraries for whole-genome sequencing, and the basic bioinformatics to identify mutations in a genome following Illumina HiSeq or MiSeq sequencing, as well as de novo genome assembly following sequencing using Pacific Biosciences (PacBio).
Delay test generation for synchronous sequential circuits

NASA Astrophysics Data System (ADS)

Devadas, Srinivas

1989-05-01

We address the problem of generating tests for delay faults in non-scan synchronous sequential circuits. Delay test generation for sequential circuits is a considerably more difficult problem than delay testing of combinational circuits and has received much less attention. In this paper, we present a method for generating test sequences to detect delay faults in sequential circuits using the stuck-at fault sequential test generator STALLION. The method is complete in that it will generate a delay test sequence for a targeted fault given sufficient CPU time, if such a sequence exists. We term faults for which no delay test sequence exists, under out test methodology, sequentially delay redundant. We describe means of eliminating sequential delay redundancies in logic circuits. We present a partial-scan methodology for enhancing the testability of difficult-to-test of untestable sequential circuits, wherein a small number of flip-flops are selected and made controllable/observable. The selection process guarantees the elimination of all sequential delay redundancies. We show that an intimate relationship exists between state assignment and delay testability of a sequential machine. We describe a state assignment algorithm for the synthesis of sequential machines with maximal delay fault testability. Preliminary experimental results using the test generation, partial-scan and synthesis algorithm are presented.
MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

PubMed

Reddy, Rachamalla Maheedhar; Mohammed, Monzoorul Haque; Mande, Sharmila S

2014-01-01

A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA. Copyright © 2014 Elsevier Inc. All rights reserved.
Transcriptome analysis of blueberry using 454 EST sequencing

USDA-ARS?s Scientific Manuscript database

Blueberry (Vaccinium corymbosum) is a major berry crop in the United States, and one that has great nutritional and economical value. Next generation sequencing methodologies, such as 454, have been demonstrated to be successful and efficient in producing a snap-shot of transcriptional activities du...
Introduction of the hybcell-based compact sequencing technology and comparison to state-of-the-art methodologies for KRAS mutation detection.

PubMed

Zopf, Agnes; Raim, Roman; Danzer, Martin; Niklas, Norbert; Spilka, Rita; Pröll, Johannes; Gabriel, Christian; Nechansky, Andreas; Roucka, Markus

2015-03-01

The detection of KRAS mutations in codons 12 and 13 is critical for anti-EGFR therapy strategies; however, only those methodologies with high sensitivity, specificity, and accuracy as well as the best cost and turnaround balance are suitable for routine daily testing. Here we compared the performance of compact sequencing using the novel hybcell technology with 454 next-generation sequencing (454-NGS), Sanger sequencing, and pyrosequencing, using an evaluation panel of 35 specimens. A total of 32 mutations and 10 wild-type cases were reported using 454-NGS as the reference method. Specificity ranged from 100% for Sanger sequencing to 80% for pyrosequencing. Sanger sequencing and hybcell-based compact sequencing achieved a sensitivity of 96%, whereas pyrosequencing had a sensitivity of 88%. Accuracy was 97% for Sanger sequencing, 85% for pyrosequencing, and 94% for hybcell-based compact sequencing. Quantitative results were obtained for 454-NGS and hybcell-based compact sequencing data, resulting in a significant correlation (r = 0.914). Whereas pyrosequencing and Sanger sequencing were not able to detect multiple mutated cell clones within one tumor specimen, 454-NGS and the hybcell-based compact sequencing detected multiple mutations in two specimens. Our comparison shows that the hybcell-based compact sequencing is a valuable alternative to state-of-the-art methodologies used for detection of clinically relevant point mutations.
Molecular testing for familial hypercholesterolaemia-associated mutations in a UK-based cohort: development of an NGS-based method and comparison with multiplex polymerase chain reaction and oligonucleotide arrays.

PubMed

Reiman, Anne; Pandey, Sarojini; Lloyd, Kate L; Dyer, Nigel; Khan, Mike; Crockard, Martin; Latten, Mark J; Watson, Tracey L; Cree, Ian A; Grammatopoulos, Dimitris K

2016-11-01

Background Detection of disease-associated mutations in patients with familial hypercholesterolaemia is crucial for early interventions to reduce risk of cardiovascular disease. Screening for these mutations represents a methodological challenge since more than 1200 different causal mutations in the low-density lipoprotein receptor has been identified. A number of methodological approaches have been developed for screening by clinical diagnostic laboratories. Methods Using primers targeting, the low-density lipoprotein receptor, apolipoprotein B, and proprotein convertase subtilisin/kexin type 9, we developed a novel Ion Torrent-based targeted re-sequencing method. We validated this in a West Midlands-UK small cohort of 58 patients screened in parallel with other mutation-targeting methods, such as multiplex polymerase chain reaction (Elucigene FH20), oligonucleotide arrays (Randox familial hypercholesterolaemia array) or the Illumina next-generation sequencing platform. Results In this small cohort, the next-generation sequencing method achieved excellent analytical performance characteristics and showed 100% and 89% concordance with the Randox array and the Elucigene FH20 assay. Investigation of the discrepant results identified two cases of mutation misclassification of the Elucigene FH20 multiplex polymerase chain reaction assay. A number of novel mutations not previously reported were also identified by the next-generation sequencing method. Conclusions Ion Torrent-based next-generation sequencing can deliver a suitable alternative for the molecular investigation of familial hypercholesterolaemia patients, especially when comprehensive mutation screening for rare or unknown mutations is required.
Spatio-temporal alignment of pedobarographic image sequences.

PubMed

Oliveira, Francisco P M; Sousa, Andreia; Santos, Rubim; Tavares, João Manuel R S

2011-07-01

This article presents a methodology to align plantar pressure image sequences simultaneously in time and space. The spatial position and orientation of a foot in a sequence are changed to match the foot represented in a second sequence. Simultaneously with the spatial alignment, the temporal scale of the first sequence is transformed with the aim of synchronizing the two input footsteps. Consequently, the spatial correspondence of the foot regions along the sequences as well as the temporal synchronizing is automatically attained, making the study easier and more straightforward. In terms of spatial alignment, the methodology can use one of four possible geometric transformation models: rigid, similarity, affine, or projective. In the temporal alignment, a polynomial transformation up to the 4th degree can be adopted in order to model linear and curved time behaviors. Suitable geometric and temporal transformations are found by minimizing the mean squared error (MSE) between the input sequences. The methodology was tested on a set of real image sequences acquired from a common pedobarographic device. When used in experimental cases generated by applying geometric and temporal control transformations, the methodology revealed high accuracy. In addition, the intra-subject alignment tests from real plantar pressure image sequences showed that the curved temporal models produced better MSE results (P < 0.001) than the linear temporal model. This article represents an important step forward in the alignment of pedobarographic image data, since previous methods can only be applied on static images.
Capillary electrophoresis of Big-Dye terminator sequencing reactions for human mtDNA Control Region haplotyping in the identification of human remains.

PubMed

Montesino, Marta; Prieto, Lourdes

2012-01-01

Cycle sequencing reaction with Big-Dye terminators provides the methodology to analyze mtDNA Control Region amplicons by means of capillary electrophoresis. DNA sequencing with ddNTPs or terminators was developed by (1). The progressive automation of the method by combining the use of fluorescent-dye terminators with cycle sequencing has made it possible to increase the sensibility and efficiency of the method and hence has allowed its introduction into the forensic field. PCR-generated mitochondrial DNA products are the templates for sequencing reactions. Different set of primers can be used to generate amplicons with different sizes according to the quality and quantity of the DNA extract providing sequence data for different ranges inside the Control Region.
Multi-Time Step Service Restoration for Advanced Distribution Systems and Microgrids

DOE PAGES

Chen, Bo; Chen, Chen; Wang, Jianhui; ...

2017-07-07

Modern power systems are facing increased risk of disasters that can cause extended outages. The presence of remote control switches (RCSs), distributed generators (DGs), and energy storage systems (ESS) provides both challenges and opportunities for developing post-fault service restoration methodologies. Inter-temporal constraints of DGs, ESS, and loads under cold load pickup (CLPU) conditions impose extra complexity on problem formulation and solution. In this paper, a multi-time step service restoration methodology is proposed to optimally generate a sequence of control actions for controllable switches, ESSs, and dispatchable DGs to assist the system operator with decision making. The restoration sequence is determinedmore » to minimize the unserved customers by energizing the system step by step without violating operational constraints at each time step. The proposed methodology is formulated as a mixed-integer linear programming (MILP) model and can adapt to various operation conditions. Furthermore, the proposed method is validated through several case studies that are performed on modified IEEE 13-node and IEEE 123-node test feeders.« less
Multi-Time Step Service Restoration for Advanced Distribution Systems and Microgrids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Bo; Chen, Chen; Wang, Jianhui

Modern power systems are facing increased risk of disasters that can cause extended outages. The presence of remote control switches (RCSs), distributed generators (DGs), and energy storage systems (ESS) provides both challenges and opportunities for developing post-fault service restoration methodologies. Inter-temporal constraints of DGs, ESS, and loads under cold load pickup (CLPU) conditions impose extra complexity on problem formulation and solution. In this paper, a multi-time step service restoration methodology is proposed to optimally generate a sequence of control actions for controllable switches, ESSs, and dispatchable DGs to assist the system operator with decision making. The restoration sequence is determinedmore » to minimize the unserved customers by energizing the system step by step without violating operational constraints at each time step. The proposed methodology is formulated as a mixed-integer linear programming (MILP) model and can adapt to various operation conditions. Furthermore, the proposed method is validated through several case studies that are performed on modified IEEE 13-node and IEEE 123-node test feeders.« less
Chemical genomic profiling via barcode sequencing to predict compound mode of action

PubMed Central

Piotrowski, Jeff S.; Simpkins, Scott W.; Li, Sheena C.; Deshpande, Raamesh; McIlwain, Sean; Ong, Irene; Myers, Chad L.; Boone, Charlie; Andersen, Raymond J.

2015-01-01

Summary Chemical genomics is an unbiased, whole-cell approach to characterizing novel compounds to determine mode of action and cellular target. Our version of this technique is built upon barcoded deletion mutants of Saccharomyces cerevisiae and has been adapted to a high-throughput methodology using next-generation sequencing. Here we describe the steps to generate a chemical genomic profile from a compound of interest, and how to use this information to predict molecular mechanism and targets of bioactive compounds. PMID:25618354
Methodological reporting of randomized clinical trials in respiratory research in 2010.

PubMed

Lu, Yi; Yao, Qiuju; Gu, Jie; Shen, Ce

2013-09-01

Although randomized controlled trials (RCTs) are considered the highest level of evidence, they are also subject to bias, due to a lack of adequately reported randomization, and therefore the reporting should be as explicit as possible for readers to determine the significance of the contents. We evaluated the methodological quality of RCTs in respiratory research in high ranking clinical journals, published in 2010. We assessed the methodological quality, including generation of the allocation sequence, allocation concealment, double-blinding, sample-size calculation, intention-to-treat analysis, flow diagrams, number of medical centers involved, diseases, funding sources, types of interventions, trial registration, number of times the papers have been cited, journal impact factor, journal type, and journal endorsement of the CONSORT (Consolidated Standards of Reporting Trials) rules, in RCTs published in 12 top ranking clinical respiratory journals and 5 top ranking general medical journals. We included 176 trials, of which 93 (53%) reported adequate generation of the allocation sequence, 66 (38%) reported adequate allocation concealment, 79 (45%) were double-blind, 123 (70%) reported adequate sample-size calculation, 88 (50%) reported intention-to-treat analysis, and 122 (69%) included a flow diagram. Multivariate logistic regression analysis revealed that journal impact factor ≥ 5 was the only variable that significantly influenced adequate allocation sequence generation. Trial registration and journal impact factor ≥ 5 significantly influenced adequate allocation concealment. Medical interventions, trial registration, and journal endorsement of the CONSORT statement influenced adequate double-blinding. Publication in one of the general medical journal influenced adequate sample-size calculation. The methodological quality of RCTs in respiratory research needs improvement. Stricter enforcement of the CONSORT statement should enhance the quality of RCTs.
Effects of the Ion PGM™ Hi-Q™ sequencing chemistry on sequence data quality.

PubMed

Churchill, Jennifer D; King, Jonathan L; Chakraborty, Ranajit; Budowle, Bruce

2016-09-01

Massively parallel sequencing (MPS) offers substantial improvements over current forensic DNA typing methodologies such as increased resolution, scalability, and throughput. The Ion PGM™ is a promising MPS platform for analysis of forensic biological evidence. The system employs a sequencing-by-synthesis chemistry on a semiconductor chip that measures a pH change due to the release of hydrogen ions as nucleotides are incorporated into the growing DNA strands. However, implementation of MPS into forensic laboratories requires a robust chemistry. Ion Torrent's Hi-Q™ Sequencing Chemistry was evaluated to determine if it could improve on the quality of the generated sequence data in association with selected genetic marker targets. The whole mitochondrial genome and the HID-Ion STR 10-plex panel were sequenced on the Ion PGM™ system with the Ion PGM™ Sequencing 400 Kit and the Ion PGM™ Hi-Q™ Sequencing Kit. Concordance, coverage, strand balance, noise, and deletion ratios were assessed in evaluating the performance of the Ion PGM™ Hi-Q™ Sequencing Kit. The results indicate that reliable, accurate data are generated and that sequencing through homopolymeric regions can be improved with the use of Ion Torrent's Hi-Q™ Sequencing Chemistry. Overall, the quality of the generated sequencing data supports the potential for use of the Ion PGM™ in forensic genetic laboratories.
Inaugural Genomics Automation Congress and the coming deluge of sequencing data.

PubMed

Creighton, Chad J

2010-10-01

Presentations at Select Biosciences's first 'Genomics Automation Congress' (Boston, MA, USA) in 2010 focused on next-generation sequencing and the platforms and methodology around them. The meeting provided an overview of sequencing technologies, both new and emerging. Speakers shared their recent work on applying sequencing to profile cells for various levels of biomolecular complexity, including DNA sequences, DNA copy, DNA methylation, mRNA and microRNA. With sequencing time and costs continuing to drop dramatically, a virtual explosion of very large sequencing datasets is at hand, which will probably present challenges and opportunities for high-level data analysis and interpretation, as well as for information technology infrastructure.
NGS tools for traceability in candies as high processed food products: Ion Torrent PGM versus conventional PCR-cloning.

PubMed

Muñoz-Colmenero, Marta; Martínez, Jose Luis; Roca, Agustín; Garcia-Vazquez, Eva

2017-01-01

The Next Generation Sequencing methodologies are considered the next step within DNA-based methods and their applicability in different fields is being evaluated. Here, we tested the usefulness of the Ion Torrent Personal Genome Machine (PGM) in food traceability analyzing candies as a model of high processed foods, and compared the results with those obtained by PCR-cloning-sequencing (PCR-CS). The majority of samples exhibited consistency between methodologies, yielding more information and species per product from the PGM platform than PCR-CS. Significantly higher AT-content in sequences of the same species was also obtained from PGM. This together with some taxonomical discrepancies between methodologies suggest that the PGM platform is still pre-mature for its use in food traceability of complex highly processed products. It could be a good option for analysis of less complex food, saving time and cost per sample. Copyright © 2016 Elsevier Ltd. All rights reserved.

Integration, warehousing, and analysis strategies of Omics data.

PubMed

Gedela, Srinubabu

2011-01-01

"-Omics" is a current suffix for numerous types of large-scale biological data generation procedures, which naturally demand the development of novel algorithms for data storage and analysis. With next generation genome sequencing burgeoning, it is pivotal to decipher a coding site on the genome, a gene's function, and information on transcripts next to the pure availability of sequence information. To explore a genome and downstream molecular processes, we need umpteen results at the various levels of cellular organization by utilizing different experimental designs, data analysis strategies and methodologies. Here comes the need for controlled vocabularies and data integration to annotate, store, and update the flow of experimental data. This chapter explores key methodologies to merge Omics data by semantic data carriers, discusses controlled vocabularies as eXtensible Markup Languages (XML), and provides practical guidance, databases, and software links supporting the integration of Omics data.
Reverse Genetics and High Throughput Sequencing Methodologies for Plant Functional Genomics

PubMed Central

Ben-Amar, Anis; Daldoul, Samia; Reustle, Götz M.; Krczal, Gabriele; Mliki, Ahmed

2016-01-01

In the post-genomic era, increasingly sophisticated genetic tools are being developed with the long-term goal of understanding how the coordinated activity of genes gives rise to a complex organism. With the advent of the next generation sequencing associated with effective computational approaches, wide variety of plant species have been fully sequenced giving a wealth of data sequence information on structure and organization of plant genomes. Since thousands of gene sequences are already known, recently developed functional genomics approaches provide powerful tools to analyze plant gene functions through various gene manipulation technologies. Integration of different omics platforms along with gene annotation and computational analysis may elucidate a complete view in a system biology level. Extensive investigations on reverse genetics methodologies were deployed for assigning biological function to a specific gene or gene product. We provide here an updated overview of these high throughout strategies highlighting recent advances in the knowledge of functional genomics in plants. PMID:28217003
DNA-based random number generation in security circuitry.

PubMed

Gearheart, Christy M; Arazi, Benjamin; Rouchka, Eric C

2010-06-01

DNA-based circuit design is an area of research in which traditional silicon-based technologies are replaced by naturally occurring phenomena taken from biochemistry and molecular biology. This research focuses on further developing DNA-based methodologies to mimic digital data manipulation. While exhibiting fundamental principles, this work was done in conjunction with the vision that DNA-based circuitry, when the technology matures, will form the basis for a tamper-proof security module, revolutionizing the meaning and concept of tamper-proofing and possibly preventing it altogether based on accurate scientific observations. A paramount part of such a solution would be self-generation of random numbers. A novel prototype schema employs solid phase synthesis of oligonucleotides for random construction of DNA sequences; temporary storage and retrieval is achieved through plasmid vectors. A discussion of how to evaluate sequence randomness is included, as well as how these techniques are applied to a simulation of the random number generation circuitry. Simulation results show generated sequences successfully pass three selected NIST random number generation tests specified for security applications.
SU-F-T-350: Continuous Leaf Optimization (CLO) for IMRT Leaf Sequencing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Long, T; Chen, M; Jiang, S

Purpose: To study a new step-and-shoot IMRT leaf sequencing model that avoids the two main pitfalls of conventional leaf sequencing: (1) target fluence being stratified into a fixed number of discrete levels and/or (2) aperture leaf positions being restricted to a discrete set of locations. These assumptions induce error into the sequence or reduce the feasible region of potential plans, respectively. Methods: We develop a one-dimensional (single leaf pair) methodology that does not make assumptions (1) or (2) that can be easily extended to a multi-row model. The proposed continuous leaf optimization (CLO) methodology takes in an existing set ofmore » apertures and associated intensities, or solution “seed,” and improves the plan without the restrictiveness of 1or (2). It then uses a first-order descent algorithm to converge onto a locally optimal solution. A seed solution can come from models that assume (1) and (2), thus allowing the CLO model to improve upon existing leaf sequencing methodologies. Results: The CLO model was applied to 208 generated target fluence maps in one dimension. In all cases for all tested sequencing strategies, the CLO model made improvements on the starting seed objective function. The CLO model also was able to keep MUs low. Conclusion: The CLO model can improve upon existing leaf sequencing methods by avoiding the restrictions of (1) and (2). By allowing for more flexible leaf positioning, error can be reduced when matching some target fluence. This study lays the foundation for future models and solution methodologies that can incorporate continuous leaf positions explicitly into the IMRT treatment planning model. Supported by Cancer Prevention & Research Institute of Texas (CPRIT) - ID RP150485.« less
Improved methods of DNA extraction from human spermatozoa that mitigate experimentally-induced oxidative DNA damage.

PubMed

Xavier, Miguel J; Nixon, Brett; Roman, Shaun D; Aitken, Robert John

2018-01-01

Current approaches for DNA extraction and fragmentation from mammalian spermatozoa provide several challenges for the investigation of the oxidative stress burden carried in the genome of male gametes. Indeed, the potential introduction of oxidative DNA damage induced by reactive oxygen species, reducing agents (dithiothreitol or beta-mercaptoethanol), and DNA shearing techniques used in the preparation of samples for chromatin immunoprecipitation and next-generation sequencing serve to cofound the reliability and accuracy of the results obtained. Here we report optimised methodology that minimises, or completely eliminates, exposure to DNA damaging compounds during extraction and fragmentation procedures. Specifically, we show that Micrococcal nuclease (MNase) digestion prior to cellular lysis generates a greater DNA yield with minimal collateral oxidation while randomly fragmenting the entire paternal genome. This modified methodology represents a significant improvement over traditional fragmentation achieved via sonication in the preparation of genomic DNA from human spermatozoa for downstream applications, such as next-generation sequencing. We also present a redesigned bioinformatic pipeline framework adjusted to correctly analyse this form of data and detect statistically relevant targets of oxidation.
Application of genetic algorithm in integrated setup planning and operation sequencing

NASA Astrophysics Data System (ADS)

Kafashi, Sajad; Shakeri, Mohsen

2011-01-01

Process planning is an essential component for linking design and manufacturing process. Setup planning and operation sequencing is two main tasks in process planning. Many researches solved these two problems separately. Considering the fact that the two functions are complementary, it is necessary to integrate them more tightly so that performance of a manufacturing system can be improved economically and competitively. This paper present a generative system and genetic algorithm (GA) approach to process plan the given part. The proposed approach and optimization methodology analyses the TAD (tool approach direction), tolerance relation between features and feature precedence relations to generate all possible setups and operations using workshop resource database. Based on these technological constraints the GA algorithm approach, which adopts the feature-based representation, optimizes the setup plan and sequence of operations using cost indices. Case study show that the developed system can generate satisfactory results in optimizing the setup planning and operation sequencing simultaneously in feasible condition.
Next Generation Sequencing Technology and Genomewide Data Analysis: Perspectives for Retinal Research

PubMed Central

Chaitankar, Vijender; Karakülah, Gökhan; Ratnapriya, Rinki; Giuste, Felipe O.; Brooks, Matthew J.; Swaroop, Anand

2016-01-01

The advent of high throughput next generation sequencing (NGS) has accelerated the pace of discovery of disease-associated genetic variants and genomewide profiling of expressed sequences and epigenetic marks, thereby permitting systems-based analyses of ocular development and disease. Rapid evolution of NGS and associated methodologies presents significant challenges in acquisition, management, and analysis of large data sets and for extracting biologically or clinically relevant information. Here we illustrate the basic design of commonly used NGS-based methods, specifically whole exome sequencing, transcriptome, and epigenome profiling, and provide recommendations for data analyses. We briefly discuss systems biology approaches for integrating multiple data sets to elucidate gene regulatory or disease networks. While we provide examples from the retina, the NGS guidelines reviewed here are applicable to other tissues/cell types as well. PMID:27297499
Accurate RNA consensus sequencing for high-fidelity detection of transcriptional mutagenesis-induced epimutations.

PubMed

Reid-Bayliss, Kate S; Loeb, Lawrence A

2017-08-29

Transcriptional mutagenesis (TM) due to misincorporation during RNA transcription can result in mutant RNAs, or epimutations, that generate proteins with altered properties. TM has long been hypothesized to play a role in aging, cancer, and viral and bacterial evolution. However, inadequate methodologies have limited progress in elucidating a causal association. We present a high-throughput, highly accurate RNA sequencing method to measure epimutations with single-molecule sensitivity. Accurate RNA consensus sequencing (ARC-seq) uniquely combines RNA barcoding and generation of multiple cDNA copies per RNA molecule to eliminate errors introduced during cDNA synthesis, PCR, and sequencing. The stringency of ARC-seq can be scaled to accommodate the quality of input RNAs. We apply ARC-seq to directly assess transcriptome-wide epimutations resulting from RNA polymerase mutants and oxidative stress.
Association between funding source, methodological quality and research outcomes in randomized controlled trials of synbiotics, probiotics and prebiotics added to infant formula: A Systematic Review

PubMed Central

2013-01-01

Background There is little or no information available on the impact of funding by the food industry on trial outcomes and methodological quality of synbiotics, probiotics and prebiotics research in infants. The objective of this study was to compare the methodological quality, outcomes of food industry sponsored trials versus non industry sponsored trials, with regards to supplementation of synbiotics, probiotics and prebiotics in infant formula. Methods A comprehensive search was conducted to identify published and unpublished randomized clinical trials (RCTs). Cochrane methodology was used to assess the risk of bias of included RCTs in the following domains: 1) sequence generation; 2) allocation concealment; 3) blinding; 4) incomplete outcome data; 5) selective outcome reporting; and 6) other bias. Clinical outcomes and authors’ conclusions were reported in frequencies and percentages. The association between source of funding, risk of bias, clinical outcomes and conclusions were assessed using Pearson’s Chi-square test and the Fisher’s exact test. A p-value < 0.05 was statistically significant. Results Sixty seven completed and 3 on-going RCTs were included. Forty (59.7%) were funded by food industry, 11 (16.4%) by non-industry entities and 16 (23.9%) did not specify source of funding. Several risk of bias domains, especially sequence generation, allocation concealment and blinding, were not adequately reported. There was no significant association between the source of funding and sequence generation, allocation concealment, blinding and selective reporting, majority of reported clinical outcomes or authors’ conclusions. On the other hand, source of funding was significantly associated with the domains of incomplete outcome data, free of other bias domains as well as reported antibiotic use and conclusions on weight gain. Conclusion In RCTs on infants fed infant formula containing probiotics, prebiotics or synbiotics, the source of funding did not influence the majority of outcomes in favour of the sponsors’ products. More non-industry funded research is needed to further assess the impact of funding on methodological quality, reported clinical outcomes and authors’ conclusions. PMID:24219082
Generation of a novel next-generation sequencing-based method for the isolation of new human papillomavirus types.

PubMed

Brancaccio, Rosario N; Robitaille, Alexis; Dutta, Sankhadeep; Cuenin, Cyrille; Santare, Daiga; Skenders, Girts; Leja, Marcis; Fischer, Nicole; Giuliano, Anna R; Rollison, Dana E; Grundhoff, Adam; Tommasino, Massimo; Gheit, Tarik

2018-05-07

With the advent of new molecular tools, the discovery of new papillomaviruses (PVs) has accelerated during the past decade, enabling the expansion of knowledge about the viral populations that inhabit the human body. Human PVs (HPVs) are etiologically linked to benign or malignant lesions of the skin and mucosa. The detection of HPV types can vary widely, depending mainly on the methodology and the quality of the biological sample. Next-generation sequencing is one of the most powerful tools, enabling the discovery of novel viruses in a wide range of biological material. Here, we report a novel protocol for the detection of known and unknown HPV types in human skin and oral gargle samples using improved PCR protocols combined with next-generation sequencing. We identified 105 putative new PV types in addition to 296 known types, thus providing important information about the viral distribution in the oral cavity and skin. Copyright © 2018. Published by Elsevier Inc.
Antibiotic Resistome: Improving Detection and Quantification Accuracy for Comparative Metagenomics.

PubMed

Elbehery, Ali H A; Aziz, Ramy K; Siam, Rania

2016-04-01

The unprecedented rise of life-threatening antibiotic resistance (AR), combined with the unparalleled advances in DNA sequencing of genomes and metagenomes, has pushed the need for in silico detection of the resistance potential of clinical and environmental metagenomic samples through the quantification of AR genes (i.e., genes conferring antibiotic resistance). Therefore, determining an optimal methodology to quantitatively and accurately assess AR genes in a given environment is pivotal. Here, we optimized and improved existing AR detection methodologies from metagenomic datasets to properly consider AR-generating mutations in antibiotic target genes. Through comparative metagenomic analysis of previously published AR gene abundance in three publicly available metagenomes, we illustrate how mutation-generated resistance genes are either falsely assigned or neglected, which alters the detection and quantitation of the antibiotic resistome. In addition, we inspected factors influencing the outcome of AR gene quantification using metagenome simulation experiments, and identified that genome size, AR gene length, total number of metagenomics reads and selected sequencing platforms had pronounced effects on the level of detected AR. In conclusion, our proposed improvements in the current methodologies for accurate AR detection and resistome assessment show reliable results when tested on real and simulated metagenomic datasets.
A Methodology for the Integration of a Mechanistic Source Term Analysis in a Probabilistic Framework for Advanced Reactors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grabaskas, Dave; Brunett, Acacia J.; Bucknor, Matthew

GE Hitachi Nuclear Energy (GEH) and Argonne National Laboratory are currently engaged in a joint effort to modernize and develop probabilistic risk assessment (PRA) techniques for advanced non-light water reactors. At a high level, the primary outcome of this project will be the development of next-generation PRA methodologies that will enable risk-informed prioritization of safety- and reliability-focused research and development, while also identifying gaps that may be resolved through additional research. A subset of this effort is the development of PRA methodologies to conduct a mechanistic source term (MST) analysis for event sequences that could result in the release ofmore » radionuclides. The MST analysis seeks to realistically model and assess the transport, retention, and release of radionuclides from the reactor to the environment. The MST methods developed during this project seek to satisfy the requirements of the Mechanistic Source Term element of the ASME/ANS Non-LWR PRA standard. The MST methodology consists of separate analysis approaches for risk-significant and non-risk significant event sequences that may result in the release of radionuclides from the reactor. For risk-significant event sequences, the methodology focuses on a detailed assessment, using mechanistic models, of radionuclide release from the fuel, transport through and release from the primary system, transport in the containment, and finally release to the environment. The analysis approach for non-risk significant event sequences examines the possibility of large radionuclide releases due to events such as re-criticality or the complete loss of radionuclide barriers. This paper provides details on the MST methodology, including the interface between the MST analysis and other elements of the PRA, and provides a simplified example MST calculation for a sodium fast reactor.« less
An optimized methodology for whole genome sequencing of RNA respiratory viruses from nasopharyngeal aspirates.

PubMed

Goya, Stephanie; Valinotto, Laura E; Tittarelli, Estefania; Rojo, Gabriel L; Nabaes Jodar, Mercedes S; Greninger, Alexander L; Zaiat, Jonathan J; Marti, Marcelo A; Mistchenko, Alicia S; Viegas, Mariana

2018-01-01

Over the last decade, the number of viral genome sequences deposited in available databases has grown exponentially. However, sequencing methodology vary widely and many published works have relied on viral enrichment by viral culture or nucleic acid amplification with specific primers rather than through unbiased techniques such as metagenomics. The genome of RNA viruses is highly variable and these enrichment methodologies may be difficult to achieve or may bias the results. In order to obtain genomic sequences of human respiratory syncytial virus (HRSV) from positive nasopharyngeal aspirates diverse methodologies were evaluated and compared. A total of 29 nearly complete and complete viral genomes were obtained. The best performance was achieved with a DNase I treatment to the RNA directly extracted from the nasopharyngeal aspirate (NPA), sequence-independent single-primer amplification (SISPA) and library preparation performed with Nextera XT DNA Library Prep Kit with manual normalization. An average of 633,789 and 1,674,845 filtered reads per library were obtained with MiSeq and NextSeq 500 platforms, respectively. The higher output of NextSeq 500 was accompanied by the increasing of duplicated reads percentage generated during SISPA (from an average of 1.5% duplicated viral reads in MiSeq to an average of 74% in NextSeq 500). HRSV genome recovery was not affected by the presence or absence of duplicated reads but the computational demand during the analysis was increased. Considering that only samples with viral load ≥ E+06 copies/ml NPA were tested, no correlation between sample viral loads and number of total filtered reads was observed, nor with the mapped viral reads. The HRSV genomes showed a mean coverage of 98.46% with the best methodology. In addition, genomes of human metapneumovirus (HMPV), human rhinovirus (HRV) and human parainfluenza virus types 1-3 (HPIV1-3) were also obtained with the selected optimal methodology.
De Novo Peptide Sequencing: Deep Mining of High-Resolution Mass Spectrometry Data.

PubMed

Islam, Mohammad Tawhidul; Mohamedali, Abidali; Fernandes, Criselda Santan; Baker, Mark S; Ranganathan, Shoba

2017-01-01

High resolution mass spectrometry has revolutionized proteomics over the past decade, resulting in tremendous amounts of data in the form of mass spectra, being generated in a relatively short span of time. The mining of this spectral data for analysis and interpretation though has lagged behind such that potentially valuable data is being overlooked because it does not fit into the mold of traditional database searching methodologies. Although the analysis of spectra by de novo sequences removes such biases and has been available for a long period of time, its uptake has been slow or almost nonexistent within the scientific community. In this chapter, we propose a methodology to integrate de novo peptide sequencing using three commonly available software solutions in tandem, complemented by homology searching, and manual validation of spectra. This simplified method would allow greater use of de novo sequencing approaches and potentially greatly increase proteome coverage leading to the unearthing of valuable insights into protein biology, especially of organisms whose genomes have been recently sequenced or are poorly annotated.
Epigenetic Mechanisms Underlie Genome Development

ERIC Educational Resources Information Center

Lamm, Ehud

2013-01-01

Technological and methodological advances, in particular next-generation sequencing and chromatin profiling, has led to a deluge of data on epigenetic mechanisms and processes. Epigenetic regulation in the brain is no exception. In this commentary, Ehud Lamm writes that extending existing frameworks for thinking about psychological development to…
Next-generation sequencing: the future of molecular genetics in poultry production and food safety.

PubMed

Diaz-Sanchez, S; Hanning, I; Pendleton, Sean; D'Souza, Doris

2013-02-01

The era of molecular biology and automation of the Sanger chain-terminator sequencing method has led to discovery and advances in diagnostics and biotechnology. The Sanger methodology dominated research for over 2 decades, leading to significant accomplishments and technological improvements in DNA sequencing. Next-generation high-throughput sequencing (HT-NGS) technologies were developed subsequently to overcome the limitations of this first generation technology that include higher speed, less labor, and lowered cost. Various platforms developed include sequencing-by-synthesis 454 Life Sciences, Illumina (Solexa) sequencing, SOLiD sequencing (among others), and the Ion Torrent semiconductor sequencing technologies that use different detection principles. As technology advances, progress made toward third generation sequencing technologies are being reported, which include Nanopore Sequencing and real-time monitoring of PCR activity through fluorescent resonant energy transfer. The advantages of these technologies include scalability, simplicity, with increasing DNA polymerase performance and yields, being less error prone, and even more economically feasible with the eventual goal of obtaining real-time results. These technologies can be directly applied to improve poultry production and enhance food safety. For example, sequence-based (determination of the gut microbial community, genes for metabolic pathways, or presence of plasmids) and function-based (screening for function such as antibiotic resistance, or vitamin production) metagenomic analysis can be carried out. Gut microbialflora/communities of poultry can be sequenced to determine the changes that affect health and disease along with efficacy of methods to control pathogenic growth. Thus, the purpose of this review is to provide an overview of the principles of these current technologies and their potential application to improve poultry production and food safety as well as public health.
Rational design of DNA sequences for nanotechnology, microarrays and molecular computers using Eulerian graphs.

PubMed

Pancoska, Petr; Moravek, Zdenek; Moll, Ute M

2004-01-01

Nucleic acids are molecules of choice for both established and emerging nanoscale technologies. These technologies benefit from large functional densities of 'DNA processing elements' that can be readily manufactured. To achieve the desired functionality, polynucleotide sequences are currently designed by a process that involves tedious and laborious filtering of potential candidates against a series of requirements and parameters. Here, we present a complete novel methodology for the rapid rational design of large sets of DNA sequences. This method allows for the direct implementation of very complex and detailed requirements for the generated sequences, thus avoiding 'brute force' filtering. At the same time, these sequences have narrow distributions of melting temperatures. The molecular part of the design process can be done without computer assistance, using an efficient 'human engineering' approach by drawing a single blueprint graph that represents all generated sequences. Moreover, the method eliminates the necessity for extensive thermodynamic calculations. Melting temperature can be calculated only once (or not at all). In addition, the isostability of the sequences is independent of the selection of a particular set of thermodynamic parameters. Applications are presented for DNA sequence designs for microarrays, universal microarray zip sequences and electron transfer experiments.
Monitoring and Surveillance of Marine Invasive Species in Californian Waters by DNA Barcoding: Methodological and Analytical Solutions

NASA Astrophysics Data System (ADS)

Campbell, T. L.; Geller, J. B.; Heller, P.; Ruiz, G.; Chang, A.; McCann, L.; Ceballos, L.; Marraffini, M.; Ashton, G.; Larson, K.; Havard, S.; Meagher, K.; Wheelock, M.; Drake, C.; Rhett, G.

2016-02-01

The Ballast Water Management Act, the Marine Invasive Species Act, and the Coastal Ecosystem Protection Act require the California Department of Fish and Wildlife to monitor and evaluate the extent of biological invasions in the state's marine and estuarine waters. This has been performed statewide, using a variety of methodologies. Conventional sample collection and processing is laborious, slow and costly, and may require considerable taxonomic expertise requiring detailed time-consuming microscopic study of multiple specimens. These factors limit the volume of biomass that can be searched for introduced species. New technologies continue to reduce the cost and increase the throughput of genetic analyses, which become efficient alternatives to traditional morphological analysis for identification, monitoring and surveillance of marine invasive species. Using next-generation sequencing of mitochondrial Cytochrome c oxidase subunit I (COI) and nuclear large subunit ribosomal RNA (LSU), we analyzed over 15,000 individual marine invertebrates collected in Californian waters. We have created sequence databases of California native and non-native species to assist in molecular identification and surveillance in North American waters. Metagenetics, the next-generation sequencing of environmental samples with comparison to DNA sequence databases, is a faster and cost-effective alternative to individual sample analysis. We have sequenced from biomass collected from whole settlement plates and plankton in California harbors, and used our introduced species database to create species lists. We can combine these species lists for individual marinas with collected environmental data, such as temperature, salinity, and dissolved oxygen to understand the ecology of marine invasions. Here we discuss high throughput sampling, sequencing, and COASTLINE, our data analysis answer to challenges working with hundreds of millions of sequencing reads from tens of thousands of specimens.
Y and W Chromosome Assemblies: Approaches and Discoveries.

PubMed

Tomaszkiewicz, Marta; Medvedev, Paul; Makova, Kateryna D

2017-04-01

Hundreds of vertebrate genomes have been sequenced and assembled to date. However, most sequencing projects have ignored the sex chromosomes unique to the heterogametic sex - Y and W - that are known as sex-limited chromosomes (SLCs). Indeed, haploid and repetitive Y chromosomes in species with male heterogamety (XY), and W chromosomes in species with female heterogamety (ZW), are difficult to sequence and assemble. Nevertheless, obtaining their sequences is important for understanding the intricacies of vertebrate genome function and evolution. Recent progress has been made towards the adaptation of next-generation sequencing (NGS) techniques to deciphering SLC sequences. We review here currently available methodology and results with regard to SLC sequencing and assembly. We focus on vertebrates, but bring in some examples from other taxa. Copyright © 2017 Elsevier Ltd. All rights reserved.
Methodological reporting of randomized trials in five leading Chinese nursing journals.

PubMed

Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

2014-01-01

Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34 ± 0.97 (Mean ± SD). No RCT reported descriptions and changes in "trial design," changes in "outcomes" and "implementation," or descriptions of the similarity of interventions for "blinding." Poor reporting was found in detailing the "settings of participants" (13.1%), "type of randomization sequence generation" (1.8%), calculation methods of "sample size" (0.4%), explanation of any interim analyses and stopping guidelines for "sample size" (0.3%), "allocation concealment mechanism" (0.3%), additional analyses in "statistical methods" (2.1%), and targeted subjects and methods of "blinding" (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of "participants," "interventions," and definitions of the "outcomes" and "statistical methods." The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods.

Shotgun protein sequencing: assembly of peptide tandem mass spectra from mixtures of modified proteins.

PubMed

Bandeira, Nuno; Clauser, Karl R; Pevzner, Pavel A

2007-07-01

Despite significant advances in the identification of known proteins, the analysis of unknown proteins by MS/MS still remains a challenging open problem. Although Klaus Biemann recognized the potential of MS/MS for sequencing of unknown proteins in the 1980s, low throughput Edman degradation followed by cloning still remains the main method to sequence unknown proteins. The automated interpretation of MS/MS spectra has been limited by a focus on individual spectra and has not capitalized on the information contained in spectra of overlapping peptides. Indeed the powerful shotgun DNA sequencing strategies have not been extended to automated protein sequencing. We demonstrate, for the first time, the feasibility of automated shotgun protein sequencing of protein mixtures by utilizing MS/MS spectra of overlapping and possibly modified peptides generated via multiple proteases of different specificities. We validate this approach by generating highly accurate de novo reconstructions of multiple regions of various proteins in western diamondback rattlesnake venom. We further argue that shotgun protein sequencing has the potential to overcome the limitations of current protein sequencing approaches and thus catalyze the otherwise impractical applications of proteomics methodologies in studies of unknown proteins.
Rapid in vivo apparent diffusion coefficient mapping of hyperpolarized (13) C metabolites.

PubMed

Koelsch, Bertram L; Reed, Galen D; Keshari, Kayvan R; Chaumeil, Myriam M; Bok, Robert; Ronen, Sabrina M; Vigneron, Daniel B; Kurhanewicz, John; Larson, Peder E Z

2015-09-01

Hyperpolarized (13) C magnetic resonance allows for the study of real-time metabolism in vivo, including significant hyperpolarized (13) C lactate production in many tumors. Other studies have shown that aggressive and highly metastatic tumors rapidly transport lactate out of cells. Thus, the ability to not only measure the production of hyperpolarized (13) C lactate but also understand its compartmentalization using diffusion-weighted MR will provide unique information for improved tumor characterization. We used a bipolar, pulsed-gradient, double spin echo imaging sequence to rapidly generate diffusion-weighted images of hyperpolarized (13) C metabolites. Our methodology included a simultaneously acquired B1 map to improve apparent diffusion coefficient (ADC) accuracy and a diffusion-compensated variable flip angle scheme to improve ADC precision. We validated this sequence and methodology in hyperpolarized (13) C phantoms. Next, we generated ADC maps of several hyperpolarized (13) C metabolites in a normal rat, rat brain tumor, and prostate cancer mouse model using both preclinical and clinical trial-ready hardware. ADC maps of hyperpolarized (13) C metabolites provide information about the localization of these molecules in the tissue microenvironment. The methodology presented here allows for further studies to investigate ADC changes due to disease state that may provide unique information about cancer aggressiveness and metastatic potential. © 2014 Wiley Periodicals, Inc.
REDItools: high-throughput RNA editing detection made easy.

PubMed

Picardi, Ernesto; Pesole, Graziano

2013-07-15

The reliable detection of RNA editing sites from massive sequencing data remains challenging and, although several methodologies have been proposed, no computational tools have been released to date. Here, we introduce REDItools a suite of python scripts to perform high-throughput investigation of RNA editing using next-generation sequencing data. REDItools are in python programming language and freely available at http://code.google.com/p/reditools/. ernesto.picardi@uniba.it or graziano.pesole@uniba.it Supplementary data are available at Bioinformatics online.
Building toy models of proteins using coevolutionary information

NASA Astrophysics Data System (ADS)

Cheng, Ryan; Raghunathan, Mohit; Onuchic, Jose

2015-03-01

Recent developments in global statistical methodologies have advanced the analysis of large collections of protein sequences for coevolutionary information. Coevolution between amino acids in a protein arises from compensatory mutations that are needed to maintain the stability or function of a protein over the course of evolution. This gives rise to quantifiable correlations between amino acid positions within the multiple sequence alignment of a protein family. Here, we use Direct Coupling Analysis (DCA) to infer a Potts model Hamiltonian governing the correlated mutations in a protein family to obtain the sequence-dependent interaction energies of a toy protein model. We demonstrate that this methodology predicts residue-residue interaction energies that are consistent with experimental mutational changes in protein stabilities as well as other computational methodologies. Furthermore, we demonstrate with several examples that DCA could be used to construct a structure-based model that quantitatively agrees with experimental data on folding mechanisms. This work serves as a potential framework for generating models of proteins that are enriched by evolutionary data that can potentially be used to engineer key functional motions and interactions in protein systems. This research has been supported by the NSF INSPIRE award MCB-1241332 and by the CTBP sponsored by the NSF (Grant PHY-1427654).
Metasecretome-selective phage display approach for mining the functional potential of a rumen microbial community.

PubMed

Ciric, Milica; Moon, Christina D; Leahy, Sinead C; Creevey, Christopher J; Altermann, Eric; Attwood, Graeme T; Rakonjac, Jasna; Gagic, Dragana

2014-05-12

In silico, secretome proteins can be predicted from completely sequenced genomes using various available algorithms that identify membrane-targeting sequences. For metasecretome (collection of surface, secreted and transmembrane proteins from environmental microbial communities) this approach is impractical, considering that the metasecretome open reading frames (ORFs) comprise only 10% to 30% of total metagenome, and are poorly represented in the dataset due to overall low coverage of metagenomic gene pool, even in large-scale projects. By combining secretome-selective phage display and next-generation sequencing, we focused the sequence analysis of complex rumen microbial community on the metasecretome component of the metagenome. This approach achieved high enrichment (29 fold) of secreted fibrolytic enzymes from the plant-adherent microbial community of the bovine rumen. In particular, we identified hundreds of heretofore rare modules belonging to cellulosomes, cell-surface complexes specialised for recognition and degradation of the plant fibre. As a method, metasecretome phage display combined with next-generation sequencing has a power to sample the diversity of low-abundance surface and secreted proteins that would otherwise require exceptionally large metagenomic sequencing projects. As a resource, metasecretome display library backed by the dataset obtained by next-generation sequencing is ready for i) affinity selection by standard phage display methodology and ii) easy purification of displayed proteins as part of the virion for individual functional analysis.
A New Method for Generating Probability Tables in the Unresolved Resonance Region

DOE PAGES

Holcomb, Andrew M.; Leal, Luiz C.; Rahnema, Farzad; ...

2017-04-18

One new method for constructing probability tables in the unresolved resonance region (URR) has been developed. This new methodology is an extensive modification of the single-level Breit-Wigner (SLBW) pseudo-resonance pair sequence method commonly used to generate probability tables in the URR. The new method uses a Monte Carlo process to generate many pseudo-resonance sequences by first sampling the average resonance parameter data in the URR and then converting the sampled resonance parameters to the more robust R-matrix limited (RML) format. Furthermore, for each sampled set of pseudo-resonance sequences, the temperature-dependent cross sections are reconstructed on a small grid around themore » energy of reference using the Reich-Moore formalism and the Leal-Hwang Doppler broadening methodology. We then use the effective cross sections calculated at the energies of reference to construct probability tables in the URR. The RML cross-section reconstruction algorithm has been rigorously tested for a variety of isotopes, including 16O, 19F, 35Cl, 56Fe, 63Cu, and 65Cu. The new URR method also produced normalized cross-section factor probability tables for 238U that were found to be in agreement with current standards. The modified 238U probability tables were shown to produce results in excellent agreement with several standard benchmarks, including the IEU-MET-FAST-007 (BIG TEN), IEU-MET-FAST-003, and IEU-COMP-FAST-004 benchmarks.« less
Targeted 'next-generation' sequencing in anophthalmia and microphthalmia patients confirms SOX2, OTX2 and FOXE3 mutations.

PubMed

Jimenez, Nelson Lopez; Flannick, Jason; Yahyavi, Mani; Li, Jiang; Bardakjian, Tanya; Tonkin, Leath; Schneider, Adele; Sherr, Elliott H; Slavotinek, Anne M

2011-12-28

Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP) calling software. We verified predicted sequence alterations using Sanger sequencing. We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15) that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp) deletion and one 3 bp duplication in SOX2. Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M.
Targeted 'Next-Generation' sequencing in anophthalmia and microphthalmia patients confirms SOX2, OTX2 and FOXE3 mutations

PubMed Central

2011-01-01

Background Anophthalmia/microphthalmia (A/M) is caused by mutations in several different transcription factors, but mutations in each causative gene are relatively rare, emphasizing the need for a testing approach that screens multiple genes simultaneously. We used next-generation sequencing to screen 15 A/M patients for mutations in 9 pathogenic genes to evaluate this technology for screening in A/M. Methods We used a pooled sequencing design, together with custom single nucleotide polymorphism (SNP) calling software. We verified predicted sequence alterations using Sanger sequencing. Results We verified three mutations - c.542delC in SOX2, resulting in p.Pro181Argfs*22, p.Glu105X in OTX2 and p.Cys240X in FOXE3. We found several novel sequence alterations and SNPs that were likely to be non-pathogenic - p.Glu42Lys in CRYBA4, p.Val201Met in FOXE3 and p.Asp291Asn in VSX2. Our analysis methodology gave one false positive result comprising a mutation in PAX6 (c.1268A > T, predicting p.X423LeuextX*15) that was not verified by Sanger sequencing. We also failed to detect one 20 base pair (bp) deletion and one 3 bp duplication in SOX2. Conclusions Our results demonstrated the power of next-generation sequencing with pooled sample groups for the rapid screening of candidate genes for A/M as we were correctly able to identify disease-causing mutations. However, next-generation sequencing was less useful for small, intragenic deletions and duplications. We did not find mutations in 10/15 patients and conclude that there is a need for further gene discovery in A/M. PMID:22204637
Construction of a high-density genetic map for grape using next generation restriction-site associated DNA sequencing

PubMed Central

2012-01-01

Background Genetic mapping and QTL detection are powerful methodologies in plant improvement and breeding. Construction of a high-density and high-quality genetic map would be of great benefit in the production of superior grapes to meet human demand. High throughput and low cost of the recently developed next generation sequencing (NGS) technology have resulted in its wide application in genome research. Sequencing restriction-site associated DNA (RAD) might be an efficient strategy to simplify genotyping. Combining NGS with RAD has proven to be powerful for single nucleotide polymorphism (SNP) marker development. Results An F1 population of 100 individual plants was developed. In-silico digestion-site prediction was used to select an appropriate restriction enzyme for construction of a RAD sequencing library. Next generation RAD sequencing was applied to genotype the F1 population and its parents. Applying a cluster strategy for SNP modulation, a total of 1,814 high-quality SNP markers were developed: 1,121 of these were mapped to the female genetic map, 759 to the male map, and 1,646 to the integrated map. A comparison of the genetic maps to the published Vitis vinifera genome revealed both conservation and variations. Conclusions The applicability of next generation RAD sequencing for genotyping a grape F1 population was demonstrated, leading to the successful development of a genetic map with high density and quality using our designed SNP markers. Detailed analysis revealed that this newly developed genetic map can be used for a variety of genome investigations, such as QTL detection, sequence assembly and genome comparison. PMID:22908993
HSA: a heuristic splice alignment tool.

PubMed

Bu, Jingde; Chi, Xuebin; Jin, Zhong

2013-01-01

RNA-Seq methodology is a revolutionary transcriptomics sequencing technology, which is the representative of Next generation Sequencing (NGS). With the high throughput sequencing of RNA-Seq, we can acquire much more information like differential expression and novel splice variants from deep sequence analysis and data mining. But the short read length brings a great challenge to alignment, especially when the reads span two or more exons. A two steps heuristic splice alignment tool is generated in this investigation. First, map raw reads to reference with unspliced aligner--BWA; second, split initial unmapped reads into three equal short reads (seeds), align each seed to the reference, filter hits, search possible split position of read and extend hits to a complete match. Compare with other splice alignment tools like SOAPsplice and Tophat2, HSA has a better performance in call rate and efficiency, but its results do not as accurate as the other software to some extent. HSA is an effective spliced aligner of RNA-Seq reads mapping, which is available at https://github.com/vlcc/HSA.
Comparing microarrays and next-generation sequencing technologies for microbial ecology research.

PubMed

Roh, Seong Woon; Abell, Guy C J; Kim, Kyoung-Ho; Nam, Young-Do; Bae, Jin-Woo

2010-06-01

Recent advances in molecular biology have resulted in the application of DNA microarrays and next-generation sequencing (NGS) technologies to the field of microbial ecology. This review aims to examine the strengths and weaknesses of each of the methodologies, including depth and ease of analysis, throughput and cost-effectiveness. It also intends to highlight the optimal application of each of the individual technologies toward the study of a particular environment and identify potential synergies between the two main technologies, whereby both sample number and coverage can be maximized. We suggest that the efficient use of microarray and NGS technologies will allow researchers to advance the field of microbial ecology, and importantly, improve our understanding of the role of microorganisms in their various environments.
Hybrid analysis for indicating patients with breast cancer using temperature time series.

PubMed

Silva, Lincoln F; Santos, Alair Augusto S M D; Bravo, Renato S; Silva, Aristófanes C; Muchaluat-Saade, Débora C; Conci, Aura

2016-07-01

Breast cancer is the most common cancer among women worldwide. Diagnosis and treatment in early stages increase cure chances. The temperature of cancerous tissue is generally higher than that of healthy surrounding tissues, making thermography an option to be considered in screening strategies of this cancer type. This paper proposes a hybrid methodology for analyzing dynamic infrared thermography in order to indicate patients with risk of breast cancer, using unsupervised and supervised machine learning techniques, which characterizes the methodology as hybrid. The dynamic infrared thermography monitors or quantitatively measures temperature changes on the examined surface, after a thermal stress. In the dynamic infrared thermography execution, a sequence of breast thermograms is generated. In the proposed methodology, this sequence is processed and analyzed by several techniques. First, the region of the breasts is segmented and the thermograms of the sequence are registered. Then, temperature time series are built and the k-means algorithm is applied on these series using various values of k. Clustering formed by k-means algorithm, for each k value, is evaluated using clustering validation indices, generating values treated as features in the classification model construction step. A data mining tool was used to solve the combined algorithm selection and hyperparameter optimization (CASH) problem in classification tasks. Besides the classification algorithm recommended by the data mining tool, classifiers based on Bayesian networks, neural networks, decision rules and decision tree were executed on the data set used for evaluation. Test results support that the proposed analysis methodology is able to indicate patients with breast cancer. Among 39 tested classification algorithms, K-Star and Bayes Net presented 100% classification accuracy. Furthermore, among the Bayes Net, multi-layer perceptron, decision table and random forest classification algorithms, an average accuracy of 95.38% was obtained. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Implementation of next-generation sequencing for molecular diagnosis of hereditary breast and ovarian cancer highlights its genetic heterogeneity.

PubMed

Pinto, Pedro; Paulo, Paula; Santos, Catarina; Rocha, Patrícia; Pinto, Carla; Veiga, Isabel; Pinheiro, Manuela; Peixoto, Ana; Teixeira, Manuel R

2016-09-01

Molecular diagnosis of hereditary breast and ovarian cancer (HBOC) by standard methodologies has been limited to the BRCA1 and BRCA2 genes. With the recent development of new sequencing methodologies, the speed and efficiency of DNA testing have dramatically improved. The aim of this work was to validate the use of next-generation sequencing (NGS) for the detection of BRCA1/BRCA2 point mutations in a diagnostic setting and to study the role of other genes associated with HBOC in Portuguese families. A cohort of 94 high-risk families was included in the study, and they were initially screened for the two common founder mutations with variant-specific methods. Fourteen index patients were shown to carry the Portuguese founder mutation BRCA2 c.156_157insAlu, and the remaining 80 were analyzed in parallel by Sanger sequencing for the BRCA1/BRCA2 genes and by NGS for a panel of 17 genes that have been described as involved in predisposition to breast and/or ovarian cancer. A total of 506 variants in the BRCA1/BRCA2 genes were detected by both methodologies, with a 100 % concordance between them. This strategy allowed the detection of a total of 39 deleterious mutations in the 94 index patients, namely 10 in BRCA1 (25.6 %), 21 in BRCA2 (53.8 %), four in PALB2 (10.3 %), two in ATM (5.1 %), one in CHEK2 (2.6 %), and one in TP53 (2.6 %), with 20.5 % of the deleterious mutations being found in genes other than BRCA1/BRCA2. These results demonstrate the efficiency of NGS for the detection of BRCA1/BRCA2 point mutations and highlight the genetic heterogeneity of HBOC.
FRAGS: estimation of coding sequence substitution rates from fragmentary data

PubMed Central

Swart, Estienne C; Hide, Winston A; Seoighe, Cathal

2004-01-01

Background Rates of substitution in protein-coding sequences can provide important insights into evolutionary processes that are of biomedical and theoretical interest. Increased availability of coding sequence data has enabled researchers to estimate more accurately the coding sequence divergence of pairs of organisms. However the use of different data sources, alignment protocols and methods to estimate substitution rates leads to widely varying estimates of key parameters that define the coding sequence divergence of orthologous genes. Although complete genome sequence data are not available for all organisms, fragmentary sequence data can provide accurate estimates of substitution rates provided that an appropriate and consistent methodology is used and that differences in the estimates obtainable from different data sources are taken into account. Results We have developed FRAGS, an application framework that uses existing, freely available software components to construct in-frame alignments and estimate coding substitution rates from fragmentary sequence data. Coding sequence substitution estimates for human and chimpanzee sequences, generated by FRAGS, reveal that methodological differences can give rise to significantly different estimates of important substitution parameters. The estimated substitution rates were also used to infer upper-bounds on the amount of sequencing error in the datasets that we have analysed. Conclusion We have developed a system that performs robust estimation of substitution rates for orthologous sequences from a pair of organisms. Our system can be used when fragmentary genomic or transcript data is available from one of the organisms and the other is a completely sequenced genome within the Ensembl database. As well as estimating substitution statistics our system enables the user to manage and query alignment and substitution data. PMID:15005802
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

PubMed Central

2014-01-01

Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
Improved PCR-Based Detection of Soil Transmitted Helminth Infections Using a Next-Generation Sequencing Approach to Assay Design.

PubMed

Pilotte, Nils; Papaiakovou, Marina; Grant, Jessica R; Bierwert, Lou Ann; Llewellyn, Stacey; McCarthy, James S; Williams, Steven A

2016-03-01

The soil transmitted helminths are a group of parasitic worms responsible for extensive morbidity in many of the world's most economically depressed locations. With growing emphasis on disease mapping and eradication, the availability of accurate and cost-effective diagnostic measures is of paramount importance to global control and elimination efforts. While real-time PCR-based molecular detection assays have shown great promise, to date, these assays have utilized sub-optimal targets. By performing next-generation sequencing-based repeat analyses, we have identified high copy-number, non-coding DNA sequences from a series of soil transmitted pathogens. We have used these repetitive DNA elements as targets in the development of novel, multi-parallel, PCR-based diagnostic assays. Utilizing next-generation sequencing and the Galaxy-based RepeatExplorer web server, we performed repeat DNA analysis on five species of soil transmitted helminths (Necator americanus, Ancylostoma duodenale, Trichuris trichiura, Ascaris lumbricoides, and Strongyloides stercoralis). Employing high copy-number, non-coding repeat DNA sequences as targets, novel real-time PCR assays were designed, and assays were tested against established molecular detection methods. Each assay provided consistent detection of genomic DNA at quantities of 2 fg or less, demonstrated species-specificity, and showed an improved limit of detection over the existing, proven PCR-based assay. The utilization of next-generation sequencing-based repeat DNA analysis methodologies for the identification of molecular diagnostic targets has the ability to improve assay species-specificity and limits of detection. By exploiting such high copy-number repeat sequences, the assays described here will facilitate soil transmitted helminth diagnostic efforts. We recommend similar analyses when designing PCR-based diagnostic tests for the detection of other eukaryotic pathogens.
Development of Scoring Functions for Antibody Sequence Assessment and Optimization

PubMed Central

Seeliger, Daniel

2013-01-01

Antibody development is still associated with substantial risks and difficulties as single mutations can radically change molecule properties like thermodynamic stability, solubility or viscosity. Since antibody generation methodologies cannot select and optimize for molecule properties which are important for biotechnological applications, careful sequence analysis and optimization is necessary to develop antibodies that fulfil the ambitious requirements of future drugs. While efforts to grab the physical principles of undesired molecule properties from the very bottom are becoming increasingly powerful, the wealth of publically available antibody sequences provides an alternative way to develop early assessment strategies for antibodies using a statistical approach which is the objective of this paper. Here, publically available sequences were used to develop heuristic potentials for the framework regions of heavy and light chains of antibodies of human and murine origin. The potentials take into account position dependent probabilities of individual amino acids but also conditional probabilities which are inevitable for sequence assessment and optimization. It is shown that the potentials derived from human sequences clearly distinguish between human sequences and sequences from mice and, hence, can be used as a measure of humaness which compares a given sequence with the phenotypic pool of human sequences instead of comparing sequence identities to germline genes. Following this line, it is demonstrated that, using the developed potentials, humanization of an antibody can be described as a simple mathematical optimization problem and that the in-silico generated framework variants closely resemble native sequences in terms of predicted immunogenicity. PMID:24204701
Effective normalization for copy number variation detection from whole genome sequencing.

PubMed

Janevski, Angel; Varadan, Vinay; Kamalakaran, Sitharthan; Banerjee, Nilanjana; Dimitrova, Nevenka

2012-01-01

Whole genome sequencing enables a high resolution view of the human genome and provides unique insights into genome structure at an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools, while validated, also include a number of parameters that are configurable to genome data being analyzed. These algorithms allow for normalization to account for individual and population-specific effects on individual genome CNV estimates but the impact of these changes on the estimated CNVs is not well characterized. We evaluate in detail the effect of normalization methodologies in two CNV algorithms FREEC and CNV-seq using whole genome sequencing data from 8 individuals spanning four populations. We apply FREEC and CNV-seq to a sequencing data set consisting of 8 genomes. We use multiple configurations corresponding to different read-count normalization methodologies in FREEC, and statistically characterize the concordance of the CNV calls between FREEC configurations and the analogous output from CNV-seq. The normalization methodologies evaluated in FREEC are: GC content, mappability and control genome. We further stratify the concordance analysis within genic, non-genic, and a collection of validated variant regions. The GC content normalization methodology generates the highest number of altered copy number regions. Both mappability and control genome normalization reduce the total number and length of copy number regions. Mappability normalization yields Jaccard indices in the 0.07 - 0.3 range, whereas using a control genome normalization yields Jaccard index values around 0.4 with normalization based on GC content. The most critical impact of using mappability as a normalization factor is substantial reduction of deletion CNV calls. The output of another method based on control genome normalization, CNV-seq, resulted in comparable CNV call profiles, and substantial agreement in variable gene and CNV region calls. Choice of read-count normalization methodology has a substantial effect on CNV calls and the use of genomic mappability or an appropriately chosen control genome can optimize the output of CNV analysis.
Identification of expressed sequences in the coffee genome potentially associated with somatic embryogenesis.

PubMed

Silva, A T; Paiva, L V; Andrade, A C; Barduche, D

2013-05-21

Brazil possesses the most modern and productive coffee growing farms in the world, but technological development is desired to cope with the increasing world demand. One way to increase Brazilian coffee growing productivity is wide scale production of clones with superior genotypes, which can be obtained with in vitro propagation technique, or from tissue culture. These procedures can generate thousands of clones. However, the methodologies for in vitro cultivation are genotype-dependent, which leads to an almost empirical development of specific protocols for each species. Therefore, molecular markers linked to the biochemical events of somatic embryogenesis would greatly facilitate the development of such protocols. In this context, sequences potentially involved in embryogenesis processes in the coffee plant were identified in silico from libraries generated by the Brazilian Coffee Genome Project. Through these in silico analyses, we identified 15 EST-contigs related to the embryogenesis process. Among these, 5 EST-contigs (3605, 9850, 13686, 17240, and 17265) could readily be associated with plant embryogenesis. Sequence analysis of EST-contig 3605, 9850, and 17265 revealed similarity to a polygalacturonase, to a cysteine-proteinase, and to an allergenine, respectively. Results also show that EST-contig 17265 sequences presented similarity to an expansin. Finally, analysis of EST-contig 17240 revealed similarity to a protein of unknown function, but it grouped in the similarity dendrogram with the WUSCHEL transcription factor. The data suggest that these EST-contigs are related to the embryogenic process and have potential as molecular markers to increase methodological efficiency in obtaining coffee plant embryogenic materials.
Genomics and metagenomics in medical microbiology.

PubMed

Padmanabhan, Roshan; Mishra, Ajay Kumar; Raoult, Didier; Fournier, Pierre-Edouard

2013-12-01

Over the last two decades, sequencing tools have evolved from laborious time-consuming methodologies to real-time detection and deciphering of genomic DNA. Genome sequencing, especially using next generation sequencing (NGS) has revolutionized the landscape of microbiology and infectious disease. This deluge of sequencing data has not only enabled advances in fundamental biology but also helped improve diagnosis, typing of pathogen, virulence and antibiotic resistance detection, and development of new vaccines and culture media. In addition, NGS also enabled efficient analysis of complex human micro-floras, both commensal, and pathological, through metagenomic methods, thus helping the comprehension and management of human diseases such as obesity. This review summarizes technological advances in genomics and metagenomics relevant to the field of medical microbiology. Copyright © 2013 Elsevier B.V. All rights reserved.

A Framework for the Evaluation of Biosecurity, Commercial, Regulatory, and Scientific Impacts of Plant Viruses and Viroids Identified by NGS Technologies

PubMed Central

Massart, Sebastien; Candresse, Thierry; Gil, José; Lacomme, Christophe; Predajna, Lukas; Ravnikar, Maja; Reynard, Jean-Sébastien; Rumbou, Artemis; Saldarelli, Pasquale; Škorić, Dijana; Vainio, Eeva J.; Valkonen, Jari P. T.; Vanderschuren, Hervé; Varveri, Christina; Wetzel, Thierry

2017-01-01

Recent advances in high-throughput sequencing technologies and bioinformatics have generated huge new opportunities for discovering and diagnosing plant viruses and viroids. Plant virology has undoubtedly benefited from these new methodologies, but at the same time, faces now substantial bottlenecks, namely the biological characterization of the newly discovered viruses and the analysis of their impact at the biosecurity, commercial, regulatory, and scientific levels. This paper proposes a scaled and progressive scientific framework for efficient biological characterization and risk assessment when a previously known or a new plant virus is detected by next generation sequencing (NGS) technologies. Four case studies are also presented to illustrate the need for such a framework, and to discuss the scenarios. PMID:28174561
Novel Bioinformatics-Based Approach for Proteomic Biomarkers Prediction of Calpain-2 & Caspase-3 Protease Fragmentation: Application to βII-Spectrin Protein

NASA Astrophysics Data System (ADS)

El-Assaad, Atlal; Dawy, Zaher; Nemer, Georges; Kobeissy, Firas

2017-01-01

The crucial biological role of proteases has been visible with the development of degradomics discipline involved in the determination of the proteases/substrates resulting in breakdown-products (BDPs) that can be utilized as putative biomarkers associated with different biological-clinical significance. In the field of cancer biology, matrix metalloproteinases (MMPs) have shown to result in MMPs-generated protein BDPs that are indicative of malignant growth in cancer, while in the field of neural injury, calpain-2 and caspase-3 proteases generate BDPs fragments that are indicative of different neural cell death mechanisms in different injury scenarios. Advanced proteomic techniques have shown a remarkable progress in identifying these BDPs experimentally. In this work, we present a bioinformatics-based prediction method that identifies protease-associated BDPs with high precision and efficiency. The method utilizes state-of-the-art sequence matching and alignment algorithms. It starts by locating consensus sequence occurrences and their variants in any set of protein substrates, generating all fragments resulting from cleavage. The complexity exists in space O(mn) as well as in O(Nmn) time, where N, m, and n are the number of protein sequences, length of the consensus sequence, and length per protein sequence, respectively. Finally, the proposed methodology is validated against βII-spectrin protein, a brain injury validated biomarker.
Human Y chromosome copy number variation in the next generation sequencing era and beyond.

PubMed

Massaia, Andrea; Xue, Yali

2017-05-01

The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.
Genomic variation in macrophage-cultured European porcine reproductive and respiratory syndrome virus Olot/91 revealed using ultra-deep next generation sequencing.

PubMed

Lu, Zen H; Brown, Alexander; Wilson, Alison D; Calvert, Jay G; Balasch, Monica; Fuentes-Utrilla, Pablo; Loecherbach, Julia; Turner, Frances; Talbot, Richard; Archibald, Alan L; Ait-Ali, Tahar

2014-03-04

Porcine Reproductive and Respiratory Syndrome (PRRS) is a disease of major economic impact worldwide. The etiologic agent of this disease is the PRRS virus (PRRSV). Increasing evidence suggest that microevolution within a coexisting quasispecies population can give rise to high sequence heterogeneity in PRRSV. We developed a pipeline based on the ultra-deep next generation sequencing approach to first construct the complete genome of a European PRRSV, strain Olot/9, cultured on macrophages and then capture the rare variants representative of the mixed quasispecies population. Olot/91 differs from the reference Lelystad strain by about 5% and a total of 88 variants, with frequencies as low as 1%, were detected in the mixed population. These variants included 16 non-synonymous variants concentrated in the genes encoding structural and nonstructural proteins; including Glycoprotein 2a and 5. Using an ultra-deep sequencing methodology, the complete genome of Olot/91 was constructed without any prior knowledge of the sequence. Rare variants that constitute minor fractions of the heterogeneous PRRSV population could successfully be detected to allow further exploration of microevolutionary events.
Applications of the rep-PCR DNA fingerprinting technique to study microbial diversity, ecology and evolution.

PubMed

Ishii, Satoshi; Sadowsky, Michael J

2009-04-01

A large number of repetitive DNA sequences are found in multiple sites in the genomes of numerous bacteria, archaea and eukarya. While the functions of many of these repetitive sequence elements are unknown, they have proven to be useful as the basis of several powerful tools for use in molecular diagnostics, medical microbiology, epidemiological analyses and environmental microbiology. The repetitive sequence-based PCR or rep-PCR DNA fingerprint technique uses primers targeting several of these repetitive elements and PCR to generate unique DNA profiles or 'fingerprints' of individual microbial strains. Although this technique has been extensively used to examine diversity among variety of prokaryotic microorganisms, rep-PCR DNA fingerprinting can also be applied to microbial ecology and microbial evolution studies since it has the power to distinguish microbes at the strain or isolate level. Recent advancement in rep-PCR methodology has resulted in increased accuracy, reproducibility and throughput. In this minireview, we summarize recent improvements in rep-PCR DNA fingerprinting methodology, and discuss its applications to address fundamentally important questions in microbial ecology and evolution.
Investigation of a Canine Parvovirus Outbreak using Next Generation Sequencing.

PubMed

Parker, Jayme; Murphy, Molly; Hueffer, Karsten; Chen, Jack

2017-08-29

Canine parvovirus (CPV) outbreaks can have a devastating effect in communities with dense dog populations. The interior region of Alaska experienced a CPV outbreak in the winter of 2016 leading to the further investigation of the virus due to reports of increased morbidity and mortality occurring at dog mushing kennels in the area. Twelve rectal-swab specimens from dogs displaying clinical signs consistent with parvoviral-associated disease were processed using next-generation sequencing (NGS) methodologies by targeting RNA transcripts, and therefore detecting only replicating virus. All twelve specimens demonstrated the presence of the CPV transcriptome, with read depths ranging from 2.2X - 12,381X, genome coverage ranging from 44.8-96.5%, and representation of CPV sequencing reads to those of the metagenome background ranging from 0.0015-6.7%. Using the data generated by NGS, the presence of newly evolved, yet known, strains of both CPV-2a and CPV-2b were identified and grouped geographically. Deep-sequencing data provided additional diagnostic information in terms of investigating novel CPV in this outbreak. NGS data in addition to limited serological data provided strong diagnostic evidence that this outbreak most likely arose from unvaccinated or under-vaccinated canines, not from a novel CPV strain incapable of being neutralized by current vaccination efforts.
Molecular Typing of Lung Adenocarcinoma on Cytological Samples Using a Multigene Next Generation Sequencing Panel

PubMed Central

Fassan, Matteo; Rachiglio, Anna Maria; Cappellesso, Rocco; Antonello, Davide; Amato, Eliana; Mafficini, Andrea; Lambiase, Matilde; Esposito, Claudia; Bria, Emilio; Simonato, Francesca; Scardoni, Maria; Turri, Giona; Chilosi, Marco; Tortora, Giampaolo; Fassina, Ambrogio; Normanno, Nicola

2013-01-01

Identification of driver mutations in lung adenocarcinoma has led to development of targeted agents that are already approved for clinical use or are in clinical trials. Therefore, the number of biomarkers that will be needed to assess is expected to rapidly increase. This calls for the implementation of methods probing the mutational status of multiple genes for inoperable cases, for which limited cytological or bioptic material is available. Cytology specimens from 38 lung adenocarcinomas were subjected to the simultaneous assessment of 504 mutational hotspots of 22 lung cancer-associated genes using 10 nanograms of DNA and Ion Torrent PGM next-generation sequencing. Thirty-six cases were successfully sequenced (95%). In 24/36 cases (67%) at least one mutated gene was observed, including EGFR, KRAS, PIK3CA, BRAF, TP53, PTEN, MET, SMAD4, FGFR3, STK11, MAP2K1. EGFR and KRAS mutations, respectively found in 6/36 (16%) and 10/36 (28%) cases, were mutually exclusive. Nine samples (25%) showed concurrent alterations in different genes. The next-generation sequencing test used is superior to current standard methodologies, as it interrogates multiple genes and requires limited amounts of DNA. Its applicability to routine cytology samples might allow a significant increase in the fraction of lung cancer patients eligible for personalized therapy. PMID:24236184
Targeted Next-generation Sequencing and Bioinformatics Pipeline to Evaluate Genetic Determinants of Constitutional Disease.

PubMed

Dilliott, Allison A; Farhan, Sali M K; Ghani, Mahdi; Sato, Christine; Liang, Eric; Zhang, Ming; McIntyre, Adam D; Cao, Henian; Racacho, Lemuel; Robinson, John F; Strong, Michael J; Masellis, Mario; Bulman, Dennis E; Rogaeva, Ekaterina; Lang, Anthony; Tartaglia, Carmela; Finger, Elizabeth; Zinman, Lorne; Turnbull, John; Freedman, Morris; Swartz, Rick; Black, Sandra E; Hegele, Robert A

2018-04-04

Next-generation sequencing (NGS) is quickly revolutionizing how research into the genetic determinants of constitutional disease is performed. The technique is highly efficient with millions of sequencing reads being produced in a short time span and at relatively low cost. Specifically, targeted NGS is able to focus investigations to genomic regions of particular interest based on the disease of study. Not only does this further reduce costs and increase the speed of the process, but it lessens the computational burden that often accompanies NGS. Although targeted NGS is restricted to certain regions of the genome, preventing identification of potential novel loci of interest, it can be an excellent technique when faced with a phenotypically and genetically heterogeneous disease, for which there are previously known genetic associations. Because of the complex nature of the sequencing technique, it is important to closely adhere to protocols and methodologies in order to achieve sequencing reads of high coverage and quality. Further, once sequencing reads are obtained, a sophisticated bioinformatics workflow is utilized to accurately map reads to a reference genome, to call variants, and to ensure the variants pass quality metrics. Variants must also be annotated and curated based on their clinical significance, which can be standardized by applying the American College of Medical Genetics and Genomics Pathogenicity Guidelines. The methods presented herein will display the steps involved in generating and analyzing NGS data from a targeted sequencing panel, using the ONDRISeq neurodegenerative disease panel as a model, to identify variants that may be of clinical significance.
Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies

PubMed Central

2014-01-01

Background The size and complexity of conifer genomes has, until now, prevented full genome sequencing and assembly. The large research community and economic importance of loblolly pine, Pinus taeda L., made it an early candidate for reference sequence determination. Results We develop a novel strategy to sequence the genome of loblolly pine that combines unique aspects of pine reproductive biology and genome assembly methodology. We use a whole genome shotgun approach relying primarily on next generation sequence generated from a single haploid seed megagametophyte from a loblolly pine tree, 20-1010, that has been used in industrial forest tree breeding. The resulting sequence and assembly was used to generate a draft genome spanning 23.2 Gbp and containing 20.1 Gbp with an N50 scaffold size of 66.9 kbp, making it a significant improvement over available conifer genomes. The long scaffold lengths allow the annotation of 50,172 gene models with intron lengths averaging over 2.7 kbp and sometimes exceeding 100 kbp in length. Analysis of orthologous gene sets identifies gene families that may be unique to conifers. We further characterize and expand the existing repeat library based on the de novo analysis of the repetitive content, estimated to encompass 82% of the genome. Conclusions In addition to its value as a resource for researchers and breeders, the loblolly pine genome sequence and assembly reported here demonstrates a novel approach to sequencing the large and complex genomes of this important group of plants that can now be widely applied. PMID:24647006
A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

PubMed

Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

2013-02-26

The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.
Evaluation of the Bacterial Diversity in the Human Tongue Coating Based on Genus-Specific Primers for 16S rRNA Sequencing.

PubMed

Sun, Beili; Zhou, Dongrui; Tu, Jing; Lu, Zuhong

2017-01-01

The characteristics of tongue coating are very important symbols for disease diagnosis in traditional Chinese medicine (TCM) theory. As a habitat of oral microbiota, bacteria on the tongue dorsum have been proved to be the cause of many oral diseases. The high-throughput next-generation sequencing (NGS) platforms have been widely applied in the analysis of bacterial 16S rRNA gene. We developed a methodology based on genus-specific multiprimer amplification and ligation-based sequencing for microbiota analysis. In order to validate the efficiency of the approach, we thoroughly analyzed six tongue coating samples from lung cancer patients with different TCM types, and more than 600 genera of bacteria were detected by this platform. The results showed that ligation-based parallel sequencing combined with enzyme digestion and multiamplification could expand the effective length of sequencing reads and could be applied in the microbiota analysis.
A universal protocol to generate consensus level genome sequences for foot-and-mouth disease virus and other positive-sense polyadenylated RNA viruses using the Illumina MiSeq.

PubMed

Logan, Grace; Freimanis, Graham L; King, David J; Valdazo-González, Begoña; Bachanek-Bankowska, Katarzyna; Sanderson, Nicholas D; Knowles, Nick J; King, Donald P; Cottam, Eleanor M

2014-09-30

Next-Generation Sequencing (NGS) is revolutionizing molecular epidemiology by providing new approaches to undertake whole genome sequencing (WGS) in diagnostic settings for a variety of human and veterinary pathogens. Previous sequencing protocols have been subject to biases such as those encountered during PCR amplification and cell culture, or are restricted by the need for large quantities of starting material. We describe here a simple and robust methodology for the generation of whole genome sequences on the Illumina MiSeq. This protocol is specific for foot-and-mouth disease virus (FMDV) or other polyadenylated RNA viruses and circumvents both the use of PCR and the requirement for large amounts of initial template. The protocol was successfully validated using five FMDV positive clinical samples from the 2001 epidemic in the United Kingdom, as well as a panel of representative viruses from all seven serotypes. In addition, this protocol was successfully used to recover 94% of an FMDV genome that had previously been identified as cell culture negative. Genome sequences from three other non-FMDV polyadenylated RNA viruses (EMCV, ERAV, VESV) were also obtained with minor protocol amendments. We calculated that a minimum coverage depth of 22 reads was required to produce an accurate consensus sequence for FMDV O. This was achieved in 5 FMDV/O/UKG isolates and the type O FMDV from the serotype panel with the exception of the 5' genomic termini and area immediately flanking the poly(C) region. We have developed a universal WGS method for FMDV and other polyadenylated RNA viruses. This method works successfully from a limited quantity of starting material and eliminates the requirement for genome-specific PCR amplification. This protocol has the potential to generate consensus-level sequences within a routine high-throughput diagnostic environment.
Simplifier: a web tool to eliminate redundant NGS contigs.

PubMed

Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Azevedo, Vasco; Schneider, Maria Paula; Barh, Debmalya; Silva, Artur

2012-01-01

Modern genomic sequencing technologies produce a large amount of data with reduced cost per base; however, this data consists of short reads. This reduction in the size of the reads, compared to those obtained with previous methodologies, presents new challenges, including a need for efficient algorithms for the assembly of genomes from short reads and for resolving repetitions. Additionally after abinitio assembly, curation of the hundreds or thousands of contigs generated by assemblers demands considerable time and computational resources. We developed Simplifier, a stand-alone software that selectively eliminates redundant sequences from the collection of contigs generated by ab initio assembly of genomes. Application of Simplifier to data generated by assembly of the genome of Corynebacterium pseudotuberculosis strain 258 reduced the number of contigs generated by ab initio methods from 8,004 to 5,272, a reduction of 34.14%; in addition, N50 increased from 1 kb to 1.5 kb. Processing the contigs of Escherichia coli DH10B with Simplifier reduced the mate-paired library 17.47% and the fragment library 23.91%. Simplifier removed redundant sequences from datasets produced by assemblers, thereby reducing the effort required for finalization of genome assembly in tests with data from Prokaryotic organisms. Simplifier is available at http://www.genoma.ufpa.br/rramos/softwares/simplifier.xhtmlIt requires Sun jdk 6 or higher.
Modeling of plug-in electric vehicle travel patterns and charging load based on trip chain generation

NASA Astrophysics Data System (ADS)

Wang, Dai; Gao, Junyu; Li, Pan; Wang, Bin; Zhang, Cong; Saxena, Samveg

2017-08-01

Modeling PEV travel and charging behavior is the key to estimate the charging demand and further explore the potential of providing grid services. This paper presents a stochastic simulation methodology to generate itineraries and charging load profiles for a population of PEVs based on real-world vehicle driving data. In order to describe the sequence of daily travel activities, we use the trip chain model which contains the detailed information of each trip, namely start time, end time, trip distance, start location and end location. A trip chain generation method is developed based on the Naive Bayes model to generate a large number of trips which are temporally and spatially coupled. We apply the proposed methodology to investigate the multi-location charging loads in three different scenarios. Simulation results show that home charging can meet the energy demand of the majority of PEVs in an average condition. In addition, we calculate the lower bound of charging load peak on the premise of lowest charging cost. The results are instructive for the design and construction of charging facilities to avoid excessive infrastructure.
Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations

PubMed Central

Bilton, Timothy P.; Schofield, Matthew R.; Black, Michael A.; Chagné, David; Wilcox, Phillip L.; Dodds, Ken G.

2018-01-01

Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species’ genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology (e.g., genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander–Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. PMID:29487138
Accounting for Errors in Low Coverage High-Throughput Sequencing Data When Constructing Genetic Maps Using Biparental Outcrossed Populations.

PubMed

Bilton, Timothy P; Schofield, Matthew R; Black, Michael A; Chagné, David; Wilcox, Phillip L; Dodds, Ken G

2018-05-01

Next-generation sequencing is an efficient method that allows for substantially more markers than previous technologies, providing opportunities for building high-density genetic linkage maps, which facilitate the development of nonmodel species' genomic assemblies and the investigation of their genes. However, constructing genetic maps using data generated via high-throughput sequencing technology ( e.g. , genotyping-by-sequencing) is complicated by the presence of sequencing errors and genotyping errors resulting from missing parental alleles due to low sequencing depth. If unaccounted for, these errors lead to inflated genetic maps. In addition, map construction in many species is performed using full-sibling family populations derived from the outcrossing of two individuals, where unknown parental phase and varying segregation types further complicate construction. We present a new methodology for modeling low coverage sequencing data in the construction of genetic linkage maps using full-sibling populations of diploid species, implemented in a package called GUSMap. Our model is based on the Lander-Green hidden Markov model but extended to account for errors present in sequencing data. We were able to obtain accurate estimates of the recombination fractions and overall map distance using GUSMap, while most existing mapping packages produced inflated genetic maps in the presence of errors. Our results demonstrate the feasibility of using low coverage sequencing data to produce genetic maps without requiring extensive filtering of potentially erroneous genotypes, provided that the associated errors are correctly accounted for in the model. Copyright © 2018 Bilton et al.
Making sense of deep sequencing

PubMed Central

Goldman, D.; Domschke, K.

2016-01-01

This review, the first of an occasional series, tries to make sense of the concepts and uses of deep sequencing of polynucleic acids (DNA and RNA). Deep sequencing, synonymous with next-generation sequencing, high-throughput sequencing and massively parallel sequencing, includes whole genome sequencing but is more often and diversely applied to specific parts of the genome captured in different ways, for example the highly expressed portion of the genome known as the exome and portions of the genome that are epigenetically marked either by DNA methylation, the binding of proteins including histones, or that are in different configurations and thus more or less accessible to enzymes that cleave DNA. Deep sequencing of RNA (RNASeq) reverse-transcribed to complementary DNA is invaluable for measuring RNA expression and detecting changes in RNA structure. Important concepts in deep sequencing include the length and depth of sequence reads, mapping and assembly of reads, sequencing error, haplotypes, and the propensity of deep sequencing, as with other types of ‘big data’, to generate large numbers of errors, requiring monitoring for methodologic biases and strategies for replication and validation. Deep sequencing yields a unique genetic fingerprint that can be used to identify a person, and a trove of predictors of genetic medical diseases. Deep sequencing to identify epigenetic events including changes in DNA methylation and RNA expression can reveal the history and impact of environmental exposures. Because of the power of sequencing to identify and deliver biomedically significant information about a person and their blood relatives, it creates ethical dilemmas and practical challenges in research and clinical care, for example the decision and procedures to report incidental findings that will increasingly and frequently be discovered. PMID:24925306
Physical layer one-time-pad data encryption through synchronized semiconductor laser networks

NASA Astrophysics Data System (ADS)

Argyris, Apostolos; Pikasis, Evangelos; Syvridis, Dimitris

2016-02-01

Semiconductor lasers (SL) have been proven to be a key device in the generation of ultrafast true random bit streams. Their potential to emit chaotic signals under conditions with desirable statistics, establish them as a low cost solution to cover various needs, from large volume key generation to real-time encrypted communications. Usually, only undemanding post-processing is needed to convert the acquired analog timeseries to digital sequences that pass all established tests of randomness. A novel architecture that can generate and exploit these true random sequences is through a fiber network in which the nodes are semiconductor lasers that are coupled and synchronized to central hub laser. In this work we show experimentally that laser nodes in such a star network topology can synchronize with each other through complex broadband signals that are the seed to true random bit sequences (TRBS) generated at several Gb/s. The potential for each node to access real-time generated and synchronized with the rest of the nodes random bit streams, through the fiber optic network, allows to implement an one-time-pad encryption protocol that mixes the synchronized true random bit sequence with real data at Gb/s rates. Forward-error correction methods are used to reduce the errors in the TRBS and the final error rate at the data decoding level. An appropriate selection in the sampling methodology and properties, as well as in the physical properties of the chaotic seed signal through which network locks in synchronization, allows an error free performance.
ProDeGe: A computational protocol for fully automated decontamination of genomes

DOE PAGES

Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott; ...

2015-06-09

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less
ProDeGe: A computational protocol for fully automated decontamination of genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tennessen, Kristin; Andersen, Evan; Clingenpeel, Scott

Single amplified genomes and genomes assembled from metagenomes have enabled the exploration of uncultured microorganisms at an unprecedented scale. However, both these types of products are plagued by contamination. Since these genomes are now being generated in a high-throughput manner and sequences from them are propagating into public databases to drive novel scientific discoveries, rigorous quality controls and decontamination protocols are urgently needed. Here, we present ProDeGe (Protocol for fully automated Decontamination of Genomes), the first computational protocol for fully automated decontamination of draft genomes. ProDeGe classifies sequences into two classes—clean and contaminant—using a combination of homology and feature-based methodologies.more » On average, 84% of sequence from the non-target organism is removed from the data set (specificity) and 84% of the sequence from the target organism is retained (sensitivity). Lastly, the procedure operates successfully at a rate of ~0.30 CPU core hours per megabase of sequence and can be applied to any type of genome sequence.« less

Decoding the Heart through Next Generation Sequencing Approaches.

PubMed

Pawlak, Michal; Niescierowicz, Katarzyna; Winata, Cecilia Lanny

2018-06-07

: Vertebrate organs develop through a complex process which involves interaction between multiple signaling pathways at the molecular, cell, and tissue levels. Heart development is an example of such complex process which, when disrupted, results in congenital heart disease (CHD). This complexity necessitates a holistic approach which allows the visualization of genome-wide interaction networks, as opposed to assessment of limited subsets of factors. Genomics offers a powerful solution to address the problem of biological complexity by enabling the observation of molecular processes at a genome-wide scale. The emergence of next generation sequencing (NGS) technology has facilitated the expansion of genomics, increasing its output capacity and applicability in various biological disciplines. The application of NGS in various aspects of heart biology has resulted in new discoveries, generating novel insights into this field of study. Here we review the contributions of NGS technology into the understanding of heart development and its disruption reflected in CHD and discuss how emerging NGS based methodologies can contribute to the further understanding of heart repair.
Comparison of next generation sequencing technologies for transcriptome characterization

PubMed Central

2009-01-01

Background We have developed a simulation approach to help determine the optimal mixture of sequencing methods for most complete and cost effective transcriptome sequencing. We compared simulation results for traditional capillary sequencing with "Next Generation" (NG) ultra high-throughput technologies. The simulation model was parameterized using mappings of 130,000 cDNA sequence reads to the Arabidopsis genome (NCBI Accession SRA008180.19). We also generated 454-GS20 sequences and de novo assemblies for the basal eudicot California poppy (Eschscholzia californica) and the magnoliid avocado (Persea americana) using a variety of methods for cDNA synthesis. Results The Arabidopsis reads tagged more than 15,000 genes, including new splice variants and extended UTR regions. Of the total 134,791 reads (13.8 MB), 119,518 (88.7%) mapped exactly to known exons, while 1,117 (0.8%) mapped to introns, 11,524 (8.6%) spanned annotated intron/exon boundaries, and 3,066 (2.3%) extended beyond the end of annotated UTRs. Sequence-based inference of relative gene expression levels correlated significantly with microarray data. As expected, NG sequencing of normalized libraries tagged more genes than non-normalized libraries, although non-normalized libraries yielded more full-length cDNA sequences. The Arabidopsis data were used to simulate additional rounds of NG and traditional EST sequencing, and various combinations of each. Our simulations suggest a combination of FLX and Solexa sequencing for optimal transcriptome coverage at modest cost. We have also developed ESTcalc http://fgp.huck.psu.edu/NG_Sims/ngsim.pl, an online webtool, which allows users to explore the results of this study by specifying individualized costs and sequencing characteristics. Conclusion NG sequencing technologies are a highly flexible set of platforms that can be scaled to suit different project goals. In terms of sequence coverage alone, the NG sequencing is a dramatic advance over capillary-based sequencing, but NG sequencing also presents significant challenges in assembly and sequence accuracy due to short read lengths, method-specific sequencing errors, and the absence of physical clones. These problems may be overcome by hybrid sequencing strategies using a mixture of sequencing methodologies, by new assemblers, and by sequencing more deeply. Sequencing and microarray outcomes from multiple experiments suggest that our simulator will be useful for guiding NG transcriptome sequencing projects in a wide range of organisms. PMID:19646272
Universal Verification Methodology Based Register Test Automation Flow.

PubMed

Woo, Jae Hun; Cho, Yong Kwan; Park, Sun Kyu

2016-05-01

In today's SoC design, the number of registers has been increased along with complexity of hardware blocks. Register validation is a time-consuming and error-pron task. Therefore, we need an efficient way to perform verification with less effort in shorter time. In this work, we suggest register test automation flow based UVM (Universal Verification Methodology). UVM provides a standard methodology, called a register model, to facilitate stimulus generation and functional checking of registers. However, it is not easy for designers to create register models for their functional blocks or integrate models in test-bench environment because it requires knowledge of SystemVerilog and UVM libraries. For the creation of register models, many commercial tools support a register model generation from register specification described in IP-XACT, but it is time-consuming to describe register specification in IP-XACT format. For easy creation of register model, we propose spreadsheet-based register template which is translated to IP-XACT description, from which register models can be easily generated using commercial tools. On the other hand, we also automate all the steps involved integrating test-bench and generating test-cases, so that designers may use register model without detailed knowledge of UVM or SystemVerilog. This automation flow involves generating and connecting test-bench components (e.g., driver, checker, bus adaptor, etc.) and writing test sequence for each type of register test-case. With the proposed flow, designers can save considerable amount of time to verify functionality of registers.
CLAST: CUDA implemented large-scale alignment search tool.

PubMed

Yano, Masahiro; Mori, Hiroshi; Akiyama, Yutaka; Yamada, Takuji; Kurokawa, Ken

2014-12-11

Metagenomics is a powerful methodology to study microbial communities, but it is highly dependent on nucleotide sequence similarity searching against sequence databases. Metagenomic analyses with next-generation sequencing technologies produce enormous numbers of reads from microbial communities, and many reads are derived from microbes whose genomes have not yet been sequenced, limiting the usefulness of existing sequence similarity search tools. Therefore, there is a clear need for a sequence similarity search tool that can rapidly detect weak similarity in large datasets. We developed a tool, which we named CLAST (CUDA implemented large-scale alignment search tool), that enables analyses of millions of reads and thousands of reference genome sequences, and runs on NVIDIA Fermi architecture graphics processing units. CLAST has four main advantages over existing alignment tools. First, CLAST was capable of identifying sequence similarities ~80.8 times faster than BLAST and 9.6 times faster than BLAT. Second, CLAST executes global alignment as the default (local alignment is also an option), enabling CLAST to assign reads to taxonomic and functional groups based on evolutionarily distant nucleotide sequences with high accuracy. Third, CLAST does not need a preprocessed sequence database like Burrows-Wheeler Transform-based tools, and this enables CLAST to incorporate large, frequently updated sequence databases. Fourth, CLAST requires <2 GB of main memory, making it possible to run CLAST on a standard desktop computer or server node. CLAST achieved very high speed (similar to the Burrows-Wheeler Transform-based Bowtie 2 for long reads) and sensitivity (equal to BLAST, BLAT, and FR-HIT) without the need for extensive database preprocessing or a specialized computing platform. Our results demonstrate that CLAST has the potential to be one of the most powerful and realistic approaches to analyze the massive amount of sequence data from next-generation sequencing technologies.
The first complete mitochondrial genome of Bactrocera tsuneonis (Miyake) (Diptera: Tephritidae) by next-generation sequencing and its phylogenetic implications.

PubMed

Zhang, Yue; Feng, Shiqian; Zeng, Yiying; Ning, Hong; Liu, Lijun; Zhao, Zihua; Jiang, Fan; Li, Zhihong

2018-06-23

Bactrocera tsuneonis (Miyake), generally known as the Japanese orange fly, is considered to be a major pest of commercial citrus crops. It has a limited distribution in China, Japan and Vietnam, but it has the potential to invade areas outside of Asia. More genetic information of B. tsuneonis should be obtained in order to develop effective methodologies for rapid and accurate molecular identification due to the difficulty of distinguishing it from Bactrocera minax based on morphological features. We report here the whole mitochondrial genome of B. tsuneonis sequenced by next-generation sequencing. This mitogenome sequence had a total length of 15,865 bp, a typical circular molecule comprising 13 protein-coding genes, 2 rRNA genes, 22 tRNA genes and a non-coding region (A + T-rich control region). The structure and organization of the molecule were typical and similar compared with the published homologous sequences of other fruit flies in Tephritidae. The phylogenetic analyses based on the mitochondrial genome data presented a close genetic relationship between B. tsuneonis and B. minax. This is the first report of the complete mitochondrial genome of B. tsuneonis, and it can be used in further studies of species diagnosis, evolutionary biology, prevention and control. Copyright © 2018. Published by Elsevier B.V.
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955–2013

PubMed Central

Armijo-Olivo, Susan; Cummings, Greta G.; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

Objectives To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. Methods We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Results Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955–2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. Conclusions The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed. PMID:29272315
Randomized clinical trials in dentistry: Risks of bias, risks of random errors, reporting quality, and methodologic quality over the years 1955-2013.

PubMed

Saltaji, Humam; Armijo-Olivo, Susan; Cummings, Greta G; Amin, Maryam; Flores-Mir, Carlos

2017-01-01

To examine the risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions and the development of these aspects over time. We included 540 randomized clinical trials from 64 selected systematic reviews. We extracted, in duplicate, details from each of the selected randomized clinical trials with respect to publication and trial characteristics, reporting and methodologic characteristics, and Cochrane risk of bias domains. We analyzed data using logistic regression and Chi-square statistics. Sequence generation was assessed to be inadequate (at unclear or high risk of bias) in 68% (n = 367) of the trials, while allocation concealment was inadequate in the majority of trials (n = 464; 85.9%). Blinding of participants and blinding of the outcome assessment were judged to be inadequate in 28.5% (n = 154) and 40.5% (n = 219) of the trials, respectively. A sample size calculation before the initiation of the study was not performed/reported in 79.1% (n = 427) of the trials, while the sample size was assessed as adequate in only 17.6% (n = 95) of the trials. Two thirds of the trials were not described as double blinded (n = 358; 66.3%), while the method of blinding was appropriate in 53% (n = 286) of the trials. We identified a significant decrease over time (1955-2013) in the proportion of trials assessed as having inadequately addressed methodological quality items (P < 0.05) in 30 out of the 40 quality criteria, or as being inadequate (at high or unclear risk of bias) in five domains of the Cochrane risk of bias tool: sequence generation, allocation concealment, incomplete outcome data, other sources of bias, and overall risk of bias. The risks of bias, risks of random errors, reporting quality, and methodological quality of randomized clinical trials of oral health interventions have improved over time; however, further efforts that contribute to the development of more stringent methodology and detailed reporting of trials are still needed.
DNA-encoded chemistry: enabling the deeper sampling of chemical space.

PubMed

Goodnow, Robert A; Dumelin, Christoph E; Keefe, Anthony D

2017-02-01

DNA-encoded chemical library technologies are increasingly being adopted in drug discovery for hit and lead generation. DNA-encoded chemistry enables the exploration of chemical spaces four to five orders of magnitude more deeply than is achievable by traditional high-throughput screening methods. Operation of this technology requires developing a range of capabilities including aqueous synthetic chemistry, building block acquisition, oligonucleotide conjugation, large-scale molecular biological transformations, selection methodologies, PCR, sequencing, sequence data analysis and the analysis of large chemistry spaces. This Review provides an overview of the development and applications of DNA-encoded chemistry, highlighting the challenges and future directions for the use of this technology.
Incorporating microbiota data into epidemiologic models: examples from vaginal microbiota research.

PubMed

van de Wijgert, Janneke H; Jespers, Vicky

2016-05-01

Next generation sequencing and quantitative polymerase chain reaction technologies are now widely available, and research incorporating these methods is growing exponentially. In the vaginal microbiota (VMB) field, most research to date has been descriptive. The purpose of this article is to provide an overview of different ways in which next generation sequencing and quantitative polymerase chain reaction data can be used to answer clinical epidemiologic research questions using examples from VMB research. We reviewed relevant methodological literature and VMB articles (published between 2008 and 2015) that incorporated these methodologies. VMB data have been analyzed using ecologic methods, methods that compare the presence or relative abundance of individual taxa or community compositions between different groups of women or sampling time points, and methods that first reduce the complexity of the data into a few variables followed by the incorporation of these variables into traditional biostatistical models. To make future VMB research more clinically relevant (such as studying associations between VMB compositions and clinical outcomes and the effects of interventions on the VMB), it is important that these methods are integrated with rigorous epidemiologic methods (such as appropriate study designs, sampling strategies, and adjustment for confounding). Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
[Methodological quality and reporting quality evaluation of randomized controlled trials published in China Journal of Chinese Materia Medica].

PubMed

Yu, Dan-Dan; Xie, Yan-Ming; Liao, Xing; Zhi, Ying-Jie; Jiang, Jun-Jie; Chen, Wei

2018-02-01

To evaluate the methodological quality and reporting quality of randomized controlled trials(RCTs) published in China Journal of Chinese Materia Medica, we searched CNKI and China Journal of Chinese Materia webpage to collect RCTs since the establishment of the magazine. The Cochrane risk of bias assessment tool was used to evaluate the methodological quality of RCTs. The CONSORT 2010 list was adopted as reporting quality evaluating tool. Finally, 184 RCTs were included and evaluated methodologically, of which 97 RCTs were evaluated with reporting quality. For the methodological evaluating, 62 trials(33.70%) reported the random sequence generation; 9(4.89%) trials reported the allocation concealment; 25(13.59%) trials adopted the method of blinding; 30(16.30%) trials reported the number of patients withdrawing, dropping out and those lost to follow-up;2 trials （1.09%） reported trial registration and none of the trial reported the trial protocol; only 8(4.35%) trials reported the sample size estimation in details. For reporting quality appraising, 3 reporting items of 25 items were evaluated with high-quality,including: abstract, participants qualified criteria, and statistical methods; 4 reporting items with medium-quality, including purpose, intervention, random sequence method, and data collection of sites and locations; 9 items with low-quality reporting items including title, backgrounds, random sequence types, allocation concealment, blindness, recruitment of subjects, baseline data, harms, and funding;the rest of items were of extremely low quality(the compliance rate of reporting item<10%). On the whole, the methodological and reporting quality of RCTs published in the magazine are generally low. Further improvement in both methodological and reporting quality for RCTs of traditional Chinese medicine are warranted. It is recommended that the international standards and procedures for RCT design should be strictly followed to conduct high-quality trials. At the same time, in order to improve the reporting quality of randomized controlled trials, CONSORT standards should be adopted in the preparation of research reports and submissions. Copyright© by the Chinese Pharmaceutical Association.
Analysis of Litopenaeus vannamei Transcriptome Using the Next-Generation DNA Sequencing Technique

PubMed Central

Li, Chaozheng; Weng, Shaoping; Chen, Yonggui; Yu, Xiaoqiang; Lü, Ling; Zhang, Haiqing; He, Jianguo; Xu, Xiaopeng

2012-01-01

Background Pacific white shrimp (Litopenaeus vannamei), the major species of farmed shrimps in the world, has been attracting extensive studies, which require more and more genome background knowledge. The now available transcriptome data of L. vannamei are insufficient for research requirements, and have not been adequately assembled and annotated. Methodology/Principal Findings This is the first study that used a next-generation high-throughput DNA sequencing technique, the Solexa/Illumina GA II method, to analyze the transcriptome from whole bodies of L. vannamei larvae. More than 2.4 Gb of raw data were generated, and 109,169 unigenes with a mean length of 396 bp were assembled using the SOAP denovo software. 73,505 unigenes (>200 bp) with good quality sequences were selected and subjected to annotation analysis, among which 37.80% can be matched in NCBI Nr database, 37.3% matched in Swissprot, and 44.1% matched in TrEMBL. Using BLAST and BLAST2Go softwares, 11,153 unigenes were classified into 25 Clusters of Orthologous Groups of proteins (COG) categories, 8171 unigenes were assigned into 51 Gene ontology (GO) functional groups, and 18,154 unigenes were divided into 220 Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways. To primarily verify part of the results of assembly and annotations, 12 assembled unigenes that are homologous to many embryo development-related genes were chosen and subjected to RT-PCR for electrophoresis and Sanger sequencing analyses, and to real-time PCR for expression profile analyses during embryo development. Conclusions/Significance The L. vannamei transcriptome analyzed using the next-generation sequencing technique enriches the information of L. vannamei genes, which will facilitate our understanding of the genome background of crustaceans, and promote the studies on L. vannamei. PMID:23071809
Next-generation sequencing of the Trichinella murrelli mitochondrial genome allows comprehensive comparison of its divergence from the principal agent of human trichinellosis, Trichinella spiralis.

PubMed

Webb, Kristen M; Rosenthal, Benjamin M

2011-01-01

The mitochondrial genome's non-recombinant mode of inheritance and relatively rapid rate of evolution has promoted its use as a marker for studying the biogeographic history and evolutionary interrelationships among many metazoan species. A modest portion of the mitochondrial genome has been defined for 12 species and genotypes of parasites in the genus Trichinella, but its adequacy in representing the mitochondrial genome as a whole remains unclear, as the complete coding sequence has been characterized only for Trichinella spiralis. Here, we sought to comprehensively describe the extent and nature of divergence between the mitochondrial genomes of T. spiralis (which poses the most appreciable zoonotic risk owing to its capacity to establish persistent infections in domestic pigs) and Trichinella murrelli (which is the most prevalent species in North American wildlife hosts, but which poses relatively little risk to the safety of pork). Next generation sequencing methodologies and scaffold and de novo assembly strategies were employed. The entire protein-coding region was sequenced (13,917 bp), along with a portion of the highly repetitive non-coding region (1524 bp) of the mitochondrial genome of T. murrelli with a combined average read depth of 250 reads. The accuracy of base calling, estimated from coding region sequence was found to exceed 99.3%. Genome content and gene order was not found to be significantly different from that of T. spiralis. An overall inter-species sequence divergence of 9.5% was estimated. Significant variation was identified when the amount of variation between species at each gene is compared to the average amount of variation between species across the coding region. Next generation sequencing is a highly effective means to obtain previously unknown mitochondrial genome sequence. Particular to parasites, the extremely deep coverage achieved through this method allows for the detection of sequence heterogeneity between the multiple individuals that necessarily comprise such templates. Copyright © 2010 Elsevier B.V. All rights reserved.
Statistical Ring Opening Metathesis Copolymerization of Norbornene and Cyclopentene by Grubbs' 1st-Generation Catalyst.

PubMed

Nikovia, Christiana; Maroudas, Andreas-Philippos; Goulis, Panagiotis; Tzimis, Dionysios; Paraskevopoulou, Patrina; Pitsikalis, Marinos

2015-08-27

Statistical copolymers of norbornene (NBE) with cyclopentene (CP) were prepared by ring-opening metathesis polymerization, employing the 1st-generation Grubbs' catalyst, in the presence or absence of triphenylphosphine, PPh₃. The reactivity ratios were estimated using the Finemann-Ross, inverted Finemann-Ross, and Kelen-Tüdos graphical methods, along with the computer program COPOINT, which evaluates the parameters of binary copolymerizations from comonomer/copolymer composition data by integrating a given copolymerization equation in its differential form. Structural parameters of the copolymers were obtained by calculating the dyad sequence fractions and the mean sequence length, which were derived using the monomer reactivity ratios. The kinetics of thermal decomposition of the copolymers along with the respective homopolymers was studied by thermogravimetric analysis within the framework of the Ozawa-Flynn-Wall and Kissinger methodologies. Finally, the effect of triphenylphosphine on the kinetics of copolymerization, the reactivity ratios, and the kinetics of thermal decomposition were examined.
Connecting the Human Variome Project to nutrigenomics.

PubMed

Kaput, Jim; Evelo, Chris T; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard

2010-12-01

Nutrigenomics is the science of analyzing and understanding gene-nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene-nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts.
Connecting the Human Variome Project to nutrigenomics

PubMed Central

Evelo, Chris T.; Perozzi, Giuditta; van Ommen, Ben; Cotton, Richard

2010-01-01

Nutrigenomics is the science of analyzing and understanding gene–nutrient interactions, which because of the genetic heterogeneity, varying degrees of interaction among gene products, and the environmental diversity is a complex science. Although much knowledge of human diversity has been accumulated, estimates suggest that ~90% of genetic variation has not yet been characterized. Identification of the DNA sequence variants that contribute to nutrition-related disease risk is essential for developing a better understanding of the complex causes of disease in humans, including nutrition-related disease. The Human Variome Project (HVP; http://www.humanvariomeproject.org/) is an international effort to systematically identify genes, their mutations, and their variants associated with phenotypic variability and indications of human disease or phenotype. Since nutrigenomic research uses genetic information in the design and analysis of experiments, the HVP is an essential collaborator for ongoing studies of gene–nutrient interactions. With the advent of next generation sequencing methodologies and the understanding of the undiscovered variation in human genomes, the nutrigenomic community will be generating novel sequence data and results. The guidelines and practices of the HVP can guide and harmonize these efforts. PMID:28300226
Diversity of Babesia bovis merozoite surface antigen genes in the Philippines.

PubMed

Tattiyapong, Muncharee; Sivakumar, Thillaiampalam; Ybanez, Adrian Patalinghug; Ybanez, Rochelle Haidee Daclan; Perez, Zandro Obligado; Guswanto, Azirwan; Igarashi, Ikuo; Yokoyama, Naoaki

2014-02-01

Babesia bovis is the causative agent of fatal babesiosis in cattle. In the present study, we investigated the genetic diversity of B. bovis among Philippine cattle, based on the genes that encode merozoite surface antigens (MSAs). Forty-one B. bovis-positive blood DNA samples from cattle were used to amplify the msa-1, msa-2b, and msa-2c genes. In phylogenetic analyses, the msa-1, msa-2b, and msa-2c gene sequences generated from Philippine B. bovis-positive DNA samples were found in six, three, and four different clades, respectively. All of the msa-1 and most of the msa-2b sequences were found in clades that were formed only by Philippine msa sequences in the respective phylograms. While all the msa-1 sequences from the Philippines showed similarity to those formed by Australian msa-1 sequences, the msa-2b sequences showed similarity to either Australian or Mexican msa-2b sequences. In contrast, msa-2c sequences from the Philippines were distributed across all the clades of the phylogram, although one clade was formed exclusively by Philippine msa-2c sequences. Similarities among the deduced amino acid sequences of MSA-1, MSA-2b, and MSA-2c from the Philippines were 62.2-100, 73.1-100, and 67.3-100%, respectively. The present findings demonstrate that B. bovis populations are genetically diverse in the Philippines. This information will provide a good foundation for the future design and implementation of improved immunological preventive methodologies against bovine babesiosis in the Philippines. The study has also generated a set of data that will be useful for futher understanding of the global genetic diversity of this important parasite. © 2013.
Organotin speciation in environmental matrices by automated on-line hydride generation-programmed temperature vaporization-capillary gas chromatography-mass spectrometry detection.

PubMed

Serra, H; Nogueira, J M F

2005-11-11

In the present contribution, a new automated on-line hydride generation methodology was developed for dibutyltin and tributyltin speciation at the trace level, using a programmable temperature-vaporizing inlet followed by capillary gas chromatography coupled to mass spectrometry in the selected ion-monitoring mode acquisition (PTV-GC/MS(SIM)). The methodology involves a sequence defined by two running methods, the first one configured for hydride generation with sodium tetrahydroborate as derivatising agent and the second configured for speciation purposes, using a conventional autosampler and data acquisition controlled by the instrument's software. From the method-development experiments, it had been established that injector configuration has a great effect on the speciation of the actual methodology, particularly, the initial inlet temperature (-20 degrees C; He: 150 ml/min), injection volume (2 microl) and solvent characteristics using the solvent venting mode. Under optimized conditions, a remarkable instrumental performance including very good precision (RSD < 4%), excellent linear dynamic range (up to 50 microg/ml) and limits of detection of 0.12 microg/ml and 9 ng/ml, were obtained for dibutyltin and tributyltin, respectively. The feasibility of the present methodology was validated through assays upon in-house spiked water (2 ng/ml) and a certified reference sediment matrix (Community Bureau of Reference, CRM 462, Nr. 330 dibutyltin: 68+/-12 ng/g; tributyltin: 54+/-15 ng/g on dry mass basis), using liquid-liquid extraction (LLE) and solid-phase extraction (SPE) sample enrichment and multiple injections (2 x 5 microl) for sensitivity enhancement. The methodology evidenced high reproducibility, is easy to work-up, sensitive and showed to be a suitable alternative to replace the currently dedicated analytical systems for organotin speciation in environmental matrices at the trace level.
Using next-generation sequencing for high resolution multiplex analysis of copy number variation from nanogram quantities of DNA from formalin-fixed paraffin-embedded specimens.

PubMed

Wood, Henry M; Belvedere, Ornella; Conway, Caroline; Daly, Catherine; Chalkley, Rebecca; Bickerdike, Melissa; McKinley, Claire; Egan, Phil; Ross, Lisa; Hayward, Bruce; Morgan, Joanne; Davidson, Leslie; MacLennan, Ken; Ong, Thian K; Papagiannopoulos, Kostas; Cook, Ian; Adams, David J; Taylor, Graham R; Rabbitts, Pamela

2010-08-01

The use of next-generation sequencing technologies to produce genomic copy number data has recently been described. Most approaches, however, reply on optimal starting DNA, and are therefore unsuitable for the analysis of formalin-fixed paraffin-embedded (FFPE) samples, which largely precludes the analysis of many tumour series. We have sought to challenge the limits of this technique with regards to quality and quantity of starting material and the depth of sequencing required. We confirm that the technique can be used to interrogate DNA from cell lines, fresh frozen material and FFPE samples to assess copy number variation. We show that as little as 5 ng of DNA is needed to generate a copy number karyogram, and follow this up with data from a series of FFPE biopsies and surgical samples. We have used various levels of sample multiplexing to demonstrate the adjustable resolution of the methodology, depending on the number of samples and available resources. We also demonstrate reproducibility by use of replicate samples and comparison with microarray-based comparative genomic hybridization (aCGH) and digital PCR. This technique can be valuable in both the analysis of routine diagnostic samples and in examining large repositories of fixed archival material.
A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

PubMed Central

2013-01-01

Background The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Results Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. Conclusions In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. PMID:23442253
Methodology Development for Passive Component Reliability Modeling in a Multi-Physics Simulation Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aldemir, Tunc; Denning, Richard; Catalyurek, Umit

Reduction in safety margin can be expected as passive structures and components undergo degradation with time. Limitations in the traditional probabilistic risk assessment (PRA) methodology constrain its value as an effective tool to address the impact of aging effects on risk and for quantifying the impact of aging management strategies in maintaining safety margins. A methodology has been developed to address multiple aging mechanisms involving large numbers of components (with possibly statistically dependent failures) within the PRA framework in a computationally feasible manner when the sequencing of events is conditioned on the physical conditions predicted in a simulation environment, suchmore » as the New Generation System Code (NGSC) concept. Both epistemic and aleatory uncertainties can be accounted for within the same phenomenological framework and maintenance can be accounted for in a coherent fashion. The framework accommodates the prospective impacts of various intervention strategies such as testing, maintenance, and refurbishment. The methodology is illustrated with several examples.« less

BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

PubMed

Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

2018-06-05

With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
DNA barcodes for ecology, evolution, and conservation.

PubMed

Kress, W John; García-Robledo, Carlos; Uriarte, Maria; Erickson, David L

2015-01-01

The use of DNA barcodes, which are short gene sequences taken from a standardized portion of the genome and used to identify species, is entering a new phase of application as more and more investigations employ these genetic markers to address questions relating to the ecology and evolution of natural systems. The suite of DNA barcode markers now applied to specific taxonomic groups of organisms are proving invaluable for understanding species boundaries, community ecology, functional trait evolution, trophic interactions, and the conservation of biodiversity. The application of next-generation sequencing (NGS) technology will greatly expand the versatility of DNA barcodes across the Tree of Life, habitats, and geographies as new methodologies are explored and developed. Published by Elsevier Ltd.
Knowledge-based computational intelligence development for predicting protein secondary structures from sequences.

PubMed

Shen, Hong-Bin; Yi, Dong-Liang; Yao, Li-Xiu; Yang, Jie; Chou, Kuo-Chen

2008-10-01

In the postgenomic age, with the avalanche of protein sequences generated and relatively slow progress in determining their structures by experiments, it is important to develop automated methods to predict the structure of a protein from its sequence. The membrane proteins are a special group in the protein family that accounts for approximately 30% of all proteins; however, solved membrane protein structures only represent less than 1% of known protein structures to date. Although a great success has been achieved for developing computational intelligence techniques to predict secondary structures in both globular and membrane proteins, there is still much challenging work in this regard. In this review article, we firstly summarize the recent progress of automation methodology development in predicting protein secondary structures, especially in membrane proteins; we will then give some future directions in this research field.
Methodologic European external quality assurance for DNA sequencing: the EQUALseq program.

PubMed

Ahmad-Nejad, Parviz; Dorn-Beineke, Alexandra; Pfeiffer, Ulrike; Brade, Joachim; Geilenkeuser, Wolf-Jochen; Ramsden, Simon; Pazzagli, Mario; Neumaier, Michael

2006-04-01

DNA sequencing is a key technique in molecular diagnostics, but to date no comprehensive methodologic external quality assessment (EQA) programs have been instituted. Between 2003 and 2005, the European Union funded, as specific support actions, the EQUAL initiative to develop methodologic EQA schemes for genotyping (EQUALqual), quantitative PCR (EQUALquant), and sequencing (EQUALseq). Here we report on the results of the EQUALseq program. The participating laboratories received a 4-sample set comprising 2 DNA plasmids, a PCR product, and a finished sequencing reaction to be analyzed. Data and information from detailed questionnaires were uploaded online and evaluated by use of a scoring system for technical skills and proficiency of data interpretation. Sixty laboratories from 21 European countries registered, and 43 participants (72%) returned data and samples. Capillary electrophoresis was the predominant platform (n = 39; 91%). The median contiguous correct sequence stretch was 527 nucleotides with considerable variation in quality of both primary data and data evaluation. The association between laboratory performance and the number of sequencing assays/year was statistically significant (P <0.05). Interestingly, more than 30% of participants neither added comments to their data nor made efforts to identify the gene sequences or mutational positions. Considerable variations exist even in a highly standardized methodology such as DNA sequencing. Methodologic EQAs are appropriate tools to uncover strengths and weaknesses in both technique and proficiency, and our results emphasize the need for mandatory EQAs. The results of EQUALseq should help improve the overall quality of molecular genetics findings obtained by DNA sequencing.
Tips and tricks for the assembly of a Corynebacterium pseudotuberculosis genome using a semiconductor sequencer.

PubMed

Ramos, Rommel Thiago Jucá; Carneiro, Adriana Ribeiro; Soares, Siomar de Castro; dos Santos, Anderson Rodrigues; Almeida, Sintia; Guimarães, Luis; Figueira, Flávia; Barbosa, Eudes; Tauch, Andreas; Azevedo, Vasco; Silva, Artur

2013-03-01

New sequencing platforms have enabled rapid decoding of complete prokaryotic genomes at relatively low cost. The Ion Torrent platform is an example of these technologies, characterized by lower coverage, generating challenges for the genome assembly. One particular problem is the lack of genomes that enable reference-based assembly, such as the one used in the present study, Corynebacterium pseudotuberculosis biovar equi, which causes high economic losses in the US equine industry. The quality treatment strategy incorporated into the assembly pipeline enabled a 16-fold greater use of the sequencing data obtained compared with traditional quality filter approaches. Data preprocessing prior to the de novo assembly enabled the use of known methodologies in the next-generation sequencing data assembly. Moreover, manual curation was proved to be essential for ensuring a quality assembly, which was validated by comparative genomics with other species of the genus Corynebacterium. The present study presents a modus operandi that enables a greater and better use of data obtained from semiconductor sequencing for obtaining the complete genome from a prokaryotic microorganism, C. pseudotuberculosis, which is not a traditional biological model such as Escherichia coli. © 2012 The Authors. Published by Society for Applied Microbiology and Blackwell Publishing Ltd. This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
The case for multimodal analysis of atypical interaction: questions, answers and gaze in play involving a child with autism.

PubMed

Muskett, Tom; Body, Richard

2013-01-01

Conversation analysis (CA) continues to accrue interest within clinical linguistics as a methodology that can enable elucidation of structural and sequential orderliness in interactions involving participants who produce ostensibly disordered communication behaviours. However, it can be challenging to apply CA to re-examine clinical phenomena that have initially been defined in terms of linguistics, as a logical starting point for analysis may be to focus primarily on the organisation of language ("talk") in such interactions. In this article, we argue that CA's methodological power can only be fully exploited in this research context when a multimodal analytic orientation is adopted, where due consideration is given to participants' co-ordinated use of multiple semiotic resources including, but not limited to, talk (e.g., gaze, embodied action, object use and so forth). To evidence this argument, a two-layered analysis of unusual question-answer sequences in a play episode involving a child with autism is presented. It is thereby demonstrated that only when the scope of enquiry is broadened to include gaze and other embodied action can an account be generated of orderliness within these sequences. This finding has important implications for CA's application as a research methodology within clinical linguistics.
SU-F-J-105: Towards a Novel Treatment Planning Pipeline Delivering Pareto- Optimal Plans While Enabling Inter- and Intrafraction Plan Adaptation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kontaxis, C; Bol, G; Lagendijk, J

2016-06-15

Purpose: To develop a new IMRT treatment planning methodology suitable for the new generation of MR-linear accelerator machines. The pipeline is able to deliver Pareto-optimal plans and can be utilized for conventional treatments as well as for inter- and intrafraction plan adaptation based on real-time MR-data. Methods: A Pareto-optimal plan is generated using the automated multicriterial optimization approach Erasmus-iCycle. The resulting dose distribution is used as input to the second part of the pipeline, an iterative process which generates deliverable segments that target the latest anatomical state and gradually converges to the prescribed dose. This process continues until a certainmore » percentage of the dose has been delivered. Under a conventional treatment, a Segment Weight Optimization (SWO) is then performed to ensure convergence to the prescribed dose. In the case of inter- and intrafraction adaptation, post-processing steps like SWO cannot be employed due to the changing anatomy. This is instead addressed by transferring the missing/excess dose to the input of the subsequent fraction. In this work, the resulting plans were delivered on a Delta4 phantom as a final Quality Assurance test. Results: A conventional static SWO IMRT plan was generated for two prostate cases. The sequencer faithfully reproduced the input dose for all volumes of interest. For the two cases the mean relative dose difference of the PTV between the ideal input and sequenced dose was 0.1% and −0.02% respectively. Both plans were delivered on a Delta4 phantom and passed the clinical Quality Assurance procedures by achieving 100% pass rate at a 3%/3mm gamma analysis. Conclusion: We have developed a new sequencing methodology capable of online plan adaptation. In this work, we extended the pipeline to support Pareto-optimal input and clinically validated that it can accurately achieve these ideal distributions, while its flexible design enables inter- and intrafraction plan adaptation. This research is financially supported by Elekta AB, Stockholm, Sweden.« less
De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case

PubMed Central

Setoh, Yin Xiang; Prow, Natalie A.; Peng, Nias; Hugo, Leon E.; Devine, Gregor; Hazlewood, Jessamine E.

2017-01-01

ABSTRACT Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero. Herein, we describe the de novo generation of a new ZIKV isolate, ZIKVNatal, using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKVNatal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKVNatal could be used to establish a fetal brain infection model in IFNAR−/− mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKVNatal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKVNatal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus’s ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses. PMID:28529976
De Novo Generation and Characterization of New Zika Virus Isolate Using Sequence Data from a Microcephaly Case.

PubMed

Setoh, Yin Xiang; Prow, Natalie A; Peng, Nias; Hugo, Leon E; Devine, Gregor; Hazlewood, Jessamine E; Suhrbier, Andreas; Khromykh, Alexander A

2017-01-01

Zika virus (ZIKV) has recently emerged and is the etiological agent of congenital Zika syndrome (CZS), a spectrum of congenital abnormalities arising from neural tissue infections in utero . Herein, we describe the de novo generation of a new ZIKV isolate, ZIKV Natal , using a modified circular polymerase extension reaction protocol and sequence data obtained from a ZIKV-infected fetus with microcephaly. ZIKV Natal thus has no laboratory passage history and is unequivocally associated with CZS. ZIKV Natal could be used to establish a fetal brain infection model in IFNAR -/- mice (including intrauterine growth restriction) without causing symptomatic infections in dams. ZIKV Natal was also able to be transmitted by Aedes aegypti mosquitoes. ZIKV Natal thus retains key aspects of circulating pathogenic ZIKVs and illustrates a novel methodology for obtaining an authentic functional viral isolate by using data from deep sequencing of infected tissues. IMPORTANCE The major complications of an ongoing Zika virus outbreak in the Americas and Asia are congenital defects caused by the virus's ability to cross the placenta and infect the fetal brain. The ability to generate molecular tools to analyze viral isolates from the current outbreak is essential for furthering our understanding of how these viruses cause congenital defects. The majority of existing viral isolates and infectious cDNA clones generated from them have undergone various numbers of passages in cell culture and/or suckling mice, which is likely to result in the accumulation of adaptive mutations that may affect viral properties. The approach described herein allows rapid generation of new, fully functional Zika virus isolates directly from deep sequencing data from virus-infected tissues without the need for prior virus passaging and for the generation and propagation of full-length cDNA clones. The approach should be applicable to other medically important flaviviruses and perhaps other positive-strand RNA viruses.
Whole genome sequencing distinguishes between relapse and reinfection in recurrent leprosy cases

PubMed Central

Bührer-Sékula, Samira; Benjak, Andrej; Loiseau, Chloé; Singh, Pushpendra; Pontes, Maria A. A.; Gonçalves, Heitor S.; Hungria, Emerith M.; Busso, Philippe; Piton, Jérémie; Silveira, Maria I. S.; Cruz, Rossilene; Schetinni, Antônio; Costa, Maurício B.; Virmond, Marcos C. L.; Diorio, Suzana M.; Dias-Baptista, Ida M. F.; Rosa, Patricia S.; Matsuoka, Masanori; Penna, Maria L. F.; Cole, Stewart T.; Penna, Gerson O.

2017-01-01

Background Since leprosy is both treated and controlled by multidrug therapy (MDT) it is important to monitor recurrent cases for drug resistance and to distinguish between relapse and reinfection as a means of assessing therapeutic efficacy. All three objectives can be reached with single nucleotide resolution using next generation sequencing and bioinformatics analysis of Mycobacterium leprae DNA present in human skin. Methodology DNA was isolated by means of optimized extraction and enrichment methods from samples from three recurrent cases in leprosy patients participating in an open-label, randomized, controlled clinical trial of uniform MDT in Brazil (U-MDT/CT-BR). Genome-wide sequencing of M. leprae was performed and the resultant sequence assemblies analyzed in silico. Principal findings In all three cases, no mutations responsible for resistance to rifampicin, dapsone and ofloxacin were found, thus eliminating drug resistance as a possible cause of disease recurrence. However, sequence differences were detected between the strains from the first and second disease episodes in all three patients. In one case, clear evidence was obtained for reinfection with an unrelated strain whereas in the other two cases, relapse appeared more probable. Conclusions/Significance This is the first report of using M. leprae whole genome sequencing to reveal that treated and cured leprosy patients who remain in endemic areas can be reinfected by another strain. Next generation sequencing can be applied reliably to M. leprae DNA extracted from biopsies to discriminate between cases of relapse and reinfection, thereby providing a powerful tool for evaluating different outcomes of therapeutic regimens and for following disease transmission. PMID:28617800
Multi-targeted priming for genome-wide gene expression assays.

PubMed

Adomas, Aleksandra B; Lopez-Giraldez, Francesc; Clark, Travis A; Wang, Zheng; Townsend, Jeffrey P

2010-08-17

Complementary approaches to assaying global gene expression are needed to assess gene expression in regions that are poorly assayed by current methodologies. A key component of nearly all gene expression assays is the reverse transcription of transcribed sequences that has traditionally been performed by priming the poly-A tails on many of the transcribed genes in eukaryotes with oligo-dT, or by priming RNA indiscriminately with random hexamers. We designed an algorithm to find common sequence motifs that were present within most protein-coding genes of Saccharomyces cerevisiae and of Neurospora crassa, but that were not present within their ribosomal RNA or transfer RNA genes. We then experimentally tested whether degenerately priming these motifs with multi-targeted primers improved the accuracy and completeness of transcriptomic assays. We discovered two multi-targeted primers that would prime a preponderance of genes in the genomes of Saccharomyces cerevisiae and Neurospora crassa while avoiding priming ribosomal RNA or transfer RNA. Examining the response of Saccharomyces cerevisiae to nitrogen deficiency and profiling Neurospora crassa early sexual development, we demonstrated that using multi-targeted primers in reverse transcription led to superior performance of microarray profiling and next-generation RNA tag sequencing. Priming with multi-targeted primers in addition to oligo-dT resulted in higher sensitivity, a larger number of well-measured genes and greater power to detect differences in gene expression. Our results provide the most complete and detailed expression profiles of the yeast nitrogen starvation response and N. crassa early sexual development to date. Furthermore, our multi-targeting priming methodology for genome-wide gene expression assays provides selective targeting of multiple sequences and counter-selection against undesirable sequences, facilitating a more complete and precise assay of the transcribed sequences within the genome.
The Present and Future of Whole Genome Sequencing (WGS) and Whole Metagenome Sequencing (WMS) for Surveillance of Antimicrobial Resistant Microorganisms and Antimicrobial Resistance Genes across the Food Chain

PubMed Central

Oniciuc, Elena A.; Likotrafiti, Eleni; Alvarez-Molina, Adrián; Alvarez-Ordóñez, Avelino

2018-01-01

Antimicrobial resistance (AMR) surveillance is a critical step within risk assessment schemes, as it is the basis for informing global strategies, monitoring the effectiveness of public health interventions, and detecting new trends and emerging threats linked to food. Surveillance of AMR is currently based on the isolation of indicator microorganisms and the phenotypic characterization of clinical, environmental and food strains isolated. However, this approach provides very limited information on the mechanisms driving AMR or on the presence or spread of AMR genes throughout the food chain. Whole-genome sequencing (WGS) of bacterial pathogens has shown potential for epidemiological surveillance, outbreak detection, and infection control. In addition, whole metagenome sequencing (WMS) allows for the culture-independent analysis of complex microbial communities, providing useful information on AMR genes occurrence. Both technologies can assist the tracking of AMR genes and mobile genetic elements, providing the necessary information for the implementation of quantitative risk assessments and allowing for the identification of hotspots and routes of transmission of AMR across the food chain. This review article summarizes the information currently available on the use of WGS and WMS for surveillance of AMR in foodborne pathogenic bacteria and food-related samples and discusses future needs that will have to be considered for the routine implementation of these next-generation sequencing methodologies with this aim. In particular, methodological constraints that impede the use at a global scale of these high-throughput sequencing (HTS) technologies are identified, and the standardization of methods and protocols is suggested as a measure to upgrade HTS-based AMR surveillance schemes. PMID:29789467
Fast and Efficient Drosophila melanogaster Gene Knock-Ins Using MiMIC Transposons

PubMed Central

Vilain, Sven; Vanhauwaert, Roeland; Maes, Ine; Schoovaerts, Nils; Zhou, Lujia; Soukup, Sandra; da Cunha, Raquel; Lauwers, Elsa; Fiers, Mark; Verstreken, Patrik

2014-01-01

Modern molecular genetics studies necessitate the manipulation of genes in their endogenous locus, but most of the current methodologies require an inefficient donor-dependent homologous recombination step to locally modify the genome. Here we describe a methodology to efficiently generate Drosophila knock-in alleles by capitalizing on the availability of numerous genomic MiMIC transposon insertions carrying recombinogenic attP sites. Our methodology entails the efficient PhiC31-mediated integration of a recombination cassette flanked by unique I-SceI and/or I-CreI restriction enzyme sites into an attP-site. These restriction enzyme sites allow for double-strand break−mediated removal of unwanted flanking transposon sequences, while leaving the desired genomic modifications or recombination cassettes. As a proof-of-principle, we mutated LRRK, tau, and sky by using different MiMIC elements. We replaced 6 kb of genomic DNA encompassing the tau locus and 35 kb encompassing the sky locus with a recombination cassette that permits easy integration of DNA at these loci and we also generated a functional LRRKHA knock in allele. Given that ~92% of the Drosophila genes are located within the vicinity (<35 kb) of a MiMIC element, our methodology enables the efficient manipulation of nearly every locus in the fruit fly genome without the need for inefficient donor-dependent homologous recombination events. PMID:25298537
Fast and efficient Drosophila melanogaster gene knock-ins using MiMIC transposons.

PubMed

Vilain, Sven; Vanhauwaert, Roeland; Maes, Ine; Schoovaerts, Nils; Zhou, Lujia; Soukup, Sandra; da Cunha, Raquel; Lauwers, Elsa; Fiers, Mark; Verstreken, Patrik

2014-10-08

Modern molecular genetics studies necessitate the manipulation of genes in their endogenous locus, but most of the current methodologies require an inefficient donor-dependent homologous recombination step to locally modify the genome. Here we describe a methodology to efficiently generate Drosophila knock-in alleles by capitalizing on the availability of numerous genomic MiMIC transposon insertions carrying recombinogenic attP sites. Our methodology entails the efficient PhiC31-mediated integration of a recombination cassette flanked by unique I-SceI and/or I-CreI restriction enzyme sites into an attP-site. These restriction enzyme sites allow for double-strand break-mediated removal of unwanted flanking transposon sequences, while leaving the desired genomic modifications or recombination cassettes. As a proof-of-principle, we mutated LRRK, tau, and sky by using different MiMIC elements. We replaced 6 kb of genomic DNA encompassing the tau locus and 35 kb encompassing the sky locus with a recombination cassette that permits easy integration of DNA at these loci and we also generated a functional LRRK(HA) knock in allele. Given that ~92% of the Drosophila genes are located within the vicinity (<35 kb) of a MiMIC element, our methodology enables the efficient manipulation of nearly every locus in the fruit fly genome without the need for inefficient donor-dependent homologous recombination events. Copyright © 2014 Vilain et al.
Design of a High Density SNP Genotyping Assay in the Pig Using SNPs Identified and Characterized by Next Generation Sequencing Technology

PubMed Central

Ramos, Antonio M.; Crooijmans, Richard P. M. A.; Affara, Nabeel A.; Amaral, Andreia J.; Archibald, Alan L.; Beever, Jonathan E.; Bendixen, Christian; Churcher, Carol; Clark, Richard; Dehais, Patrick; Hansen, Mark S.; Hedegaard, Jakob; Hu, Zhi-Liang; Kerstens, Hindrik H.; Law, Andy S.; Megens, Hendrik-Jan; Milan, Denis; Nonneman, Danny J.; Rohrer, Gary A.; Rothschild, Max F.; Smith, Tim P. L.; Schnabel, Robert D.; Van Tassell, Curt P.; Taylor, Jeremy F.; Wiedmann, Ralph T.; Schook, Lawrence B.; Groenen, Martien A. M.

2009-01-01

Background The dissection of complex traits of economic importance to the pig industry requires the availability of a significant number of genetic markers, such as single nucleotide polymorphisms (SNPs). This study was conducted to discover several hundreds of thousands of porcine SNPs using next generation sequencing technologies and use these SNPs, as well as others from different public sources, to design a high-density SNP genotyping assay. Methodology/Principal Findings A total of 19 reduced representation libraries derived from four swine breeds (Duroc, Landrace, Large White, Pietrain) and a Wild Boar population and three restriction enzymes (AluI, HaeIII and MspI) were sequenced using Illumina's Genome Analyzer (GA). The SNP discovery effort resulted in the de novo identification of over 372K SNPs. More than 549K SNPs were used to design the Illumina Porcine 60K+SNP iSelect Beadchip, now commercially available as the PorcineSNP60. A total of 64,232 SNPs were included on the Beadchip. Results from genotyping the 158 individuals used for sequencing showed a high overall SNP call rate (97.5%). Of the 62,621 loci that could be reliably scored, 58,994 were polymorphic yielding a SNP conversion success rate of 94%. The average minor allele frequency (MAF) for all scorable SNPs was 0.274. Conclusions/Significance Overall, the results of this study indicate the utility of using next generation sequencing technologies to identify large numbers of reliable SNPs. In addition, the validation of the PorcineSNP60 Beadchip demonstrated that the assay is an excellent tool that will likely be used in a variety of future studies in pigs. PMID:19654876
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Human papillomavirus genotyping by Linear Array and Next-Generation Sequencing in cervical samples from Western Mexico.

PubMed

Flores-Miramontes, María Guadalupe; Torres-Reyes, Luis Alberto; Alvarado-Ruíz, Liliana; Romero-Martínez, Salvador Angel; Ramírez-Rodríguez, Verenice; Balderas-Peña, Luz María Adriana; Vallejo-Ruíz, Verónica; Piña-Sánchez, Patricia; Cortés-Gutiérrez, Elva Irene; Jave-Suárez, Luis Felipe; Aguilar-Lemarroy, Adriana

2015-10-06

The Linear Array® (LA) genotyping test is one of the most used methodologies for Human papillomavirus (HPV) genotyping, in that it is able to detect 37 HPV genotypes and co-infections in the same sample. However, the assay is limited to a restricted number of HPV, and sequence variations in the detection region of the HPV probes could give false negatives results. Recently, 454 Next-Generation sequencing (NGS) technology has been efficiently used also for HPV genotyping; this methodology is based on massive sequencing of HPV fragments and is expected to be highly specific and sensitive. In this work, we studied HPV prevalence in cervixes of women in Western Mexico by LA and confirmed the genotypes found by NGS. Two hundred thirty three cervical samples from women Without cervical lesions (WCL, n = 48), with Cervical intraepithelial neoplasia grade 1 (CIN I, n = 98), or with Cervical cancer (CC, n = 87) were recruited, DNA was extracted, and HPV positivity was determined by PCR amplification using PGMY09/11 primers. All HPV- positive samples were genotyped individually by LA. Additionally, pools of amplicons from the PGMY-PCR products were sequenced using 454 NGS technology. Results obtained by NGS were compared with those of LA for each group of samples. We identified 35 HPV genotypes, among which 30 were identified by both technologies; in addition, the HPV genotypes 32, 44, 74, 102 and 114 were detected by NGS. These latter genotypes, to our knowledge, have not been previously reported in Mexican population. Furthermore, we found that LA did not detect, in some diagnosis groups, certain HPV genotypes included in the test, such as 6, 11, 16, 26, 35, 51, 58, 68, 73, and 89, which indicates possible variations at the species level. There are HPV genotypes in Mexican population that cannot be detected by LA, which is, at present, the most complete commercial genotyping test. More studies are necessary to determine the impact of HPV-44, 74, 102 and 114 on the risk of developing CC. A greater number of samples must be analyzed by NGS for the most accurate determination of Mexican HPV variants.
The dynamics of genome replication using deep sequencing

PubMed Central

Müller, Carolin A.; Hawkins, Michelle; Retkute, Renata; Malla, Sunir; Wilson, Ray; Blythe, Martin J.; Nakato, Ryuichiro; Komata, Makiko; Shirahige, Katsuhiko; de Moura, Alessandro P.S.; Nieduszynski, Conrad A.

2014-01-01

Eukaryotic genomes are replicated from multiple DNA replication origins. We present complementary deep sequencing approaches to measure origin location and activity in Saccharomyces cerevisiae. Measuring the increase in DNA copy number during a synchronous S-phase allowed the precise determination of genome replication. To map origin locations, replication forks were stalled close to their initiation sites; therefore, copy number enrichment was limited to origins. Replication timing profiles were generated from asynchronous cultures using fluorescence-activated cell sorting. Applying this technique we show that the replication profiles of haploid and diploid cells are indistinguishable, indicating that both cell types use the same cohort of origins with the same activities. Finally, increasing sequencing depth allowed the direct measure of replication dynamics from an exponentially growing culture. This is the first time this approach, called marker frequency analysis, has been successfully applied to a eukaryote. These data provide a high-resolution resource and methodological framework for studying genome biology. PMID:24089142
Space debris detection in optical image sequences.

PubMed

Xi, Jiangbo; Wen, Desheng; Ersoy, Okan K; Yi, Hongwei; Yao, Dalei; Song, Zongxi; Xi, Shaobo

2016-10-01

We present a high-accuracy, low false-alarm rate, and low computational-cost methodology for removing stars and noise and detecting space debris with low signal-to-noise ratio (SNR) in optical image sequences. First, time-index filtering and bright star intensity enhancement are implemented to remove stars and noise effectively. Then, a multistage quasi-hypothesis-testing method is proposed to detect the pieces of space debris with continuous and discontinuous trajectories. For this purpose, a time-index image is defined and generated. Experimental results show that the proposed method can detect space debris effectively without any false alarms. When the SNR is higher than or equal to 1.5, the detection probability can reach 100%, and when the SNR is as low as 1.3, 1.2, and 1, it can still achieve 99%, 97%, and 85% detection probabilities, respectively. Additionally, two large sets of image sequences are tested to show that the proposed method performs stably and effectively.
Microsatellite DNA capture from enriched libraries.

PubMed

Gonzalez, Elena G; Zardoya, Rafael

2013-01-01

Microsatellites are DNA sequences of tandem repeats of one to six nucleotides, which are highly polymorphic, and thus the molecular markers of choice in many kinship, population genetic, and conservation studies. There have been significant technical improvements since the early methods for microsatellite isolation were developed, and today the most common procedures take advantage of the hybrid capture methods of enriched-targeted microsatellite DNA. Furthermore, recent advents in sequencing technologies (i.e., next-generation sequencing, NGS) have fostered the mining of microsatellite markers in non-model organisms, affording a cost-effective way of obtaining a large amount of sequence data potentially useful for loci characterization. The rapid improvements of NGS platforms together with the increase in available microsatellite information open new avenues to the understanding of the evolutionary forces that shape genetic structuring in wild populations. Here, we provide detailed methodological procedures for microsatellite isolation based on the screening of GT microsatellite-enriched libraries, either by cloning and Sanger sequencing of positive clones or by direct NGS. Guides for designing new species-specific primers and basic genotyping are also given.

Genomic Repeat Abundances Contain Phylogenetic Signal

PubMed Central

Dodsworth, Steven; Chase, Mark W.; Kelly, Laura J.; Leitch, Ilia J.; Macas, Jiří; Novák, Petr; Piednoël, Mathieu; Weiss-Schneeweiss, Hanna; Leitch, Andrew R.

2015-01-01

A large proportion of genomic information, particularly repetitive elements, is usually ignored when researchers are using next-generation sequencing. Here we demonstrate the usefulness of this repetitive fraction in phylogenetic analyses, utilizing comparative graph-based clustering of next-generation sequence reads, which results in abundance estimates of different classes of genomic repeats. Phylogenetic trees are then inferred based on the genome-wide abundance of different repeat types treated as continuously varying characters; such repeats are scattered across chromosomes and in angiosperms can constitute a majority of nuclear genomic DNA. In six diverse examples, five angiosperms and one insect, this method provides generally well-supported relationships at interspecific and intergeneric levels that agree with results from more standard phylogenetic analyses of commonly used markers. We propose that this methodology may prove especially useful in groups where there is little genetic differentiation in standard phylogenetic markers. At the same time as providing data for phylogenetic inference, this method additionally yields a wealth of data for comparative studies of genome evolution. PMID:25261464
Specific identification of Bacillus anthracis strains

NASA Astrophysics Data System (ADS)

Krishnamurthy, Thaiya; Deshpande, Samir; Hewel, Johannes; Liu, Hongbin; Wick, Charles H.; Yates, John R., III

2007-01-01

Accurate identification of human pathogens is the initial vital step in treating the civilian terrorism victims and military personnel afflicted in biological threat situations. We have applied a powerful multi-dimensional protein identification technology (MudPIT) along with newly generated software termed Profiler to identify the sequences of specific proteins observed for few strains of Bacillus anthracis, a human pathogen. Software termed Profiler was created to initially screen the MudPIT data of B. anthracis strains and establish the observed proteins specific for its strains. A database was also generated using Profiler containing marker proteins of B. anthracis and its strains, which in turn could be used for detecting the organism and its corresponding strains in samples. Analysis of the unknowns by our methodology, combining MudPIT and Profiler, led to the accurate identification of the anthracis strains present in samples. Thus, a new approach for the identification of B. anthracis strains in unknown samples, based on the molecular mass and sequences of marker proteins, has been ascertained.
Species classifier choice is a key consideration when analysing low-complexity food microbiome data.

PubMed

Walsh, Aaron M; Crispie, Fiona; O'Sullivan, Orla; Finnegan, Laura; Claesson, Marcus J; Cotter, Paul D

2018-03-20

The use of shotgun metagenomics to analyse low-complexity microbial communities in foods has the potential to be of considerable fundamental and applied value. However, there is currently no consensus with respect to choice of species classification tool, platform, or sequencing depth. Here, we benchmarked the performances of three high-throughput short-read sequencing platforms, the Illumina MiSeq, NextSeq 500, and Ion Proton, for shotgun metagenomics of food microbiota. Briefly, we sequenced six kefir DNA samples and a mock community DNA sample, the latter constructed by evenly mixing genomic DNA from 13 food-related bacterial species. A variety of bioinformatic tools were used to analyse the data generated, and the effects of sequencing depth on these analyses were tested by randomly subsampling reads. Compositional analysis results were consistent between the platforms at divergent sequencing depths. However, we observed pronounced differences in the predictions from species classification tools. Indeed, PERMANOVA indicated that there was no significant differences between the compositional results generated by the different sequencers (p = 0.693, R 2 = 0.011), but there was a significant difference between the results predicted by the species classifiers (p = 0.01, R 2 = 0.127). The relative abundances predicted by the classifiers, apart from MetaPhlAn2, were apparently biased by reference genome sizes. Additionally, we observed varying false-positive rates among the classifiers. MetaPhlAn2 had the lowest false-positive rate, whereas SLIMM had the greatest false-positive rate. Strain-level analysis results were also similar across platforms. Each platform correctly identified the strains present in the mock community, but accuracy was improved slightly with greater sequencing depth. Notably, PanPhlAn detected the dominant strains in each kefir sample above 500,000 reads per sample. Again, the outputs from functional profiling analysis using SUPER-FOCUS were generally accordant between the platforms at different sequencing depths. Finally, and expectedly, metagenome assembly completeness was significantly lower on the MiSeq than either on the NextSeq (p = 0.03) or the Proton (p = 0.011), and it improved with increased sequencing depth. Our results demonstrate a remarkable similarity in the results generated by the three sequencing platforms at different sequencing depths, and, in fact, the choice of bioinformatics methodology had a more evident impact on results than the choice of sequencer did.
Massively parallel pyrosequencing of the mitochondrial genome with the 454 methodology in forensic genetics.

PubMed

Mikkelsen, Martin; Frank-Hansen, Rune; Hansen, Anders J; Morling, Niels

2014-09-01

of sequencing of whole mitochondrial genome, HV1 and HV2 DNA with the second generation system (SGS) Roche 454 GS Junior were compared with results of Sanger sequencing and SNP typing with SNaPshot single base extension detected with MALDI-TOF and capillary electrophoresis. We investigated the performance of the software analysis of the data, reproducibility, ability to sequence homopolymeric regions, detection of mixtures and heteroplasmy as well as the implications of the depth of coverage. We found full reproducibility between samples sequenced twice with SGS. We found close to full concordance between the mtDNA sequences of 26 samples obtained with (1) the 454 SGS method using a depth of coverage above 100 and (2) Sanger sequencing and SNP typing. The discrepancies were primarily observed in homopolymeric regions. The 454 SGS method was able to sequence 95% of the reads correctly in homopolymers up to 4 bases, and up to 6 bases could be sequenced with similar success if the results were carefully, visually inspected. The 454 technology was able to detect mixtures or heteroplasmy of approximately 10%. We detected previously unreported heteroplasmy in the GM9947A component of the NIST human mitochondrial DNA SRM-2392 standard reference material. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Evaluation of next generation sequencing for the analysis of Eimeria communities in wildlife.

PubMed

Vermeulen, Elke T; Lott, Matthew J; Eldridge, Mark D B; Power, Michelle L

2016-05-01

Next-generation sequencing (NGS) techniques are well-established for studying bacterial communities but not yet for microbial eukaryotes. Parasite communities remain poorly studied, due in part to the lack of reliable and accessible molecular methods to analyse eukaryotic communities. We aimed to develop and evaluate a methodology to analyse communities of the protozoan parasite Eimeria from populations of the Australian marsupial Petrogale penicillata (brush-tailed rock-wallaby) using NGS. An oocyst purification method for small sample sizes and polymerase chain reaction (PCR) protocol for the 18S rRNA locus targeting Eimeria was developed and optimised prior to sequencing on the Illumina MiSeq platform. A data analysis approach was developed by modifying methods from bacterial metagenomics and utilising existing Eimeria sequences in GenBank. Operational taxonomic unit (OTU) assignment at a high similarity threshold (97%) was more accurate at assigning Eimeria contigs into Eimeria OTUs but at a lower threshold (95%) there was greater resolution between OTU consensus sequences. The assessment of two amplification PCR methods prior to Illumina MiSeq, single and nested PCR, determined that single PCR was more sensitive to Eimeria as more Eimeria OTUs were detected in single amplicons. We have developed a simple and cost-effective approach to a data analysis pipeline for community analysis of eukaryotic organisms using Eimeria communities as a model. The pipeline provides a basis for evaluation using other eukaryotic organisms and potential for diverse community analysis studies. Copyright © 2016 Elsevier B.V. All rights reserved.
Long-read whole genome sequencing and comparative analysis of six strains of the human pathogen Orientia tsutsugamushi.

PubMed

Batty, Elizabeth M; Chaemchuen, Suwittra; Blacksell, Stuart; Richards, Allen L; Paris, Daniel; Bowden, Rory; Chan, Caroline; Lachumanan, Ramkumar; Day, Nicholas; Donnelly, Peter; Chen, Swaine; Salje, Jeanne

2018-06-01

Orientia tsutsugamushi is a clinically important but neglected obligate intracellular bacterial pathogen of the Rickettsiaceae family that causes the potentially life-threatening human disease scrub typhus. In contrast to the genome reduction seen in many obligate intracellular bacteria, early genetic studies of Orientia have revealed one of the most repetitive bacterial genomes sequenced to date. The dramatic expansion of mobile elements has hampered efforts to generate complete genome sequences using short read sequencing methodologies, and consequently there have been few studies of the comparative genomics of this neglected species. We report new high-quality genomes of O. tsutsugamushi, generated using PacBio single molecule long read sequencing, for six strains: Karp, Kato, Gilliam, TA686, UT76 and UT176. In comparative genomics analyses of these strains together with existing reference genomes from Ikeda and Boryong strains, we identify a relatively small core genome of 657 genes, grouped into core gene islands and separated by repeat regions, and use the core genes to infer the first whole-genome phylogeny of Orientia. Complete assemblies of multiple Orientia genomes verify initial suggestions that these are remarkable organisms. They have larger genomes compared with most other Rickettsiaceae, with widespread amplification of repeat elements and massive chromosomal rearrangements between strains. At the gene level, Orientia has a relatively small set of universally conserved genes, similar to other obligate intracellular bacteria, and the relative expansion in genome size can be accounted for by gene duplication and repeat amplification. Our study demonstrates the utility of long read sequencing to investigate complex bacterial genomes and characterise genomic variation.
SSPACE-LongRead: scaffolding bacterial draft genomes using long read sequence information

PubMed Central

2014-01-01

Background The recent introduction of the Pacific Biosciences RS single molecule sequencing technology has opened new doors to scaffolding genome assemblies in a cost-effective manner. The long read sequence information is promised to enhance the quality of incomplete and inaccurate draft assemblies constructed from Next Generation Sequencing (NGS) data. Results Here we propose a novel hybrid assembly methodology that aims to scaffold pre-assembled contigs in an iterative manner using PacBio RS long read information as a backbone. On a test set comprising six bacterial draft genomes, assembled using either a single Illumina MiSeq or Roche 454 library, we show that even a 50× coverage of uncorrected PacBio RS long reads is sufficient to drastically reduce the number of contigs. Comparisons to the AHA scaffolder indicate our strategy is better capable of producing (nearly) complete bacterial genomes. Conclusions The current work describes our SSPACE-LongRead software which is designed to upgrade incomplete draft genomes using single molecule sequences. We conclude that the recent advances of the PacBio sequencing technology and chemistry, in combination with the limited computational resources required to run our program, allow to scaffold genomes in a fast and reliable manner. PMID:24950923
2b-RAD genotyping for population genomic studies of Chagas disease vectors: Rhodnius ecuadoriensis in Ecuador

PubMed Central

Villacís, Anita G.; Andersson, Björn; Costales, Jaime A.; De Noia, Michele; Ocaña-Mayorga, Sofía; Yumiseva, Cesar A.; Grijalva, Mario J.; Llewellyn, Martin S.

2017-01-01

Background Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. Methodology/Principal findings The 2b-RAD protocol was carried out in-house at a non-specialized laboratory using 20 R. ecuadoriensis adults collected from the central coast and southern Andean region of Ecuador, from June 2006 to July 2013. 2b-RAD sequencing data was performed on an Illumina MiSeq instrument and analyzed with the STACKS de novo pipeline for loci assembly and Single Nucleotide Polymorphism (SNP) discovery. Preliminary population genomic analyses (global AMOVA and Bayesian clustering) were implemented. Our results showed that the 2b-RAD genotyping protocol is effective for R. ecuadoriensis and likely for other triatomine species. However, only BcgI and CspCI restriction enzymes provided a number of markers suitable for population genomic analysis at the read depth we generated. Our preliminary genomic analyses detected a signal of genetic structuring across the study area. Conclusions/Significance Our findings suggest that 2b-RAD genotyping is both a cost effective and methodologically simple approach for generating high resolution genomic data for Chagas disease vectors with the power to distinguish between different vector populations at epidemiologically relevant scales. As such, 2b-RAD represents a powerful tool in the hands of medical entomologists with limited access to specialized molecular biological equipment. PMID:28723901
Automation of route identification and optimisation based on data-mining and chemical intuition.

PubMed

Lapkin, A A; Heer, P K; Jacob, P-M; Hutchby, M; Cunningham, W; Bull, S D; Davidson, M G

2017-09-21

Data-mining of Reaxys and network analysis of the combined literature and in-house reactions set were used to generate multiple possible reaction routes to convert a bio-waste feedstock, limonene, into a pharmaceutical API, paracetamol. The network analysis of data provides a rich knowledge-base for generation of the initial reaction screening and development programme. Based on the literature and the in-house data, an overall flowsheet for the conversion of limonene to paracetamol was proposed. Each individual reaction-separation step in the sequence was simulated as a combination of the continuous flow and batch steps. The linear model generation methodology allowed us to identify the reaction steps requiring further chemical optimisation. The generated model can be used for global optimisation and generation of environmental and other performance indicators, such as cost indicators. However, the identified further challenge is to automate model generation to evolve optimal multi-step chemical routes and optimal process configurations.
HASP server: a database and structural visualization platform for comparative models of influenza A hemagglutinin proteins.

PubMed

Ambroggio, Xavier I; Dommer, Jennifer; Gopalan, Vivek; Dunham, Eleca J; Taubenberger, Jeffery K; Hurt, Darrell E

2013-06-18

Influenza A viruses possess RNA genomes that mutate frequently in response to immune pressures. The mutations in the hemagglutinin genes are particularly significant, as the hemagglutinin proteins mediate attachment and fusion to host cells, thereby influencing viral pathogenicity and species specificity. Large-scale influenza A genome sequencing efforts have been ongoing to understand past epidemics and pandemics and anticipate future outbreaks. Sequencing efforts thus far have generated nearly 9,000 distinct hemagglutinin amino acid sequences. Comparative models for all publicly available influenza A hemagglutinin protein sequences (8,769 to date) were generated using the Rosetta modeling suite. The C-alpha root mean square deviations between a randomly chosen test set of models and their crystallographic templates were less than 2 Å, suggesting that the modeling protocols yielded high-quality results. The models were compiled into an online resource, the Hemagglutinin Structure Prediction (HASP) server. The HASP server was designed as a scientific tool for researchers to visualize hemagglutinin protein sequences of interest in a three-dimensional context. With a built-in molecular viewer, hemagglutinin models can be compared side-by-side and navigated by a corresponding sequence alignment. The models and alignments can be downloaded for offline use and further analysis. The modeling protocols used in the HASP server scale well for large amounts of sequences and will keep pace with expanded sequencing efforts. The conservative approach to modeling and the intuitive search and visualization interfaces allow researchers to quickly analyze hemagglutinin sequences of interest in the context of the most highly related experimental structures, and allow them to directly compare hemagglutinin sequences to each other simultaneously in their two- and three-dimensional contexts. The models and methodology have shown utility in current research efforts and the ongoing aim of the HASP server is to continue to accelerate influenza A research and have a positive impact on global public health.
The Porcelain Crab Transcriptome and PCAD, the Porcelain Crab Microarray and Sequence Database

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tagmount, Abderrahmane; Wang, Mei; Lindquist, Erika

2010-01-27

Background: With the emergence of a completed genome sequence of the freshwater crustacean Daphnia pulex, construction of genomic-scale sequence databases for additional crustacean sequences are important for comparative genomics and annotation. Porcelain crabs, genus Petrolisthes, have been powerful crustacean models for environmental and evolutionary physiology with respect to thermal adaptation and understanding responses of marine organisms to climate change. Here, we present a large-scale EST sequencing and cDNA microarray database project for the porcelain crab Petrolisthes cinctipes. Methodology/Principal Findings: A set of ~;;30K unique sequences (UniSeqs) representing ~;;19K clusters were generated from ~;;98K high quality ESTs from a set ofmore » tissue specific non-normalized and mixed-tissue normalized cDNA libraries from the porcelain crab Petrolisthes cinctipes. Homology for each UniSeq was assessed using BLAST, InterProScan, GO and KEGG database searches. Approximately 66percent of the UniSeqs had homology in at least one of the databases. All EST and UniSeq sequences along with annotation results and coordinated cDNA microarray datasets have been made publicly accessible at the Porcelain Crab Array Database (PCAD), a feature-enriched version of the Stanford and Longhorn Array Databases.Conclusions/Significance: The EST project presented here represents the third largest sequencing effort for any crustacean, and the largest effort for any crab species. Our assembly and clustering results suggest that our porcelain crab EST data set is equally diverse to the much larger EST set generated in the Daphnia pulex genome sequencing project, and thus will be an important resource to the Daphnia research community. Our homology results support the pancrustacea hypothesis and suggest that Malacostraca may be ancestral to Branchiopoda and Hexapoda. Our results also suggest that our cDNA microarrays cover as much of the transcriptome as can reasonably be captured in EST library sequencing approaches, and thus represent a rich resource for studies of environmental genomics.« less
Omics Metadata Management Software (OMMS).

PubMed

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. The OMMS can be obtained at http://omms.sandia.gov.
De Novo Transcriptome of the Hemimetabolous German Cockroach (Blattella germanica)

PubMed Central

Zhou, Xiaojie; Qian, Kun; Tong, Ying; Zhu, Junwei Jerry; Qiu, Xinghui; Zeng, Xiaopeng

2014-01-01

Background The German cockroach, Blattella germanica, is an important insect pest that transmits various pathogens mechanically and causes severe allergic diseases. This insect has long served as a model system for studies of insect biology, physiology and ecology. However, the lack of genome or transcriptome information heavily hinder our further understanding about the German cockroach in every aspect at a molecular level and on a genome-wide scale. To explore the transcriptome and identify unique sequences of interest, we subjected the B. germanica transcriptome to massively parallel pyrosequencing and generated the first reference transcriptome for B. germanica. Methodology/Principal Findings A total of 1,365,609 raw reads with an average length of 529 bp were generated via pyrosequencing the mixed cDNA library from different life stages of German cockroach including maturing oothecae, nymphs, adult females and males. The raw reads were de novo assembled to 48,800 contigs and 3,961 singletons with high-quality unique sequences. These sequences were annotated and classified functionally in terms of BLAST, GO and KEGG, and the genes putatively coding detoxification enzyme systems, insecticide targets, key components in systematic RNA interference, immunity and chemoreception pathways were identified. A total of 3,601 SSRs (Simple Sequence Repeats) loci were also predicted. Conclusions/Significance The whole transcriptome pyrosequencing data from this study provides a usable genetic resource for future identification of potential functional genes involved in various biological processes. PMID:25265537
Omics Metadata Management Software (OMMS)

PubMed Central

Perez-Arriaga, Martha O; Wilson, Susan; Williams, Kelly P; Schoeniger, Joseph; Waymire, Russel L; Powell, Amy Jo

2015-01-01

Next-generation sequencing projects have underappreciated information management tasks requiring detailed attention to specimen curation, nucleic acid sample preparation and sequence production methods required for downstream data processing, comparison, interpretation, sharing and reuse. The few existing metadata management tools for genome-based studies provide weak curatorial frameworks for experimentalists to store and manage idiosyncratic, project-specific information, typically offering no automation supporting unified naming and numbering conventions for sequencing production environments that routinely deal with hundreds, if not thousands of samples at a time. Moreover, existing tools are not readily interfaced with bioinformatics executables, (e.g., BLAST, Bowtie2, custom pipelines). Our application, the Omics Metadata Management Software (OMMS), answers both needs, empowering experimentalists to generate intuitive, consistent metadata, and perform analyses and information management tasks via an intuitive web-based interface. Several use cases with short-read sequence datasets are provided to validate installation and integrated function, and suggest possible methodological road maps for prospective users. Provided examples highlight possible OMMS workflows for metadata curation, multistep analyses, and results management and downloading. The OMMS can be implemented as a stand alone-package for individual laboratories, or can be configured for webbased deployment supporting geographically-dispersed projects. The OMMS was developed using an open-source software base, is flexible, extensible and easily installed and executed. The OMMS can be obtained at http://omms.sandia.gov. Availability The OMMS can be obtained at http://omms.sandia.gov PMID:26124554
IDP-ASE: haplotyping and quantifying allele-specific expression at the gene and gene isoform level by hybrid sequencing

PubMed Central

Deonovic, Benjamin; Wang, Yunhao; Weirather, Jason; Wang, Xiu-Jie; Au, Kin Fai

2017-01-01

Abstract Allele-specific expression (ASE) is a fundamental problem in studying gene regulation and diploid transcriptome profiles, with two key challenges: (i) haplotyping and (ii) estimation of ASE at the gene isoform level. Existing ASE analysis methods are limited by a dependence on haplotyping from laborious experiments or extra genome/family trio data. In addition, there is a lack of methods for gene isoform level ASE analysis. We developed a tool, IDP-ASE, for full ASE analysis. By innovative integration of Third Generation Sequencing (TGS) long reads with Second Generation Sequencing (SGS) short reads, the accuracy of haplotyping and ASE quantification at the gene and gene isoform level was greatly improved as demonstrated by the gold standard data GM12878 data and semi-simulation data. In addition to methodology development, applications of IDP-ASE to human embryonic stem cells and breast cancer cells indicate that the imbalance of ASE and non-uniformity of gene isoform ASE is widespread, including tumorigenesis relevant genes and pluripotency markers. These results show that gene isoform expression and allele-specific expression cooperate to provide high diversity and complexity of gene regulation and expression, highlighting the importance of studying ASE at the gene isoform level. Our study provides a robust bioinformatics solution to understand ASE using RNA sequencing data only. PMID:27899656
Comparing memory-efficient genome assemblers on stand-alone and cloud infrastructures.

PubMed

Kleftogiannis, Dimitrios; Kalnis, Panos; Bajic, Vladimir B

2013-01-01

A fundamental problem in bioinformatics is genome assembly. Next-generation sequencing (NGS) technologies produce large volumes of fragmented genome reads, which require large amounts of memory to assemble the complete genome efficiently. With recent improvements in DNA sequencing technologies, it is expected that the memory footprint required for the assembly process will increase dramatically and will emerge as a limiting factor in processing widely available NGS-generated reads. In this report, we compare current memory-efficient techniques for genome assembly with respect to quality, memory consumption and execution time. Our experiments prove that it is possible to generate draft assemblies of reasonable quality on conventional multi-purpose computers with very limited available memory by choosing suitable assembly methods. Our study reveals the minimum memory requirements for different assembly programs even when data volume exceeds memory capacity by orders of magnitude. By combining existing methodologies, we propose two general assembly strategies that can improve short-read assembly approaches and result in reduction of the memory footprint. Finally, we discuss the possibility of utilizing cloud infrastructures for genome assembly and we comment on some findings regarding suitable computational resources for assembly.
Enhanced Modeling of First-Order Plant Equations of Motion for Aeroelastic and Aeroservoelastic Applications

NASA Technical Reports Server (NTRS)

Pototzky, Anthony S.

2010-01-01

A methodology is described for generating first-order plant equations of motion for aeroelastic and aeroservoelastic applications. The description begins with the process of generating data files representing specialized mode-shapes, such as rigid-body and control surface modes, using both PATRAN and NASTRAN analysis. NASTRAN executes the 146 solution sequence using numerous Direct Matrix Abstraction Program (DMAP) calls to import the mode-shape files and to perform the aeroelastic response analysis. The aeroelastic response analysis calculates and extracts structural frequencies, generalized masses, frequency-dependent generalized aerodynamic force (GAF) coefficients, sensor deflections and load coefficients data as text-formatted data files. The data files are then re-sequenced and re-formatted using a custom written FORTRAN program. The text-formatted data files are stored and coefficients for s-plane equations are fitted to the frequency-dependent GAF coefficients using two Interactions of Structures, Aerodynamics and Controls (ISAC) programs. With tabular files from stored data created by ISAC, MATLAB generates the first-order aeroservoelastic plant equations of motion. These equations include control-surface actuator, turbulence, sensor and load modeling. Altitude varying root-locus plot and PSD plot results for a model of the F-18 aircraft are presented to demonstrate the capability.
Methodological Reporting of Randomized Trials in Five Leading Chinese Nursing Journals

PubMed Central

Shi, Chunhu; Tian, Jinhui; Ren, Dan; Wei, Hongli; Zhang, Lihuan; Wang, Quan; Yang, Kehu

2014-01-01

Background Randomized controlled trials (RCTs) are not always well reported, especially in terms of their methodological descriptions. This study aimed to investigate the adherence of methodological reporting complying with CONSORT and explore associated trial level variables in the Chinese nursing care field. Methods In June 2012, we identified RCTs published in five leading Chinese nursing journals and included trials with details of randomized methods. The quality of methodological reporting was measured through the methods section of the CONSORT checklist and the overall CONSORT methodological items score was calculated and expressed as a percentage. Meanwhile, we hypothesized that some general and methodological characteristics were associated with reporting quality and conducted a regression with these data to explore the correlation. The descriptive and regression statistics were calculated via SPSS 13.0. Results In total, 680 RCTs were included. The overall CONSORT methodological items score was 6.34±0.97 (Mean ± SD). No RCT reported descriptions and changes in “trial design,” changes in “outcomes” and “implementation,” or descriptions of the similarity of interventions for “blinding.” Poor reporting was found in detailing the “settings of participants” (13.1%), “type of randomization sequence generation” (1.8%), calculation methods of “sample size” (0.4%), explanation of any interim analyses and stopping guidelines for “sample size” (0.3%), “allocation concealment mechanism” (0.3%), additional analyses in “statistical methods” (2.1%), and targeted subjects and methods of “blinding” (5.9%). More than 50% of trials described randomization sequence generation, the eligibility criteria of “participants,” “interventions,” and definitions of the “outcomes” and “statistical methods.” The regression analysis found that publication year and ITT analysis were weakly associated with CONSORT score. Conclusions The completeness of methodological reporting of RCTs in the Chinese nursing care field is poor, especially with regard to the reporting of trial design, changes in outcomes, sample size calculation, allocation concealment, blinding, and statistical methods. PMID:25415382
RAS testing in metastatic colorectal cancer: advances in Europe.

PubMed

Van Krieken, J Han J M; Rouleau, Etienne; Ligtenberg, Marjolijn J L; Normanno, Nicola; Patterson, Scott D; Jung, Andreas

2016-04-01

Personalized medicine shows promise for maximizing efficacy and minimizing toxicity of anti-cancer treatment. KRAS exon 2 mutations are predictive of resistance to epidermal growth factor receptor-directed monoclonal antibodies in patients with metastatic colorectal cancer. Recent studies have shown that broader RAS testing (KRAS and NRAS) is needed to select patients for treatment. While Sanger sequencing is still used, approaches based on various methodologies are available. Few CE-approved kits, however, detect the full spectrum of RAS mutations. More recently, "next-generation" sequencing has been developed for research use, including parallel semiconductor sequencing and reversible termination. These techniques have high technical sensitivities for detecting mutations, although the ideal threshold is currently unknown. Finally, liquid biopsy has the potential to become an additional tool to assess tumor-derived DNA. For accurate and timely RAS testing, appropriate sampling and prompt delivery of material is critical. Processes to ensure efficient turnaround from sample request to RAS evaluation must be implemented so that patients receive the most appropriate treatment. Given the variety of methodologies, external quality assurance programs are important to ensure a high standard of RAS testing. Here, we review technical and practical aspects of RAS testing for pathologists working with metastatic colorectal cancer tumor samples. The extension of markers from KRAS to RAS testing is the new paradigm for biomarker testing in colorectal cancer.
Sequence analysis of cultivated strawberry (Fragaria × ananassa Duch.) using microdissected single somatic chromosomes.

PubMed

Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

2017-01-01

Cultivated strawberry ( Fragaria × ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F. × ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.

Generation and Analysis of a Large-Scale Expressed Sequence Tag Database from a Full-Length Enriched cDNA Library of Developing Leaves of Gossypium hirsutum L

PubMed Central

Pang, Chaoyou; Fan, Shuli; Song, Meizhen; Yu, Shuxun

2013-01-01

Background Cotton (Gossypium hirsutum L.) is one of the world’s most economically-important crops. However, its entire genome has not been sequenced, and limited resources are available in GenBank for understanding the molecular mechanisms underlying leaf development and senescence. Methodology/Principal Findings In this study, 9,874 high-quality ESTs were generated from a normalized, full-length cDNA library derived from pooled RNA isolated from throughout leaf development during the plant blooming stage. After clustering and assembly of these ESTs, 5,191 unique sequences, representative 1,652 contigs and 3,539 singletons, were obtained. The average unique sequence length was 682 bp. Annotation of these unique sequences revealed that 84.4% showed significant homology to sequences in the NCBI non-redundant protein database, and 57.3% had significant hits to known proteins in the Swiss-Prot database. Comparative analysis indicated that our library added 2,400 ESTs and 991 unique sequences to those known for cotton. The unigenes were functionally characterized by gene ontology annotation. We identified 1,339 and 200 unigenes as potential leaf senescence-related genes and transcription factors, respectively. Moreover, nine genes related to leaf senescence and eleven MYB transcription factors were randomly selected for quantitative real-time PCR (qRT-PCR), which revealed that these genes were regulated differentially during senescence. The qRT-PCR for three GhYLSs revealed that these genes express express preferentially in senescent leaves. Conclusions/Significance These EST resources will provide valuable sequence information for gene expression profiling analyses and functional genomics studies to elucidate their roles, as well as for studying the mechanisms of leaf development and senescence in cotton and discovering candidate genes related to important agronomic traits of cotton. These data will also facilitate future whole-genome sequence assembly and annotation in G. hirsutum and comparative genomics among Gossypium species. PMID:24146870
RFLP and sequence analysis of the cytochrome b gene of selected animals and man: methodology and forensic application.

PubMed

Zehner, R; Zimmermann, S; Mebs, D

1998-01-01

To identify common animal species by analysis of the cytochrome b gene a method has been developed to obtain PCR products of a large domain of the cytochrome b gene (981 bp out of 1140 bp) in humans, selected mammals and birds using the same specifically designed primers. Species-specific RFLP patterns are generated by co-restriction with the restriction endonucleases ALU I and NCO I. The RFLP patterns obtained are conclusive even in mixtures of two or more species. The results were confirmed by sequence analysis which in addition explained intraspecies variations in the RFLP patterns. The method has been applied to forensic casework studies where the origin of roasted meat, stomach contents and a bone sample has been successfully identified.
Detection of generalized synchronization using echo state networks

NASA Astrophysics Data System (ADS)

Ibáñez-Soria, D.; Garcia-Ojalvo, J.; Soria-Frisch, A.; Ruffini, G.

2018-03-01

Generalized synchronization between coupled dynamical systems is a phenomenon of relevance in applications that range from secure communications to physiological modelling. Here, we test the capabilities of reservoir computing and, in particular, echo state networks for the detection of generalized synchronization. A nonlinear dynamical system consisting of two coupled Rössler chaotic attractors is used to generate temporal series consisting of time-locked generalized synchronized sequences interleaved with unsynchronized ones. Correctly tuned, echo state networks are able to efficiently discriminate between unsynchronized and synchronized sequences even in the presence of relatively high levels of noise. Compared to other state-of-the-art techniques of synchronization detection, the online capabilities of the proposed Echo State Network based methodology make it a promising choice for real-time applications aiming to monitor dynamical synchronization changes in continuous signals.
Real time simulation of computer-assisted sequencing of terminal area operations

NASA Technical Reports Server (NTRS)

Dear, R. G.

1981-01-01

A simulation was developed to investigate the utilization of computer assisted decision making for the task of sequencing and scheduling aircraft in a high density terminal area. The simulation incorporates a decision methodology termed Constrained Position Shifting. This methodology accounts for aircraft velocity profiles, routes, and weight classes in dynamically sequencing and scheduling arriving aircraft. A sample demonstration of Constrained Position Shifting is presented where six aircraft types (including both light and heavy aircraft) are sequenced to land at Denver's Stapleton International Airport. A graphical display is utilized and Constrained Position Shifting with a maximum shift of four positions (rearward or forward) is compared to first come, first serve with respect to arrival at the runway. The implementation of computer assisted sequencing and scheduling methodologies is investigated. A time based control concept will be required and design considerations for such a system are discussed.
ProDaMa: an open source Python library to generate protein structure datasets.

PubMed

Armano, Giuliano; Manconi, Andrea

2009-10-02

The huge difference between the number of known sequences and known tertiary structures has justified the use of automated methods for protein analysis. Although a general methodology to solve these problems has not been yet devised, researchers are engaged in developing more accurate techniques and algorithms whose training plays a relevant role in determining their performance. From this perspective, particular importance is given to the training data used in experiments, and researchers are often engaged in the generation of specialized datasets that meet their requirements. To facilitate the task of generating specialized datasets we devised and implemented ProDaMa, an open source Python library than provides classes for retrieving, organizing, updating, analyzing, and filtering protein data. ProDaMa has been used to generate specialized datasets useful for secondary structure prediction and to develop a collaborative web application aimed at generating and sharing protein structure datasets. The library, the related database, and the documentation are freely available at the URL http://iasc.diee.unica.it/prodama.
Next-Generation Sequencing: The Translational Medicine Approach from “Bench to Bedside to Population”

PubMed Central

Beigh, Mohammad Muzafar

2016-01-01

Humans have predicted the relationship between heredity and diseases for a long time. Only in the beginning of the last century, scientists begin to discover the connotations between different genes and disease phenotypes. Recent trends in next-generation sequencing (NGS) technologies have brought a great momentum in biomedical research that in turn has remarkably augmented our basic understanding of human biology and its associated diseases. State-of-the-art next generation biotechnologies have started making huge strides in our current understanding of mechanisms of various chronic illnesses like cancers, metabolic disorders, neurodegenerative anomalies, etc. We are experiencing a renaissance in biomedical research primarily driven by next generation biotechnologies like genomics, transcriptomics, proteomics, metabolomics, lipidomics etc. Although genomic discoveries are at the forefront of next generation omics technologies, however, their implementation into clinical arena had been painstakingly slow mainly because of high reaction costs and unavailability of requisite computational tools for large-scale data analysis. However rapid innovations and steadily lowering cost of sequence-based chemistries along with the development of advanced bioinformatics tools have lately prompted launching and implementation of large-scale massively parallel genome sequencing programs in different fields ranging from medical genetics, infectious biology, agriculture sciences etc. Recent advances in large-scale omics-technologies is bringing healthcare research beyond the traditional “bench to bedside” approach to more of a continuum that will include improvements, in public healthcare and will be primarily based on predictive, preventive, personalized, and participatory medicine approach (P4). Recent large-scale research projects in genetic and infectious disease biology have indicated that massively parallel whole-genome/whole-exome sequencing, transcriptome analysis, and other functional genomic tools can reveal large number of unique functional elements and/or markers that otherwise would be undetected by traditional sequencing methodologies. Therefore, latest trends in the biomedical research is giving birth to the new branch in medicine commonly referred to as personalized and/or precision medicine. Developments in the post-genomic era are believed to completely restructure the present clinical pattern of disease prevention and treatment as well as methods of diagnosis and prognosis. The next important step in the direction of the precision/personalized medicine approach should be its early adoption in clinics for future medical interventions. Consequently, in coming year’s next generation biotechnologies will reorient medical practice more towards disease prediction and prevention approaches rather than curing them at later stages of their development and progression, even at wider population level(s) for general public healthcare system. PMID:28930123
Prescreening of microbial populations for the assessment of sequencing potential.

PubMed

Hanning, Irene B; Ricke, Steven C

2011-01-01

Next-generation sequencing (NGS) is a powerful tool that can be utilized to profile and compare microbial populations. By amplifying a target gene present in all bacteria and subsequently sequencing amplicons, the bacteria genera present in the populations can be identified and compared. In some scenarios, little to no difference may exist among microbial populations being compared in which case a prescreening method would be practical to determine which microbial populations would be suitable for further analysis by NGS. Denaturing density-gradient electrophoresis (DGGE) is relatively cheaper than NGS and the data comparing microbial populations are ready to be viewed immediately after electrophoresis. DGGE follows essentially the same initial methodology as NGS by targeting and amplifying the 16S rRNA gene. However, as opposed to sequencing amplicons, DGGE amplicons are analyzed by electrophoresis. By prescreening microbial populations with DGGE, more efficient use of NGS methods can be accomplished. In this chapter, we outline the protocol for DGGE targeting the same gene (16S rRNA) that would be targeted for NGS to compare and determine differences in microbial populations from a wide range of ecosystems.
Methodology challenges in studying human gut microbiota - effects of collection, storage, DNA extraction and next generation sequencing technologies.

PubMed

Panek, Marina; Čipčić Paljetak, Hana; Barešić, Anja; Perić, Mihaela; Matijašić, Mario; Lojkić, Ivana; Vranešić Bender, Darija; Krznarić, Željko; Verbanac, Donatella

2018-03-23

The information on microbiota composition in the human gastrointestinal tract predominantly originates from the analyses of human faeces by application of next generation sequencing (NGS). However, the detected composition of the faecal bacterial community can be affected by various factors including experimental design and procedures. This study evaluated the performance of different protocols for collection and storage of faecal samples (native and OMNIgene.GUT system) and bacterial DNA extraction (MP Biomedicals, QIAGEN and MO BIO kits), using two NGS platforms for 16S rRNA gene sequencing (Ilumina MiSeq and Ion Torrent PGM). OMNIgene.GUT proved as a reliable and convenient system for collection and storage of faecal samples although favouring Sutterella genus. MP provided superior DNA yield and quality, MO BIO depleted Gram positive organisms while using QIAGEN with OMNIgene.GUT resulted in greatest variability compared to other two kits. MiSeq and IT platforms in their supplier recommended setups provided comparable reproducibility of donor faecal microbiota. The differences included higher diversity observed with MiSeq and increased capacity of MiSeq to detect Akkermansia muciniphila, [Odoribacteraceae], Erysipelotrichaceae and Ruminococcaceae (primarily Faecalibacterium prausnitzii). The results of our study could assist the investigators using NGS technologies to make informed decisions on appropriate tools for their experimental pipelines.
Development and preliminary evaluation of a multiplexed amplification and next generation sequencing method for viral hemorrhagic fever diagnostics

PubMed Central

Radonić, Aleksandar; Kocak Tufan, Zeliha; Domingo, Cristina

2017-01-01

Background We describe the development and evaluation of a novel method for targeted amplification and Next Generation Sequencing (NGS)-based identification of viral hemorrhagic fever (VHF) agents and assess the feasibility of this approach in diagnostics. Methodology An ultrahigh-multiplex panel was designed with primers to amplify all known variants of VHF-associated viruses and relevant controls. The performance of the panel was evaluated via serially quantified nucleic acids from Yellow fever virus, Rift Valley fever virus, Crimean-Congo hemorrhagic fever (CCHF) virus, Ebola virus, Junin virus and Chikungunya virus in a semiconductor-based sequencing platform. A comparison of direct NGS and targeted amplification-NGS was performed. The panel was further tested via a real-time nanopore sequencing-based platform, using clinical specimens from CCHF patients. Principal findings The multiplex primer panel comprises two pools of 285 and 256 primer pairs for the identification of 46 virus species causing hemorrhagic fevers, encompassing 6,130 genetic variants of the strains involved. In silico validation revealed that the panel detected over 97% of all known genetic variants of the targeted virus species. High levels of specificity and sensitivity were observed for the tested virus strains. Targeted amplification ensured viral read detection in specimens with the lowest virus concentration (1–10 genome equivalents) and enabled significant increases in specific reads over background for all viruses investigated. In clinical specimens, the panel enabled detection of the causative agent and its characterization within 10 minutes of sequencing, with sample-to-result time of less than 3.5 hours. Conclusions Virus enrichment via targeted amplification followed by NGS is an applicable strategy for the diagnosis of VHFs which can be adapted for high-throughput or nanopore sequencing platforms and employed for surveillance or outbreak monitoring. PMID:29155823
Rediscovery of Good-Turing estimators via Bayesian nonparametrics.

PubMed

Favaro, Stefano; Nipoti, Bernardo; Teh, Yee Whye

2016-03-01

The problem of estimating discovery probabilities originated in the context of statistical ecology, and in recent years it has become popular due to its frequent appearance in challenging applications arising in genetics, bioinformatics, linguistics, designs of experiments, machine learning, etc. A full range of statistical approaches, parametric and nonparametric as well as frequentist and Bayesian, has been proposed for estimating discovery probabilities. In this article, we investigate the relationships between the celebrated Good-Turing approach, which is a frequentist nonparametric approach developed in the 1940s, and a Bayesian nonparametric approach recently introduced in the literature. Specifically, under the assumption of a two parameter Poisson-Dirichlet prior, we show that Bayesian nonparametric estimators of discovery probabilities are asymptotically equivalent, for a large sample size, to suitably smoothed Good-Turing estimators. As a by-product of this result, we introduce and investigate a methodology for deriving exact and asymptotic credible intervals to be associated with the Bayesian nonparametric estimators of discovery probabilities. The proposed methodology is illustrated through a comprehensive simulation study and the analysis of Expressed Sequence Tags data generated by sequencing a benchmark complementary DNA library. © 2015, The International Biometric Society.
Dynamic Loads Generation for Multi-Point Vibration Excitation Problems

NASA Technical Reports Server (NTRS)

Shen, Lawrence

2011-01-01

A random-force method has been developed to predict dynamic loads produced by rocket-engine random vibrations for new rocket-engine designs. The method develops random forces at multiple excitation points based on random vibration environments scaled from accelerometer data obtained during hot-fire tests of existing rocket engines. This random-force method applies random forces to the model and creates expected dynamic response in a manner that simulates the way the operating engine applies self-generated random vibration forces (random pressure acting on an area) with the resulting responses that we measure with accelerometers. This innovation includes the methodology (implementation sequence), the computer code, two methods to generate the random-force vibration spectra, and two methods to reduce some of the inherent conservatism in the dynamic loads. This methodology would be implemented to generate the random-force spectra at excitation nodes without requiring the use of artificial boundary conditions in a finite element model. More accurate random dynamic loads than those predicted by current industry methods can then be generated using the random force spectra. The scaling method used to develop the initial power spectral density (PSD) environments for deriving the random forces for the rocket engine case is based on the Barrett Criteria developed at Marshall Space Flight Center in 1963. This invention approach can be applied in the aerospace, automotive, and other industries to obtain reliable dynamic loads and responses from a finite element model for any structure subject to multipoint random vibration excitations.
Homopolymer tail-mediated ligation PCR: a streamlined and highly efficient method for DNA cloning and library construction.

PubMed

Lazinski, David W; Camilli, Andrew

2013-01-01

The amplification of DNA fragments, cloned between user-defined 5' and 3' end sequences, is a prerequisite step in the use of many current applications including massively parallel sequencing (MPS). Here we describe an improved method, called homopolymer tail-mediated ligation PCR (HTML-PCR), that requires very little starting template, minimal hands-on effort, is cost-effective, and is suited for use in high-throughput and robotic methodologies. HTML-PCR starts with the addition of homopolymer tails of controlled lengths to the 3' termini of a double-stranded genomic template. The homopolymer tails enable the annealing-assisted ligation of a hybrid oligonucleotide to the template's recessed 5' ends. The hybrid oligonucleotide has a user-defined sequence at its 5' end. This primer, together with a second primer composed of a longer region complementary to the homopolymer tail and fused to a second 5' user-defined sequence, are used in a PCR reaction to generate the final product. The user-defined sequences can be varied to enable compatibility with a wide variety of downstream applications. We demonstrate our new method by constructing MPS libraries starting from nanogram and sub-nanogram quantities of Vibrio cholerae and Streptococcus pneumoniae genomic DNA.
SNP Discovery by Illumina-Based Transcriptome Sequencing of the Olive and the Genetic Characterization of Turkish Olive Genotypes Revealed by AFLP, SSR and SNP Markers

PubMed Central

Kaya, Hilal Betul; Cetin, Oznur; Kaya, Hulya; Sahin, Mustafa; Sefer, Filiz; Kahraman, Abdullah; Tanyolac, Bahattin

2013-01-01

Background The olive tree (Olea europaea L.) is a diploid (2n = 2x = 46) outcrossing species mainly grown in the Mediterranean area, where it is the most important oil-producing crop. Because of its economic, cultural and ecological importance, various DNA markers have been used in the olive to characterize and elucidate homonyms, synonyms and unknown accessions. However, a comprehensive characterization and a full sequence of its transcriptome are unavailable, leading to the importance of an efficient large-scale single nucleotide polymorphism (SNP) discovery in olive. The objectives of this study were (1) to discover olive SNPs using next-generation sequencing and to identify SNP primers for cultivar identification and (2) to characterize 96 olive genotypes originating from different regions of Turkey. Methodology/Principal Findings Next-generation sequencing technology was used with five distinct olive genotypes and generated cDNA, producing 126,542,413 reads using an Illumina Genome Analyzer IIx. Following quality and size trimming, the high-quality reads were assembled into 22,052 contigs with an average length of 1,321 bases and 45 singletons. The SNPs were filtered and 2,987 high-quality putative SNP primers were identified. The assembled sequences and singletons were subjected to BLAST similarity searches and annotated with a Gene Ontology identifier. To identify the 96 olive genotypes, these SNP primers were applied to the genotypes in combination with amplified fragment length polymorphism (AFLP) and simple sequence repeats (SSR) markers. Conclusions/Significance This study marks the highest number of SNP markers discovered to date from olive genotypes using transcriptome sequencing. The developed SNP markers will provide a useful source for molecular genetic studies, such as genetic diversity and characterization, high density quantitative trait locus (QTL) analysis, association mapping and map-based gene cloning in the olive. High levels of genetic variation among Turkish olive genotypes revealed by SNPs, AFLPs and SSRs allowed us to characterize the Turkish olive genotype. PMID:24058483
A New Omics Data Resource of Pleurocybella porrigens for Gene Discovery

PubMed Central

Dohra, Hideo; Someya, Takumi; Takano, Tomoyuki; Harada, Kiyonori; Omae, Saori; Hirai, Hirofumi; Yano, Kentaro; Kawagishi, Hirokazu

2013-01-01

Background Pleurocybella porrigens is a mushroom-forming fungus, which has been consumed as a traditional food in Japan. In 2004, 55 people were poisoned by eating the mushroom and 17 people among them died of acute encephalopathy. Since then, the Japanese government has been alerting Japanese people to take precautions against eating the P . porrigens mushroom. Unfortunately, despite efforts, the molecular mechanism of the encephalopathy remains elusive. The genome and transcriptome sequence data of P . porrigens and the related species, however, are not stored in the public database. To gain the omics data in P . porrigens , we sequenced genome and transcriptome of its fruiting bodies and mycelia by next generation sequencing. Methodology/Principal Findings Short read sequences of genomic DNAs and mRNAs in P . porrigens were generated by Illumina Genome Analyzer. Genome short reads were de novo assembled into scaffolds using Velvet. Comparisons of genome signatures among Agaricales showed that P . porrigens has a unique genome signature. Transcriptome sequences were assembled into contigs (unigenes). Biological functions of unigenes were predicted by Gene Ontology and KEGG pathway analyses. The majority of unigenes would be novel genes without significant counterparts in the public omics databases. Conclusions Functional analyses of unigenes present the existence of numerous novel genes in the basidiomycetes division. The results mean that the omics information such as genome, transcriptome and metabolome in basidiomycetes is short in the current databases. The large-scale omics information on P . porrigens , provided from this research, will give a new data resource for gene discovery in basidiomycetes. PMID:23936076
Generation of “LYmph Node Derived Antibody Libraries” (LYNDAL) for selecting fully human antibody fragments with therapeutic potential

PubMed Central

Diebolder, Philipp; Keller, Armin; Haase, Stephanie; Schlegelmilch, Anne; Kiefer, Jonathan D; Karimi, Tamana; Weber, Tobias; Moldenhauer, Gerhard; Kehm, Roland; Eis-Hübinger, Anna M; Jäger, Dirk; Federspil, Philippe A; Herold-Mende, Christel; Dyckhoff, Gerhard; Kontermann, Roland E; Arndt, Michaela AE; Krauss, Jürgen

2014-01-01

The development of efficient strategies for generating fully human monoclonal antibodies with unique functional properties that are exploitable for tailored therapeutic interventions remains a major challenge in the antibody technology field. Here, we present a methodology for recovering such antibodies from antigen-encountered human B cell repertoires. As the source for variable antibody genes, we cloned immunoglobulin G (IgG)-derived B cell repertoires from lymph nodes of 20 individuals undergoing surgery for head and neck cancer. Sequence analysis of unselected “LYmph Node Derived Antibody Libraries” (LYNDAL) revealed a naturally occurring distribution pattern of rearranged antibody sequences, representing all known variable gene families and most functional germline sequences. To demonstrate the feasibility for selecting antibodies with therapeutic potential from these repertoires, seven LYNDAL from donors with high serum titers against herpes simplex virus (HSV) were panned on recombinant glycoprotein B of HSV-1. Screening for specific binders delivered 34 single-chain variable fragments (scFvs) with unique sequences. Sequence analysis revealed extensive somatic hypermutation of enriched clones as a result of affinity maturation. Binding of scFvs to common glycoprotein B variants from HSV-1 and HSV-2 strains was highly specific, and the majority of analyzed antibody fragments bound to the target antigen with nanomolar affinity. From eight scFvs with HSV-neutralizing capacity in vitro, the most potent antibody neutralized 50% HSV-2 at 4.5 nM as a dimeric (scFv)2. We anticipate our approach to be useful for recovering fully human antibodies with therapeutic potential. PMID:24256717
Goodbye genome paper, hello genome report: the increasing popularity of 'genome announcements' and their impact on science.

PubMed

Smith, David Roy

2017-05-01

Next-generation sequencing technologies have revolutionized genomics and altered the scientific publication landscape. Life-science journals abound with genome papers-peer-reviewed descriptions of newly sequenced chromosomes. Although they once filled the pages of Nature and Science, genome papers are now mostly relegated to journals with low-impact factors. Some have forecast the death of the genome paper and argued that they are using up valuable resources and not advancing science. However, the publication rate of genome papers is on the rise. This increase is largely because some journals have created a new category of manuscript called genome reports, which are short, fast-tracked papers describing a chromosome sequence(s), its GenBank accession number and little else. In 2015, for example, more than 2000 genome reports were published, and 2016 is poised to bring even more. Here, I highlight the growing popularity of genome reports and discuss their merits, drawbacks and impact on science and the academic publication infrastructure. Genome reports can be excellent assets for the research community, but they are also being used as quick and easy routes to a publication, and in some instances they are not peer reviewed. One of the best arguments for genome reports is that they are a citable, user-generated genomic resource providing essential methodological and biological information, which may not be present in the sequence database. But they are expensive and time-consuming avenues for achieving such a goal. © The Author 2016. Published by Oxford University Press.
Generation of “LYmph Node Derived Antibody Libraries” (LYNDAL) for selecting fully human antibody fragments with therapeutic potential.

PubMed

Diebolder, Philipp; Keller, Armin; Haase, Stephanie; Schlegelmilch, Anne; Kiefer, Jonathan D; Karimi, Tamana; Weber, Tobias; Moldenhauer, Gerhard; Kehm, Roland; Eis-Hübinger, Anna M; Jäger, Dirk; Federspil, Philippe A; Herold-Mende, Christel; Dyckhoff, Gerhard; Kontermann, Roland E; Arndt, Michaela A E; Krauss, Jürgen

2014-01-01

The development of efficient strategies for generating fully human monoclonal antibodies with unique functional properties that are exploitable for tailored therapeutic interventions remains a major challenge in the antibody technology field. Here, we present a methodology for recovering such antibodies from antigen-encountered human B cell repertoires. As the source for variable antibody genes, we cloned immunoglobulin G (IgG)-derived B cell repertoires from lymph nodes of 20 individuals undergoing surgery for head and neck cancer. Sequence analysis of unselected “LYmph Node Derived Antibody Libraries” (LYNDAL) revealed a naturally occurring distribution pattern of rearranged antibody sequences, representing all known variable gene families and most functional germline sequences. To demonstrate the feasibility for selecting antibodies with therapeutic potential from these repertoires, seven LYNDAL from donors with high serum titers against herpes simplex virus (HSV) were panned on recombinant glycoprotein B of HSV-1. Screening for specific binders delivered 34 single-chain variable fragments (scFvs) with unique sequences. Sequence analysis revealed extensive somatic hypermutation of enriched clones as a result of affinity maturation. Binding of scFvs to common glycoprotein B variants from HSV-1 and HSV-2 strains was highly specific, and the majority of analyzed antibody fragments bound to the target antigen with nanomolar affinity. From eight scFvs with HSV-neutralizing capacity in vitro,the most potent antibody neutralized 50% HSV-2 at 4.5 nM as a dimeric (scFv)2. We anticipate our approach to be useful for recovering fully human antibodies with therapeutic potential.
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

PubMed Central

Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

2013-01-01

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

PubMed

Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

2013-07-01

Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.
Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data

PubMed Central

2010-01-01

Background In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. Results The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Conclusions Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements. PMID:20205909

Exact distribution of a pattern in a set of random sequences generated by a Markov source: applications to biological data.

PubMed

Nuel, Gregory; Regad, Leslie; Martin, Juliette; Camproux, Anne-Claude

2010-01-26

In bioinformatics it is common to search for a pattern of interest in a potentially large set of rather short sequences (upstream gene regions, proteins, exons, etc.). Although many methodological approaches allow practitioners to compute the distribution of a pattern count in a random sequence generated by a Markov source, no specific developments have taken into account the counting of occurrences in a set of independent sequences. We aim to address this problem by deriving efficient approaches and algorithms to perform these computations both for low and high complexity patterns in the framework of homogeneous or heterogeneous Markov models. The latest advances in the field allowed us to use a technique of optimal Markov chain embedding based on deterministic finite automata to introduce three innovative algorithms. Algorithm 1 is the only one able to deal with heterogeneous models. It also permits to avoid any product of convolution of the pattern distribution in individual sequences. When working with homogeneous models, Algorithm 2 yields a dramatic reduction in the complexity by taking advantage of previous computations to obtain moment generating functions efficiently. In the particular case of low or moderate complexity patterns, Algorithm 3 exploits power computation and binary decomposition to further reduce the time complexity to a logarithmic scale. All these algorithms and their relative interest in comparison with existing ones were then tested and discussed on a toy-example and three biological data sets: structural patterns in protein loop structures, PROSITE signatures in a bacterial proteome, and transcription factors in upstream gene regions. On these data sets, we also compared our exact approaches to the tempting approximation that consists in concatenating the sequences in the data set into a single sequence. Our algorithms prove to be effective and able to handle real data sets with multiple sequences, as well as biological patterns of interest, even when the latter display a high complexity (PROSITE signatures for example). In addition, these exact algorithms allow us to avoid the edge effect observed under the single sequence approximation, which leads to erroneous results, especially when the marginal distribution of the model displays a slow convergence toward the stationary distribution. We end up with a discussion on our method and on its potential improvements.
Systematic review of the methodological quality of controlled trials evaluating Chinese herbal medicine in patients with rheumatoid arthritis

PubMed Central

Pan, Xin; Lopez-Olivo, Maria A; Song, Juhee; Pratt, Gregory; Suarez-Almazor, Maria E

2017-01-01

Objectives We appraised the methodological and reporting quality of randomised controlled clinical trials (RCTs) evaluating the efficacy and safety of Chinese herbal medicine (CHM) in patients with rheumatoid arthritis (RA). Design For this systematic review, electronic databases were searched from inception until June 2015. The search was limited to humans and non-case report studies, but was not limited by language, year of publication or type of publication. Two independent reviewers selected RCTs, evaluating CHM in RA (herbals and decoctions). Descriptive statistics were used to report on risk of bias and their adherence to reporting standards. Multivariable logistic regression analysis was performed to determine study characteristics associated with high or unclear risk of bias. Results Out of 2342 unique citations, we selected 119 RCTs including 18 919 patients: 10 108 patients received CHM alone and 6550 received one of 11 treatment combinations. A high risk of bias was observed across all domains: 21% had a high risk for selection bias (11% from sequence generation and 30% from allocation concealment), 85% for performance bias, 89% for detection bias, 4% for attrition bias and 40% for reporting bias. In multivariable analysis, fewer authors were associated with selection bias (allocation concealment), performance bias and attrition bias, and earlier year of publication and funding source not reported or disclosed were associated with selection bias (sequence generation). Studies published in non-English language were associated with reporting bias. Poor adherence to recommended reporting standards (<60% of the studies not providing sufficient information) was observed in 11 of the 23 sections evaluated. Limitations Study quality and data extraction were performed by one reviewer and cross-checked by a second reviewer. Translation to English was performed by one reviewer in 85% of the included studies. Conclusions Studies evaluating CHM often fail to meet expected methodological criteria, and high-quality evidence is lacking. PMID:28249848
Ecological and evolutionary genomics of marine photosynthetic organisms.

PubMed

Coelho, Susana M; Simon, Nathalie; Ahmed, Sophia; Cock, J Mark; Partensky, Frédéric

2013-02-01

Environmental (ecological) genomics aims to understand the genetic basis of relationships between organisms and their abiotic and biotic environments. It is a rapidly progressing field of research largely due to recent advances in the speed and volume of genomic data being produced by next generation sequencing (NGS) technologies. Building on information generated by NGS-based approaches, functional genomic methodologies are being applied to identify and characterize genes and gene systems of both environmental and evolutionary relevance. Marine photosynthetic organisms (MPOs) were poorly represented amongst the early genomic models, but this situation is changing rapidly. Here we provide an overview of the recent advances in the application of ecological genomic approaches to both prokaryotic and eukaryotic MPOs. We describe how these approaches are being used to explore the biology and ecology of marine cyanobacteria and algae, particularly with regard to their functions in a broad range of marine ecosystems. Specifically, we review the ecological and evolutionary insights gained from whole genome and transcriptome sequencing projects applied to MPOs and illustrate how their genomes are yielding information on the specific features of these organisms. © 2012 Blackwell Publishing Ltd.
Numerical study on the sequential Bayesian approach for radioactive materials detection

NASA Astrophysics Data System (ADS)

Qingpei, Xiang; Dongfeng, Tian; Jianyu, Zhu; Fanhua, Hao; Ge, Ding; Jun, Zeng

2013-01-01

A new detection method, based on the sequential Bayesian approach proposed by Candy et al., offers new horizons for the research of radioactive detection. Compared with the commonly adopted detection methods incorporated with statistical theory, the sequential Bayesian approach offers the advantages of shorter verification time during the analysis of spectra that contain low total counts, especially in complex radionuclide components. In this paper, a simulation experiment platform implanted with the methodology of sequential Bayesian approach was developed. Events sequences of γ-rays associating with the true parameters of a LaBr3(Ce) detector were obtained based on an events sequence generator using Monte Carlo sampling theory to study the performance of the sequential Bayesian approach. The numerical experimental results are in accordance with those of Candy. Moreover, the relationship between the detection model and the event generator, respectively represented by the expected detection rate (Am) and the tested detection rate (Gm) parameters, is investigated. To achieve an optimal performance for this processor, the interval of the tested detection rate as a function of the expected detection rate is also presented.
Functional Analysis With a Barcoder Yeast Gene Overexpression System

PubMed Central

Douglas, Alison C.; Smith, Andrew M.; Sharifpoor, Sara; Yan, Zhun; Durbic, Tanja; Heisler, Lawrence E.; Lee, Anna Y.; Ryan, Owen; Göttert, Hendrikje; Surendra, Anu; van Dyk, Dewald; Giaever, Guri; Boone, Charles; Nislow, Corey; Andrews, Brenda J.

2012-01-01

Systematic analysis of gene overexpression phenotypes provides an insight into gene function, enzyme targets, and biological pathways. Here, we describe a novel functional genomics platform that enables a highly parallel and systematic assessment of overexpression phenotypes in pooled cultures. First, we constructed a genome-level collection of ~5100 yeast barcoder strains, each of which carries a unique barcode, enabling pooled fitness assays with a barcode microarray or sequencing readout. Second, we constructed a yeast open reading frame (ORF) galactose-induced overexpression array by generating a genome-wide set of yeast transformants, each of which carries an individual plasmid-born and sequence-verified ORF derived from the Saccharomyces cerevisiae full-length EXpression-ready (FLEX) collection. We combined these collections genetically using synthetic genetic array methodology, generating ~5100 strains, each of which is barcoded and overexpresses a specific ORF, a set we termed “barFLEX.” Additional synthetic genetic array allows the barFLEX collection to be moved into different genetic backgrounds. As a proof-of-principle, we describe the properties of the barFLEX overexpression collection and its application in synthetic dosage lethality studies under different environmental conditions. PMID:23050238
Design and FPGA Implementation of a Universal Chaotic Signal Generator Based on the Verilog HDL Fixed-Point Algorithm and State Machine Control

NASA Astrophysics Data System (ADS)

Qiu, Mo; Yu, Simin; Wen, Yuqiong; Lü, Jinhu; He, Jianbin; Lin, Zhuosheng

In this paper, a novel design methodology and its FPGA hardware implementation for a universal chaotic signal generator is proposed via the Verilog HDL fixed-point algorithm and state machine control. According to continuous-time or discrete-time chaotic equations, a Verilog HDL fixed-point algorithm and its corresponding digital system are first designed. In the FPGA hardware platform, each operation step of Verilog HDL fixed-point algorithm is then controlled by a state machine. The generality of this method is that, for any given chaotic equation, it can be decomposed into four basic operation procedures, i.e. nonlinear function calculation, iterative sequence operation, iterative values right shifting and ceiling, and chaotic iterative sequences output, each of which corresponds to only a state via state machine control. Compared with the Verilog HDL floating-point algorithm, the Verilog HDL fixed-point algorithm can save the FPGA hardware resources and improve the operation efficiency. FPGA-based hardware experimental results validate the feasibility and reliability of the proposed approach.
The Relevance of HLA Sequencing in Population Genetics Studies

PubMed Central

Sanchez-Mazas, Alicia

2014-01-01

Next generation sequencing (NGS) is currently being adapted by different biotechnological platforms to the standard typing method for HLA polymorphism, the huge diversity of which makes this initiative particularly challenging. Boosting the molecular characterization of the HLA genes through efficient, rapid, and low-cost technologies is expected to amplify the success of tissue transplantation by enabling us to find donor-recipient matching for rare phenotypes. But the application of NGS technologies to the molecular mapping of the MHC region also anticipates essential changes in population genetic studies. Huge amounts of HLA sequence data will be available in the next years for different populations, with the potential to change our understanding of HLA variation in humans. In this review, we first explain how HLA sequencing allows a better assessment of the HLA diversity in human populations, taking also into account the methodological difficulties it introduces at the statistical level; secondly, we show how analyzing HLA sequence variation may improve our comprehension of population genetic relationships by facilitating the identification of demographic events that marked human evolution; finally, we discuss the interest of both HLA and genome-wide sequencing and genotyping in detecting functionally significant SNPs in the MHC region, the latter having also contributed to the makeup of the HLA molecular diversity observed today. PMID:25126587
The relevance of HLA sequencing in population genetics studies.

PubMed

Sanchez-Mazas, Alicia; Meyer, Diogo

2014-01-01

Next generation sequencing (NGS) is currently being adapted by different biotechnological platforms to the standard typing method for HLA polymorphism, the huge diversity of which makes this initiative particularly challenging. Boosting the molecular characterization of the HLA genes through efficient, rapid, and low-cost technologies is expected to amplify the success of tissue transplantation by enabling us to find donor-recipient matching for rare phenotypes. But the application of NGS technologies to the molecular mapping of the MHC region also anticipates essential changes in population genetic studies. Huge amounts of HLA sequence data will be available in the next years for different populations, with the potential to change our understanding of HLA variation in humans. In this review, we first explain how HLA sequencing allows a better assessment of the HLA diversity in human populations, taking also into account the methodological difficulties it introduces at the statistical level; secondly, we show how analyzing HLA sequence variation may improve our comprehension of population genetic relationships by facilitating the identification of demographic events that marked human evolution; finally, we discuss the interest of both HLA and genome-wide sequencing and genotyping in detecting functionally significant SNPs in the MHC region, the latter having also contributed to the makeup of the HLA molecular diversity observed today.
Experimental and Theoretical Studies on the Nazarov Cyclization/Wagner-Meerwein Rearrangement Sequence

PubMed Central

Lebœuf, David; Ciesielski, Jennifer

2012-01-01

Highly functionalized cyclopentenones can be generated stereospecifically by a chemoselective copper(II)-mediated Nazarov/Wagner-Meerwein rearrangement sequence of divinyl ketones. A detailed investigation of this sequence is described including a study of substrate scope and limitations. After the initial 4π electrocyclization, this reaction proceeds via two different sequential [1,2]-shifts, with selectivity that depends upon either migratory ability or the steric bulkiness of the substituents at C1 and C5. This methodology allows the creation of vicinal stereogenic centers, including adjacent quaternary centers. This sequence can also be achieved by using a catalytic amount of copper(II) in combination with NaBAr4f, a weak Lewis acid. During the study of the scope of the reaction, a partial or complete E / Z isomerization of the enone moiety was observed in some cases prior to the cyclization, which resulted in a mixture of diastereomeric products. Use of a Cu(II)-bisoxazoline complex prevented the isomerization, allowing high diastereoselectivity to be obtained in all substrate types. In addition, the reaction sequence was studied by DFT computations at the UB3LYP/6-31G(d,p) level, which are consistent with the proposed sequences observed, including E / Z isomerizations and chemoselective Wagner-Meerwein shifts. PMID:22471833
Opinion: Clarifying Two Controversies about Information Mapping's Method.

ERIC Educational Resources Information Center

Horn, Robert E.

1992-01-01

Describes Information Mapping, a methodology for the analysis, organization, sequencing, and presentation of information and explains three major parts of the method: (1) content analysis, (2) project life-cycle synthesis and integration of the content analysis, and (3) sequencing and formatting. Major criticisms of the methodology are addressed.…
Generation and characterization of antibodies specific for caspase-cleaved neo-epitopes: a novel approach

PubMed Central

Ai, X; Butts, B; Vora, K; Li, W; Tache-Talmadge, C; Fridman, A; Mehmet, H

2011-01-01

Apoptosis research has been significantly aided by the generation of antibodies against caspase-cleaved peptide neo-epitopes. However, most of these antibodies recognize the N-terminal fragment and are specific for the protein in question. The aim of this project was to create antibodies, which could identify caspase-cleaved proteins without a priori knowledge of the cleavage sites or even the proteins themselves. We hypothesized that many caspase-cleavage products might have a common antigenic shape, given that they must all fit into the same active site of caspases. Rabbits were immunized with the eight most prevalent exposed C-terminal tetrapeptide sequences following caspase cleavage. After purification of the antibodies we demonstrated (1) their specificity for exposed C-terminal (but not internal) peptides, (2) their ability to detect known caspase-cleaved proteins from apoptotic cell lysates or supernatants from apoptotic cell culture and (3) their ability to detect a caspase-cleaved protein whose tetrapeptide sequence differs from the eight tetrapeptides used to generate the antibodies. These antibodies have the potential to identify novel neo-epitopes produced by caspase cleavage and so can be used to identify pathway-specific caspase cleavage events in a specific cell type. Additionally this methodology may be applied to generate antibodies against products of other proteases, which have a well-defined and non-promiscuous cleavage activity. PMID:21881607
DNA capture and next-generation sequencing can recover whole mitochondrial genomes from highly degraded samples for human identification

PubMed Central

2013-01-01

Background Mitochondrial DNA (mtDNA) typing can be a useful aid for identifying people from compromised samples when nuclear DNA is too damaged, degraded or below detection thresholds for routine short tandem repeat (STR)-based analysis. Standard mtDNA typing, focused on PCR amplicon sequencing of the control region (HVS I and HVS II), is limited by the resolving power of this short sequence, which misses up to 70% of the variation present in the mtDNA genome. Methods We used in-solution hybridisation-based DNA capture (using DNA capture probes prepared from modern human mtDNA) to recover mtDNA from post-mortem human remains in which the majority of DNA is both highly fragmented (<100 base pairs in length) and chemically damaged. The method ‘immortalises’ the finite quantities of DNA in valuable extracts as DNA libraries, which is followed by the targeted enrichment of endogenous mtDNA sequences and characterisation by next-generation sequencing (NGS). Results We sequenced whole mitochondrial genomes for human identification from samples where standard nuclear STR typing produced only partial profiles or demonstrably failed and/or where standard mtDNA hypervariable region sequences lacked resolving power. Multiple rounds of enrichment can substantially improve coverage and sequencing depth of mtDNA genomes from highly degraded samples. The application of this method has led to the reliable mitochondrial sequencing of human skeletal remains from unidentified World War Two (WWII) casualties approximately 70 years old and from archaeological remains (up to 2,500 years old). Conclusions This approach has potential applications in forensic science, historical human identification cases, archived medical samples, kinship analysis and population studies. In particular the methodology can be applied to any case, involving human or non-human species, where whole mitochondrial genome sequences are required to provide the highest level of maternal lineage discrimination. Multiple rounds of in-solution hybridisation-based DNA capture can retrieve whole mitochondrial genome sequences from even the most challenging samples. PMID:24289217
Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes

PubMed Central

Salazar-Gonzalez, Jesus F.; Salazar, Maria G.; Tully, Damien C.; Ogilvie, Colin B.; Learn, Gerald H.; Allen, Todd M.; Heath, Sonya L.; Goepfert, Paul; Bar, Katharine J.

2016-01-01

Background Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS) as a simple and convenient alternative to collecting and storing frozen plasma. Methods We performed parallel nucleic acid extraction, single genome amplification (SGA), next generation sequencing (NGS), and phylogenetic analyses on plasma and DBS. Results We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF) HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ∼50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20°C for up to five months. Conclusions These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings. PMID:27819061
Next-generation Sequencing (NGS) Analysis on Single Circulating Tumor Cells (CTCs) with No Need of Whole-genome Amplification (WGA).

PubMed

Palmirotta, Raffaele; Lovero, Domenica; Silvestris, Erica; Felici, Claudia; Quaresmini, Davide; Cafforio, Paola; Silvestris, Franco

2017-01-01

Isolation and genotyping of circulating tumor cells (CTCs) is gaining an increasing interest by clinical researchers in oncology not only for investigative purposes, but also for concrete application in clinical practice in terms of diagnosis, prognosis and decision treatment with targeted therapies. For the mutational analysis of single CTCs, the most advanced biotechnology methodology currently available includes the combination of whole genome amplification (WGA) followed by next-generation sequencing (NGS). However, the sequence of these molecular techniques is time-consuming and may also favor operator-dependent errors, related to the procedures themselves that, as in the case of the WGA technique, might affect downstream molecular analyses. A preliminary approach of molecular analysis by NGS on a model of CTCs without previous WGA procedural step was performed. We set-up an artificial sample obtained by spiking the SK-MEL-28 melanoma cell line in normal donor peripheral whole blood. Melanoma cells were first enriched using an AutoMACS® (Miltenyi) cell separator and then isolated as single and pooled CTCs by DEPArray™ System (Silicon Biosystems). NGS analysis, using the Ion AmpliSeq™ Cancer Hotspot Panel v2 (Life Technologies) with the Ion Torrent PGM™ system (Life Technologies), was performed on the SK-MEL-28 cell pellet, a single CTC previously processed with WGA and on 1, 2, 4 and 8 recovered CTCs without WGA pre-amplification. NGS directly carried out on CTCs without WGA showed the same mutations identified in SK-MEL-28 cell line pellet, with a considerable efficiency and avoiding the errors induced by the WGA procedure. We identified a cost-effective, time-saving and reliable methodological approach that could improve the analytical accuracy of the liquid biopsy and appears promising in studying CTCs from cancer patients for both research and clinical purposes. Copyright© 2017, International Institute of Anticancer Research (Dr. George J. Delinasios), All rights reserved.
Analyzing the requirements for mass production of small wind turbine generators

NASA Astrophysics Data System (ADS)

Anuskiewicz, T.; Asmussen, J.; Frankenfield, O.

Mass producibility of small wind turbine generators to give manufacturers design and cost data for profitable production operations is discussed. A 15 kW wind turbine generator for production in annual volumes from 1,000 to 50,000 units is discussed. Methodology to cost the systems effectively is explained. The process estimate sequence followed is outlined with emphasis on the process estimate sheets compiled for each component and subsystem. These data enabled analysts to develop cost breakdown profiles crucial in manufacturing decision-making. The appraisal also led to various design recommendations including replacement of aluminum towers with cost effective carbon steel towers. Extensive cost information is supplied in tables covering subassemblies, capital requirements, and levelized energy costs. The physical layout of the plant is depicted to guide manufacturers in taking advantage of the growing business opportunity now offered in conjunction with the national need for energy development.
A weighted sampling algorithm for the design of RNA sequences with targeted secondary structure and nucleotide distribution.

PubMed

Reinharz, Vladimir; Ponty, Yann; Waldispühl, Jérôme

2013-07-01

The design of RNA sequences folding into predefined secondary structures is a milestone for many synthetic biology and gene therapy studies. Most of the current software uses similar local search strategies (i.e. a random seed is progressively adapted to acquire the desired folding properties) and more importantly do not allow the user to control explicitly the nucleotide distribution such as the GC-content in their sequences. However, the latter is an important criterion for large-scale applications as it could presumably be used to design sequences with better transcription rates and/or structural plasticity. In this article, we introduce IncaRNAtion, a novel algorithm to design RNA sequences folding into target secondary structures with a predefined nucleotide distribution. IncaRNAtion uses a global sampling approach and weighted sampling techniques. We show that our approach is fast (i.e. running time comparable or better than local search methods), seedless (we remove the bias of the seed in local search heuristics) and successfully generates high-quality sequences (i.e. thermodynamically stable) for any GC-content. To complete this study, we develop a hybrid method combining our global sampling approach with local search strategies. Remarkably, our glocal methodology overcomes both local and global approaches for sampling sequences with a specific GC-content and target structure. IncaRNAtion is available at csb.cs.mcgill.ca/incarnation/. Supplementary data are available at Bioinformatics online.
Evolution of poor reporting and inadequate methods over time in 20 920 randomised controlled trials included in Cochrane reviews: research on research study.

PubMed

Dechartres, Agnes; Trinquart, Ludovic; Atal, Ignacio; Moher, David; Dickersin, Kay; Boutron, Isabelle; Perrodeau, Elodie; Altman, Douglas G; Ravaud, Philippe

2017-06-08

Objective To examine how poor reporting and inadequate methods for key methodological features in randomised controlled trials (RCTs) have changed over the past three decades. Design Mapping of trials included in Cochrane reviews. Data sources Data from RCTs included in all Cochrane reviews published between March 2011 and September 2014 reporting an evaluation of the Cochrane risk of bias items: sequence generation, allocation concealment, blinding, and incomplete outcome data. Data extraction For each RCT, we extracted consensus on risk of bias made by the review authors and identified the primary reference to extract publication year and journal. We matched journal names with Journal Citation Reports to get 2014 impact factors. Main outcomes measures We considered the proportions of trials rated by review authors at unclear and high risk of bias as surrogates for poor reporting and inadequate methods, respectively. Results We analysed 20 920 RCTs (from 2001 reviews) published in 3136 journals. The proportion of trials with unclear risk of bias was 48.7% for sequence generation and 57.5% for allocation concealment; the proportion of those with high risk of bias was 4.0% and 7.2%, respectively. For blinding and incomplete outcome data, 30.6% and 24.7% of trials were at unclear risk and 33.1% and 17.1% were at high risk, respectively. Higher journal impact factor was associated with a lower proportion of trials at unclear or high risk of bias. The proportion of trials at unclear risk of bias decreased over time, especially for sequence generation, which fell from 69.1% in 1986-1990 to 31.2% in 2011-14 and for allocation concealment (70.1% to 44.6%). After excluding trials at unclear risk of bias, use of inadequate methods also decreased over time: from 14.8% to 4.6% for sequence generation and from 32.7% to 11.6% for allocation concealment. Conclusions Poor reporting and inadequate methods have decreased over time, especially for sequence generation and allocation concealment. But more could be done, especially in lower impact factor journals. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://group.bmj.com/group/rights-licensing/permissions.
Teaching Research Methodology Using a Project-Based Three Course Sequence Critical Reflections on Practice

ERIC Educational Resources Information Center

Braguglia, Kay H.; Jackson, Kanata A.

2012-01-01

This article presents a reflective analysis of teaching research methodology through a three course sequence using a project-based approach. The authors reflect critically on their experiences in teaching research methods courses in an undergraduate business management program. The introduction of a range of specific techniques including student…
Fragmentation of contaminant and endogenous DNA in ancient samples determined by shotgun sequencing; prospects for human palaeogenomics.

PubMed

García-Garcerà, Marc; Gigli, Elena; Sanchez-Quinto, Federico; Ramirez, Oscar; Calafell, Francesc; Civit, Sergi; Lalueza-Fox, Carles

2011-01-01

Despite the successful retrieval of genomes from past remains, the prospects for human palaeogenomics remain unclear because of the difficulty of distinguishing contaminant from endogenous DNA sequences. Previous sequence data generated on high-throughput sequencing platforms indicate that fragmentation of ancient DNA sequences is a characteristic trait primarily arising due to depurination processes that create abasic sites leading to DNA breaks. METHODOLOGY/PRINCIPALS FINDINGS: To investigate whether this pattern is present in ancient remains from a temperate environment, we have 454-FLX pyrosequenced different samples dated between 5,500 and 49,000 years ago: a bone from an extinct goat (Myotragus balearicus) that was treated with a depurinating agent (bleach), an Iberian lynx bone not subjected to any treatment, a human Neolithic sample from Barcelona (Spain), and a Neandertal sample from the El Sidrón site (Asturias, Spain). The efficiency of retrieval of endogenous sequences is below 1% in all cases. We have used the non-human samples to identify human sequences (0.35 and 1.4%, respectively), that we positively know are contaminants. We observed that bleach treatment appears to create a depurination-associated fragmentation pattern in resulting contaminant sequences that is indistinguishable from previously described endogenous sequences. Furthermore, the nucleotide composition pattern observed in 5' and 3' ends of contaminant sequences is much more complex than the flat pattern previously described in some Neandertal contaminants. Although much research on samples with known contaminant histories is needed, our results suggest that endogenous and contaminant sequences cannot be distinguished by the fragmentation pattern alone.
HomoSAR: bridging comparative protein modeling with quantitative structural activity relationship to design new peptides.

PubMed

Borkar, Mahesh R; Pissurlenkar, Raghuvir R S; Coutinho, Evans C

2013-11-15

Peptides play significant roles in the biological world. To optimize activity for a specific therapeutic target, peptide library synthesis is inevitable; which is a time consuming and expensive. Computational approaches provide a promising way to simply elucidate the structural basis in the design of new peptides. Earlier, we proposed a novel methodology termed HomoSAR to gain insight into the structure activity relationships underlying peptides. Based on an integrated approach, HomoSAR uses the principles of homology modeling in conjunction with the quantitative structural activity relationship formalism to predict and design new peptide sequences with the optimum activity. In the present study, we establish that the HomoSAR methodology can be universally applied to all classes of peptides irrespective of sequence length by studying HomoSAR on three peptide datasets viz., angiotensin-converting enzyme inhibitory peptides, CAMEL-s antibiotic peptides, and hAmphiphysin-1 SH3 domain binding peptides, using a set of descriptors related to the hydrophobic, steric, and electronic properties of the 20 natural amino acids. Models generated for all three datasets have statistically significant correlation coefficients (r(2)) and predictive r2 (r(pred)2) and cross validated coefficient ( q(LOO)2). The daintiness of this technique lies in its simplicity and ability to extract all the information contained in the peptides to elucidate the underlying structure activity relationships. The difficulties of correlating both sequence diversity and variation in length of the peptides with their biological activity can be addressed. The study has been able to identify the preferred or detrimental nature of amino acids at specific positions in the peptide sequences. Copyright © 2013 Wiley Periodicals, Inc.

Ion Torrent sequencing as a tool for mutation discovery in the flax (Linum usitatissimum L.) genome.

PubMed

Galindo-González, Leonardo; Pinzón-Latorre, David; Bergen, Erik A; Jensen, Dustin C; Deyholos, Michael K

2015-01-01

Detection of induced mutations is valuable for inferring gene function and for developing novel germplasm for crop improvement. Many reverse genetics approaches have been developed to identify mutations in genes of interest within a mutagenized population, including some approaches that rely on next-generation sequencing (e.g. exome capture, whole genome resequencing). As an alternative to these genome or exome-scale methods, we sought to develop a scalable and efficient method for detection of induced mutations that could be applied to a small number of target genes, using Ion Torrent technology. We developed this method in flax (Linum usitatissimum), to demonstrate its utility in a crop species. We used an amplicon-based approach in which DNA samples from an ethyl methanesulfonate (EMS)-mutagenized population were pooled and used as template in PCR reactions to amplify a region of each gene of interest. Barcodes were incorporated during PCR, and the pooled amplicons were sequenced using an Ion Torrent PGM. A pilot experiment with known SNPs showed that they could be detected at a frequency > 0.3% within the pools. We then selected eight genes for which we wanted to discover novel mutations, and applied our approach to screen 768 individuals from the EMS population, using either the Ion 314 or Ion 316 chips. Out of 29 potential mutations identified after processing the NGS reads, 16 mutations were confirmed using Sanger sequencing. The methodology presented here demonstrates the utility of Ion Torrent technology in detecting mutation variants in specific genome regions for large populations of a species such as flax. The methodology could be scaled-up to test >100 genes using the higher capacity chips now available from Ion Torrent.
Songbirds and humans apply different strategies in a sound sequence discrimination task.

PubMed

Seki, Yoshimasa; Suzuki, Kenta; Osawa, Ayumi M; Okanoya, Kazuo

2013-01-01

The abilities of animals and humans to extract rules from sound sequences have previously been compared using observation of spontaneous responses and conditioning techniques. However, the results were inconsistently interpreted across studies possibly due to methodological and/or species differences. Therefore, we examined the strategies for discrimination of sound sequences in Bengalese finches and humans using the same protocol. Birds were trained on a GO/NOGO task to discriminate between two categories of sound stimulus generated based on an "AAB" or "ABB" rule. The sound elements used were taken from a variety of male (M) and female (F) calls, such that the sequences could be represented as MMF and MFF. In test sessions, FFM and FMM sequences, which were never presented in the training sessions but conformed to the rule, were presented as probe stimuli. The results suggested two discriminative strategies were being applied: (1) memorizing sound patterns of either GO or NOGO stimuli and generating the appropriate responses for only those sounds; and (2) using the repeated element as a cue. There was no evidence that the birds successfully extracted the abstract rule (i.e., AAB and ABB); MMF-GO subjects did not produce a GO response for FFM and vice versa. Next we examined whether those strategies were also applicable for human participants on the same task. The results and questionnaires revealed that participants extracted the abstract rule, and most of them employed it to discriminate the sequences. This strategy was never observed in bird subjects, although some participants used strategies similar to the birds when responding to the probe stimuli. Our results showed that the human participants applied the abstract rule in the task even without instruction but Bengalese finches did not, thereby reconfirming that humans have to extract abstract rules from sound sequences that is distinct from non-human animals.
pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree

PubMed Central

2010-01-01

Background Likelihood-based phylogenetic inference is generally considered to be the most reliable classification method for unknown sequences. However, traditional likelihood-based phylogenetic methods cannot be applied to large volumes of short reads from next-generation sequencing due to computational complexity issues and lack of phylogenetic signal. "Phylogenetic placement," where a reference tree is fixed and the unknown query sequences are placed onto the tree via a reference alignment, is a way to bring the inferential power offered by likelihood-based approaches to large data sets. Results This paper introduces pplacer, a software package for phylogenetic placement and subsequent visualization. The algorithm can place twenty thousand short reads on a reference tree of one thousand taxa per hour per processor, has essentially linear time and memory complexity in the number of reference taxa, and is easy to run in parallel. Pplacer features calculation of the posterior probability of a placement on an edge, which is a statistically rigorous way of quantifying uncertainty on an edge-by-edge basis. It also can inform the user of the positional uncertainty for query sequences by calculating expected distance between placement locations, which is crucial in the estimation of uncertainty with a well-sampled reference tree. The software provides visualizations using branch thickness and color to represent number of placements and their uncertainty. A simulation study using reads generated from 631 COG alignments shows a high level of accuracy for phylogenetic placement over a wide range of alignment diversity, and the power of edge uncertainty estimates to measure placement confidence. Conclusions Pplacer enables efficient phylogenetic placement and subsequent visualization, making likelihood-based phylogenetics methodology practical for large collections of reads; it is freely available as source code, binaries, and a web service. PMID:21034504
Application of high-throughput sequencing to whole rabies viral genome characterisation and its use for phylogenetic re-evaluation of a raccoon strain incursion into the province of Ontario.

PubMed

Nadin-Davis, Susan A; Colville, Adam; Trewby, Hannah; Biek, Roman; Real, Leslie

2017-03-15

Raccoon rabies remains a serious public health problem throughout much of the eastern seaboard of North America due to the urban nature of the reservoir host and the many challenges inherent in multi-jurisdictional efforts to administer co-ordinated and comprehensive wildlife rabies control programmes. Better understanding of the mechanisms of spread of rabies virus can play a significant role in guiding such control efforts. To facilitate a detailed molecular epidemiological study of raccoon rabies virus movements across eastern North America, we developed a methodology to efficiently determine whole genome sequences of hundreds of viral samples. The workflow combines the generation of a limited number of overlapping amplicons covering the complete viral genome and use of high throughput sequencing technology. The value of this approach is demonstrated through a retrospective phylogenetic analysis of an outbreak of raccoon rabies which occurred in the province of Ontario between 1999 and 2005. As demonstrated by the number of single nucleotide polymorphisms detected, whole genome sequence data were far more effective than single gene sequences in discriminating between samples and this facilitated the generation of more robust and informative phylogenies that yielded insights into the spatio-temporal pattern of viral spread. With minor modification this approach could be applied to other rabies virus variants thereby facilitating greatly improved phylogenetic inference and thus better understanding of the spread of this serious zoonotic disease. Such information will inform the most appropriate strategies for rabies control in wildlife reservoirs. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.
Methodology for the nuclear design validation of an Alternate Emergency Management Centre (CAGE)

NASA Astrophysics Data System (ADS)

Hueso, César; Fabbri, Marco; de la Fuente, Cristina; Janés, Albert; Massuet, Joan; Zamora, Imanol; Gasca, Cristina; Hernández, Héctor; Vega, J. Ángel

2017-09-01

The methodology is devised by coupling different codes. The study of weather conditions as part of the data of the site will determine the relative concentrations of radionuclides in the air using ARCON96. The activity in the air is characterized depending on the source and release sequence specified in NUREG-1465 by RADTRAD code, which provides results of the inner cloud source term contribution. Known activities, energy spectra are inferred using ORIGEN-S, which are used as input for the models of the outer cloud, filters and containment generated with MCNP5. The sum of the different contributions must meet the conditions of habitability specified by the CSN (Spanish Nuclear Regulatory Body) (TEDE <50 mSv and equivalent dose to the thyroid <500 mSv within 30 days following the accident doses) so that the dose is optimized by varying parameters such as CAGE location, flow filtering need for recirculation, thicknesses and compositions of the walls, etc. The results for the most penalizing area meet the established criteria, and therefore the CAGE building design based on the methodology presented is radiologically validated.
Development of a State Machine Sequencer for the Keck Interferometer: Evolution, Development and Lessons Learned using a CASE Tool Approach

NASA Technical Reports Server (NTRS)

Rede, Leonard J.; Booth, Andrew; Hsieh, Jonathon; Summer, Kellee

2004-01-01

This paper presents a discussion of the evolution of a sequencer from a simple EPICS (Experimental Physics and Industrial Control System) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a CASE (Computer Aided Software Engineering) tool approach. The main purpose of the sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Hare1 finite state machine, software program designed to orchestrate several lower-level hardware and software hard real time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORB A, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.
Development of a state machine sequencer for the Keck Interferometer: evolution, development, and lessons learned using a CASE tool approach

NASA Astrophysics Data System (ADS)

Reder, Leonard J.; Booth, Andrew; Hsieh, Jonathan; Summers, Kellee R.

2004-09-01

This paper presents a discussion of the evolution of a sequencer from a simple Experimental Physics and Industrial Control System (EPICS) based sequencer into a complex implementation designed utilizing UML (Unified Modeling Language) methodologies and a Computer Aided Software Engineering (CASE) tool approach. The main purpose of the Interferometer Sequencer (called the IF Sequencer) is to provide overall control of the Keck Interferometer to enable science operations to be carried out by a single operator (and/or observer). The interferometer links the two 10m telescopes of the W. M. Keck Observatory at Mauna Kea, Hawaii. The IF Sequencer is a high-level, multi-threaded, Harel finite state machine software program designed to orchestrate several lower-level hardware and software hard real-time subsystems that must perform their work in a specific and sequential order. The sequencing need not be done in hard real-time. Each state machine thread commands either a high-speed real-time multiple mode embedded controller via CORBA, or slower controllers via EPICS Channel Access interfaces. The overall operation of the system is simplified by the automation. The UML is discussed and our use of it to implement the sequencer is presented. The decision to use the Rhapsody product as our CASE tool is explained and reflected upon. Most importantly, a section on lessons learned is presented and the difficulty of integrating CASE tool automatically generated C++ code into a large control system consisting of multiple infrastructures is presented.
Anticipation measures of sequence learning: manual versus oculomotor versions of the serial reaction time task.

PubMed

Vakil, Eli; Bloch, Ayala; Cohen, Haggar

2017-03-01

The serial reaction time (SRT) task has generated a very large amount of research. Nevertheless the debate continues as to the exact cognitive processes underlying implicit sequence learning. Thus, the first goal of this study is to elucidate the underlying cognitive processes enabling sequence acquisition. We therefore compared reaction time (RT) in sequence learning in a standard manual activated (MA) to that in an ocular activated (OA) version of the task, within a single experimental setting. The second goal is to use eye movement measures to compare anticipation, as an additional indication of sequence learning, between the two versions of the SRT. Performance of the group given the MA version of the task (n = 29) was compared with that of the group given the OA version (n = 30). The results showed that although overall, RT was faster for the OA group, the rate of sequence learning was similar to that of the MA group performing the standard version of the SRT. Because the stimulus-response association is automatic and exists prior to training in the OA task, the decreased reaction time in this version of the task reflects a purer measure of the sequence learning that occurs in the SRT task. The results of this study show that eye tracking anticipation can be measured directly and can serve as a direct measure of sequence learning. Finally, using the OA version of the SRT to study sequence learning presents a significant methodological contribution by making sequence learning studies possible among populations that struggle to perform manual responses.
Versatile approach for functional analysis of human proteins and efficient stable cell line generation using FLP-mediated recombination system

PubMed Central

Szczesny, Roman J.; Kowalska, Katarzyna; Klosowska-Kosicka, Kamila; Chlebowski, Aleksander; Owczarek, Ewelina P.; Warkocki, Zbigniew; Kulinski, Tomasz M.; Adamska, Dorota; Affek, Kamila; Jedroszkowiak, Agata; Kotrys, Anna V.; Tomecki, Rafal; Krawczyk, Pawel S.; Borowski, Lukasz S.; Dziembowski, Andrzej

2018-01-01

Deciphering a function of a given protein requires investigating various biological aspects. Usually, the protein of interest is expressed with a fusion tag that aids or allows subsequent analyses. Additionally, downregulation or inactivation of the studied gene enables functional studies. Development of the CRISPR/Cas9 methodology opened many possibilities but in many cases it is restricted to non-essential genes. Recombinase-dependent gene integration methods, like the Flp-In system, are very good alternatives. The system is widely used in different research areas, which calls for the existence of compatible vectors and efficient protocols that ensure straightforward DNA cloning and generation of stable cell lines. We have created and validated a robust series of 52 vectors for streamlined generation of stable mammalian cell lines using the FLP recombinase-based methodology. Using the sequence-independent DNA cloning method all constructs for a given coding-sequence can be made with just three universal PCR primers. Our collection allows tetracycline-inducible expression of proteins with various tags suitable for protein localization, FRET, bimolecular fluorescence complementation (BiFC), protein dynamics studies (FRAP), co-immunoprecipitation, the RNA tethering assay and cell sorting. Some of the vectors contain a bidirectional promoter for concomitant expression of miRNA and mRNA, so that a gene can be silenced and its product replaced by a mutated miRNA-insensitive version. Our toolkit and protocols have allowed us to create more than 500 constructs with ease. We demonstrate the efficacy of our vectors by creating stable cell lines with various tagged proteins (numatrin, fibrillarin, coilin, centrin, THOC5, PCNA). We have analysed transgene expression over time to provide a guideline for future experiments and compared the effectiveness of commonly used inducers for tetracycline-responsive promoters. As proof of concept we examined the role of the exoribonuclease XRN2 in transcription termination by RNAseq. PMID:29590189
Analysis of accident sequences and source terms at waste treatment and storage facilities for waste generated by U.S. Department of Energy Waste Management Operations, Volume 3: Appendixes C-H

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mueller, C.; Nabelssi, B.; Roglans-Ribas, J.

1995-04-01

This report contains the Appendices for the Analysis of Accident Sequences and Source Terms at Waste Treatment and Storage Facilities for Waste Generated by the U.S. Department of Energy Waste Management Operations. The main report documents the methodology, computational framework, and results of facility accident analyses performed as a part of the U.S. Department of Energy (DOE) Waste Management Programmatic Environmental Impact Statement (WM PEIS). The accident sequences potentially important to human health risk are specified, their frequencies are assessed, and the resultant radiological and chemical source terms are evaluated. A personal computer-based computational framework and database have been developedmore » that provide these results as input to the WM PEIS for calculation of human health risk impacts. This report summarizes the accident analyses and aggregates the key results for each of the waste streams. Source terms are estimated and results are presented for each of the major DOE sites and facilities by WM PEIS alternative for each waste stream. The appendices identify the potential atmospheric release of each toxic chemical or radionuclide for each accident scenario studied. They also provide discussion of specific accident analysis data and guidance used or consulted in this report.« less
Viruses and Multiple Sclerosis

PubMed Central

Virtanen, Jussi Oskari; Jacobson, Steve

2016-01-01

Multiple sclerosis (MS) is a heterogeneous disease that develops as an interplay between the immune system and environmental stimuli in genetically susceptible individuals. There is increasing evidence that viruses may play a role in MS pathogenesis acting as these environmental triggers. However, it is not known if any single virus is causal, or rather several viruses can act as triggers in disease development. Here, we review the association of different viruses to MS with an emphasis on two herpesviruses, Epstein-Barr virus (EBV) and human herpesvirus 6 (HHV-6). These two agents have generated the most impact during recent years as possible co-factors in MS disease development. The strongest argument for association of EBV with MS comes from the link between symptomatic infectious mononucleosis and MS and from seroepidemiological studies. In contrast to EBV, HHV-6 has been found significantly more often in MS plaques than in MS normal appearing white matter or non-MS brains and HHV-6 re-activation has been reported during MS clinical relapses. In this review we also suggest new strategies, including the development of new infectious animal models of MS and antiviral MS clinical trials, to elucidate roles of different viruses in the pathogenesis of this disease. Furthermore, we introduce the idea of using unbiased sequence-independent pathogen discovery methodologies, such as next generation sequencing, to study MS brain tissue or body fluids for detection of known viral sequences or potential novel viral agents. PMID:22583435
Homopolymer tail-mediated ligation PCR: a streamlined and highly efficient method for DNA cloning and library construction

PubMed Central

Lazinski, David W.; Camilli, Andrew

2013-01-01

The amplification of DNA fragments, cloned between user-defined 5′ and 3′ end sequences, is a prerequisite step in the use of many current applications including massively parallel sequencing (MPS). Here we describe an improved method, called homopolymer tail-mediated ligation PCR (HTML-PCR), that requires very little starting template, minimal hands-on effort, is cost-effective, and is suited for use in high-throughput and robotic methodologies. HTML-PCR starts with the addition of homopolymer tails of controlled lengths to the 3′ termini of a double-stranded genomic template. The homopolymer tails enable the annealing-assisted ligation of a hybrid oligonucleotide to the template's recessed 5′ ends. The hybrid oligonucleotide has a user-defined sequence at its 5′ end. This primer, together with a second primer composed of a longer region complementary to the homopolymer tail and fused to a second 5′ user-defined sequence, are used in a PCR reaction to generate the final product. The user-defined sequences can be varied to enable compatibility with a wide variety of downstream applications. We demonstrate our new method by constructing MPS libraries starting from nanogram and sub-nanogram quantities of Vibrio cholerae and Streptococcus pneumoniae genomic DNA. PMID:23311318
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing

PubMed Central

Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J.; Szatkiewicz, Jin P.

2015-01-01

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. PMID:25883151
A Next-Generation Sequencing Primer—How Does It Work and What Can It Do?

PubMed Central

Alekseyev, Yuriy O.; Fazeli, Roghayeh; Yang, Shi; Basran, Raveen; Miller, Nancy S.

2018-01-01

Next-generation sequencing refers to a high-throughput technology that determines the nucleic acid sequences and identifies variants in a sample. The technology has been introduced into clinical laboratory testing and produces test results for precision medicine. Since next-generation sequencing is relatively new, graduate students, medical students, pathology residents, and other physicians may benefit from a primer to provide a foundation about basic next-generation sequencing methods and applications, as well as specific examples where it has had diagnostic and prognostic utility. Next-generation sequencing technology grew out of advances in multiple fields to produce a sophisticated laboratory test with tremendous potential. Next-generation sequencing may be used in the clinical setting to look for specific genetic alterations in patients with cancer, diagnose inherited conditions such as cystic fibrosis, and detect and profile microbial organisms. This primer will review DNA sequencing technology, the commercialization of next-generation sequencing, and clinical uses of next-generation sequencing. Specific applications where next-generation sequencing has demonstrated utility in oncology are provided. PMID:29761157
Clinical trials in palliative care: a systematic review of their methodological characteristics and of the quality of their reporting.

PubMed

Bouça-Machado, Raquel; Rosário, Madalena; Alarcão, Joana; Correia-Guedes, Leonor; Abreu, Daisy; Ferreira, Joaquim J

2017-01-25

Over the past decades there has been a significant increase in the number of published clinical trials in palliative care. However, empirical evidence suggests that there are methodological problems in the design and conduct of studies, which raises questions about the validity and generalisability of the results and of the strength of the available evidence. We sought to evaluate the methodological characteristics and assess the quality of reporting of clinical trials in palliative care. We performed a systematic review of published clinical trials assessing therapeutic interventions in palliative care. Trials were identified using MEDLINE (from its inception to February 2015). We assessed methodological characteristics and describe the quality of reporting using the Cochrane Risk of Bias tool. We retrieved 107 studies. The most common medical field studied was oncology, and 43.9% of trials evaluated pharmacological interventions. Symptom control and physical dimensions (e.g. intervention on pain, breathlessness, nausea) were the palliative care-specific issues most studied. We found under-reporting of key information in particular on random sequence generation, allocation concealment, and blinding. While the number of clinical trials in palliative care has increased over time, methodological quality remains suboptimal. This compromises the quality of studies. Therefore, a greater effort is needed to enable the appropriate performance of future studies and increase the robustness of evidence-based medicine in this important field.
Highly sensitive detection of mutations in CHO cell recombinant DNA using multi-parallel single molecule real-time DNA sequencing.

PubMed

Cartwright, Joseph F; Anderson, Karin; Longworth, Joseph; Lobb, Philip; James, David C

2018-06-01

High-fidelity replication of biologic-encoding recombinant DNA sequences by engineered mammalian cell cultures is an essential pre-requisite for the development of stable cell lines for the production of biotherapeutics. However, immortalized mammalian cells characteristically exhibit an increased point mutation frequency compared to mammalian cells in vivo, both across their genomes and at specific loci (hotspots). Thus unforeseen mutations in recombinant DNA sequences can arise and be maintained within producer cell populations. These may affect both the stability of recombinant gene expression and give rise to protein sequence variants with variable bioactivity and immunogenicity. Rigorous quantitative assessment of recombinant DNA integrity should therefore form part of the cell line development process and be an essential quality assurance metric for instances where synthetic/multi-component assemblies are utilized to engineer mammalian cells, such as the assessment of recombinant DNA fidelity or the mutability of single-site integration target loci. Based on Pacific Biosciences (Menlo Park, CA) single molecule real-time (SMRT™) circular consensus sequencing (CCS) technology we developed a rDNA sequence analysis tool to process the multi-parallel sequencing of ∼40,000 single recombinant DNA molecules. After statistical filtering of raw sequencing data, we show that this analytical method is capable of detecting single point mutations in rDNA to a minimum single mutation frequency of 0.0042% (<1/24,000 bases). Using a stable CHO transfectant pool harboring a randomly integrated 5 kB plasmid construct encoding GFP we found that 28% of recombinant plasmid copies contained at least one low frequency (<0.3%) point mutation. These mutations were predominantly found in GC base pairs (85%) and that there was no positional bias in mutation across the plasmid sequence. There was no discernable difference between the mutation frequencies of coding and non-coding DNA. The putative ratio of non-synonymous and synonymous changes within the open reading frames (ORFs) in the plasmid sequence indicates that natural selection does not impact upon the prevalence of these mutations. Here we have demonstrated the abundance of mutations that fall outside of the reported range of detection of next generation sequencing (NGS) and second generation sequencing (SGS) platforms, providing a methodology capable of being utilized in cell line development platforms to identify the fidelity of recombinant genes throughout the production process. © 2018 Wiley Periodicals, Inc.
BMPOS: a Flexible and User-Friendly Tool Sets for Microbiome Studies.

PubMed

Pylro, Victor S; Morais, Daniel K; de Oliveira, Francislon S; Dos Santos, Fausto G; Lemos, Leandro N; Oliveira, Guilherme; Roesch, Luiz F W

2016-08-01

Recent advances in science and technology are leading to a revision and re-orientation of methodologies, addressing old and current issues under a new perspective. Advances in next generation sequencing (NGS) are allowing comparative analysis of the abundance and diversity of whole microbial communities, generating a large amount of data and findings at a systems level. The current limitation for biologists has been the increasing demand for computational power and training required for processing of NGS data. Here, we describe the deployment of the Brazilian Microbiome Project Operating System (BMPOS), a flexible and user-friendly Linux distribution dedicated to microbiome studies. The Brazilian Microbiome Project (BMP) has developed data analyses pipelines for metagenomic studies (phylogenetic marker genes), conducted using the two main high-throughput sequencing platforms (Ion Torrent and Illumina MiSeq). The BMPOS is freely available and possesses the entire requirement of bioinformatics packages and databases to perform all the pipelines suggested by the BMP team. The BMPOS may be used as a bootable live USB stick or installed in any computer with at least 1 GHz CPU and 512 MB RAM, independent of the operating system previously installed. The BMPOS has proved to be effective for sequences processing, sequences clustering, alignment, taxonomic annotation, statistical analysis, and plotting of metagenomic data. The BMPOS has been used during several metagenomic analyses courses, being valuable as a tool for training, and an excellent starting point to anyone interested in performing metagenomic studies. The BMPOS and its documentation are available at http://www.brmicrobiome.org .
A Simple and Efficient Method for Assembling TALE Protein Based on Plasmid Library

PubMed Central

Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate. PMID:23840477
A simple and efficient method for assembling TALE protein based on plasmid library.

PubMed

Zhang, Zhiqiang; Li, Duo; Xu, Huarong; Xin, Ying; Zhang, Tingting; Ma, Lixia; Wang, Xin; Chen, Zhilong; Zhang, Zhiying

2013-01-01

DNA binding domain of the transcription activator-like effectors (TALEs) from Xanthomonas sp. consists of tandem repeats that can be rearranged according to a simple cipher to target new DNA sequences with high DNA-binding specificity. This technology has been successfully applied in varieties of species for genome engineering. However, assembling long TALE tandem repeats remains a big challenge precluding wide use of this technology. Although several new methodologies for efficiently assembling TALE repeats have been recently reported, all of them require either sophisticated facilities or skilled technicians to carry them out. Here, we described a simple and efficient method for generating customized TALE nucleases (TALENs) and TALE transcription factors (TALE-TFs) based on TALE repeat tetramer library. A tetramer library consisting of 256 tetramers covers all possible combinations of 4 base pairs. A set of unique primers was designed for amplification of these tetramers. PCR products were assembled by one step of digestion/ligation reaction. 12 TALE constructs including 4 TALEN pairs targeted to mouse Gt(ROSA)26Sor gene and mouse Mstn gene sequences as well as 4 TALE-TF constructs targeted to mouse Oct4, c-Myc, Klf4 and Sox2 gene promoter sequences were generated by using our method. The construction routines took 3 days and parallel constructions were available. The rate of positive clones during colony PCR verification was 64% on average. Sequencing results suggested that all TALE constructs were performed with high successful rate. This is a rapid and cost-efficient method using the most common enzymes and facilities with a high success rate.
Metatranscriptomics and Pyrosequencing Facilitate Discovery of Potential Viral Natural Enemies of the Invasive Caribbean Crazy Ant, Nylanderia pubens

PubMed Central

Valles, Steven M.; Oi, David H.; Yu, Fahong; Tan, Xin-Xing; Buss, Eileen A.

2012-01-01

Background Nylanderia pubens (Forel) is an invasive ant species that in recent years has developed into a serious nuisance problem in the Caribbean and United States. A rapidly expanding range, explosive localized population growth, and control difficulties have elevated this ant to pest status. Professional entomologists and the pest control industry in the United States are urgently trying to understand its biology and develop effective control methods. Currently, no known biological-based control agents are available for use in controlling N. pubens. Methodology and Principal Findings Metagenomics and pyrosequencing techniques were employed to examine the transcriptome of field-collected N. pubens colonies in an effort to identify virus infections with potential to serve as control agents against this pest ant. Pyrosequencing (454-platform) of a non-normalized N. pubens expression library generated 1,306,177 raw sequence reads comprising 450 Mbp. Assembly resulted in generation of 59,017 non-redundant sequences, including 27,348 contigs and 31,669 singlets. BLAST analysis of these non-redundant sequences identified 51 of potential viral origin. Additional analyses winnowed this list of potential viruses to three that appear to replicate in N. pubens. Conclusions Pyrosequencing the transcriptome of field-collected samples of N. pubens has identified at least three sequences that are likely of viral origin and, in which, N. pubens serves as host. In addition, the N. pubens transcriptome provides a genetic resource for the scientific community which is especially important at this early stage of developing a knowledgebase for this new pest. PMID:22384082

KRAS mutation detection in colorectal cancer by a commercially available gene chip array compares well with Sanger sequencing.

PubMed

French, Deborah; Smith, Andrew; Powers, Martin P; Wu, Alan H B

2011-08-17

Binding of a ligand to the epidermal growth factor receptor (EGFR) stimulates various intracellular signaling pathways resulting in cell cycle progression, proliferation, angiogenesis and apoptosis inhibition. KRAS is involved in signaling pathways including RAF/MAPK and PI3K and mutations in this gene result in constitutive activation of these pathways, independent of EGFR activation. Seven mutations in codons 12 and 13 of KRAS comprise around 95% of the observed human mutations, rendering monoclonal antibodies against EGFR (e.g. cetuximab and panitumumab) useless in treatment of colorectal cancer. KRAS mutation testing by two different methodologies was compared; Sanger sequencing and AutoGenomics INFINITI® assay, on DNA extracted from colorectal cancers. Out of 29 colorectal tumor samples tested, 28 were concordant between the two methodologies for the KRAS mutations that were detected in both assays with the INFINITI® assay detecting a mutation in one sample that was indeterminate by Sanger sequencing and a third methodology; single nucleotide primer extension. This study indicates the utility of the AutoGenomics INFINITI® methodology in a clinical laboratory setting where technical expertise or access to equipment for DNA sequencing does not exist. Copyright © 2011 Elsevier B.V. All rights reserved.
The genetic architecture of long QT syndrome: A critical reappraisal.

PubMed

Giudicessi, John R; Wilde, Arthur A M; Ackerman, Michael J

2018-03-30

Collectively, the completion of the Human Genome Project and subsequent development of high-throughput next-generation sequencing methodologies have revolutionized genomic research. However, the rapid sequencing and analysis of thousands upon thousands of human exomes and genomes has taught us that most genes, including those known to cause heritable cardiovascular disorders such as long QT syndrome, harbor an unexpected background rate of rare, and presumably innocuous, non-synonymous genetic variation. In this Review, we aim to reappraise the genetic architecture underlying both the acquired and congenital forms of long QT syndrome by examining how the clinical phenotype associated with and background genetic variation in long QT syndrome-susceptibility genes impacts the clinical validity of existing gene-disease associations and the variant classification and reporting strategies that serve as the foundation for diagnostic long QT syndrome genetic testing. Copyright © 2018 Elsevier Inc. All rights reserved.
Development of a diaphragmatic motion-based elastography framework for assessment of liver stiffness

NASA Astrophysics Data System (ADS)

Weis, Jared A.; Johnsen, Allison M.; Wile, Geoffrey E.; Yankeelov, Thomas E.; Abramson, Richard G.; Miga, Michael I.

2015-03-01

Evaluation of mechanical stiffness imaging biomarkers, through magnetic resonance elastography (MRE), has shown considerable promise for non-invasive assessment of liver stiffness to monitor hepatic fibrosis. MRE typically requires specialized externally-applied vibratory excitation and scanner-specific motion-sensitive pulse sequences. In this work, we have developed an elasticity imaging approach that utilizes natural diaphragmatic respiratory motion to induce deformation and eliminates the need for external deformation excitation hardware and specialized pulse sequences. Our approach uses clinically-available standard of care volumetric imaging acquisitions, combined with offline model-based post-processing to generate volumetric estimates of stiffness within the liver and surrounding tissue structures. We have previously developed a novel methodology for non-invasive elasticity imaging which utilizes a model-based elasticity reconstruction algorithm and MR image volumes acquired under different states of deformation. In prior work, deformation was external applied through inflation of an air bladder placed within the MR radiofrequency coil. In this work, we extend the methodology with the goal of determining the feasibility of assessing liver mechanical stiffness using diaphragmatic respiratory motion between end-inspiration and end-expiration breath-holds as a source of deformation. We present initial investigations towards applying this methodology to assess liver stiffness in healthy volunteers and cirrhotic patients. Our preliminary results suggest that this method is capable of non-invasive image-based assessment of liver stiffness using natural diaphragmatic respiratory motion and provides considerable enthusiasm for extension of our approach towards monitoring liver stiffness in cirrhotic patients with limited impact to standard-of-care clinical imaging acquisition workflow.
Enhanced spatio-temporal alignment of plantar pressure image sequences using B-splines.

PubMed

Oliveira, Francisco P M; Tavares, João Manuel R S

2013-03-01

This article presents an enhanced methodology to align plantar pressure image sequences simultaneously in time and space. The temporal alignment of the sequences is accomplished using B-splines in the time modeling, and the spatial alignment can be attained using several geometric transformation models. The methodology was tested on a dataset of 156 real plantar pressure image sequences (3 sequences for each foot of the 26 subjects) that was acquired using a common commercial plate during barefoot walking. In the alignment of image sequences that were synthetically deformed both in time and space, an outstanding accuracy was achieved with the cubic B-splines. This accuracy was significantly better (p < 0.001) than the one obtained using the best solution proposed in our previous work. When applied to align real image sequences with unknown transformation involved, the alignment based on cubic B-splines also achieved superior results than our previous methodology (p < 0.001). The consequences of the temporal alignment on the dynamic center of pressure (COP) displacement was also assessed by computing the intraclass correlation coefficients (ICC) before and after the temporal alignment of the three image sequence trials of each foot of the associated subject at six time instants. The results showed that, generally, the ICCs related to the medio-lateral COP displacement were greater when the sequences were temporally aligned than the ICCs of the original sequences. Based on the experimental findings, one can conclude that the cubic B-splines are a remarkable solution for the temporal alignment of plantar pressure image sequences. These findings also show that the temporal alignment can increase the consistency of the COP displacement on related acquired plantar pressure image sequences.
Small studies may overestimate the effect sizes in critical care meta-analyses: a meta-epidemiological study

PubMed Central

2013-01-01

Introduction Small-study effects refer to the fact that trials with limited sample sizes are more likely to report larger beneficial effects than large trials. However, this has never been investigated in critical care medicine. Thus, the present study aimed to examine the presence and extent of small-study effects in critical care medicine. Methods Critical care meta-analyses involving randomized controlled trials and reported mortality as an outcome measure were considered eligible for the study. Component trials were classified as large (≥100 patients per arm) and small (<100 patients per arm) according to their sample sizes. Ratio of odds ratio (ROR) was calculated for each meta-analysis and then RORs were combined using a meta-analytic approach. ROR<1 indicated larger beneficial effect in small trials. Small and large trials were compared in methodological qualities including sequence generating, blinding, allocation concealment, intention to treat and sample size calculation. Results A total of 27 critical care meta-analyses involving 317 trials were included. Of them, five meta-analyses showed statistically significant RORs <1, and other meta-analyses did not reach a statistical significance. Overall, the pooled ROR was 0.60 (95% CI: 0.53 to 0.68); the heterogeneity was moderate with an I2 of 50.3% (chi-squared = 52.30; P = 0.002). Large trials showed significantly better reporting quality than small trials in terms of sequence generating, allocation concealment, blinding, intention to treat, sample size calculation and incomplete follow-up data. Conclusions Small trials are more likely to report larger beneficial effects than large trials in critical care medicine, which could be partly explained by the lower methodological quality in small trials. Caution should be practiced in the interpretation of meta-analyses involving small trials. PMID:23302257
RBT-GA: a novel metaheuristic for solving the Multiple Sequence Alignment problem.

PubMed

Taheri, Javid; Zomaya, Albert Y

2009-07-07

Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences.
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

PubMed

Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

2017-06-14

Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
The ChIP-exo Method: Identifying Protein-DNA Interactions with Near Base Pair Precision.

PubMed

Perreault, Andrea A; Venters, Bryan J

2016-12-23

Chromatin immunoprecipitation (ChIP) is an indispensable tool in the fields of epigenetics and gene regulation that isolates specific protein-DNA interactions. ChIP coupled to high throughput sequencing (ChIP-seq) is commonly used to determine the genomic location of proteins that interact with chromatin. However, ChIP-seq is hampered by relatively low mapping resolution of several hundred base pairs and high background signal. The ChIP-exo method is a refined version of ChIP-seq that substantially improves upon both resolution and noise. The key distinction of the ChIP-exo methodology is the incorporation of lambda exonuclease digestion in the library preparation workflow to effectively footprint the left and right 5' DNA borders of the protein-DNA crosslink site. The ChIP-exo libraries are then subjected to high throughput sequencing. The resulting data can be leveraged to provide unique and ultra-high resolution insights into the functional organization of the genome. Here, we describe the ChIP-exo method that we have optimized and streamlined for mammalian systems and next-generation sequencing-by-synthesis platform.
Construction of sequences of exact analytical solutions for heat diffusion in graded heterogeneous materials by the Darboux transformation method. Examples for half-space

NASA Astrophysics Data System (ADS)

Krapez, J.-C.

2016-09-01

The Darboux transformation is a differential transformation which, like other related methods (supersymmetry quantum mechanics-SUSYQM, factorization method) allows generating sequences of solvable potentials for the stationary 1D Schrodinger equation. It was recently shown that the heat equation in graded heterogeneous media, after a Liouville transformation, reduces to a pair of Schrödinger equations sharing the same potential function, one for the transformed temperature and one for the square root of effusivity. Repeated joint PROperty and Field Darboux Transformations (PROFIDT method) then yield two sequences of solutions: one of new solvable effusivity profiles and one of the corresponding temperature fields. In this paper we present and discuss the outcome in the case of a graded half-space domain. The interest in this methodology is that it provides closed-form solutions based on elementary functions. They are thus easily amenable to an implementation in an inversion process aimed, for example, at retrieving a subsurface effusivity profile from a modulated or transient surface temperature measurement (photothermal characterization).
CLIP-related methodologies and their application to retrovirology.

PubMed

Bieniasz, Paul D; Kutluay, Sebla B

2018-05-02

Virtually every step of HIV-1 replication and numerous cellular antiviral defense mechanisms are regulated by the binding of a viral or cellular RNA-binding protein (RBP) to distinct sequence or structural elements on HIV-1 RNAs. Until recently, these protein-RNA interactions were studied largely by in vitro binding assays complemented with genetics approaches. However, these methods are highly limited in the identification of the relevant targets of RBPs in physiologically relevant settings. Development of crosslinking-immunoprecipitation sequencing (CLIP) methodology has revolutionized the analysis of protein-nucleic acid complexes. CLIP combines immunoprecipitation of covalently crosslinked protein-RNA complexes with high-throughput sequencing, providing a global account of RNA sequences bound by a RBP of interest in cells (or virions) at near-nucleotide resolution. Numerous variants of the CLIP protocol have recently been developed, some with major improvements over the original. Herein, we briefly review these methodologies and give examples of how CLIP has been successfully applied to retrovirology research.
SIMBA: a web tool for managing bacterial genome assembly generated by Ion PGM sequencing technology.

PubMed

Mariano, Diego C B; Pereira, Felipe L; Aguiar, Edgar L; Oliveira, Letícia C; Benevides, Leandro; Guimarães, Luís C; Folador, Edson L; Sousa, Thiago J; Ghosh, Preetam; Barh, Debmalya; Figueiredo, Henrique C P; Silva, Artur; Ramos, Rommel T J; Azevedo, Vasco A C

2016-12-15

The evolution of Next-Generation Sequencing (NGS) has considerably reduced the cost per sequenced-base, allowing a significant rise of sequencing projects, mainly in prokaryotes. However, the range of available NGS platforms requires different strategies and software to correctly assemble genomes. Different strategies are necessary to properly complete an assembly project, in addition to the installation or modification of various software. This requires users to have significant expertise in these software and command line scripting experience on Unix platforms, besides possessing the basic expertise on methodologies and techniques for genome assembly. These difficulties often delay the complete genome assembly projects. In order to overcome this, we developed SIMBA (SImple Manager for Bacterial Assemblies), a freely available web tool that integrates several component tools for assembling and finishing bacterial genomes. SIMBA provides a friendly and intuitive user interface so bioinformaticians, even with low computational expertise, can work under a centralized administrative control system of assemblies managed by the assembly center head. SIMBA guides the users to execute assembly process through simple and interactive pages. SIMBA workflow was divided in three modules: (i) projects: allows a general vision of genome sequencing projects, in addition to data quality analysis and data format conversions; (ii) assemblies: allows de novo assemblies with the software Mira, Minia, Newbler and SPAdes, also assembly quality validations using QUAST software; and (iii) curation: presents methods to finishing assemblies through tools for scaffolding contigs and close gaps. We also presented a case study that validated the efficacy of SIMBA to manage bacterial assemblies projects sequenced using Ion Torrent PGM. Besides to be a web tool for genome assembly, SIMBA is a complete genome assemblies project management system, which can be useful for managing of several projects in laboratories. SIMBA source code is available to download and install in local webservers at http://ufmg-simba.sourceforge.net .
A population genetics view of animal domestication.

PubMed

Larson, Greger; Burger, Joachim

2013-04-01

The fundamental shift associated with the domestication of plants and animals allowed for a dramatic increase in human population sizes and the emergence of modern society. Despite its importance and the decades of research devoted to studying it, questions regarding the origins and processes of domestication remain. Here, we review recent theoretical advances and present a perspective that underscores the crucial role that population admixture has played in influencing the genomes of domestic animals over the past 10000 years. We then discuss novel approaches to generating and analysing genetic data, emphasising the importance of an explicit hypothesis-testing approach for the inference of the origins and subsequent evolution and demography of domestic animals. By applying next-generation sequencing technology alongside appropriate biostatistical methodologies, a substantially deeper understanding of domestication is on the horizon. Copyright © 2013 Elsevier Ltd. All rights reserved.
Transgenic mice in developmental toxicology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woychik, R.P.

1992-12-31

Advances in molecular biology and embryology are being utilized for the generation of transgenic mice, animals that contain specific additions, deletions, or modifications of genes or sequences in their DNA. Mouse embryonic stem cells and homologous recombination procedures have made it possible to target specific DNA structural alterations to highly localized region in the host chromosomes. The majority of the DNA structural rearrangements in transgenic mice can be passed through the germ line and used to establish new genetic traits in the carrier animals. Since the use of transgenic mice is having such an enormous impact on so many areasmore » of mammalian biological research, including developmental toxicology, the objective of this review is to briefly describe the fundamental methodologies for generating transgenic mice and to describe one particular application that has direct relevance to the field of genetic toxicology.« less
Transgenic mice in developmental toxicology

DOE Office of Scientific and Technical Information (OSTI.GOV)

Woychik, R.P.

1992-01-01

Advances in molecular biology and embryology are being utilized for the generation of transgenic mice, animals that contain specific additions, deletions, or modifications of genes or sequences in their DNA. Mouse embryonic stem cells and homologous recombination procedures have made it possible to target specific DNA structural alterations to highly localized region in the host chromosomes. The majority of the DNA structural rearrangements in transgenic mice can be passed through the germ line and used to establish new genetic traits in the carrier animals. Since the use of transgenic mice is having such an enormous impact on so many areasmore » of mammalian biological research, including developmental toxicology, the objective of this review is to briefly describe the fundamental methodologies for generating transgenic mice and to describe one particular application that has direct relevance to the field of genetic toxicology.« less
From Ramachandran Maps to Tertiary Structures of Proteins.

PubMed

DasGupta, Debarati; Kaushik, Rahul; Jayaram, B

2015-08-27

Sequence to structure of proteins is an unsolved problem. A possible coarse grained resolution to this entails specification of all the torsional (Φ, Ψ) angles along the backbone of the polypeptide chain. The Ramachandran map quite elegantly depicts the allowed conformational (Φ, Ψ) space of proteins which is still very large for the purposes of accurate structure generation. We have divided the allowed (Φ, Ψ) space in Ramachandran maps into 27 distinct conformations sufficient to regenerate a structure to within 5 Å from the native, at least for small proteins, thus reducing the structure prediction problem to a specification of an alphanumeric string, i.e., the amino acid sequence together with one of the 27 conformations preferred by each amino acid residue. This still theoretically results in 27(n) conformations for a protein comprising "n" amino acids. We then investigated the spatial correlations at the two-residue (dipeptide) and three-residue (tripeptide) levels in what may be described as higher order Ramachandran maps, with the premise that the allowed conformational space starts to shrink as we introduce neighborhood effects. We found, for instance, for a tripeptide which potentially can exist in any of the 27(3) "allowed" conformations, three-fourths of these conformations are redundant to the 95% confidence level, suggesting sequence context dependent preferred conformations. We then created a look-up table of preferred conformations at the tripeptide level and correlated them with energetically favorable conformations. We found in particular that Boltzmann probabilities calculated from van der Waals energies for each conformation of tripeptides correlate well with the observed populations in the structural database (the average correlation coefficient is ∼0.8). An alpha-numeric string and hence the tertiary structure can be generated for any sequence from the look-up table within minutes on a single processor and to a higher level of accuracy if secondary structure can be specified. We tested the methodology on 100 small proteins, and in 90% of the cases, a structure within 5 Å is recovered. We thus believe that the method presented here provides the missing link between Ramachandran maps and tertiary structures of proteins. A Web server to convert a tertiary structure to an alphanumeric string and to predict the tertiary structure from the sequence of a protein using the above methodology is created and made freely accessible at http://www.scfbio-iitd.res.in/software/proteomics/rm2ts.jsp.
Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury.

PubMed

Maia, Rafaela M; Valente, Valeria; Cunha, Marco A V; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew J G; Monesi, Nadia; Ramos, Ricardo G P; Espreafico, Enilza M; Paçó-Larson, Maria L

2007-07-24

The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data.
Identification of unannotated exons of low abundance transcripts in Drosophila melanogaster and cloning of a new serine protease gene upregulated upon injury

PubMed Central

Maia, Rafaela M; Valente, Valeria; Cunha, Marco AV; Sousa, Josane F; Araujo, Daniela D; Silva, Wilson A; Zago, Marco A; Dias-Neto, Emmanuel; Souza, Sandro J; Simpson, Andrew JG; Monesi, Nadia; Ramos, Ricardo GP; Espreafico, Enilza M; Paçó-Larson, Maria L

2007-01-01

Background The sequencing of the D.melanogaster genome revealed an unexpected small number of genes (~ 14,000) indicating that mechanisms acting on generation of transcript diversity must have played a major role in the evolution of complex metazoans. Among the most extensively used mechanisms that accounts for this diversity is alternative splicing. It is estimated that over 40% of Drosophila protein-coding genes contain one or more alternative exons. A recent transcription map of the Drosophila embryogenesis indicates that 30% of the transcribed regions are unannotated, and that 1/3 of this is estimated as missed or alternative exons of previously characterized protein-coding genes. Therefore, the identification of the variety of expressed transcripts depends on experimental data for its final validation and is continuously being performed using different approaches. We applied the Open Reading Frame Expressed Sequence Tags (ORESTES) methodology, which is capable of generating cDNA data from the central portion of rare transcripts, in order to investigate the presence of hitherto unnanotated regions of Drosophila transcriptome. Results Bioinformatic analysis of 1,303 Drosophila ORESTES clusters identified 68 sequences derived from unannotated regions in the current Drosophila genome version (4.3). Of these, a set of 38 was analysed by polyA+ northern blot hybridization, validating 17 (50%) new exons of low abundance transcripts. For one of these ESTs, we obtained the cDNA encompassing the complete coding sequence of a new serine protease, named SP212. The SP212 gene is part of a serine protease gene cluster located in the chromosome region 88A12-B1. This cluster includes the predicted genes CG9631, CG9649 and CG31326, which were previously identified as up-regulated after immune challenges in genomic-scale microarray analysis. In agreement with the proposal that this locus is co-regulated in response to microorganisms infection, we show here that SP212 is also up-regulated upon injury. Conclusion Using the ORESTES methodology we identified 17 novel exons from low abundance Drosophila transcripts, and through a PCR approach the complete CDS of one of these transcripts was defined. Our results show that the computational identification and manual inspection are not sufficient to annotate a genome in the absence of experimentally derived data. PMID:17650329
High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

PubMed

Harrison, Lucas B; Hanson, Nancy D

2017-06-01

Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.
Complete mitochondrial genome sequences of the northern spotted owl (Strix occidentalis caurina) and the barred owl (Strix varia; Aves: Strigiformes: Strigidae) confirm the presence of a duplicated control region

PubMed Central

Henderson, James B.; Sellas, Anna B.; Fuchs, Jérôme; Bowie, Rauri C.K.; Dumbacher, John P.

2017-01-01

We report here the successful assembly of the complete mitochondrial genomes of the northern spotted owl (Strix occidentalis caurina) and the barred owl (S. varia). We utilized sequence data from two sequencing methodologies, Illumina paired-end sequence data with insert lengths ranging from approximately 250 nucleotides (nt) to 9,600 nt and read lengths from 100–375 nt and Sanger-derived sequences. We employed multiple assemblers and alignment methods to generate the final assemblies. The circular genomes of S. o. caurina and S. varia are comprised of 19,948 nt and 18,975 nt, respectively. Both code for two rRNAs, twenty-two tRNAs, and thirteen polypeptides. They both have duplicated control region sequences with complex repeat structures. We were not able to assemble the control regions solely using Illumina paired-end sequence data. By fully spanning the control regions, Sanger-derived sequences enabled accurate and complete assembly of these mitochondrial genomes. These are the first complete mitochondrial genome sequences of owls (Aves: Strigiformes) possessing duplicated control regions. We searched the nuclear genome of S. o. caurina for copies of mitochondrial genes and found at least nine separate stretches of nuclear copies of gene sequences originating in the mitochondrial genome (Numts). The Numts ranged from 226–19,522 nt in length and included copies of all mitochondrial genes except tRNAPro, ND6, and tRNAGlu. Strix occidentalis caurina and S. varia exhibited an average of 10.74% (8.68% uncorrected p-distance) divergence across the non-tRNA mitochondrial genes. PMID:29038757
Automated Sequence Generation Process and Software

NASA Technical Reports Server (NTRS)

Gladden, Roy

2007-01-01

"Automated sequence generation" (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences.

Transcriptome Assembly and Analysis of Tibetan Hulless Barley (Hordeum vulgare L. var. nudum) Developing Grains, with Emphasis on Quality Properties

PubMed Central

Chen, Xin; Long, Hai; Gao, Ping; Deng, Guangbing; Pan, Zhifen; Liang, Junjun; Tang, Yawei; Tashi, Nyima; Yu, Maoqun

2014-01-01

Background Hulless barley is attracting increasing attention due to its unique nutritional value and potential health benefits. However, the molecular biology of the barley grain development and nutrient storage are not well understood. Furthermore, the genetic potential of hulless barley has not been fully tapped for breeding. Methodology/Principal Findings In the present study, we investigated the transcriptome features during hulless barley grain development. Using Illumina paired-end RNA-Sequencing, we generated two data sets of the developing grain transcriptomes from two hulless barley landraces. A total of 13.1 and 12.9 million paired-end reads with lengths of 90 bp were generated from the two varieties and were assembled to 48,863 and 45,788 unigenes, respectively. A combined dataset of 46,485 All-Unigenes were generated from two transcriptomes with an average length of 542 bp, and 36,278 among were annotated with gene descriptions, conserved protein domains or gene ontology terms. Furthermore, sequences and expression levels of genes related to the biosynthesis of storage reserve compounds (starch, protein, and β-glucan) were analyzed, and their temporal and spatial patterns were deduced from the transcriptome data of cultivated barley Morex. Conclusions/Significance We established a sequences and functional annotation integrated database and examined the expression profiles of the developing grains of Tibetan hulless barley. The characterization of genes encoding storage proteins and enzymes of starch synthesis and (1–3;1–4)-β-D-glucan synthesis provided an overview of changes in gene expression associated with grain nutrition and health properties. Furthermore, the characterization of these genes provides a gene reservoir, which helps in quality improvement of hulless barley. PMID:24871534
Systematic review of the methodological quality of controlled trials evaluating Chinese herbal medicine in patients with rheumatoid arthritis.

PubMed

Pan, Xin; Lopez-Olivo, Maria A; Song, Juhee; Pratt, Gregory; Suarez-Almazor, Maria E

2017-03-01

We appraised the methodological and reporting quality of randomised controlled clinical trials (RCTs) evaluating the efficacy and safety of Chinese herbal medicine (CHM) in patients with rheumatoid arthritis (RA). For this systematic review, electronic databases were searched from inception until June 2015. The search was limited to humans and non-case report studies, but was not limited by language, year of publication or type of publication. Two independent reviewers selected RCTs, evaluating CHM in RA (herbals and decoctions). Descriptive statistics were used to report on risk of bias and their adherence to reporting standards. Multivariable logistic regression analysis was performed to determine study characteristics associated with high or unclear risk of bias. Out of 2342 unique citations, we selected 119 RCTs including 18 919 patients: 10 108 patients received CHM alone and 6550 received one of 11 treatment combinations. A high risk of bias was observed across all domains: 21% had a high risk for selection bias (11% from sequence generation and 30% from allocation concealment), 85% for performance bias, 89% for detection bias, 4% for attrition bias and 40% for reporting bias. In multivariable analysis, fewer authors were associated with selection bias (allocation concealment), performance bias and attrition bias, and earlier year of publication and funding source not reported or disclosed were associated with selection bias (sequence generation). Studies published in non-English language were associated with reporting bias. Poor adherence to recommended reporting standards (<60% of the studies not providing sufficient information) was observed in 11 of the 23 sections evaluated. Study quality and data extraction were performed by one reviewer and cross-checked by a second reviewer. Translation to English was performed by one reviewer in 85% of the included studies. Studies evaluating CHM often fail to meet expected methodological criteria, and high-quality evidence is lacking. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
GIS-based planning system for managing the flow of construction and demolition waste in Brazil.

PubMed

Paz, Diogo Henrique Fernandes da; Lafayette, Kalinny Patrícia Vaz; Sobral, Maria do Carmo

2018-05-01

The objective of this article was to plan a network for municipal management of construction and demolition waste in Brazil with the assistance of a geographic information system, using the city of Recife as a case study. The methodology was carried out in three stages. The first was to map the illegal construction and demolition of waste disposal points across Recife and classify the waste according to its recyclability. In sequence, a method for indicating suitable areas for installation of voluntary delivery points, for small waste generators, are presented. Finally, a method for indicating suitable areas for the installation of trans-shipment and waste sorting areas, developed for large generators, is presented. The results show that a geographic information system is an essential tool in the planning of municipal construction and demolition waste management, in order to facilitate the spatial analysis and control the generation, sorting, collection, transportation, and final destination of construction and demolition waste, increasing the rate of recovery and recycling of materials.
Deterministic nonlinear phase gates induced by a single qubit

NASA Astrophysics Data System (ADS)

Park, Kimin; Marek, Petr; Filip, Radim

2018-05-01

We propose deterministic realizations of nonlinear phase gates by repeating a finite sequence of non-commuting Rabi interactions between a harmonic oscillator and only a single two-level ancillary qubit. We show explicitly that the key nonclassical features of the ideal cubic phase gate and the quartic phase gate are generated in the harmonic oscillator faithfully by our method. We numerically analyzed the performance of our scheme under realistic imperfections of the oscillator and the two-level system. The methodology is extended further to higher-order nonlinear phase gates. This theoretical proposal completes the set of operations required for continuous-variable quantum computation.
A genomic regulatory network for development

NASA Technical Reports Server (NTRS)

Davidson, Eric H.; Rast, Jonathan P.; Oliveri, Paola; Ransick, Andrew; Calestani, Cristina; Yuh, Chiou-Hwa; Minokawa, Takuya; Amore, Gabriele; Hinman, Veronica; Arenas-Mena, Cesar;

2002-01-01

Development of the body plan is controlled by large networks of regulatory genes. A gene regulatory network that controls the specification of endoderm and mesoderm in the sea urchin embryo is summarized here. The network was derived from large-scale perturbation analyses, in combination with computational methodologies, genomic data, cis-regulatory analysis, and molecular embryology. The network contains over 40 genes at present, and each node can be directly verified at the DNA sequence level by cis-regulatory analysis. Its architecture reveals specific and general aspects of development, such as how given cells generate their ordained fates in the embryo and why the process moves inexorably forward in developmental time.

Assaying gene function by growth competition experiment.

PubMed

Merritt, Joshua; Edwards, Jeremy S

2004-07-01

High-throughput screening and analysis is one of the emerging paradigms in biotechnology. In particular, high-throughput methods are essential in the field of functional genomics because of the vast amount of data generated in recent and ongoing genome sequencing efforts. In this report we discuss integrated functional analysis methodologies which incorporate both a growth competition component and a highly parallel assay used to quantify results of the growth competition. Several applications of the two most widely used technologies in the field, i.e., transposon mutagenesis and deletion strain library growth competition, and individual applications of several developing or less widely reported technologies are presented.
PIMMS (Pragmatic Insertional Mutation Mapping System) Laboratory Methodology a Readily Accessible Tool for Identification of Essential Genes in Streptococcus

PubMed Central

Blanchard, Adam M.; Egan, Sharon A.; Emes, Richard D.; Warry, Andrew; Leigh, James A.

2016-01-01

The Pragmatic Insertional Mutation Mapping (PIMMS) laboratory protocol was developed alongside various bioinformatics packages (Blanchard et al., 2015) to enable detection of essential and conditionally essential genes in Streptococcus and related bacteria. This extended the methodology commonly used to locate insertional mutations in individual mutants to the analysis of mutations in populations of bacteria. In Streptococcus uberis, a pyogenic Streptococcus associated with intramammary infection and mastitis in ruminants, the mutagen pGhost9:ISS1 was shown to integrate across the entire genome. Analysis of >80,000 mutations revealed 196 coding sequences, which were not be mutated and a further 67 where mutation only occurred beyond the 90th percentile of the coding sequence. These sequences showed good concordance with sequences within the database of essential genes and typically matched sequences known to be associated with basic cellular functions. Due to the broad utility of this mutagen and the simplicity of the methodology it is anticipated that PIMMS will be of value to a wide range of laboratories in functional genomic analysis of a wide range of Gram positive bacteria (Streptococcus, Enterococcus, and Lactococcus) of medical, veterinary, and industrial significance. PMID:27826289
A conditional stochastic weather generator for seasonal to multi-decadal simulations

NASA Astrophysics Data System (ADS)

Verdin, Andrew; Rajagopalan, Balaji; Kleiber, William; Podestá, Guillermo; Bert, Federico

2018-01-01

We present the application of a parametric stochastic weather generator within a nonstationary context, enabling simulations of weather sequences conditioned on interannual and multi-decadal trends. The generalized linear model framework of the weather generator allows any number of covariates to be included, such as large-scale climate indices, local climate information, seasonal precipitation and temperature, among others. Here we focus on the Salado A basin of the Argentine Pampas as a case study, but the methodology is portable to any region. We include domain-averaged (e.g., areal) seasonal total precipitation and mean maximum and minimum temperatures as covariates for conditional simulation. Areal covariates are motivated by a principal component analysis that indicates the seasonal spatial average is the dominant mode of variability across the domain. We find this modification to be effective in capturing the nonstationarity prevalent in interseasonal precipitation and temperature data. We further illustrate the ability of this weather generator to act as a spatiotemporal downscaler of seasonal forecasts and multidecadal projections, both of which are generally of coarse resolution.
Whole-Genome Sequencing and Assembly with High-Throughput, Short-Read Technologies

PubMed Central

Sundquist, Andreas; Ronaghi, Mostafa; Tang, Haixu; Pevzner, Pavel; Batzoglou, Serafim

2007-01-01

While recently developed short-read sequencing technologies may dramatically reduce the sequencing cost and eventually achieve the $1000 goal for re-sequencing, their limitations prevent the de novo sequencing of eukaryotic genomes with the standard shotgun sequencing protocol. We present SHRAP (SHort Read Assembly Protocol), a sequencing protocol and assembly methodology that utilizes high-throughput short-read technologies. We describe a variation on hierarchical sequencing with two crucial differences: (1) we select a clone library from the genome randomly rather than as a tiling path and (2) we sample clones from the genome at high coverage and reads from the clones at low coverage. We assume that 200 bp read lengths with a 1% error rate and inexpensive random fragment cloning on whole mammalian genomes is feasible. Our assembly methodology is based on first ordering the clones and subsequently performing read assembly in three stages: (1) local assemblies of regions significantly smaller than a clone size, (2) clone-sized assemblies of the results of stage 1, and (3) chromosome-sized assemblies. By aggressively localizing the assembly problem during the first stage, our method succeeds in assembling short, unpaired reads sampled from repetitive genomes. We tested our assembler using simulated reads from D. melanogaster and human chromosomes 1, 11, and 21, and produced assemblies with large sets of contiguous sequence and a misassembly rate comparable to other draft assemblies. Tested on D. melanogaster and the entire human genome, our clone-ordering method produces accurate maps, thereby localizing fragment assembly and enabling the parallelization of the subsequent steps of our pipeline. Thus, we have demonstrated that truly inexpensive de novo sequencing of mammalian genomes will soon be possible with high-throughput, short-read technologies using our methodology. PMID:17534434
A Six Nuclear Gene Phylogeny of Citrus (Rutaceae) Taking into Account Hybridization and Lineage Sorting

PubMed Central

Keremane, Manjunath L.; Lee, Richard F.; Maureira-Butler, Ivan J.; Roose, Mikeal L.

2013-01-01

Background Genus Citrus (Rutaceae) comprises many important cultivated species that generally hybridize easily. Phylogenetic study of a group showing extensive hybridization is challenging. Since the genus Citrus has diverged recently (4–12 Ma), incomplete lineage sorting of ancestral polymorphisms is also likely to cause discrepancies among genes in phylogenetic inferences. Incongruence of gene trees is observed and it is essential to unravel the processes that cause inconsistencies in order to understand the phylogenetic relationships among the species. Methodology and Principal Findings (1) We generated phylogenetic trees using haplotype sequences of six low copy nuclear genes. (2) Published simple sequence repeat data were re-analyzed to study population structure and the results were compared with the phylogenetic trees constructed using sequence data and coalescence simulations. (3) To distinguish between hybridization and incomplete lineage sorting, we developed and utilized a coalescence simulation approach. In other studies, species trees have been inferred despite the possibility of hybridization having occurred and used to generate null distributions of the effect of lineage sorting alone (by coalescent simulation). Since this is problematic, we instead generate these distributions directly from observed gene trees. Of the six trees generated, we used the most resolved three to detect hybrids. We found that 11 of 33 samples appear to be affected by historical hybridization. Analysis of the remaining three genes supported the conclusions from the hybrid detection test. Conclusions We have identified or confirmed probable hybrid origins for several Citrus cultivars using three different approaches–gene phylogenies, population structure analysis and coalescence simulation. Hybridization and incomplete lineage sorting were identified primarily based on differences among gene phylogenies with reference to null expectations via coalescence simulations. We conclude that identifying hybridization as a frequent cause of incongruence among gene trees is critical to correctly infer the phylogeny among species of Citrus. PMID:23874615
A cis-regulatory logic simulator.

PubMed

Zeigler, Robert D; Gertz, Jason; Cohen, Barak A

2007-07-27

A major goal of computational studies of gene regulation is to accurately predict the expression of genes based on the cis-regulatory content of their promoters. The development of computational methods to decode the interactions among cis-regulatory elements has been slow, in part, because it is difficult to know, without extensive experimental validation, whether a particular method identifies the correct cis-regulatory interactions that underlie a given set of expression data. There is an urgent need for test expression data in which the interactions among cis-regulatory sites that produce the data are known. The ability to rapidly generate such data sets would facilitate the development and comparison of computational methods that predict gene expression patterns from promoter sequence. We developed a gene expression simulator which generates expression data using user-defined interactions between cis-regulatory sites. The simulator can incorporate additive, cooperative, competitive, and synergistic interactions between regulatory elements. Constraints on the spacing, distance, and orientation of regulatory elements and their interactions may also be defined and Gaussian noise can be added to the expression values. The simulator allows for a data transformation that simulates the sigmoid shape of expression levels from real promoters. We found good agreement between sets of simulated promoters and predicted regulatory modules from real expression data. We present several data sets that may be useful for testing new methodologies for predicting gene expression from promoter sequence. We developed a flexible gene expression simulator that rapidly generates large numbers of simulated promoters and their corresponding transcriptional output based on specified interactions between cis-regulatory sites. When appropriate rule sets are used, the data generated by our simulator faithfully reproduces experimentally derived data sets. We anticipate that using simulated gene expression data sets will facilitate the direct comparison of computational strategies to predict gene expression from promoter sequence. The source code is available online and as additional material. The test sets are available as additional material.
Initial steps towards a production platform for DNA sequence analysis on the grid.

PubMed

Luyf, Angela C M; van Schaik, Barbera D C; de Vries, Michel; Baas, Frank; van Kampen, Antoine H C; Olabarriaga, Silvia D

2010-12-14

Bioinformatics is confronted with a new data explosion due to the availability of high throughput DNA sequencers. Data storage and analysis becomes a problem on local servers, and therefore it is needed to switch to other IT infrastructures. Grid and workflow technology can help to handle the data more efficiently, as well as facilitate collaborations. However, interfaces to grids are often unfriendly to novice users. In this study we reused a platform that was developed in the VL-e project for the analysis of medical images. Data transfer, workflow execution and job monitoring are operated from one graphical interface. We developed workflows for two sequence alignment tools (BLAST and BLAT) as a proof of concept. The analysis time was significantly reduced. All workflows and executables are available for the members of the Dutch Life Science Grid and the VL-e Medical virtual organizations All components are open source and can be transported to other grid infrastructures. The availability of in-house expertise and tools facilitates the usage of grid resources by new users. Our first results indicate that this is a practical, powerful and scalable solution to address the capacity and collaboration issues raised by the deployment of next generation sequencers. We currently adopt this methodology on a daily basis for DNA sequencing and other applications. More information and source code is available via http://www.bioinformaticslaboratory.nl/
Allele-specific copy-number discovery from whole-genome and whole-exome sequencing.

PubMed

Wang, WeiBo; Wang, Wei; Sun, Wei; Crowley, James J; Szatkiewicz, Jin P

2015-08-18

Copy-number variants (CNVs) are a major form of genetic variation and a risk factor for various human diseases, so it is crucial to accurately detect and characterize them. It is conceivable that allele-specific reads from high-throughput sequencing data could be leveraged to both enhance CNV detection and produce allele-specific copy number (ASCN) calls. Although statistical methods have been developed to detect CNVs using whole-genome sequence (WGS) and/or whole-exome sequence (WES) data, information from allele-specific read counts has not yet been adequately exploited. In this paper, we develop an integrated method, called AS-GENSENG, which incorporates allele-specific read counts in CNV detection and estimates ASCN using either WGS or WES data. To evaluate the performance of AS-GENSENG, we conducted extensive simulations, generated empirical data using existing WGS and WES data sets and validated predicted CNVs using an independent methodology. We conclude that AS-GENSENG not only predicts accurate ASCN calls but also improves the accuracy of total copy number calls, owing to its unique ability to exploit information from both total and allele-specific read counts while accounting for various experimental biases in sequence data. Our novel, user-friendly and computationally efficient method and a complete analytic protocol is freely available at https://sourceforge.net/projects/asgenseng/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Temporal variation of aftershocks by means of multifractal characterization of their inter-event time and cluster analysis

NASA Astrophysics Data System (ADS)

Figueroa-Soto, A.; Zuñiga, R.; Marquez-Ramirez, V.; Monterrubio-Velasco, M.

2017-12-01

. The inter-event time characteristics of seismic aftershock sequences can provide important information to discern stages in the aftershock generation process. In order to investigate whether separate dynamic stages can be identified, (1) aftershock series after selected earthquake mainshocks, which took place at similar tectonic regimes were analyzed. To this end we selected two well-defined aftershock sequences from New Zealand and one aftershock sequence for Mexico, we (2) analyzed the fractal behavior of the logarithm of inter-event times (also called waiting times) of aftershocks by means of Holdeŕs exponent, and (3) their magnitude and spatial location based on a methodology proposed by Zaliapin and Ben Zion [2011] which accounts for the clustering properties of the sequence. In general, more than two coherent process stages can be identified following the main rupture, evidencing a type of "cascade" process which precludes implying a single generalized power law even though the temporal rate and average fractal character appear to be unique (as in a single Omorís p value). We found that aftershock processes indeed show multi-fractal characteristics, which may be related to different stages in the process of diffusion, as seen in the temporary-spatial distribution of aftershocks. Our method provides a way of defining the onset of the return to seismic background activity and the end of the main aftershock sequence.
Inferring gene expression from ribosomal promoter sequences, a crowdsourcing approach

PubMed Central

Meyer, Pablo; Siwo, Geoffrey; Zeevi, Danny; Sharon, Eilon; Norel, Raquel; Segal, Eran; Stolovitzky, Gustavo; Siwo, Geoffrey; Rider, Andrew K.; Tan, Asako; Pinapati, Richard S.; Emrich, Scott; Chawla, Nitesh; Ferdig, Michael T.; Tung, Yi-An; Chen, Yong-Syuan; Chen, Mei-Ju May; Chen, Chien-Yu; Knight, Jason M.; Sahraeian, Sayed Mohammad Ebrahim; Esfahani, Mohammad Shahrokh; Dreos, Rene; Bucher, Philipp; Maier, Ezekiel; Saeys, Yvan; Szczurek, Ewa; Myšičková, Alena; Vingron, Martin; Klein, Holger; Kiełbasa, Szymon M.; Knisley, Jeff; Bonnell, Jeff; Knisley, Debra; Kursa, Miron B.; Rudnicki, Witold R.; Bhattacharjee, Madhuchhanda; Sillanpää, Mikko J.; Yeung, James; Meysman, Pieter; Rodríguez, Aminael Sánchez; Engelen, Kristof; Marchal, Kathleen; Huang, Yezhou; Mordelet, Fantine; Hartemink, Alexander; Pinello, Luca; Yuan, Guo-Cheng

2013-01-01

The Gene Promoter Expression Prediction challenge consisted of predicting gene expression from promoter sequences in a previously unknown experimentally generated data set. The challenge was presented to the community in the framework of the sixth Dialogue for Reverse Engineering Assessments and Methods (DREAM6), a community effort to evaluate the status of systems biology modeling methodologies. Nucleotide-specific promoter activity was obtained by measuring fluorescence from promoter sequences fused upstream of a gene for yellow fluorescence protein and inserted in the same genomic site of yeast Saccharomyces cerevisiae. Twenty-one teams submitted results predicting the expression levels of 53 different promoters from yeast ribosomal protein genes. Analysis of participant predictions shows that accurate values for low-expressed and mutated promoters were difficult to obtain, although in the latter case, only when the mutation induced a large change in promoter activity compared to the wild-type sequence. As in previous DREAM challenges, we found that aggregation of participant predictions provided robust results, but did not fare better than the three best algorithms. Finally, this study not only provides a benchmark for the assessment of methods predicting activity of a specific set of promoters from their sequence, but it also shows that the top performing algorithm, which used machine-learning approaches, can be improved by the addition of biological features such as transcription factor binding sites. PMID:23950146
Describing the diversity of Ag specific receptors in vertebrates: Contribution of repertoire deep sequencing.

PubMed

Castro, Rosario; Navelsaker, Sofie; Krasnov, Aleksei; Du Pasquier, Louis; Boudinot, Pierre

2017-10-01

During the last decades, gene and cDNA cloning identified TCR and Ig genes across vertebrates; genome sequencing of TCR and Ig loci in many species revealed the different organizations selected during evolution under the pressure of generating diverse repertoires of Ag receptors. By detecting clonotypes over a wide range of frequency, deep sequencing of Ig and TCR transcripts provides a new way to compare the structure of expressed repertoires in species of various sizes, at different stages of development, with different physiologies, and displaying multiple adaptations to the environment. In this review, we provide a short overview of the technologies currently used to produce global description of immune repertoires, describe how they have already been used in comparative immunology, and we discuss the future potential of such approaches. The development of these methodologies in new species holds promise for new discoveries concerning particular adaptations. As an example, understanding the development of adaptive immunity across metamorphosis in frogs has been made possible by such approaches. Repertoire sequencing is now widely used, not only in basic research but also in the context of immunotherapy and vaccination. Analysis of fish responses to pathogens and vaccines has already benefited from these methods. Finally, we also discuss potential advances based on repertoire sequencing of multigene families of immune sensors and effectors in invertebrates. Copyright © 2017 Elsevier Ltd. All rights reserved.
Next-Generation Immune Repertoire Sequencing as a Clue to Elucidate the Landscape of Immune Modulation by Host-Gut Microbiome Interactions.

PubMed

Ichinohe, Tatsuo; Miyama, Takahiko; Kawase, Takakazu; Honjo, Yasuko; Kitaura, Kazutaka; Sato, Hiroyuki; Shin-I, Tadasu; Suzuki, Ryuji

2018-01-01

The human immune system is a fine network consisted of the innumerable numbers of functional cells that balance the immunity and tolerance against various endogenous and environmental challenges. Although advances in modern immunology have revealed a role of many unique immune cell subsets, technologies that enable us to capture the whole landscape of immune responses against specific antigens have been not available to date. Acquired immunity against various microorganisms including host microbiome is principally founded on T cell and B cell populations, each of which expresses antigen-specific receptors that define a unique clonotype. Over the past several years, high-throughput next-generation sequencing has been developed as a powerful tool to profile T- and B-cell receptor repertoires in a given individual at the single-cell level. Sophisticated immuno-bioinformatic analyses by use of this innovative methodology have been already implemented in clinical development of antibody engineering, vaccine design, and cellular immunotherapy. In this article, we aim to discuss the possible application of high-throughput immune receptor sequencing in the field of nutritional and intestinal immunology. Although there are still unsolved caveats, this emerging technology combined with single-cell transcriptomics/proteomics provides a critical tool to unveil the previously unrecognized principle of host-microbiome immune homeostasis. Accumulation of such knowledge will lead to the development of effective ways for personalized immune modulation through deeper understanding of the mechanisms by which the intestinal environment affects our immune ecosystem.
Systematic Review of Synthetic Computed Tomography Generation Methodologies for Use in Magnetic Resonance Imaging-Only Radiation Therapy.

PubMed

Johnstone, Emily; Wyatt, Jonathan J; Henry, Ann M; Short, Susan C; Sebag-Montefiore, David; Murray, Louise; Kelly, Charles G; McCallum, Hazel M; Speight, Richard

2018-01-01

Magnetic resonance imaging (MRI) offers superior soft-tissue contrast as compared with computed tomography (CT), which is conventionally used for radiation therapy treatment planning (RTP) and patient positioning verification, resulting in improved target definition. The 2 modalities are co-registered for RTP; however, this introduces a systematic error. Implementing an MRI-only radiation therapy workflow would be advantageous because this error would be eliminated, the patient pathway simplified, and patient dose reduced. Unlike CT, in MRI there is no direct relationship between signal intensity and electron density; however, various methodologies for MRI-only RTP have been reported. A systematic review of these methods was undertaken. The PRISMA guidelines were followed. Embase and Medline databases were searched (1996 to March, 2017) for studies that generated synthetic CT scans (sCT)s for MRI-only radiation therapy. Sixty-one articles met the inclusion criteria. This review showed that MRI-only RTP techniques could be grouped into 3 categories: (1) bulk density override; (2) atlas-based; and (3) voxel-based techniques, which all produce an sCT scan from MR images. Bulk density override techniques either used a single homogeneous or multiple tissue override. The former produced large dosimetric errors (>2%) in some cases and the latter frequently required manual bone contouring. Atlas-based techniques used both single and multiple atlases and included methods incorporating pattern recognition techniques. Clinically acceptable sCTs were reported, but atypical anatomy led to erroneous results in some cases. Voxel-based techniques included methods using routine and specialized MRI sequences, namely ultra-short echo time imaging. High-quality sCTs were produced; however, use of multiple sequences led to long scanning times increasing the chances of patient movement. Using nonroutine sequences would currently be problematic in most radiation therapy centers. Atlas-based and voxel-based techniques were found to be the most clinically useful methods, with some studies reporting dosimetric differences of <1% between planning on the sCT and CT and <1-mm deviations when using sCTs for positional verification. Copyright © 2017 Elsevier Inc. All rights reserved.
Multiple templates-based homology modeling enhances structure quality of AT1 receptor: validation by molecular dynamics and antagonist docking.

PubMed

Sokkar, Pandian; Mohandass, Shylajanaciyar; Ramachandran, Murugesan

2011-07-01

We present a comparative account on 3D-structures of human type-1 receptor (AT1) for angiotensin II (AngII), modeled using three different methodologies. AngII activates a wide spectrum of signaling responses via the AT1 receptor that mediates physiological control of blood pressure and diverse pathological actions in cardiovascular, renal, and other cell types. Availability of 3D-model of AT1 receptor would significantly enhance the development of new drugs for cardiovascular diseases. However, templates of AT1 receptor with low sequence similarity increase the complexity in straightforward homology modeling, and hence there is a need to evaluate different modeling methodologies in order to use the models for sensitive applications such as rational drug design. Three models were generated for AT1 receptor by, (1) homology modeling with bovine rhodopsin as template, (2) homology modeling with multiple templates and (3) threading using I-TASSER web server. Molecular dynamics (MD) simulation (15 ns) of models in explicit membrane-water system, Ramachandran plot analysis and molecular docking with antagonists led to the conclusion that multiple template-based homology modeling outweighs other methodologies for AT1 modeling.
Comparative performance of high-density oligonucleotide sequencing and dideoxynucleotide sequencing of HIV type 1 pol from clinical samples.

PubMed

Günthard, H F; Wong, J K; Ignacio, C C; Havlir, D V; Richman, D D

1998-07-01

The performance of the high-density oligonucleotide array methodology (GeneChip) in detecting drug resistance mutations in HIV-1 pol was compared with that of automated dideoxynucleotide sequencing (ABI) of clinical samples, viral stocks, and plasmid-derived NL4-3 clones. Sequences from 29 clinical samples (plasma RNA, n = 17; lymph node RNA, n = 5; lymph node DNA, n = 7) from 12 patients, from 6 viral stock RNA samples, and from 13 NL4-3 clones were generated by both methods. Editing was done independently by a different investigator for each method before comparing the sequences. In addition, NL4-3 wild type (WT) and mutants were mixed in varying concentrations and sequenced by both methods. Overall, a concordance of 99.1% was found for a total of 30,865 bases compared. The comparison of clinical samples (plasma RNA and lymph node RNA and DNA) showed a slightly lower match of base calls, 98.8% for 19,831 nucleotides compared (protease region, 99.5%, n = 8272; RT region, 98.3%, n = 11,316), than for viral stocks and NL4-3 clones (protease region, 99.8%; RT region, 99.5%). Artificial mixing experiments showed a bias toward calling wild-type bases by GeneChip. Discordant base calls are most likely due to differential detection of mixtures. The concordance between GeneChip and ABI was high and appeared dependent on the nature of the templates (directly amplified versus cloned) and the complexity of mixes.

Monitoring of facial stress during space flight: Optical computer recognition combining discriminative and generative methods

NASA Astrophysics Data System (ADS)

Dinges, David F.; Venkataraman, Sundara; McGlinchey, Eleanor L.; Metaxas, Dimitris N.

2007-02-01

Astronauts are required to perform mission-critical tasks at a high level of functional capability throughout spaceflight. Stressors can compromise their ability to do so, making early objective detection of neurobehavioral problems in spaceflight a priority. Computer optical approaches offer a completely unobtrusive way to detect distress during critical operations in space flight. A methodology was developed and a study completed to determine whether optical computer recognition algorithms could be used to discriminate facial expressions during stress induced by performance demands. Stress recognition from a facial image sequence is a subject that has not received much attention although it is an important problem for many applications beyond space flight (security, human-computer interaction, etc.). This paper proposes a comprehensive method to detect stress from facial image sequences by using a model-based tracker. The image sequences were captured as subjects underwent a battery of psychological tests under high- and low-stress conditions. A cue integration-based tracking system accurately captured the rigid and non-rigid parameters of different parts of the face (eyebrows, lips). The labeled sequences were used to train the recognition system, which consisted of generative (hidden Markov model) and discriminative (support vector machine) parts that yield results superior to using either approach individually. The current optical algorithm methods performed at a 68% accuracy rate in an experimental study of 60 healthy adults undergoing periods of high-stress versus low-stress performance demands. Accuracy and practical feasibility of the technique is being improved further with automatic multi-resolution selection for the discretization of the mask, and automated face detection and mask initialization algorithms.
Bias-Corrected Targeted Next-Generation Sequencing for Rapid, Multiplexed Detection of Actionable Alterations in Cell-Free DNA from Advanced Lung Cancer Patients.

PubMed

Paweletz, Cloud P; Sacher, Adrian G; Raymond, Chris K; Alden, Ryan S; O'Connell, Allison; Mach, Stacy L; Kuang, Yanan; Gandhi, Leena; Kirschmeier, Paul; English, Jessie M; Lim, Lee P; Jänne, Pasi A; Oxnard, Geoffrey R

2016-02-15

Tumor genotyping is a powerful tool for guiding non-small cell lung cancer (NSCLC) care; however, comprehensive tumor genotyping can be logistically cumbersome. To facilitate genotyping, we developed a next-generation sequencing (NGS) assay using a desktop sequencer to detect actionable mutations and rearrangements in cell-free plasma DNA (cfDNA). An NGS panel was developed targeting 11 driver oncogenes found in NSCLC. Targeted NGS was performed using a novel methodology that maximizes on-target reads, and minimizes artifact, and was validated on DNA dilutions derived from cell lines. Plasma NGS was then blindly performed on 48 patients with advanced, progressive NSCLC and a known tumor genotype, and explored in two patients with incomplete tumor genotyping. NGS could identify mutations present in DNA dilutions at ≥ 0.4% allelic frequency with 100% sensitivity/specificity. Plasma NGS detected a broad range of driver and resistance mutations, including ALK, ROS1, and RET rearrangements, HER2 insertions, and MET amplification, with 100% specificity. Sensitivity was 77% across 62 known driver and resistance mutations from the 48 cases; in 29 cases with common EGFR and KRAS mutations, sensitivity was similar to droplet digital PCR. In two cases with incomplete tumor genotyping, plasma NGS rapidly identified a novel EGFR exon 19 deletion and a missed case of MET amplification. Blinded to tumor genotype, this plasma NGS approach detected a broad range of targetable genomic alterations in NSCLC with no false positives including complex mutations like rearrangements and unexpected resistance mutations such as EGFR C797S. Through use of widely available vacutainers and a desktop sequencing platform, this assay has the potential to be implemented broadly for patient care and translational research. ©2015 American Association for Cancer Research.
Bias-corrected targeted next-generation sequencing for rapid, multiplexed detection of actionable alterations in cell-free DNA from advanced lung cancer patients

PubMed Central

Paweletz, Cloud P.; Sacher, Adrian G.; Raymond, Chris K.; Alden, Ryan S.; O'Connell, Allison; Mach, Stacy L.; Kuang, Yanan; Gandhi, Leena; Kirschmeier, Paul; English, Jessie M.; Lim, Lee P.; Jänne, Pasi A.; Oxnard, Geoffrey R.

2015-01-01

Purpose Tumor genotyping is a powerful tool for guiding non-small cell lung cancer (NSCLC) care, however comprehensive tumor genotyping can be logistically cumbersome. To facilitate genotyping, we developed a next-generation sequencing (NGS) assay using a desktop sequencer to detect actionable mutations and rearrangements in cell-free plasma DNA (cfDNA). Experimental Design An NGS panel was developed targeting 11 driver oncogenes found in NSCLC. Targeted NGS was performed using a novel methodology that maximizes on-target reads, and minimizes artifact, and was validated on DNA dilutions derived from cell lines. Plasma NGS was then blindly performed on 48 patients with advanced, progressive NSCLC and a known tumor genotype, and explored in two patients with incomplete tumor genotyping. Results NGS could identify mutations present in DNA dilutions at ≥0.4% allelic frequency with 100% sensitivity/specificity. Plasma NGS detected a broad range of driver and resistance mutations, including ALK, ROS1, and RET rearrangements, HER2 insertions, and MET amplification, with 100% specificity. Sensitivity was 77% across 62 known driver and resistance mutations from the 48 cases; in 29 cases with common EGFR and KRAS mutations, sensitivity was similar to droplet digital PCR. In two cases with incomplete tumor genotyping, plasma NGS rapidly identified a novel EGFR exon 19 deletion and a missed case of MET amplification. Conclusion Blinded to tumor genotype, this plasma NGS approach detected a broad range of targetable genomic alterations in NSCLC with no false positives including complex mutations like rearrangements and unexpected resistance mutations such as EGFR C797S. Through use of widely available vacutainers and a desktop sequencing platform, this assay has the potential to be implemented broadly for patient care and translational research. PMID:26459174
Transcriptomic Responses to Salinity Stress in the Pacific Oyster Crassostrea gigas

PubMed Central

Zhao, Xuelin; Yu, Hong; Kong, Lingfeng; Li, Qi

2012-01-01

Background Low salinity is one of the main factors limiting the distribution and survival of marine species. As a euryhaline species, the Pacific oyster Crassostrea gigas is considered to be tolerant to relative low salinity. The genes that regulate C. gigas responses to osmotic stress were monitored using the next-generation sequencing of whole transcriptome with samples taken from gills. By RNAseq technology, transcript catalogs of up- and down-regulated genes were generated from the oysters exposed to low and optimal salinity seawater. Methodology/Principal Findings Through Illumina sequencing, we reported 1665 up-regulated transcripts and 1815 down-regulated transcripts. A total of 45771 protein-coding contigs were identified from two groups based on sequence similarities with known proteins. As determined by GO annotation and KEGG pathway mapping, functional annotation of the genes recovered diverse biological functions and processes. The genes that changed expression significantly were highly represented in cellular process and regulation of biological process, intracellular and cell, binding and protein binding according to GO annotation. The results highlighted genes related to osmoregulation, signaling and interactions of osmotic stress response, anti-apoptotic reactions as well as immune response, cell adhesion and communication, cytoskeleton and cell cycle. Conclusions/Significance Through more than 1.5 million sequence reads and the expression data of the two libraries, the study provided some useful insights into signal transduction pathways in oysters and offered a number of candidate genes as potential markers of tolerance to hypoosmotic stress for oysters. In addition, the characterization of C. gigas transcriptome will not only provide a better understanding of the molecular mechanisms about the response to osmotic stress of the oysters, but also facilitate research into biological processes to find underlying physiological adaptations to hypoosmotic shock for marine invertebrates. PMID:23029449
Elucidation of Peptide-Directed Palladium Surface Structure for Biologically Tunable Nanocatalysts

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bedford, Nicholas M.; Ramezani-Dakhel, Hadi; Slocik, Joseph M.

Peptide-enabled synthesis of inorganic nanostructures represents an avenue to access catalytic materials with tunable and optimized properties. This is achieved via peptide complexity and programmability that is missing in traditional ligands for catalytic nanomaterials. Unfortunately, there is limited information available to correlate peptide sequence to particle structure and catalytic activity to date. As such, the application of peptide-enabled nanocatalysts remains limited to trial and error approaches. In this paper, a hybrid experimental and computational approach is introduced to systematically elucidate biomolecule-dependent structure/function relationships for peptide-capped Pd nanocatalysts. Synchrotron X-ray techniques were used to uncover substantial particle surface structural disorder, whichmore » was dependent upon the amino acid sequence of the peptide capping ligand. Nanocatalyst configurations were then determined directly from experimental data using reverse Monte Carlo methods and further refined using molecular dynamics simulation, obtaining thermodynamically stable peptide-Pd nanoparticle configurations. Sequence-dependent catalytic property differences for C-C coupling and olefin hydrogenation were then eluddated by identification of the catalytic active sites at the atomic level and quantitative prediction of relative reaction rates. This hybrid methodology provides a clear route to determine peptide-dependent structure/function relationships, enabling the generation of guidelines for catalyst design through rational tailoring of peptide sequences« less
UV Decontamination of MDA Reagents for Single Cell Genomics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lee, Janey; Tighe, Damon; Sczyrba, Alexander

2011-03-18

Single cell genomics, the amplification and sequencing of genomes from single cells, can provide a glimpse into the genetic make-up and thus life style of the vast majority of uncultured microbial cells, making it an immensely powerful and increasingly popular tool. This is accomplished by use of multiple displacement amplification (MDA), which can generate billions of copies of a single bacterial genome producing microgram-range DNA required for shotgun sequencing. Here, we address a key challenge inherent to this approach and propose a solution for the improved recovery of single cell genomes. While DNA-free reagents for the amplification of a singlemore » cell genome are a prerequisite for successful single cell sequencing and analysis, DNA contamination has been detected in various reagents, which poses a considerable challenge. Our study demonstrates the effect of UV irradiation in efficient elimination of exogenous contaminant DNA found in MDA reagents, while maintaining Phi29 activity. Consequently, we also find that increased UV exposure to Phi29 does not adversely affect genome coverage of MDA amplified single cells. While additional challenges in single cell genomics remain to be resolved, the proposed methodology is relatively quick and simple and we believe that its application will be of high value for future single cell sequencing projects.« less
Elucidation of peptide-directed palladium surface structure for biologically tunable nanocatalysts.

PubMed

Bedford, Nicholas M; Ramezani-Dakhel, Hadi; Slocik, Joseph M; Briggs, Beverly D; Ren, Yang; Frenkel, Anatoly I; Petkov, Valeri; Heinz, Hendrik; Naik, Rajesh R; Knecht, Marc R

2015-05-26

Peptide-enabled synthesis of inorganic nanostructures represents an avenue to access catalytic materials with tunable and optimized properties. This is achieved via peptide complexity and programmability that is missing in traditional ligands for catalytic nanomaterials. Unfortunately, there is limited information available to correlate peptide sequence to particle structure and catalytic activity to date. As such, the application of peptide-enabled nanocatalysts remains limited to trial and error approaches. In this paper, a hybrid experimental and computational approach is introduced to systematically elucidate biomolecule-dependent structure/function relationships for peptide-capped Pd nanocatalysts. Synchrotron X-ray techniques were used to uncover substantial particle surface structural disorder, which was dependent upon the amino acid sequence of the peptide capping ligand. Nanocatalyst configurations were then determined directly from experimental data using reverse Monte Carlo methods and further refined using molecular dynamics simulation, obtaining thermodynamically stable peptide-Pd nanoparticle configurations. Sequence-dependent catalytic property differences for C-C coupling and olefin hydrogenation were then elucidated by identification of the catalytic active sites at the atomic level and quantitative prediction of relative reaction rates. This hybrid methodology provides a clear route to determine peptide-dependent structure/function relationships, enabling the generation of guidelines for catalyst design through rational tailoring of peptide sequences.
The Transposon Galileo Generates Natural Chromosomal Inversions in Drosophila by Ectopic Recombination

PubMed Central

Delprat, Alejandra; Ruiz, Alfredo

2009-01-01

Background Transposable elements (TEs) are responsible for the generation of chromosomal inversions in several groups of organisms. However, in Drosophila and other Dipterans, where inversions are abundant both as intraspecific polymorphisms and interspecific fixed differences, the evidence for a role of TEs is scarce. Previous work revealed that the transposon Galileo was involved in the generation of two polymorphic inversions of Drosophila buzzatii. Methodology/Principal Findings To assess the impact of TEs in Drosophila chromosomal evolution and shed light on the mechanism involved, we isolated and sequenced the two breakpoints of another widespread polymorphic inversion from D. buzzatii, 2z 3. In the non inverted chromosome, the 2z 3 distal breakpoint was located between genes CG2046 and CG10326 whereas the proximal breakpoint lies between two novel genes that we have named Dlh and Mdp. In the inverted chromosome, the analysis of the breakpoint sequences revealed relatively large insertions (2,870-bp and 4,786-bp long) including two copies of the transposon Galileo (subfamily Newton), one at each breakpoint, plus several other TEs. The two Galileo copies: (i) are inserted in opposite orientation; (ii) present exchanged target site duplications; and (iii) are both chimeric. Conclusions/Significance Our observations provide the best evidence gathered so far for the role of TEs in the generation of Drosophila inversions. In addition, they show unequivocally that ectopic recombination is the causative mechanism. The fact that the three polymorphic D. buzzatii inversions investigated so far were generated by the same transposon family is remarkable and is conceivably due to Galileo's unusual structure and current (or recent) transpositional activity. PMID:19936241
The Alveolate Perkinsus marinus: Biological Insights from EST Gene Discovery

PubMed Central

2010-01-01

Background Perkinsus marinus, a protozoan parasite of the eastern oyster Crassostrea virginica, has devastated natural and farmed oyster populations along the Atlantic and Gulf coasts of the United States. It is classified as a member of the Perkinsozoa, a recently established phylum considered close to the ancestor of ciliates, dinoflagellates, and apicomplexans, and a key taxon for understanding unique adaptations (e.g. parasitism) within the Alveolata. Despite intense parasite pressure, no disease-resistant oysters have been identified and no effective therapies have been developed to date. Results To gain insight into the biological basis of the parasite's virulence and pathogenesis mechanisms, and to identify genes encoding potential targets for intervention, we generated >31,000 5' expressed sequence tags (ESTs) derived from four trophozoite libraries generated from two P. marinus strains. Trimming and clustering of the sequence tags yielded 7,863 unique sequences, some of which carry a spliced leader. Similarity searches revealed that 55% of these had hits in protein sequence databases, of which 1,729 had their best hit with proteins from the chromalveolates (E-value ≤ 1e-5). Some sequences are similar to those proven to be targets for effective intervention in other protozoan parasites, and include not only proteases, antioxidant enzymes, and heat shock proteins, but also those associated with relict plastids, such as acetyl-CoA carboxylase and methyl erythrithol phosphate pathway components, and those involved in glycan assembly, protein folding/secretion, and parasite-host interactions. Conclusions Our transcriptome analysis of P. marinus, the first for any member of the Perkinsozoa, contributes new insight into its biology and taxonomic position. It provides a very informative, albeit preliminary, glimpse into the expression of genes encoding functionally relevant proteins as potential targets for chemotherapy, and evidence for the presence of a relict plastid. Further, although P. marinus sequences display significant similarity to those from both apicomplexans and dinoflagellates, the presence of trans-spliced transcripts confirms the previously established affinities with the latter. The EST analysis reported herein, together with the recently completed sequence of the P. marinus genome and the development of transfection methodology, should result in improved intervention strategies against dermo disease. PMID:20374649
Phylogenomics of plant genomes: a methodology for genome-wide searches for orthologs in plants

PubMed Central

Conte, Matthieu G; Gaillard, Sylvain; Droc, Gaetan; Perin, Christophe

2008-01-01

Background Gene ortholog identification is now a major objective for mining the increasing amount of sequence data generated by complete or partial genome sequencing projects. Comparative and functional genomics urgently need a method for ortholog detection to reduce gene function inference and to aid in the identification of conserved or divergent genetic pathways between several species. As gene functions change during evolution, reconstructing the evolutionary history of genes should be a more accurate way to differentiate orthologs from paralogs. Phylogenomics takes into account phylogenetic information from high-throughput genome annotation and is the most straightforward way to infer orthologs. However, procedures for automatic detection of orthologs are still scarce and suffer from several limitations. Results We developed a procedure for ortholog prediction between Oryza sativa and Arabidopsis thaliana. Firstly, we established an efficient method to cluster A. thaliana and O. sativa full proteomes into gene families. Then, we developed an optimized phylogenomics pipeline for ortholog inference. We validated the full procedure using test sets of orthologs and paralogs to demonstrate that our method outperforms pairwise methods for ortholog predictions. Conclusion Our procedure achieved a high level of accuracy in predicting ortholog and paralog relationships. Phylogenomic predictions for all validated gene families in both species were easily achieved and we can conclude that our methodology outperforms similarly based methods. PMID:18426584
Next generation sequencing techniques in liquid biopsy: focus on non-small cell lung cancer patients.

PubMed

Malapelle, Umberto; Pisapia, Pasquale; Rocco, Danilo; Smeraglio, Riccardo; di Spirito, Maria; Bellevicine, Claudio; Troncone, Giancarlo

2016-10-01

The advent of genomic based personalized medicine has led to multiple advances in the molecular characterization of many tumor types, such as non-small cell lung cancer (NSCLC). NSCLC is diagnosed in most cases on small tissue samples that may be not always sufficient for EGFR mutational assessment to select patients for first and second generations' tyrosine kinase inhibitors (TKIs) therapy. In patients without tissue availability at presentation, the analysis of cell free DNA (cfDNA) derived from liquid biopsy samples, in particular from plasma, represent an established alternative to provide EGFR mutational testing for treatment decision making. In addition, a new paradigm for TKIs resistance management was recently approved by Food and Drug Administration, supporting the liquid biopsy based genotyping prior to tissue based genotyping for the detection of T790M mutation to select patients for third generation TKIs. In these settings, real time PCR (RT-PCR) and digital PCR 'targeted' methods, which detect known mutations by specific probes, have extensively been adopted. Taking into account the restricted reference range and the limited multiplexing power of these targeted methods, the performance of liquid biopsy analyses may be further improved by next generation sequencing (NGS). While most tissue based NGS genotyping is well established, liquid biopsy NGS application is challenging, requiring a careful validation of the whole process, from blood collection to variant calling. Here we review this evolving field, highlighting those methodological points that are crucial to accurately select NSCLC patients for TKIs treatment administration by NGS on cfDNA.
RBT-GA: a novel metaheuristic for solving the multiple sequence alignment problem

PubMed Central

Taheri, Javid; Zomaya, Albert Y

2009-01-01

Background Multiple Sequence Alignment (MSA) has always been an active area of research in Bioinformatics. MSA is mainly focused on discovering biologically meaningful relationships among different sequences or proteins in order to investigate the underlying main characteristics/functions. This information is also used to generate phylogenetic trees. Results This paper presents a novel approach, namely RBT-GA, to solve the MSA problem using a hybrid solution methodology combining the Rubber Band Technique (RBT) and the Genetic Algorithm (GA) metaheuristic. RBT is inspired by the behavior of an elastic Rubber Band (RB) on a plate with several poles, which is analogues to locations in the input sequences that could potentially be biologically related. A GA attempts to mimic the evolutionary processes of life in order to locate optimal solutions in an often very complex landscape. RBT-GA is a population based optimization algorithm designed to find the optimal alignment for a set of input protein sequences. In this novel technique, each alignment answer is modeled as a chromosome consisting of several poles in the RBT framework. These poles resemble locations in the input sequences that are most likely to be correlated and/or biologically related. A GA-based optimization process improves these chromosomes gradually yielding a set of mostly optimal answers for the MSA problem. Conclusion RBT-GA is tested with one of the well-known benchmarks suites (BALiBASE 2.0) in this area. The obtained results show that the superiority of the proposed technique even in the case of formidable sequences. PMID:19594869
Technical variations in low-input RNA-seq methodologies.

PubMed

Bhargava, Vipul; Head, Steven R; Ordoukhanian, Phillip; Mercola, Mark; Subramaniam, Shankar

2014-01-14

Recent advances in RNA-seq methodologies from limiting amounts of mRNA have facilitated the characterization of rare cell-types in various biological systems. So far, however, technical variations in these methods have not been adequately characterized, vis-à-vis sensitivity, starting with reduced levels of mRNA. Here, we generated sequencing libraries from limiting amounts of mRNA using three amplification-based methods, viz. Smart-seq, DP-seq and CEL-seq, and demonstrated significant technical variations in these libraries. Reduction in mRNA levels led to inefficient amplification of the majority of low to moderately expressed transcripts. Furthermore, noise in primer hybridization and/or enzyme incorporation was magnified during the amplification step resulting in significant distortions in fold changes of the transcripts. Consequently, the majority of the differentially expressed transcripts identified were either high-expressed and/or exhibited high fold changes. High technical variations ultimately masked subtle biological differences mandating the development of improved amplification-based strategies for quantitative transcriptomics from limiting amounts of mRNA.
Rapid access to compound libraries through flow technology: fully automated synthesis of a 3-aminoindolizine library via orthogonal diversification.

PubMed

Lange, Paul P; James, Keith

2012-10-08

A novel methodology for the synthesis of druglike heterocycle libraries has been developed through the use of flow reactor technology. The strategy employs orthogonal modification of a heterocyclic core, which is generated in situ, and was used to construct both a 25-membered library of druglike 3-aminoindolizines, and selected examples of a 100-member virtual library. This general protocol allows a broad range of acylation, alkylation and sulfonamidation reactions to be performed in conjunction with a tandem Sonogashira coupling/cycloisomerization sequence. All three synthetic steps were conducted under full automation in the flow reactor, with no handling or isolation of intermediates, to afford the desired products in good yields. This fully automated, multistep flow approach opens the way to highly efficient generation of druglike heterocyclic systems as part of a lead discovery strategy or within a lead optimization program.
Bhageerath-H: A homology/ab initio hybrid server for predicting tertiary structures of monomeric soluble proteins

PubMed Central

2014-01-01

Background The advent of human genome sequencing project has led to a spurt in the number of protein sequences in the databanks. Success of structure based drug discovery severely hinges on the availability of structures. Despite significant progresses in the area of experimental protein structure determination, the sequence-structure gap is continually widening. Data driven homology based computational methods have proved successful in predicting tertiary structures for sequences sharing medium to high sequence similarities. With dwindling similarities of query sequences, advanced homology/ ab initio hybrid approaches are being explored to solve structure prediction problem. Here we describe Bhageerath-H, a homology/ ab initio hybrid software/server for predicting protein tertiary structures with advancing drug design attempts as one of the goals. Results Bhageerath-H web-server was validated on 75 CASP10 targets which showed TM-scores ≥0.5 in 91% of the cases and Cα RMSDs ≤5Å from the native in 58% of the targets, which is well above the CASP10 water mark. Comparison with some leading servers demonstrated the uniqueness of the hybrid methodology in effectively sampling conformational space, scoring best decoys and refining low resolution models to high and medium resolution. Conclusion Bhageerath-H methodology is web enabled for the scientific community as a freely accessible web server. The methodology is fielded in the on-going CASP11 experiment. PMID:25521245
Variable speed wind turbine generator with zero-sequence filter

DOEpatents

Muljadi, Eduard

1998-01-01

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.
Variable Speed Wind Turbine Generator with Zero-sequence Filter

DOEpatents

Muljadi, Eduard

1998-08-25

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility.
Variable speed wind turbine generator with zero-sequence filter

DOEpatents

Muljadi, E.

1998-08-25

A variable speed wind turbine generator system to convert mechanical power into electrical power or energy and to recover the electrical power or energy in the form of three phase alternating current and return the power or energy to a utility or other load with single phase sinusoidal waveform at sixty (60) hertz and unity power factor includes an excitation controller for generating three phase commanded current, a generator, and a zero sequence filter. Each commanded current signal includes two components: a positive sequence variable frequency current signal to provide the balanced three phase excitation currents required in the stator windings of the generator to generate the rotating magnetic field needed to recover an optimum level of real power from the generator; and a zero frequency sixty (60) hertz current signal to allow the real power generated by the generator to be supplied to the utility. The positive sequence current signals are balanced three phase signals and are prevented from entering the utility by the zero sequence filter. The zero sequence current signals have zero phase displacement from each other and are prevented from entering the generator by the star connected stator windings. The zero sequence filter allows the zero sequence current signals to pass through to deliver power to the utility. 14 figs.
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Autogen Version 2.0

NASA Technical Reports Server (NTRS)

Gladden, Roy

2007-01-01

Version 2.0 of the autogen software has been released. "Autogen" (automated sequence generation) signifies both a process and software used to implement the process of automated generation of sequences of commands in a standard format for uplink to spacecraft. Autogen requires fewer workers than are needed for older manual sequence-generation processes and reduces sequence-generation times from weeks to minutes.

Periodic, On-Demand, and User-Specified Information Reconciliation

NASA Technical Reports Server (NTRS)

Kolano, Paul

2007-01-01

Automated sequence generation (autogen) signifies both a process and software used to automatically generate sequences of commands to operate various spacecraft. Autogen requires fewer workers than are needed for older manual sequence-generation processes and reduces sequence-generation times from weeks to minutes. The autogen software comprises the autogen script plus the Activity Plan Generator (APGEN) program. APGEN can be used for planning missions and command sequences. APGEN includes a graphical user interface that facilitates scheduling of activities on a time line and affords a capability to automatically expand, decompose, and schedule activities.
The Methodology of Clinical Studies Used by the FDA for Approval of High-Risk Orthopaedic Devices.

PubMed

Barker, Jordan P; Simon, Stephen D; Dubin, Jonathan

2017-05-03

The purpose of this investigation was to examine the methodology of clinical trials used by the U.S. Food and Drug Administration (FDA) to determine the safety and effectiveness of high-risk orthopaedic devices approved between 2001 and 2015. Utilizing the FDA's online public database, this systematic review audited study design and methodological variables intended to minimize bias and confounding. An additional analysis of blinding as well as the Checklist to Evaluate a Report of a Nonpharmacological Trial (CLEAR NPT) was applied to the randomized controlled trials (RCTs). Of the 49 studies, 46 (94%) were prospective and 37 (76%) were randomized. Forty-seven (96%) of the studies were controlled in some form. Of 35 studies that reported it, blinding was utilized in 21 (60%), of which 8 (38%) were reported as single-blinded and 13 (62%) were reported as double-blinded. Of the 37 RCTs, outcome assessors were clearly blinded in 6 (16%), whereas 15 (41%) were deemed impossible to blind as implants could be readily discerned on imaging. When the CLEAR NPT was applied to the 37 RCTs, >70% of studies were deemed "unclear" in describing generation of allocation sequences, treatment allocation concealment, and adequate blinding of participants and outcome assessors. This study manifests the highly variable reporting and strength of clinical research methodology accepted by the FDA to approve high-risk orthopaedic devices.
Reanalysis of RNA-Sequencing Data Reveals Several Additional Fusion Genes with Multiple Isoforms

PubMed Central

Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli

2012-01-01

RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts. PMID:23119097
Reanalysis of RNA-sequencing data reveals several additional fusion genes with multiple isoforms.

PubMed

Kangaspeska, Sara; Hultsch, Susanne; Edgren, Henrik; Nicorici, Daniel; Murumägi, Astrid; Kallioniemi, Olli

2012-01-01

RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60%) of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
Evaluating and Redesigning Teaching Learning Sequences at the Introductory Physics Level

ERIC Educational Resources Information Center

Guisasola, Jenaro; Zuza, Kristina; Ametller, Jaume; Gutierrez-Berraondo, José

2017-01-01

In this paper we put forward a proposal for the design and evaluation of teaching and learning sequences in upper secondary school and university. We will connect our proposal with relevant contributions on the design of teaching sequences, ground it on the design-based research methodology, and discuss how teaching and learning sequences designed…
Unraveling the genetic diversity and phylogeny of Leishmania RNA virus 1 strains of infected Leishmania isolates circulating in French Guiana

PubMed Central

Caballero, Ignacio S.; Bouchier, Christiane; Lavergne, Anne; Bourreau, Eliane; Mosnier, Emilie; Vantilcke, Vincent; Couppié, Pierre; Prevot, Ghislaine

2017-01-01

Introduction Leishmania RNA virus type 1 (LRV1) is an endosymbiont of some Leishmania (Vianna) species in South America. Presence of LRV1 in parasites exacerbates disease severity in animal models and humans, related to a disproportioned innate immune response, and is correlated with drug treatment failures in humans. Although the virus was identified decades ago, its genomic diversity has been overlooked until now. Methodology/Principles findings We subjected LRV1 strains from 19 L. (V.) guyanensis and one L. (V.) braziliensis isolates obtained from cutaneous leishmaniasis samples identified throughout French Guiana with next-generation sequencing and de novo sequence assembly. We generated and analyzed 24 unique LRV1 sequences over their full-length coding regions. Multiple alignment of these new sequences revealed variability (0.5%–23.5%) across the entire sequence except for highly conserved motifs within the 5’ untranslated region. Phylogenetic analyses showed that viral genomes of L. (V.) guyanensis grouped into five distinct clusters. They further showed a species-dependent clustering between viral genomes of L. (V.) guyanensis and L. (V.) braziliensis, confirming a long-term co-evolutionary history. Noteworthy, we identified cases of multiple LRV1 infections in three of the 20 Leishmania isolates. Conclusions/Significance Here, we present the first-ever estimate of LRV1 genomic diversity that exists in Leishmania (V.) guyanensis parasites. Genetic characterization and phylogenetic analyses of these viruses has shed light on their evolutionary relationships. To our knowledge, this study is also the first to report cases of multiple LRV1 infections in some parasites. Finally, this work has made it possible to develop molecular tools for adequate identification and genotyping of LRV1 strains for diagnostic purposes. Given the suspected worsening role of LRV1 infection in the pathogenesis of human leishmaniasis, these data have a major impact from a clinical viewpoint and for the management of Leishmania-infected patients. PMID:28715422
Profiling pneumococcal type 3-derived oligosaccharides by high resolution liquid chromatography-tandem mass spectrometry

PubMed Central

Li, Guoyun; Li, Lingyun; Xue, Changhu; Middleton, Dustin; Linhardt, Robert J.; Avci, Fikri Y.

2015-01-01

Pneumococcal type-3 polysaccharide (Pn3P) is considered a major target for the development of a human vaccine to protect against Streptococcus pneumonia infection. Thus, it is critical to develop methods for the preparation and analysis of Pn3P-derived oligosaccharides to better understand its immunological properties. In this paper, we profile oligosaccharides, generated by the free radical depolymerization of Pn3P, using liquid chromatography (LC)-tandem mass spectrometry (MS/MS). Hydrophilic liquid interaction chromatography (HILIC)-mass spectrometry (MS) revealed a series of oligosaccharides with an even- and odd-number of saccharide residues, ranging from monosaccharide, degree of polymerization (dp1) to large oligosaccharides up to dp 20, generated by free radical depolymerization. Isomers of oligosaccharides with an even number of sugar residues were easily separated on a HILIC column, and their sequences could be distinguished by comparing MS/MS of these oligosaccharides and their reduced alditols. Fluorescent labeling with 2-aminoacridone (AMAC) followed by reversed phase (RP)-LC-MS/MS was applied to analyze and sequence poorly separated product mixtures, as RP-LC affords higher resolution of AMAC-labeled oligosaccharides than does HILIC-based separation. The present methodology can be potentially applied to profiling other capsular polysaccharides. PMID:25913329
Metaproteomics Provides Functional Insight into Activated Sludge Wastewater Treatment

PubMed Central

Wilmes, Paul; Wexler, Margaret; Bond, Philip L.

2008-01-01

Background Through identification of highly expressed proteins from a mixed culture activated sludge system this study provides functional evidence of microbial transformations important for enhanced biological phosphorus removal (EBPR). Methodology/Principal Findings A laboratory-scale sequencing batch reactor was successfully operated for different levels of EBPR, removing around 25, 40 and 55 mg/l P. The microbial communities were dominated by the uncultured polyphosphate-accumulating organism “Candidatus Accumulibacter phosphatis”. When EBPR failed, the sludge was dominated by tetrad-forming α-Proteobacteria. Representative and reproducible 2D gel protein separations were obtained for all sludge samples. 638 protein spots were matched across gels generated from the phosphate removing sludges. 111 of these were excised and 46 proteins were identified using recently available sludge metagenomic sequences. Many of these closely match proteins from “Candidatus Accumulibacter phosphatis” and could be directly linked to the EBPR process. They included enzymes involved in energy generation, polyhydroxyalkanoate synthesis, glycolysis, gluconeogenesis, glycogen synthesis, glyoxylate/TCA cycle, fatty acid β oxidation, fatty acid synthesis and phosphate transport. Several proteins involved in cellular stress response were detected. Conclusions/Significance Importantly, this study provides direct evidence linking the metabolic activities of “Accumulibacter” to the chemical transformations observed in EBPR. Finally, the results are discussed in relation to current EBPR metabolic models. PMID:18392150
Smoking Gun or Circumstantial Evidence? Comparison of Statistical Learning Methods using Functional Annotations for Prioritizing Risk Variants.

PubMed

Gagliano, Sarah A; Ravji, Reena; Barnes, Michael R; Weale, Michael E; Knight, Jo

2015-08-24

Although technology has triumphed in facilitating routine genome sequencing, new challenges have been created for the data-analyst. Genome-scale surveys of human variation generate volumes of data that far exceed capabilities for laboratory characterization. By incorporating functional annotations as predictors, statistical learning has been widely investigated for prioritizing genetic variants likely to be associated with complex disease. We compared three published prioritization procedures, which use different statistical learning algorithms and different predictors with regard to the quantity, type and coding. We also explored different combinations of algorithm and annotation set. As an application, we tested which methodology performed best for prioritizing variants using data from a large schizophrenia meta-analysis by the Psychiatric Genomics Consortium. Results suggest that all methods have considerable (and similar) predictive accuracies (AUCs 0.64-0.71) in test set data, but there is more variability in the application to the schizophrenia GWAS. In conclusion, a variety of algorithms and annotations seem to have a similar potential to effectively enrich true risk variants in genome-scale datasets, however none offer more than incremental improvement in prediction. We discuss how methods might be evolved for risk variant prediction to address the impending bottleneck of the new generation of genome re-sequencing studies.
Revealing stable processing products from ribosome-associated small RNAs by deep-sequencing data analysis.

PubMed

Zywicki, Marek; Bakowska-Zywicka, Kamilla; Polacek, Norbert

2012-05-01

The exploration of the non-protein-coding RNA (ncRNA) transcriptome is currently focused on profiling of microRNA expression and detection of novel ncRNA transcription units. However, recent studies suggest that RNA processing can be a multi-layer process leading to the generation of ncRNAs of diverse functions from a single primary transcript. Up to date no methodology has been presented to distinguish stable functional RNA species from rapidly degraded side products of nucleases. Thus the correct assessment of widespread RNA processing events is one of the major obstacles in transcriptome research. Here, we present a novel automated computational pipeline, named APART, providing a complete workflow for the reliable detection of RNA processing products from next-generation-sequencing data. The major features include efficient handling of non-unique reads, detection of novel stable ncRNA transcripts and processing products and annotation of known transcripts based on multiple sources of information. To disclose the potential of APART, we have analyzed a cDNA library derived from small ribosome-associated RNAs in Saccharomyces cerevisiae. By employing the APART pipeline, we were able to detect and confirm by independent experimental methods multiple novel stable RNA molecules differentially processed from well known ncRNAs, like rRNAs, tRNAs or snoRNAs, in a stress-dependent manner.
Profiling pneumococcal type 3-derived oligosaccharides by high resolution liquid chromatography-tandem mass spectrometry.

PubMed

Li, Guoyun; Li, Lingyun; Xue, Changhu; Middleton, Dustin; Linhardt, Robert J; Avci, Fikri Y

2015-06-05

Pneumococcal type-3 polysaccharide (Pn3P) is considered a major target for the development of a human vaccine to protect against Streptococcus pneumoniae infection. Thus, it is critical to develop methods for the preparation and analysis of Pn3P-derived oligosaccharides to better understand its immunological properties. In this paper, we profile oligosaccharides, generated by the free radical depolymerization of Pn3P, using liquid chromatography (LC)-tandem mass spectrometry (MS/MS). Hydrophilic liquid interaction chromatography (HILIC)-mass spectrometry (MS) revealed a series of oligosaccharides with an even- and odd-number of saccharide residues, ranging from monosaccharide, degree of polymerization (dp1) to large oligosaccharides up to dp 20, generated by free radical depolymerization. Isomers of oligosaccharides with an even number of sugar residues were easily separated on a HILIC column, and their sequences could be distinguished by comparing MS/MS of these oligosaccharides and their reduced alditols. Fluorescent labeling with 2-aminoacridone (AMAC) followed by reversed phase (RP)-LC-MS/MS was applied to analyze and sequence poorly separated product mixtures, as RP-LC affords higher resolution of AMAC-labeled oligosaccharides than does HILIC-based separation. The present methodology can be potentially applied to profiling other capsular polysaccharides. Copyright © 2015 Elsevier B.V. All rights reserved.
Image encryption using random sequence generated from generalized information domain

NASA Astrophysics Data System (ADS)

Xia-Yan, Zhang; Guo-Ji, Zhang; Xuan, Li; Ya-Zhou, Ren; Jie-Hua, Wu

2016-05-01

A novel image encryption method based on the random sequence generated from the generalized information domain and permutation-diffusion architecture is proposed. The random sequence is generated by reconstruction from the generalized information file and discrete trajectory extraction from the data stream. The trajectory address sequence is used to generate a P-box to shuffle the plain image while random sequences are treated as keystreams. A new factor called drift factor is employed to accelerate and enhance the performance of the random sequence generator. An initial value is introduced to make the encryption method an approximately one-time pad. Experimental results show that the random sequences pass the NIST statistical test with a high ratio and extensive analysis demonstrates that the new encryption scheme has superior security.
Community Profiling of Fusarium in Combination with Other Plant-Associated Fungi in Different Crop Species Using SMRT Sequencing.

PubMed

Walder, Florian; Schlaeppi, Klaus; Wittwer, Raphaël; Held, Alain Y; Vogelgsang, Susanne; van der Heijden, Marcel G A

2017-01-01

Fusarium head blight, caused by fungi from the genus Fusarium , is one of the most harmful cereal diseases, resulting not only in severe yield losses but also in mycotoxin contaminated and health-threatening grains. Fusarium head blight is caused by a diverse set of species that have different host ranges, mycotoxin profiles and responses to agricultural practices. Thus, understanding the composition of Fusarium communities in the field is crucial for estimating their impact and also for the development of effective control measures. Up to now, most molecular tools that monitor Fusarium communities on plants are limited to certain species and do not distinguish other plant associated fungi. To close these gaps, we developed a sequencing-based community profiling methodology for crop-associated fungi with a focus on the genus Fusarium . By analyzing a 1600 bp long amplicon spanning the highly variable segments ITS and D1-D3 of the ribosomal operon by PacBio SMRT sequencing, we were able to robustly quantify Fusarium down to species level through clustering against reference sequences. The newly developed methodology was successfully validated in mock communities and provided similar results as the culture-based assessment of Fusarium communities by seed health tests in grain samples from different crop species. Finally, we exemplified the newly developed methodology in a field experiment with a wheat-maize crop sequence under different cover crop and tillage regimes. We analyzed wheat straw residues, cover crop shoots and maize grains and we could reveal that the cover crop hairy vetch ( Vicia villosa ) acts as a potent alternative host for Fusarium (OTU F.ave/tri ) showing an eightfold higher relative abundance compared with other cover crop treatments. Moreover, as the newly developed methodology also allows to trace other crop-associated fungi, we found that vetch and green fallow hosted further fungal plant pathogens including Zymoseptoria tritici . Thus, besides their beneficial traits, cover crops can also entail phytopathological risks by acting as alternative hosts for Fusarium and other noxious plant pathogens. The newly developed sequencing based methodology is a powerful diagnostic tool to trace Fusarium in combination with other fungi associated to different crop species.
Community Profiling of Fusarium in Combination with Other Plant-Associated Fungi in Different Crop Species Using SMRT Sequencing

PubMed Central

Walder, Florian; Schlaeppi, Klaus; Wittwer, Raphaël; Held, Alain Y.; Vogelgsang, Susanne; van der Heijden, Marcel G. A.

2017-01-01

Fusarium head blight, caused by fungi from the genus Fusarium, is one of the most harmful cereal diseases, resulting not only in severe yield losses but also in mycotoxin contaminated and health-threatening grains. Fusarium head blight is caused by a diverse set of species that have different host ranges, mycotoxin profiles and responses to agricultural practices. Thus, understanding the composition of Fusarium communities in the field is crucial for estimating their impact and also for the development of effective control measures. Up to now, most molecular tools that monitor Fusarium communities on plants are limited to certain species and do not distinguish other plant associated fungi. To close these gaps, we developed a sequencing-based community profiling methodology for crop-associated fungi with a focus on the genus Fusarium. By analyzing a 1600 bp long amplicon spanning the highly variable segments ITS and D1–D3 of the ribosomal operon by PacBio SMRT sequencing, we were able to robustly quantify Fusarium down to species level through clustering against reference sequences. The newly developed methodology was successfully validated in mock communities and provided similar results as the culture-based assessment of Fusarium communities by seed health tests in grain samples from different crop species. Finally, we exemplified the newly developed methodology in a field experiment with a wheat-maize crop sequence under different cover crop and tillage regimes. We analyzed wheat straw residues, cover crop shoots and maize grains and we could reveal that the cover crop hairy vetch (Vicia villosa) acts as a potent alternative host for Fusarium (OTU F.ave/tri) showing an eightfold higher relative abundance compared with other cover crop treatments. Moreover, as the newly developed methodology also allows to trace other crop-associated fungi, we found that vetch and green fallow hosted further fungal plant pathogens including Zymoseptoria tritici. Thus, besides their beneficial traits, cover crops can also entail phytopathological risks by acting as alternative hosts for Fusarium and other noxious plant pathogens. The newly developed sequencing based methodology is a powerful diagnostic tool to trace Fusarium in combination with other fungi associated to different crop species. PMID:29234337
Advances for Studying Clonal Evolution in Cancer

PubMed Central

Raphael, Benjamin J.; Chen, Feng; Wendl, Michael C.

2013-01-01

The “clonal evolution” model of cancer emerged and “evolved” amid ongoing advances in technology, especially in recent years during which next generation sequencing instruments have provided ever higher resolution pictures of the genetic changes in cancer cells and heterogeneity in tumors. It has become increasingly clear that clonal evolution is not a single sequential process, but instead frequently involves simultaneous evolution of multiple subclones that co-exist because they are of similar fitness or are spatially separated. Co-evolution of subclones also occurs when they complement each other’s survival advantages. Recent studies have also shown that clonal evolution is highly heterogeneous: different individual tumors of the same type may undergo very different paths of clonal evolution. New methodological advancements, including deep digital sequencing of a mixed tumor population, single cell sequencing, and the development of more sophisticated computational tools, will continue to shape and reshape the models of clonal evolution. In turn, these will provide both an improved framework for the understanding of cancer progression and a guide for treatment strategies aimed at the elimination of all, rather than just some, of the cancer cells within a patient. PMID:23353056
Advances for studying clonal evolution in cancer.

PubMed

Ding, Li; Raphael, Benjamin J; Chen, Feng; Wendl, Michael C

2013-11-01

The "clonal evolution" model of cancer emerged and "evolved" amid ongoing advances in technology, especially in recent years during which next generation sequencing instruments have provided ever higher resolution pictures of the genetic changes in cancer cells and heterogeneity in tumors. It has become increasingly clear that clonal evolution is not a single sequential process, but instead frequently involves simultaneous evolution of multiple subclones that co-exist because they are of similar fitness or are spatially separated. Co-evolution of subclones also occurs when they complement each other's survival advantages. Recent studies have also shown that clonal evolution is highly heterogeneous: different individual tumors of the same type may undergo very different paths of clonal evolution. New methodological advancements, including deep digital sequencing of a mixed tumor population, single cell sequencing, and the development of more sophisticated computational tools, will continue to shape and reshape the models of clonal evolution. In turn, these will provide both an improved framework for the understanding of cancer progression and a guide for treatment strategies aimed at the elimination of all, rather than just some, of the cancer cells within a patient. Copyright © 2013 Elsevier Ireland Ltd. All rights reserved.
Fine-tuning gene networks using simple sequence repeats

PubMed Central

Egbert, Robert G.; Klavins, Eric

2012-01-01

The parameters in a complex synthetic gene network must be extensively tuned before the network functions as designed. Here, we introduce a simple and general approach to rapidly tune gene networks in Escherichia coli using hypermutable simple sequence repeats embedded in the spacer region of the ribosome binding site. By varying repeat length, we generated expression libraries that incrementally and predictably sample gene expression levels over a 1,000-fold range. We demonstrate the utility of the approach by creating a bistable switch library that programmatically samples the expression space to balance the two states of the switch, and we illustrate the need for tuning by showing that the switch’s behavior is sensitive to host context. Further, we show that mutation rates of the repeats are controllable in vivo for stability or for targeted mutagenesis—suggesting a new approach to optimizing gene networks via directed evolution. This tuning methodology should accelerate the process of engineering functionally complex gene networks. PMID:22927382
Identification of Milk Component in Ancient Food Residue by Proteomics

PubMed Central

Hong, Chuan; Jiang, Hongen; Lü, Enguo; Wu, Yunfei; Guo, Lihai; Xie, Yongming; Wang, Changsui; Yang, Yimin

2012-01-01

Background Proteomic approaches based on mass spectrometry have been recently used in archaeological and art researches, generating promising results for protein identification. Little information is known about eastward spread and eastern limits of prehistoric milking in eastern Eurasia. Methodology/Principal Finding In this paper, an ancient visible food remain from Subeixi Cemeteries (cal. 500 to 300 years BC) of the Turpan Basin in Xinjiang, China, preliminarily determined containing 0.432 mg/kg cattle casein with ELISA, was analyzed by using an improved method based on liquid chromatography (LC) coupled with MALDI-TOF/TOF-MS to further identify protein origin. The specific sequence of bovine casein and the homology sequence of goat/sheep casein were identified. Conclusions/Significance The existence of milk component in ancient food implies goat/sheep and cattle milking in ancient Subeixi region, the furthest eastern location of prehistoric milking in the Old World up to date. It is envisioned that this work provides a new approach for ancient residue analysis and other archaeometry field. PMID:22615887
Specifics of the methodological approach to the study of nanoparticle impact on human health in the production of non-metallic nanomaterials for construction purposes

NASA Astrophysics Data System (ADS)

Ayzenshtadt, A. M.; Frolova, M. A.; Makhova, T. A.; Danilov, V. E.; Gupta, Piyush K.; Verma, Rama S.

2018-01-01

Minerals samples of mixed-genesis rocks in a finely dispersed state were obtained and studied, namely sand deposit (Kholmogory district) and basalt (Myandukha deposit, Plesetsk district) in Arkhangelsk region. The paper provides the chemical composition data used to calculate the specific mass atomization energy of rocks. The energy parameters of the micro and nano systems of the rock samples - free surface energy and surface activity - were calculated. For toxicological evaluation of the materials obtained, next-generation sequencing (NGS) was used to perform metagenomic analysis which allowed determining the species diversity of microorganisms in the samples under study. It was shown that the sequencing method and metagenomic analysis are applicable and provide good reproducibility for the analysis of the toxicological properties of selected rock samples. The correlation of the surface activity of finely dispersed rock systems and the species diversity of cultivated microorganisms on the raw material was observed.
Verification of nonlinear dynamic structural test results by combined image processing and acoustic analysis

NASA Astrophysics Data System (ADS)

Tene, Yair; Tene, Noam; Tene, G.

1993-08-01

An interactive data fusion methodology of video, audio, and nonlinear structural dynamic analysis for potential application in forensic engineering is presented. The methodology was developed and successfully demonstrated in the analysis of heavy transportable bridge collapse during preparation for testing. Multiple bridge elements failures were identified after the collapse, including fracture, cracks and rupture of high performance structural materials. Videotape recording by hand held camcorder was the only source of information about the collapse sequence. The interactive data fusion methodology resulted in extracting relevant information form the videotape and from dynamic nonlinear structural analysis, leading to full account of the sequence of events during the bridge collapse.

Nanopore Technology: A Simple, Inexpensive, Futuristic Technology for DNA Sequencing.

PubMed

Gupta, P D

2016-10-01

In health care, importance of DNA sequencing has been fully established. Sanger's Capillary Electrophoresis DNA sequencing methodology is time consuming, cumbersome, hence become more expensive. Lately, because of its versatility DNA sequencing became house hold name, and therefore, there is an urgent need of simple, fast, inexpensive, DNA sequencing technology. In the beginning of this century efforts were made, and Nanopore DNA sequencing technology was developed; still it is infancy, nevertheless, it is the futuristic technology.
Ancient DNA sequence revealed by error-correcting codes.

PubMed

Brandão, Marcelo M; Spoladore, Larissa; Faria, Luzinete C B; Rocha, Andréa S L; Silva-Filho, Marcio C; Palazzo, Reginaldo

2015-07-10

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code.
Ancient DNA sequence revealed by error-correcting codes

PubMed Central

Brandão, Marcelo M.; Spoladore, Larissa; Faria, Luzinete C. B.; Rocha, Andréa S. L.; Silva-Filho, Marcio C.; Palazzo, Reginaldo

2015-01-01

A previously described DNA sequence generator algorithm (DNA-SGA) using error-correcting codes has been employed as a computational tool to address the evolutionary pathway of the genetic code. The code-generated sequence alignment demonstrated that a residue mutation revealed by the code can be found in the same position in sequences of distantly related taxa. Furthermore, the code-generated sequences do not promote amino acid changes in the deviant genomes through codon reassignment. A Bayesian evolutionary analysis of both code-generated and homologous sequences of the Arabidopsis thaliana malate dehydrogenase gene indicates an approximately 1 MYA divergence time from the MDH code-generated sequence node to its paralogous sequences. The DNA-SGA helps to determine the plesiomorphic state of DNA sequences because a single nucleotide alteration often occurs in distantly related taxa and can be found in the alternative codon patterns of noncanonical genetic codes. As a consequence, the algorithm may reveal an earlier stage of the evolution of the standard code. PMID:26159228
Methodology for the analysis of pollutant emissions from a city bus

NASA Astrophysics Data System (ADS)

Armas, Octavio; Lapuerta, Magín; Mata, Carmen

2012-04-01

In this work a methodology is proposed for measurement and analysis of gaseous emissions and particle size distributions emitted by a diesel city bus during its typical operation under urban driving conditions. As test circuit, a passenger transportation line at a Spanish city was used. Different ways for data processing and representation were studied and, derived from this work, a new approach is proposed. The methodology was useful to detect the most important uncertainties arising during registration and processing of data derived from a measurement campaign devoted to determine the main pollutant emissions. A HORIBA OBS-1300 gas analyzer and a TSI engine exhaust particle spectrometer were used with 1 Hz frequency data recording. The methodology proposed allows for the comparison of results (in mean values) derived from the analysis of either complete cycles or specific categories (or sequences). The analysis by categories is demonstrated to be a robust and helpful tool to isolate the effect of the main vehicle parameters (relative fuel-air ratio and velocity) on pollutant emissions. It was shown that acceleration sequences have the highest contribution to the total emissions, whereas deceleration sequences have the least.
Methodological reporting of randomized controlled trials in major hepato-gastroenterology journals in 2008 and 1998: a comparative study

PubMed Central

2011-01-01

Background It was still unclear whether the methodological reporting quality of randomized controlled trials (RCTs) in major hepato-gastroenterology journals improved after the Consolidated Standards of Reporting Trials (CONSORT) Statement was revised in 2001. Methods RCTs in five major hepato-gastroenterology journals published in 1998 or 2008 were retrieved from MEDLINE using a high sensitivity search method and their reporting quality of methodological details were evaluated based on the CONSORT Statement and Cochrane Handbook for Systematic Reviews of interventions. Changes of the methodological reporting quality between 2008 and 1998 were calculated by risk ratios with 95% confidence intervals. Results A total of 107 RCTs published in 2008 and 99 RCTs published in 1998 were found. Compared to those in 1998, the proportion of RCTs that reported sequence generation (RR, 5.70; 95%CI 3.11-10.42), allocation concealment (RR, 4.08; 95%CI 2.25-7.39), sample size calculation (RR, 3.83; 95%CI 2.10-6.98), incomplete outecome data addressed (RR, 1.81; 95%CI, 1.03-3.17), intention-to-treat analyses (RR, 3.04; 95%CI 1.72-5.39) increased in 2008. Blinding and intent-to-treat analysis were reported better in multi-center trials than in single-center trials. The reporting of allocation concealment and blinding were better in industry-sponsored trials than in public-funded trials. Compared with historical studies, the methodological reporting quality improved with time. Conclusion Although the reporting of several important methodological aspects improved in 2008 compared with those published in 1998, which may indicate the researchers had increased awareness of and compliance with the revised CONSORT statement, some items were still reported badly. There is much room for future improvement. PMID:21801429
Exome sequencing is an efficient tool for variant late-infantile neuronal ceroid lipofuscinosis molecular diagnosis.

PubMed

Patiño, Liliana Catherine; Battu, Rajani; Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

2014-01-01

The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease.
Diagnosis of Sepsis with Cell-free DNA by Next-Generation Sequencing Technology in ICU Patients.

PubMed

Long, Yun; Zhang, Yinxin; Gong, Yanping; Sun, Ruixue; Su, Longxiang; Lin, Xin; Shen, Ao; Zhou, Jiali; Caiji, Zhuoma; Wang, Xinying; Li, Dongfang; Wu, Honglong; Tan, Hongdong

2016-07-01

Bacteremia is a common serious manifestation of disease in the intensive care unit (ICU), which requires quick and accurate determinations of pathogens to select the appropriate antibiotic treatment. To overcome the shortcomings of traditional bacterial culture (BC), we have adapted next-generation sequencing (NGS) technology to identify pathogens from cell-free plasma DNA. In this study, 78 plasma samples from ICU patients were analyzed by both NGS and BC methods and verified by PCR amplification/Sanger sequencing and ten plasma samples from healthy volunteers were analyzed by NGS as negative controls to define or calibrate the threshold of the NGS methodology. Overall, 1578 suspected patient samples were found to contain bacteria or fungi by NGS, whereas ten patients were diagnosed by BC. Seven samples were diagnosed with bacterial or fungal infection both by NGS and BC. Among them, two samples were diagnosed with two types of bacteria by NGS, whereas one sample was diagnosed with two types of bacteria by BC, which increased the detectability of bacteria or fungi from 11 with BC to 17 with NGS. Most interestingly, 14 specimens were also diagnosed with viral infection by NGS. The overall diagnostic sensitivity was significantly increased from 12.82% (10/78) by BC alone to 30.77% (24/78) by NGS alone for ICU patients, which provides more useful information for establishing patient treatment plans. NGS technology can be applied to detect bacteria in clinical blood samples as an emerging diagnostic tool rich in information to determine the appropriate treatment of septic patients. Copyright © 2016 IMSS. Published by Elsevier Inc. All rights reserved.
Methodology for Designing Fault-Protection Software

NASA Technical Reports Server (NTRS)

Barltrop, Kevin; Levison, Jeffrey; Kan, Edwin

2006-01-01

A document describes a methodology for designing fault-protection (FP) software for autonomous spacecraft. The methodology embodies and extends established engineering practices in the technical discipline of Fault Detection, Diagnosis, Mitigation, and Recovery; and has been successfully implemented in the Deep Impact Spacecraft, a NASA Discovery mission. Based on established concepts of Fault Monitors and Responses, this FP methodology extends the notion of Opinion, Symptom, Alarm (aka Fault), and Response with numerous new notions, sub-notions, software constructs, and logic and timing gates. For example, Monitor generates a RawOpinion, which graduates into Opinion, categorized into no-opinion, acceptable, or unacceptable opinion. RaiseSymptom, ForceSymptom, and ClearSymptom govern the establishment and then mapping to an Alarm (aka Fault). Local Response is distinguished from FP System Response. A 1-to-n and n-to- 1 mapping is established among Monitors, Symptoms, and Responses. Responses are categorized by device versus by function. Responses operate in tiers, where the early tiers attempt to resolve the Fault in a localized step-by-step fashion, relegating more system-level response to later tier(s). Recovery actions are gated by epoch recovery timing, enabling strategy, urgency, MaxRetry gate, hardware availability, hazardous versus ordinary fault, and many other priority gates. This methodology is systematic, logical, and uses multiple linked tables, parameter files, and recovery command sequences. The credibility of the FP design is proven via a fault-tree analysis "top-down" approach, and a functional fault-mode-effects-and-analysis via "bottoms-up" approach. Via this process, the mitigation and recovery strategy(s) per Fault Containment Region scope (width versus depth) the FP architecture.
The challenge of recreation planning: methodology and factors to consider

Treesearch

Ronald B. Uleck

1971-01-01

The proposed methodology of planning is a description, explanation, and justification of the methods or techniques that a planner should use in preparing outdoor recreation development plans. The sequence of steps required is described
Relationships between palaeogeography and opal occurrence in Australia: A data-mining approach

NASA Astrophysics Data System (ADS)

Landgrebe, T. C. W.; Merdith, A.; Dutkiewicz, A.; Müller, R. D.

2013-07-01

Age-coded multi-layered geological datasets are becoming increasingly prevalent with the surge in open-access geodata, yet there are few methodologies for extracting geological information and knowledge from these data. We present a novel methodology, based on the open-source GPlates software in which age-coded digital palaeogeographic maps are used to “data-mine” spatio-temporal patterns related to the occurrence of Australian opal. Our aim is to test the concept that only a particular sequence of depositional/erosional environments may lead to conditions suitable for the formation of gem quality sedimentary opal. Time-varying geographic environment properties are extracted from a digital palaeogeographic dataset of the eastern Australian Great Artesian Basin (GAB) at 1036 opal localities. We obtain a total of 52 independent ordinal sequences sampling 19 time slices from the Early Cretaceous to the present-day. We find that 95% of the known opal deposits are tied to only 27 sequences all comprising fluvial and shallow marine depositional sequences followed by a prolonged phase of erosion. We then map the total area of the GAB that matches these 27 opal-specific sequences, resulting in an opal-prospective region of only about 10% of the total area of the basin. The key patterns underlying this association involve only a small number of key environmental transitions. We demonstrate that these key associations are generally absent at arbitrary locations in the basin. This new methodology allows for the simplification of a complex time-varying geological dataset into a single map view, enabling straightforward application for opal exploration and for future co-assessment with other datasets/geological criteria. This approach may help unravel the poorly understood opal formation process using an empirical spatio-temporal data-mining methodology and readily available datasets to aid hypothesis testing.
Hypothesis testing on the fractal structure of behavioral sequences: the Bayesian assessment of scaling methodology.

PubMed

Moscoso del Prado Martín, Fermín

2013-12-01

I introduce the Bayesian assessment of scaling (BAS), a simple but powerful Bayesian hypothesis contrast methodology that can be used to test hypotheses on the scaling regime exhibited by a sequence of behavioral data. Rather than comparing parametric models, as typically done in previous approaches, the BAS offers a direct, nonparametric way to test whether a time series exhibits fractal scaling. The BAS provides a simpler and faster test than do previous methods, and the code for making the required computations is provided. The method also enables testing of finely specified hypotheses on the scaling indices, something that was not possible with the previously available methods. I then present 4 simulation studies showing that the BAS methodology outperforms the other methods used in the psychological literature. I conclude with a discussion of methodological issues on fractal analyses in experimental psychology. PsycINFO Database Record (c) 2014 APA, all rights reserved.
JVM: Java Visual Mapping tool for next generation sequencing read.

PubMed

Yang, Ye; Liu, Juan

2015-01-01

We developed a program JVM (Java Visual Mapping) for mapping next generation sequencing read to reference sequence. The program is implemented in Java and is designed to deal with millions of short read generated by sequence alignment using the Illumina sequencing technology. It employs seed index strategy and octal encoding operations for sequence alignments. JVM is useful for DNA-Seq, RNA-Seq when dealing with single-end resequencing. JVM is a desktop application, which supports reads capacity from 1 MB to 10 GB.
System, method and apparatus for generating phrases from a database

NASA Technical Reports Server (NTRS)

McGreevy, Michael W. (Inventor)

2004-01-01

A phrase generation is a method of generating sequences of terms, such as phrases, that may occur within a database of subsets containing sequences of terms, such as text. A database is provided and a relational model of the database is created. A query is then input. The query includes a term or a sequence of terms or multiple individual terms or multiple sequences of terms or combinations thereof. Next, several sequences of terms that are contextually related to the query are assembled from contextual relations in the model of the database. The sequences of terms are then sorted and output. Phrase generation can also be an iterative process used to produce sequences of terms from a relational model of a database.
Optimizing the Determination of Roughness Parameters for Model Urban Canopies

NASA Astrophysics Data System (ADS)

Huq, Pablo; Rahman, Auvi

2018-05-01

We present an objective optimization procedure to determine the roughness parameters for very rough boundary-layer flow over model urban canopies. For neutral stratification the mean velocity profile above a model urban canopy is described by the logarithmic law together with the set of roughness parameters of displacement height d, roughness length z_0 , and friction velocity u_* . Traditionally, values of these roughness parameters are obtained by fitting the logarithmic law through (all) the data points comprising the velocity profile. The new procedure generates unique velocity profiles from subsets or combinations of the data points of the original velocity profile, after which all possible profiles are examined. Each of the generated profiles is fitted to the logarithmic law for a sequence of values of d, with the representative value of d obtained from the minima of the summed least-squares errors for all the generated profiles. The representative values of z_0 and u_* are identified by the peak in the bivariate histogram of z_0 and u_* . The methodology has been verified against laboratory datasets of flow above model urban canopies.
Predicting Functions of Proteins in Mouse Based on Weighted Protein-Protein Interaction Network and Protein Hybrid Properties

PubMed Central

Shi, Xiaohe; Lu, Wen-Cong; Cai, Yu-Dong; Chou, Kuo-Chen

2011-01-01

Background With the huge amount of uncharacterized protein sequences generated in the post-genomic age, it is highly desirable to develop effective computational methods for quickly and accurately predicting their functions. The information thus obtained would be very useful for both basic research and drug development in a timely manner. Methodology/Principal Findings Although many efforts have been made in this regard, most of them were based on either sequence similarity or protein-protein interaction (PPI) information. However, the former often fails to work if a query protein has no or very little sequence similarity to any function-known proteins, while the latter had similar problem if the relevant PPI information is not available. In view of this, a new approach is proposed by hybridizing the PPI information and the biochemical/physicochemical features of protein sequences. The overall first-order success rates by the new predictor for the functions of mouse proteins on training set and test set were 69.1% and 70.2%, respectively, and the success rate covered by the results of the top-4 order from a total of 24 orders was 65.2%. Conclusions/Significance The results indicate that the new approach is quite promising that may open a new avenue or direction for addressing the difficult and complicated problem. PMID:21283518
Evaluating and redesigning teaching learning sequences at the introductory physics level

NASA Astrophysics Data System (ADS)

Guisasola, Jenaro; Zuza, Kristina; Ametller, Jaume; Gutierrez-Berraondo, José

2017-12-01

In this paper we put forward a proposal for the design and evaluation of teaching and learning sequences in upper secondary school and university. We will connect our proposal with relevant contributions on the design of teaching sequences, ground it on the design-based research methodology, and discuss how teaching and learning sequences designed according to our proposal relate to learning progressions. An iterative methodology for evaluating and redesigning the teaching and learning sequence (TLS) is presented. The proposed assessment strategy focuses on three aspects: (a) evaluation of the activities of the TLS, (b) evaluation of learning achieved by students in relation to the intended objectives, and (c) a document for gathering the difficulties found when implementing the TLS to serve as a guide to teachers. Discussion of this guide with external teachers provides feedback used for the TLS redesign. The context of our implementation and evaluation is an innovative calculus-based physics course for first-year engineering and science degree students at the University of the Basque Country.
Repair Sequences in Dysarthric Conversational Speech: A Study in Interactional Phonetics

ERIC Educational Resources Information Center

Rutter, Ben

2009-01-01

This paper presents some findings from a case study of repair sequences in conversations between a dysarthric speaker, Chris, and her interactional partners. It adopts the methodology of interactional phonetics, where turn design, sequence organization, and variation in phonetic parameters are analysed in unison. The analysis focused on the use of…
Critical appraisal of clinical trials in multiple system atrophy: Toward better quality.

PubMed

Castro Caldas, Ana; Levin, Johannes; Djaldetti, Ruth; Rascol, Olivier; Wenning, Gregor; Ferreira, Joaquim J

2017-10-01

Multiple system atrophy (MSA) is a rare neurodegenerative disease of undetermined cause. Although many clinical trials have been conducted, there is still no treatment that cures the disease or slows its progression. We sought to assess the clinical trials, methodology, and quality of reporting of clinical trails conducted in MSA patients. We conducted a systematic review of all trials with at least 1 MSA patient subject to any pharmacological/nonpharmacological interventions. Two independent reviewers evaluated the methodological characteristics and quality of reporting of trials. A total of 60 clinical trials were identified, including 1375 MSA patients. Of the trials, 51% (n = 31) were single-arm studies. A total of 28% (n = 17) had a parallel design, half of which (n = 13) were placebo controlled. Of the studies, 8 (13.3%) were conducted in a multicenter setting, 3 of which were responsible for 49.3% (n = 678) of the total included MSA patients. The description of primary outcomes was unclear in 60% (n = 40) of trials. Only 10 (16.7%) clinical trials clearly described the randomization process. Blinding of the participants, personnel, and outcome assessments were at high risk of bias in the majority of studies. The number of dropouts/withdrawals was high (n = 326, 23.4% among the included patients). Overall, the design and quality of reporting of the reviewed studies is unsatisfactory. The most frequent clinical trials were small and single centered. Inadequate reporting was related to the information on the randomization process, sequence generation, allocation concealment, blinding of participants, and sample size calculations. Although improved during the recent years, methodological quality and trial design need to be optimized to generate more informative results. © 2017 International Parkinson and Movement Disorder Society. © 2017 International Parkinson and Movement Disorder Society.
Rapid identification of kidney cyst mutations by whole exome sequencing in zebrafish

PubMed Central

Ryan, Sean; Willer, Jason; Marjoram, Lindsay; Bagwell, Jennifer; Mankiewicz, Jamie; Leshchiner, Ignaty; Goessling, Wolfram; Bagnat, Michel; Katsanis, Nicholas

2013-01-01

Forward genetic approaches in zebrafish have provided invaluable information about developmental processes. However, the relative difficulty of mapping and isolating mutations has limited the number of new genetic screens. Recent improvements in the annotation of the zebrafish genome coupled to a reduction in sequencing costs prompted the development of whole genome and RNA sequencing approaches for gene discovery. Here we describe a whole exome sequencing (WES) approach that allows rapid and cost-effective identification of mutations. We used our WES methodology to isolate four mutations that cause kidney cysts; we identified novel alleles in two ciliary genes as well as two novel mutants. The WES approach described here does not require specialized infrastructure or training and is therefore widely accessible. This methodology should thus help facilitate genetic screens and expedite the identification of mutants that can inform basic biological processes and the causality of genetic disorders in humans. PMID:24130329
Is a Genome a Codeword of an Error-Correcting Code?

PubMed Central

Kleinschmidt, João H.; Silva-Filho, Márcio C.; Bim, Edson; Herai, Roberto H.; Yamagishi, Michel E. B.; Palazzo, Reginaldo

2012-01-01

Since a genome is a discrete sequence, the elements of which belong to a set of four letters, the question as to whether or not there is an error-correcting code underlying DNA sequences is unavoidable. The most common approach to answering this question is to propose a methodology to verify the existence of such a code. However, none of the methodologies proposed so far, although quite clever, has achieved that goal. In a recent work, we showed that DNA sequences can be identified as codewords in a class of cyclic error-correcting codes known as Hamming codes. In this paper, we show that a complete intron-exon gene, and even a plasmid genome, can be identified as a Hamming code codeword as well. Although this does not constitute a definitive proof that there is an error-correcting code underlying DNA sequences, it is the first evidence in this direction. PMID:22649495

Mapping Base Modifications in DNA by Transverse-Current Sequencing

NASA Astrophysics Data System (ADS)

Alvarez, Jose R.; Skachkov, Dmitry; Massey, Steven E.; Kalitsov, Alan; Velev, Julian P.

2018-02-01

Sequencing DNA modifications and lesions, such as methylation of cytosine and oxidation of guanine, is even more important and challenging than sequencing the genome itself. The traditional methods for detecting DNA modifications are either insensitive to these modifications or require additional processing steps to identify a particular type of modification. Transverse-current sequencing in nanopores can potentially identify the canonical bases and base modifications in the same run. In this work, we demonstrate that the most common DNA epigenetic modifications and lesions can be detected with any predefined accuracy based on their tunneling current signature. Our results are based on simulations of the nanopore tunneling current through DNA molecules, calculated using nonequilibrium electron-transport methodology within an effective multiorbital model derived from first-principles calculations, followed by a base-calling algorithm accounting for neighbor current-current correlations. This methodology can be integrated with existing experimental techniques to improve base-calling fidelity.
Design methodology and projects for space engineering

NASA Technical Reports Server (NTRS)

Nichols, S.; Kleespies, H.; Wood, K.; Crawford, R.

1993-01-01

NASA/USRA is an ongoing sponsor of space design projects in the senior design course of the Mechanical Engineering Department at The University of Texas at Austin. This paper describes the UT senior design sequence, consisting of a design methodology course and a capstone design course. The philosophical basis of this sequence is briefly summarized. A history of the Department's activities in the Advanced Design Program is then presented. The paper concludes with a description of the projects completed during the 1991-92 academic year and the ongoing projects for the Fall 1992 semester.
A metagenomic approach to characterization of the vaginal microbiome signature in pregnancy.

PubMed

Aagaard, Kjersti; Riehle, Kevin; Ma, Jun; Segata, Nicola; Mistretta, Toni-Ann; Coarfa, Cristian; Raza, Sabeen; Rosenbaum, Sean; Van den Veyver, Ignatia; Milosavljevic, Aleksandar; Gevers, Dirk; Huttenhower, Curtis; Petrosino, Joseph; Versalovic, James

2012-01-01

While current major national research efforts (i.e., the NIH Human Microbiome Project) will enable comprehensive metagenomic characterization of the adult human microbiota, how and when these diverse microbial communities take up residence in the host and during reproductive life are unexplored at a population level. Because microbial abundance and diversity might differ in pregnancy, we sought to generate comparative metagenomic signatures across gestational age strata. DNA was isolated from the vagina (introitus, posterior fornix, midvagina) and the V5V3 region of bacterial 16S rRNA genes were sequenced (454FLX Titanium platform). Sixty-eight samples from 24 healthy gravidae (18 to 40 confirmed weeks) were compared with 301 non-pregnant controls (60 subjects). Generated sequence data were quality filtered, taxonomically binned, normalized, and organized by phylogeny and into operational taxonomic units (OTU); principal coordinates analysis (PCoA) of the resultant beta diversity measures were used for visualization and analysis in association with sample clinical metadata. Altogether, 1.4 gigabytes of data containing >2.5 million reads (averaging 6,837 sequences/sample of 493 nt in length) were generated for computational analyses. Although gravidae were not excluded by virtue of a posterior fornix pH >4.5 at the time of screening, unique vaginal microbiome signature encompassing several specific OTUs and higher-level clades was nevertheless observed and confirmed using a combination of phylogenetic, non-phylogenetic, supervised, and unsupervised approaches. Both overall diversity and richness were reduced in pregnancy, with dominance of Lactobacillus species (L. iners crispatus, jensenii and johnsonii, and the orders Lactobacillales (and Lactobacillaceae family), Clostridiales, Bacteroidales, and Actinomycetales. This intergroup comparison using rigorous standardized sampling protocols and analytical methodologies provides robust initial evidence that the vaginal microbial 16S rRNA gene catalogue uniquely differs in pregnancy, with variance of taxa across vaginal subsite and gestational age.
Segmentation and Recognition of Continuous Human Activity

DTIC Science & Technology

2001-01-01

This paper presents a methodology for automatic segmentation and recognition of continuous human activity . We segment a continuous human activity into...commencement or termination. We use single action sequences for the training data set. The test sequences, on the other hand, are continuous sequences of human ... activity that consist of three or more actions in succession. The system has been tested on continuous activity sequences containing actions such as
Short RNA indicator sequences are not completely degraded by autoclaving

PubMed Central

Unnithan, Veena V.; Unc, Adrian; Joe, Valerisa; Smith, Geoffrey B.

2014-01-01

Short indicator RNA sequences (<100 bp) persist after autoclaving and are recovered intact by molecular amplification. Primers targeting longer sequences are most likely to produce false positives due to amplification errors easily verified by melting curves analyses. If short indicator RNA sequences are used for virus identification and quantification then post autoclave RNA degradation methodology should be employed, which may include further autoclaving. PMID:24518856
Complete Genome Sequence of a Streptococcus pyogenes Serotype M12 Scarlet Fever Outbreak Isolate from China, Compiled Using Oxford Nanopore and Illumina Sequencing

PubMed Central

You, Yuanhai; Kou, Yongjun; Niu, Longfei; Jia, Qiong; Liu, Yahui; Walker, Mark J.; Zhu, Jiaqiang

2018-01-01

ABSTRACT The incidence of scarlet fever cases remains high in China. Here, we report the complete genome sequence of a Streptococcus pyogenes isolate of serotype M12, which has been confirmed as the predominant serotype in recent outbreaks. Genome sequencing was achieved by a combination of Oxford Nanopore MinION and Illumina methodologies. PMID:29724853
Acoustic sequences in non-human animals: a tutorial review and prospectus.

PubMed

Kershenbaum, Arik; Blumstein, Daniel T; Roch, Marie A; Akçay, Çağlar; Backus, Gregory; Bee, Mark A; Bohn, Kirsten; Cao, Yan; Carter, Gerald; Cäsar, Cristiane; Coen, Michael; DeRuiter, Stacy L; Doyle, Laurance; Edelman, Shimon; Ferrer-i-Cancho, Ramon; Freeberg, Todd M; Garland, Ellen C; Gustison, Morgan; Harley, Heidi E; Huetz, Chloé; Hughes, Melissa; Hyland Bruno, Julia; Ilany, Amiyaal; Jin, Dezhe Z; Johnson, Michael; Ju, Chenghui; Karnowski, Jeremy; Lohr, Bernard; Manser, Marta B; McCowan, Brenda; Mercado, Eduardo; Narins, Peter M; Piel, Alex; Rice, Megan; Salmi, Roberta; Sasahara, Kazutoshi; Sayigh, Laela; Shiu, Yu; Taylor, Charles; Vallejo, Edgar E; Waller, Sara; Zamora-Gutierrez, Veronica

2016-02-01

Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise - let alone understand - the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, 'Analysing vocal sequences in animals'. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality. © 2014 Cambridge Philosophical Society.
Acoustic sequences in non-human animals: a tutorial review and prospectus

PubMed Central

Kershenbaum, Arik; Blumstein, Daniel T.; Roch, Marie A.; Akçay, Çağlar; Backus, Gregory; Bee, Mark A.; Bohn, Kirsten; Cao, Yan; Carter, Gerald; Cäsar, Cristiane; Coen, Michael; DeRuiter, Stacy L.; Doyle, Laurance; Edelman, Shimon; Ferrer-i-Cancho, Ramon; Freeberg, Todd M.; Garland, Ellen C.; Gustison, Morgan; Harley, Heidi E.; Huetz, Chloé; Hughes, Melissa; Bruno, Julia Hyland; Ilany, Amiyaal; Jin, Dezhe Z.; Johnson, Michael; Ju, Chenghui; Karnowski, Jeremy; Lohr, Bernard; Manser, Marta B.; McCowan, Brenda; Mercado, Eduardo; Narins, Peter M.; Piel, Alex; Rice, Megan; Salmi, Roberta; Sasahara, Kazutoshi; Sayigh, Laela; Shiu, Yu; Taylor, Charles; Vallejo, Edgar E.; Waller, Sara; Zamora-Gutierrez, Veronica

2015-01-01

Animal acoustic communication often takes the form of complex sequences, made up of multiple distinct acoustic units. Apart from the well-known example of birdsong, other animals such as insects, amphibians, and mammals (including bats, rodents, primates, and cetaceans) also generate complex acoustic sequences. Occasionally, such as with birdsong, the adaptive role of these sequences seems clear (e.g. mate attraction and territorial defence). More often however, researchers have only begun to characterise – let alone understand – the significance and meaning of acoustic sequences. Hypotheses abound, but there is little agreement as to how sequences should be defined and analysed. Our review aims to outline suitable methods for testing these hypotheses, and to describe the major limitations to our current and near-future knowledge on questions of acoustic sequences. This review and prospectus is the result of a collaborative effort between 43 scientists from the fields of animal behaviour, ecology and evolution, signal processing, machine learning, quantitative linguistics, and information theory, who gathered for a 2013 workshop entitled, “Analysing vocal sequences in animals”. Our goal is to present not just a review of the state of the art, but to propose a methodological framework that summarises what we suggest are the best practices for research in this field, across taxa and across disciplines. We also provide a tutorial-style introduction to some of the most promising algorithmic approaches for analysing sequences. We divide our review into three sections: identifying the distinct units of an acoustic sequence, describing the different ways that information can be contained within a sequence, and analysing the structure of that sequence. Each of these sections is further subdivided to address the key questions and approaches in that area. We propose a uniform, systematic, and comprehensive approach to studying sequences, with the goal of clarifying research terms used in different fields, and facilitating collaboration and comparative studies. Allowing greater interdisciplinary collaboration will facilitate the investigation of many important questions in the evolution of communication and sociality. PMID:25428267
From Conventional to Next Generation Sequencing of Epstein-Barr Virus Genomes.

PubMed

Kwok, Hin; Chiang, Alan Kwok Shing

2016-02-24

Genomic sequences of Epstein-Barr virus (EBV) have been of interest because the virus is associated with cancers, such as nasopharyngeal carcinoma, and conditions such as infectious mononucleosis. The progress of whole-genome EBV sequencing has been limited by the inefficiency and cost of the first-generation sequencing technology. With the advancement of next-generation sequencing (NGS) and target enrichment strategies, increasing number of EBV genomes has been published. These genomes were sequenced using different approaches, either with or without EBV DNA enrichment. This review provides an overview of the EBV genomes published to date, and a description of the sequencing technology and bioinformatic analyses employed in generating these sequences. We further explored ways through which the quality of sequencing data can be improved, such as using DNA oligos for capture hybridization, and longer insert size and read length in the sequencing runs. These advances will enable large-scale genomic sequencing of EBV which will facilitate a better understanding of the genetic variations of EBV in different geographic regions and discovery of potentially pathogenic variants in specific diseases.
Towards Clinical Molecular Diagnosis of Inherited Cardiac Conditions: A Comparison of Bench-Top Genome DNA Sequencers

PubMed Central

Wilkinson, Samuel L.; John, Shibu; Walsh, Roddy; Novotny, Tomas; Valaskova, Iveta; Gupta, Manu; Game, Laurence; Barton, Paul J R.; Cook, Stuart A.; Ware, James S.

2013-01-01

Background Molecular genetic testing is recommended for diagnosis of inherited cardiac disease, to guide prognosis and treatment, but access is often limited by cost and availability. Recently introduced high-throughput bench-top DNA sequencing platforms have the potential to overcome these limitations. Methodology/Principal Findings We evaluated two next-generation sequencing (NGS) platforms for molecular diagnostics. The protein-coding regions of six genes associated with inherited arrhythmia syndromes were amplified from 15 human samples using parallelised multiplex PCR (Access Array, Fluidigm), and sequenced on the MiSeq (Illumina) and Ion Torrent PGM (Life Technologies). Overall, 97.9% of the target was sequenced adequately for variant calling on the MiSeq, and 96.8% on the Ion Torrent PGM. Regions missed tended to be of high GC-content, and most were problematic for both platforms. Variant calling was assessed using 107 variants detected using Sanger sequencing: within adequately sequenced regions, variant calling on both platforms was highly accurate (Sensitivity: MiSeq 100%, PGM 99.1%. Positive predictive value: MiSeq 95.9%, PGM 95.5%). At the time of the study the Ion Torrent PGM had a lower capital cost and individual runs were cheaper and faster. The MiSeq had a higher capacity (requiring fewer runs), with reduced hands-on time and simpler laboratory workflows. Both provide significant cost and time savings over conventional methods, even allowing for adjunct Sanger sequencing to validate findings and sequence exons missed by NGS. Conclusions/Significance MiSeq and Ion Torrent PGM both provide accurate variant detection as part of a PCR-based molecular diagnostic workflow, and provide alternative platforms for molecular diagnosis of inherited cardiac conditions. Though there were performance differences at this throughput, platforms differed primarily in terms of cost, scalability, protocol stability and ease of use. Compared with current molecular genetic diagnostic tests for inherited cardiac arrhythmias, these NGS approaches are faster, less expensive, and yet more comprehensive. PMID:23861798
Program Synthesizes UML Sequence Diagrams

NASA Technical Reports Server (NTRS)

Barry, Matthew R.; Osborne, Richard N.

2006-01-01

A computer program called "Rational Sequence" generates Universal Modeling Language (UML) sequence diagrams of a target Java program running on a Java virtual machine (JVM). Rational Sequence thereby performs a reverse engineering function that aids in the design documentation of the target Java program. Whereas previously, the construction of sequence diagrams was a tedious manual process, Rational Sequence generates UML sequence diagrams automatically from the running Java code.
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Understanding phylogenetic incongruence: lessons from phyllostomid bats

PubMed Central

Dávalos, Liliana M; Cirranello, Andrea L; Geisler, Jonathan H; Simmons, Nancy B

2012-01-01

All characters and trait systems in an organism share a common evolutionary history that can be estimated using phylogenetic methods. However, differential rates of change and the evolutionary mechanisms driving those rates result in pervasive phylogenetic conflict. These drivers need to be uncovered because mismatches between evolutionary processes and phylogenetic models can lead to high confidence in incorrect hypotheses. Incongruence between phylogenies derived from morphological versus molecular analyses, and between trees based on different subsets of molecular sequences has become pervasive as datasets have expanded rapidly in both characters and species. For more than a decade, evolutionary relationships among members of the New World bat family Phyllostomidae inferred from morphological and molecular data have been in conflict. Here, we develop and apply methods to minimize systematic biases, uncover the biological mechanisms underlying phylogenetic conflict, and outline data requirements for future phylogenomic and morphological data collection. We introduce new morphological data for phyllostomids and outgroups and expand previous molecular analyses to eliminate methodological sources of phylogenetic conflict such as taxonomic sampling, sparse character sampling, or use of different algorithms to estimate the phylogeny. We also evaluate the impact of biological sources of conflict: saturation in morphological changes and molecular substitutions, and other processes that result in incongruent trees, including convergent morphological and molecular evolution. Methodological sources of incongruence play some role in generating phylogenetic conflict, and are relatively easy to eliminate by matching taxa, collecting more characters, and applying the same algorithms to optimize phylogeny. The evolutionary patterns uncovered are consistent with multiple biological sources of conflict, including saturation in morphological and molecular changes, adaptive morphological convergence among nectar-feeding lineages, and incongruent gene trees. Applying methods to account for nucleotide sequence saturation reduces, but does not completely eliminate, phylogenetic conflict. We ruled out paralogy, lateral gene transfer, and poor taxon sampling and outgroup choices among the processes leading to incongruent gene trees in phyllostomid bats. Uncovering and countering the possible effects of introgression and lineage sorting of ancestral polymorphism on gene trees will require great leaps in genomic and allelic sequencing in this species-rich mammalian family. We also found evidence for adaptive molecular evolution leading to convergence in mitochondrial proteins among nectar-feeding lineages. In conclusion, the biological processes that generate phylogenetic conflict are ubiquitous, and overcoming incongruence requires better models and more data than have been collected even in well-studied organisms such as phyllostomid bats. PMID:22891620
Applications of nanotechnology, next generation sequencing and microarrays in biomedical research.

PubMed

Elingaramil, Sauli; Li, Xiaolong; He, Nongyue

2013-07-01

Next-generation sequencing technologies, microarrays and advances in bio nanotechnology have had an enormous impact on research within a short time frame. This impact appears certain to increase further as many biomedical institutions are now acquiring these prevailing new technologies. Beyond conventional sampling of genome content, wide-ranging applications are rapidly evolving for next-generation sequencing, microarrays and nanotechnology. To date, these technologies have been applied in a variety of contexts, including whole-genome sequencing, targeted re sequencing and discovery of transcription factor binding sites, noncoding RNA expression profiling and molecular diagnostics. This paper thus discusses current applications of nanotechnology, next-generation sequencing technologies and microarrays in biomedical research and highlights the transforming potential these technologies offer.
Advanced Endoscopic Navigation: Surgical Big Data, Methodology, and Applications.

PubMed

Luo, Xiongbiao; Mori, Kensaku; Peters, Terry M

2018-06-04

Interventional endoscopy (e.g., bronchoscopy, colonoscopy, laparoscopy, cystoscopy) is a widely performed procedure that involves either diagnosis of suspicious lesions or guidance for minimally invasive surgery in a variety of organs within the body cavity. Endoscopy may also be used to guide the introduction of certain items (e.g., stents) into the body. Endoscopic navigation systems seek to integrate big data with multimodal information (e.g., computed tomography, magnetic resonance images, endoscopic video sequences, ultrasound images, external trackers) relative to the patient's anatomy, control the movement of medical endoscopes and surgical tools, and guide the surgeon's actions during endoscopic interventions. Nevertheless, it remains challenging to realize the next generation of context-aware navigated endoscopy. This review presents a broad survey of various aspects of endoscopic navigation, particularly with respect to the development of endoscopic navigation techniques. First, we investigate big data with multimodal information involved in endoscopic navigation. Next, we focus on numerous methodologies used for endoscopic navigation. We then review different endoscopic procedures in clinical applications. Finally, we discuss novel techniques and promising directions for the development of endoscopic navigation.
Removing technical variability in RNA-seq data using conditional quantile normalization.

PubMed

Hansen, Kasper D; Irizarry, Rafael A; Wu, Zhijin

2012-04-01

The ability to measure gene expression on a genome-wide scale is one of the most promising accomplishments in molecular biology. Microarrays, the technology that first permitted this, were riddled with problems due to unwanted sources of variability. Many of these problems are now mitigated, after a decade's worth of statistical methodology development. The recently developed RNA sequencing (RNA-seq) technology has generated much excitement in part due to claims of reduced variability in comparison to microarrays. However, we show that RNA-seq data demonstrate unwanted and obscuring variability similar to what was first observed in microarrays. In particular, we find guanine-cytosine content (GC-content) has a strong sample-specific effect on gene expression measurements that, if left uncorrected, leads to false positives in downstream results. We also report on commonly observed data distortions that demonstrate the need for data normalization. Here, we describe a statistical methodology that improves precision by 42% without loss of accuracy. Our resulting conditional quantile normalization algorithm combines robust generalized regression to remove systematic bias introduced by deterministic features such as GC-content and quantile normalization to correct for global distortions.
Cell assembly sequences arising from spike threshold adaptation keep track of time in the hippocampus

PubMed Central

Itskov, Vladimir; Curto, Carina; Pastalkova, Eva; Buzsáki, György

2011-01-01

Hippocampal neurons can display reliable and long-lasting sequences of transient firing patterns, even in the absence of changing external stimuli. We suggest that time-keeping is an important function of these sequences, and propose a network mechanism for their generation. We show that sequences of neuronal assemblies recorded from rat hippocampal CA1 pyramidal cells can reliably predict elapsed time (15-20 sec) during wheel running with a precision of 0.5sec. In addition, we demonstrate the generation of multiple reliable, long-lasting sequences in a recurrent network model. These sequences are generated in the presence of noisy, unstructured inputs to the network, mimicking stationary sensory input. Identical initial conditions generate similar sequences, whereas different initial conditions give rise to distinct sequences. The key ingredients responsible for sequence generation in the model are threshold-adaptation and a Mexican-hat-like pattern of connectivity among pyramidal cells. This pattern may arise from recurrent systems such as the hippocampal CA3 region or the entorhinal cortex. We hypothesize that mechanisms that evolved for spatial navigation also support tracking of elapsed time in behaviorally relevant contexts. PMID:21414904
Value-based genomics.

PubMed

Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi

2018-03-20

Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics.
Value-based genomics

PubMed Central

Gong, Jun; Pan, Kathy; Fakih, Marwan; Pal, Sumanta; Salgia, Ravi

2018-01-01

Advancements in next-generation sequencing have greatly enhanced the development of biomarker-driven cancer therapies. The affordability and availability of next-generation sequencers have allowed for the commercialization of next-generation sequencing platforms that have found widespread use for clinical-decision making and research purposes. Despite the greater availability of tumor molecular profiling by next-generation sequencing at our doorsteps, the achievement of value-based care, or improving patient outcomes while reducing overall costs or risks, in the era of precision oncology remains a looming challenge. In this review, we highlight available data through a pre-established and conceptualized framework for evaluating value-based medicine to assess the cost (efficiency), clinical benefit (effectiveness), and toxicity (safety) of genomic profiling in cancer care. We also provide perspectives on future directions of next-generation sequencing from targeted panels to whole-exome or whole-genome sequencing and describe potential strategies needed to attain value-based genomics. PMID:29644010
The Evolution of the Observed Hubble Sequence over the past 6Gyr

NASA Astrophysics Data System (ADS)

Delgado-Serrano, R.; Hammer, F.; Yang, Y. B.; Puech, M.; Flores, H.; Rodrigues, M.

2011-10-01

During the past years we have confronted serious problems of methodology concerning the morphological and kinematic classification of distant galaxies. This has forced us to create a new simple and effective morphological classification methodology, in order to guarantee a morpho-kinematic correlation, make the reproducibility easier and restrict the classification subjectivity. Giving the characteristic of our morphological classification, we have thus been able to apply the same methodology, using equivalent observations, to representative samples of local and distant galaxies. It has allowed us to derive, for the first time, the distant Hubble sequence (~6 Gyr ago), and determine a morphological evolution of galaxies over the past 6 Gyr. Our results strongly suggest that more than half of the present-day spirals had peculiar morphologies, 6 Gyr ago.

Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome

PubMed Central

Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

2015-01-01

Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions. PMID:26544948
The global prevalence of HFE and non-HFE hemochromatosis estimated from analysis of next-generation sequencing data.

PubMed

Wallace, Daniel F; Subramaniam, V Nathan

2016-06-01

The prevalence of HFE-related hereditary hemochromatosis (HH) among European populations has been well studied. There are no prevalence data for atypical forms of HH caused by mutations in HFE2, HAMP, TFR2, or SLC40A1. The purpose of this study was to estimate the population prevalence of these non-HFE forms of HH. A list of HH pathogenic variants in publically available next-generation sequence (NGS) databases was compiled and allele frequencies were determined. Of 161 variants previously associated with HH, 43 were represented among the NGS data sets; an additional 40 unreported functional variants also were identified. The predicted prevalence of HFE HH and the p.Cys282Tyr mutation closely matched previous estimates from similar populations. Of the non-HFE forms of iron overload, TFR2-, HFE2-, and HAMP-related forms are predicted to be rare, with pathogenic allele frequencies in the range of 0.00007 to 0.0005. Significantly, SLC40A1 variants that have been previously associated with autosomal-dominant ferroportin disease were identified in several populations (pathogenic allele frequency 0.0004), being most prevalent among Africans. We have, for the first time, estimated the population prevalence of non-HFE HH. This methodology could be applied to estimate the population prevalence of a wide variety of genetic disorders.Genet Med 18 6, 618-626.
Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

PubMed

Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

2015-01-01

Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.
Generating Artificial Snort Alerts and Implementing SELK: The Snort-Elasticsearch-Logstash-Kibana Stack

DTIC Science & Technology

2017-09-01

analyzing Snort alerts. The first section covers the Snort alert-generation program, the methodology involved in developing it, and how it accelerates...guide on system setup. The methodologies described can be translated to the setup and use of the ELK stack for storing and visualizing any data...Figures iv List of Tables iv 1. Introduction 1 2. Methodology 2 2.1. Snort Alert Generation 2 2.2 The SELK Stack 8 3. Discussion and Conclusion 11
Vector Design Tour de Force: Integrating Combinatorial and Rational Approaches to Derive Novel Adeno-associated Virus Variants

PubMed Central

Marsic, Damien; Govindasamy, Lakshmanan; Currlin, Seth; Markusic, David M; Tseng, Yu-Shan; Herzog, Roland W; Agbandje-McKenna, Mavis; Zolotukhin, Sergei

2014-01-01

Methodologies to improve existing adeno-associated virus (AAV) vectors for gene therapy include either rational approaches or directed evolution to derive capsid variants characterized by superior transduction efficiencies in targeted tissues. Here, we integrated both approaches in one unified design strategy of “virtual family shuffling” to derive a combinatorial capsid library whereby only variable regions on the surface of the capsid are modified. Individual sublibraries were first assembled in order to preselect compatible amino acid residues within restricted surface-exposed regions to minimize the generation of dead-end variants. Subsequently, the successful families were interbred to derive a combined library of ~8 × 105 complexity. Next-generation sequencing of the packaged viral DNA revealed capsid surface areas susceptible to directed evolution, thus providing guidance for future designs. We demonstrated the utility of the library by deriving an AAV2-based vector characterized by a 20-fold higher transduction efficiency in murine liver, now equivalent to that of AAV8. PMID:25048217
Sequential Service Restoration for Unbalanced Distribution Systems and Microgrids

DOE PAGES

Chen, Bo; Chen, Chen; Wang, Jianhui; ...

2017-07-07

The resilience and reliability of modern power systems are threatened by increasingly severe weather events and cyber-physical security events. An effective restoration methodology is desired to optimally integrate emerging smart grid technologies and pave the way for developing self-healing smart grids. In this paper, a sequential service restoration (SSR) framework is proposed to generate restoration solutions for distribution systems and microgrids in the event of large-scale power outages. The restoration solution contains a sequence of control actions that properly coordinate switches, distributed generators, and switchable loads to form multiple isolated microgrids. The SSR can be applied for three-phase unbalanced distributionmore » systems and microgrids and can adapt to various operation conditions. Mathematical models are introduced for three-phase unbalanced power flow, voltage regulators, transformers, and loads. Furthermore, the SSR problem is formulated as a mixed-integer linear programming model, and its effectiveness is evaluated via the modified IEEE 123 node test feeder.« less
Sequential Service Restoration for Unbalanced Distribution Systems and Microgrids

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Bo; Chen, Chen; Wang, Jianhui

The resilience and reliability of modern power systems are threatened by increasingly severe weather events and cyber-physical security events. An effective restoration methodology is desired to optimally integrate emerging smart grid technologies and pave the way for developing self-healing smart grids. In this paper, a sequential service restoration (SSR) framework is proposed to generate restoration solutions for distribution systems and microgrids in the event of large-scale power outages. The restoration solution contains a sequence of control actions that properly coordinate switches, distributed generators, and switchable loads to form multiple isolated microgrids. The SSR can be applied for three-phase unbalanced distributionmore » systems and microgrids and can adapt to various operation conditions. Mathematical models are introduced for three-phase unbalanced power flow, voltage regulators, transformers, and loads. Furthermore, the SSR problem is formulated as a mixed-integer linear programming model, and its effectiveness is evaluated via the modified IEEE 123 node test feeder.« less
Moving Object Detection Using a Parallax Shift Vector Algorithm

NASA Astrophysics Data System (ADS)

Gural, Peter S.; Otto, Paul R.; Tedesco, Edward F.

2018-07-01

There are various algorithms currently in use to detect asteroids from ground-based observatories, but they are generally restricted to linear or mildly curved movement of the target object across the field of view. Space-based sensors in high inclination, low Earth orbits can induce significant parallax in a collected sequence of images, especially for objects at the typical distances of asteroids in the inner solar system. This results in a highly nonlinear motion pattern of the asteroid across the sensor, which requires a more sophisticated search pattern for detection processing. Both the classical pattern matching used in ground-based asteroid search and the more sensitive matched filtering and synthetic tracking techniques, can be adapted to account for highly complex parallax motion. A new shift vector generation methodology is discussed along with its impacts on commonly used detection algorithms, processing load, and responsiveness to asteroid track reporting. The matched filter, template generator, and pattern matcher source code for the software described herein are available via GitHub.
Complete Genome Sequence of a Streptococcus pyogenes Serotype M12 Scarlet Fever Outbreak Isolate from China, Compiled Using Oxford Nanopore and Illumina Sequencing.

PubMed

You, Yuanhai; Kou, Yongjun; Niu, Longfei; Jia, Qiong; Liu, Yahui; Davies, Mark R; Walker, Mark J; Zhu, Jiaqiang; Zhang, Jianzhong

2018-05-03

The incidence of scarlet fever cases remains high in China. Here, we report the complete genome sequence of a Streptococcus pyogenes isolate of serotype M12, which has been confirmed as the predominant serotype in recent outbreaks. Genome sequencing was achieved by a combination of Oxford Nanopore MinION and Illumina methodologies. Copyright © 2018 You et al.
Rapid Evolution of Virulence and Drug Resistance in the Emerging Zoonotic Pathogen Streptococcus suis

PubMed Central

Holden, Matthew T. G.; Hauser, Heidi; Sanders, Mandy; Ngo, Thi Hoa; Cherevach, Inna; Cronin, Ann; Goodhead, Ian; Mungall, Karen; Quail, Michael A.; Price, Claire; Rabbinowitsch, Ester; Sharp, Sarah; Croucher, Nicholas J.; Chieu, Tran Bich; Thi Hoang Mai, Nguyen; Diep, To Song; Chinh, Nguyen Tran; Kehoe, Michael; Leigh, James A.; Ward, Philip N.; Dowson, Christopher G.; Whatmore, Adrian M.; Chanter, Neil; Iversen, Pernille; Gottschalk, Marcelo; Slater, Josh D.; Smith, Hilde E.; Spratt, Brian G.; Xu, Jianguo; Ye, Changyun; Bentley, Stephen; Barrell, Barclay G.; Schultsz, Constance; Maskell, Duncan J.; Parkhill, Julian

2009-01-01

Background Streptococcus suis is a zoonotic pathogen that infects pigs and can occasionally cause serious infections in humans. S. suis infections occur sporadically in human Europe and North America, but a recent major outbreak has been described in China with high levels of mortality. The mechanisms of S. suis pathogenesis in humans and pigs are poorly understood. Methodology/Principal Findings The sequencing of whole genomes of S. suis isolates provides opportunities to investigate the genetic basis of infection. Here we describe whole genome sequences of three S. suis strains from the same lineage: one from European pigs, and two from human cases from China and Vietnam. Comparative genomic analysis was used to investigate the variability of these strains. S. suis is phylogenetically distinct from other Streptococcus species for which genome sequences are currently available. Accordingly, ∼40% of the ∼2 Mb genome is unique in comparison to other Streptococcus species. Finer genomic comparisons within the species showed a high level of sequence conservation; virtually all of the genome is common to the S. suis strains. The only exceptions are three ∼90 kb regions, present in the two isolates from humans, composed of integrative conjugative elements and transposons. Carried in these regions are coding sequences associated with drug resistance. In addition, small-scale sequence variation has generated pseudogenes in putative virulence and colonization factors. Conclusions/Significance The genomic inventories of genetically related S. suis strains, isolated from distinct hosts and diseases, exhibit high levels of conservation. However, the genomes provide evidence that horizontal gene transfer has contributed to the evolution of drug resistance. PMID:19603075
A Window Into Clinical Next-Generation Sequencing-Based Oncology Testing Practices.

PubMed

Nagarajan, Rakesh; Bartley, Angela N; Bridge, Julia A; Jennings, Lawrence J; Kamel-Reid, Suzanne; Kim, Annette; Lazar, Alexander J; Lindeman, Neal I; Moncur, Joel; Rai, Alex J; Routbort, Mark J; Vasalos, Patricia; Merker, Jason D

2017-12-01

- Detection of acquired variants in cancer is a paradigm of precision medicine, yet little has been reported about clinical laboratory practices across a broad range of laboratories. - To use College of American Pathologists proficiency testing survey results to report on the results from surveys on next-generation sequencing-based oncology testing practices. - College of American Pathologists proficiency testing survey results from more than 250 laboratories currently performing molecular oncology testing were used to determine laboratory trends in next-generation sequencing-based oncology testing. - These presented data provide key information about the number of laboratories that currently offer or are planning to offer next-generation sequencing-based oncology testing. Furthermore, we present data from 60 laboratories performing next-generation sequencing-based oncology testing regarding specimen requirements and assay characteristics. The findings indicate that most laboratories are performing tumor-only targeted sequencing to detect single-nucleotide variants and small insertions and deletions, using desktop sequencers and predesigned commercial kits. Despite these trends, a diversity of approaches to testing exists. - This information should be useful to further inform a variety of topics, including national discussions involving clinical laboratory quality systems, regulation and oversight of next-generation sequencing-based oncology testing, and precision oncology efforts in a data-driven manner.
Mobile Eye Tracking Methodology in Informal E-Learning in Social Groups in Technology-Enhanced Science Centres

ERIC Educational Resources Information Center

Magnussen, Rikke; Zachariassen, Maria; Kharlamov, Nikita; Larsen, Birger

2017-01-01

This paper presents a methodological discussion of the potential and challenges of involving mobile eye tracking technology in studies of knowledge generation and learning in a science centre context. The methodological exploration is based on eye-tracking studies of audience interaction and knowledge generation in the technology-enhanced health…
The Essential Component in DNA-Based Information Storage System: Robust Error-Tolerating Module

PubMed Central

Yim, Aldrin Kay-Yuen; Yu, Allen Chi-Shing; Li, Jing-Woei; Wong, Ada In-Chun; Loo, Jacky F. C.; Chan, King Ming; Kong, S. K.; Yip, Kevin Y.; Chan, Ting-Fung

2014-01-01

The size of digital data is ever increasing and is expected to grow to 40,000 EB by 2020, yet the estimated global information storage capacity in 2011 is <300 EB, indicating that most of the data are transient. DNA, as a very stable nano-molecule, is an ideal massive storage device for long-term data archive. The two most notable illustrations are from Church et al. and Goldman et al., whose approaches are well-optimized for most sequencing platforms – short synthesized DNA fragments without homopolymer. Here, we suggested improvements on error handling methodology that could enable the integration of DNA-based computational process, e.g., algorithms based on self-assembly of DNA. As a proof of concept, a picture of size 438 bytes was encoded to DNA with low-density parity-check error-correction code. We salvaged a significant portion of sequencing reads with mutations generated during DNA synthesis and sequencing and successfully reconstructed the entire picture. A modular-based programing framework – DNAcodec with an eXtensible Markup Language-based data format was also introduced. Our experiments demonstrated the practicability of long DNA message recovery with high error tolerance, which opens the field to biocomputing and synthetic biology. PMID:25414846
MDC-Analyzer: a novel degenerate primer design tool for the construction of intelligent mutagenesis libraries with contiguous sites.

PubMed

Tang, Lixia; Wang, Xiong; Ru, Beibei; Sun, Hengfei; Huang, Jian; Gao, Hui

2014-06-01

Recent computational and bioinformatics advances have enabled the efficient creation of novel biocatalysts by reducing amino acid variability at hot spot regions. To further expand the utility of this strategy, we present here a tool called Multi-site Degenerate Codon Analyzer (MDC-Analyzer) for the automated design of intelligent mutagenesis libraries that can completely cover user-defined randomized sequences, especially when multiple contiguous and/or adjacent sites are targeted. By initially defining an objective function, the possible optimal degenerate PCR primer profiles could be automatically explored using the heuristic approach of Greedy Best-First-Search. Compared to the previously developed DC-Analyzer, MDC-Analyzer allows for the existence of a small amount of undesired sequences as a tradeoff between the number of degenerate primers and the encoded library size while still providing all the benefits of DC-Analyzer with the ability to randomize multiple contiguous sites. MDC-Analyzer was validated using a series of randomly generated mutation schemes and experimental case studies on the evolution of halohydrin dehalogenase, which proved that the MDC methodology is more efficient than other methods and is particularly well-suited to exploring the sequence space of proteins using data-driven protein engineering strategies.
Identification and phylogeny of Arabian snakes: Comparison of venom chromatographic profiles versus 16S rRNA gene sequences.

PubMed

Al Asmari, Abdulrahman; Manthiri, Rajamohammed Abbas; Khan, Haseeb Ahmad

2014-11-01

Identification of snake species is important for various reasons including the emergency treatment of snake bite victims. We present a simple method for identification of six snake species using the gel filtration chromatographic profiles of their venoms. The venoms of Echis coloratus, Echis pyramidum, Cerastes gasperettii, Bitis arietans, Naja arabica, and Walterinnesia aegyptia were milked, lyophilized, diluted and centrifuged to separate the mucus from the venom. The clear supernatants were filtered and chromatographed on fast protein liquid chromatography (FPLC). We obtained the 16S rRNA gene sequences of the above species and performed phylogenetic analysis using the neighbor-joining method. The chromatograms of venoms from different snake species showed peculiar patterns based on the number and location of peaks. The dendrograms generated from similarity matrix based on the presence/absence of particular chromatographic peaks clearly differentiated Elapids from Viperids. Molecular cladistics using 16S rRNA gene sequences resulted in jumping clades while separating the members of these two families. These findings suggest that chromatographic profiles of snake venoms may provide a simple and reproducible chemical fingerprinting method for quick identification of snake species. However, the validation of this methodology requires further studies on large number of specimens from within and across species.
Fragment analysis represents a suitable approach for the detection of hotspot c.7541_7542delCT NOTCH1 mutation in chronic lymphocytic leukemia.

PubMed

Vavrova, Eva; Kantorova, Barbara; Vonkova, Barbara; Kabathova, Jitka; Skuhrova-Francova, Hana; Diviskova, Eva; Letocha, Ondrej; Kotaskova, Jana; Brychtova, Yvona; Doubek, Michael; Mayer, Jiri; Pospisilova, Sarka

2017-09-01

The hotspot c.7541_7542delCT NOTCH1 mutation has been proven to have a negative clinical impact in chronic lymphocytic leukemia (CLL). However, an optimal method for its detection has not yet been specified. The aim of our study was to examine the presence of the NOTCH1 mutation in CLL using three commonly used molecular methods. Sanger sequencing, fragment analysis and allele-specific PCR were compared in the detection of the c.7541_7542delCT NOTCH1 mutation in 201 CLL patients. In 7 patients with inconclusive mutational analysis results, the presence of the NOTCH1 mutation was also confirmed using ultra-deep next generation sequencing. The NOTCH1 mutation was detected in 15% (30/201) of examined patients. Only fragment analysis was able to identify all 30 NOTCH1-mutated patients. Sanger sequencing and allele-specific PCR showed a lower detection efficiency, determining 93% (28/30) and 80% (24/30) of the present NOTCH1 mutations, respectively. Considering these three most commonly used methodologies for c.7541_7542delCT NOTCH1 mutation screening in CLL, we defined fragment analysis as the most suitable approach for detecting the hotspot NOTCH1 mutation. Copyright © 2017 Elsevier Ltd. All rights reserved.
De Novo Deep Transcriptome Analysis of Medicinal Plants for Gene Discovery in Biosynthesis of Plant Natural Products.

PubMed

Han, R; Rai, A; Nakamura, M; Suzuki, H; Takahashi, H; Yamazaki, M; Saito, K

2016-01-01

Study on transcriptome, the entire pool of transcripts in an organism or single cells at certain physiological or pathological stage, is indispensable in unraveling the connection and regulation between DNA and protein. Before the advent of deep sequencing, microarray was the main approach to handle transcripts. Despite obvious shortcomings, including limited dynamic range and difficulties to compare the results from distinct experiments, microarray was widely applied. During the past decade, next-generation sequencing (NGS) has revolutionized our understanding of genomics in a fast, high-throughput, cost-effective, and tractable manner. By adopting NGS, efficiency and fruitful outcomes concerning the efforts to elucidate genes responsible for producing active compounds in medicinal plants were profoundly enhanced. The whole process involves steps, from the plant material sampling, to cDNA library preparation, to deep sequencing, and then bioinformatics takes over to assemble enormous-yet fragmentary-data from which to comb and extract information. The unprecedentedly rapid development of such technologies provides so many choices to facilitate the task, which can cause confusion when choosing the suitable methodology for specific purposes. Here, we review the general approaches for deep transcriptome analysis and then focus on their application in discovering biosynthetic pathways of medicinal plants that produce important secondary metabolites. © 2016 Elsevier Inc. All rights reserved.
Extensive Conserved Synteny of Genes between the Karyotypes of Manduca sexta and Bombyx mori Revealed by BAC-FISH Mapping

PubMed Central

Tanaka-Okuyama, Makiko; Shibata, Fukashi; Yoshido, Atsuo; Marec, František; Wu, Chengcang; Zhang, Hongbin; Goldsmith, Marian R.

2009-01-01

Background Genome sequencing projects have been completed for several species representing four highly diverged holometabolous insect orders, Diptera, Hymenoptera, Coleoptera, and Lepidoptera. The striking evolutionary diversity of insects argues a need for efficient methods to apply genome information from such models to genetically uncharacterized species. Constructing conserved synteny maps plays a crucial role in this task. Here, we demonstrate the use of fluorescence in situ hybridization with bacterial artificial chromosome probes as a powerful tool for physical mapping of genes and comparative genome analysis in Lepidoptera, which have numerous and morphologically uniform holokinetic chromosomes. Methodology/Principal Findings We isolated 214 clones containing 159 orthologs of well conserved single-copy genes of a sequenced lepidopteran model, the silkworm, Bombyx mori, from a BAC library of a sphingid with an unexplored genome, the tobacco hornworm, Manduca sexta. We then constructed a BAC-FISH karyotype identifying all 28 chromosomes of M. sexta by mapping 124 loci using the corresponding BAC clones. BAC probes from three M. sexta chromosomes also generated clear signals on the corresponding chromosomes of the convolvulus hawk moth, Agrius convolvuli, which belongs to the same subfamily, Sphinginae, as M. sexta. Conclusions/Significance Comparison of the M. sexta BAC physical map with the linkage map and genome sequence of B. mori pointed to extensive conserved synteny including conserved gene order in most chromosomes. Only a few rearrangements, including three inversions, three translocations, and two fission/fusion events were estimated to have occurred after the divergence of Bombycidae and Sphingidae. These results add to accumulating evidence for the stability of lepidopteran genomes. Generating signals on A. convolvuli chromosomes using heterologous M. sexta probes demonstrated that BAC-FISH with orthologous sequences can be used for karyotyping a wide range of related and genetically uncharacterized species, significantly extending the ability to develop synteny maps for comparative and functional genomics. PMID:19829706

Automatic Command Sequence Generation

NASA Technical Reports Server (NTRS)

Fisher, Forest; Gladded, Roy; Khanampompan, Teerapat

2007-01-01

Automatic Sequence Generator (Autogen) Version 3.0 software automatically generates command sequences for the Mars Reconnaissance Orbiter (MRO) and several other JPL spacecraft operated by the multi-mission support team. Autogen uses standard JPL sequencing tools like APGEN, ASP, SEQGEN, and the DOM database to automate the generation of uplink command products, Spacecraft Command Message Format (SCMF) files, and the corresponding ground command products, DSN Keywords Files (DKF). Autogen supports all the major multi-mission mission phases including the cruise, aerobraking, mapping/science, and relay mission phases. Autogen is a Perl script, which functions within the mission operations UNIX environment. It consists of two parts: a set of model files and the autogen Perl script. Autogen encodes the behaviors of the system into a model and encodes algorithms for context sensitive customizations of the modeled behaviors. The model includes knowledge of different mission phases and how the resultant command products must differ for these phases. The executable software portion of Autogen, automates the setup and use of APGEN for constructing a spacecraft activity sequence file (SASF). The setup includes file retrieval through the DOM (Distributed Object Manager), an object database used to store project files. This step retrieves all the needed input files for generating the command products. Depending on the mission phase, Autogen also uses the ASP (Automated Sequence Processor) and SEQGEN to generate the command product sent to the spacecraft. Autogen also provides the means for customizing sequences through the use of configuration files. By automating the majority of the sequencing generation process, Autogen eliminates many sequence generation errors commonly introduced by manually constructing spacecraft command sequences. Through the layering of commands into the sequence by a series of scheduling algorithms, users are able to rapidly and reliably construct the desired uplink command products. With the aid of Autogen, sequences may be produced in a matter of hours instead of weeks, with a significant reduction in the number of people on the sequence team. As a result, the uplink product generation process is significantly streamlined and mission risk is significantly reduced. Autogen is used for operations of MRO, Mars Global Surveyor (MGS), Mars Exploration Rover (MER), Mars Odyssey, and will be used for operations of Phoenix. Autogen Version 3.0 is the operational version of Autogen including the MRO adaptation for the cruise mission phase, and was also used for development of the aerobraking and mapping mission phases for MRO.
An approach for identification of unknown viruses using sequencing-by-hybridization.

PubMed

Katoski, Sarah E; Meyer, Hermann; Ibrahim, Sofi

2015-09-01

Accurate identification of biological threat agents, especially RNA viruses, in clinical or environmental samples can be challenging because the concentration of viral genomic material in a given sample is usually low, viral genomic RNA is liable to degradation, and RNA viruses are extremely diverse. A two-tiered approach was used for initial identification, then full genomic characterization of 199 RNA viruses belonging to virus families Arenaviridae, Bunyaviridae, Filoviridae, Flaviviridae, and Togaviridae. A Sequencing-by-hybridization (SBH) microarray was used to tentatively identify a viral pathogen then, the identity is confirmed by guided next-generation sequencing (NGS). After optimization and evaluation of the SBH and NGS methodologies with various virus species and strains, the approach was used to test the ability to identify viruses in blinded samples. The SBH correctly identified two Ebola viruses in the blinded samples within 24 hr, and by using guided amplicon sequencing with 454 GS FLX, the identities of the viruses in both samples were confirmed. SBH provides at relatively low-cost screening of biological samples against a panel of viral pathogens that can be custom-designed on a microarray. Once the identity of virus is deduced from the highest hybridization signal on the SBH microarray, guided (amplicon) NGS sequencing can be used not only to confirm the identity of the virus but also to provide further information about the strain or isolate, including a potential genetic manipulation. This approach can be useful in situations where natural or deliberate biological threat incidents might occur and a rapid response is required. © 2015 Wiley Periodicals, Inc.
Comparison of large-insert, small-insert and pyrosequencing libraries for metagenomic analysis.

PubMed

Danhorn, Thomas; Young, Curtis R; DeLong, Edward F

2012-11-01

The development of DNA sequencing methods for characterizing microbial communities has evolved rapidly over the past decades. To evaluate more traditional, as well as newer methodologies for DNA library preparation and sequencing, we compared fosmid, short-insert shotgun and 454 pyrosequencing libraries prepared from the same metagenomic DNA samples. GC content was elevated in all fosmid libraries, compared with shotgun and 454 libraries. Taxonomic composition of the different libraries suggested that this was caused by a relative underrepresentation of dominant taxonomic groups with low GC content, notably Prochlorales and the SAR11 cluster, in fosmid libraries. While these abundant taxa had a large impact on library representation, we also observed a positive correlation between taxon GC content and fosmid library representation in other low-GC taxa, suggesting a general trend. Analysis of gene category representation in different libraries indicated that the functional composition of a library was largely a reflection of its taxonomic composition, and no additional systematic biases against particular functional categories were detected at the level of sequencing depth in our samples. Another important but less predictable factor influencing the apparent taxonomic and functional library composition was the read length afforded by the different sequencing technologies. Our comparisons and analyses provide a detailed perspective on the influence of library type on the recovery of microbial taxa in metagenomic libraries and underscore the different uses and utilities of more traditional, as well as contemporary 'next-generation' DNA library construction and sequencing technologies for exploring the genomics of the natural microbial world.
Exome Sequencing Is an Efficient Tool for Variant Late-Infantile Neuronal Ceroid Lipofuscinosis Molecular Diagnosis

PubMed Central

Ortega-Recalde, Oscar; Nallathambi, Jeyabalan; Anandula, Venkata Ramana; Renukaradhya, Umashankar; Laissue, Paul

2014-01-01

The neuronal ceroid-lipofuscinoses (NCL) is a group of neurodegenerative disorders characterized by epilepsy, visual failure, progressive mental and motor deterioration, myoclonus, dementia and reduced life expectancy. Classically, NCL-affected individuals have been classified into six categories, which have been mainly defined regarding the clinical onset of symptoms. However, some patients cannot be easily included in a specific group because of significant variation in the age of onset and disease progression. Molecular genetics has emerged in recent years as a useful tool for enhancing NCL subtype classification. Fourteen NCL genetic forms (CLN1 to CLN14) have been described to date. The variant late-infantile form of the disease has been linked to CLN5, CLN6, CLN7 (MFSD8) and CLN8 mutations. Despite advances in the diagnosis of neurodegenerative disorders mutations in these genes may cause similar phenotypes, which rends difficult accurate candidate gene selection for direct sequencing. Three siblings who were affected by variant late-infantile NCL are reported in the present study. We used whole-exome sequencing, direct sequencing and in silico approaches to identify the molecular basis of the disease. We identified the novel c.1219T>C (p.Trp407Arg) and c.1361T>C (p.Met454Thr) MFSD8 pathogenic mutations. Our results highlighted next generation sequencing as a novel and powerful methodological approach for the rapid determination of the molecular diagnosis of NCL. They also provide information regarding the phenotypic and molecular spectrum of CLN7 disease. PMID:25333361
Comparison between two widely used laboratory methods in BRAF V600 mutation detection in a large cohort of clinical samples of cutaneous melanoma metastases to the lymph nodes.

PubMed

Jurkowska, Monika; Gos, Aleksandra; Ptaszyński, Konrad; Michej, Wanda; Tysarowski, Andrzej; Zub, Renata; Siedlecki, Janusz A; Rutkowski, Piotr

2015-01-01

The study compares detection rates of oncogenic BRAF mutations in a homogenous group of 236 FFPE cutaneous melanoma lymph node metastases, collected in one cancer center. BRAF mutational status was verified by two independent in-house PCR/Sanger sequencing tests, and the Cobas® 4800 BRAF V600 Mutation Test. The best of two sequencing approaches returned results for 230/236 samples. In 140 (60.9%), the mutation in codon 600 of BRAF was found. 91.4% of all mutated cases (128 samples) represented p.V600E. Both Sanger-based tests gave reproducible results although they differed significantly in the percentage of amplifiable samples: 230/236 to 109/143. Cobas generated results in all 236 cases, mutations changing codon V600 were detected in 144 of them (61.0%), including 5 not amplifiable and 5 negative in the standard sequencing. However, 6 cases positive in sequencing turned out to be negative in Cobas. Both tests provided us with the same BRAF V600 mutational status in 219 out of 230 cases with valid results (95.2%). The total BRAF V600 mutation detection rate didn't differ significantly between the two methodological approaches (60.9% vs. 61.0%). Sequencing was a reproducible method of V600 mutation detection and more powerful to detect mutations other than p.V600E, while Cobas test proved to be less susceptible to the poor DNA quality or investigator's bias. The study underlined an important role of pathologists in quality assurance of molecular diagnostics.
Illumina GA IIx& HiSeq 2000 Production Sequenccing and QC Analysis Pipelines at the DOE Joint Genome Institute

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daum, Christopher; Zane, Matthew; Han, James

2011-01-31

The U.S. Department of Energy (DOE) Joint Genome Institute's (JGI) Production Sequencing group is committed to the generation of high-quality genomic DNA sequence to support the mission areas of renewable energy generation, global carbon management, and environmental characterization and clean-up. Within the JGI's Production Sequencing group, a robust Illumina Genome Analyzer and HiSeq pipeline has been established. Optimization of the sesequencer pipelines has been ongoing with the aim of continual process improvement of the laboratory workflow, reducing operational costs and project cycle times to increases ample throughput, and improving the overall quality of the sequence generated. A sequence QC analysismore » pipeline has been implemented to automatically generate read and assembly level quality metrics. The foremost of these optimization projects, along with sequencing and operational strategies, throughput numbers, and sequencing quality results will be presented.« less
Temporal Code-Driven Stimulation: Definition and Application to Electric Fish Signaling

PubMed Central

Lareo, Angel; Forlim, Caroline G.; Pinto, Reynaldo D.; Varona, Pablo; Rodriguez, Francisco de Borja

2016-01-01

Closed-loop activity-dependent stimulation is a powerful methodology to assess information processing in biological systems. In this context, the development of novel protocols, their implementation in bioinformatics toolboxes and their application to different description levels open up a wide range of possibilities in the study of biological systems. We developed a methodology for studying biological signals representing them as temporal sequences of binary events. A specific sequence of these events (code) is chosen to deliver a predefined stimulation in a closed-loop manner. The response to this code-driven stimulation can be used to characterize the system. This methodology was implemented in a real time toolbox and tested in the context of electric fish signaling. We show that while there are codes that evoke a response that cannot be distinguished from a control recording without stimulation, other codes evoke a characteristic distinct response. We also compare the code-driven response to open-loop stimulation. The discussed experiments validate the proposed methodology and the software toolbox. PMID:27766078
Temporal Code-Driven Stimulation: Definition and Application to Electric Fish Signaling.

PubMed

Lareo, Angel; Forlim, Caroline G; Pinto, Reynaldo D; Varona, Pablo; Rodriguez, Francisco de Borja

2016-01-01

Closed-loop activity-dependent stimulation is a powerful methodology to assess information processing in biological systems. In this context, the development of novel protocols, their implementation in bioinformatics toolboxes and their application to different description levels open up a wide range of possibilities in the study of biological systems. We developed a methodology for studying biological signals representing them as temporal sequences of binary events. A specific sequence of these events (code) is chosen to deliver a predefined stimulation in a closed-loop manner. The response to this code-driven stimulation can be used to characterize the system. This methodology was implemented in a real time toolbox and tested in the context of electric fish signaling. We show that while there are codes that evoke a response that cannot be distinguished from a control recording without stimulation, other codes evoke a characteristic distinct response. We also compare the code-driven response to open-loop stimulation. The discussed experiments validate the proposed methodology and the software toolbox.
A safe an easy method for building consensus HIV sequences from 454 massively parallel sequencing data.

PubMed

Fernández-Caballero Rico, Jose Ángel; Chueca Porcuna, Natalia; Álvarez Estévez, Marta; Mosquera Gutiérrez, María Del Mar; Marcos Maeso, María Ángeles; García, Federico

2018-02-01

To show how to generate a consensus sequence from the information of massive parallel sequences data obtained from routine HIV anti-retroviral resistance studies, and that may be suitable for molecular epidemiology studies. Paired Sanger (Trugene-Siemens) and next-generation sequencing (NGS) (454 GSJunior-Roche) HIV RT and protease sequences from 62 patients were studied. NGS consensus sequences were generated using Mesquite, using 10%, 15%, and 20% thresholds. Molecular evolutionary genetics analysis (MEGA) was used for phylogenetic studies. At a 10% threshold, NGS-Sanger sequences from 17/62 patients were phylogenetically related, with a median bootstrap-value of 88% (IQR83.5-95.5). Association increased to 36/62 sequences, median bootstrap 94% (IQR85.5-98)], using a 15% threshold. Maximum association was at the 20% threshold, with 61/62 sequences associated, and a median bootstrap value of 99% (IQR98-100). A safe method is presented to generate consensus sequences from HIV-NGS data at 20% threshold, which will prove useful for molecular epidemiological studies. Copyright © 2016 Elsevier España, S.L.U. and Sociedad Española de Enfermedades Infecciosas y Microbiología Clínica. All rights reserved.
Improved annotation with de novo transcriptome assembly in four social amoeba species.

PubMed

Singh, Reema; Lawal, Hajara M; Schilde, Christina; Glöckner, Gernot; Barton, Geoffrey J; Schaap, Pauline; Cole, Christian

2017-01-31

Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects.
Emerging Concepts of Data Integration in Pathogen Phylodynamics.

PubMed

Baele, Guy; Suchard, Marc A; Rambaut, Andrew; Lemey, Philippe

2017-01-01

Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics.
Big Data Application in Biomedical Research and Health Care: A Literature Review.

PubMed

Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

2016-01-01

Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care.
Emerging Concepts of Data Integration in Pathogen Phylodynamics

PubMed Central

Baele, Guy; Suchard, Marc A.; Rambaut, Andrew; Lemey, Philippe

2017-01-01

Phylodynamics has become an increasingly popular statistical framework to extract evolutionary and epidemiological information from pathogen genomes. By harnessing such information, epidemiologists aim to shed light on the spatio-temporal patterns of spread and to test hypotheses about the underlying interaction of evolutionary and ecological dynamics in pathogen populations. Although the field has witnessed a rich development of statistical inference tools with increasing levels of sophistication, these tools initially focused on sequences as their sole primary data source. Integrating various sources of information, however, promises to deliver more precise insights in infectious diseases and to increase opportunities for statistical hypothesis testing. Here, we review how the emerging concept of data integration is stimulating new advances in Bayesian evolutionary inference methodology which formalize a marriage of statistical thinking and evolutionary biology. These approaches include connecting sequence to trait evolution, such as for host, phenotypic and geographic sampling information, but also the incorporation of covariates of evolutionary and epidemic processes in the reconstruction procedures. We highlight how a full Bayesian approach to covariate modeling and testing can generate further insights into sequence evolution, trait evolution, and population dynamics in pathogen populations. Specific examples demonstrate how such approaches can be used to test the impact of host on rabies and HIV evolutionary rates, to identify the drivers of influenza dispersal as well as the determinants of rabies cross-species transmissions, and to quantify the evolutionary dynamics of influenza antigenicity. Finally, we briefly discuss how data integration is now also permeating through the inference of transmission dynamics, leading to novel insights into tree-generative processes and detailed reconstructions of transmission trees. [Bayesian inference; birth–death models; coalescent models; continuous trait evolution; covariates; data integration; discrete trait evolution; pathogen phylodynamics. PMID:28173504
Big Data Application in Biomedical Research and Health Care: A Literature Review

PubMed Central

Luo, Jake; Wu, Min; Gopukumar, Deepika; Zhao, Yiqing

2016-01-01

Big data technologies are increasingly used for biomedical and health-care informatics research. Large amounts of biological and clinical data have been generated and collected at an unprecedented speed and scale. For example, the new generation of sequencing technologies enables the processing of billions of DNA sequence data per day, and the application of electronic health records (EHRs) is documenting large amounts of patient data. The cost of acquiring and analyzing biomedical data is expected to decrease dramatically with the help of technology upgrades, such as the emergence of new sequencing machines, the development of novel hardware and software for parallel computing, and the extensive expansion of EHRs. Big data applications present new opportunities to discover new knowledge and create novel methods to improve the quality of health care. The application of big data in health care is a fast-growing field, with many new discoveries and methodologies published in the last five years. In this paper, we review and discuss big data application in four major biomedical subdisciplines: (1) bioinformatics, (2) clinical informatics, (3) imaging informatics, and (4) public health informatics. Specifically, in bioinformatics, high-throughput experiments facilitate the research of new genome-wide association studies of diseases, and with clinical informatics, the clinical field benefits from the vast amount of collected patient data for making intelligent decisions. Imaging informatics is now more rapidly integrated with cloud platforms to share medical image data and workflows, and public health informatics leverages big data techniques for predicting and monitoring infectious disease outbreaks, such as Ebola. In this paper, we review the recent progress and breakthroughs of big data applications in these health-care domains and summarize the challenges, gaps, and opportunities to improve and advance big data applications in health care. PMID:26843812
A Methodology for Calculating EGS Electricity Generation Potential Based on the Gringarten Model for Heat Extraction From Fractured Rock

DOE Office of Scientific and Technical Information (OSTI.GOV)

Augustine, Chad

Existing methodologies for estimating the electricity generation potential of Enhanced Geothermal Systems (EGS) assume thermal recovery factors of 5% or less, resulting in relatively low volumetric electricity generation potentials for EGS reservoirs. This study proposes and develops a methodology for calculating EGS electricity generation potential based on the Gringarten conceptual model and analytical solution for heat extraction from fractured rock. The electricity generation potential of a cubic kilometer of rock as a function of temperature is calculated assuming limits on the allowed produced water temperature decline and reservoir lifetime based on surface power plant constraints. The resulting estimates of EGSmore » electricity generation potential can be one to nearly two-orders of magnitude larger than those from existing methodologies. The flow per unit fracture surface area from the Gringarten solution is found to be a key term in describing the conceptual reservoir behavior. The methodology can be applied to aid in the design of EGS reservoirs by giving minimum reservoir volume, fracture spacing, number of fractures, and flow requirements for a target reservoir power output. Limitations of the idealized model compared to actual reservoir performance and the implications on reservoir design are discussed.« less
Nonmedical prescription opioids and pathways of drug involvement in the US: Generational differences.

PubMed

Wall, Melanie; Cheslack-Postava, Keely; Hu, Mei-Chen; Feng, Tianshu; Griesler, Pamela; Kandel, Denise B

2018-01-01

This study sought to specify (1) the position of nonmedical prescription opioids (NMPO) in drug initiation sequences among Millennials (1979-96), Generation X (1964-79), and Baby Boomers (1949-64) and (2) gender and racial/ethnic differences in sequences among Millennials. Data are from the 2013-2014 National Surveys on Drug Use and Health (n = 73,026). We identified statistically significant drug initiation sequences involving alcohol/cigarettes, marijuana, NMPO, cocaine, and heroin using a novel method distinguishing significant sequences from patterns expected only due to correlations induced by common liability among drugs. Alcohol/cigarettes followed by marijuana was the most common sequence. NMPO or cocaine use after marijuana, and heroin use after NMPO or cocaine, differed by generation. Among successively younger generations, NMPO after marijuana and heroin after NMPO increased. Millennials were more likely to initiate NMPO than cocaine after marijuana; Generation X and Baby Boomers were less likely (odds ratios = 1.4;0.3;0.2). Millennials were more likely than Generation X and Baby Boomers to use heroin after NMPO (hazards ratios = 7.1;3.4;2.5). In each generation, heroin users were far more likely to start heroin after both NMPO and cocaine than either alone. Sequences were similar by gender. Fewer paths were significant among African-Americans. NMPOs play a more prominent role in drug initiation sequences among Millennials than prior generations. Among Millennials, NMPO use is more likely than cocaine to follow marijuana use. In all generations, transition to heroin from NMPO significantly occurs only when both NMPO and cocaine have been used. Delineation of drug sequences suggests optimal points in development for prevention and treatment efforts. Copyright © 2017 Elsevier B.V. All rights reserved.
Sequence information gain based motif analysis.

PubMed

Maynou, Joan; Pairó, Erola; Marco, Santiago; Perera, Alexandre

2015-11-09

The detection of regulatory regions in candidate sequences is essential for the understanding of the regulation of a particular gene and the mechanisms involved. This paper proposes a novel methodology based on information theoretic metrics for finding regulatory sequences in promoter regions. This methodology (SIGMA) has been tested on genomic sequence data for Homo sapiens and Mus musculus. SIGMA has been compared with different publicly available alternatives for motif detection, such as MEME/MAST, Biostrings (Bioconductor package), MotifRegressor, and previous work such Qresiduals projections or information theoretic based detectors. Comparative results, in the form of Receiver Operating Characteristic curves, show how, in 70% of the studied Transcription Factor Binding Sites, the SIGMA detector has a better performance and behaves more robustly than the methods compared, while having a similar computational time. The performance of SIGMA can be explained by its parametric simplicity in the modelling of the non-linear co-variability in the binding motif positions. Sequence Information Gain based Motif Analysis is a generalisation of a non-linear model of the cis-regulatory sequences detection based on Information Theory. This generalisation allows us to detect transcription factor binding sites with maximum performance disregarding the covariability observed in the positions of the training set of sequences. SIGMA is freely available to the public at http://b2slab.upc.edu.
Novel Method for High-Throughput Full-Length IGHV-D-J Sequencing of the Immune Repertoire from Bulk B-Cells with Single-Cell Resolution.

PubMed

Vergani, Stefano; Korsunsky, Ilya; Mazzarello, Andrea Nicola; Ferrer, Gerardo; Chiorazzi, Nicholas; Bagnara, Davide

2017-01-01

Efficient and accurate high-throughput DNA sequencing of the adaptive immune receptor repertoire (AIRR) is necessary to study immune diversity in healthy subjects and disease-related conditions. The high complexity and diversity of the AIRR coupled with the limited amount of starting material, which can compromise identification of the full biological diversity makes such sequencing particularly challenging. AIRR sequencing protocols often fail to fully capture the sampled AIRR diversity, especially for samples containing restricted numbers of B lymphocytes. Here, we describe a library preparation method for immunoglobulin sequencing that results in an exhaustive full-length repertoire where virtually every sampled B-cell is sequenced. This maximizes the likelihood of identifying and quantifying the entire IGHV-D-J repertoire of a sample, including the detection of rearrangements present in only one cell in the starting population. The methodology establishes the importance of circumventing genetic material dilution in the preamplification phases and incorporates the use of certain described concepts: (1) balancing the starting material amount and depth of sequencing, (2) avoiding IGHV gene-specific amplification, and (3) using Unique Molecular Identifier. Together, this methodology is highly efficient, in particular for detecting rare rearrangements in the sampled population and when only a limited amount of starting material is available.
Concatenated shift registers generating maximally spaced phase shifts of PN-sequences

NASA Technical Reports Server (NTRS)

Hurd, W. J.; Welch, L. R.

1977-01-01

A large class of linearly concatenated shift registers is shown to generate approximately maximally spaced phase shifts of pn-sequences, for use in pseudorandom number generation. A constructive method is presented for finding members of this class, for almost all degrees for which primitive trinomials exist. The sequences which result are not normally characterized by trinomial recursions, which is desirable since trinomial sequences can have some undesirable randomness properties.
Stimulus novelty, task relevance and the visual evoked potential in man

NASA Technical Reports Server (NTRS)

Courchesne, E.; Hillyard, S. A.; Galambos, R.

1975-01-01

The effect of task relevance on P3 (waveform of human evoked potential) waves and the methodologies used to deal with them are outlined. Visual evoked potentials (VEPs) were recorded from normal adult subjects performing in a visual discrimination task. Subjects counted the number of presentations of the numeral 4 which was interposed rarely and randomly within a sequence of tachistoscopically flashed background stimuli. Intrusive, task-irrelevant (not counted) stimuli were also interspersed rarely and randomly in the sequence of 2s; these stimuli were of two types: simples, which were easily recognizable, and novels, which were completely unrecognizable. It was found that the simples and the counted 4s evoked posteriorly distributed P3 waves while the irrelevant novels evoked large, frontally distributed P3 waves. These large, frontal P3 waves to novels were also found to be preceded by large N2 waves. These findings indicate that the P3 wave is not a unitary phenomenon but should be considered in terms of a family of waves, differing in their brain generators and in their psychological correlates.

Neural Sequence Generation Using Spatiotemporal Patterns of Inhibition.

PubMed

Cannon, Jonathan; Kopell, Nancy; Gardner, Timothy; Markowitz, Jeffrey

2015-11-01

Stereotyped sequences of neural activity are thought to underlie reproducible behaviors and cognitive processes ranging from memory recall to arm movement. One of the most prominent theoretical models of neural sequence generation is the synfire chain, in which pulses of synchronized spiking activity propagate robustly along a chain of cells connected by highly redundant feedforward excitation. But recent experimental observations in the avian song production pathway during song generation have shown excitatory activity interacting strongly with the firing patterns of inhibitory neurons, suggesting a process of sequence generation more complex than feedforward excitation. Here we propose a model of sequence generation inspired by these observations in which a pulse travels along a spatially recurrent excitatory chain, passing repeatedly through zones of local feedback inhibition. In this model, synchrony and robust timing are maintained not through redundant excitatory connections, but rather through the interaction between the pulse and the spatiotemporal pattern of inhibition that it creates as it circulates the network. These results suggest that spatially and temporally structured inhibition may play a key role in sequence generation.
WebLogo: A Sequence Logo Generator

PubMed Central

Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.

2004-01-01

WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120
Pseudorandom number generation using chaotic true orbits of the Bernoulli map

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saito, Asaki, E-mail: saito@fun.ac.jp; Yamaguchi, Akihiro

We devise a pseudorandom number generator that exactly computes chaotic true orbits of the Bernoulli map on quadratic algebraic integers. Moreover, we describe a way to select the initial points (seeds) for generating multiple pseudorandom binary sequences. This selection method distributes the initial points almost uniformly (equidistantly) in the unit interval, and latter parts of the generated sequences are guaranteed not to coincide. We also demonstrate through statistical testing that the generated sequences possess good randomness properties.
Application of discrete Fourier inter-coefficient difference for assessing genetic sequence similarity.

PubMed

King, Brian R; Aburdene, Maurice; Thompson, Alex; Warres, Zach

2014-01-01

Digital signal processing (DSP) techniques for biological sequence analysis continue to grow in popularity due to the inherent digital nature of these sequences. DSP methods have demonstrated early success for detection of coding regions in a gene. Recently, these methods are being used to establish DNA gene similarity. We present the inter-coefficient difference (ICD) transformation, a novel extension of the discrete Fourier transformation, which can be applied to any DNA sequence. The ICD method is a mathematical, alignment-free DNA comparison method that generates a genetic signature for any DNA sequence that is used to generate relative measures of similarity among DNA sequences. We demonstrate our method on a set of insulin genes obtained from an evolutionarily wide range of species, and on a set of avian influenza viral sequences, which represents a set of highly similar sequences. We compare phylogenetic trees generated using our technique against trees generated using traditional alignment techniques for similarity and demonstrate that the ICD method produces a highly accurate tree without requiring an alignment prior to establishing sequence similarity.
Proceedings of the Conference on Moments and Signal Proceedings Held in Monterey, California on 30-31 March 1992

DTIC Science & Technology

1992-09-21

describt systeiii- atic methodologies for selecting nonlitinr transformiations for blind equal- ization algorithins ,and thus new types of culnulants...nonlinearity is inside the adaptive filter, i.e., the nonlinear filter or neural network. ’We describe methodologies for selecting nonlinear...which do riot require any known training sequence during the startup period. 4 The paper describes systematic methodologies for selecting the
A methodology for eliciting, representing, and analysing stakeholder knowledge for decision making on complex socio-ecological systems: from cognitive maps to agent-based models.

PubMed

Elsawah, Sondoss; Guillaume, Joseph H A; Filatova, Tatiana; Rook, Josefine; Jakeman, Anthony J

2015-03-15

This paper aims to contribute to developing better ways for incorporating essential human elements in decision making processes for modelling of complex socio-ecological systems. It presents a step-wise methodology for integrating perceptions of stakeholders (qualitative) into formal simulation models (quantitative) with the ultimate goal of improving understanding and communication about decision making in complex socio-ecological systems. The methodology integrates cognitive mapping and agent based modelling. It cascades through a sequence of qualitative/soft and numerical methods comprising: (1) Interviews to elicit mental models; (2) Cognitive maps to represent and analyse individual and group mental models; (3) Time-sequence diagrams to chronologically structure the decision making process; (4) All-encompassing conceptual model of decision making, and (5) computational (in this case agent-based) Model. We apply the proposed methodology (labelled ICTAM) in a case study of viticulture irrigation in South Australia. Finally, we use strengths-weakness-opportunities-threats (SWOT) analysis to reflect on the methodology. Results show that the methodology leverages the use of cognitive mapping to capture the richness of decision making and mental models, and provides a combination of divergent and convergent analysis methods leading to the construction of an Agent Based Model. Copyright © 2014 Elsevier Ltd. All rights reserved.
FOUNTAIN: A JAVA open-source package to assist large sequencing projects

PubMed Central

Buerstedde, Jean-Marie; Prill, Florian

2001-01-01

Background Better automation, lower cost per reaction and a heightened interest in comparative genomics has led to a dramatic increase in DNA sequencing activities. Although the large sequencing projects of specialized centers are supported by in-house bioinformatics groups, many smaller laboratories face difficulties managing the appropriate processing and storage of their sequencing output. The challenges include documentation of clones, templates and sequencing reactions, and the storage, annotation and analysis of the large number of generated sequences. Results We describe here a new program, named FOUNTAIN, for the management of large sequencing projects . FOUNTAIN uses the JAVA computer language and data storage in a relational database. Starting with a collection of sequencing objects (clones), the program generates and stores information related to the different stages of the sequencing project using a web browser interface for user input. The generated sequences are subsequently imported and annotated based on BLAST searches against the public databases. In addition, simple algorithms to cluster sequences and determine putative polymorphic positions are implemented. Conclusions A simple, but flexible and scalable software package is presented to facilitate data generation and storage for large sequencing projects. Open source and largely platform and database independent, we wish FOUNTAIN to be improved and extended in a community effort. PMID:11591214
A Simple and Efficient Methodology To Improve Geometric Accuracy in Gamma Knife Radiation Surgery: Implementation in Multiple Brain Metastases

DOE Office of Scientific and Technical Information (OSTI.GOV)

Karaiskos, Pantelis, E-mail: pkaraisk@med.uoa.gr; Gamma Knife Department, Hygeia Hospital, Athens; Moutsatsos, Argyris

Purpose: To propose, verify, and implement a simple and efficient methodology for the improvement of total geometric accuracy in multiple brain metastases gamma knife (GK) radiation surgery. Methods and Materials: The proposed methodology exploits the directional dependence of magnetic resonance imaging (MRI)-related spatial distortions stemming from background field inhomogeneities, also known as sequence-dependent distortions, with respect to the read-gradient polarity during MRI acquisition. First, an extra MRI pulse sequence is acquired with the same imaging parameters as those used for routine patient imaging, aside from a reversal in the read-gradient polarity. Then, “average” image data are compounded from data acquiredmore » from the 2 MRI sequences and are used for treatment planning purposes. The method was applied and verified in a polymer gel phantom irradiated with multiple shots in an extended region of the GK stereotactic space. Its clinical impact in dose delivery accuracy was assessed in 15 patients with a total of 96 relatively small (<2 cm) metastases treated with GK radiation surgery. Results: Phantom study results showed that use of average MR images eliminates the effect of sequence-dependent distortions, leading to a total spatial uncertainty of less than 0.3 mm, attributed mainly to gradient nonlinearities. In brain metastases patients, non-eliminated sequence-dependent distortions lead to target localization uncertainties of up to 1.3 mm (mean: 0.51 ± 0.37 mm) with respect to the corresponding target locations in the “average” MRI series. Due to these uncertainties, a considerable underdosage (5%-32% of the prescription dose) was found in 33% of the studied targets. Conclusions: The proposed methodology is simple and straightforward in its implementation. Regarding multiple brain metastases applications, the suggested approach may substantially improve total GK dose delivery accuracy in smaller, outlying targets.« less
Sequence Data for Clostridium autoethanogenum using Three Generations of Sequencing Technologies

DOE PAGES

Utturkar, Sagar M.; Klingeman, Dawn Marie; Bruno-Barcena, José M.; ...

2015-04-14

During the past decade, DNA sequencing output has been mostly dominated by the second generation sequencing platforms which are characterized by low cost, high throughput and shorter read lengths for example, Illumina. The emergence and development of so called third generation sequencing platforms such as PacBio has permitted exceptionally long reads (over 20 kb) to be generated. Due to read length increases, algorithm improvements and hybrid assembly approaches, the concept of one chromosome, one contig and automated finishing of microbial genomes is now a realistic and achievable task for many microbial laboratories. In this paper, we describe high quality sequencemore » datasets which span three generations of sequencing technologies, containing six types of data from four NGS platforms and originating from a single microorganism, Clostridium autoethanogenum. The dataset reported here will be useful for the scientific community to evaluate upcoming NGS platforms, enabling comparison of existing and novel bioinformatics approaches and will encourage interest in the development of innovative experimental and computational methods for NGS data.« less
Spherical: an iterative workflow for assembling metagenomic datasets.

PubMed

Hitch, Thomas C A; Creevey, Christopher J

2018-01-24

The consensus emerging from the study of microbiomes is that they are far more complex than previously thought, requiring better assemblies and increasingly deeper sequencing. However, current metagenomic assembly techniques regularly fail to incorporate all, or even the majority in some cases, of the sequence information generated for many microbiomes, negating this effort. This can especially bias the information gathered and the perceived importance of the minor taxa in a microbiome. We propose a simple but effective approach, implemented in Python, to address this problem. Based on an iterative methodology, our workflow (called Spherical) carries out successive rounds of assemblies with the sequencing reads not yet utilised. This approach also allows the user to reduce the resources required for very large datasets, by assembling random subsets of the whole in a "divide and conquer" manner. We demonstrate the accuracy of Spherical using simulated data based on completely sequenced genomes and the effectiveness of the workflow at retrieving lost information for taxa in three published metagenomics studies of varying sizes. Our results show that Spherical increased the amount of reads utilized in the assembly by up to 109% compared to the base assembly. The additional contigs assembled by the Spherical workflow resulted in a significant (P < 0.05) changes in the predicted taxonomic profile of all datasets analysed. Spherical is implemented in Python 2.7 and freely available for use under the MIT license. Source code and documentation is hosted publically at: https://github.com/thh32/Spherical .
De Novo Assembly, Characterization and Functional Annotation of Pineapple Fruit Transcriptome through Massively Parallel Sequencing

PubMed Central

Ong, Wen Dee; Voo, Lok-Yung Christopher; Kumar, Vijay Subbiah

2012-01-01

Background Pineapple (Ananas comosus var. comosus), is an important tropical non-climacteric fruit with high commercial potential. Understanding the mechanism and processes underlying fruit ripening would enable scientists to enhance the improvement of quality traits such as, flavor, texture, appearance and fruit sweetness. Although, the pineapple is an important fruit, there is insufficient transcriptomic or genomic information that is available in public databases. Application of high throughput transcriptome sequencing to profile the pineapple fruit transcripts is therefore needed. Methodology/Principal Findings To facilitate this, we have performed transcriptome sequencing of ripe yellow pineapple fruit flesh using Illumina technology. About 4.7 millions Illumina paired-end reads were generated and assembled using the Velvet de novo assembler. The assembly produced 28,728 unique transcripts with a mean length of approximately 200 bp. Sequence similarity search against non-redundant NCBI database identified a total of 16,932 unique transcripts (58.93%) with significant hits. Out of these, 15,507 unique transcripts were assigned to gene ontology terms. Functional annotation against Kyoto Encyclopedia of Genes and Genomes pathway database identified 13,598 unique transcripts (47.33%) which were mapped to 126 pathways. The assembly revealed many transcripts that were previously unknown. Conclusions The unique transcripts derived from this work have rapidly increased of the number of the pineapple fruit mRNA transcripts as it is now available in public databases. This information can be further utilized in gene expression, genomics and other functional genomics studies in pineapple. PMID:23091603
Sequencing CYP2D6 for the detection of poor-metabolizers in post-mortem blood samples with tramadol.

PubMed

Fonseca, Suzana; Amorim, António; Costa, Heloísa Afonso; Franco, João; Porto, Maria João; Santos, Jorge Costa; Dias, Mário

2016-08-01

Tramadol concentrations and analgesic effect are dependent on the CYP2D6 enzymatic activity. It is well known that some genetic polymorphisms are responsible for the variability in the expression of this enzyme and in the individual drug response. The detection of allelic variants described as non-functional can be useful to explain some circumstances of death in the study of post-mortem cases with tramadol. A Sanger sequencing methodology was developed for the detection of genetic variants that cause absent or reduced CYP2D6 activity, such as *3, *4, *6, *8, *10 and *12 alleles. This methodology, as well as the GC/MS method for the detection and quantification of tramadol and its main metabolites in blood samples was fully validated in accordance with international guidelines. Both methodologies were successfully applied to 100 post-mortem blood samples and the relation between toxicological and genetic results evaluated. Tramadol metabolism, expressed as its metabolites concentration ratio (N-desmethyltramadol/O-desmethyltramadol), has been shown to be correlated with the poor-metabolizer phenotype based on genetic characterization. It was also demonstrated the importance of enzyme inhibitors identification in toxicological analysis. According to our knowledge, this is the first study where a CYP2D6 sequencing methodology is validated and applied to post-mortem samples, in Portugal. The developed methodology allows the data collection of post-mortem cases, which is of primordial importance to enhance the application of these genetic tools to forensic toxicology and pathology. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Information capacity of nucleotide sequences and its applications.

PubMed

Sadovsky, M G

2006-05-01

The information capacity of nucleotide sequences is defined through the specific entropy of frequency dictionary of a sequence determined with respect to another one containing the most probable continuations of shorter strings. This measure distinguishes a sequence both from a random one, and from ordered entity. A comparison of sequences based on their information capacity is studied. An order within the genetic entities is found at the length scale ranged from 3 to 8. Some other applications of the developed methodology to genetics, bioinformatics, and molecular biology are discussed.
NullSeq: A Tool for Generating Random Coding Sequences with Desired Amino Acid and GC Contents.

PubMed

Liu, Sophia S; Hockenberry, Adam J; Lancichinetti, Andrea; Jewett, Michael C; Amaral, Luís A N

2016-11-01

The existence of over- and under-represented sequence motifs in genomes provides evidence of selective evolutionary pressures on biological mechanisms such as transcription, translation, ligand-substrate binding, and host immunity. In order to accurately identify motifs and other genome-scale patterns of interest, it is essential to be able to generate accurate null models that are appropriate for the sequences under study. While many tools have been developed to create random nucleotide sequences, protein coding sequences are subject to a unique set of constraints that complicates the process of generating appropriate null models. There are currently no tools available that allow users to create random coding sequences with specified amino acid composition and GC content for the purpose of hypothesis testing. Using the principle of maximum entropy, we developed a method that generates unbiased random sequences with pre-specified amino acid and GC content, which we have developed into a python package. Our method is the simplest way to obtain maximally unbiased random sequences that are subject to GC usage and primary amino acid sequence constraints. Furthermore, this approach can easily be expanded to create unbiased random sequences that incorporate more complicated constraints such as individual nucleotide usage or even di-nucleotide frequencies. The ability to generate correctly specified null models will allow researchers to accurately identify sequence motifs which will lead to a better understanding of biological processes as well as more effective engineering of biological systems.
Simulations Using Random-Generated DNA and RNA Sequences

ERIC Educational Resources Information Center

Bryce, C. F. A.

1977-01-01

Using a very simple computer program written in BASIC, a very large number of random-generated DNA or RNA sequences are obtained. Students use these sequences to predict complementary sequences and translational products, evaluate base compositions, determine frequencies of particular triplet codons, and suggest possible secondary structures.…
Novel methods to optimize genotypic imputation for low-coverage, next-generation sequence data in crop plants

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing technology such as genotyping-by-sequencing (GBS) made low-cost, but often low-coverage, whole-genome sequencing widely available. Extensive inbreeding in crop plants provides an untapped, high quality source of phased haplotypes for imputing missing genotypes. We introduc...
Applications and Case Studies of the Next-Generation Sequencing Technologies in Food, Nutrition and Agriculture.

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing technologies are able to produce high-throughput short sequence reads in a cost-effective fashion. The emergence of these technologies has not only facilitated genome sequencing but also changed the landscape of life sciences. Here I survey their major applications ranging...
Next generation sequencers: methods and applications in food-borne pathogens

USDA-ARS?s Scientific Manuscript database

Next generation sequencers are able to produce millions of short sequence reads in a high-throughput, low-cost way. The emergence of these technologies has not only facilitated genome sequencing but also started to change the landscape of life sciences. This chapter will survey their methods and app...
Functional identification of spike-processing neural circuits.

PubMed

Lazar, Aurel A; Slutskiy, Yevgeniy B

2014-02-01

We introduce a novel approach for a complete functional identification of biophysical spike-processing neural circuits. The circuits considered accept multidimensional spike trains as their input and comprise a multitude of temporal receptive fields and conductance-based models of action potential generation. Each temporal receptive field describes the spatiotemporal contribution of all synapses between any two neurons and incorporates the (passive) processing carried out by the dendritic tree. The aggregate dendritic current produced by a multitude of temporal receptive fields is encoded into a sequence of action potentials by a spike generator modeled as a nonlinear dynamical system. Our approach builds on the observation that during any experiment, an entire neural circuit, including its receptive fields and biophysical spike generators, is projected onto the space of stimuli used to identify the circuit. Employing the reproducing kernel Hilbert space (RKHS) of trigonometric polynomials to describe input stimuli, we quantitatively describe the relationship between underlying circuit parameters and their projections. We also derive experimental conditions under which these projections converge to the true parameters. In doing so, we achieve the mathematical tractability needed to characterize the biophysical spike generator and identify the multitude of receptive fields. The algorithms obviate the need to repeat experiments in order to compute the neurons' rate of response, rendering our methodology of interest to both experimental and theoretical neuroscientists.
Comprehensive evaluation of AmpliSeq transcriptome, a novel targeted whole transcriptome RNA sequencing methodology for global gene expression analysis.

PubMed

Li, Wenli; Turner, Amy; Aggarwal, Praful; Matter, Andrea; Storvick, Erin; Arnett, Donna K; Broeckel, Ulrich

2015-12-16

Whole transcriptome sequencing (RNA-seq) represents a powerful approach for whole transcriptome gene expression analysis. However, RNA-seq carries a few limitations, e.g., the requirement of a significant amount of input RNA and complications led by non-specific mapping of short reads. The Ion AmpliSeq Transcriptome Human Gene Expression Kit (AmpliSeq) was recently introduced by Life Technologies as a whole-transcriptome, targeted gene quantification kit to overcome these limitations of RNA-seq. To assess the performance of this new methodology, we performed a comprehensive comparison of AmpliSeq with RNA-seq using two well-established next-generation sequencing platforms (Illumina HiSeq and Ion Torrent Proton). We analyzed standard reference RNA samples and RNA samples obtained from human induced pluripotent stem cell derived cardiomyocytes (hiPSC-CMs). Using published data from two standard RNA reference samples, we observed a strong concordance of log2 fold change for all genes when comparing AmpliSeq to Illumina HiSeq (Pearson's r = 0.92) and Ion Torrent Proton (Pearson's r = 0.92). We used ROC, Matthew's correlation coefficient and RMSD to determine the overall performance characteristics. All three statistical methods demonstrate AmpliSeq as a highly accurate method for differential gene expression analysis. Additionally, for genes with high abundance, AmpliSeq outperforms the two RNA-seq methods. When analyzing four closely related hiPSC-CM lines, we show that both AmpliSeq and RNA-seq capture similar global gene expression patterns consistent with known sources of variations. Our study indicates that AmpliSeq excels in the limiting areas of RNA-seq for gene expression quantification analysis. Thus, AmpliSeq stands as a very sensitive and cost-effective approach for very large scale gene expression analysis and mRNA marker screening with high accuracy.

Recommended design and fabrication sequence of AMTEC test assembly

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schock, A.; Kumar, V.; Noravian, H.

1998-01-01

A series of previous OSC papers described: 1) a novel methodology for the coupled thermal, fluid flow, and electrical analysis of multitube AMTEC (Alkali Metal Thermal-to-Electric Conversion) cells; 2) the application of that methodology to determine the effect of numerous design variations on the cell{close_quote}s performance, leading to selection and performance characterization of an OSC-recommended cell design; and 3) the design, analysis, and characterization of an OSC-generated power system design combining sixteen of the above AMTEC cells with two or three GPHS (General Purpose Heat Source) radioisotope heat source modules, and the applicability of those power systems to future spacemore » missions ({ital e.g.} Pluto Express and Europa Orbiter) under consideration by NASA. The OSC system design studies demonstrated the critical importance of the thermal insulation subsystem, and culminated in a design in which the eight AMTEC cells on each end of the heat source stack are embedded in Min-K fibrous insulation, and the Min-K and the GPHS modules are surrounded by graded-length Mo multifoil insulation. The present paper depicts the OSC-recommended AMTEC cell and generator designs, and identifies the need for an electrically heated (scaled-down but otherwise prototypic) test assembly for the experimental validation of the generator{close_quote}s system performance predictions. It then describes the design of an OSC-recommended test assembly consisting of an electrical heater enclosed in a graphite box to simulate the radioisotope heat source, four series-connected prototypic AMTEC cells of the OSC-recommended configuration, and a prototypic hybrid insulation package consisting of Min-K and graded-length Mo multifoils. Finally, the paper describes and illustrates an OSC-recommended detailed fabrication sequence and procedure for the above cell and test assembly. That fabrication procedure is being implemented by AMPS, Inc. with the support of DOE{close_quote}s Oak Ridge and Mound Laboratories, and the Air Force Phillips Laboratory (AFPL) will test the performance of the assembly over a range of input thermal powers and output voltages. The experimentally measured performance will be compared with the results of OSC analyses of the same insulated test assembly over the same range of operating parameters. {copyright} {ital 1998 American Institute of Physics.}« less
Arbitrary digital pulse sequence generator with delay-loop timing

NASA Astrophysics Data System (ADS)

Hošák, Radim; Ježek, Miroslav

2018-04-01

We propose an idea of an electronic multi-channel arbitrary digital sequence generator with temporal granularity equal to two clock cycles. We implement the generator with 32 channels using a low-cost ARM microcontroller and demonstrate its capability to produce temporal delays ranging from tens of nanoseconds to hundreds of seconds, with 24 ns timing granularity and linear scaling of delay with respect to the number of delay loop iterations. The generator is optionally synchronized with an external clock source to provide 100 ps jitter and overall sequence repeatability within the whole temporal range. The generator is fully programmable and able to produce digital sequences of high complexity. The concept of the generator can be implemented using different microcontrollers and applied for controlling of various optical, atomic, and nuclear physics measurement setups.
Development and evaluation of clicker methodology for introductory physics courses

NASA Astrophysics Data System (ADS)

Lee, Albert H.

Many educators understand that lectures are cost effective but not learning efficient, so continue to search for ways to increase active student participation in this traditionally passive learning environment. In-class polling systems, or "clickers", are inexpensive and reliable tools allowing students to actively participate in lectures by answering multiple-choice questions. Students assess their learning in real time by observing instant polling summaries displayed in front of them. This in turn motivates additional discussions which increase the opportunity for active learning. We wanted to develop a comprehensive clicker methodology that creates an active lecture environment for a broad spectrum of students taking introductory physics courses. We wanted our methodology to incorporate many findings of contemporary learning science. It is recognized that learning requires active construction; students need to be actively involved in their own learning process. Learning also depends on preexisting knowledge; students construct new knowledge and understandings based on what they already know and believe. Learning is context dependent; students who have learned to apply a concept in one context may not be able to recognize and apply the same concept in a different context, even when both contexts are considered to be isomorphic by experts. On this basis, we developed question sequences, each involving the same concept but having different contexts. Answer choices are designed to address students preexisting knowledge. These sequences are used with the clickers to promote active discussions and multiple assessments. We have created, validated, and evaluated sequences sufficient in number to populate all of introductory physics courses. Our research has found that using clickers with our question sequences significantly improved student conceptual understanding. Our research has also found how to best measure student conceptual gain using research-based instruments. Finally, we discovered that students need to have full access to the question sequences after lectures to reap the maximum benefit. Chapter 1 provides an introduction to our research. Chapter 2 provides a literature review relevant for our research. Chapter 3 discusses the creation of the clicker question sequences. Chapter 4 provides a picture of the validation process involving both physics experts and the introductory physics students. Chapter 5 describes how the sequences have been used with clickers in lectures. Chapter 6 provides the evaluation of the effectiveness of the clicker methodology. Chapter 7 contains a brief summary of research results and conclusions.
Evaluation and Selection of Best Priority Sequencing Rule in Job Shop Scheduling using Hybrid MCDM Technique

NASA Astrophysics Data System (ADS)

Kiran Kumar, Kalla; Nagaraju, Dega; Gayathri, S.; Narayanan, S.

2017-05-01

Priority Sequencing Rules provide the guidance for the order in which the jobs are to be processed at a workstation. The application of different priority rules in job shop scheduling gives different order of scheduling. More experimentation needs to be conducted before a final choice is made to know the best priority sequencing rule. Hence, a comprehensive method of selecting the right choice is essential in managerial decision making perspective. This paper considers seven different priority sequencing rules in job shop scheduling. For evaluation and selection of the best priority sequencing rule, a set of eight criteria are considered. The aim of this work is to demonstrate the methodology of evaluating and selecting the best priority sequencing rule by using hybrid multi criteria decision making technique (MCDM), i.e., analytical hierarchy process (AHP) with technique for order preference by similarity to ideal solution (TOPSIS). The criteria weights are calculated by using AHP whereas the relative closeness values of all priority sequencing rules are computed based on TOPSIS with the help of data acquired from the shop floor of a manufacturing firm. Finally, from the findings of this work, the priority sequencing rules are ranked from most important to least important. The comprehensive methodology presented in this paper is very much essential for the management of a workstation to choose the best priority sequencing rule among the available alternatives for processing the jobs with maximum benefit.
Future generations, environmental ethics, and global environmental change

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tonn, B.E.

1994-12-31

The elements of a methodology to be employed by the global community to investigate the consequences of global environmental change upon future generations and global ecosystems are outlined in this paper. The methodology is comprised of two major components: A possible future worlds model; and a formal, citizen-oriented process to judge whether the possible future worlds potentially inheritable by future generations meet obligational standards. A broad array of descriptors of future worlds can be encompassed within this framework, including survival of ecosystems and other species and satisfaction of human concerns. The methodology expresses fundamental psychological motivations and human myths journey,more » renewal, mother earth, and being-in-nature-and incorporates several viewpoints on obligations to future generations-maintaining options, fairness, humility, and the cause of humanity. The methodology overcomes several severe drawbacks of the economic-based methods most commonly used for global environmental policy analysis.« less
Efficient generation of cavitation bubbles and reactive oxygen species using triggered high-intensity focused ultrasound sequence for sonodynamic treatment

NASA Astrophysics Data System (ADS)

Yasuda, Jun; Yoshizawa, Shin; Umemura, Shin-ichiro

2016-07-01

Sonodynamic treatment is a method of treating cancer using reactive oxygen species (ROS) generated by cavitation bubbles in collaboration with a sonosensitizer at a target tissue. In this treatment method, both localized ROS generation and ROS generation with high efficiency are important. In this study, a triggered high-intensity focused ultrasound (HIFU) sequence, which consists of a short, extremely high intensity pulse immediately followed by a long, moderate-intensity burst, was employed for the efficient generation of ROS. In experiments, a solution sealed in a chamber was exposed to a triggered HIFU sequence. Then, the distribution of generated ROS was observed by the luminol reaction, and the amount of generated ROS was quantified using KI method. As a result, the localized ROS generation was demonstrated by light emission from the luminol reaction. Moreover, it was demonstrated that the triggered HIFU sequence has higher efficiency of ROS generation by both the KI method and the luminol reaction emission.
Sequence information signal processor

DOEpatents

Peterson, John C.; Chow, Edward T.; Waterman, Michael S.; Hunkapillar, Timothy J.

1999-01-01

An electronic circuit is used to compare two sequences, such as genetic sequences, to determine which alignment of the sequences produces the greatest similarity. The circuit includes a linear array of series-connected processors, each of which stores a single element from one of the sequences and compares that element with each successive element in the other sequence. For each comparison, the processor generates a scoring parameter that indicates which segment ending at those two elements produces the greatest degree of similarity between the sequences. The processor uses the scoring parameter to generate a similar scoring parameter for a comparison between the stored element and the next successive element from the other sequence. The processor also delivers the scoring parameter to the next processor in the array for use in generating a similar scoring parameter for another pair of elements. The electronic circuit determines which processor and alignment of the sequences produce the scoring parameter with the highest value.
Next-Generation Sequencing Platforms

NASA Astrophysics Data System (ADS)

Mardis, Elaine R.

2013-06-01

Automated DNA sequencing instruments embody an elegant interplay among chemistry, engineering, software, and molecular biology and have built upon Sanger's founding discovery of dideoxynucleotide sequencing to perform once-unfathomable tasks. Combined with innovative physical mapping approaches that helped to establish long-range relationships between cloned stretches of genomic DNA, fluorescent DNA sequencers produced reference genome sequences for model organisms and for the reference human genome. New types of sequencing instruments that permit amazing acceleration of data-collection rates for DNA sequencing have been developed. The ability to generate genome-scale data sets is now transforming the nature of biological inquiry. Here, I provide an historical perspective of the field, focusing on the fundamental developments that predated the advent of next-generation sequencing instruments and providing information about how these instruments work, their application to biological research, and the newest types of sequencers that can extract data from single DNA molecules.
Rice Molecular Breeding Laboratories in the Genomics Era: Current Status and Future Considerations

PubMed Central

Collard, Bert C. Y.; Vera Cruz, Casiana M.; McNally, Kenneth L.; Virk, Parminder S.; Mackill, David J.

2008-01-01

Using DNA markers in plant breeding with marker-assisted selection (MAS) could greatly improve the precision and efficiency of selection, leading to the accelerated development of new crop varieties. The numerous examples of MAS in rice have prompted many breeding institutes to establish molecular breeding labs. The last decade has produced an enormous amount of genomics research in rice, including the identification of thousands of QTLs for agronomically important traits, the generation of large amounts of gene expression data, and cloning and characterization of new genes, including the detection of single nucleotide polymorphisms. The pinnacle of genomics research has been the completion and annotation of genome sequences for indica and japonica rice. This information—coupled with the development of new genotyping methodologies and platforms, and the development of bioinformatics databases and software tools—provides even more exciting opportunities for rice molecular breeding in the 21st century. However, the great challenge for molecular breeders is to apply genomics data in actual breeding programs. Here, we review the current status of MAS in rice, current genomics projects and promising new genotyping methodologies, and evaluate the probable impact of genomics research. We also identify critical research areas to “bridge the application gap” between QTL identification and applied breeding that need to be addressed to realize the full potential of MAS, and propose ideas and guidelines for establishing rice molecular breeding labs in the postgenome sequence era to integrate molecular breeding within the context of overall rice breeding and research programs. PMID:18528527
A better sequence-read simulator program for metagenomics.

PubMed

Johnson, Stephen; Trost, Brett; Long, Jeffrey R; Pittet, Vanessa; Kusalik, Anthony

2014-01-01

There are many programs available for generating simulated whole-genome shotgun sequence reads. The data generated by many of these programs follow predefined models, which limits their use to the authors' original intentions. For example, many models assume that read lengths follow a uniform or normal distribution. Other programs generate models from actual sequencing data, but are limited to reads from single-genome studies. To our knowledge, there are no programs that allow a user to generate simulated data following non-parametric read-length distributions and quality profiles based on empirically-derived information from metagenomics sequencing data. We present BEAR (Better Emulation for Artificial Reads), a program that uses a machine-learning approach to generate reads with lengths and quality values that closely match empirically-derived distributions. BEAR can emulate reads from various sequencing platforms, including Illumina, 454, and Ion Torrent. BEAR requires minimal user input, as it automatically determines appropriate parameter settings from user-supplied data. BEAR also uses a unique method for deriving run-specific error rates, and extracts useful statistics from the metagenomic data itself, such as quality-error models. Many existing simulators are specific to a particular sequencing technology; however, BEAR is not restricted in this way. Because of its flexibility, BEAR is particularly useful for emulating the behaviour of technologies like Ion Torrent, for which no dedicated sequencing simulators are currently available. BEAR is also the first metagenomic sequencing simulator program that automates the process of generating abundances, which can be an arduous task. BEAR is useful for evaluating data processing tools in genomics. It has many advantages over existing comparable software, such as generating more realistic reads and being independent of sequencing technology, and has features particularly useful for metagenomics work.
enoLOGOS: a versatile web tool for energy normalized sequence logos

PubMed Central

Workman, Christopher T.; Yin, Yutong; Corcoran, David L.; Ideker, Trey; Stormo, Gary D.; Benos, Panayiotis V.

2005-01-01

enoLOGOS is a web-based tool that generates sequence logos from various input sources. Sequence logos have become a popular way to graphically represent DNA and amino acid sequence patterns from a set of aligned sequences. Each position of the alignment is represented by a column of stacked symbols with its total height reflecting the information content in this position. Currently, the available web servers are able to create logo images from a set of aligned sequences, but none of them generates weighted sequence logos directly from energy measurements or other sources. With the advent of high-throughput technologies for estimating the contact energy of different DNA sequences, tools that can create logos directly from binding affinity data are useful to researchers. enoLOGOS generates sequence logos from a variety of input data, including energy measurements, probability matrices, alignment matrices, count matrices and aligned sequences. Furthermore, enoLOGOS can represent the mutual information of different positions of the consensus sequence, a unique feature of this tool. Another web interface for our software, C2H2-enoLOGOS, generates logos for the DNA-binding preferences of the C2H2 zinc-finger transcription factor family members. enoLOGOS and C2H2-enoLOGOS are accessible over the web at . PMID:15980495
Randomized controlled trials of simulation-based interventions in Emergency Medicine: a methodological review.

PubMed

Chauvin, Anthony; Truchot, Jennifer; Bafeta, Aida; Pateron, Dominique; Plaisance, Patrick; Yordanov, Youri

2018-04-01

The number of trials assessing Simulation-Based Medical Education (SBME) interventions has rapidly expanded. Many studies show that potential flaws in design, conduct and reporting of randomized controlled trials (RCTs) can bias their results. We conducted a methodological review of RCTs assessing a SBME in Emergency Medicine (EM) and examined their methodological characteristics. We searched MEDLINE via PubMed for RCT that assessed a simulation intervention in EM, published in 6 general and internal medicine and in the top 10 EM journals. The Cochrane Collaboration risk of Bias tool was used to assess risk of bias, intervention reporting was evaluated based on the "template for intervention description and replication" checklist, and methodological quality was evaluated by the Medical Education Research Study Quality Instrument. Reports selection and data extraction was done by 2 independents researchers. From 1394 RCTs screened, 68 trials assessed a SBME intervention. They represent one quarter of our sample. Cardiopulmonary resuscitation (CPR) is the most frequent topic (81%). Random sequence generation and allocation concealment were performed correctly in 66 and 49% of trials. Blinding of participants and assessors was performed correctly in 19 and 68%. Risk of attrition bias was low in three-quarters of the studies (n = 51). Risk of selective reporting bias was unclear in nearly all studies. The mean MERQSI score was of 13.4/18.4% of the reports provided a description allowing the intervention replication. Trials assessing simulation represent one quarter of RCTs in EM. Their quality remains unclear, and reproducing the interventions appears challenging due to reporting issues.
Solar photoelectro-Fenton degradation of the herbicide 4-chloro-2-methylphenoxyacetic acid optimized by response surface methodology.

PubMed

Garcia-Segura, Sergi; Almeida, Lucio Cesar; Bocchi, Nerilso; Brillas, Enric

2011-10-30

A central composite rotatable design and response surface methodology (RSM) were used to optimize the experimental variables of the solar photoelectro-Fenton (SPEF) treatment of the herbicide 4-chloro-2-methylphenoxyacetic acid (MCPA). The experiments were made with a flow plant containing a Pt/air-diffusion reactor coupled to a solar compound parabolic collector (CPC) under recirculation of 10 L of 186 mg L(-1) MCPA solutions in 0.05 M Na(2)SO(4) at a liquid flow rate of 180 L h(-1) with an average UV irradiation intensity of about 32 Wm(-2). The optimum variables found for the SPEF process were 5.0 A, 1.0mM Fe(2+) and pH 3.0 after 120 min of electrolysis. Under these conditions, 75% of mineralization with 71% of current efficiency and 87.7 k Wh kg(-1) TOC of energy consumption were obtained. MCPA decayed under the attack of generated hydroxyl radicals following a pseudo-first-order kinetics. Hydroxyl radicals also destroyed 4-chloro-2-methylphenol, methylhydroquinone and methyl-p-benzoquinone detected as aromatic by-products. Glycolic, maleic, fumaric, malic, succinic, tartronic, oxalic and formic acids were identified as generated carboxylic acids, which form Fe(III) complexes that are quickly photodecarboxylated by the UV irradiation of sunlight at the CPC photoreactor. A reaction sequence for the SPEF degradation of MCPA was proposed. Copyright © 2011 Elsevier B.V. All rights reserved.
Enabling efficient vertical takeoff/landing and forward flight of unmanned aerial vehicles: Design and control of tandem wing-tip mounted rotor mechanisms

NASA Astrophysics Data System (ADS)

Mancuso, Peter Timothy

Fixed-wing unmanned aerial vehicles (UAVs) that offer vertical takeoff and landing (VTOL) and forward flight capability suffer from sub-par performance in both flight modes. Achieving the next generation of efficient hybrid aircraft requires innovations in: (i) power management, (ii) efficient structures, and (iii) control methodologies. Existing hybrid UAVs generally utilize one of three transitioning mechanisms: an external power mechanism to tilt the rotor-propulsion pod, separate propulsion units and rotors during hover and forward flight, or tilt body craft (smaller scale). Thus, hybrid concepts require more energy compared to dedicated fixed-wing or rotorcraft UAVs. Moreover, design trade-offs to reinforce the wing structure (typically to accommodate the propulsion systems and enable hover, i.e. tilt-rotor concepts) adversely impacts the aerodynamics, controllability and efficiency of the aircraft in both hover and forward flight modes. The goal of this research is to develop more efficient VTOL/ hover and forward flight UAVs. In doing so, the transition sequence, transition mechanism, and actuator performance are heavily considered. A design and control methodology was implemented to address these issues through a series of computer simulations and prototype benchtop tests to verify the proposed solution. Finally, preliminary field testing with a first-generation prototype was conducted. The methods used in this research offer guidelines and a new dual-arm rotor UAV concept to designing more efficient hybrid UAVs in both hover and forward flight.
Initial retrieval sequence and blending strategy

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pemwell, D.L.; Grenard, C.E.

1996-09-01

This report documents the initial retrieval sequence and the methodology used to select it. Waste retrieval, storage, pretreatment and vitrification were modeled for candidate single-shell tank retrieval sequences. Performance of the sequences was measured by a set of metrics (for example,high-level waste glass volume, relative risk and schedule).Computer models were used to evaluate estimated glass volumes,process rates, retrieval dates, and blending strategy effects.The models were based on estimates of component inventories and concentrations, sludge wash factors and timing, retrieval annex limitations, etc.
The Planning, Implementation, and Movement of an Academic Library Collection.

ERIC Educational Resources Information Center

Kurkul, Donna Lee

1983-01-01

Discusses methodology, logistics, and time/cost study of planning, implementation, and relocation of 682,810 volume Smith College Library collection into its newly constructed and renovated facility. Call number sequence location, collection movement phasing and formulas for sequence distribution, and personnel requirements are noted. Elementary…
Single-Concept Clicker Question Sequences

ERIC Educational Resources Information Center

Lee, Albert; Ding, Lin; Reay, Neville W.; Bao, Lei

2011-01-01

Students typically use electronic polling systems, or clickers, to answer individual questions. Differing from this tradition, we have developed a new clicker methodology in which multiple clicker questions targeting the same underlying concept but with different surface features are grouped into a sequence. Here we present the creation,…
Risk of bias and methodological issues in randomised controlled trials of acupuncture for knee osteoarthritis: a cross-sectional study

PubMed Central

Lee, Andy H; Zhou, Xu; Kang, Deying; Luo, Yanan; Liu, Jiali; Sun, Xin

2018-01-01

Objective To assess risk of bias and to investigate methodological issues concerning the design, conduct and analysis of randomised controlled trials (RCTs) testing acupuncture for knee osteoarthritis (KOA). Methods PubMed, EMBASE, Cochrane Central Register of Controlled Trials and four major Chinese databases were searched for RCTs that investigated the effect of acupuncture for KOA. The Cochrane tool was used to examine the risk of bias of eligible RCTs. Their methodological details were examined using a standardised and pilot-tested questionnaire of 48 items, together with the association between four predefined factors and important methodological quality indicators. Results A total of 248 RCTs were eligible, of which 39 (15.7%) used computer-generated randomisation sequence. Of the 31 (12.5%) trials that stated the allocation concealment, only one used central randomisation. Twenty-five (10.1%) trials mentioned that their acupuncture procedures were standardised, but only 18 (7.3%) specified how the standardisation was achieved. The great majority of trials (n=233, 94%) stated that blinding was in place, but 204 (87.6%) did not clarify who was blinded. Only 27 (10.9%) trials specified the primary outcome, for which 7 used intention-to-treat analysis. Only 17 (6.9%) trials included details on sample size calculation; none preplanned an interim analysis and associated stopping rule. In total, 46 (18.5%) trials explicitly stated that loss to follow-up occurred, but only 6 (2.4%) provided some information to deal with the issue. No trials prespecified, conducted or reported any subgroup or adjusted analysis for the primary outcome. Conclusion The overall risk of bias was high among published RCTs testing acupuncture for KOA. Methodological limitations were present in many important aspects of design, conduct and analyses. These findings inform the development of evidence-based methodological guidance for future trials assessing the effect of acupuncture for KOA. PMID:29511016
Next Generation Sequencing at the University of Chicago Genomics Core

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faber, Pieter

2013-04-24

The University of Chicago Genomics Core provides University of Chicago investigators (and external clients) access to State-of-the-Art genomics capabilities: next generation sequencing, Sanger sequencing / genotyping and micro-arrays (gene expression, genotyping, and methylation). The current presentation will highlight our capabilities in the area of ultra-high throughput sequencing analysis.
Defining the healthy "core microbiome" of oral microbial communities

PubMed Central

2009-01-01

Background Most studies examining the commensal human oral microbiome are focused on disease or are limited in methodology. In order to diagnose and treat diseases at an early and reversible stage an in-depth definition of health is indispensible. The aim of this study therefore was to define the healthy oral microbiome using recent advances in sequencing technology (454 pyrosequencing). Results We sampled and sequenced microbiomes from several intraoral niches (dental surfaces, cheek, hard palate, tongue and saliva) in three healthy individuals. Within an individual oral cavity, we found over 3600 unique sequences, over 500 different OTUs or "species-level" phylotypes (sequences that clustered at 3% genetic difference) and 88 - 104 higher taxa (genus or more inclusive taxon). The predominant taxa belonged to Firmicutes (genus Streptococcus, family Veillonellaceae, genus Granulicatella), Proteobacteria (genus Neisseria, Haemophilus), Actinobacteria (genus Corynebacterium, Rothia, Actinomyces), Bacteroidetes (genus Prevotella, Capnocytophaga, Porphyromonas) and Fusobacteria (genus Fusobacterium). Each individual sample harboured on average 266 "species-level" phylotypes (SD 67; range 123 - 326) with cheek samples being the least diverse and the dental samples from approximal surfaces showing the highest diversity. Principal component analysis discriminated the profiles of the samples originating from shedding surfaces (mucosa of tongue, cheek and palate) from the samples that were obtained from solid surfaces (teeth). There was a large overlap in the higher taxa, "species-level" phylotypes and unique sequences among the three microbiomes: 84% of the higher taxa, 75% of the OTUs and 65% of the unique sequences were present in at least two of the three microbiomes. The three individuals shared 1660 of 6315 unique sequences. These 1660 sequences (the "core microbiome") contributed 66% of the reads. The overlapping OTUs contributed to 94% of the reads, while nearly all reads (99.8%) belonged to the shared higher taxa. Conclusions We obtained the first insight into the diversity and uniqueness of individual oral microbiomes at a resolution of next-generation sequencing. We showed that a major proportion of bacterial sequences of unrelated healthy individuals is identical, supporting the concept of a core microbiome at health. PMID:20003481

Concept For Generation Of Long Pseudorandom Sequences

NASA Technical Reports Server (NTRS)

Wang, C. C.

1990-01-01

Conceptual very-large-scale integrated (VLSI) digital circuit performs exponentiation in finite field. Algorithm that generates unusually long sequences of pseudorandom numbers executed by digital processor that includes such circuits. Concepts particularly advantageous for such applications as spread-spectrum communications, cryptography, and generation of ranging codes, synthetic noise, and test data, where usually desirable to make pseudorandom sequences as long as possible.
Coinfection of Fusobacterium nucleatum and Actinomyces israelii in Mastoiditis Diagnosed by Next-Generation DNA Sequencing

PubMed Central

Hoogestraat, Daniel R.; Abbott, April N.; SenGupta, Dhruba J.; Cummings, Lisa A.; Butler-Wu, Susan M.; Stephens, Karen; Cookson, Brad T.; Hoffman, Noah G.

2014-01-01

Some bacterial infections involve potentially complex mixtures of species that can now be distinguished using next-generation DNA sequencing. We present a case of mastoiditis where Gram stain, culture, and molecular diagnosis were nondiagnostic or discrepant. Next-generation sequencing implicated coinfection of Fusobacterium nucleatum and Actinomyces israelii, resolving these diagnostic discrepancies. PMID:24574281
Multi-Stage Target Tracking with Drift Correction and Position Prediction

NASA Astrophysics Data System (ADS)

Chen, Xin; Ren, Keyan; Hou, Yibin

2018-04-01

Most existing tracking methods are hard to combine accuracy and performance, and do not consider the shift between clarity and blur that often occurs. In this paper, we propound a multi-stage tracking framework with two particular modules: position prediction and corrective measure. We conduct tracking based on correlation filter with a corrective measure module to increase both performance and accuracy. Specifically, a convolutional network is used for solving the blur problem in realistic scene, training methodology that training dataset with blur images generated by the three blur algorithms. Then, we propose a position prediction module to reduce the computation cost and make tracker more capable of fast motion. Experimental result shows that our tracking method is more robust compared to others and more accurate on the benchmark sequences.
Universal Influenza B Virus Genomic Amplification Facilitates Sequencing, Diagnostics, and Reverse Genetics

PubMed Central

Zhou, Bin; Lin, Xudong; Wang, Wei; Halpin, Rebecca A.; Bera, Jayati; Stockwell, Timothy B.; Barr, Ian G.

2014-01-01

Although human influenza B virus (IBV) is a significant human pathogen, its great genetic diversity has limited our ability to universally amplify the entire genome for subsequent sequencing or vaccine production. The generation of sequence data via next-generation approaches and the rapid cloning of viral genes are critical for basic research, diagnostics, antiviral drugs, and vaccines to combat IBV. To overcome the difficulty of amplifying the diverse and ever-changing IBV genome, we developed and optimized techniques that amplify the complete segmented negative-sense RNA genome from any IBV strain in a single tube/well (IBV genomic amplification [IBV-GA]). Amplicons for >1,000 diverse IBV genomes from different sample types (e.g., clinical specimens) were generated and sequenced using this robust technology. These approaches are sensitive, robust, and sequence independent (i.e., universally amplify past, present, and future IBVs), which facilitates next-generation sequencing and advanced genomic diagnostics. Importantly, special terminal sequences engineered into the optimized IBV-GA2 products also enable ligation-free cloning to rapidly generate reverse-genetics plasmids, which can be used for the rescue of recombinant viruses and/or the creation of vaccine seed stock. PMID:24501036
AutoGen Version 5.0

NASA Technical Reports Server (NTRS)

Gladden, Roy E.; Khanampornpan, Teerapat; Fisher, Forest W.

2010-01-01

Version 5.0 of the AutoGen software has been released. Previous versions, variously denoted Autogen and autogen, were reported in two articles: Automated Sequence Generation Process and Software (NPO-30746), Software Tech Briefs (Special Supplement to NASA Tech Briefs), September 2007, page 30, and Autogen Version 2.0 (NPO- 41501), NASA Tech Briefs, Vol. 31, No. 10 (October 2007), page 58. To recapitulate: AutoGen (now signifying automatic sequence generation ) automates the generation of sequences of commands in a standard format for uplink to spacecraft. AutoGen requires fewer workers than are needed for older manual sequence-generation processes, and greatly reduces sequence-generation times. The sequences are embodied in spacecraft activity sequence files (SASFs). AutoGen automates generation of SASFs by use of another previously reported program called APGEN. AutoGen encodes knowledge of different mission phases and of how the resultant commands must differ among the phases. AutoGen also provides means for customizing sequences through use of configuration files. The approach followed in developing AutoGen has involved encoding the behaviors of a system into a model and encoding algorithms for context-sensitive customizations of the modeled behaviors. This version of AutoGen addressed the MRO (Mars Reconnaissance Orbiter) primary science phase (PSP) mission phase. On previous Mars missions this phase has more commonly been referred to as mapping phase. This version addressed the unique aspects of sequencing orbital operations and specifically the mission specific adaptation of orbital operations for MRO. This version also includes capabilities for MRO s role in Mars relay support for UHF relay communications with the MER rovers and the Phoenix lander.
454 next generation-sequencing outperforms allele-specific PCR, Sanger sequencing, and pyrosequencing for routine KRAS mutation analysis of formalin-fixed, paraffin-embedded samples

PubMed Central

Altimari, Annalisa; de Biase, Dario; De Maglio, Giovanna; Gruppioni, Elisa; Capizzi, Elisa; Degiovanni, Alessio; D’Errico, Antonia; Pession, Annalisa; Pizzolitto, Stefano; Fiorentino, Michelangelo; Tallini, Giovanni

2013-01-01

Detection of KRAS mutations in archival pathology samples is critical for therapeutic appropriateness of anti-EGFR monoclonal antibodies in colorectal cancer. We compared the sensitivity, specificity, and accuracy of Sanger sequencing, ARMS-Scorpion (TheraScreen®) real-time polymerase chain reaction (PCR), pyrosequencing, chip array hybridization, and 454 next-generation sequencing to assess KRAS codon 12 and 13 mutations in 60 nonconsecutive selected cases of colorectal cancer. Twenty of the 60 cases were detected as wild-type KRAS by all methods with 100% specificity. Among the 40 mutated cases, 13 were discrepant with at least one method. The sensitivity was 85%, 90%, 93%, and 92%, and the accuracy was 90%, 93%, 95%, and 95% for Sanger sequencing, TheraScreen real-time PCR, pyrosequencing, and chip array hybridization, respectively. The main limitation of Sanger sequencing was its low analytical sensitivity, whereas TheraScreen real-time PCR, pyrosequencing, and chip array hybridization showed higher sensitivity but suffered from the limitations of predesigned assays. Concordance between the methods was k = 0.79 for Sanger sequencing and k > 0.85 for the other techniques. Tumor cell enrichment correlated significantly with the abundance of KRAS-mutated deoxyribonucleic acid (DNA), evaluated as ΔCt for TheraScreen real-time PCR (P = 0.03), percentage of mutation for pyrosequencing (P = 0.001), ratio for chip array hybridization (P = 0.003), and percentage of mutation for 454 next-generation sequencing (P = 0.004). Also, 454 next-generation sequencing showed the best cross correlation for quantification of mutation abundance compared with all the other methods (P < 0.001). Our comparison showed the superiority of next-generation sequencing over the other techniques in terms of sensitivity and specificity. Next-generation sequencing will replace Sanger sequencing as the reference technique for diagnostic detection of KRAS mutation in archival tumor tissues. PMID:23950653
Effect of Next-Generation Exome Sequencing Depth for Discovery of Diagnostic Variants.

PubMed

Kim, Kyung; Seong, Moon-Woo; Chung, Won-Hyong; Park, Sung Sup; Leem, Sangseob; Park, Won; Kim, Jihyun; Lee, KiYoung; Park, Rae Woong; Kim, Namshin

2015-06-01

Sequencing depth, which is directly related to the cost and time required for the generation, processing, and maintenance of next-generation sequencing data, is an important factor in the practical utilization of such data in clinical fields. Unfortunately, identifying an exome sequencing depth adequate for clinical use is a challenge that has not been addressed extensively. Here, we investigate the effect of exome sequencing depth on the discovery of sequence variants for clinical use. Toward this, we sequenced ten germ-line blood samples from breast cancer patients on the Illumina platform GAII(x) at a high depth of ~200×. We observed that most function-related diverse variants in the human exonic regions could be detected at a sequencing depth of 120×. Furthermore, investigation using a diagnostic gene set showed that the number of clinical variants identified using exome sequencing reached a plateau at an average sequencing depth of about 120×. Moreover, the phenomena were consistent across the breast cancer samples.
Microbes, metagenomes and marine mammals: enabling the next generation of scientist to enter the genomic era

PubMed Central

2013-01-01

Background The revolution in DNA sequencing technology continues unabated, and is affecting all aspects of the biological and medical sciences. The training and recruitment of the next generation of researchers who are able to use and exploit the new technology is severely lacking and potentially negatively influencing research and development efforts to advance genome biology. Here we present a cross-disciplinary course that provides undergraduate students with practical experience in running a next generation sequencing instrument through to the analysis and annotation of the generated DNA sequences. Results Many labs across world are installing next generation sequencing technology and we show that the undergraduate students produce quality sequence data and were excited to participate in cutting edge research. The students conducted the work flow from DNA extraction, library preparation, running the sequencing instrument, to the extraction and analysis of the data. They sequenced microbes, metagenomes, and a marine mammal, the Californian sea lion, Zalophus californianus. The students met sequencing quality controls, had no detectable contamination in the targeted DNA sequences, provided publication quality data, and became part of an international collaboration to investigate carcinomas in carnivores. Conclusions Students learned important skills for their future education and career opportunities, and a perceived increase in students’ ability to conduct independent scientific research was measured. DNA sequencing is rapidly expanding in the life sciences. Teaching undergraduates to use the latest technology to sequence genomic DNA ensures they are ready to meet the challenges of the genomic era and allows them to participate in annotating the tree of life. PMID:24007365
2b-RAD genotyping for population genomic studies of Chagas disease vectors: Rhodnius ecuadoriensis in Ecuador.

PubMed

Hernandez-Castro, Luis E; Paterno, Marta; Villacís, Anita G; Andersson, Björn; Costales, Jaime A; De Noia, Michele; Ocaña-Mayorga, Sofía; Yumiseva, Cesar A; Grijalva, Mario J; Llewellyn, Martin S

2017-07-01

Rhodnius ecuadoriensis is the main triatomine vector of Chagas disease, American trypanosomiasis, in Southern Ecuador and Northern Peru. Genomic approaches and next generation sequencing technologies have become powerful tools for investigating population diversity and structure which is a key consideration for vector control. Here we assess the effectiveness of three different 2b restriction site-associated DNA (2b-RAD) genotyping strategies in R. ecuadoriensis to provide sufficient genomic resolution to tease apart microevolutionary processes and undertake some pilot population genomic analyses. The 2b-RAD protocol was carried out in-house at a non-specialized laboratory using 20 R. ecuadoriensis adults collected from the central coast and southern Andean region of Ecuador, from June 2006 to July 2013. 2b-RAD sequencing data was performed on an Illumina MiSeq instrument and analyzed with the STACKS de novo pipeline for loci assembly and Single Nucleotide Polymorphism (SNP) discovery. Preliminary population genomic analyses (global AMOVA and Bayesian clustering) were implemented. Our results showed that the 2b-RAD genotyping protocol is effective for R. ecuadoriensis and likely for other triatomine species. However, only BcgI and CspCI restriction enzymes provided a number of markers suitable for population genomic analysis at the read depth we generated. Our preliminary genomic analyses detected a signal of genetic structuring across the study area. Our findings suggest that 2b-RAD genotyping is both a cost effective and methodologically simple approach for generating high resolution genomic data for Chagas disease vectors with the power to distinguish between different vector populations at epidemiologically relevant scales. As such, 2b-RAD represents a powerful tool in the hands of medical entomologists with limited access to specialized molecular biological equipment.
Structural phylogeny by profile extraction and multiple superimposition using electrostatic congruence as a discriminator

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chakraborty, Sandeep; Rao, Basuthkar J.; Baker, Nathan A.

2013-04-01

Phylogenetic analysis of proteins using multiple sequence alignment (MSA) assumes an underlying evolutionary relationship in these proteins which occasionally remains undetected due to considerable sequence divergence. Structural alignment programs have been developed to unravel such fuzzy relationships. However, none of these structure based methods have used electrostatic properties to discriminate between spatially equivalent residues. We present a methodology for MSA of a set of related proteins with known structures using electrostatic properties as an additional discriminator (STEEP). STEEP first extracts a profile, then generates a multiple structural superimposition providing a consolidated spatial framework for comparing residues and finally emits themore » MSA. Residues that are aligned differently by including or excluding electrostatic properties can be targeted by directed evolution experiments to transform the enzymatic properties of one protein into another. We have compared STEEP results to those obtained from a MSA program (ClustalW) and a structural alignment method (MUSTANG) for chymotrypsin serine proteases. Subsequently, we used PhyML to generate phylogenetic trees for the serine and metallo-β-lactamase superfamilies from the STEEP generated MSA, and corroborated the accepted relationships in these superfamilies. We have observed that STEEP acts as a functional classifier when electrostatic congruence is used as a discriminator, and thus identifies potential targets for directed evolution experiments. In summary, STEEP is unique among phylogenetic methods for its ability to use electrostatic congruence to specify mutations that might be the source of the functional divergence in a protein family. Based on our results, we also hypothesize that the active site and its close vicinity contains enough information to infer the correct phylogeny for related proteins.« less
Next-Generation Sequencing in the Mycology Lab.

PubMed

Zoll, Jan; Snelders, Eveline; Verweij, Paul E; Melchers, Willem J G

New state-of-the-art techniques in sequencing offer valuable tools in both detection of mycobiota and in understanding of the molecular mechanisms of resistance against antifungal compounds and virulence. Introduction of new sequencing platform with enhanced capacity and a reduction in costs for sequence analysis provides a potential powerful tool in mycological diagnosis and research. In this review, we summarize the applications of next-generation sequencing techniques in mycology.
Rapid Sequencing of the Bamboo Mitochondrial Genome Using Illumina Technology and Parallel Episodic Evolution of Organelle Genomes in Grasses

PubMed Central

Ma, Peng-Fei; Guo, Zhen-Hua; Li, De-Zhu

2012-01-01

Background Compared to their counterparts in animals, the mitochondrial (mt) genomes of angiosperms exhibit a number of unique features. However, unravelling their evolution is hindered by the few completed genomes, of which are essentially Sanger sequenced. While next-generation sequencing technologies have revolutionized chloroplast genome sequencing, they are just beginning to be applied to angiosperm mt genomes. Chloroplast genomes of grasses (Poaceae) have undergone episodic evolution and the evolutionary rate was suggested to be correlated between chloroplast and mt genomes in Poaceae. It is interesting to investigate whether correlated rate change also occurred in grass mt genomes as expected under lineage effects. A time-calibrated phylogenetic tree is needed to examine rate change. Methodology/Principal Findings We determined a largely completed mt genome from a bamboo, Ferrocalamus rimosivaginus (Poaceae), through Illumina sequencing of total DNA. With combination of de novo and reference-guided assembly, 39.5-fold coverage Illumina reads were finally assembled into scaffolds totalling 432,839 bp. The assembled genome contains nearly the same genes as the completed mt genomes in Poaceae. For examining evolutionary rate in grass mt genomes, we reconstructed a phylogenetic tree including 22 taxa based on 31 mt genes. The topology of the well-resolved tree was almost identical to that inferred from chloroplast genome with only minor difference. The inconsistency possibly derived from long branch attraction in mtDNA tree. By calculating absolute substitution rates, we found significant rate change (∼4-fold) in mt genome before and after the diversification of Poaceae both in synonymous and nonsynonymous terms. Furthermore, the rate change was correlated with that of chloroplast genomes in grasses. Conclusions/Significance Our result demonstrates that it is a rapid and efficient approach to obtain angiosperm mt genome sequences using Illumina sequencing technology. The parallel episodic evolution of mt and chloroplast genomes in grasses is consistent with lineage effects. PMID:22272330
The effect of different control point sampling sequences on convergence of VMAT inverse planning

NASA Astrophysics Data System (ADS)

Pardo Montero, Juan; Fenwick, John D.

2011-04-01

A key component of some volumetric-modulated arc therapy (VMAT) optimization algorithms is the progressive addition of control points to the optimization. This idea was introduced in Otto's seminal VMAT paper, in which a coarse sampling of control points was used at the beginning of the optimization and new control points were progressively added one at a time. A different form of the methodology is also present in the RapidArc optimizer, which adds new control points in groups called 'multiresolution levels', each doubling the number of control points in the optimization. This progressive sampling accelerates convergence, improving the results obtained, and has similarities with the ordered subset algorithm used to accelerate iterative image reconstruction. In this work we have used a VMAT optimizer developed in-house to study the performance of optimization algorithms which use different control point sampling sequences, most of which fall into three different classes: doubling sequences, which add new control points in groups such that the number of control points in the optimization is (roughly) doubled; Otto-like progressive sampling which adds one control point at a time, and equi-length sequences which contain several multiresolution levels each with the same number of control points. Results are presented in this study for two clinical geometries, prostate and head-and-neck treatments. A dependence of the quality of the final solution on the number of starting control points has been observed, in agreement with previous works. We have found that some sequences, especially E20 and E30 (equi-length sequences with 20 and 30 multiresolution levels, respectively), generate better results than a 5 multiresolution level RapidArc-like sequence. The final value of the cost function is reduced up to 20%, such reductions leading to small improvements in dosimetric parameters characterizing the treatments—slightly more homogeneous target doses and better sparing of the organs at risk.
Whole Genome Sequences of Three Treponema pallidum ssp. pertenue Strains: Yaws and Syphilis Treponemes Differ in Less than 0.2% of the Genome Sequence

PubMed Central

Chen, Lei; Pospíšilová, Petra; Strouhal, Michal; Qin, Xiang; Mikalová, Lenka; Norris, Steven J.; Muzny, Donna M.; Gibbs, Richard A.; Fulton, Lucinda L.; Sodergren, Erica; Weinstock, George M.; Šmajs, David

2012-01-01

Background The yaws treponemes, Treponema pallidum ssp. pertenue (TPE) strains, are closely related to syphilis causing strains of Treponema pallidum ssp. pallidum (TPA). Both yaws and syphilis are distinguished on the basis of epidemiological characteristics, clinical symptoms, and several genetic signatures of the corresponding causative agents. Methodology/Principal Findings To precisely define genetic differences between TPA and TPE, high-quality whole genome sequences of three TPE strains (Samoa D, CDC-2, Gauthier) were determined using next-generation sequencing techniques. TPE genome sequences were compared to four genomes of TPA strains (Nichols, DAL-1, SS14, Chicago). The genome structure was identical in all three TPE strains with similar length ranging between 1,139,330 bp and 1,139,744 bp. No major genome rearrangements were found when compared to the four TPA genomes. The whole genome nucleotide divergence (dA) between TPA and TPE subspecies was 4.7 and 4.8 times higher than the observed nucleotide diversity (π) among TPA and TPE strains, respectively, corresponding to 99.8% identity between TPA and TPE genomes. A set of 97 (9.9%) TPE genes encoded proteins containing two or more amino acid replacements or other major sequence changes. The TPE divergent genes were mostly from the group encoding potential virulence factors and genes encoding proteins with unknown function. Conclusions/Significance Hypothetical genes, with genetic differences, consistently found between TPE and TPA strains are candidates for syphilitic treponemes virulence factors. Seventeen TPE genes were predicted under positive selection, and eleven of them coded either for predicted exported proteins or membrane proteins suggesting their possible association with the cell surface. Sequence changes between TPE and TPA strains and changes specific to individual strains represent suitable targets for subspecies- and strain-specific molecular diagnostics. PMID:22292095
Novel Degenerate PCR Method for Whole-Genome Amplification Applied to Peru Margin (ODP Leg 201) Subsurface Samples

PubMed Central

Martino, Amanda J.; Rhodes, Matthew E.; Biddle, Jennifer F.; Brandt, Leah D.; Tomsho, Lynn P.; House, Christopher H.

2011-01-01

A degenerate polymerase chain reaction (PCR)-based method of whole-genome amplification, designed to work fluidly with 454 sequencing technology, was developed and tested for use on deep marine subsurface DNA samples. While optimized here for use with Roche 454 technology, the general framework presented may be applicable to other next generation sequencing systems as well (e.g., Illumina, Ion Torrent). The method, which we have called random amplification metagenomic PCR (RAMP), involves the use of specific primers from Roche 454 amplicon sequencing, modified by the addition of a degenerate region at the 3′ end. It utilizes a PCR reaction, which resulted in no amplification from blanks, even after 50 cycles of PCR. After efforts to optimize experimental conditions, the method was tested with DNA extracted from cultured E. coli cells, and genome coverage was estimated after sequencing on three different occasions. Coverage did not vary greatly with the different experimental conditions tested, and was around 62% with a sequencing effort equivalent to a theoretical genome coverage of 14.10×. The GC content of the sequenced amplification product was within 2% of the predicted values for this strain of E. coli. The method was also applied to DNA extracted from marine subsurface samples from ODP Leg 201 site 1229 (Peru Margin), and results of a taxonomic analysis revealed microbial communities dominated by Proteobacteria, Chloroflexi, Firmicutes, Euryarchaeota, and Crenarchaeota, among others. These results were similar to those obtained previously for those samples; however, variations in the proportions of taxa identified illustrates well the generally accepted view that community analysis is sensitive to both the amplification technique used and the method of assigning sequences to taxonomic groups. Overall, we find that RAMP represents a valid methodology for amplifying metagenomes from low-biomass samples. PMID:22319519
Aftershock Forecasting: Recent Developments and Lessons from the 2016 M5.8 Pawnee, Oklahoma, Earthquake

NASA Astrophysics Data System (ADS)

Michael, A. J.; Field, E. H.; Hardebeck, J.; Llenos, A. L.; Milner, K. R.; Page, M. T.; Perry, S. C.; van der Elst, N.; Wein, A. M.

2016-12-01

After the Mw 5.8 Pawnee, Oklahoma, earthquake of September 3, 2016 the USGS issued a series of aftershock forecasts for the next month and year. These forecasts were aimed at the emergency response community, those making decisions about well operations in the affected region, and the general public. The forecasts were generated manually using methods planned for automatically released Operational Aftershock Forecasts. The underlying method is from Reasenberg and Jones (Science, 1989) with improvements recently published in Page et al. (BSSA, 2016), implemented in a JAVA Graphical User Interface and presented in a template that is under development. The methodological improvements include initial models based on the tectonic regime as defined by Garcia et al. (BSSA, 2012) and the inclusion of both uncertainty in the clustering parameters and natural random variability. We did not utilize the time-dependent magnitude of completeness model from Page et al. because it applies only to teleseismic events recorded by NEIC. The parameters for Garcia's Generic Active Continental Region underestimated the modified-Omori decay parameter and underestimated the aftershock rate by a factor of 2. And the sequence following the Mw 5.7 Prague, Oklahoma, earthquake of November 6, 2011 was about 3 to 4 times more productive than the Pawnee sequence. The high productivity for these potentially induced sequences is consistent with an increase in productivity in Oklahoma since 2009 (Llenos and Michael, BSSA, 2013) and makes a general tectonic model inapplicable to sequences in this region. Soon after the mainshock occurred, the forecasts relied on the sequence specific parameters. After one month, the Omori decay parameter p is less than one, implying a very long-lived sequence. However, the decay parameter is known to be biased low at early times due to secondary aftershock triggering, and the p-value determined early in the sequence may be inaccurate for long-term forecasting.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Curry, J J; Gallagher, D W; Modarres, M

Appendices are presented concerning isolation condenser makeup; vapor suppression system; station air system; reactor building closed cooling water system; turbine building secondary closed water system; service water system; emergency service water system; fire protection system; emergency ac power; dc power system; event probability estimation; methodology of accident sequence quantification; and assignment of dominant sequences to release categories.
Testing Extension Services through AKAP Models

ERIC Educational Resources Information Center

De Rosa, Marcello; Bartoli, Luca; La Rocca, Giuseppe

2014-01-01

Purpose: The aim of the paper is to analyse the attitude of Italian farms in gaining access to agricultural extension services (AES). Design/methodology/approach: The ways Italian farms use AES are described through the AKAP (Awareness, Knowledge, Adoption, Product) sequence. This article investigated the AKAP sequence by submitting a…
A Contextualized, Differential Sequence Mining Method to Derive Students' Learning Behavior Patterns

ERIC Educational Resources Information Center

Kinnebrew, John S.; Loretz, Kirk M.; Biswas, Gautam

2013-01-01

Computer-based learning environments can produce a wealth of data on student learning interactions. This paper presents an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs a novel combination of sequence mining techniques to identify deferentially…
Analysis of Illumina Microbial Assemblies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clum, Alicia; Foster, Brian; Froula, Jeff

2010-05-28

Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

Detection of a divergent variant of grapevine virus F by next-generation sequencing.

PubMed

Molenaar, Nicholas; Burger, Johan T; Maree, Hans J

2015-08-01

The complete genome sequence of a South African isolate of grapevine virus F (GVF) is presented. It was first detected by metagenomic next-generation sequencing of field samples and validated through direct Sanger sequencing. The genome sequence of GVF isolate V5 consists of 7539 nucleotides and contains a poly(A) tail. It has a typical vitivirus genome arrangement that comprises five open reading frames (ORFs), which share only 88.96 % nucleotide sequence identity with the existing complete GVF genome sequence (JX105428).
Self-Organizing Hidden Markov Model Map (SOHMMM): Biological Sequence Clustering and Cluster Visualization.

PubMed

Ferles, Christos; Beaufort, William-Scott; Ferle, Vanessa

2017-01-01

The present study devises mapping methodologies and projection techniques that visualize and demonstrate biological sequence data clustering results. The Sequence Data Density Display (SDDD) and Sequence Likelihood Projection (SLP) visualizations represent the input symbolical sequences in a lower-dimensional space in such a way that the clusters and relations of data elements are depicted graphically. Both operate in combination/synergy with the Self-Organizing Hidden Markov Model Map (SOHMMM). The resulting unified framework is in position to analyze automatically and directly raw sequence data. This analysis is carried out with little, or even complete absence of, prior information/domain knowledge.
A Bidding Methodology by Nash Equilibrium for Finite Generators Participating in Imperfect Electricity Markets

NASA Astrophysics Data System (ADS)

Satyaramesh, P. V.

2014-01-01

This paper presents an application of finite n-person non-cooperative game theory for analyzing bidding strategies of generators in a deregulated energy marketplace with Pool Bilateral contracts so as to maximize their net profits. A new methodology to build bidding methodology for generators participating in oligopoly electricity market has been proposed in this paper. It is assumed that each generator bids a supply function. This methodology finds out the coefficients in the supply function of generators in order to maximize benefits in an environment of competing rival bidders. A natural choice for developing strategies is Nash Equilibrium (NE) model incorporating mixed strategies, for solving the bidding problem of electrical market. Associated optimal profits are evaluated for a combination of set of pure strategies of bidding of generators, and payoff matrix has been constructed. The optimal payoff is calculated by using NE. An attempt has also been made to minimize the gap between the optimal payoff and the payoff obtained by a possible mixed strategies combination. The algorithm is coded in MATLAB. A numerical example is used to illustrate the essential features of the approach and the results are proved to be the optimal values.
Economic evaluation of genomic sequencing in the paediatric population: a critical review.

PubMed

Alam, Khurshid; Schofield, Deborah

2018-05-24

Systematic evidence is critical to the formulation of national health policy to provide public funding for the integration of genomic sequencing into routine clinical care. The purpose of this review is to present systematic evidence on the economic evaluation of genomic sequencing conducted for paediatric patients in clinical care, and to identify any gaps in the methodology of economic evaluations. We undertook a critical review of the empirical evidence from economic evaluations of genomic sequencing among paediatric patients searching five electronic databases. Our inclusion criteria were limited to literature published in the English language between 2010 and 2017 in OECD countries. Articles that met our inclusion criteria were assessed using a recognised checklist for a well-designed economic evaluation. We found 11 full-text articles that met our inclusion criteria. Our analysis found that genomic sequencing markedly increased the diagnostic rate to 16-79%, but lowered the cost by 11-64% compared to the standard diagnostic pathway. Only five recent studies in paediatric clinical cohorts met most of the criteria for a well-designed economic evaluation and demonstrated cost-effectiveness of genomic sequencing in paediatric clinical cohorts of patients. Our review identified the need for improvement in the rigour of the methodologies used to provide robust evidence for the formulation of health policy on public funding to integrate genomic sequencing into routine clinical care. Nonetheless, there is emerging evidence of the cost-effectiveness of genomic sequencing over usual care for paediatric patients.
Statistical Knowledge for Teaching: Exploring it in the Classroom

ERIC Educational Resources Information Center

Burgess, Tim

2009-01-01

This paper first reports on the methodology of a study of teacher knowledge for statistics, conducted in a classroom at the primary school level. The methodology included videotaping of a sequence of lessons that involved students in investigating multivariate data sets, followed up by audiotaped interviews with each teacher. These stimulated…
Autonomously generating operations sequences for a Mars Rover using AI-based planning

NASA Technical Reports Server (NTRS)

Sherwood, Rob; Mishkin, Andrew; Estlin, Tara; Chien, Steve; Backes, Paul; Cooper, Brian; Maxwell, Scott; Rabideau, Gregg

2001-01-01

This paper discusses a proof-of-concept prototype for ground-based automatic generation of validated rover command sequences from highlevel science and engineering activities. This prototype is based on ASPEN, the Automated Scheduling and Planning Environment. This Artificial Intelligence (AI) based planning and scheduling system will automatically generate a command sequence that will execute within resource constraints and satisfy flight rules.
Finding Sequences for over 270 Orphan Enzymes

PubMed Central

Shearer, Alexander G.; Altman, Tomer; Rhee, Christine D.

2014-01-01

Despite advances in sequencing technology, there are still significant numbers of well-characterized enzymatic activities for which there are no known associated sequences. These ‘orphan enzymes’ represent glaring holes in our biological understanding, and it is a top priority to reunite them with their coding sequences. Here we report a methodology for resolving orphan enzymes through a combination of database search and literature review. Using this method we were able to reconnect over 270 orphan enzymes with their corresponding sequence. This success points toward how we can systematically eliminate the remaining orphan enzymes and prevent the introduction of future orphan enzymes. PMID:24826896
De Novo Assembly of the Japanese Flounder (Paralichthys olivaceus) Spleen Transcriptome to Identify Putative Genes Involved in Immunity

PubMed Central

Huang, Lin; Li, Guiyang; Mo, Zhaolan; Xiao, Peng; Li, Jie; Huang, Jie

2015-01-01

Background Japanese flounder (Paralichthys olivaceus) is an economically important marine fish in Asia and has suffered from disease outbreaks caused by various pathogens, which requires more information for immune relevant genes on genome background. However, genomic and transcriptomic data for Japanese flounder remain scarce, which limits studies on the immune system of this species. In this study, we characterized the Japanese flounder spleen transcriptome using an Illumina paired-end sequencing platform to identify putative genes involved in immunity. Methodology/Principal Findings A cDNA library from the spleen of P. olivaceus was constructed and randomly sequenced using an Illumina technique. The removal of low quality reads generated 12,196,968 trimmed reads, which assembled into 96,627 unigenes. A total of 21,391 unigenes (22.14%) were annotated in the NCBI Nr database, and only 1.1% of the BLASTx top-hits matched P. olivaceus protein sequences. Approximately 12,503 (58.45%) unigenes were categorized into three Gene Ontology groups, 19,547 (91.38%) were classified into 26 Cluster of Orthologous Groups, and 10,649 (49.78%) were assigned to six Kyoto Encyclopedia of Genes and Genomes pathways. Furthermore, 40,928 putative simple sequence repeats and 47, 362 putative single nucleotide polymorphisms were identified. Importantly, we identified 1,563 putative immune-associated unigenes that mapped to 15 immune signaling pathways. Conclusions/Significance The P. olivaceus transciptome data provides a rich source to discover and identify new genes, and the immune-relevant sequences identified here will facilitate our understanding of the mechanisms involved in the immune response. Furthermore, the plentiful potential SSRs and SNPs found in this study are important resources with respect to future development of a linkage map or marker assisted breeding programs for the flounder. PMID:25723398
Application of Metagenomic Sequencing to Food Safety: Detection of Shiga Toxin-Producing Escherichia coli on Fresh Bagged Spinach

PubMed Central

Leonard, Susan R.; Mammel, Mark K.; Lacher, David W.

2015-01-01

Culture-independent diagnostics reduce the reliance on traditional (and slower) culture-based methodologies. Here we capitalize on advances in next-generation sequencing (NGS) to apply this approach to food pathogen detection utilizing NGS as an analytical tool. In this study, spiking spinach with Shiga toxin-producing Escherichia coli (STEC) following an established FDA culture-based protocol was used in conjunction with shotgun metagenomic sequencing to determine the limits of detection, sensitivity, and specificity levels and to obtain information on the microbiology of the protocol. We show that an expected level of contamination (∼10 CFU/100 g) could be adequately detected (including key virulence determinants and strain-level specificity) within 8 h of enrichment at a sequencing depth of 10,000,000 reads. We also rationalize the relative benefit of static versus shaking culture conditions and the addition of selected antimicrobial agents, thereby validating the long-standing culture-based parameters behind such protocols. Moreover, the shotgun metagenomic approach was informative regarding the dynamics of microbial communities during the enrichment process, including initial surveys of the microbial loads associated with bagged spinach; the microbes found included key genera such as Pseudomonas, Pantoea, and Exiguobacterium. Collectively, our metagenomic study highlights and considers various parameters required for transitioning to such sequencing-based diagnostics for food safety and the potential to develop better enrichment processes in a high-throughput manner not previously possible. Future studies will investigate new species-specific DNA signature target regimens, rational design of medium components in concert with judicious use of additives, such as antibiotics, and alterations in the sample processing protocol to enhance detection. PMID:26386062
A DNA Barcoding Method to Discriminate between the Model Plant Brachypodium distachyon and Its Close Relatives B. stacei and B. hybridum (Poaceae)

PubMed Central

López-Alvarez, Diana; López-Herranz, Maria Luisa; Betekhtin, Alexander; Catalán, Pilar

2012-01-01

Background Brachypodium distachyon s. l. has been widely investigated across the world as a model plant for temperate cereals and biofuel grasses. However, this annual plant shows three cytotypes that have been recently recognized as three independent species, the diploids B. distachyon (2n = 10) and B. stacei (2n = 20) and their derived allotetraploid B. hybridum (2n = 30). Methodology/Principal Findings We propose a DNA barcoding approach that consists of a rapid, accurate and automatable species identification method using the standard DNA sequences of complementary plastid (trnLF) and nuclear (ITS, GI) loci. The highly homogenous but largely divergent B. distachyon and B. stacei diploids could be easily distinguished (100% identification success) using direct trnLF (2.4%), ITS (5.5%) or GI (3.8%) sequence divergence. By contrast, B. hybridum could only be unambiguously identified through the use of combined trnLF+ITS sequences (90% of identification success) or by cloned GI sequences (96.7%) that showed 5.4% (ITS) and 4% (GI) rate divergence between the two parental sequences found in the allopolyploid. Conclusion/Significance Our data provide an unbiased and effective barcode to differentiate these three closely-related species from one another. This procedure overcomes the taxonomic uncertainty generated from methods based on morphology or flow cytometry identifications that have resulted in some misclassifications of the model plant and its allies. Our study also demonstrates that the allotetraploid B. hybridum has resulted from bi-directional crosses of B. distachyon and B. stacei plants acting either as maternal or paternal parents. PMID:23240000
Metagenome assembly through clustering of next-generation sequencing data using protein sequences.

PubMed

Sim, Mikang; Kim, Jaebum

2015-02-01

The study of environmental microbial communities, called metagenomics, has gained a lot of attention because of the recent advances in next-generation sequencing (NGS) technologies. Microbes play a critical role in changing their environments, and the mode of their effect can be solved by investigating metagenomes. However, the difficulty of metagenomes, such as the combination of multiple microbes and different species abundance, makes metagenome assembly tasks more challenging. In this paper, we developed a new metagenome assembly method by utilizing protein sequences, in addition to the NGS read sequences. Our method (i) builds read clusters by using mapping information against available protein sequences, and (ii) creates contig sequences by finding consensus sequences through probabilistic choices from the read clusters. By using simulated NGS read sequences from real microbial genome sequences, we evaluated our method in comparison with four existing assembly programs. We found that our method could generate relatively long and accurate metagenome assemblies, indicating that the idea of using protein sequences, as a guide for the assembly, is promising. Copyright © 2015 Elsevier B.V. All rights reserved.
A normative price for energy from an electricity generation system: An Owner-dependent Methodology for Energy Generation (system) Assessment (OMEGA). Volume 1: Summary

NASA Technical Reports Server (NTRS)

Chamberlain, R. G.; Mcmaster, K. M.

1981-01-01

The utility owned solar electric system methodology is generalized and updated. The net present value of the system is determined by consideration of all financial benefits and costs (including a specified return on investment). Life cycle costs, life cycle revenues, and residual system values are obtained. Break even values of system parameters are estimated by setting the net present value to zero. While the model was designed for photovoltaic generators with a possible thermal energy byproduct, it applicability is not limited to such systems. The resulting owner-dependent methodology for energy generation system assessment consists of a few equations that can be evaluated without the aid of a high-speed computer.
Advanced Applications of Next-Generation Sequencing Technologies to Orchid Biology.

PubMed

Yeh, Chuan-Ming; Liu, Zhong-Jian; Tsai, Wen-Chieh

2018-01-01

Next-generation sequencing technologies are revolutionizing biology by permitting, transcriptome sequencing, whole-genome sequencing and resequencing, and genome-wide single nucleotide polymorphism profiling. Orchid research has benefited from this breakthrough, and a few orchid genomes are now available; new biological questions can be approached and new breeding strategies can be designed. The first part of this review describes the unique features of orchid biology. The second part provides an overview of the current next-generation sequencing platforms, many of which are already used in plant laboratories. The third part summarizes the state of orchid transcriptome and genome sequencing and illustrates current achievements. The genetic sequences currently obtained will not only provide a broad scope for the study of orchid biology, but also serves as a starting point for uncovering the mystery of orchid evolution.
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

PubMed Central

2013-01-01

Background Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. Results Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. Conclusions Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. PMID:24564333
Heterogeneous Suppression of Sequential Effects in Random Sequence Generation, but Not in Operant Learning.

PubMed

Shteingart, Hanan; Loewenstein, Yonatan

2016-01-01

There is a long history of experiments in which participants are instructed to generate a long sequence of binary random numbers. The scope of this line of research has shifted over the years from identifying the basic psychological principles and/or the heuristics that lead to deviations from randomness, to one of predicting future choices. In this paper, we used generalized linear regression and the framework of Reinforcement Learning in order to address both points. In particular, we used logistic regression analysis in order to characterize the temporal sequence of participants' choices. Surprisingly, a population analysis indicated that the contribution of the most recent trial has only a weak effect on behavior, compared to more preceding trials, a result that seems irreconcilable with standard sequential effects that decay monotonously with the delay. However, when considering each participant separately, we found that the magnitudes of the sequential effect are a monotonous decreasing function of the delay, yet these individual sequential effects are largely averaged out in a population analysis because of heterogeneity. The substantial behavioral heterogeneity in this task is further demonstrated quantitatively by considering the predictive power of the model. We show that a heterogeneous model of sequential dependencies captures the structure available in random sequence generation. Finally, we show that the results of the logistic regression analysis can be interpreted in the framework of reinforcement learning, allowing us to compare the sequential effects in the random sequence generation task to those in an operant learning task. We show that in contrast to the random sequence generation task, sequential effects in operant learning are far more homogenous across the population. These results suggest that in the random sequence generation task, different participants adopt different cognitive strategies to suppress sequential dependencies when generating the "random" sequences.
A high-speed on-chip pseudo-random binary sequence generator for multi-tone phase calibration

NASA Astrophysics Data System (ADS)

Gommé, Liesbeth; Vandersteen, Gerd; Rolain, Yves

2011-07-01

An on-chip reference generator is conceived by adopting the technique of decimating a pseudo-random binary sequence (PRBS) signal in parallel sequences. This is of great benefit when high-speed generation of PRBS and PRBS-derived signals is the objective. The design implemented standard CMOS logic is available in commercial libraries to provide the logic functions for the generator. The design allows the user to select the periodicity of the PRBS and the PRBS-derived signals. The characterization of the on-chip generator marks its performance and reveals promising specifications.
High-throughput genotyping of hop (Humulus lupulus L.) utilising diversity arrays technology (DArT).

PubMed

Howard, E L; Whittock, S P; Jakše, J; Carling, J; Matthews, P D; Probasco, G; Henning, J A; Darby, P; Cerenak, A; Javornik, B; Kilian, A; Koutoulis, A

2011-05-01

Implementation of molecular methods in hop (Humulus lupulus L.) breeding is dependent on the availability of sizeable numbers of polymorphic markers and a comprehensive understanding of genetic variation. However, use of molecular marker technology is limited due to expense, time inefficiency, laborious methodology and dependence on DNA sequence information. Diversity arrays technology (DArT) is a high-throughput cost-effective method for the discovery of large numbers of quality polymorphic markers without reliance on DNA sequence information. This study is the first to utilise DArT for hop genotyping, identifying 730 polymorphic markers from 92 hop accessions. The marker quality was high and similar to the quality of DArT markers previously generated for other species; although percentage polymorphism and polymorphism information content (PIC) were lower than in previous studies deploying other marker systems in hop. Genetic relationships in hop illustrated by DArT in this study coincide with knowledge generated using alternate methods. Several statistical analyses separated the hop accessions into genetically differentiated North American and European groupings, with hybrids between the two groups clearly distinguishable. Levels of genetic diversity were similar in the North American and European groups, but higher in the hybrid group. The markers produced from this time and cost-efficient genotyping tool will be a valuable resource for numerous applications in hop breeding and genetics studies, such as mapping, marker-assisted selection, genetic identity testing, guidance in the maintenance of genetic diversity and the directed breeding of superior cultivars.
Determination of Spoilage Microbiota of Pacific White Shrimp During Ambient and Cold Storage Using Next-Generation Sequencing and Culture-Dependent Method.

PubMed

Yang, Sheng-Ping; Xie, Jing; Qian, Yun-Fang

2017-05-01

This study was conducted to determine the initial and spoilage microbiota of Pacific white shrimp during ambient and cold storage using next-generation sequencing (NGS) and a culture-dependent method. The quality changes were also evaluated by sensory analysis and total volatile basic nitrogen (TVB-N) values. After 1 d of storage, the psychrotrophic bacteria were only 5.97 log CFU/g, accounting for 1.1% of the mesophilic bacteria counts (7.94 log CFU/g). The psychrotrophic bacteria counts exceeded the counts of mesophilic bacteria for shrimp stored at 4 °C after 6 d of storage, indicating that psychrotrophic bacteria became predominant. The NGS was used to identify the bacterial species in samples stored at 25 and 4 °C. The results showed that the dominant microorganisms were Vibrio at 25 °C, and Acinetobacter, Psychrobacter, and Shewanella at 4 °C. By the culture-dependent method based on 16S rRNA gene and VITEK®2 CompactA system, it showed that the dominant microorganisms were Proteus spp. at 25 °C, and Shewanella putrefaciens, Acinetobacter johnsonii, and Aeromonas sobria at 4 °C. In conclusion, differences in results of microbiota analyzed by culture dependent and independent approaches were observed. The combination of both methodologies may provide more comprehensive information about the dominant spoilage microbiota in Pacific white shrimp during ambient and cold storage. © 2017 Institute of Food Technologists®.
Real-time fast physical random number generator with a photonic integrated circuit.

PubMed

Ugajin, Kazusa; Terashima, Yuta; Iwakawa, Kento; Uchida, Atsushi; Harayama, Takahisa; Yoshimura, Kazuyuki; Inubushi, Masanobu

2017-03-20

Random number generators are essential for applications in information security and numerical simulations. Most optical-chaos-based random number generators produce random bit sequences by offline post-processing with large optical components. We demonstrate a real-time hardware implementation of a fast physical random number generator with a photonic integrated circuit and a field programmable gate array (FPGA) electronic board. We generate 1-Tbit random bit sequences and evaluate their statistical randomness using NIST Special Publication 800-22 and TestU01. All of the BigCrush tests in TestU01 are passed using 410-Gbit random bit sequences. A maximum real-time generation rate of 21.1 Gb/s is achieved for random bit sequences in binary format stored in a computer, which can be directly used for applications involving secret keys in cryptography and random seeds in large-scale numerical simulations.
Evaluation of massively parallel sequencing for forensic DNA methylation profiling.

PubMed

Richards, Rebecca; Patel, Jayshree; Stevenson, Kate; Harbison, SallyAnn

2018-05-11

Epigenetics is an emerging area of interest in forensic science. DNA methylation, a type of epigenetic modification, can be applied to chronological age estimation, identical twin differentiation and body fluid identification. However, there is not yet an agreed, established methodology for targeted detection and analysis of DNA methylation markers in forensic research. Recently a massively parallel sequencing-based approach has been suggested. The use of massively parallel sequencing is well established in clinical epigenetics and is emerging as a new technology in the forensic field. This review investigates the potential benefits, limitations and considerations of this technique for the analysis of DNA methylation in a forensic context. The importance of a robust protocol, regardless of the methodology used, that minimises potential sources of bias is highlighted. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.

Detecting atypical examples of known domain types by sequence similarity searching: the SBASE domain library approach.

PubMed

Dhir, Somdutta; Pacurar, Mircea; Franklin, Dino; Gáspári, Zoltán; Kertész-Farkas, Attila; Kocsor, András; Eisenhaber, Frank; Pongor, Sándor

2010-11-01

SBASE is a project initiated to detect known domain types and predicting domain architectures using sequence similarity searching (Simon et al., Protein Seq Data Anal, 5: 39-42, 1992, Pongor et al, Nucl. Acids. Res. 21:3111-3115, 1992). The current approach uses a curated collection of domain sequences - the SBASE domain library - and standard similarity search algorithms, followed by postprocessing which is based on a simple statistics of the domain similarity network (http://hydra.icgeb.trieste.it/sbase/). It is especially useful in detecting rare, atypical examples of known domain types which are sometimes missed even by more sophisticated methodologies. This approach does not require multiple alignment or machine learning techniques, and can be a useful complement to other domain detection methodologies. This article gives an overview of the project history as well as of the concepts and principles developed within this the project.
The Application of Next-Generation Sequencing for Mutation Detection in Autosomal-Dominant Hereditary Hearing Impairment.

PubMed

Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K

2017-07-01

Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
[Target gene sequence capture and next generation sequencing technology to diagnose four children with Alagille syndrome].

PubMed

Gao, M L; Zhong, X M; Ma, X; Ning, H J; Zhu, D; Zou, J Z

2016-06-02

To make genetic diagnosis of Alagille syndrome (ALGS) patients using target gene sequence capture and next generation sequencing technology. Target gene sequence capture and next generation sequencing were used to detect ALGS gene of 4 patients. They were hospitalized at the Affiliated Hospital, Capital Institute of Pediatrics between January 2014 and December 2015, referred to clinical diagnosis of ALGS typical and atypical respectively in 2 cases. Blood samples were collected from patients and their parents and genomic DNA was extracted from lymphocytes. Target gene sequence capture and next generation sequencing was detected. Sanger sequencing was used to confirm the results of the patients and their parents. Cholestasis, heart defects, inverted triangular face and butterfly vertebrae were presented as main clinical features in 4 male patients. The first hospital visiting ages ranged from 3 months and 14 days to 3 years and 1 month. The age of onset ranged from 3 days to 42 days (median 23 days). According to the clinical diagnostic criteria of ALGS, patient 1 and patient 2 were considered as typical ALGS. The other 2 patients were considered as atypical ALGS. Four Jagged 1(JAG1) pathogenic mutations were detected. Three different missense mutations were detected in patient 1 to patient 3 with ALGS(c.839C>T(p.W280X), c. 703G>A(p.R235X), c. 1720C>T(p.V574M)). The JAG1 mutation of patient 3 was first reported. Patient 4 had one novel insertion mutation (c.1779_1780insA(p.Ile594AsnfsTer23)). Parental analysis verified that the JAG1 missense mutation of 3 patients were de novo. The results of sanger sequencing was consistent with the results of the next generation sequencing. Target gene sequence capture combined with next generation sequencing can detect two pathogenic genes in ALGS and test genes of other related diseases in infantile cholestatic diseases simultaneously and presents a high throughput, high efficiency and low cost. It may provide molecular diagnosis and treatment for clinicians with good clinical application prospects.
Repetitive DNA in the pea (Pisum sativum L.) genome: comprehensive characterization using 454 sequencing and comparison to soybean and Medicago truncatula

PubMed Central

Macas, Jiří; Neumann, Pavel; Navrátilová, Alice

2007-01-01

Background Extraordinary size variation of higher plant nuclear genomes is in large part caused by differences in accumulation of repetitive DNA. This makes repetitive DNA of great interest for studying the molecular mechanisms shaping architecture and function of complex plant genomes. However, due to methodological constraints of conventional cloning and sequencing, a global description of repeat composition is available for only a very limited number of higher plants. In order to provide further data required for investigating evolutionary patterns of repeated DNA within and between species, we used a novel approach based on massive parallel sequencing which allowed a comprehensive repeat characterization in our model species, garden pea (Pisum sativum). Results Analysis of 33.3 Mb sequence data resulted in quantification and partial sequence reconstruction of major repeat families occurring in the pea genome with at least thousands of copies. Our results showed that the pea genome is dominated by LTR-retrotransposons, estimated at 140,000 copies/1C. Ty3/gypsy elements are less diverse and accumulated to higher copy numbers than Ty1/copia. This is in part due to a large population of Ogre-like retrotransposons which alone make up over 20% of the genome. In addition to numerous types of mobile elements, we have discovered a set of novel satellite repeats and two additional variants of telomeric sequences. Comparative genome analysis revealed that there are only a few repeat sequences conserved between pea and soybean genomes. On the other hand, all major families of pea mobile elements are well represented in M. truncatula. Conclusion We have demonstrated that even in a species with a relatively large genome like pea, where a single 454-sequencing run provided only 0.77% coverage, the generated sequences were sufficient to reconstruct and analyze major repeat families corresponding to a total of 35–48% of the genome. These data provide a starting point for further investigations of legume plant genomes based on their global comparative analysis and for the development of more sophisticated approaches for data mining. PMID:18031571
A long PCR–based approach for DNA enrichment prior to next-generation sequencing for systematic studies1

PubMed Central

Uribe-Convers, Simon; Duke, Justin R.; Moore, Michael J.; Tank, David C.

2014-01-01

• Premise of the study: We present an alternative approach for molecular systematic studies that combines long PCR and next-generation sequencing. Our approach can be used to generate templates from any DNA source for next-generation sequencing. Here we test our approach by amplifying complete chloroplast genomes, and we present a set of 58 potentially universal primers for angiosperms to do so. Additionally, this approach is likely to be particularly useful for nuclear and mitochondrial regions. • Methods and Results: Chloroplast genomes of 30 species across angiosperms were amplified to test our approach. Amplification success varied depending on whether PCR conditions were optimized for a given taxon. To further test our approach, some amplicons were sequenced on an Illumina HiSeq 2000. • Conclusions: Although here we tested this approach by sequencing plastomes, long PCR amplicons could be generated using DNA from any genome, expanding the possibilities of this approach for molecular systematic studies. PMID:25202592
An evolution based biosensor receptor DNA sequence generation algorithm.

PubMed

Kim, Eungyeong; Lee, Malrey; Gatton, Thomas M; Lee, Jaewan; Zang, Yupeng

2010-01-01

A biosensor is composed of a bioreceptor, an associated recognition molecule, and a signal transducer that can selectively detect target substances for analysis. DNA based biosensors utilize receptor molecules that allow hybridization with the target analyte. However, most DNA biosensor research uses oligonucleotides as the target analytes and does not address the potential problems of real samples. The identification of recognition molecules suitable for real target analyte samples is an important step towards further development of DNA biosensors. This study examines the characteristics of DNA used as bioreceptors and proposes a hybrid evolution-based DNA sequence generating algorithm, based on DNA computing, to identify suitable DNA bioreceptor recognition molecules for stable hybridization with real target substances. The Traveling Salesman Problem (TSP) approach is applied in the proposed algorithm to evaluate the safety and fitness of the generated DNA sequences. This approach improves efficiency and stability for enhanced and variable-length DNA sequence generation and allows extension to generation of variable-length DNA sequences with diverse receptor recognition requirements.
Generation of novel motor sequences: the neural correlates of musical improvisation.

PubMed

Berkowitz, Aaron L; Ansari, Daniel

2008-06-01

While some motor behavior is instinctive and stereotyped or learned and re-executed, much action is a spontaneous response to a novel set of environmental conditions. The neural correlates of both pre-learned and cued motor sequences have been previously studied, but novel motor behavior has thus far not been examined through brain imaging. In this paper, we report a study of musical improvisation in trained pianists with functional magnetic resonance imaging (fMRI), using improvisation as a case study of novel action generation. We demonstrate that both rhythmic (temporal) and melodic (ordinal) motor sequence creation modulate activity in a network of brain regions comprised of the dorsal premotor cortex, the rostral cingulate zone of the anterior cingulate cortex, and the inferior frontal gyrus. These findings are consistent with a role for the dorsal premotor cortex in movement coordination, the rostral cingulate zone in voluntary selection, and the inferior frontal gyrus in sequence generation. Thus, the invention of novel motor sequences in musical improvisation recruits a network of brain regions coordinated to generate possible sequences, select among them, and execute the decided-upon sequence.
Existing and emerging detection technologies for DNA (Deoxyribonucleic Acid) finger printing, sequencing, bio- and analytical chips: a multidisciplinary development unifying molecular biology, chemical and electronics engineering.

PubMed

Kumar Khanna, Vinod

2007-01-01

The current status and research trends of detection techniques for DNA-based analysis such as DNA finger printing, sequencing, biochips and allied fields are examined. An overview of main detectors is presented vis-à-vis these DNA operations. The biochip method is explained, the role of micro- and nanoelectronic technologies in biochip realization is highlighted, various optical and electrical detection principles employed in biochips are indicated, and the operational mechanisms of these detection devices are described. Although a diversity of biochips for diagnostic and therapeutic applications has been demonstrated in research laboratories worldwide, only some of these chips have entered the clinical market, and more chips are awaiting commercialization. The necessity of tagging is eliminated in refractive-index change based devices, but the basic flaw of indirect nature of most detection methodologies can only be overcome by generic and/or reagentless DNA sensors such as the conductance-based approach and the DNA-single electron transistor (DNA-SET) structure. Devices of the electrical detection-based category are expected to pave the pathway for the next-generation DNA chips. The review provides a comprehensive coverage of the detection technologies for DNA finger printing, sequencing and related techniques, encompassing a variety of methods from the primitive art to the state-of-the-art scenario as well as promising methods for the future.
Late-onset Bartter syndrome type II.

PubMed

Gollasch, Benjamin; Anistan, Yoland-Marie; Canaan-Kühl, Sima; Gollasch, Maik

2017-10-01

Mutations in the ROMK1 potassium channel gene ( KCNJ1 ) cause antenatal/neonatal Bartter syndrome type II (aBS II), a renal disorder that begins in utero , accounting for the polyhydramnios and premature delivery that is typical in affected infants, who develop massive renal salt wasting, hypokalaemic metabolic alkalosis, secondary hyperreninaemic hyperaldosteronism, hypercalciuria and nephrocalcinosis. This BS type is believed to represent a disorder of the infancy, but not in adulthood. We herein describe a female patient with a remarkably late-onset and mild clinical manifestation of BS II with compound heterozygous KCNJ1 missense mutations, consisting of a novel c.197T > A (p.I66N) and a previously reported c.875G > A (p.R292Q) KCNJ1 mutation. We implemented and evaluated the performance of two different bioinformatics-based approaches of targeted massively parallel sequencing [next generation sequencing (NGS)] in defining the molecular diagnosis. Our results demonstrate that aBS II may be suspected in patients with a late-onset phenotype. Our experimental approach of NGS-based mutation screening combined with Sanger sequencing proved to be a reliable molecular approach for defining the clinical diagnosis in our patient, and results in important differential diagnostic and therapeutic implications for patients with BS. Our results could have a significant impact on the diagnosis and methodological approaches of genetic testing in other patients with clinical unclassified phenotypes of nephrocalcinosis and congenital renal electrolyte abnormalities.
Whole genome sequencing options for bacterial strain typing and epidemiologic analysis based on single nucleotide polymorphism versus gene-by-gene-based approaches.

PubMed

Schürch, A C; Arredondo-Alonso, S; Willems, R J L; Goering, R V

2018-04-01

Whole genome sequence (WGS)-based strain typing finds increasing use in the epidemiologic analysis of bacterial pathogens in both public health as well as more localized infection control settings. This minireview describes methodologic approaches that have been explored for WGS-based epidemiologic analysis and considers the challenges and pitfalls of data interpretation. Personal collection of relevant publications. When applying WGS to study the molecular epidemiology of bacterial pathogens, genomic variability between strains is translated into measures of distance by determining single nucleotide polymorphisms in core genome alignments or by indexing allelic variation in hundreds to thousands of core genes, assigning types to unique allelic profiles. Interpreting isolate relatedness from these distances is highly organism specific, and attempts to establish species-specific cutoffs are unlikely to be generally applicable. In cases where single nucleotide polymorphism or core gene typing do not provide the resolution necessary for accurate assessment of the epidemiology of bacterial pathogens, inclusion of accessory gene or plasmid sequences may provide the additional required discrimination. As with all epidemiologic analysis, realizing the full potential of the revolutionary advances in WGS-based approaches requires understanding and dealing with issues related to the fundamental steps of data generation and interpretation. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.
Making new genetic diagnoses with old data: iterative reanalysis and reporting from genome-wide data in 1,133 families with developmental disorders.

PubMed

Wright, Caroline F; McRae, Jeremy F; Clayton, Stephen; Gallone, Giuseppe; Aitken, Stuart; FitzGerald, Tomas W; Jones, Philip; Prigmore, Elena; Rajan, Diana; Lord, Jenny; Sifrim, Alejandro; Kelsell, Rosemary; Parker, Michael J; Barrett, Jeffrey C; Hurles, Matthew E; FitzPatrick, David R; Firth, Helen V

2018-01-11

PurposeGiven the rapid pace of discovery in rare disease genomics, it is likely that improvements in diagnostic yield can be made by systematically reanalyzing previously generated genomic sequence data in light of new knowledge.MethodsWe tested this hypothesis in the United Kingdom-wide Deciphering Developmental Disorders study, where in 2014 we reported a diagnostic yield of 27% through whole-exome sequencing of 1,133 children with severe developmental disorders and their parents. We reanalyzed existing data using improved variant calling methodologies, novel variant detection algorithms, updated variant annotation, evidence-based filtering strategies, and newly discovered disease-associated genes.ResultsWe are now able to diagnose an additional 182 individuals, taking our overall diagnostic yield to 454/1,133 (40%), and another 43 (4%) have a finding of uncertain clinical significance. The majority of these new diagnoses are due to novel developmental disorder-associated genes discovered since our original publication.ConclusionThis study highlights the importance of coupling large-scale research with clinical practice, and of discussing the possibility of iterative reanalysis and recontact with patients and health professionals at an early stage. We estimate that implementing parent-offspring whole-exome sequencing as a first-line diagnostic test for developmental disorders would diagnose >50% of patients.GENETICS in MEDICINE advance online publication, 11 January 2018; doi:10.1038/gim.2017.246.
A generic Transcriptomics Reporting Framework (TRF) for 'omics data processing and analysis.

PubMed

Gant, Timothy W; Sauer, Ursula G; Zhang, Shu-Dong; Chorley, Brian N; Hackermüller, Jörg; Perdichizzi, Stefania; Tollefsen, Knut E; van Ravenzwaay, Ben; Yauk, Carole; Tong, Weida; Poole, Alan

2017-12-01

A generic Transcriptomics Reporting Framework (TRF) is presented that lists parameters that should be reported in 'omics studies used in a regulatory context. The TRF encompasses the processes from transcriptome profiling from data generation to a processed list of differentially expressed genes (DEGs) ready for interpretation. Included within the TRF is a reference baseline analysis (RBA) that encompasses raw data selection; data normalisation; recognition of outliers; and statistical analysis. The TRF itself does not dictate the methodology for data processing, but deals with what should be reported. Its principles are also applicable to sequencing data and other 'omics. In contrast, the RBA specifies a simple data processing and analysis methodology that is designed to provide a comparison point for other approaches and is exemplified here by a case study. By providing transparency on the steps applied during 'omics data processing and analysis, the TRF will increase confidence processing of 'omics data, and regulatory use. Applicability of the TRF is ensured by its simplicity and generality. The TRF can be applied to all types of regulatory 'omics studies, and it can be executed using different commonly available software tools. Crown Copyright © 2017. Published by Elsevier Inc. All rights reserved.
Rigid Dipeptide Mimics: Synthesis of Enantiopure 5- and 7-Benzyl and 5,7-Dibenzyl Indolizidinone Amino Acids via Enolization and Alkylation of delta-Oxo alpha,omega-Di-[N-(9-(9-phenylfluorenyl))amino]azelate Esters.

PubMed

Polyak, Felix; Lubell, William D.

1998-08-21

Azabicyclo[X.Y.0]alkane amino acids are tools for constructing mimics of peptide structure and templates for generating combinatorial libraries for drug discovery. Our methodology for synthesizing these conformationally rigid dipeptides has been elaborated such that alkyl groups can be appended onto the heterocycle to generate mimics of peptide backbone and side-chain structure. Inexpensive glutamic acid was employed as chiral educt in a Claisen condensation/ketone alkylation/reductive amination/lactam cyclization sequence that furnished alkyl-branched azabicyclo[4.3.0]alkane amino acid. Enantiopure 5-benzyl-, 7-benzyl-, and 5,7-dibenzylindolizidinone amino acids 2-4 were stereoselectively synthesized via efficient reaction sequences featuring the alkylation of di-tert-butyl alpha,omega-di-[N-(PhF)amino]azelate delta-ketone 5. A variety of alkyl halides were readily added to the enolate of ketone 5 to provide mono- and dialkylated ketones 6 and 7. Hydride additions to 6 and 7, methanesulfonations, and intramolecular S(N)2 displacements by the PhF amine gave 5-alkylprolines that were converted by lactam cyclizations into 7- and 5-benzyl-, as well as 5,7-dibenzyl-2-oxo-3-N-(BOC)amino-1-azabicyclo[4.3.0]nonane-9-carboxylate methyl esters 10, 11, and 14. Epimerization of the alkyl-branched stereocenter via an iminium-enaminium equilibrium proved effective for controlling diastereoselectivity in reductive aminations with 6 and 7 in order to furnish 5-alkylprolines that were similarly converted to 7- benzyl- and 5,7-dibenzylindolizidinone N-(BOC)amino esters 10 and 14. Ester hydrolysis with hydroxide ion and potassium trimethylsilanolate then gave enantiopure indolizidinone amino acids 2-4. Epimerization at C-9 of benzylindolizidinone amino esters was also used to provide alternative diastereomers of 10, 11, and 14. This practical methodology for introducing side-chain groups onto the heterocycle with regioselective and diastereoselective control is designed to enhance the use of alkyl-branched azabicycloalkane amino acids for the exploration of conformation-activity relationships of various biologically active peptides.
Targeted next generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes

DTIC Science & Technology

2016-07-06

1 Targeted next-generation sequencing for the detection of ciprofloxacin resistance markers using molecular inversion probes Christopher P...development and evaluation of a panel of 44 single-stranded molecular inversion probes (MIPs) coupled to next-generation sequencing (NGS) for the...padlock and molecular inversion probes as upfront enrichment steps for use with NGS showed the specificity and multiplexability of these techniques
Detection of Babesia canis rossi, B. canis vogeli, and Hepatozoon canis in Dogs in a Village of Eastern Sudan by Using a Screening PCR and Sequencing Methodologies

PubMed Central

Oyamada, Maremichi; Davoust, Bernard; Boni, Mickaël; Dereure, Jacques; Bucheton, Bruno; Hammad, Awad; Itamoto, Kazuhito; Okuda, Masaru; Inokuma, Hisashi

2005-01-01

Babesia and Hepatozoon infections of dogs in a village of eastern Sudan were analyzed by using a single PCR and sequencing. Among 78 dogs, 5 were infected with Babesia canis rossi and 2 others were infected with B. canis vogeli. Thirty-three dogs were positive for Hepatozoon. Hepatozoon canis was detected by sequence analysis. PMID:16275954
Detection of Babesia canis rossi, B. canis vogeli, and Hepatozoon canis in dogs in a village of eastern Sudan by using a screening PCR and sequencing methodologies.

PubMed

Oyamada, Maremichi; Davoust, Bernard; Boni, Mickaël; Dereure, Jacques; Bucheton, Bruno; Hammad, Awad; Itamoto, Kazuhito; Okuda, Masaru; Inokuma, Hisashi

2005-11-01

Babesia and Hepatozoon infections of dogs in a village of eastern Sudan were analyzed by using a single PCR and sequencing. Among 78 dogs, 5 were infected with Babesia canis rossi and 2 others were infected with B. canis vogeli. Thirty-three dogs were positive for Hepatozoon. Hepatozoon canis was detected by sequence analysis.
An ultra-sparse code underliesthe generation of neural sequences in a songbird

NASA Astrophysics Data System (ADS)

Hahnloser, Richard H. R.; Kozhevnikov, Alexay A.; Fee, Michale S.

2002-09-01

Sequences of motor activity are encoded in many vertebrate brains by complex spatio-temporal patterns of neural activity; however, the neural circuit mechanisms underlying the generation of these pre-motor patterns are poorly understood. In songbirds, one prominent site of pre-motor activity is the forebrain robust nucleus of the archistriatum (RA), which generates stereotyped sequences of spike bursts during song and recapitulates these sequences during sleep. We show that the stereotyped sequences in RA are driven from nucleus HVC (high vocal centre), the principal pre-motor input to RA. Recordings of identified HVC neurons in sleeping and singing birds show that individual HVC neurons projecting onto RA neurons produce bursts sparsely, at a single, precise time during the RA sequence. These HVC neurons burst sequentially with respect to one another. We suggest that at each time in the RA sequence, the ensemble of active RA neurons is driven by a subpopulation of RA-projecting HVC neurons that is active only at that time. As a population, these HVC neurons may form an explicit representation of time in the sequence. Such a sparse representation, a temporal analogue of the `grandmother cell' concept for object recognition, eliminates the problem of temporal interference during sequence generation and learning attributed to more distributed representations.
Bit error rate tester using fast parallel generation of linear recurring sequences

DOEpatents

Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

2003-05-06

A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment.

PubMed

Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

2007-07-13

Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page.
Compression-based classification of biological sequences and structures via the Universal Similarity Metric: experimental assessment

PubMed Central

Ferragina, Paolo; Giancarlo, Raffaele; Greco, Valentina; Manzini, Giovanni; Valiente, Gabriel

2007-01-01

Background Similarity of sequences is a key mathematical notion for Classification and Phylogenetic studies in Biology. It is currently primarily handled using alignments. However, the alignment methods seem inadequate for post-genomic studies since they do not scale well with data set size and they seem to be confined only to genomic and proteomic sequences. Therefore, alignment-free similarity measures are actively pursued. Among those, USM (Universal Similarity Metric) has gained prominence. It is based on the deep theory of Kolmogorov Complexity and universality is its most novel striking feature. Since it can only be approximated via data compression, USM is a methodology rather than a formula quantifying the similarity of two strings. Three approximations of USM are available, namely UCD (Universal Compression Dissimilarity), NCD (Normalized Compression Dissimilarity) and CD (Compression Dissimilarity). Their applicability and robustness is tested on various data sets yielding a first massive quantitative estimate that the USM methodology and its approximations are of value. Despite the rich theory developed around USM, its experimental assessment has limitations: only a few data compressors have been tested in conjunction with USM and mostly at a qualitative level, no comparison among UCD, NCD and CD is available and no comparison of USM with existing methods, both based on alignments and not, seems to be available. Results We experimentally test the USM methodology by using 25 compressors, all three of its known approximations and six data sets of relevance to Molecular Biology. This offers the first systematic and quantitative experimental assessment of this methodology, that naturally complements the many theoretical and the preliminary experimental results available. Moreover, we compare the USM methodology both with methods based on alignments and not. We may group our experiments into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the methodology to discriminate and classify biological sequences and structures. A second set of experiments aims at assessing how well two commonly available classification algorithms, UPGMA (Unweighted Pair Group Method with Arithmetic Mean) and NJ (Neighbor Joining), can use the methodology to perform their task, their performance being evaluated against gold standards and with the use of well known statistical indexes, i.e., the F-measure and the partition distance. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of USM on biological data. The main ones are reported next. Conclusion UCD and NCD are indistinguishable, i.e., they yield nearly the same values of the statistical indexes we have used, accross experiments and data sets, while CD is almost always worse than both. UPGMA seems to yield better classification results with respect to NJ, i.e., better values of the statistical indexes (10% difference or above), on a substantial fraction of experiments, compressors and USM approximation choices. The compression program PPMd, based on PPM (Prediction by Partial Matching), for generic data and Gencompress for DNA, are the best performers among the compression algorithms we have used, although the difference in performance, as measured by statistical indexes, between them and the other algorithms depends critically on the data set and may not be as large as expected. PPMd used with UCD or NCD and UPGMA, on sequence data is very close, although worse, in performance with the alignment methods (less than 2% difference on the F-measure). Yet, it scales well with data set size and it can work on data other than sequences. In summary, our quantitative analysis naturally complements the rich theory behind USM and supports the conclusion that the methodology is worth using because of its robustness, flexibility, scalability, and competitiveness with existing techniques. In particular, the methodology applies to all biological data in textual format. The software and data sets are available under the GNU GPL at the supplementary material web page. PMID:17629909

Probe-Directed Degradation (PDD) for Flexible Removal of Unwanted cDNA Sequences from RNA-Seq Libraries.

PubMed

Archer, Stuart K; Shirokikh, Nikolay E; Preiss, Thomas

2015-04-01

Most applications for RNA-seq require the depletion of abundant transcripts to gain greater coverage of the underlying transcriptome. The sequences to be targeted for depletion depend on application and species and in many cases may not be supported by commercial depletion kits. This unit describes a method for generating RNA-seq libraries that incorporates probe-directed degradation (PDD), which can deplete any unwanted sequence set, with the low-bias split-adapter method of library generation (although many other library generation methods are in principle compatible). The overall strategy is suitable for applications requiring customized sequence depletion or where faithful representation of fragment ends and lack of sequence bias is paramount. We provide guidelines to rapidly design specific probes against the target sequence, and a detailed protocol for library generation using the split-adapter method including several strategies for streamlining the technique and reducing adapter dimer content. Copyright © 2015 John Wiley & Sons, Inc.
A statistical method for the detection of variants from next-generation resequencing of DNA pools.

PubMed

Bansal, Vikas

2010-06-15

Next-generation sequencing technologies have enabled the sequencing of several human genomes in their entirety. However, the routine resequencing of complete genomes remains infeasible. The massive capacity of next-generation sequencers can be harnessed for sequencing specific genomic regions in hundreds to thousands of individuals. Sequencing-based association studies are currently limited by the low level of multiplexing offered by sequencing platforms. Pooled sequencing represents a cost-effective approach for studying rare variants in large populations. To utilize the power of DNA pooling, it is important to accurately identify sequence variants from pooled sequencing data. Detection of rare variants from pooled sequencing represents a different challenge than detection of variants from individual sequencing. We describe a novel statistical approach, CRISP [Comprehensive Read analysis for Identification of Single Nucleotide Polymorphisms (SNPs) from Pooled sequencing] that is able to identify both rare and common variants by using two approaches: (i) comparing the distribution of allele counts across multiple pools using contingency tables and (ii) evaluating the probability of observing multiple non-reference base calls due to sequencing errors alone. Information about the distribution of reads between the forward and reverse strands and the size of the pools is also incorporated within this framework to filter out false variants. Validation of CRISP on two separate pooled sequencing datasets generated using the Illumina Genome Analyzer demonstrates that it can detect 80-85% of SNPs identified using individual sequencing while achieving a low false discovery rate (3-5%). Comparison with previous methods for pooled SNP detection demonstrates the significantly lower false positive and false negative rates for CRISP. Implementation of this method is available at http://polymorphism.scripps.edu/~vbansal/software/CRISP/.
"First generation" automated DNA sequencing technology.

PubMed

Slatko, Barton E; Kieleczawa, Jan; Ju, Jingyue; Gardner, Andrew F; Hendrickson, Cynthia L; Ausubel, Frederick M

2011-10-01

Beginning in the 1980s, automation of DNA sequencing has greatly increased throughput, reduced costs, and enabled large projects to be completed more easily. The development of automation technology paralleled the development of other aspects of DNA sequencing: better enzymes and chemistry, separation and imaging technology, sequencing protocols, robotics, and computational advancements (including base-calling algorithms with quality scores, database developments, and sequence analysis programs). Despite the emergence of high-throughput sequencing platforms, automated Sanger sequencing technology remains useful for many applications. This unit provides background and a description of the "First-Generation" automated DNA sequencing technology. It also includes protocols for using the current Applied Biosystems (ABI) automated DNA sequencing machines. © 2011 by John Wiley & Sons, Inc.
HLA genotyping by next-generation sequencing of complementary DNA.

PubMed

Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

2017-11-28

Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
Applications of Gene Targeting Technology to Mental Retardation and Developmental Disability Research

ERIC Educational Resources Information Center

Pimenta, Aurea F.; Levitt, Pat

2005-01-01

The human and mouse genome projects elucidated the sequence and position map of innumerous genes expressed in the central nervous system (CNS), advancing our ability to manipulate these sequences and create models to investigate regulation of gene expression and function. In this article, we reviewed gene targeting methodologies with emphasis on…
Sequenced Integration and the Identification of a Problem-Solving Approach through a Learning Process

ERIC Educational Resources Information Center

Cormas, Peter C.

2016-01-01

Preservice teachers (N = 27) in two sections of a sequenced, methodological and process integrated mathematics/science course solved a levers problem with three similar learning processes and a problem-solving approach, and identified a problem-solving approach through one different learning process. Similar learning processes used included:…
Transition to Science Teacher Educator: Tensions Experienced While Learning to Teach Lesson Sequencing

ERIC Educational Resources Information Center

Wiebke, Heidi; Park Rogers, Meredith

2014-01-01

This self-study investigated the tensions that I (Heidi) encountered when teaching elementary preservice teachers how to develop a coherent sequence of five science lessons. Four lesson planning components guided me in developing a series of lessons to support the preservice teachers with this exercise. Employing self-study methodology, data…
Gene sequence analyses and other DNA-based methods for yeast species recognition

USDA-ARS?s Scientific Manuscript database

DNA sequence analyses, as well as other DNA-based methodologies, have transformed the way in which yeasts are identified. The focus of this chapter will be on the resolution of species using various types of DNA comparisons. In other chapters in this book, Rozpedowska, Piškur and Wolfe discuss mul...
Exploiting genotyping by sequencing to characterize the genomic structure of the American cranberry through high-density linkage mapping

USDA-ARS?s Scientific Manuscript database

The application of genotyping by sequencing (GBS) approaches, combined with data imputation methodologies, is narrowing the genetic knowledge gap between major and understudied, minor crops. GBS is an excellent tool to characterize the genomic structure of recently domesticated (~200 years) and unde...
Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

ERIC Educational Resources Information Center

Kinnebrew, John S.; Biswas, Gautam

2012-01-01

Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…
Genome Sequencing and Assembly by Long Reads in Plants

PubMed Central

Li, Changsheng; Lin, Feng; An, Dong; Huang, Ruidong

2017-01-01

Plant genomes generated by Sanger and Next Generation Sequencing (NGS) have provided insight into species diversity and evolution. However, Sanger sequencing is limited in its applications due to high cost, labor intensity, and low throughput, while NGS reads are too short to resolve abundant repeats and polyploidy, leading to incomplete or ambiguous assemblies. The advent and improvement of long-read sequencing by Third Generation Sequencing (TGS) methods such as PacBio and Nanopore have shown promise in producing high-quality assemblies for complex genomes. Here, we review the development of sequencing, introducing the application as well as considerations of experimental design in TGS of plant genomes. We also introduce recent revolutionary scaffolding technologies including BioNano, Hi-C, and 10× Genomics. We expect that the informative guidance for genome sequencing and assembly by long reads will benefit the initiation of scientists’ projects. PMID:29283420
Metabarcoding avian diets at airports: implications for birdstrike hazard management planning

PubMed Central

2013-01-01

Background Wildlife collisions with aircraft cost the airline industry billions of dollars per annum and represent a public safety risk. Clearly, adapting aerodrome habitats to become less attractive to hazardous wildlife will reduce the incidence of collisions. Formulating effective habitat management strategies relies on accurate species identification of high-risk species. This can be successfully achieved for all strikes either through morphology and/or DNA-based identifications. Beyond species identification, dietary analysis of birdstrike gut contents can provide valuable intelligence for airport hazard management practices in regards to what food is attracting which species to aerodromes. Here, we present birdstrike identification and dietary data from Perth Airport, Western Australia, an aerodrome that saw approximately 140,000 aircraft movements in 2012. Next-generation high throughput DNA sequencing was employed to investigate 77 carcasses from 16 bird species collected over a 12-month period. Five DNA markers, which broadly characterize vertebrates, invertebrates and plants, were used to target three animal mitochondrial genes (12S rRNA, 16S rRNA, and COI) and a plastid gene (trnL) from DNA extracted from birdstrike carcass gastrointestinal tracts. Results Over 151,000 DNA sequences were generated, filtered and analyzed by a fusion-tag amplicon sequencing approach. Across the 77 carcasses, the most commonly identified vertebrate was Mus musculus (house mouse). Acrididae (grasshoppers) was the most common invertebrate family identified, and Poaceae (grasses) the most commonly identified plant family. The DNA-based dietary data has the potential to provide some key insights into feeding ecologies within and around the aerodrome. Conclusions The data generated here, together with the methodological approach, will greatly assist in the development of hazard management plans and, in combination with existing observational studies, provide an improved way to monitor the effectiveness of mitigation strategies (for example, netting of water, grass type, insecticides and so on) at aerodromes. It is hoped that with the insights provided by dietary data, airports will be able to allocate financial resources to the areas that will achieve the best outcomes for birdstrike reduction. PMID:24330620
Metabarcoding avian diets at airports: implications for birdstrike hazard management planning.

PubMed

Coghlan, Megan L; White, Nicole E; Murray, Dáithí C; Houston, Jayne; Rutherford, William; Bellgard, Matthew I; Haile, James; Bunce, Michael

2013-12-11

Wildlife collisions with aircraft cost the airline industry billions of dollars per annum and represent a public safety risk. Clearly, adapting aerodrome habitats to become less attractive to hazardous wildlife will reduce the incidence of collisions. Formulating effective habitat management strategies relies on accurate species identification of high-risk species. This can be successfully achieved for all strikes either through morphology and/or DNA-based identifications. Beyond species identification, dietary analysis of birdstrike gut contents can provide valuable intelligence for airport hazard management practices in regards to what food is attracting which species to aerodromes. Here, we present birdstrike identification and dietary data from Perth Airport, Western Australia, an aerodrome that saw approximately 140,000 aircraft movements in 2012. Next-generation high throughput DNA sequencing was employed to investigate 77 carcasses from 16 bird species collected over a 12-month period. Five DNA markers, which broadly characterize vertebrates, invertebrates and plants, were used to target three animal mitochondrial genes (12S rRNA, 16S rRNA, and COI) and a plastid gene (trnL) from DNA extracted from birdstrike carcass gastrointestinal tracts. Over 151,000 DNA sequences were generated, filtered and analyzed by a fusion-tag amplicon sequencing approach. Across the 77 carcasses, the most commonly identified vertebrate was Mus musculus (house mouse). Acrididae (grasshoppers) was the most common invertebrate family identified, and Poaceae (grasses) the most commonly identified plant family. The DNA-based dietary data has the potential to provide some key insights into feeding ecologies within and around the aerodrome. The data generated here, together with the methodological approach, will greatly assist in the development of hazard management plans and, in combination with existing observational studies, provide an improved way to monitor the effectiveness of mitigation strategies (for example, netting of water, grass type, insecticides and so on) at aerodromes. It is hoped that with the insights provided by dietary data, airports will be able to allocate financial resources to the areas that will achieve the best outcomes for birdstrike reduction.
Next-generation sequencing for targeted discovery of rare mutations in rice

USDA-ARS?s Scientific Manuscript database

Advances in DNA sequencing (i.e., next-generation sequencing, NGS) have greatly increased the power and efficiency of detecting rare mutations in large mutant populations. Targeting Induced Local Lesions in Genomes (TILLING) is a reverse genetics approach for identifying gene mutations resulting fro...
Exploring the Inner and Outer Cultural Landscapes of Counseling Candidates towards Diverse Students and Families through Self-Reflection

ERIC Educational Resources Information Center

Montes, Adonay A.; Rodriguez-Valls, Fernando; Schroeder, Laurie

2014-01-01

This article presents an interpersonal methodology designed to increase the cultural awareness of counselor candidates. This methodology was implemented through a sequence of activities, which was part of a multicultural course in the counseling credential program in a university located in Southern California. The goal was to enrich future…
Transcriptome Sequencing Revealed Significant Alteration of Cortical Promoter Usage and Splicing in Schizophrenia

PubMed Central

Wu, Jing Qin; Wang, Xi; Beveridge, Natalie J.; Tooney, Paul A.; Scott, Rodney J.; Carr, Vaughan J.; Cairns, Murray J.

2012-01-01

Background While hybridization based analysis of the cortical transcriptome has provided important insight into the neuropathology of schizophrenia, it represents a restricted view of disease-associated gene activity based on predetermined probes. By contrast, sequencing technology can provide un-biased analysis of transcription at nucleotide resolution. Here we use this approach to investigate schizophrenia-associated cortical gene expression. Methodology/Principal Findings The data was generated from 76 bp reads of RNA-Seq, aligned to the reference genome and assembled into transcripts for quantification of exons, splice variants and alternative promoters in postmortem superior temporal gyrus (STG/BA22) from 9 male subjects with schizophrenia and 9 matched non-psychiatric controls. Differentially expressed genes were then subjected to further sequence and functional group analysis. The output, amounting to more than 38 Gb of sequence, revealed significant alteration of gene expression including many previously shown to be associated with schizophrenia. Gene ontology enrichment analysis followed by functional map construction identified three functional clusters highly relevant to schizophrenia including neurotransmission related functions, synaptic vesicle trafficking, and neural development. Significantly, more than 2000 genes displayed schizophrenia-associated alternative promoter usage and more than 1000 genes showed differential splicing (FDR<0.05). Both types of transcriptional isoforms were exemplified by reads aligned to the neurodevelopmentally significant doublecortin-like kinase 1 (DCLK1) gene. Conclusions This study provided the first deep and un-biased analysis of schizophrenia-associated transcriptional diversity within the STG, and revealed variants with important implications for the complex pathophysiology of schizophrenia. PMID:22558445
A Chromosome 7 Pericentric Inversion Defined at Single-Nucleotide Resolution Using Diagnostic Whole Genome Sequencing in a Patient with Hand-Foot-Genital Syndrome.

PubMed

Watson, Christopher M; Crinnion, Laura A; Harrison, Sally M; Lascelles, Carolina; Antanaviciute, Agne; Carr, Ian M; Bonthron, David T; Sheridan, Eamonn

2016-01-01

Next generation sequencing methodologies are facilitating the rapid characterisation of novel structural variants at nucleotide resolution. These approaches are particularly applicable to variants initially identified using alternative molecular methods. We report a child born with bilateral postaxial syndactyly of the feet and bilateral fifth finger clinodactyly. This was presumed to be an autosomal recessive syndrome, due to the family history of consanguinity. Karyotype analysis revealed a homozygous pericentric inversion of chromosome 7 (46,XX,inv(7)(p15q21)x2) which was confirmed to be heterozygous in both unaffected parents. Since the resolution of the karyotype was insufficient to identify any putatively causative gene, we undertook medium-coverage whole genome sequencing using paired-end reads, in order to elucidate the molecular breakpoints. In a two-step analysis, we first narrowed down the region by identifying discordant read-pairs, and then determined the precise molecular breakpoint by analysing the mapping locations of "soft-clipped" breakpoint-spanning reads. PCR and Sanger sequencing confirmed the identified breakpoints, both of which were located in intergenic regions. Significantly, the 7p15 breakpoint was located 523 kb upstream of HOXA13, the locus for hand-foot-genital syndrome. By inference from studies of HOXA locus control in the mouse, we suggest that the inversion has delocalised a HOXA13 enhancer to produce the phenotype observed in our patient. This study demonstrates how modern genetic diagnostic approach can characterise structural variants at nucleotide resolution and provide potential insights into functional regulation.
Methodological reporting quality of randomized controlled trials: A survey of seven core journals of orthopaedics from Mainland China over 5 years following the CONSORT statement.

PubMed

Zhang, J; Chen, X; Zhu, Q; Cui, J; Cao, L; Su, J

2016-11-01

In recent years, the number of randomized controlled trials (RCTs) in the field of orthopaedics is increasing in Mainland China. However, randomized controlled trials (RCTs) are inclined to bias if they lack methodological quality. Therefore, we performed a survey of RCT to assess: (1) What about the quality of RCTs in the field of orthopedics in Mainland China? (2) Whether there is difference between the core journals of the Chinese department of orthopedics and Orthopaedics Traumatology Surgery & Research (OTSR). This research aimed to evaluate the methodological reporting quality according to the CONSORT statement of randomized controlled trials (RCTs) in seven key orthopaedic journals published in Mainland China over 5 years from 2010 to 2014. All of the articles were hand researched on Chongqing VIP database between 2010 and 2014. Studies were considered eligible if the words "random", "randomly", "randomization", "randomized" were employed to describe the allocation way. Trials including animals, cadavers, trials published as abstracts and case report, trials dealing with subgroups analysis, or trials without the outcomes were excluded. In addition, eight articles selected from Orthopaedics Traumatology Surgery & Research (OTSR) between 2010 and 2014 were included in this study for comparison. The identified RCTs are analyzed using a modified version of the Consolidated Standards of Reporting Trials (CONSORT), including the sample size calculation, allocation sequence generation, allocation concealment, blinding and handling of dropouts. A total of 222 RCTs were identified in seven core orthopaedic journals. No trials reported adequate sample size calculation, 74 (33.4%) reported adequate allocation generation, 8 (3.7%) trials reported adequate allocation concealment, 18 (8.1%) trials reported adequate blinding and 16 (7.2%) trials reported handling of dropouts. In OTSR, 1 (12.5%) trial reported adequate sample size calculation, 4 (50.0%) reported adequate allocation generation, 1 (12.5%) trials reported adequate allocation concealment, 2 (25.0%) trials reported adequate blinding and 5 (62.5%) trials reported handling of dropouts. There were statistical differences as for sample size calculation and handling of dropouts between papers from Mainland China and OTSR (P<0.05). The findings of this study show that the methodological reporting quality of RCTs in seven core orthopaedic journals from the Mainland China is far from satisfaction and it needs to further improve to keep up with the standards of the CONSORT statement. Level III case control. Copyright © 2016 Elsevier Masson SAS. All rights reserved.
Twenty-one genome sequences from Pseudomonas species and 19 genome sequences from diverse bacteria isolated from the rhizosphere and endosphere of Populus deltoides.

PubMed

Brown, Steven D; Utturkar, Sagar M; Klingeman, Dawn M; Johnson, Courtney M; Martin, Stanton L; Land, Miriam L; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A

2012-11-01

To aid in the investigation of the Populus deltoides microbiome, we generated draft genome sequences for 21 Pseudomonas strains and 19 other diverse bacteria isolated from Populus deltoides roots. Genome sequences for isolates similar to Acidovorax, Bradyrhizobium, Brevibacillus, Caulobacter, Chryseobacterium, Flavobacterium, Herbaspirillum, Novosphingobium, Pantoea, Phyllobacterium, Polaromonas, Rhizobium, Sphingobium, and Variovorax were generated.
DNA sequence similarity recognition by hybridization to short oligomers

DOEpatents

Milosavljevic, Aleksandar

1999-01-01

Methods are disclosed for the comparison of nucleic acid sequences. Data is generated by hybridizing sets of oligomers with target nucleic acids. The data thus generated is manipulated simultaneously with respect to both (i) matching between oligomers and (ii) matching between oligomers and putative reference sequences available in databases. Using data compression methods to manipulate this mutual information, sequences for the target can be constructed.

Implementation of a quantum random number generator based on the optimal clustering of photocounts

NASA Astrophysics Data System (ADS)

Balygin, K. A.; Zaitsev, V. I.; Klimov, A. N.; Kulik, S. P.; Molotkov, S. N.

2017-10-01

To implement quantum random number generators, it is fundamentally important to have a mathematically provable and experimentally testable process of measurements of a system from which an initial random sequence is generated. This makes sure that randomness indeed has a quantum nature. A quantum random number generator has been implemented with the use of the detection of quasi-single-photon radiation by a silicon photomultiplier (SiPM) matrix, which makes it possible to reliably reach the Poisson statistics of photocounts. The choice and use of the optimal clustering of photocounts for the initial sequence of photodetection events and a method of extraction of a random sequence of 0's and 1's, which is polynomial in the length of the sequence, have made it possible to reach a yield rate of 64 Mbit/s of the output certainly random sequence.
An Adapting Auditory-motor Feedback Loop Can Contribute to Generating Vocal Repetition

PubMed Central

Brainard, Michael S.; Jin, Dezhe Z.

2015-01-01

Consecutive repetition of actions is common in behavioral sequences. Although integration of sensory feedback with internal motor programs is important for sequence generation, if and how feedback contributes to repetitive actions is poorly understood. Here we study how auditory feedback contributes to generating repetitive syllable sequences in songbirds. We propose that auditory signals provide positive feedback to ongoing motor commands, but this influence decays as feedback weakens from response adaptation during syllable repetitions. Computational models show that this mechanism explains repeat distributions observed in Bengalese finch song. We experimentally confirmed two predictions of this mechanism in Bengalese finches: removal of auditory feedback by deafening reduces syllable repetitions; and neural responses to auditory playback of repeated syllable sequences gradually adapt in sensory-motor nucleus HVC. Together, our results implicate a positive auditory-feedback loop with adaptation in generating repetitive vocalizations, and suggest sensory adaptation is important for feedback control of motor sequences. PMID:26448054
Gene transfer and expression in plants.

PubMed

Lorence, Argelia; Verpoorte, Robert

2004-01-01

Until recently, agriculture and plant breeding relied solely on the accumulated experience of generations of farmers and breeders that is, on sexual transfer of genes between plant species. However, recent developments in plant molecular biology and genomics now give us access to knowledge and understanding of plant genomes and the possibility of modifying them. This chapter presents an updated overview of the two most powerful technologies for transferring genetic material (DNA) into plants: Agrobacterium-mediated transformation and microparticle bombardment (biolistics). Some of the topics that are discussed in detail are the main variables controlling the transformation efficiency that can be achieved using each one of these approaches; the advantages and limitations of each methodology; transient versus stable transformation approaches; the potential of some in planta transformation systems; alternatives to developing transgenic plants without selection markers; the availability of diverse genetic tools generated as part of the genome sequencing of different plant species; transgene expression, gene silencing, and their association with regulatory elements; and prospects and ways to possibly overcome some transgene expression difficulties, in particular the use of matrix-attachment regions (MARs).
Decomposing Oncogenic Transcriptional Signatures to Generate Maps of Divergent Cellular States.

PubMed

Kim, Jong Wook; Abudayyeh, Omar O; Yeerna, Huwate; Yeang, Chen-Hsiang; Stewart, Michelle; Jenkins, Russell W; Kitajima, Shunsuke; Konieczkowski, David J; Medetgul-Ernar, Kate; Cavazos, Taylor; Mah, Clarence; Ting, Stephanie; Van Allen, Eliezer M; Cohen, Ofir; Mcdermott, John; Damato, Emily; Aguirre, Andrew J; Liang, Jonathan; Liberzon, Arthur; Alexe, Gabriella; Doench, John; Ghandi, Mahmoud; Vazquez, Francisca; Weir, Barbara A; Tsherniak, Aviad; Subramanian, Aravind; Meneses-Cime, Karina; Park, Jason; Clemons, Paul; Garraway, Levi A; Thomas, David; Boehm, Jesse S; Barbie, David A; Hahn, William C; Mesirov, Jill P; Tamayo, Pablo

2017-08-23

The systematic sequencing of the cancer genome has led to the identification of numerous genetic alterations in cancer. However, a deeper understanding of the functional consequences of these alterations is necessary to guide appropriate therapeutic strategies. Here, we describe Onco-GPS (OncoGenic Positioning System), a data-driven analysis framework to organize individual tumor samples with shared oncogenic alterations onto a reference map defined by their underlying cellular states. We applied the methodology to the RAS pathway and identified nine distinct components that reflect transcriptional activities downstream of RAS and defined several functional states associated with patterns of transcriptional component activation that associates with genomic hallmarks and response to genetic and pharmacological perturbations. These results show that the Onco-GPS is an effective approach to explore the complex landscape of oncogenic cellular states across cancers, and an analytic framework to summarize knowledge, establish relationships, and generate more effective disease models for research or as part of individualized precision medicine paradigms. Copyright © 2017 Elsevier Inc. All rights reserved.
Rank-order-selective neurons form a temporal basis set for the generation of motor sequences.

PubMed

Salinas, Emilio

2009-04-08

Many behaviors are composed of a series of elementary motor actions that must occur in a specific order, but the neuronal mechanisms by which such motor sequences are generated are poorly understood. In particular, if a sequence consists of a few motor actions, a primate can learn to replicate it from memory after practicing it for just a few trials. How do the motor and premotor areas of the brain assemble motor sequences so fast? The network model presented here reveals part of the solution to this problem. The model is based on experiments showing that, during the performance of motor sequences, some cortical neurons are always activated at specific times, regardless of which motor action is being executed. In the model, a population of such rank-order-selective (ROS) cells drives a layer of downstream motor neurons so that these generate specific movements at different times in different sequences. A key ingredient of the model is that the amplitude of the ROS responses must be modulated by sequence identity. Because of this modulation, which is consistent with experimental reports, the network is able not only to produce multiple sequences accurately but also to learn a new sequence with minimal changes in connectivity. The ROS neurons modulated by sequence identity thus serve as a basis set for constructing arbitrary sequences of motor responses downstream. The underlying mechanism is analogous to the mechanism described in parietal areas for generating coordinate transformations in the spatial domain.
RANK-ORDER-SELECTIVE NEURONS FORM A TEMPORAL BASIS SET FOR THE GENERATION OF MOTOR SEQUENCES

PubMed Central

Salinas, Emilio

2009-01-01

Many behaviors are composed of a series of elementary motor actions that must occur in a specific order, but the neuronal mechanisms by which such motor sequences are generated are poorly understood. In particular, if a sequence consists of a few motor actions, a primate can learn to replicate it from memory after practicing it for just a few trials. How do the motor and premotor areas of the brain assemble motor sequences so fast? The network model presented here reveals part of the solution to this problem. The model is based on experiments showing that, during the performance of motor sequences, some cortical neurons are always activated at specific times, regardless of which motor action is being executed. In the model, a population of such rank-order-selective (ROS) cells drives a layer of downstream motor neurons so that these generate specific movements at different times in different sequences. A key ingredient of the model is that the amplitude of the ROS responses must be modulated by sequence identity. Because of this modulation, which is consistent with experimental reports, the network is able not only to produce multiple sequences accurately but also to learn a new sequence with minimal changes in connectivity. The ROS neurons modulated by sequence identity thus serve as a basis set for constructing arbitrary sequences of motor responses downstream. The underlying mechanism is analogous to the mechanism described in parietal areas for generating coordinate transformations in the spatial domain. PMID:19357265
GENESUS: a two-step sequence design program for DNA nanostructure self-assembly.

PubMed

Tsutsumi, Takanobu; Asakawa, Takeshi; Kanegami, Akemi; Okada, Takao; Tahira, Tomoko; Hayashi, Kenshi

2014-01-01

DNA has been recognized as an ideal material for bottom-up construction of nanometer scale structures by self-assembly. The generation of sequences optimized for unique self-assembly (GENESUS) program reported here is a straightforward method for generating sets of strand sequences optimized for self-assembly of arbitrarily designed DNA nanostructures by a generate-candidates-and-choose-the-best strategy. A scalable procedure to prepare single-stranded DNA having arbitrary sequences is also presented. Strands for the assembly of various structures were designed and successfully constructed, validating both the program and the procedure.
Preparation of next-generation sequencing libraries using Nextera™ technology: simultaneous DNA fragmentation and adaptor tagging by in vitro transposition.

PubMed

Caruccio, Nicholas

2011-01-01

DNA library preparation is a common entry point and bottleneck for next-generation sequencing. Current methods generally consist of distinct steps that often involve significant sample loss and hands-on time: DNA fragmentation, end-polishing, and adaptor-ligation. In vitro transposition with Nextera™ Transposomes simultaneously fragments and covalently tags the target DNA, thereby combining these three distinct steps into a single reaction. Platform-specific sequencing adaptors can be added, and the sample can be enriched and bar-coded using limited-cycle PCR to prepare di-tagged DNA fragment libraries. Nextera technology offers a streamlined, efficient, and high-throughput method for generating bar-coded libraries compatible with multiple next-generation sequencing platforms.
Dipolar recoupling in solid state NMR by phase alternating pulse sequences

PubMed Central

Lin, J.; Bayro, M.; Griffin, R. G.; Khaneja, N.

2009-01-01

We describe some new developments in the methodology of making heteronuclear and homonuclear recoupling experiments in solid state NMR insensitive to rf-inhomogeneity by phase alternating the irradiation on the spin system every rotor period. By incorporating delays of half rotor periods in the pulse sequences, these phase alternating experiments can be made γ encoded. The proposed methodology is conceptually different from the standard methods of making recoupling experiments robust by the use of ramps and adiabatic pulses in the recoupling periods. We show how the concept of phase alternation can be incorporated in the design of homonuclear recoupling experiments that are both insensitive to chemical-shift dispersion and rf-inhomogeneity. PMID:19157931
Relative Panoramic Camera Position Estimation for Image-Based Virtual Reality Networks in Indoor Environments

NASA Astrophysics Data System (ADS)

Nakagawa, M.; Akano, K.; Kobayashi, T.; Sekiguchi, Y.

2017-09-01

Image-based virtual reality (VR) is a virtual space generated with panoramic images projected onto a primitive model. In imagebased VR, realistic VR scenes can be generated with lower rendering cost, and network data can be described as relationships among VR scenes. The camera network data are generated manually or by an automated procedure using camera position and rotation data. When panoramic images are acquired in indoor environments, network data should be generated without Global Navigation Satellite Systems (GNSS) positioning data. Thus, we focused on image-based VR generation using a panoramic camera in indoor environments. We propose a methodology to automate network data generation using panoramic images for an image-based VR space. We verified and evaluated our methodology through five experiments in indoor environments, including a corridor, elevator hall, room, and stairs. We confirmed that our methodology can automatically reconstruct network data using panoramic images for image-based VR in indoor environments without GNSS position data.
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

ScienceCinema

Campbell, Catherine

2018-01-22

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, Catherine

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Next generation sequencing provides rapid access to the genome of wheat stripe rust

USDA-ARS?s Scientific Manuscript database

Background: The wheat stripe rust fungus (Puccinia striiformis f. sp. tritici, PST) is responsible for significant yield losses in wheat production worldwide. In spite of its economic importance, the PST genomic sequence is not currently available. Fortunately Next Generation Sequencing (NGS) has ra...
Characterization of Microbial Population Structures in Recreational Waters and Primary Sources of Fecal Pollution with a Next-Generation Sequencing Approach

EPA Science Inventory

The invention of new approaches to DNA sequencing commonly referred to as next generation sequencing technologies is revolutionizing the study of microbial diversity. In this chapter, we discuss the characterization of microbial population structures in recreational waters and p...
Use of low-coverage, large-insert, short-read data for rapid and accurate generation of enhanced-quality draft Pseudomonas genome sequences.

PubMed

O'Brien, Heath E; Gong, Yunchen; Fung, Pauline; Wang, Pauline W; Guttman, David S

2011-01-01

Next-generation genomic technology has both greatly accelerated the pace of genome research as well as increased our reliance on draft genome sequences. While groups such as the Genomics Standards Consortium have made strong efforts to promote genome standards there is a still a general lack of uniformity among published draft genomes, leading to challenges for downstream comparative analyses. This lack of uniformity is a particular problem when using standard draft genomes that frequently have large numbers of low-quality sequencing tracts. Here we present a proposal for an "enhanced-quality draft" genome that identifies at least 95% of the coding sequences, thereby effectively providing a full accounting of the genic component of the genome. Enhanced-quality draft genomes are easily attainable through a combination of small- and large-insert next-generation, paired-end sequencing. We illustrate the generation of an enhanced-quality draft genome by re-sequencing the plant pathogenic bacterium Pseudomonas syringae pv. phaseolicola 1448A (Pph 1448A), which has a published, closed genome sequence of 5.93 Mbp. We use a combination of Illumina paired-end and mate-pair sequencing, and surprisingly find that de novo assemblies with 100x paired-end coverage and mate-pair sequencing with as low as low as 2-5x coverage are substantially better than assemblies based on higher coverage. The rapid and low-cost generation of large numbers of enhanced-quality draft genome sequences will be of particular value for microbial diagnostics and biosecurity, which rely on precise discrimination of potentially dangerous clones from closely related benign strains.
Diagnosis of local hepatic tuberculosis through next-generation sequencing: Smarter, faster and better.

PubMed

Ai, Jing-Wen; Li, Yang; Cheng, Qi; Cui, Peng; Wu, Hong-Long; Xu, Bin; Zhang, Wen-Hong

2018-06-01

A 45-year-old man who complained of continuous fever and multiple hepatic masses was admitted to our hospital. Repeated MRI manifestations were similar while each radiological report suggested contradictory diagnosis pointing to infections or malignances respectively. Pathologic examination of the liver tissue showed no direct evidence of either infections or tumor. We performed next-generation sequencing on the liver tissue and peripheral blood to further investigate the possible etiology. High throughput sequencing was performed on the liver lesion tissues using BGISEQ-100 platform, and data was mapped to the Microbial Genome Databases after filtering low quality data and human reads. We identified a total of 299 sequencing reads of Mycobacterium tuberculosis (M. tuberculosis) complex sequences from the liver tissue, including 8, 229 of 4,424,435 of the M. tuberculosis nucleotide sequences, and Mycobacterium africanum, Mycobacterium bovis, and Mycobacterium canettii were also detected due to the 99.9% identical rate among these strains. No specific Mycobacterial tuberculosis nucleotide sequence was detected in the sample of peripheral blood. Patient's symptom quickly recovered after anti-tuberculosis treatment and repeated Ziehl-Neelsen staining of the liver tissue finally identified small numbers of positive bacillus. The diagnosis of this patient was difficult to establish before the next-generation sequencing because of contradictive radiological results and negative pathological findings. More sensitive diagnostic methods are urgently needed. This is the first case reporting hepatic tuberculosis confirmed by the next-generation sequencing, and marks the promising potential of the application of the next-generation sequencing in the diagnosis of hepatic lesions with unknown etiology. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Vertical decomposition with Genetic Algorithm for Multiple Sequence Alignment

PubMed Central

2011-01-01

Background Many Bioinformatics studies begin with a multiple sequence alignment as the foundation for their research. This is because multiple sequence alignment can be a useful technique for studying molecular evolution and analyzing sequence structure relationships. Results In this paper, we have proposed a Vertical Decomposition with Genetic Algorithm (VDGA) for Multiple Sequence Alignment (MSA). In VDGA, we divide the sequences vertically into two or more subsequences, and then solve them individually using a guide tree approach. Finally, we combine all the subsequences to generate a new multiple sequence alignment. This technique is applied on the solutions of the initial generation and of each child generation within VDGA. We have used two mechanisms to generate an initial population in this research: the first mechanism is to generate guide trees with randomly selected sequences and the second is shuffling the sequences inside such trees. Two different genetic operators have been implemented with VDGA. To test the performance of our algorithm, we have compared it with existing well-known methods, namely PRRP, CLUSTALX, DIALIGN, HMMT, SB_PIMA, ML_PIMA, MULTALIGN, and PILEUP8, and also other methods, based on Genetic Algorithms (GA), such as SAGA, MSA-GA and RBT-GA, by solving a number of benchmark datasets from BAliBase 2.0. Conclusions The experimental results showed that the VDGA with three vertical divisions was the most successful variant for most of the test cases in comparison to other divisions considered with VDGA. The experimental results also confirmed that VDGA outperformed the other methods considered in this research. PMID:21867510
Design space pruning heuristics and global optimization method for conceptual design of low-thrust asteroid tour missions

NASA Astrophysics Data System (ADS)

Alemany, Kristina

Electric propulsion has recently become a viable technology for spacecraft, enabling shorter flight times, fewer required planetary gravity assists, larger payloads, and/or smaller launch vehicles. With the maturation of this technology, however, comes a new set of challenges in the area of trajectory design. Because low-thrust trajectory optimization has historically required long run-times and significant user-manipulation, mission design has relied on expert-based knowledge for selecting departure and arrival dates, times of flight, and/or target bodies and gravitational swing-bys. These choices are generally based on known configurations that have worked well in previous analyses or simply on trial and error. At the conceptual design level, however, the ability to explore the full extent of the design space is imperative to locating the best solutions in terms of mass and/or flight times. Beginning in 2005, the Global Trajectory Optimization Competition posed a series of difficult mission design problems, all requiring low-thrust propulsion and visiting one or more asteroids. These problems all had large ranges on the continuous variables---launch date, time of flight, and asteroid stay times (when applicable)---as well as being characterized by millions or even billions of possible asteroid sequences. Even with recent advances in low-thrust trajectory optimization, full enumeration of these problems was not possible within the stringent time limits of the competition. This investigation develops a systematic methodology for determining a broad suite of good solutions to the combinatorial, low-thrust, asteroid tour problem. The target application is for conceptual design, where broad exploration of the design space is critical, with the goal being to rapidly identify a reasonable number of promising solutions for future analysis. The proposed methodology has two steps. The first step applies a three-level heuristic sequence developed from the physics of the problem, which allows for efficient pruning of the design space. The second phase applies a global optimization scheme to locate a broad suite of good solutions to the reduced problem. The global optimization scheme developed combines a novel branch-and-bound algorithm with a genetic algorithm and an industry-standard low-thrust trajectory optimization program to solve for the following design variables: asteroid sequence, launch date, times of flight, and asteroid stay times. The methodology is developed based on a small sample problem, which is enumerated and solved so that all possible discretized solutions are known. The methodology is then validated by applying it to a larger intermediate sample problem, which also has a known solution. Next, the methodology is applied to several larger combinatorial asteroid rendezvous problems, using previously identified good solutions as validation benchmarks. These problems include the 2nd and 3rd Global Trajectory Optimization Competition problems. The methodology is shown to be capable of achieving a reduction in the number of asteroid sequences of 6-7 orders of magnitude, in terms of the number of sequences that require low-thrust optimization as compared to the number of sequences in the original problem. More than 70% of the previously known good solutions are identified, along with several new solutions that were not previously reported by any of the competitors. Overall, the methodology developed in this investigation provides an organized search technique for the low-thrust mission design of asteroid rendezvous problems.
Assessment of a stochastic downscaling methodology in generating an ensemble of hourly future climate time series

NASA Astrophysics Data System (ADS)

Fatichi, S.; Ivanov, V. Y.; Caporali, E.

2013-04-01

This study extends a stochastic downscaling methodology to generation of an ensemble of hourly time series of meteorological variables that express possible future climate conditions at a point-scale. The stochastic downscaling uses general circulation model (GCM) realizations and an hourly weather generator, the Advanced WEather GENerator (AWE-GEN). Marginal distributions of factors of change are computed for several climate statistics using a Bayesian methodology that can weight GCM realizations based on the model relative performance with respect to a historical climate and a degree of disagreement in projecting future conditions. A Monte Carlo technique is used to sample the factors of change from their respective marginal distributions. As a comparison with traditional approaches, factors of change are also estimated by averaging GCM realizations. With either approach, the derived factors of change are applied to the climate statistics inferred from historical observations to re-evaluate parameters of the weather generator. The re-parameterized generator yields hourly time series of meteorological variables that can be considered to be representative of future climate conditions. In this study, the time series are generated in an ensemble mode to fully reflect the uncertainty of GCM projections, climate stochasticity, as well as uncertainties of the downscaling procedure. Applications of the methodology in reproducing future climate conditions for the periods of 2000-2009, 2046-2065 and 2081-2100, using the period of 1962-1992 as the historical baseline are discussed for the location of Firenze (Italy). The inferences of the methodology for the period of 2000-2009 are tested against observations to assess reliability of the stochastic downscaling procedure in reproducing statistics of meteorological variables at different time scales.
13C NMR study of the generation of C2- and C3-deuterated lactic acid by tumoral pancreatic islet cells exposed to D-[1-13C]-, D-[2-13C]- and D-[6-13C]-glucose in 2H2O.

PubMed

Willem, R; Biesemans, M; Kayser, F; Malaisse, W J

1994-03-01

Tumoral pancreatic islet cells of the RIN5mF line were incubated for 120 min in media prepared in 2H2O and containing D-[1-13C]glucose, D-[2-13C]glucose, and D-[6-13C]glucose. The generation of C2- and C3-deuterated lactic acid was assessed by 13C NMR. The interpretation of experimental results suggests that a) the efficiency of deuteration on the C1 of D-fructose 6-phosphate does not exceed about 47% and 4% in the phosphoglucoisomerase and phosphomannoisomerase reactions, respectively; b) approximately 38% of the molecules of D-glyceraldehyde 3-phosphate generated from D-glucose escape deuteration in the sequence of reactions catalyzed by triose phosphate isomerase and aldolase; and c) about 41% of the molecules of pyruvate generated by glycolysis are immediately converted to lactate, the remaining 59% of pyruvate molecules undergoing first a single or double back-and-forth interconversion with L-alanine. It is proposed that this methodological approach, based on high resolution 13C NMR spectroscopy, may provide novel information on the regulation of back-and-forth interconversion of glycolytic intermediates in intact cells as modulated, for instance, by enzyme-to-enzyme tunneling.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.