sequence analysis sasa: Topics by Science.gov

Sample records for sequence analysis sasa

Analysis of loss of decay-heat-removal sequences at Browns Ferry Unit One

DOE Office of Scientific and Technical Information (OSTI.GOV)

Harrington, R.M.

1983-01-01

This paper summarizes the Oak Ridge National Laboratory (ORNL) report Loss of DHR Sequences at Browns Ferry Unit One - Accident Sequence Analysis (NUREG/CR-2973). The Loss of DHR investigation is the third in a series of accident studies concerning the BWR 4 - MK I containment plant design. These studies, sponsored by the Nuclear Regulatory Commission Severe Accident Sequence Analysis (SASA) program, have been conducted at ORNL with the full cooperation of the Tennessee Valley Authority (TVA). The purpose of the SASA studies is to predetermine the probable course of postulated severe accidents so as to establish the timing andmore » the sequence of events. The SASA studies also produce recommendations concerning the implementation of better system design and better emergency operating instructions and operator training. The ORNL studies also include a detailed, best-estimate calculation of the release and transport of radioactive fission products following postulated severe accidents.« less
Redescription of 13 holotypes of Rheocricotopus Brundin, 1956 (Diptera: Chironomidae) from the Sino-Indian Region.

PubMed

Fu, Yue; Huang, Jingli; Liu, Wenbin; Fang, Xiangliang; Wang, Xinhua

2016-05-24

Thirteen holotypes of the orthoclad genus Rheocricotopus from Sino-Indian Region: R. (Psilocricotopus) hidakadeeus Sasa & Suzuki, R. (P.) isigadeeus Sasa & Suzuki, R. (P.) kurocedeus Sasa, R. (P.) tokarakeleus Sasa & Suzuki, R. (P.) tobatervicesimus Kikuchi & Sasa, R. (Rheocricotopus) inaquereus Sasa, Kitami & Suzuki, R. (R.) inaxeyeus Sasa, Kitami & Suzuki, R. (R.) shoufukusecundus Sasa, R. (R.) tamahumeralis Sasa, R. (R.) tatequintus Sasa, R. (R.) tedorisecundus Sasa, R. (R.) togapeniculus Sasa & Okazawa and R. (R.) yakulemeus Sasa & Suzuki are re-examined and illustrated, Some additional descriptions, corrections and a key of these thirteen holotypes are given.
Social Anxiety Scale for Adolescents (SAS-A): measuring social anxiety among Finnish adolescents.

PubMed

Ranta, Klaus; Junttila, Niina; Laakkonen, Eero; Uhmavaara, Anni; La Greca, Annette M; Niemi, Päivi M

2012-08-01

The aim of this study was to investigate symptoms of social anxiety and the psychometric properties of the Social Anxiety Scale for Adolescents (SAS-A) among Finnish adolescents, 13-16 years of age. Study 1 (n = 867) examined the distribution of SAS-A scores according to gender and age, and the internal consistency and factor structure of the SAS-A. In a subsample (n = 563; Study 2) concurrent and discriminant validity of the SAS-A were examined relative to the Social Phobia Inventory and the Beck Depression Inventory. Test-retest stability was examined over a 30-month period by repeated measures every 6 months in another subsample (n = 377; Study 3). Results mostly revealed no gender differences in social anxiety, except that boys reported more general social avoidance and distress than girls. Older adolescents (14-16-year-olds) reported higher social anxiety than younger adolescents (12-13-year-olds). Internal consistency for the SAS-A was acceptable for both genders and for all three SAS-A subscales. Confirmatory factor analysis replicated the original 18-item three-factor structure of the SAS-A, accounting for 61% of the variance between items. Evidence for concurrent and discriminant validity was found. Test-retest stability over 6 months was satisfactory. Results support the reliability and validity of the Finnish adaptation of the SAS-A, and further indicate that gender differences in adolescents' social anxiety may vary across Western countries.
Station blackout transient at the Browns Ferry Unit 1 Plant: a severe accident sequence analysis (SASA) program study

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schultz, R.R.

1982-01-01

Operating plant transients are of great interest for many reasons, not the least of which is the potential for a mild transient to degenerate to a severe transient yielding core damage. Using the Browns Ferry (BF) Unit-1 plant as a basis of study, the station blackout sequence was investigated by the Severe Accident Sequence Analysis (SASA) Program in support of the Nuclear Regulatory Commission's Unresolved Safety Issue A-44: Station Blackout. A station blackout transient occurs when the plant's AC power from a comemrcial power grid is lost and cannot be restored by the diesel generators. Under normal operating conditions, fmore » a loss of offsite power (LOSP) occurs (i.e., a complete severance of the BF plants from the Tennessee Valley Authority (TVA) power grid), the eight diesel generators at the three BF units would quickly start and power the emergency AC buses. Of the eight diesel generators, only six are needed to safely shut down all three units. Examination of BF-specific data show that LOSP frequency is low at Unit 1. The station blackout frequency is even lower (5.7 x 10/sup -4/ events per year) and hinges on whether the diesel generators start. The frequency of diesel generator failure is dictated in large measure by the emergency equipment cooling water (EECW) system that cools the diesel generators.« less
Psychometric properties and clinical cut-off scores of the Spanish version of the Social Anxiety Scale for Adolescents.

PubMed

Garcia-Lopez, Luis J; Inglés, Cándido J; García-Fernández, José M; Hidalgo, María D; Bermejo, Rosa; Puklek Levpušček, Melita

2011-01-01

This study examined the reliability and validity evidence drawn from the scores of the Spanish version of the Slovenian-developed Social Anxiety Scale for Adolescents (SASA; Puklek, 1997; Puklek & Vidmar, 2000) using a community sample (Study 1) and a clinical sample (Study 2). Confirmatory factor analysis in Study 1 replicated the 2-factor structure found by the original authors in a sample of Slovenian adolescents. Test-retest reliability was adequate. Furthermore, the SASA correlated significantly with other social anxiety scales, supporting concurrent validity evidence in Spanish adolescents. The results of Study 2 confirmed the correlations between the SASA and other social anxiety measures in a clinical sample. In addition, findings revealed that the SASA can effectively discriminate between adolescents with a clinical diagnosis of social anxiety disorder (SAD) and those without this disorder. Finally, cut-off scores for the SASA are provided for Spanish adolescents.
Comprehensive analysis of MHC class I genes from the U-, S-, and Z-lineages in Atlantic salmon.

PubMed

Lukacs, Morten F; Harstad, Håvard; Bakke, Hege G; Beetz-Sargent, Marianne; McKinnel, Linda; Lubieniecki, Krzysztof P; Koop, Ben F; Grimholt, Unni

2010-03-05

We have previously sequenced more than 500 kb of the duplicated MHC class I regions in Atlantic salmon. In the IA region we identified the loci for the MHC class I gene Sasa-UBA in addition to a soluble MHC class I molecule, Sasa-ULA. A pseudolocus for Sasa-UCA was identified in the nonclassical IB region. Both regions contained genes for antigen presentation, as wells as orthologues to other genes residing in the human MHC region. The genomic localisation of two MHC class I lineages (Z and S) has been resolved. 7 BACs were sequenced using a combination of standard Sanger and 454 sequencing. The new sequence data extended the IA region with 150 kb identifying the location of one Z-lineage locus, ZAA. The IB region was extended with 350 kb including three new Z-lineage loci, ZBA, ZCA and ZDA in addition to a UGA locus. An allelic version of the IB region contained a functional UDA locus in addition to the UCA pseudolocus. Additionally a BAC harbouring two MHC class I genes (UHA) was placed on linkage group 14, while a BAC containing the S-lineage locus SAA (previously known as UAA) was placed on LG10. Gene expression studies showed limited expression range for all class I genes with exception of UBA being dominantly expressed in gut, spleen and gills, and ZAA with high expression in blood. Here we describe the genomic organization of MHC class I loci from the U-, Z-, and S-lineages in Atlantic salmon. Nine of the described class I genes are located in the extension of the duplicated IA and IB regions, while three class I genes are found on two separate linkage groups. The gene organization of the two regions indicates that the IB region is evolving at a different pace than the IA region. Expression profiling, polymorphic content, peptide binding properties and phylogenetic relationship show that Atlantic salmon has only one MHC class Ia gene (UBA), in addition to a multitude of nonclassical MHC class I genes from the U-, S- and Z-lineages.
A rapid solvent accessible surface area estimator for coarse grained molecular simulations.

PubMed

Wei, Shuai; Brooks, Charles L; Frank, Aaron T

2017-06-05

The rapid and accurate calculation of solvent accessible surface area (SASA) is extremely useful in the energetic analysis of biomolecules. For example, SASA models can be used to estimate the transfer free energy associated with biophysical processes, and when combined with coarse-grained simulations, can be particularly useful for accounting for solvation effects within the framework of implicit solvent models. In such cases, a fast and accurate, residue-wise SASA predictor is highly desirable. Here, we develop a predictive model that estimates SASAs based on Cα-only protein structures. Through an extensive comparison between this method and a comparable method, POPS-R, we demonstrate that our new method, Protein-C α Solvent Accessibilities or PCASA, shows better performance, especially for unfolded conformations of proteins. We anticipate that this model will be quite useful in the efficient inclusion of SASA-based solvent free energy estimations in coarse-grained protein folding simulations. PCASA is made freely available to the academic community at https://github.com/atfrank/PCASA. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
'SASA! is the medicine that treats violence'. Qualitative findings on how a community mobilisation intervention to prevent violence against women created change in Kampala, Uganda.

PubMed

Kyegombe, Nambusi; Starmann, Elizabeth; Devries, Karen M; Michau, Lori; Nakuti, Janet; Musuya, Tina; Watts, Charlotte; Heise, Lori

2014-01-01

Intimate partner violence (IPV) violates women's human rights and is a serious public health concern. Historically strategies to prevent IPV have focussed on individuals and their relationships without addressing the context under which IPV occurs. Primary prevention of IPV is a relatively new focus of international efforts and what SASA!, a phased community mobilisation intervention, seeks to achieve. Conducted in Kampala, Uganda, between 2007 and 2012, the SASA! Study is a cluster randomised controlled trial to assess the community-level impact of SASA! This nested qualitative study explores pathways of individual- and community-level change as a result of SASA! Forty in-depth interviews with community members (20 women, 20 men) were conducted at follow-up, audio recorded, transcribed verbatim and analysed using thematic analysis complemented by constant comparative methods. SASA! influenced the dynamics of relationships and broader community norms. At the relationship level, SASA! is helping partners to explore the benefits of mutually supportive gender roles; improve communication on a variety of issues; increase levels of joint decision-making and highlight non-violent ways to deal with anger or disagreement. Not all relationships experienced the same breadth and depth of change. At the community level, SASA! has helped foster a climate of non-tolerance of violence by reducing the acceptability of violence against women and increasing individuals' skills, willingness, and sense of responsibility to act to prevent it. It has also developed and strengthened community-based structures to catalyse and support on-going activism to prevent IPV. This paper provides evidence of the ways in which community-based violence prevention interventions may reduce IPV in low-income settings. It offers important implications for community mobilisation approaches and for prevention of IPV against women. This research has demonstrated the potential of social norm change interventions at the community level to achieve meaningful impact within project timeframes.
‘SASA! is the medicine that treats violence’. Qualitative findings on how a community mobilisation intervention to prevent violence against women created change in Kampala, Uganda

PubMed Central

Kyegombe, Nambusi; Starmann, Elizabeth; Devries, Karen M.; Michau, Lori; Nakuti, Janet; Musuya, Tina; Watts, Charlotte; Heise, Lori

2014-01-01

Background Intimate partner violence (IPV) violates women's human rights and is a serious public health concern. Historically strategies to prevent IPV have focussed on individuals and their relationships without addressing the context under which IPV occurs. Primary prevention of IPV is a relatively new focus of international efforts and what SASA!, a phased community mobilisation intervention, seeks to achieve. Methods Conducted in Kampala, Uganda, between 2007 and 2012, the SASA! Study is a cluster randomised controlled trial to assess the community-level impact of SASA! This nested qualitative study explores pathways of individual- and community-level change as a result of SASA! Forty in-depth interviews with community members (20 women, 20 men) were conducted at follow-up, audio recorded, transcribed verbatim and analysed using thematic analysis complemented by constant comparative methods. Results SASA! influenced the dynamics of relationships and broader community norms. At the relationship level, SASA! is helping partners to explore the benefits of mutually supportive gender roles; improve communication on a variety of issues; increase levels of joint decision-making and highlight non-violent ways to deal with anger or disagreement. Not all relationships experienced the same breadth and depth of change. At the community level, SASA! has helped foster a climate of non-tolerance of violence by reducing the acceptability of violence against women and increasing individuals’ skills, willingness, and sense of responsibility to act to prevent it. It has also developed and strengthened community-based structures to catalyse and support on-going activism to prevent IPV. Discussion This paper provides evidence of the ways in which community-based violence prevention interventions may reduce IPV in low-income settings. It offers important implications for community mobilisation approaches and for prevention of IPV against women. This research has demonstrated the potential of social norm change interventions at the community level to achieve meaningful impact within project timeframes. PMID:25226421
Peptides design based on transmembrane Escherichia coli's OmpA protein through molecular dynamics simulations in water-dodecane interfaces.

PubMed

Aguilera-Segura, Sonia M; Núñez Vélez, Vanessa; Achenie, Luke; Álvarez Solano, Oscar; Torres, Rodrigo; González Barrios, Andrés Fernando

2016-07-01

Recent research efforts have focused on the production of environmentally nonthreatening products, including identifying biosurfactants that can replace conventional surfactants. In order to utilize biosurfactants in different industries such as cosmetic, food or petroleum, it is necessary to understand the underpinnings behind the interactions that could take place for biosurfactants which display potential for interface activity. This work aimed to use molecular dynamics simulations to understand the interactions of rationally obtained peptide sequences from the original sequence of the OmpA gene in Escherichia coli, based on the free energy change (ΔG) during peptide insertion at the water-dodecane interface. Seventeen OmpA-based peptide sequences were selected and analyzed based on their hydropathy index profiles. We found that free energy change due to Columbic interactions and SASA (ΔGCoul/SASA), total free energy change and MW (ΔG/MW), and free energy change due to Coulombic and van der Waals interactions (ΔGCoul/ΔGvdW) ratios could provide a better understating in the contribution of the free energy decrease at the interface. The results indicated that the peptide sequences GKNHDTGVSPVFA and THENQLGAGAFG display biosurfactant potential based on low ΔG per square nanometer, high ΔGCoul/ΔGvdW ratio, clearly defined moieties along its hydrophobic surface and sequence, and the presence of charged residues in the polar head. Clearly defined moieties and SASA were determinant for electrostatic interactions between oil-water interfaces. Experimental validations exhibited that the emulsions prepared remained stable between 3 and 27h, respectively. Even though the peptide GKNHDTGVSPVFA displays strong interactions at the interface, stabilization times showed that the peptide THENQLGAGAFG exhibited the best performance suggesting that the stability can be better described by kinetic rather than thermodynamic criteria once the emulsion is formed. Copyright © 2016 Elsevier Inc. All rights reserved.
RAMONA-3B application to Browns Ferry ATWS

DOE Office of Scientific and Technical Information (OSTI.GOV)

Slovik, G.C.; Neymotin, L.Y.; Saha, P.

1985-01-01

The Anticipated Transient Without Scram (ATWS) is known to be a dominant accident sequence for possible core melt in a Boiling Water Reactor (BWR). A recent Probabilistic Risk Assessment (PRA) analysis for the Browns Ferry nuclear power plant indicates that ATWS is the second most dominant transient for core melt in BWR/4 with Mark I containment. The most dominant sequence being the failure of long term decay heat removal function of the Residual Heat Removal (RHR) system. Of all the various ATWS scenarios, the Main Steam Isolation Valve (MSIV) closure ATWS sequence was chosen for present analysis because of itsmore » relatively high frequency of occurrence and its challenge to the residual heat removal system and containment integrity. The objective of this paper is to discuss four MSIV closure ATWS calculations using the RAMONA-3B code. The paper is a summary of a report being prepared for the USNRC Severe Accident Sequence Analysis (SASA) program which should be referred to for details. 10 refs., 20 figs., 3 tabs.« less
Sasa health exerts a protective effect on Her2/NeuN mammary tumorigenesis.

PubMed

Ren, Mingqiang; Reilly, R Todd; Sacchi, Nicoletta

2004-01-01

Bamboo grass leaves of different Sasa species have been widely used in food and medicine in Eastern Asia for hundreds of years. Of special interest are Kumazasa (Sasa senanensis rehder) leaves used to prepare an alkaline extract known as Sasa Health. This extract was reported to inhibit both the development and growth of mammary tumors in a mammary tumor strain of virgin SHN mice (1). We found that Sasa Health exerts a significant protective effect on spontaneous mammary tumorigenesis in another mouse model of human breast cancer, the transgenic FVB-Her2/NeuN mouse model. Two cohorts of Her2/NeuN female mice of different age (eleven-week-old and twenty-four-week-old) chronically treated with Sasa Health in drinking water showed both a delay in the development of tumors and reduced tumor multiplicity. Sasa Health also induced inhibition of mammary duct branching and side bud development in association with reduced angiogenesis. Altogether these findings indicate that Sasa Health contains phytochemicals that can effectively retard spontaneous mammary tumorigenesis.
Reliability and Validity of the Chinese Version of the Social Anxiety Scale for Adolescents

ERIC Educational Resources Information Center

Zhou, Xinyue; Xu, Qian; Ingles, Candido J.; Hidalgo, Maria D.; La Greca, Annette M.

2008-01-01

This study evaluated the psychometric properties of the Chinese version of the Social Anxiety Scale for Adolescents (SAS-A) in a sample of 296 adolescents (49% boys) in Grades 7, 8, 9, 10, and 12 with a mean age of 15.52 years. Confirmatory factor analysis replicated the three-factor structure of the SAS-A in the Chinese sample: Fear of Negative…
Parachironomus lenz from china and Japan (Diptera, chironomidae).

PubMed

Yan, Chun-Cai; Yan, Jiao; Jiang, Li; Guo, Qin; Liu, Ting; Ge, Xin-Yu; Wang, Xin-Hua; Pan, Bao-Ping

2015-01-01

Members of the genus Parachironomus Lenz known from China and Japan are revised, and a key to their male adults is given. Parachironomuspoyangensis sp. n. is described in this life stage. Parachironomusfrequens (Johannsen) and Parachironomusmonochromus (van der Wulp) are recorded from China for the first time, thus are redescribed from Chinese specimens. Parachironomuskamaabeus Sasa & Tanaka and Parachironomustoneabeus Sasa & Tanaka are new junior synonyms of Parachironomusfrequens. Three Chinese or Japanese species formerly placed in Parachironomus are transferred to other genera, resulting in the new combinations Cryptochironomusinafegeus (Sasa, Kitami & Suzuki), Demicryptochironomus (Irmakia) lobus (Yan, Sæther, Jin & Wang), and Microchironomuslacteipennis (Kieffer). Chironomussauteri Kieffer, Parachironomuskisobilobalis Sasa & Kondo and Parachironomuskuramaexpandus Sasa are removed from Parachironomus; the last of these three denotes a valid species of uncertain generic placement, the first two are nomina dubia.
Ecological pathways to prevention: How does the SASA! community mobilisation model work to prevent physical intimate partner violence against women?

PubMed

Abramsky, Tanya; Devries, Karen M; Michau, Lori; Nakuti, Janet; Musuya, Tina; Kiss, Ligia; Kyegombe, Nambusi; Watts, Charlotte

2016-04-16

Intimate partner violence (IPV) against women is a global public health concern. While community-level gender norms and attitudes to IPV are recognised drivers of IPV risk, there is little evidence on how interventions might tackle these drivers to prevent IPV at the community-level. This secondary analysis of data from the SASA! study explores the pathways through which SASA!, a community mobilisation intervention to prevent violence against women, achieved community-wide reductions in physical IPV. From 2007 to 2012 a cluster randomised controlled trial (CRT) was conducted in eight communities in Kampala, Uganda. Cross-sectional surveys of a random sample of community members, aged 18-49, were undertaken at baseline (n = 1583) and 4 years post intervention implementation (n = 2532). We used cluster-level intention to treat analysis to estimate SASA!'s community-level impact on women's past year experience of physical IPV and men's past year perpetration of IPV. The mediating roles of community-, relationship- and individual-level factors in intervention effect on past year physical IPV experience (women)/perpetration (men) were explored using modified Poisson regression models. SASA! was associated with reductions in women's past year experience of physical IPV (0.48, 95 % CI 0.16-1.39), as well as men's perpetration of IPV (0.39, 95 % CI 0.20-0.73). Community-level normative attitudes were the most important mediators of intervention impact on physical IPV risk, with norms around the acceptability of IPV explaining 70 % of the intervention effect on women's experience of IPV and 95 % of the effect on men's perpetration. The strongest relationship-level mediators were men's reduced suspicion of partner infidelity (explaining 22 % of effect on men's perpetration), and improved communication around sex (explaining 16 % of effect on women's experience). Reduced acceptability of IPV among men was the most important individual-level mediator (explaining 42 % of effect on men's perpetration). These results highlight the important role of community-level norm-change in achieving community-wide reductions in IPV risk. They lend strong support for the more widespread adoption of community-level approaches to preventing violence. ClinicalTrials.gov, NCT00790959 . Registered 13th November 2008. The study protocol is available at: http://www.trialsjournal.com/content/13/1/96.
Social Anxiety Scale for Adolescents (SAS-A): Measuring Social Anxiety among Finnish Adolescents

ERIC Educational Resources Information Center

Ranta, Klaus; Junttila, Niina; Laakkonen, Eero; Uhmavaara, Anni; La Greca, Annette M.; Niemi, Paivi M.

2012-01-01

The aim of this study was to investigate symptoms of social anxiety and the psychometric properties of the "Social Anxiety Scale for Adolescents" (SAS-A) among Finnish adolescents, 13-16 years of age. Study 1 (n = 867) examined the distribution of SAS-A scores according to gender and age, and the internal consistency and factor structure…
Exploring Couples' Processes of Change in the Context of SASA!, a Violence Against Women and HIV Prevention Intervention in Uganda.

PubMed

Starmann, Elizabeth; Collumbien, Martine; Kyegombe, Nambusi; Devries, Karen; Michau, Lori; Musuya, Tina; Watts, Charlotte; Heise, Lori

2017-02-01

There is now a growing body of research indicating that prevention interventions can reduce intimate partner violence (IPV); much less is known, however, about how couples exposed to these interventions experience the change process, particularly in low-income countries. Understanding the dynamic process that brings about the cessation of IPV is essential for understanding how interventions work (or don't) to reduce IPV. This study aimed to provide a better understanding of how couples' involvement with SASA!-a violence against women and HIV-related community mobilisation intervention developed by Raising Voices in Uganda-influenced processes of change in relationships. Qualitative data were collected from each partner in separate in-depth interviews following the intervention. Dyadic analysis was conducted using framework analysis methods. Study findings suggest that engagement with SASA! contributed to varied experiences and degrees of change at the individual and relationship levels. Reflection around healthy relationships and communication skills learned through SASA! activities or community activists led to more positive interaction among many couples, which reduced conflict and IPV. This nurtured a growing trust and respect between many partners, facilitating change in longstanding conflicts and generating greater intimacy and love as well as increased partnership among couples to manage economic challenges. This study draws attention to the value of researching and working with both women, men and couples to prevent IPV and suggests IPV prevention interventions may benefit from the inclusion of relationship skills building and support within the context of community mobilisation interventions.
Social Anxiety Scale for Adolescents and School Anxiety Inventory: Psychometric properties in French adolescents.

PubMed

Delgado, Beatriz; García-Fernández, José M; Martínez-Monteagudo, María C; Inglés, Cándido J; Marzo, Juan C; La Greca, Annette M; Hugon, Mandarine

2018-06-02

School and social anxiety are common problems and have a significant impact on youths' development. Nevertheless, the questionnaires to assess these anxious symptoms in French adolescents have limitations. The aim of this study is to provide a French version of the Social Anxiety Scale for Adolescents (SAS-A) and the School Anxiety Inventory (SAI), analysing their psychometric properties by the factor structure, internal consistency, and convergent validity. The SAS-A and the SAI were collectively administered in a sample of 1011 French adolescents (48.5% boys) ranging in age from 11 to 18 years. Confirmatory factor analyses replicated the previously identified correlated three-factor structure of the SAS-A and the correlated four-factor structure of the SAI. Acceptable internal consistency indexes were found for SAS-A and SAI scores. Correlations supported the convergent validity of the questionnaires' subscales. Overall, results supported the internal consistency and validity of the French versions of the SAS-A and SAI.
Implicit Solvation Parameters Derived from Explicit Water Forces in Large-Scale Molecular Dynamics Simulations

PubMed Central

2012-01-01

Implicit solvation is a mean force approach to model solvent forces acting on a solute molecule. It is frequently used in molecular simulations to reduce the computational cost of solvent treatment. In the first instance, the free energy of solvation and the associated solvent–solute forces can be approximated by a function of the solvent-accessible surface area (SASA) of the solute and differentiated by an atom–specific solvation parameter σiSASA. A procedure for the determination of values for the σiSASA parameters through matching of explicit and implicit solvation forces is proposed. Using the results of Molecular Dynamics simulations of 188 topologically diverse protein structures in water and in implicit solvent, values for the σiSASA parameters for atom types i of the standard amino acids in the GROMOS force field have been determined. A simplified representation based on groups of atom types σgSASA was obtained via partitioning of the atom–type σiSASA distributions by dynamic programming. Three groups of atom types with well separated parameter ranges were obtained, and their performance in implicit versus explicit simulations was assessed. The solvent forces are available at http://mathbio.nimr.mrc.ac.uk/wiki/Solvent_Forces. PMID:23180979
Social Anxiety Scale for Adolescents (SAS-A) Short Form.

PubMed

Nelemans, Stefanie A; Meeus, Wim H J; Branje, Susan J T; Van Leeuwen, Karla; Colpin, Hilde; Verschueren, Karine; Goossens, Luc

2017-01-01

In this study, we examined the longitudinal measurement invariance of a 12-item short version of the Social Anxiety Scale for Adolescents (SAS-A) in two 4-year longitudinal community samples ( N sample 1 = 815, M age T 1 = 13.38 years; N sample 2 = 551, M age T 1 = 14.82 years). Using confirmatory factor analyses, we found strict longitudinal measurement invariance for the three-factor structure of the SAS-A across adolescence, across samples, and across gender. Some developmental changes in social anxiety were found from early to mid-adolescence, as well as gender differences across adolescence. These findings suggest that the short version of the SAS-A is a developmentally appropriate instrument that can be used effectively to examine adolescent social anxiety development.

Sasa-Satsuma higher-order nonlinear Schrödinger equation and its bilinearization and multisoliton solutions.

PubMed

Gilson, C; Hietarinta, J; Nimmo, J; Ohta, Y

2003-07-01

Higher-order and multicomponent generalizations of the nonlinear Schrödinger equation are important in various applications, e.g., in optics. One of these equations, the integrable Sasa-Satsuma equation, has particularly interesting soliton solutions. Unfortunately, the construction of multisoliton solutions to this equation presents difficulties due to its complicated bilinearization. We discuss briefly some previous attempts and then give the correct bilinearization based on the interpretation of the Sasa-Satsuma equation as a reduction of the three-component Kadomtsev-Petviashvili hierarchy. In the process, we also get bilinearizations and multisoliton formulas for a two-component generalization of the Sasa-Satsuma equation (the Yajima-Oikawa-Tasgal-Potasek model), and for a (2+1)-dimensional generalization.
Role of STAT3 in Angiotensin II-Induced Hypertension and Cardiac Remodeling Revealed by Mice Lacking STAT3 Serine 727 Phosphorylation

PubMed Central

Zouein, Fouad A.; Zgheib, Carlos; Hamza, Shereen; Fuseler, John W.; Hall, John E.; Soljancic, Andrea; Lopez-Ruiz, Arnaldo; Kurdi, Mazen; Booz, George W.

2013-01-01

STAT3 is involved in protection of the heart provided by ischemic preconditioning. However, the role of this transcription factor in the heart in chronic stresses such as hypertension has not been defined. We assessed whether STAT3 is important in hypertension-induced cardiac remodeling using mice with reduced STAT3 activity due to a S727A mutation (SA/SA). Wild type (WT) and SA/SA mice received angiotensin (ANG) II or saline for 17 days. ANG II increased mean arterial and systolic pressure in SA/SA and WT mice, but cardiac levels of cytokines associated with heart failure were increased less in SA/SA mice. Unlike WT mice, hearts of SA/SA mice showed signs of developing systolic dysfunction as evidenced by reduction in ejection fraction and fractional shortening. In the left ventricle of both WT and SA/SA mice, ANG II induced fibrosis. However, fibrosis in SA/SA mice appeared more extensive and was associated with loss of myocytes. Cardiac hypertrophy as indexed by heart to body weight ratio and left ventricular anterior wall dimension during diastole was greater in WT mice. In WT+ANG II mice there was an increase in the mass of individual myofibrils. In contrast, cardiac myocytes of SA/SA+ANG II mice showed a loss in myofibrils and myofibrillar mass density was decreased during ANG II infusion. Our findings reveal that STAT3 transcriptional activity is important for normal cardiac myocyte myofibril morphology. Loss of STAT3 may impair cardiac function in the hypertensive heart due to defective myofibrillar structure and remodeling that may lead to heart failure. PMID:23364341
FreeSASA: An open source C library for solvent accessible surface area calculations.

PubMed

Mitternacht, Simon

2016-01-01

Calculating solvent accessible surface areas (SASA) is a run-of-the-mill calculation in structural biology. Although there are many programs available for this calculation, there are no free-standing, open-source tools designed for easy tool-chain integration. FreeSASA is an open source C library for SASA calculations that provides both command-line and Python interfaces in addition to its C API. The library implements both Lee and Richards' and Shrake and Rupley's approximations, and is highly configurable to allow the user to control molecular parameters, accuracy and output granularity. It only depends on standard C libraries and should therefore be easy to compile and install on any platform. The library is well-documented, stable and efficient. The command-line interface can easily replace closed source legacy programs, with comparable or better accuracy and speed, and with some added functionality.
The impact of SASA!, a community mobilisation intervention, on women's experiences of intimate partner violence: secondary findings from a cluster randomised trial in Kampala, Uganda

PubMed Central

Abramsky, Tanya; Devries, Karen M; Michau, Lori; Nakuti, Janet; Musuya, Tina; Kyegombe, Nambusi; Watts, Charlotte

2016-01-01

Background Intimate partner violence (IPV) is a global public health and human rights concern, though there is limited evidence on how to prevent it. This secondary analysis of data from the SASA! Study assesses the potential of a community mobilisation IPV prevention intervention to reduce overall prevalence of IPV, new onset of abuse (primary prevention) and continuation of prior abuse (secondary prevention). Methods A pair-matched cluster randomised controlled trial was conducted in 8 communities (4 intervention, 4 control) in Kampala, Uganda (2007–2012). Cross-sectional surveys of community members, 18–49 years old, were undertaken at baseline (n=1583) and 4 years postintervention implementation (n=2532). Outcomes relate to women's past year experiences of physical and sexual IPV, emotional aggression, controlling behaviours and fear of partner. An adjusted cluster-level intention-to-treat analysis compared outcomes in intervention and control communities at follow-up. Results At follow-up, all types of IPV (including severe forms of each) were lower in intervention communities compared with control communities. SASA! was associated with lower onset of abuse and lower continuation of prior abuse. Statistically significant effects were observed for continued physical IPV (adjusted risk ratio 0.42, 95% CI 0.18 to 0.96); continued sexual IPV (0.68, 0.53 to 0.87); continued emotional aggression (0.68, 0.52 to 0.89); continued fear of partner (0.67, 0.51 to 0.89); and new onset of controlling behaviours (0.38, 0.23 to 0.62). Conclusions Community mobilisation is an effective means for both primary and secondary prevention of IPV. Further support should be given to the replication and scale up of SASA! and other similar interventions. Trial registration number NCT00790959 PMID:26873948
The taxonomic implication of frontal tubercles in Polypedilum subgenera diagnoses, with re-description of Polypedilum isigabeceum Sasa & Suzuki (Diptera, Chironomidae).

PubMed

Yamamoto, Nao; Yamamoto, Masaru

2016-11-15

Polypedilum isigabeceum Sasa et Suzuki, 2000 was described as belonging to subgenus Polypedilum s. str. However, if we accept the conclusion of Sæther et al. (2010), the species might be placed into Kribionympha with P. unagiquartum Sasa, 1985 because of the presence of distinct frontal tubercles in the adult males. However, other taxonomic characters do not support their treatment. P. isigabeceum is re-described and reconfirmed to be assigned to the subgenus Polypedilum s. str. The taxonomic meaning of frontal tubercles is discussed for defining the subgeneric rankings within genus Polypedilum.
Examining diffusion to understand the how of SASA!, a violence against women and HIV prevention intervention in Uganda.

PubMed

Starmann, Elizabeth; Heise, Lori; Kyegombe, Nambusi; Devries, Karen; Abramsky, Tanya; Michau, Lori; Musuya, Tina; Watts, Charlotte; Collumbien, Martine

2018-05-11

A growing number of complex public health interventions combine mass media with community-based "change agents" and/or mobilisation efforts acting at multiple levels. While impact evaluations are important, there is a paucity of research into the more nuanced roles intervention and social network factors may play in achieving intervention outcomes, making it difficult to understand how different aspects of the intervention worked (or did not). This study applied aspects of diffusion of innovations theory to explore how SASA!, a community mobilisation approach for preventing HIV and violence against women, diffused within intervention communities and the factors that influenced the uptake of new ideas and behaviours around intimate partner relationships and violence. This paper is based on a qualitative study of couples living in SASA communities and secondary analysis of endline quantitative data collected as part of a cluster randomised control trial designed to evaluate the impact of the SASA! The primary trial was conducted in eight communities in Kampala, Uganda between 2007 and 2012. The secondary analysis of follow up survey data used multivariate logistic regression to examine associations between intervention exposure and interpersonal communication, and relationship change (n = 928). The qualitative study used in-depth interviews (n = 20) and framework analysis methods to explore the intervention attributes that facilitated engagement with the intervention and uptake of new ideas and behaviours in intimate relationships. We found communication materials and mid media channels generated awareness and knowledge, while the concurrent influence from interpersonal communication with community-based change agents and social network members more frequently facilitated changes in behaviour. The results indicate combining community mobilisation components, programme content that reflects peoples' lives and direct support through local change agents can facilitate diffusion and powerful collective change processes in communities. This study makes clear the value of applying diffusion of innovations theory to illuminate how complex public health intervention evaluations effect change. It also contributes to our knowledge of partner violence prevention in a low-income, urban East African context. ClinicalTrials.gov # NCT00790959 . Registered 13th November 2008.
Seasonal Variations of the Antioxidant Composition in Ground Bamboo Sasa argenteastriatus Leaves

PubMed Central

Ni, Qinxue; Xu, Guangzhi; Wang, Zhiqiang; Gao, Qianxin; Wang, Shu; Zhang, Youzuo

2012-01-01

Sasa argenteastriatus, with abundant active compounds and high antioxidant activity in leaves, is a new leafy bamboo grove suitable for exploitation. To utilize it more effectively and scientifically, we investigate the seasonal variations of antioxidant composition in its leaves and antioxidant activity. The leaves of Sasa argenteastriatus were collected on the 5th day of each month in three same-sized sample plots from May 2009 to May 2011. The total flavonoids (TF): phenolics (TP) and triterpenoid (TT) of bamboo leaves were extracted and the contents analyzed by UV-spectrophotometer. Our data showed that all exhibited variations with the changing seasons, with the highest levels appearing in November to March. Antioxidant activity was measured using DPPH and FRAP methods. The highest antioxidant activity appeared in December with the lowest in May. Correlation analyses demonstrated that TP and TF exhibited high correlation with bamboo antioxidant activity. Eight bamboo characteristic compounds (orientin, isoorientin, vitexin, homovitexin and p-coumaric acid, chlorogenic acid, caffeic acid, ferulic acid) were determined by RP-HPLC synchronously. We found that chlorogenic acid, isoorientin and vitexin are the main compounds in Sasa argenteastriatus leaves and the content of isovitexin and chlorogenic acid showed a similar seasonal variation with the TF, TP and TT. Our results suggested that the optimum season for harvesting Sasa argenteastriatus leaves is between autumn and winter. PMID:22408451
ATWS analysis for Browns Ferry Nuclear Plant Unit 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dallman, R.J.; Jouse, W.C.

1985-01-01

Analyses of postulated Anticipated Transients Without Scram (ATWS) were performed at the Idaho National Engineering Laboratory (INEL). The Browns Ferry Nuclear Plant Unit 1 (BFNP1) was selected as the subject of this work because of the cooperation of the Tennessee Valley Authority (TVA). The work is part of the Severe Accident Sequence Analysis (SASA) Program of the US Nuclear Regulatory Commission (NRC). A Main Steamline Isolation Valve (MSIV) closure served as the transient initiator for these analyses, which proceeded a complete failure to scram. Results from the analyses indicate that operator mitigative actions are required to prevent overpressurization of themore » primary containment. Uncertainties remain concerning the effectiveness of key mitigative actions. The effectiveness of level control as a power reduction procedure is limited. Power level resulting from level control only reduce the Pressure Suppression Pool (PSP) heatup rate from 6 to 4F/min.« less
Dark soliton solution of Sasa-Satsuma equation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ohta, Y.

2010-03-08

The Sasa-Satsuma equation is a higher order nonlinear Schroedinger type equation which admits bright soliton solutions with internal freedom. We present the dark soliton solutions for the equation by using Gram type determinant. The dark solitons have no internal freedom and exist for both defocusing and focusing equations.
Probing protein surface with a solvent mimetic carbene coupled to detection by mass spectrometry.

PubMed

Gómez, Gabriela E; Mundo, Mariana R; Craig, Patricio O; Delfino, José M

2012-01-01

Much knowledge into protein folding, ligand binding, and complex formation can be derived from the examination of the nature and size of the accessible surface area (SASA) of the polypeptide chain, a key parameter in protein science not directly measurable in an experimental fashion. To this end, an ideal chemical approach should aim at exerting solvent mimicry and achieving minimal selectivity to probe the protein surface regardless of its chemical nature. The choice of the photoreagent diazirine to fulfill these goals arises from its size comparable to water and from being a convenient source of the extremely reactive methylene carbene (:CH(2)). The ensuing methylation depends primarily on the solvent accessibility of the polypeptide chain, turning it into a valuable signal to address experimentally the measurement of SASA in proteins. The superb sensitivity and high resolution of modern mass spectrometry techniques allows us to derive a quantitative signal proportional to the extent of modification (EM) of the sample. Thus, diazirine labeling coupled to electrospray mass spectrometry (ESI-MS) detection can shed light on conformational features of the native as well as non-native states, not easily addressable by other methods. Enzymatic fragmentation of the polypeptide chain at the level of small peptides allows us to locate the covalent tag along the amino acid sequence, therefore enabling the construction of a map of solvent accessibility. Moreover, by subsequent MS/MS analysis of peptides, we demonstrate here the feasibility of attaining amino acid resolution in defining the target sites. © American Society for Mass Spectrometry, 2011
Certain bright soliton interactions of the Sasa-Satsuma equation in a monomode optical fiber

NASA Astrophysics Data System (ADS)

Liu, Lei; Tian, Bo; Chai, Han-Peng; Yuan, Yu-Qiang

2017-03-01

Under investigation in this paper is the Sasa-Satsuma equation, which describes the propagation of ultrashort pulses in a monomode fiber with the third-order dispersion, self-steepening, and stimulated Raman scattering effects. Based on the known bilinear forms, through the modified expanded formulas and symbolic computation, we construct the bright two-soliton solutions. Through classifying the interactions under different parameter conditions, we reveal six cases of interactions between the two solitons via an asymptotic analysis. With the help of the analytic and graphic analysis, we find that such interactions are different from those of the nonlinear Schrödinger equation and Hirota equation. When those solitons interact with each other, the singular-I soliton is shape-preserving, while the singular-II and nonsingular solitons may be shape preserving or shape changing. Such elastic and inelastic interaction phenomena in a scalar equation might enrich the knowledge of soliton behavior, which could be expected to be experimentally observed.
Certain bright soliton interactions of the Sasa-Satsuma equation in a monomode optical fiber.

PubMed

Liu, Lei; Tian, Bo; Chai, Han-Peng; Yuan, Yu-Qiang

2017-03-01

Under investigation in this paper is the Sasa-Satsuma equation, which describes the propagation of ultrashort pulses in a monomode fiber with the third-order dispersion, self-steepening, and stimulated Raman scattering effects. Based on the known bilinear forms, through the modified expanded formulas and symbolic computation, we construct the bright two-soliton solutions. Through classifying the interactions under different parameter conditions, we reveal six cases of interactions between the two solitons via an asymptotic analysis. With the help of the analytic and graphic analysis, we find that such interactions are different from those of the nonlinear Schrödinger equation and Hirota equation. When those solitons interact with each other, the singular-I soliton is shape-preserving, while the singular-II and nonsingular solitons may be shape preserving or shape changing. Such elastic and inelastic interaction phenomena in a scalar equation might enrich the knowledge of soliton behavior, which could be expected to be experimentally observed.
Twisted rogue-wave pairs in the Sasa-Satsuma equation.

PubMed

Chen, Shihua

2013-08-01

Exact explicit rogue wave solutions of the Sasa-Satsuma equation are obtained by use of a Darboux transformation. In addition to the double-peak structure and an analog of the Peregrine soliton, the rogue wave can exhibit an intriguing twisted rogue-wave pair that involves four well-defined zero-amplitude points. This exotic structure may enrich our understanding on the nature of rogue waves.
Applications of the RELAP5 code to the station blackout transients at the Browns Ferry Unit One Plant

DOE Office of Scientific and Technical Information (OSTI.GOV)

Schultz, R.R.; Wagoner, S.R.

1983-01-01

As a part of the charter of the Severe Accident Sequence Analysis (SASA) Program, station blackout transients have been analyzed using a RELAP5 model of the Browns Ferry Unit 1 Plant. The task was conducted as a partial fulfillment of the needs of the US Nuclear Regulatory Commission in examining the Unresolved Safety Issue A-44: Station Blackout (1) the station blackout transients were examined (a) to define the equipment needed to maintain a well cooled core, (b) to determine when core uncovery would occur given equipment failure, and (c) to characterize the behavior of the vessel thermal-hydraulics during the stationmore » blackout transients (in part as the plant operator would see it). These items are discussed in the paper. Conclusions and observations specific to the station blackout are presented.« less
Structural, molecular motions, and free-energy landscape of Leishmania sterol-14α-demethylase wild type and drug resistant mutant: a comparative molecular dynamics study.

PubMed

Vijayakumar, Saravanan; Das, Pradeep

2018-04-18

Sterol-14α-demethylase (CYP51) is an ergosterol pathway enzyme crucial for the survival of infectious Leishmania parasite. Recent high-throughput metabolomics and whole genome sequencing study revealed amphotericin B resistance in Leishmania is indeed due to mutation in CYP51. The residue of mutation (asparagine 176) is conserved across the kinetoplastidae and not in yeast or humans, portraying its functional significance. In order to understand the possible cause for the resistance, knowledge of structural changes due to mutation is of high importance. To shed light on the structural changes of wild and mutant CYP51, we conducted comparative molecular dynamics simulation study. The active site, substrate biding cavity, substrate channel entrance (SCE), and cavity involving the mutated site were studied based on basic parameters and large concerted molecular motions derived from essential dynamics analyses of 100 ns simulation. Results indicated that mutant CYP51 is stable and less compact than the wild type. Correspondingly, the solvent accessible surface area (SASA) of the mutant was found to be increased, especially in active site and cavities not involving the mutation site. Free-energy landscape analysis disclosed mutant to have a rich conformational diversity than wild type, with various free-energy conformations of mutant having SASA greater than wild type with SCE open. More residues were found to interact with the mutant CYP51 upon docking of substrate to both the wild and mutant CYP51. These results indicate that, relative to wild type, the N176I mutation of CYP51 in Leishmania mexicana could possibly favor increased substrate binding efficiency.
A metagenetic approach to determine the diversity and distribution of cyst nematodes at the level of the country, the field and the individual.

PubMed

Eves-van den Akker, Sebastian; Lilley, Catherine J; Reid, Alex; Pickup, Jon; Anderson, Eric; Cock, Peter J A; Blaxter, Mark; Urwin, Peter E; Jones, John T; Blok, Vivian C

2015-12-01

Distinct populations of the potato cyst nematode (PCN) Globodera pallida exist in the UK that differ in their ability to overcome various sources of resistance. An efficient method for distinguishing between populations would enable pathogen-informed cultivar choice in the field. Science and Advice for Scottish Agriculture (SASA) annually undertake national DNA diagnostic tests to determine the presence of PCN in potato seed and ware land by extracting DNA from soil floats. These DNA samples provide a unique resource for monitoring the distribution of PCN and further interrogation of the diversity within species. We identify a region of mitochondrial DNA descriptive of three main groups of G. pallida present in the UK and adopt a metagenetic approach to the sequencing and analysis of all SASA samples simultaneously. Using this approach, we describe the distribution of G. pallida mitotypes across Scotland with field-scale resolution. Most fields contain a single mitotype, one-fifth contain a mix of mitotypes, and less than 3% contain all three mitotypes. Within mixed fields, we were able to quantify the relative abundance of each mitotype across an order of magnitude. Local areas within mixed fields are dominated by certain mitotypes and indicate towards a complex underlying 'pathoscape'. Finally, we assess mitotype distribution at the level of the individual cyst and provide evidence of 'hybrids'. This study provides a method for accurate, quantitative and high-throughput typing of up to one thousand fields simultaneously, while revealing novel insights into the national genetic variability of an economically important plant parasite. © 2015 The Authors. Molecular Ecology Published by John Wiley & Sons Ltd.
What is the potential for interventions designed to prevent violence against women to reduce children's exposure to violence? Findings from the SASA! study, Kampala, Uganda.

PubMed

Kyegombe, Nambusi; Abramsky, Tanya; Devries, Karen M; Michau, Lori; Nakuti, Janet; Starmann, Elizabeth; Musuya, Tina; Heise, Lori; Watts, Charlotte

2015-12-01

Intimate partner violence (IPV) and child maltreatment often co-occur in households and lead to negative outcomes for children. This article explores the extent to which SASA!, an intervention to prevent violence against women, impacted children's exposure to violence. Between 2007 and 2012 a cluster randomized controlled trial was conducted in Kampala, Uganda. An adjusted cluster-level intention to treat analysis, compares secondary outcomes in intervention and control communities at follow-up. Under the qualitative evaluation, 82 in-depth interviews were audio recorded at follow-up, transcribed verbatim, and analyzed using thematic analysis complemented by constant comparative methods. This mixed-methods article draws mainly on the qualitative data. The findings suggest that SASA! impacted on children's experience of violence in three main ways. First, quantitative data suggest that children's exposure to IPV was reduced. We estimate that reductions in IPV combined with reduced witnessing by children when IPV did occur, led to a 64% reduction in prevalence of children witnessing IPV in their home (aRR 0.36, 95% CI 0.06-2.20). Second, among couples who experienced reduced IPV, qualitative data suggests parenting and discipline practices sometimes also changed-improving parent-child relationships and for a few parents, resulting in the complete rejection of corporal punishment as a disciplinary method. Third, some participants reported intervening to prevent violence against children. The findings suggest that interventions to prevent IPV may also impact on children's exposure to violence, and improve parent-child relationships. They also point to potential synergies for violence prevention, an area meriting further exploration. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.
A community mobilisation intervention to prevent violence against women and reduce HIV/AIDS risk in Kampala, Uganda (the SASA! Study): study protocol for a cluster randomised controlled trial.

PubMed

Abramsky, Tanya; Devries, Karen; Kiss, Ligia; Francisco, Leilani; Nakuti, Janet; Musuya, Tina; Kyegombe, Nambusi; Starmann, Elizabeth; Kaye, Dan; Michau, Lori; Watts, Charlotte

2012-06-29

Gender based violence, including violence by an intimate partner, is a major global human rights and public health problem, with important connections with HIV risk. Indeed, the elimination of sexual and gender based violence is a core pillar of HIV prevention for UNAIDS. Integrated strategies to address the gender norms, relations and inequities that underlie both violence against women and HIV/AIDS are needed. However there is limited evidence about the potential impact of different intervention models. This protocol describes the SASA! an evaluation of a community mobilisation intervention to prevent violence against women and reduce HIV/AIDS risk in Kampala, Uganda. The SASA! STUDY is a pair-matched cluster randomised controlled trial being conducted in eight communities in Kampala. It is designed to assess the community-level impact of the SASA! intervention on the following six primary outcomes: attitudes towards the acceptability of violence against women and the acceptability of a woman refusing sex (among male and female community members); past year experience of physical intimate partner violence and sexual intimate partner violence (among females); community responses to women experiencing violence (among women reporting past year physical/sexual partner violence); and past year concurrency of sexual partners (among males). 1583 women and men (aged 18-49 years) were surveyed in intervention and control communities prior to intervention implementation in 2007/8. A follow-up cross-sectional survey of community members will take place in 2012. The primary analysis will be an adjusted cluster-level intention to treat analysis, comparing outcomes in intervention and control communities at follow-up. Complementary monitoring and evaluation and qualitative research will be used to explore and describe the process of intervention implementation and the pathways through which change is achieved. This is one of few cluster randomised trials globally to assess the impact of a gender-focused community mobilisation intervention. The multi-disciplinary research approach will enable us to address questions of intervention impact and mechanisms of action, as well as its feasibility, acceptability and transferability to other contexts. The results will be of importance to researchers, policy makers and those working on the front line to prevent violence against women and HIV. ClinicalTrials.Gov NCT00790959.
A community mobilisation intervention to prevent violence against women and reduce HIV/AIDS risk in Kampala, Uganda (the SASA! Study): study protocol for a cluster randomised controlled trial

PubMed Central

2012-01-01

Background Gender based violence, including violence by an intimate partner, is a major global human rights and public health problem, with important connections with HIV risk. Indeed, the elimination of sexual and gender based violence is a core pillar of HIV prevention for UNAIDS. Integrated strategies to address the gender norms, relations and inequities that underlie both violence against women and HIV/AIDS are needed. However there is limited evidence about the potential impact of different intervention models. This protocol describes the SASA! Study: an evaluation of a community mobilisation intervention to prevent violence against women and reduce HIV/AIDS risk in Kampala, Uganda. Methods/Design The SASA! Study is a pair-matched cluster randomised controlled trial being conducted in eight communities in Kampala. It is designed to assess the community-level impact of the SASA! intervention on the following six primary outcomes: attitudes towards the acceptability of violence against women and the acceptability of a woman refusing sex (among male and female community members); past year experience of physical intimate partner violence and sexual intimate partner violence (among females); community responses to women experiencing violence (among women reporting past year physical/sexual partner violence); and past year concurrency of sexual partners (among males). 1583 women and men (aged 18–49 years) were surveyed in intervention and control communities prior to intervention implementation in 2007/8. A follow-up cross-sectional survey of community members will take place in 2012. The primary analysis will be an adjusted cluster-level intention to treat analysis, comparing outcomes in intervention and control communities at follow-up. Complementary monitoring and evaluation and qualitative research will be used to explore and describe the process of intervention implementation and the pathways through which change is achieved. Discussion This is one of few cluster randomised trials globally to assess the impact of a gender-focused community mobilisation intervention. The multi-disciplinary research approach will enable us to address questions of intervention impact and mechanisms of action, as well as its feasibility, acceptability and transferability to other contexts. The results will be of importance to researchers, policy makers and those working on the front line to prevent violence against women and HIV. Trial registration ClinicalTrials.Gov NCT00790959 PMID:22747846
DBAC: A simple prediction method for protein binding hot spots based on burial levels and deeply buried atomic contacts

PubMed Central

2011-01-01

Background A protein binding hot spot is a cluster of residues in the interface that are energetically important for the binding of the protein with its interaction partner. Identifying protein binding hot spots can give useful information to protein engineering and drug design, and can also deepen our understanding of protein-protein interaction. These residues are usually buried inside the interface with very low solvent accessible surface area (SASA). Thus SASA is widely used as an outstanding feature in hot spot prediction by many computational methods. However, SASA is not capable of distinguishing slightly buried residues, of which most are non hot spots, and deeply buried ones that are usually inside a hot spot. Results We propose a new descriptor called “burial level” for characterizing residues, atoms and atomic contacts. Specifically, burial level captures the depth the residues are buried. We identify different kinds of deeply buried atomic contacts (DBAC) at different burial levels that are directly broken in alanine substitution. We use their numbers as input for SVM to classify between hot spot or non hot spot residues. We achieve F measure of 0.6237 under the leave-one-out cross-validation on a data set containing 258 mutations. This performance is better than other computational methods. Conclusions Our results show that hot spot residues tend to be deeply buried in the interface, not just having a low SASA value. This indicates that a high burial level is not only a necessary but also a more sufficient condition than a low SASA for a residue to be a hot spot residue. We find that those deeply buried atoms become increasingly more important when their burial levels rise up. This work also confirms the contribution of deeply buried interfacial atomic contacts to the energy of protein binding hot spot. PMID:21689480

Social Anxiety Scale for Adolescents: factorial invariance across gender and age in Hispanic American adolescents.

PubMed

La Greca, Annette M; Ingles, Candido J; Lai, Betty S; Marzo, Juan C

2015-04-01

Social anxiety is a common psychological disorder that often emerges during adolescence and is associated with significant impairment. Efforts to prevent social anxiety disorder require sound assessment measures for identifying anxious youth, especially those from minority backgrounds. We examined the factorial invariance and latent mean differences of the Social Anxiety Scale for Adolescents (SAS-A) across gender and age groups in Hispanic American adolescents (N = 1,191; 56% girls; 15-18 years) using multigroup confirmatory factor analyses. Results indicated that the factorial configuration of the correlated three-factor model of the SAS-A was invariant across gender and age. Analyses of latent mean differences revealed that boys exhibited higher structured means than girls on the Social Avoidance and Distress-General (SAD-General) subscale. On all SAS-A subscales, Fear of Negative Evaluation, Social Avoidance and Distress-New, and SAD-General, estimates of the structured means decreased with adolescent age. Implications for further research and clinical practice are discussed. © The Author(s) 2014.
The Social Anxiety Scale for Adolescents: Measurement Invariance and Psychometric Properties Among a School Sample of Portuguese Youths.

PubMed

Pechorro, Pedro; Ayala-Nunes, Lara; Nunes, Cristina; Marôco, João; Gonçalves, Rui Abrunhosa

2016-12-01

Over the last decades there has been an increased interest in assessing social anxiety in adolescents. This study aims to validate the Social Anxiety Scale for Adolescents (SAS-A) to Portuguese youth, and to examine its invariance across gender as well as its psychometric properties. The participants were 782 Portuguese youths (371 males, 411 females), with an average age of 15.87 years (SD = 1.72). The results support the original three-factor structure of the SAS-A, with measurement invariance being found across gender, with females scoring higher than males on two subscales. High levels of internal consistency were found. Positive associations with empathy demonstrated that high socially anxious adolescents have elevated empathy tendencies. Mostly null or low negative associations were found with measures of psychopathic traits, callous-unemotional traits and aggression. Study findings provide evidence that the SAS-A is a psychometrically sound instrument that shows measurement invariance between genders, good reliability and positive correlations with empathy.
Breathers, quasi-periodic and travelling waves for a generalized ?-dimensional Yu-Toda-Sasa-Fukayama equation in fluids

NASA Astrophysics Data System (ADS)

Hu, Wen-Qiang; Gao, Yi-Tian; Zhao, Chen; Jia, Shu-Liang; Lan, Zhong-Zhou

2017-07-01

Under investigation in this paper is a generalized ?-dimensional Yu-Toda-Sasa-Fukayama equation for the interfacial wave in a two-layer fluid or the elastic quasi-plane wave in a liquid lattice. By virtue of the binary Bell polynomials, bilinear form of this equation is obtained. With the help of the bilinear form, N-soliton solutions are obtained via the Hirota method, and a bilinear Bäcklund transformation is derived to verify the integrability. Homoclinic breather waves are obtained according to the homoclinic test approach, which is not only the space-periodic breather but also the time-periodic breather via the graphic analysis. Via the Riemann theta function, quasi one-periodic waves are constructed, which can be viewed as a superposition of the overlapping solitary waves, placed one period apart. Finally, soliton-like, periodical triangle-type, rational-type and solitary bell-type travelling waves are obtained by means of the polynomial expansion method.
Station blackout calculations for Browns Ferry

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ott, L.J.; Weber, C.F.; Hyman, C.R.

1985-01-01

This paper presents the results of calculations performed with the ORNL SASA code suite for the Station Blackout Severe Accident Sequence at Browns Ferry. The accident is initiated by a loss of offsite power combined with failure of all onsite emergency diesel generators to start and load. The Station Blackout is assumed to persist beyond the point of battery exhaustion (at six hours) and without DC power, cooling water could no longer be injected into the reactor vessel. Calculations are continued through the period of core degradation and melting, reactor vessel failure, and the subsequent containment failure. An estimate ofmore » the magnitude and timing of the concomitant fission product releases is also provided.« less
To systematics of the genus Saetheria Jackson (Diptera, Chironomidae) from the Russian Far East.

PubMed

Orel, Oksana V

2014-05-23

The genus Saetheria Jackson from the Russian Far East is reviewed. The males of S. reissi Jackson, 1977, S. tamanipparai (Sasa, 1983) and S. tylus (Townes, 1945) are redescribed and figured. The pupa of S. reissi is redescribed and illustrated. The larva of S. reissi Jackson is described for the first time. Comments on the systematics and distribution of each species are provided. Paracladopelma kisopediformis Sasa, Kondo, 1993 is designated a new junior synonym of S. reissi Jackson, 1977. Keys to the males, pupae and larvae of the Russian Saetheria are given.
Detection of Orientia tsutsugamushi (Rickettsiales: rickettsiaceae) in unengorged chiggers (Acari: Trombiculidae) from Oita Prefecture, Japan, by nested polymerase chain reaction.

PubMed

Pham, X D; Otsuka, Y; Suzuki, H; Takaoka, H

2001-03-01

The current study surveyed the 56-kDa type-specific antigen (TSA) gene DNAs of Orientia tsutsugamushi (Hayashi) in approximately 4.000 unengorged chiggers obtained from the soil or ground surface in an endemic and a nonendemic area of the Tsutsugamushi disease in Oita Prefecture, southwestern Japan, by nested polymerase chain reaction (PCR). Serotypes of O. tsutsugamushi were identified by restriction fragment-length polymorphism (RFLP) analysis. In the endemic area, 242 pools from five species [234 pools of Leptotrombidium scutellare (Nagayo, Miyagawa, Mitamura, Tamiya and Tenjin), two L. pallidum (Nagayo, Miyagawa, Mitamura and Tamiya), four L. kitasatoi (Fukuzumi & Obata), one L. fuji (Kuwata, Berge and Philip), and one Neotrombicula japonica (Tanaka, Kaiwa, Teramura and Kagaya)] were tested, and eight (seven pools of L. scutellare and one N. japonica) were positive for O. tsutsugamushi. Among the seven positive pools of L. scutellare, the distribution of serotypes was as follows: Kuroki (4), Gilliam (1), Karp (1), and Kawasaki (1). The first two serotypes (Kuroki and Gilliam) were identified for the first time in this species. In the nonendemic area, 128 pools from eight species were tested, and 13 were positive for O. tsutsugamushi. The positive rate was as follows: L. pallidum (4/41). L. kitasatoi (1/18), Gahrliepia saduski Womersley (2/10), L. fuji (4/50), L. himizu (Sasa, Kumada, Hayashi, Enomoto, Fukuzumi and Obata) (1/2), and Miyatrombicula kochiensis (Sasa, Kawashima and Egashira) (1/3). The latter three species were shown for the first time to harbor O. tsutsugamushi. All ofthe positive pools were Kuroki, except for two pools (one L. pallidum and one L. fuji), which were Gilliam (this serotype was also detected for the first time in L. pallidum). Further analysis revealed no differences in the nucleotide sequences (125 bp of variable domain 1 of TSA gene) of the same serotypes (i.e., Kuroki and Gilliam) among the positive samples. These data indicate that O. tsutsugamushi was widely distributed in various trombiculid species, even in the nonendemic area. The data are also suggestive of a possible horizontal transmission of O. tsutsugamushi among trombiculid species.
Findings from the SASA! Study: a cluster randomized controlled trial to assess the impact of a community mobilization intervention to prevent violence against women and reduce HIV risk in Kampala, Uganda.

PubMed

Abramsky, Tanya; Devries, Karen; Kiss, Ligia; Nakuti, Janet; Kyegombe, Nambusi; Starmann, Elizabeth; Cundill, Bonnie; Francisco, Leilani; Kaye, Dan; Musuya, Tina; Michau, Lori; Watts, Charlotte

2014-07-31

Intimate partner violence (IPV) and HIV are important and interconnected public health concerns. While it is recognized that they share common social drivers, there is limited evidence surrounding the potential of community interventions to reduce violence and HIV risk at the community level. The SASA! study assessed the community-level impact of SASA!, a community mobilization intervention to prevent violence and reduce HIV-risk behaviors. From 2007 to 2012 a pair-matched cluster randomized controlled trial (CRT) was conducted in eight communities (four intervention and four control) in Kampala, Uganda. Cross-sectional surveys of a random sample of community members, 18- to 49-years old, were undertaken at baseline (n = 1,583) and four years post intervention implementation (n = 2,532). Six violence and HIV-related primary outcomes were defined a priori. An adjusted cluster-level intention-to-treat analysis compared outcomes in intervention and control communities at follow-up. The intervention was associated with significantly lower social acceptance of IPV among women (adjusted risk ratio 0.54, 95% confidence interval (CI) 0.38 to 0.79) and lower acceptance among men (0.13, 95% CI 0.01 to 1.15); significantly greater acceptance that a woman can refuse sex among women (1.28, 95% CI 1.07 to 1.52) and men (1.31, 95% CI 1.00 to 1.70); 52% lower past year experience of physical IPV among women (0.48, 95% CI 0.16 to 1.39); and lower levels of past year experience of sexual IPV (0.76, 95% CI 0.33 to 1.72). Women experiencing violence in intervention communities were more likely to receive supportive community responses. Reported past year sexual concurrency by men was significantly lower in intervention compared to control communities (0.57, 95% CI 0.36 to 0.91). This is the first CRT in sub-Saharan Africa to assess the community impact of a mobilization program on the social acceptability of IPV, the past year prevalence of IPV and levels of sexual concurrency. SASA! achieved important community impacts, and is now being delivered in control communities and replicated in 15 countries. ClinicalTrials.gov #NCT00790959.
Exploring opportunities and challenges for establishing a South American Space Agency

NASA Astrophysics Data System (ADS)

Silva-Martinez, Jackelynne P.; Aguilar, Andrés D.; Sarli, Bruno V.; Pardo Spiess, Monika Johanna; Sorice, Andreia F.; Genaro, Gino; Ojeda, Oscar I.

2018-06-01

The idea of establishing a South American Space Agency (SASA) is not new. There have been many discussions about this topic for a couple of decades, including an agreement by the Union of South American Nations to create such a space agency. Roughly 10 years ago, Argentina was the first to propose this collaboration with a military orientation. As the ideas progressed, Brazil was proposed to host its headquarters. However, not much support from the South American region has been given, either financially or logistically. To this day, a South American Space Agency or a similar concept has not yet been established in the region. The Space Generation Advisory Council (SGAC) hosted the first South American Space Generation Workshop in Argentina in 2015, where one of the working groups was tasked to further investigate the feasibility, advantages and challenges of implementing SASA. This paper presents an extension of the main findings from this working group where South American students and young professionals study and present a rationale in favor of SASA, outlining possible solutions and a structure that could be taken into account for its implementation. This paper pays particular attention to the question: Is it possible for countries in South America to establish the kind of cooperation necessary to stimulate the development and application of capabilities in the space sector, which would then enable undertaking missions far beyond the scope of what any single country in South America could do on its own? The existence of SASA would allow access to a common representative agency, which would lower costs, be accessible to all participating countries, and allow engagement with other emerging and established space agencies around the world.
Evaluation of various molecular parameters as predictors of bioconcentration in fish

DOE Office of Scientific and Technical Information (OSTI.GOV)

Connell, D.W.; Schueuermann G3

1988-06-01

A reliable set of data on the bioconcentration factors (KB) of a diverse range of compounds in fish was selected from the literature. Using the structures of these compounds, the following molecular parameters were calculated: molecular weight (MW), solvent accessible molecular surface area (SASA), solvent accessible molecular volume (SAV), molar refraction (MR), largest principal moment of inertia (LPMI) and several molecular connectivity indices of the Randic type (1 chi, 2 chi, 3 chi, 1 chi vr, 3 chi c). The relationships between these parameters and log KB were evaluated for all compounds and the following subgroups: chlorinated hydrocarbons (CHC), polyaromaticmore » hydrocarbons (PAH), and CHC and PAH combined. These relationships indicated that SASA, SAV, and MR were good predictors of log KB for the CHC and PAH combined or alone and the other parameters were less satisfactory with these groups. In addition with the CHC, the log of these parameters displayed an improved correlation with log KB due to apparent nonlinearity in the log to linear relationship. Thus, with these groups of compounds, calculated values of SASA, SAV, and MR provide a satisfactory means of estimating log KB without measured data.« less
Explicit and exact nontraveling wave solutions of the (3+1)-dimensional potential Yu-Toda-Sasa-Fukuyama equation

NASA Astrophysics Data System (ADS)

Yuan, Na

2018-04-01

With the aid of the symbolic computation, we present an improved ( G ‧ / G ) -expansion method, which can be applied to seek more types of exact solutions for certain nonlinear evolution equations. In illustration, we choose the (3 + 1)-dimensional potential Yu-Toda-Sasa-Fukuyama equation to demonstrate the validity and advantages of the method. As a result, abundant explicit and exact nontraveling wave solutions are obtained including two solitary waves solutions, nontraveling wave solutions and dromion soliton solutions. Some particular localized excitations and the interactions between two solitary waves are researched. The method can be also applied to other nonlinear partial differential equations.
Forensic Spoorology: Seeing and Understanding Human Behavior through Observation, Classification and Interpretation of Spoor Evidence

DTIC Science & Technology

2011-06-10

SASA Stride and Step Analysis SCS Spoor-Chain Signature SID Self -Imposed Distortion SMS Steadfast-Mind State SSA Secondary Spoor Area TEC Track...because there was no self -regulating body of professional trackers whom by education, training, and experience had come together to establish an...registered in soft ground or in snow by isolated tracks. This visibility gap in knowledge leaves an examiner‘s eyes outside the realm of linearity
A Combined Computational and Genetic Approach Uncovers Network Interactions of the Cyanobacterial Circadian Clock.

PubMed

Boyd, Joseph S; Cheng, Ryan R; Paddock, Mark L; Sancar, Cigdem; Morcos, Faruck; Golden, Susan S

2016-09-15

Two-component systems (TCS) that employ histidine kinases (HK) and response regulators (RR) are critical mediators of cellular signaling in bacteria. In the model cyanobacterium Synechococcus elongatus PCC 7942, TCSs control global rhythms of transcription that reflect an integration of time information from the circadian clock with a variety of cellular and environmental inputs. The HK CikA and the SasA/RpaA TCS transduce time information from the circadian oscillator to modulate downstream cellular processes. Despite immense progress in understanding of the circadian clock itself, many of the connections between the clock and other cellular signaling systems have remained enigmatic. To narrow the search for additional TCS components that connect to the clock, we utilized direct-coupling analysis (DCA), a statistical analysis of covariant residues among related amino acid sequences, to infer coevolution of new and known clock TCS components. DCA revealed a high degree of interaction specificity between SasA and CikA with RpaA, as expected, but also with the phosphate-responsive response regulator SphR. Coevolutionary analysis also predicted strong specificity between RpaA and a previously undescribed kinase, HK0480 (herein CikB). A knockout of the gene for CikB (cikB) in a sasA cikA null background eliminated the RpaA phosphorylation and RpaA-controlled transcription that is otherwise present in that background and suppressed cell elongation, supporting the notion that CikB is an interactor with RpaA and the clock network. This study demonstrates the power of DCA to identify subnetworks and key interactions in signaling pathways and of combinatorial mutagenesis to explore the phenotypic consequences. Such a combined strategy is broadly applicable to other prokaryotic systems. Signaling networks are complex and extensive, comprising multiple integrated pathways that respond to cellular and environmental cues. A TCS interaction model, based on DCA, independently confirmed known interactions and revealed a core set of subnetworks within the larger HK-RR set. We validated high-scoring candidate proteins via combinatorial genetics, demonstrating that DCA can be utilized to reduce the search space of complex protein networks and to infer undiscovered specific interactions for signaling proteins in vivo Significantly, new interactions that link circadian response to cell division and fitness in a light/dark cycle were uncovered. The combined analysis also uncovered a more basic core clock, illustrating the synergy and applicability of a combined computational and genetic approach for investigating prokaryotic signaling networks. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Socio-environmental factors associated with self-rated oral health in South Africa: a multilevel effects model.

PubMed

Olutola, Bukola G; Ayo-Yusuf, Olalekan A

2012-10-02

This study examined the influence of the social context in which people live on self-ratings of their oral health. This study involved a representative sample of 2,907 South African adults (≥16 years) who participated in the 2007 South African Social Attitude Survey (SASAS). We used the 2005 General Household Survey (n = 107,987 persons from 28,129 households) to obtain living environment characteristics of SASAS participants, including sources of water and energy, and household cell-phone ownership (a proxy measure for the social network available to them). Information obtained from SASAS included socio-demographic data, respondents' level of trust in people, oral health behaviors and self-rated oral health. Of the respondents, 76.3% self-rated their oral health as good. Social context influenced women's self-rated oral health differently from that of men. Good self-rated oral health was significantly higher among non-smokers, employed respondents and women living in areas with higher household cell-phone ownership. Furthermore, trust and higher social position were associated with good self-rated oral health among men and women respectively. Overall, 55.1% and 18.3% of the variance in self-rated oral health were explained by factors operating at the individual and community levels respectively. The findings highlight the potential role of social capital in improving the population's oral health.
Molecular modeling of class I and II alleles of the major histocompatibility complex in Salmo salar.

PubMed

Cárdenas, Constanza; Bidon-Chanal, Axel; Conejeros, Pablo; Arenas, Gloria; Marshall, Sergio; Luque, F Javier

2010-12-01

Knowledge of the 3D structure of the binding groove of major histocompatibility (MHC) molecules, which play a central role in the immune response, is crucial to shed light into the details of peptide recognition and polymorphism. This work reports molecular modeling studies aimed at providing 3D models for two class I and two class II MHC alleles from Salmo salar (Sasa), as the lack of experimental structures of fish MHC molecules represents a serious limitation to understand the specific preferences for peptide binding. The reliability of the structural models built up using bioinformatic tools was explored by means of molecular dynamics simulations of their complexes with representative peptides, and the energetics of the MHC-peptide interaction was determined by combining molecular mechanics interaction energies and implicit continuum solvation calculations. The structural models revealed the occurrence of notable differences in the nature of residues at specific positions in the binding groove not only between human and Sasa MHC proteins, but also between different Sasa alleles. Those differences lead to distinct trends in the structural features that mediate the binding of peptides to both class I and II MHC molecules, which are qualitatively reflected in the relative binding affinities. Overall, the structural models presented here are a valuable starting point to explore the interactions between MHC receptors and pathogen-specific interactions and to design vaccines against viral pathogens.
Constraints on quantum information field and “human gain medium” making possible functioning of social laser

NASA Astrophysics Data System (ADS)

Khrennikov, Andrei

2017-08-01

Starting with the quantum-like paradigm on application of quantum information and probability outside of physics we proceed to the social laser model describing Stimulated Amplification of Social Actions (SASA). The basic components of social laser are the quantum information field carrying information excitations and the human gain medium. The aim of this note is to analyze constraints on these components making possible SASA. The soical laser model can be used to explain the recent wave of color revolutions as well as such “unpredictable events” as Brexit and election of Donald Trump as the president of the United States of America. The presented quantum-like model is not only descriptive. We shall list explicitly conditions for creation of social laser.
Inferring energy dissipation from violation of the fluctuation-dissipation theorem

NASA Astrophysics Data System (ADS)

Wang, Shou-Wen

2018-05-01

The Harada-Sasa equality elegantly connects the energy dissipation rate of a moving object with its measurable violation of the Fluctuation-Dissipation Theorem (FDT). Although proven for Langevin processes, its validity remains unclear for discrete Markov systems whose forward and backward transition rates respond asymmetrically to external perturbation. A typical example is a motor protein called kinesin. Here we show generally that the FDT violation persists surprisingly in the high-frequency limit due to the asymmetry, resulting in a divergent FDT violation integral and thus a complete breakdown of the Harada-Sasa equality. A renormalized FDT violation integral still well predicts the dissipation rate when each discrete transition produces a small entropy in the environment. Our study also suggests a way to infer this perturbation asymmetry based on the measurable high-frequency-limit FDT violation.
Dispersive optical soliton solutions for higher order nonlinear Sasa-Satsuma equation in mono mode fibers via new auxiliary equation method

NASA Astrophysics Data System (ADS)

Khater, Mostafa M. A.; Seadawy, Aly R.; Lu, Dianchen

2018-01-01

In this research, we apply new technique for higher order nonlinear Schrödinger equation which is representing the propagation of short light pulses in the monomode optical fibers and the evolution of slowly varying packets of quasi-monochromatic waves in weakly nonlinear media that have dispersion. Nonlinear Schrödinger equation is one of the basic model in fiber optics. We apply new auxiliary equation method for nonlinear Sasa-Satsuma equation to obtain a new optical forms of solitary traveling wave solutions. Exact and solitary traveling wave solutions are obtained in different kinds like trigonometric, hyperbolic, exponential, rational functions, …, etc. These forms of solutions that we represent in this research prove the superiority of our new technique on almost thirteen powerful methods. The main merits of this method over the other methods are that it gives more general solutions with some free parameters.
Integrable aspects and rogue wave solution of Sasa-Satsuma equation with variable coefficients in the inhomogeneous fiber

NASA Astrophysics Data System (ADS)

Zhang, Yu-Ping; Yu, Lan; Wei, Guang-Mei

2018-02-01

Under investigation with symbolic computation in this paper, is a variable-coefficient Sasa-Satsuma equation (SSE) which can describe the ultra short pulses in optical fiber communications and propagation of deep ocean waves. By virtue of the extended Ablowitz-Kaup-Newell-Segur system, Lax pair for the model is directly constructed. Based on the obtained Lax pair, an auto-Bäcklund transformation is provided, then the explicit one-soliton solution is obtained. Meanwhile, an infinite number of conservation laws in explicit recursion forms are derived to indicate its integrability in the Liouville sense. Furthermore, exact explicit rogue wave (RW) solution is presented by use of a Darboux transformation. In addition to the double-peak structure and an analog of the Peregrine soliton, the RW can exhibit graphically an intriguing twisted rogue-wave (TRW) pair that involve four well-defined zero-amplitude points.
Squared eigenfunctions for the Sasa-Satsuma equation

NASA Astrophysics Data System (ADS)

Yang, Jianke; Kaup, D. J.

2009-02-01

Squared eigenfunctions are quadratic combinations of Jost functions and adjoint Jost functions which satisfy the linearized equation of an integrable equation. They are needed for various studies related to integrable equations, such as the development of its soliton perturbation theory. In this article, squared eigenfunctions are derived for the Sasa-Satsuma equation whose spectral operator is a 3×3 system, while its linearized operator is a 2×2 system. It is shown that these squared eigenfunctions are sums of two terms, where each term is a product of a Jost function and an adjoint Jost function. The procedure of this derivation consists of two steps: First is to calculate the variations of the potentials via variations of the scattering data by the Riemann-Hilbert method. The second one is to calculate the variations of the scattering data via the variations of the potentials through elementary calculations. While this procedure has been used before on other integrable equations, it is shown here, for the first time, that for a general integrable equation, the functions appearing in these variation relations are precisely the squared eigenfunctions and adjoint squared eigenfunctions satisfying, respectively, the linearized equation and the adjoint linearized equation of the integrable system. This proof clarifies this procedure and provides a unified explanation for previous results of squared eigenfunctions on individual integrable equations. This procedure uses primarily the spectral operator of the Lax pair. Thus two equations in the same integrable hierarchy will share the same squared eigenfunctions (except for a time-dependent factor). In the Appendix, the squared eigenfunctions are presented for the Manakov equations whose spectral operator is closely related to that of the Sasa-Satsuma equation.
Stability analysis solutions and optical solitons in extended nonlinear Schrödinger equation with higher-order odd and even terms

NASA Astrophysics Data System (ADS)

Peng, Wei-Qi; Tian, Shou-Fu; Zou, Li; Zhang, Tian-Tian

2018-01-01

In this paper, the extended nonlinear Schrödinger equation with higher-order odd (third order) and even (fourth order) terms is investigated, whose particular cases are the Hirota equation, the Sasa-Satsuma equation and Lakshmanan-Porsezian-Daniel equation by selecting some specific values on the parameters of higher-order terms. We first study the stability analysis of the equation. Then, using the ansatz method, we derive its bright, dark solitons and some constraint conditions which can guarantee the existence of solitons. Moreover, the Ricatti equation extension method is employed to derive some exact singular solutions. The outstanding characteristics of these solitons are analyzed via several diverting graphics.

vmdICE: a plug-in for rapid evaluation of molecular dynamics simulations using VMD.

PubMed

Knapp, Bernhard; Lederer, Nadja; Omasits, Ulrich; Schreiner, Wolfgang

2010-12-01

Molecular dynamics (MD) is a powerful in silico method to investigate the interactions between biomolecules. It solves Newton's equations of motion for atoms over a specified period of time and yields a trajectory file, containing the different spatial arrangements of atoms during the simulation. The movements and energies of each single atom are recorded. For evaluating of these simulation trajectories with regard to biomedical implications, several methods are available. Three well-known ones are the root mean square deviation (RMSD), the root mean square fluctuation (RMSF) and solvent accessible surface area (SASA). Herein, we present a novel plug-in for the software "visual molecular dynamics" (VMD) that allows an interactive 3D representation of RMSD, RMSF, and SASA, directly on the molecule. On the one hand, our plug-in is easy to handle for inexperienced users, and on the other hand, it provides a fast and flexible graphical impression of the spatial dynamics of a system for experts in the field. © 2010 Wiley Periodicals, Inc.
Major histocompatibility complex loci are associated with susceptibility of Atlantic salmon to infectious hematopoietic necrosis virus

USGS Publications Warehouse

Miller, Kristina M.; Winton, James R.; Schulze, Angela D.; Purcell, Maureen K.; Ming, Tobi J.

2004-01-01

Infectious hematopoietic necrosis virus (IHNV) is one of the most significant viral pathogens of salmonids and is a leading cause of death among cultured juvenile fish. Although several vaccine strategies have been developed, some of which are highly protective, the delivery systems are still too costly for general use by the aquaculture industry. More cost effective methods could come from the identification of genes associated with IHNV resistance for use in selective breeding. Further, identification of susceptibility genes may lead to an improved understanding of viral pathogenesis and may therefore aid in the development of preventive and therapeutic measures. Genes of the major histocompatibility complex (MHC), involved in the primary recognition of foreign pathogens in the acquired immune response, are associated with resistance to a variety of diseases in vertebrate organisms. We conducted a preliminary analysis of MHC disease association in which an aquaculture strain of Atlantic salmon was challenged with IHNV at three different doses and individual fish were genotyped at three MHC loci using denaturing gradient gel electrophoresis (PCR-DGGE), followed by sequencing of all differentiated alleles. Nine to fourteen alleles per exon-locus were resolved, and alleles potentially associated with resistance or susceptibility were identified. One allele (Sasa-B-04) from a potentially non-classical class I locus was highly associated with resistance to infectious hematopoietic necrosis (p < 0.01). This information can be used to design crosses of specific haplotypes for family analysis of disease associations.
Leptocorticium (Corticiaceae s.l., Basidiomycota): new species and combinations

Treesearch

Karen K. Nakasone

2005-01-01

The genus Leptocorticium is redescribed, and a key to the species is provided. A new taxon, Leptocorticium tenellum, is described, and two new combinations, L. sasae and L. utribasidiatum, are proposed. Dentocorticium nephrolepidis is determined to be conspecific with L. cyatheae. All four species are described and illustrated.
Anti-Halitosis Effect of Toothpaste Supplemented with Alkaline Extract of the Leaves of Sasa senanensis Rehder.

PubMed

Sakagami, Hiroshi; Sheng, Hong; Ono, Koki; Komine, Yusuke; Miyadai, Tomoharu; Terada, Yuji; Nakada, Daisuke; Tanaka, Shoji; Matsumoto, Masaru; Yasui, Toshikazu; Watanabe, Koichi; Junye, Jia; Natori, Takenori; Suguro-Kitajima, Madoka; Oizumi, Hiroshi; Oizumi, Takaaki

2016-01-01

Previous studies have shown activity against viruses, bacteria, inflammation and oral lichenoid dysplasia of alkaline extract of the leaves of Sasa senanensis Rehder (SE), suggesting its possible application to oral diseases. In the present study, we performed a small-scale clinical test to investigate whether SE is effective against halitosis and in oral bacterial reduction. A total of 12 volunteers participated in this study. They brushed their teeth immediately after meals three times each day with SE-containing toothpaste (SETP) or placebo toothpaste. Halitosis in the breath and bacterial number on the tongue were measured by commercially available portable apparatuses at a specified time in the morning. Some relationship was observed between halitosis and bacterial number from each individual, especially when those with severe halitosis were included. Repeated experiments demonstrated that SETP significantly reduced halitosis but not the bacterial number on the tongue. The present study provides for the first time the basis for anti-halitosis activity of SE. Copyright © 2016 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved.
Kinetic Reaction Mechanism of Sinapic Acid Scavenging NO2 and OH Radicals: A Theoretical Study

PubMed Central

Lu, Yang; Wang, AiHua; Shi, Peng; Zhang, Hui; Li, ZeSheng

2016-01-01

The mechanism and kinetics underlying reactions between the naturally-occurring antioxidant sinapic acid (SA) and the very damaging ·NO2 and ·OH were investigated through the density functional theory (DFT). Two most possible reaction mechanisms were studied: hydrogen atom transfer (HAT) and radical adduct formation (RAF). Different reaction channels of neutral and anionic sinapic acid (SA-) scavenging radicals in both atmosphere and water medium were traced independently, and the thermodynamic and kinetic parameters were calculated. We find the most active site of SA/SA- scavenging ·NO2 and ·OH is the –OH group in benzene ring by HAT mechanism, while the RAF mechanism for SA/SA- scavenging ·NO2 seems thermodynamically unfavorable. In water phase, at 298 K, the total rate constants of SA eliminating ·NO2 and ·OH are 1.30×108 and 9.20×109 M-1 S-1 respectively, indicating that sinapic acid is an efficient scavenger for both ·NO2 and ·OH. PMID:27622460
Solvent-accessible surface area: How well can be applied to hot-spot detection?

PubMed

Martins, João M; Ramos, Rui M; Pimenta, António C; Moreira, Irina S

2014-03-01

A detailed comprehension of protein-based interfaces is essential for the rational drug development. One of the key features of these interfaces is their solvent accessible surface area profile. With that in mind, we tested a group of 12 SASA-based features for their ability to correlate and differentiate hot- and null-spots. These were tested in three different data sets, explicit water MD, implicit water MD, and static PDB structure. We found no discernible improvement with the use of more comprehensive data sets obtained from molecular dynamics. The features tested were shown to be capable of discerning between hot- and null-spots, while presenting low correlations. Residue standardization such as rel SASAi or rel/res SASAi , improved the features as a tool to predict ΔΔGbinding values. A new method using support machine learning algorithms was developed: SBHD (Sasa-Based Hot-spot Detection). This method presents a precision, recall, and F1 score of 0.72, 0.81, and 0.76 for the training set and 0.91, 0.73, and 0.81 for an independent test set. Copyright © 2013 Wiley Periodicals, Inc.
Low nutation-rate dampers

NASA Technical Reports Server (NTRS)

Tossman, B. E.

1971-01-01

Mission requirements plus spacecraft weight and power constraints often reduce the excitation frequency of a nutation damper below 1 cpm. Since attitude stability is determined by damper performance, maximum effectiveness at low rates is demanded. Presented are design considerations that low-frequency dampers require, along with descriptions of two low-frequency systems: the Direct Measurement Explorer 1 and the Small Astronomy Satellite A (SAS-A).
Parental Partnerships in the Governance of Schools in the Black Townships of Port Elizabeth

ERIC Educational Resources Information Center

Mbokodi, Sindiswa Madgie; Singh, Prakash

2011-01-01

This article focuses on the functionality of school-governing bodies (SGBs) as the voice of parents in the governance of schools. After nearly sixteen years since the South African Schools Act 84 of 1996 (SASA) came into effect, the question that still raises many concerns among stakeholders in education is whether Black parents through their SGBs…
Rational Solutions and Lump Solutions of the Potential YTSF Equation

NASA Astrophysics Data System (ADS)

Sun, Hong-Qian; Chen, Ai-Hua

2017-07-01

By using of the bilinear form, rational solutions and lump solutions of the potential Yu-Toda-Sasa-Fukuyama (YTSF) equation are derived. Dynamics of the fundamental lump solution, n1-order lump solutions, and N-lump solutions are studied for some special cases. We also find some interaction behaviours of solitary waves and one lump of rational solutions.
Multiple biological complex of alkaline extract of the leaves of Sasa senanensis Rehder.

PubMed

Sakagami, Hiroshi; Zhou, Li; Kawano, Michiyo; Thet, May Maw; Tanaka, Shoji; Machino, Mamoru; Amano, Shigeru; Kuroshita, Reina; Watanabe, Shigeru; Chu, Qing; Wang, Qin-Tao; Kanamoto, Taisei; Terakubo, Shigemi; Nakashima, Hideki; Sekine, Keisuke; Shirataki, Yoshiaki; Zhang, Chang-Hao; Uesawa, Yoshihiro; Mohri, Kiminori; Kitajima, Madoka; Oizumi, Hiroshi; Oizumi, Takaaki

2010-01-01

Previous studies have shown anti-inflammatory potential of alkaline extract of the leaves of Sasa senanensis Rehder (SE). The aim of the present study was to clarity the molecular entity of SE, using various fractionation methods. SE inhibited the production of nitric oxide (NO), but not tumour necrosis factor-α by lipopolysaccharide (LPS)-stimulated mouse macrophage-like cells. Lignin carbohydrate complex prepared from SE inhibited the NO production to a comparable extent with SE, whereas chlorophyllin was more active. On successive extraction with organic solvents, nearly 90% of SE components, including chlorophyllin, were recovered from the aqueous layer. Anti-HIV activity of SE was comparable with that of lignin-carbohydrate complex, and much higher than that of chlorophyllin and n-butanol extract fractions. The CYP3A inhibitory activity of SE was significantly lower than that of grapefruit juice and chlorophyllin. Oral administration of SE slightly reduced the number of oral bacteria. When SE was applied to HPLC, nearly 70% of SE components were eluted as a single peak. These data suggest that multiple components of SE may be associated with each other in the native state or after extraction with alkaline solution.
Binding mode prediction and MD/MMPBSA-based free energy ranking for agonists of REV-ERBα/NCoR.

PubMed

Westermaier, Yvonne; Ruiz-Carmona, Sergio; Theret, Isabelle; Perron-Sierra, Françoise; Poissonnet, Guillaume; Dacquet, Catherine; Boutin, Jean A; Ducrot, Pierre; Barril, Xavier

2017-08-01

The knowledge of the free energy of binding of small molecules to a macromolecular target is crucial in drug design as is the ability to predict the functional consequences of binding. We highlight how a molecular dynamics (MD)-based approach can be used to predict the free energy of small molecules, and to provide priorities for the synthesis and the validation via in vitro tests. Here, we study the dynamics and energetics of the nuclear receptor REV-ERBα with its co-repressor NCoR and 35 novel agonists. Our in silico approach combines molecular docking, molecular dynamics (MD), solvent-accessible surface area (SASA) and molecular mechanics poisson boltzmann surface area (MMPBSA) calculations. While docking yielded initial hints on the binding modes, their stability was assessed by MD. The SASA calculations revealed that the presence of the ligand led to a higher exposure of hydrophobic REV-ERB residues for NCoR recruitment. MMPBSA was very successful in ranking ligands by potency in a retrospective and prospective manner. Particularly, the prospective MMPBSA ranking-based validations for four compounds, three predicted to be active and one weakly active, were confirmed experimentally.
Periodicity of microfilariae of human filariasis analysed by a trigonometric method (Aikat and Das).

PubMed

Tanaka, H

1981-04-01

The microfilarial periodicity of human filariae was characterized statistically by fitting the observed change of microfilaria (mf) counts to the formula of a simple harmonic wave using two parameters, the peak hour (K) and periodicity index (D) (Sasa & Tanaka, 1972, 1974). Later Aikat and Das (1976) proposed a simple calculation method using trigonometry (A-D method) to determine the peak hour (K) and periodicity index (P). All data of microfilarial periodicity analysed previously by the method of Sasa and Tanaka (S-T method) were calculated again by the A-D method in the present study to evaluate the latter method. The results of calculations showed that P was not proportional to D and the ratios of P/D were mostly smaller than expected, especially when P or D was small in less periodic forms. The peak hour calculated by the A-D method did not differ much from that calculated by the S-T method. Goodness of fit was improved slightly by the A-K method in two thirds of analysed data. The classification of human filariae in respect of the type of periodicity was, however, changed little by the results calculated by the A-D method.
Lid opening and conformational stability of T1 Lipase is mediated by increasing chain length polar solvents

PubMed Central

Mohamad Ali, Mohd Shukuri; Salleh, Abu Bakar; Rahman, Raja Noor Zaliha Raja Abd; Normi, Yahaya M.; Mohd Shariff, Fairolniza

2017-01-01

The dynamics and conformational landscape of proteins in organic solvents are events of potential interest in nonaqueous process catalysis. Conformational changes, folding transitions, and stability often correspond to structural rearrangements that alter contacts between solvent molecules and amino acid residues. However, in nonaqueous enzymology, organic solvents limit stability and further application of proteins. In the present study, molecular dynamics (MD) of a thermostable Geobacillus zalihae T1 lipase was performed in different chain length polar organic solvents (methanol, ethanol, propanol, butanol, and pentanol) and water mixture systems to a concentration of 50%. On the basis of the MD results, the structural deviations of the backbone atoms elucidated the dynamic effects of water/organic solvent mixtures on the equilibrium state of the protein simulations in decreasing solvent polarity. The results show that the solvent mixture gives rise to deviations in enzyme structure from the native one simulated in water. The drop in the flexibility in H2O, MtOH, EtOH and PrOH simulation mixtures shows that greater motions of residues were influenced in BtOH and PtOH simulation mixtures. Comparing the root mean square fluctuations value with the accessible solvent area (SASA) for every residue showed an almost correspondingly high SASA value of residues to high flexibility and low SASA value to low flexibility. The study further revealed that the organic solvents influenced the formation of more hydrogen bonds in MtOH, EtOH and PrOH and thus, it is assumed that increased intraprotein hydrogen bonding is ultimately correlated to the stability of the protein. However, the solvent accessibility analysis showed that in all solvent systems, hydrophobic residues were exposed and polar residues tended to be buried away from the solvent. Distance variation of the tetrahedral intermediate packing of the active pocket was not conserved in organic solvent systems, which could lead to weaknesses in the catalytic H-bond network and most likely a drop in catalytic activity. The conformational variation of the lid domain caused by the solvent molecules influenced its gradual opening. Formation of additional hydrogen bonds and hydrophobic interactions indicates that the contribution of the cooperative network of interactions could retain the stability of the protein in some solvent systems. Time-correlated atomic motions were used to characterize the correlations between the motions of the atoms from atomic coordinates. The resulting cross-correlation map revealed that the organic solvent mixtures performed functional, concerted, correlated motions in regions of residues of the lid domain to other residues. These observations suggest that varying lengths of polar organic solvents play a significant role in introducing dynamic conformational diversity in proteins in a decreasing order of polarity. PMID:28533982
Two species of Nilothauma Kieffer (Diptera, Chironomidae) from Japan, with description of a new species.

PubMed

Niitsuma, Hiromi

2016-02-16

The male and female adults and pupa of Nilothauma niidaense n. sp. are described and illustrated on the basis of the material collected from a fontal stream in Fukushima Prefecture, Japan. N. sasai Adam & Sæther is treated as a junior synonym of N. hibaratertium Sasa, of which the male is redescribed. The Adam & Sæther key to Nilothauma males is revised.
Molecular dynamics simulation analysis of Focal Adhesive Kinase (FAK) docked with solanesol as an anti-cancer agent

PubMed Central

Daneial, Betty; Joseph, Jacob Paul Vazhappilly; Ramakrishna, Guruprasad

2017-01-01

Focal adhesion kinase (FAK) plays a primary role in regulating the activity of many signaling molecules. Increased FAK expression has been associated in a series of cellular processes like cell migration and survival. FAK inhibition by an anti cancer agent is critical. Therefore, it is of interest to identify, modify, design, improve and develop molecules to inhibit FAK. Solanesol is known to have inhibitory activity towards FAK. However, the molecular principles of its binding with FAK is unknown. Solanesol is a highly flexible ligand (25 rotatable bonds). Hence, ligand-protein docking was completed using AutoDock with a modified contact based scoring function. The FAK-solanesol complex model was further energy minimized and simulated in GROMOS96 (53a6) force field followed by post simulation analysis such as Root mean square deviation (RMSD), root mean square fluctuations (RMSF) and solvent accessible surface area (SASA) calculations to explain solanesol-FAK binding. PMID:29081606
Molecular dynamics simulation analysis of Focal Adhesive Kinase (FAK) docked with solanesol as an anti-cancer agent.

PubMed

Daneial, Betty; Joseph, Jacob Paul Vazhappilly; Ramakrishna, Guruprasad

2017-01-01

Focal adhesion kinase (FAK) plays a primary role in regulating the activity of many signaling molecules. Increased FAK expression has been associated in a series of cellular processes like cell migration and survival. FAK inhibition by an anti cancer agent is critical. Therefore, it is of interest to identify, modify, design, improve and develop molecules to inhibit FAK. Solanesol is known to have inhibitory activity towards FAK. However, the molecular principles of its binding with FAK is unknown. Solanesol is a highly flexible ligand (25 rotatable bonds). Hence, ligand-protein docking was completed using AutoDock with a modified contact based scoring function. The FAK-solanesol complex model was further energy minimized and simulated in GROMOS96 (53a6) force field followed by post simulation analysis such as Root mean square deviation (RMSD), root mean square fluctuations (RMSF) and solvent accessible surface area (SASA) calculations to explain solanesol-FAK binding.
Topological T-duality via Lie algebroids and Q-flux in Poisson-generalized geometry

NASA Astrophysics Data System (ADS)

Asakawa, Tsuguhiko; Muraki, Hisayoshi; Watamura, Satoshi

2015-10-01

It is known that the topological T-duality exchanges H- and F-fluxes. In this paper, we reformulate the topological T-duality as an exchange of two Lie algebroids in the generalized tangent bundle. Then, we apply the same formulation to the Poisson-generalized geometry, which is introduced [T. Asakawa, H. Muraki, S. Sasa and S. Watamura, Int. J. Mod. Phys. A 30, 1550097 (2015), arXiv:1408.2649 [hep-th
Conochironomus (Diptera: Chironomidae) in Asia: new and redescribed species and vouchering issues.

PubMed

Cranston, Peter S

2016-05-09

The presence of the Afro-Australian genus Conochironomus Freeman, 1961 (Diptera: Chironomidae) in Asia has been recognised only informally. An unpublished thesis included Conochironomus from Singapore, and the genus has been keyed from Malaysia without named species. Here, the Sumatran Conochironomus tobaterdecimus (Kikuchi & Sasa, 1980) comb. n. is recorded from Singapore and Thailand. The species is transferred from Sumatendipes Kikuchi & Sasa, 1980, rendering the latter a junior synonym (syn. n.) of Conochironomus Freeman. Conochironomus nuengthai sp. n. and Conochironomus sawngthai sp. n. are described as new to science, based on adult males from Chiang Mai, Thailand. All species conform to existing generic diagnoses for all life stages, with features from male and female genitalia, pupal cephalic tubercles and posterolateral 'spurs' of tergite VIII providing evidence for species distinction. Some larvae are linked to C. tobaterdecimus through molecular barcoding. Variation in other larvae, which clearly belong to Conochironomus and are common throughout Thailand, means that they cannot be segregated to species. Larval habitats include pools in river beds, urban storage reservoirs, drains with moderately high nutrient loadings, and peat swamps. Endochironomus effusus Dutta, 1994 from north-eastern India may be a congener but may differ in adult morphology, thereby precluding formal new combination until discrepancies can be reconciled. Many problems with vouchering taxonomic and molecular material are identified that need to be rectified in the future.
Mechanically Reconfigurable Single-Arm Spiral Antenna Array for Generation of Broadband Circularly Polarized Orbital Angular Momentum Vortex Waves.

PubMed

Li, Long; Zhou, Xiaoxiao

2018-03-23

In this paper, a mechanically reconfigurable circular array with single-arm spiral antennas (SASAs) is designed, fabricated, and experimentally demonstrated to generate broadband circularly polarized orbital angular momentum (OAM) vortex waves in radio frequency domain. With the symmetrical and broadband properties of single-arm spiral antennas, the vortex waves with different OAM modes can be mechanically reconfigurable generated in a wide band from 3.4 GHz to 4.7 GHz. The prototype of the circular array is proposed, conducted, and fabricated to validate the theoretical analysis. The simulated and experimental results verify that different OAM modes can be effectively generated by rotating the spiral arms of single-arm spiral antennas with corresponding degrees, which greatly simplify the feeding network. The proposed method paves a reconfigurable way to generate multiple OAM vortex waves with spin angular momentum (SAM) in radio and microwave satellite communication applications.
DMG-α--a computational geometry library for multimolecular systems.

PubMed

Szczelina, Robert; Murzyn, Krzysztof

2014-11-24

The DMG-α library grants researchers in the field of computational biology, chemistry, and biophysics access to an open-sourced, easy to use, and intuitive software for performing fine-grained geometric analysis of molecular systems. The library is capable of computing power diagrams (weighted Voronoi diagrams) in three dimensions with 3D periodic boundary conditions, computing approximate projective 2D Voronoi diagrams on arbitrarily defined surfaces, performing shape properties recognition using α-shape theory and can do exact Solvent Accessible Surface Area (SASA) computation. The software is written mainly as a template-based C++ library for greater performance, but a rich Python interface (pydmga) is provided as a convenient way to manipulate the DMG-α routines. To illustrate possible applications of the DMG-α library, we present results of sample analyses which allowed to determine nontrivial geometric properties of two Escherichia coli-specific lipids as emerging from molecular dynamics simulations of relevant model bilayers.

Notes on the genus Conchapelopia Fittkau (Diptera: Chironomidae: Tanypodinae) from southern China, with description of a new species.

PubMed

Niitsuma, Hiromi; Tang, Hongqu

2017-02-22

Two interesting species, Conchapelopia togamaculosa Sasa & Okazawa and a new species, Conchapelopia brachiata sp. n., were collected from southern China. The male, pupa and larva of the new species are described, and new distributions of the former species are noted. Although the male of the new species is very distinct from that of the former in the hypopygial median volsella, the pupa and larva stunningly resemble those of the former.
Chigger mites (Acari: Trombiculidae) from Makalu region in Nepal Himalaya, with a description of three new species.

PubMed

Daniel, M; Stekol'nikov, A A

2009-07-01

Three new species of chigger mites, Neotrombicula kounickyi sp. n., Leptotrombidium angkamii sp. n., and Doloisia vlastae sp. n., are described from two species of small mammals collected in the Barun Glacier Valley, Makalu region, Nepal Himalaya. Two species, Trombiculindus mehtai Fernandes et Kulkarni, 2003 and Cheladonta ikaoensis (Sasa et al., 1951) are recorded for the first time in Nepal. Data on altitude distribution of chiggers and their host preferences are given.
Role of Glycans in Cholesteryl Ester Transfer Protein revealed by MD simulation.

PubMed

Hao, Dongxiao; Yang, Zhiwei; Gao, Teng; Tian, Zhiqi; Zhang, Lei; Zhang, Shengli

2018-05-03

Current cholesteryl ester transfer protein (CETP) inhibitors are designed based on the unglycosylated crystal structure, and most of them have failed to cure cardiovascular disease (CVD). It is particularly important for us to investigate the glycosylation structure of CETP (CETP-G) and effect of glycans on the structure and function of CETP. Here, we used a total of 3.0-μs molecular dynamics trajectories of nascent structure of CETP (CETP-N) and CETP-G to study their structural differentiations, to shed new light on the CETP-mediated lipid exchange. In accordance with our simulations and previous mutation studies, relative to CETP-N, CETP-G adopts a more stretched shape with higher hydrophobic and hydrophilic SASA of N-terminal oscillating with larger amplitude, in which Glycan88 provides partial assistance for CEs through the N-terminal. Glycan341 reduces the flexibility of neck flap, with the interference of CEs through the neck region. Besides, Glycan240 reduces the flexibility of Helix-X to interfere the CEs transfer. Glycan396 decreases the flexibility and increases the hydrophobic SASA of C-terminal. Overall, these glycans affect the dynamics and structure of CETP through forming H-bonds with surrounding residues, and the sampled conformations of glycan is also affected by its surrounding residues. Thus, glycans are an integral part of CETP, further studies on the CETP inhibition and treatment of CVD should fully consider the effect of glycans. This article is protected by copyright. All rights reserved. © 2018 Wiley Periodicals, Inc.
Sasa quelpaertensis leaf extract regulates microbial dysbiosis by modulating the composition and diversity of the microbiota in dextran sulfate sodium-induced colitis mice.

PubMed

Yeom, Yiseul; Kim, Bong-Soo; Kim, Se-Jae; Kim, Yuri

2016-11-25

Inflammatory bowel diseases (IBD) are related to a dysfunction of the mucosal immune system and they result from complex interactions between genetics and environmental factors, including lifestyle, diet, and the gut microbiome. Therefore, the effect of Sasa quelpaertensis leaf extract (SQE) on gut microbiota in a dextran sulfate sodium (DSS)-induced colitis mouse model was investigated with pyrosequencing of fecal samples. Three groups of animals were examined: i) a control group, ii) a group that was received 2.5% DSS in their drinking water for 7 days, followed by 7 days of untreated water, and then another 7 days of 2.5% DSS in their drinking water, and iii) a group that was presupplemented with SQE (300 mg/kg body weight) by gavage for two weeks prior to the same DSS treatment schedule described in ii. SQE supplementation alleviated disease activity scores and shortened colon length compared to the other two groups. In the DSS group, the proportion of Bacteroidetes increased, whereas that the proportion of Firmicutes was decreased compared to the control group. SQE supplementation recovered the proportions of Firmicutes and Bacteroidetes back to control levels. Moreover, the diversity of microbiota in the SQE supplementation group higher than that of the DSS group. SQE was found to protect mice from microbial dysbiosis associated with colitis by modulating the microbial composition and diversity of the microbiota present. These results provide valuable insight into microbiota-food component interactions in IBD.
Structural and Kinetic Properties of Graphite Intercalation Compounds

DTIC Science & Technology

1982-08-21

the case of FeCI3 , and Dowel2 for Br2, HNO3 and PdC 2 have investigated rates of intercalation to determine diffusion coefficients. Bardhan et al.18...Chim. 21, 1312 (1954). 17. T. Sasa, Y. Takahashi and T. Mukaibo, Carbon 9, 407 (1971). 18. K. K. Bardhan and D. D. L. Chung, Carbon 18, 313 (1980). 19...S. H. Anderson and D. D. L. Chung, Ext. Abst. Program -- Bienn. Conf. Carbon 15, 361 (1981). 20. K. K. Bardhan and D. D. L. Chung, Carbon 18, 303
A revised subgeneric position for Polypedilum (Probolum) simantokeleum, with description of a new Uresipedilum species in Japan (Diptera: Chironomidae).

PubMed

Yamamoto, Nao; Yamamoto, Masaru

2015-08-11

Two Japanese Polypedilum species including a new species are redescribed and described based on the males. Polypedilum (Probolum) simantokeleum, Sasa, Suzuki et Sakai, 1998, is transferred to the subgenus Uresipedilum. Polypedilum (Uresipedilum) dissimilum sp. nov. is easily distinguished from other members of Uresipedilum by having a T-shaped tergal band. Definition of the subgenus Probolum is briefly discussed: we suggest Probolum should be defined as the species with the superior volsella bearing inner lobe pending adequate larval information.
A contribution to the systematics of Australasian Tanytarsini (Diptera: Chironomidae): first descriptions from New Caledonia.

PubMed

Giłka, Wojciech; Dobosz, Roland

2015-06-26

First specific records of chironomids of the tribe Tanytarsini from New Caledonia based on detailed descriptions of new species are presented. Cladotanytarsus (Cladotanytarsus) stylifer sp. nov. and its closest relatives, i.a. Cladotanytarsus (C.) isigacedeus (Sasa et Suzuki, 2000), comb. nov., known from males bearing extraordinarily elongate hypopygial anal points are diagnosed. Paratanytarsus mirificus sp. nov. is described as adult male with unique structure of its hypopygium and shortened antennae. Diagnostic description of Tanytarsus fuscithorax Skuse, 1889 is also complemented.
HBCU Equipment for AFOSR Project 13RSL012: The Mechanism by which ADP Regulates the Structure and Function of the Protein KaiC

DTIC Science & Technology

2015-05-18

4 , as well as increases the risk of obesity 5-7 , diabetes 8, 9 , heart disease 10 , and cancer 11, 12 . Our lab studies the circadian clock of a...2013) Two Antagonistic Clock-Regulated Histidine Kinases Time the Activation of Circadian Gene Expression. Mol. Cell 50, 288-294. 10.1016/j.molcel...Circadian Clock-associated Histidine Kinase SasA. J. Mol. Biol. 342, 9-17. 10.1016/j.jmb.2004.07.010. 19. Smith R. M., Williams S. B. (2006) Circadian
Impact of point mutation P29S in RAC1 on tumorigenesis.

PubMed

Rajendran, Vidya; Gopalakrishnan, Chandrasekhar; Purohit, Rituraj

2016-11-01

A point mutation (P29S) in the RAS-related C3 botulinum toxin substrate 1 (RAC1) was considered to be a trigger for melanoma, a form of skin cancer with highest mortality rate. In this study, we have investigated the pathogenic role of P29S based on the conformational behavior of RAC1 protein toward guanosine triphosphate (GTP). Molecular interaction, molecular dynamics trajectory analysis (RMSD, RMSF, Rg, SASA, DSSP, and PCA), and shape analysis of binding pocket were performed to analyze the interaction energy and the dynamic behavior of native and mutant RAC1 at the atomic level. Due to this mutation, the RAC1 switch I region acquired more flexibility and, to compensate it, the switch II region becomes rigid in their conformational space, as a result of which the interaction energy of the protein for GTP increased. The overall results strongly implied that the changes in atomic conformation of the switch I and II regions in mutant RAC1 protein were a significant reason for its malignant transformation and tumorigenesis. We raised the opportunity for researchers to design possible therapeutic molecule by considering our findings.
Conformation Analysis of T1 Lipase on Alcohols Solvent using Molecular Dynamics Simulation

NASA Astrophysics Data System (ADS)

Putri, A. M.; Sumaryada, T.; Wahyudi, S. T.

2017-07-01

Biodiesel usually is produced commercially via a transesterification reaction of vegetable oil with alcohol and alkali catalyst. The alkali catalyst has some drawbacks, such as the soap formation during the reaction. T1 Lipase enzyme had been known as a thermostable biocatalyst which is able to produce biodiesel through a cleaner process. In this paper the performance of T1 lipase enzyme as catalyst for transesterification reaction in pure ethanol, methanol, and water solvents were studied using a Molecular Dynamics (MD) Simulation at temperature of 300 K for 10 nanoseconds. The results have shown that in general the conformation of T1 lipase enzyme in methanol is more dynamics as shown by the value of root mean square deviation (RMSD), root mean squared fluctuation (RMSF), and radius of gyration. The highest solvent accessible surface area (SASA) total was also found in methanol due to the contribution of non-polar amino acid in the interior of the protein. Analysis of MD simulation has also revealed that the enzyme structure tend to be more rigid in ethanol environment. The analysis of electrostatic interactions have shown that Glu359-Arg270 salt-bridge pair might hold the key of thermostability of T1 lipase enzyme as shown by its strong and stable binding in all three solvents.
Effect of Sasa veitchii extract on immunostimulating activity of β-glucan (SCG) from culinary-medicinal mushroom Sparassis crispa Wulf.:Fr. (higher Basidiomycetes).

PubMed

Yoshida, Mia; Hida, Toshie H; Takeshita, Kazuo; Tsuboi, Masamichi; Kanamori, Masato; Akachi, Natsuko; Miura, Noriko N; Adachi, Yoshiyuki; Ohno, Naohito

2012-01-01

Fungal β-glucan is a representative pathogen-associated microbial pattern (PAMP) from mushroom, yeast, and fungi, and stimulates innate as well as acquired immune systems. It is a widely used functional food to enhance immunity. Such plant extracts have been known as folk medicines and reported to show various biological activities beneficial to human health, such as anti-tumor, anti-allergic, and anti-inflammatory activities. In the present study, the cooperative effect of bamboo water-soluble methanol precipitation (BWMP), a macromolecular fraction of the hot-water extract of Sasa veitchii (Japanese folk medicine Kumazasa), and the β-glucan from the medicinal mushroom Sparassis crispa (SCG) was analyzed in vitro using DBA/2 mice. The splenocytes from male DBA/2 mice were cultured with BWMP in the presence of SCG, and the responses were assessed by measuring cytokines. BWMP suppressed IFN-γ and GM-CSF production by SCG, but not TNF-α production. To analyze the specificity of the reaction, similar experiments were conducted with BWMP in the presence of bacterial lipopolysaccharide (LPS); however, none of the cytokines were inhibited. Cytokine production of splenocytes by SCG was suggested to be largely dependent on the binding of lymphocytes with dendritic cells. Functions of BWMP were also analyzed by mixed lymphocyte reaction, and IFN-γ production was suppressed. These findings suggested that BWMP modulated the cell-to-cell contact induced by SCG and inhibited cytokine production. It is strongly suggested that the plant extracts modulate the immunostimulating effects of medicinal mushrooms. Cooperative effects of plants and mushrooms would be an important issue for functional foods.
Efficacy of surface disinfectant cleaners against emerging highly resistant gram-negative bacteria

PubMed Central

2014-01-01

Background Worldwide, the emergence of multidrug-resistant gram-negative bacteria is a clinical problem. Surface disinfectant cleaners (SDCs) that are effective against these bacteria are needed for use in high risk areas around patients and on multi-touch surfaces. We determined the efficacy of several SDCs against clinically relevant bacterial species with and without common types of multidrug resistance. Methods Bacteria species used were ATCC strains; clinical isolates classified as antibiotic-susceptible; and multi-resistant clinical isolates from Klebsiella oxytoca, Klebsiella pneumoniae, and Serratia marcescens (all OXA-48 and KPC-2); Acinetobacter baumannii (OXA-23); Pseudomonas aeruginosa (VIM-1); and Achromobacter xylosoxidans (ATCC strain). Experiments were carried out according to EN 13727:2012 in quadruplicate under dirty conditions. The five evaluated SDCs were based on alcohol and an amphoteric substance (AAS), an oxygen-releaser (OR), surface-active substances (SAS), or surface-active-substances plus aldehydes (SASA; two formulations). Bactericidal concentrations of SDCs were determined at two different contact times. Efficacy was defined as a log10 ≥ 5 reduction in bacterial cell count. Results SDCs based on AAS, OR, and SAS were effective against all six species irrespective of the degree of multi-resistance. The SASA formulations were effective against the bacteria irrespective of degree of multi-resistance except for one of the four P. aeruginosa isolates (VIM-1). We found no general correlation between SDC efficacy and degree of antibiotic resistance. Conclusions SDCs were generally effective against gram-negative bacteria with and without multidrug resistance. SDCs are therefore suitable for surface disinfection in the immediate proximity of patients. Single bacterial isolates, however, might have reduced susceptibility to selected biocidal agents. PMID:24885029
Impact of Moist Physics Complexity on Tropical Cyclone Simulations from the Hurricane Weather Research and Forecast System

NASA Astrophysics Data System (ADS)

Kalina, E. A.; Biswas, M.; Newman, K.; Grell, E. D.; Bernardet, L.; Frimel, J.; Carson, L.

2017-12-01

The parameterization of moist physics in numerical weather prediction models plays an important role in modulating tropical cyclone structure, intensity, and evolution. The Hurricane Weather Research and Forecast system (HWRF), the National Oceanic and Atmospheric Administration's operational model for tropical cyclone prediction, uses the Scale-Aware Simplified Arakawa-Schubert (SASAS) cumulus scheme and a modified version of the Ferrier-Aligo (FA) microphysics scheme to parameterize moist physics. The FA scheme contains a number of simplifications that allow it to run efficiently in an operational setting, which includes prescribing values for hydrometeor number concentrations (i.e., single-moment microphysics) and advecting the total condensate rather than the individual hydrometeor species. To investigate the impact of these simplifying assumptions on the HWRF forecast, the FA scheme was replaced with the more complex double-moment Thompson microphysics scheme, which individually advects cloud ice, cloud water, rain, snow, and graupel. Retrospective HWRF forecasts of tropical cyclones that occurred in the Atlantic and eastern Pacific ocean basins from 2015-2017 were then simulated and compared to those produced by the operational HWRF configuration. Both traditional model verification metrics (i.e., tropical cyclone track and intensity) and process-oriented metrics (e.g., storm size, precipitation structure, and heating rates from the microphysics scheme) will be presented and compared. The sensitivity of these results to the cumulus scheme used (i.e., the operational SASAS versus the Grell-Freitas scheme) also will be examined. Finally, the merits of replacing the moist physics schemes that are used operationally with the alternatives tested here will be discussed from a standpoint of forecast accuracy versus computational resources.
The mediating role of Internet addiction in depression, social anxiety, and psychosocial well-being among adolescents in six Asian countries: a structural equation modelling approach.

PubMed

Lai, C M; Mak, K K; Watanabe, H; Jeong, J; Kim, D; Bahar, N; Ramos, M; Chen, S H; Cheng, C

2015-09-01

This study examines the associations of Internet addiction with social anxiety, depression, and psychosocial well-being among Asian adolescents. A self-medication model conceptualizing Internet addiction as a mediating role in relating depression and social anxiety to negative psychosocial well-being was tested. A cross-sectional survey. In the Asian Adolescent Risk Behavior Survey (AARBS), 5366 adolescents aged 12-18 years from six Asian countries (China, Hong Kong, Japan, South Korea, Malaysia, and Philippines) completed a questionnaire with items of the Internet Addiction Test (IAT), Social Anxiety Scale for Adolescents (SAS-A), Center for Epidemiological Studies Depression Scale (CESD), Self-Rated Health of the Nation Outcome Scales for Children and Adolescents (HoNOSCA-SR) in the 2012-2013 school year. Structural equation modelling was used to examine the mediating role of Internet addiction in depression, social anxiety, and subjective psychosocial well-being. Significant differences on the scores of IAT, SAS-A, CESD, and HoNOSCA-SR across the six countries were found. The proposed self-medication model of Internet addiction received satisfactory goodness-of-fit with data of all countries. After the path from social anxiety to Internet addiction had been discarded in the revised model, there was a significant improvement of the goodness-of-fit in the models for Japan, South Korea, and the Philippines. Depression and social anxiety reciprocally influenced, whereas depression associated with poorer psychosocial well-being directly and indirectly through Internet addiction in all six countries. Internet addiction mediated the association between social anxiety and poor psychosocial well-being in China, Hong Kong, and Malaysia. Copyright © 2015 The Royal Society for Public Health. Published by Elsevier Ltd. All rights reserved.
Extended forms of the second law for general time-dependent stochastic processes.

PubMed

Ge, Hao

2009-08-01

The second law of thermodynamics represents a universal principle applicable to all natural processes, physical systems, and engineering devices. Hatano and Sasa have recently put forward an extended form of the second law for transitions between nonequilibrium stationary states [Phys. Rev. Lett. 86, 3463 (2001)]. In this paper we further extend this form to an instantaneous interpretation, which is satisfied by quite general time-dependent stochastic processes including master-equation models and Langevin dynamics without the requirements of the stationarity for the initial and final states. The theory is applied to several thermodynamic processes, and its consistence with the classical thermodynamics is shown.
Vibrational Stark effect spectroscopy at the interface of Ras and Rap1A bound to the Ras binding domain of RalGDS reveals an electrostatic mechanism for protein-protein interaction.

PubMed

Stafford, Amy J; Ensign, Daniel L; Webb, Lauren J

2010-11-25

Electrostatic fields at the interface of the Ras binding domain of the protein Ral guanine nucleotide dissociation stimulator (RalGDS) with the structurally analogous GTPases Ras and Rap1A were measured with vibrational Stark effect (VSE) spectroscopy. Eleven residues on the surface of RalGDS that participate in this protein-protein interaction were systematically mutated to cysteine and subsequently converted to cyanocysteine in order to introduce a nitrile VSE probe in the form of the thiocyanate (SCN) functional group. The measured SCN absorption energy on the monomeric protein was compared with solvent-accessible surface area (SASA) calculations and solutions to the Poisson-Boltzmann equation using Boltzmann-weighted structural snapshots from molecular dynamics simulations. We found a weak negative correlation between SASA and measured absorption energy, indicating that water exposure of protein surface amino acids can be estimated from experimental measurement of the magnitude of the thiocyanate absorption energy. We found no correlation between calculated field and measured absorption energy. These results highlight the complex structural and electrostatic nature of the protein-water interface. The SCN-labeled RalGDS was incubated with either wild-type Ras or wild-type Rap1A, and the formation of the docked complex was confirmed by measurement of the dissociation constant of the interaction. The change in absorption energy of the thiocyanate functional group due to complex formation was related to the change in electrostatic field experienced by the nitrile functional group when the protein-protein interface forms. At some locations, the nitrile experiences the same shift in field when bound to Ras and Rap1A, but at others, the change in field is dramatically different. These differences identify residues on the surface of RalGDS that direct the specificity of RalGDS binding to its in vivo binding partner, Rap1A, through an electrostatic mechanism.
Universal ideal behavior and macroscopic work relation of linear irreversible stochastic thermodynamics

NASA Astrophysics Data System (ADS)

Ma, Yi-An; Qian, Hong

2015-06-01

We revisit the Ornstein-Uhlenbeck (OU) process as the fundamental mathematical description of linear irreversible phenomena, with fluctuations, near an equilibrium. By identifying the underlying circulating dynamics in a stationary process as the natural generalization of classical conservative mechanics, a bridge between a family of OU processes with equilibrium fluctuations and thermodynamics is established through the celebrated Helmholtz theorem. The Helmholtz theorem provides an emergent macroscopic ‘equation of state’ of the entire system, which exhibits a universal ideal thermodynamic behavior. Fluctuating macroscopic quantities are studied from the stochastic thermodynamic point of view and a non-equilibrium work relation is obtained in the macroscopic picture, which may facilitate experimental study and application of the equalities due to Jarzynski, Crooks, and Hatano and Sasa.
DNA Repair Gene (XRCC1) Polymorphism (Arg399Gln) Associated with Schizophrenia in South Indian Population: A Genotypic and Molecular Dynamics Study

PubMed Central

Sujitha, S. P.; Kumar, D. Thirumal; Doss, C. George Priya; Aavula, K.; Ramesh, R.; Lakshmanan, S.; Gunasekaran, S.; Anilkumar, G.

2016-01-01

This paper depicts the first report from an Indian population on the association between the variant Arg399Gln of XRCC1 locus in the DNA repair system and schizophrenia, the debilitating disease that affects 1% of the world population. Genotypic analysis of a total of 523 subjects (260 patients and 263 controls) revealed an overwhelming presence of Gln399Gln in the case subjects against the controls (P < 0.0068), indicating significant level of association of this nsSNP with schizophrenia; the Gln399 allele frequency was also perceptibly more in cases than in controls (p < 0.003; OR = 1.448). The results of the genotypic studies were further validated using pathogenicity and stability prediction analysis employing computational tools [I-Mutant Suite, iStable, PolyPhen2, SNAP, and PROVEAN], with a view toassess the magnitude of deleteriousness of the mutation. The pathogenicity analysis reveals that the nsSNP could be deleterious inasmuch as it could affect the functionality of the gene, and interfere with protein function. Molecular dynamics simulation of 60ns was performed using GROMACS to analyse structural change due to a mutation (Arg399Gln) that was never examined before. RMSD, RMSF, hydrogen bonds, radius of gyration and SASA analysis showedthe existence of asignificant difference between the native and the mutant protein. The present study gives astrong indication that the XRCC1 locus deserves serious attention, as it could be a potential candidatecontributing to the etio-pathogenesis of the disease. PMID:26824244
DNA Repair Gene (XRCC1) Polymorphism (Arg399Gln) Associated with Schizophrenia in South Indian Population: A Genotypic and Molecular Dynamics Study.

PubMed

Sujitha, S P; Kumar, D Thirumal; Doss, C George Priya; Aavula, K; Ramesh, R; Lakshmanan, S; Gunasekaran, S; Anilkumar, G

2016-01-01

This paper depicts the first report from an Indian population on the association between the variant Arg399Gln of XRCC1 locus in the DNA repair system and schizophrenia, the debilitating disease that affects 1% of the world population. Genotypic analysis of a total of 523 subjects (260 patients and 263 controls) revealed an overwhelming presence of Gln399Gln in the case subjects against the controls (P < 0.0068), indicating significant level of association of this nsSNP with schizophrenia; the Gln399 allele frequency was also perceptibly more in cases than in controls (p < 0.003; OR = 1.448). The results of the genotypic studies were further validated using pathogenicity and stability prediction analysis employing computational tools [I-Mutant Suite, iStable, PolyPhen2, SNAP, and PROVEAN], with a view toassess the magnitude of deleteriousness of the mutation. The pathogenicity analysis reveals that the nsSNP could be deleterious inasmuch as it could affect the functionality of the gene, and interfere with protein function. Molecular dynamics simulation of 60ns was performed using GROMACS to analyse structural change due to a mutation (Arg399Gln) that was never examined before. RMSD, RMSF, hydrogen bonds, radius of gyration and SASA analysis showedthe existence of asignificant difference between the native and the mutant protein. The present study gives astrong indication that the XRCC1 locus deserves serious attention, as it could be a potential candidatecontributing to the etio-pathogenesis of the disease.
Cathepsin B Cleavage of vcMMAE-Based Antibody-Drug Conjugate Is Not Drug Location or Monoclonal Antibody Carrier Specific.

PubMed

Gikanga, Benson; Adeniji, Nia S; Patapoff, Thomas W; Chih, Hung-Wei; Yi, Li

2016-04-20

Antibody-drug conjugates (ADCs) require thorough characterization and understanding of product quality attributes. The framework of many ADCs comprises one molecule of antibody that is usually conjugated with multiple drug molecules at various locations. It is unknown whether the drug release rate from the ADC is dependent on drug location, and/or local environment, dictated by the sequence and structure of the antibody carrier. This study addresses these issues with valine-citrulline-monomethylauristatin E (vc-MMAE)-based ADC molecules conjugated at reduced disulfide bonds, by evaluating the cathepsin B catalyzed drug release rate of ADC molecules with different drug distributions or antibody carriers. MMAE drug release rates at different locations on ADC I were compared to evaluate the impact of drug location. No difference in rates was observed for drug released from the V(H), V(L), or C(H)2 domains of ADC I. Furthermore, four vc-MMAE ADC molecules were chosen as substrates for cathepsin B for evaluation of Michaelis-Menten parameters. There was no significant difference in K(M) or k(cat) values, suggesting that different sequences of the antibody carrier do not result in different drug release rates. Comparison between ADCs and small molecules containing vc-MMAE moieties as substrates for cathepsin B suggests that the presence of IgG1 antibody carrier, regardless of its bulkiness, does not impact drug release rate. Finally, a molecular dynamics simulation on ADC II revealed that the val-cit moiety at each of the eight possible conjugation sites was, on average, solvent accessible over 50% of its maximum solvent accessible surface area (SASA) during a 500 ns trajectory. Combined, these results suggest that the cathepsin cleavage sites for conjugated drugs are exposed enough for the enzyme to access and that the drug release rate is rather independent of drug location or monoclonal antibody carrier. Therefore, the distribution of drug conjugation at different sites is not a critical parameter to control in manufacturing of the vc-MMAE-based ADC conjugated at reduced disulfide bonds.

Factors contributing to deep supercooling capability and cold survival in dwarf bamboo (Sasa senanensis) leaf blades.

PubMed

Ishikawa, Masaya; Oda, Asuka; Fukami, Reiko; Kuriyama, Akira

2014-01-01

Wintering Sasa senanensis, dwarf bamboo, is known to employ deep supercooling as the mechanism of cold hardiness in most of its tissues from leaves to rhizomes. The breakdown of supercooling in leaf blades has been shown to proceed in a random and scattered manner with a small piece of tissue surrounded by longitudinal and transverse veins serving as the unit of freezing. The unique cold hardiness mechanism of this plant was further characterized using current year leaf blades. Cold hardiness levels (LT20: the lethal temperature at which 20% of the leaf blades are injured) seasonally increased from August (-11°C) to December (-20°C). This coincided with the increases in supercooling capability of the leaf blades as expressed by the initiation temperature of low temperature exotherms (LTE) detected in differential thermal analyses (DTA). When leaf blades were stored at -5°C for 1-14 days, there was no nucleation of the supercooled tissue units either in summer or winter. However, only summer leaf blades suffered significant injury after prolonged supercooling of the tissue units. This may be a novel type of low temperature-induced injury in supercooled state at subfreezing temperatures. When winter leaf blades were maintained at the threshold temperature (-20°C), a longer storage period (1-7 days) increased lethal freezing of the supercooled tissue units. Within a wintering shoot, the second or third leaf blade from the top was most cold hardy and leaf blades at lower positions tended to suffer more injury due to lethal freezing of the supercooled units. LTE were shifted to higher temperatures (2-5°C) after a lethal freeze-thaw cycle. The results demonstrate that the tissue unit compartmentalized with longitudinal and transverse veins serves as the unit of supercooling and temperature- and time-dependent freezing of the units is lethal both in laboratory freeze tests and in the field. To establish such supercooling in the unit, structural ice barriers such as development of sclerenchyma and biochemical mechanisms to increase the stability of supercooling are considered important. These mechanisms are discussed in regard to ecological and physiological significance in winter survival.
Effects of a windthrow disturbance on the carbon balance of a broadleaf deciduous forest in Hokkaido, Japan

NASA Astrophysics Data System (ADS)

Yamanoi, K.; Mizoguchi, Y.; Utsugi, H.

2015-12-01

Forests play an important role in the terrestrial carbon balance, with most being in a carbon sequestration stage. The net carbon releases that occur result from forest disturbance, and windthrow is a typical disturbance event affecting the forest carbon balance in eastern Asia. The CO2 flux has been measured using the eddy covariance method in a deciduous broadleaf forest (Japanese white birch, Japanese oak, and castor aralia) in Hokkaido, where incidental damage by the strong Typhoon Songda in 2004 occurred. We also used the biometrical method to demonstrate the CO2 flux within the forest in detail. Damaged trees amounted to 40 % of all trees, and they remained on site where they were not extracted by forest management. Gross primary production (GPP), ecosystem respiration (Re), and net ecosystem production were 1350, 975, and 375 g C m-2 yr-1 before the disturbance and 1262, 1359, and -97 g C m-2 yr-1 2 years after the disturbance, respectively. Before the disturbance, the forest was an evident carbon sink, and it subsequently transformed into a net carbon source. Because of increased light intensity at the forest floor, the leaf area index and biomass of the undergrowth (Sasa kurilensis and S. senanensis) increased by factors of 2.4 and 1.7, respectively, in 3 years subsequent to the disturbance. The photosynthesis of Sasa increased rapidly and contributed to the total GPP after the disturbance. The annual GPP only decreased by 6 % just after the disturbance. On the other hand, the annual Re increased by 39 % mainly because of the decomposition of residual coarse-wood debris. The carbon balance after the disturbance was controlled by the new growth and the decomposition of residues. The forest management, which resulted in the dead trees remaining at the study site, strongly affected the carbon balance over the years. When comparing the carbon uptake efficiency at the study site with that at others, including those with various kinds of disturbances, we emphasized the importance of forest management as well as disturbance type in the carbon balance.
Computational insights of K1444N substitution in GAP-related domain of NF1 gene associated with neurofibromatosis type 1 disease: a molecular modeling and dynamics approach.

PubMed

Agrahari, Ashish Kumar; Muskan, Meghana; George Priya Doss, C; Siva, R; Zayed, Hatem

2018-05-27

The NF1 gene encodes for neurofibromin protein, which is ubiquitously expressed, but most highly in the central nervous system. Non-synonymous SNPs (nsSNPs) in the NF1 gene were found to be associated with Neurofibromatosis Type 1 disease, which is characterized by the growth of tumors along nerves in the skin, brain, and other parts of the body. In this study, we used several in silico predictions tools to analyze 16 nsSNPs in the RAS-GAP domain of neurofibromin, the K1444N (K1423N) mutation was predicted as the most pathogenic. The comparative molecular dynamic simulation (MDS; 50 ns) between the wild type and the K1444N (K1423N) mutant suggested a significant change in the electrostatic potential. In addition, the RMSD, RMSF, Rg, hydrogen bonds, and PCA analysis confirmed the loss of flexibility and increase in compactness of the mutant protein. Further, SASA analysis revealed exchange between hydrophobic and hydrophilic residues from the core of the RAS-GAP domain to the surface of the mutant domain, consistent with the secondary structure analysis that showed significant alteration in the mutant protein conformation. Our data concludes that the K1444N (K1423N) mutant lead to increasing the rigidity and compactness of the protein. This study provides evidence of the benefits of the computational tools in predicting the pathogenicity of genetic mutations and suggests the application of MDS and different in silico prediction tools for variant assessment and classification in genetic clinics.
Simulation Based Investigation of Deleterious nsSNPs in ATXN2 Gene and Its Structural Consequence Toward Spinocerebellar Ataxia.

PubMed

Sinha, Siddharth; Verma, Sharad; Singh, Aditi; Somvanshi, Pallavi; Grover, Abhinav

2018-01-01

Spinocerebellar degeneration, termed as ataxia is a neurological disorder of central nervous system, characterized by limb in-coordination and a progressive gait. The patient also demonstrates specific symptoms of muscle weakness, slurring of speech, and decreased vibration senses. Expansion of polyglutamine trinucleotide (CAG) within ATXN2 gene with 35 or more repeats, results in spinocerebellar ataxia type-2. Protein ataxin-2 coded by ATXN2 gene has been reported to have a crucial role in translation of the genetic information through sequestering the histone acetyl transferases (HAT) resulting in a state of hypo-acetylation. In the present study, we have evaluated the outcome for 122 non synonymous single nucleotide polymorphisms (nsSNPs) reported within ATXN2 gene through computational tools such as SIFT, PolyPhen 2.0, PANTHER, I-mutant 2.0, Phd-SNP, Pmut, MutPred. The apo and mutant (L305V and Q339L) form of structures for the ataxin-2 protein were modeled for gaining insights toward 3D spatial arrangement. Further, molecular dynamics simulations and structural analysis were performed to observe the brunt of disease associated nsSNPs toward the strength and secondary properties of ataxin-2 protein structure. Our results showed that, L305V is a highly deleterious and disease causing point substitution. Analysis based on RMSD, RMSF, Rg, SASA, number of hydrogen bonds (NH bonds), covariance matrix trace, projection analysis for eigen vector demonstrated a significant instability and conformation along with rise in mutant flexibility values in comparison to the apo form of ataxin-2 protein. The study provides a blue print of computational methodologies to examine the ataxin-blend SNPs. J. Cell. Biochem. 119: 499-510, 2018. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.
Host-Seeking Behavior of Trombiculid Mites on Vegetation in Relation to Sika Deer.

PubMed

Tsunoda, Takashi; Takahashi, Mamoru

2015-03-01

We collected larval trombiculid mites on vegetation monthly from October 1997 to February 2000, and from the heads of sika deer culled in March 2003 in Boso Peninsula, central Japan. Two species of trombiculid mites, Neotrombicula nogamii Takahashi, Takano, Misumi, and Kikuchi and Leptotrombidium scutellare Nagayo, Miyagawa, Mitamura, and Tamiya, occurred on vegetation. Peak numbers of N. nogamii were found in January, and L. scutellare numbers peaked in November. Both species were collected predominantly on the top of Sasa bamboo stems, where they formed clusters, though N. nogamii preferred heights of 40-50 cm. Furthermore, N. nogamii and Walchia masoni (Asanuma and Saito) were collected from deer. These findings indicate that vegetation is an important substrate for some trombiculid mites awaiting hosts. © The Authors 2015. Published by Oxford University Press on behalf of Entomological Society of America. All rights reserved. For Permissions, please email: journals.permissions@oup.com.
Cladotanytarsus Kieffer (Diptera: Chironomidae): several distinctive species reviewed on the basis of records from Canada and USA.

PubMed

Puchalski, Mateusz; Giłka, Wojciech

2017-03-10

Two species of the genus Cladotanytarsus Kieffer, 1921 are described as adult males, both peculiar in having distinctively elongated hypopygial anal points. The male of Cladotanytarsus bilyji Giłka et Puchalski, sp. nov. (Canada, Manitoba; USA, Ohio) is presumed to be a close relative of C. nigrovittatus (Goetghebuer, 1922). Another unknown Cladotanytarsus species (USA, Illinois and Louisiana) keys with the European C. donmcbeani Langton et McBean, 2010. The intraspecific variability of the male C. acornutus Jacobsen et Bilyj, 2007 is also presented on the basis of new records (Canada, Ontario; USA, South Carolina). Cladotanytarsus males with similarly structured elongate anal points are reviewed, including C. tobaquardecimus Kikuchi et Sasa, 1990, considered a junior synonym (syn. nov.) of C. conversus (Johannsen, 1932). As a compilation of this study, a key to the identification of the adult males of 14 Cladotanytarsus species is provided.
Comparison of volume and surface area nonpolar solvation free energy terms for implicit solvent simulations.

PubMed

Lee, Michael S; Olson, Mark A

2013-07-28

Implicit solvent models for molecular dynamics simulations are often composed of polar and nonpolar terms. Typically, the nonpolar solvation free energy is approximated by the solvent-accessible-surface area times a constant factor. More sophisticated approaches incorporate an estimate of the attractive dispersion forces of the solvent and∕or a solvent-accessible volume cavitation term. In this work, we confirm that a single volume-based nonpolar term most closely fits the dispersion and cavitation forces obtained from benchmark explicit solvent simulations of fixed protein conformations. Next, we incorporated the volume term into molecular dynamics simulations and find the term is not universally suitable for folding up small proteins. We surmise that while mean-field cavitation terms such as volume and SASA often tilt the energy landscape towards native-like folds, they also may sporadically introduce bottlenecks into the folding pathway that hinder the progression towards the native state.
Comparison of volume and surface area nonpolar solvation free energy terms for implicit solvent simulations

NASA Astrophysics Data System (ADS)

Lee, Michael S.; Olson, Mark A.

2013-07-01

Implicit solvent models for molecular dynamics simulations are often composed of polar and nonpolar terms. Typically, the nonpolar solvation free energy is approximated by the solvent-accessible-surface area times a constant factor. More sophisticated approaches incorporate an estimate of the attractive dispersion forces of the solvent and/or a solvent-accessible volume cavitation term. In this work, we confirm that a single volume-based nonpolar term most closely fits the dispersion and cavitation forces obtained from benchmark explicit solvent simulations of fixed protein conformations. Next, we incorporated the volume term into molecular dynamics simulations and find the term is not universally suitable for folding up small proteins. We surmise that while mean-field cavitation terms such as volume and SASA often tilt the energy landscape towards native-like folds, they also may sporadically introduce bottlenecks into the folding pathway that hinder the progression towards the native state.
[ASSOCIATION BETWEEN FOUR SEROTONIC GENES POLYMORPHISM (5HTTL, 5HT1A, 5HT2A, AND MAOA) AND PERSONALITY TRAITS IN WRESTLERS AND CONTROL GROUP].

PubMed

Butovskaya, P R; Lazebnij, O E; Fekhretdionva, D I; Vasil'ev, V A; Prosikova, E A; Lysenko, V V; Udina, I G; Butovskaya, M L

2015-01-01

This study presents the data on the polymorphisms of the serotonin system genes (5-HTTL, 5-HT1A, 5-HT2A, and MAOA) in male and female wrestlers and in the control group. The population genetics analysis of the 5HTTL gene showed the highest frequency of the SS genotype 5-HTTLPR in sportsmen (p = 0.04), as well as the trend toward higher frequency of united genotypes of the locus of 5-HTTLPR VNTR and SNP rs25531--SASA (p = 0.06) in comparison with the control group. As for the polymorphisms for other genes 5-HT1A (rs6295), 5-HT2A (rs6311), and MAOA (VNTR), we found no significant differences between the groups tested. Using the NEO PI-R questionnaire we analyzed the possible correlations between the genotypes and the psychological traits in our samples. It was demonstrated that the athletic success in elite sportsmen was associated with lower openness to experience and higher conscientiousness. The interaction effect of the gender and 5-HT2A on the self-rating for openness to experience, interaction effect of the level of the sport success and 5-HT2A, and the interaction effect of the gender and 5-HT1A genotype on self-reported conscientiousness were observed as a trend.
A comprehensive computational study on pathogenic mis-sense mutations spanning the RING2 and REP domains of Parkin protein.

PubMed

Biswas, Ria; Bagchi, Angshuman

2017-04-30

Various mutations in PARK2 gene, which encodes the protein parkin, are significantly associated with the onset of autosomal recessive juvenile Parkinson (ARJP) in neuronal cells. Parkin is a multi domain protein, the N-terminal part contains the Ubl and the C-terminal part consists of four zinc coordinating domains, viz., RING0, RING1, in between ring (IBR) and RING2. Disease mutations are spread over all the domains of Parkin, although mutations in some regions may affect the functionality of Parkin more adversely. The mutations in the RING2 domain are seen to abolish the neuroprotective E3 ligase activity of Parkin. In this current work, we carried out detailed in silico analysis to study the extent of pathogenicity of mutations spanning the Parkin RING2 domain and the adjoining REP region by SIFT, Mutation Accessor, PolyPhen2, SNPs and GO, GV/GD and I-mutant. To study the structural and functional implications of these mutations on RING2-REP domain of Parkin, we studied the solvent accessibility (SASA/RSA), hydrophobicity, intra-molecular hydrogen bonding profile and domain analysis by various computational tools. Finally, we analysed the interaction energy profiles of the mutants and compared them to the wild type protein using Discovery studio 2.5. By comparing the various analyses it could be safely concluded that except P437L and A379V mutations, all other mutations were potentially deleterious affecting various structural aspects of RING2 domain architecture. This study is based purely on computational approach which has the potential to identify disease mutations and the information could further be used in treatment of diseases and prognosis. Copyright © 2017 Elsevier B.V. All rights reserved.
Response surface optimization of extraction protocols to obtain phenolic rich antioxidant from sea buckthorn and their potential application into model meat system.

PubMed

Wagh, Rajesh V; Chatli, Manish K

2017-05-01

In the present study, processing parameters for the extraction of phenolic rich sea buckthorn seed (SBTE) extract were optimised using response surface method and subjected for in vitro efficacy viz. total phenolic, ABTS, DPPH and SASA activity. The optimised model depicted MeOH as a solvent at 60% concentration level with a reaction time of 20 min and extracting temperature of 55 °C for the highest yield and total phenolic content. The efficacy of different concentration of obtained SBT was evaluated in raw ground pork as a model meat system on the basis of various physico-chemical, microbiological, sensory quality characteristics. Addition of 0.3% SBTE significantly reduced the lipid peroxidation (PV, TBARS and FFA) and improved instrumental colour ( L* , a*, b* ) attributes of raw ground pork during refrigerated storage of 9 days. Results concluded that SBTE at 0.3% level can successfully improve the oxidative stability, microbial, sensory quality attributes in the meat model system.
Phylogenetic variation of phytolith carbon sequestration in bamboos

PubMed Central

Li, Beilei; Song, Zhaoliang; Li, Zimin; Wang, Hailong; Gui, Renyi; Song, Ruisheng

2014-01-01

Phytoliths, the amorphous silica deposited in plant tissues, can occlude organic carbon (phytolith-occluded carbon, PhytOC) during their formation and play a significant role in the global carbon balance. This study explored phylogenetic variation of phytolith carbon sequestration in bamboos. The phytolith content in bamboo varied substantially from 4.28% to 16.42%, with the highest content in Sasa and the lowest in Chimonobambusa, Indocalamus and Acidosasa. The mean PhytOC production flux and rate in China's bamboo forests were 62.83 kg CO2 ha−1 y−1 and 4.5 × 108 kg CO2 y−1, respectively. This implies that 1.4 × 109 kg CO2 would be sequestered in world's bamboo phytoliths because the global bamboo distribution area is about three to four times higher than China's bamboo. Therefore, both increasing the bamboo area and selecting high phytolith-content bamboo species would increase the sequestration of atmospheric CO2 within bamboo phytoliths. PMID:24736571
Molecular variation in the potato cyst nematode, Globodera pallida, in relation to virulence.

PubMed

Blok, V C; Pylypenko, L; Phillips, M S

2006-01-01

The potato cyst nematode Globodera pallida poses a challenge for potato growers. The potato cyst nematodes (PCN) Globodera rostochiensis and G. pallida cause damage valued at over pounds 50m per annum in the U.K. and problems in controlling PCN are growing due to the increase in populations and spread of G. pallida, the lack of many commercially attractive cultivars with resistance to this species and the pressure to reduce nematicide use. Over 60% of potato fields in the U.K. are infected with G. pallida (Minnis et al. 2000). The Scottish Agricultural Science Agency (SASA) figures show that the incidence of both species of PCN on Scottish seed potato land, though low, has been increasing. The proportion of potato land in ware production in Scotland is also increasing and now represents 50% of the potato growing area. This situation potentially increases the risk of the spread of PCN unless it is very carefully monitored and managed.
In silico approaches and proportional odds model towards identifying selective ADAM17 inhibitors from anti-inflammatory natural molecules.

PubMed

Borah, Pallab Kumar; Chakraborty, Sourav; Jha, Anupam N; Rajkhowa, Sanchaita; Duary, Raj Kumar

2016-11-01

ADAM metallopeptidase domain 17 (ADAM17) is an attractive target for the development of new anti-inflammatory drugs. We aimed to identify selective inhibitors of ADAM17 against matrix metalloproteinase enzymes (MMP-1, MMP-2, MMP-3, MMP-7, MMP-8, MMP-9, MMP-13, and MMP-16) which have substantial structural similarity. Target proteins were docked with 29 anti-inflammatory natural molecule ligands and a known selective inhibitor IK682. The ligands were screened based on Lipinski rules, interaction with the ADAM17 active site cavity, and then ranked using the proportional odds model multinomial logistic regression. Silymarin was the most selective inhibitor of ADAM17 exhibiting H-bonding with Glu 406, Gly 349, Glu 398, Asn 447, Tyr 433, and Lys 432. Molecular dynamics simulations were carried out for 10ns. The root mean square deviation (RMSD), root mean squared fluctuations (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA), and H-bonding indicated the induced metastability. A comparison of the principal component analysis revealed that the silymarin complex also explored lesser region compared to IK682 complex. A control study on ADAM17 protein (2OI0) is included. These observations present silymarin (widely present in plants such as milk thistle (Silybum maianum), wild artichokes (Cynara cardunculus), turmeric (Curcuma longa) roots, coriander (Coriandrum sativum) seeds, etc.) as a promising natural template for development of ADAM17 selective drugs. Copyright Â© 2016 Elsevier Inc. All rights reserved.
The acidic pH-induced structural changes in apo-CP43 by spectral methodologies and molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Wang, Wang; Li, Xue; Wang, Qiuying; Zhu, Xixi; Zhang, Qingyan; Du, Linfang

2018-01-01

CP43 is closely associated with the photosystem II and exists the plant thylakoid membranes. The acidic pH-induced structural changes had been investigated by fluorescence spectrum, ANS spectrum, RLS spectrum, energy transfer experiment, acrylamide fluorescence quenching assay and MD simulation. The fluorescence spectrum indicated that the structural changes in acidic pH-induced process were a four-state model, which was nature state (N), partial unfolding state (PU), refolding state (R), and molten-globule state (M), respectively. Analysis of ANS spectrum illustrated that inner hydrophobic core exposed partially to surface below pH 2.0 and inferred also that the molten-globule state existed. The RLS spectrum showed the aggregation of apo-CP43 around the pI (pH 4.5-4.0). The alterations of apo-CP43 secondary structure with different acidic treatments were confirmed by FTIR spectrum. The energy transfer experiment and quenching research demonstrated structural change at pH 4.0 was loosest. The RMSF suggested two terminals played an important function in acidic denaturation process. The distance of two terminals shown slight difference in acidic pH-induced process during the unfolding process, both N-terminal and C-terminal occupied the dominant role. However, the N-terminal accounted for the main part in the refolding process. All kinds of SASA values corresponded to spectral results. The tertiary and secondary structure by MD simulation indicated that the part transmembrane α-helix was destroyed at low pH.
Effects of green tea or Sasa quelpaertensis bamboo leaves on plasma and liver lipids, erythrocyte Na efflux, and platelet aggregation in ovariectomized rats

PubMed Central

Ryou, Sung Hee; Kang, Min Sook; Kim, Kyu Il; Kang, Young Hee

2012-01-01

This study was conducted to investigate the effects of Sasa quelpaertensis bamboo and green tea on plasma and liver lipids, platelet aggregation, and erythrocyte membrane Na channels in ovariectomized (OVX) rats. Thirty female rats were OVX, and ten female rats were sham-operated at the age of 6 weeks. The rats were divided into four groups at the age of 10 weeks and fed the experiment diets: sham-control, OVX-control, OVX-bamboo leaves (10%), or OVX-green tea leaves (10%) for four weeks. Final body weight increased significantly in the OVX groups compared with that in the sham-control, whereas body weight in the OVX-green tea group decreased significantly compared with that in the OVX-control (P < 0.01). High density lipoprotein (HDL)-cholesterol level decreased in all OVX groups compared with that in the sham-control rats (P < 0.05) but without a difference in plasma total cholesterol. Plasma triglycerides in the OVX-green tea group were significantly lower than those in the sham-control or OVX-control group (P < 0.05). Liver triglycerides increased significantly in the OVX-control compared with those in the sham-control (P < 0.01) but decreased significantly in the OVX-green tea group compared with those in the OVX-control or OVX-bamboo group (P < 0.01). Platelet aggregation in both maximum and initial slope tended to be lower in all OVX rats compared with that in the sham-control rats but was not significantly different. Na-K ATPase tended to increase and Na-K cotransport tended to decrease following ovariectomy. Na-K ATPase decreased significantly in the OVX-green tea group compared with that in the OVX-control group (P < 0.01), and Na-K cotransport increased significantly in the OVX-bamboo and OVX-green tea groups compared with that in the OVX-control (P < 0.05). Femoral bone mineral density tended to be lower in OVX rats than that in the sham-control, whereas the green tea and bamboo leaves groups recovered bone density to some extent. The results show that ovariectomy caused an increase in body weight and liver triglycerides, and that green tea was effective for lowering body weight and triglycerides in OVX rats. Ovariectomy induced an increase in Na efflux via Na-K ATPase and a decrease in Na efflux via Na-K cotransport. Furthermore, consumption of green tea and bamboo leaves affected Na efflux channels, controlling electrolyte and body water balance. PMID:22586498
Detection of upward and downward Solar-induced chlorophyll fluorescence emissions at the forest floor in a cool-temperate deciduous broadleaf forest in Japan

NASA Astrophysics Data System (ADS)

Kato, T.; Tsujimoto, K.; Nasahara, K. N.; Akitsu, T.; Murayama, S.; Noda, H.; Muraoka, H.

2016-12-01

Strong representation of Sun-Induced Fluorescence (SIF) for the ecosystem-level photosynthesis activity has been confirmed by satellite studies [Frankenberg et al., 2011; Joiner et al., 2013] and by field studies [Porcar-Castell, 2011, Yang et al., 2015]. However, the lack of taking care of SIF emission below the tree canopy top may underestimate the contribution of sub-canopy and the understory species to total ecosystem CO2dynamics. To examine the potential contribution of SIF emission from lower part of tree ecosystem to total ecosystem SIF emission, the downward SIF from tree canopy and upward SIF from understory were calculated from the spectrum data in a cool temperate forest in in central Japan (36°08'N, 137°25'E, 1420 masl) as well as the upward SIF from canopy top, and the fractional ratios among them are compared on half-hourly and daily bases from 2006 to 2007. The top canopy is dominated by Oak and Birches, and the sub-canopy layer and shrub layers are dominated by Acer, Hydrangea and Viburnum species. The understory is dominated by an evergreen dwarf bamboo Sasa senanensis, and covered partially by the seedlings of oak and maple, and herbaceous species [Muraoka and Koizumi, 2005]. The SIF was estimated from the spectrums of downward and upward irradiances measured at two heights of 18m and 2m above ground by HemiSpherical Spectro-Radiometer, consisting of the spectroradiometer (MS700, Eko inc., Tokyo, Japan) with the FWHM of 10 nm and wavelength interval of 3.3 nm. The SIF around 760nm (O2-A band) was calculated according to the Fraunhofer Line Depth principle with additional arrangements. Our preliminary results show that the SIF emission intensity was kept in the order as canopy upward > canopy downward > understory upward for most of growing season, except for short spring time between snow melt and canopy greening because of the evergreen Sasa bamboo grass at the forest floor. On the other hand, the relative intensities among three SIF emissions seem to change diurnally and seasonally. The temporal changes in these relative SIF emissions would be showed to understand the contributions of ecosystem vertical layers to total SIF emissions, only top layer SIF emission of which is considered by satellites and field observations in previous studies, and to ecosystem photosynthesis (GPP) in this presentation.
Cluster analysis applied to the spatial and temporal variability of monthly rainfall in Mato Grosso do Sul State, Brazil

NASA Astrophysics Data System (ADS)

Teodoro, Paulo Eduardo; de Oliveira-Júnior, José Francisco; da Cunha, Elias Rodrigues; Correa, Caio Cezar Guedes; Torres, Francisco Eduardo; Bacani, Vitor Matheus; Gois, Givanildo; Ribeiro, Larissa Pereira

2016-04-01

The State of Mato Grosso do Sul (MS) located in Brazil Midwest is devoid of climatological studies, mainly in the characterization of rainfall regime and producers' meteorological systems and rain inhibitors. This state has different soil and climatic characteristics distributed among three biomes: Cerrado, Atlantic Forest and Pantanal. This study aimed to apply the cluster analysis using Ward's algorithm and identify those meteorological systems that affect the rainfall regime in the biomes. The rainfall data of 32 stations (sites) of the MS State were obtained from the Agência Nacional de Águas (ANA) database, collected from 1954 to 2013. In each of the 384 monthly rainfall temporal series was calculated the average and applied the Ward's algorithm to identify spatial and temporal variability of rainfall. Bartlett's test revealed only in January homogeneous variance at all sites. Run test showed that there was no increase or decrease in trend of monthly rainfall. Cluster analysis identified five rainfall homogeneous regions in the MS State, followed by three seasons (rainy, transitional and dry). The rainy season occurs during the months of November, December, January, February and March. The transitional season ranges between the months of April and May, September and October. The dry season occurs in June, July and August. The groups G1, G4 and G5 are influenced by South Atlantic Subtropical Anticyclone (SASA), Chaco's Low (CL), Bolivia's High (BH), Low Levels Jet (LLJ) and South Atlantic Convergence Zone (SACZ) and Maden-Julian Oscillation (MJO). Group G2 is influenced by Upper Tropospheric Cyclonic Vortex (UTCV) and Front Systems (FS). The group G3 is affected by UTCV, FS and SACZ. The meteorological systems' interaction that operates in each biome and the altitude causes the rainfall spatial and temporal diversity in MS State.
Investigating the Structural Impacts of I64T and P311S Mutations in APE1-DNA Complex: A Molecular Dynamics Approach

PubMed Central

Doss, C. George Priya; NagaSundaram, N.

2012-01-01

Background Elucidating the molecular dynamic behavior of Protein-DNA complex upon mutation is crucial in current genomics. Molecular dynamics approach reveals the changes on incorporation of variants that dictate the structure and function of Protein-DNA complexes. Deleterious mutations in APE1 protein modify the physicochemical property of amino acids that affect the protein stability and dynamic behavior. Further, these mutations disrupt the binding sites and prohibit the protein to form complexes with its interacting DNA. Principal Findings In this study, we developed a rapid and cost-effective method to analyze variants in APE1 gene that are associated with disease susceptibility and evaluated their impacts on APE1-DNA complex dynamic behavior. Initially, two different in silico approaches were used to identify deleterious variants in APE1 gene. Deleterious scores that overlap in these approaches were taken in concern and based on it, two nsSNPs with IDs rs61730854 (I64T) and rs1803120 (P311S) were taken further for structural analysis. Significance Different parameters such as RMSD, RMSF, salt bridge, H-bonds and SASA applied in Molecular dynamic study reveals that predicted deleterious variants I64T and P311S alters the structure as well as affect the stability of APE1-DNA interacting functions. This study addresses such new methods for validating functional polymorphisms of human APE1 which is critically involved in causing deficit in repair capacity, which in turn leads to genetic instability and carcinogenesis. PMID:22384055
Structural and dynamical insight into thermally induced functional inactivation of firefly luciferase

PubMed Central

Jazayeri, Fatemeh S.; Hosseinkhani, Saman

2017-01-01

Luciferase is the key component of light production in bioluminescence process. Extensive and advantageous application of this enzyme in biotechnology is restricted due to its low thermal stability. Here we report the effect of heating up above Tm on the structure and dynamical properties of luciferase enzyme compared to temperature at 298 K. In this way we demonstrate that the number of hydrogen bonds between N- and C-domain is increased for the free enzyme at 325 K. Increased inter domain hydrogen bonds by three at 325 K suggests that inter domain contact is strengthened. The appearance of simultaneous strong salt bridge and hydrogen bond between K529 and D422 and increased existence probability between R533 and E389 could mechanistically explain stronger contact between N- and C-domain. Mutagenesis studies demonstrated the importance of K529 and D422 experimentally. Also the significant reduction in SASA for experimentally important residues K529, D422 and T343 which are involved in active site region was observed. Principle component analysis (PCA) in our study shows that the dynamical behavior of the enzyme is changed upon heating up which mainly originated from the change of motion modes and associated extent of those motions with respect to 298 K. These findings could explain why heating up of the enzyme or thermal fluctuation of protein conformation reduces luciferase activity in course of time as a possible mechanism of thermal functional inactivation. According to these results we proposed two strategies to improve thermal stability of functional luciferase. PMID:28672033

Molecular dynamic simulation of Trastuzumab F(ab’)2 structure in corporation with HER2 as a theranostic agent of breast cancer

NASA Astrophysics Data System (ADS)

Hermanto, S.; Yusuf, M.; Mutalib, A.; Hudiyono, S.

2017-05-01

Trastuzumab as intact IgG are well researched for theranostic agent in HER2 overexpressed breast cancer. However, due to the relatively large of molecules it is slowly moved and weak penetration of the target cells. Fragmentation of trastzumab has been developed by pepsin cleavages to get the F(ab’)2 fragments. To observe the stability and accessibility of F(ab’)2 structure in corporation with HER2 (human epidermal growth factor receptor-2), the structure of antibody modeling had been developed with 1IGT as a template. Molecular dynamics (MD) of the F(ab’)2 structure simulation has been done in the aqueous phase with AMBER trajectories for 20 ns. Computational visualization by VMD (Visual Molecular Dynamics) were applied to identify binding site interaction details between trastuzumab F(ab’)2 and HER2 receptor. The results of MD simulations indicated that the fragmentation of trastuzumab F(ab’)2 did not change the structure and conformation of F(ab’)2 as a whole, especially in the CDR (Complementarity Determining Region) area. SASA (solvent accessibility surface area) analysis on lysine residues showed that formation of conjugate DOTA-F(ab’)2 predicted occur on outside of the CDR regions so its not interfered with binding affinity for the HER2 receptor. The molecular dynamic simulation of DOTA-F(ab’)2 with HER2 receptor in aqueous system generated ΔGbinding more highly (15.5066 kkal/mol) than positive control HER2-Fab (-45.1446 kkal/mol).
Whartonacarus floridensis sp. nov. (Acari: Trombiculidae), with a taxonomic review and the first record of Whartonacarus chiggers in the continental United States.

PubMed

Mertins, James W; Hanson, Britta A; Corn, Joseph L

2009-11-01

Among several unusual species collected during surveillance of ectoparasites on wildlife hosts in the southeastern United States and Caribbean Region, the larvae of a new species of Whartonacarus were encountered in 2003 on a cattle egret, Bubulcus ibis (L.), in the Florida Keys. This is the first record for a member of Whartonacarus in the continental United States. The mite is described and named as Whartonacarus floridensis Mertins, and the possible significance of this discovery with respect to the "tropical bont tick," Amblyomma variegatum (F.), is discussed. A brief taxonomic review of Whartonacarus raises questions about the putative synonymy of Whartonacarus nativitatis (Hoffmann) and Whartonacarus thompsoni (Brennan) and suggests that Whartonacarus shiraii (Sasa et al.) may include two distinct taxa. Whartonacarus is redefined, and a revised key to the known taxa is provided. Toritrombicula oceanica Brennan & Amerson is placed in the genus Whartonacarus. Also, Whartonacarus palenquensis (Hoffman) is rejected as a member of this genus and placed in its own new genus, Longisetacarus Mertins.
Support of selected X-ray studies to be performed using data from the Uhuru (SAS-A) satellite

NASA Technical Reports Server (NTRS)

Garmire, G. P.

1976-01-01

A new measurement of the diffuse X-ray emission sets more stringent upper limits on the fluctuations of the background and on the number counts of X-ray sources with absolute value of b 20 deg than previous measurements. A random sample of background data from the Uhuru satellite gives a relative fluctuation in excess of statistics of 2.0% between 2.4 and 6.9 keV. The hypothesis that the relative fluctuation exceeds 2.9% can be rejected at the 90% confidence level. No discernable energy dependence is evident in the fluctuations in the pulse height data, when separated into three energy channels of nearly equal width from 1.8 to 10.0 keV. The probability distribution of fluctuations was convolved with the photon noise and cosmic ray background deviation (obtained from the earth-viewing data) to yield the differential source count distribution for high latitude sources. Results imply that a maximum of 160 sources could be between 1.7 and 5.1 x 10 to the -11 power ergs/sq cm/sec (1-3 Uhuru counts).
Seasonal changes of the mineral contents in the rumen of wild Yeso sika deer (Cervus nippon yesoensis).

PubMed

Hayashida, Maki; Souma, Kousaku; Hanagata, Osamu; Okamoto, Masayo; Masuko, Takayoshi

2012-03-01

The rumen contents were collected from 36 wild Yeso sika deer (Cervus nippon yesoensis) captured by deer culling or by hunting in the spring, summer, autumn and winter in Hokkaido, Japan. Botanical classification was conducted, and the contents of mineral (calcium (Ca), phosphorus (P), potassium (K), sodium (Na), iron (Fe), copper (Cu) and zinc (Zn)) were measured. The animals were captured around pastures or fallow field areas in the Kushiro area. The rumen contents consisted of grasses and Sasa sp. leaves regardless of the season. Leaves and bark were ingested in the spring, autumn and winter. The macro-mineral contents in the rumen showed seasonal changes. In the summer, the Ca, K and P contents were high, and the Na content was low. There were no seasonal changes in the Fe content. The P, Na and Fe contents were higher than the animals' requirements. In a future survey, it is needed to determine the mineral contents of the food ingested by wild Yeso sika deer. © 2011 The Authors. Animal Science Journal © 2011 Japanese Society of Animal Science.
Cratering Equations for Zinc Orthotitanate Coated Aluminum

NASA Technical Reports Server (NTRS)

Hyde, James; Christiansen, Eric; Liou, Jer-Chyi; Ryan, Shannon

2009-01-01

The final STS-125 servicing mission (SM4) to the Hubble Space Telescope (HST) in May of 2009 saw the return of the 2nd Wide Field Planetary Camera (WFPC2) aboard the shuttle Discovery. This hardware had been in service on HST since it was installed during the SM1 mission in December of 1993 yielding one of the longest low Earth orbit exposure times (15.4 years) of any returned space hardware. The WFPC2 is equipped with a 0.8 x 2.2 m radiator for thermal control of the camera electronics (Figure 1). The space facing surface of the 4.1 mm thick aluminum radiator is coated with Z93 zinc orthotitanate thermal control paint with a nominal thickness of 0.1 0.2 mm. Post flight inspections of the radiator panel revealed hundreds of micrometeoroid/orbital debris (MMOD) impact craters ranging in size from less than 300 to nearly 1000 microns in diameter. The Z93 paint exhibited large spall areas around the larger impact sites (Figure 2) and the craters observed in the 6061-T651 aluminum had a different shape than those observed in uncoated aluminum. Typical hypervelocity impact craters in aluminum have raised lips around the impact site. The craters in the HST radiator panel had suppressed crater lips, and in some cases multiple craters were present instead of a single individual crater. Humes and Kinard observed similar behavior after the WFPC1 post flight inspection and assumed the Z93 coating was acting like a bumper in a Whipple shield. Similar paint behavior (spall) was also observed by Bland2 during post flight inspection of the International Space Station (ISS) S-Band Antenna Structural Assembly (SASA) in 2008. The SASA, with similar Z93 coated aluminum, was inspected after nearly 4 years of exposure on the ISS. The multi-crater phenomena could be a function of the density, composition, or impact obliquity angle of the impacting particle. For instance, a micrometeoroid particle consisting of loosely bound grains of material could be responsible for creating the multiple craters. Samples were obtained from the HST largest craters for examination by electron microscope equipped with x-ray spectrometers to determine impactor source (micrometeoroid or orbital debris). In an attempt to estimate the MMOD particle diameters that produced these craters, this paper will present equations for spall diameter, crater depth and crater diameter in Z93 coated aluminum. The equations will be based on hypervelocity impact tests of Z93 painted aluminum at the NASA White Sands Test Facility. Equations inputs for velocities beyond the testable regime are expected from hydrocode simulations of Z93 coated aluminum using CTH and ANSYS AUTODYN.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis.

PubMed

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. http://www.cemb.edu.pk/sw.html RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language.
Disruption of redox catalytic functions of peroxiredoxin-thioredoxin complex in Mycobacterium tuberculosis H37Rv using small interface binding molecules.

PubMed

Gurung, Arun Bahadur; Das, Amit Kumar; Bhattacharjee, Atanu

2017-04-01

Mycobacterium tuberculosis has distinctive ability to detoxify various microbicidal superoxides and hydroperoxides via a redox catalytic cycle involving thiol reductants of peroxiredoxin (Prx) and thioredoxin (Trx) systems which has conferred on it resistance against oxidative killing and survivability within host. We have used computational approach to disrupt catalytic functions of Prx-Trx complex which can possibly render the pathogen vulnerable to oxidative killing in the host. Using protein-protein docking method, we have successfully constructed the Prx-Trx complex. Statistics of interface region revealed contact area of each monomer less than 1500Å 2 and enriched in polar amino acids indicating transient interaction between Prx and Trx. We have identified ZINC40139449 as a potent interface binding molecule through virtual screening of drug-like compounds from ZINC database. Molecular dynamics (MD) simulation studies showed differences in structural properties of Prx-Trx complex both in apo and ligand bound states with regard to root mean square deviation (RMSD), radius of gyration (Rg), root mean square fluctuations (RMSF), solvent accessible surface area (SASA) and number of hydrogen bonds (NHBs). Interestingly, we found stability of two conserved catalytic residues Cys61 and Cys174 of Prx and conserved catalytic motif, WCXXC of Trx upon binding of ZINC40139449. The time dependent displacement study reveals that the compound is quite stable in the interface binding region till 30ns of MD simulation. The structural properties were further validated by principal component analysis (PCA). We report ZINC40139449 as promising lead which can be further evaluated by in vitro or in vivo enzyme inhibition assays. Copyright © 2016 Elsevier Ltd. All rights reserved.
Addressing gender inequality and intimate partner violence as critical barriers to an effective HIV response in sub-Saharan Africa.

PubMed

Watts, Charlotte; Seeley, Janet

2014-01-01

In Africa, women and girls represent 57% of people living with HIV, with gender inequality and violence being an important structural determinant of their vulnerability. This commentary draws out lessons for a more effective combination response to the HIV epidemic from three papers recently published in JIAS. Hatcher and colleagues present qualitative data from women attending ante-natal clinics in Johannesburg, describing how HIV diagnosis during pregnancy and subsequent partner disclosure are common triggers for violence within relationships. The authors describe the challenges women face in adhering to medication or using services. Kyegombe and colleagues present a secondary analysis of a randomized controlled trial in Uganda of SASA! - a community violence prevention programme. Along with promising community impacts on physical partner violence, significantly lower levels of sexual concurrency, condom use and HIV testing were reported by men in intervention communities. Remme and her colleagues present a systematic review of evidence on the costs and cost-effectiveness of gender-responsive HIV interventions. The review identified an ever-growing evidence base, but a paucity of accompanying economic analyses, making it difficult to assess the costs or value for money of gender-focused programmes. There is a need to continue to accumulate evidence on the effectiveness and costs of different approaches to addressing gender inequality and violence as part of a combination HIV response. A clearer HIV-specific and broader synergistic vision of financing and programming needs to be developed, to ensure that the potential synergies between HIV-specific and broader gender-focused development investments can be used to best effect to address vulnerability of women and girls to both violence and HIV.
Addressing gender inequality and intimate partner violence as critical barriers to an effective HIV response in sub-Saharan Africa

PubMed Central

Watts, Charlotte; Seeley, Janet

2014-01-01

Introduction In Africa, women and girls represent 57% of people living with HIV, with gender inequality and violence being an important structural determinant of their vulnerability. This commentary draws out lessons for a more effective combination response to the HIV epidemic from three papers recently published in JIAS. Discussion Hatcher and colleagues present qualitative data from women attending ante-natal clinics in Johannesburg, describing how HIV diagnosis during pregnancy and subsequent partner disclosure are common triggers for violence within relationships. The authors describe the challenges women face in adhering to medication or using services. Kyegombe and colleagues present a secondary analysis of a randomized controlled trial in Uganda of SASA! – a community violence prevention programme. Along with promising community impacts on physical partner violence, significantly lower levels of sexual concurrency, condom use and HIV testing were reported by men in intervention communities. Remme and her colleagues present a systematic review of evidence on the costs and cost-effectiveness of gender-responsive HIV interventions. The review identified an ever-growing evidence base, but a paucity of accompanying economic analyses, making it difficult to assess the costs or value for money of gender-focused programmes. Conclusions There is a need to continue to accumulate evidence on the effectiveness and costs of different approaches to addressing gender inequality and violence as part of a combination HIV response. A clearer HIV-specific and broader synergistic vision of financing and programming needs to be developed, to ensure that the potential synergies between HIV-specific and broader gender-focused development investments can be used to best effect to address vulnerability of women and girls to both violence and HIV. PMID:25499456
Computational screening and molecular dynamics simulation of disease associated nsSNPs in CENP-E.

PubMed

Kumar, Ambuj; Purohit, Rituraj

2012-01-01

Aneuploidy and chromosomal instability (CIN) are hallmarks of most solid tumors. Mutations in centroemere proteins have been observed in promoting aneuploidy and tumorigenesis. Recent studies reported that Centromere-associated protein-E (CENP-E) is involved in inducing cancers. In this study we investigated the pathogenic effect of 132 nsSNPs reported in CENP-E using computational platform. Y63H point mutation found to be associated with cancer using SIFT, Polyphen, PhD-SNP, MutPred, CanPredict and Dr. Cancer tools. Further we investigated the binding affinity of ATP molecule to the CENP-E motor domain. Complementarity scores obtained from docking studies showed significant loss in ATP binding affinity of mutant structure. Molecular dynamics simulation was carried to examine the structural consequences of Y63H mutation. Root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (R(g)), solvent accessibility surface area (SASA), energy value, hydrogen bond (NH Bond), eigenvector projection, trace of covariance matrix and atom density analysis results showed notable loss in stability for mutant structure. Y63H mutation was also shown to disrupt the native conformation of ATP binding region in CENP-E motor domain. Docking studies for remaining 18 mutations at 63rd residue position as well as other two computationally predicted disease associated mutations S22L and P69S were also carried to investigate their affect on ATP binding affinity of CENP-E motor domain. Our study provided a promising computational methodology to study the tumorigenic consequences of nsSNPs that have not been characterized and clear clue to the wet lab scientist. Copyright © 2012 Elsevier B.V. All rights reserved.
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

PubMed

Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

2018-05-02

Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
RDNAnalyzer: A tool for DNA secondary structure prediction and sequence analysis

PubMed Central

Afzal, Muhammad; Shahid, Ahmad Ali; Shehzadi, Abida; Nadeem, Shahid; Husnain, Tayyab

2012-01-01

RDNAnalyzer is an innovative computer based tool designed for DNA secondary structure prediction and sequence analysis. It can randomly generate the DNA sequence or user can upload the sequences of their own interest in RAW format. It uses and extends the Nussinov dynamic programming algorithm and has various application for the sequence analysis. It predicts the DNA secondary structure and base pairings. It also provides the tools for routinely performed sequence analysis by the biological scientists such as DNA replication, reverse compliment generation, transcription, translation, sequence specific information as total number of nucleotide bases, ATGC base contents along with their respective percentages and sequence cleaner. RDNAnalyzer is a unique tool developed in Microsoft Visual Studio 2008 using Microsoft Visual C# and Windows Presentation Foundation and provides user friendly environment for sequence analysis. It is freely available. Availability http://www.cemb.edu.pk/sw.html Abbreviations RDNAnalyzer - Random DNA Analyser, GUI - Graphical user interface, XAML - Extensible Application Markup Language. PMID:23055611
Molecular Basis for Structural Heterogeneity of an Intrinsically Disordered Protein Bound to a Partner by Combined ESI-IM-MS and Modeling

NASA Astrophysics Data System (ADS)

D'Urzo, Annalisa; Konijnenberg, Albert; Rossetti, Giulia; Habchi, Johnny; Li, Jinyu; Carloni, Paolo; Sobott, Frank; Longhi, Sonia; Grandori, Rita

2015-03-01

Intrinsically disordered proteins (IDPs) form biologically active complexes that can retain a high degree of conformational disorder, escaping structural characterization by conventional approaches. An example is offered by the complex between the intrinsically disordered NTAIL domain and the phosphoprotein X domain (PXD) from measles virus (MeV). Here, distinct conformers of the complex are detected by electrospray ionization-mass spectrometry (ESI-MS) and ion mobility (IM) techniques yielding estimates for the solvent-accessible surface area (SASA) in solution and the average collision cross-section (CCS) in the gas phase. Computational modeling of the complex in solution, based on experimental constraints, provides atomic-resolution structural models featuring different levels of compactness. The resulting models indicate high structural heterogeneity. The intermolecular interactions are predominantly hydrophobic, not only in the ordered core of the complex, but also in the dynamic, disordered regions. Electrostatic interactions become involved in the more compact states. This system represents an illustrative example of a hydrophobic complex that could be directly detected in the gas phase by native mass spectrometry. This work represents the first attempt to modeling the entire NTAIL domain bound to PXD at atomic resolution.
The Anne Frank Haven in an Israeli Kibbutz.

PubMed

Dror, Y

1995-01-01

The Anne Frank Haven, founded in 1956, in the Israeli Kibbutz Sasa provides a unique educational program for coping with muticultural and integration problems. It is a holistic, regional junior and senior high school system within the holistic community of three kibbutzim. The Haven has been the subject of much research into "Moral Development," carried out by Wolins (1969, 1971), and mainly by Kohlberg (1971), his doctoral students Reimer and Snarey and other colleagues. In the seventies and eighties they used the Kibbutz example as a model for the "Just Community" approach. In the early nineties, an Israeli group evaluated the success of the program and its rationale, taking into consideration all the "educational factors" of the community, in the Haven, and in the kibbutzim around it. This article offers a comprehensive picture of the Kohlbergian moral-developmental research at the Anne Frank Haven, including all the relevant references and evaluations of the Haven as a part of the "Just Community" approach. It concludes with a suggestion for another approach--"Community Education" research in the same Haven--as an example of present and future studies in the area of "Moral" and "Values" education.
An isolate of Potato Virus X capsid protein from N. benthamiana: Insights from homology modeling and molecular dynamics simulation.

PubMed

Esfandiari, Neda; Sefidbakht, Yahya

2018-05-17

Since Potato Virus X (PVX) is easily transmitted mechanically between their hosts, its control is difficult. We have previously reported new isolate of this virus (PVX-Iran, GenBank Accession number FJ461343). However, the molecular basis of resistance breaking activity and its relation to capsid protein structure are still not well-understood. SDS-PAGE, ELISA, Western blot and RT-PCR molecular examinations were performed on the inoculated plants Nicotiana benthamiana. The pathological symptoms were related to the PVX isolate. The capsid protein (CP) structure were modeled based on homology and subjected to three independent 80 ns molecular dynamics minimization (GROMACS, OPLS force field) in the SPC water box. The RMSD, RMSF, SASA, and electrostatic properties were retrieved from the trajectories. Flexibility and hydrophilic nature of the N-terminal residues (1-34) of solvated CP could be observed in conformational changes upon minimization. The obtained structure was then docked with NbPCIP1 using ClusPro 2.0. The strong binding affinity of these two proteins (≈-16.0 Kcal mol -1 ) represents the formation of inclusion body and hence appearance of the symptoms. Copyright © 2018 Elsevier B.V. All rights reserved.
Subgrouping Automata: automatic sequence subgrouping using phylogenetic tree-based optimum subgrouping algorithm.

PubMed

Seo, Joo-Hyun; Park, Jihyang; Kim, Eun-Mi; Kim, Juhan; Joo, Keehyoung; Lee, Jooyoung; Kim, Byung-Gee

2014-02-01

Sequence subgrouping for a given sequence set can enable various informative tasks such as the functional discrimination of sequence subsets and the functional inference of unknown sequences. Because an identity threshold for sequence subgrouping may vary according to the given sequence set, it is highly desirable to construct a robust subgrouping algorithm which automatically identifies an optimal identity threshold and generates subgroups for a given sequence set. To meet this end, an automatic sequence subgrouping method, named 'Subgrouping Automata' was constructed. Firstly, tree analysis module analyzes the structure of tree and calculates the all possible subgroups in each node. Sequence similarity analysis module calculates average sequence similarity for all subgroups in each node. Representative sequence generation module finds a representative sequence using profile analysis and self-scoring for each subgroup. For all nodes, average sequence similarities are calculated and 'Subgrouping Automata' searches a node showing statistically maximum sequence similarity increase using Student's t-value. A node showing the maximum t-value, which gives the most significant differences in average sequence similarity between two adjacent nodes, is determined as an optimum subgrouping node in the phylogenetic tree. Further analysis showed that the optimum subgrouping node from SA prevents under-subgrouping and over-subgrouping. Copyright © 2013. Published by Elsevier Ltd.
A combination of LongSAGE with Solexa sequencing is well suited to explore the depth and the complexity of transcriptome

PubMed Central

Hanriot, Lucie; Keime, Céline; Gay, Nadine; Faure, Claudine; Dossat, Carole; Wincker, Patrick; Scoté-Blachon, Céline; Peyron, Christelle; Gandrillon, Olivier

2008-01-01

Background "Open" transcriptome analysis methods allow to study gene expression without a priori knowledge of the transcript sequences. As of now, SAGE (Serial Analysis of Gene Expression), LongSAGE and MPSS (Massively Parallel Signature Sequencing) are the mostly used methods for "open" transcriptome analysis. Both LongSAGE and MPSS rely on the isolation of 21 pb tag sequences from each transcript. In contrast to LongSAGE, the high throughput sequencing method used in MPSS enables the rapid sequencing of very large libraries containing several millions of tags, allowing deep transcriptome analysis. However, a bias in the complexity of the transcriptome representation obtained by MPSS was recently uncovered. Results In order to make a deep analysis of mouse hypothalamus transcriptome avoiding the limitation introduced by MPSS, we combined LongSAGE with the Solexa sequencing technology and obtained a library of more than 11 millions of tags. We then compared it to a LongSAGE library of mouse hypothalamus sequenced with the Sanger method. Conclusion We found that Solexa sequencing technology combined with LongSAGE is perfectly suited for deep transcriptome analysis. In contrast to MPSS, it gives a complex representation of transcriptome as reliable as a LongSAGE library sequenced by the Sanger method. PMID:18796152
MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

DOE Office of Scientific and Technical Information (OSTI.GOV)

White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.

MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).
High-Resolution Melting Analysis for Rapid Detection of Sequence Type 131 Escherichia coli.

PubMed

Harrison, Lucas B; Hanson, Nancy D

2017-06-01

Escherichia coli isolates belonging to the sequence type 131 (ST131) clonal complex have been associated with the global distribution of fluoroquinolone and β-lactam resistance. Whole-genome sequencing and multilocus sequence typing identify sequence type but are expensive when evaluating large numbers of samples. This study was designed to develop a cost-effective screening tool using high-resolution melting (HRM) analysis to differentiate ST131 from non-ST131 E. coli in large sample populations in the absence of sequence analysis. The method was optimized using DNA from 12 E. coli isolates. Singleplex PCR was performed using 10 ng of DNA, Type-it HRM buffer, and multilocus sequence typing primers and was followed by multiplex PCR. The amplicon sizes ranged from 630 to 737 bp. Melt temperature peaks were determined by performing HRM analysis at 0.1°C resolution from 50 to 95°C on a Rotor-Gene Q 5-plex HRM system. Derivative melt curves were compared between sequence types and analyzed by principal component analysis. A blinded study of 191 E. coli isolates of ST131 and unknown sequence types validated this methodology. This methodology returned 99.2% specificity (124 true negatives and 1 false positive) and 100% sensitivity (66 true positives and 0 false negatives). This HRM methodology distinguishes ST131 from non-ST131 E. coli without sequence analysis. The analysis can be accomplished in about 3 h in any laboratory with an HRM-capable instrument and principal component analysis software. Therefore, this assay is a fast and cost-effective alternative to sequencing-based ST131 identification. Copyright © 2017 Harrison and Hanson.
Analysis of Illumina Microbial Assemblies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clum, Alicia; Foster, Brian; Froula, Jeff

2010-05-28

Since the emerging of second generation sequencing technologies, the evaluation of different sequencing approaches and their assembly strategies for different types of genomes has become an important undertaken. Next generation sequencing technologies dramatically increase sequence throughput while decreasing cost, making them an attractive tool for whole genome shotgun sequencing. To compare different approaches for de-novo whole genome assembly, appropriate tools and a solid understanding of both quantity and quality of the underlying sequence data are crucial. Here, we performed an in-depth analysis of short-read Illumina sequence assembly strategies for bacterial and archaeal genomes. Different types of Illumina libraries as wellmore » as different trim parameters and assemblers were evaluated. Results of the comparative analysis and sequencing platforms will be presented. The goal of this analysis is to develop a cost-effective approach for the increased throughput of the generation of high quality microbial genomes.« less

Novel primer specific false terminations during DNA sequencing reactions: danger of inaccuracy of mutation analysis in molecular diagnostics

PubMed Central

Anwar, R; Booth, A; Churchill, A J; Markham, A F

1996-01-01

The determination of nucleotide sequence is fundamental to the identification and molecular analysis of genes. Direct sequencing of PCR products is now becoming a commonplace procedure for haplotype analysis, and for defining mutations and polymorphism within genes, particularly for diagnostic purposes. A previously unrecognised phenomenon, primer related variability, observed in sequence data generated using Taq cycle sequencing and T7 Sequenase sequencing, is reported. This suggests that caution is necessary when interpreting DNA sequence data. This is particularly important in situations where treatment may be dependent on the accuracy of the molecular diagnosis. Images PMID:16696096
Piscine reovirus: Genomic and molecular phylogenetic analysis from farmed and wild salmonids collected on the Canada/US Pacific Coast

USGS Publications Warehouse

Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul S.; Richmond, Zina; Purcell, Maureen K.; Johns, Robert; Johnson, Stewart C.; Sakasida, Sonja M.

2015-01-01

Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period.
Piscine Reovirus: Genomic and Molecular Phylogenetic Analysis from Farmed and Wild Salmonids Collected on the Canada/US Pacific Coast

PubMed Central

Siah, Ahmed; Morrison, Diane B.; Fringuelli, Elena; Savage, Paul; Richmond, Zina; Johns, Robert; Purcell, Maureen K.; Johnson, Stewart C.; Saksida, Sonja M.

2015-01-01

Piscine reovirus (PRV) is a double stranded non-enveloped RNA virus detected in farmed and wild salmonids. This study examined the phylogenetic relationships among different PRV sequence types present in samples from salmonids in Western Canada and the US, including Alaska (US), British Columbia (Canada) and Washington State (US). Tissues testing positive for PRV were partially sequenced for segment S1, producing 71 sequences that grouped into 10 unique sequence types. Sequence analysis revealed no identifiable geographical or temporal variation among the sequence types. Identical sequence types were found in fish sampled in 2001, 2005 and 2014. In addition, PRV positive samples from fish derived from Alaska, British Columbia and Washington State share identical sequence types. Comparative analysis of the phylogenetic tree indicated that Canada/US Pacific Northwest sequences formed a subgroup with some Norwegian sequence types (group II), distinct from other Norwegian and Chilean sequences (groups I, III and IV). Representative PRV positive samples from farmed and wild fish in British Columbia and Washington State were subjected to genome sequencing using next generation sequencing methods. Individual analysis of each of the 10 partial segments indicated that the Canadian and US PRV sequence types clustered separately from available whole genome sequences of some Norwegian and Chilean sequences for all segments except the segment S4. In summary, PRV was genetically homogenous over a large geographic distance (Alaska to Washington State), and the sequence types were relatively stable over a 13 year period. PMID:26536673
NexGen Production â Sequencing and Analysis

ScienceCinema

Muzny, Donna

2018-01-16

Donna Muzny of the Baylor College of Medicine Human Genome Sequencing Center discusses next generation sequencing platforms and evaluating pipeline performance on June 2, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
IDENTIFICATION OF AVIAN-SPECIFIC FECAL METAGENOMIC SEQUENCES USING GENOME FRAGMENT ENRICHMENTS

EPA Science Inventory

Sequence analysis of microbial genomes has provided biologists the opportunity to compare genetic differences between closely related microorganisms. While random sequencing has also been used to study natural microbial communities, metagenomic comparisons via sequencing analysis...
Quantitative phenotyping via deep barcode sequencing.

PubMed

Smith, Andrew M; Heisler, Lawrence E; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J; Chee, Mark; Roth, Frederick P; Giaever, Guri; Nislow, Corey

2009-10-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or "Bar-seq," outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that approximately 20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene-environment interactions on a genome-wide scale.
Whole genome sequence analysis of unidentified genetically modified papaya for development of a specific detection method.

PubMed

Nakamura, Kosuke; Kondo, Kazunari; Akiyama, Hiroshi; Ishigaki, Takumi; Noguchi, Akio; Katsumata, Hiroshi; Takasaki, Kazuto; Futo, Satoshi; Sakata, Kozue; Fukuda, Nozomi; Mano, Junichi; Kitta, Kazumi; Tanaka, Hidenori; Akashi, Ryo; Nishimaki-Mogami, Tomoko

2016-08-15

Identification of transgenic sequences in an unknown genetically modified (GM) papaya (Carica papaya L.) by whole genome sequence analysis was demonstrated. Whole genome sequence data were generated for a GM-positive fresh papaya fruit commodity detected in monitoring using real-time polymerase chain reaction (PCR). The sequences obtained were mapped against an open database for papaya genome sequence. Transgenic construct- and event-specific sequences were identified as a GM papaya developed to resist infection from a Papaya ringspot virus. Based on the transgenic sequences, a specific real-time PCR detection method for GM papaya applicable to various food commodities was developed. Whole genome sequence analysis enabled identifying unknown transgenic construct- and event-specific sequences in GM papaya and development of a reliable method for detecting them in papaya food commodities. Copyright © 2016 Elsevier Ltd. All rights reserved.
Modern Computational Techniques for the HMMER Sequence Analysis

PubMed Central

2013-01-01

This paper focuses on the latest research and critical reviews on modern computing architectures, software and hardware accelerated algorithms for bioinformatics data analysis with an emphasis on one of the most important sequence analysis applications—hidden Markov models (HMM). We show the detailed performance comparison of sequence analysis tools on various computing platforms recently developed in the bioinformatics society. The characteristics of the sequence analysis, such as data and compute-intensive natures, make it very attractive to optimize and parallelize by using both traditional software approach and innovated hardware acceleration technologies. PMID:25937944
REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

PubMed Central

Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.

2009-01-01

The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
Identification of Bacillus Probiotics Isolated from Soil Rhizosphere Using 16S rRNA, recA, rpoB Gene Sequencing and RAPD-PCR.

PubMed

Mohkam, Milad; Nezafat, Navid; Berenjian, Aydin; Mobasher, Mohammad Ali; Ghasemi, Younes

2016-03-01

Some Bacillus species, especially Bacillus subtilis and Bacillus pumilus groups, have highly similar 16S rRNA gene sequences, which are hard to identify based on 16S rDNA sequence analysis. To conquer this drawback, rpoB, recA sequence analysis along with randomly amplified polymorphic (RAPD) fingerprinting was examined as an alternative method for differentiating Bacillus species. The 16S rRNA, rpoB and recA genes were amplified via a polymerase chain reaction using their specific primers. The resulted PCR amplicons were sequenced, and phylogenetic analysis was employed by MEGA 6 software. Identification based on 16S rRNA gene sequencing was underpinned by rpoB and recA gene sequencing as well as RAPD-PCR technique. Subsequently, concatenation and phylogenetic analysis showed that extent of diversity and similarity were better obtained by rpoB and recA primers, which are also reinforced by RAPD-PCR methods. However, in one case, these approaches failed to identify one isolate, which in combination with the phenotypical method offsets this issue. Overall, RAPD fingerprinting, rpoB and recA along with concatenated genes sequence analysis discriminated closely related Bacillus species, which highlights the significance of the multigenic method in more precisely distinguishing Bacillus strains. This research emphasizes the benefit of RAPD fingerprinting, rpoB and recA sequence analysis superior to 16S rRNA gene sequence analysis for suitable and effective identification of Bacillus species as recommended for probiotic products.
Investigation of the protein osteocalcin of Camelops hesternus: Sequence, structure and phylogenetic implications

NASA Astrophysics Data System (ADS)

Humpula, James F.; Ostrom, Peggy H.; Gandhi, Hasand; Strahler, John R.; Walker, Angela K.; Stafford, Thomas W.; Smith, James J.; Voorhies, Michael R.; George Corner, R.; Andrews, Phillip C.

2007-12-01

Ancient DNA sequences offer an extraordinary opportunity to unravel the evolutionary history of ancient organisms. Protein sequences offer another reservoir of genetic information that has recently become tractable through the application of mass spectrometric techniques. The extent to which ancient protein sequences resolve phylogenetic relationships, however, has not been explored. We determined the osteocalcin amino acid sequence from the bone of an extinct Camelid (21 ka, Camelops hesternus) excavated from Isleta Cave, New Mexico and three bones of extant camelids: bactrian camel ( Camelus bactrianus); dromedary camel ( Camelus dromedarius) and guanaco ( Llama guanacoe) for a diagenetic and phylogenetic assessment. There was no difference in sequence among the four taxa. Structural attributes observed in both modern and ancient osteocalcin include a post-translation modification, Hyp 9, deamidation of Gln 35 and Gln 39, and oxidation of Met 36. Carbamylation of the N-terminus in ancient osteocalcin may result in blockage and explain previous difficulties in sequencing ancient proteins via Edman degradation. A phylogenetic analysis using osteocalcin sequences of 25 vertebrate taxa was conducted to explore osteocalcin protein evolution and the utility of osteocalcin sequences for delineating phylogenetic relationships. The maximum likelihood tree closely reflected generally recognized taxonomic relationships. For example, maximum likelihood analysis recovered rodents, birds and, within hominins, the Homo-Pan-Gorilla trichotomy. Within Artiodactyla, character state analysis showed that a substitution of Pro 4 for His 4 defines the Capra-Ovis clade within Artiodactyla. Homoplasy in our analysis indicated that osteocalcin evolution is not a perfect indicator of species evolution. Limited sequence availability prevented assigning functional significance to sequence changes. Our preliminary analysis of osteocalcin evolution represents an initial step towards a complete character analysis aimed at determining the evolutionary history of this functionally significant protein. We emphasize that ancient protein sequencing and phylogenetic analyses using amino acid sequences must pay close attention to post-translational modifications, amino acid substitutions due to diagenetic alteration and the impacts of isobaric amino acids on mass shifts and sequence alignments.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering.

PubMed

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor; Essex, M

2015-05-01

To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice.
Importance of Viral Sequence Length and Number of Variable and Informative Sites in Analysis of HIV Clustering

PubMed Central

Novitsky, Vlad; Moyo, Sikhulile; Lei, Quanhong; DeGruttola, Victor

2015-01-01

Abstract To improve the methodology of HIV cluster analysis, we addressed how analysis of HIV clustering is associated with parameters that can affect the outcome of viral clustering. The extent of HIV clustering and tree certainty was compared between 401 HIV-1C near full-length genome sequences and subgenomic regions retrieved from the LANL HIV Database. Sliding window analysis was based on 99 windows of 1,000 bp and 45 windows of 2,000 bp. Potential associations between the extent of HIV clustering and sequence length and the number of variable and informative sites were evaluated. The near full-length genome HIV sequences showed the highest extent of HIV clustering and the highest tree certainty. At the bootstrap threshold of 0.80 in maximum likelihood (ML) analysis, 58.9% of near full-length HIV-1C sequences but only 15.5% of partial pol sequences (ViroSeq) were found in clusters. Among HIV-1 structural genes, pol showed the highest extent of clustering (38.9% at a bootstrap threshold of 0.80), although it was significantly lower than in the near full-length genome sequences. The extent of HIV clustering was significantly higher for sliding windows of 2,000 bp than 1,000 bp. We found a strong association between the sequence length and proportion of HIV sequences in clusters, and a moderate association between the number of variable and informative sites and the proportion of HIV sequences in clusters. In HIV cluster analysis, the extent of detectable HIV clustering is directly associated with the length of viral sequences used, as well as the number of variable and informative sites. Near full-length genome sequences could provide the most informative HIV cluster analysis. Selected subgenomic regions with a high extent of HIV clustering and high tree certainty could also be considered as a second choice. PMID:25560745
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis.

PubMed

Jakupciak, John P; Wells, Jeffrey M; Karalus, Richard J; Pawlowski, David R; Lin, Jeffrey S; Feldman, Andrew B

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations.
Population-Sequencing as a Biomarker of Burkholderia mallei and Burkholderia pseudomallei Evolution through Microbial Forensic Analysis

PubMed Central

Jakupciak, John P.; Wells, Jeffrey M.; Karalus, Richard J.; Pawlowski, David R.; Lin, Jeffrey S.; Feldman, Andrew B.

2013-01-01

Large-scale genomics projects are identifying biomarkers to detect human disease. B. pseudomallei and B. mallei are two closely related select agents that cause melioidosis and glanders. Accurate characterization of metagenomic samples is dependent on accurate measurements of genetic variation between isolates with resolution down to strain level. Often single biomarker sensitivity is augmented by use of multiple or panels of biomarkers. In parallel with single biomarker validation, advances in DNA sequencing enable analysis of entire genomes in a single run: population-sequencing. Potentially, direct sequencing could be used to analyze an entire genome to serve as the biomarker for genome identification. However, genome variation and population diversity complicate use of direct sequencing, as well as differences caused by sample preparation protocols including sequencing artifacts and mistakes. As part of a Department of Homeland Security program in bacterial forensics, we examined how to implement whole genome sequencing (WGS) analysis as a judicially defensible forensic method for attributing microbial sample relatedness; and also to determine the strengths and limitations of whole genome sequence analysis in a forensics context. Herein, we demonstrate use of sequencing to provide genetic characterization of populations: direct sequencing of populations. PMID:24455204
A Rapid Whole Genome Sequencing and Analysis System Supporting Genomic Epidemiology (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

FitzGerald, Michael

2012-06-01

Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
[Development of laboratory sequence analysis software based on WWW and UNIX].

PubMed

Huang, Y; Gu, J R

2001-01-01

Sequence analysis tools based on WWW and UNIX were developed in our laboratory to meet the needs of molecular genetics research in our laboratory. General principles of computer analysis of DNA and protein sequences were also briefly discussed in this paper.
A Rapid Whole Genome Sequencing and Analysis System Supporting Genomic Epidemiology (7th Annual SFAF Meeting, 2012)

ScienceCinema

FitzGerald, Michael

2018-01-11

Michael FitzGerald on "A rapid whole genome sequencing and analysis system supporting genomic epidemiology" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
RSAT 2015: Regulatory Sequence Analysis Tools

PubMed Central

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A.; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M.; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-01-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. PMID:25904632
Genomic DNA sequence and cytosine methylation changes of adult rice leaves after seeds space flight

NASA Astrophysics Data System (ADS)

Shi, Jinming

In this study, cytosine methylation on CCGG site and genomic DNA sequence changes of adult leaves of rice after seeds space flight were detected by methylation-sensitive amplification polymorphism (MSAP) and Amplified fragment length polymorphism (AFLP) technique respectively. Rice seeds were planted in the trial field after 4 days space flight on the shenzhou-6 Spaceship of China. Adult leaves of space-treated rice including 8 plants chosen randomly and 2 plants with phenotypic mutation were used for AFLP and MSAP analysis. Polymorphism of both DNA sequence and cytosine methylation were detected. For MSAP analysis, the average polymorphic frequency of the on-ground controls, space-treated plants and mutants are 1.3%, 3.1% and 11% respectively. For AFLP analysis, the average polymorphic frequencies are 1.4%, 2.9%and 8%respectively. Total 27 and 22 polymorphic fragments were cloned sequenced from MSAP and AFLP analysis respectively. Nine of the 27 fragments from MSAP analysis show homology to coding sequence. For the 22 polymorphic fragments from AFLP analysis, no one shows homology to mRNA sequence and eight fragments show homology to repeat region or retrotransposon sequence. These results suggest that although both genomic DNA sequence and cytosine methylation status can be effected by space flight, the genomic region homology to the fragments from genome DNA and cytosine methylation analysis were different.

Universal sequence map (USM) of arbitrary discrete sequences

PubMed Central

2002-01-01

Background For over a decade the idea of representing biological sequences in a continuous coordinate space has maintained its appeal but not been fully realized. The basic idea is that any sequence of symbols may define trajectories in the continuous space conserving all its statistical properties. Ideally, such a representation would allow scale independent sequence analysis – without the context of fixed memory length. A simple example would consist on being able to infer the homology between two sequences solely by comparing the coordinates of any two homologous units. Results We have successfully identified such an iterative function for bijective mappingψ of discrete sequences into objects of continuous state space that enable scale-independent sequence analysis. The technique, named Universal Sequence Mapping (USM), is applicable to sequences with an arbitrary length and arbitrary number of unique units and generates a representation where map distance estimates sequence similarity. The novel USM procedure is based on earlier work by these and other authors on the properties of Chaos Game Representation (CGR). The latter enables the representation of 4 unit type sequences (like DNA) as an order free Markov Chain transition table. The properties of USM are illustrated with test data and can be verified for other data by using the accompanying web-based tool:http://bioinformatics.musc.edu/~jonas/usm/. Conclusions USM is shown to enable a statistical mechanics approach to sequence analysis. The scale independent representation frees sequence analysis from the need to assume a memory length in the investigation of syntactic rules. PMID:11895567
High Throughput Plasmid Sequencing with Illumina and CLC Bio (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

ScienceCinema

Athavale, Ajay

2018-01-04

Ajay Athavale (Monsanto) presents "High Throughput Plasmid Sequencing with Illumina and CLC Bio" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Lactobacillus heilongjiangensis sp. nov., isolated from Chinese pickle.

PubMed

Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

2013-11-01

A Gram-stain-positive bacterial strain, S4-3(T), was isolated from traditional pickle in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, pheS gene sequence analysis, rpoA gene sequence analysis, dnaK gene sequence analysis, fatty acid methyl ester (FAME) analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain S4-3(T) showed 97.9-98.7 % 16S rRNA gene sequence similarities, 84.4-94.1 % pheS gene sequence similarities and 94.4-96.9 % rpoA gene sequence similarities to the type strains of Lactobacillus nantensis, Lactobacillus mindensis, Lactobacillus crustorum, Lactobacillus futsaii, Lactobacillus farciminis and Lactobacillus kimchiensis. dnaK gene sequence similarities between S4-3(T) and Lactobacillus nantensis LMG 23510(T), Lactobacillus mindensis LMG 21932(T), Lactobacillus crustorum LMG 23699(T), Lactobacillus futsaii JCM 17355(T) and Lactobacillus farciminis LMG 9200(T) were 95.4, 91.5, 90.4, 91.7 and 93.1 %, respectively. Based upon the data obtained in the present study, a novel species, Lactobacillus heilongjiangensis sp. nov., is proposed and the type strain is S4-3(T) ( = LMG 26166(T) = NCIMB 14701(T)).
RIEMS: a software pipeline for sensitive and comprehensive taxonomic classification of reads from metagenomics datasets.

PubMed

Scheuch, Matthias; Höper, Dirk; Beer, Martin

2015-03-03

Fuelled by the advent and subsequent development of next generation sequencing technologies, metagenomics became a powerful tool for the analysis of microbial communities both scientifically and diagnostically. The biggest challenge is the extraction of relevant information from the huge sequence datasets generated for metagenomics studies. Although a plethora of tools are available, data analysis is still a bottleneck. To overcome the bottleneck of data analysis, we developed an automated computational workflow called RIEMS - Reliable Information Extraction from Metagenomic Sequence datasets. RIEMS assigns every individual read sequence within a dataset taxonomically by cascading different sequence analyses with decreasing stringency of the assignments using various software applications. After completion of the analyses, the results are summarised in a clearly structured result protocol organised taxonomically. The high accuracy and performance of RIEMS analyses were proven in comparison with other tools for metagenomics data analysis using simulated sequencing read datasets. RIEMS has the potential to fill the gap that still exists with regard to data analysis for metagenomics studies. The usefulness and power of RIEMS for the analysis of genuine sequencing datasets was demonstrated with an early version of RIEMS in 2011 when it was used to detect the orthobunyavirus sequences leading to the discovery of Schmallenberg virus.
Quantitative phenotyping via deep barcode sequencing

PubMed Central

Smith, Andrew M.; Heisler, Lawrence E.; Mellor, Joseph; Kaper, Fiona; Thompson, Michael J.; Chee, Mark; Roth, Frederick P.; Giaever, Guri; Nislow, Corey

2009-01-01

Next-generation DNA sequencing technologies have revolutionized diverse genomics applications, including de novo genome sequencing, SNP detection, chromatin immunoprecipitation, and transcriptome analysis. Here we apply deep sequencing to genome-scale fitness profiling to evaluate yeast strain collections in parallel. This method, Barcode analysis by Sequencing, or “Bar-seq,” outperforms the current benchmark barcode microarray assay in terms of both dynamic range and throughput. When applied to a complex chemogenomic assay, Bar-seq quantitatively identifies drug targets, with performance superior to the benchmark microarray assay. We also show that Bar-seq is well-suited for a multiplex format. We completely re-sequenced and re-annotated the yeast deletion collection using deep sequencing, found that ∼20% of the barcodes and common priming sequences varied from expectation, and used this revised list of barcode sequences to improve data quality. Together, this new assay and analysis routine provide a deep-sequencing-based toolkit for identifying gene–environment interactions on a genome-wide scale. PMID:19622793
Asymmetry of perceived key movement in chorale sequences: converging evidence from a probe-tone analysis.

PubMed

Cuddy, L L; Thompson, W F

1992-01-01

In a probe-tone experiment, two groups of listeners--one trained, the other untrained, in traditional music theory--rated the goodness of fit of each of the 12 notes of the chromatic scale to four-voice harmonic sequences. Sequences were 12 simplified excerpts from Bach chorales, 4 nonmodulating, and 8 modulating. Modulations occurred either one or two steps in either the clockwise or the counterclockwise direction on the cycle of fifths. A consistent pattern of probe-tone ratings was obtained for each sequence, with no significant differences between listener groups. Two methods of analysis (Fourier analysis and regression analysis) revealed a directional asymmetry in the perceived key movement conveyed by modulating sequences. For a given modulation distance, modulations in the counterclockwise direction effected a clearer shift in tonal organization toward the final key than did clockwise modulations. The nature of the directional asymmetry was consistent with results reported for identification and rating of key change in the sequences (Thompson & Cuddy, 1989a). Further, according to the multiple-regression analysis, probe-tone ratings did not merely reflect the distribution of tones in the sequence. Rather, ratings were sensitive to the temporal structure of the tonal organization in the sequence.
String Mining in Bioinformatics

NASA Astrophysics Data System (ADS)

Abouelhoda, Mohamed; Ghanem, Moustafa

Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word "data-mining" is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
String Mining in Bioinformatics

NASA Astrophysics Data System (ADS)

Abouelhoda, Mohamed; Ghanem, Moustafa

Sequence analysis is a major area in bioinformatics encompassing the methods and techniques for studying the biological sequences, DNA, RNA, and proteins, on the linear structure level. The focus of this area is generally on the identification of intra- and inter-molecular similarities. Identifying intra-molecular similarities boils down to detecting repeated segments within a given sequence, while identifying inter-molecular similarities amounts to spotting common segments among two or multiple sequences. From a data mining point of view, sequence analysis is nothing but string- or pattern mining specific to biological strings. For a long time, this point of view, however, has not been explicitly embraced neither in the data mining nor in the sequence analysis text books, which may be attributed to the co-evolution of the two apparently independent fields. In other words, although the word “data-mining” is almost missing in the sequence analysis literature, its basic concepts have been implicitly applied. Interestingly, recent research in biological sequence analysis introduced efficient solutions to many problems in data mining, such as querying and analyzing time series [49,53], extracting information from web pages [20], fighting spam mails [50], detecting plagiarism [22], and spotting duplications in software systems [14].
Bacterial diversity in permanently cold and alkaline ikaite columns from Greenland.

PubMed

Schmidt, Mariane; Priemé, Anders; Stougaard, Peter

2006-12-01

Bacterial diversity in alkaline (pH 10.4) and permanently cold (4 degrees C) ikaite tufa columns from the Ikka Fjord, SW Greenland, was investigated using growth characterization of cultured bacterial isolates with Terminal-restriction fragment length polymorphism (T-RFLP) and sequence analysis of bacterial 16S rRNA gene fragments. More than 200 bacterial isolates were characterized with respect to pH and temperature tolerance, and it was shown that the majority were cold-active alkaliphiles. T-RFLP analysis revealed distinct bacterial communities in different fractions of three ikaite columns, and, along with sequence analysis, it showed the presence of rich and diverse bacterial communities. Rarefaction analysis showed that the 109 sequenced clones in the 16S rRNA gene library represented between 25 and 65% of the predicted species richness in the three ikaite columns investigated. Phylogenetic analysis of the 16S rRNA gene sequences revealed many sequences with similarity to alkaliphilic or psychrophilic bacteria, and showed that 33% of the cloned sequences and 33% of the cultured bacteria showed less than 97% sequence identity to known sequences in databases, and may therefore represent yet unknown species.
Joint Sequence Analysis: Association and Clustering

ERIC Educational Resources Information Center

Piccarreta, Raffaella

2017-01-01

In its standard formulation, sequence analysis aims at finding typical patterns in a set of life courses represented as sequences. Recently, some proposals have been introduced to jointly analyze sequences defined on different domains (e.g., work career, partnership, and parental histories). We introduce measures to evaluate whether a set of…
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets.

PubMed

Hosseini, Parsa; Tremblay, Arianne; Matthews, Benjamin F; Alkharouf, Nadim W

2010-07-02

The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease.
'DNA Strider': a 'C' program for the fast analysis of DNA and protein sequences on the Apple Macintosh family of computers.

PubMed Central

Marck, C

1988-01-01

DNA Strider is a new integrated DNA and Protein sequence analysis program written with the C language for the Macintosh Plus, SE and II computers. It has been designed as an easy to learn and use program as well as a fast and efficient tool for the day-to-day sequence analysis work. The program consists of a multi-window sequence editor and of various DNA and Protein analysis functions. The editor may use 4 different types of sequences (DNA, degenerate DNA, RNA and one-letter coded protein) and can handle simultaneously 6 sequences of any type up to 32.5 kB each. Negative numbering of the bases is allowed for DNA sequences. All classical restriction and translation analysis functions are present and can be performed in any order on any open sequence or part of a sequence. The main feature of the program is that the same analysis function can be repeated several times on different sequences, thus generating multiple windows on the screen. Many graphic capabilities have been incorporated such as graphic restriction map, hydrophobicity profile and the CAI plot- codon adaptation index according to Sharp and Li. The restriction sites search uses a newly designed fast hexamer look-ahead algorithm. Typical runtime for the search of all sites with a library of 130 restriction endonucleases is 1 second per 10,000 bases. The circular graphic restriction map of the pBR322 plasmid can be therefore computed from its sequence and displayed on the Macintosh Plus screen within 2 seconds and its multiline restriction map obtained in a scrolling window within 5 seconds. PMID:2832831
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

ScienceCinema

Campbell, Catherine

2018-01-22

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Scalable Kernel Methods and Algorithms for General Sequence Analysis

ERIC Educational Resources Information Center

Kuksa, Pavel

2011-01-01

Analysis of large-scale sequential data has become an important task in machine learning and pattern recognition, inspired in part by numerous scientific and technological applications such as the document and text classification or the analysis of biological sequences. However, current computational methods for sequence comparison still lack…
Finishing and Special Motifs: Lessons Learned from CRISPR Analysis Using Next-Generation Draft Sequences (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Campbell, Catherine

Catherine Campbell on "Finishing and Special Motifs: Lessons learned from CRISPR analysis using next-generation draft sequences" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Analysis of DNA Sequences by an Optical Time-Integrating Correlator: Proof-of-Concept Experiments.

DTIC Science & Technology

1992-05-01

DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0 CUSTOM GENERATORS FOR DNA SEQUENCES 10 3.1 Hardware Design 10...of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5 Figure 4: Coarse analysis of a DNA sequence. 7 Figure 5: Fine...a 20-bases long database. 32 xiii LIST OF TABLES PAGE Table 1: Short representations of the DNA bases where each base is represented by 7-bits long
PFAAT version 2.0: a tool for editing, annotating, and analyzing multiple sequence alignments.

PubMed

Caffrey, Daniel R; Dana, Paul H; Mathur, Vidhya; Ocano, Marco; Hong, Eun-Jong; Wang, Yaoyu E; Somaroo, Shyamal; Caffrey, Brian E; Potluri, Shobha; Huang, Enoch S

2007-10-11

By virtue of their shared ancestry, homologous sequences are similar in their structure and function. Consequently, multiple sequence alignments are routinely used to identify trends that relate to function. This type of analysis is particularly productive when it is combined with structural and phylogenetic analysis. Here we describe the release of PFAAT version 2.0, a tool for editing, analyzing, and annotating multiple sequence alignments. Support for multiple annotations is a key component of this release as it provides a framework for most of the new functionalities. The sequence annotations are accessible from the alignment and tree, where they are typically used to label sequences or hyperlink them to related databases. Sequence annotations can be created manually or extracted automatically from UniProt entries. Once a multiple sequence alignment is populated with sequence annotations, sequences can be easily selected and sorted through a sophisticated search dialog. The selected sequences can be further analyzed using statistical methods that explicitly model relationships between the sequence annotations and residue properties. Residue annotations are accessible from the alignment viewer and are typically used to designate binding sites or properties for a particular residue. Residue annotations are also searchable, and allow one to quickly select alignment columns for further sequence analysis, e.g. computing percent identities. Other features include: novel algorithms to compute sequence conservation, mapping conservation scores to a 3D structure in Jmol, displaying secondary structure elements, and sorting sequences by residue composition. PFAAT provides a framework whereby end-users can specify knowledge for a protein family in the form of annotation. The annotations can be combined with sophisticated analysis to test hypothesis that relate to sequence, structure and function.
CRITICA: coding region identification tool invoking comparative analysis

NASA Technical Reports Server (NTRS)

Badger, J. H.; Olsen, G. J.; Woese, C. R. (Principal Investigator)

1999-01-01

Gene recognition is essential to understanding existing and future DNA sequence data. CRITICA (Coding Region Identification Tool Invoking Comparative Analysis) is a suite of programs for identifying likely protein-coding sequences in DNA by combining comparative analysis of DNA sequences with more common noncomparative methods. In the comparative component of the analysis, regions of DNA are aligned with related sequences from the DNA databases; if the translation of the aligned sequences has greater amino acid identity than expected for the observed percentage nucleotide identity, this is interpreted as evidence for coding. CRITICA also incorporates noncomparative information derived from the relative frequencies of hexanucleotides in coding frames versus other contexts (i.e., dicodon bias). The dicodon usage information is derived by iterative analysis of the data, such that CRITICA is not dependent on the existence or accuracy of coding sequence annotations in the databases. This independence makes the method particularly well suited for the analysis of novel genomes. CRITICA was tested by analyzing the available Salmonella typhimurium DNA sequences. Its predictions were compared with the DNA sequence annotations and with the predictions of GenMark. CRITICA proved to be more accurate than GenMark, and moreover, many of its predictions that would seem to be errors instead reflect problems in the sequence databases. The source code of CRITICA is freely available by anonymous FTP (rdp.life.uiuc.edu in/pub/critica) and on the World Wide Web (http:/(/)rdpwww.life.uiuc.edu).
The influence of changes in soil moisture in association with geomorphic change on the formation of a subalpine coniferous forest on Mt. Akita-Komagatake, northern Japan

NASA Astrophysics Data System (ADS)

Konno, A.

2015-12-01

The coniferous forest (largely composed of Abies mariesii) is presently the typical vegetation of the subalpine zone in Japan. Pollen analysis revealed that few A. mariesii were present during the last glacial period, and the species began to expand to the subalpine zone during the Holocene (Morita, 1992). However, on Mt. Akita-Komagatake in northern Japan, the expected predominance of A. mariesii is not extensively observed, and the predominant vegetation is instead the dwarf bamboo (Sasa kurilensis). It is unknown why the area under coniferous forest is small in this region. Therefore, I examined this issue from the perspectives of (1) distribution of vegetation, (2) geomorphology, (3) soil moisture, and (4) vegetation history. (1) Precise digital elevation model data and photographic interpretation showed that this coniferous forest was densely distributed in a flat segment considered to be formed by a landslide; (2) this landslide is thought to have occurred up to 3,699 ± 26 yr BP because a boring-core specimen from the landslide included the AK-3 tephra layer (2,300-2,800 yr BP: Wachi et al, 1997) and the radiocarbon date of the lowermost humic soil layer was 3,699 ± 26 yr BP; (3) the soil in the forest area had higher volumetric water content than that in the non-forest area; and (4) phytolith analysis revealed that the main species in the study site was initially dwarf bamboo, but coniferous forest replaced it after the Towada-a tephra (1035 cal. BP, Machida and Arai, 1992) layer fell. These results suggest that soil water conditions changed because of the formation of the flat segment by the landslide, and the coniferous forest was consequently established. However, the landslide only indirectly affected the formation of the coniferous forest, because the forest developed over several thousand years after the landslide occurred. In other words, more direct reasons for the establishment of the coniferous forest may involve changes in soil moisture. This unresolved issue warrants further investigation of the vegetation on the subalpine zone.
Statistical analysis of life history calendar data.

PubMed

Eerola, Mervi; Helske, Satu

2016-04-01

The life history calendar is a data-collection tool for obtaining reliable retrospective data about life events. To illustrate the analysis of such data, we compare the model-based probabilistic event history analysis and the model-free data mining method, sequence analysis. In event history analysis, we estimate instead of transition hazards the cumulative prediction probabilities of life events in the entire trajectory. In sequence analysis, we compare several dissimilarity metrics and contrast data-driven and user-defined substitution costs. As an example, we study young adults' transition to adulthood as a sequence of events in three life domains. The events define the multistate event history model and the parallel life domains in multidimensional sequence analysis. The relationship between life trajectories and excess depressive symptoms in middle age is further studied by their joint prediction in the multistate model and by regressing the symptom scores on individual-specific cluster indices. The two approaches complement each other in life course analysis; sequence analysis can effectively find typical and atypical life patterns while event history analysis is needed for causal inquiries. © The Author(s) 2012.

A meta-analysis of bacterial diversity in the feces of cattle

USDA-ARS?s Scientific Manuscript database

In this study, we conducted a meta-analysis on 16S rRNA gene sequences of bovine fecal origin that are publicly available in the RDP database. A total of 13663 sequences including 603 isolate sequences were identified in the RDP database (Release 11, Update 1), where 13447 sequences were assigned t...
Molecular identification and phylogenetic analysis of Wuchereria bancrofti from human blood samples in Egypt.

PubMed

Abdel-Shafi, Iman R; Shoieb, Eman Y; Attia, Samar S; Rubio, José M; Ta-Tang, Thuy-Huong; El-Badry, Ayman A

2017-03-01

Lymphatic filariasis (LF) is a serious vector-borne health problem, and Wuchereria bancrofti (W.b) is the major cause of LF worldwide and is focally endemic in Egypt. Identification of filarial infection using traditional morphologic and immunological criteria can be difficult and lead to misdiagnosis. The aim of the present study was molecular detection of W.b in residents in endemic areas in Egypt, sequence variance analysis, and phylogenetic analysis of W.b DNA. Collected blood samples from residents in filariasis endemic areas in five governorates were subjected to semi-nested PCR targeting repeated DNA sequence, for detection of W.b DNA. PCR products were sequenced; subsequently, a phylogenetic analysis of the obtained sequences was performed. Out of 300 blood samples, W.b DNA was identified in 48 (16%). Sequencing analysis confirmed PCR results identifying only W.b species. Sequence alignment and phylogenetic analysis indicated genetically distinct clusters of W.b among the study population. Study results demonstrated that the semi-nested PCR proved to be an effective diagnostic tool for accurate and rapid detection of W.b infections in nano-epidemics and is applicable for samples collected in the daytime as well as the night time. PCR products sequencing and phylogenitic analysis revealed three different nucleotide sequences variants. Further genetic studies of W.b in Egypt and other endemic areas are needed to distinguish related strains and the various ecological as well as drug effects exerted on them to support W.b elimination.
Suitability of partial 16S ribosomal RNA gene sequence analysis for the identification of dangerous bacterial pathogens.

PubMed

Ruppitsch, W; Stöger, A; Indra, A; Grif, K; Schabereiter-Gurtner, C; Hirschl, A; Allerberger, F

2007-03-01

In a bioterrorism event a rapid tool is needed to identify relevant dangerous bacteria. The aim of the study was to assess the usefulness of partial 16S rRNA gene sequence analysis and the suitability of diverse databases for identifying dangerous bacterial pathogens. For rapid identification purposes a 500-bp fragment of the 16S rRNA gene of 28 isolates comprising Bacillus anthracis, Brucella melitensis, Burkholderia mallei, Burkholderia pseudomallei, Francisella tularensis, Yersinia pestis, and eight genus-related and unrelated control strains was amplified and sequenced. The obtained sequence data were submitted to three public and two commercial sequence databases for species identification. The most frequent reason for incorrect identification was the lack of the respective 16S rRNA gene sequences in the database. Sequence analysis of a 500-bp 16S rDNA fragment allows the rapid identification of dangerous bacterial species. However, for discrimination of closely related species sequencing of the entire 16S rRNA gene, additional sequencing of the 23S rRNA gene or sequencing of the 16S-23S rRNA intergenic spacer is essential. This work provides comprehensive information on the suitability of partial 16S rDNA analysis and diverse databases for rapid and accurate identification of dangerous bacterial pathogens.
Noncoding sequence classification based on wavelet transform analysis: part I

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. Coding sequences are those that are read during the transcription. The identification of coding sequences has been widely reported in literature due to its much-studied periodicity. Noncoding sequences represent the majority of the human genome. They play an important role in gene regulation and differentiation among the cells. However, noncoding sequences do not exhibit periodicities that correlate to their functions. The ENCODE (Encyclopedia of DNA elements) and Epigenomic Roadmap Project projects have cataloged the human noncoding sequences into specific functions. We study characteristics of noncoding sequences with wavelet analysis of genomic signals.
Analysis of noise-induced temporal correlations in neuronal spike sequences

NASA Astrophysics Data System (ADS)

Reinoso, José A.; Torrent, M. C.; Masoller, Cristina

2016-11-01

We investigate temporal correlations in sequences of noise-induced neuronal spikes, using a symbolic method of time-series analysis. We focus on the sequence of time-intervals between consecutive spikes (inter-spike-intervals, ISIs). The analysis method, known as ordinal analysis, transforms the ISI sequence into a sequence of ordinal patterns (OPs), which are defined in terms of the relative ordering of consecutive ISIs. The ISI sequences are obtained from extensive simulations of two neuron models (FitzHugh-Nagumo, FHN, and integrate-and-fire, IF), with correlated noise. We find that, as the noise strength increases, temporal order gradually emerges, revealed by the existence of more frequent ordinal patterns in the ISI sequence. While in the FHN model the most frequent OP depends on the noise strength, in the IF model it is independent of the noise strength. In both models, the correlation time of the noise affects the OP probabilities but does not modify the most probable pattern.
BIM (BCL-2 interacting mediator of cell death) SAHB (stabilized α helix of BCL2) not always convinces BAX (BCL-2-associated X protein) for apoptosis.

PubMed

Verma, Sharad; Goyal, Sukriti; Tyagi, Chetna; Jamal, Salma; Singh, Aditi; Grover, Abhinav

2016-06-01

The interaction of BAX (BCL-2-associated X protein) with BIM (BCL-2 interacting mediator of cell death) SAHB (stabilized α helix of BCL2) directly initiates BAX-mediated mitochondrial apoptosis. This molecular dynamics study reveals that BIM SAHB forms a stable complex with BAX but it remains in a non-functional conformation. N terminal of BAX folds towards the core which has been reported exposed in the functional monomer. The α1-α2 loop, which has been reported in open conformation in functional BAX, acquires a closed conformation during the simulation. BH3/α2 remains less exposed as compared to initial structure. The hydrophobic residues of BIM accommodates in the rear pocket of BAX during the simulation. A steep decrease in radius of gyration and solvent accessible surface area (SASA) indicates the complex folding to acquire a more stable but inactive conformation. Further the covariance matrix reveals that the backbone atoms' motions favour the inactive conformation of the complex. This is the first report on the non-functional BAX-BIM SAHB complex by molecular dynamics simulation in the best of our knowledge. Copyright © 2016 Elsevier Inc. All rights reserved.
cyclostratigraphy, sequence stratigraphy and organic matter accumulation mechanism

NASA Astrophysics Data System (ADS)

Cong, F.; Li, J.

2016-12-01

The first member of Maokou Formation of Sichuan basin is composed of well preserved carbonate ramp couplets of limestone and marlstone/shale. It acts as one of the potential shale gas source rock, and is suitable for time-series analysis. We conducted time-series analysis to identify high-frequency sequences, reconstruct high-resolution sedimentation rate, estimate detailed primary productivity for the first time in the study intervals and discuss organic matter accumulation mechanism of source rock under sequence stratigraphic framework.Using the theory of cyclostratigraphy and sequence stratigraphy, the high-frequency sequences of one outcrop profile and one drilling well are identified. Two third-order sequences and eight fourth-order sequences are distinguished on outcrop profile based on the cycle stacking patterns. For drilling well, sequence boundary and four system tracts is distinguished by "integrated prediction error filter analysis" (INPEFA) of Gamma-ray logging data, and eight fourth-order sequences is identified by 405ka long eccentricity curve in depth domain which is quantified and filtered by integrated analysis of MTM spectral analysis, evolutive harmonic analysis (EHA), evolutive average spectral misfit (eASM) and band-pass filtering. It suggests that high-frequency sequences correlate well with Milankovitch orbital signals recorded in sediments, and it is applicable to use cyclostratigraphy theory in dividing high-frequency(4-6 orders) sequence stratigraphy.High-resolution sedimentation rate is reconstructed through the study interval by tracking the highly statistically significant short eccentricity component (123ka) revealed by EHA. Based on sedimentation rate, measured TOC and density data, the burial flux, delivery flux and primary productivity of organic carbon was estimated. By integrating redox proxies, we can discuss the controls on organic matter accumulation by primary production and preservation under the high-resolution sequence stratigraphic framework. Results show that high average organic carbon contents in the study interval are mainly attributed to high primary production. The results also show a good correlation between high organic carbon accumulation and intervals of transgression.
RSAT 2015: Regulatory Sequence Analysis Tools.

PubMed

Medina-Rivera, Alejandra; Defrance, Matthieu; Sand, Olivier; Herrmann, Carl; Castro-Mondragon, Jaime A; Delerce, Jeremy; Jaeger, Sébastien; Blanchet, Christophe; Vincens, Pierre; Caron, Christophe; Staines, Daniel M; Contreras-Moreira, Bruno; Artufel, Marie; Charbonnier-Khamvongsa, Lucie; Hernandez, Céline; Thieffry, Denis; Thomas-Chollier, Morgane; van Helden, Jacques

2015-07-01

RSAT (Regulatory Sequence Analysis Tools) is a modular software suite for the analysis of cis-regulatory elements in genome sequences. Its main applications are (i) motif discovery, appropriate to genome-wide data sets like ChIP-seq, (ii) transcription factor binding motif analysis (quality assessment, comparisons and clustering), (iii) comparative genomics and (iv) analysis of regulatory variations. Nine new programs have been added to the 43 described in the 2011 NAR Web Software Issue, including a tool to extract sequences from a list of coordinates (fetch-sequences from UCSC), novel programs dedicated to the analysis of regulatory variants from GWAS or population genomics (retrieve-variation-seq and variation-scan), a program to cluster motifs and visualize the similarities as trees (matrix-clustering). To deal with the drastic increase of sequenced genomes, RSAT public sites have been reorganized into taxon-specific servers. The suite is well-documented with tutorials and published protocols. The software suite is available through Web sites, SOAP/WSDL Web services, virtual machines and stand-alone programs at http://www.rsat.eu/. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Analyzing Immunoglobulin Repertoires

PubMed Central

Chaudhary, Neha; Wesemann, Duane R.

2018-01-01

Somatic assembly of T cell receptor and B cell receptor (BCR) genes produces a vast diversity of lymphocyte antigen recognition capacity. The advent of efficient high-throughput sequencing of lymphocyte antigen receptor genes has recently generated unprecedented opportunities for exploration of adaptive immune responses. With these opportunities have come significant challenges in understanding the analysis techniques that most accurately reflect underlying biological phenomena. In this regard, sample preparation and sequence analysis techniques, which have largely been borrowed and adapted from other fields, continue to evolve. Here, we review current methods and challenges of library preparation, sequencing and statistical analysis of lymphocyte receptor repertoire studies. We discuss the general steps in the process of immune repertoire generation including sample preparation, platforms available for sequencing, processing of sequencing data, measurable features of the immune repertoire, and the statistical tools that can be used for analysis and interpretation of the data. Because BCR analysis harbors additional complexities, such as immunoglobulin (Ig) (i.e., antibody) gene somatic hypermutation and class switch recombination, the emphasis of this review is on Ig/BCR sequence analysis. PMID:29593723
Sequence analysis of cultivated strawberry (Fragaria × ananassa Duch.) using microdissected single somatic chromosomes.

PubMed

Yanagi, Tomohiro; Shirasawa, Kenta; Terachi, Mayuko; Isobe, Sachiko

2017-01-01

Cultivated strawberry ( Fragaria × ananassa Duch.) has homoeologous chromosomes because of allo-octoploidy. For example, two homoeologous chromosomes that belong to different sub-genome of allopolyploids have similar base sequences. Thus, when conducting de novo assembly of DNA sequences, it is difficult to determine whether these sequences are derived from the same chromosome. To avoid the difficulties associated with homoeologous chromosomes and demonstrate the possibility of sequencing allopolyploids using single chromosomes, we conducted sequence analysis using microdissected single somatic chromosomes of cultivated strawberry. Three hundred and ten somatic chromosomes of the Japanese octoploid strawberry 'Reiko' were individually selected under a light microscope using a microdissection system. DNA from 288 of the dissected chromosomes was successfully amplified using a DNA amplification kit. Using next-generation sequencing, we decoded the base sequences of the amplified DNA segments, and on the basis of mapping, we identified DNA sequences from 144 samples that were best matched to the reference genomes of the octoploid strawberry, F. × ananassa , and the diploid strawberry, F. vesca . The 144 samples were classified into seven pseudo-molecules of F. vesca . The coverage rates of the DNA sequences from the single chromosome onto all pseudo-molecular sequences varied from 3 to 29.9%. We demonstrated an efficient method for sequence analysis of allopolyploid plants using microdissected single chromosomes. On the basis of our results, we believe that whole-genome analysis of allopolyploid plants can be enhanced using methodology that employs microdissected single chromosomes.
mPUMA: a computational approach to microbiota analysis by de novo assembly of operational taxonomic units based on protein-coding barcode sequences.

PubMed

Links, Matthew G; Chaban, Bonnie; Hemmingsen, Sean M; Muirhead, Kevin; Hill, Janet E

2013-08-15

Formation of operational taxonomic units (OTU) is a common approach to data aggregation in microbial ecology studies based on amplification and sequencing of individual gene targets. The de novo assembly of OTU sequences has been recently demonstrated as an alternative to widely used clustering methods, providing robust information from experimental data alone, without any reliance on an external reference database. Here we introduce mPUMA (microbial Profiling Using Metagenomic Assembly, http://mpuma.sourceforge.net), a software package for identification and analysis of protein-coding barcode sequence data. It was developed originally for Cpn60 universal target sequences (also known as GroEL or Hsp60). Using an unattended process that is independent of external reference sequences, mPUMA forms OTUs by DNA sequence assembly and is capable of tracking OTU abundance. mPUMA processes microbial profiles both in terms of the direct DNA sequence as well as in the translated amino acid sequence for protein coding barcodes. By forming OTUs and calculating abundance through an assembly approach, mPUMA is capable of generating inputs for several popular microbiota analysis tools. Using SFF data from sequencing of a synthetic community of Cpn60 sequences derived from the human vaginal microbiome, we demonstrate that mPUMA can faithfully reconstruct all expected OTU sequences and produce compositional profiles consistent with actual community structure. mPUMA enables analysis of microbial communities while empowering the discovery of novel organisms through OTU assembly.
Analysis of sequence repeats of proteins in the PDB.

PubMed

Mary Rajathei, David; Selvaraj, Samuel

2013-12-01

Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.
MIPS: a database for genomes and protein sequences.

PubMed Central

Mewes, H W; Heumann, K; Kaps, A; Mayer, K; Pfeiffer, F; Stocker, S; Frishman, D

1999-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF), Martinsried near Munich, Germany, develops and maintains genome oriented databases. It is commonplace that the amount of sequence data available increases rapidly, but not the capacity of qualified manual annotation at the sequence databases. Therefore, our strategy aims to cope with the data stream by the comprehensive application of analysis tools to sequences of complete genomes, the systematic classification of protein sequences and the active support of sequence analysis and functional genomics projects. This report describes the systematic and up-to-date analysis of genomes (PEDANT), a comprehensive database of the yeast genome (MYGD), a database reflecting the progress in sequencing the Arabidopsis thaliana genome (MATD), the database of assembled, annotated human EST clusters (MEST), and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). MIPS provides access through its WWW server (http://www.mips.biochem.mpg.de) to a spectrum of generic databases, including the above mentioned as well as a database of protein families (PROTFAM), the MITOP database, and the all-against-all FASTA database. PMID:9847138
Genetic Code Analysis Toolkit: A novel tool to explore the coding properties of the genetic code and DNA sequences

NASA Astrophysics Data System (ADS)

Kraljić, K.; Strüngmann, L.; Fimmel, E.; Gumbel, M.

2018-01-01

The genetic code is degenerated and it is assumed that redundancy provides error detection and correction mechanisms in the translation process. However, the biological meaning of the code's structure is still under current research. This paper presents a Genetic Code Analysis Toolkit (GCAT) which provides workflows and algorithms for the analysis of the structure of nucleotide sequences. In particular, sets or sequences of codons can be transformed and tested for circularity, comma-freeness, dichotomic partitions and others. GCAT comes with a fertile editor custom-built to work with the genetic code and a batch mode for multi-sequence processing. With the ability to read FASTA files or load sequences from GenBank, the tool can be used for the mathematical and statistical analysis of existing sequence data. GCAT is Java-based and provides a plug-in concept for extensibility. Availability: Open source Homepage:http://www.gcat.bio/
Statistical properties of filtered pseudorandom digital sequences formed from the sum of maximum-length sequences

NASA Technical Reports Server (NTRS)

Wallace, G. R.; Weathers, G. D.; Graf, E. R.

1973-01-01

The statistics of filtered pseudorandom digital sequences called hybrid-sum sequences, formed from the modulo-two sum of several maximum-length sequences, are analyzed. The results indicate that a relation exists between the statistics of the filtered sequence and the characteristic polynomials of the component maximum length sequences. An analysis procedure is developed for identifying a large group of sequences with good statistical properties for applications requiring the generation of analog pseudorandom noise. By use of the analysis approach, the filtering process is approximated by the convolution of the sequence with a sum of unit step functions. A parameter reflecting the overall statistical properties of filtered pseudorandom sequences is derived. This parameter is called the statistical quality factor. A computer algorithm to calculate the statistical quality factor for the filtered sequences is presented, and the results for two examples of sequence combinations are included. The analysis reveals that the statistics of the signals generated with the hybrid-sum generator are potentially superior to the statistics of signals generated with maximum-length generators. Furthermore, fewer calculations are required to evaluate the statistics of a large group of hybrid-sum generators than are required to evaluate the statistics of the same size group of approximately equivalent maximum-length sequences.
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, M.S.

1998-08-18

A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device. 27 figs.
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, Mark S.; Wang, Chunwei; Jevons, Luis C.; Bernhart, Derek H.; Lipshutz, Robert J.

2004-05-11

A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, Mark S.

1998-08-18

A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments are improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, Mark S.

2003-08-19

A computer system for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area and sample sequences in another area on a display device.
Method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation in conjunction with mass spectrometric analysis

DOEpatents

Laskin, Julia [Richland, WA; Futrell, Jean H [Richland, WA

2008-04-29

The invention relates to a method and apparatus for enhanced sequencing of complex molecules using surface-induced dissociation (SID) in conjunction with mass spectrometric analysis. Results demonstrate formation of a wide distribution of structure-specific fragments having wide sequence coverage useful for sequencing and identifying the complex molecules.

A DNA sequence analysis package for the IBM personal computer.

PubMed Central

Lagrimini, L M; Brentano, S T; Donelson, J E

1984-01-01

We present here a collection of DNA sequence analysis programs, called "PC Sequence" (PCS), which are designed to run on the IBM Personal Computer (PC). These programs are written in IBM PC compiled BASIC and take full advantage of the IBM PC's speed, error handling, and graphics capabilities. For a modest initial expense in hardware any laboratory can use these programs to quickly perform computer analysis on DNA sequences. They are written with the novice user in mind and require very little training or previous experience with computers. Also provided are a text editing program for creating and modifying DNA sequence files and a communications program which enables the PC to communicate with and collect information from mainframe computers and DNA sequence databases. PMID:6546433
Regulatory sequence analysis tools.

PubMed

van Helden, Jacques

2003-07-01

The web resource Regulatory Sequence Analysis Tools (RSAT) (http://rsat.ulb.ac.be/rsat) offers a collection of software tools dedicated to the prediction of regulatory sites in non-coding DNA sequences. These tools include sequence retrieval, pattern discovery, pattern matching, genome-scale pattern matching, feature-map drawing, random sequence generation and other utilities. Alternative formats are supported for the representation of regulatory motifs (strings or position-specific scoring matrices) and several algorithms are proposed for pattern discovery. RSAT currently holds >100 fully sequenced genomes and these data are regularly updated from GenBank.
An efficient annotation and gene-expression derivation tool for Illumina Solexa datasets

PubMed Central

2010-01-01

Background The data produced by an Illumina flow cell with all eight lanes occupied, produces well over a terabyte worth of images with gigabytes of reads following sequence alignment. The ability to translate such reads into meaningful annotation is therefore of great concern and importance. Very easily, one can get flooded with such a great volume of textual, unannotated data irrespective of read quality or size. CASAVA, a optional analysis tool for Illumina sequencing experiments, enables the ability to understand INDEL detection, SNP information, and allele calling. To not only extract from such analysis, a measure of gene expression in the form of tag-counts, but furthermore to annotate such reads is therefore of significant value. Findings We developed TASE (Tag counting and Analysis of Solexa Experiments), a rapid tag-counting and annotation software tool specifically designed for Illumina CASAVA sequencing datasets. Developed in Java and deployed using jTDS JDBC driver and a SQL Server backend, TASE provides an extremely fast means of calculating gene expression through tag-counts while annotating sequenced reads with the gene's presumed function, from any given CASAVA-build. Such a build is generated for both DNA and RNA sequencing. Analysis is broken into two distinct components: DNA sequence or read concatenation, followed by tag-counting and annotation. The end result produces output containing the homology-based functional annotation and respective gene expression measure signifying how many times sequenced reads were found within the genomic ranges of functional annotations. Conclusions TASE is a powerful tool to facilitate the process of annotating a given Illumina Solexa sequencing dataset. Our results indicate that both homology-based annotation and tag-count analysis are achieved in very efficient times, providing researchers to delve deep in a given CASAVA-build and maximize information extraction from a sequencing dataset. TASE is specially designed to translate sequence data in a CASAVA-build into functional annotations while producing corresponding gene expression measurements. Achieving such analysis is executed in an ultrafast and highly efficient manner, whether the analysis be a single-read or paired-end sequencing experiment. TASE is a user-friendly and freely available application, allowing rapid analysis and annotation of any given Illumina Solexa sequencing dataset with ease. PMID:20598141
DNA sequence analysis of ARS elements from chromosome III of Saccharomyces cerevisiae: identification of a new conserved sequence.

PubMed Central

Palzkill, T G; Oliver, S G; Newlon, C S

1986-01-01

Four fragments of Saccharomyces cerevisiae chromosome III DNA which carry ARS elements have been sequenced. Each fragment contains multiple copies of sequences that have at least 10 out of 11 bases of homology to a previously reported 11 bp core consensus sequence. A survey of these new ARS sequences and previously reported sequences revealed the presence of an additional 11 bp conserved element located on the 3' side of the T-rich strand of the core consensus. Subcloning analysis as well as deletion and transposon insertion mutagenesis of ARS fragments support a role for 3' conserved sequence in promoting ARS activity. PMID:3529036
mESAdb: microRNA Expression and Sequence Analysis Database

PubMed Central

Kaya, Koray D.; Karakülah, Gökhan; Yakıcıer, Cengiz M.; Acar, Aybar C.; Konu, Özlen

2011-01-01

microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data. PMID:21177657
mESAdb: microRNA expression and sequence analysis database.

PubMed

Kaya, Koray D; Karakülah, Gökhan; Yakicier, Cengiz M; Acar, Aybar C; Konu, Ozlen

2011-01-01

microRNA expression and sequence analysis database (http://konulab.fen.bilkent.edu.tr/mirna/) (mESAdb) is a regularly updated database for the multivariate analysis of sequences and expression of microRNAs from multiple taxa. mESAdb is modular and has a user interface implemented in PHP and JavaScript and coupled with statistical analysis and visualization packages written for the R language. The database primarily comprises mature microRNA sequences and their target data, along with selected human, mouse and zebrafish expression data sets. mESAdb analysis modules allow (i) mining of microRNA expression data sets for subsets of microRNAs selected manually or by motif; (ii) pair-wise multivariate analysis of expression data sets within and between taxa; and (iii) association of microRNA subsets with annotation databases, HUGE Navigator, KEGG and GO. The use of existing and customized R packages facilitates future addition of data sets and analysis tools. Furthermore, the ability to upload and analyze user-specified data sets makes mESAdb an interactive and expandable analysis tool for microRNA sequence and expression data.
VDJServer: A Cloud-Based Analysis Portal and Data Commons for Immune Repertoire Sequences and Rearrangements.

PubMed

Christley, Scott; Scarborough, Walter; Salinas, Eddie; Rounds, William H; Toby, Inimary T; Fonner, John M; Levin, Mikhail K; Kim, Min; Mock, Stephen A; Jordan, Christopher; Ostmeyer, Jared; Buntzman, Adam; Rubelt, Florian; Davila, Marco L; Monson, Nancy L; Scheuermann, Richard H; Cowell, Lindsay G

2018-01-01

Recent technological advances in immune repertoire sequencing have created tremendous potential for advancing our understanding of adaptive immune response dynamics in various states of health and disease. Immune repertoire sequencing produces large, highly complex data sets, however, which require specialized methods and software tools for their effective analysis and interpretation. VDJServer is a cloud-based analysis portal for immune repertoire sequence data that provide access to a suite of tools for a complete analysis workflow, including modules for preprocessing and quality control of sequence reads, V(D)J gene segment assignment, repertoire characterization, and repertoire comparison. VDJServer also provides sophisticated visualizations for exploratory analysis. It is accessible through a standard web browser via a graphical user interface designed for use by immunologists, clinicians, and bioinformatics researchers. VDJServer provides a data commons for public sharing of repertoire sequencing data, as well as private sharing of data between users. We describe the main functionality and architecture of VDJServer and demonstrate its capabilities with use cases from cancer immunology and autoimmunity. VDJServer provides a complete analysis suite for human and mouse T-cell and B-cell receptor repertoire sequencing data. The combination of its user-friendly interface and high-performance computing allows large immune repertoire sequencing projects to be analyzed with no programming or software installation required. VDJServer is a web-accessible cloud platform that provides access through a graphical user interface to a data management infrastructure, a collection of analysis tools covering all steps in an analysis, and an infrastructure for sharing data along with workflows, results, and computational provenance. VDJServer is a free, publicly available, and open-source licensed resource.
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, Mark S.

1999-10-26

A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).
Computer-aided visualization and analysis system for sequence evaluation

DOEpatents

Chee, Mark S.

2001-06-05

A computer system (1) for analyzing nucleic acid sequences is provided. The computer system is used to perform multiple methods for determining unknown bases by analyzing the fluorescence intensities of hybridized nucleic acid probes. The results of individual experiments may be improved by processing nucleic acid sequences together. Comparative analysis of multiple experiments is also provided by displaying reference sequences in one area (814) and sample sequences in another area (816) on a display device (3).
Genome-wide comparative analysis of four Indian Drosophila species.

PubMed

Mohanty, Sujata; Khanna, Radhika

2017-12-01

Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
Glutenite bodies sequence division of the upper Es4 in northern Minfeng zone of Dongying Sag, Bohai Bay Basin, China

NASA Astrophysics Data System (ADS)

Shao, Xupeng

2017-04-01

Glutenite bodies are widely developed in northern Minfeng zone of Dongying Sag. Their litho-electric relationship is not clear. In addition, as the conventional sequence stratigraphic research method drawbacks of involving too many subjective human factors, it has limited deepening of the regional sequence stratigraphic research. The wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data have advantages of dividing sequence stratigraphy quantitatively comparing with the conventional methods. Under the basis of the conventional sequence research method, this paper used the above techniques to divide the fourth-order sequence of the upper Es4 in northern Minfeng zone of Dongying Sag. The research shows that the wavelet transform technique based on logging data and the time-frequency analysis technique based on seismic data are essentially consistent, both of which divide sequence stratigraphy quantitatively in the frequency domain; wavelet transform technique has high resolutions. It is suitable for areas with wells. The seismic time-frequency analysis technique has wide applicability, but a low resolution. Both of the techniques should be combined; the upper Es4 in northern Minfeng zone of Dongying Sag is a complete set of third-order sequence, which can be further subdivided into 5 fourth-order sequences that has the depositional characteristics of fine-upward sequence in granularity. Key words: Dongying sag, northern Minfeng zone, wavelet transform technique, time-frequency analysis technique ,the upper Es4, sequence stratigraphy
Single-Exome sequencing identified a novel RP2 mutation in a child with X-linked retinitis pigmentosa.

PubMed

Lim, Hassol; Park, Young-Mi; Lee, Jong-Keuk; Taek Lim, Hyun

2016-10-01

To present an efficient and successful application of a single-exome sequencing study in a family clinically diagnosed with X-linked retinitis pigmentosa. Exome sequencing study based on clinical examination data. An 8-year-old proband and his family. The proband and his family members underwent comprehensive ophthalmologic examinations. Exome sequencing was undertaken in the proband using Agilent SureSelect Human All Exon Kit and Illumina HiSeq 2000 platform. Bioinformatic analysis used Illumina pipeline with Burrows-Wheeler Aligner-Genome Analysis Toolkit (BWA-GATK), followed by ANNOVAR to perform variant functional annotation. All variants passing filter criteria were validated by Sanger sequencing to confirm familial segregation. Analysis of exome sequence data identified a novel frameshift mutation in RP2 gene resulting in a premature stop codon (c.665delC, p.Pro222fsTer237). Sanger sequencing revealed this mutation co-segregated with the disease phenotype in the child's family. We identified a novel causative mutation in RP2 from a single proband's exome sequence data analysis. This study highlights the effectiveness of the whole-exome sequencing in the genetic diagnosis of X-linked retinitis pigmentosa, over the conventional sequencing methods. Even using a single exome, exome sequencing technology would be able to pinpoint pathogenic variant(s) for X-linked retinitis pigmentosa, when properly applied with aid of adequate variant filtering strategy. Copyright © 2016 Canadian Ophthalmological Society. Published by Elsevier Inc. All rights reserved.
Purification, developmental expression, and in silico characterization of α-amylase inhibitor from Echinochloa frumentacea.

PubMed

Panwar, Priyankar; Verma, A K; Dubey, Ashutosh

2018-05-01

Barnyard ( Echinochloa frumentacea ) and finger ( Eleusine coracana ) millet growing at northwestern Himalaya were explored for the α-amylase inhibitor (α-AI). The mature seeds of barnyard millet variety PRJ1 had maximum α-AI activity which increases in different developmental stage. α-AI was purified up to 22.25-fold from barnyard millet variety PRJ1. Semi-quantitative PCR of different developmental stages of barnyard millet seeds showed increased levels of the transcript from 7 to 28 days. Sequence analysis revealed that it contained 315 bp nucleotide which encodes 104 amino acid sequence with molecular weight 10.72 kDa. The predicted 3D structure of α-AI was 86.73% similar to a bifunctional inhibitor of ragi. In silico analysis of 71 α-AI protein sequences were carried out for biochemical features, homology search, multiple sequence alignment, phylogenetic tree construction, motif, and superfamily distribution of protein sequences. Analysis of multiple sequence alignment revealed the existence of conserved regions NPLP[S/G]CRWYVV[S/Q][Q/R]TCG[V/I] throughout sequences. Superfam analysis revealed that α-AI protein sequences were distributed among seven different superfamilies.
EventThread: Visual Summarization and Stage Analysis of Event Sequence Data.

PubMed

Guo, Shunan; Xu, Ke; Zhao, Rongwen; Gotz, David; Zha, Hongyuan; Cao, Nan

2018-01-01

Event sequence data such as electronic health records, a person's academic records, or car service records, are ordered series of events which have occurred over a period of time. Analyzing collections of event sequences can reveal common or semantically important sequential patterns. For example, event sequence analysis might reveal frequently used care plans for treating a disease, typical publishing patterns of professors, and the patterns of service that result in a well-maintained car. It is challenging, however, to visually explore large numbers of event sequences, or sequences with large numbers of event types. Existing methods focus on extracting explicitly matching patterns of events using statistical analysis to create stages of event progression over time. However, these methods fail to capture latent clusters of similar but not identical evolutions of event sequences. In this paper, we introduce a novel visualization system named EventThread which clusters event sequences into threads based on tensor analysis and visualizes the latent stage categories and evolution patterns by interactively grouping the threads by similarity into time-specific clusters. We demonstrate the effectiveness of EventThread through usage scenarios in three different application domains and via interviews with an expert user.
Sequencing, Assembly and Analysis of Human Microbial Communities

ScienceCinema

Petrosino, Joe

2018-02-02

Joe Petrosino of Baylor College of Medicine discusses using next generation sequencing technologies to study human microbial communities associated with health and disease on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
galaxie--CGI scripts for sequence identification through automated phylogenetic analysis.

PubMed

Nilsson, R Henrik; Larsson, Karl-Henrik; Ursing, Björn M

2004-06-12

The prevalent use of similarity searches like BLAST to identify sequences and species implicitly assumes the reference database to be of extensive sequence sampling. This is often not the case, restraining the correctness of the outcome as a basis for sequence identification. Phylogenetic inference outperforms similarity searches in retrieving correct phylogenies and consequently sequence identities, and a project was initiated to design a freely available script package for sequence identification through automated Web-based phylogenetic analysis. Three CGI scripts were designed to facilitate qualified sequence identification from a Web interface. Query sequences are aligned to pre-made alignments or to alignments made by ClustalW with entries retrieved from a BLAST search. The subsequent phylogenetic analysis is based on the PHYLIP package for inferring neighbor-joining and parsimony trees. The scripts are highly configurable. A service installation and a version for local use are found at http://andromeda.botany.gu.se/galaxiewelcome.html and http://galaxie.cgb.ki.se
Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

PubMed

Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

2004-03-01

One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.
Impact of cultivation on characterisation of species composition of soil bacterial communities.

PubMed

McCaig, A E.; Grayston, S J.; Prosser, J I.; Glover, L A.

2001-03-01

The species composition of culturable bacteria in Scottish grassland soils was investigated using a combination of Biolog and 16S rDNA analysis for characterisation of isolates. The inclusion of a molecular approach allowed direct comparison of sequences from culturable bacteria with sequences obtained during analysis of DNA extracted directly from the same soil samples. Bacterial strains were isolated on Pseudomonas isolation agar (PIA), a selective medium, and on tryptone soya agar (TSA), a general laboratory medium. In total, 12 and 21 morphologically different bacterial cultures were isolated on PIA and TSA, respectively. Biolog and sequencing placed PIA isolates in the same taxonomic groups, the majority of cultures belonging to the Pseudomonas (sensu stricto) group. However, analysis of 16S rDNA sequences proved more efficient than Biolog for characterising TSA isolates due to limitations of the Microlog database for identifying environmental bacteria. In general, 16S rDNA sequences from TSA isolates showed high similarities to cultured species represented in sequence databases, although TSA-8 showed only 92.5% similarity to the nearest relative, Bacillus insolitus. In general, there was very little overlap between the culturable and uncultured bacterial communities, although two sequences, PIA-2 and TSA-13, showed >99% similarity to soil clones. A cloning step was included prior to sequence analysis of two isolates, TSA-5 and TSA-14, and analysis of several clones confirmed that these cultures comprised at least four and three sequence types, respectively. All isolate clones were most closely related to uncultured bacteria, with clone TSA-5.1 showing 99.8% similarity to a sequence amplified directly from the same soil sample. Interestingly, one clone, TSA-5.4, clustered within a novel group comprising only uncultured sequences. This group, which is associated with the novel, deep-branching Acidobacterium capsulatum lineage, also included clones isolated during direct analysis of the same soil and from a wide range of other sample types studied elsewhere. The study demonstrates the value of fine-scale molecular analysis for identification of laboratory isolates and indicates the culturability of approximately 1% of the total population but under a restricted range of media and cultivation conditions.
Oligo Design: a computer program for development of probes for oligonucleotide microarrays.

PubMed

Herold, Keith E; Rasooly, Avraham

2003-12-01

Oligonucleotide microarrays have demonstrated potential for the analysis of gene expression, genotyping, and mutational analysis. Our work focuses primarily on the detection and identification of bacteria based on known short sequences of DNA. Oligo Design, the software described here, automates several design aspects that enable the improved selection of oligonucleotides for use with microarrays for these applications. Two major features of the program are: (i) a tiling algorithm for the design of short overlapping temperature-matched oligonucleotides of variable length, which are useful for the analysis of single nucleotide polymorphisms and (ii) a set of tools for the analysis of multiple alignments of gene families and related short DNA sequences, which allow for the identification of conserved DNA sequences for PCR primer selection and variable DNA sequences for the selection of unique probes for identification. Note that the program does not address the full genome perspective but, instead, is focused on the genetic analysis of short segments of DNA. The program is Internet-enabled and includes a built-in browser and the automated ability to download sequences from GenBank by specifying the GI number. The program also includes several utilities, including audio recital of a DNA sequence (useful for verifying sequences against a written document), a random sequence generator that provides insight into the relationship between melting temperature and GC content, and a PCR calculator.
[The use of 16S rDNA sequencing in species diversity analysis for sputum of patients with ventilator-associated pneumonia].

PubMed

Yang, Xiaojun; Wang, Xiaohong; Liang, Zhijuan; Zhang, Xiaoya; Wang, Yanbo; Wang, Zhenhai

2014-05-01

To study the species and amount of bacteria in sputum of patients with ventilator-associated pneumonia (VAP) by using 16S rDNA sequencing analysis, and to explore the new method for etiologic diagnosis of VAP. Bronchoalveolar lavage sputum samples were collected from 31 patients with VAP. Bacterial DNA of the samples were extracted and identified by polymerase chain reaction (PCR). At the same time, sputum specimens were processed for routine bacterial culture. The high flux sequencing experiment was conducted on PCR positive samples with 16S rDNA macro genome sequencing technology, and sequencing results were analyzed using bioinformatics, then the results between the sequencing and bacteria culture were compared. (1) 550 bp of specific DNA sequences were amplified in sputum specimens from 27 cases of the 31 patients with VAP, and they were used for sequencing analysis. 103 856 sequences were obtained from those sputum specimens using 16S rDNA sequencing, yielding approximately 39 Mb of raw data. Tag sequencing was able to inform genus level in all 27 samples. (2) Alpha-diversity analysis showed that sputum samples of patients with VAP had significantly higher variability and richness in bacterial species (Shannon index values 1.20, Simpson index values 0.48). Rarefaction curve analysis showed that there were more species that were not detected by sequencing from some VAP sputum samples. (3) Analysis of 27 sputum samples with VAP by using 16S rDNA sequences yielded four phyla: namely Acitinobacteria, Bacteroidetes, Firmicutes, Proteobacteria. With genus as a classification, it was found that the dominant species included Streptococcus 88.9% (24/27), Limnohabitans 77.8% (21/27), Acinetobacter 70.4% (19/27), Sphingomonas 63.0% (17/27), Prevotella 63.0% (17/27), Klebsiella 55.6% (15/27), Pseudomonas 55.6% (15/27), Aquabacterium 55.6% (15/27), and Corynebacterium 55.6% (15/27). (4) Pyrophosphate sequencing discovered that Prevotella, Limnohabitans, Aquabacterium, Sphingomonas might not be detected by routine bacteria culture. Among seven species which were identified by both methods, pyrophosphate sequencing yielded higher positive rate than that of ordinary bacteria culture [Streptococcus: 88.9% (24/27) vs. 18.5% (5/27), Klebsiella: 55.6% (15/27) vs. 18.5% (5/27), Acinetobacter: 70.4% (19/27) vs. 37.0% (10/27), Corynebacterium: 55.6% (15/27) vs. 7.4% (2/27), P<0.05 or P<0.01]. Sequencing positive rate was found to increase positive rate for culture of Pseudomonas [55.6% (15/27) vs. 25.9% (7/27), P=0.050]. No significant differences were observed between sequencing and ordinary bacteria culture for detection Staphylococcus [7.4% (2/27) vs. 11.1% (3/27)] and Neisseria bacteria genera [18.5% (5/27) vs. 3.7% (1/27), both P>0.05]. 16S rDNA sequencing analysis confirmed that pathogenic bacteria in sputum of VAP were complicated with multiple drug resistant strains. Compared with routine bacterial culture, pyrophosphate sequencing had higher positive rate in detecting pathogens. 16S rDNA gene sequencing technology may become a new method for etiological diagnosis of VAP.

Automated Sanger Analysis Pipeline (ASAP): A Tool for Rapidly Analyzing Sanger Sequencing Data with Minimum User Interference.

PubMed

Singh, Aditya; Bhatia, Prateek

2016-12-01

Sanger sequencing platforms, such as applied biosystems instruments, generate chromatogram files. Generally, for 1 region of a sequence, we use both forward and reverse primers to sequence that area, in that way, we have 2 sequences that need to be aligned and a consensus generated before mutation detection studies. This work is cumbersome and takes time, especially if the gene is large with many exons. Hence, we devised a rapid automated command system to filter, build, and align consensus sequences and also optionally extract exonic regions, translate them in all frames, and perform an amino acid alignment starting from raw sequence data within a very short time. In full capabilities of Automated Mutation Analysis Pipeline (ASAP), it is able to read "*.ab1" chromatogram files through command line interface, convert it to the FASTQ format, trim the low-quality regions, reverse-complement the reverse sequence, create a consensus sequence, extract the exonic regions using a reference exonic sequence, translate the sequence in all frames, and align the nucleic acid and amino acid sequences to reference nucleic acid and amino acid sequences, respectively. All files are created and can be used for further analysis. ASAP is available as Python 3.x executable at https://github.com/aditya-88/ASAP. The version described in this paper is 0.28.
Sequencing, Analysis, and Annotation of Expressed Sequence Tags for Camelus dromedarius

PubMed Central

Al-Swailem, Abdulaziz M.; Shehata, Maher M.; Abu-Duhier, Faisel M.; Al-Yamani, Essam J.; Al-Busadah, Khalid A.; Al-Arawi, Mohammed S.; Al-Khider, Ali Y.; Al-Muhaimeed, Abdullah N.; Al-Qahtani, Fahad H.; Manee, Manee M.; Al-Shomrani, Badr M.; Al-Qhtani, Saad M.; Al-Harthi, Amer S.; Akdemir, Kadir C.; Otu, Hasan H.

2010-01-01

Despite its economical, cultural, and biological importance, there has not been a large scale sequencing project to date for Camelus dromedarius. With the goal of sequencing complete DNA of the organism, we first established and sequenced camel EST libraries, generating 70,272 reads. Following trimming, chimera check, repeat masking, cluster and assembly, we obtained 23,602 putative gene sequences, out of which over 4,500 potentially novel or fast evolving gene sequences do not carry any homology to other available genomes. Functional annotation of sequences with similarities in nucleotide and protein databases has been obtained using Gene Ontology classification. Comparison to available full length cDNA sequences and Open Reading Frame (ORF) analysis of camel sequences that exhibit homology to known genes show more than 80% of the contigs with an ORF>300 bp and ∼40% hits extending to the start codons of full length cDNAs suggesting successful characterization of camel genes. Similarity analyses are done separately for different organisms including human, mouse, bovine, and rat. Accompanying web portal, CAGBASE (http://camel.kacst.edu.sa/), hosts a relational database containing annotated EST sequences and analysis tools with possibility to add sequences from public domain. We anticipate our results to provide a home base for genomic studies of camel and other comparative studies enabling a starting point for whole genome sequencing of the organism. PMID:20502665
Analysis of the Macaca mulatta transcriptome and the sequence divergence between Macaca and human.

PubMed

Magness, Charles L; Fellin, P Campion; Thomas, Matthew J; Korth, Marcus J; Agy, Michael B; Proll, Sean C; Fitzgibbon, Matthew; Scherer, Christina A; Miner, Douglas G; Katze, Michael G; Iadonato, Shawn P

2005-01-01

We report the initial sequencing and comparative analysis of the Macaca mulatta transcriptome. Cloned sequences from 11 tissues, nine animals, and three species (M. mulatta, M. fascicularis, and M. nemestrina) were sampled, resulting in the generation of 48,642 sequence reads. These data represent an initial sampling of the putative rhesus orthologs for 6,216 human genes. Mean nucleotide diversity within M. mulatta and sequence divergence among M. fascicularis, M. nemestrina, and M. mulatta are also reported.
The complete sequence of Cymbidium mosaic virus from Vanilla fragrans in Hainan, China.

PubMed

He, Zhen; Jiang, Dongmei; Liu, Aiqin; Sang, Liwei; Li, Wenfeng; Li, Shifang

2011-06-01

The complete nucleotide sequence of Cymbidium mosaic virus (CymMV) isolated from vanilla in Hainan province, China was determined for the first time. It comprised 6,224 nucleotides; sequence analysis suggested that the isolate we obtained was a member of the genus Potexvirus, and its sequence shared 86.67-96.61% identities with previously reported sequences. Phylogenetic analysis suggested that CymMV from vanilla fragrans was clustered into subgroup A and the isolates in this subgroup displayed little regional difference.
Using PATIMDB to Create Bacterial Transposon Insertion Mutant Libraries

PubMed Central

Urbach, Jonathan M.; Wei, Tao; Liberati, Nicole; Grenfell-Lee, Daniel; Villanueva, Jacinto; Wu, Gang; Ausubel, Frederick M.

2015-01-01

PATIMDB is a software package for facilitating the generation of transposon mutant insertion libraries. The software has two main functions: process tracking and automated sequence analysis. The process tracking function specifically includes recording the status and fates of multiwell plates and samples in various stages of library construction. Automated sequence analysis refers specifically to the pipeline of sequence analysis starting with ABI files from a sequencing facility and ending with insertion location identifications. The protocols in this unit describe installation and use of PATIMDB software. PMID:19343706
Identification of genes in anonymous DNA sequences. Annual performance report, February 1, 1991--January 31, 1992

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fields, C.A.

1996-06-01

The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progressmore » report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.« less
Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

ScienceCinema

Patel, Kamlesh D.

2018-01-22

Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Patel, Kamlesh D.

2012-06-01

Kamlesh (Ken) Patel from Sandia National Laboratories (Livermore, California) presents "Preparation of Nucleic Acid Libraries for Personalized Sequencing Systems Using an Integrated Microfluidic Hub Technology " at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Categorizing accident sequences in the external radiotherapy for risk analysis

PubMed Central

2013-01-01

Purpose This study identifies accident sequences from the past accidents in order to help the risk analysis application to the external radiotherapy. Materials and Methods This study reviews 59 accidental cases in two retrospective safety analyses that have collected the incidents in the external radiotherapy extensively. Two accident analysis reports that accumulated past incidents are investigated to identify accident sequences including initiating events, failure of safety measures, and consequences. This study classifies the accidents by the treatments stages and sources of errors for initiating events, types of failures in the safety measures, and types of undesirable consequences and the number of affected patients. Then, the accident sequences are grouped into several categories on the basis of similarity of progression. As a result, these cases can be categorized into 14 groups of accident sequence. Results The result indicates that risk analysis needs to pay attention to not only the planning stage, but also the calibration stage that is committed prior to the main treatment process. It also shows that human error is the largest contributor to initiating events as well as to the failure of safety measures. This study also illustrates an event tree analysis for an accident sequence initiated in the calibration. Conclusion This study is expected to provide sights into the accident sequences for the prospective risk analysis through the review of experiences. PMID:23865005
Enterobacter xiangfangensis sp. nov., isolated from Chinese traditional sourdough, and reclassification of Enterobacter sacchari Zhu et al. 2013 as Kosakonia sacchari comb. nov.

PubMed

Gu, Chun Tao; Li, Chun Yan; Yang, Li Jie; Huo, Gui Cheng

2014-08-01

A Gram-stain-negative bacterial strain, 10-17(T), was isolated from traditional sourdough in Heilongjiang Province, China. The bacterium was characterized by a polyphasic approach, including 16S rRNA gene sequence analysis, RNA polymerase β subunit (rpoB) gene sequence analysis, DNA gyrase (gyrB) gene sequence analysis, initiation translation factor 2 (infB) gene sequence analysis, ATP synthase β subunit (atpD) gene sequence analysis, fatty acid methyl ester analysis, determination of DNA G+C content, DNA-DNA hybridization and an analysis of phenotypic features. Strain 10-17(T) was phylogenetically related to Enterobacter hormaechei CIP 103441(T), Enterobacter cancerogenus LMG 2693(T), Enterobacter asburiae JCM 6051(T), Enterobacter mori LMG 25706(T), Enterobacter ludwigii EN-119(T) and Leclercia adecarboxylata LMG 2803(T), having 99.5%, 99.3%, 98.7%, 98.5%, 98.4% and 98.4% 16S rRNA gene sequence similarity, respectively. On the basis of polyphasic characterization data obtained in the present study, a novel species, Enterobacter xiangfangensis sp. nov., is proposed and the type strain is 10-17(T) ( = LMG 27195(T) = NCIMB 14836(T) = CCUG 62994(T)). Enterobacter sacchari Zhu et al. 2013 was reclassified as Kosakonia sacchari comb. nov. on the basis of 16S rRNA, rpoB, gyrB, infB and atpD gene sequence analysis and the type strain is strain SP1(T)( = CGMCC 1.12102(T) = LMG 26783(T)). © 2014 IUMS.
Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications.

PubMed

Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H; Lee, Doheon

2007-01-01

With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene-patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene-patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at http://www.patome.org/; the information is updated bimonthly.
Patome: a database server for biological sequence annotation and analysis in issued patents and published patent applications

PubMed Central

Lee, Byungwook; Kim, Taehyung; Kim, Seon-Kyu; Lee, Kwang H.; Lee, Doheon

2007-01-01

With the advent of automated and high-throughput techniques, the number of patent applications containing biological sequences has been increasing rapidly. However, they have attracted relatively little attention compared to other sequence resources. We have built a database server called Patome, which contains biological sequence data disclosed in patents and published applications, as well as their analysis information. The analysis is divided into two steps. The first is an annotation step in which the disclosed sequences were annotated with RefSeq database. The second is an association step where the sequences were linked to Entrez Gene, OMIM and GO databases, and their results were saved as a gene–patent table. From the analysis, we found that 55% of human genes were associated with patenting. The gene–patent table can be used to identify whether a particular gene or disease is related to patenting. Patome is available at ; the information is updated bimonthly. PMID:17085479
Analysis of xylem formation in pine by cDNA sequencing

NASA Technical Reports Server (NTRS)

Allona, I.; Quinn, M.; Shoop, E.; Swope, K.; St Cyr, S.; Carlis, J.; Riedl, J.; Retzel, E.; Campbell, M. M.; Sederoff, R.;

1998-01-01

Secondary xylem (wood) formation is likely to involve some genes expressed rarely or not at all in herbaceous plants. Moreover, environmental and developmental stimuli influence secondary xylem differentiation, producing morphological and chemical changes in wood. To increase our understanding of xylem formation, and to provide material for comparative analysis of gymnosperm and angiosperm sequences, ESTs were obtained from immature xylem of loblolly pine (Pinus taeda L.). A total of 1,097 single-pass sequences were obtained from 5' ends of cDNAs made from gravistimulated tissue from bent trees. Cluster analysis detected 107 groups of similar sequences, ranging in size from 2 to 20 sequences. A total of 361 sequences fell into these groups, whereas 736 sequences were unique. About 55% of the pine EST sequences show similarity to previously described sequences in public databases. About 10% of the recognized genes encode factors involved in cell wall formation. Sequences similar to cell wall proteins, most known lignin biosynthetic enzymes, and several enzymes of carbohydrate metabolism were found. A number of putative regulatory proteins also are represented. Expression patterns of several of these genes were studied in various tissues and organs of pine. Sequencing novel genes expressed during xylem formation will provide a powerful means of identifying mechanisms controlling this important differentiation pathway.

Performance evaluation of Sanger sequencing for the diagnosis of primary hyperoxaluria and comparison with targeted next generation sequencing

PubMed Central

Williams, Emma L; Bagg, Eleanor A L; Mueller, Michael; Vandrovcova, Jana; Aitman, Timothy J; Rumsby, Gill

2015-01-01

Definitive diagnosis of primary hyperoxaluria (PH) currently utilizes sequential Sanger sequencing of the AGXT, GRPHR, and HOGA1 genes but efficacy is unproven. This analysis is time-consuming, relatively expensive, and delays in diagnosis and inappropriate treatment can occur if not pursued early in the diagnostic work-up. We reviewed testing outcomes of Sanger sequencing in 200 consecutive patient samples referred for analysis. In addition, the Illumina Truseq custom amplicon system was evaluated for paralleled next-generation sequencing (NGS) of AGXT,GRHPR, and HOGA1 in 90 known PH patients. AGXT sequencing was requested in all patients, permitting a diagnosis of PH1 in 50%. All remaining patients underwent targeted exon sequencing of GRHPR and HOGA1 with 8% diagnosed with PH2 and 8% with PH3. Complete sequencing of both GRHPR and HOGA1 was not requested in 25% of patients referred leaving their diagnosis in doubt. NGS analysis showed 98% agreement with Sanger sequencing and both approaches had 100% diagnostic specificity. Diagnostic sensitivity of Sanger sequencing was 98% and for NGS it was 97%. NGS has comparable diagnostic performance to Sanger sequencing for the diagnosis of PH and, if implemented, would screen for all forms of PH simultaneously ensuring prompt diagnosis at decreased cost. PMID:25629080
Sequence quality analysis tool for HIV type 1 protease and reverse transcriptase.

PubMed

Delong, Allison K; Wu, Mingham; Bennett, Diane; Parkin, Neil; Wu, Zhijin; Hogan, Joseph W; Kantor, Rami

2012-08-01

Access to antiretroviral therapy is increasing globally and drug resistance evolution is anticipated. Currently, protease (PR) and reverse transcriptase (RT) sequence generation is increasing, including the use of in-house sequencing assays, and quality assessment prior to sequence analysis is essential. We created a computational HIV PR/RT Sequence Quality Analysis Tool (SQUAT) that runs in the R statistical environment. Sequence quality thresholds are calculated from a large dataset (46,802 PR and 44,432 RT sequences) from the published literature ( http://hivdb.Stanford.edu ). Nucleic acid sequences are read into SQUAT, identified, aligned, and translated. Nucleic acid sequences are flagged if with >five 1-2-base insertions; >one 3-base insertion; >one deletion; >six PR or >18 RT ambiguous bases; >three consecutive PR or >four RT nucleic acid mutations; >zero stop codons; >three PR or >six RT ambiguous amino acids; >three consecutive PR or >four RT amino acid mutations; >zero unique amino acids; or <0.5% or >15% genetic distance from another submitted sequence. Thresholds are user modifiable. SQUAT output includes a summary report with detailed comments for troubleshooting of flagged sequences, histograms of pairwise genetic distances, neighbor joining phylogenetic trees, and aligned nucleic and amino acid sequences. SQUAT is a stand-alone, free, web-independent tool to ensure use of high-quality HIV PR/RT sequences in interpretation and reporting of drug resistance, while increasing awareness and expertise and facilitating troubleshooting of potentially problematic sequences.
Bound Water at Protein-Protein Interfaces: Partners, Roles and Hydrophobic Bubbles as a Conserved Motif

PubMed Central

Ahmed, Mostafa H.; Spyrakis, Francesca; Cozzini, Pietro; Tripathi, Parijat K.; Mozzarelli, Andrea; Scarsdale, J. Neel; Safo, Martin A.; Kellogg, Glen E.

2011-01-01

Background There is a great interest in understanding and exploiting protein-protein associations as new routes for treating human disease. However, these associations are difficult to structurally characterize or model although the number of X-ray structures for protein-protein complexes is expanding. One feature of these complexes that has received little attention is the role of water molecules in the interfacial region. Methodology A data set of 4741 water molecules abstracted from 179 high-resolution (≤ 2.30 Å) X-ray crystal structures of protein-protein complexes was analyzed with a suite of modeling tools based on the HINT forcefield and hydrogen-bonding geometry. A metric termed Relevance was used to classify the general roles of the water molecules. Results The water molecules were found to be involved in: a) (bridging) interactions with both proteins (21%), b) favorable interactions with only one protein (53%), and c) no interactions with either protein (26%). This trend is shown to be independent of the crystallographic resolution. Interactions with residue backbones are consistent for all classes and account for 21.5% of all interactions. Interactions with polar residues are significantly more common for the first group and interactions with non-polar residues dominate the last group. Waters interacting with both proteins stabilize on average the proteins' interaction (−0.46 kcal mol−1), but the overall average contribution of a single water to the protein-protein interaction energy is unfavorable (+0.03 kcal mol−1). Analysis of the waters without favorable interactions with either protein suggests that this is a conserved phenomenon: 42% of these waters have SASA ≤ 10 Å2 and are thus largely buried, and 69% of these are within predominantly hydrophobic environments or “hydrophobic bubbles”. Such water molecules may have an important biological purpose in mediating protein-protein interactions. PMID:21961043
Docking-based Screening of Ficus religiosa Phytochemicals as Inhibitors of Human Histamine H2 Receptor.

PubMed

Chaudhary, Amit; Yadav, Birendra Singh; Singh, Swati; Maurya, Pramod Kumar; Mishra, Alok; Srivastva, Shweta; Varadwaj, Pritish Kumar; Singh, Nand Kumar; Mani, Ashutosh

2017-10-01

Ficus religiosa L. is generally known as Peepal and belongs to family Moraceae . The tree is a source of many compounds having high medicinal value. In gastrointestinal tract, histamine H2 receptors have key role in histamine-stimulated gastric acid secretion. Their over stimulation causes its excessive production which is responsible for gastric ulcer. This study aims to screen the range of phytochemicals present in F. religiosa for binding with human histamine H2 and identify therapeutics for a gastric ulcer from the plant. In this work, a 3D-structure of human histamine H2 receptor was modeled by using homology modeling and the predicted model was validated using PROCHECK. Docking studies were also performed to assess binding affinities between modeled receptor and 34 compounds. Molecular dynamics simulations were done to identify most stable receptor-ligand complexes. Absorption, distribution, metabolism, excretion, and screening was done to evaluate pharmacokinetic properties of compounds. The results suggest that seven ligands, namely, germacrene, bergaptol, lanosterol, Ergost-5-en-3beta-ol, α-amyrin acetate, bergapten, and γ-cadinene showed better binding affinities. Among seven phytochemicals, lanosterol and α-amyrin acetate were found to have greater stability during simulation studies. These two compounds may be a suitable therapeutic agent against histamine H2 receptor. This study was performed to screen antiulcer compounds from F. religiosa . Molecular modeling, molecular docking and MD simulation studies were performed with selected phytochemicals from F. religiosa . The analysis suggests that Lanosterol and α-amyrin may be a suitable therapeutic agent against histamine H2 receptor. This study facilitates initiation of the herbal drug discovery process for the antiulcer activity. Abbreviations used: ADMET: Absorption, distribution, metabolism, excretion and toxicity, DOPE: Discrete Optimized Potential Energy, OPLS: Optimized potential for liquid simulations, RMSD: Root-mean-square deviation, HOA: Human oral absorption, MW: Molecular weight, SP: Standard-precision, XP: Extra-precision, GPCRs: G protein-coupled receptors, SASA: Solvent accessible surface area, Rg: Radius of gyration, NHB: Number of hydrogen bond.
Complete genome sequence analysis of novel human bocavirus reveals genetic recombination between human bocavirus 2 and human bocavirus 4.

PubMed

Khamrin, Pattara; Okitsu, Shoko; Ushijima, Hiroshi; Maneekarn, Niwat

2013-07-01

Epidemiological surveillance of human bocavirus (HBoV) was conducted on fecal specimens collected from hospitalized children with diarrhea in Chiang Mai, Thailand in 2011. By partial sequence analysis of VP1 gene, an unusual strain of HBoV (CMH-S011-11), was initially identified as HBoV4. The complete genome sequence of CMH-S011-11 was performed and analyzed further to clarify whether it was a recombinant strain or a new HBoV variant. Analysis of complete genome sequence revealed that the coding sequence starting from NS1, NP1 to VP1/VP2 was 4795 nucleotides long. Interestingly, the nucleotide sequence of NS1 gene of CMH-S011-11 was most closely related to the HBoV2 reference strains detected in Pakistan, which contradicted to the initial genotyping result of the partial VP1 region in the previous study. In addition, comparison of NP1 nucleotide sequence of CMH-S011-11 with those of other HBoV1-4 reference strains also revealed a high level of sequence identity with HBoV2. On the other hand, nucleotide sequence of VP1/VP2 gene of CMH-S011-11 was most closely related to those of HBoV4 reference strains detected in Nigeria. The overall full-length sequence analysis revealed that this CMH-S011-11 was grouped within HBoV4 species, but located in a separate branch from other HBoV4 prototype strains. Recombination analysis revealed that CMH-S011-11 was the result of recombination between HBoV2 and HBoV4 strains with the break point located near the start codon of VP2. Copyright © 2013 Elsevier B.V. All rights reserved.
CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing.

PubMed

Angiuoli, Samuel V; Matalka, Malcolm; Gussman, Aaron; Galens, Kevin; Vangala, Mahesh; Riley, David R; Arze, Cesar; White, James R; White, Owen; Fricke, W Florian

2011-08-30

Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing.
Single-Cell RNA-Sequencing: Assessment of Differential Expression Analysis Methods.

PubMed

Dal Molin, Alessandra; Baruzzo, Giacomo; Di Camillo, Barbara

2017-01-01

The sequencing of the transcriptomes of single-cells, or single-cell RNA-sequencing, has now become the dominant technology for the identification of novel cell types and for the study of stochastic gene expression. In recent years, various tools for analyzing single-cell RNA-sequencing data have been proposed, many of them with the purpose of performing differentially expression analysis. In this work, we compare four different tools for single-cell RNA-sequencing differential expression, together with two popular methods originally developed for the analysis of bulk RNA-sequencing data, but largely applied to single-cell data. We discuss results obtained on two real and one synthetic dataset, along with considerations about the perspectives of single-cell differential expression analysis. In particular, we explore the methods performance in four different scenarios, mimicking different unimodal or bimodal distributions of the data, as characteristic of single-cell transcriptomics. We observed marked differences between the selected methods in terms of precision and recall, the number of detected differentially expressed genes and the overall performance. Globally, the results obtained in our study suggest that is difficult to identify a best performing tool and that efforts are needed to improve the methodologies for single-cell RNA-sequencing data analysis and gain better accuracy of results.

CloVR-ITS: Automated internal transcribed spacer amplicon sequence analysis pipeline for the characterization of fungal microbiota

PubMed Central

2013-01-01

Background Besides the development of comprehensive tools for high-throughput 16S ribosomal RNA amplicon sequence analysis, there exists a growing need for protocols emphasizing alternative phylogenetic markers such as those representing eukaryotic organisms. Results Here we introduce CloVR-ITS, an automated pipeline for comparative analysis of internal transcribed spacer (ITS) pyrosequences amplified from metagenomic DNA isolates and representing fungal species. This pipeline performs a variety of steps similar to those commonly used for 16S rRNA amplicon sequence analysis, including preprocessing for quality, chimera detection, clustering of sequences into operational taxonomic units (OTUs), taxonomic assignment (at class, order, family, genus, and species levels) and statistical analysis of sample groups of interest based on user-provided information. Using ITS amplicon pyrosequencing data from a previous human gastric fluid study, we demonstrate the utility of CloVR-ITS for fungal microbiota analysis and provide runtime and cost examples, including analysis of extremely large datasets on the cloud. We show that the largest fractions of reads from the stomach fluid samples were assigned to Dothideomycetes, Saccharomycetes, Agaricomycetes and Sordariomycetes but that all samples were dominated by sequences that could not be taxonomically classified. Representatives of the Candida genus were identified in all samples, most notably C. quercitrusa, while sequence reads assigned to the Aspergillus genus were only identified in a subset of samples. CloVR-ITS is made available as a pre-installed, automated, and portable software pipeline for cloud-friendly execution as part of the CloVR virtual machine package (http://clovr.org). Conclusion The CloVR-ITS pipeline provides fungal microbiota analysis that can be complementary to bacterial 16S rRNA and total metagenome sequence analysis allowing for more comprehensive studies of environmental and host-associated microbial communities. PMID:24451270
GATA: A graphic alignment tool for comparative sequenceanalysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nix, David A.; Eisen, Michael B.

2005-01-01

Several problems exist with current methods used to align DNA sequences for comparative sequence analysis. Most dynamic programming algorithms assume that conserved sequence elements are collinear. This assumption appears valid when comparing orthologous protein coding sequences. Functional constraints on proteins provide strong selective pressure against sequence inversions, and minimize sequence duplications and feature shuffling. For non-coding sequences this collinearity assumption is often invalid. For example, enhancers contain clusters of transcription factor binding sites that change in number, orientation, and spacing during evolution yet the enhancer retains its activity. Dotplot analysis is often used to estimate non-coding sequence relatedness. Yet dotmore » plots do not actually align sequences and thus cannot account well for base insertions or deletions. Moreover, they lack an adequate statistical framework for comparing sequence relatedness and are limited to pairwise comparisons. Lastly, dot plots and dynamic programming text outputs fail to provide an intuitive means for visualizing DNA alignments.« less
Deep Sequencing to Identify the Causes of Viral Encephalitis

PubMed Central

Chan, Benjamin K.; Wilson, Theodore; Fischer, Kael F.; Kriesel, John D.

2014-01-01

Deep sequencing allows for a rapid, accurate characterization of microbial DNA and RNA sequences in many types of samples. Deep sequencing (also called next generation sequencing or NGS) is being developed to assist with the diagnosis of a wide variety of infectious diseases. In this study, seven frozen brain samples from deceased subjects with recent encephalitis were investigated. RNA from each sample was extracted, randomly reverse transcribed and sequenced. The sequence analysis was performed in a blinded fashion and confirmed with pathogen-specific PCR. This analysis successfully identified measles virus sequences in two brain samples and herpes simplex virus type-1 sequences in three brain samples. No pathogen was identified in the other two brain specimens. These results were concordant with pathogen-specific PCR and partially concordant with prior neuropathological examinations, demonstrating that deep sequencing can accurately identify viral infections in frozen brain tissue. PMID:24699691
Method for phosphorothioate antisense DNA sequencing by capillary electrophoresis with UV detection.

PubMed

Froim, D; Hopkins, C E; Belenky, A; Cohen, A S

1997-11-01

The progress of antisense DNA therapy demands development of reliable and convenient methods for sequencing short single-stranded oligonucleotides. A method of phosphorothioate antisense DNA sequencing analysis using UV detection coupled to capillary electrophoresis (CE) has been developed based on a modified chain termination sequencing method. The proposed method reduces the sequencing cost since it uses affordable CE-UV instrumentation and requires no labeling with minimal sample processing before analysis. Cycle sequencing with ThermoSequenase generates quantities of sequencing products that are readily detectable by UV. Discrimination of undesired components from sequencing products in the reaction mixture, previously accomplished by fluorescent or radioactive labeling, is now achieved by bringing concentrations of undesired components below the UV detection range which yields a 'clean', well defined sequence. UV detection coupled with CE offers additional conveniences for sequencing since it can be accomplished with commercially available CE-UV equipment and is readily amenable to automation.
Method for phosphorothioate antisense DNA sequencing by capillary electrophoresis with UV detection.

PubMed Central

Froim, D; Hopkins, C E; Belenky, A; Cohen, A S

1997-01-01

The progress of antisense DNA therapy demands development of reliable and convenient methods for sequencing short single-stranded oligonucleotides. A method of phosphorothioate antisense DNA sequencing analysis using UV detection coupled to capillary electrophoresis (CE) has been developed based on a modified chain termination sequencing method. The proposed method reduces the sequencing cost since it uses affordable CE-UV instrumentation and requires no labeling with minimal sample processing before analysis. Cycle sequencing with ThermoSequenase generates quantities of sequencing products that are readily detectable by UV. Discrimination of undesired components from sequencing products in the reaction mixture, previously accomplished by fluorescent or radioactive labeling, is now achieved by bringing concentrations of undesired components below the UV detection range which yields a 'clean', well defined sequence. UV detection coupled with CE offers additional conveniences for sequencing since it can be accomplished with commercially available CE-UV equipment and is readily amenable to automation. PMID:9336449
Analysis of sequences from field samples reveals the presence of the recently described pepper vein yellows virus (genus Polerovirus) in six additional countries.

PubMed

Knierim, Dennis; Tsai, Wen-Shi; Kenyon, Lawrence

2013-06-01

Polerovirus infection was detected by reverse transcription polymerase chain reaction (RT-PCR) in 29 pepper plants (Capsicum spp.) and one black nightshade plant (Solanum nigrum) sample collected from fields in India, Indonesia, Mali, Philippines, Thailand and Taiwan. At least two representative samples for each country were selected to generate a general polerovirus RT-PCR product of 1.4 kb length for sequencing. Sequence analysis of the partial genome sequences revealed the presence of pepper vein yellows virus (PeVYV) in all 13 samples. A 1990 Australian herbarium sample of pepper described by serological means as infected with capsicum yellows virus (CYV) was identified by sequence analysis of a partial CP sequence as probably infected with a potato leaf roll virus (PLRV) isolate.
VISA--Vector Integration Site Analysis server: a web-based server to rapidly identify retroviral integration sites from next-generation sequencing.

PubMed

Hocum, Jonah D; Battrell, Logan R; Maynard, Ryan; Adair, Jennifer E; Beard, Brian C; Rawlings, David J; Kiem, Hans-Peter; Miller, Daniel G; Trobridge, Grant D

2015-07-07

Analyzing the integration profile of retroviral vectors is a vital step in determining their potential genotoxic effects and developing safer vectors for therapeutic use. Identifying retroviral vector integration sites is also important for retroviral mutagenesis screens. We developed VISA, a vector integration site analysis server, to analyze next-generation sequencing data for retroviral vector integration sites. Sequence reads that contain a provirus are mapped to the human genome, sequence reads that cannot be localized to a unique location in the genome are filtered out, and then unique retroviral vector integration sites are determined based on the alignment scores of the remaining sequence reads. VISA offers a simple web interface to upload sequence files and results are returned in a concise tabular format to allow rapid analysis of retroviral vector integration sites.
Laser Desorption Mass Spectrometry for DNA Sequencing and Analysis

NASA Astrophysics Data System (ADS)

Chen, C. H. Winston; Taranenko, N. I.; Golovlev, V. V.; Isola, N. R.; Allman, S. L.

1998-03-01

Rapid DNA sequencing and/or analysis is critically important for biomedical research. In the past, gel electrophoresis has been the primary tool to achieve DNA analysis and sequencing. However, gel electrophoresis is a time-consuming and labor-extensive process. Recently, we have developed and used laser desorption mass spectrometry (LDMS) to achieve sequencing of ss-DNA longer than 100 nucleotides. With LDMS, we succeeded in sequencing DNA in seconds instead of hours or days required by gel electrophoresis. In addition to sequencing, we also applied LDMS for the detection of DNA probes for hybridization LDMS was also used to detect short tandem repeats for forensic applications. Clinical applications for disease diagnosis such as cystic fibrosis caused by base deletion and point mutation have also been demonstrated. Experimental details will be presented in the meeting. abstract.
Association mining of dependency between time series

NASA Astrophysics Data System (ADS)

Hafez, Alaaeldin

2001-03-01

Time series analysis is considered as a crucial component of strategic control over a broad variety of disciplines in business, science and engineering. Time series data is a sequence of observations collected over intervals of time. Each time series describes a phenomenon as a function of time. Analysis on time series data includes discovering trends (or patterns) in a time series sequence. In the last few years, data mining has emerged and been recognized as a new technology for data analysis. Data Mining is the process of discovering potentially valuable patterns, associations, trends, sequences and dependencies in data. Data mining techniques can discover information that many traditional business analysis and statistical techniques fail to deliver. In this paper, we adapt and innovate data mining techniques to analyze time series data. By using data mining techniques, maximal frequent patterns are discovered and used in predicting future sequences or trends, where trends describe the behavior of a sequence. In order to include different types of time series (e.g. irregular and non- systematic), we consider past frequent patterns of the same time sequences (local patterns) and of other dependent time sequences (global patterns). We use the word 'dependent' instead of the word 'similar' for emphasis on real life time series where two time series sequences could be completely different (in values, shapes, etc.), but they still react to the same conditions in a dependent way. In this paper, we propose the Dependence Mining Technique that could be used in predicting time series sequences. The proposed technique consists of three phases: (a) for all time series sequences, generate their trend sequences, (b) discover maximal frequent trend patterns, generate pattern vectors (to keep information of frequent trend patterns), use trend pattern vectors to predict future time series sequences.
Novel methodologies for spectral classification of exon and intron sequences

NASA Astrophysics Data System (ADS)

Kwan, Hon Keung; Kwan, Benjamin Y. M.; Kwan, Jennifer Y. Y.

2012-12-01

Digital processing of a nucleotide sequence requires it to be mapped to a numerical sequence in which the choice of nucleotide to numeric mapping affects how well its biological properties can be preserved and reflected from nucleotide domain to numerical domain. Digital spectral analysis of nucleotide sequences unfolds a period-3 power spectral value which is more prominent in an exon sequence as compared to that of an intron sequence. The success of a period-3 based exon and intron classification depends on the choice of a threshold value. The main purposes of this article are to introduce novel codes for 1-sequence numerical representations for spectral analysis and compare them to existing codes to determine appropriate representation, and to introduce novel thresholding methods for more accurate period-3 based exon and intron classification of an unknown sequence. The main findings of this study are summarized as follows: Among sixteen 1-sequence numerical representations, the K-Quaternary Code I offers an attractive performance. A windowed 1-sequence numerical representation (with window length of 9, 15, and 24 bases) offers a possible speed gain over non-windowed 4-sequence Voss representation which increases as sequence length increases. A winner threshold value (chosen from the best among two defined threshold values and one other threshold value) offers a top precision for classifying an unknown sequence of specified fixed lengths. An interpolated winner threshold value applicable to an unknown and arbitrary length sequence can be estimated from the winner threshold values of fixed length sequences with a comparable performance. In general, precision increases as sequence length increases. The study contributes an effective spectral analysis of nucleotide sequences to better reveal embedded properties, and has potential applications in improved genome annotation.
Noncoding sequence classification based on wavelet transform analysis: part II

NASA Astrophysics Data System (ADS)

Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.

2017-09-01

DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.
A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection

PubMed Central

Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike

2018-01-01

ABSTRACT Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection. PMID:29564396
A Reference Viral Database (RVDB) To Enhance Bioinformatics Analysis of High-Throughput Sequencing for Novel Virus Detection.

PubMed

Goodacre, Norman; Aljanahi, Aisha; Nandakumar, Subhiksha; Mikailov, Mike; Khan, Arifa S

2018-01-01

Detection of distantly related viruses by high-throughput sequencing (HTS) is bioinformatically challenging because of the lack of a public database containing all viral sequences, without abundant nonviral sequences, which can extend runtime and obscure viral hits. Our reference viral database (RVDB) includes all viral, virus-related, and virus-like nucleotide sequences (excluding bacterial viruses), regardless of length, and with overall reduced cellular sequences. Semantic selection criteria (SEM-I) were used to select viral sequences from GenBank, resulting in a first-generation viral database (VDB). This database was manually and computationally reviewed, resulting in refined, semantic selection criteria (SEM-R), which were applied to a new download of updated GenBank sequences to create a second-generation VDB. Viral entries in the latter were clustered at 98% by CD-HIT-EST to reduce redundancy while retaining high viral sequence diversity. The viral identity of the clustered representative sequences (creps) was confirmed by BLAST searches in NCBI databases and HMMER searches in PFAM and DFAM databases. The resulting RVDB contained a broad representation of viral families, sequence diversity, and a reduced cellular content; it includes full-length and partial sequences and endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Testing of RVDBv10.2, with an in-house HTS transcriptomic data set indicated a significantly faster run for virus detection than interrogating the entirety of the NCBI nonredundant nucleotide database, which contains all viral sequences but also nonviral sequences. RVDB is publically available for facilitating HTS analysis, particularly for novel virus detection. It is meant to be updated on a regular basis to include new viral sequences added to GenBank. IMPORTANCE To facilitate bioinformatics analysis of high-throughput sequencing (HTS) data for the detection of both known and novel viruses, we have developed a new reference viral database (RVDB) that provides a broad representation of different virus species from eukaryotes by including all viral, virus-like, and virus-related sequences (excluding bacteriophages), regardless of their size. In particular, RVDB contains endogenous nonretroviral elements, endogenous retroviruses, and retrotransposons. Sequences were clustered to reduce redundancy while retaining high viral sequence diversity. A particularly useful feature of RVDB is the reduction of cellular sequences, which can enhance the run efficiency of large transcriptomic and genomic data analysis and increase the specificity of virus detection.
PHASTpep: Analysis Software for Discovery of Cell-Selective Peptides via Phage Display and Next-Generation Sequencing

PubMed Central

Dasa, Siva Sai Krishna; Kelly, Kimberly A.

2016-01-01

Next-generation sequencing has enhanced the phage display process, allowing for the quantification of millions of sequences resulting from the biopanning process. In response, many valuable analysis programs focused on specificity and finding targeted motifs or consensus sequences were developed. For targeted drug delivery and molecular imaging, it is also necessary to find peptides that are selective—targeting only the cell type or tissue of interest. We present a new analysis strategy and accompanying software, PHage Analysis for Selective Targeted PEPtides (PHASTpep), which identifies highly specific and selective peptides. Using this process, we discovered and validated, both in vitro and in vivo in mice, two sequences (HTTIPKV and APPIMSV) targeted to pancreatic cancer-associated fibroblasts that escaped identification using previously existing software. Our selectivity analysis makes it possible to discover peptides that target a specific cell type and avoid other cell types, enhancing clinical translatability by circumventing complications with systemic use. PMID:27186887
Illuminator, a desktop program for mutation detection using short-read clonal sequencing.

PubMed

Carr, Ian M; Morgan, Joanne E; Diggle, Christine P; Sheridan, Eamonn; Markham, Alexander F; Logan, Clare V; Inglehearn, Chris F; Taylor, Graham R; Bonthron, David T

2011-10-01

Current methods for sequencing clonal populations of DNA molecules yield several gigabases of data per day, typically comprising reads of < 100 nt. Such datasets permit widespread genome resequencing and transcriptome analysis or other quantitative tasks. However, this huge capacity can also be harnessed for the resequencing of smaller (gene-sized) target regions, through the simultaneous parallel analysis of multiple subjects, using sample "tagging" or "indexing". These methods promise to have a huge impact on diagnostic mutation analysis and candidate gene testing. Here we describe a software package developed for such studies, offering the ability to resolve pooled samples carrying barcode tags and to align reads to a reference sequence using a mutation-tolerant process. The program, Illuminator, can identify rare sequence variants, including insertions and deletions, and permits interactive data analysis on standard desktop computers. It facilitates the effective analysis of targeted clonal sequencer data without dedicated computational infrastructure or specialized training. Copyright © 2011 Elsevier Inc. All rights reserved.
A Web-Hosted R Workflow to Simplify and Automate the Analysis of 16S NGS Data

EPA Science Inventory

Next-Generation Sequencing (NGS) produces large data sets that include tens-of-thousands of sequence reads per sample. For analysis of bacterial diversity, 16S NGS sequences are typically analyzed in a workflow that containing best-of-breed bioinformatics packages that may levera...
Detecting and Estimating Contamination of Human DNA Samples in Sequencing and Array-Based Genotype Data

PubMed Central

Jun, Goo; Flickinger, Matthew; Hetrick, Kurt N.; Romm, Jane M.; Doheny, Kimberly F.; Abecasis, Gonçalo R.; Boehnke, Michael; Kang, Hyun Min

2012-01-01

DNA sample contamination is a serious problem in DNA sequencing studies and may result in systematic genotype misclassification and false positive associations. Although methods exist to detect and filter out cross-species contamination, few methods to detect within-species sample contamination are available. In this paper, we describe methods to identify within-species DNA sample contamination based on (1) a combination of sequencing reads and array-based genotype data, (2) sequence reads alone, and (3) array-based genotype data alone. Analysis of sequencing reads allows contamination detection after sequence data is generated but prior to variant calling; analysis of array-based genotype data allows contamination detection prior to generation of costly sequence data. Through a combination of analysis of in silico and experimentally contaminated samples, we show that our methods can reliably detect and estimate levels of contamination as low as 1%. We evaluate the impact of DNA contamination on genotype accuracy and propose effective strategies to screen for and prevent DNA contamination in sequencing studies. PMID:23103226
Recursive sequences in first-year calculus

NASA Astrophysics Data System (ADS)

Krainer, Thomas

2016-02-01

This article provides ready-to-use supplementary material on recursive sequences for a second-semester calculus class. It equips first-year calculus students with a basic methodical procedure based on which they can conduct a rigorous convergence or divergence analysis of many simple recursive sequences on their own without the need to invoke inductive arguments as is typically required in calculus textbooks. The sequences that are accessible to this kind of analysis are predominantly (eventually) monotonic, but also certain recursive sequences that alternate around their limit point as they converge can be considered.
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase).

PubMed

Odronitz, Florian; Kollmar, Martin

2006-11-29

Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein.
Design and Analysis of Single-Cell Sequencing Experiments.

PubMed

Grün, Dominic; van Oudenaarden, Alexander

2015-11-05

Recent advances in single-cell sequencing hold great potential for exploring biological systems with unprecedented resolution. Sequencing the genome of individual cells can reveal somatic mutations and allows the investigation of clonal dynamics. Single-cell transcriptome sequencing can elucidate the cell type composition of a sample. However, single-cell sequencing comes with major technical challenges and yields complex data output. In this Primer, we provide an overview of available methods and discuss experimental design and single-cell data analysis. We hope that these guidelines will enable a growing number of researchers to leverage the power of single-cell sequencing. Copyright © 2015 Elsevier Inc. All rights reserved.

Partial sequencing of sodA gene and its application to identification of Streptococcus dysgalactiae subsp. dysgalactiae isolated from farmed fish.

PubMed

Nomoto, R; Kagawa, H; Yoshida, T

2008-01-01

To investigate the difference between Lancefield group C Streptococcus dysgalactiae (GCSD) strains isolated from diseased fish and animals by sequencing and phylogenetic analysis of the sodA gene. The sodA gene of Strep. dysgalactiae strains isolated from fish and animals were amplified and its nucleotide sequences were determined. Although 100% sequence identity was observed among fish GCSD strains, the determined sequences from animal isolates showed variations against fish isolate sequences. Thus, all fish GCSD strains were clearly separated from the GCSD strains of other origin by using phylogenetic tree analysis. In addition, the original primer set was designed based on the determined sequences for specifically amplify the sodA gene of fish GCSD strains. The primer set yield amplification products from only fish GCSD strains. By sequencing analysis of the sodA gene, the genetic divergence between Strep. dysgalactiae strains isolated from fish and mammals was demonstrated. Moreover, an original oligonucletide primer set, which could simply detect the genotype of fish GCSD strains was designed. This study shows that Strep. dysgalactiae isolated from diseased fish could be distinguished from conventional GCSD strains by the difference in the sequence of the sodA gene.
Integrated databanks access and sequence/structure analysis services at the PBIL.

PubMed

Perrière, Guy; Combet, Christophe; Penel, Simon; Blanchet, Christophe; Thioulouse, Jean; Geourjon, Christophe; Grassot, Julien; Charavay, Céline; Gouy, Manolo; Duret, Laurent; Deléage, Gilbert

2003-07-01

The World Wide Web server of the PBIL (Pôle Bioinformatique Lyonnais) provides on-line access to sequence databanks and to many tools of nucleic acid and protein sequence analyses. This server allows to query nucleotide sequence banks in the EMBL and GenBank formats and protein sequence banks in the SWISS-PROT and PIR formats. The query engine on which our data bank access is based is the ACNUC system. It allows the possibility to build complex queries to access functional zones of biological interest and to retrieve large sequence sets. Of special interest are the unique features provided by this system to query the data banks of gene families developed at the PBIL. The server also provides access to a wide range of sequence analysis methods: similarity search programs, multiple alignments, protein structure prediction and multivariate statistics. An originality of this server is the integration of these two aspects: sequence retrieval and sequence analysis. Indeed, thanks to the introduction of re-usable lists, it is possible to perform treatments on large sets of data. The PBIL server can be reached at: http://pbil.univ-lyon1.fr.
Quantiprot - a Python package for quantitative analysis of protein sequences.

PubMed

Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

2017-07-17

The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
A proteomic analysis of leaf sheaths from rice.

PubMed

Shen, Shihua; Matsubae, Masami; Takao, Toshifumi; Tanaka, Naoki; Komatsu, Setsuko

2002-10-01

The proteins extracted from the leaf sheaths of rice seedlings were separated by 2-D PAGE, and analyzed by Edman sequencing and mass spectrometry, followed by database searching. Image analysis revealed 352 protein spots on 2-D PAGE after staining with Coomassie Brilliant Blue. The amino acid sequences of 44 of 84 proteins were determined; for 31 of these proteins, a clear function could be assigned, whereas for 12 proteins, no function could be assigned. Forty proteins did not yield amino acid sequence information, because they were N-terminally blocked, or the obtained sequences were too short and/or did not give unambiguous results. Fifty-nine proteins were analyzed by mass spectrometry; all of these proteins were identified by matching to the protein database. The amino acid sequences of 19 of 27 proteins analyzed by mass spectrometry were similar to the results of Edman sequencing. These results suggest that 2-D PAGE combined with Edman sequencing and mass spectrometry analysis can be effectively used to identify plant proteins.
A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing

PubMed Central

Alioto, Tyler S.; Buchhalter, Ivo; Derdak, Sophia; Hutter, Barbara; Eldridge, Matthew D.; Hovig, Eivind; Heisler, Lawrence E.; Beck, Timothy A.; Simpson, Jared T.; Tonon, Laurie; Sertier, Anne-Sophie; Patch, Ann-Marie; Jäger, Natalie; Ginsbach, Philip; Drews, Ruben; Paramasivam, Nagarajan; Kabbe, Rolf; Chotewutmontri, Sasithorn; Diessl, Nicolle; Previti, Christopher; Schmidt, Sabine; Brors, Benedikt; Feuerbach, Lars; Heinold, Michael; Gröbner, Susanne; Korshunov, Andrey; Tarpey, Patrick S.; Butler, Adam P.; Hinton, Jonathan; Jones, David; Menzies, Andrew; Raine, Keiran; Shepherd, Rebecca; Stebbings, Lucy; Teague, Jon W.; Ribeca, Paolo; Giner, Francesc Castro; Beltran, Sergi; Raineri, Emanuele; Dabad, Marc; Heath, Simon C.; Gut, Marta; Denroche, Robert E.; Harding, Nicholas J.; Yamaguchi, Takafumi N.; Fujimoto, Akihiro; Nakagawa, Hidewaki; Quesada, Víctor; Valdés-Mas, Rafael; Nakken, Sigve; Vodák, Daniel; Bower, Lawrence; Lynch, Andrew G.; Anderson, Charlotte L.; Waddell, Nicola; Pearson, John V.; Grimmond, Sean M.; Peto, Myron; Spellman, Paul; He, Minghui; Kandoth, Cyriac; Lee, Semin; Zhang, John; Létourneau, Louis; Ma, Singer; Seth, Sahil; Torrents, David; Xi, Liu; Wheeler, David A.; López-Otín, Carlos; Campo, Elías; Campbell, Peter J.; Boutros, Paul C.; Puente, Xose S.; Gerhard, Daniela S.; Pfister, Stefan M.; McPherson, John D.; Hudson, Thomas J.; Schlesner, Matthias; Lichter, Peter; Eils, Roland; Jones, David T. W.; Gut, Ivo G.

2015-01-01

As whole-genome sequencing for cancer genome analysis becomes a clinical tool, a full understanding of the variables affecting sequencing analysis output is required. Here using tumour-normal sample pairs from two different types of cancer, chronic lymphocytic leukaemia and medulloblastoma, we conduct a benchmarking exercise within the context of the International Cancer Genome Consortium. We compare sequencing methods, analysis pipelines and validation methods. We show that using PCR-free methods and increasing sequencing depth to ∼100 × shows benefits, as long as the tumour:control coverage ratio remains balanced. We observe widely varying mutation call rates and low concordance among analysis pipelines, reflecting the artefact-prone nature of the raw data and lack of standards for dealing with the artefacts. However, we show that, using the benchmark mutation set we have created, many issues are in fact easy to remedy and have an immediate positive impact on mutation detection accuracy. PMID:26647970
Influenza Virus Database (IVDB): an integrated information resource and analysis platform for influenza virus research.

PubMed

Chang, Suhua; Zhang, Jiajie; Liao, Xiaoyun; Zhu, Xinxing; Wang, Dahai; Zhu, Jiang; Feng, Tao; Zhu, Baoli; Gao, George F; Wang, Jian; Yang, Huanming; Yu, Jun; Wang, Jing

2007-01-01

Frequent outbreaks of highly pathogenic avian influenza and the increasing data available for comparative analysis require a central database specialized in influenza viruses (IVs). We have established the Influenza Virus Database (IVDB) to integrate information and create an analysis platform for genetic, genomic, and phylogenetic studies of the virus. IVDB hosts complete genome sequences of influenza A virus generated by Beijing Institute of Genomics (BIG) and curates all other published IV sequences after expert annotation. Our Q-Filter system classifies and ranks all nucleotide sequences into seven categories according to sequence content and integrity. IVDB provides a series of tools and viewers for comparative analysis of the viral genomes, genes, genetic polymorphisms and phylogenetic relationships. A search system has been developed for users to retrieve a combination of different data types by setting search options. To facilitate analysis of global viral transmission and evolution, the IV Sequence Distribution Tool (IVDT) has been developed to display the worldwide geographic distribution of chosen viral genotypes and to couple genomic data with epidemiological data. The BLAST, multiple sequence alignment and phylogenetic analysis tools were integrated for online data analysis. Furthermore, IVDB offers instant access to pre-computed alignments and polymorphisms of IV genes and proteins, and presents the results as SNP distribution plots and minor allele distributions. IVDB is publicly available at http://influenza.genomics.org.cn.
SUGAR: graphical user interface-based data refiner for high-throughput DNA sequencing.

PubMed

Sato, Yukuto; Kojima, Kaname; Nariai, Naoki; Yamaguchi-Kabata, Yumi; Kawai, Yosuke; Takahashi, Mamoru; Mimori, Takahiro; Nagasaki, Masao

2014-08-08

Next-generation sequencers (NGSs) have become one of the main tools for current biology. To obtain useful insights from the NGS data, it is essential to control low-quality portions of the data affected by technical errors such as air bubbles in sequencing fluidics. We develop a software SUGAR (subtile-based GUI-assisted refiner) which can handle ultra-high-throughput data with user-friendly graphical user interface (GUI) and interactive analysis capability. The SUGAR generates high-resolution quality heatmaps of the flowcell, enabling users to find possible signals of technical errors during the sequencing. The sequencing data generated from the error-affected regions of a flowcell can be selectively removed by automated analysis or GUI-assisted operations implemented in the SUGAR. The automated data-cleaning function based on sequence read quality (Phred) scores was applied to a public whole human genome sequencing data and we proved the overall mapping quality was improved. The detailed data evaluation and cleaning enabled by SUGAR would reduce technical problems in sequence read mapping, improving subsequent variant analysis that require high-quality sequence data and mapping results. Therefore, the software will be especially useful to control the quality of variant calls to the low population cells, e.g., cancers, in a sample with technical errors of sequencing procedures.
Geoseq: a tool for dissecting deep-sequencing datasets.

PubMed

Gurtowski, James; Cancio, Anthony; Shah, Hardik; Levovitz, Chaya; George, Ajish; Homann, Robert; Sachidanandam, Ravi

2010-10-12

Datasets generated on deep-sequencing platforms have been deposited in various public repositories such as the Gene Expression Omnibus (GEO), Sequence Read Archive (SRA) hosted by the NCBI, or the DNA Data Bank of Japan (ddbj). Despite being rich data sources, they have not been used much due to the difficulty in locating and analyzing datasets of interest. Geoseq http://geoseq.mssm.edu provides a new method of analyzing short reads from deep sequencing experiments. Instead of mapping the reads to reference genomes or sequences, Geoseq maps a reference sequence against the sequencing data. It is web-based, and holds pre-computed data from public libraries. The analysis reduces the input sequence to tiles and measures the coverage of each tile in a sequence library through the use of suffix arrays. The user can upload custom target sequences or use gene/miRNA names for the search and get back results as plots and spreadsheet files. Geoseq organizes the public sequencing data using a controlled vocabulary, allowing identification of relevant libraries by organism, tissue and type of experiment. Analysis of small sets of sequences against deep-sequencing datasets, as well as identification of public datasets of interest, is simplified by Geoseq. We applied Geoseq to, a) identify differential isoform expression in mRNA-seq datasets, b) identify miRNAs (microRNAs) in libraries, and identify mature and star sequences in miRNAS and c) to identify potentially mis-annotated miRNAs. The ease of using Geoseq for these analyses suggests its utility and uniqueness as an analysis tool.
Initial sequencing and comparative analysis of the mouse genome

DOE Office of Scientific and Technical Information (OSTI.GOV)

Waterston, Robert H.; Lindblad-Toh, Kerstin; Birney, Ewan

2002-12-15

The sequence of the mouse genome is a key informational tool for understanding the contents of the human genome and a key experimental tool for biomedical research. Here, we report the results of an international collaboration to produce a high-quality draft sequence of the mouse genome. We also present an initial comparative analysis of the mouse and human genomes, describing some of the insights that can be gleaned from the two sequences. We discuss topics including the analysis of the evolutionary forces shaping the size, structure and sequence of the genomes; the conservation of large-scale synteny across most of themore » genomes; the much lower extent of sequence orthology covering less than half of the genomes; the proportions of the genomes under selection; the number of protein-coding genes; the expansion of gene families related to reproduction and immunity; the evolution of proteins; and the identification of intraspecies polymorphism.« less
ReSeqTools: an integrated toolkit for large-scale next-generation sequencing based resequencing analysis.

PubMed

He, W; Zhao, S; Liu, X; Dong, S; Lv, J; Liu, D; Wang, J; Meng, Z

2013-12-04

Large-scale next-generation sequencing (NGS)-based resequencing detects sequence variations, constructs evolutionary histories, and identifies phenotype-related genotypes. However, NGS-based resequencing studies generate extraordinarily large amounts of data, making computations difficult. Effective use and analysis of these data for NGS-based resequencing studies remains a difficult task for individual researchers. Here, we introduce ReSeqTools, a full-featured toolkit for NGS (Illumina sequencing)-based resequencing analysis, which processes raw data, interprets mapping results, and identifies and annotates sequence variations. ReSeqTools provides abundant scalable functions for routine resequencing analysis in different modules to facilitate customization of the analysis pipeline. ReSeqTools is designed to use compressed data files as input or output to save storage space and facilitates faster and more computationally efficient large-scale resequencing studies in a user-friendly manner. It offers abundant practical functions and generates useful statistics during the analysis pipeline, which significantly simplifies resequencing analysis. Its integrated algorithms and abundant sub-functions provide a solid foundation for special demands in resequencing projects. Users can combine these functions to construct their own pipelines for other purposes.
Implication of the cause of differences in 3D structures of proteins with high sequence identity based on analyses of amino acid sequences and 3D structures.

PubMed

Matsuoka, Masanari; Sugita, Masatake; Kikuchi, Takeshi

2014-09-18

Proteins that share a high sequence homology while exhibiting drastically different 3D structures are investigated in this study. Recently, artificial proteins related to the sequences of the GA and IgG binding GB domains of human serum albumin have been designed. These artificial proteins, referred to as GA and GB, share 98% amino acid sequence identity but exhibit different 3D structures, namely, a 3α bundle versus a 4β + α structure. Discriminating between their 3D structures based on their amino acid sequences is a very difficult problem. In the present work, in addition to using bioinformatics techniques, an analysis based on inter-residue average distance statistics is used to address this problem. It was hard to distinguish which structure a given sequence would take only with the results of ordinary analyses like BLAST and conservation analyses. However, in addition to these analyses, with the analysis based on the inter-residue average distance statistics and our sequence tendency analysis, we could infer which part would play an important role in its structural formation. The results suggest possible determinants of the different 3D structures for sequences with high sequence identity. The possibility of discriminating between the 3D structures based on the given sequences is also discussed.
Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition

PubMed Central

Lim, Yan Wei; Cuevas, Daniel A.; Silva, Genivaldo Gueiros Z.; Aguinaldo, Kristen; Dinsdale, Elizabeth A.; Haas, Andreas F.; Hatay, Mark; Sanchez, Savannah E.; Wegley-Kelly, Linda; Dutilh, Bas E.; Harkins, Timothy T.; Lee, Clarence C.; Tom, Warren; Sandin, Stuart A.; Smith, Jennifer E.; Zgliczynski, Brian; Vermeij, Mark J.A.; Rohwer, Forest

2014-01-01

Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines. PMID:25177534
Sequencing at sea: challenges and experiences in Ion Torrent PGM sequencing during the 2013 Southern Line Islands Research Expedition.

PubMed

Lim, Yan Wei; Cuevas, Daniel A; Silva, Genivaldo Gueiros Z; Aguinaldo, Kristen; Dinsdale, Elizabeth A; Haas, Andreas F; Hatay, Mark; Sanchez, Savannah E; Wegley-Kelly, Linda; Dutilh, Bas E; Harkins, Timothy T; Lee, Clarence C; Tom, Warren; Sandin, Stuart A; Smith, Jennifer E; Zgliczynski, Brian; Vermeij, Mark J A; Rohwer, Forest; Edwards, Robert A

2014-01-01

Genomics and metagenomics have revolutionized our understanding of marine microbial ecology and the importance of microbes in global geochemical cycles. However, the process of DNA sequencing has always been an abstract extension of the research expedition, completed once the samples were returned to the laboratory. During the 2013 Southern Line Islands Research Expedition, we started the first effort to bring next generation sequencing to some of the most remote locations on our planet. We successfully sequenced twenty six marine microbial genomes, and two marine microbial metagenomes using the Ion Torrent PGM platform on the Merchant Yacht Hanse Explorer. Onboard sequence assembly, annotation, and analysis enabled us to investigate the role of the microbes in the coral reef ecology of these islands and atolls. This analysis identified phosphonate as an important phosphorous source for microbes growing in the Line Islands and reinforced the importance of L-serine in marine microbial ecosystems. Sequencing in the field allowed us to propose hypotheses and conduct experiments and further sampling based on the sequences generated. By eliminating the delay between sampling and sequencing, we enhanced the productivity of the research expedition. By overcoming the hurdles associated with sequencing on a boat in the middle of the Pacific Ocean we proved the flexibility of the sequencing, annotation, and analysis pipelines.
Genetic Diversity of Crimean Congo Hemorrhagic Fever Virus Strains from Iran

PubMed Central

Chinikar, Sadegh; Bouzari, Saeid; Shokrgozar, Mohammad Ali; Mostafavi, Ehsan; Jalali, Tahmineh; Khakifirouz, Sahar; Nowotny, Norbert; Fooks, Anthony R.; Shah-Hosseini, Nariman

2016-01-01

Background: Crimean Congo hemorrhagic fever virus (CCHFV) is a member of the Bunyaviridae family and Nairovirus genus. It has a negative-sense, single stranded RNA genome approximately 19.2 kb, containing the Small, Medium, and Large segments. CCHFVs are relatively divergent in their genome sequence and grouped in seven distinct clades based on S-segment sequence analysis and six clades based on M-segment sequences. Our aim was to obtain new insights into the molecular epidemiology of CCHFV in Iran. Methods: We analyzed partial and complete nucleotide sequences of the S and M segments derived from 50 Iranian patients. The extracted RNA was amplified using one-step RT-PCR and then sequenced. The sequences were analyzed using Mega5 software. Results: Phylogenetic analysis of partial S segment sequences demonstrated that clade IV-(Asia 1), clade IV-(Asia 2) and clade V-(Europe) accounted for 80 %, 4 % and 14 % of the circulating genomic variants of CCHFV in Iran respectively. However, one of the Iranian strains (Iran-Kerman/22) was associated with none of other sequences and formed a new clade (VII). The phylogenetic analysis of complete S-segment nucleotide sequences from selected Iranian CCHFV strains complemented with representative strains from GenBank revealed similar topology as partial sequences with eight major clusters. A partial M segment phylogeny positioned the Iranian strains in either association with clade III (Asia-Africa) or clade V (Europe). Conclusion: The phylogenetic analysis revealed subtle links between distant geographic locations, which we propose might originate either from international livestock trade or from long-distance carriage of CCHFV by infected ticks via bird migration. PMID:27308271
Sequence analysis of serum albumins reveals the molecular evolution of ligand recognition properties.

PubMed

Fanali, Gabriella; Ascenzi, Paolo; Bernardi, Giorgio; Fasano, Mauro

2012-01-01

Serum albumin (SA) is a circulating protein providing a depot and carrier for many endogenous and exogenous compounds. At least seven major binding sites have been identified by structural and functional investigations mainly in human SA. SA is conserved in vertebrates, with at least 49 entries in protein sequence databases. The multiple sequence analysis of this set of entries leads to the definition of a cladistic tree for the molecular evolution of SA orthologs in vertebrates, thus showing the clustering of the considered species, with lamprey SAs (Lethenteron japonicum and Petromyzon marinus) in a separate outgroup. Sequence analysis aimed at searching conserved domains revealed that most SA sequences are made up by three repeated domains (about 600 residues), as extensively characterized for human SA. On the contrary, lamprey SAs are giant proteins (about 1400 residues) comprising seven repeated domains. The phylogenetic analysis of the SA family reveals a stringent correlation with the taxonomic classification of the species available in sequence databases. A focused inspection of the sequences of ligand binding sites in SA revealed that in all sites most residues involved in ligand binding are conserved, although the versatility towards different ligands could be peculiar of higher organisms. Moreover, the analysis of molecular links between the different sites suggests that allosteric modulation mechanisms could be restricted to higher vertebrates.
CloVR: A virtual machine for automated and portable sequence analysis from the desktop using cloud computing

PubMed Central

2011-01-01

Background Next-generation sequencing technologies have decentralized sequence acquisition, increasing the demand for new bioinformatics tools that are easy to use, portable across multiple platforms, and scalable for high-throughput applications. Cloud computing platforms provide on-demand access to computing infrastructure over the Internet and can be used in combination with custom built virtual machines to distribute pre-packaged with pre-configured software. Results We describe the Cloud Virtual Resource, CloVR, a new desktop application for push-button automated sequence analysis that can utilize cloud computing resources. CloVR is implemented as a single portable virtual machine (VM) that provides several automated analysis pipelines for microbial genomics, including 16S, whole genome and metagenome sequence analysis. The CloVR VM runs on a personal computer, utilizes local computer resources and requires minimal installation, addressing key challenges in deploying bioinformatics workflows. In addition CloVR supports use of remote cloud computing resources to improve performance for large-scale sequence processing. In a case study, we demonstrate the use of CloVR to automatically process next-generation sequencing data on multiple cloud computing platforms. Conclusion The CloVR VM and associated architecture lowers the barrier of entry for utilizing complex analysis protocols on both local single- and multi-core computers and cloud systems for high throughput data processing. PMID:21878105
Automated sequence analysis and editing software for HIV drug resistance testing.

PubMed

Struck, Daniel; Wallis, Carole L; Denisov, Gennady; Lambert, Christine; Servais, Jean-Yves; Viana, Raquel V; Letsoalo, Esrom; Bronze, Michelle; Aitken, Sue C; Schuurman, Rob; Stevens, Wendy; Schmit, Jean Claude; Rinke de Wit, Tobias; Perez Bercoff, Danielle

2012-05-01

Access to antiretroviral treatment in resource-limited-settings is inevitably paralleled by the emergence of HIV drug resistance. Monitoring treatment efficacy and HIV drugs resistance testing are therefore of increasing importance in resource-limited settings. Yet low-cost technologies and procedures suited to the particular context and constraints of such settings are still lacking. The ART-A (Affordable Resistance Testing for Africa) consortium brought together public and private partners to address this issue. To develop an automated sequence analysis and editing software to support high throughput automated sequencing. The ART-A Software was designed to automatically process and edit ABI chromatograms or FASTA files from HIV-1 isolates. The ART-A Software performs the basecalling, assigns quality values, aligns query sequences against a set reference, infers a consensus sequence, identifies the HIV type and subtype, translates the nucleotide sequence to amino acids and reports insertions/deletions, premature stop codons, ambiguities and mixed calls. The results can be automatically exported to Excel to identify mutations. Automated analysis was compared to manual analysis using a panel of 1624 PR-RT sequences generated in 3 different laboratories. Discrepancies between manual and automated sequence analysis were 0.69% at the nucleotide level and 0.57% at the amino acid level (668,047 AA analyzed), and discordances at major resistance mutations were recorded in 62 cases (4.83% of differences, 0.04% of all AA) for PR and 171 (6.18% of differences, 0.03% of all AA) cases for RT. The ART-A Software is a time-sparing tool for pre-analyzing HIV and viral quasispecies sequences in high throughput laboratories and highlighting positions requiring attention. Copyright © 2012 Elsevier B.V. All rights reserved.
Company profile: Complete Genomics Inc.

PubMed

Reid, Clifford

2011-02-01

Complete Genomics Inc. is a life sciences company that focuses on complete human genome sequencing. It is taking a completely different approach to DNA sequencing than other companies in the industry. Rather than building a general-purpose platform for sequencing all organisms and all applications, it has focused on a single application - complete human genome sequencing. The company's Complete Genomics Analysis Platform (CGA™ Platform) comprises an integrated package of biochemistry, instrumentation and software that sequences human genomes at the highest quality, lowest cost and largest scale available. Complete Genomics offers a turnkey service that enables customers to outsource their human genome sequencing to the company's genome sequencing center in Mountain View, CA, USA. Customers send in their DNA samples, the company does all the library preparation, DNA sequencing, assembly and variant analysis, and customers receive research-ready data that they can use for biological discovery.
Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer

DTIC Science & Technology

2016-09-01

AWARD NUMBER: W81XWH-14-1-0080 TITLE: Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. PRINCIPAL INVESTIGATOR...PREPARED FOR: U.S. Army Medical Research and Materiel Command Fort Detrick, Maryland 21702-5012 DISTRIBUTION STATEMENT: Approved for Public Release...SUBTITLE Total RNA Sequencing Analysis of DCIS Progressing to Invasive Breast Cancer. 5a. CONTRACT NUMBER 5b. GRANT NUMBER W81XWH-14-1-0080 GRANT11489
Dynamic Assessment of Microbial Ecology (DAME): A web app for interactive analysis and visualization of microbial sequencing data

USDA-ARS?s Scientific Manuscript database

Dynamic Assessment of Microbial Ecology (DAME) is a shiny-based web application for interactive analysis and visualization of microbial sequencing data. DAME provides researchers not familiar with R programming the ability to access the most current R functions utilized for ecology and gene sequenci...

Genomics sequence analysis of the United States infectious laryngotracheitis vaccine strains chicken embryo origin (CEO) and tissue culture origin (TCO)

USDA-ARS?s Scientific Manuscript database

The genomic sequences of low and high passages of the United States infectious laryngotracheitis (ILT) vaccine strains CEO and TCO were determined using hybrid next generation sequencing in order to define genomic changes associated with attenuation and reversion to virulence. Phylogenetic analysis ...
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera

PubMed Central

Kluge, M.; Franco, A. C.; Giongo, A.; Valdez, F. P.; Saddi, T. M.; Brito, W. M. E. D.; Roehe, P. M.

2016-01-01

A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. PMID:26823583
High Throughput Sequence Analysis for Disease Resistance in Maize

USDA-ARS?s Scientific Manuscript database

Preliminary results of a computational analysis of high throughput sequencing data from Zea mays and the fungus Aspergillus are reported. The Illumina Genome Analyzer was used to sequence RNA samples from two strains of Z. mays (Va35 and Mp313) collected over a time course as well as several specie...
Toward a method for tracking virus evolutionary trajectory applied to the pandemic H1N1 2009 influenza virus.

PubMed

Squires, R Burke; Pickett, Brett E; Das, Sajal; Scheuermann, Richard H

2014-12-01

In 2009 a novel pandemic H1N1 influenza virus (H1N1pdm09) emerged as the first official influenza pandemic of the 21st century. Early genomic sequence analysis pointed to the swine origin of the virus. Here we report a novel computational approach to determine the evolutionary trajectory of viral sequences that uses data-driven estimations of nucleotide substitution rates to track the gradual accumulation of observed sequence alterations over time. Phylogenetic analysis and multiple sequence alignments show that sequences belonging to the resulting evolutionary trajectory of the H1N1pdm09 lineage exhibit a gradual accumulation of sequence variations and tight temporal correlations in the topological structure of the phylogenetic trees. These results suggest that our evolutionary trajectory analysis (ETA) can more effectively pinpoint the evolutionary history of viruses, including the host and geographical location traversed by each segment, when compared against either BLAST or traditional phylogenetic analysis alone. Copyright © 2014 Elsevier B.V. All rights reserved.
Fungal Genomics for Energy and Environment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2013-03-11

Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Sequencing Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for usersmore » to nominate new species for sequencing. Over 200 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less
Interuser Interference Analysis for Direct-Sequence Spread-Spectrum Systems Part I: Partial-Period Cross-Correlation

NASA Technical Reports Server (NTRS)

Ni, Jianjun (David)

2012-01-01

This presentation discusses an analysis approach to evaluate the interuser interference for Direct-Sequence Spread-Spectrum (DSSS) Systems for Space Network (SN) Users. Part I of this analysis shows that the correlation property of pseudo noise (PN) sequences is the critical factor which determines the interuser interference performance of the DSSS system. For non-standard DSSS systems in which PN sequence s period is much larger than one data symbol duration, it is the partial-period cross-correlation that determines the system performance. This study reveals through an example that a well-designed PN sequence set (e.g. Gold Sequence, in which the cross-correlation for a whole-period is well controlled) may have non-controlled partial-period cross-correlation which could cause severe interuser interference for a DSSS system. Since the analytical derivation of performance metric (bit error rate or signal-to-noise ratio) based on partial-period cross-correlation is prohibitive, the performance degradation due to partial-period cross-correlation will be evaluated using simulation in Part II of this analysis in the future.
Identification of Y-Chromosome Sequences in Turner Syndrome.

PubMed

Silva-Grecco, Roseane Lopes da; Trovó-Marqui, Alessandra Bernadete; Sousa, Tiago Alves de; Croce, Lilian Da; Balarin, Marly Aparecida Spadotto

2016-05-01

To investigate the presence of Y-chromosome sequences and determine their frequency in patients with Turner syndrome. The study included 23 patients with Turner syndrome from Brazil, who gave written informed consent for participating in the study. Cytogenetic analyses were performed in peripheral blood lymphocytes, with 100 metaphases per patient. Genomic DNA was also extracted from peripheral blood lymphocytes, and gene sequences DYZ1, DYZ3, ZFY and SRY were amplified by Polymerase Chain Reaction. The cytogenetic analysis showed a 45,X karyotype in 9 patients (39.2 %) and a mosaic pattern in 14 (60.8 %). In 8.7 % (2 out of 23) of the patients, Y-chromosome sequences were found. This prevalence is very similar to those reported previously. The initial karyotype analysis of these patients did not reveal Y-chromosome material, but they were found positive for Y-specific sequences in the lymphocyte DNA analysis. The PCR technique showed that 2 (8.7 %) of the patients with Turner syndrome had Y-chromosome sequences, both presenting marker chromosomes on cytogenetic analysis.
Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

ScienceCinema

Sexton, David

2018-01-22

David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Mercury: Next-gen Data Analysis and Annotation Pipeline (Seventh Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sexton, David

2012-06-01

David Sexton (Baylor) gives a talk titled "Mercury: Next-gen Data Analysis and Annotation Pipeline" at the 7th Annual Sequencing, Finishing, Analysis in the Future (SFAF) Meeting held in June, 2012 in Santa Fe, NM.
Auditory sequence analysis and phonological skill

PubMed Central

Grube, Manon; Kumar, Sukhbinder; Cooper, Freya E.; Turton, Stuart; Griffiths, Timothy D.

2012-01-01

This work tests the relationship between auditory and phonological skill in a non-selected cohort of 238 school students (age 11) with the specific hypothesis that sound-sequence analysis would be more relevant to phonological skill than the analysis of basic, single sounds. Auditory processing was assessed across the domains of pitch, time and timbre; a combination of six standard tests of literacy and language ability was used to assess phonological skill. A significant correlation between general auditory and phonological skill was demonstrated, plus a significant, specific correlation between measures of phonological skill and the auditory analysis of short sequences in pitch and time. The data support a limited but significant link between auditory and phonological ability with a specific role for sound-sequence analysis, and provide a possible new focus for auditory training strategies to aid language development in early adolescence. PMID:22951739
Evaluation of microbial community in hydrothermal field by direct DNA sequencing

NASA Astrophysics Data System (ADS)

Kawarabayasi, Y.; Maruyama, A.

2002-12-01

Many extremophiles have been discovered from terrestrial and marine hydrothermal fields. Some thermophiles can grow beyond 90°C in culture, while direct microscopic analysis occasionally indicates that microbes may survive in much hotter hydrothermal fluids. However, it is very difficult to isolate and cultivate such microbes from the environments, i.e., over 99% of total microbes remains undiscovered. Based on experiences of entire microbial genome analysis (Y.K.) and microbial community analysis (A.M.), we started to find out unique microbes/genes in hydrothermal fields through direct sequencing of environmental DNA fragments. At first, shotgun plasmid libraries were directly constructed with the DNA molecules prepared from mixed microbes collected by an in situ filtration system from low-temperature fluids at RM24 in the Southern East Pacific Rise (S-EPR). A gene amplification (PCR) technique was not used for preventing mutation in the process. The nucleotide sequences of 285 clones indicated that no sequence had identical data in public databases. Among 27 clones determined entire sequences, no ORF was identified on 14 clones like intron in Eukaryote. On four clones, tetra-nucleotide-long multiple tandem repetitive sequences were identified. This type of sequence was identified in some familiar disease in human. The result indicates that living/dead materials with eukaryotic features may exist in this low temperature field. Secondly, shotgun plasmid libraries were constructed from the environmental DNA prepared from Beppu hot springs. In randomly-selected 143 clones used for sequencing, no known sequence was identified. Unlike the clones in S-EPR library, clear ORFs were identified on all nine clones determined the entire sequence. It was found that one clone, H4052, contained the complete Aspartyl-tRNA synthetase. Phylogenetic analysis using amino acid sequences of this gene indicated that this gene was separated from other Euryarchaea before the differentiation of species. Thus, some novel archaeal species are expected to be in this field. The present direct cloning and sequencing technique is now opening a window to the new world in hydrothermal microbial community analysis.
Genomics dataset of unidentified disclosed isolates.

PubMed

Rekadwad, Bhagwan N

2016-09-01

Analysis of DNA sequences is necessary for higher hierarchical classification of the organisms. It gives clues about the characteristics of organisms and their taxonomic position. This dataset is chosen to find complexities in the unidentified DNA in the disclosed patents. A total of 17 unidentified DNA sequences were thoroughly analyzed. The quick response codes were generated. AT/GC content of the DNA sequences analysis was carried out. The QR is helpful for quick identification of isolates. AT/GC content is helpful for studying their stability at different temperatures. Additionally, a dataset on cleavage code and enzyme code studied under the restriction digestion study, which helpful for performing studies using short DNA sequences was reported. The dataset disclosed here is the new revelatory data for exploration of unique DNA sequences for evaluation, identification, comparison and analysis.
miRanalyzer: a microRNA detection and analysis tool for next-generation sequencing experiments.

PubMed

Hackenberg, Michael; Sturm, Martin; Langenberger, David; Falcón-Pérez, Juan Manuel; Aransay, Ana M

2009-07-01

Next-generation sequencing allows now the sequencing of small RNA molecules and the estimation of their expression levels. Consequently, there will be a high demand of bioinformatics tools to cope with the several gigabytes of sequence data generated in each single deep-sequencing experiment. Given this scene, we developed miRanalyzer, a web server tool for the analysis of deep-sequencing experiments for small RNAs. The web server tool requires a simple input file containing a list of unique reads and its copy numbers (expression levels). Using these data, miRanalyzer (i) detects all known microRNA sequences annotated in miRBase, (ii) finds all perfect matches against other libraries of transcribed sequences and (iii) predicts new microRNAs. The prediction of new microRNAs is an especially important point as there are many species with very few known microRNAs. Therefore, we implemented a highly accurate machine learning algorithm for the prediction of new microRNAs that reaches AUC values of 97.9% and recall values of up to 75% on unseen data. The web tool summarizes all the described steps in a single output page, which provides a comprehensive overview of the analysis, adding links to more detailed output pages for each analysis module. miRanalyzer is available at http://web.bioinformatics.cicbiogune.es/microRNA/.
Quasispecies Analyses of the HIV-1 Near-full-length Genome With Illumina MiSeq

PubMed Central

Ode, Hirotaka; Matsuda, Masakazu; Matsuoka, Kazuhiro; Hachiya, Atsuko; Hattori, Junko; Kito, Yumiko; Yokomaku, Yoshiyuki; Iwatani, Yasumasa; Sugiura, Wataru

2015-01-01

Human immunodeficiency virus type-1 (HIV-1) exhibits high between-host genetic diversity and within-host heterogeneity, recognized as quasispecies. Because HIV-1 quasispecies fluctuate in terms of multiple factors, such as antiretroviral exposure and host immunity, analyzing the HIV-1 genome is critical for selecting effective antiretroviral therapy and understanding within-host viral coevolution mechanisms. Here, to obtain HIV-1 genome sequence information that includes minority variants, we sought to develop a method for evaluating quasispecies throughout the HIV-1 near-full-length genome using the Illumina MiSeq benchtop deep sequencer. To ensure the reliability of minority mutation detection, we applied an analysis method of sequence read mapping onto a consensus sequence derived from de novo assembly followed by iterative mapping and subsequent unique error correction. Deep sequencing analyses of aHIV-1 clone showed that the analysis method reduced erroneous base prevalence below 1% in each sequence position and discarded only < 1% of all collected nucleotides, maximizing the usage of the collected genome sequences. Further, we designed primer sets to amplify the HIV-1 near-full-length genome from clinical plasma samples. Deep sequencing of 92 samples in combination with the primer sets and our analysis method provided sufficient coverage to identify >1%-frequency sequences throughout the genome. When we evaluated sequences of pol genes from 18 treatment-naïve patients' samples, the deep sequencing results were in agreement with Sanger sequencing and identified numerous additional minority mutations. The results suggest that our deep sequencing method would be suitable for identifying within-host viral population dynamics throughout the genome. PMID:26617593
Sequence determination and analysis of the NSs genes of two tospoviruses.

PubMed

Hallwass, Mariana; Leastro, Mikhail O; Lima, Mirtes F; Inoue-Nagata, Alice K; Resende, Renato O

2012-03-01

The tospoviruses groundnut ringspot virus (GRSV) and zucchini lethal chlorosis virus (ZLCV) cause severe losses in many crops, especially in solanaceous and cucurbit species. In this study, the non-structural NSs gene and the 5'UTRs of these two biologically distinct tospoviruses were cloned and sequenced. The NSs sequence of GRSV and ZLCV were both 1,404 nucleotides long. Pairwise comparison showed that the NSs amino acid sequence of GRSV shared 69.6% identity with that of ZLCV and 75.9% identity with that of TSWV, while the NSs sequence of ZLCV and TSWV shared 67.9% identity. Phylogenetic analysis based on NSs sequences confirmed that these viruses cluster in the American clade.
First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

PubMed

Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

2015-10-01

Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.
Dipeptide Sequence Determination: Analyzing Phenylthiohydantoin Amino Acids by HPLC

NASA Astrophysics Data System (ADS)

Barton, Janice S.; Tang, Chung-Fei; Reed, Steven S.

2000-02-01

Amino acid composition and sequence determination, important techniques for characterizing peptides and proteins, are essential for predicting conformation and studying sequence alignment. This experiment presents improved, fundamental methods of sequence analysis for an upper-division biochemistry laboratory. Working in pairs, students use the Edman reagent to prepare phenylthiohydantoin derivatives of amino acids for determination of the sequence of an unknown dipeptide. With a single HPLC technique, students identify both the N-terminal amino acid and the composition of the dipeptide. This method yields good precision of retention times and allows use of a broad range of amino acids as components of the dipeptide. Students learn fundamental principles and techniques of sequence analysis and HPLC.
Pfarao: a web application for protein family analysis customized for cytoskeletal and motor proteins (CyMoBase)

PubMed Central

Odronitz, Florian; Kollmar, Martin

2006-01-01

Background Annotation of protein sequences of eukaryotic organisms is crucial for the understanding of their function in the cell. Manual annotation is still by far the most accurate way to correctly predict genes. The classification of protein sequences, their phylogenetic relation and the assignment of function involves information from various sources. This often leads to a collection of heterogeneous data, which is hard to track. Cytoskeletal and motor proteins consist of large and diverse superfamilies comprising up to several dozen members per organism. Up to date there is no integrated tool available to assist in the manual large-scale comparative genomic analysis of protein families. Description Pfarao (Protein Family Application for Retrieval, Analysis and Organisation) is a database driven online working environment for the analysis of manually annotated protein sequences and their relationship. Currently, the system can store and interrelate a wide range of information about protein sequences, species, phylogenetic relations and sequencing projects as well as links to literature and domain predictions. Sequences can be imported from multiple sequence alignments that are generated during the annotation process. A web interface allows to conveniently browse the database and to compile tabular and graphical summaries of its content. Conclusion We implemented a protein sequence-centric web application to store, organize, interrelate, and present heterogeneous data that is generated in manual genome annotation and comparative genomics. The application has been developed for the analysis of cytoskeletal and motor proteins (CyMoBase) but can easily be adapted for any protein. PMID:17134497
CSReport: A New Computational Tool Designed for Automatic Analysis of Class Switch Recombination Junctions Sequenced by High-Throughput Sequencing.

PubMed

Boyer, François; Boutouil, Hend; Dalloul, Iman; Dalloul, Zeinab; Cook-Moreau, Jeanne; Aldigier, Jean-Claude; Carrion, Claire; Herve, Bastien; Scaon, Erwan; Cogné, Michel; Péron, Sophie

2017-05-15

B cells ensure humoral immune responses due to the production of Ag-specific memory B cells and Ab-secreting plasma cells. In secondary lymphoid organs, Ag-driven B cell activation induces terminal maturation and Ig isotype class switch (class switch recombination [CSR]). CSR creates a virtually unique IgH locus in every B cell clone by intrachromosomal recombination between two switch (S) regions upstream of each C region gene. Amount and structural features of CSR junctions reveal valuable information about the CSR mechanism, and analysis of CSR junctions is useful in basic and clinical research studies of B cell functions. To provide an automated tool able to analyze large data sets of CSR junction sequences produced by high-throughput sequencing (HTS), we designed CSReport, a software program dedicated to support analysis of CSR recombination junctions sequenced with a HTS-based protocol (Ion Torrent technology). CSReport was assessed using simulated data sets of CSR junctions and then used for analysis of Sμ-Sα and Sμ-Sγ1 junctions from CH12F3 cells and primary murine B cells, respectively. CSReport identifies junction segment breakpoints on reference sequences and junction structure (blunt-ended junctions or junctions with insertions or microhomology). Besides the ability to analyze unprecedentedly large libraries of junction sequences, CSReport will provide a unified framework for CSR junction studies. Our results show that CSReport is an accurate tool for analysis of sequences from our HTS-based protocol for CSR junctions, thereby facilitating and accelerating their study. Copyright © 2017 by The American Association of Immunologists, Inc.
Bacterial community comparisons by taxonomy-supervised analysis independent of sequence alignment and clustering

PubMed Central

Sul, Woo Jun; Cole, James R.; Jesus, Ederson da C.; Wang, Qiong; Farris, Ryan J.; Fish, Jordan A.; Tiedje, James M.

2011-01-01

High-throughput sequencing of 16S rRNA genes has increased our understanding of microbial community structure, but now even higher-throughput methods to the Illumina scale allow the creation of much larger datasets with more samples and orders-of-magnitude more sequences that swamp current analytic methods. We developed a method capable of handling these larger datasets on the basis of assignment of sequences into an existing taxonomy using a supervised learning approach (taxonomy-supervised analysis). We compared this method with a commonly used clustering approach based on sequence similarity (taxonomy-unsupervised analysis). We sampled 211 different bacterial communities from various habitats and obtained ∼1.3 million 16S rRNA sequences spanning the V4 hypervariable region by pyrosequencing. Both methodologies gave similar ecological conclusions in that β-diversity measures calculated by using these two types of matrices were significantly correlated to each other, as were the ordination configurations and hierarchical clustering dendrograms. In addition, our taxonomy-supervised analyses were also highly correlated with phylogenetic methods, such as UniFrac. The taxonomy-supervised analysis has the advantages that it is not limited by the exhaustive computation required for the alignment and clustering necessary for the taxonomy-unsupervised analysis, is more tolerant of sequencing errors, and allows comparisons when sequences are from different regions of the 16S rRNA gene. With the tremendous expansion in 16S rRNA data acquisition underway, the taxonomy-supervised approach offers the potential to provide more rapid and extensive community comparisons across habitats and samples. PMID:21873204

Comparative Analysis of the Orphan CRISPR2 Locus in 242 Enterococcus faecalis Strains

PubMed Central

Hullahalli, Karthik; Rodrigues, Marinelle; Schmidt, Brendan D.; Li, Xiang; Bhardwaj, Pooja; Palmer, Kelli L.

2015-01-01

Clustered, Regularly Interspaced Short Palindromic Repeats and their associated Cas proteins (CRISPR-Cas) provide prokaryotes with a mechanism for defense against mobile genetic elements (MGEs). A CRISPR locus is a molecular memory of MGE encounters. It contains an array of short sequences, called spacers, that generally have sequence identity to MGEs. Three different CRISPR loci have been identified among strains of the opportunistic pathogen Enterococcus faecalis. CRISPR1 and CRISPR3 are associated with the cas genes necessary for blocking MGEs, but these loci are present in only a subset of E. faecalis strains. The orphan CRISPR2 lacks cas genes and is ubiquitous in E. faecalis, although its spacer content varies from strain to strain. Because CRISPR2 is a variable locus occurring in all E. faecalis, comparative analysis of CRISPR2 sequences may provide information about the clonality of E. faecalis strains. We examined CRISPR2 sequences from 228 E. faecalis genomes in relationship to subspecies phylogenetic lineages (sequence types; STs) determined by multilocus sequence typing (MLST), and to a genome phylogeny generated for a representative 71 genomes. We found that specific CRISPR2 sequences are associated with specific STs and with specific branches on the genome tree. To explore possible applications of CRISPR2 analysis, we evaluated 14 E. faecalis bloodstream isolates using CRISPR2 analysis and MLST. CRISPR2 analysis identified two groups of clonal strains among the 14 isolates, an assessment that was confirmed by MLST. CRISPR2 analysis was also used to accurately predict the ST of a subset of isolates. We conclude that CRISPR2 analysis, while not a replacement for MLST, is an inexpensive method to assess clonality among E. faecalis isolates, and can be used in conjunction with MLST to identify recombination events occurring between STs. PMID:26398194
The Transcriptome Analysis and Comparison Explorer--T-ACE: a platform-independent, graphical tool to process large RNAseq datasets of non-model organisms.

PubMed

Philipp, E E R; Kraemer, L; Mountfort, D; Schilhabel, M; Schreiber, S; Rosenstiel, P

2012-03-15

Next generation sequencing (NGS) technologies allow a rapid and cost-effective compilation of large RNA sequence datasets in model and non-model organisms. However, the storage and analysis of transcriptome information from different NGS platforms is still a significant bottleneck, leading to a delay in data dissemination and subsequent biological understanding. Especially database interfaces with transcriptome analysis modules going beyond mere read counts are missing. Here, we present the Transcriptome Analysis and Comparison Explorer (T-ACE), a tool designed for the organization and analysis of large sequence datasets, and especially suited for transcriptome projects of non-model organisms with little or no a priori sequence information. T-ACE offers a TCL-based interface, which accesses a PostgreSQL database via a php-script. Within T-ACE, information belonging to single sequences or contigs, such as annotation or read coverage, is linked to the respective sequence and immediately accessible. Sequences and assigned information can be searched via keyword- or BLAST-search. Additionally, T-ACE provides within and between transcriptome analysis modules on the level of expression, GO terms, KEGG pathways and protein domains. Results are visualized and can be easily exported for external analysis. We developed T-ACE for laboratory environments, which have only a limited amount of bioinformatics support, and for collaborative projects in which different partners work on the same dataset from different locations or platforms (Windows/Linux/MacOS). For laboratories with some experience in bioinformatics and programming, the low complexity of the database structure and open-source code provides a framework that can be customized according to the different needs of the user and transcriptome project.
Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome (7th Annual SFAF Meeting, 2012)

ScienceCinema

Jenkins, David

2018-01-10

David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Ion Torren Semiconductor Sequencing Allows Rapid, Low Cost Sequencing of the Human Exome (7th Annual SFAF Meeting, 2012)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jenkins, David

David Jenkins on "Ion Torrent semiconductor sequencing allows rapid, low-cost sequencing of the human exome" at the 2012 Sequencing, Finishing, Analysis in the Future Meeting held June 5-7, 2012 in Santa Fe, New Mexico.
Relationships among genera of the Saccharomycotina from multigene sequence analysis

USDA-ARS?s Scientific Manuscript database

Most known species of the subphylum Saccharomycotina (budding ascomycetous yeasts) have now been placed in phylogenetically defined clades following multigene sequence analysis. Terminal clades, which are usually well supported from bootstrap analysis, are viewed as phylogenetically circumscribed ge...
Forensic massively parallel sequencing data analysis tool: Implementation of MyFLq as a standalone web- and Illumina BaseSpace(®)-application.

PubMed

Van Neste, Christophe; Gansemans, Yannick; De Coninck, Dieter; Van Hoofstat, David; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

2015-03-01

Routine use of massively parallel sequencing (MPS) for forensic genomics is on the horizon. The last few years, several algorithms and workflows have been developed to analyze forensic MPS data. However, none have yet been tailored to the needs of the forensic analyst who does not possess an extensive bioinformatics background. We developed our previously published forensic MPS data analysis framework MyFLq (My-Forensic-Loci-queries) into an open-source, user-friendly, web-based application. It can be installed as a standalone web application, or run directly from the Illumina BaseSpace environment. In the former, laboratories can keep their data on-site, while in the latter, data from forensic samples that are sequenced on an Illumina sequencer can be uploaded to Basespace during acquisition, and can subsequently be analyzed using the published MyFLq BaseSpace application. Additional features were implemented such as an interactive graphical report of the results, an interactive threshold selection bar, and an allele length-based analysis in addition to the sequenced-based analysis. Practical use of the application is demonstrated through the analysis of four 16-plex short tandem repeat (STR) samples, showing the complementarity between the sequence- and length-based analysis of the same MPS data. Copyright © 2014 The Authors. Published by Elsevier Ireland Ltd.. All rights reserved.
Sequence-structure mapping errors in the PDB: OB-fold domains

PubMed Central

Venclovas, Česlovas; Ginalski, Krzysztof; Kang, Chulhee

2004-01-01

The Protein Data Bank (PDB) is the single most important repository of structural data for proteins and other biologically relevant molecules. Therefore, it is critically important to keep the PDB data, as much as possible, error-free. In this study, we have analyzed PDB crystal structures possessing oligonucleotide/oligosaccharide binding (OB)-fold, one of the highly populated folds, for the presence of sequence-structure mapping errors. Using energy-based structure quality assessment coupled with sequence analyses, we have found that there are at least five OB-structures in the PDB that have regions where sequences have been incorrectly mapped onto the structure. We have demonstrated that the combination of these computation techniques is effective not only in detecting sequence-structure mapping errors, but also in providing guidance to correct them. Namely, we have used results of computational analysis to direct a revision of X-ray data for one of the PDB entries containing a fairly inconspicuous sequence-structure mapping error. The revised structure has been deposited with the PDB. We suggest use of computational energy assessment and sequence analysis techniques to facilitate structure determination when homologs having known structure are available to use as a reference. Such computational analysis may be useful in either guiding the sequence-structure assignment process or verifying the sequence mapping within poorly defined regions. PMID:15133161
Studies of a biochemical factory: tomato trichome deep expressed sequence tag sequencing and proteomics.

PubMed

Schilmiller, Anthony L; Miner, Dennis P; Larson, Matthew; McDowell, Eric; Gang, David R; Wilkerson, Curtis; Last, Robert L

2010-07-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces beta-caryophyllene and alpha-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells.
Studies of a Biochemical Factory: Tomato Trichome Deep Expressed Sequence Tag Sequencing and Proteomics1[W][OA

PubMed Central

Schilmiller, Anthony L.; Miner, Dennis P.; Larson, Matthew; McDowell, Eric; Gang, David R.; Wilkerson, Curtis; Last, Robert L.

2010-01-01

Shotgun proteomics analysis allows hundreds of proteins to be identified and quantified from a single sample at relatively low cost. Extensive DNA sequence information is a prerequisite for shotgun proteomics, and it is ideal to have sequence for the organism being studied rather than from related species or accessions. While this requirement has limited the set of organisms that are candidates for this approach, next generation sequencing technologies make it feasible to obtain deep DNA sequence coverage from any organism. As part of our studies of specialized (secondary) metabolism in tomato (Solanum lycopersicum) trichomes, 454 sequencing of cDNA was combined with shotgun proteomics analyses to obtain in-depth profiles of genes and proteins expressed in leaf and stem glandular trichomes of 3-week-old plants. The expressed sequence tag and proteomics data sets combined with metabolite analysis led to the discovery and characterization of a sesquiterpene synthase that produces β-caryophyllene and α-humulene from E,E-farnesyl diphosphate in trichomes of leaf but not of stem. This analysis demonstrates the utility of combining high-throughput cDNA sequencing with proteomics experiments in a target tissue. These data can be used for dissection of other biochemical processes in these specialized epidermal cells. PMID:20431087
BlockLogo: visualization of peptide and sequence motif conservation

PubMed Central

Olsen, Lars Rønn; Kudahl, Ulrich Johan; Simon, Christian; Sun, Jing; Schönbach, Christian; Reinherz, Ellis L.; Zhang, Guang Lan; Brusic, Vladimir

2013-01-01

BlockLogo is a web-server application for visualization of protein and nucleotide fragments, continuous protein sequence motifs, and discontinuous sequence motifs using calculation of block entropy from multiple sequence alignments. The user input consists of a multiple sequence alignment, selection of motif positions, type of sequence, and output format definition. The output has BlockLogo along with the sequence logo, and a table of motif frequencies. We deployed BlockLogo as an online application and have demonstrated its utility through examples that show visualization of T-cell epitopes and B-cell epitopes (both continuous and discontinuous). Our additional example shows a visualization and analysis of structural motifs that determine specificity of peptide binding to HLA-DR molecules. The BlockLogo server also employs selected experimentally validated prediction algorithms to enable on-the-fly prediction of MHC binding affinity to 15 common HLA class I and class II alleles as well as visual analysis of discontinuous epitopes from multiple sequence alignments. It enables the visualization and analysis of structural and functional motifs that are usually described as regular expressions. It provides a compact view of discontinuous motifs composed of distant positions within biological sequences. BlockLogo is available at: http://research4.dfci.harvard.edu/cvc/blocklogo/ and http://methilab.bu.edu/blocklogo/ PMID:24001880
A Ser75-to-Asp phospho-mimicking mutation in Src accelerates ageing-related loss of retinal ganglion cells in mice.

PubMed

Kashiwagi, Kenji; Ito, Sadahiro; Maeda, Shuichiro; Kato, Goro

2017-12-01

Src knockout mice show no detectable abnormalities in central nervous system (CNS) post-mitotic neurons, likely reflecting functional compensation by other Src family kinases. Cdk1- or Cdk5-dependent Ser75 phosphorylation in the amino-terminal Unique domain of Src, which shares no homology with other Src family kinases, regulates the stability of active Src. To clarify the roles of Src Ser75 phosphorylation in CNS neurons, we established two types of mutant mice with mutations in Src: phospho-mimicking Ser75Asp (SD) and non-phosphorylatable Ser75Ala (SA). In ageing SD/SD mice, retinal ganglion cell (RGC) number in whole retinas was significantly lower than that in young SD/SD mice in the absence of inflammation and elevated intraocular pressure, resembling the pathogenesis of progressive optic neuropathy. By contrast, SA/SA mice and wild-type (WT) mice exhibited no age-related RGC loss. The age-related retinal RGC number reduction was greater in the peripheral rather than the mid-peripheral region of the retina in SD/SD mice. Furthermore, Rho-associated kinase activity in whole retinas of ageing SD/SD mice was significantly higher than that in young SD/SD mice. These results suggest that Src regulates RGC survival during ageing in a manner that depends on Ser75 phosphorylation.
Adsorption of soft and hard proteins onto OTCEs under the influence of an external electric field.

PubMed

Benavidez, Tomás E; Torrente, Daniel; Marucho, Marcelo; Garcia, Carlos D

2015-03-03

The adsorption behavior of hard and soft proteins under the effect of an external electric field was investigated by a combination of spectroscopic ellipsometry and molecular dynamics (MD) simulations. Optically transparent carbon electrodes (OTCE) were used as conductive, sorbent substrates. Lysozyme (LSZ) and ribonuclease A (RNase A) were selected as representative hard proteins, whereas myoglobin (Mb), α-lactalbumin (α-LAC), bovine serum albumin (BSA), glucose oxidase (GOx), and immunoglobulin G (IgG) were selected to represent soft proteins. In line with recent publications from our group, the experimental results revealed that while the adsorption of all investigated proteins can be enhanced by the potential applied to the electrode, the effect is more pronounced for hard proteins. In contrast with the incomplete monolayers formed at open-circuit potential, the application of +800 mV to the sorbent surface induced the formation of multiple layers of protein. These results suggest that this effect can be related to the intrinsic polarizability of the protein (induction of dipoles), the resulting surface accessible solvent area (SASA), and structural rearrangements induced upon the incorporation on the protein layer. The described experiments are critical to understand the relationship between the structure of proteins and their tendency to form (under electric stimulation) layers with thicknesses that greatly surpass those obtained at open-circuit conditions.
Effects of different force fields on the structural character of α synuclein β-hairpin peptide (35-56) in aqueous environment.

PubMed

Kundu, Sangeeta

2018-02-01

The hallmark of Parkinson's disease (PD) is the intracellular protein aggregation forming Lewy Bodies (LB) and Lewy neuritis which comprise mostly of a protein, alpha synuclein (α-syn). Molecular dynamics (MD) simulation methods can augment experimental techniques to understand misfolding and aggregation pathways with atomistic resolution. The quality of MD simulations for proteins and peptides depends greatly on the accuracy of empirical force fields. The aim of this work is to investigate the effects of different force fields on the structural character of β hairpin fragment of α-syn (residues 35-56) peptide in aqueous solution. Six independent MD simulations are done in explicit solvent using, AMBER03, AMBER99SB, GROMOS96 43A1, GROMOS96 53A6, OPLS-AA, and CHARMM27 force fields with CMAP corrections. The performance of each force field is assessed from several structural parameters such as root mean square deviation (RMSD), root mean square fluctuation (RMSF), radius of gyration (Rg), solvent accessible surface area (SASA), formation of β-turn, the stability of folded β-hairpin structure, and the favourable conformations obtained for different force fields. In this study, CMAP correction of CHARMM27 force field is found to overestimate the helical conformation, while GROMOS96 53A6 is found to most successfully capture the conformational dynamics of α-syn β-hairpin fragment as elicited from NMR.
Adsorption of Soft and Hard Proteins onto OTCEs under the influence of an External Electric Field

PubMed Central

Benavidez, Tomás E.; Torrente, Daniel; Marucho, Marcelo; Garcia, Carlos D.

2015-01-01

The adsorption behavior of hard and soft proteins under the effect of an external electric field was investigated by a combination of spectroscopic ellipsometry and molecular dynamics (MD) simulations. Optically transparent carbon electrodes (OTCE) were used as conductive, sorbent substrates. Lysozyme (LSZ) and ribonuclease A (RNase A) were selected as representative hard proteins whereas myoglobin (Mb), α-lactalbumin (α-LAC), bovine serum albumin (BSA), glucose oxidase (GOx), and immunoglobulin G (IgG) were selected to represent soft proteins. In line with recent publications from our group, the experimental results revealed that while the adsorption of all investigated proteins can be enhanced by the potential applied to the electrode, the effect is more pronounced for hard proteins. In contrast with the incomplete monolayers formed at open-circuit potential, the application of +800mV to the sorbent surface induced the formation of multiple layers of protein. These results also suggest that this effect can be related to the intrinsic polarizability of the protein (induction of dipoles), the resulting surface accessible solvent area (SASA), and structural rearrangements induced upon the incorporation on the protein layer. The described experiments are critical to understand the relationship between the structure of proteins and their tendency to form (under electric stimulation) layers with thicknesses that greatly surpass those obtained at open-circuit conditions. PMID:25658387
Markov model plus k-word distributions: a synergy that produces novel statistical measures for sequence comparison.

PubMed

Dai, Qi; Yang, Yanchun; Wang, Tianming

2008-10-15

Many proposed statistical measures can efficiently compare biological sequences to further infer their structures, functions and evolutionary information. They are related in spirit because all the ideas for sequence comparison try to use the information on the k-word distributions, Markov model or both. Motivated by adding k-word distributions to Markov model directly, we investigated two novel statistical measures for sequence comparison, called wre.k.r and S2.k.r. The proposed measures were tested by similarity search, evaluation on functionally related regulatory sequences and phylogenetic analysis. This offers the systematic and quantitative experimental assessment of our measures. Moreover, we compared our achievements with these based on alignment or alignment-free. We grouped our experiments into two sets. The first one, performed via ROC (receiver operating curve) analysis, aims at assessing the intrinsic ability of our statistical measures to search for similar sequences from a database and discriminate functionally related regulatory sequences from unrelated sequences. The second one aims at assessing how well our statistical measure is used for phylogenetic analysis. The experimental assessment demonstrates that our similarity measures intending to incorporate k-word distributions into Markov model are more efficient.
A computational proposal for designing structured RNA pools for in vitro selection of RNAs.

PubMed

Kim, Namhee; Gan, Hin Hark; Schlick, Tamar

2007-04-01

Although in vitro selection technology is a versatile experimental tool for discovering novel synthetic RNA molecules, finding complex RNA molecules is difficult because most RNAs identified from random sequence pools are simple motifs, consistent with recent computational analysis of such sequence pools. Thus, enriching in vitro selection pools with complex structures could increase the probability of discovering novel RNAs. Here we develop an approach for engineering sequence pools that links RNA sequence space regions with corresponding structural distributions via a "mixing matrix" approach combined with a graph theory analysis. We define five classes of mixing matrices motivated by covariance mutations in RNA; these constructs define nucleotide transition rates and are applied to chosen starting sequences to yield specific nonrandom pools. We examine the coverage of sequence space as a function of the mixing matrix and starting sequence via clustering analysis. We show that, in contrast to random sequences, which are associated only with a local region of sequence space, our designed pools, including a structured pool for GTP aptamers, can target specific motifs. It follows that experimental synthesis of designed pools can benefit from using optimized starting sequences, mixing matrices, and pool fractions associated with each of our constructed pools as a guide. Automation of our approach could provide practical tools for pool design applications for in vitro selection of RNAs and related problems.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

PubMed

Cao, Yinhe; Tung, Wen-Wen; Gao, J B

2004-01-01

With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Identification of a Heterozygous SPG11 Mutation by Clinical Exome Sequencing in a Patient With Hereditary Spastic Paraplegia: A Case Report.

PubMed

Oh, Ja-Young; Do, Hyun Jung; Lee, Seungok; Jang, Ja-Hyun; Cho, Eun-Hae; Jang, Dae-Hyun

2016-12-01

Next-generation sequencing, such as whole-genome sequencing, whole-exome sequencing, and targeted panel sequencing have been applied for diagnosis of many genetic diseases, and are in the process of replacing the traditional methods of genetic analysis. Clinical exome sequencing (CES), which provides not only sequence variation data but also clinical interpretation, aids in reaching a final conclusion with regards to genetic diagnosis. Sequencing of genes with clinical relevance rather than whole exome sequencing might be more suitable for the diagnosis of known hereditary disease with genetic heterogeneity. Here, we present the clinical usefulness of CES for the diagnosis of hereditary spastic paraplegia (HSP). We report a case of patient who was strongly suspected of having HSP based on her clinical manifestations. HSP is one of the diseases with high genetic heterogeneity, the 72 different loci and 59 discovered genes identified so far. Therefore, traditional approach for diagnosis of HSP with genetic analysis is very challenging and time-consuming. CES with TruSight One Sequencing Panel, which enriches about 4,800 genes with clinical relevance, revealed compound heterozygous mutations in SPG11 . One workflow and one procedure can provide the results of genetic analysis, and CES with enrichment of clinically relevant genes is a cost-effective and time-saving diagnostic tool for diseases with genetic heterogeneity, including HSP.
Sequence Alignment to Predict Across Species Susceptibility ...

EPA Pesticide Factsheets

Conservation of a molecular target across species can be used as a line-of-evidence to predict the likelihood of chemical susceptibility. The web-based Sequence Alignment to Predict Across Species Susceptibility (SeqAPASS) tool was developed to simplify, streamline, and quantitatively assess protein sequence/structural similarity across taxonomic groups as a means to predict relative intrinsic susceptibility. The intent of the tool is to allow for evaluation of any potential protein target, so it is amenable to variable degrees of protein characterization, depending on available information about the chemical/protein interaction and the molecular target itself. To allow for flexibility in the analysis, a layered strategy was adopted for the tool. The first level of the SeqAPASS analysis compares primary amino acid sequences to a query sequence, calculating a metric for sequence similarity (including detection of candidate orthologs), the second level evaluates sequence similarity within selected domains (e.g., ligand-binding domain, DNA binding domain), and the third level of analysis compares individual amino acid residue positions identified as being of importance for protein conformation and/or ligand binding upon chemical perturbation. Each level of the SeqAPASS analysis provides increasing evidence to apply toward rapid, screening-level assessments of probable cross species susceptibility. Such analyses can support prioritization of chemicals for further ev
Sequence-dependent modelling of local DNA bending phenomena: curvature prediction and vibrational analysis.

PubMed

Vlahovicek, K; Munteanu, M G; Pongor, S

1999-01-01

Bending is a local conformational micropolymorphism of DNA in which the original B-DNA structure is only distorted but not extensively modified. Bending can be predicted by simple static geometry models as well as by a recently developed elastic model that incorporate sequence dependent anisotropic bendability (SDAB). The SDAB model qualitatively explains phenomena including affinity of protein binding, kinking, as well as sequence-dependent vibrational properties of DNA. The vibrational properties of DNA segments can be studied by finite element analysis of a model subjected to an initial bending moment. The frequency spectrum is obtained by applying Fourier analysis to the displacement values in the time domain. This analysis shows that the spectrum of the bending vibrations quite sensitively depends on the sequence, for example the spectrum of a curved sequence is characteristically different from the spectrum of straight sequence motifs of identical basepair composition. Curvature distributions are genome-specific, and pronounced differences are found between protein-coding and regulatory regions, respectively, that is, sites of extreme curvature and/or bendability are less frequent in protein-coding regions. A WWW server is set up for the prediction of curvature and generation of 3D models from DNA sequences (http:@www.icgeb.trieste.it/dna).

Molecular characterization of the vitamin D receptor (VDR) gene in Holstein cows.

PubMed

Ali, Mayar O; El-Adl, Mohamed A; Ibrahim, Hussam M M; Elseedy, Youssef Y; Rizk, Mohamed A; El-Khodery, Sabry A

2018-06-01

Vitamin D plays a vital role in calcium homeostasis, growth, and immunoregulation. Because little is known about the vitamin D receptor (VDR) gene in cattle, the aim of the present investigation was to present the molecular characterization of exons 5 and 6 of the VDR gene in Holstein cows. DNA extraction, genomic sequencing, phylogenetic analysis, synteny mapping and single nucleotide gene polymorphism analysis of the VDR gene were performed to assess blood samples collected from 50 clinically healthy Holstein cows. The results revealed the presence of a 450-base pair (bp) nucleotide sequence that resembled exons 5 and 6 with intron 5 enclosed between these exons. Sequence alignment and phylogenetic analysis revealed a close relationship between the sequenced VDR region and that found in Hereford cattle. A close association between this region and the corresponding region in small ruminants was also documented. Moreover, a single nucleotide polymorphism (SNP) that caused the replacement of a glutamate with an arginine in the deduced amino acid sequence was detected at position 7 of exon 5. In conclusion, Holstein and Hereford cattle differ with respect to exon 5 of the VDR gene. Phylogenetic analysis of the VDR gene based on nucleotide sequence produced different results from prior analyses based on amino acid sequence. Copyright © 2018 Elsevier Ltd. All rights reserved.
A streamlined method for analysing genome-wide DNA methylation patterns from low amounts of FFPE DNA.

PubMed

Ludgate, Jackie L; Wright, James; Stockwell, Peter A; Morison, Ian M; Eccles, Michael R; Chatterjee, Aniruddha

2017-08-31

Formalin fixed paraffin embedded (FFPE) tumor samples are a major source of DNA from patients in cancer research. However, FFPE is a challenging material to work with due to macromolecular fragmentation and nucleic acid crosslinking. FFPE tissue particularly possesses challenges for methylation analysis and for preparing sequencing-based libraries relying on bisulfite conversion. Successful bisulfite conversion is a key requirement for sequencing-based methylation analysis. Here we describe a complete and streamlined workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues. This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test, preparing reduced representation bisulfite sequencing libraries and massively parallel sequencing. The main features and advantages of this protocol are: An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. We provide a complete workflow and an integrated protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue.
Complete Genome Sequence of Porcine Parvovirus 2 Recovered from Swine Sera.

PubMed

Campos, F S; Kluge, M; Franco, A C; Giongo, A; Valdez, F P; Saddi, T M; Brito, W M E D; Roehe, P M

2016-01-28

A complete genomic sequence of porcine parvovirus 2 (PPV-2) was detected by viral metagenome analysis on swine sera. A phylogenetic analysis of this genome reveals that it is highly similar to previously reported North American PPV-2 genomes. The complete PPV-2 sequence is 5,426 nucleotides long. Copyright © 2016 Campos et al.
Student Initiatives and Missed Learning Opportunities in an IRF Sequence: A Single Case Analysis

ERIC Educational Resources Information Center

Li, Houxiang

2013-01-01

Most conversation analysis (CA) studies of the initiation-response-feedback (IRF; Sinclair & Coulthard, 1975) sequence have focused on teacher actions in the feedback move. In this article, I use CA to analyze student initiatives (Waring, 2011) within an IRF sequence in one excerpt from a Chinese as a foreign language class. The excerpt…
ICO amplicon NGS data analysis: a Web tool for variant detection in common high-risk hereditary cancer genes analyzed by amplicon GS Junior next-generation sequencing.

PubMed

Lopez-Doriga, Adriana; Feliubadaló, Lídia; Menéndez, Mireia; Lopez-Doriga, Sergio; Morón-Duran, Francisco D; del Valle, Jesús; Tornero, Eva; Montes, Eva; Cuesta, Raquel; Campos, Olga; Gómez, Carolina; Pineda, Marta; González, Sara; Moreno, Victor; Capellá, Gabriel; Lázaro, Conxi

2014-03-01

Next-generation sequencing (NGS) has revolutionized genomic research and is set to have a major impact on genetic diagnostics thanks to the advent of benchtop sequencers and flexible kits for targeted libraries. Among the main hurdles in NGS are the difficulty of performing bioinformatic analysis of the huge volume of data generated and the high number of false positive calls that could be obtained, depending on the NGS technology and the analysis pipeline. Here, we present the development of a free and user-friendly Web data analysis tool that detects and filters sequence variants, provides coverage information, and allows the user to customize some basic parameters. The tool has been developed to provide accurate genetic analysis of targeted sequencing of common high-risk hereditary cancer genes using amplicon libraries run in a GS Junior System. The Web resource is linked to our own mutation database, to assist in the clinical classification of identified variants. We believe that this tool will greatly facilitate the use of the NGS approach in routine laboratories.
Effects of Sequences of Cognitions on Group Performance Over Time

PubMed Central

Molenaar, Inge; Chiu, Ming Ming

2017-01-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions. PMID:28490854
Effects of Sequences of Cognitions on Group Performance Over Time.

PubMed

Molenaar, Inge; Chiu, Ming Ming

2017-04-01

Extending past research showing that sequences of low cognitions (low-level processing of information) and high cognitions (high-level processing of information through questions and elaborations) influence the likelihoods of subsequent high and low cognitions, this study examines whether sequences of cognitions are related to group performance over time; 54 primary school students (18 triads) discussed and wrote an essay about living in another country (32,375 turns of talk). Content analysis and statistical discourse analysis showed that within each lesson, groups with more low cognitions or more sequences of low cognition followed by high cognition added more essay words. Groups with more high cognitions, sequences of low cognition followed by low cognition, or sequences of high cognition followed by an action followed by low cognition, showed different words and sequences, suggestive of new ideas. The links between cognition sequences and group performance over time can inform facilitation and assessment of student discussions.
BiQ Analyzer HT: locus-specific analysis of DNA methylation by high-throughput bisulfite sequencing

PubMed Central

Lutsik, Pavlo; Feuerbach, Lars; Arand, Julia; Lengauer, Thomas; Walter, Jörn; Bock, Christoph

2011-01-01

Bisulfite sequencing is a widely used method for measuring DNA methylation in eukaryotic genomes. The assay provides single-base pair resolution and, given sufficient sequencing depth, its quantitative accuracy is excellent. High-throughput sequencing of bisulfite-converted DNA can be applied either genome wide or targeted to a defined set of genomic loci (e.g. using locus-specific PCR primers or DNA capture probes). Here, we describe BiQ Analyzer HT (http://biq-analyzer-ht.bioinf.mpi-inf.mpg.de/), a user-friendly software tool that supports locus-specific analysis and visualization of high-throughput bisulfite sequencing data. The software facilitates the shift from time-consuming clonal bisulfite sequencing to the more quantitative and cost-efficient use of high-throughput sequencing for studying locus-specific DNA methylation patterns. In addition, it is useful for locus-specific visualization of genome-wide bisulfite sequencing data. PMID:21565797
Molecular characterization of Taenia multiceps isolates from Gansu Province, China by sequencing of mitochondrial cytochrome C oxidase subunit 1.

PubMed

Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu; Fu, Bao Quan

2013-04-01

A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species.
Molecular Characterization of Taenia multiceps Isolates from Gansu Province, China by Sequencing of Mitochondrial Cytochrome C Oxidase Subunit 1

PubMed Central

Li, Wen Hui; Jia, Wan Zhong; Qu, Zi Gang; Xie, Zhi Zhou; Luo, Jian Xun; Yin, Hong; Sun, Xiao Lin; Blaga, Radu

2013-01-01

A total of 16 Taenia multiceps isolates collected from naturally infected sheep or goats in Gansu Province, China were characterized by sequences of mitochondrial cytochrome c oxidase subunit 1 (cox1) gene. The complete cox1 gene was amplified for individual T. multiceps isolates by PCR, ligated to pMD18T vector, and sequenced. Sequence analysis indicated that out of 16 T. multiceps isolates 10 unique cox1 gene sequences of 1,623 bp were obtained with sequence variation of 0.12-0.68%. The results showed that the cox1 gene sequences were highly conserved among the examined T. multiceps isolates. However, they were quite different from those of the other Taenia species. Phylogenetic analysis based on complete cox1 gene sequences revealed that T. multiceps isolates were composed of 3 genotypes and distinguished from the other Taenia species. PMID:23710087
Sequence Complexity of Amyloidogenic Regions in Intrinsically Disordered Human Proteins

PubMed Central

Das, Swagata; Pal, Uttam; Das, Supriya; Bagga, Khyati; Roy, Anupam; Mrigwani, Arpita; Maiti, Nakul C.

2014-01-01

An amyloidogenic region (AR) in a protein sequence plays a significant role in protein aggregation and amyloid formation. We have investigated the sequence complexity of AR that is present in intrinsically disordered human proteins. More than 80% human proteins in the disordered protein databases (DisProt+IDEAL) contained one or more ARs. With decrease of protein disorder, AR content in the protein sequence was decreased. A probability density distribution analysis and discrete analysis of AR sequences showed that ∼8% residue in a protein sequence was in AR and the region was in average 8 residues long. The residues in the AR were high in sequence complexity and it seldom overlapped with low complexity regions (LCR), which was largely abundant in disorder proteins. The sequences in the AR showed mixed conformational adaptability towards α-helix, β-sheet/strand and coil conformations. PMID:24594841
Model-based quality assessment and base-calling for second-generation sequencing data.

PubMed

Bravo, Héctor Corrada; Irizarry, Rafael A

2010-09-01

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, making it capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1000 Genomes Project, plans to fully sequence the genomes of approximately 1200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads-strings of A,C,G, or T's, between 30 and 100 characters long-which are the result of complex processing of noisy continuous fluorescence intensity measurements known as base-calling. The complexity of the base-calling discretization process results in reads of widely varying quality within and across sequence samples. This variation in processing quality results in infrequent but systematic errors that we have found to mislead downstream analysis of the discretized sequence read data. For instance, a central goal of the 1000 Genomes Project is to quantify across-sample variation at the single nucleotide level. At this resolution, small error rates in sequencing prove significant, especially for rare variants. Sec-gen sequencing is a relatively new technology for which potential biases and sources of obscuring variation are not yet fully understood. Therefore, modeling and quantifying the uncertainty inherent in the generation of sequence reads is of utmost importance. In this article, we present a simple model to capture uncertainty arising in the base-calling procedure of the Illumina/Solexa GA platform. Model parameters have a straightforward interpretation in terms of the chemistry of base-calling allowing for informative and easily interpretable metrics that capture the variability in sequencing quality. Our model provides these informative estimates readily usable in quality assessment tools while significantly improving base-calling performance. © 2009, The International Biometric Society.
Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo) genome assembly and analysis

USDA-ARS?s Scientific Manuscript database

Next-generation sequencing technologies were used to rapidly and efficiently sequence the genome of the domestic turkey (Meleagris gallopavo). The current genome assembly (~1.1 Gb) includes 917 Mb of sequence assigned to chromosomes. Innate heterozygosity of the sequenced bird allowed discovery of...
Bellerophon: A program to detect chimeric sequences in multiple sequence alignments

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, Thomas; Faulkner, Geoffrey; Hugenholtz, Philip

2003-12-23

Bellerophon is a program for detecting chimeric sequences in multiple sequence datasets by an adaption of partial treeing analysis. Bellerophon was specifically developed to detect 16S rRNA gene chimeras in PCR-clone libraries of environmental samples but can be applied to other nucleotide sequence alignments.
Can natural proteins designed with 'inverted' peptide sequences adopt native-like protein folds?

PubMed

Sridhar, Settu; Guruprasad, Kunchur

2014-01-01

We have carried out a systematic computational analysis on a representative dataset of proteins of known three-dimensional structure, in order to evaluate whether it would possible to 'swap' certain short peptide sequences in naturally occurring proteins with their corresponding 'inverted' peptides and generate 'artificial' proteins that are predicted to retain native-like protein fold. The analysis of 3,967 representative proteins from the Protein Data Bank revealed 102,677 unique identical inverted peptide sequence pairs that vary in sequence length between 5-12 and 18 amino acid residues. Our analysis illustrates with examples that such 'artificial' proteins may be generated by identifying peptides with 'similar structural environment' and by using comparative protein modeling and validation studies. Our analysis suggests that natural proteins may be tolerant to accommodating such peptides.
The MIGenAS integrated bioinformatics toolkit for web-based sequence analysis

PubMed Central

Rampp, Markus; Soddemann, Thomas; Lederer, Hermann

2006-01-01

We describe a versatile and extensible integrated bioinformatics toolkit for the analysis of biological sequences over the Internet. The web portal offers convenient interactive access to a growing pool of chainable bioinformatics software tools and databases that are centrally installed and maintained by the RZG. Currently, supported tasks comprise sequence similarity searches in public or user-supplied databases, computation and validation of multiple sequence alignments, phylogenetic analysis and protein–structure prediction. Individual tools can be seamlessly chained into pipelines allowing the user to conveniently process complex workflows without the necessity to take care of any format conversions or tedious parsing of intermediate results. The toolkit is part of the Max-Planck Integrated Gene Analysis System (MIGenAS) of the Max Planck Society available at (click ‘Start Toolkit’). PMID:16844980
Analysis of beta-carotene hydroxylase gene cDNA isolated from the American oil-palm (Elaeis oleifera) mesocarp tissue cDNA library

PubMed Central

Bhore, Subhash J; Kassim, Amelia; Loh, Chye Ying; Shah, Farida H

2010-01-01

It is well known that the nutritional quality of the American oil-palm (Elaeis oleifera) mesocarp oil is superior to that of African oil-palm (Elaeis guineensis Jacq. Tenera) mesocarp oil. Therefore, it is of important to identify the genetic features for its superior value. This could be achieved through the genome sequencing of the oil-palm. However, the genome sequence is not available in the public domain due to commercial secrecy. Hence, we constructed a cDNA library and generated expressed sequence tags (3,205) from the mesocarp tissue of the American oil-palm. We continued to annotate each of these cDNAs after submitting to GenBank/DDBJ/EMBL. A rough analysis turned our attention to the beta-carotene hydroxylase (Chyb) enzyme encoding cDNA. Then, we completed the full sequencing of cDNA clone for its both strands using M13 forward and reverse primers. The full nucleotide and protein sequence was further analyzed and annotated using various Bioinformatics tools. The analysis results showed the presence of fatty acid hydroxylase superfamily domain in the protein sequence. The multiple sequence alignment of selected Chyb amino acid sequences from other plant species and algal members with E. oleifera Chyb using ClustalW and its phylogenetic analysis suggest that Chyb from monocotyledonous plant species, Lilium hubrid, Crocus sativus and Zea mays are the most evolutionary related with E. oleifera Chyb. This study reports the annotation of E. oleifera Chyb. Abbreviations ESTs - expressed sequence tags, EoChyb - Elaeis oleifera beta-carotene hydroxylase, MC - main cluster PMID:21364789
Bidirectional Retroviral Integration Site PCR Methodology and Quantitative Data Analysis Workflow.

PubMed

Suryawanshi, Gajendra W; Xu, Song; Xie, Yiming; Chou, Tom; Kim, Namshin; Chen, Irvin S Y; Kim, Sanggu

2017-06-14

Integration Site (IS) assays are a critical component of the study of retroviral integration sites and their biological significance. In recent retroviral gene therapy studies, IS assays, in combination with next-generation sequencing, have been used as a cell-tracking tool to characterize clonal stem cell populations sharing the same IS. For the accurate comparison of repopulating stem cell clones within and across different samples, the detection sensitivity, data reproducibility, and high-throughput capacity of the assay are among the most important assay qualities. This work provides a detailed protocol and data analysis workflow for bidirectional IS analysis. The bidirectional assay can simultaneously sequence both upstream and downstream vector-host junctions. Compared to conventional unidirectional IS sequencing approaches, the bidirectional approach significantly improves IS detection rates and the characterization of integration events at both ends of the target DNA. The data analysis pipeline described here accurately identifies and enumerates identical IS sequences through multiple steps of comparison that map IS sequences onto the reference genome and determine sequencing errors. Using an optimized assay procedure, we have recently published the detailed repopulation patterns of thousands of Hematopoietic Stem Cell (HSC) clones following transplant in rhesus macaques, demonstrating for the first time the precise time point of HSC repopulation and the functional heterogeneity of HSCs in the primate system. The following protocol describes the step-by-step experimental procedure and data analysis workflow that accurately identifies and quantifies identical IS sequences.
Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

NASA Astrophysics Data System (ADS)

Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

2017-01-01

Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.
Analysis of DNA Sequences by An Optical Time-Integrating Correlator: Proof-Of-Concept Experiments.

DTIC Science & Technology

1992-05-01

TABLES xv LIST OF ABBREVIATIONS xvii 1.0 INTRODUCTION 1 2.0 DNA ANALYSIS STRATEGY 4 2.1 Representation of DNA Bases 4 2.2 DNA Analysis Strategy 6 3.0...Zehnder architecture. 3 Figure 3: Short representations of the DNA bases where each base is represented by a 7-bits long pseudorandom sequence. 5... DNA bases where each base is represented by 7-bits long pseudorandom sequences. 4 Table 2: Long representations of the DNA bases with 255-bits maximum

Cloning and sequence analysis of Hemonchus contortus HC58cDNA.

PubMed

Muleke, Charles I; Ruofeng, Yan; Lixin, Xu; Xinwen, Bo; Xiangrui, Li

2007-06-01

The complete coding sequence of Hemonchus contortus HC58cDNA was generated by rapid amplification of cDNA ends and polymerase chain reaction using primers based on the 5' and 3' ends of the parasite mRNA, accession no. AF305964. The HC58cDNA gene was 851 bp long, with open reading frame of 717 bp, precursors to 239 amino acids coding for approximately 27 kDa protein. Analysis of amino acid sequence revealed conserved residues of cysteine, histidine, asparagine, occluding loop pattern, hemoglobinase motif and glutamine of the oxyanion hole characteristic of cathepsin B like proteases (CBL). Comparison of the predicted amino acid sequences showed the protein shared 33.5-58.7% identity to cathepsin B homologues in the papain clan CA family (family C1). Phylogenetic analysis revealed close evolutionary proximity of the protein sequence to counterpart sequences in the CBL, suggesting that HC58cDNA was a member of the papain family.
Phylogenetic Analysis of Ruminant Theileria spp. from China Based on 28S Ribosomal RNA Gene

PubMed Central

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze

2013-01-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode. PMID:24327775
Phylogenetic analysis of ruminant Theileria spp. from China based on 28S ribosomal RNA gene.

PubMed

Gou, Huitian; Guan, Guiquan; Ma, Miling; Liu, Aihong; Liu, Zhijie; Xu, Zongke; Ren, Qiaoyun; Li, Youquan; Yang, Jifei; Chen, Ze; Yin, Hong; Luo, Jianxun

2013-10-01

Species identification using DNA sequences is the basis for DNA taxonomy. In this study, we sequenced the ribosomal large-subunit RNA gene sequences (3,037-3,061 bp) in length of 13 Chinese Theileria stocks that were infective to cattle and sheep. The complete 28S rRNA gene is relatively difficult to amplify and its conserved region is not important for phylogenetic study. Therefore, we selected the D2-D3 region from the complete 28S rRNA sequences for phylogenetic analysis. Our analyses of 28S rRNA gene sequences showed that the 28S rRNA was useful as a phylogenetic marker for analyzing the relationships among Theileria spp. in ruminants. In addition, the D2-D3 region was a short segment that could be used instead of the whole 28S rRNA sequence during the phylogenetic analysis of Theileria, and it may be an ideal DNA barcode.
Clonal architecture of secondary acute myeloid leukemia defined by single-cell sequencing.

PubMed

Hughes, Andrew E O; Magrini, Vincent; Demeter, Ryan; Miller, Christopher A; Fulton, Robert; Fulton, Lucinda L; Eades, William C; Elliott, Kevin; Heath, Sharon; Westervelt, Peter; Ding, Li; Conrad, Donald F; White, Brian S; Shao, Jin; Link, Daniel C; DiPersio, John F; Mardis, Elaine R; Wilson, Richard K; Ley, Timothy J; Walter, Matthew J; Graubert, Timothy A

2014-07-01

Next-generation sequencing has been used to infer the clonality of heterogeneous tumor samples. These analyses yield specific predictions-the population frequency of individual clones, their genetic composition, and their evolutionary relationships-which we set out to test by sequencing individual cells from three subjects diagnosed with secondary acute myeloid leukemia, each of whom had been previously characterized by whole genome sequencing of unfractionated tumor samples. Single-cell mutation profiling strongly supported the clonal architecture implied by the analysis of bulk material. In addition, it resolved the clonal assignment of single nucleotide variants that had been initially ambiguous and identified areas of previously unappreciated complexity. Accordingly, we find that many of the key assumptions underlying the analysis of tumor clonality by deep sequencing of unfractionated material are valid. Furthermore, we illustrate a single-cell sequencing strategy for interrogating the clonal relationships among known variants that is cost-effective, scalable, and adaptable to the analysis of both hematopoietic and solid tumors, or any heterogeneous population of cells.
Harnessing Whole Genome Sequencing in Medical Mycology.

PubMed

Cuomo, Christina A

2017-01-01

Comparative genome sequencing studies of human fungal pathogens enable identification of genes and variants associated with virulence and drug resistance. This review describes current approaches, resources, and advances in applying whole genome sequencing to study clinically important fungal pathogens. Genomes for some important fungal pathogens were only recently assembled, revealing gene family expansions in many species and extreme gene loss in one obligate species. The scale and scope of species sequenced is rapidly expanding, leveraging technological advances to assemble and annotate genomes with higher precision. By using iteratively improved reference assemblies or those generated de novo for new species, recent studies have compared the sequence of isolates representing populations or clinical cohorts. Whole genome approaches provide the resolution necessary for comparison of closely related isolates, for example, in the analysis of outbreaks or sampled across time within a single host. Genomic analysis of fungal pathogens has enabled both basic research and diagnostic studies. The increased scale of sequencing can be applied across populations, and new metagenomic methods allow direct analysis of complex samples.
Evaluation of 16S Rrna amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Evaluation of 16S rRNA amplicon sequencing using two next-generation sequencing technologies for phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Prefrontal neural correlates of memory for sequences.

PubMed

Averbeck, Bruno B; Lee, Daeyeol

2007-02-28

The sequence of actions appropriate to solve a problem often needs to be discovered by trial and error and recalled in the future when faced with the same problem. Here, we show that when monkeys had to discover and then remember a sequence of decisions across trials, ensembles of prefrontal cortex neurons reflected the sequence of decisions the animal would make throughout the interval between trials. This signal could reflect either an explicit memory process or a sequence-planning process that begins far in advance of the actual sequence execution. This finding extended to error trials such that, when the neural activity during the intertrial interval specified the wrong sequence, the animal also attempted to execute an incorrect sequence. More specifically, we used a decoding analysis to predict the sequence the monkey was planning to execute at the end of the fore-period, just before sequence execution. When this analysis was applied to error trials, we were able to predict where in the sequence the error would occur, up to three movements into the future. This suggests that prefrontal neural activity can retain information about sequences between trials, and that regardless of whether information is remembered correctly or incorrectly, the prefrontal activity veridically reflects the animal's action plan.
Application of representational difference analysis to identify genomic differences between Bradyrhizobium elkanii and B. Japonicum species.

PubMed

Soares, René Arderius; Passaglia, Luciane Maria Pereira

2010-10-01

Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.
Molecular Phylogenetic Analysis of Archaeal Intron-Containing Genes Coding for rRNA Obtained from a Deep-Subsurface Geothermal Water Pool

PubMed Central

Takai, Ken; Horikoshi, Koki

1999-01-01

Molecular phylogenetic analysis of a naturally occurring microbial community in a deep-subsurface geothermal environment indicated that the phylogenetic diversity of the microbial population in the environment was extremely limited and that only hyperthermophilic archaeal members closely related to Pyrobaculum were present. All archaeal ribosomal DNA sequences contained intron-like sequences, some of which had open reading frames with repeated homing-endonuclease motifs. The sequence similarity analysis and the phylogenetic analysis of these homing endonucleases suggested the possible phylogenetic relationship among archaeal rRNA-encoded homing endonucleases. PMID:10584021
Interactive computer programs for the graphic analysis of nucleotide sequence data.

PubMed Central

Luckow, V A; Littlewood, R K; Rownd, R H

1984-01-01

A group of interactive computer programs have been developed which aid in the collection and graphical analysis of nucleotide and protein sequence data. The programs perform the following basic functions: a) enter, edit, list, and rearrange sequence data; b) permit automatic entry of nucleotide sequence data directly from an autoradiograph into the computer; c) search for restriction sites or other specified patterns and plot a linear or circular restriction map, or print their locations; d) plot base composition; e) analyze homology between sequences by plotting a two-dimensional graphic matrix; and f) aid in plotting predicted secondary structures of RNA molecules. PMID:6546437
A disruptive sequencer meets disruptive publishing.

PubMed

Loman, Nick; Goodwin, Sarah; Jansen, Hans; Loose, Matt

2015-01-01

Nanopore sequencing was recently made available to users in the form of the Oxford Nanopore MinION. Released to users through an early access programme, the MinION is made unique by its tiny form factor and ability to generate very long sequences from single DNA molecules. The platform is undergoing rapid evolution with three distinct nanopore types and five updates to library preparation chemistry in the last 18 months. To keep pace with the rapid evolution of this sequencing platform, and to provide a space where new analysis methods can be openly discussed, we present a new F1000Research channel devoted to updates to and analysis of nanopore sequence data.
Quantitative analysis of the anti-noise performance of an m-sequence in an electromagnetic method

NASA Astrophysics Data System (ADS)

Yuan, Zhe; Zhang, Yiming; Zheng, Qijia

2018-02-01

An electromagnetic method with a transmitted waveform coded by an m-sequence achieved better anti-noise performance compared to the conventional manner with a square-wave. The anti-noise performance of the m-sequence varied with multiple coding parameters; hence, a quantitative analysis of the anti-noise performance for m-sequences with different coding parameters was required to optimize them. This paper proposes the concept of an identification system, with the identified Earth impulse response obtained by measuring the system output with the input of the voltage response. A quantitative analysis of the anti-noise performance of the m-sequence was achieved by analyzing the amplitude-frequency response of the corresponding identification system. The effects of the coding parameters on the anti-noise performance are summarized by numerical simulation, and their optimization is further discussed in our conclusions; the validity of the conclusions is further verified by field experiment. The quantitative analysis method proposed in this paper provides a new insight into the anti-noise mechanism of the m-sequence, and could be used to evaluate the anti-noise performance of artificial sources in other time-domain exploration methods, such as the seismic method.
The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.

PubMed

Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H

2018-01-05

Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.
Evaluating the protein coding potential of exonized transposable element sequences

PubMed Central

Piriyapongsa, Jittima; Rutledge, Mark T; Patel, Sanil; Borodovsky, Mark; Jordan, I King

2007-01-01

Background Transposable element (TE) sequences, once thought to be merely selfish or parasitic members of the genomic community, have been shown to contribute a wide variety of functional sequences to their host genomes. Analysis of complete genome sequences have turned up numerous cases where TE sequences have been incorporated as exons into mRNAs, and it is widely assumed that such 'exonized' TEs encode protein sequences. However, the extent to which TE-derived sequences actually encode proteins is unknown and a matter of some controversy. We have tried to address this outstanding issue from two perspectives: i-by evaluating ascertainment biases related to the search methods used to uncover TE-derived protein coding sequences (CDS) and ii-through a probabilistic codon-frequency based analysis of the protein coding potential of TE-derived exons. Results We compared the ability of three classes of sequence similarity search methods to detect TE-derived sequences among data sets of experimentally characterized proteins: 1-a profile-based hidden Markov model (HMM) approach, 2-BLAST methods and 3-RepeatMasker. Profile based methods are more sensitive and more selective than the other methods evaluated. However, the application of profile-based search methods to the detection of TE-derived sequences among well-curated experimentally characterized protein data sets did not turn up many more cases than had been previously detected and nowhere near as many cases as recent genome-wide searches have. We observed that the different search methods used were complementary in the sense that they yielded largely non-overlapping sets of hits and differed in their ability to recover known cases of TE-derived CDS. The probabilistic analysis of TE-derived exon sequences indicates that these sequences have low protein coding potential on average. In particular, non-autonomous TEs that do not encode protein sequences, such as Alu elements, are frequently exonized but unlikely to encode protein sequences. Conclusion The exaptation of the numerous TE sequences found in exons as bona fide protein coding sequences may prove to be far less common than has been suggested by the analysis of complete genomes. We hypothesize that many exonized TE sequences actually function as post-transcriptional regulators of gene expression, rather than coding sequences, which may act through a variety of double stranded RNA related regulatory pathways. Indeed, their relatively high copy numbers and similarity to sequences dispersed throughout the genome suggests that exonized TE sequences could serve as master regulators with a wide scope of regulatory influence. Reviewers: This article was reviewed by Itai Yanai, Kateryna D. Makova, Melissa Wilson (nominated by Kateryna D. Makova) and Cedric Feschotte (nominated by John M. Logsdon Jr.). PMID:18036258
Use of Sequence-independent, single-primer amplification (SISPA) with NGS platform for detection of RNA viruses in clinical samples

USDA-ARS?s Scientific Manuscript database

Current technologies for next generation sequencing (NGS) have revolutionized metagenomics analysis of clinical samples. One advantage of the NGS platform is the possibility to sequence the genetic material in samples without any prior knowledge of the sequence contained within. Sequence-Independent...
Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

PubMed

Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

2017-04-01

Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
Conservation and variability of West Nile virus proteins.

PubMed

Koo, Qi Ying; Khan, Asif M; Jung, Keun-Ok; Ramdas, Shweta; Miotto, Olivo; Tan, Tin Wee; Brusic, Vladimir; Salmon, Jerome; August, J Thomas

2009-01-01

West Nile virus (WNV) has emerged globally as an increasingly important pathogen for humans and domestic animals. Studies of the evolutionary diversity of the virus over its known history will help to elucidate conserved sites, and characterize their correspondence to other pathogens and their relevance to the immune system. We describe a large-scale analysis of the entire WNV proteome, aimed at identifying and characterizing evolutionarily conserved amino acid sequences. This study, which used 2,746 WNV protein sequences collected from the NCBI GenPept database, focused on analysis of peptides of length 9 amino acids or more, which are immunologically relevant as potential T-cell epitopes. Entropy-based analysis of the diversity of WNV sequences, revealed the presence of numerous evolutionarily stable nonamer positions across the proteome (entropy value of < or = 1). The representation (frequency) of nonamers variant to the predominant peptide at these stable positions was, generally, low (< or = 10% of the WNV sequences analyzed). Eighty-eight fragments of length 9-29 amino acids, representing approximately 34% of the WNV polyprotein length, were identified to be identical and evolutionarily stable in all analyzed WNV sequences. Of the 88 completely conserved sequences, 67 are also present in other flaviviruses, and several have been associated with the functional and structural properties of viral proteins. Immunoinformatic analysis revealed that the majority (78/88) of conserved sequences are potentially immunogenic, while 44 contained experimentally confirmed human T-cell epitopes. This study identified a comprehensive catalogue of completely conserved WNV sequences, many of which are shared by other flaviviruses, and majority are potential epitopes. The complete conservation of these immunologically relevant sequences through the entire recorded WNV history suggests they will be valuable as components of peptide-specific vaccines or other therapeutic applications, for sequence-specific diagnosis of a wide-range of Flavivirus infections, and for studies of homologous sequences among other flaviviruses.
Targeted Analysis of Whole Genome Sequence Data to Diagnose Genetic Cardiomyopathy

DOE PAGES

Golbus, Jessica R.; Puckelwartz, Megan J.; Dellefave-Castillo, Lisa; ...

2014-09-01

Background—Cardiomyopathy is highly heritable but genetically diverse. At present, genetic testing for cardiomyopathy uses targeted sequencing to simultaneously assess the coding regions of more than 50 genes. New genes are routinely added to panels to improve the diagnostic yield. With the anticipated $1000 genome, it is expected that genetic testing will shift towards comprehensive genome sequencing accompanied by targeted gene analysis. Therefore, we assessed the reliability of whole genome sequencing and targeted analysis to identify cardiomyopathy variants in 11 subjects with cardiomyopathy. Methods and Results—Whole genome sequencing with an average of 37× coverage was combined with targeted analysis focused onmore » 204 genes linked to cardiomyopathy. Genetic variants were scored using multiple prediction algorithms combined with frequency data from public databases. This pipeline yielded 1-14 potentially pathogenic variants per individual. Variants were further analyzed using clinical criteria and/or segregation analysis. Three of three previously identified primary mutations were detected by this analysis. In six subjects for whom the primary mutation was previously unknown, we identified mutations that segregated with disease, had clinical correlates, and/or had additional pathological correlation to provide evidence for causality. For two subjects with previously known primary mutations, we identified additional variants that may act as modifiers of disease severity. In total, we identified the likely pathological mutation in 9 of 11 (82%) subjects. We conclude that these pilot data demonstrate that ~30-40× coverage whole genome sequencing combined with targeted analysis is feasible and sensitive to identify rare variants in cardiomyopathy-associated genes.« less
Multifractal analysis of 2001 Mw 7 . 7 Bhuj earthquake sequence in Gujarat, Western India

NASA Astrophysics Data System (ADS)

Aggarwal, Sandeep Kumar; Pastén, Denisse; Khan, Prosanta Kumar

2017-12-01

The 2001 Mw 7 . 7 Bhuj mainshock seismic sequence in the Kachchh area, occurring during 2001 to 2012, has been analyzed using mono-fractal and multi-fractal dimension spectrum analysis technique. This region was characterized by frequent moderate shocks of Mw ≥ 5 . 0 for more than a decade since the occurrence of 2001 Bhuj earthquake. The present study is therefore important for precursory analysis using this sequence. The selected long-sequence has been investigated first time for completeness magnitude Mc 3.0 using the maximum curvature method. Multi-fractal Dq spectrum (Dq ∼ q) analysis was carried out using effective window-length of 200 earthquakes with a moving window of 20 events overlapped by 180 events. The robustness of the analysis has been tested by considering the magnitude completeness correction term of 0.2 to Mc 3.0 as Mc 3.2 and we have tested the error in the calculus of Dq for each magnitude threshold. On the other hand, the stability of the analysis has been investigated down to the minimum magnitude of Mw ≥ 2 . 6 in the sequence. The analysis shows the multi-fractal dimension spectrum Dq decreases with increasing of clustering of events with time before a moderate magnitude earthquake in the sequence, which alternatively accounts for non-randomness in the spatial distribution of epicenters and its self-organized criticality. Similar behavior is ubiquitous elsewhere around the globe, and warns for proximity of a damaging seismic event in an area. OS: Please confirm math roman or italics in abs.

JGI Fungal Genomics Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2011-03-14

Genomes of energy and environment fungi are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 50 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functionalmore » genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such 'parts' suggested by comparative genomics and functional analysis in these areas are presented here« less
Uncommonly isolated clinical Pseudomonas: identification and phylogenetic assignation.

PubMed

Mulet, M; Gomila, M; Ramírez, A; Cardew, S; Moore, E R B; Lalucat, J; García-Valdés, E

2017-02-01

Fifty-two Pseudomonas strains that were difficult to identify at the species level in the phenotypic routine characterizations employed by clinical microbiology laboratories were selected for genotypic-based analysis. Species level identifications were done initially by partial sequencing of the DNA dependent RNA polymerase sub-unit D gene (rpoD). Two other gene sequences, for the small sub-unit ribosonal RNA (16S rRNA) and for DNA gyrase sub-unit B (gyrB) were added in a multilocus sequence analysis (MLSA) study to confirm the species identifications. These sequences were analyzed with a collection of reference sequences from the type strains of 161 Pseudomonas species within an in-house multi-locus sequence analysis database. Whole-cell matrix-assisted laser-desorption/ionization time-of-flight mass spectrometry (MALDI-TOF MS) analyses of these strains complemented the DNA sequenced-based phylogenetic analyses and were observed to be in accordance with the results of the sequence data. Twenty-three out of 52 strains were assigned to 12 recognized species not commonly detected in clinical specimens and 29 (56 %) were considered representatives of at least ten putative new species. Most strains were distributed within the P. fluorescens and P. aeruginosa lineages. The value of rpoD sequences in species-level identifications for Pseudomonas is emphasized. The correct species identifications of clinical strains is essential for establishing the intrinsic antibiotic resistance patterns and improved treatment plans.
Isolation and determination of the primary structure of a lectin protein from the serum of the American alligator (Alligator mississippiensis).

PubMed

Darville, Lancia N F; Merchant, Mark E; Maccha, Venkata; Siddavarapu, Vivekananda Reddy; Hasan, Azeem; Murray, Kermit K

2012-02-01

Mass spectrometry in conjunction with de novo sequencing was used to determine the amino acid sequence of a 35kDa lectin protein isolated from the serum of the American alligator that exhibits binding to mannose. The protein N-terminal sequence was determined using Edman degradation and enzymatic digestion with different proteases was used to generate peptide fragments for analysis by liquid chromatography tandem mass spectrometry (LC MS/MS). Separate analysis of the protein digests with multiple enzymes enhanced the protein sequence coverage. De novo sequencing was accomplished using MASCOT Distiller and PEAKS software and the sequences were searched against the NCBI database using MASCOT and BLAST to identify homologous peptides. MS analysis of the intact protein indicated that it is present primarily as monomer and dimer in vitro. The isolated 35kDa protein was ~98% sequenced and found to have 313 amino acids and nine cysteine residues and was identified as an alligator lectin. The alligator lectin sequence was aligned with other lectin sequences using DIALIGN and ClustalW software and was found to exhibit 58% and 59% similarity to both human and mouse intelectin-1. The alligator lectin exhibited strong binding affinities toward mannan and mannose as compared to other tested carbohydrates. Copyright © 2011 Elsevier Inc. All rights reserved.
Genome-wide identification of conserved intronic non-coding sequences using a Bayesian segmentation approach.

PubMed

Algama, Manjula; Tasker, Edward; Williams, Caitlin; Parslow, Adam C; Bryson-Richardson, Robert J; Keith, Jonathan M

2017-03-27

Computational identification of non-coding RNAs (ncRNAs) is a challenging problem. We describe a genome-wide analysis using Bayesian segmentation to identify intronic elements highly conserved between three evolutionarily distant vertebrate species: human, mouse and zebrafish. We investigate the extent to which these elements include ncRNAs (or conserved domains of ncRNAs) and regulatory sequences. We identified 655 deeply conserved intronic sequences in a genome-wide analysis. We also performed a pathway-focussed analysis on genes involved in muscle development, detecting 27 intronic elements, of which 22 were not detected in the genome-wide analysis. At least 87% of the genome-wide and 70% of the pathway-focussed elements have existing annotations indicative of conserved RNA secondary structure. The expression of 26 of the pathway-focused elements was examined using RT-PCR, providing confirmation that they include expressed ncRNAs. Consistent with previous studies, these elements are significantly over-represented in the introns of transcription factors. This study demonstrates a novel, highly effective, Bayesian approach to identifying conserved non-coding sequences. Our results complement previous findings that these sequences are enriched in transcription factors. However, in contrast to previous studies which suggest the majority of conserved sequences are regulatory factor binding sites, the majority of conserved sequences identified using our approach contain evidence of conserved RNA secondary structures, and our laboratory results suggest most are expressed. Functional roles at DNA and RNA levels are not mutually exclusive, and many of our elements possess evidence of both. Moreover, ncRNAs play roles in transcriptional and post-transcriptional regulation, and this may contribute to the over-representation of these elements in introns of transcription factors. We attribute the higher sensitivity of the pathway-focussed analysis compared to the genome-wide analysis to improved alignment quality, suggesting that enhanced genomic alignments may reveal many more conserved intronic sequences.
Sensitivity of BRCA1/2 testing in high-risk breast/ovarian/male breast cancer families: little contribution of comprehensive RNA/NGS panel testing.

PubMed

Byers, Helen; Wallis, Yvonne; van Veen, Elke M; Lalloo, Fiona; Reay, Kim; Smith, Philip; Wallace, Andrew J; Bowers, Naomi; Newman, William G; Evans, D Gareth

2016-11-01

The sensitivity of testing BRCA1 and BRCA2 remains unresolved as the frequency of deep intronic splicing variants has not been defined in high-risk familial breast/ovarian cancer families. This variant category is reported at significant frequency in other tumour predisposition genes, including NF1 and MSH2. We carried out comprehensive whole gene RNA analysis on 45 high-risk breast/ovary and male breast cancer families with no identified pathogenic variant on exonic sequencing and copy number analysis of BRCA1/2. In addition, we undertook variant screening of a 10-gene high/moderate risk breast/ovarian cancer panel by next-generation sequencing. DNA testing identified the causative variant in 50/56 (89%) breast/ovarian/male breast cancer families with Manchester scores of ≥50 with two variants being confirmed to affect splicing on RNA analysis. RNA sequencing of BRCA1/BRCA2 on 45 individuals from high-risk families identified no deep intronic variants and did not suggest loss of RNA expression as a cause of lost sensitivity. Panel testing in 42 samples identified a known RAD51D variant, a high-risk ATM variant in another breast ovary family and a truncating CHEK2 mutation. Current exonic sequencing and copy number analysis variant detection methods of BRCA1/2 have high sensitivity in high-risk breast/ovarian cancer families. Sequence analysis of RNA does not identify any variants undetected by current analysis of BRCA1/2. However, RNA analysis clarified the pathogenicity of variants of unknown significance detected by current methods. The low diagnostic uplift achieved through sequence analysis of the other known breast/ovarian cancer susceptibility genes indicates that further high-risk genes remain to be identified.
Information theory applications for biological sequence analysis.

PubMed

Vinga, Susana

2014-05-01

Information theory (IT) addresses the analysis of communication systems and has been widely applied in molecular biology. In particular, alignment-free sequence analysis and comparison greatly benefited from concepts derived from IT, such as entropy and mutual information. This review covers several aspects of IT applications, ranging from genome global analysis and comparison, including block-entropy estimation and resolution-free metrics based on iterative maps, to local analysis, comprising the classification of motifs, prediction of transcription factor binding sites and sequence characterization based on linguistic complexity and entropic profiles. IT has also been applied to high-level correlations that combine DNA, RNA or protein features with sequence-independent properties, such as gene mapping and phenotype analysis, and has also provided models based on communication systems theory to describe information transmission channels at the cell level and also during evolutionary processes. While not exhaustive, this review attempts to categorize existing methods and to indicate their relation with broader transversal topics such as genomic signatures, data compression and complexity, time series analysis and phylogenetic classification, providing a resource for future developments in this promising area.
Plastome Sequence Determination and Comparative Analysis for Members of the Lolium-Festuca Grass Species Complex

PubMed Central

Hand, Melanie L.; Spangenberg, German C.; Forster, John W.; Cogan, Noel O. I.

2013-01-01

Chloroplast genome sequences are of broad significance in plant biology, due to frequent use in molecular phylogenetics, comparative genomics, population genetics, and genetic modification studies. The present study used a second-generation sequencing approach to determine and assemble the plastid genomes (plastomes) of four representatives from the agriculturally important Lolium-Festuca species complex of pasture grasses (Lolium multiflorum, Festuca pratensis, Festuca altissima, and Festuca ovina). Total cellular DNA was extracted from either roots or leaves, was sequenced, and the output was filtered for plastome-related reads. A comparison between sources revealed fewer plastome-related reads from root-derived template but an increase in incidental bacterium-derived sequences. Plastome assembly and annotation indicated high levels of sequence identity and a conserved organization and gene content between species. However, frequent deletions within the F. ovina plastome appeared to contribute to a smaller plastid genome size. Comparative analysis with complete plastome sequences from other members of the Poaceae confirmed conservation of most grass-specific features. Detailed analysis of the rbcL–psaI intergenic region, however, revealed a “hot-spot” of variation characterized by independent deletion events. The evolutionary implications of this observation are discussed. The complete plastome sequences are anticipated to provide the basis for potential organelle-specific genetic modification of pasture grasses. PMID:23550121
Generation and analysis of expressed sequence tags from a cDNA library of the fruiting body of Ganoderma lucidum

PubMed Central

2010-01-01

Background Little genomic or trancriptomic information on Ganoderma lucidum (Lingzhi) is known. This study aims to discover the transcripts involved in secondary metabolite biosynthesis and developmental regulation of G. lucidum using an expressed sequence tag (EST) library. Methods A cDNA library was constructed from the G. lucidum fruiting body. Its high-quality ESTs were assembled into unique sequences with contigs and singletons. The unique sequences were annotated according to sequence similarities to genes or proteins available in public databases. The detection of simple sequence repeats (SSRs) was preformed by online analysis. Results A total of 1,023 clones were randomly selected from the G. lucidum library and sequenced, yielding 879 high-quality ESTs. These ESTs showed similarities to a diverse range of genes. The sequences encoding squalene epoxidase (SE) and farnesyl-diphosphate synthase (FPS) were identified in this EST collection. Several candidate genes, such as hydrophobin, MOB2, profilin and PHO84 were detected for the first time in G. lucidum. Thirteen (13) potential SSR-motif microsatellite loci were also identified. Conclusion The present study demonstrates a successful application of EST analysis in the discovery of transcripts involved in the secondary metabolite biosynthesis and the developmental regulation of G. lucidum. PMID:20230644
The European Classical Swine Fever Virus Database: Blueprint for a Pathogen-Specific Sequence Database with Integrated Sequence Analysis Tools

PubMed Central

Postel, Alexander; Schmeiser, Stefanie; Zimmermann, Bernd; Becher, Paul

2016-01-01

Molecular epidemiology has become an indispensable tool in the diagnosis of diseases and in tracing the infection routes of pathogens. Due to advances in conventional sequencing and the development of high throughput technologies, the field of sequence determination is in the process of being revolutionized. Platforms for sharing sequence information and providing standardized tools for phylogenetic analyses are becoming increasingly important. The database (DB) of the European Union (EU) and World Organisation for Animal Health (OIE) Reference Laboratory for classical swine fever offers one of the world’s largest semi-public virus-specific sequence collections combined with a module for phylogenetic analysis. The classical swine fever (CSF) DB (CSF-DB) became a valuable tool for supporting diagnosis and epidemiological investigations of this highly contagious disease in pigs with high socio-economic impacts worldwide. The DB has been re-designed and now allows for the storage and analysis of traditionally used, well established genomic regions and of larger genomic regions including complete viral genomes. We present an application example for the analysis of highly similar viral sequences obtained in an endemic disease situation and introduce the new geographic “CSF Maps” tool. The concept of this standardized and easy-to-use DB with an integrated genetic typing module is suited to serve as a blueprint for similar platforms for other human or animal viruses. PMID:27827988
Genome analysis and identification of gelatinase encoded gene in Enterobacter aerogenes

NASA Astrophysics Data System (ADS)

Shahimi, Safiyyah; Mutalib, Sahilah Abdul; Khalid, Rozida Abdul; Repin, Rul Aisyah Mat; Lamri, Mohd Fadly; Bakar, Mohd Faizal Abu; Isa, Mohd Noor Mat

2016-11-01

In this study, bioinformatic analysis towards genome sequence of E. aerogenes was done to determine gene encoded for gelatinase. Enterobacter aerogenes was isolated from hot spring water and gelatinase species-specific bacterium to porcine and fish gelatin. This bacterium offers the possibility of enzymes production which is specific to both species gelatine, respectively. Enterobacter aerogenes was partially genome sequenced resulting in 5.0 mega basepair (Mbp) total size of sequence. From pre-process pipeline, 87.6 Mbp of total reads, 68.8 Mbp of total high quality reads and 78.58 percent of high quality percentage was determined. Genome assembly produced 120 contigs with 67.5% of contigs over 1 kilo base pair (kbp), 124856 bp of N50 contig length and 55.17 % of GC base content percentage. About 4705 protein gene was identified from protein prediction analysis. Two candidate genes selected have highest similarity identity percentage against gelatinase enzyme available in Swiss-Prot and NCBI online database. They were NODE_9_length_26866_cov_148.013245_12 containing 1029 base pair (bp) sequence with 342 amino acid sequence and NODE_24_length_155103_cov_177.082458_62 which containing 717 bp sequence with 238 amino acid sequence, respectively. Thus, two paired of primers (forward and reverse) were designed, based on the open reading frame (ORF) of selected genes. Genome analysis of E. aerogenes resulting genes encoded gelatinase were identified.
RY-Coding and Non-Homogeneous Models Can Ameliorate the Maximum-Likelihood Inferences From Nucleotide Sequence Data with Parallel Compositional Heterogeneity.

PubMed

Ishikawa, Sohta A; Inagaki, Yuji; Hashimoto, Tetsuo

2012-01-01

In phylogenetic analyses of nucleotide sequences, 'homogeneous' substitution models, which assume the stationarity of base composition across a tree, are widely used, albeit individual sequences may bear distinctive base frequencies. In the worst-case scenario, a homogeneous model-based analysis can yield an artifactual union of two distantly related sequences that achieved similar base frequencies in parallel. Such potential difficulty can be countered by two approaches, 'RY-coding' and 'non-homogeneous' models. The former approach converts four bases into purine and pyrimidine to normalize base frequencies across a tree, while the heterogeneity in base frequency is explicitly incorporated in the latter approach. The two approaches have been applied to real-world sequence data; however, their basic properties have not been fully examined by pioneering simulation studies. Here, we assessed the performances of the maximum-likelihood analyses incorporating RY-coding and a non-homogeneous model (RY-coding and non-homogeneous analyses) on simulated data with parallel convergence to similar base composition. Both RY-coding and non-homogeneous analyses showed superior performances compared with homogeneous model-based analyses. Curiously, the performance of RY-coding analysis appeared to be significantly affected by a setting of the substitution process for sequence simulation relative to that of non-homogeneous analysis. The performance of a non-homogeneous analysis was also validated by analyzing a real-world sequence data set with significant base heterogeneity.
New splicing mutation in the choline kinase beta (CHKB) gene causing a muscular dystrophy detected by whole-exome sequencing.

PubMed

Oliveira, Jorge; Negrão, Luís; Fineza, Isabel; Taipa, Ricardo; Melo-Pires, Manuel; Fortuna, Ana Maria; Gonçalves, Ana Rita; Froufe, Hugo; Egas, Conceição; Santos, Rosário; Sousa, Mário

2015-06-01

Muscular dystrophies (MDs) are a group of hereditary muscle disorders that include two particularly heterogeneous subgroups: limb-girdle MD and congenital MD, linked to 52 different genes (seven common to both subgroups). Massive parallel sequencing technology may avoid the usual stepwise gene-by-gene analysis. We report the whole-exome sequencing (WES) analysis of a patient with childhood-onset progressive MD, also presenting mental retardation and dilated cardiomyopathy. Conventional sequencing had excluded eight candidate genes. WES of the trio (patient and parents) was performed using the ion proton sequencing system. Data analysis resorted to filtering steps using the GEMINI software revealed a novel silent variant in the choline kinase beta (CHKB) gene. Inspection of sequence alignments ultimately identified the causal variant (CHKB:c.1031+3G>C). This splice site mutation was confirmed using Sanger sequencing and its effect was further evaluated with gene expression analysis. On reassessment of the muscle biopsy, typical abnormal mitochondrial oxidative changes were observed. Mutations in CHKB have been shown to cause phosphatidylcholine deficiency in myofibers, causing a rare form of CMD (only 21 patients reported). Notwithstanding interpretative difficulties that need to be overcome before the integration of WES in the diagnostic workflow, this work corroborates its utility in solving cases from highly heterogeneous groups of diseases, in which conventional diagnostic approaches fail to provide a definitive diagnosis.
The Papillomavirus Episteme: a central resource for papillomavirus sequence data and analysis.

PubMed

Van Doorslaer, Koenraad; Tan, Qina; Xirasagar, Sandhya; Bandaru, Sandya; Gopalan, Vivek; Mohamoud, Yasmin; Huyen, Yentram; McBride, Alison A

2013-01-01

The goal of the Papillomavirus Episteme (PaVE) is to provide an integrated resource for the analysis of papillomavirus (PV) genome sequences and related information. The PaVE is a freely accessible, web-based tool (http://pave.niaid.nih.gov) created around a relational database, which enables storage, analysis and exchange of sequence information. From a design perspective, the PaVE adopts an Open Source software approach and stresses the integration and reuse of existing tools. Reference PV genome sequences have been extracted from publicly available databases and reannotated using a custom-created tool. To date, the PaVE contains 241 annotated PV genomes, 2245 genes and regions, 2004 protein sequences and 47 protein structures, which users can explore, analyze or download. The PaVE provides scientists with the data and tools needed to accelerate scientific progress for the study and treatment of diseases caused by PVs.
Determination and analysis of the complete genome sequence of Paralichthys olivaceus rhabdovirus (PORV).

PubMed

Zhu, Ruo-Lin; Zhang, Qi-Ya

2014-04-01

Paralichthys olivaceus rhabdovirus (PORV), which is associated with high mortality rates in flounder, was isolated in China in 2005. Here, we provide an annotated sequence record of PORV, the genome of which comprises 11,182 nucleotides and contains six genes in the order 3'-N-P-M-G-NV-L-5'. Phylogenetic analysis based on glycoprotein sequences of PORV and other rhabdoviruses showed that PORV clusters with viral haemorrhagic septicemia virus (VHSV), genus Novirhabdovirus, family Rhabdoviridae. Further phylogenetic analysis of the combined amino acid sequences of six proteins of PORV and VHSV strains showed that PORV clusters with Korean strains and is closely related to Asian strains, all of which were isolated from flounder. In a comparison in which the sequences of the six proteins were combined, PORV shared the highest identity (98.3 %) with VHSV strain KJ2008 from Korea.
Evaluation of the Bacterial Diversity in the Human Tongue Coating Based on Genus-Specific Primers for 16S rRNA Sequencing.

PubMed

Sun, Beili; Zhou, Dongrui; Tu, Jing; Lu, Zuhong

2017-01-01

The characteristics of tongue coating are very important symbols for disease diagnosis in traditional Chinese medicine (TCM) theory. As a habitat of oral microbiota, bacteria on the tongue dorsum have been proved to be the cause of many oral diseases. The high-throughput next-generation sequencing (NGS) platforms have been widely applied in the analysis of bacterial 16S rRNA gene. We developed a methodology based on genus-specific multiprimer amplification and ligation-based sequencing for microbiota analysis. In order to validate the efficiency of the approach, we thoroughly analyzed six tongue coating samples from lung cancer patients with different TCM types, and more than 600 genera of bacteria were detected by this platform. The results showed that ligation-based parallel sequencing combined with enzyme digestion and multiamplification could expand the effective length of sequencing reads and could be applied in the microbiota analysis.
DNA sequence-based comparative studies between non-extremophile and extremophile organisms with implications in exobiology

NASA Astrophysics Data System (ADS)

Holden, Todd; Marchese, P.; Tremberger, G., Jr.; Cheung, E.; Subramaniam, R.; Sullivan, R.; Schneider, P.; Flamholz, A.; Lieberman, D.; Cheung, T.

2008-08-01

We have characterized function related DNA sequences of various organisms using informatics techniques, including fractal dimension calculation, nucleotide and multi-nucleotide statistics, and sequence fluctuation analysis. Our analysis shows trends which differentiate extremophile from non-extremophile organisms, which could be reproduced in extraterrestrial life. Among the systems studied are radiation repair genes, genes involved in thermal shocks, and genes involved in drug resistance. We also evaluate sequence level changes that have occurred during short term evolution (several thousand generations) under extreme conditions.
Processing and population genetic analysis of multigenic datasets with ProSeq3 software.

PubMed

Filatov, Dmitry A

2009-12-01

The current tendency in molecular population genetics is to use increasing numbers of genes in the analysis. Here I describe a program for handling and population genetic analysis of DNA polymorphism data collected from multiple genes. The program includes a sequence/alignment editor and an internal relational database that simplify the preparation and manipulation of multigenic DNA polymorphism datasets. The most commonly used DNA polymorphism analyses are implemented in ProSeq3, facilitating population genetic analysis of large multigenic datasets. Extensive input/output options make ProSeq3 a convenient hub for sequence data processing and analysis. The program is available free of charge from http://dps.plants.ox.ac.uk/sequencing/proseq.htm.
Analysis and Functional Annotation of an Expressed Sequence Tag Collection for Tropical Crop Sugarcane

PubMed Central

Vettore, André L.; da Silva, Felipe R.; Kemper, Edson L.; Souza, Glaucia M.; da Silva, Aline M.; Ferro, Maria Inês T.; Henrique-Silva, Flavio; Giglioti, Éder A.; Lemos, Manoel V.F.; Coutinho, Luiz L.; Nobrega, Marina P.; Carrer, Helaine; França, Suzelei C.; Bacci, Maurício; Goldman, Maria Helena S.; Gomes, Suely L.; Nunes, Luiz R.; Camargo, Luis E.A.; Siqueira, Walter J.; Van Sluys, Marie-Anne; Thiemann, Otavio H.; Kuramae, Eiko E.; Santelli, Roberto V.; Marino, Celso L.; Targon, Maria L.P.N.; Ferro, Jesus A.; Silveira, Henrique C.S.; Marini, Danyelle C.; Lemos, Eliana G.M.; Monteiro-Vitorello, Claudia B.; Tambor, José H.M.; Carraro, Dirce M.; Roberto, Patrícia G.; Martins, Vanderlei G.; Goldman, Gustavo H.; de Oliveira, Regina C.; Truffi, Daniela; Colombo, Carlos A.; Rossi, Magdalena; de Araujo, Paula G.; Sculaccio, Susana A.; Angella, Aline; Lima, Marleide M.A.; de Rosa, Vicente E.; Siviero, Fábio; Coscrato, Virginia E.; Machado, Marcos A.; Grivet, Laurent; Di Mauro, Sonia M.Z.; Nobrega, Francisco G.; Menck, Carlos F.M.; Braga, Marilia D.V.; Telles, Guilherme P.; Cara, Frank A.A.; Pedrosa, Guilherme; Meidanis, João; Arruda, Paulo

2003-01-01

To contribute to our understanding of the genome complexity of sugarcane, we undertook a large-scale expressed sequence tag (EST) program. More than 260,000 cDNA clones were partially sequenced from 26 standard cDNA libraries generated from different sugarcane tissues. After the processing of the sequences, 237,954 high-quality ESTs were identified. These ESTs were assembled into 43,141 putative transcripts. Of the assembled sequences, 35.6% presented no matches with existing sequences in public databases. A global analysis of the whole SUCEST data set indicated that 14,409 assembled sequences (33% of the total) contained at least one cDNA clone with a full-length insert. Annotation of the 43,141 assembled sequences associated almost 50% of the putative identified sugarcane genes with protein metabolism, cellular communication/signal transduction, bioenergetics, and stress responses. Inspection of the translated assembled sequences for conserved protein domains revealed 40,821 amino acid sequences with 1415 Pfam domains. Reassembling the consensus sequences of the 43,141 transcripts revealed a 22% redundancy in the first assembling. This indicated that possibly 33,620 unique genes had been identified and indicated that >90% of the sugarcane expressed genes were tagged. PMID:14613979
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhou, Xiaofan; Peris, David; Kominek, Jacek

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less
in silico Whole Genome Sequencer & Analyzer (iWGS): A Computational Pipeline to Guide the Design and Analysis of de novo Genome Sequencing Studies

DOE PAGES

Zhou, Xiaofan; Peris, David; Kominek, Jacek; ...

2016-09-16

The availability of genomes across the tree of life is highly biased toward vertebrates, pathogens, human disease models, and organisms with relatively small and simple genomes. Recent progress in genomics has enabled the de novo decoding of the genome of virtually any organism, greatly expanding its potential for understanding the biology and evolution of the full spectrum of biodiversity. The increasing diversity of sequencing technologies, assays, and de novo assembly algorithms have augmented the complexity of de novo genome sequencing projects in nonmodel organisms. To reduce the costs and challenges in de novo genome sequencing projects and streamline their experimentalmore » design and analysis, we developed iWGS (in silico Whole Genome Sequencer and Analyzer), an automated pipeline for guiding the choice of appropriate sequencing strategy and assembly protocols. iWGS seamlessly integrates the four key steps of a de novo genome sequencing project: data generation (through simulation), data quality control, de novo assembly, and assembly evaluation and validation. The last three steps can also be applied to the analysis of real data. iWGS is designed to enable the user to have great flexibility in testing the range of experimental designs available for genome sequencing projects, and supports all major sequencing technologies and popular assembly tools. Three case studies illustrate how iWGS can guide the design of de novo genome sequencing projects, and evaluate the performance of a wide variety of user-specified sequencing strategies and assembly protocols on genomes of differing architectures. iWGS, along with a detailed documentation, is freely available at https://github.com/zhouxiaofan1983/iWGS.« less

Error Analysis of Deep Sequencing of Phage Libraries: Peptides Censored in Sequencing

PubMed Central

Matochko, Wadim L.; Derda, Ratmir

2013-01-01

Next-generation sequencing techniques empower selection of ligands from phage-display libraries because they can detect low abundant clones and quantify changes in the copy numbers of clones without excessive selection rounds. Identification of errors in deep sequencing data is the most critical step in this process because these techniques have error rates >1%. Mechanisms that yield errors in Illumina and other techniques have been proposed, but no reports to date describe error analysis in phage libraries. Our paper focuses on error analysis of 7-mer peptide libraries sequenced by Illumina method. Low theoretical complexity of this phage library, as compared to complexity of long genetic reads and genomes, allowed us to describe this library using convenient linear vector and operator framework. We describe a phage library as N × 1 frequency vector n = ||ni||, where ni is the copy number of the ith sequence and N is the theoretical diversity, that is, the total number of all possible sequences. Any manipulation to the library is an operator acting on n. Selection, amplification, or sequencing could be described as a product of a N × N matrix and a stochastic sampling operator (S a). The latter is a random diagonal matrix that describes sampling of a library. In this paper, we focus on the properties of S a and use them to define the sequencing operator (S e q). Sequencing without any bias and errors is S e q = S a IN, where IN is a N × N unity matrix. Any bias in sequencing changes IN to a nonunity matrix. We identified a diagonal censorship matrix (C E N), which describes elimination or statistically significant downsampling, of specific reads during the sequencing process. PMID:24416071
A survey of tools for variant analysis of next-generation genome sequencing data

PubMed Central

Pabinger, Stephan; Dander, Andreas; Fischer, Maria; Snajder, Rene; Sperk, Michael; Efremova, Mirjana; Krabichler, Birgit; Speicher, Michael R.; Zschocke, Johannes

2014-01-01

Recent advances in genome sequencing technologies provide unprecedented opportunities to characterize individual genomic landscapes and identify mutations relevant for diagnosis and therapy. Specifically, whole-exome sequencing using next-generation sequencing (NGS) technologies is gaining popularity in the human genetics community due to the moderate costs, manageable data amounts and straightforward interpretation of analysis results. While whole-exome and, in the near future, whole-genome sequencing are becoming commodities, data analysis still poses significant challenges and led to the development of a plethora of tools supporting specific parts of the analysis workflow or providing a complete solution. Here, we surveyed 205 tools for whole-genome/whole-exome sequencing data analysis supporting five distinct analytical steps: quality assessment, alignment, variant identification, variant annotation and visualization. We report an overview of the functionality, features and specific requirements of the individual tools. We then selected 32 programs for variant identification, variant annotation and visualization, which were subjected to hands-on evaluation using four data sets: one set of exome data from two patients with a rare disease for testing identification of germline mutations, two cancer data sets for testing variant callers for somatic mutations, copy number variations and structural variations, and one semi-synthetic data set for testing identification of copy number variations. Our comprehensive survey and evaluation of NGS tools provides a valuable guideline for human geneticists working on Mendelian disorders, complex diseases and cancers. PMID:23341494
Regularized rare variant enrichment analysis for case-control exome sequencing data.

PubMed

Larson, Nicholas B; Schaid, Daniel J

2014-02-01

Rare variants have recently garnered an immense amount of attention in genetic association analysis. However, unlike methods traditionally used for single marker analysis in GWAS, rare variant analysis often requires some method of aggregation, since single marker approaches are poorly powered for typical sequencing study sample sizes. Advancements in sequencing technologies have rendered next-generation sequencing platforms a realistic alternative to traditional genotyping arrays. Exome sequencing in particular not only provides base-level resolution of genetic coding regions, but also a natural paradigm for aggregation via genes and exons. Here, we propose the use of penalized regression in combination with variant aggregation measures to identify rare variant enrichment in exome sequencing data. In contrast to marginal gene-level testing, we simultaneously evaluate the effects of rare variants in multiple genes, focusing on gene-based least absolute shrinkage and selection operator (LASSO) and exon-based sparse group LASSO models. By using gene membership as a grouping variable, the sparse group LASSO can be used as a gene-centric analysis of rare variants while also providing a penalized approach toward identifying specific regions of interest. We apply extensive simulations to evaluate the performance of these approaches with respect to specificity and sensitivity, comparing these results to multiple competing marginal testing methods. Finally, we discuss our findings and outline future research. © 2013 WILEY PERIODICALS, INC.
Proteomics analysis of "Rovabiot Excel", a secreted protein cocktail from the filamentous fungus Penicillium funiculosum grown under industrial process fermentation.

PubMed

Guais, Olivier; Borderies, Gisèle; Pichereaux, Carole; Maestracci, Marc; Neugnot, Virginie; Rossignol, Michel; François, Jean Marie

2008-12-01

MS/MS techniques are well customized now for proteomic analysis, even for non-sequenced organisms, since peptide sequences obtained by these methods can be matched with those found in databases from closely related sequenced organisms. We used this approach to characterize the protein content of the "Rovabio Excel", an enzymatic cocktail produced by Penicillium funiculosum that is used as feed additive in animal nutrition. Protein separation by bi-dimensional electrophoresis yielded more than 100 spots, from which 37 proteins were unambiguously assigned from peptide sequences. By one-dimensional SDS-gel electrophoresis, 34 proteins were identified among which 8 were not found in the 2-DE analysis. A third method, termed 'peptidic shotgun', which consists in a direct treatment of the cocktail by trypsin followed by separation of the peptides on two-dimensional liquid chromatography, resulted in the identification of two additional proteins not found by the two other methods. Altogether, more than 50 proteins, among which several glycosylhydrolytic, hemicellulolytic and proteolytic enzymes, were identified by combining three separation methods in this enzymatic cocktail. This work confirmed the power of proteome analysis to explore the genome expression of a non-sequenced fungus by taking advantage of sequences from phylogenetically related filamentous fungi and pave the way for further functional analysis of P. funiculosum.
Phylogenetic analysis of HIV-1 reverse transcriptase sequences from 382 patients recruited in JJ Hospital of Mumbai, India, between 2002 and 2008.

PubMed

Deshpande, Alaka; Jauvin, Valerie; Pinson, Patricia; Jeannot, Anne Cecile; Fleury, Herve J

2009-06-01

Analysis of reverse transcriptase (RT) sequences of 382 HIV-1 isolates from untreated and treated patients recruited in JJ Hospital (Mumbai, India) between 2002 and 2008 shows that subtype C is largely predominant (98%) and that non-C sequences cluster with A1, B, CRF01_AE, and CRF06_cpx.
Identification of the sequence motif of glycoside hydrolase 13 family members

PubMed Central

Kumar, Vikash

2011-01-01

A bioinformatics analysis of sequences of enzymes of the glycoside hydrolase (GH) 13 family members such as α-amylase, cyclodextrin glycosyltransferase (CGTase), branching enzyme and cyclomaltodextrinase has been carried out in order to find out the sequence motifs that govern the reactions specificities of these enzymes by using hidden Markov model (HMM) profile. This analysis suggests the existence of such sequence motifs and residues of these motifs constituting the −1 to +3 catalytic subsites of the enzyme. Hence, by introducing mutations in the residues of these four subsites, one can change the reaction specificities of the enzymes. In general it has been observed that α -amylase sequence motif have low sequence conservation than rest of the motifs of the GH13 family members. PMID:21544166
Sequencing and phylogenetic analysis of tobacco virus 2, a polerovirus from Nicotiana tabacum.

PubMed

Zhou, Benguo; Wang, Fang; Zhang, Xuesong; Zhang, Lina; Lin, Huafeng

2017-07-01

The complete genome sequence of a new virus, provisionally named tobacco virus 2 (TV2), was determined and identified from leaves of tobacco (Nicotiana tabacum) exhibiting leaf mosaic, yellowing, and deformity, in Anhui Province, China. The genome sequence of TV2 comprises 5,979 nucleotides, with 87% nucleotide sequence identity to potato leafroll virus (PLRV). Its genome organization is similar to that of PLRV, containing six open reading frames (ORFs) that potentially encode proteins with putative functions in cell-to-cell movement and suppression of RNA silencing. Phylogenetic analysis of the nucleotide sequence placed TV2 alongside members of the genus Polerovirus in the family Luteoviridae. To the best our knowledge, this study is the first report of a complete genome sequence of a new polerovirus identified in tobacco.
Library preparation and data analysis packages for rapid genome sequencing.

PubMed

Pomraning, Kyle R; Smith, Kristina M; Bredeweg, Erin L; Connolly, Lanelle R; Phatale, Pallavi A; Freitag, Michael

2012-01-01

High-throughput sequencing (HTS) has quickly become a valuable tool for comparative genetics and genomics and is now regularly carried out in laboratories that are not connected to large sequencing centers. Here we describe an updated version of our protocol for constructing single- and paired-end Illumina sequencing libraries, beginning with purified genomic DNA. The present protocol can also be used for "multiplexing," i.e. the analysis of several samples in a single flowcell lane by generating "barcoded" or "indexed" Illumina sequencing libraries in a way that is independent from Illumina-supported methods. To analyze sequencing results, we suggest several independent approaches but end users should be aware that this is a quickly evolving field and that currently many alignment (or "mapping") and counting algorithms are being developed and tested.
Sequence determination and analysis of S-adenosyl-L-homocysteine hydrolase from yellow lupine (Lupinus luteus).

PubMed

Brzeziński, K; Janowski, R; Podkowiński, J; Jaskólski, M

2001-01-01

The coding sequences of two S-adenosyl-L-homocysteine hydrolases (SAHases) were identified in yellow lupine by screenig of a cDNA library. One of them, corresponding to the complete protein, was sequenced and compared with 52 other SAHase sequences. Phylogenetic analysis of these proteins identified three groups of the enzymes. Group A comprises only bacterial sequences. Group B is subdivided into two subgroups, one of which (B1) is formed by animal sequences. Subgroup B2 consist of two distinct clusters, B2a and B2b. Cluster B2b comprises all known plant sequences, including the yellow lupine enzyme, which are distinguished by a 50-residue insert. Group C is heterogeneous and contains SAHases from Archaea as well as a new class of animal enzymes, distinctly different from those in group B1.
MIPS: a database for protein sequences and complete genomes.

PubMed Central

Mewes, H W; Hani, J; Pfeiffer, F; Frishman, D

1998-01-01

The MIPS group [Munich Information Center for Protein Sequences of the German National Center for Environment and Health (GSF)] at the Max-Planck-Institute for Biochemistry, Martinsried near Munich, Germany, is involved in a number of data collection activities, including a comprehensive database of the yeast genome, a database reflecting the progress in sequencing the Arabidopsis thaliana genome, the systematic analysis of other small genomes and the collection of protein sequence data within the framework of the PIR-International Protein Sequence Database (described elsewhere in this volume). Through its WWW server (http://www.mips.biochem.mpg.de ) MIPS provides access to a variety of generic databases, including a database of protein families as well as automatically generated data by the systematic application of sequence analysis algorithms. The yeast genome sequence and its related information was also compiled on CD-ROM to provide dynamic interactive access to the 16 chromosomes of the first eukaryotic genome unraveled. PMID:9399795
An Optimal Bahadur-Efficient Method in Detection of Sparse Signals with Applications to Pathway Analysis in Sequencing Association Studies.

PubMed

Dai, Hongying; Wu, Guodong; Wu, Michael; Zhi, Degui

2016-01-01

Next-generation sequencing data pose a severe curse of dimensionality, complicating traditional "single marker-single trait" analysis. We propose a two-stage combined p-value method for pathway analysis. The first stage is at the gene level, where we integrate effects within a gene using the Sequence Kernel Association Test (SKAT). The second stage is at the pathway level, where we perform a correlated Lancaster procedure to detect joint effects from multiple genes within a pathway. We show that the Lancaster procedure is optimal in Bahadur efficiency among all combined p-value methods. The Bahadur efficiency,[Formula: see text], compares sample sizes among different statistical tests when signals become sparse in sequencing data, i.e. ε →0. The optimal Bahadur efficiency ensures that the Lancaster procedure asymptotically requires a minimal sample size to detect sparse signals ([Formula: see text]). The Lancaster procedure can also be applied to meta-analysis. Extensive empirical assessments of exome sequencing data show that the proposed method outperforms Gene Set Enrichment Analysis (GSEA). We applied the competitive Lancaster procedure to meta-analysis data generated by the Global Lipids Genetics Consortium to identify pathways significantly associated with high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglycerides, and total cholesterol.
The complete chloroplast genome sequence of the medicinal plant Salvia miltiorrhiza.

PubMed

Qian, Jun; Song, Jingyuan; Gao, Huanhuan; Zhu, Yingjie; Xu, Jiang; Pang, Xiaohui; Yao, Hui; Sun, Chao; Li, Xian'en; Li, Chuyuan; Liu, Juyan; Xu, Haibin; Chen, Shilin

2013-01-01

Salvia miltiorrhiza is an important medicinal plant with great economic and medicinal value. The complete chloroplast (cp) genome sequence of Salvia miltiorrhiza, the first sequenced member of the Lamiaceae family, is reported here. The genome is 151,328 bp in length and exhibits a typical quadripartite structure of the large (LSC, 82,695 bp) and small (SSC, 17,555 bp) single-copy regions, separated by a pair of inverted repeats (IRs, 25,539 bp). It contains 114 unique genes, including 80 protein-coding genes, 30 tRNAs and four rRNAs. The genome structure, gene order, GC content and codon usage are similar to the typical angiosperm cp genomes. Four forward, three inverted and seven tandem repeats were detected in the Salvia miltiorrhiza cp genome. Simple sequence repeat (SSR) analysis among the 30 asterid cp genomes revealed that most SSRs are AT-rich, which contribute to the overall AT richness of these cp genomes. Additionally, fewer SSRs are distributed in the protein-coding sequences compared to the non-coding regions, indicating an uneven distribution of SSRs within the cp genomes. Entire cp genome comparison of Salvia miltiorrhiza and three other Lamiales cp genomes showed a high degree of sequence similarity and a relatively high divergence of intergenic spacers. Sequence divergence analysis discovered the ten most divergent and ten most conserved genes as well as their length variation, which will be helpful for phylogenetic studies in asterids. Our analysis also supports that both regional and functional constraints affect gene sequence evolution. Further, phylogenetic analysis demonstrated a sister relationship between Salvia miltiorrhiza and Sesamum indicum. The complete cp genome sequence of Salvia miltiorrhiza reported in this paper will facilitate population, phylogenetic and cp genetic engineering studies of this medicinal plant.
Quantitative statistical analysis of cis-regulatory sequences in ABA/VP1- and CBF/DREB1-regulated genes of Arabidopsis.

PubMed

Suzuki, Masaharu; Ketterling, Matthew G; McCarty, Donald R

2005-09-01

We have developed a simple quantitative computational approach for objective analysis of cis-regulatory sequences in promoters of coregulated genes. The program, designated MotifFinder, identifies oligo sequences that are overrepresented in promoters of coregulated genes. We used this approach to analyze promoter sequences of Viviparous1 (VP1)/abscisic acid (ABA)-regulated genes and cold-regulated genes, respectively, of Arabidopsis (Arabidopsis thaliana). We detected significantly enriched sequences in up-regulated genes but not in down-regulated genes. This result suggests that gene activation but not repression is mediated by specific and common sequence elements in promoters. The enriched motifs include several known cis-regulatory sequences as well as previously unidentified motifs. With respect to known cis-elements, we dissected the flanking nucleotides of the core sequences of Sph element, ABA response elements (ABREs), and the C repeat/dehydration-responsive element. This analysis identified the motif variants that may correlate with qualitative and quantitative differences in gene expression. While both VP1 and cold responses are mediated in part by ABA signaling via ABREs, these responses correlate with unique ABRE variants distinguished by nucleotides flanking the ACGT core. ABRE and Sph motifs are tightly associated uniquely in the coregulated set of genes showing a strict dependence on VP1 and ABA signaling. Finally, analysis of distribution of the enriched sequences revealed a striking concentration of enriched motifs in a proximal 200-base region of VP1/ABA and cold-regulated promoters. Overall, each class of coregulated genes possesses a discrete set of the enriched motifs with unique distributions in their promoters that may account for the specificity of gene regulation.
Mucosal and Cutaneous Human Papillomaviruses Detected in Raw Sewages

PubMed Central

La Rosa, Giuseppina; Fratini, Marta; Accardi, Luisa; D'Oro, Graziana; Della Libera, Simonetta; Muscillo, Michele; Di Bonito, Paola

2013-01-01

Epitheliotropic viruses can find their way into sewage. The aim of the present study was to investigate the occurrence, distribution, and genetic diversity of Human Papillomaviruses (HPVs) in urban wastewaters. Sewage samples were collected from treatment plants distributed throughout Italy. The DNA extracted from these samples was analyzed by PCR using five PV-specific sets of primers targeting the L1 (GP5/GP6, MY09/MY11, FAP59/64, SKF/SKR) and E1 regions (PM-A/PM-B), according to the protocols previously validated for the detection of mucosal and cutaneous HPV genotypes. PCR products underwent sequencing analysis and the sequences were aligned to reference genomes from the Papillomavirus Episteme database. Phylogenetic analysis was then performed to assess the genetic relationships among the different sequences and between the sequences of the samples and those of the prototype strains. A broad spectrum of sequences related to mucosal and cutaneous HPV types was detected in 81% of the sewage samples analyzed. Surprisingly, sequences related to the anogenital HPV6 and 11 were detected in 19% of the samples, and sequences related to the “high risk” oncogenic HPV16 were identified in two samples. Sequences related to HPV9, HPV20, HPV25, HPV76, HPV80, HPV104, HPV110, HPV111, HPV120 and HPV145 beta Papillomaviruses were detected in 76% of the samples. In addition, similarity searches and phylogenetic analysis of some sequences suggest that they could belong to putative new genotypes of the beta genus. In this study, for the first time, the presence of HPV viruses strongly related to human cancer is reported in sewage samples. Our data increases the knowledge of HPV genomic diversity and suggests that virological analysis of urban sewage can provide key information useful in supporting epidemiological studies. PMID:23341898
Opinion: Clarifying Two Controversies about Information Mapping's Method.

ERIC Educational Resources Information Center

Horn, Robert E.

1992-01-01

Describes Information Mapping, a methodology for the analysis, organization, sequencing, and presentation of information and explains three major parts of the method: (1) content analysis, (2) project life-cycle synthesis and integration of the content analysis, and (3) sequencing and formatting. Major criticisms of the methodology are addressed.…
Examining inter-family differences in intra-family (parent-adolescent) dynamics using grid-sequence analysis.

PubMed

Brinberg, Miriam; Fosco, Gregory M; Ram, Nilam

2017-12-01

Family systems theorists have forwarded a set of theoretical principles meant to guide family scientists and practitioners in their conceptualization of patterns of family interaction-intra-family dynamics-that, over time, give rise to family and individual dysfunction and/or adaptation. In this article, we present an analytic approach that merges state space grid methods adapted from the dynamic systems literature with sequence analysis methods adapted from molecular biology into a "grid-sequence" method for studying inter-family differences in intra-family dynamics. Using dyadic data from 86 parent-adolescent dyads who provided up to 21 daily reports about connectedness, we illustrate how grid-sequence analysis can be used to identify a typology of intrafamily dynamics and to inform theory about how specific types of intrafamily dynamics contribute to adolescent behavior problems and family members' mental health. Methodologically, grid-sequence analysis extends the toolbox of techniques for analysis of family experience sampling and daily diary data. Substantively, we identify patterns of family level microdynamics that may serve as new markers of risk/protective factors and potential points for intervention in families. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Genetic diversity assessment of anoxygenic photosynthetic bacteria by distance-based grouping analysis of pufM sequences.

PubMed

Zeng, Y H; Chen, X H; Jiao, N Z

2007-12-01

To assess how completely the diversity of anoxygenic phototrophic bacteria (APB) was sampled in natural environments. All nucleotide sequences of the APB marker gene pufM from cultures and environmental clones were retrieved from the GenBank database. A set of cutoff values (sequence distances 0.06, 0.15 and 0.48 for species, genus, and (sub)phylum levels, respectively) was established using a distance-based grouping program. Analysis of the environmental clones revealed that current efforts on APB isolation and sampling in natural environments are largely inadequate. Analysis of the average distance between each identified genus and an uncultured environmental pufM sequence indicated that the majority of cultured APB genera lack environmental representatives. The distance-based grouping method is fast and efficient for bulk functional gene sequences analysis. The results clearly show that we are at a relatively early stage in sampling the global richness of APB species. Periodical assessment will undoubtedly facilitate in-depth analysis of potential biogeographical distribution pattern of APB. This is the first attempt to assess the present understanding of APB diversity in natural environments. The method used is also useful for assessing the diversity of other functional genes.
Investigating DNA Binding and Conformational Variation in Temperature Sensitive p53 Cancer Mutants Using QM-MM Simulations

PubMed Central

Koulgi, Shruti; Achalere, Archana; Sonavane, Uddhavesh; Joshi, Rajendra

2015-01-01

The tp53 gene is found to be mutated in 50% of all the cancers. The p53 protein, a product of tp53 gene, is a multi-domain protein. It consists of a core DNA binding domain (DBD) which is responsible for its binding and transcription of downstream target genes. The mutations in p53 protein are responsible for creating cancerous conditions and are found to be occurring at a high frequency in the DBD region of p53. Some of these mutations are also known to be temperature sensitive (ts) in nature. They are known to exhibit partial or strong binding with DNA in the temperature range (298–306 K). Whereas, at 310 K and above they show complete loss in binding. We have analyzed the changes in binding and conformational behavior at 300 K and 310 K for three of the ts-mutants viz., V143A, R249S and R175H. QM-MM simulations have been performed on the wild type and the above mentioned ts-mutants for 30 ns each. The optimal estimate of free energy of binding for a particular number of interface hydrogen bonds was calculated using the maximum likelihood method as described by Chodera et. al (2007). This parameter has been observed to be able to mimic the binding affinity of the p53 ts-mutants at 300 K and 310 K. Thus the correlation between MM-GBSA free energy of binding and hydrogen bonds formed by the interface residues between p53 and DNA has revealed the temperature dependent nature of these mutants. The role of main chain dihedrals was obtained by performing dihedral principal component analysis (PCA). This analysis, suggests that the conformational variations in the main chain dihedrals (ϕ and ψ) of the p53 ts-mutants may have caused reduction in the overall stability of the protein. The solvent exposure of the side chains of the interface residues were found to hamper the binding of the p53 to the DNA. Solvent Accessible Surface Area (SASA) also proved to be a crucial property in distinguishing the conformers obtained at 300 K and 310 K for the three ts-mutants from the wild type at 300 K. PMID:26579714
Mobile Genome Express (MGE): A comprehensive automatic genetic analyses pipeline with a mobile device.

PubMed

Yoon, Jun-Hee; Kim, Thomas W; Mendez, Pedro; Jablons, David M; Kim, Il-Jin

2017-01-01

The development of next-generation sequencing (NGS) technology allows to sequence whole exomes or genome. However, data analysis is still the biggest bottleneck for its wide implementation. Most laboratories still depend on manual procedures for data handling and analyses, which translates into a delay and decreased efficiency in the delivery of NGS results to doctors and patients. Thus, there is high demand for developing an automatic and an easy-to-use NGS data analyses system. We developed comprehensive, automatic genetic analyses controller named Mobile Genome Express (MGE) that works in smartphones or other mobile devices. MGE can handle all the steps for genetic analyses, such as: sample information submission, sequencing run quality check from the sequencer, secured data transfer and results review. We sequenced an Actrometrix control DNA containing multiple proven human mutations using a targeted sequencing panel, and the whole analysis was managed by MGE, and its data reviewing program called ELECTRO. All steps were processed automatically except for the final sequencing review procedure with ELECTRO to confirm mutations. The data analysis process was completed within several hours. We confirmed the mutations that we have identified were consistent with our previous results obtained by using multi-step, manual pipelines.
PMS2 gene mutational analysis: direct cDNA sequencing to circumvent pseudogene interference.

PubMed

Wimmer, Katharina; Wernstedt, Annekatrin

2014-01-01

The presence of highly homologous pseudocopies can compromise the mutation analysis of a gene of interest. In particular, when using PCR-based strategies, pseudogene co-amplification has to be effectively prevented. This is often achieved by using primers designed to be parental gene specific according to the reference sequence and by applying stringent PCR conditions. However, there are cases in which this approach is of limited utility. For example, it has been shown that the PMS2 gene exchanges sequences with one of its pseudogenes, named PMS2CL. This results in functional PMS2 alleles containing pseudogene-derived sequences at their 3'-end and in nonfunctional PMS2CL pseudogene alleles that contain gene-derived sequences. Hence, the paralogues cannot be distinguished according to the reference sequence. This shortcoming can be effectively circumvented by using direct cDNA sequencing. This approach is based on the selective amplification of PMS2 transcripts in two overlapping 1.6-kb RT-PCR products. In addition to avoiding pseudogene co-amplification and allele dropout, this method has also the advantage that it allows to effectively identify deletions, splice mutations, and de novo retrotransposon insertions that escape the detection of most DNA-based mutation analysis protocols.

PAQ: Partition Analysis of Quasispecies.

PubMed

Baccam, P; Thompson, R J; Fedrigo, O; Carpenter, S; Cornette, J L

2001-01-01

The complexities of genetic data may not be accurately described by any single analytical tool. Phylogenetic analysis is often used to study the genetic relationship among different sequences. Evolutionary models and assumptions are invoked to reconstruct trees that describe the phylogenetic relationship among sequences. Genetic databases are rapidly accumulating large amounts of sequences. Newly acquired sequences, which have not yet been characterized, may require preliminary genetic exploration in order to build models describing the evolutionary relationship among sequences. There are clustering techniques that rely less on models of evolution, and thus may provide nice exploratory tools for identifying genetic similarities. Some of the more commonly used clustering methods perform better when data can be grouped into mutually exclusive groups. Genetic data from viral quasispecies, which consist of closely related variants that differ by small changes, however, may best be partitioned by overlapping groups. We have developed an intuitive exploratory program, Partition Analysis of Quasispecies (PAQ), which utilizes a non-hierarchical technique to partition sequences that are genetically similar. PAQ was used to analyze a data set of human immunodeficiency virus type 1 (HIV-1) envelope sequences isolated from different regions of the brain and another data set consisting of the equine infectious anemia virus (EIAV) regulatory gene rev. Analysis of the HIV-1 data set by PAQ was consistent with phylogenetic analysis of the same data, and the EIAV rev variants were partitioned into two overlapping groups. PAQ provides an additional tool which can be used to glean information from genetic data and can be used in conjunction with other tools to study genetic similarities and genetic evolution of viral quasispecies.
Indigenous and introduced potyviruses of legumes and Passiflora spp. from Australia: biological properties and comparison of coat protein sequences

USDA-ARS?s Scientific Manuscript database

Coat protein sequences of 33 Potyvirus isolates from legume and Passiflora spp. were sequenced to determine the identity of infecting viruses. Phylogenetic analysis of the sequences revealed the presence of seven distinct virus species....
Single nucleotide polymorphisms from Theobroma cacao expressed sequence tags associated with witches' broom disease in cacao.

PubMed

Lima, L S; Gramacho, K P; Carels, N; Novais, R; Gaiotto, F A; Lopes, U V; Gesteira, A S; Zaidan, H A; Cascardo, J C M; Pires, J L; Micheli, F

2009-07-14

In order to increase the efficiency of cacao tree resistance to witches' broom disease, which is caused by Moniliophthora perniciosa (Tricholomataceae), we looked for molecular markers that could help in the selection of resistant cacao genotypes. Among the different markers useful for developing marker-assisted selection, single nucleotide polymorphisms (SNPs) constitute the most common type of sequence difference between alleles and can be easily detected by in silico analysis from expressed sequence tag libraries. We report the first detection and analysis of SNPs from cacao-M. perniciosa interaction expressed sequence tags, using bioinformatics. Selection based on analysis of these SNPs should be useful for developing cacao varieties resistant to this devastating disease.
[Isolation and identification of specific sequences correlated to cytoplasmic male sterility and fertile maintenance in cauliflower (Brassica oleracea var. botrytis)].

PubMed

Wang, Chun Guo; Chen, Xiao Qiang; Li, Hui; Zhao, Qian Cheng; Sun, De Ling; Song, Wen Qin

2008-02-01

Analysis of ISSR (Inter-Simple Sequence Repeat) and DDRT-PCR (Differential Display Reverse Transcriptase Polymerase Chain Reaction) was performed between cytoplasmic male sterility cauliflower ogura-A and its corresponding maintainer line ogura-B. Totally, 306 detectable bands were obtained by ISSR using thirty oligonucleotide primers. Commonly, six to twelve bands were produced per primer. Among all these primers only the amplification of primer ISSR3 was polymorphic, an 1100 bp specific band was only detected in maintainer line, named ISSR3(1100). Analysis of this sequence indicated that ISSR3(1100) was high homologous with the corresponding sequences of mitochondrial genome in Brassica napus and Arabidopsis thaliana,which suggested that ISSR3(1100) may derive from mitochondrial genome in cauliflower. To carry out DDRT-PCR analysis, three anchor primers and fifteen random primers were selected to combine. Totally, 1122 bands from 1 000 bp to 50 bp were detected. However, only four bands, named ogura-A 205, ogura-A383, ogura-B307 and ogura-B352, were confirmed to be different display in both lines. This result was further identified by reverse Northern dot blotting analysis. Among these four bands, ogura-A205 and ogura-A383 only express in cytoplasmic male sterility line, while ogura-B307 and ogura-B352 were only detected in maintainer line. Analysis of these sequences indicated that it was the first time that these four sequences were reported in cauliflower. Interestingly, ogura-A205 and ogura-B307 did not exhibit any similarities to other reported sequences in other species, more investigations were required to obtain further information. ogura-A383 and ogura-B352 were also two new sequences, they showed high similarities to corresponding chloroplast sequences of Arabidopsis thaliana and Brassica rapa subsp. pekinensis. So we speculated that these two sequences may derive from chloroplast genome. All these results obtained in this study offer new and significant information to investigate the molecular mechanism of cytoplasmic male sterility and fertile maintenance in cauliflower.
Complete sequence determination of a novel reptile iridovirus isolated from soft-shelled turtle and evolutionary analysis of Iridoviridae

PubMed Central

Huang, Youhua; Huang, Xiaohong; Liu, Hong; Gong, Jie; Ouyang, Zhengliang; Cui, Huachun; Cao, Jianhao; Zhao, Yingtao; Wang, Xiujie; Jiang, Yulin; Qin, Qiwei

2009-01-01

Background Soft-shelled turtle iridovirus (STIV) is the causative agent of severe systemic diseases in cultured soft-shelled turtles (Trionyx sinensis). To our knowledge, the only molecular information available on STIV mainly concerns the highly conserved STIV major capsid protein. The complete sequence of the STIV genome is not yet available. Therefore, determining the genome sequence of STIV and providing a detailed bioinformatic analysis of its genome content and evolution status will facilitate further understanding of the taxonomic elements of STIV and the molecular mechanisms of reptile iridovirus pathogenesis. Results We determined the complete nucleotide sequence of the STIV genome using 454 Life Science sequencing technology. The STIV genome is 105 890 bp in length with a base composition of 55.1% G+C. Computer assisted analysis revealed that the STIV genome contains 105 potential open reading frames (ORFs), which encode polypeptides ranging from 40 to 1,294 amino acids and 20 microRNA candidates. Among the putative proteins, 20 share homology with the ancestral proteins of the nuclear and cytoplasmic large DNA viruses (NCLDVs). Comparative genomic analysis showed that STIV has the highest degree of sequence conservation and a colinear arrangement of genes with frog virus 3 (FV3), followed by Tiger frog virus (TFV), Ambystoma tigrinum virus (ATV), Singapore grouper iridovirus (SGIV), Grouper iridovirus (GIV) and other iridovirus isolates. Phylogenetic analysis based on conserved core genes and complete genome sequence of STIV with other virus genomes was performed. Moreover, analysis of the gene gain-and-loss events in the family Iridoviridae suggested that the genes encoded by iridoviruses have evolved for favoring adaptation to different natural host species. Conclusion This study has provided the complete genome sequence of STIV. Phylogenetic analysis suggested that STIV and FV3 are strains of the same viral species belonging to the Ranavirus genus in the Iridoviridae family. Given virus-host co-evolution and the phylogenetic relationship among vertebrates from fish to reptiles, we propose that iridovirus might transmit between reptiles and amphibians and that STIV and FV3 are strains of the same viral species in the Ranavirus genus. PMID:19439104
Microbial community analysis of the hypersaline water of the Dead Sea using high-throughput amplicon sequencing.

PubMed

Jacob, Jacob H; Hussein, Emad I; Shakhatreh, Muhamad Ali K; Cornelison, Christopher T

2017-10-01

Amplicon sequencing using next-generation technology (bTEFAP ® ) has been utilized in describing the diversity of Dead Sea microbiota. The investigated area is a well-known salt lake in the western part of Jordan found in the lowest geographical location in the world (more than 420 m below sea level) and characterized by extreme salinity (approximately, 34%) in addition to other extreme conditions (low pH, unique ionic composition different from sea water). DNA was extracted from Dead Sea water. A total of 314,310 small subunit RNA (SSU rRNA) sequences were parsed, and 288,452 sequences were then clustered. For alpha diversity analysis, sample was rarefied to 3,000 sequences. The Shannon-Wiener index curve plot reached a plateau at approximately 3,000 sequences indicating that sequencing depth was sufficient to capture the full scope of microbial diversity. Archaea was found to be dominating the sequences (52%), whereas Bacteria constitute 45% of the sequences. Altogether, prokaryotic sequences (which constitute 97% of all sequences) were found to predominate. The findings expand on previous studies by using high-throughput amplicon sequencing to describe the microbial community in an environment which in recent years has been shown to hide some interesting diversity. © 2017 The Authors. MicrobiologyOpen published by John Wiley & Sons Ltd.
Sedimentary sequence evolution in a Foredeep basin: Eastern Venezuela

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bejarano, C.; Funes, D.; Sarzalho, S.

1996-08-01

Well log-seismic sequence stratigraphy analysis in the Eastern Venezuela Foreland Basin leads to study of the evolution of sedimentary sequences onto the Cretaceous-Paleocene passive margin. This basin comprises two different foredeep sub-basins: The Guarico subbasin to the west, older, and the Maturin sub-basin to the east, younger. A foredeep switching between these two sub-basins is observed at 12.5 m.y. Seismic interpretation and well log sections across the study area show sedimentary sequences with transgressive sands and coastal onlaps to the east-southeast for the Guarico sub-basin, as well as truncations below the switching sequence (12.5 m.y.), and the Maturin sub-basin showsmore » apparent coastal onlaps to the west-northwest, as well as a marine onlap (deeper water) in the west, where it starts to establish. Sequence stratigraphy analysis of these sequences with well logs allowed the study of the evolution of stratigraphic section from Paleocene to middle Miocene (68.0-12.0 m.y.). On the basis of well log patterns, the sequences were divided in regressive-transgressive-regressive sedimentary cycles caused by changes in relative sea level. Facies distributions were analyzed and the sequences were divided into simple sequences or sub- sequences of a greater frequencies than third order depositional sequences.« less
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences

PubMed Central

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-01-01

Background The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. Methods RetroTector© (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. Results ROL was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10 000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of <= 100 Megabases. Jobs are shown in an IP-number specific list. Results are text files, and can be viewed with the program, RetroTectorViewer.jar (at the same site), which has the full graphical capabilities of the basic ReTe program. A detailed analysis of any retroviral sequences found in the submitted sequence is graphically presented, exportable in standard formats. With the current server, a complete analysis of a 1 Megabase sequence is complete in 10 minutes. It is possible to mask nonretroviral repetitive sequences in the submitted sequence, using host genome specific "brooms", which increase specificity. Discussion Proviral sequences can be hard to recognize, especially if the integration occurred many million years ago. Precise delineation of LTR, gag, pro, pol and env can be difficult, requiring manual work. ROL is a way of simplifying these tasks. Conclusion ROL provides 1. annotation and presentation of known retroviral sequences, 2. detection of proviral chains in unknown genomic sequences, with up to 100 Mbase per submission. PMID:19534753
RetroTector online, a rational tool for analysis of retroviral elements in small and medium size vertebrate genomic sequences.

PubMed

Sperber, Göran; Lövgren, Anders; Eriksson, Nils-Einar; Benachenhou, Farid; Blomberg, Jonas

2009-06-16

The rapid accumulation of genomic information in databases necessitates rapid and specific algorithms for extracting biologically meaningful information. More or less complete retroviral sequences, also called proviral or endogenous retroviral sequences; ERVs, constitutes at least 5% of vertebrate genomes. After infecting the host, these retroviruses have integrated in germ line cells, and have then been carried in genomes for at least several 100 million years. A better understanding of structure and function of these sequences can have profound biological and medical consequences. RetroTector (ReTe) is a platform-independent Java program for identification and characterization of proviral sequences in vertebrate genomes. The full ReTe requires a local installation with a MySQL database. Although not overly complicated, the installation may take some time. A "light" version of ReTe, (RetroTector online; ROL) which does not require specific installation procedures is provided, via the World Wide Web. ROL http://www.fysiologi.neuro.uu.se/jbgs/ was implemented under the Batchelor web interface (A Lövgren et al). It allows both GenBank accession number, file and FASTA cut-and-paste admission of sequences (5 to 10,000 kilobases). Up to ten submissions can be done simultaneously, allowing batch analysis of
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

PubMed

Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

2015-12-01

The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
Improved serial analysis of V1 ribosomal sequence tags (SARST-V1) provides a rapid, comprehensive, sequence-based characterization of bacterial diversity and community composition.

PubMed

Yu, Zhongtang; Yu, Marie; Morrison, Mark

2006-04-01

Serial analysis of ribosomal sequence tags (SARST) is a recently developed technology that can generate large 16S rRNA gene (rrs) sequence data sets from microbiomes, but there are numerous enzymatic and purification steps required to construct the ribosomal sequence tag (RST) clone libraries. We report here an improved SARST method, which still targets the V1 hypervariable region of rrs genes, but reduces the number of enzymes, oligonucleotides, reagents, and technical steps needed to produce the RST clone libraries. The new method, hereafter referred to as SARST-V1, was used to examine the eubacterial diversity present in community DNA recovered from the microbiome resident in the ovine rumen. The 190 sequenced clones contained 1055 RSTs and no less than 236 unique phylotypes (based on > or = 95% sequence identity) that were assigned to eight different eubacterial phyla. Rarefaction and monomolecular curve analyses predicted that the complete RST clone library contains 99% of the 353 unique phylotypes predicted to exist in this microbiome. When compared with ribosomal intergenic spacer analysis (RISA) of the same community DNA sample, as well as a compilation of nine previously published conventional rrs clone libraries prepared from the same type of samples, the RST clone library provided a more comprehensive characterization of the eubacterial diversity present in rumen microbiomes. As such, SARST-V1 should be a useful tool applicable to comprehensive examination of diversity and composition in microbiomes and offers an affordable, sequence-based method for diversity analysis.
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India.

PubMed

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-03-01

Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability.
Genetic characterization of the non-structural protein-3 gene of bluetongue virus serotype-2 isolate from India

PubMed Central

Pudupakam, Raghavendra Sumanth; Raghunath, Shobana; Pudupakam, Meghanath; Daggupati, Sreenivasulu

2017-01-01

Aim: Sequence analysis and phylogenetic studies based on non-structural protein-3 (NS3) gene are important in understanding the evolution and epidemiology of bluetongue virus (BTV). This study was aimed at characterizing the NS3 gene sequence of Indian BTV serotype-2 (BTV2) to elucidate its genetic relationship to global BTV isolates. Materials and Methods: The NS3 gene of BTV2 was amplified from infected BHK-21 cell cultures, cloned and subjected to sequence analysis. The generated NS3 gene sequence was compared with the corresponding sequences of different BTV serotypes across the world, and a phylogenetic relationship was established. Results: The NS3 gene of BTV2 showed moderate levels of variability in comparison to different BTV serotypes, with nucleotide sequence identities ranging from 81% to 98%. The region showed high sequence homology of 93-99% at amino acid level with various BTV serotypes. The PPXY/PTAP late domain motifs, glycosylation sites, hydrophobic domains, and the amino acid residues critical for virus-host interactions were conserved in NS3 protein. Phylogenetic analysis revealed that BTV isolates segregate into four topotypes and that the Indian BTV2 in subclade IA is closely related to Asian and Australian origin strains. Conclusion: Analysis of the NS3 gene indicated that Indian BTV2 isolate is closely related to strains from Asia and Australia, suggesting a common origin of infection. Although the pattern of evolution of BTV2 isolate is different from other global isolates, the deduced amino acid sequence of NS3 protein demonstrated high molecular stability. PMID:28435199
Primary and secondary structural analyses of glutathione S-transferase pi from human placenta.

PubMed

Ahmad, H; Wilson, D E; Fritz, R R; Singh, S V; Medh, R D; Nagle, G T; Awasthi, Y C; Kurosky, A

1990-05-01

The primary structure of glutathione S-transferase (GST) pi from a single human placenta was determined. The structure was established by chemical characterization of tryptic and cyanogen bromide peptides as well as automated sequence analysis of the intact enzyme. The structural analysis indicated that the protein is comprised of 209 amino acid residues and gave no evidence of post-translational modifications. The amino acid sequence differed from that of the deduced amino acid sequence determined by nucleotide sequence analysis of a cDNA clone (Kano, T., Sakai, M., and Muramatsu, M., 1987, Cancer Res. 47, 5626-5630) at position 104 which contained both valine and isoleucine whereas the deduced sequence from nucleotide sequence analysis identified only isoleucine at this position. These results demonstrated that in the one individual placenta studied at least two GST pi genes are coexpressed, probably as a result of allelomorphism. Computer assisted consensus sequence evaluation identified a hydrophobic region in GST pi (residues 155-181) that was predicted to be either a buried transmembrane helical region or a signal sequence region. The significance of this hydrophobic region was interpreted in relation to the mode of action of the enzyme especially in regard to the potential involvement of a histidine in the active site mechanism. A comparison of the chemical similarity of five known human GST complete enzyme structures, one of pi, one of mu, two of alpha, and one microsomal, gave evidence that all five enzymes have evolved by a divergent evolutionary process after gene duplication, with the microsomal enzyme representing the most divergent form.
Analysis of Pre-Analytic Factors Affecting the Success of Clinical Next-Generation Sequencing of Solid Organ Malignancies.

PubMed

Chen, Hui; Luthra, Rajyalakshmi; Goswami, Rashmi S; Singh, Rajesh R; Roy-Chowdhuri, Sinchita

2015-08-28

Application of next-generation sequencing (NGS) technology to routine clinical practice has enabled characterization of personalized cancer genomes to identify patients likely to have a response to targeted therapy. The proper selection of tumor sample for downstream NGS based mutational analysis is critical to generate accurate results and to guide therapeutic intervention. However, multiple pre-analytic factors come into play in determining the success of NGS testing. In this review, we discuss pre-analytic requirements for AmpliSeq PCR-based sequencing using Ion Torrent Personal Genome Machine (PGM) (Life Technologies), a NGS sequencing platform that is often used by clinical laboratories for sequencing solid tumors because of its low input DNA requirement from formalin fixed and paraffin embedded tissue. The success of NGS mutational analysis is affected not only by the input DNA quantity but also by several other factors, including the specimen type, the DNA quality, and the tumor cellularity. Here, we review tissue requirements for solid tumor NGS based mutational analysis, including procedure types, tissue types, tumor volume and fraction, decalcification, and treatment effects.
Fueling the Future with Fungal Genomes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor V.

2014-10-27

Genomes of fungi relevant to energy and environment are in focus of the JGI Fungal Genomic Program. One of its projects, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts and pathogens) and biorefinery processes (cellulose degradation and sugar fermentation) by means of genome sequencing and analysis. New chapters of the Encyclopedia can be opened with user proposals to the JGI Community Science Program (CSP). Another JGI project, the 1000 fungal genomes, explores fungal diversity on genome level at scale and is open for users to nominate new species for sequencing. Over 400 fungal genomes have beenmore » sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supported by functional genomics will lead to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such ‘parts’ suggested by comparative genomics and functional analysis in these areas are presented here.« less
Fungal Genomics Program

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor

The JGI Fungal Genomics Program aims to scale up sequencing and analysis of fungal genomes to explore the diversity of fungi important for energy and the environment, and to promote functional studies on a system level. Combining new sequencing technologies and comparative genomics tools, JGI is now leading the world in fungal genome sequencing and analysis. Over 120 sequenced fungal genomes with analytical tools are available via MycoCosm (www.jgi.doe.gov/fungi), a web-portal for fungal biologists. Our model of interacting with user communities, unique among other sequencing centers, helps organize these communities, improves genome annotation and analysis work, and facilitates new larger-scalemore » genomic projects. This resulted in 20 high-profile papers published in 2011 alone and contributing to the Genomics Encyclopedia of Fungi, which targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts). Our next grand challenges include larger scale exploration of fungal diversity (1000 fungal genomes), developing molecular tools for DOE-relevant model organisms, and analysis of complex systems and metagenomes.« less
Resources and costs for microbial sequence analysis evaluated using virtual machines and cloud computing.

PubMed

Angiuoli, Samuel V; White, James R; Matalka, Malcolm; White, Owen; Fricke, W Florian

2011-01-01

The widespread popularity of genomic applications is threatened by the "bioinformatics bottleneck" resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers.
Sequence analysis of the L protein of the Ebola 2014 outbreak: Insight into conserved regions and mutations.

PubMed

Ayub, Gohar; Waheed, Yasir

2016-06-01

The 2014 Ebola outbreak was one of the largest that have occurred; it started in Guinea and spread to Nigeria, Liberia and Sierra Leone. Phylogenetic analysis of the current virus species indicated that this outbreak is the result of a divergent lineage of the Zaire ebolavirus. The L protein of Ebola virus (EBOV) is the catalytic subunit of the RNA‑dependent RNA polymerase complex, which, with VP35, is key for the replication and transcription of viral RNA. Earlier sequence analysis demonstrated that the L protein of all non‑segmented negative‑sense (NNS) RNA viruses consists of six domains containing conserved functional motifs. The aim of the present study was to analyze the presence of these motifs in 2014 EBOV isolates, highlight their function and how they may contribute to the overall pathogenicity of the isolates. For this purpose, 81 2014 EBOV L protein sequences were aligned with 475 other NNS RNA viruses, including Paramyxoviridae and Rhabdoviridae viruses. Phylogenetic analysis of all EBOV outbreak L protein sequences was also performed. Analysis of the amino acid substitutions in the 2014 EBOV outbreak was conducted using sequence analysis. The alignment demonstrated the presence of previously conserved motifs in the 2014 EBOV isolates and novel residues. Notably, all the mutations identified in the 2014 EBOV isolates were tolerant, they were pathogenic with certain examples occurring within previously determined functional conserved motifs, possibly altering viral pathogenicity, replication and virulence. The phylogenetic analysis demonstrated that all sequences with the exception of the 2014 EBOV sequences were clustered together. The 2014 EBOV outbreak has acquired a great number of mutations, which may explain the reasons behind this unprecedented outbreak. Certain residues critical to the function of the polymerase remain conserved and may be targets for the development of antiviral therapeutic agents.
Resources and Costs for Microbial Sequence Analysis Evaluated Using Virtual Machines and Cloud Computing

PubMed Central

Angiuoli, Samuel V.; White, James R.; Matalka, Malcolm; White, Owen; Fricke, W. Florian

2011-01-01

Background The widespread popularity of genomic applications is threatened by the “bioinformatics bottleneck” resulting from uncertainty about the cost and infrastructure needed to meet increasing demands for next-generation sequence analysis. Cloud computing services have been discussed as potential new bioinformatics support systems but have not been evaluated thoroughly. Results We present benchmark costs and runtimes for common microbial genomics applications, including 16S rRNA analysis, microbial whole-genome shotgun (WGS) sequence assembly and annotation, WGS metagenomics and large-scale BLAST. Sequence dataset types and sizes were selected to correspond to outputs typically generated by small- to midsize facilities equipped with 454 and Illumina platforms, except for WGS metagenomics where sampling of Illumina data was used. Automated analysis pipelines, as implemented in the CloVR virtual machine, were used in order to guarantee transparency, reproducibility and portability across different operating systems, including the commercial Amazon Elastic Compute Cloud (EC2), which was used to attach real dollar costs to each analysis type. We found considerable differences in computational requirements, runtimes and costs associated with different microbial genomics applications. While all 16S analyses completed on a single-CPU desktop in under three hours, microbial genome and metagenome analyses utilized multi-CPU support of up to 120 CPUs on Amazon EC2, where each analysis completed in under 24 hours for less than $60. Representative datasets were used to estimate maximum data throughput on different cluster sizes and to compare costs between EC2 and comparable local grid servers. Conclusions Although bioinformatics requirements for microbial genomics depend on dataset characteristics and the analysis protocols applied, our results suggests that smaller sequencing facilities (up to three Roche/454 or one Illumina GAIIx sequencer) invested in 16S rRNA amplicon sequencing, microbial single-genome and metagenomics WGS projects can achieve cost-efficient bioinformatics support using CloVR in combination with Amazon EC2 as an alternative to local computing centers. PMID:22028928

Molecular characterization of a novel Nucleorhabdovirus from black currant identified by high-throughput sequencing

USDA-ARS?s Scientific Manuscript database

Contigs with sequence similarities to several nucleorhabdoviruses were identified by high-throughput sequencing analysis from a black currant (Ribes nigrum L.) cultivar. The complete genomic sequence of this new nucleorhabdovirus is 14,432 nucleotides. Its genomic organization is typical of nucleorh...
Comparison of sequencing-based methods to profile DNA methylation and identification of monoallelic epigenetic modifications

USDA-ARS?s Scientific Manuscript database

Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylat...
SNP discovery through de novo deep sequencing using the next generation of DNA sequencers

USDA-ARS?s Scientific Manuscript database

The production of high volumes of DNA sequence data using new technologies has permitted more efficient identification of single nucleotide polymorphisms in vertebrate genomes. This chapter presented practical methodology for production and analysis of DNA sequence data for SNP discovery....
Use of Sequence-Independent, Single-Primer-Amplification (SISPA) for rapid detection, identification, and characterization of avian RNA viruses

USDA-ARS?s Scientific Manuscript database

Current technologies with next generation sequencing have revolutionized metagenomics analysis of clinical samples. To achieve the non-selective amplification and recovery of low abundance genetic sequences, a simplified Sequence-Independent, Single-Primer Amplification (SISPA) technique in combinat...
Software for Analyzing Sequences of Flow-Related Images

NASA Technical Reports Server (NTRS)

Klimek, Robert; Wright, Ted

2004-01-01

Spotlight is a computer program for analysis of sequences of images generated in combustion and fluid physics experiments. Spotlight can perform analysis of a single image in an interactive mode or a sequence of images in an automated fashion. The primary type of analysis is tracking of positions of objects over sequences of frames. Features and objects that are typically tracked include flame fronts, particles, droplets, and fluid interfaces. Spotlight automates the analysis of object parameters, such as centroid position, velocity, acceleration, size, shape, intensity, and color. Images can be processed to enhance them before statistical and measurement operations are performed. An unlimited number of objects can be analyzed simultaneously. Spotlight saves results of analyses in a text file that can be exported to other programs for graphing or further analysis. Spotlight is a graphical-user-interface-based program that at present can be executed on Microsoft Windows and Linux operating systems. A version that runs on Macintosh computers is being considered.
Tracking B-Cell Repertoires and Clonal Histories in Normal and Malignant Lymphocytes.

PubMed

Weston-Bell, Nicola J; Cowan, Graeme; Sahota, Surinder S

2017-01-01

Methods for tracking B-cell repertoires and clonal history in normal and malignant B-cells based on immunoglobulin variable region (IGV) gene analysis have developed rapidly with the advent of massive parallel next-generation sequencing (mpNGS) protocols. mpNGS permits a depth of analysis of IGV genes not hitherto feasible, and presents challenges of bioinformatics analysis, which can be readily met by current pipelines. This strategy offers a potential resolution of B-cell usage at a depth that may capture fully the natural state, in a given biological setting. Conventional methods based on RT-PCR amplification and Sanger sequencing are also available where mpNGS is not accessible. Each method offers distinct advantages. Conventional methods for IGV gene sequencing are readily adaptable to most laboratories and provide an ease of analysis to capture salient features of B-cell use. This chapter describes two methods in detail for analysis of IGV genes, mpNGS and conventional RT-PCR with Sanger sequencing.
Analysis of Metagenomic Sequences: From Megabases to Terabases

ScienceCinema

Krypides, Nikos

2018-05-04

Nikos Krypides of the DOE Joint Genome Institute discusses metagenomics and the challenge of dealing with terabases of data on June 4, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
Molecular phylogeny of grey mullets (Teleostei: Mugilidae) in Greece: evidence from sequence analysis of mtDNA segments.

PubMed

Papasotiropoulos, Vasilis; Klossa-Kilia, Elena; Alahiotis, Stamatis N; Kilias, George

2007-08-01

Mitochondrial DNA sequence analysis has been used to explore genetic differentiation and phylogenetic relationships among five species of the Mugilidae family, Mugil cephalus, Chelon labrosus, Liza aurata, Liza ramada, and Liza saliens. DNA was isolated from samples originating from the Messolongi Lagoon in Greece. Three mtDNA segments (12s rRNA, 16s rRNA, and CO I) were PCR amplified and sequenced. Sequencing analysis revealed that the greatest genetic differentiation was observed between M. cephalus and all the other species studied, while C. labrosus and L. aurata were the closest taxa. Dendrograms obtained by the neighbor-joining method and Bayesian inference analysis exhibited the same topology. According to this topology, M. cephalus is the most distinct species and the remaining taxa are clustered together, with C. labrosus and L. aurata forming a single group. The latter result brings into question the monophyletic origin of the genus Liza.
Genomic Encyclopedia of Fungi

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grigoriev, Igor

Genomes of fungi relevant to energy and environment are in focus of the Fungal Genomic Program at the US Department of Energy Joint Genome Institute (JGI). Its key project, the Genomics Encyclopedia of Fungi, targets fungi related to plant health (symbionts, pathogens, and biocontrol agents) and biorefinery processes (cellulose degradation, sugar fermentation, industrial hosts), and explores fungal diversity by means of genome sequencing and analysis. Over 150 fungal genomes have been sequenced by JGI to date and released through MycoCosm (www.jgi.doe.gov/fungi), a fungal web-portal, which integrates sequence and functional data with genome analysis tools for user community. Sequence analysis supportedmore » by functional genomics leads to developing parts list for complex systems ranging from ecosystems of biofuel crops to biorefineries. Recent examples of such parts suggested by comparative genomics and functional analysis in these areas are presented here.« less
Genomic Diversity and Evolution of the Lyssaviruses

PubMed Central

Delmas, Olivier; Holmes, Edward C.; Talbi, Chiraz; Larrous, Florence; Dacheux, Laurent; Bouchier, Christiane; Bourhy, Hervé

2008-01-01

Lyssaviruses are RNA viruses with single-strand, negative-sense genomes responsible for rabies-like diseases in mammals. To date, genomic and evolutionary studies have most often utilized partial genome sequences, particularly of the nucleoprotein and glycoprotein genes, with little consideration of genome-scale evolution. Herein, we report the first genomic and evolutionary analysis using complete genome sequences of all recognised lyssavirus genotypes, including 14 new complete genomes of field isolates from 6 genotypes and one genotype that is completely sequenced for the first time. In doing so we significantly increase the extent of genome sequence data available for these important viruses. Our analysis of these genome sequence data reveals that all lyssaviruses have the same genomic organization. A phylogenetic analysis reveals strong geographical structuring, with the greatest genetic diversity in Africa, and an independent origin for the two known genotypes that infect European bats. We also suggest that multiple genotypes may exist within the diversity of viruses currently classified as ‘Lagos Bat’. In sum, we show that rigorous phylogenetic techniques based on full length genome sequence provide the best discriminatory power for genotype classification within the lyssaviruses. PMID:18446239
Analysis of plant microbe interactions in the era of next generation sequencing technologies

PubMed Central

Knief, Claudia

2014-01-01

Next generation sequencing (NGS) technologies have impressively accelerated research in biological science during the last years by enabling the production of large volumes of sequence data to a drastically lower price per base, compared to traditional sequencing methods. The recent and ongoing developments in the field allow addressing research questions in plant-microbe biology that were not conceivable just a few years ago. The present review provides an overview of NGS technologies and their usefulness for the analysis of microorganisms that live in association with plants. Possible limitations of the different sequencing systems, in particular sources of errors and bias, are critically discussed and methods are disclosed that help to overcome these shortcomings. A focus will be on the application of NGS methods in metagenomic studies, including the analysis of microbial communities by amplicon sequencing, which can be considered as a targeted metagenomic approach. Different applications of NGS technologies are exemplified by selected research articles that address the biology of the plant associated microbiota to demonstrate the worth of the new methods. PMID:24904612
Pediatric Glioblastoma Therapies Based on Patient-Derived Stem Cell Resources

DTIC Science & Technology

2014-11-01

genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate gene...and genomic DNA and then subjected to Illumina high-throughput sequencing . In this analysis, shRNAs lost in the GSC population represent candidate...PRISM 7900 Sequence Detection System ( Genomics Resource, FHCRC). Relative transcript abundance was analyzed using the 2−ΔΔCt method. TRIzol (Invitrogen
Early Detection of NSCLC Using Stromal Markers in Peripheral Blood

DTIC Science & Technology

2016-09-01

circulating myeloid cells, flow cytometry, RNA -sequencing, expression profiling. 3. ACCOMPLISHMENTS:  What were the major goals of the project...Subtask 2: Flow cytometry sorting of circulating myeloid cells. Subtask 3: RNA -Sequencing Subtask 4: RNA -seq data analysis Subtask 5: Feasible RT-PCR...accomplished the patient recruitment, flow cytometry sorting of circulating myeloid cells, RNA -sequencing of the samples. During the RNA - seq data analysis, we
Meta sequence analysis of human blood peptides and their parent proteins.

PubMed

Bowden, Peter; Pendrak, Voitek; Zhu, Peihong; Marshall, John G

2010-04-18

Sequence analysis of the blood peptides and their qualities will be key to understanding the mechanisms that contribute to error in LC-ESI-MS/MS. Analysis of peptides and their proteins at the level of sequences is much more direct and informative than the comparison of disparate accession numbers. A portable database of all blood peptide and protein sequences with descriptor fields and gene ontology terms might be useful for designing immunological or MRM assays from human blood. The results of twelve studies of human blood peptides and/or proteins identified by LC-MS/MS and correlated against a disparate array of genetic libraries were parsed and matched to proteins from the human ENSEMBL, SwissProt and RefSeq databases by SQL. The reported peptide and protein sequences were organized into an SQL database with full protein sequences and up to five unique peptides in order of prevalence along with the peptide count for each protein. Structured query language or BLAST was used to acquire descriptive information in current databases. Sampling error at the level of peptides is the largest source of disparity between groups. Chi Square analysis of peptide to protein distributions confirmed the significant agreement between groups on identified proteins. Copyright 2010. Published by Elsevier B.V.
MIPS: a database for genomes and protein sequences

PubMed Central

Mewes, H. W.; Frishman, D.; Güldener, U.; Mannhaupt, G.; Mayer, K.; Mokrejs, M.; Morgenstern, B.; Münsterkötter, M.; Rudd, S.; Weil, B.

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz–Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91–93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155–158; Barker et al. (2001) Nucleic Acids Res., 29, 29–32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de). PMID:11752246
MIPS: a database for genomes and protein sequences.

PubMed

Mewes, H W; Frishman, D; Güldener, U; Mannhaupt, G; Mayer, K; Mokrejs, M; Morgenstern, B; Münsterkötter, M; Rudd, S; Weil, B

2002-01-01

The Munich Information Center for Protein Sequences (MIPS-GSF, Neuherberg, Germany) continues to provide genome-related information in a systematic way. MIPS supports both national and European sequencing and functional analysis projects, develops and maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences, and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the databases for the comprehensive set of genomes (PEDANT genomes), the database of annotated human EST clusters (HIB), the database of complete cDNAs from the DHGP (German Human Genome Project), as well as the project specific databases for the GABI (Genome Analysis in Plants) and HNB (Helmholtz-Netzwerk Bioinformatik) networks. The Arabidospsis thaliana database (MATDB), the database of mitochondrial proteins (MITOP) and our contribution to the PIR International Protein Sequence Database have been described elsewhere [Schoof et al. (2002) Nucleic Acids Res., 30, 91-93; Scharfe et al. (2000) Nucleic Acids Res., 28, 155-158; Barker et al. (2001) Nucleic Acids Res., 29, 29-32]. All databases described, the protein analysis tools provided and the detailed descriptions of our projects can be accessed through the MIPS World Wide Web server (http://mips.gsf.de).
Metagenomic and near full-length 16S rRNA sequence data in support of the phylogenetic analysis of the rumen bacterial community in steers

USDA-ARS?s Scientific Manuscript database

Next generation sequencing technologies have vastly changed the approach of sequencing of the 16S rRNA gene for studies in microbial ecology. Three distinct technologies are available for large-scale 16S sequencing. All three are subject to biases introduced by sequencing error rates, amplificatio...
Median network analysis of defectively sequenced entire mitochondrial genomes from early and contemporary disease studies.

PubMed

Bandelt, Hans-Jürgen; Yao, Yong-Gang; Bravi, Claudio M; Salas, Antonio; Kivisild, Toomas

2009-03-01

Sequence analysis of the mitochondrial genome has become a routine method in the study of mitochondrial diseases. Quite often, the sequencing efforts in the search of pathogenic or disease-associated mutations are affected by technical and interpretive problems, caused by sample mix-up, contamination, biochemical problems, incomplete sequencing, misdocumentation and insufficient reference to previously published data. To assess data quality in case studies of mitochondrial diseases, it is recommended to compare any mtDNA sequence under consideration to their phylogenetically closest lineages available in the Web. The median network method has proven useful for visualizing potential problems with the data. We contrast some early reports of complete mtDNA sequences to more recent total mtDNA sequencing efforts in studies of various mitochondrial diseases. We conclude that the quality of complete mtDNA sequences generated in the medical field in the past few years is somewhat unsatisfactory and may even fall behind that of pioneer manual sequencing in the early nineties. Our study provides a paradigm for an a posteriori evaluation of sequence quality and for detection of potential problems with inferring a pathogenic status of a particular mutation.
Sequence investigation of 34 forensic autosomal STRs with massively parallel sequencing.

PubMed

Zhang, Suhua; Niu, Yong; Bian, Yingnan; Dong, Rixia; Liu, Xiling; Bao, Yun; Jin, Chao; Zheng, Hancheng; Li, Chengtao

2018-05-01

STRs vary not only in the length of the repeat units and the number of repeats but also in the region with which they conform to an incremental repeat pattern. Massively parallel sequencing (MPS) offers new possibilities in the analysis of STRs since they can simultaneously sequence multiple targets in a single reaction and capture potential internal sequence variations. Here, we sequenced 34 STRs applied in the forensic community of China with a custom-designed panel. MPS performance were evaluated from sequencing reads analysis, concordance study and sensitivity testing. High coverage sequencing data were obtained to determine the constitute ratios and heterozygous balance. No actual inconsistent genotypes were observed between capillary electrophoresis (CE) and MPS, demonstrating the reliability of the panel and the MPS technology. With the sequencing data from the 200 investigated individuals, 346 and 418 alleles were obtained via CE and MPS technologies at the 34 STRs, indicating MPS technology provides higher discrimination than CE detection. The whole study demonstrated that STR genotyping with the custom panel and MPS technology has the potential not only to reveal length and sequence variations but also to satisfy the demands of high throughput and high multiplexing with acceptable sensitivity.
Analysis of BAC end sequences in oak, a keystone forest tree species, providing insight into the composition of its genome

PubMed Central

2011-01-01

Background One of the key goals of oak genomics research is to identify genes of adaptive significance. This information may help to improve the conservation of adaptive genetic variation and the management of forests to increase their health and productivity. Deep-coverage large-insert genomic libraries are a crucial tool for attaining this objective. We report herein the construction of a BAC library for Quercus robur, its characterization and an analysis of BAC end sequences. Results The EcoRI library generated consisted of 92,160 clones, 7% of which had no insert. Levels of chloroplast and mitochondrial contamination were below 3% and 1%, respectively. Mean clone insert size was estimated at 135 kb. The library represents 12 haploid genome equivalents and, the likelihood of finding a particular oak sequence of interest is greater than 99%. Genome coverage was confirmed by PCR screening of the library with 60 unique genetic loci sampled from the genetic linkage map. In total, about 20,000 high-quality BAC end sequences (BESs) were generated by sequencing 15,000 clones. Roughly 5.88% of the combined BAC end sequence length corresponded to known retroelements while ab initio repeat detection methods identified 41 additional repeats. Collectively, characterized and novel repeats account for roughly 8.94% of the genome. Further analysis of the BESs revealed 1,823 putative genes suggesting at least 29,340 genes in the oak genome. BESs were aligned with the genome sequences of Arabidopsis thaliana, Vitis vinifera and Populus trichocarpa. One putative collinear microsyntenic region encoding an alcohol acyl transferase protein was observed between oak and chromosome 2 of V. vinifera. Conclusions This BAC library provides a new resource for genomic studies, including SSR marker development, physical mapping, comparative genomics and genome sequencing. BES analysis provided insight into the structure of the oak genome. These sequences will be used in the assembly of a future genome sequence for oak. PMID:21645357

Prediction and phylogenetic analysis of mammalian short interspersed elements (SINEs).

PubMed

Rogozin, I B; Mayorov, V I; Lavrentieva, M V; Milanesi, L; Adkison, L R

2000-09-01

The presence of repetitive elements can create serious problems for sequence analysis, especially in the case of homology searches in nucleotide sequence databases. Repetitive elements should be treated carefully by using special programs and databases. In this paper, various aspects of SINE (short interspersed repetitive element) identification, analysis and evolution are discussed.
Applications and Extensions of pClust to Big Microbial Proteomic Data

ERIC Educational Resources Information Center

Lockwood, Svetlana

2016-01-01

The goal of biological sciences is to understand the biomolecular mechanics of living organisms. Proteins serve as the foundation for organisms functional analysis and sequence analysis has shown to be invaluable in answering questions about individual organisms. The first step in any sequence analysis is alignment and it is common that even…
TaxI: a software tool for DNA barcoding using distance methods

PubMed Central

Steinke, Dirk; Vences, Miguel; Salzburger, Walter; Meyer, Axel

2005-01-01

DNA barcoding is a promising approach to the diagnosis of biological diversity in which DNA sequences serve as the primary key for information retrieval. Most existing software for evolutionary analysis of DNA sequences was designed for phylogenetic analyses and, hence, those algorithms do not offer appropriate solutions for the rapid, but precise analyses needed for DNA barcoding, and are also unable to process the often large comparative datasets. We developed a flexible software tool for DNA taxonomy, named TaxI. This program calculates sequence divergences between a query sequence (taxon to be barcoded) and each sequence of a dataset of reference sequences defined by the user. Because the analysis is based on separate pairwise alignments this software is also able to work with sequences characterized by multiple insertions and deletions that are difficult to align in large sequence sets (i.e. thousands of sequences) by multiple alignment algorithms because of computational restrictions. Here, we demonstrate the utility of this approach with two datasets of fish larvae and juveniles from Lake Constance and juvenile land snails under different models of sequence evolution. Sets of ribosomal 16S rRNA sequences, characterized by multiple indels, performed as good as or better than cox1 sequence sets in assigning sequences to species, demonstrating the suitability of rRNA genes for DNA barcoding. PMID:16214755
An Ambystoma mexicanum EST sequencing project: analysis of 17,352 expressed sequence tags from embryonic and regenerating blastema cDNA libraries

PubMed Central

Habermann, Bianca; Bebin, Anne-Gaelle; Herklotz, Stephan; Volkmer, Michael; Eckelt, Kay; Pehlke, Kerstin; Epperlein, Hans Henning; Schackert, Hans Konrad; Wiebe, Glenis; Tanaka, Elly M

2004-01-01

Background The ambystomatid salamander, Ambystoma mexicanum (axolotl), is an important model organism in evolutionary and regeneration research but relatively little sequence information has so far been available. This is a major limitation for molecular studies on caudate development, regeneration and evolution. To address this lack of sequence information we have generated an expressed sequence tag (EST) database for A. mexicanum. Results Two cDNA libraries, one made from stage 18-22 embryos and the other from day-6 regenerating tail blastemas, generated 17,352 sequences. From the sequenced ESTs, 6,377 contigs were assembled that probably represent 25% of the expressed genes in this organism. Sequence comparison revealed significant homology to entries in the NCBI non-redundant database. Further examination of this gene set revealed the presence of genes involved in important cell and developmental processes, including cell proliferation, cell differentiation and cell-cell communication. On the basis of these data, we have performed phylogenetic analysis of key cell-cycle regulators. Interestingly, while cell-cycle proteins such as the cyclin B family display expected evolutionary relationships, the cyclin-dependent kinase inhibitor 1 gene family shows an unusual evolutionary behavior among the amphibians. Conclusions Our analysis reveals the importance of a comprehensive sequence set from a representative of the Caudata and illustrates that the EST sequence database is a rich source of molecular, developmental and regeneration studies. To aid in data mining, the ESTs have been organized into an easily searchable database that is freely available online. PMID:15345051
Statistical Features of the 2010 Beni-Ilmane, Algeria, Aftershock Sequence

NASA Astrophysics Data System (ADS)

Hamdache, M.; Peláez, J. A.; Gospodinov, D.; Henares, J.

2018-03-01

The aftershock sequence of the 2010 Beni-Ilmane ( M W 5.5) earthquake is studied in depth to analyze the spatial and temporal variability of seismicity parameters of the relationships modeling the sequence. The b value of the frequency-magnitude distribution is examined rigorously. A threshold magnitude of completeness equal to 2.1, using the maximum curvature procedure or the changing point algorithm, and a b value equal to 0.96 ± 0.03 have been obtained for the entire sequence. Two clusters have been identified and characterized by their faulting type, exhibiting b values equal to 0.99 ± 0.05 and 1.04 ± 0.05. Additionally, the temporal decay of the aftershock sequence was examined using a stochastic point process. The analysis was done through the restricted epidemic-type aftershock sequence (RETAS) stochastic model, which allows the possibility to recognize the prevailing clustering pattern of the relaxation process in the examined area. The analysis selected the epidemic-type aftershock sequence (ETAS) model to offer the most appropriate description of the temporal distribution, which presumes that all events in the sequence can cause secondary aftershocks. Finally, the fractal dimensions are estimated using the integral correlation. The obtained D 2 values are 2.15 ± 0.01, 2.23 ± 0.01 and 2.17 ± 0.02 for the entire sequence, and for the first and second cluster, respectively. An analysis of the temporal evolution of the fractal dimensions D -2, D 0, D 2 and the spectral slope has been also performed to derive and characterize the different clusters included in the sequence.
DNA Translator and Aligner: HyperCard utilities to aid phylogenetic analysis of molecules.

PubMed

Eernisse, D J

1992-04-01

DNA Translator and Aligner are molecular phylogenetics HyperCard stacks for Macintosh computers. They manipulate sequence data to provide graphical gene mapping, conversions, translations and manual multiple-sequence alignment editing. DNA Translator is able to convert documented GenBank or EMBL documented sequences into linearized, rescalable gene maps whose gene sequences are extractable by clicking on the corresponding map button or by selection from a scrolling list. Provided gene maps, complete with extractable sequences, consist of nine metazoan, one yeast, and one ciliate mitochondrial DNAs and three green plant chloroplast DNAs. Single or multiple sequences can be manipulated to aid in phylogenetic analysis. Sequences can be translated between nucleic acids and proteins in either direction with flexible support of alternate genetic codes and ambiguous nucleotide symbols. Multiple aligned sequence output from diverse sources can be converted to Nexus, Hennig86 or PHYLIP format for subsequent phylogenetic analysis. Input or output alignments can be examined with Aligner, a convenient accessory stack included in the DNA Translator package. Aligner is an editor for the manual alignment of up to 100 sequences that toggles between display of matched characters and normal unmatched sequences. DNA Translator also generates graphic displays of amino acid coding and codon usage frequency relative to all other, or only synonymous, codons for approximately 70 select organism-organelle combinations. Codon usage data is compatible with spreadsheet or UWGCG formats for incorporation of additional molecules of interest. The complete package is available via anonymous ftp and is free for non-commercial uses.
An automated genotyping tool for enteroviruses and noroviruses.

PubMed

Kroneman, A; Vennema, H; Deforche, K; v d Avoort, H; Peñaranda, S; Oberste, M S; Vinjé, J; Koopmans, M

2011-06-01

Molecular techniques are established as routine in virological laboratories and virus typing through (partial) sequence analysis is increasingly common. Quality assurance for the use of typing data requires harmonization of genotype nomenclature, and agreement on target genes, depending on the level of resolution required, and robustness of methods. To develop and validate web-based open-access typing-tools for enteroviruses and noroviruses. An automated web-based typing algorithm was developed, starting with BLAST analysis of the query sequence against a reference set of sequences from viruses in the family Picornaviridae or Caliciviridae. The second step is phylogenetic analysis of the query sequence and a sub-set of the reference sequences, to assign the enterovirus type or norovirus genotype and/or variant, with profile alignment, construction of phylogenetic trees and bootstrap validation. Typing is performed on VP1 sequences of Human enterovirus A to D, and ORF1 and ORF2 sequences of genogroup I and II noroviruses. For validation, we used the tools to automatically type sequences in the RIVM and CDC enterovirus databases and the FBVE norovirus database. Using the typing-tools, 785(99%) of 795 Enterovirus VP1 sequences, and 8154(98.5%) of 8342 norovirus sequences were typed in accordance with previously used methods. Subtyping into variants was achieved for 4439(78.4%) of 5838 NoV GII.4 sequences. The online typing-tools reliably assign genotypes for enteroviruses and noroviruses. The use of phylogenetic methods makes these tools robust to ongoing evolution. This should facilitate standardized genotyping and nomenclature in clinical and public health laboratories, thus supporting inter-laboratory comparisons. Copyright © 2011 Elsevier B.V. All rights reserved.
Comparative Genome Sequence Analysis of the Bpa/Str Region in Mouse and Man

PubMed Central

Mallon, A.-M.; Platzer, M.; Bate, R.; Gloeckner, G.; Botcherby, M.R.M.; Nordsiek, G.; Strivens, M.A.; Kioschis, P.; Dangel, A.; Cunningham, D.; Straw, R.N.A.; Weston, P.; Gilbert, M.; Fernando, S.; Goodall, K.; Hunter, G.; Greystrong, J.S.; Clarke, D.; Kimberley, C.; Goerdes, M.; Blechschmidt, K.; Rump, A.; Hinzmann, B.; Mundy, C.R.; Miller, W.; Poustka, A.; Herman, G.E.; Rhodes, M.; Denny, P.; Rosenthal, A.; Brown, S.D.M.

2000-01-01

The progress of human and mouse genome sequencing programs presages the possibility of systematic cross-species comparison of the two genomes as a powerful tool for gene and regulatory element identification. As the opportunities to perform comparative sequence analysis emerge, it is important to develop parameters for such analyses and to examine the outcomes of cross-species comparison. Our analysis used gene prediction and a database search of 430 kb of genomic sequence covering the Bpa/Str region of the mouse X chromosome, and 745 kb of genomic sequence from the homologous human X chromosome region. We identified 11 genes in mouse and 13 genes and two pseudogenes in human. In addition, we compared the mouse and human sequences using pairwise alignment and searches for evolutionary conserved regions (ECRs) exceeding a defined threshold of sequence identity. This approach aided the identification of at least four further putative conserved genes in the region. Comparative sequencing revealed that this region is a mosaic in evolutionary terms, with considerably more rearrangement between the two species than realized previously from comparative mapping studies. Surprisingly, this region showed an extremely high LINE and low SINE content, low G+C content, and yet a relatively high gene density, in contrast to the low gene density usually associated with such regions. [The sequence data described in this paper have been submitted to EMBL under the following accession nos.: Mouse Genomic Sequence: Mouse contig A (AL021127), Mouse contig B (AL049866), BAC41M10 (AL136328), PAC303O11(AL136329). Human Genomic Sequence: Human contig 1 (U82671, U82670), Human contig 2 (U82695).] PMID:10854409
BCM Search Launcher--an integrated interface to molecular biology data base search and analysis services available on the World Wide Web.

PubMed

Smith, R F; Wiese, B A; Wojzynski, M K; Davison, D B; Worley, K C

1996-05-01

The BCM Search Launcher is an integrated set of World Wide Web (WWW) pages that organize molecular biology-related search and analysis services available on the WWW by function, and provide a single point of entry for related searches. The Protein Sequence Search Page, for example, provides a single sequence entry form for submitting sequences to WWW servers that offer remote access to a variety of different protein sequence search tools, including BLAST, FASTA, Smith-Waterman, BEAUTY, PROSITE, and BLOCKS searches. Other Launch pages provide access to (1) nucleic acid sequence searches, (2) multiple and pair-wise sequence alignments, (3) gene feature searches, (4) protein secondary structure prediction, and (5) miscellaneous sequence utilities (e.g., six-frame translation). The BCM Search Launcher also provides a mechanism to extend the utility of other WWW services by adding supplementary hypertext links to results returned by remote servers. For example, links to the NCBI's Entrez data base and to the Sequence Retrieval System (SRS) are added to search results returned by the NCBI's WWW BLAST server. These links provide easy access to auxiliary information, such as Medline abstracts, that can be extremely helpful when analyzing BLAST data base hits. For new or infrequent users of sequence data base search tools, we have preset the default search parameters to provide the most informative first-pass sequence analysis possible. We have also developed a batch client interface for Unix and Macintosh computers that allows multiple input sequences to be searched automatically as a background task, with the results returned as individual HTML documents directly to the user's system. The BCM Search Launcher and batch client are available on the WWW at URL http:@gc.bcm.tmc.edu:8088/search-launcher.html.
Sequencing the Unrearranged Human Immunoglobin

DOE Office of Scientific and Technical Information (OSTI.GOV)

Warren, Rene

2010-06-03

Rene Warren from Canada's Michael Smith Genome Sciences Centre discusses sequencing and finishing the IgH heavy chain locus on June 3, 2010 at the "Sequencing, Finishing, Analysis in the Future" meeting in Santa Fe, NM.
The Application of Next-Generation Sequencing for Mutation Detection in Autosomal-Dominant Hereditary Hearing Impairment.

PubMed

Gürtler, Nicolas; Röthlisberger, Benno; Ludin, Katja; Schlegel, Christoph; Lalwani, Anil K

2017-07-01

Identification of the causative mutation using next-generation sequencing in autosomal-dominant hereditary hearing impairment, as mutation analysis in hereditary hearing impairment by classic genetic methods, is hindered by the high heterogeneity of the disease. Two Swiss families with autosomal-dominant hereditary hearing impairment. Amplified DNA libraries for next-generation sequencing were constructed from extracted genomic DNA, derived from peripheral blood, and enriched by a custom-made sequence capture library. Validated, pooled libraries were sequenced on an Illumina MiSeq instrument, 300 cycles and paired-end sequencing. Technical data analysis was performed with SeqMonk, variant analysis with GeneTalk or VariantStudio. The detection of mutations in genes related to hearing loss by next-generation sequencing was subsequently confirmed using specific polymerase-chain-reaction and Sanger sequencing. Mutation detection in hearing-loss-related genes. The first family harbored the mutation c.5383+5delGTGA in the TECTA-gene. In the second family, a novel mutation c.2614-2625delCATGGCGCCGTG in the WFS1-gene and a second mutation TCOF1-c.1028G>A were identified. Next-generation sequencing successfully identified the causative mutation in families with autosomal-dominant hereditary hearing impairment. The results helped to clarify the pathogenic role of a known mutation and led to the detection of a novel one. NGS represents a feasible approach with great potential future in the diagnostics of hereditary hearing impairment, even in smaller labs.
Molecular Analysis of Dehalococcoides 16S Ribosomal DNA from Chloroethene-Contaminated Sites throughout North America and Europe

PubMed Central

Hendrickson, Edwin R.; Payne, Jo Ann; Young, Roslyn M.; Starr, Mark G.; Perry, Michael P.; Fahnestock, Stephen; Ellis, David E.; Ebersole, Richard C.

2002-01-01

The environmental distribution of Dehalococcoides group organisms and their association with chloroethene-contaminated sites were examined. Samples from 24 chloroethene-dechlorinating sites scattered throughout North America and Europe were tested for the presence of members of the Dehalococcoides group by using a PCR assay developed to detect Dehalococcoides 16S rRNA gene (rDNA) sequences. Sequences identified by sequence analysis as sequences of members of the Dehalococcoides group were detected at 21 sites. Full dechlorination of chloroethenes to ethene occurred at these sites. Dehalococcoides sequences were not detected in samples from three sites at which partial dechlorination of chloroethenes occurred, where dechlorination appeared to stop at 1,2-cis-dichloroethene. Phylogenetic analysis of the 16S rDNA amplicons confirmed that Dehalococcoides sequences formed a unique 16S rDNA group. These 16S rDNA sequences were divided into three subgroups based on specific base substitution patterns in variable regions 2 and 6 of the Dehalococcoides 16S rDNA sequence. Analyses also demonstrated that specific base substitution patterns were signature patterns. The specific base substitutions distinguished the three sequence subgroups phylogenetically. These results demonstrated that members of the Dehalococcoides group are widely distributed in nature and can be found in a variety of geological formations and in different climatic zones. Furthermore, the association of these organisms with full dechlorination of chloroethenes suggests that they are promising candidates for engineered bioremediation and may be important contributors to natural attenuation of chloroethenes. PMID:11823182
Molecular characterization of a novel Luteovirus from peach identified by high-throughput sequencing

USDA-ARS?s Scientific Manuscript database

Contigs with sequence homologies to Cherry-associated luteovirus were identified by high-throughput sequencing analysis of two peach accessions undergoing quarantine testing. The complete genomic sequences of the two isolates of this virus are 5,819 and 5,814 nucleotides. Their genome organization i...
Molecular Cloning and Sequencing of Hemoglobin-Beta Gene of Channel Catfish, Ictalurus Punctatus Rafinesque

USDA-ARS?s Scientific Manuscript database

: Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...
Complete genome sequence of the plant pathogen Erwinia amylovora strain ATCC 49946

USDA-ARS?s Scientific Manuscript database

Erwinia amylovora causes the economically important disease fire blight that affects rosaceous plants, especially pear and apple. Here we report the complete genome sequence and annotation of strain ATCC 49946. The analysis of the sequence and its comparison with sequenced genomes of closely related...
Sequence analysis reveals genomic factors affecting EST-SSR primer performance and polymorphism

USDA-ARS?s Scientific Manuscript database

Search for simple sequence repeat (SSR) motifs and design of flanking primers in expressed sequence tag (EST) sequences can be easily done at a large scale using bioinformatics programs. However, failed amplification and/or detection, along with lack of polymorphism, is often seen among randomly sel...
Sequence analysis to assess labour market participation following vocational rehabilitation: an observational study among patients sick-listed with low back pain from a randomised clinical trial in Denmark

PubMed Central

Lindholdt, Louise; Labriola, Merete; Nielsen, Claus Vinther; Horsbøl, Trine Allerslev; Lund, Thomas

2017-01-01

Introduction The return-to-work (RTW) process after long-term sickness absence is often complex and long and implies multiple shifts between different labour market states for the absentee. Standard methods for examining RTW research typically rely on the analysis of one outcome measure at a time, which will not capture the many possible states and transitions the absentee can go through. The purpose of this study was to explore the potential added value of sequence analysis in supplement to standard regression analysis of a multidisciplinary RTW intervention among patients with low back pain (LBP). Methods The study population consisted of 160 patients randomly allocated to either a hospital-based brief or a multidisciplinary intervention. Data on labour market participation following intervention were obtained from a national register and analysed in two ways: as a binary outcome expressed as active or passive relief at a 1-year follow-up and as four different categories for labour market participation. Logistic regression and sequence analysis were performed. Results The logistic regression analysis showed no difference in labour market participation for patients in the two groups after 1 year. Applying sequence analysis showed differences in subsequent labour market participation after 2 years after baseline in favour of the brief intervention group versus the multidisciplinary intervention group. Conclusion The study indicated that sequence analysis could provide added analytical value as a supplement to traditional regression analysis in prospective studies of RTW among patients with LBP. PMID:28729315
TRAP: automated classification, quantification and annotation of tandemly repeated sequences.

PubMed

Sobreira, Tiago José P; Durham, Alan M; Gruber, Arthur

2006-02-01

TRAP, the Tandem Repeats Analysis Program, is a Perl program that provides a unified set of analyses for the selection, classification, quantification and automated annotation of tandemly repeated sequences. TRAP uses the results of the Tandem Repeats Finder program to perform a global analysis of the satellite content of DNA sequences, permitting researchers to easily assess the tandem repeat content for both individual sequences and whole genomes. The results can be generated in convenient formats such as HTML and comma-separated values. TRAP can also be used to automatically generate annotation data in the format of feature table and GFF files.
EUGENE'HOM: A generic similarity-based gene finder using multiple homologous sequences.

PubMed

Foissac, Sylvain; Bardou, Philippe; Moisan, Annick; Cros, Marie-Josée; Schiex, Thomas

2003-07-01

EUGENE'HOM is a gene prediction software for eukaryotic organisms based on comparative analysis. EUGENE'HOM is able to take into account multiple homologous sequences from more or less closely related organisms. It integrates the results of TBLASTX analysis, splice site and start codon prediction and a robust coding/non-coding probabilistic model which allows EUGENE'HOM to handle sequences from a variety of organisms. The current target of EUGENE'HOM is plant sequences. The EUGENE'HOM web site is available at http://genopole.toulouse.inra.fr/bioinfo/eugene/EuGeneHom/cgi-bin/EuGeneHom.pl.
Instances of erroneous DNA barcoding of metazoan invertebrates: Are universal cox1 gene primers too "universal"?

PubMed

Mioduchowska, Monika; Czyż, Michał Jan; Gołdyn, Bartłomiej; Kur, Jarosław; Sell, Jerzy

2018-01-01

The cytochrome c oxidase subunit I (cox1) gene is the main mitochondrial molecular marker playing a pivotal role in phylogenetic research and is a crucial barcode sequence. Folmer's "universal" primers designed to amplify this gene in metazoan invertebrates allowed quick and easy barcode and phylogenetic analysis. On the other hand, the increase in the number of studies on barcoding leads to more frequent publishing of incorrect sequences, due to amplification of non-target taxa, and insufficient analysis of the obtained sequences. Consequently, some sequences deposited in genetic databases are incorrectly described as obtained from invertebrates, while being in fact bacterial sequences. In our study, in which we used Folmer's primers to amplify COI sequences of the crustacean fairy shrimp Branchipus schaefferi (Fischer 1834), we also obtained COI sequences of microbial contaminants from Aeromonas sp. However, when we searched the GenBank database for sequences closely matching these contaminations we found entries described as representatives of Gastrotricha and Mollusca. When these entries were compared with other sequences bearing the same names in the database, the genetic distance between the incorrect and correct sequences amplified from the same species was c.a. 65%. Although the responsibility for the correct molecular identification of species rests on researchers, the errors found in already published sequences data have not been re-evaluated so far. On the basis of the standard sampling technique we have estimated with 95% probability that the chances of finding incorrectly described metazoan sequences in the GenBank depend on the systematic group, and variety from less than 1% (Mollusca and Arthropoda) up to 6.9% (Gastrotricha). Consequently, the increasing popularity of DNA barcoding and metabarcoding analysis may lead to overestimation of species diversity. Finally, the study also discusses the sources of the problems with amplification of non-target sequences.

HLA genotyping by next-generation sequencing of complementary DNA.

PubMed

Segawa, Hidenobu; Kukita, Yoji; Kato, Kikuya

2017-11-28

Genotyping of the human leucocyte antigen (HLA) is indispensable for various medical treatments. However, unambiguous genotyping is technically challenging due to high polymorphism of the corresponding genomic region. Next-generation sequencing is changing the landscape of genotyping. In addition to high throughput of data, its additional advantage is that DNA templates are derived from single molecules, which is a strong merit for the phasing problem. Although most currently developed technologies use genomic DNA, use of cDNA could enable genotyping with reduced costs in data production and analysis. We thus developed an HLA genotyping system based on next-generation sequencing of cDNA. Each HLA gene was divided into 3 or 4 target regions subjected to PCR amplification and subsequent sequencing with Ion Torrent PGM. The sequence data were then subjected to an automated analysis. The principle of the analysis was to construct candidate sequences generated from all possible combinations of variable bases and arrange them in decreasing order of the number of reads. Upon collecting candidate sequences from all target regions, 2 haplotypes were usually assigned. Cases not assigned 2 haplotypes were forwarded to 4 additional processes: selection of candidate sequences applying more stringent criteria, removal of artificial haplotypes, selection of candidate sequences with a relaxed threshold for sequence matching, and countermeasure for incomplete sequences in the HLA database. The genotyping system was evaluated using 30 samples; the overall accuracy was 97.0% at the field 3 level and 98.3% at the G group level. With one sample, genotyping of DPB1 was not completed due to short read size. We then developed a method for complete sequencing of individual molecules of the DPB1 gene, using the molecular barcode technology. The performance of the automatic genotyping system was comparable to that of systems developed in previous studies. Thus, next-generation sequencing of cDNA is a viable option for HLA genotyping.
ANCAC: amino acid, nucleotide, and codon analysis of COGs--a tool for sequence bias analysis in microbial orthologs.

PubMed

Meiler, Arno; Klinger, Claudia; Kaufmann, Michael

2012-09-08

The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC's NUCOCOG dataset as the largest one available for that purpose thus far. Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills.
ANCAC: amino acid, nucleotide, and codon analysis of COGs – a tool for sequence bias analysis in microbial orthologs

PubMed Central

2012-01-01

Background The COG database is the most popular collection of orthologous proteins from many different completely sequenced microbial genomes. Per definition, a cluster of orthologous groups (COG) within this database exclusively contains proteins that most likely achieve the same cellular function. Recently, the COG database was extended by assigning to every protein both the corresponding amino acid and its encoding nucleotide sequence resulting in the NUCOCOG database. This extended version of the COG database is a valuable resource connecting sequence features with the functionality of the respective proteins. Results Here we present ANCAC, a web tool and MySQL database for the analysis of amino acid, nucleotide, and codon frequencies in COGs on the basis of freely definable phylogenetic patterns. We demonstrate the usefulness of ANCAC by analyzing amino acid frequencies, codon usage, and GC-content in a species- or function-specific context. With respect to amino acids we, at least in part, confirm the cognate bias hypothesis by using ANCAC’s NUCOCOG dataset as the largest one available for that purpose thus far. Conclusions Using the NUCOCOG datasets, ANCAC connects taxonomic, amino acid, and nucleotide sequence information with the functional classification via COGs and provides a GUI for flexible mining for sequence-bias. Thereby, to our knowledge, it is the only tool for the analysis of sequence composition in the light of physiological roles and phylogenetic context without requirement of substantial programming-skills. PMID:22958836
Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome

PubMed Central

Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

2014-01-01

Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield. PMID:25333064
Screening for single nucleotide variants, small indels and exon deletions with a next-generation sequencing based gene panel approach for Usher syndrome.

PubMed

Krawitz, Peter M; Schiska, Daniela; Krüger, Ulrike; Appelt, Sandra; Heinrich, Verena; Parkhomchuk, Dmitri; Timmermann, Bernd; Millan, Jose M; Robinson, Peter N; Mundlos, Stefan; Hecht, Jochen; Gross, Manfred

2014-09-01

Usher syndrome is an autosomal recessive disorder characterized both by deafness and blindness. For the three clinical subtypes of Usher syndrome causal mutations in altogether 12 genes and a modifier gene have been identified. Due to the genetic heterogeneity of Usher syndrome, the molecular analysis is predestined for a comprehensive and parallelized analysis of all known genes by next-generation sequencing (NGS) approaches. We describe here the targeted enrichment and deep sequencing for exons of Usher genes and compare the costs and workload of this approach compared to Sanger sequencing. We also present a bioinformatics analysis pipeline that allows us to detect single-nucleotide variants, short insertions and deletions, as well as copy number variations of one or more exons on the same sequence data. Additionally, we present a flexible in silico gene panel for the analysis of sequence variants, in which newly identified genes can easily be included. We applied this approach to a cohort of 44 Usher patients and detected biallelic pathogenic mutations in 35 individuals and monoallelic mutations in eight individuals of our cohort. Thirty-nine of the sequence variants, including two heterozygous deletions comprising several exons of USH2A, have not been reported so far. Our NGS-based approach allowed us to assess single-nucleotide variants, small indels, and whole exon deletions in a single test. The described diagnostic approach is fast and cost-effective with a high molecular diagnostic yield.
SPAR: small RNA-seq portal for analysis of sequencing experiments.

PubMed

Kuksa, Pavel P; Amlie-Wolf, Alexandre; Katanic, Živadin; Valladares, Otto; Wang, Li-San; Leung, Yuk Yee

2018-05-04

The introduction of new high-throughput small RNA sequencing protocols that generate large-scale genomics datasets along with increasing evidence of the significant regulatory roles of small non-coding RNAs (sncRNAs) have highlighted the urgent need for tools to analyze and interpret large amounts of small RNA sequencing data. However, it remains challenging to systematically and comprehensively discover and characterize sncRNA genes and specifically-processed sncRNA products from these datasets. To fill this gap, we present Small RNA-seq Portal for Analysis of sequencing expeRiments (SPAR), a user-friendly web server for interactive processing, analysis, annotation and visualization of small RNA sequencing data. SPAR supports sequencing data generated from various experimental protocols, including smRNA-seq, short total RNA sequencing, microRNA-seq, and single-cell small RNA-seq. Additionally, SPAR includes publicly available reference sncRNA datasets from our DASHR database and from ENCODE across 185 human tissues and cell types to produce highly informative small RNA annotations across all major small RNA types and other features such as co-localization with various genomic features, precursor transcript cleavage patterns, and conservation. SPAR allows the user to compare the input experiment against reference ENCODE/DASHR datasets. SPAR currently supports analyses of human (hg19, hg38) and mouse (mm10) sequencing data. SPAR is freely available at https://www.lisanwanglab.org/SPAR.
SACCHARIS: an automated pipeline to streamline discovery of carbohydrate active enzyme activities within polyspecific families and de novo sequence datasets.

PubMed

Jones, Darryl R; Thomas, Dallas; Alger, Nicholas; Ghavidel, Ata; Inglis, G Douglas; Abbott, D Wade

2018-01-01

Deposition of new genetic sequences in online databases is expanding at an unprecedented rate. As a result, sequence identification continues to outpace functional characterization of carbohydrate active enzymes (CAZymes). In this paradigm, the discovery of enzymes with novel functions is often hindered by high volumes of uncharacterized sequences particularly when the enzyme sequence belongs to a family that exhibits diverse functional specificities (i.e., polyspecificity). Therefore, to direct sequence-based discovery and characterization of new enzyme activities we have developed an automated in silico pipeline entitled: Sequence Analysis and Clustering of CarboHydrate Active enzymes for Rapid Informed prediction of Specificity (SACCHARIS). This pipeline streamlines the selection of uncharacterized sequences for discovery of new CAZyme or CBM specificity from families currently maintained on the CAZy website or within user-defined datasets. SACCHARIS was used to generate a phylogenetic tree of a GH43, a CAZyme family with defined subfamily designations. This analysis confirmed that large datasets can be organized into sequence clusters of manageable sizes that possess related functions. Seeding this tree with a GH43 sequence from Bacteroides dorei DSM 17855 (BdGH43b, revealed it partitioned as a single sequence within the tree. This pattern was consistent with it possessing a unique enzyme activity for GH43 as BdGH43b is the first described α-glucanase described for this family. The capacity of SACCHARIS to extract and cluster characterized carbohydrate binding module sequences was demonstrated using family 6 CBMs (i.e., CBM6s). This CBM family displays a polyspecific ligand binding profile and contains many structurally determined members. Using SACCHARIS to identify a cluster of divergent sequences, a CBM6 sequence from a unique clade was demonstrated to bind yeast mannan, which represents the first description of an α-mannan binding CBM. Additionally, we have performed a CAZome analysis of an in-house sequenced bacterial genome and a comparative analysis of B. thetaiotaomicron VPI-5482 and B. thetaiotaomicron 7330, to demonstrate that SACCHARIS can generate "CAZome fingerprints", which differentiate between the saccharolytic potential of two related strains in silico. Establishing sequence-function and sequence-structure relationships in polyspecific CAZyme families are promising approaches for streamlining enzyme discovery. SACCHARIS facilitates this process by embedding CAZyme and CBM family trees generated from biochemically to structurally characterized sequences, with protein sequences that have unknown functions. In addition, these trees can be integrated with user-defined datasets (e.g., genomics, metagenomics, and transcriptomics) to inform experimental characterization of new CAZymes or CBMs not currently curated, and for researchers to compare differential sequence patterns between entire CAZomes. In this light, SACCHARIS provides an in silico tool that can be tailored for enzyme bioprospecting in datasets of increasing complexity and for diverse applications in glycobiotechnology.
Sequence analysis by iterated maps, a review.

PubMed

Almeida, Jonas S

2014-05-01

Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.
Mutation Analysis in Classical Phenylketonuria Patients Followed by Detecting Haplotypes Linked to Some PAH Mutations.

PubMed

Dehghanian, Fatemeh; Silawi, Mohammad; Tabei, Seyed M B

2017-02-01

Deficiency of phenylalanine hydroxylase (PAH) enzyme and elevation of phenylalanine in body fluids cause phenylketonuria (PKU). The gold standard for confirming PKU and PAH deficiency is detecting causal mutations by direct sequencing of the coding exons and splicing involved sequences of the PAH gene. Furthermore, haplotype analysis could be considered as an auxiliary approach for detecting PKU causative mutations before direct sequencing of the PAH gene by making comparisons between prior detected mutation linked-haplotypes and new PKU case haplotypes with undetermined mutations. In this study, 13 unrelated classical PKU patients took part in the study detecting causative mutations. Mutations were identified by polymerase chain reaction (PCR) and direct sequencing in all patients. After that, haplotype analysis was performed by studying VNTR and PAHSTR markers (linked genetic markers of the PAH gene) through application of PCR and capillary electrophoresis (CE). Mutation analysis was performed successfully and the detected mutations were as follows: c.782G>A, c.754C>T, c.842C>G, c.113-115delTCT, c.688G>A, and c.696A>G. Additionally, PAHSTR/VNTR haplotypes were detected to discover haplotypes linked to each mutation. Mutation detection is the best approach for confirming PAH enzyme deficiency in PKU patients. Due to the relatively large size of the PAH gene and high cost of the direct sequencing in developing countries, haplotype analysis could be used before DNA sequencing and mutation detection for a faster and cheaper way via identifying probable mutated exons.
Using information content and base frequencies to distinguish mutations from genetic polymorphisms in splice junction recognition sites.

PubMed

Rogan, P K; Schneider, T D

1995-01-01

Predicting the effects of nucleotide substitutions in human splice sites has been based on analysis of consensus sequences. We used a graphic representation of sequence conservation and base frequency, the sequence logo, to demonstrate that a change in a splice acceptor of hMSH2 (a gene associated with familial nonpolyposis colon cancer) probably does not reduce splicing efficiency. This confirms a population genetic study that suggested that this substitution is a genetic polymorphism. The information theory-based sequence logo is quantitative and more sensitive than the corresponding splice acceptor consensus sequence for detection of true mutations. Information analysis may potentially be used to distinguish polymorphisms from mutations in other types of transcriptional, translational, or protein-coding motifs.
RNA-Seq for Bacterial Gene Expression.

PubMed

Poulsen, Line Dahl; Vinther, Jeppe

2018-06-01

RNA sequencing (RNA-seq) has become the preferred method for global quantification of bacterial gene expression. With the continued improvements in sequencing technology and data analysis tools, the most labor-intensive and expensive part of an RNA-seq experiment is the preparation of sequencing libraries, which is also essential for the quality of the data obtained. Here, we present a straightforward and inexpensive basic protocol for preparation of strand-specific RNA-seq libraries from bacterial RNA as well as a computational pipeline for the data analysis of sequencing reads. The protocol is based on the Illumina platform and allows easy multiplexing of samples and the removal of sequencing reads that are PCR duplicates. © 2018 by John Wiley & Sons, Inc. © 2018 John Wiley & Sons, Inc.
Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing

PubMed Central

2012-01-01

Background RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Results Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. Conclusions This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates. PMID:22985019
Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

PubMed

Robles, José A; Qureshi, Sumaira E; Stephen, Stuart J; Wilson, Susan R; Burden, Conrad J; Taylor, Jennifer M

2012-09-17

RNA sequencing (RNA-Seq) has emerged as a powerful approach for the detection of differential gene expression with both high-throughput and high resolution capabilities possible depending upon the experimental design chosen. Multiplex experimental designs are now readily available, these can be utilised to increase the numbers of samples or replicates profiled at the cost of decreased sequencing depth generated per sample. These strategies impact on the power of the approach to accurately identify differential expression. This study presents a detailed analysis of the power to detect differential expression in a range of scenarios including simulated null and differential expression distributions with varying numbers of biological or technical replicates, sequencing depths and analysis methods. Differential and non-differential expression datasets were simulated using a combination of negative binomial and exponential distributions derived from real RNA-Seq data. These datasets were used to evaluate the performance of three commonly used differential expression analysis algorithms and to quantify the changes in power with respect to true and false positive rates when simulating variations in sequencing depth, biological replication and multiplex experimental design choices. This work quantitatively explores comparisons between contemporary analysis tools and experimental design choices for the detection of differential expression using RNA-Seq. We found that the DESeq algorithm performs more conservatively than edgeR and NBPSeq. With regard to testing of various experimental designs, this work strongly suggests that greater power is gained through the use of biological replicates relative to library (technical) replicates and sequencing depth. Strikingly, sequencing depth could be reduced as low as 15% without substantial impacts on false positive or true positive rates.
Analysis of whole genome sequences of 16 strains of rubella virus from the United States, 1961-2009.

PubMed

Abernathy, Emily; Chen, Min-hsin; Bera, Jayati; Shrivastava, Susmita; Kirkness, Ewen; Zheng, Qi; Bellini, William; Icenogle, Joseph

2013-01-25

Rubella virus is the causative agent of rubella, a mild rash illness, and a potent teratogenic agent when contracted by a pregnant woman. Global rubella control programs target the reduction and elimination of congenital rubella syndrome. Phylogenetic analysis of partial sequences of rubella viruses has contributed to virus surveillance efforts and played an important role in demonstrating that indigenous rubella viruses have been eliminated in the United States. Sixteen wild-type rubella viruses were chosen for whole genome sequencing. All 16 viruses were collected in the United States from 1961 to 2009 and are from 8 of the 13 known rubella genotypes. Phylogenetic analysis of 30 whole genome sequences produced a maximum likelihood tree giving high bootstrap values for all genotypes except provisional genotype 1a. Comparison of the 16 new complete sequences and 14 previously sequenced wild-type viruses found regions with clusters of variable amino acids. The 5' 250 nucleotides of the genome are more conserved than any other part of the genome. Genotype specific deletions in the untranslated region between the non-structural and structural open reading frames were observed for genotypes 2B and genotype 1G. No evidence was seen for recombination events among the 30 viruses. The analysis presented here is consistent with previous reports on the genetic characterization of rubella virus genomes. Conserved and variable regions were identified and additional evidence for genotype specific nucleotide deletions in the intergenic region was found. Phylogenetic analysis confirmed genotype groupings originally based on structural protein coding region sequences, which provides support for the WHO nomenclature for genetic characterization of wild-type rubella viruses.
Plasmid Flux in Escherichia coli ST131 Sublineages, Analyzed by Plasmid Constellation Network (PLACNET), a New Method for Plasmid Reconstruction from Whole Genome Sequences

PubMed Central

Garcillán-Barcia, M. Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M.; de la Cruz, Fernando

2014-01-01

Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ–proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages. PMID:25522143
Plasmid flux in Escherichia coli ST131 sublineages, analyzed by plasmid constellation network (PLACNET), a new method for plasmid reconstruction from whole genome sequences.

PubMed

Lanza, Val F; de Toro, María; Garcillán-Barcia, M Pilar; Mora, Azucena; Blanco, Jorge; Coque, Teresa M; de la Cruz, Fernando

2014-12-01

Bacterial whole genome sequence (WGS) methods are rapidly overtaking classical sequence analysis. Many bacterial sequencing projects focus on mobilome changes, since macroevolutionary events, such as the acquisition or loss of mobile genetic elements, mainly plasmids, play essential roles in adaptive evolution. Existing WGS analysis protocols do not assort contigs between plasmids and the main chromosome, thus hampering full analysis of plasmid sequences. We developed a method (called plasmid constellation networks or PLACNET) that identifies, visualizes and analyzes plasmids in WGS projects by creating a network of contig interactions, thus allowing comprehensive plasmid analysis within WGS datasets. The workflow of the method is based on three types of data: assembly information (including scaffold links and coverage), comparison to reference sequences and plasmid-diagnostic sequence features. The resulting network is pruned by expert analysis, to eliminate confounding data, and implemented in a Cytoscape-based graphic representation. To demonstrate PLACNET sensitivity and efficacy, the plasmidome of the Escherichia coli lineage ST131 was analyzed. ST131 is a globally spread clonal group of extraintestinal pathogenic E. coli (ExPEC), comprising different sublineages with ability to acquire and spread antibiotic resistance and virulence genes via plasmids. Results show that plasmids flux in the evolution of this lineage, which is wide open for plasmid exchange. MOBF12/IncF plasmids were pervasive, adding just by themselves more than 350 protein families to the ST131 pangenome. Nearly 50% of the most frequent γ-proteobacterial plasmid groups were found to be present in our limited sample of ten analyzed ST131 genomes, which represent the main ST131 sublineages.
ReadXplorer—visualization and analysis of mapped sequences

PubMed Central

Hilker, Rolf; Stadermann, Kai Bernd; Doppmeier, Daniel; Kalinowski, Jörn; Stoye, Jens; Straube, Jasmin; Winnebald, Jörn; Goesmann, Alexander

2014-01-01

Motivation: Fast algorithms and well-arranged visualizations are required for the comprehensive analysis of the ever-growing size of genomic and transcriptomic next-generation sequencing data. Results: ReadXplorer is a software offering straightforward visualization and extensive analysis functions for genomic and transcriptomic DNA sequences mapped on a reference. A unique specialty of ReadXplorer is the quality classification of the read mappings. It is incorporated in all analysis functions and displayed in ReadXplorer's various synchronized data viewers for (i) the reference sequence, its base coverage as (ii) normalizable plot and (iii) histogram, (iv) read alignments and (v) read pairs. ReadXplorer's analysis capability covers RNA secondary structure prediction, single nucleotide polymorphism and deletion–insertion polymorphism detection, genomic feature and general coverage analysis. Especially for RNA-Seq data, it offers differential gene expression analysis, transcription start site and operon detection as well as RPKM value and read count calculations. Furthermore, ReadXplorer can combine or superimpose coverage of different datasets. Availability and implementation: ReadXplorer is available as open-source software at http://www.readxplorer.org along with a detailed manual. Contact: rhilker@mikrobio.med.uni-giessen.de Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24790157
In silico Analysis of 2085 Clones from a Normalized Rat Vestibular Periphery 3′ cDNA Library

PubMed Central

Roche, Joseph P.; Cioffi, Joseph A.; Kwitek, Anne E.; Erbe, Christy B.; Popper, Paul

2005-01-01

The inserts from 2400 cDNA clones isolated from a normalized Rattus norvegicus vestibular periphery cDNA library were sequenced and characterized. The Wackym-Soares vestibular 3′ cDNA library was constructed from the saccular and utricular maculae, the ampullae of all three semicircular canals and Scarpa's ganglia containing the somata of the primary afferent neurons, microdissected from 104 male and female rats. The inserts from 2400 randomly selected clones were sequenced from the 5′ end. Each sequence was analyzed using the BLAST algorithm compared to the Genbank nonredundant, rat genome, mouse genome and human genome databases to search for high homology alignments. Of the initial 2400 clones, 315 (13%) were found to be of poor quality and did not yield useful information, and therefore were eliminated from the analysis. Of the remaining 2085 sequences, 918 (44%) were found to represent 758 unique genes having useful annotations that were identified in databases within the public domain or in the published literature; these sequences were designated as known characterized sequences. 1141 sequences (55%) aligned with 1011 unique sequences had no useful annotations and were designated as known but uncharacterized sequences. Of the remaining 26 sequences (1%), 24 aligned with rat genomic sequences, but none matched previously described rat expressed sequence tags or mRNAs. No significant alignment to the rat or human genomic sequences could be found for the remaining 2 sequences. Of the 2085 sequences analyzed, 86% were singletons. The known, characterized sequences were analyzed with the FatiGO online data-mining tool (http://fatigo.bioinfo.cnio.es/) to identify level 5 biological process gene ontology (GO) terms for each alignment and to group alignments with similar or identical GO terms. Numerous genes were identified that have not been previously shown to be expressed in the vestibular system. Further characterization of the novel cDNA sequences may lead to the identification of genes with vestibular-specific functions. Continued analysis of the rat vestibular periphery transcriptome should provide new insights into vestibular function and generate new hypotheses. Physiological studies are necessary to further elucidate the roles of the identified genes and novel sequences in vestibular function. PMID:16103642
Chromosome arm-specific BAC end sequences permit comparative analysis of homoeologous chromosomes and genomes of polyploid wheat

PubMed Central

2012-01-01

Background Bread wheat, one of the world’s staple food crops, has the largest, highly repetitive and polyploid genome among the cereal crops. The wheat genome holds the key to crop genetic improvement against challenges such as climate change, environmental degradation, and water scarcity. To unravel the complex wheat genome, the International Wheat Genome Sequencing Consortium (IWGSC) is pursuing a chromosome- and chromosome arm-based approach to physical mapping and sequencing. Here we report on the use of a BAC library made from flow-sorted telosomic chromosome 3A short arm (t3AS) for marker development and analysis of sequence composition and comparative evolution of homoeologous genomes of hexaploid wheat. Results The end-sequencing of 9,984 random BACs from a chromosome arm 3AS-specific library (TaaCsp3AShA) generated 11,014,359 bp of high quality sequence from 17,591 BAC-ends with an average length of 626 bp. The sequence represents 3.2% of t3AS with an average DNA sequence read every 19 kb. Overall, 79% of the sequence consisted of repetitive elements, 1.38% as coding regions (estimated 2,850 genes) and another 19% of unknown origin. Comparative sequence analysis suggested that 70-77% of the genes present in both 3A and 3B were syntenic with model species. Among the transposable elements, gypsy/sabrina (12.4%) was the most abundant repeat and was significantly more frequent in 3A compared to homoeologous chromosome 3B. Twenty novel repetitive sequences were also identified using de novo repeat identification. BESs were screened to identify simple sequence repeats (SSR) and transposable element junctions. A total of 1,057 SSRs were identified with a density of one per 10.4 kb, and 7,928 junctions between transposable elements (TE) and other sequences were identified with a density of one per 1.39 kb. With the objective of enhancing the marker density of chromosome 3AS, oligonucleotide primers were successfully designed from 758 SSRs and 695 Insertion Site Based Polymorphisms (ISBPs). Of the 96 ISBP primer pairs tested, 28 (29%) were 3A-specific and compared to 17 (18%) for 96 SSRs. Conclusion This work reports on the use of wheat chromosome arm 3AS-specific BAC library for the targeted generation of sequence data from a particular region of the huge genome of wheat. A large quantity of sequences were generated from the A genome of hexaploid wheat for comparative genome analysis with homoeologous B and D genomes and other model grass genomes. Hundreds of molecular markers were developed from the 3AS arm-specific sequences; these and other sequences will be useful in gene discovery and physical mapping. PMID:22559868
HTSstation: a web application and open-access libraries for high-throughput sequencing data analysis.

PubMed

David, Fabrice P A; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

2014-01-01

The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch.

HTSstation: A Web Application and Open-Access Libraries for High-Throughput Sequencing Data Analysis

PubMed Central

David, Fabrice P. A.; Delafontaine, Julien; Carat, Solenne; Ross, Frederick J.; Lefebvre, Gregory; Jarosz, Yohan; Sinclair, Lucas; Noordermeer, Daan; Rougemont, Jacques; Leleu, Marion

2014-01-01

The HTSstation analysis portal is a suite of simple web forms coupled to modular analysis pipelines for various applications of High-Throughput Sequencing including ChIP-seq, RNA-seq, 4C-seq and re-sequencing. HTSstation offers biologists the possibility to rapidly investigate their HTS data using an intuitive web application with heuristically pre-defined parameters. A number of open-source software components have been implemented and can be used to build, configure and run HTS analysis pipelines reactively. Besides, our programming framework empowers developers with the possibility to design their own workflows and integrate additional third-party software. The HTSstation web application is accessible at http://htsstation.epfl.ch. PMID:24475057
Analysis of 16S-23S intergenic spacer regions of the rRNA operons in Edwardsiella ictaluri and Edwardsiella tarda isolates from fish.

PubMed

Panangala, V S; van Santen, V L; Shoemaker, C A; Klesius, P H

2005-01-01

To analyse interspecies and intraspecies differences based on the 16S-23S rRNA intergenic spacer region (ISR) sequences of the fish pathogens Edwardsiella ictaluri and Edwardsiella tarda. The 16S-23S rRNA spacer regions of 19 Edw. ictaluri and four Edw. tarda isolates from four geographical regions were amplified by PCR with primers complementary to conserved sequences within the flanking 16S-23S rRNA coding sequences. Two products were generated from all isolates, without interspecies or intraspecific size polymorphisms. Sequence analysis of the amplified fragments revealed a smaller ISR of 350 bp, which contained a gene for tRNA(Glu), and a larger ISR of 441 bp, which contained genes for tRNA(Ile) and tRNA(Ala). The sequences of the smaller ISR of different Edw. ictaluri isolates were essentially identical to each other. Partial sequences of larger ISR from several Edw. ictaluri isolates also revealed no differences from the one complete Edw. ictaluri large ISR sequence obtained. The sequences of the smaller ISR of Edw. tarda were 97% identical to the Edw. ictaluri smaller ISR and the larger ISR were 96-98% identical to the Edw. ictaluri larger ISR sequence. The Edw. tarda isolates displayed limited ISR sequence heterogeneity, with > or =97% sequence identity among isolates for both small and large ISR. There is a high degree of size and sequence similarity of 16S-23S ISR both among isolates within Edw. ictaluri and Edw. tarda species and between the two species. Our results confirm a close genetic relationship between Edw. ictaluri and Edw. tarda and the relative homogeneity of Edw. ictaluri isolates compared with Edw. tarda isolates. Because no differences were found in ISR sequences among Edw. ictaluri isolates, sequence analysis of the ISR will not be useful to distinguish isolates of Edw. ictaluri. However, we identified restriction sites that differ between ISR sequences of Edw. ictaluri and Edw. tarda, which will be useful in distinguishing the two species.
Data Analysis of Sequences and qPCR for Microbial Communities during Algal Blooms

EPA Pesticide Factsheets

A training opportunity is open to a highly microbial-research-motivated student to conduct sequence analysis, explore novel genes and metabolic pathways, validate resultant findings using qPCR/RT-qPCR and summarize the findings
Memory Efficient Sequence Analysis Using Compressed Data Structures (Metagenomics Informatics Challenges Workshop: 10K Genomes at a Time)

ScienceCinema

Simpson, Jared

2018-01-24

Wellcome Trust Sanger Institute's Jared Simpson on Memory efficient sequence analysis using compressed data structures at the Metagenomics Informatics Challenges Workshop held at the DOE JGI on October 12-13, 2011.
Interim reliability-evaluation program: analysis of the Browns Ferry, Unit 1, nuclear plant. Appendix C - sequence quantification

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mays, S.E.; Poloski, J.P.; Sullivan, W.H.

1982-07-01

This report describes a risk study of the Browns Ferry, Unit 1, nuclear plant. The study is one of four such studies sponsored by the NRC Office of Research, Division of Risk Assessment, as part of its Interim Reliability Evaluation Program (IREP), Phase II. This report is contained in four volumes: a main report and three appendixes. Appendix C generally describes the methods used to estimate accident sequence frequency values. Information is presented concerning the approach, example collection, failure data, candidate dominant sequences, uncertainty analysis, and sensitivity analysis.
Functional organization of a single nif cluster in the mesophilic archaeon Methanosarcina mazei strain Gö1

PubMed Central

Ehlers, Claudia; Veit, Katharina; Gottschalk, Gerhard; Schmitz, Ruth A.

2002-01-01

The mesophilic methanogenic archaeon Methanosarcina mazei strain Gö1 is able to utilize molecular nitrogen (N2) as its sole nitrogen source. We have identified and characterized a single nitrogen fixation (nif) gene cluster in M. mazei Gö1 with an approximate length of 9 kbp. Sequence analysis revealed seven genes with sequence similarities to nifH, nifI1, nifI2, nifD, nifK, nifE and nifN, similar to other diazotrophic methanogens and certain bacteria such as Clostridium acetobutylicum, with the two glnB-like genes (nifI1 and nifI2) located between nifH and nifD. Phylogenetic analysis of deduced amino acid sequences for the nitrogenase structural genes of M. mazei Gö1 showed that they are most closely related to Methanosarcina barkeri nif2 genes, and also closely resemble those for the corresponding nif products of the gram-positive bacterium C. acetobutylicum. Northern blot analysis and reverse transcription PCR analysis demonstrated that the M. mazei nif genes constitute an operon transcribed only under nitrogen starvation as a single 8 kb transcript. Sequence analysis revealed a palindromic sequence at the transcriptional start site in front of the M. mazei nifH gene, which may have a function in transcriptional regulation of the nif operon. PMID:15803652
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus

PubMed Central

Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations. PMID:26180540
Evolutionary Influenced Interaction Pattern as Indicator for the Investigation of Natural Variants Causing Nephrogenic Diabetes Insipidus.

PubMed

Grunert, Steffen; Labudde, Dirk

2015-01-01

The importance of short membrane sequence motifs has been shown in many works and emphasizes the related sequence motif analysis. Together with specific transmembrane helix-helix interactions, the analysis of interacting sequence parts is helpful for understanding the process during membrane protein folding and in retaining the three-dimensional fold. Here we present a simple high-throughput analysis method for deriving mutational information of interacting sequence parts. Applied on aquaporin water channel proteins, our approach supports the analysis of mutational variants within different interacting subsequences and finally the investigation of natural variants which cause diseases like, for example, nephrogenic diabetes insipidus. In this work we demonstrate a simple method for massive membrane protein data analysis. As shown, the presented in silico analyses provide information about interacting sequence parts which are constrained by protein evolution. We present a simple graphical visualization medium for the representation of evolutionary influenced interaction pattern pairs (EIPPs) adapted to mutagen investigations of aquaporin-2, a protein whose mutants are involved in the rare endocrine disorder known as nephrogenic diabetes insipidus, and membrane proteins in general. Furthermore, we present a new method to derive new evolutionary variations within EIPPs which can be used for further mutagen laboratory investigations.
MinION Analysis and Reference Consortium: Phase 1 data release and analysis

PubMed Central

Eccles, David A.; Zalunin, Vadim; Urban, John M.; Piazza, Paolo; Bowden, Rory J.; Paten, Benedict; Mwaigwisya, Solomon; Batty, Elizabeth M.; Simpson, Jared T.; Snutch, Terrance P.

2015-01-01

The advent of a miniaturized DNA sequencing device with a high-throughput contextual sequencing capability embodies the next generation of large scale sequencing tools. The MinION™ Access Programme (MAP) was initiated by Oxford Nanopore Technologies™ in April 2014, giving public access to their USB-attached miniature sequencing device. The MinION Analysis and Reference Consortium (MARC) was formed by a subset of MAP participants, with the aim of evaluating and providing standard protocols and reference data to the community. Envisaged as a multi-phased project, this study provides the global community with the Phase 1 data from MARC, where the reproducibility of the performance of the MinION was evaluated at multiple sites. Five laboratories on two continents generated data using a control strain of Escherichia coli K-12, preparing and sequencing samples according to a revised ONT protocol. Here, we provide the details of the protocol used, along with a preliminary analysis of the characteristics of typical runs including the consistency, rate, volume and quality of data produced. Further analysis of the Phase 1 data presented here, and additional experiments in Phase 2 of E. coli from MARC are already underway to identify ways to improve and enhance MinION performance. PMID:26834992
A sequential analysis of classroom discourse in Italian primary schools: the many faces of the IRF pattern.

PubMed

Molinari, Luisa; Mameli, Consuelo; Gnisci, Augusto

2013-09-01

A sequential analysis of classroom discourse is needed to investigate the conditions under which the triadic initiation-response-feedback (IRF) pattern may host different teaching orientations. The purpose of the study is twofold: first, to describe the characteristics of classroom discourse and, second, to identify and explore the different interactive sequences that can be captured with a sequential statistical analysis. Twelve whole-class activities were video recorded in three Italian primary schools. We observed classroom interaction as it occurs naturally on an everyday basis. In total, we collected 587 min of video recordings. Subsequently, 828 triadic IRF patterns were extracted from this material and analysed with the programme Generalized Sequential Query (GSEQ). The results indicate that classroom discourse may unfold in different ways. In particular, we identified and described four types of sequences. Dialogic sequences were triggered by authentic questions, and continued through further relaunches. Monologic sequences were directed to fulfil the teachers' pre-determined didactic purposes. Co-constructive sequences fostered deduction, reasoning, and thinking. Scaffolding sequences helped and sustained children with difficulties. The application of sequential analyses allowed us to show that interactive sequences may account for a variety of meanings, thus making a significant contribution to the literature and research practice in classroom discourse. © 2012 The British Psychological Society.
DendroBLAST: approximate phylogenetic trees in the absence of multiple sequence alignments.

PubMed

Kelly, Steven; Maini, Philip K

2013-01-01

The rapidly growing availability of genome information has created considerable demand for both fast and accurate phylogenetic inference algorithms. We present a novel method called DendroBLAST for reconstructing phylogenetic dendrograms/trees from protein sequences using BLAST. This method differs from other methods by incorporating a simple model of sequence evolution to test the effect of introducing sequence changes on the reliability of the bipartitions in the inferred tree. Using realistic simulated sequence data we demonstrate that this method produces phylogenetic trees that are more accurate than other commonly-used distance based methods though not as accurate as maximum likelihood methods from good quality multiple sequence alignments. In addition to tests on simulated data, we use DendroBLAST to generate input trees for a supertree reconstruction of the phylogeny of the Archaea. This independent analysis produces an approximate phylogeny of the Archaea that has both high precision and recall when compared to previously published analysis of the same dataset using conventional methods. Taken together these results demonstrate that approximate phylogenetic trees can be produced in the absence of multiple sequence alignments, and we propose that these trees will provide a platform for improving and informing downstream bioinformatic analysis. A web implementation of the DendroBLAST method is freely available for use at http://www.dendroblast.com/.
G-quadruplex prediction in E. coli genome reveals a conserved putative G-quadruplex-Hairpin-Duplex switch.

PubMed

Kaplan, Oktay I; Berber, Burak; Hekim, Nezih; Doluca, Osman

2016-11-02

Many studies show that short non-coding sequences are widely conserved among regulatory elements. More and more conserved sequences are being discovered since the development of next generation sequencing technology. A common approach to identify conserved sequences with regulatory roles relies on topological changes such as hairpin formation at the DNA or RNA level. G-quadruplexes, non-canonical nucleic acid topologies with little established biological roles, are increasingly considered for conserved regulatory element discovery. Since the tertiary structure of G-quadruplexes is strongly dependent on the loop sequence which is disregarded by the generally accepted algorithm, we hypothesized that G-quadruplexes with similar topology and, indirectly, similar interaction patterns, can be determined using phylogenetic clustering based on differences in the loop sequences. Phylogenetic analysis of 52 G-quadruplex forming sequences in the Escherichia coli genome revealed two conserved G-quadruplex motifs with a potential regulatory role. Further analysis revealed that both motifs tend to form hairpins and G quadruplexes, as supported by circular dichroism studies. The phylogenetic analysis as described in this work can greatly improve the discovery of functional G-quadruplex structures and may explain unknown regulatory patterns. © The Author(s) 2016. Published by Oxford University Press on behalf of Nucleic Acids Research.
Illumina MiSeq Sequencing for Preliminary Analysis of Microbiome Causing Primary Endodontic Infections in Egypt

PubMed Central

Azab, Marwa Mohamed; Fayyad, Dalia Mukhtar

2018-01-01

The use of high throughput next generation technologies has allowed more comprehensive analysis than traditional Sanger sequencing. The specific aim of this study was to investigate the microbial diversity of primary endodontic infections using Illumina MiSeq sequencing platform in Egyptian patients. Samples were collected from 19 patients in Suez Canal University Hospital (Endodontic Department) using sterile # 15K file and paper points. DNA was extracted using Mo Bio power soil DNA isolation extraction kit followed by PCR amplification and agarose gel electrophoresis. The microbiome was characterized on the basis of the V3 and V4 hypervariable region of the 16S rRNA gene by using paired-end sequencing on Illumina MiSeq device. MOTHUR software was used in sequence filtration and analysis of sequenced data. A total of 1858 operational taxonomic units at 97% similarity were assigned to 26 phyla, 245 families, and 705 genera. Four main phyla Firmicutes, Bacteroidetes, Proteobacteria, and Synergistetes were predominant in all samples. At genus level, Prevotella, Bacillus, Porphyromonas, Streptococcus, and Bacteroides were the most abundant. Illumina MiSeq platform sequencing can be used to investigate oral microbiome composition of endodontic infections. Elucidating the ecology of endodontic infections is a necessary step in developing effective intracanal antimicrobials. PMID:29849646
Exploring the temporal structure of heterochronous sequences using TempEst (formerly Path-O-Gen).

PubMed

Rambaut, Andrew; Lam, Tommy T; Max Carvalho, Luiz; Pybus, Oliver G

2016-01-01

Gene sequences sampled at different points in time can be used to infer molecular phylogenies on a natural timescale of months or years, provided that the sequences in question undergo measurable amounts of evolutionary change between sampling times. Data sets with this property are termed heterochronous and have become increasingly common in several fields of biology, most notably the molecular epidemiology of rapidly evolving viruses. Here we introduce the cross-platform software tool, TempEst (formerly known as Path-O-Gen), for the visualization and analysis of temporally sampled sequence data. Given a molecular phylogeny and the dates of sampling for each sequence, TempEst uses an interactive regression approach to explore the association between genetic divergence through time and sampling dates. TempEst can be used to (1) assess whether there is sufficient temporal signal in the data to proceed with phylogenetic molecular clock analysis, and (2) identify sequences whose genetic divergence and sampling date are incongruent. Examination of the latter can help identify data quality problems, including errors in data annotation, sample contamination, sequence recombination, or alignment error. We recommend that all users of the molecular clock models implemented in BEAST first check their data using TempEst prior to analysis.
Effects of informed consent for individual genome sequencing on relevant knowledge.

PubMed

Kaphingst, K A; Facio, F M; Cheng, M-R; Brooks, S; Eidem, H; Linn, A; Biesecker, B B; Biesecker, L G

2012-11-01

Increasing availability of individual genomic information suggests that patients will need knowledge about genome sequencing to make informed decisions, but prior research is limited. In this study, we examined genome sequencing knowledge before and after informed consent among 311 participants enrolled in the ClinSeq™ sequencing study. An exploratory factor analysis of knowledge items yielded two factors (sequencing limitations knowledge; sequencing benefits knowledge). In multivariable analysis, high pre-consent sequencing limitations knowledge scores were significantly related to education [odds ratio (OR): 8.7, 95% confidence interval (CI): 2.45-31.10 for post-graduate education, and OR: 3.9; 95% CI: 1.05, 14.61 for college degree compared with less than college degree] and race/ethnicity (OR: 2.4, 95% CI: 1.09, 5.38 for non-Hispanic Whites compared with other racial/ethnic groups). Mean values increased significantly between pre- and post-consent for the sequencing limitations knowledge subscale (6.9-7.7, p < 0.0001) and sequencing benefits knowledge subscale (7.0-7.5, p < 0.0001); increase in knowledge did not differ by sociodemographic characteristics. This study highlights gaps in genome sequencing knowledge and underscores the need to target educational efforts toward participants with less education or from minority racial/ethnic groups. The informed consent process improved genome sequencing knowledge. Future studies could examine how genome sequencing knowledge influences informed decision making. © 2012 John Wiley & Sons A/S.
Next-Generation Sequencing of the Chrysanthemum nankingense (Asteraceae) Transcriptome Permits Large-Scale Unigene Assembly and SSR Marker Discovery

PubMed Central

Wang, Haibin; Jiang, Jiafu; Chen, Sumei; Qi, Xiangyu; Peng, Hui; Li, Pirui; Song, Aiping; Guan, Zhiyong; Fang, Weimin; Liao, Yuan; Chen, Fadi

2013-01-01

Background Simple sequence repeats (SSRs) are ubiquitous in eukaryotic genomes. Chrysanthemum is one of the largest genera in the Asteraceae family. Only few Chrysanthemum expressed sequence tag (EST) sequences have been acquired to date, so the number of available EST-SSR markers is very low. Methodology/Principal Findings Illumina paired-end sequencing technology produced over 53 million sequencing reads from C. nankingense mRNA. The subsequent de novo assembly yielded 70,895 unigenes, of which 45,789 (64.59%) unigenes showed similarity to the sequences in NCBI database. Out of 45,789 sequences, 107 have hits to the Chrysanthemum Nr protein database; 679 and 277 sequences have hits to the database of Helianthus and Lactuca species, respectively. MISA software identified a large number of putative EST-SSRs, allowing 1,788 primer pairs to be designed from the de novo transcriptome sequence and a further 363 from archival EST sequence. Among 100 primer pairs randomly chosen, 81 markers have amplicons and 20 are polymorphic for genotypes analysis in Chrysanthemum. The results showed that most (but not all) of the assays were transferable across species and that they exposed a significant amount of allelic diversity. Conclusions/Significance SSR markers acquired by transcriptome sequencing are potentially useful for marker-assisted breeding and genetic analysis in the genus Chrysanthemum and its related genera. PMID:23626799
Isolation of prolactin and growth hormone from the pituitary of the holostean fish Amia calva.

PubMed

Dores, R M; Noso, T; Rand-Weaver, M; Kawauchi, H

1993-06-01

Pituitaries from adult male and female Amia calva (Order Holostei) were acid extracted and fractionated by gel filtration column chromatography and reversed-phase high performance liquid chromatography. This two-step isolation procedure yielded homogeneous pools of Amia prolaction (PRL) and growth hormone (GH). The amino acid composition of both purified polypeptides was determined. Primary sequence analysis of the first 22 positions at the N-terminal of Amia PRL revealed that this region has 63% sequence identity with eel PRL-1. The N-terminal region of Amia PRL lacks the disulfide bridge which is characteristic of tetrapod PRLs. Primary sequence analysis of the first 24 positions at the N-terminal of Amia GH revealed that this region has 62% sequence identity with eel GH and 54% sequence identity with both blue shark GH and sea turtle GH. Based on N-terminal analysis, it appears that Amia PRL and GH are more closely related to teleost PRLs and GHs than they are to tetrapod PRLs and GHs.
Using RSAT oligo-analysis and dyad-analysis tools to discover regulatory signals in nucleic sequences.

PubMed

Defrance, Matthieu; Janky, Rekin's; Sand, Olivier; van Helden, Jacques

2008-01-01

This protocol explains how to discover functional signals in genomic sequences by detecting over- or under-represented oligonucleotides (words) or spaced pairs thereof (dyads) with the Regulatory Sequence Analysis Tools (http://rsat.ulb.ac.be/rsat/). Two typical applications are presented: (i) predicting transcription factor-binding motifs in promoters of coregulated genes and (ii) discovering phylogenetic footprints in promoters of orthologous genes. The steps of this protocol include purging genomic sequences to discard redundant fragments, discovering over-represented patterns and assembling them to obtain degenerate motifs, scanning sequences and drawing feature maps. The main strength of the method is its statistical ground: the binomial significance provides an efficient control on the rate of false positives. In contrast with optimization-based pattern discovery algorithms, the method supports the detection of under- as well as over-represented motifs. Computation times vary from seconds (gene clusters) to minutes (whole genomes). The execution of the whole protocol should take approximately 1 h.
Deep sequencing reveals double mutations in cis of MPL exon 10 in myeloproliferative neoplasms.

PubMed

Pietra, Daniela; Brisci, Angela; Rumi, Elisa; Boggi, Sabrina; Elena, Chiara; Pietrelli, Alessandro; Bordoni, Roberta; Ferrari, Maurizio; Passamonti, Francesco; De Bellis, Gianluca; Cremonesi, Laura; Cazzola, Mario

2011-04-01

Somatic mutations of MPL exon 10, mainly involving a W515 substitution, have been described in JAK2 (V617F)-negative patients with essential thrombocythemia and primary myelofibrosis. We used direct sequencing and high-resolution melt analysis to identify mutations of MPL exon 10 in 570 patients with myeloproliferative neoplasms, and allele specific PCR and deep sequencing to further characterize a subset of mutated patients. Somatic mutations were detected in 33 of 221 patients (15%) with JAK2 (V617F)-negative essential thrombocythemia or primary myelofibrosis. Only one patient with essential thrombocythemia carried both JAK2 (V617F) and MPL (W515L). High-resolution melt analysis identified abnormal patterns in all the MPL mutated cases, while direct sequencing did not detect the mutant MPL in one fifth of them. In 3 cases carrying double MPL mutations, deep sequencing analysis showed identical load and location in cis of the paired lesions, indicating their simultaneous occurrence on the same chromosome.
Complete annotated genome sequence of Mycobacterium tuberculosis (Zopf) Lehmann and Neumann (ATCC35812) (Kurono).

PubMed

Miyoshi-Akiyama, Tohru; Satou, Kazuhito; Kato, Masako; Shiroma, Akino; Matsumura, Kazunori; Tamotsu, Hinako; Iwai, Hiroki; Teruya, Kuniko; Funatogawa, Keiji; Hirano, Takashi; Kirikae, Teruo

2015-01-01

We report the completely annotated genome sequence of Mycobacterium tuberculosis (Zopf) Lehmann and Neumann (ATCC35812) (Kurono), which is a used for virulence and/or immunization studies. The complete genome sequence of M. tuberculosis Kurono was determined with a length of 4,415,078 bp and a G+C content of 65.60%. The chromosome was shown to contain a total of 4,340 protein-coding genes, 53 tRNA genes, one transfer messenger RNA for all amino acids, and 1 rrn operon. Lineage analysis based on large sequence polymorphisms indicated that M. tuberculosis Kurono belongs to the Euro-American lineage (lineage 4). Phylogenetic analysis using whole genome sequences of M. tuberculosis Kurono in addition to 22 M. tuberculosis complex strains indicated that H37Rv is the closest relative of Kurono based on the results of phylogenetic analysis. These findings provide a basis for research using M. tuberculosis Kurono, especially in animal models. Copyright © 2014 Elsevier Ltd. All rights reserved.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.