Poltev, V I; Anisimov, V M; Sanchez, C; Deriabina, A; Gonzalez, E; Garcia, D; Rivas, F; Polteva, N A
2016-01-01
It is generally accepted that the important characteristic features of the Watson-Crick duplex originate from the molecular structure of its subunits. However, it still remains to elucidate what properties of each subunit are responsible for the significant characteristic features of the DNA structure. The computations of desoxydinucleoside monophosphates complexes with Na-ions using density functional theory revealed a pivotal role of DNA conformational properties of single-chain minimal fragments in the development of unique features of the Watson-Crick duplex. We found that directionality of the sugar-phosphate backbone and the preferable ranges of its torsion angles, combined with the difference between purines and pyrimidines. in ring bases, define the dependence of three-dimensional structure of the Watson-Crick duplex on nucleotide base sequence. In this work, we extended these density functional theory computations to the minimal' fragments of DNA duplex, complementary desoxydinucleoside monophosphates complexes with Na-ions. Using several computational methods and various functionals, we performed a search for energy minima of BI-conformation for complementary desoxydinucleoside monophosphates complexes with different nucleoside sequences. Two sequences are optimized using ab initio method at the MP2/6-31++G** level of theory. The analysis of torsion angles, sugar ring puckering and mutual base positions of optimized structures demonstrates that the conformational characteristic features of complementary desoxydinucleoside monophosphates complexes with Na-ions remain within BI ranges and become closer to the corresponding characteristic features of the Watson-Crick duplex crystals. Qualitatively, the main characteristic features of each studied complementary desoxydinucleoside monophosphates complex remain invariant when different computational methods are used, although the quantitative values of some conformational parameters could vary lying within the limits typical for the corresponding family. We observe that popular functionals in density functional theory calculations lead to the overestimated distances between base pairs, while MP2 computations and the newer complex functionals produce the structures that have too close atom-atom contacts. A detailed study of some complementary desoxydinucleoside monophosphate complexes with Na-ions highlights the existence of several energy minima corresponding to BI-conformations, in other words, the complexity of the relief pattern of the potential energy surface of complementary desoxydinucleoside monophosphate complexes. This accounts for variability of conformational parameters of duplex fragments with the same base sequence. Popular molecular mechanics force fields AMBER and CHARMM reproduce most of the conformational characteristics of desoxydinucleoside monophosphates and their complementary complexes with Na-ions but fail to reproduce some details of the dependence of the Watson-Crick duplex conformation on the nucleotide sequence.
Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H
2012-05-01
Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.
Quantiprot - a Python package for quantitative analysis of protein sequences.
Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold
2017-07-17
The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.
Graph pyramids for protein function prediction
2015-01-01
Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522
Graph pyramids for protein function prediction.
Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun
2015-01-01
Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.
Cluster compression algorithm: A joint clustering/data compression concept
NASA Technical Reports Server (NTRS)
Hilbert, E. E.
1977-01-01
The Cluster Compression Algorithm (CCA), which was developed to reduce costs associated with transmitting, storing, distributing, and interpreting LANDSAT multispectral image data is described. The CCA is a preprocessing algorithm that uses feature extraction and data compression to more efficiently represent the information in the image data. The format of the preprocessed data enables simply a look-up table decoding and direct use of the extracted features to reduce user computation for either image reconstruction, or computer interpretation of the image data. Basically, the CCA uses spatially local clustering to extract features from the image data to describe spectral characteristics of the data set. In addition, the features may be used to form a sequence of scalar numbers that define each picture element in terms of the cluster features. This sequence, called the feature map, is then efficiently represented by using source encoding concepts. Various forms of the CCA are defined and experimental results are presented to show trade-offs and characteristics of the various implementations. Examples are provided that demonstrate the application of the cluster compression concept to multi-spectral images from LANDSAT and other sources.
Qin, Jiang-Bo; Liu, Zhenyu; Zhang, Hui; Shen, Chen; Wang, Xiao-Chun; Tan, Yan; Wang, Shuo; Wu, Xiao-Feng; Tian, Jie
2017-05-07
BACKGROUND Gliomas are the most common primary brain neoplasms. Misdiagnosis occurs in glioma grading due to an overlap in conventional MRI manifestations. The aim of the present study was to evaluate the power of radiomic features based on multiple MRI sequences - T2-Weighted-Imaging-FLAIR (FLAIR), T1-Weighted-Imaging-Contrast-Enhanced (T1-CE), and Apparent Diffusion Coefficient (ADC) map - in glioma grading, and to improve the power of glioma grading by combining features. MATERIAL AND METHODS Sixty-six patients with histopathologically proven gliomas underwent T2-FLAIR and T1WI-CE sequence scanning with some patients (n=63) also undergoing DWI scanning. A total of 114 radiomic features were derived with radiomic methods by using in-house software. All radiomic features were compared between high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Features with significant statistical differences were selected for receiver operating characteristic (ROC) curve analysis. The relationships between significantly different radiomic features and glial fibrillary acidic protein (GFAP) expression were evaluated. RESULTS A total of 8 radiomic features from 3 MRI sequences displayed significant differences between LGGs and HGGs. FLAIR GLCM Cluster Shade, T1-CE GLCM Entropy, and ADC GLCM Homogeneity were the best features to use in differentiating LGGs and HGGs in each MRI sequence. The combined feature was best able to differentiate LGGs and HGGs, which improved the accuracy of glioma grading compared to the above features in each MRI sequence. A significant correlation was found between GFAP and T1-CE GLCM Entropy, as well as between GFAP and ADC GLCM Homogeneity. CONCLUSIONS The combined radiomic feature had the highest efficacy in distinguishing LGGs from HGGs.
Woo, Kevin L; Rieucau, Guillaume
2008-07-01
The increasing use of the video playback technique in behavioural ecology reveals a growing need to ensure better control of the visual stimuli that focal animals experience. Technological advances now allow researchers to develop computer-generated animations instead of using video sequences of live-acting demonstrators. However, care must be taken to match the motion characteristics (speed and velocity) of the animation to the original video source. Here, we presented a tool based on the use of an optic flow analysis program to measure the resemblance of motion characteristics of computer-generated animations compared to videos of live-acting animals. We examined three distinct displays (tail-flick (TF), push-up body rock (PUBR), and slow arm wave (SAW)) exhibited by animations of Jacky dragons (Amphibolurus muricatus) that were compared to the original video sequences of live lizards. We found no significant differences between the motion characteristics of videos and animations across all three displays. Our results showed that our animations are similar the speed and velocity features of each display. Researchers need to ensure that similar motion characteristics in animation and video stimuli are represented, and this feature is a critical component in the future success of the video playback technique.
Meinicke, Peter; Tech, Maike; Morgenstern, Burkhard; Merkl, Rainer
2004-01-01
Background Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. Results We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. Conclusions We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems. PMID:15511290
A matter of emphasis: Linguistic stress habits modulate serial recall.
Taylor, John C; Macken, Bill; Jones, Dylan M
2015-04-01
Models of short-term memory for sequential information rely on item-level, feature-based descriptions to account for errors in serial recall. Transposition errors within alternating similar/dissimilar letter sequences derive from interactions between overlapping features. However, in two experiments, we demonstrated that the characteristics of the sequence are what determine the fates of items, rather than the properties ascribed to the items themselves. Performance in alternating sequences is determined by the way that the sequences themselves induce particular prosodic rehearsal patterns, and not by the nature of the items per se. In a serial recall task, the shapes of the canonical "saw-tooth" serial position curves and transposition error probabilities at successive input-output distances were modulated by subvocal rehearsal strategies, despite all item-based parameters being held constant. We replicated this finding using nonalternating lists, thus demonstrating that transpositions are substantially influenced by prosodic features-such as stress-that emerge during subvocal rehearsal.
van den Akker, Jeroen; Mishne, Gilad; Zimmer, Anjali D; Zhou, Alicia Y
2018-04-17
Next generation sequencing (NGS) has become a common technology for clinical genetic tests. The quality of NGS calls varies widely and is influenced by features like reference sequence characteristics, read depth, and mapping accuracy. With recent advances in NGS technology and software tools, the majority of variants called using NGS alone are in fact accurate and reliable. However, a small subset of difficult-to-call variants that still do require orthogonal confirmation exist. For this reason, many clinical laboratories confirm NGS results using orthogonal technologies such as Sanger sequencing. Here, we report the development of a deterministic machine-learning-based model to differentiate between these two types of variant calls: those that do not require confirmation using an orthogonal technology (high confidence), and those that require additional quality testing (low confidence). This approach allows reliable NGS-based calling in a clinical setting by identifying the few important variant calls that require orthogonal confirmation. We developed and tested the model using a set of 7179 variants identified by a targeted NGS panel and re-tested by Sanger sequencing. The model incorporated several signals of sequence characteristics and call quality to determine if a variant was identified at high or low confidence. The model was tuned to eliminate false positives, defined as variants that were called by NGS but not confirmed by Sanger sequencing. The model achieved very high accuracy: 99.4% (95% confidence interval: +/- 0.03%). It categorized 92.2% (6622/7179) of the variants as high confidence, and 100% of these were confirmed to be present by Sanger sequencing. Among the variants that were categorized as low confidence, defined as NGS calls of low quality that are likely to be artifacts, 92.1% (513/557) were found to be not present by Sanger sequencing. This work shows that NGS data contains sufficient characteristics for a machine-learning-based model to differentiate low from high confidence variants. Additionally, it reveals the importance of incorporating site-specific features as well as variant call features in such a model.
Assessing Diversity of DNA Structure-Related Sequence Features in Prokaryotic Genomes
Huang, Yongjie; Mrázek, Jan
2014-01-01
Prokaryotic genomes are diverse in terms of their nucleotide and oligonucleotide composition as well as presence of various sequence features that can affect physical properties of the DNA molecule. We present a survey of local sequence patterns which have a potential to promote non-canonical DNA conformations (i.e. different from standard B-DNA double helix) and interpret the results in terms of relationships with organisms' habitats, phylogenetic classifications, and other characteristics. Our present work differs from earlier similar surveys not only by investigating a wider range of sequence patterns in a large number of genomes but also by using a more realistic null model to assess significant deviations. Our results show that simple sequence repeats and Z-DNA-promoting patterns are generally suppressed in prokaryotic genomes, whereas palindromes and inverted repeats are over-represented. Representation of patterns that promote Z-DNA and intrinsic DNA curvature increases with increasing optimal growth temperature (OGT), and decreases with increasing oxygen requirement. Additionally, representations of close direct repeats, palindromes and inverted repeats exhibit clear negative trends with increasing OGT. The observed relationships with environmental characteristics, particularly OGT, suggest possible evolutionary scenarios of structural adaptation of DNA to particular environmental niches. PMID:24408877
Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences
NASA Astrophysics Data System (ADS)
Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou
2017-01-01
Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.
Protein classification using modified n-grams and skip-grams.
Islam, S M Ashiqul; Heil, Benjamin J; Kearney, Christopher Michel; Baker, Erich J
2018-05-01
Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG). A meta-comparison of cross-validation accuracy with twelve training datasets from nine different published studies demonstrates a consistent increase in accuracy of m-NGSG when compared to contemporary classification and feature generation models. We expect this model to accelerate the classification of proteins from primary sequence data and increase the accessibility of protein characteristic prediction to a broader range of scientists. m-NGSG is freely available at Bitbucket: https://bitbucket.org/sm_islam/mngsg/src. A web server is available at watson.ecs.baylor.edu/ngsg. erich_baker@baylor.edu. Supplementary data are available at Bioinformatics online.
What’s in a URL? Genre Classification from URLs
2012-01-01
webpages with access to the content of a document and feature extraction from URLs alone. Feature Extraction from Webpages Stylistic and structural...2010). Character n-grams (sequence of n characters) are attractive because of their simplicity and because they encapsulate both lexical and stylistic ...report might be stylistic . Feature Extraction from URLs The syntactic characteristics of URLs have been fairly sta- ble over the years. URL terms are
Uversky, Vladimir N
2015-03-01
Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Abstract feature codes: The building blocks of the implicit learning system.
Eberhardt, Katharina; Esser, Sarah; Haider, Hilde
2017-07-01
According to the Theory of Event Coding (TEC; Hommel, Müsseler, Aschersleben, & Prinz, 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele, Ivry, Mayr, Hazeltine, & Heuer 2003) that process information along single dimensions. These dimensions have remained underspecified so far. It is especially not clear whether stimulus and response characteristics are processed in separate modules. Here, we suggest that feature dimensions as they are described in the TEC should be viewed as the basic content of modules of implicit learning. This means that the modules process all stimulus and response information related to certain feature dimensions of the perceptual environment. In 3 experiments, we investigated by means of a serial reaction time task the nature of the basic units of implicit learning. As a test case, we used stimulus location sequence learning. The results show that a stimulus location sequence and a response location sequence cannot be learned without interference (Experiment 2) unless one of the sequences can be coded via an alternative, nonspatial dimension (Experiment 3). These results support the notion that spatial location is one module of the implicit learning system and, consequently, that there are no separate processing units for stimulus versus response locations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).
A Shellcode Detection Method Based on Full Native API Sequence and Support Vector Machine
NASA Astrophysics Data System (ADS)
Cheng, Yixuan; Fan, Wenqing; Huang, Wei; An, Jing
2017-09-01
Dynamic monitoring the behavior of a program is widely used to discriminate between benign program and malware. It is usually based on the dynamic characteristics of a program, such as API call sequence or API call frequency to judge. The key innovation of this paper is to consider the full Native API sequence and use the support vector machine to detect the shellcode. We also use the Markov chain to extract and digitize Native API sequence features. Our experimental results show that the method proposed in this paper has high accuracy and low detection rate.
Mori, Kazuki; Kadooka, Chihiro; Masuda, Chika; Muto, Ai; Okutsu, Kayu; Yoshizaki, Yumiko; Takamine, Kazunori; Futagami, Taiki; Tamaki, Hisanori
2017-10-12
Here, we report a draft genome sequence of Saccharomyces cerevisiae strain Kagoshima no. 2, which is used for brewing shōchū, a traditional distilled spirit in Japan. The genome data will facilitate an understanding of the evolutional traits and genetic background related to the characteristic features of strain Kagoshima no. 2. Copyright © 2017 Mori et al.
Mori, Kazuki; Kadooka, Chihiro; Masuda, Chika; Muto, Ai; Okutsu, Kayu; Yoshizaki, Yumiko; Takamine, Kazunori; Tamaki, Hisanori
2017-01-01
ABSTRACT Here, we report a draft genome sequence of Saccharomyces cerevisiae strain Kagoshima no. 2, which is used for brewing shōchū, a traditional distilled spirit in Japan. The genome data will facilitate an understanding of the evolutional traits and genetic background related to the characteristic features of strain Kagoshima no. 2. PMID:29025949
Codebook-based electrooculography data analysis towards cognitive activity recognition.
Lagodzinski, P; Shirahama, K; Grzegorzek, M
2018-04-01
With the advancement in mobile/wearable technology, people started to use a variety of sensing devices to track their daily activities as well as health and fitness conditions in order to improve the quality of life. This work addresses an idea of eye movement analysis, which due to the strong correlation with cognitive tasks can be successfully utilized in activity recognition. Eye movements are recorded using an electrooculographic (EOG) system built into the frames of glasses, which can be worn more unobtrusively and comfortably than other devices. Since the obtained information is low-level sensor data expressed as a sequence representing values in constant intervals (100 Hz), the cognitive activity recognition problem is formulated as sequence classification. However, it is unclear what kind of features are useful for accurate cognitive activity recognition. Thus, a machine learning algorithm like a codebook approach is applied, which instead of focusing on feature engineering is using a distribution of characteristic subsequences (codewords) to describe sequences of recorded EOG data, where the codewords are obtained by clustering a large number of subsequences. Further, statistical analysis of the codeword distribution results in discovering features which are characteristic to a certain activity class. Experimental results demonstrate good accuracy of the codebook-based cognitive activity recognition reflecting the effective usage of the codewords. Copyright © 2017 Elsevier Ltd. All rights reserved.
O'Rahilly, R; Müller, F; Hutchins, G M; Moore, G W
1987-09-01
The sequence of events in the development of the brain in human embryos, already published for stages 8-15, is here continued for stages 16 and 17. With the aid of a computerized bubble-sort algorithm, 71 individual embryos were ranked in ascending order of the features present. Whereas these numbered 100 in the previous study, the increasing structural complexity gave 27 new features in the two stages now under investigation. The chief characteristics of stage 16 (approximately 37 postovulatory days) are protruding basal nuclei, the caudal olfactory elevation (olfactory tubercle), the tectobulbar tracts, and ascending fibers to the cerebellum. The main features of stage 17 (approximately 41 postovulatory days) are the cortical nucleus of the amygdaloid body, an intermediate layer in the tectum mesencephali, the posterior commissure, and the habenulo-interpeduncular tract. In addition, a typical feature at stage 17 is the crescentic shape of the lens cavity.
Complete mitochondrial genome of the larch hawk moth, Sphinx morio (Lepidoptera: Sphingidae).
Kim, Min Jee; Choi, Sei-Woong; Kim, Iksoo
2013-12-01
The larch hawk moth, Sphinx morio, belongs to the lepidopteran family Sphingidae that has long been studied as a family of model insects in a diverse field. In this study, we describe the complete mitochondrial genome (mitogenome) sequences of the species in terms of general genomic features and characteristic short repetitive sequences found in the A + T-rich region. The 15,299-bp-long genome consisted of a typical set of genes (13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes) and one major non-coding A + T-rich region, with the typical arrangement found in Lepidoptera. The 316-bp-long A + T-rich region located between srRNA and tRNA(Met) harbored the conserved sequence blocks that are typically found in lepidopteran insects. Additionally, the A + T-rich region of S. morio contained three characteristic repeat sequences that are rarely found in Lepidoptera: two identical 12-bp repeat, three identical 5-bp-long tandem repeat, and six nearly identical 5-6 bp long repeat sequences.
Centromere-Like Regions in the Budding Yeast Genome
Lefrançois, Philippe; Auerbach, Raymond K.; Yellman, Christopher M.; Roeder, G. Shirleen; Snyder, Michael
2013-01-01
Accurate chromosome segregation requires centromeres (CENs), the DNA sequences where kinetochores form, to attach chromosomes to microtubules. In contrast to most eukaryotes, which have broad centromeres, Saccharomyces cerevisiae possesses sequence-defined point CENs. Chromatin immunoprecipitation followed by sequencing (ChIP–Seq) reveals colocalization of four kinetochore proteins at novel, discrete, non-centromeric regions, especially when levels of the centromeric histone H3 variant, Cse4 (a.k.a. CENP-A or CenH3), are elevated. These regions of overlapping protein binding enhance the segregation of plasmids and chromosomes and have thus been termed Centromere-Like Regions (CLRs). CLRs form in close proximity to S. cerevisiae CENs and share characteristics typical of both point and regional CENs. CLR sequences are conserved among related budding yeasts. Many genomic features characteristic of CLRs are also associated with these conserved homologous sequences from closely related budding yeasts. These studies provide general and important insights into the origin and evolution of centromeres. PMID:23349633
RISSC: a novel database for ribosomal 16S–23S RNA genes spacer regions
García-Martínez, Jesús; Bescós, Ignacio; Rodríguez-Sala, Jesús Javier; Rodríguez-Valera, Francisco
2001-01-01
A novel database, under the acronym RISSC (Ribosomal Intergenic Spacer Sequence Collection), has been created. It compiles more than 1600 entries of edited DNA sequence data from the 16S–23S ribosomal spacers present in most prokaryotes and organelles (e.g. mitochondria and chloroplasts) and is accessible through the Internet (http://ulises.umh.es/RISSC), where systematic searches for specific words can be conducted, as well as BLAST-type sequence searches. Additionally, a characteristic feature of this region, the presence/absence and nature of tRNA genes within the spacer, is included in all the entries, even when not previously indicated in the original database. All these combined features could provide a useful documentation tool for studies on evolution, identification, typing and strain characterization, among others. PMID:11125084
Characteristics of arachnoids from Magellan data
NASA Technical Reports Server (NTRS)
Dawson, C. B.; Crumpler, L. S.
1993-01-01
Current high resolution Magellan data enables more detailed geological study of arachnoids, first identified by Barsukov et al. as features characterized by a combination of radar-bright, concentric rings and radiating lineations, named 'arachnoids' on the basis of their spider and web-like appearance. Identification of arachnoids in Magellan data has been based on SAR images, in keeping with the original definition. However, there is some overlap by other workers in identification of arachnoids, corona (predominantly bright rings), and novae (predominantly radiating lineations), as all of these features share some common characteristics. Features used in this survey were chosen based on their classification as arachnoids in Head et al.'s catalog and on SAR characteristics matching Barsukov et al.'s original definition. The 259 arachnoids have been currently identified on Venus, all of which were considered in this study. Fifteen arachnoids from different regions, chosen for their 'type' characteristics and lack of deformation by other regional processes, were studied in depth, using SAR and altimetric data to map and profile these arachnoids in an attempt to better determine their geologic and altimetric characteristics and possible formation sequences.
Rotation invariant features for wear particle classification
NASA Astrophysics Data System (ADS)
Arof, Hamzah; Deravi, Farzin
1997-09-01
This paper investigates the ability of a set of rotation invariant features to classify images of wear particles found in used lubricating oil of machinery. The rotation invariant attribute of the features is derived from the property of the magnitudes of Fourier transform coefficients that do not change with spatial shift of the input elements. By analyzing individual circular neighborhoods centered at every pixel in an image, local and global texture characteristics of an image can be described. A number of input sequences are formed by the intensities of pixels on concentric rings of various radii measured from the center of each neighborhood. Fourier transforming the sequences would generate coefficients whose magnitudes are invariant to rotation. Rotation invariant features extracted from these coefficients were utilized to classify wear particle images that were obtained from a number of different particles captured at different orientations. In an experiment involving images of 6 classes, the circular neighborhood features obtained a 91% recognition rate which compares favorably to a 76% rate achieved by features of a 6 by 6 co-occurrence matrix.
Hsing, Michael; Cherkasov, Artem
2008-06-25
Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.
NASA Technical Reports Server (NTRS)
Winker, S.; Woese, C. R.
1991-01-01
The number of small subunit rRNA sequences is now great enough that the three domains Archaea, Bacteria and Eucarya (Woese et al., 1990) can be reliably defined in terms of their sequence "signatures". Approximately 50 homologous positions (or nucleotide pairs) in the small subunit rRNA characterize and distinguish among the three. In addition, the three can be recognized by a variety of nonhomologous rRNA characters, either individual positions and/or higher-order structural features. The Crenarchaeota and the Euryarchaeota, the two archaeal kingdoms, can also be defined and distinguished by their characteristic compositions at approximately fifteen positions in the small subunit rRNA molecule.
Plastid-targeting peptides from the chlorarachniophyte Bigelowiella natans.
Rogers, Matthew B; Archibald, John M; Field, Matthew A; Li, Catherine; Striepen, Boris; Keeling, Patrick J
2004-01-01
Chlorarachniophytes are marine amoeboflagellate protists that have acquired their plastid (chloroplast) through secondary endosymbiosis with a green alga. Like other algae, most of the proteins necessary for plastid function are encoded in the nuclear genome of the secondary host. These proteins are targeted to the organelle using a bipartite leader sequence consisting of a signal peptide (allowing entry in to the endomembrane system) and a chloroplast transit peptide (for transport across the chloroplast envelope membranes). We have examined the leader sequences from 45 full-length predicted plastid-targeted proteins from the chlorarachniophyte Bigelowiella natans with the goal of understanding important features of these sequences and possible conserved motifs. The chemical characteristics of these sequences were compared with a set of 10 B. natans endomembrane-targeted proteins and 38 cytosolic or nuclear proteins, which show that the signal peptides are similar to those of most other eukaryotes, while the transit peptides differ from those of other algae in some characteristics. Consistent with this, the leader sequence from one B. natans protein was tested for function in the apicomplexan parasite, Toxoplasma gondii, and shown to direct the secretion of the protein.
Chen, Peng; Li, Jinyan; Wong, Limsoon; Kuwahara, Hiroyuki; Huang, Jianhua Z; Gao, Xin
2013-08-01
Hot spot residues of proteins are fundamental interface residues that help proteins perform their functions. Detecting hot spots by experimental methods is costly and time-consuming. Sequential and structural information has been widely used in the computational prediction of hot spots. However, structural information is not always available. In this article, we investigated the problem of identifying hot spots using only physicochemical characteristics extracted from amino acid sequences. We first extracted 132 relatively independent physicochemical features from a set of the 544 properties in AAindex1, an amino acid index database. Each feature was utilized to train a classification model with a novel encoding schema for hot spot prediction by the IBk algorithm, an extension of the K-nearest neighbor algorithm. The combinations of the individual classifiers were explored and the classifiers that appeared frequently in the top performing combinations were selected. The hot spot predictor was built based on an ensemble of these classifiers and to work in a voting manner. Experimental results demonstrated that our method effectively exploited the feature space and allowed flexible weights of features for different queries. On the commonly used hot spot benchmark sets, our method significantly outperformed other machine learning algorithms and state-of-the-art hot spot predictors. The program is available at http://sfb.kaust.edu.sa/pages/software.aspx. Copyright © 2013 Wiley Periodicals, Inc.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew; Haass, Michael; Rintoul, Mark Daniel
GazeAppraise advances the state of the art of gaze pattern analysis using methods that simultaneously analyze spatial and temporal characteristics of gaze patterns. GazeAppraise enables novel research in visual perception and cognition; for example, using shape features as distinguishing elements to assess individual differences in visual search strategy. Given a set of point-to-point gaze sequences, hereafter referred to as scanpaths, the method constructs multiple descriptive features for each scanpath. Once the scanpath features have been calculated, they are used to form a multidimensional vector representing each scanpath and cluster analysis is performed on the set of vectors from all scanpaths.more » An additional benefit of this method is the identification of causal or correlated characteristics of the stimuli, subjects, and visual task through statistical analysis of descriptive metadata distributions within and across clusters.« less
NASA Astrophysics Data System (ADS)
Ulomov, V. I.; Danilova, T. I.; Medvedeva, N. S.; Polyakova, T. P.
2006-07-01
The Scythian-Turan platform, together with the Alpine Iran-Caucasus-Anatolia and Hercynian Central Tien Shan orogenic structures adjacent to it, represents a coherent seismogeodynamic system responsible for regional seismicity features in the territory under consideration. Investigations of the spatiotemporal and energy evolution of seismogeodynamic processes along the main lineament structures of the orogen reveal characteristic features directly related to the prediction of seismic hazard in this region, as well as in southern European Russia. These characteristics primarily include kinematic features in the sequences of seismic events of various magnitudes and an ordered migration of seismic activation, enabling the more or less reliable determination of the occurrence time intervals (years) and areas of forthcoming large earthquakes (magnitudes of 7.0 ± 0.2, 7.5 ± 0.2, and 8.0 ± 0.2).
The 11 Micron Emissions of Carbon Stars
NASA Technical Reports Server (NTRS)
Goebel, J. H.; Cheeseman, P.; Gerbault, F.
1995-01-01
A new classification scheme of the IRAS LRS carbon stars is presented. It comprises the separation of 718 probable carbon stars into 12 distinct self-similar spectral groupings. Continuum temperatures are assigned and range from 470 to 5000 K. Three distinct dust species are identifiable: SiC, alpha:C-H, and MgS. In addition to the narrow 11 + micron emission feature that is commonly attributed to SiC, a broad 11 + micron emission feature, that is correlated with the 8.5 and 7.7 micron features, is found and attributed to alpha:C-H. SiC and alpha:C-H band strengths are found to correlate with the temperature progression among the Classes. We find a spectral sequence of Classes that reflects the carbon star evolutionary sequence of spectral types, or alternatively developmental sequences of grain condensation in carbon-rich circumstellar shells. If decreasing temperature corresponds to increasing evolution, then decreasing temperature corresponds to increasing C/O resulting in increasing amounts of carbon rich dust, namely alpha:C-H. If decreasing the temperature corresponds to a grain condensation sequence, then heterogeneous, or induced nucleation scenarios are supported. SiC grains precede alpha:C-H and form the nuclei for the condensation of the latter material. At still lower temperatures, MgS appears to be quite prevalent. No 11.3 micron PAH features are identified in any of the 718 carbon stars. However, one of the coldest objects, IRAS 15048-5702, and a few others, displays an 11.9 micron emission feature characteristic of laboratory samples of coronene. That feature corresponds to the C-H out of plane deformation mode of aromatic hydrocarbon. This band indicates the presence of unsaturated, sp(sup 3), hydrocarbon bonds that may subsequently evolve into saturated bonds, sp(sup 2), if, and when, the star enters the planetary nebulae phase of stellar evolution. The effusion of hydrogen from the hydrocarbon grain results in the evolution in wavelength of this 11.9 micron emission feature to the 11.3 micron feature.
The 11 Micron Emissions of Cabon Stars
NASA Technical Reports Server (NTRS)
Goebel, J. H.; Cheeseman, P.; Gerbault, F.
1995-01-01
A new classification scheme of the IRAS LRS carbon stars is presented. It comprises the separation of 718 probable carbon stars into 12 distinct self-similar spectral groupings. Continuum temperatures are assigned and range from 470 to 5000 K. Three distinct dust species are identifiable: SiC, alpha:C-H, and MgS. In addition to the narrow 11 + micron emission feature that is commonly attributed to SiC, a broad 11 + micron emission feature, that is correlated with the 8.5 and 7.7 micron features, is found and attributed to alpha:C-H. SiC and alpha:C-H band strengths are found to correlate with the temperature progression among the Classes. We find a spectral sequence of Classes that reflects the carbon star evolutionary sequence of spectral types, or alternatively developmental sequences of grain condensation in carbon-rich circumstellar shells. If decreasing temperature corresponds to increasing evolution, then decreasing temperature corresponds to increasing CIO resulting in increasing amounts of carbon rich dust, namely alpha:C-H. If decreasing the temperature corresponds to a grain condensation sequence, then heterogeneous, or induced nucleation scenarios are supported. SiC grains precede alpha:C-H and form the nuclei for the condensation of the latter material. At still lower temperatures, MgS appears to be quite prevalent. No 11.3 micron PAH features are identified in any of the 718 carbon stars. However, one of the coldest objects, IRAS 15048-5702, and a few others, displays an 11.9 micron emission feature characteristic of laboratory samples of coronene. That feature corresponds to the C-H out of plane deformation mode of aromatic hydrocarbon. This band indicates the presence of unsaturated, sp(sup 3), hydrocarbon bonds that may subsequently evolve into saturated bonds, sp(sup 2), if, and when, the star enters the planetary nebulae phase of stellar evolution. The effusion of hydrogen from the hydrocarbon grain results in the evolution in wavelength of this 11.9 micron emission feature to the 11.3 micron feature.
2011-01-01
Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070
Saravanan, Konda Mani; Dunker, A Keith; Krishnaswamy, Sankaran
2017-12-27
More than 60 prediction methods for intrinsically disordered proteins (IDPs) have been developed over the years, many of which are accessible on the World Wide Web. Nearly, all of these predictors give balanced accuracies in the ~65%-~80% range. Since predictors are not perfect, further studies are required to uncover the role of amino acid residues in native IDP as compared to predicted IDP regions. In the present work, we make use of sequences of 100% predicted IDP regions, false positive disorder predictions, and experimentally determined IDP regions to distinguish the characteristics of native versus predicted IDP regions. A higher occurrence of asparagine is observed in sequences of native IDP regions but not in sequences of false positive predictions of IDP regions. The occurrences of certain combinations of amino acids at the pentapeptide level provide a distinguishing feature in the IDPs with respect to globular proteins. The distinguishing features presented in this paper provide insights into the sequence fingerprints of amino acid residues in experimentally determined as compared to predicted IDP regions. These observations and additional work along these lines should enable the development of improvements in the accuracy of disorder prediction algorithm.
Angart, Phillip A.; Carlson, Rebecca J.; Adu-Berchie, Kwasi
2016-01-01
Efficient short interfering RNA (siRNA)-mediated gene silencing requires selection of a sequence that is complementary to the intended target and possesses sequence and structural features that encourage favorable functional interactions with the RNA interference (RNAi) pathway proteins. In this study, we investigated how terminal sequence and structural characteristics of siRNAs contribute to siRNA strand loading and silencing activity and how these characteristics ultimately result in a functionally asymmetric duplex in cultured HeLa cells. Our results reiterate that the most important characteristic in determining siRNA activity is the 5′ terminal nucleotide identity. Our findings further suggest that siRNA loading is controlled principally by the hybridization stability of the 5′ terminus (Nucleotides: 1–2) of each siRNA strand, independent of the opposing terminus. Postloading, RNA-induced silencing complex (RISC)–specific activity was found to be improved by lower hybridization stability in the 5′ terminus (Nucleotides: 3–4) of the loaded siRNA strand and greater hybridization stability toward the 3′ terminus (Nucleotides: 17–18). Concomitantly, specific recognition of the 5′ terminal nucleotide sequence by human Argonaute 2 (Ago2) improves RISC half-life. These findings indicate that careful selection of siRNA sequences can maximize both the loading and the specific activity of the intended guide strand. PMID:27399870
NASA Astrophysics Data System (ADS)
Ávila-Barrientos, L.; Zúñiga, F. R.; Rodríguez-Pérez, Q.; Guzmán-Speziale, M.
2015-11-01
Aftershock sequences along the Mexican subduction margin (between coordinates 110ºW and 91ºW) were analyzed by means of the p value from the Omori-Utsu relation and the b value from the Gutenberg-Richter relation. We focused on recent medium to large (Mw > 5.6) events considered susceptible of generating aftershock sequences suitable for analysis. The main goal was to try to find a possible correlation between aftershock parameters and plate characteristics, such as displacement rate, age and segmentation. The subduction regime of Mexico is one of the most active regions of the world with a high frequency of occurrence of medium to large events and plate characteristics change along the subduction margin. Previous studies have observed differences in seismic source characteristics at the subduction regime, which may indicate a difference in rheology and possible segmentation. The results of the analysis of the aftershock sequences indicate a slight tendency for p values to decrease from west to east with increasing of plate age although a statistical significance is undermined by the small number of aftershocks in the sequences, a particular feature distinctive of the region as compared to other world subduction regimes. The b values show an opposite, increasing trend towards the east even though the statistical significance is not enough to warrant the validation of such a trend. A linear regression between both parameters provides additional support for the inverse relation. Moreover, we calculated the seismic coupling coefficient, showing a direct relation with the p and b values. While we cannot undoubtedly confirm the hypothesis that aftershock generation depends on certain tectonic characteristics (age, thickness, temperature), our results do not reject it thus encouraging further study into this question.
Jeong, Hyeonsoo; Kim, Kwondo; Caetano-Anollés, Kelsey; Kim, Heebal; Kim, Byung-Ki; Yi, Jun-Koo; Ha, Jae-Jung; Cho, Seoae; Oh, Dong Yep
2016-05-24
Chicken, Gallus gallus, is a valuable species both as a food source and as a model organism for scientific research. Here, we sequenced the genome of Gyeongbuk Araucana, a rare chicken breed with unique phenotypic characteristics including flight ability, large body size, and laying blue-shelled eggs, to identify its genomic features. We generated genomes of Gyeongbuk Araucana, Leghorn, and Korean Native Chicken at a total of 33.5, 35.82, and 33.23 coverage depth, respectively. Along with the genomes of 12 Chinese breeds, we identified genomic variants of 16.3 million SNVs and 2.3 million InDels in mapped regions. Additionally, through assembly of unmapped reads and selective sweep, we identified candidate genes that fall into heart, vasculature and muscle development and body growth categories, which provided insight into Gyeongbuk Araucana's phenotypic traits. Finally, genetic variation based on the transposable element insertion pattern was investigated to elucidate the features of transposable elements related to blue egg shell formation. This study presents results of the first genomic study on the Gyeongbuk Araucana breed; it has potential to serve as an invaluable resource for future research on the genomic characteristics of this chicken breed as well as others.
Herget, Stephan; Toukach, Philip V; Ranzinger, René; Hull, William E; Knirel, Yuriy A; von der Lieth, Claus-Wilhelm
2008-01-01
Background There are considerable differences between bacterial and mammalian glycans. In contrast to most eukaryotic carbohydrates, bacterial glycans are often composed of repeating units with diverse functions ranging from structural reinforcement to adhesion, colonization and camouflage. Since bacterial glycans are typically displayed at the cell surface, they can interact with the environment and, therefore, have significant biomedical importance. Results The sequence characteristics of glycans (monosaccharide composition, modifications, and linkage patterns) for the higher bacterial taxonomic classes have been examined and compared with the data for mammals, with both similarities and unique features becoming evident. Compared to mammalian glycans, the bacterial glycans deposited in the current databases have a more than ten-fold greater diversity at the monosaccharide level, and the disaccharide pattern space is approximately nine times larger. Specific bacterial subclasses exhibit characteristic glycans which can be distinguished on the basis of distinctive structural features or sequence properties. Conclusion For the first time a systematic database analysis of the bacterial glycome has been performed. This study summarizes the current knowledge of bacterial glycan architecture and diversity and reveals putative targets for the rational design and development of therapeutic intervention strategies by comparing bacterial and mammalian glycans. PMID:18694500
Song, Jiangning; Li, Fuyi; Takemoto, Kazuhiro; Haffari, Gholamreza; Akutsu, Tatsuya; Chou, Kuo-Chen; Webb, Geoffrey I
2018-04-14
Determining the catalytic residues in an enzyme is critical to our understanding the relationship between protein sequence, structure, function, and enhancing our ability to design novel enzymes and their inhibitors. Although many enzymes have been sequenced, and their primary and tertiary structures determined, experimental methods for enzyme functional characterization lag behind. Because experimental methods used for identifying catalytic residues are resource- and labor-intensive, computational approaches have considerable value and are highly desirable for their ability to complement experimental studies in identifying catalytic residues and helping to bridge the sequence-structure-function gap. In this study, we describe a new computational method called PREvaIL for predicting enzyme catalytic residues. This method was developed by leveraging a comprehensive set of informative features extracted from multiple levels, including sequence, structure, and residue-contact network, in a random forest machine-learning framework. Extensive benchmarking experiments on eight different datasets based on 10-fold cross-validation and independent tests, as well as side-by-side performance comparisons with seven modern sequence- and structure-based methods, showed that PREvaIL achieved competitive predictive performance, with an area under the receiver operating characteristic curve and area under the precision-recall curve ranging from 0.896 to 0.973 and from 0.294 to 0.523, respectively. We demonstrated that this method was able to capture useful signals arising from different levels, leveraging such differential but useful types of features and allowing us to significantly improve the performance of catalytic residue prediction. We believe that this new method can be utilized as a valuable tool for both understanding the complex sequence-structure-function relationships of proteins and facilitating the characterization of novel enzymes lacking functional annotations. Copyright © 2018 Elsevier Ltd. All rights reserved.
Michael, Todd P; Bryant, Douglas; Gutierrez, Ryan; Borisjuk, Nikolai; Chu, Philomena; Zhang, Hanzhong; Xia, Jing; Zhou, Junfei; Peng, Hai; El Baidouri, Moaine; Ten Hallers, Boudewijn; Hastie, Alex R; Liang, Tiffany; Acosta, Kenneth; Gilbert, Sarah; McEntee, Connor; Jackson, Scott A; Mockler, Todd C; Zhang, Weixiong; Lam, Eric
2017-02-01
Spirodela polyrhiza is a fast-growing aquatic monocot with highly reduced morphology, genome size and number of protein-coding genes. Considering these biological features of Spirodela and its basal position in the monocot lineage, understanding its genome architecture could shed light on plant adaptation and genome evolution. Like many draft genomes, however, the 158-Mb Spirodela genome sequence has not been resolved to chromosomes, and important genome characteristics have not been defined. Here we deployed rapid genome-wide physical maps combined with high-coverage short-read sequencing to resolve the 20 chromosomes of Spirodela and to empirically delineate its genome features. Our data revealed a dramatic reduction in the number of the rDNA repeat units in Spirodela to fewer than 100, which is even fewer than that reported for yeast. Consistent with its unique phylogenetic position, small RNA sequencing revealed 29 Spirodela-specific microRNA, with only two being shared with Elaeis guineensis (oil palm) and Musa balbisiana (banana). Combining DNA methylation data and small RNA sequencing enabled the accurate prediction of 20.5% long terminal repeats (LTRs) that doubled the previous estimate, and revealed a high Solo:Intact LTR ratio of 8.2. Interestingly, we found that Spirodela has the lowest global DNA methylation levels (9%) of any plant species tested. Taken together our results reveal a genome that has undergone reduction, likely through eliminating non-essential protein coding genes, rDNA and LTRs. In addition to delineating the genome features of this unique plant, the methodologies described and large-scale genome resources from this work will enable future evolutionary and functional studies of this basal monocot family. © 2016 The Authors The Plant Journal © 2016 John Wiley & Sons Ltd.
Joffret, Marie-Line; Jégouic, Sophie; Bessaud, Maël; Balanant, Jean; Tran, Coralie; Caro, Valerie; Holmblat, Barbara; Razafindratsimandresy, Richter; Reynes, Jean-Marc; Rakoto-Andrianarivelo, Mala; Delpeyroux, Francis
2012-05-01
Five cases of poliomyelitis due to type 2 or 3 recombinant vaccine-derived polioviruses (VDPVs) were reported in the Toliara province of Madagascar in 2005. We sequenced the genome of the VDPVs isolated from the patients and from 12 healthy children and characterized phenotypic aspects, including pathogenicity, in mice transgenic for the poliovirus receptor. We identified 6 highly complex mosaic recombinant lineages composed of sequences derived from different vaccine polioviruses and other species C human enteroviruses (HEV-Cs). Most had some recombinant genome features in common and contained nucleotide sequences closely related to certain cocirculating coxsackie A virus isolates. However, they differed in terms of their recombinant characteristics or nucleotide substitutions and phenotypic features. All VDPVs were neurovirulent in mice. This study confirms the genetic relationship between type 2 and 3 VDPVs, indicating that both types can be involved in a single outbreak of disease. Our results highlight the various ways in which a vaccine-derived poliovirus may become pathogenic in complex viral ecosystems, through frequent recombination events and mutations. Intertypic recombination between cocirculating HEV-Cs (including polioviruses) appears to be a common mechanism of genetic plasticity underlying transverse genetic variability.
O'Rahilly, R; Müller, F; Hutchins, G M; Moore, G W
1988-08-01
The sequence of events in the development of the brain in human embryos, already published for stages 8-17, is here continued for stages 18 and 19. With the aid of a computerized bubble-sort algorithm, 58 individual embryos were ranked in ascending order of the features present. The increasing structural complexity provided 40 new features in these two stages. The chief characteristics of stage 18 (approximately 44 postovulatory days) are rapidly growing basal nuclei; appearance of the extraventricular bulge of the cerebellum (flocculus), of the superior cerebellar peduncle, and of follicles in the epiphysis cerebri; and the presence of vomeronasal organ and ganglion, of the bucconasal membrane, and of isolated semicircular ducts. The main features of stage 19 (approximately 48 days) are the cochlear nuclei, the ganglion of the nervus terminalis, nuclei of the prosencephalic septum, the appearance of the subcommissural organ, the presence of villi in the choroid plexuses of the fourth and lateral ventricles, and the stria medullaris thalami.
Mizianty, Marcin J; Kurgan, Lukasz
2009-12-13
Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.
2009-01-01
Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388
Using expected sequence features to improve basecalling accuracy of amplicon pyrosequencing data.
Rask, Thomas S; Petersen, Bent; Chen, Donald S; Day, Karen P; Pedersen, Anders Gorm
2016-04-22
Amplicon pyrosequencing targets a known genetic region and thus inherently produces reads highly anticipated to have certain features, such as conserved nucleotide sequence, and in the case of protein coding DNA, an open reading frame. Pyrosequencing errors, consisting mainly of nucleotide insertions and deletions, are on the other hand likely to disrupt open reading frames. Such an inverse relationship between errors and expectation based on prior knowledge can be used advantageously to guide the process known as basecalling, i.e. the inference of nucleotide sequence from raw sequencing data. The new basecalling method described here, named Multipass, implements a probabilistic framework for working with the raw flowgrams obtained by pyrosequencing. For each sequence variant Multipass calculates the likelihood and nucleotide sequence of several most likely sequences given the flowgram data. This probabilistic approach enables integration of basecalling into a larger model where other parameters can be incorporated, such as the likelihood for observing a full-length open reading frame at the targeted region. We apply the method to 454 amplicon pyrosequencing data obtained from a malaria virulence gene family, where Multipass generates 20 % more error-free sequences than current state of the art methods, and provides sequence characteristics that allow generation of a set of high confidence error-free sequences. This novel method can be used to increase accuracy of existing and future amplicon sequencing data, particularly where extensive prior knowledge is available about the obtained sequences, for example in analysis of the immunoglobulin VDJ region where Multipass can be combined with a model for the known recombining germline genes. Multipass is available for Roche 454 data at http://www.cbs.dtu.dk/services/MultiPass-1.0 , and the concept can potentially be implemented for other sequencing technologies as well.
Zhao, Jiaduo; Gong, Weiguo; Tang, Yuzhen; Li, Weihong
2016-01-20
In this paper, we propose an effective human and nonhuman pyroelectric infrared (PIR) signal recognition method to reduce PIR detector false alarms. First, using the mathematical model of the PIR detector, we analyze the physical characteristics of the human and nonhuman PIR signals; second, based on the analysis results, we propose an empirical mode decomposition (EMD)-based symbolic dynamic analysis method for the recognition of human and nonhuman PIR signals. In the proposed method, first, we extract the detailed features of a PIR signal into five symbol sequences using an EMD-based symbolization method, then, we generate five feature descriptors for each PIR signal through constructing five probabilistic finite state automata with the symbol sequences. Finally, we use a weighted voting classification strategy to classify the PIR signals with their feature descriptors. Comparative experiments show that the proposed method can effectively classify the human and nonhuman PIR signals and reduce PIR detector's false alarms.
Venet, Sophie; Ravn, Ulla; Buatois, Vanessa; Gueneau, Franck; Calloud, Sébastien; Kosco-Vilbois, Marie; Fischer, Nicolas
2012-01-01
Antibody repertoires are characterized by diversity as they vary not only amongst individuals and post antigen exposure but also differ significantly between vertebrate species. Such plasticity can be exploited to generate human antibody libraries featuring hallmarks of these diverse repertoires. In this study, the focus was to capture CDRH3 sequences, as this region generally accounts for most of the interaction energy with antigen. Sequences from human as well as non-human sources were successfully integrated into human antibody libraries. Next generation sequencing of these libraries proved that the CDRH3 lengths and amino acid composition corresponded to the species of origin. Specific CDRH3 sequences, biased towards the recognition of a model antigen either by immunizing mice or by selecting with phage display, were then integrated into another set of libraries. From these antigen biased libraries, highly potent antibodies were more frequently isolated, indicating that the characteristics of an immune repertoire is transferrable via CDRH3 sequences into a human antibody library. Taken together, these data demonstrate that the properties of naturally or experimentally biased repertoires can be effectively harnessed for the generation of targeted human antibody libraries, substantially increasing the probability of isolating antibodies suitable for therapeutic and diagnostic applications. PMID:22937053
Chætognath transcriptome reveals ancestral and unique features among bilaterians
Marlétaz, Ferdinand; Gilles, André; Caubit, Xavier; Perez, Yvan; Dossat, Carole; Samain, Sylvie; Gyapay, Gabor; Wincker, Patrick; Le Parco, Yannick
2008-01-01
Background The chætognaths (arrow worms) have puzzled zoologists for years because of their astonishing morphological and developmental characteristics. Despite their deuterostome-like development, phylogenomic studies recently positioned the chætognath phylum in protostomes, most likely in an early branching. This key phylogenetic position and the peculiar characteristics of chætognaths prompted further investigation of their genomic features. Results Transcriptomic and genomic data were collected from the chætognath Spadella cephaloptera through the sequencing of expressed sequence tags and genomic bacterial artificial chromosome clones. Transcript comparisons at various taxonomic scales emphasized the conservation of a core gene set and phylogenomic analysis confirmed the basal position of chætognaths among protostomes. A detailed survey of transcript diversity and individual genotyping revealed a past genome duplication event in the chætognath lineage, which was, surprisingly, followed by a high retention rate of duplicated genes. Moreover, striking genetic heterogeneity was detected within the sampled population at the nuclear and mitochondrial levels but cannot be explained by cryptic speciation. Finally, we found evidence for trans-splicing maturation of transcripts through splice-leader addition in the chætognath phylum and we further report that this processing is associated with operonic transcription. Conclusion These findings reveal both shared ancestral and unique derived characteristics of the chætognath genome, which suggests that this genome is likely the product of a very original evolutionary history. These features promote chætognaths as a pivotal model for comparative genomics, which could provide new clues for the investigation of the evolution of animal genomes. PMID:18533022
Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex
Leonard, Matthew K.; Bouchard, Kristofer E.; Tang, Claire
2015-01-01
Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. PMID:25948269
Huang, Chuen-Der; Lin, Chin-Teng; Pal, Nikhil Ranjan
2003-12-01
The structure classification of proteins plays a very important role in bioinformatics, since the relationships and characteristics among those known proteins can be exploited to predict the structure of new proteins. The success of a classification system depends heavily on two things: the tools being used and the features considered. For the bioinformatics applications, the role of appropriate features has not been paid adequate importance. In this investigation we use three novel ideas for multiclass protein fold classification. First, we use the gating neural network, where each input node is associated with a gate. This network can select important features in an online manner when the learning goes on. At the beginning of the training, all gates are almost closed, i.e., no feature is allowed to enter the network. Through the training, gates corresponding to good features are completely opened while gates corresponding to bad features are closed more tightly, and some gates may be partially open. The second novel idea is to use a hierarchical learning architecture (HLA). The classifier in the first level of HLA classifies the protein features into four major classes: all alpha, all beta, alpha + beta, and alpha/beta. And in the next level we have another set of classifiers, which further classifies the protein features into 27 folds. The third novel idea is to induce the indirect coding features from the amino-acid composition sequence of proteins based on the N-gram concept. This provides us with more representative and discriminative new local features of protein sequences for multiclass protein fold classification. The proposed HLA with new indirect coding features increases the protein fold classification accuracy by about 12%. Moreover, the gating neural network is found to reduce the number of features drastically. Using only half of the original features selected by the gating neural network can reach comparable test accuracy as that using all the original features. The gating mechanism also helps us to get a better insight into the folding process of proteins. For example, tracking the evolution of different gates we can find which characteristics (features) of the data are more important for the folding process. And, of course, it also reduces the computation time.
Yu, Yi; Duan, Lian; Zhang, Qi; Liao, Rijing; Ding, Ying; Pan, Haixue; Wendt-Pienkowski, Evelyn; Tang, Gongli; Shen, Ben; Liu, Wen
2009-01-01
Nosiheptide (NOS), belonging to the e series of thiopeptide antibiotics that exhibit potent activity against various bacterial pathogens, bears a unique indole side ring system and regiospecific hydroxyl groups on the characteristic macrocyclic core. Here, cloning, sequencing and characterization of the nos gene cluster from Streptomyces actuosus ATCC 25421 as a model for this series of thiopeptides has unveiled new insights into their biosynthesis. Bioinformatics-based sequence analysis and in vivo investigation into the gene functions show that NOS biosynthesis shares a common strategy with recently characterized b or c series thiopeptides for forming the characteristic macrocyclic core, which features a ribosomally synthesized precursor peptide with conserved posttranslational modifications. However, it apparently proceeds via a different route for tailoring the thiopeptide framework, allowing the final product to exhibit the distinct structural characteristics of e series thiopeptides, such as the indole side ring system. Chemical complementation supports the notion that the S-adenosylmethionine (AdoMet)-dependent protein NosL may play a central role in converting Trp to the key 3-methylindole moiety by an unusual carbon side chain rearrangement, most likely via a radical-initiated mechanism. Characterization of the indole side ring-opened analog of NOS from the nosN mutant strain is consistent with the proposed methyltransferase activity of its encoded protein, shedding light into the timing of the individual steps for indole side ring biosynthesis. These results also suggest the feasibility of engineering novel thiopeptides for drug discovery by manipulating the NOS biosynthetic machinery. PMID:19678698
Vingron, Martin
2016-01-01
Non-methylated islands (NMIs) of DNA are genomic regions that are important for gene regulation and development. A recent study of genome-wide non-methylation data in vertebrates by Long et al. (eLife 2013;2:e00348) has shown that many experimentally identified non-methylated regions do not overlap with classically defined CpG islands which are computationally predicted using simple DNA sequence features. This is especially true in cold-blooded vertebrates such as Danio rerio (zebrafish). In order to investigate how predictive DNA sequence is of a region’s methylation status, we applied a supervised learning approach using a spectrum kernel support vector machine, to see if a more complex model and supervised learning can be used to improve non-methylated island prediction and to understand the sequence properties of these regions. We demonstrate that DNA sequence is highly predictive of methylation status, and that in contrast to existing CpG island prediction methods our method is able to provide more useful predictions of NMIs genome-wide in all vertebrate organisms that were studied. Our results also show that in cold-blooded vertebrates (Anolis carolinensis, Xenopus tropicalis and Danio rerio) where genome-wide classical CpG island predictions consist primarily of false positives, longer primarily AT-rich DNA sequence features are able to identify these regions much more accurately. PMID:27984582
VarMod: modelling the functional effects of non-synonymous variants
Pappalardo, Morena; Wass, Mark N.
2014-01-01
Unravelling the genotype–phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein–protein interfaces and protein–ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. PMID:24906884
Syndromes associated with Homo sapiens pol II regulatory genes.
Bina, M; Demmon, S; Pares-Matos, E I
2000-01-01
The molecular basis of human characteristics is an intriguing but an unresolved problem. Human characteristics cover a broad spectrum, from the obvious to the abstract. Obvious characteristics may include morphological features such as height, shape, and facial form. Abstract characteristics may be hidden in processes that are controlled by hormones and the human brain. In this review we examine exaggerated characteristics presented as syndromes. Specifically, we focus on human genes that encode transcription factors to examine morphological, immunological, and hormonal anomalies that result from deletion, insertion, or mutation of genes that regulate transcription by RNA polymerase II (the Pol II genes). A close analysis of abnormal phenotypes can give clues into how sequence variations in regulatory genes and changes in transcriptional control may give rise to characteristics defined as complex traits.
Archaebacterial rhodopsin sequences: Implications for evolution
NASA Technical Reports Server (NTRS)
Lanyi, J. K.
1991-01-01
It was proposed over 10 years ago that the archaebacteria represent a separate kingdom which diverged very early from the eubacteria and eukaryotes. It follows that investigations of archaebacterial characteristics might reveal features of early evolution. So far, two genes, one for bacteriorhodopsin and another for halorhodopsin, both from Halobacterium halobium, have been sequenced. We cloned and sequenced the gene coding for the polypeptide of another one of these rhodopsins, a halorhodopsin in Natronobacterium pharaonis. Peptide sequencing of cyanogen bromide fragments, and immuno-reactions of the protein and synthetic peptides derived from the C-terminal gene sequence, confirmed that the open reading frame was the structural gene for the pharaonis halorhodopsin polypeptide. The flanking DNA sequences of this gene, as well as those of other bacterial rhodopsins, were compared to previously proposed archaebacterial consensus sequences. In pairwise comparisons of the open reading frame with DNA sequences for bacterio-opsin and halo-opsin from Halobacterium halobium, silent divergences were calculated. These indicate very considerable evolutionary distance between each pair of genes, even in the dame organism. In spite of this, three protein sequences show extensive similarities, indicating strong selective pressures.
Evolutionary and comparative analyses of the soybean genome
Cannon, Steven B.; Shoemaker, Randy C.
2012-01-01
The soybean genome assembly has been available since the end of 2008. Significant features of the genome include large, gene-poor, repeat-dense pericentromeric regions, spanning roughly 57% of the genome sequence; a relatively large genome size of ~1.15 billion bases; remnants of a genome duplication that occurred ~13 million years ago (Mya); and fainter remnants of older polyploidies that occurred ~58 Mya and >130 Mya. The genome sequence has been used to identify the genetic basis for numerous traits, including disease resistance, nutritional characteristics, and developmental features. The genome sequence has provided a scaffold for placement of many genomic feature elements, both from within soybean and from related species. These may be accessed at several websites, including http://www.phytozome.net, http://soybase.org, http://comparative-legumes.org, and http://www.legumebase.brc.miyazaki-u.ac.jp. The taxonomic position of soybean in the Phaseoleae tribe of the legumes means that there are approximately two dozen other beans and relatives that have undergone independent domestication, and which may have traits that will be useful for transfer to soybean. Methods of translating information between species in the Phaseoleae range from design of markers for marker assisted selection, to transformation with Agrobacterium or with other experimental transformation methods. PMID:23136483
NASA Astrophysics Data System (ADS)
DeForest, Craig; Seaton, Daniel B.; Darnell, John A.
2017-08-01
I present and demonstrate a new, general purpose post-processing technique, "3D noise gating", that can reduce image noise by an order of magnitude or more without effective loss of spatial or temporal resolution in typical solar applications.Nearly all scientific images are, ultimately, limited by noise. Noise can be direct Poisson "shot noise" from photon counting effects, or introduced by other means such as detector read noise. Noise is typically represented as a random variable (perhaps with location- or image-dependent characteristics) that is sampled once per pixel or once per resolution element of an image sequence. Noise limits many aspects of image analysis, including photometry, spatiotemporal resolution, feature identification, morphology extraction, and background modeling and separation.Identifying and separating noise from image signal is difficult. The common practice of blurring in space and/or time works because most image "signal" is concentrated in the low Fourier components of an image, while noise is evenly distributed. Blurring in space and/or time attenuates the high spatial and temporal frequencies, reducing noise at the expense of also attenuating image detail. Noise-gating exploits the same property -- "coherence" -- that we use to identify features in images, to separate image features from noise.Processing image sequences through 3-D noise gating results in spectacular (more than 10x) improvements in signal-to-noise ratio, while not blurring bright, resolved features in either space or time. This improves most types of image analysis, including feature identification, time sequence extraction, absolute and relative photometry (including differential emission measure analysis), feature tracking, computer vision, correlation tracking, background modeling, cross-scale analysis, visual display/presentation, and image compression.I will introduce noise gating, describe the method, and show examples from several instruments (including SDO/AIA , SDO/HMI, STEREO/SECCHI, and GOES-R/SUVI) that explore the benefits and limits of the technique.
Analysis of swallowing sounds using hidden Markov models.
Aboofazeli, Mohammad; Moussavi, Zahra
2008-04-01
In recent years, acoustical analysis of the swallowing mechanism has received considerable attention due to its diagnostic potentials. This paper presents a hidden Markov model (HMM) based method for the swallowing sound segmentation and classification. Swallowing sound signals of 15 healthy and 11 dysphagic subjects were studied. The signals were divided into sequences of 25 ms segments each of which were represented by seven features. The sequences of features were modeled by HMMs. Trained HMMs were used for segmentation of the swallowing sounds into three distinct phases, i.e., initial quiet period, initial discrete sounds (IDS) and bolus transit sounds (BTS). Among the seven features, accuracy of segmentation by the HMM based on multi-scale product of wavelet coefficients was higher than that of the other HMMs and the linear prediction coefficient (LPC)-based HMM showed the weakest performance. In addition, HMMs were used for classification of the swallowing sounds of healthy subjects and dysphagic patients. Classification accuracy of different HMM configurations was investigated. When we increased the number of states of the HMMs from 4 to 8, the classification error gradually decreased. In most cases, classification error for N=9 was higher than that of N=8. Among the seven features used, root mean square (RMS) and waveform fractal dimension (WFD) showed the best performance in the HMM-based classification of swallowing sounds. When the sequences of the features of IDS segment were modeled separately, the accuracy reached up to 85.5%. As a second stage classification, a screening algorithm was used which correctly classified all the subjects but one healthy subject when RMS was used as characteristic feature of the swallowing sounds and the number of states was set to N=8.
McDougall, Carmel; Woodcroft, Ben J.
2016-01-01
In nature, numerous mechanisms have evolved by which organisms fabricate biological structures with an impressive array of physical characteristics. Some examples of metazoan biological materials include the highly elastic byssal threads by which bivalves attach themselves to rocks, biomineralized structures that form the skeletons of various animals, and spider silks that are renowned for their exceptional strength and elasticity. The remarkable properties of silks, which are perhaps the best studied biological materials, are the result of the highly repetitive, modular, and biased amino acid composition of the proteins that compose them. Interestingly, similar levels of modularity/repetitiveness and similar bias in amino acid compositions have been reported in proteins that are components of structural materials in other organisms, however the exact nature and extent of this similarity, and its functional and evolutionary relevance, is unknown. Here, we investigate this similarity and use sequence features common to silks and other known structural proteins to develop a bioinformatics-based method to identify similar proteins from large-scale transcriptome and whole-genome datasets. We show that a large number of proteins identified using this method have roles in biological material formation throughout the animal kingdom. Despite the similarity in sequence characteristics, most of the silk-like structural proteins (SLSPs) identified in this study appear to have evolved independently and are restricted to a particular animal lineage. Although the exact function of many of these SLSPs is unknown, the apparent independent evolution of proteins with similar sequence characteristics in divergent lineages suggests that these features are important for the assembly of biological materials. The identification of these characteristics enable the generation of testable hypotheses regarding the mechanisms by which these proteins assemble and direct the construction of biological materials with diverse morphologies. The SilkSlider predictor software developed here is available at https://github.com/wwood/SilkSlider. PMID:27415783
Lee, Hae-Won; Kim, Dae-Won; Lee, Mi-Hwa; Kim, Byung-Yong; Cho, Yong-Joon; Yim, Kyung June; Song, Hye Seon; Rhee, Jin-Kyu; Seo, Myung-Ji; Choi, Hak-Jong; Choi, Jong-Soon; Lee, Dong-Gi; Yoon, Changmann; Nam, Young-Do; Roh, Seong Woon
2015-01-01
An extremely halophilic archaeon, Haladaptatus cibarius D43(T), was isolated from traditional Korean salt-rich fermented seafood. Strain D43(T) shows the highest 16S rRNA gene sequence similarity (98.7 %) with Haladaptatus litoreus RO1-28(T), is Gram-negative staining, motile, and extremely halophilic. Despite potential industrial applications of extremely halophilic archaea, their genome characteristics remain obscure. Here, we describe the whole genome sequence and annotated features of strain D43(T). The 3,926,724 bp genome includes 4,092 protein-coding and 57 RNA genes (including 6 rRNA and 49 tRNA genes) with an average G + C content of 57.76 %.
Oggioni, M R; Claverys, J P
1999-10-01
A survey of all Streptococcus pneumoniae GenBank/EMBL DNA sequence entries and of the public domain sequence (representing more than 90% of the genome) of an S. pneumoniae type 4 strain allowed identification of 108 copies of a 107-bp-long highly repeated intergenic element called RUP (for repeat unit of pneumococcus). Several features of the element, revealed in this study, led to the proposal that RUP is an insertion sequence (IS)-derivative that could still be mobile. Among these features are: (1) a highly significant homology between the terminal inverted repeats (IRs) of RUPs and of IS630-Spn1, a new putative IS of S. pneumoniae; and (2) insertion at a TA dinucleotide, a characteristic target of several members of the IS630 family. Trans-mobilization of RUP is therefore proposed to be mediated by the transposase of IS630-Spn1. To account for the observation that RUPs are distributed among four subtypes which exhibit different degrees of sequence homogeneity, a scenario is invoked based on successive stages of RUP mobility and non-mobility, depending on whether an active transposase is present or absent. In the latter situation, an active transposase could be reintroduced into the species through natural transformation. Examination of sequences flanking RUP revealed a preferential association with ISs. It also provided evidence that RUPs promote sequence rearrangements, thereby contributing to genome flexibility. The possibility that RUP preferentially targets transforming DNA of foreign origin and subsequently favours disruption/rearrangement of exogenous sequences is discussed.
Marck, Christian; Grosjean, Henri
2002-01-01
From 50 genomes of the three domains of life (7 eukarya, 13 archaea, and 30 bacteria), we extracted, analyzed, and compared over 4,000 sequences corresponding to cytoplasmic, nonorganellar tRNAs. For each genome, the complete set of tRNAs required to read the 61 sense codons was identified, which permitted revelation of three major anticodon-sparing strategies. Other features and sequence peculiarities analyzed are the following: (1) fit to the standard cloverleaf structure, (2) characteristic consensus sequences for elongator and initiator tDNAs, (3) frequencies of bases at each sequence position, (4) type and frequencies of conserved 2D and 3D base pairs, (5) anticodon/tDNA usages and anticodon-sparing strategies, (6) identification of the tRNA-Ile with anticodon CAU reading AUA, (7) size of variable arm, (8) occurrence and location of introns, (9) occurrence of 3'-CCA and 5'-extra G encoded at the tDNA level, and (10) distribution of the tRNA genes in genomes and their mode of transcription. Among all tRNA isoacceptors, we found that initiator tDNA-iMet is the most conserved across the three domains, yet domain-specific signatures exist. Also, according to which tRNA feature is considered (5'-extra G encoded in tDNAs-His, AUA codon read by tRNA-Ile with anticodon CAU, presence of intron, absence of "two-out-of-three" reading mode and short V-arm in tDNA-Tyr) Archaea sequester either with Bacteria or Eukarya. No common features between Eukarya and Bacteria not shared with Archaea could be unveiled. Thus, from the tRNomic point of view, Archaea appears as an "intermediate domain" between Eukarya and Bacteria. PMID:12403461
Rehearsal dynamics in elementary school children.
Lehmann, Martin; Hasselhorn, Marcus
2012-03-01
Several studies on free recall suggest that processes responsible for recall are analogous to processes responsible for rehearsal. In children, the relationship between cumulative rehearsal and recall performance has been proven to be critical; however, the locus of the effect of rehearsal is not yet fully understood. To unfold the mechanisms that come into play in an overt rehearsal free recall task, we assessed rehearsal and recall sequences in children between 8 and 10 years of age. These sequences give information about the context in which items are repeated and rearranged throughout the list and subsequently recalled. Rehearsal sequences consisted mainly of items from neighboring list positions in their original temporal order. The same characteristics were true for recall sequences. Qualitatively, order effects during study and recall did not differ over age groups. However, in older children who were using cumulative rehearsal more intensively, successive rehearsal and recall of items in their original order was more pronounced. Therefore, we suggest that a main feature of item rehearsal with regard to facilitating recall is the strengthening of interitem associations based on the temporal order within a list and that this characteristic develops with age. Copyright © 2011 Elsevier Inc. All rights reserved.
Vandenbol, M; Jauniaux, J C; Grenson, M
1989-11-15
The complete nucleotide (nt) sequence of the PUT4 gene, whose product is required for high-affinity proline active transport in the yeast Saccharomyces cerevisiae, is presented. The sequence contains a single long open reading frame of 1881 nt, encoding a polypeptide with a calculated Mr of 68,795. The predicted protein is strongly hydrophobic and exhibits six potential glycosylation sites. Its hydropathy profile suggests the presence of twelve membrane-spanning regions flanked by hydrophilic N- and C-terminal domains. The N terminus does not resemble signal sequences found in secreted proteins. These features are characteristic of integral membrane proteins catalyzing translocation of ligands across cellular membranes. Protein sequence comparisons indicate strong resemblance to the arginine and histidine permeases of S. cerevisiae, but no marked sequence similarity to the proline permease of Escherichia coli or to other known prokaryotic or eukaryotic transport proteins. The strong similarity between the three yeast amino acid permeases suggests a common ancestor for the three proteins.
Sequence signatures of allosteric proteins towards rational design.
Namboodiri, Saritha; Verma, Chandra; Dhar, Pawan K; Giuliani, Alessandro; Nair, Achuthsankar S
2010-12-01
Allostery is the phenomenon of changes in the structure and activity of proteins that appear as a consequence of ligand binding at sites other than the active site. Studying mechanistic basis of allostery leading to protein design with predetermined functional endpoints is an important unmet need of synthetic biology. Here, we screened the amino acid sequence landscape in search of sequence-signatures of allostery using Recurrence Quantitative Analysis (RQA) method. A characteristic vector, comprised of 10 features extracted from RQA was defined for amino acid sequences. Using Principal Component Analysis, four factors were found to be important determinants of allosteric behavior. Our sequence-based predictor method shows 82.6% accuracy, 85.7% sensitivity and 77.9% specificity with the current dataset. Further, we show that Laminarity-Mean-hydrophobicity representing repeated hydrophobic patches is the most crucial indicator of allostery. To our best knowledge this is the first report that describes sequence determinants of allostery based on hydrophobicity. As an outcome of these findings, we plan to explore possibility of inducing allostery in proteins.
Jiang, Rui ; Yang, Hua ; Zhou, Linqi ; Kuo, C.-C. Jay ; Sun, Fengzhu ; Chen, Ting
2007-01-01
The increasing demand for the identification of genetic variation responsible for common diseases has translated into a need for sophisticated methods for effectively prioritizing mutations occurring in disease-associated genetic regions. In this article, we prioritize candidate nonsynonymous single-nucleotide polymorphisms (nsSNPs) through a bioinformatics approach that takes advantages of a set of improved numeric features derived from protein-sequence information and a new statistical learning model called “multiple selection rule voting” (MSRV). The sequence-based features can maximize the scope of applications of our approach, and the MSRV model can capture subtle characteristics of individual mutations. Systematic validation of the approach demonstrates that this approach is capable of prioritizing causal mutations for both simple monogenic diseases and complex polygenic diseases. Further studies of familial Alzheimer diseases and diabetes show that the approach can enrich mutations underlying these polygenic diseases among the top of candidate mutations. Application of this approach to unclassified mutations suggests that there are 10 suspicious mutations likely to cause diseases, and there is strong support for this in the literature. PMID:17668383
NASA Astrophysics Data System (ADS)
Baker, Edward N.; Proft, Thomas; Kang, Haejoo
Proteins displayed on the cell surfaces of pathogenic organisms are the front-line troops of bacterial attack, playing critical roles in colonization, infection and virulence. Although such proteins can often be recognized from genome sequence data, through characteristic sequence motifs, their functions are often unknown. One such group of surface proteins is attached to the cell surface of Gram-positive pathogens through the action of sortase enzymes. Some of these proteins are now known to form pili: long filamentous structures that mediate attachment to human cells. Crystallographic analyses of these and other cell surface proteins have uncovered novel features in their structure, assembly and stability, including the presence of inter- and intramolecular isopeptide crosslinks. This improved understanding of structures on the bacterial cell surface offers opportunities for the development of some new drug targets and for novel approaches to vaccine design.
Universal and idiosyncratic characteristic lengths in bacterial genomes
NASA Astrophysics Data System (ADS)
Junier, Ivan; Frémont, Paul; Rivoire, Olivier
2018-05-01
In condensed matter physics, simplified descriptions are obtained by coarse-graining the features of a system at a certain characteristic length, defined as the typical length beyond which some properties are no longer correlated. From a physics standpoint, in vitro DNA has thus a characteristic length of 300 base pairs (bp), the Kuhn length of the molecule beyond which correlations in its orientations are typically lost. From a biology standpoint, in vivo DNA has a characteristic length of 1000 bp, the typical length of genes. Since bacteria live in very different physico-chemical conditions and since their genomes lack translational invariance, whether larger, universal characteristic lengths exist is a non-trivial question. Here, we examine this problem by leveraging the large number of fully sequenced genomes available in public databases. By analyzing GC content correlations and the evolutionary conservation of gene contexts (synteny) in hundreds of bacterial chromosomes, we conclude that a fundamental characteristic length around 10–20 kb can be defined. This characteristic length reflects elementary structures involved in the coordination of gene expression, which are present all along the genome of nearly all bacteria. Technically, reaching this conclusion required us to implement methods that are insensitive to the presence of large idiosyncratic genomic features, which may co-exist along these fundamental universal structures.
[Clinical features and COMP gene mutation in a family with a pseudoachondroplasia child].
Lu, Chun-Ting; Guo, Li; Zahng, Zhan-Hui; Lin, Wei-Xia; Song, Yuan-Zong; Feng, Lie
2013-11-01
This study aimed to report the clinical characteristics and COMP gene mutation of a family with pseudoachondroplasia (PSACH), a relatively rare spinal and epiphyseal dysplasia that is inherited as an autosomal dominant trait. Clinical information on a 5-year-2-month-old PSACH child and his parents was collected and analyzed. Diagnosis was confirmed by PCR amplification and direct sequencing of all the 19 exons and their flanking sequences of COMP gene, and the mutation was further ascertained by cloning analysis of exon 10. The child presented with short and stubby fingers, bow leg, short limb dwarfism and metaphysic broadening in long bone as well as lumbar lordosis. A mutation c.1048_1116del (p.Asn350_Asp372del) in exon 10, inherited from his father who did not demonstrate any phenotypic feature of PSACH, was detected in the child. PSACH was diagnosed definitively by means of COMP mutation analysis, on the basis of the child's clinical and imaging features. The non-penetrance phenomenon of COMP mutation was described for the first time in PSACH.
Clustering method for counting passengers getting in a bus with single camera
NASA Astrophysics Data System (ADS)
Yang, Tao; Zhang, Yanning; Shao, Dapei; Li, Ying
2010-03-01
Automatic counting of passengers is very important for both business and security applications. We present a single-camera-based vision system that is able to count passengers in a highly crowded situation at the entrance of a traffic bus. The unique characteristics of the proposed system include, First, a novel feature-point-tracking- and online clustering-based passenger counting framework, which performs much better than those of background-modeling-and foreground-blob-tracking-based methods. Second, a simple and highly accurate clustering algorithm is developed that projects the high-dimensional feature point trajectories into a 2-D feature space by their appearance and disappearance times and counts the number of people through online clustering. Finally, all test video sequences in the experiment are captured from a real traffic bus in Shanghai, China. The results show that the system can process two 320×240 video sequences at a frame rate of 25 fps simultaneously, and can count passengers reliably in various difficult scenarios with complex interaction and occlusion among people. The method achieves high accuracy rates up to 96.5%.
Efficient iris recognition by characterizing key local variations.
Ma, Li; Tan, Tieniu; Wang, Yunhong; Zhang, Dexin
2004-06-01
Unlike other biometrics such as fingerprints and face, the distinct aspect of iris comes from randomly distributed features. This leads to its high reliability for personal identification, and at the same time, the difficulty in effectively representing such details in an image. This paper describes an efficient algorithm for iris recognition by characterizing key local variations. The basic idea is that local sharp variation points, denoting the appearing or vanishing of an important image structure, are utilized to represent the characteristics of the iris. The whole procedure of feature extraction includes two steps: 1) a set of one-dimensional intensity signals is constructed to effectively characterize the most important information of the original two-dimensional image; 2) using a particular class of wavelets, a position sequence of local sharp variation points in such signals is recorded as features. We also present a fast matching scheme based on exclusive OR operation to compute the similarity between a pair of position sequences. Experimental results on 2255 iris images show that the performance of the proposed method is encouraging and comparable to the best iris recognition algorithm found in the current literature.
2009-01-01
Background The characterisation, or binning, of metagenome fragments is an important first step to further downstream analysis of microbial consortia. Here, we propose a one-dimensional signature, OFDEG, derived from the oligonucleotide frequency profile of a DNA sequence, and show that it is possible to obtain a meaningful phylogenetic signal for relatively short DNA sequences. The one-dimensional signal is essentially a compact representation of higher dimensional feature spaces of greater complexity and is intended to improve on the tetranucleotide frequency feature space preferred by current compositional binning methods. Results We compare the fidelity of OFDEG against tetranucleotide frequency in both an unsupervised and semi-supervised setting on simulated metagenome benchmark data. Four tests were conducted using assembler output of Arachne and phrap, and for each, performance was evaluated on contigs which are greater than or equal to 8 kbp in length and contigs which are composed of at least 10 reads. Using both G-C content in conjunction with OFDEG gave an average accuracy of 96.75% (semi-supervised) and 95.19% (unsupervised), versus 94.25% (semi-supervised) and 82.35% (unsupervised) for tetranucleotide frequency. Conclusion We have presented an observation of an alternative characteristic of DNA sequences. The proposed feature representation has proven to be more beneficial than the existing tetranucleotide frequency space to the metagenome binning problem. We do note, however, that our observation of OFDEG deserves further anlaysis and investigation. Unsupervised clustering revealed OFDEG related features performed better than standard tetranucleotide frequency in representing a relevant organism specific signal. Further improvement in binning accuracy is given by semi-supervised classification using OFDEG. The emphasis on a feature-driven, bottom-up approach to the problem of binning reveals promising avenues for future development of techniques to characterise short environmental sequences without bias toward cultivable organisms. PMID:19958473
Pal Choudhury, Pabitra
2017-01-01
Periplasmic c7 type cytochrome A (PpcA) protein is determined in Geobacter sulfurreducens along with its other four homologs (PpcB-E). From the crystal structure viewpoint the observation emerges that PpcA protein can bind with Deoxycholate (DXCA), while its other homologs do not. But it is yet to be established with certainty the reason behind this from primary protein sequence information. This study is primarily based on primary protein sequence analysis through the chemical basis of embedded amino acids. Firstly, we look for the chemical group specific score of amino acids. Along with this, we have developed a new methodology for the phylogenetic analysis based on chemical group dissimilarities of amino acids. This new methodology is applied to the cytochrome c7 family members and pinpoint how a particular sequence is differing with others. Secondly, we build a graph theoretic model on using amino acid sequences which is also applied to the cytochrome c7 family members and some unique characteristics and their domains are highlighted. Thirdly, we search for unique patterns as subsequences which are common among the group or specific individual member. In all the cases, we are able to show some distinct features of PpcA that emerges PpcA as an outstanding protein compared to its other homologs, resulting towards its binding with deoxycholate. Similarly, some notable features for the structurally dissimilar protein PpcD compared to the other homologs are also brought out. Further, the five members of cytochrome family being homolog proteins, they must have some common significant features which are also enumerated in this study. PMID:28362850
Godkin, A; Friede, T; Davenport, M; Stevanovic, S; Willis, A; Jewell, D; Hill, A; Rammensee, H G
1997-06-01
HLA-DQ8 (A1*0301, B1*0302) and -DQ2 (A1*0501, B1*0201) are both associated with diseases such as insulin-dependent diabetes mellitus and coeliac disease. We used the technique of pool sequencing to look at the requirements of peptides binding to HLA-DQ8, and combined these data with naturally sequenced ligands and in vitro binding assays to describe a novel motif for HLA-DQ8. The motif, which has the same basic format as many HLA-DR molecules, consists of four or five anchor regions, in the positions from the N-terminus of the binding core of n, n + 3, n + 5/6 and n + 8, i.e. P1, P4, P6/7 and P9. P1 and P9 require negative or polar residues, with mainly aliphatic residues at P4 and P6/7. The features of the HLA-DQ8 motif were then compared to a pool sequence of peptides eluted from HLA-DQ2. A consensus motif for the binding of a common peptide which may be involved in disease pathogenesis is described. Neither of the disease-associated alleles HLA-DQ2 and -DQ8 have Asp at position 57 of the beta-chain. This Asp, if present, may form a salt bridge with an Arg at position 79 of the alpha-chain and so alter the binding specificity of P9. HLA-DQ2 and -DQ8 both appear to prefer negatively charged amino acids at P9. In contrast, HLA-DQ7 (A1*0301, B1*0301), which is not associated with diabetes, has Asp at beta 57, allowing positively charged amino acids at P9. This analysis of the sequence features of DQ-binding peptides suggests molecular characteristics which may be useful to predict epitopes involved in disease pathogenesis.
VarMod: modelling the functional effects of non-synonymous variants.
Pappalardo, Morena; Wass, Mark N
2014-07-01
Unravelling the genotype-phenotype relationship in humans remains a challenging task in genomics studies. Recent advances in sequencing technologies mean there are now thousands of sequenced human genomes, revealing millions of single nucleotide variants (SNVs). For non-synonymous SNVs present in proteins the difficulties of the problem lie in first identifying those nsSNVs that result in a functional change in the protein among the many non-functional variants and in turn linking this functional change to phenotype. Here we present VarMod (Variant Modeller) a method that utilises both protein sequence and structural features to predict nsSNVs that alter protein function. VarMod develops recent observations that functional nsSNVs are enriched at protein-protein interfaces and protein-ligand binding sites and uses these characteristics to make predictions. In benchmarking on a set of nearly 3000 nsSNVs VarMod performance is comparable to an existing state of the art method. The VarMod web server provides extensive resources to investigate the sequence and structural features associated with the predictions including visualisation of protein models and complexes via an interactive JSmol molecular viewer. VarMod is available for use at http://www.wasslab.org/varmod. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Celik, Nermin; Webb, Chaille T.; Leyton, Denisse L.; Holt, Kathryn E.; Heinz, Eva; Gorrell, Rebecca; Kwok, Terry; Naderer, Thomas; Strugnell, Richard A.; Speed, Terence P.; Teasdale, Rohan D.; Likić, Vladimir A.; Lithgow, Trevor
2012-01-01
Autotransporters are secreted proteins that are assembled into the outer membrane of bacterial cells. The passenger domains of autotransporters are crucial for bacterial pathogenesis, with some remaining attached to the bacterial surface while others are released by proteolysis. An enigma remains as to whether autotransporters should be considered a class of secretion system, or simply a class of substrate with peculiar requirements for their secretion. We sought to establish a sensitive search protocol that could identify and characterize diverse autotransporters from bacterial genome sequence data. The new sequence analysis pipeline identified more than 1500 autotransporter sequences from diverse bacteria, including numerous species of Chlamydiales and Fusobacteria as well as all classes of Proteobacteria. Interrogation of the proteins revealed that there are numerous classes of passenger domains beyond the known proteases, adhesins and esterases. In addition the barrel-domain-a characteristic feature of autotransporters-was found to be composed from seven conserved sequence segments that can be arranged in multiple ways in the tertiary structure of the assembled autotransporter. One of these conserved motifs overlays the targeting information required for autotransporters to reach the outer membrane. Another conserved and diagnostic motif maps to the linker region between the passenger domain and barrel-domain, indicating it as an important feature in the assembly of autotransporters. PMID:22905239
Schilf, Paul; Peter, Annette; Hurek, Thomas; Stick, Reimer
2014-07-01
Lamin proteins are found in all metazoans. Most non-vertebrate genomes including those of the closest relatives of vertebrates, the cephalochordates and tunicates, encode only a single lamin. In teleosts and tetrapods the number of lamin genes has quadrupled. They can be divided into four sub-types, lmnb1, lmnb2, LIII, and lmna, each characterized by particular features and functional differentiations. Little is known when during vertebrate evolution these features have emerged. Lampreys belong to the Agnatha, the sister group of the Gnathostomata. They split off first within the vertebrate lineage. Analysis of the sea lamprey (Petromyzon marinus) lamin complement presented here, identified three functional lamin genes, one encoding a lamin LIII, indicating that the characteristic gene structure of this subtype had been established prior to the agnathan/gnathostome split. Two other genes encode lamins for which orthology to gnathostome lamins cannot be designated. Search for lamin gene sequences in all vertebrate taxa for which sufficient sequence data are available reveals the evolutionary time frame in which specific features of the vertebrate lamins were established. Structural features characteristic for A-type lamins are not found in the lamprey genome. In contrast, lmna genes are present in all gnathostome lineages suggesting that this gene evolved with the emergence of the gnathostomes. The analysis of lamin gene neighborhoods reveals noticeable similarities between the different vertebrate lamin genes supporting the hypothesis that they emerged due to two rounds of whole genome duplication and makes clear that an orthologous relationship between a particular vertebrate paralog and lamins outside the vertebrate lineage cannot be established. Copyright © 2014 Elsevier GmbH. All rights reserved.
Deriving video content type from HEVC bitstream semantics
NASA Astrophysics Data System (ADS)
Nightingale, James; Wang, Qi; Grecos, Christos; Goma, Sergio R.
2014-05-01
As network service providers seek to improve customer satisfaction and retention levels, they are increasingly moving from traditional quality of service (QoS) driven delivery models to customer-centred quality of experience (QoE) delivery models. QoS models only consider metrics derived from the network however, QoE models also consider metrics derived from within the video sequence itself. Various spatial and temporal characteristics of a video sequence have been proposed, both individually and in combination, to derive methods of classifying video content either on a continuous scale or as a set of discrete classes. QoE models can be divided into three broad categories, full reference, reduced reference and no-reference models. Due to the need to have the original video available at the client for comparison, full reference metrics are of limited practical value in adaptive real-time video applications. Reduced reference metrics often require metadata to be transmitted with the bitstream, while no-reference metrics typically operate in the decompressed domain at the client side and require significant processing to extract spatial and temporal features. This paper proposes a heuristic, no-reference approach to video content classification which is specific to HEVC encoded bitstreams. The HEVC encoder already makes use of spatial characteristics to determine partitioning of coding units and temporal characteristics to determine the splitting of prediction units. We derive a function which approximates the spatio-temporal characteristics of the video sequence by using the weighted averages of the depth at which the coding unit quadtree is split and the prediction mode decision made by the encoder to estimate spatial and temporal characteristics respectively. Since the video content type of a sequence is determined by using high level information parsed from the video stream, spatio-temporal characteristics are identified without the need for full decoding and can be used in a timely manner to aid decision making in QoE oriented adaptive real time streaming.
Using distances between Top-n-gram and residue pairs for protein remote homology detection.
Liu, Bin; Xu, Jinghao; Zou, Quan; Xu, Ruifeng; Wang, Xiaolong; Chen, Qingcai
2014-01-01
Protein remote homology detection is one of the central problems in bioinformatics, which is important for both basic research and practical application. Currently, discriminative methods based on Support Vector Machines (SVMs) achieve the state-of-the-art performance. Exploring feature vectors incorporating the position information of amino acids or other protein building blocks is a key step to improve the performance of the SVM-based methods. Two new methods for protein remote homology detection were proposed, called SVM-DR and SVM-DT. SVM-DR is a sequence-based method, in which the feature vector representation for protein is based on the distances between residue pairs. SVM-DT is a profile-based method, which considers the distances between Top-n-gram pairs. Top-n-gram can be viewed as a profile-based building block of proteins, which is calculated from the frequency profiles. These two methods are position dependent approaches incorporating the sequence-order information of protein sequences. Various experiments were conducted on a benchmark dataset containing 54 families and 23 superfamilies. Experimental results showed that these two new methods are very promising. Compared with the position independent methods, the performance improvement is obvious. Furthermore, the proposed methods can also provide useful insights for studying the features of protein families. The better performance of the proposed methods demonstrates that the position dependant approaches are efficient for protein remote homology detection. Another advantage of our methods arises from the explicit feature space representation, which can be used to analyze the characteristic features of protein families. The source code of SVM-DT and SVM-DR is available at http://bioinformatics.hitsz.edu.cn/DistanceSVM/index.jsp.
Space moving target detection using time domain feature
NASA Astrophysics Data System (ADS)
Wang, Min; Chen, Jin-yong; Gao, Feng; Zhao, Jin-yu
2018-01-01
The traditional space target detection methods mainly use the spatial characteristics of the star map to detect the targets, which can not make full use of the time domain information. This paper presents a new space moving target detection method based on time domain features. We firstly construct the time spectral data of star map, then analyze the time domain features of the main objects (target, stars and the background) in star maps, finally detect the moving targets using single pulse feature of the time domain signal. The real star map target detection experimental results show that the proposed method can effectively detect the trajectory of moving targets in the star map sequence, and the detection probability achieves 99% when the false alarm rate is about 8×10-5, which outperforms those of compared algorithms.
Poltev, V; Anisimov, V M; Dominguez, V; Gonzalez, E; Deriabina, A; Garcia, D; Rivas, F; Polteva, N A
2018-02-01
Deciphering the mechanism of functioning of DNA as the carrier of genetic information requires identifying inherent factors determining its structure and function. Following this path, our previous DFT studies attributed the origin of unique conformational characteristics of right-handed Watson-Crick duplexes (WCDs) to the conformational profile of deoxydinucleoside monophosphates (dDMPs) serving as the minimal repeating units of DNA strand. According to those findings, the directionality of the sugar-phosphate chain and the characteristic ranges of dihedral angles of energy minima combined with the geometric differences between purines and pyrimidines determine the dependence on base sequence of the three-dimensional (3D) structure of WCDs. This work extends our computational study to complementary deoxydinucleotide-monophosphates (cdDMPs) of non-standard conformation, including those of Z-family, Hoogsteen duplexes, parallel-stranded structures, and duplexes with mispaired bases. For most of these systems, except Z-conformation, computations closely reproduce experimental data within the tolerance of characteristic limits of dihedral parameters for each conformation family. Computation of cdDMPs with Z-conformation reveals that their experimental structures do not correspond to the internal energy minimum. This finding establishes the leading role of external factors in formation of the Z-conformation. Energy minima of cdDMPs of non-Watson-Crick duplexes demonstrate different sequence-dependence features than those known for WCDs. The obtained results provide evidence that the biologically important regularities of 3D structure distinguish WCDs from duplexes having non-Watson-Crick nucleotide pairing.
Cho, Young Sun; Choi, Buyl Nim; Ha, En-Mi; Kim, Ki Hong; Kim, Sung Koo; Kim, Dong Soo; Nam, Yoon Kwon
2005-01-01
Novel metallothionein (MT) complementary DNA and genomic sequences were isolated from a cartilaginous shark species, Scyliorhinus torazame. The full-length open reading frame (ORF) of shark MT cDNA encoded 68 amino acids with a high cysteine content (29%). The genomic ORF sequence (932 bp) of shark MT isolated by polymerase chain reaction (PCR) comprised 3 exons with 2 interventing introns. Shark MT sequence shared many conserved features with other vertebrate MTs: overall amino acid identities of shark MT ranged from 47% to 57% with fish MTs, and 41% to 62% with mammalian MTs. However, in addition to these conserved characteristics, shark MT sequence exhibited some unique characteristics. It contained 4 extra amino acids (Lys-Ala-Gly-Arg) at the end of the beta-domain, which have not been reported in any other vertebrate MTs. The last amino acid residue at the C-terminus was Ser, which also has not been reported in fish and mammalian MTs. The MT messenger RNA levels in shark liver and kidney, assessed by semiquantitative reverse transcriptase PCR and RNA blot hybridization, were significantly affected by experimental exposures to heavy metals (cadmium, copper, and zinc). Generally, the transcriptional activation of shark MT gene was dependent on the dose (0-10 mg/kg body weight for injection and 0-20 microM for immersion) and duration (1-10 days); zinc was a more potent inducer than copper and cadmium.
Agarwal, Pragati; Singh, Jyoti; Singh, R P
2017-05-01
Aspergillus niger PA2, a novel strain isolated from waste effluents of food industry, is a potential extracellular tyrosinase producer. Enzyme activity and L-DOPA production were maximum when glucose and peptone were employed as C source and nitrogen source respectively in the medium and enhanced notably when the copper was supplemented, thus depicting the significance of copper in tyrosinase activity. Tyrosinase-encoding gene from the fungus was cloned, and amplification of the tyrosinase gene yielded a 1127-bp DNA fragment and 374 amino acid residue long product that encoded for a predicted protein of 42.3 kDa with an isoelectric point of 4.8. Primary sequence analysis of A. niger PA2 tyrosinase had shown that it had approximately 99% identity with that of A. niger CBS 513.88, which was further confirmed by phylogenetic analysis. The inferred amino acid sequence of A. niger tyrosinase contained two putative copper-binding sites comprising of six histidines, a characteristic feature for type-3 copper proteins, which were highly conserved in all tyrosinases throughout the Aspergillus species. When superimposed onto the tertiary structure of A. oryzae tyrosinase, the conserved residues from both the organisms occupied same spatial positions to provide a di-copper-binding peptide groove.
Genomic analyses of Clostridium perfringens isolates from five toxinotypes.
Hassan, Karl A; Elbourne, Liam D H; Tetu, Sasha G; Melville, Stephen B; Rood, Julian I; Paulsen, Ian T
2015-05-01
Clostridium perfringens can be isolated from a range of environments, including soil, marine and fresh water sediments, and the gastrointestinal tracts of animals and humans. Some C. perfringens strains have attractive industrial applications, e.g., in the degradation of waste products or the production of useful chemicals. However, C. perfringens has been most studied as the causative agent of a range of enteric and soft tissue infections of varying severities in humans and animals. Host preference and disease type in C. perfringens are intimately linked to the production of key extracellular toxins and on this basis toxigenic C. perfringens strains have been classified into five toxinotypes (A-E). To date, twelve genome sequences have been generated for a diverse collection of C. perfringens isolates, including strains associated with human and animal infections, a human commensal strain, and a strain with potential industrial utility. Most of the sequenced strains are classified as toxinotype A. However, genome sequences of representative strains from each of the other four toxinotypes have also been determined. Analysis of this collection of sequences has highlighted a lack of features differentiating toxinotype A strains from the other isolates, indicating that the primary defining characteristic of toxinotype A strains is their lack of key plasmid-encoded extracellular toxin genes associated with toxinotype B to E strains. The representative B-E strains sequenced to date each harbour many unique genes. Additional genome sequences are needed to determine if these genes are characteristic of their respective toxinotypes. Copyright © 2014. Published by Elsevier Masson SAS.
Influence of time and length size feature selections for human activity sequences recognition.
Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran
2014-01-01
In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.
Cavalcanti, Sarah Desirée Barbosa; Vidal, Mônica Scarpelli Martinelli; Sousa, Maria da Glória Teixeira de; Del Negro, Gilda Maria Barbaro
2013-01-01
Coccidioidomycosis is an emerging fungal disease in Brazil; adequate maintenance and authentication of Coccidioides isolates are essential for research into genetic diversity of the environmental organisms, as well as for understanding the human disease. Seventeen Coccidioides isolates maintained under mineral oil since 1975 in the Instituto de Medicina Tropical de São Paulo (IMTSP) culture collection, Brazil, were evaluated with respect to their viability, morphological characteristics and genetic features in order to authenticate these fungal cultures. Only five isolates were viable after almost 30 years, showing typical morphological characteristics, and sequencing analysis using Coi-F and Coi-R primers revealed 99% identity with Coccidioides genera. These five isolates were then preserved in liquid nitrogen and sterile water, and remained viable after two years of storage under these conditions, maintaining the same features.
Robot acting on moving bodies (RAMBO): Preliminary results
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madju; Harwood, David
1989-01-01
A robot system called RAMBO is being developed. It is equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a moving object. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations nearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enchancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows the use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using parametric cubic splines between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Stargardt disease: clinical features, molecular genetics, animal models and therapeutic options
Tanna, Preena; Strauss, Rupert W; Fujinami, Kaoru; Michaelides, Michel
2017-01-01
Stargardt disease (STGD1; MIM 248200) is the most prevalent inherited macular dystrophy and is associated with disease-causing sequence variants in the gene ABCA4. Significant advances have been made over the last 10 years in our understanding of both the clinical and molecular features of STGD1, and also the underlying pathophysiology, which has culminated in ongoing and planned human clinical trials of novel therapies. The aims of this review are to describe the detailed phenotypic and genotypic characteristics of the disease, conventional and novel imaging findings, current knowledge of animal models and pathogenesis, and the multiple avenues of intervention being explored. PMID:27491360
Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning
2018-03-08
Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.
Estep, Anne L; Tidyman, William E; Teitell, Michael A; Cotter, Philip D; Rauen, Katherine A
2006-01-01
Costello syndrome (CS) is a complex developmental disorder involving characteristic craniofacial features, failure to thrive, developmental delay, cardiac and skeletal anomalies, and a predisposition to develop neoplasia. Based on similarities with other cancer syndromes, we previously hypothesized that CS is likely due to activation of signal transduction through the Ras/MAPK pathway [Tartaglia et al., 2003]. In this study, the HRAS coding region was sequenced for mutations in a large, well-characterized cohort of 36 CS patients. Heterogeneous missense point mutations predicting an amino acid substitution were identified in 33/36 (92%) patients. The majority (91%) had a 34G --> A transition in codon 12. Less frequent mutations included 35G --> C (codon 12) and 37G --> T (codon 13). Parental samples did not have an HRAS mutation supporting the hypothesis of de novo heterogeneous mutations. There is phenotypic variability among patients with a 34G --> A transition. The most consistent features included characteristic facies and skin, failure to thrive, developmental delay, musculoskeletal abnormalities, visual impairment, cardiac abnormalities, and generalized hyperpigmentation. The two patients with 35G --> C had cardiac arrhythmias whereas one patient with a 37G --> T transversion had an enlarged aortic root. Of the patients with a clinical diagnosis of CS, neoplasia was the most consistent phenotypic feature for predicating an HRAS mutation. To gain an understanding of the relationship between constitutional HRAS mutations and malignancy, HRAS was sequenced in an advanced biphasic rhabdomyosarcoma/fibrosarcoma from an individual with a 34G --> A mutation. Loss of the wild-type HRAS allele was observed, suggesting tumorigenesis in CS patients is accompanied by additional somatic changes affecting HRAS. Finally, due to phenotypic overlap between CS and cardio-facio-cutaneous (CFC) syndromes, the HRAS coding region was sequenced in a well-characterized CFC cohort. No mutations were found which support a distinct genetic etiology between CS and CFC syndromes. (c) 2005 Wiley-Liss, Inc.
NASA Astrophysics Data System (ADS)
Barnes, S. J.; Dering, G.
2016-12-01
Previous studies of large komatiite fields in Archean greenstone belts in Western Australia and elsewhere have led to the suggestion that komatiite lavas were emplaced by similar mechanisms to modern pahoehoe flows, notwithstanding the very low viscosities and sea-floor eruption setting. Of komatiites. We use UAV photogrammetry to identify and map inflation features characteristic of modern pahoehoe flows in Archean komatiites at the Gordon Sirdar Lake locality near Kalgoorlie. Komatiite lavas, forming part of the 2705 Ma old plume-related bimodal volcanic sequence of the Eastern Goldfields Superterrane, Yilgarn Craton, were emplaced within a sequence of dacitic lava flows and semi-consolidated tuffs. The sequence was tilted to the vertical on the flanks of a regional isoclinal fold, and is exposed as partially weathered outcrop in the bed of a playa lake. Komatiite lava lobes form characteristic lenticular cross sections ranging from 1-6 m thick and up to 20m long, in some cases with lower margins draped over pre-existing dacite flow tops, and in others showing invasive textures implying eruption onto or into wet sediment. Inflation features include tumuli, inflation clefts, breakouts, and terraced margins. Spinifex textures are preserved locally at flow tops and rarely at bases. High temperature (>1400 C) and low viscosities (<50 Pa s) of komatiites evidently do not preclude inflation as an emplacement mechanism of individual flows. Flow-top morphology has been used to identify inflation of basaltic lava flows in Martian environments. We suggest these criteria may be extended to the possible recognition of Martian komatiites.
NGS Catalog: A Database of Next Generation Sequencing Studies in Humans
Xia, Junfeng; Wang, Qingguo; Jia, Peilin; Wang, Bing; Pao, William; Zhao, Zhongming
2015-01-01
Next generation sequencing (NGS) technologies have been rapidly applied in biomedical and biological research since its advent only a few years ago, and they are expected to advance at an unprecedented pace in the following years. To provide the research community with a comprehensive NGS resource, we have developed the database Next Generation Sequencing Catalog (NGS Catalog, http://bioinfo.mc.vanderbilt.edu/NGS/index.html), a continually updated database that collects, curates and manages available human NGS data obtained from published literature. NGS Catalog deposits publication information of NGS studies and their mutation characteristics (SNVs, small insertions/deletions, copy number variations, and structural variants), as well as mutated genes and gene fusions detected by NGS. Other functions include user data upload, NGS general analysis pipelines, and NGS software. NGS Catalog is particularly useful for investigators who are new to NGS but would like to take advantage of these powerful technologies for their own research. Finally, based on the data deposited in NGS Catalog, we summarized features and findings from whole exome sequencing, whole genome sequencing, and transcriptome sequencing studies for human diseases or traits. PMID:22517761
Processing Dynamic Image Sequences from a Moving Sensor.
1984-02-01
65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign
De Lillo, Carlo; Kirby, Melissa; Poole, Daniel
2016-01-01
Immediate serial spatial recall measures the ability to retain sequences of locations in short-term memory and is considered the spatial equivalent of digit span. It is tested by requiring participants to reproduce sequences of movements performed by an experimenter or displayed on a monitor. Different organizational factors dramatically affect serial spatial recall but they are often confounded or underspecified. Untangling them is crucial for the characterization of working-memory models and for establishing the contribution of structure and memory capacity to spatial span. We report five experiments assessing the relative role and independence of factors that have been reported in the literature. Experiment 1 disentangled the effects of spatial clustering and path-length by manipulating the distance of items displayed on a touchscreen monitor. Long-path sequences segregated by spatial clusters were compared with short-path sequences not segregated by clusters. Recall was more accurate for sequences segregated by clusters independently from path-length. Experiment 2 featured conditions where temporal pauses were introduced between or within cluster boundaries during the presentation of sequences with the same paths. Thus, the temporal structure of the sequences was either consistent or inconsistent with a hierarchical representation based on segmentation by spatial clusters but the effect of structure could not be confounded with effects of path-characteristics. Pauses at cluster boundaries yielded more accurate recall, as predicted by a hierarchical model. In Experiment 3, the systematic manipulation of sequence structure, path-length, and presence of path-crossings of sequences showed that structure explained most of the variance, followed by the presence/absence of path-crossings, and path-length. Experiments 4 and 5 replicated the results of the previous experiments in immersive virtual reality navigation tasks where the viewpoint of the observer changed dynamically during encoding and recall. This suggested that the effects of structure in spatial span are not dependent on perceptual grouping processes induced by the aerial view of the stimulus array typically afforded by spatial recall tasks. These results demonstrate the independence of coding strategies based on structure from effects of path characteristics and perceptual grouping in immediate serial spatial recall. PMID:27891101
Filteau, Marie; Lagacé, Luc; LaPointe, Gisèle; Roy, Denis
2010-04-01
An arbitrary primed community PCR fingerprinting technique based on capillary electrophoresis was developed to study maple sap microbial community characteristics among 19 production sites in Québec over the tapping season. Presumptive fragment identification was made with corresponding fingerprint profiles of bacterial isolate cultures. Maple sap microbial communities were subsequently compared using a representative subset of 13 16S rRNA gene clone libraries followed by gene sequence analysis. Results from both methods indicated that all maple sap production sites and flow periods shared common microbiota members, but distinctive features also existed. Changes over the season in relative abundance of predominant populations showed evidence of a common pattern. Pseudomonas (64%) and Rahnella (8%) were the most abundantly and frequently represented genera of the 2239 sequences analyzed. Janthinobacterium, Leuconostoc, Lactococcus, Weissella, Epilithonimonas and Sphingomonas were revealed as occasional contaminants in maple sap. Maple sap microbiota showed a low level of deep diversity along with a high variation of similar 16S rRNA gene sequences within the Pseudomonas genus. Predominance of Pseudomonas is suggested as a typical feature of maple sap microbiota across geographical regions, production sites, and sap flow periods.
Huckabee, Maggie-Lee; Lamvik, Kristin; Jones, Richard
2014-08-15
Clinical data are submitted as documentation of a pathophysiologic feature of dysphagia termed pharyngeal mis-sequencing and to encourage clinicians and researchers to adopt more critical approaches to diagnosis and treatment planning. Recent clinical experience has identified a cohort of patients who present with an atypical dysphagia not specifically described in the literature: mis-sequenced constriction of the pharynx when swallowing. As a result, they are unable to coordinate streamlined bolus transfer from the pharynx into the esophagus. This mis-sequencing contributes to nasal redirection, aspiration, and, for some, the inability to safely tolerate an oral diet. Sixteen patients (8 females, 8 males), with a mean age of 44 years (range=25-78), had an average time post-onset of 23 months (range=2-72) at initiation of intensive rehabilitation. A 3-channel manometric catheter was used to measure pharyngeal pressure. The average peak-to-peak latency between nadir pressures at sensor-1 and sensor-2 was 15 ms (95% CI, -2 to 33 ms), compared to normative mean latency of 239 ms (95% CI, 215 to 263 ms). Rehabilitative responses are summarized, along with a single detailed case report. It is unclear from these data if pharyngeal mis-sequencing is (i) a pathological feature of impaired motor planning from brainstem damage or (ii) a maladaptive compensation developed in response to chronic dysphagia. Future investigation is needed to provide a full report of pharyngeal mis-sequencing, and the implications on our understanding of underlying neural control of swallowing. Copyright © 2014 Elsevier B.V. All rights reserved.
Chen, Yuantao; Xu, Weihong; Kuang, Fangjun; Gao, Shangbing
2013-01-01
The efficient target tracking algorithm researches have become current research focus of intelligent robots. The main problems of target tracking process in mobile robot face environmental uncertainty. They are very difficult to estimate the target states, illumination change, target shape changes, complex backgrounds, and other factors and all affect the occlusion in tracking robustness. To further improve the target tracking's accuracy and reliability, we present a novel target tracking algorithm to use visual saliency and adaptive support vector machine (ASVM). Furthermore, the paper's algorithm has been based on the mixture saliency of image features. These features include color, brightness, and sport feature. The execution process used visual saliency features and those common characteristics have been expressed as the target's saliency. Numerous experiments demonstrate the effectiveness and timeliness of the proposed target tracking algorithm in video sequences where the target objects undergo large changes in pose, scale, and illumination.
Hooper, Paula; Knuiman, Matthew; Foster, Sarah; Giles-Corti, Billie
2015-11-01
Planning policy makers are requesting clearer guidance on the key design features required to build neighbourhoods that promote active living. Using a backwards stepwise elimination procedure (logistic regression with generalised estimating equations adjusting for demographic characteristics, self-selection factors, stage of construction and scale of development) this study identified specific design features (n=16) from an operational planning policy ("Liveable Neighbourhoods") that showed the strongest associations with walking behaviours (measured using the Neighbourhood Physical Activity Questionnaire). The interacting effects of design features on walking behaviours were also investigated. The urban design features identified were grouped into the "building blocks of a Liveable Neighbourhood", reflecting the scale, importance and sequencing of the design and implementation phases required to create walkable, pedestrian friendly developments. Copyright © 2015 Elsevier Ltd. All rights reserved.
McGillewie, Lara; Ramesh, Muthusamy; Soliman, Mahmoud E
2017-10-01
Aspartic proteases are a class of hydrolytic enzymes that have been implicated in a number of diseases such as HIV, malaria, cancer and Alzheimer's. The flap region of aspartic proteases is a characteristic unique structural feature of these enzymes; and found to have a profound impact on protein overall structure, function and dynamics. Flap dynamics also plays a crucial role in drug binding and drug resistance. Therefore, understanding the structure and dynamic behavior of this flap regions is crucial in the design of potent and selective inhibitors against aspartic proteases. Defining metrics that can describe the flap motion/dynamics has been a challenging topic in literature. This review is the first attempt to compile comprehensive information on sequence, structure, motion and metrics used to assess the dynamics of the flap region of different aspartic proteases in "one pot". We believe that this review would be of critical importance to the researchers from different scientific domains.
MatureP: prediction of secreted proteins with exclusive information from their mature regions.
Orfanoudaki, Georgia; Markaki, Maria; Chatzi, Katerina; Tsamardinos, Ioannis; Economou, Anastassios
2017-06-12
More than a third of the cellular proteome is non-cytoplasmic. Most secretory proteins use the Sec system for export and are targeted to membranes using signal peptides and mature domains. To specifically analyze bacterial mature domain features, we developed MatureP, a classifier that predicts secretory sequences through features exclusively computed from their mature domains. MatureP was trained using Just Add Data Bio, an automated machine learning tool. Mature domains are predicted efficiently with ~92% success, as measured by the Area Under the Receiver Operating Characteristic Curve (AUC). Predictions were validated using experimental datasets of mutated secretory proteins. The features selected by MatureP reveal prominent differences in amino acid content between secreted and cytoplasmic proteins. Amino-terminal mature domain sequences have enhanced disorder, more hydroxyl and polar residues and less hydrophobics. Cytoplasmic proteins have prominent amino-terminal hydrophobic stretches and charged regions downstream. Presumably, secretory mature domains comprise a distinct protein class. They balance properties that promote the necessary flexibility required for the maintenance of non-folded states during targeting and secretion with the ability of post-secretion folding. These findings provide novel insight in protein trafficking, sorting and folding mechanisms and may benefit protein secretion biotechnology.
Identification and characterization of Burkholderia multivorans CCA53.
Akita, Hironaga; Kimura, Zen-Ichiro; Yusoff, Mohd Zulkhairi Mohd; Nakashima, Nobutaka; Hoshino, Tamotsu
2017-07-06
A lignin-degrading bacterium, Burkholderia sp. CCA53, was previously isolated from leaf soil. The purpose of this study was to determine phenotypic and biochemical features of Burkholderia sp. CCA53. Multilocus sequence typing (MLST) analysis based on fragments of the atpD, gltD, gyrB, lepA, recA and trpB gene sequences was performed to identify Burkholderia sp. CCA53. The MLST analysis revealed that Burkholderia sp. CCA53 was tightly clustered with B. multivorans ATCC BAA-247 T . The quinone and cellular fatty acid profiles, carbon source utilization, growth temperature and pH were consistent with the characteristics of B. multivorans species. Burkholderia sp. CCA53 was therefore identified as B. multivorans CCA53.
Meng, Xiaohong; Li, Qiyou; Guo, Hong; Xu, Haiwei; Li, Shiying; Yin, Zhengqin
2017-01-01
To characterize the clinical and molecular genetic characteristics of a large, multigenerational Chinese family showing different phenotypes. A pedigree consisted of 56 individuals in 5 generations was recruited. Comprehensive ophthalmic examinations were performed in 16 family members affected. Mutation screening of CYP4V2 was performed by Sanger sequencing. Next-generation sequencing (NGS) was performed to capture and sequence all exons of 47 known retinal dystrophy-associated genes in two affected family members who had no mutations in CYP4V2 . The detected variants in NGS were validated by Sanger sequencing in the family members. Two compound heterozygous CYP4V2 mutations (c.802-8_810del17insGC and c.992A>C) were detected in the proband who presented typical clinical features of BCD. One missense mutation (c.1482C>T, p.T494M) in the PRPF3 gene was detected in 9 out of 22 affected family members who manifested classical clinical features of RP. Our results showed that two compound heterozygous CYP4V2 mutations caused BCD, and one missense mutation in PRPF3 was responsible for adRP in this large family. This study suggests that accurate phenotypic diagnosis, molecular diagnosis, and genetic counseling are necessary for patients with hereditary retinal degeneration in some large mutigenerational family.
Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.
Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia
2017-07-24
Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.
Genotype and Phenotype of Echinococcus granulosus Derived from Wild Sheep (Ovis orientalis) in Iran.
Eslami, Ali; Meshgi, Behnam; Jalousian, Fatemeh; Rahmani, Shima; Salari, Mohammad Ali
2016-02-01
The aim of the present study is to determine the characteristics of genotype and phenotype of Echinococcus granulosus derived from wild sheep and to compare them with the strains of E. granulosus sensu stricto (sheep-dog) and E. granulosus camel strain (camel-dog) in Iran. In Khojir National Park, near Tehran, Iran, a fertile hydatid cyst was recently found in the liver of a dead wild sheep (Ovis orientalis). The number of protoscolices (n=6,000) proved enough for an experimental infection in a dog. The characteristics of large and small hooks of metacestode were statistically determined as the sensu stricto strain but not the camel strain (P=0.5). To determine E. granulosus genotype, 20 adult worms of this type were collected from the infected dog. The second internal transcribed spacer (ITS2) of the nuclear ribosomal DNA (rDNA) and cytochrome c oxidase 1 subunit (COX1) of the mitochondrial DNA were amplified from individual adult worm by PCR. Subsequently, the PCR product was sequenced by Sanger method. The lengths of ITS2 and COX1 sequences were 378 and 857 bp, respectively, for all the sequenced samples. The amplified DNA sequences from both ribosomal and mitochondrial genes were highly similar (99% and 98%, respectively) to that of the ovine strain in the GenBank database. The results of the present study indicate that the morpho-molecular features and characteristics of E. granulosus in the Iranian wild sheep are the same as those of the sheep-dog E. granulosus sensu stricto strain.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2015-04-15
In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Hydropathic self-organized criticality: a magic wand for protein physics.
Phillips, J C
2012-10-01
Self-organized criticality (SOC) is a popular concept that has been the subject of more than 3000 articles in the last 25 years. The characteristic signature of SOC is the appearance of self-similarity (power-law scaling) in observable properties. A characteristic observable protein property that describes protein-water interactions is the water-accessible (hydropathic) interfacial area of compacted globular protein networks. Here we show that hydropathic power-law (size- or length-scale-dependent) exponents derived from SOC enable theory to connect standard Web-based (BLAST) short-range amino acid (aa) sequence similarities to long-range aa sequence hydropathic roughening form factors that hierarchically describe evolutionary trends in water - membrane protein interactions. Our method utilizes hydropathic aa exponents that define a non-Euclidean metric realistically rooted in the atomic coordinates of 5526 protein segments. These hydropathic aa exponents thereby encapsulate universal (but previously only implicit) non-Euclidean long-range differential geometrical features of the Protein Data Bank. These hydropathic aa exponents easily organize small mutated aa sequence differences between human and proximate species proteins. For rhodopsin, the most studied transmembrane signaling protein associated with night vision, analysis shows that this approach separates Euclidean short- and non-Euclidean long-range aa sequence properties, and shows that they correlate with 96% success for humans, monkeys, cats, mice and rabbits. Proper application of SOC using hydropathic aa exponents promises unprecedented simplifications of exponentially complex protein sequence-structure-function problems, both conceptual and practical.
ITS2 data corroborate a monophyletic chlorophycean DO-group (Sphaeropleales)
2008-01-01
Background Within Chlorophyceae the ITS2 secondary structure shows an unbranched helix I, except for the 'Hydrodictyon' and the 'Scenedesmus' clade having a ramified first helix. The latter two are classified within the Sphaeropleales, characterised by directly opposed basal bodies in their flagellar apparatuses (DO-group). Previous studies could not resolve the taxonomic position of the 'Sphaeroplea' clade within the Chlorophyceae without ambiguity and two pivotal questions remain open: (1) Is the DO-group monophyletic and (2) is a branched helix I an apomorphic feature of the DO-group? In the present study we analysed the secondary structure of three newly obtained ITS2 sequences classified within the 'Sphaeroplea' clade and resolved sphaeroplealean relationships by applying different phylogenetic approaches based on a combined sequence-structure alignment. Results The newly obtained ITS2 sequences of Ankyra judayi, Atractomorpha porcata and Sphaeroplea annulina of the 'Sphaeroplea' clade do not show any branching in the secondary structure of their helix I. All applied phylogenetic methods highly support the 'Sphaeroplea' clade as a sister group to the 'core Sphaeropleales'. Thus, the DO-group is monophyletic. Furthermore, based on characteristics in the sequence-structure alignment one is able to distinguish distinct lineages within the green algae. Conclusion In green algae, a branched helix I in the secondary structure of the ITS2 evolves past the 'Sphaeroplea' clade. A branched helix I is an apomorph characteristic within the monophyletic DO-group. Our results corroborate the fundamental relevance of including the secondary structure in sequence analysis and phylogenetics. PMID:18655698
Rai, Shesh N; Trainor, Patrick J; Khosravi, Farhad; Kloecker, Goetz; Panchapakesan, Balaji
2016-01-01
The development of biosensors that produce time series data will facilitate improvements in biomedical diagnostics and in personalized medicine. The time series produced by these devices often contains characteristic features arising from biochemical interactions between the sample and the sensor. To use such characteristic features for determining sample class, similarity-based classifiers can be utilized. However, the construction of such classifiers is complicated by the variability in the time domains of such series that renders the traditional distance metrics such as Euclidean distance ineffective in distinguishing between biological variance and time domain variance. The dynamic time warping (DTW) algorithm is a sequence alignment algorithm that can be used to align two or more series to facilitate quantifying similarity. In this article, we evaluated the performance of DTW distance-based similarity classifiers for classifying time series that mimics electrical signals produced by nanotube biosensors. Simulation studies demonstrated the positive performance of such classifiers in discriminating between time series containing characteristic features that are obscured by noise in the intensity and time domains. We then applied a DTW distance-based k -nearest neighbors classifier to distinguish the presence/absence of mesenchymal biomarker in cancer cells in buffy coats in a blinded test. Using a train-test approach, we find that the classifier had high sensitivity (90.9%) and specificity (81.8%) in differentiating between EpCAM-positive MCF7 cells spiked in buffy coats and those in plain buffy coats.
Palmisano, Aldo N.; Winton, James R.; Dickhoff, Walton W.
1999-01-01
We cloned and sequenced a chinook salmon Hsp90 cDNA; sequence analysis shows it to be Hsp90??. Phylogenetic analysis supports the hypothesis that ?? and ?? paralogs of Hsp90 arose as a result of a gene duplication event and that they diverged early in the evolution of vertebrates, before tetrapods separated from the teleost lineage. Among several differences distinguishing poikilothermic Hsp90?? sequences from their bird and mammal orthologs, the teleost versions specifically lack a characteristic QTQDQP phosphorylation site near the N-terminus. We used the cDNA to develop an RNA (Northern) blot to quantify cellular Hsp90 mRNA levels. Chinook salmon embryonic (CHSE-214) cells responded to heat shock with a rapid rise in Hsp90 mRNA through 4 h, followed by a gradual decline over the next 20 h. Hsp90 mRNA level may be useful as a stress indicator, especially in a laboratory setting or in response to acute heat stress.
A Parvovirus B19 synthetic genome: sequence features and functional competence.
Manaresi, Elisabetta; Conti, Ilaria; Bua, Gloria; Bonvicini, Francesca; Gallinella, Giorgio
2017-08-01
Central to genetic studies for Parvovirus B19 (B19V) is the availability of genomic clones that may possess functional competence and ability to generate infectious virus. In our study, we established a new model genetic system for Parvovirus B19. A synthetic approach was followed, by design of a reference genome sequence, by generation of a corresponding artificial construct and its molecular cloning in a complete and functional form, and by setup of an efficient strategy to generate infectious virus, via transfection in UT7/EpoS1 cells and amplification in erythroid progenitor cells. The synthetic genome was able to generate virus with biological properties paralleling those of native virus, its infectious activity being dependent on the preservation of self-complementarity and sequence heterogeneity within the terminal regions. A virus of defined genome sequence, obtained from controlled cell culture conditions, can constitute a reference tool for investigation of the structural and functional characteristics of the virus. Copyright © 2017 Elsevier Inc. All rights reserved.
Clustering evolving proteins into homologous families.
Chan, Cheong Xin; Mahbob, Maisarah; Ragan, Mark A
2013-04-08
Clustering sequences into groups of putative homologs (families) is a critical first step in many areas of comparative biology and bioinformatics. The performance of clustering approaches in delineating biologically meaningful families depends strongly on characteristics of the data, including content bias and degree of divergence. New, highly scalable methods have recently been introduced to cluster the very large datasets being generated by next-generation sequencing technologies. However, there has been little systematic investigation of how characteristics of the data impact the performance of these approaches. Using clusters from a manually curated dataset as reference, we examined the performance of a widely used graph-based Markov clustering algorithm (MCL) and a greedy heuristic approach (UCLUST) in delineating protein families coded by three sets of bacterial genomes of different G+C content. Both MCL and UCLUST generated clusters that are comparable to the reference sets at specific parameter settings, although UCLUST tends to under-cluster compositionally biased sequences (G+C content 33% and 66%). Using simulated data, we sought to assess the individual effects of sequence divergence, rate heterogeneity, and underlying G+C content. Performance decreased with increasing sequence divergence, decreasing among-site rate variation, and increasing G+C bias. Two MCL-based methods recovered the simulated families more accurately than did UCLUST. MCL using local alignment distances is more robust across the investigated range of sequence features than are greedy heuristics using distances based on global alignment. Our results demonstrate that sequence divergence, rate heterogeneity and content bias can individually and in combination affect the accuracy with which MCL and UCLUST can recover homologous protein families. For application to data that are more divergent, and exhibit higher among-site rate variation and/or content bias, MCL may often be the better choice, especially if computational resources are not limiting.
Cancelable biometrics realization with multispace random projections.
Teoh, Andrew Beng Jin; Yuang, Chong Tze
2007-10-01
Biometric characteristics cannot be changed; therefore, the loss of privacy is permanent if they are ever compromised. This paper presents a two-factor cancelable formulation, where the biometric data are distorted in a revocable but non-reversible manner by first transforming the raw biometric data into a fixed-length feature vector and then projecting the feature vector onto a sequence of random subspaces that were derived from a user-specific pseudorandom number (PRN). This process is revocable and makes replacing biometrics as easy as replacing PRNs. The formulation has been verified under a number of scenarios (normal, stolen PRN, and compromised biometrics scenarios) using 2400 Facial Recognition Technology face images. The diversity property is also examined.
Sánchez-Sanhueza, G; Bello-Toledo, H; González-Rocha, G; Gonçalves, A T; Valenzuela, V; Gallardo-Escárate, C
2018-05-22
To determine the bacterial microbiota in root canals associated with persistent apical periodontitis and their relationship with the clinical characteristics of patients using next-generation sequencing (NGS). Bacterial samples from root canals associated with teeth having persistent apical periodontitis were taken from 24 patients undergoing root canal retreatment. Bacterial DNA was extracted, and V3-V4 variable regions of the 16S rRNA gene were amplified. The amplification was deep sequenced by Illumina technology to establish the metagenetic relationships among the bacterial species identified. The composition and diversity of microbial communities in the root canal and their relationships with clinical features were analysed. Parametric and nonparametric tests were used to analyse differences between patient characteristics and microbial data. A total of 86 different operational taxonomic units (OTUs) were identified and Good's nonparametric coverage estimator method indicated that 99.9 ± 0.00001% diversity was recovered per sample. The largest number of bacteria belonged to the phylum Proteobacteria. According to the medical history from the American Society of Anesthesiologists (ASA) Classification System, ASA II-III had higher richness estimates and distinct phylogenetic relationships compared to ASA I individuals (P < 0.05). Periapical index (PAI) score 5 was associated with increased microbiota diversity in comparison to PAI score 4, and this index was reduced in symptomatic patients. Based on the findings of this study, it is possible to suggest a close relationship between several clinical features and greater microbiota diversity with persistent endodontic infections. This work provides a better understanding on how microbial communities interact with their host and vice versa. © 2018 International Endodontic Journal. Published by John Wiley & Sons Ltd.
Characteristics of dune-paleosol-sequences in Fuerteventura. - What should be questioned?
NASA Astrophysics Data System (ADS)
Faust, Dominik; Willkommen, Tobias; Yanes, Yurena; Richter, David; Zöller, Ludwig
2013-04-01
Characteristics of dune-paleosol-sequences in Fuerteventura. - What should be questioned? Dominik Faust, TU Dresden, Germany Tobias Willkommen, TU Dresden, Germany Yurena Yanes, CSIC Granada/Cincinatti, Spain/USA David Richter, TU Dresden, Germany Ludwig Zöller, Uni Bayreuth, Germany The northern part of Fuerteventura is characterized by large dune fields. We investigated dune-paleosol-sequences in four pits to establish a robust stratigraphy and to propose a standard section. An interaction of processes like dune formation, soil formation and redeposition of soils and sand are most important to understand the principles of landscape development in the study area. To our mind a process cycle seem to be important: First climbing-dunes are formed by sand of shelf origin. Then soil formation could have taken place. Soil and/or sand were then eroded and deposited at toe slope position. This material in turn is the source of new sand supply and dune formation. The described cycle may be repeated several times and this ping-pong-process holds on. The results are sections composed of dune layers, paleosols and colluvial material interbedded. Fundamental questions still remain unanswered: Is climate change responsable for changes in process combination (e.g. from dune formation to soil formation)? Or are these features due to divergence phenomenon, where different effects/results (dune and soils) may be linked to similar causes (here: climate)? Assuming that different features (soils and dunes) were formed under one climate, increasing soil forming intensity could be mainly a function of decreasing sand supply. This in turn could be caused by reduced sand production (s. ZECH et al. accepted). However geochemical data and mollusc assemblages point to changing environments in space and even climate modifications in time.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.
Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...
2016-06-13
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco
Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less
Yadav, Saurabh; Kumari, Pragati; Kushwaha, Hemant Ritturaj
2013-01-01
Glutaredoxins are enzymatic antioxidants which are small, ubiquitous, glutathione dependent and essentially classified under thioredoxin-fold superfamily. Glutaredoxins are classified into two types: dithiol and monothiol. Monothiol glutaredoxins which carry the signature "CGFS" as a redox active motif is known for its role in oxidative stress, inside the cell. In the present analysis, the 138 amino acid long monothiol glutaredoxin, AgGRX1 from Ashbya gossypii was identified and has been used for the analysis. The multiple sequence alignment of the AgGRX1 protein sequence revealed the characteristic motif of typical monothiol glutaredoxin as observed in various other organisms. The proposed structure of the AgGRX1 protein was used to analyze signature folds related to the thioredoxin superfamily. Further, the study highlighted the structural features pertaining to the complex mechanism of glutathione docking and interacting residues.
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia.
Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee
2016-01-01
Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5' and 3' non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63-81% among themselves and 63-96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection.
Comparative Genetic Analyses of Human Rhinovirus C (HRV-C) Complete Genome from Malaysia
Khaw, Yam Sim; Chan, Yoke Fun; Jafar, Faizatul Lela; Othman, Norlijah; Chee, Hui Yee
2016-01-01
Human rhinovirus-C (HRV-C) has been implicated in more severe illnesses than HRV-A and HRV-B, however, the limited number of HRV-C complete genomes (complete 5′ and 3′ non-coding region and open reading frame sequences) has hindered the in-depth genetic study of this virus. This study aimed to sequence seven complete HRV-C genomes from Malaysia and compare their genetic characteristics with the 18 published HRV-Cs. Seven Malaysian HRV-C complete genomes were obtained with newly redesigned primers. The seven genomes were classified as HRV-C6, C12, C22, C23, C26, C42, and pat16 based on the VP4/VP2 and VP1 pairwise distance threshold classification. Five of the seven Malaysian isolates, namely, 3430-MY-10/C22, 8713-MY-10/C23, 8097-MY-11/C26, 1570-MY-10/C42, and 7383-MY-10/pat16 are the first newly sequenced complete HRV-C genomes. All seven Malaysian isolates genomes displayed nucleotide similarity of 63–81% among themselves and 63–96% with other HRV-Cs. Malaysian HRV-Cs had similar putative immunogenic sites, putative receptor utilization and potential antiviral sites as other HRV-Cs. The genomic features of Malaysian isolates were similar to those of other HRV-Cs. Negative selections were frequently detected in HRV-Cs complete coding sequences indicating that these sequences were under functional constraint. The present study showed that HRV-Cs from Malaysia have diverse genetic sequences but share conserved genomic features with other HRV-Cs. This genetic information could provide further aid in the understanding of HRV-C infection. PMID:27199901
Prevalence and genome characteristics of canine astrovirus in southwest China.
Li, Mingxiang; Yan, Nan; Ji, Conghui; Wang, Min; Zhang, Bin; Yue, Hua; Tang, Cheng
2018-05-30
The aim of this study was to investigate canine astrovirus (CaAstV) infection in southwest China. We collected 107 faecal samples from domestic dogs with obvious diarrhoea. Forty-two diarrhoeic samples (39.3 %) were positive for CaAstV by RT-PCR, and 41/42 samples showed co-infection with canine coronavirus (CCoV), canine parvovirus-2 (CPV-2) and canine distemper virus (CDV). Phylogenetic analysis based on 26 CaAstV partial ORF1a and ORF1b sequences revealed that most CaAstV strains showed unique evolutionary features. Interestingly, putative recombination events were observed among four of the five complete ORF2 sequences cloned in this study, and three of the five complete ORF2 sequences formed a single unique group, suggesting that these strains could be a novel genotype. We successfully sequenced the complete genome of one CaAstV strain (designated 2017/44/CHN), which was 6628 nt in length. The features of this genome include putative recombination events in the ORF1a, ORF1b and ORF2 genes, while the ORF2 gene had a continuous insertion of 7 aa in region II compared with the other complete ORF2 sequences available in GenBank. Phylogenetic analysis showed that 2017/44/CHN formed a single group based on genome sequences, suggesting that this strain might be a novel genotype. The results of this study revealed that CaAstV circulates widely in diarrhoeic dogs in southwest China and exhibits unique evolutionary events. To the best of our knowledge, this is the first report of recombination events in CaAstV, and it contributes to further understanding of the genetic evolution of CaAstV.
CaMELS: In silico prediction of calmodulin binding proteins and their binding sites.
Abbasi, Wajid Arshad; Asif, Amina; Andleeb, Saiqa; Minhas, Fayyaz Ul Amir Afsar
2017-09-01
Due to Ca 2+ -dependent binding and the sequence diversity of Calmodulin (CaM) binding proteins, identifying CaM interactions and binding sites in the wet-lab is tedious and costly. Therefore, computational methods for this purpose are crucial to the design of such wet-lab experiments. We present an algorithm suite called CaMELS (CalModulin intEraction Learning System) for predicting proteins that interact with CaM as well as their binding sites using sequence information alone. CaMELS offers state of the art accuracy for both CaM interaction and binding site prediction and can aid biologists in studying CaM binding proteins. For CaM interaction prediction, CaMELS uses protein sequence features coupled with a large-margin classifier. CaMELS models the binding site prediction problem using multiple instance machine learning with a custom optimization algorithm which allows more effective learning over imprecisely annotated CaM-binding sites during training. CaMELS has been extensively benchmarked using a variety of data sets, mutagenic studies, proteome-wide Gene Ontology enrichment analyses and protein structures. Our experiments indicate that CaMELS outperforms simple motif-based search and other existing methods for interaction and binding site prediction. We have also found that the whole sequence of a protein, rather than just its binding site, is important for predicting its interaction with CaM. Using the machine learning model in CaMELS, we have identified important features of protein sequences for CaM interaction prediction as well as characteristic amino acid sub-sequences and their relative position for identifying CaM binding sites. Python code for training and evaluating CaMELS together with a webserver implementation is available at the URL: http://faculty.pieas.edu.pk/fayyaz/software.html#camels. © 2017 Wiley Periodicals, Inc.
Astakhova, L N; Zatsepina, O G; Przhiboro, A A; Evgen'ev, M B; Garbuz, D G
2013-06-01
The heat shock proteins belonging to the Hsp90 family (Hsp83 in Diptera) play a crucial role in the protection of cells due to their chaperoning functions. We sequenced hsp90 genes from three species of the family Stratiomyidae (Diptera) living in thermally different habitats and characterized by extraordinarily high thermotolerance. The sequence variation and structure of the hsp90 family genes were compared with previously described features of hsp70 copies isolated from the same species. Two functional hsp83 genes were found in the species studied, that are arranged in tandem orientation at least in one of them. This organization was not previously described. Stratiomyidae hsp83 genes share a high level of identity with hsp83 of Drosophila, and the deduced protein possesses five conserved amino acid sequence motifs characteristic of the Hsp90 family as well as the C-terminus MEEVD sequence characteristic of the cytosolic isoform. A comparison of the hsp83 promoters of two Stratiomyidae species from thermally contrasting habitats demonstrated that while both species contain canonical heat shock elements in the same position, only one of the species contains functional GAF-binding elements. Our data indicate that in the same species, hsp83 family genes show a higher evolution rate than the hsp70 family. © 2013 Royal Entomological Society.
Temporality of Features in Near-Death Experience Narratives
Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven
2017-01-01
Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon. PMID:28659779
Temporality of Features in Near-Death Experience Narratives.
Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven
2017-01-01
Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon.
Discriminative prediction of mammalian enhancers from DNA sequence
Lee, Dongwon; Karchin, Rachel; Beer, Michael A.
2011-01-01
Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935
Heinz, Eva; Stubenrauch, Christopher J.; Grinter, Rhys; Croft, Nathan P.; Purcell, Anthony W.; Strugnell, Richard A.; Dougan, Gordon; Lithgow, Trevor
2016-01-01
The bacterial cell surface proteins intimin and invasin are virulence factors that share a common domain structure and bind selectively to host cell receptors in the course of bacterial pathogenesis. The β-barrel domains of intimin and invasin show significant sequence and structural similarities. Conversely, a variety of proteins with sometimes limited sequence similarity have also been annotated as “intimin-like” and “invasin” in genome datasets, while other recent work on apparently unrelated virulence-associated proteins ultimately revealed similarities to intimin and invasin. Here we characterize the sequence and structural relationships across this complex protein family. Surprisingly, intimins and invasins represent a very small minority of the sequence diversity in what has been previously the “intimin/invasin protein family”. Analysis of the assembly pathway for expression of the classic intimin, EaeA, and a characteristic example of the most prevalent members of the group, FdeC, revealed a dependence on the translocation and assembly module as a common feature for both these proteins. While the majority of the sequences in the grouping are most similar to FdeC, a further and widespread group is two-partner secretion systems that use the β-barrel domain as the delivery device for secretion of a variety of virulence factors. This comprehensive analysis supports the adoption of the “inverse autotransporter protein family” as the most accurate nomenclature for the family and, in turn, has important consequences for our overall understanding of the Type V secretion systems of bacterial pathogens. PMID:27190006
Organization and evolution of highly repeated satellite DNA sequences in plant chromosomes.
Sharma, S; Raina, S N
2005-01-01
A major component of the plant nuclear genome is constituted by different classes of repetitive DNA sequences. The structural, functional and evolutionary aspects of the satellite repetitive DNA families, and their organization in the chromosomes is reviewed. The tandem satellite DNA sequences exhibit characteristic chromosomal locations, usually at subtelomeric and centromeric regions. The repetitive DNA family(ies) may be widely distributed in a taxonomic family or a genus, or may be specific for a species, genome or even a chromosome. They may acquire large-scale variations in their sequence and copy number over an evolutionary time-scale. These features have formed the basis of extensive utilization of repetitive sequences for taxonomic and phylogenetic studies. Hybrid polyploids have especially proven to be excellent models for studying the evolution of repetitive DNA sequences. Recent studies explicitly show that some repetitive DNA families localized at the telomeres and centromeres have acquired important structural and functional significance. The repetitive elements are under different evolutionary constraints as compared to the genes. Satellite DNA families are thought to arise de novo as a consequence of molecular mechanisms such as unequal crossing over, rolling circle amplification, replication slippage and mutation that constitute "molecular drive". Copyright 2005 S. Karger AG, Basel.
Gupta, R S; Aitken, K; Falah, M; Singh, B
1994-01-01
The genes for two different 70-kDa heat shock protein (HSP70) homologs have been cloned and sequenced from the protozoan Giardia lamblia. On the basis of their sequence features, one of these genes corresponds to the cytoplasmic form of HSP70. The second gene, on the basis of its characteristic N-terminal hydrophobic signal sequence and C-terminal endoplasmic reticulum (ER) retention sequence (Lys-Asp-Glu-Leu), is the equivalent of ER-resident GRP78 or the Bip family of proteins. Phylogenetic trees based on HSP70 sequences show that G. lamblia homologs show the deepest divergence among eukaryotic species. The identification of a GRP78 or Bip homolog in G. lamblia strongly suggests the existence of ER in this ancient eukaryote. Detailed phylogenetic analyses of HSP70 sequences by boot-strap neighbor-joining and maximum-parsimony methods show that the cytoplasmic and ER homologs form distinct subfamilies that evolved from a common eukaryotic ancestor by gene duplication that occurred very early in the evolution of eukaryotic cells. It is postulated that because of the essential "molecular chaperone" function of these proteins in translocation of other proteins across membranes, duplication of their genes accompanied the evolution of ER or nucleus in the eukaryotic cell ancestor. The presence in all eukaryotic cytoplasmic HSP70 homologs (including the cognate, heat-induced, and ER forms) of a number of autapomorphic sequence signatures that are not present in any prokaryotic or organellar homologs provides strong evidence regarding the monophyletic nature of eukaryotic lineage. Further, all eukaryotic HSP70 homologs share in common with the Gram-negative group of eubacteria a number of sequence features that are not present in any archaebacterium or Gram-positive bacterium, indicating their evolution from this group of organisms. Some implications of these findings regarding the evolution of eukaryotic cells and ER are discussed. Images PMID:8159675
NASA Technical Reports Server (NTRS)
Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.
1989-01-01
Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.
Ultraviolet spectral morphology of the O stars. IV - The OB supergiant sequence
NASA Technical Reports Server (NTRS)
Walborn, Nolan R.; Nichols-Bohlin, Joy
1987-01-01
An atlas of 25 O3-B8 supergiant spectra in the wavelength ranges 1320-1580 A and 1620-1880 A is presented, based on high-resolution data from the IUE archives. The remarkably detailed relationship between the stellar-wind profiles and the optical spectral classifications throughout this sequence is emphasized. For instance, the (Si IV)/(C IV) ratio reverses between O4 and O6.5; and the B0, B0.5, and B0.7 Ia wind characteristics are each qualitatively unique and distinct from one another. The systematic behavior of nine stellar-wind features with ionization potentials ranging from 114 to 19 eV is summarized as a function of advancing spectral type.
Freyhult, Eva; Moulton, Vincent; Ardell, David H.
2006-01-01
Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure–function relationships in sequence families and superfamilies. PMID:16473848
Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping
2017-12-12
In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.
Structural-functional diversity of the natural oligopeptides.
Zamyatnin, Alexander A
2018-03-01
Natural oligopeptides may regulate nearly all vital processes. To date, the chemical structures of many oligopeptides have been identified from >2000 organisms representing all the biological kingdoms. We have considered a number of mathematical (sequence length), chemical, physical, and biological features of an array of natural oligopeptides on the basis of the oligopeptide EROP-Moscow database (http://erop.inbi.ras.ru, 15,351 entries) data. There is the substantial difference of these substances from polypeptide molecules of proteins according to their physicochemical characteristics. These characteristics may be critical for understanding the molecular mechanisms of the action of oligopeptides that lead to the development of physiological effects. Copyright © 2017 Elsevier Ltd. All rights reserved.
Saccharomyces cerevisiae Shuttle vectors.
Gnügge, Robert; Rudolf, Fabian
2017-05-01
Yeast shuttle vectors are indispensable tools in yeast research. They enable cloning of defined DNA sequences in Escherichia coli and their direct transfer into Saccharomyces cerevisiae cells. There are three types of commonly used yeast shuttle vectors: centromeric plasmids, episomal plasmids and integrating plasmids. In this review, we discuss the different plasmid systems and their characteristic features. We focus on their segregational stability and copy number and indicate how to modify these properties. Copyright © 2017 John Wiley & Sons, Ltd. Copyright © 2017 John Wiley & Sons, Ltd.
Detection of 1p36 deletion by clinical exome-first diagnostic approach.
Watanabe, Miki; Hayabuchi, Yasunobu; Ono, Akemi; Naruto, Takuya; Horikawa, Hideaki; Kohmoto, Tomohiro; Masuda, Kiyoshi; Nakagawa, Ryuji; Ito, Hiromichi; Kagami, Shoji; Imoto, Issei
2016-01-01
Although chromosome 1p36 deletion syndrome is considered clinically recognizable based on characteristic features, the clinical manifestations of patients during infancy are often not consistent with those observed later in life. We report a 4-month-old girl who showed multiple congenital anomalies and developmental delay, but no clinical signs of syndromic disease caused by a terminal deletion in 1p36.32-p36.33 that was first identified by targeted-exome sequencing for molecular diagnosis.
Detection of 1p36 deletion by clinical exome-first diagnostic approach
Watanabe, Miki; Hayabuchi, Yasunobu; Ono, Akemi; Naruto, Takuya; Horikawa, Hideaki; Kohmoto, Tomohiro; Masuda, Kiyoshi; Nakagawa, Ryuji; Ito, Hiromichi; Kagami, Shoji; Imoto, Issei
2016-01-01
Although chromosome 1p36 deletion syndrome is considered clinically recognizable based on characteristic features, the clinical manifestations of patients during infancy are often not consistent with those observed later in life. We report a 4-month-old girl who showed multiple congenital anomalies and developmental delay, but no clinical signs of syndromic disease caused by a terminal deletion in 1p36.32-p36.33 that was first identified by targeted-exome sequencing for molecular diagnosis. PMID:28428889
Genome sequence and analysis of Lactobacillus helveticus
Cremonesi, Paola; Chessa, Stefania; Castiglioni, Bianca
2013-01-01
The microbiological characterization of lactobacilli is historically well developed, but the genomic analysis is recent. Because of the widespread use of Lactobacillus helveticus in cheese technology, information concerning the heterogeneity in this species is accumulating rapidly. Recently, the genome of five L. helveticus strains was sequenced to completion and compared with other genomically characterized lactobacilli. The genomic analysis of the first sequenced strain, L. helveticus DPC 4571, isolated from cheese and selected for its characteristics of rapid lysis and high proteolytic activity, has revealed a plethora of genes with industrial potential including those responsible for key metabolic functions such as proteolysis, lipolysis, and cell lysis. These genes and their derived enzymes can facilitate the production of cheese and cheese derivatives with potential for use as ingredients in consumer foods. In addition, L. helveticus has the potential to produce peptides with a biological function, such as angiotensin converting enzyme (ACE) inhibitory activity, in fermented dairy products, demonstrating the therapeutic value of this species. A most intriguing feature of the genome of L. helveticus is the remarkable similarity in gene content with many intestinal lactobacilli. Comparative genomics has allowed the identification of key gene sets that facilitate a variety of lifestyles including adaptation to food matrices or the gastrointestinal tract. As genome sequence and functional genomic information continues to explode, key features of the genomes of L. helveticus strains continue to be discovered, answering many questions but also raising many new ones. PMID:23335916
Sequence Analysis of the Genome of Carnation (Dianthus caryophyllus L.)
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
2014-01-01
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. ‘Francesco’ was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568 887 315 bp, consisting of 45 088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16 644 bp and 60 737 bp, respectively, and the longest scaffold was 1 287 144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. PMID:24344172
Shao, Zhiyong; Graf, Shannon; Chaga, Oleg Y; Lavrov, Dennis V
2006-10-15
The 16,937-nuceotide sequence of the linear mitochondrial DNA (mt-DNA) molecule of the moon jelly Aurelia aurita (Cnidaria, Scyphozoa) - the first mtDNA sequence from the class Scypozoa and the first sequence of a linear mtDNA from Metazoa - has been determined. This sequence contains genes for 13 energy pathway proteins, small and large subunit rRNAs, and methionine and tryptophan tRNAs. In addition, two open reading frames of 324 and 969 base pairs in length have been found. The deduced amino-acid sequence of one of them, ORF969, displays extensive sequence similarity with the polymerase [but not the exonuclease] domain of family B DNA polymerases, and this ORF has been tentatively identified as dnab. This is the first report of dnab in animal mtDNA. The genes in A. aurita mtDNA are arranged in two clusters with opposite transcriptional polarities; transcription proceeding toward the ends of the molecule. The determined sequences at the ends of the molecule are nearly identical but inverted and lack any obvious potential secondary structures or telomere-like repeat elements. The acquisition of mitochondrial genomic data for the second class of Cnidaria allows us to reconstruct characteristic features of mitochondrial evolution in this animal phylum.
Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang
2018-02-01
Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .
Colorimetric Sensor for Label Free Detection of Porcine PCR Product (ID: 18)
NASA Astrophysics Data System (ADS)
Ali, M. E.; Hashim, U.; Bari, M. F.; Dhahi, Th. S.
2011-05-01
This report described the use of 40±5 nm in diameter citrate-coated gold nanoparticles (GNPs) as colorimetric sensor to visually detect the presence of a 17-base swine specific conserved sequence and nucleotide mismatch in the mixed PCR products of pig, deer and shad cytochrome b genes. The size of these PCR amplicons was 109 base-pair and was amplified with a pair of common primers. Colloidal GNPs changed color from pinkish- red to purple-gray in 2 mM PBS buffer by losing its characteristic surface plasmon resonance peak at 530 nm and gaining new features between 620 and 800 nm in the absorption spectrum indicating strong aggregation. The particles were stabilized against salt induced aggregation, retained spectral features and characteristic color upon adsorption of single-stranded DNA. The PCR products without any additional processing were hybridized with a 17-nucleotide swine probe prior to exposure to GNPs. At a critical annealing temperature (55° C) that differentiated between the match and mismatch pairing, the probe was hybridized with the pig PCR product and dehybridized from the deer's and shad's. The interaction of dehybridized probe to GNPs prevented them from salt-induced aggregation, retaining their characteristic red color. The assay did not need any surface modification chemistry or labeling steps. The results were determined visually and validated by absorption spectroscopy. The entire assay (hybridization plus visual detection) was performed in less than 10 min. The assay obviated the need of complex RFLP, sequencing or blotting to differentiate the same size PCR products. We find the application of the assay for species assignment in food analysis, mismatch detection in genetic screening and homology study among closely related species.
Lee, Chung-Ta; Chow, Nan-Haw; Su, Pei-Fang; Ho, Chung-Liang; Tsai, Hung-Wen; Chen, Yi-Lin; Lin, Shao-Chieh; Lin, Bo-Wen; Lin, Peng-Chan; Lee, Jenq-Chang
2017-01-01
Colorectal mucinous adenocarcinoma (MAC) and serrated adenocarcinoma (SAC) share many characteristics, including right-side colon location, frequent mucin production, and various molecular features. This study examined the frequency of SAC morphology in MACs. We assessed the correlation of SAC morphology with clinicopathological parameters, molecular characteristics, and patient prognosis. Eighty-eight colorectal MACs were collected and reviewed for SAC morphology according to Makinen's criteria. We sequenced KRAS and BRAF, assessed CpG island methylator phenotype (CIMP) frequency, and analyzed DNA mismatch repair enzyme levels using immunohistochemistry in tumor samples. SAC morphology was observed in 38% of MACs, and was associated with proximal location (P=0.001), BRAF mutation (P=0.042), CIMP-positive status (P=0.023), and contiguous traditional serrated adenoma (P=0.019). Multivariate analysis revealed that MACs without both SAC morphology and CIMP-positive status exhibited 3.955 times greater risk of cancer relapse than MACs having both characteristics or either one (P=0.035). Our results show that two MAC groups with distinct features can be identified using Makinen's criteria, and suggest a favorable prognostic role for the serrated neoplastic pathway in colorectal MAC. PMID:28422723
Regulatory Features for Odorant Receptor Genes in the Mouse Genome.
Degl'Innocenti, Andrea; D'Errico, Anna
2017-01-01
The odorant receptor genes, seven transmembrane receptor genes constituting the vastest mammalian gene multifamily, are expressed monogenically and monoallelicaly in each sensory neuron in the olfactory epithelium. This characteristic, often referred to as the one neuron-one receptor rule, is driven by mostly uncharacterized molecular dynamics, generally named odorant receptor gene choice . Much attention has been paid by the scientific community to the identification of sequences regulating the expression of odorant receptor genes within their loci , where related genes are usually arranged in genomic clusters. A number of studies identified transcription factor binding sites on odorant receptor promoter sequences. Similar binding sites were also found on a number of enhancers that regulate in cis their transcription, but have been proposed to form interchromosomal networks. Odorant receptor gene choice seems to occur via the local removal of strongly repressive epigenetic markings, put in place during the maturation of the sensory neuron on each odorant receptor locus . Here we review the fast-changing state of art for the study of regulatory features for odorant receptor genes.
van Keulen, H; Gutell, R R; Campbell, S R; Erlandsen, S L; Jarroll, E L
1992-10-01
The total nucleotide sequence of the rDNA of Giardia muris, an intestinal protozoan parasite of rodents, has been determined. The repeat unit is 7668 basepairs (bp) in size and consists of a spacer of 3314 bp, a small-subunit rRNA (SSU-rRNA) gene of 1429, and a large-subunit rRNA (LSU-rRNA) gene of 2698 bp. The spacer contains long direct repeats and is heterogeneous in size. The LSU-rRNA of G. muris was compared to that of the human intestinal parasite Giardia duodenalis, to the bird parasite Giardia ardeae, and to that of Escherichia coli. The LSU-rRNA has a size comparable to the 23S rRNA of E. coli but shows structural features typical for eukaryotes. Some variable regions are typically small and account for the overall smaller size of this rRNA. The structure of the G. muris LSU-rRNA is similar to that of the other Giardia rRNA, but each rRNA has characteristic features residing in a number of variable regions.
Sattley, W Matthew; Blankenship, Robert E
2010-06-01
The complete annotated genome sequence of Heliobacterium modesticaldum strain Ice1 provides our first glimpse into the genetic potential of the Heliobacteriaceae, a unique family of anoxygenic phototrophic bacteria. H. modesticaldum str. Ice1 is the first completely sequenced phototrophic representative of the Firmicutes, and heliobacteria are the only phototrophic members of this large bacterial phylum. The H. modesticaldum genome consists of a single 3.1-Mb circular chromosome with no plasmids. Of special interest are genomic features that lend insight to the physiology and ecology of heliobacteria, including the genetic inventory of the photosynthesis gene cluster. Genes involved in transport, photosynthesis, and central intermediary metabolism are described and catalogued. The obligately heterotrophic metabolism of heliobacteria is a key feature of the physiology and evolution of these phototrophs. The conspicuous absence of recognizable genes encoding the enzyme ATP-citrate lyase prevents autotrophic growth via the reverse citric acid cycle in heliobacteria, thus being a distinguishing differential characteristic between heliobacteria and green sulfur bacteria. The identities of electron carriers that enable energy conservation by cyclic light-driven electron transfer remain in question.
Ibrahim, Wisam; Abadeh, Mohammad Saniee
2017-05-21
Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.
Aramaki, Yu; Haruno, Masahiko; Osu, Rieko; Sadato, Norihiro
2011-07-06
In periodic bimanual movements, anti-phase-coordinated patterns often change into in-phase patterns suddenly and involuntarily. Because behavior in the initial period of a sequence of cycles often does not show any obvious errors, it is difficult to predict subsequent movement errors in the later period of the cyclical sequence. Here, we evaluated performance in the later period of the cyclical sequence of bimanual periodic movements using human brain activity measured with functional magnetic resonance imaging as well as using initial movement features. Eighteen subjects performed a 30 s bimanual finger-tapping task. We calculated differences in initiation-locked transient brain activity between antiphase and in-phase tapping conditions. Correlation analysis revealed that the difference in the anterior putamen activity during antiphase compared within-phase tapping conditions was strongly correlated with future instability as measured by the mean absolute deviation of the left-hand intertap interval during antiphase movements relative to in-phase movements (r = 0.81). Among the initial movement features we measured, only the number of taps to establish the antiphase movement pattern exhibited a significant correlation. However, the correlation efficient of 0.60 was not high enough to predict the characteristics of subsequent movement. There was no significant correlation between putamen activity and initial movement features. It is likely that initiating unskilled difficult movements requires increased anterior putamen activity, and this activity increase may facilitate the initiation of movement via the basal ganglia-thalamocortical circuit. Our results suggest that initiation-locked transient activity of the anterior putamen can be used to predict future motor performance.
Robot Acting on Moving Bodies (RAMBO): Interaction with tumbling objects
NASA Technical Reports Server (NTRS)
Davis, Larry S.; Dementhon, Daniel; Bestul, Thor; Ziavras, Sotirios; Srinivasan, H. V.; Siddalingaiah, Madhu; Harwood, David
1989-01-01
Interaction with tumbling objects will become more common as human activities in space expand. Attempting to interact with a large complex object translating and rotating in space, a human operator using only his visual and mental capacities may not be able to estimate the object motion, plan actions or control those actions. A robot system (RAMBO) equipped with a camera, which, given a sequence of simple tasks, can perform these tasks on a tumbling object, is being developed. RAMBO is given a complete geometric model of the object. A low level vision module extracts and groups characteristic features in images of the object. The positions of the object are determined in a sequence of images, and a motion estimate of the object is obtained. This motion estimate is used to plan trajectories of the robot tool to relative locations rearby the object sufficient for achieving the tasks. More specifically, low level vision uses parallel algorithms for image enhancement by symmetric nearest neighbor filtering, edge detection by local gradient operators, and corner extraction by sector filtering. The object pose estimation is a Hough transform method accumulating position hypotheses obtained by matching triples of image features (corners) to triples of model features. To maximize computing speed, the estimate of the position in space of a triple of features is obtained by decomposing its perspective view into a product of rotations and a scaled orthographic projection. This allows use of 2-D lookup tables at each stage of the decomposition. The position hypotheses for each possible match of model feature triples and image feature triples are calculated in parallel. Trajectory planning combines heuristic and dynamic programming techniques. Then trajectories are created using dynamic interpolations between initial and goal trajectories. All the parallel algorithms run on a Connection Machine CM-2 with 16K processors.
Noonan syndrome - a new survey.
Tafazoli, Alireza; Eshraghi, Peyman; Koleti, Zahra Kamel; Abbaszadegan, Mohammadreza
2017-02-01
Noonan syndrome (NS) is an autosomal dominant disorder with vast heterogeneity in clinical and genetic features. Various symptoms have been reported for this abnormality such as short stature, unusual facial characteristics, congenital heart abnormalities, developmental complications, and an elevated tumor incidence rate. Noonan syndrome shares clinical features with other rare conditions, including LEOPARD syndrome, cardio-facio-cutaneous syndrome, Noonan-like syndrome with loose anagen hair, and Costello syndrome. Germline mutations in the RAS-MAPK (mitogen-activated protein kinase) signal transduction pathway are responsible for NS and other related disorders. Noonan syndrome diagnosis is primarily based on clinical features, but molecular testing should be performed to confirm it in patients. Due to the high number of genes associated with NS and other RASopathy disorders, next-generation sequencing is the best choice for diagnostic testing. Patients with NS also have higher risk for leukemia and specific solid tumors. Age-specific guidelines for the management of NS are available.
Noonan syndrome – a new survey
Tafazoli, Alireza; Eshraghi, Peyman; Koleti, Zahra Kamel
2016-01-01
Noonan syndrome (NS) is an autosomal dominant disorder with vast heterogeneity in clinical and genetic features. Various symptoms have been reported for this abnormality such as short stature, unusual facial characteristics, congenital heart abnormalities, developmental complications, and an elevated tumor incidence rate. Noonan syndrome shares clinical features with other rare conditions, including LEOPARD syndrome, cardio-facio-cutaneous syndrome, Noonan-like syndrome with loose anagen hair, and Costello syndrome. Germline mutations in the RAS-MAPK (mitogen-activated protein kinase) signal transduction pathway are responsible for NS and other related disorders. Noonan syndrome diagnosis is primarily based on clinical features, but molecular testing should be performed to confirm it in patients. Due to the high number of genes associated with NS and other RASopathy disorders, next-generation sequencing is the best choice for diagnostic testing. Patients with NS also have higher risk for leukemia and specific solid tumors. Age-specific guidelines for the management of NS are available. PMID:28144274
Method of interplanetary trajectory optimization for the spacecraft with low thrust and swing-bys
NASA Astrophysics Data System (ADS)
Konstantinov, M. S.; Thein, M.
2017-07-01
The method developed to avoid the complexity of solving the multipoint boundary value problem while optimizing interplanetary trajectories of the spacecraft with electric propulsion and a sequence of swing-bys is presented in the paper. This method is based on the use of the preliminary problem solutions for the impulsive trajectories. The preliminary problem analyzed at the first stage of the study is formulated so that the analysis and optimization of a particular flight path is considered as the unconstrained minimum in the space of the selectable parameters. The existing methods can effectively solve this problem and make it possible to identify rational flight paths (the sequence of swing-bys) to receive the initial approximation for the main characteristics of the flight path (dates, values of the hyperbolic excess velocity, etc.). These characteristics can be used to optimize the trajectory of the spacecraft with electric propulsion. The special feature of the work is the introduction of the second (intermediate) stage of the research. At this stage some characteristics of the analyzed flight path (e.g. dates of swing-bys) are fixed and the problem is formulated so that the trajectory of the spacecraft with electric propulsion is optimized on selected sites of the flight path. The end-to-end optimization is carried out at the third (final) stage of the research. The distinctive feature of this stage is the analysis of the full set of optimal conditions for the considered flight path. The analysis of the characteristics of the optimal flight trajectories to Jupiter with Earth, Venus and Mars swing-bys for the spacecraft with electric propulsion are presented. The paper shows that the spacecraft weighing more than 7150 kg can be delivered into the vicinity of Jupiter along the trajectory with two Earth swing-bys by use of the space transportation system based on the "Angara A5" rocket launcher, the chemical upper stage "KVTK" and the electric propulsion system with input electrical power of 100 kW.
Prediction of Protein-Protein Interaction Sites by Random Forest Algorithm with mRMR and IFS
Li, Bi-Qing; Feng, Kai-Yan; Chen, Lei; Huang, Tao; Cai, Yu-Dong
2012-01-01
Prediction of protein-protein interaction (PPI) sites is one of the most challenging problems in computational biology. Although great progress has been made by employing various machine learning approaches with numerous characteristic features, the problem is still far from being solved. In this study, we developed a novel predictor based on Random Forest (RF) algorithm with the Minimum Redundancy Maximal Relevance (mRMR) method followed by incremental feature selection (IFS). We incorporated features of physicochemical/biochemical properties, sequence conservation, residual disorder, secondary structure and solvent accessibility. We also included five 3D structural features to predict protein-protein interaction sites and achieved an overall accuracy of 0.672997 and MCC of 0.347977. Feature analysis showed that 3D structural features such as Depth Index (DPX) and surface curvature (SC) contributed most to the prediction of protein-protein interaction sites. It was also shown via site-specific feature analysis that the features of individual residues from PPI sites contribute most to the determination of protein-protein interaction sites. It is anticipated that our prediction method will become a useful tool for identifying PPI sites, and that the feature analysis described in this paper will provide useful insights into the mechanisms of interaction. PMID:22937126
Yang, Xiao; Richardson, Patricia A.; Hong, Chuanxue
2014-01-01
A novel Phytophthora species was frequently recovered from irrigation reservoirs at several ornamental plant production facilities in eastern Virginia. Initial sequencing of the internal transcribed spacer (ITS) region of this species generated unreadable sequences due to continual polymorphic positions. Cloning and sequencing the ITS region as well as sequencing the mitochondrially encoded cytochrome c oxidase 1 and beta-tubulin genes revealed that it is a hybrid between P. taxon PgChlamydo as its paternal parent and an unknown species genetically close to P. mississippiae as its maternal parent. This hybrid has some diagnostic morphological features of P. taxon PgChlamydo and P. mississippiae. It produces catenulate hyphal swellings, characteristic of P. mississippiae, and chlamydospores, typical of P. taxon PgChlamydo. It also produces both ornamented and relatively smooth-walled oogonia. Ornamented oogonia are another important diagnostic character of P. mississippiae. The relatively smooth-walled oogonia may be indicative of oogonial character of P. taxon PgChlamydo. The new hybrid is described here as Phytophthora ×stagnum. PMID:25072374
Evolutionary and biophysical relationships among the papillomavirus E2 proteins.
Blakaj, Dukagjin M; Fernandez-Fuentes, Narcis; Chen, Zigui; Hegde, Rashmi; Fiser, Andras; Burk, Robert D; Brenowitz, Michael
2009-01-01
Infection by human papillomavirus (HPV) may result in clinical conditions ranging from benign warts to invasive cancer. The HPV E2 protein represses oncoprotein transcription and is required for viral replication. HPV E2 binds to palindromic DNA sequences of highly conserved four base pair sequences flanking an identical length variable 'spacer'. E2 proteins directly contact the conserved but not the spacer DNA. Variation in naturally occurring spacer sequences results in differential protein affinity that is dependent on their sensitivity to the spacer DNA's unique conformational and/or dynamic properties. This article explores the biophysical character of this core viral protein with the goal of identifying characteristics that associated with risk of virally caused malignancy. The amino acid sequence, 3d structure and electrostatic features of the E2 protein DNA binding domain are highly conserved; specific interactions with DNA binding sites have also been conserved. In contrast, the E2 protein's transactivation domain does not have extensive surfaces of highly conserved residues. Rather, regions of high conservation are localized to small surface patches. Implications to cancer biology are discussed.
Organization of nif gene cluster in Frankia sp. EuIK1 strain, a symbiont of Elaeagnus umbellata.
Oh, Chang Jae; Kim, Ho Bang; Kim, Jitae; Kim, Won Jin; Lee, Hyoungseok; An, Chung Sun
2012-01-01
The nucleotide sequence of a 20.5-kb genomic region harboring nif genes was determined and analyzed. The fragment was obtained from Frankia sp. EuIK1 strain, an indigenous symbiont of Elaeagnus umbellata. A total of 20 ORFs including 12 nif genes were identified and subjected to comparative analysis with the genome sequences of 3 Frankia strains representing diverse host plant specificities. The nucleotide and deduced amino acid sequences showed highest levels of identity with orthologous genes from an Elaeagnus-infecting strain. The gene organization patterns around the nif gene clusters were well conserved among all 4 Frankia strains. However, characteristic features appeared in the location of the nifV gene for each Frankia strain, depending on the type of host plant. Sequence analysis was performed to determine the transcription units and suggested that there could be an independent operon starting from the nifW gene in the EuIK strain. Considering the organization patterns and their total extensions on the genome, we propose that the nif gene clusters remained stable despite genetic variations occurring in the Frankia genomes.
Jakubec, David; Laskowski, Roman A.; Vondrasek, Jiri
2016-01-01
Decades of intensive experimental studies of the recognition of DNA sequences by proteins have provided us with a view of a diverse and complicated world in which few to no features are shared between individual DNA-binding protein families. The originally conceived direct readout of DNA residue sequences by amino acid side chains offers very limited capacity for sequence recognition, while the effects of the dynamic properties of the interacting partners remain difficult to quantify and almost impossible to generalise. In this work we investigated the energetic characteristics of all DNA residue—amino acid side chain combinations in the conformations found at the interaction interface in a very large set of protein—DNA complexes by the means of empirical potential-based calculations. General specificity-defining criteria were derived and utilised to look beyond the binding motifs considered in previous studies. Linking energetic favourability to the observed geometrical preferences, our approach reveals several additional amino acid motifs which can distinguish between individual DNA bases. Our results remained valid in environments with various dielectric properties. PMID:27384774
NASA Astrophysics Data System (ADS)
Arteca, Gustavo A.; Tapia, O.
Using computer-simulated molecular dynamics, we study the effect of sequence mutation on the unfolding mechanism of a native fold. The system considered is the native fold of hen egg-white lysozyme, exposed to centrifugal unfolding in vacuo. This unfolding bias elicits configurational transitions that imitate the behaviour of anhydrous proteins diffusing after electrospraying from neutral-pH solutions. By changing the sequences threaded onto the native fold of lysozyme, we probe the role of disulfide bridges and the effect of a global mutation. We find that the initial denaturing steps share common characteristics for the tested sequences. Recurrent features are: (i) the presence of dumbbell conformers with significant residual secondary structure, (ii) the ubiquitous formation of hairpins and two-stranded β-sheets regardless of disulfide bridges, and (iii) an unfolding pattern where the reduction in folding complexity is highly correlated with the decrease in chain compactness. These findings appear to be intrinsic to the shape of the native fold, suggesting that similar unfolding pathways may be accessible to many protein sequences.
Sathyan, Naveen; Philip, Rosamma; Chaithanya, E R; Anil Kumar, P R; Sanjeevan, V N; Singh, I S Bright
2013-01-01
Antimicrobial peptides (AMPs) are humoral innate immune components of fishes that provide protection against pathogenic infections. Histone derived antimicrobial peptides are reported to actively participate in the immune defenses of fishes. Present study deals with identification of putative antimicrobial sequences from the histone H2A of sicklefin chimaera, Neoharriotta pinnata. A 52 amino acid residue termed Harriottin-1, a 40 amino acid Harriottin-2, and a 21 mer Harriottin-3 were identified to possess antimicrobial sequence motif. Physicochemical properties and molecular structure of Harriottins are in agreement with the characteristic features of antimicrobial peptides, indicating its potential role in innate immunity of sicklefin chimaera. The histone H2A sequence of sicklefin chimera was found to differ from previously reported histone H2A sequences. Phylogenetic analysis based on histone H2A and cytochrome oxidase subunit-1 (CO1) gene revealed N. pinnata to occupy an intermediate position with respect to invertebrates and vertebrates.
Sathyan, Naveen; Philip, Rosamma; Chaithanya, E. R.; Anil Kumar, P. R.; Sanjeevan, V. N.; Singh, I. S. Bright
2013-01-01
Antimicrobial peptides (AMPs) are humoral innate immune components of fishes that provide protection against pathogenic infections. Histone derived antimicrobial peptides are reported to actively participate in the immune defenses of fishes. Present study deals with identification of putative antimicrobial sequences from the histone H2A of sicklefin chimaera, Neoharriotta pinnata. A 52 amino acid residue termed Harriottin-1, a 40 amino acid Harriottin-2, and a 21 mer Harriottin-3 were identified to possess antimicrobial sequence motif. Physicochemical properties and molecular structure of Harriottins are in agreement with the characteristic features of antimicrobial peptides, indicating its potential role in innate immunity of sicklefin chimaera. The histone H2A sequence of sicklefin chimera was found to differ from previously reported histone H2A sequences. Phylogenetic analysis based on histone H2A and cytochrome oxidase subunit-1 (CO1) gene revealed N. pinnata to occupy an intermediate position with respect to invertebrates and vertebrates. PMID:27398241
Adrian-Kalchhauser, Irene; Svensson, Ola; Kutschera, Verena E; Alm Rosenblad, Magnus; Pippel, Martin; Winkler, Sylke; Schloissnig, Siegfried; Blomberg, Anders; Burkhardt-Holm, Patricia
2017-02-16
Vertebrate mitochondrial genomes are optimized for fast replication and low cost of RNA expression. Accordingly, they are devoid of introns, are transcribed as polycistrons and contain very little intergenic sequences. Usually, vertebrate mitochondrial genomes measure between 16.5 and 17 kilobases (kb). During genome sequencing projects for two novel vertebrate models, the invasive round goby and the sand goby, we found that the sand goby genome is exceptionally small (16.4 kb), while the mitochondrial genome of the round goby is much larger than expected for a vertebrate. It is 19 kb in size and is thus one of the largest fish and even vertebrate mitochondrial genomes known to date. The expansion is attributable to a sequence insertion downstream of the putative transcriptional start site. This insertion carries traces of repeats from the control region, but is mostly novel. To get more information about this phenomenon, we gathered all available mitochondrial genomes of Gobiidae and of nine gobioid species, performed phylogenetic analyses, analysed gene arrangements, and compared gobiid mitochondrial genome sizes, ecological information and other species characteristics with respect to the mitochondrial phylogeny. This allowed us amongst others to identify a unique arrangement of tRNAs among Ponto-Caspian gobies. Our results indicate that the round goby mitochondrial genome may contain novel features. Since mitochondrial genome organisation is tightly linked to energy metabolism, these features may be linked to its invasion success. Also, the unique tRNA arrangement among Ponto-Caspian gobies may be helpful in studying the evolution of this highly adaptive and invasive species group. Finally, we find that the phylogeny of gobiids can be further refined by the use of longer stretches of linked DNA sequence.
Berstein, R M; Schluter, S F; Shen, S; Marchalonis, J J
1996-04-16
All immunoglobulins and T-cell receptors throughout phylogeny share regions of highly conserved amino acid sequence. To identify possible primitive immunoglobulins and immunoglobulin-like molecules, we utilized 3' RACE (rapid amplification of cDNA ends) and a highly conserved constant region consensus amino acid sequence to isolate a new immunoglobulin class from the sandbar shark Carcharhinus plumbeus. The immunoglobulin, termed IgW, in its secreted form consists of 782 amino acids and is expressed in both the thymus and the spleen. The molecule overall most closely resembles mu chains of the skate and human and a new putative antigen binding molecule isolated from the nurse shark (NAR). The full-length IgW chain has a variable region resembling human and shark heavy-chain (VH) sequences and a novel joining segment containing the WGXGT motif characteristic of H chains. However, unlike any other H-chain-type molecule, it contains six constant (C) domains. The first C domain contains the cysteine residue characteristic of C mu1 that would allow dimerization with a light (L) chain. The fourth and sixth domains also contain comparable cysteines that would enable dimerization with other H chains or homodimerization. Comparison of the sequences of IgW V and C domains shows homology greater than that found in comparisons among VH and C mu or VL, or CL thereby suggesting that IgW may retain features of the primordial immunoglobulin in evolution.
Methods for determining the genetic affinity of microorganisms and viruses
NASA Technical Reports Server (NTRS)
Fox, George E. (Inventor); Willson, III, Richard C. (Inventor); Zhang, Zhengdong (Inventor)
2012-01-01
Selecting which sub-sequences in a database of nucleic acid such as 16S rRNA are highly characteristic of particular groupings of bacteria, microorganisms, fungi, etc. on a substantially phylogenetic tree. Also applicable to viruses comprising viral genomic RNA or DNA. A catalogue of highly characteristic sequences identified by this method is assembled to establish the genetic identity of an unknown organism. The characteristic sequences are used to design nucleic acid hybridization probes that include the characteristic sequence or its complement, or are derived from one or more characteristic sequences. A plurality of these characteristic sequences is used in hybridization to determine the phylogenetic tree position of the organism(s) in a sample. Those target organisms represented in the original sequence database and sufficient characteristic sequences can identify to the species or subspecies level. Oligonucleotide arrays of many probes are especially preferred. A hybridization signal can comprise fluorescence, chemiluminescence, or isotopic labeling, etc.; or sequences in a sample can be detected by direct means, e.g. mass spectrometry. The method's characteristic sequences can also be used to design specific PCR primers. The method uniquely identifies the phylogenetic affinity of an unknown organism without requiring prior knowledge of what is present in the sample. Even if the organism has not been previously encountered, the method still provides useful information about which phylogenetic tree bifurcation nodes encompass the organism.
Elouej, Sahar; Beleza-Meireles, Ana; Caswell, Richard; Colclough, Kevin; Ellard, Sian; Desvignes, Jean Pierre; Béroud, Christophe; Lévy, Nicolas; Mohammed, Shehla; De Sandre-Giovannoli, Annachiara
2017-06-01
Mandibular hypoplasia, deafness, progeroid features, and lipodystrophy syndrome (MDPL) is an autosomal dominant systemic disorder characterized by prominent loss of subcutaneous fat, a characteristic facial appearance and metabolic abnormalities. This syndrome is caused by heterozygous de novo mutations in the POLD1 gene. To date, 19 patients with MDPL have been reported in the literature and among them 14 patients have been characterized at the molecular level. Twelve unrelated patients carried a recurrent in-frame deletion of a single codon (p.Ser605del) and two other patients carried a novel heterozygous mutation in exon 13 (p.Arg507Cys). Additionally and interestingly, germline mutations of the same gene have been involved in familial polyposis and colorectal cancer (CRC) predisposition. We describe a male and a female patient with MDPL respectively affected with mild and severe phenotypes. Both of them showed mandibular hypoplasia, a beaked nose with bird-like facies, prominent eyes, a small mouth, growth retardation, muscle and skin atrophy, but the female patient showed such a severe and early phenotype that a first working diagnosis of Hutchinson-Gilford Progeria was made. The exploration was performed by direct sequencing of POLD1 gene exon 15 in the male patient with a classical MDPL phenotype and by whole exome sequencing in the female patient and her unaffected parents. Exome sequencing identified in the latter patient a de novo heterozygous undescribed mutation in the POLD1 gene (NM_002691.3: c.3209T>A), predicted to cause the missense change p.Ile1070Asn in the ZnF2 (Zinc Finger 2) domain of the protein. This mutation was not reported in the 1000 Genome Project, dbSNP and Exome sequencing databases. Furthermore, the Isoleucine1070 residue of POLD1 is highly conserved among various species, suggesting that this substitution may cause a major impairment of POLD1 activity. For the second patient, affected with a typical MDPL phenotype, direct sequencing of POLD1 exon 15 revealed the recurrent in-frame deletion (c.1812_1814del, p.S605del). Our work highlights that mutations in different POLD1 domains can lead to phenotypic variability, ranging from dominantly inherited cancer predisposition syndromes, to mild MDPL phenotypes without lifespan reduction, to very severe MDPL syndromes with major premature aging features. These results also suggest that POLD1 gene testing should be considered in patients presenting with severe progeroid features. Copyright © 2017 Elsevier Inc. All rights reserved.
Prediction of enhancer-promoter interactions via natural language processing.
Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui
2018-05-09
Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.
Scarlatti, G; Leitner, T; Halapi, E; Wahlberg, J; Marchisio, P; Clerici-Schoeller, M A; Wigzell, H; Fenyö, E M; Albert, J; Uhlén, M
1993-01-01
We have compared the variable region 3 sequences from 10 human immunodeficiency virus type 1 (HIV-1)-infected infants to virus sequences from the corresponding mothers. The sequences were derived from DNA of uncultured peripheral blood mononuclear cells (PBMC), DNA of cultured PBMC, and RNA from serum collected at or shortly after delivery. The infected infants, in contrast to the mothers, harbored homogeneous virus populations. Comparison of sequences from the children and clones derived from DNA of the corresponding mothers showed that the transmitted virus represented either a minor or a major virus population of the mother. In contrast to an earlier study, we found no evidence of selection of minor virus variants during transmission. Furthermore, the transmitted virus variant did not show any characteristic molecular features. In some cases the transmitted virus was more related to the virus RNA population of the mother and in other cases it was more related to the virus DNA population. This suggests that either cell-free or cell-associated virus may be transmitted. These data will help AIDS researchers to understand the mechanism of transmission and to plan strategies for prevention of transmission. PMID:8446584
Genome-Wide Identification and Comparative Analysis of Albumin Family in Vertebrates
Li, Shugang; Cao, Yiping; Geng, Fang
2017-01-01
Albumins are the most well-known globular proteins, and the most typical representatives are the serum albumins. However, less attention was paid to the albumin family, except for the human and bovine serum albumin. To characterize the features of albumin family, we have mined all the putative albumin proteins from the available genome sequences. The results showed that albumin is widely distributed in vertebrates, but not present in the bacteria and archaea. The phylogenetic analysis of vertebrate albumin family implied an evolutionary relationship between members of serum albumin, α-fetoprotein, vitamin D–binding protein, and afamin. Meanwhile, a new member from the albumin family was found, namely, extracellular matrix protein 1. The structural analysis revealed that the motifs for forming the internal disulfide bonds are highly conserved in the albumin family, despite the low overall sequence identity across the family. The domain arrangement of albumin proteins indicated that most of vertebrate albumins contain 3 characteristic domains, arising from 2 evolutionary patterns. And a significant trend has been observed that the albumin proteins in higher vertebrate species tend to possess more characteristic domains. This study has provided the fundamental information required for achieving a better understanding of the albumin distribution, phylogenetic relationship, characteristic motif, structure, and new insights into the evolutionary pattern. PMID:28680266
Matsudate, Yoshihiro; Naruto, Takuya; Hayashi, Yumiko; Minami, Mitsuyoshi; Tohyama, Mikiko; Yokota, Kenji; Yamada, Daisuke; Imoto, Issei; Kubo, Yoshiaki
2017-06-01
Nevoid basal cell carcinoma syndrome (NBCCS) is an autosomal dominant disorder mainly caused by heterozygous mutations of PTCH1. In addition to characteristic clinical features, detection of a mutation in causative genes is reliable for the diagnosis of NBCCS; however, no mutations have been identified in some patients using conventional methods. To improve the method for the molecular diagnosis of NBCCS. We performed targeted exome sequencing (TES) analysis using a multi-gene panel, including PTCH1, PTCH2, SUFU, and other sonic hedgehog signaling pathway-related genes, based on next-generation sequencing (NGS) technology in 8 cases in whom possible causative mutations were not detected by previously performed conventional analysis and 2 recent cases of NBCCS. Subsequent analysis of gross deletion within or around PTCH1 detected by TES was performed using chromosomal microarray (CMA). Through TES analysis, specific single nucleotide variants or small indels of PTCH1 causing inferred amino acid changes were identified in 2 novel cases and 2 undiagnosed cases, whereas gross deletions within or around PTCH1, which are validated by CMA, were found in 3 undiagnosed cases. However, no mutations were detected even by TES in 3 cases. Among 3 cases with gross deletions of PTCH1, deletions containing the entire PTCH1 and additional neighboring genes were detected in 2 cases, one of which exhibited atypical clinical features, such as severe mental retardation, likely associated with genes located within the 4.3Mb deleted region, especially. TES-based simultaneous evaluation of sequences and copy number status in all targeted coding exons by NGS is likely to be more useful for the molecular diagnosis of NBCCS than conventional methods. CMA is recommended as a subsequent analysis for validation and detailed mapping of deleted regions, which may explain the atypical clinical features of NBCCS cases. Copyright © 2017 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.
Zhang, Long; Jia, Lianyin; Ren, Yazhou
2017-01-01
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae, DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study. PMID:29117139
Ye, Wenwu; Wang, Yang; Shen, Danyu; Li, Delong; Pu, Tianhuizi; Jiang, Zide; Zhang, Zhengguang; Zheng, Xiaobo; Tyler, Brett M; Wang, Yuanchao
2016-07-01
On the basis of its downy mildew-like morphology, the litchi downy blight pathogen was previously named Peronophythora litchii. Recently, however, it was proposed to transfer this pathogen to Phytophthora clade 4. To better characterize this unusual oomycete species and important fruit pathogen, we obtained the genome sequence of Phytophthora litchii and compared it to those from other oomycete species. P. litchii has a small genome with tightly spaced genes. On the basis of a multilocus phylogenetic analysis, the placement of P. litchii in the genus Phytophthora is strongly supported. Effector proteins predicted included 245 RxLR, 30 necrosis-and-ethylene-inducing protein-like, and 14 crinkler proteins. The typical motifs, phylogenies, and activities of these effectors were typical for a Phytophthora species. However, like the genome features of the analyzed downy mildews, P. litchii exhibited a streamlined genome with a relatively small number of genes in both core and species-specific protein families. The low GC content and slight codon preferences of P. litchii sequences were similar to those of the analyzed downy mildews and a subset of Phytophthora species. Taken together, these observations suggest that P. litchii is a Phytophthora pathogen that is in the process of acquiring downy mildew-like genomic and morphological features. Thus P. litchii may provide a novel model for investigating morphological development and genomic adaptation in oomycete pathogens.
Wang, Jun; Zhang, Long; Jia, Lianyin; Ren, Yazhou; Yu, Guoxian
2017-11-08
Protein-protein interactions (PPIs) play crucial roles in almost all cellular processes. Although a large amount of PPIs have been verified by high-throughput techniques in the past decades, currently known PPIs pairs are still far from complete. Furthermore, the wet-lab experiments based techniques for detecting PPIs are time-consuming and expensive. Hence, it is urgent and essential to develop automatic computational methods to efficiently and accurately predict PPIs. In this paper, a sequence-based approach called DNN-LCTD is developed by combining deep neural networks (DNNs) and a novel local conjoint triad description (LCTD) feature representation. LCTD incorporates the advantage of local description and conjoint triad, thus, it is capable to account for the interactions between residues in both continuous and discontinuous regions of amino acid sequences. DNNs can not only learn suitable features from the data by themselves, but also learn and discover hierarchical representations of data. When performing on the PPIs data of Saccharomyces cerevisiae , DNN-LCTD achieves superior performance with accuracy as 93.12%, precision as 93.75%, sensitivity as 93.83%, area under the receiver operating characteristic curve (AUC) as 97.92%, and it only needs 718 s. These results indicate DNN-LCTD is very promising for predicting PPIs. DNN-LCTD can be a useful supplementary tool for future proteomics study.
repRNA: a web server for generating various feature vectors of RNA sequences.
Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen
2016-02-01
With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .
Automated frame selection process for high-resolution microendoscopy
NASA Astrophysics Data System (ADS)
Ishijima, Ayumu; Schwarz, Richard A.; Shin, Dongsuk; Mondrik, Sharon; Vigneswaran, Nadarajah; Gillenwater, Ann M.; Anandasabapathy, Sharmila; Richards-Kortum, Rebecca
2015-04-01
We developed an automated frame selection algorithm for high-resolution microendoscopy video sequences. The algorithm rapidly selects a representative frame with minimal motion artifact from a short video sequence, enabling fully automated image analysis at the point-of-care. The algorithm was evaluated by quantitative comparison of diagnostically relevant image features and diagnostic classification results obtained using automated frame selection versus manual frame selection. A data set consisting of video sequences collected in vivo from 100 oral sites and 167 esophageal sites was used in the analysis. The area under the receiver operating characteristic curve was 0.78 (automated selection) versus 0.82 (manual selection) for oral sites, and 0.93 (automated selection) versus 0.92 (manual selection) for esophageal sites. The implementation of fully automated high-resolution microendoscopy at the point-of-care has the potential to reduce the number of biopsies needed for accurate diagnosis of precancer and cancer in low-resource settings where there may be limited infrastructure and personnel for standard histologic analysis.
Systems properties of the Haemophilus influenzae Rd metabolic genotype.
Edwards, J S; Palsson, B O
1999-06-18
Haemophilus influenzae Rd was the first free-living organism for which the complete genomic sequence was established. The annotated sequence and known biochemical information was used to define the H. influenzae Rd metabolic genotype. This genotype contains 488 metabolic reactions operating on 343 metabolites. The stoichiometric matrix was used to determine the systems characteristics of the metabolic genotype and to assess the metabolic capabilities of H. influenzae. The need to balance cofactor and biosynthetic precursor production during growth on mixed substrates led to the definition of six different optimal metabolic phenotypes arising from the same metabolic genotype, each with different constraining features. The effects of variations in the metabolic genotype were also studied, and it was shown that the H. influenzae Rd metabolic genotype contains redundant functions under defined conditions. We thus show that the synthesis of in silico metabolic genotypes from annotated genome sequences is possible and that systems analysis methods are available that can be used to analyze and interpret phenotypic behavior of such genotypes.
Analysis of Nearly One Thousand Mammalian Mirtrons Reveals Novel Features of Dicer Substrates
Shenker, Sol; Mohammed, Jaaved; Lai, Eric C.
2015-01-01
Mirtrons are microRNA (miRNA) substrates that utilize the splicing machinery to bypass the necessity of Drosha cleavage for their biogenesis. Expanding our recent efforts for mammalian mirtron annotation, we use meta-analysis of aggregate datasets to identify ~500 novel mouse and human introns that confidently generate diced small RNA duplexes. These comprise nearly 1000 total loci distributed in four splicing-mediated biogenesis subclasses, with 5'-tailed mirtrons as, by far, the dominant subtype. Thus, mirtrons surprisingly comprise a substantial fraction of endogenous Dicer substrates in mammalian genomes. Although mirtron-derived small RNAs exhibit overall expression correlation with their host mRNAs, we observe a subset with substantial differences that suggest regulated processing or accumulation. We identify characteristic sequence, length, and structural features of mirtron loci that distinguish them from bulk introns, and find that mirtrons preferentially emerge from genes with larger numbers of introns. While mirtrons generate miRNA-class regulatory RNAs, we also find that mirtrons exhibit many features that distinguish them from canonical miRNAs. We observe that conventional mirtron hairpins are substantially longer than Drosha-generated pre-miRNAs, indicating that the characteristic length of canonical pre-miRNAs is not a general feature of Dicer substrate hairpins. In addition, mammalian mirtrons exhibit unique patterns of ordered 5' and 3' heterogeneity, which reveal hidden complexity in miRNA processing pathways. These include broad 3'-uridylation of mirtron hairpins, atypically heterogeneous 5' termini that may result from exonucleolytic processing, and occasionally robust decapitation of the 5' guanine (G) of mirtron-5p species defined by splicing. Altogether, this study reveals that this extensive class of non-canonical miRNA bears a multitude of characteristic properties, many of which raise general mechanistic questions regarding the processing of endogenous hairpin transcripts. PMID:26325366
NASA Astrophysics Data System (ADS)
Yang, Hongxin; Su, Fulin
2018-01-01
We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.
Cooper, David N.; Bacolla, Albino; Férec, Claude; Vasquez, Karen M.; Kehrer-Sawatzki, Hildegard; Chen, Jian-Min
2011-01-01
Different types of human gene mutation may vary in size, from structural variants (SVs) to single base-pair substitutions, but what they all have in common is that their nature, size and location are often determined either by specific characteristics of the local DNA sequence environment or by higher-order features of the genomic architecture. The human genome is now recognized to contain ‘pervasive architectural flaws’ in that certain DNA sequences are inherently mutation-prone by virtue of their base composition, sequence repetitivity and/or epigenetic modification. Here we explore how the nature, location and frequency of different types of mutation causing inherited disease are shaped in large part, and often in remarkably predictable ways, by the local DNA sequence environment. The mutability of a given gene or genomic region may also be influenced indirectly by a variety of non-canonical (non-B) secondary structures whose formation is facilitated by the underlying DNA sequence. Since these non-B DNA structures can interfere with subsequent DNA replication and repair, and may serve to increase mutation frequencies in generalized fashion (i.e. both in the context of subtle mutations and SVs), they have the potential to serve as a unifying concept in studies of mutational mechanisms underlying human inherited disease. PMID:21853507
Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
2014-06-01
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Walker, J; Tait, A
1997-11-01
A reverse-transcriptase polymerase chain reaction (PCR) procedure was used to isolate an Ostertagia circumcincta partial cDNA encoding a protein with general primary sequence features characteristic of members of the mitochondrial processing peptidase (MPP) subfamily of M16 metallopeptidases. The structural relationships of the predicted protein (Oc MPPX) with MPP subfamily proteins from other species (including the model free-living nematode Caenorhabditis elegans) were examined, and Northern analysis confirmed the expression of the Oc mppx gene in adult nematodes.
SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.
Vasylenko, Tamara; Liou, Yi-Fan; Chen, Hong-An; Charoenkwan, Phasit; Huang, Hui-Ling; Ho, Shinn-Ying
2015-01-01
Photosynthetic proteins (PSPs) greatly differ in their structure and function as they are involved in numerous subprocesses that take place inside an organelle called a chloroplast. Few studies predict PSPs from sequences due to their high variety of sequences and structues. This work aims to predict and characterize PSPs by establishing the datasets of PSP and non-PSP sequences and developing prediction methods. A novel bioinformatics method of predicting and characterizing PSPs based on scoring card method (SCMPSP) was used. First, a dataset consisting of 649 PSPs was established by using a Gene Ontology term GO:0015979 and 649 non-PSPs from the SwissProt database with sequence identity <= 25%.- Several prediction methods are presented based on support vector machine (SVM), decision tree J48, Bayes, BLAST, and SCM. The SVM method using dipeptide features-performed well and yielded - a test accuracy of 72.31%. The SCMPSP method uses the estimated propensity scores of 400 dipeptides - as PSPs and has a test accuracy of 71.54%, which is comparable to that of the SVM method. The derived propensity scores of 20 amino acids were further used to identify informative physicochemical properties for characterizing PSPs. The analytical results reveal the following four characteristics of PSPs: 1) PSPs favour hydrophobic side chain amino acids; 2) PSPs are composed of the amino acids prone to form helices in membrane environments; 3) PSPs have low interaction with water; and 4) PSPs prefer to be composed of the amino acids of electron-reactive side chains. The SCMPSP method not only estimates the propensity of a sequence to be PSPs, it also discovers characteristics that further improve understanding of PSPs. The SCMPSP source code and the datasets used in this study are available at http://iclab.life.nctu.edu.tw/SCMPSP/.
Rossi, Mari; El-Khechen, Dima; Black, Mary Helen; Farwell Hagman, Kelly D; Tang, Sha; Powis, Zöe
2017-05-01
Exome sequencing has recently been proved to be a successful diagnostic method for complex neurodevelopmental disorders. However, the diagnostic yield of exome sequencing for autism spectrum disorders has not been extensively evaluated in large cohorts to date. We performed diagnostic exome sequencing in a cohort of 163 individuals with autism spectrum disorder (66.3%) or autistic features (33.7%). The diagnostic yield observed in patients in our cohort was 25.8% (42 of 163) for positive or likely positive findings in characterized disease genes, while a candidate genetic etiology was reported for an additional 3.3% (4 of 120) of patients. Among the positive findings in the patients with autism spectrum disorder or autistic features, 61.9% were the result of de novo mutations. Patients presenting with psychiatric conditions or ataxia or paraplegia in addition to autism spectrum disorder or autistic features were significantly more likely to receive positive results compared with patients without these clinical features (95.6% vs 27.1%, P < 0.0001; 83.3% vs 21.2%, P < 0.0001, respectively). The majority of the positive findings were in recently identified autism spectrum disorder genes, supporting the importance of diagnostic exome sequencing for patients with autism spectrum disorder or autistic features as the causative genes might evade traditional sequential or panel testing. These results suggest that diagnostic exome sequencing would be an efficient primary diagnostic method for patients with autism spectrum disorders or autistic features. Moreover, our data may aid clinicians to better determine which subset of patients with autism spectrum disorder with additional clinical features would benefit the most from diagnostic exome sequencing. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
NASA Astrophysics Data System (ADS)
Poiata, Natalia; Vilotte, Jean-Pierre; Bernard, Pascal; Satriano, Claudio; Obara, Kazushige
2018-06-01
In this study, we demonstrate the capability of an automatic network-based detection and location method to extract and analyse different components of tectonic tremor activity by analysing a 9-day energetic tectonic tremor sequence occurring at the downdip extension of the subducting slab in southwestern Japan. The applied method exploits the coherency of multiscale, frequency-selective characteristics of non-stationary signals recorded across the seismic network. Use of different characteristic functions, in the signal processing step of the method, allows to extract and locate the sources of short-duration impulsive signal transients associated with low-frequency earthquakes and of longer-duration energy transients during the tectonic tremor sequence. Frequency-dependent characteristic functions, based on higher-order statistics' properties of the seismic signals, are used for the detection and location of low-frequency earthquakes. This allows extracting a more complete (˜6.5 times more events) and time-resolved catalogue of low-frequency earthquakes than the routine catalogue provided by the Japan Meteorological Agency. As such, this catalogue allows resolving the space-time evolution of the low-frequency earthquakes activity in great detail, unravelling spatial and temporal clustering, modulation in response to tide, and different scales of space-time migration patterns. In the second part of the study, the detection and source location of longer-duration signal energy transients within the tectonic tremor sequence is performed using characteristic functions built from smoothed frequency-dependent energy envelopes. This leads to a catalogue of longer-duration energy sources during the tectonic tremor sequence, characterized by their durations and 3-D spatial likelihood maps of the energy-release source regions. The summary 3-D likelihood map for the 9-day tectonic tremor sequence, built from this catalogue, exhibits an along-strike spatial segmentation of the long-duration energy-release regions, matching the large-scale clustering features evidenced from the low-frequency earthquake's activity analysis. Further examination of the two catalogues showed that the extracted short-duration low-frequency earthquakes activity coincides in space, within about 10-15 km distance, with the longer-duration energy sources during the tectonic tremor sequence. This observation provides a potential constraint on the size of the longer-duration energy-radiating source region in relation with the clustering of low-frequency earthquakes activity during the analysed tectonic tremor sequence. We show that advanced statistical network-based methods offer new capabilities for automatic high-resolution detection, location and monitoring of different scale-components of tectonic tremor activity, enriching existing slow earthquakes catalogues. Systematic application of such methods to large continuous data sets will allow imaging the slow transient seismic energy-release activity at higher resolution, and therefore, provide new insights into the underlying multiscale mechanisms of slow earthquakes generation.
NASA Astrophysics Data System (ADS)
Poiata, Natalia; Vilotte, Jean-Pierre; Bernard, Pascal; Satriano, Claudio; Obara, Kazushige
2018-02-01
In this study, we demonstrate the capability of an automatic network-based detection and location method to extract and analyse different components of tectonic tremor activity by analysing a 9-day energetic tectonic tremor sequence occurring at the down-dip extension of the subducting slab in southwestern Japan. The applied method exploits the coherency of multi-scale, frequency-selective characteristics of non-stationary signals recorded across the seismic network. Use of different characteristic functions, in the signal processing step of the method, allows to extract and locate the sources of short-duration impulsive signal transients associated with low-frequency earthquakes and of longer-duration energy transients during the tectonic tremor sequence. Frequency-dependent characteristic functions, based on higher-order statistics' properties of the seismic signals, are used for the detection and location of low-frequency earthquakes. This allows extracting a more complete (˜6.5 times more events) and time-resolved catalogue of low-frequency earthquakes than the routine catalogue provided by the Japan Meteorological Agency. As such, this catalogue allows resolving the space-time evolution of the low-frequency earthquakes activity in great detail, unravelling spatial and temporal clustering, modulation in response to tide, and different scales of space-time migration patterns. In the second part of the study, the detection and source location of longer-duration signal energy transients within the tectonic tremor sequence is performed using characteristic functions built from smoothed frequency-dependent energy envelopes. This leads to a catalogue of longer-duration energy sources during the tectonic tremor sequence, characterized by their durations and 3-D spatial likelihood maps of the energy-release source regions. The summary 3-D likelihood map for the 9-day tectonic tremor sequence, built from this catalogue, exhibits an along-strike spatial segmentation of the long-duration energy-release regions, matching the large-scale clustering features evidenced from the low-frequency earthquake's activity analysis. Further examination of the two catalogues showed that the extracted short-duration low-frequency earthquakes activity coincides in space, within about 10-15 km distance, with the longer-duration energy sources during the tectonic tremor sequence. This observation provides a potential constraint on the size of the longer-duration energy-radiating source region in relation with the clustering of low-frequency earthquakes activity during the analysed tectonic tremor sequence. We show that advanced statistical network-based methods offer new capabilities for automatic high-resolution detection, location and monitoring of different scale-components of tectonic tremor activity, enriching existing slow earthquakes catalogues. Systematic application of such methods to large continuous data sets will allow imaging the slow transient seismic energy-release activity at higher resolution, and therefore, provide new insights into the underlying multi-scale mechanisms of slow earthquakes generation.
Computational Characterization of Exogenous MicroRNAs that Can Be Transferred into Human Circulation
Shu, Jiang; Chiang, Kevin; Zempleni, Janos; Cui, Juan
2015-01-01
MicroRNAs have been long considered synthesized endogenously until very recent discoveries showing that human can absorb dietary microRNAs from animal and plant origins while the mechanism remains unknown. Compelling evidences of microRNAs from rice, milk, and honeysuckle transported to human blood and tissues have created a high volume of interests in the fundamental questions that which and how exogenous microRNAs can be transferred into human circulation and possibly exert functions in humans. Here we present an integrated genomics and computational analysis to study the potential deciding features of transportable microRNAs. Specifically, we analyzed all publicly available microRNAs, a total of 34,612 from 194 species, with 1,102 features derived from the microRNA sequence and structure. Through in-depth bioinformatics analysis, 8 groups of discriminative features have been used to characterize human circulating microRNAs and infer the likelihood that a microRNA will get transferred into human circulation. For example, 345 dietary microRNAs have been predicted as highly transportable candidates where 117 of them have identical sequences with their homologs in human and 73 are known to be associated with exosomes. Through a milk feeding experiment, we have validated 9 cow-milk microRNAs in human plasma using microRNA-sequencing analysis, including the top ranked microRNAs such as bta-miR-487b, miR-181b, and miR-421. The implications in health-related processes have been illustrated in the functional analysis. This work demonstrates the data-driven computational analysis is highly promising to study novel molecular characteristics of transportable microRNAs while bypassing the complex mechanistic details. PMID:26528912
Shu, Jiang; Chiang, Kevin; Zempleni, Janos; Cui, Juan
2015-01-01
MicroRNAs have been long considered synthesized endogenously until very recent discoveries showing that human can absorb dietary microRNAs from animal and plant origins while the mechanism remains unknown. Compelling evidences of microRNAs from rice, milk, and honeysuckle transported to human blood and tissues have created a high volume of interests in the fundamental questions that which and how exogenous microRNAs can be transferred into human circulation and possibly exert functions in humans. Here we present an integrated genomics and computational analysis to study the potential deciding features of transportable microRNAs. Specifically, we analyzed all publicly available microRNAs, a total of 34,612 from 194 species, with 1,102 features derived from the microRNA sequence and structure. Through in-depth bioinformatics analysis, 8 groups of discriminative features have been used to characterize human circulating microRNAs and infer the likelihood that a microRNA will get transferred into human circulation. For example, 345 dietary microRNAs have been predicted as highly transportable candidates where 117 of them have identical sequences with their homologs in human and 73 are known to be associated with exosomes. Through a milk feeding experiment, we have validated 9 cow-milk microRNAs in human plasma using microRNA-sequencing analysis, including the top ranked microRNAs such as bta-miR-487b, miR-181b, and miR-421. The implications in health-related processes have been illustrated in the functional analysis. This work demonstrates the data-driven computational analysis is highly promising to study novel molecular characteristics of transportable microRNAs while bypassing the complex mechanistic details.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans.
Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M
2013-11-01
Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12,766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades.
Systematic analysis and evolution of 5S ribosomal DNA in metazoans
Vierna, J; Wehner, S; Höner zu Siederdissen, C; Martínez-Lage, A; Marz, M
2013-01-01
Several studies on 5S ribosomal DNA (5S rDNA) have been focused on a subset of the following features in mostly one organism: number of copies, pseudogenes, secondary structure, promoter and terminator characteristics, genomic arrangements, types of non-transcribed spacers and evolution. In this work, we systematically analyzed 5S rDNA sequence diversity in available metazoan genomes, and showed organism-specific and evolutionary-conserved features. Putatively functional sequences (12 766) from 97 organisms allowed us to identify general features of this multigene family in animals. Interestingly, we show that each mammal species has a highly conserved (housekeeping) 5S rRNA type and many variable ones. The genomic organization of 5S rDNA is still under debate. Here, we report the occurrence of several paralog 5S rRNA sequences in 58 of the examined species, and a flexible genome organization of 5S rDNA in animals. We found heterogeneous 5S rDNA clusters in several species, supporting the hypothesis of an exchange of 5S rDNA from one locus to another. A rather high degree of variation of upstream, internal and downstream putative regulatory regions appears to characterize metazoan 5S rDNA. We systematically studied the internal promoters and described three different types of termination signals, as well as variable distances between the coding region and the typical termination signal. Finally, we present a statistical method for detection of linkage among noncoding RNA (ncRNA) gene families. This method showed no evolutionary-conserved linkage among 5S rDNAs and any other ncRNA genes within Metazoa, even though we found 5S rDNA to be linked to various ncRNAs in several clades. PMID:23838690
Yamazaki, Tomohiro; Matsuo, Junji; Takahashi, Satoshi; Kumagai, Shouta; Shimoda, Tomoko; Abe, Kiyotaka; Minami, Kunihiro; Yamaguchi, Hiroyuki
2015-12-01
Although sexually transmitted disease due to Chlamydia trachomatis occurs similarly in both men and women, the female urogenital tract differs from that of males anatomically and physiologically, possibly leading to specific polymorphisms of the bacterial surface molecules. In the present study, we therefore characterized polymorphic features in a high-definition phylogenetic marker, polymorphic outer membrane protein (Pmp) F of C. trachomatis strains isolated from male urogenital tracts in Japan (Category: Japan-males, n = 12), when compared with those isolated from female cervical ducts in Japan (Category: Japan-females, n = 11), female cervical ducts in the other country (Category: Ref-females, n = 12) or homosexual male rectums in the other country (Category: Ref-males, n = 7), by general bioinformatics analysis tool with MAFFT software. As a result, phylogenetic reconstruction of the PmpF amino acid sequences showing three distinct clusters revealed that the Japan-males were limited into cluster 1 and 2, although there were only four clusters even though including an outgroup. Meanwhile, the phylogenetic distance values of PmpF passenger domain without hinge region, but not its full-length sequence, showed that the Japan-males were more stable and displayed less diversity when compared with the other categories, supported by the sequence conservation features. Thus, PmpF passenger domain is a useful phylogenetic maker, and the phylogenic features indicate that C. trachomatis strains isolated from male urogenital tracts in Japan may be unique, suggesting an adaptation depending on selective pressure, such as the presence or absence of microbial flora, furthermore possibly connecting to sexual differentiation. Copyright © 2015 Japanese Society of Chemotherapy and The Japanese Association for Infectious Diseases. Published by Elsevier Ltd. All rights reserved.
Protein location prediction using atomic composition and global features of the amino acid sequence
DOE Office of Scientific and Technical Information (OSTI.GOV)
Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.
2010-01-22
Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less
Berdan, J.M.
1984-01-01
Leperditicopid ostracodes from the Ordovician formations of Kentucky occur in micritic to fine-grained carbonate rocks believed to represent shallow-water facies. They are found at widely separated horizons in the Middle Ordovician High Bridge Group, the Middle and Upper Ordovician Lexington Limestone, and the Upper Ordovician Ashlock, Bull Fork, and Drakes Formations. In this sequence, the leperditicopes are represented by two genera of leperditiids, Eoleperditia Swartz, 1949 and Bivia Berdan, 1976, and six isochilinid genera, Isochilina Jones, 1858, Teichochilina Swartz, 1949, Ceratoleperditia Harris, 1960, Parabriartina n. gen., Kenodontochilina n. gen., and Saffordellina Bassler and Kellett, 1934; the type species of the hitherto poorly known genus Saffordellina, S. muralis (Ulrich and Bassler, 1923), is redescribed and refigured. In all, 18 taxa, of which 2 are in open nomenclature, are described and illustrated. In addition, the family Isochilinidae Swartz, 1949 is redefined to include genera without marginal brims and with straight ventral contact margins. The morphological characteristics of leperditicopid genera are discussed, and a table listing described genera and their diagnostic features is included.
Lu, Shan-Shan; Ge, Song; Su, Chun-Qiu; Xie, Jun; Mao, Jian; Shi, Hai-Bin; Hong, Xun-Ning
2017-10-30
Intracranial plaque characteristics are associated with stroke events. Differences in plaque features may explain the disconnect between stenosis severity and the presence of ischemic stroke. To investigate the relationship between plaque characteristics and downstream perfusion changes, and their contribution to the occurrence of cerebral infarction beyond luminal stenosis. Case control. Forty-six patients with symptomatic middle cerebral artery (MCA) stenosis (with acute cerebral infarction, n = 30; without acute cerebral infarction, n = 16). 3.0T with 3D turbo spin echo sequence (3D-SPACE). Luminal stenosis grade, plaque features including lesion T 2 and T 1 hyperintense components, plaque enhancement grade, and plaque distribution were assessed. Brain perfusion was evaluated on mean transient time maps based on the Alberta Stroke Program Early CT score (MTT-ASPECTS). Plaque features, grade of luminal stenosis, and MTT-ASPECTS were compared between two groups. The association between plaque features and MTT-ASPECTS were assessed using Spearman's correlation analysis. Multivariate logistic regression and receiver operating characteristic (ROC) curves were constructed to assess the effect of significant variables alone and their combination in determining the occurrence of cerebral infarction. Stronger enhanced plaques were associated with downstream lower MTT-ASPECTS (P = 0.010). Plaque enhancement grade (P = 0.039, odds ratio [OR] 5.9, 95% confidence interval [CI] 1.1-32) and MTT-ASPECTS (P = 0.003, OR 2.6, 95% CI 1.4-4.7) were associated with a recent cerebral infarction, whereas luminal stenosis grade was not (P = 0.128). The combination of MTT-ASPECTS and plaque enhancement grade provided incremental information beyond luminal stenosis grade alone. The area under the receiver operating characteristic curve (AUC) improved from 0.535 to 0.921 (P < 0.05). Strongly enhanced plaques are associated with a higher likelihood of downstream perfusion impairment. Plaque enhancement and perfusion evaluation may play a complementary role to luminal stenosis in determining the occurrence of acute cerebral infarction. 4 Technical Efficacy: Stage 2 J. Magn. Reson. Imaging 2017. © 2017 International Society for Magnetic Resonance in Medicine.
Shaw, D R; Richter, H; Giorda, R; Ohmachi, T; Ennis, H L
1989-09-01
A Dictyostelium discoideum repetitive element composed of long repeats of the codon (AAC) is found in developmentally regulated transcripts. The concentration of (AAC) sequences is low in mRNA from dormant spores and growing cells and increases markedly during spore germination and multicellular development. The sequence hybridizes to many different sized Dictyostelium DNA restriction fragments indicating that it is scattered throughout the genome. Four cDNA clones isolated contain (AAC) sequences in the deduced coding region. Interestingly, the (AAC)-rich sequences are present in all three reading frames in the deduced proteins, i.e., AAC (asparagine), ACA (threonine) and CAA (glutamine). Three of the clones contain only one of these in-frame so that the individual proteins carry either asparagine, threonine, or glutamine clusters, not mixtures. However, one clone is both glutamine- and asparagine-rich. The (AAC) portion of the transcripts are reiterated 300 times in the haploid genome while the other portions of the cDNAs represent single copy genes, whose sequences show no similarity other than the (AAC) repeats. The repeated sequence is similar to the opa or M sequence found in Drosophila melanogaster notch and homeo box genes and in fly developmentally regulated transcripts. The transcripts are present on polysomes suggesting that they are translated. Although the function of these repeats is unknown, long amino acid repeats are a characteristic feature of extracellular proteins of lower eukaryotes.
A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures
Shiraishi, Yuichi; Tremmel, Georg; Miyano, Satoru; Stephens, Matthew
2015-01-01
Recent advances in sequencing technologies have enabled the production of massive amounts of data on somatic mutations from cancer genomes. These data have led to the detection of characteristic patterns of somatic mutations or “mutation signatures” at an unprecedented resolution, with the potential for new insights into the causes and mechanisms of tumorigenesis. Here we present new methods for modelling, identifying and visualizing such mutation signatures. Our methods greatly simplify mutation signature models compared with existing approaches, reducing the number of parameters by orders of magnitude even while increasing the contextual factors (e.g. the number of flanking bases) that are accounted for. This improves both sensitivity and robustness of inferred signatures. We also provide a new intuitive way to visualize the signatures, analogous to the use of sequence logos to visualize transcription factor binding sites. We illustrate our new method on somatic mutation data from urothelial carcinoma of the upper urinary tract, and a larger dataset from 30 diverse cancer types. The results illustrate several important features of our methods, including the ability of our new visualization tool to clearly highlight the key features of each signature, the improved robustness of signature inferences from small sample sizes, and more detailed inference of signature characteristics such as strand biases and sequence context effects at the base two positions 5′ to the mutated site. The overall framework of our work is based on probabilistic models that are closely connected with “mixed-membership models” which are widely used in population genetic admixture analysis, and in machine learning for document clustering. We argue that recognizing these relationships should help improve understanding of mutation signature extraction problems, and suggests ways to further improve the statistical methods. Our methods are implemented in an R package pmsignature (https://github.com/friend1ws/pmsignature) and a web application available at https://friend1ws.shinyapps.io/pmsignature_shiny/. PMID:26630308
Frequency of the first feature in action sequences influences feature binding.
Mattson, Paul S; Fournier, Lisa R; Behmer, Lawrence P
2012-10-01
We investigated whether binding among perception and action feature codes is a preliminary step toward creating a more durable memory trace of an action event. If so, increasing the frequency of a particular event (e.g., a stimulus requiring a movement with the left or right hand in an up or down direction) should increase the strength and speed of feature binding for this event. The results from two experiments, using a partial-repetition paradigm, confirmed that feature binding increased in strength and/or occurred earlier for a high-frequency (e.g., left hand moving up) than for a low-frequency (e.g., right hand moving down) event. Moreover, increasing the frequency of the first-specified feature in the action sequence alone (e.g., "left" hand) increased the strength and/or speed of action feature binding (e.g., between the "left" hand and movement in an "up" or "down" direction). The latter finding suggests an update to the theory of event coding, as not all features in the action sequence equally determine binding strength. We conclude that action planning involves serial binding of features in the order of action feature execution (i.e., associations among features are not bidirectional but are directional), which can lead to a more durable memory trace. This is consistent with physiological evidence suggesting that serial order is preserved in an action plan executed from memory and that the first feature in the action sequence may be critical in preserving this serial order.
Brzuszkiewicz, Elzbieta; Thürmer, Andrea; Schuldes, Jörg; Leimbach, Andreas; Liesegang, Heiko; Meyer, Frauke-Dorothee; Boelter, Jürgen; Petersen, Heiko; Gottschalk, Gerhard; Daniel, Rolf
2011-12-01
The genome sequences of two Escherichia coli O104:H4 strains derived from two different patients of the 2011 German E. coli outbreak were determined. The two analyzed strains were designated E. coli GOS1 and GOS2 (German outbreak strain). Both isolates comprise one chromosome of approximately 5.31 Mbp and two putative plasmids. Comparisons of the 5,217 (GOS1) and 5,224 (GOS2) predicted protein-encoding genes with various E. coli strains, and a multilocus sequence typing analysis revealed that the isolates were most similar to the entero-aggregative E. coli (EAEC) strain 55989. In addition, one of the putative plasmids of the outbreak strain is similar to pAA-type plasmids of EAEC strains, which contain aggregative adhesion fimbrial operons. The second putative plasmid harbors genes for extended-spectrum β-lactamases. This type of plasmid is widely distributed in pathogenic E. coli strains. A significant difference of the E. coli GOS1 and GOS2 genomes to those of EAEC strains is the presence of a prophage encoding the Shiga toxin, which is characteristic for enterohemorrhagic E. coli (EHEC) strains. The unique combination of genomic features of the German outbreak strain, containing characteristics from pathotypes EAEC and EHEC, suggested that it represents a new pathotype Entero-Aggregative-Haemorrhagic E scherichia c oli (EAHEC).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ramli, N.
1986-01-01
The J sandstone is an important hydrocarbon-bearing reservoir in the southeastern part of the Malay basin. The lower and upper members of the J sandstone are composed of shoreface and offshore sediments. The shoreface sequence contains depositional structures characteristic of a barred wave- and storm-dominated shoreface. Each shoreface sequence is laterally associated with a series of stacked offshore bars. Offshore bars can be subdivided into proximal and distal types. Two types of proximal offshore bars have been identified: (1) proximal bars formed largely above fair-weather wave base (inner proximal bars), and (2) proximal bars formed below fair-weather wave base (outermore » proximal bars). The inner proximal bars are closely associated with the shoreface sequence and are similar to the middle and lower shoreface. The presence of poorly sorted, polymodal, very fine to very coarse-grained sandstone beneath well-sorted crestal sandstones of inner proximal bars suggests that these offshore bars may have been deposited rapidly by storms. The crests of the inner proximal offshore bars were subsequently reworked by fair-weather processes, and the crests of the outer proximal and distal offshore bars were reworked by waning storm currents and oscillatory waves. Thick marine shales overlying offshore bars contain isolated sheet sandstones. Each sheet sandstone exhibits features that may be characteristic of distal storm shelf deposits. 15 figures, 2 tables.« less
Graf, Hansjörg; Martirosian, Petros; Schick, Fritz; Grieser, Marco; Bellemann, Matthias E
2003-06-01
Inductively coupled solenoid coils fitting to objects in the size of mice or rats were developed to adapt modem whole-body MR scanners featuring sufficient gradient strength for animal examinations with high spatial resolution. Homogenous receiver characteristics is achievable over almost the whole inner region of the solenoid coils. The SNR can be increased by a factor 2 to 6 with the adapting coils for examinations using the head coil as connected receiver. Standard sequences on clinical 1.5 T scanners can be applied with adapted transmitter voltages. For example, a SNR value of about 30 is achievable in a mouse liver after 10 minutes measuring time using a 2-D spin echo imaging sequence and a size of 0.3 x 0.3 x 0.8 mm3 for the picture elements.
Hupfer, H; Swiatek, M; Hornung, S; Herrmann, R G; Maier, R M; Chiu, W L; Sears, B
2000-05-01
We describe the 159,443-bp [corrected] sequence of the plastid chromosome of Oenothera elata (evening primrose). The Oe. elata plastid chromosome represents type I of the five genetically distinguishable basic plastomes found in the subsection Euoenothera. The genus Oenothera provides an ideal system in which to address fundamental questions regarding the functional integration of the compartmentalised genetic system characteristic of the eukaryotic cell. Its highly developed taxonomy and genetics, together with a favourable combination of features in its genetic structure (interspecific fertility, stable heterozygous progeny, biparental transmission of organelles, and the phenomenon of complex heterozygosity), allow facile exchanges of nuclei, plastids and mitochondria, as well as individual chromosome pairs, between species. The resulting hybrids or cybrids are usually viable and fertile, but can display various forms of developmental disturbance.
Visualization of protein sequence features using JavaScript and SVG with pViz.js.
Mukhyala, Kiran; Masselot, Alexandre
2014-12-01
pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Adhikari, Badri; Hou, Jie; Cheng, Jianlin
2018-03-01
In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-05-01
Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.
Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke
2008-01-01
Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616
Cheng, Yu-Wei; Tan, Christopher A; Minor, Agata; Arndt, Kelly; Wysinger, Latrice; Grange, Dorothy K; Kozel, Beth A; Robin, Nathaniel H; Waggoner, Darrel; Fitzpatrick, Carrie; Das, Soma; Del Gaudio, Daniela
2014-03-01
Cornelia de Lange syndrome (CdLS) is a genetically heterogeneous disorder characterized by growth retardation, intellectual disability, upper limb abnormalities, hirsutism, and characteristic facial features. In this study we explored the occurrence of intragenic NIPBL copy number variations (CNVs) in a cohort of 510 NIPBL sequence-negative patients with suspected CdLS. Copy number analysis was performed by custom exon-targeted oligonucleotide array-comparative genomic hybridization and/or MLPA. Whole-genome SNP array was used to further characterize rearrangements extending beyond the NIPBL gene. We identified NIPBL CNVs in 13 patients (2.5%) including one intragenic duplication and a deletion in mosaic state. Breakpoint sequences in two patients provided further evidence of a microhomology-mediated replicative mechanism as a potential predominant contributor to CNVs in NIPBL. Patients for whom clinical information was available share classical CdLS features including craniofacial and limb defects. Our experience in studying the frequency of NIBPL CNVs in the largest series of patients to date widens the mutational spectrum of NIPBL and emphasizes the clinical utility of performing NIPBL deletion/duplication analysis in patients with CdLS.
BioSAVE: display of scored annotation within a sequence context.
Pollock, Richard F; Adryan, Boris
2008-03-20
Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage.
BioSAVE: Display of scored annotation within a sequence context
Pollock, Richard F; Adryan, Boris
2008-01-01
Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage. PMID:18366701
Terrace aggradation during the 1978 flood on Powder River, Montana, USA
Moody, J.A.; Meade, R.H.
2008-01-01
Flood processes no longer actively increase the planform area of terraces. Instead, lateral erosion decreases the area. However, infrequent extreme floods continue episodic aggradation of terraces surfaces. We quantify this type of evolution of terraces by an extreme flood in May 1978 on Powder River in southeastern Montana. Within an 89-km study reach of the river, we (1) determine a sediment budget for each geomorphic feature, (2) interpret the stratigraphy of the newly deposited sediment, and (3) discuss the essential role of vegetation in the depositional processes. Peak flood discharge was about 930??m3 s- 1, which lasted about eight??days. During this time, the flood transported 8.2??million tons of sediment into and 4.5??million tons out of the study reach. The masses of sediment transferred between features or eroded from one feature and redeposited on the same feature exceeded the mass transported out of the reach. The flood inundated the floodplain and some of the remnants of two terraces along the river. Lateral erosion decreased the planform area of the lower of the two terraces (~ 2.7??m above the riverbed) by 3.2% and that of the higher terrace (~ 3.5??m above the riverbed) by 4.1%. However, overbank aggradation, on average, raised the lower terrace by 0.16??m and the higher terrace by 0.063??m. Vegetation controlled the type, thickness, and stratigraphy of the aggradation on terrace surfaces. Two characteristic overbank deposits were common: coarsening-upward sequences and lee dunes. Grass caused the deposition of the coarsening-upward sequences, which had 0.02 to 0.07??m of mud at the base, and in some cases, the deposits coarsened upwards to coarse sand on the top. Lee dunes, composed of fine and very fine sand, were deposited in the wake zone downstream from the trees. The characteristic morphology of the dunes can be used to estimate some flood variables such as suspended-sediment particle size, minimum depth, and critical shear velocity. Information about depositional processes during extreme floods is rare, and therefore, the results from this study aid in interpreting the record of terrace stratigraphy along other rivers.
Effective Feature Selection for Classification of Promoter Sequences.
K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish
2016-01-01
Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.
Demir-Hilton, Elif; Hutchins, David A; Czymmek, Kirk J; Coyne, Kathryn J
2012-10-01
Delaware's Inland Bays (DIB), USA, are subject to blooms of potentially harmful raphidophytes, including Heterosigma akashiwo. In 2004, a dense bloom was observed in a low salinity tributary of the DIB. Light microscopy initially suggested that the species was H. akashiwo; however, the cells were smaller than anticipated. 18S rDNA sequences of isolated cultures differed substantially from all raphidophyte sequences in GenBank. Phylogenetic analysis placed it approximately equidistant from Chattonella and Heterosigma with only ~96% sequence homology with either group. Here, we describe this marine raphidophyte as a novel genus and species, Viridilobus marinus (gen. et sp. nov.). We also compared this species with H. akashiwo, because both species are superficially similar with respect to morphology and their ecological niches overlap. V. marinus cells are ovoid to spherical (11.4 × 9.4 μm), and the average number of chloroplasts (4 per cell) is lower than in H. akashiwo (15 per cell). Pigment analysis of V. marinus revealed the presence of fucoxanthin, violaxanthin, and zeaxanthin, which are characteristic of marine raphidophytes within the family Chattonellaceae of the Raphidophyceae. TEM and confocal microscopy, however, revealed diagnostic microscopic and ultrastructural characteristics that distinguish it from other raphidophytes. Chloroplasts were in close association with the nucleus and thylakoids were arranged either parallel or perpendicular to the cell surface. Putative mucocysts were identified, but trichocysts were not observed. These features, along with DNA sequence data, distinguish this species from all other raphidophyte genera within the family Chattonellaceae of the Raphidophyceae. © 2012 Phycological Society of America.
Matrix metalloproteinases outside vertebrates.
Marino-Puertas, Laura; Goulas, Theodoros; Gomis-Rüth, F Xavier
2017-11-01
The matrix metalloproteinase (MMP) family belongs to the metzincin clan of zinc-dependent metallopeptidases. Due to their enormous implications in physiology and disease, MMPs have mainly been studied in vertebrates. They are engaged in extracellular protein processing and degradation, and present extensive paralogy, with 23 forms in humans. One characteristic of MMPs is a ~165-residue catalytic domain (CD), which has been structurally studied for 14 MMPs from human, mouse, rat, pig and the oral-microbiome bacterium Tannerella forsythia. These studies revealed close overall coincidence and characteristic structural features, which distinguish MMPs from other metzincins and give rise to a sequence pattern for their identification. Here, we reviewed the literature available on MMPs outside vertebrates and performed database searches for potential MMP CDs in invertebrates, plants, fungi, viruses, protists, archaea and bacteria. These and previous results revealed that MMPs are widely present in several copies in Eumetazoa and higher plants (Tracheophyta), but have just token presence in eukaryotic algae. A few dozen sequences were found in Ascomycota (within fungi) and in double-stranded DNA viruses infecting invertebrates (within viruses). In contrast, a few hundred sequences were found in archaea and >1000 in bacteria, with several copies for some species. Most of the archaeal and bacterial phyla containing potential MMPs are present in human oral and gut microbiomes. Overall, MMP-like sequences are present across all kingdoms of life, but their asymmetric distribution contradicts the vertical descent model from a eubacterial or archaeal ancestor. This article is part of a Special Issue entitled: Matrix Metalloproteinases edited by Rafael Fridman. Copyright © 2017 Elsevier B.V. All rights reserved.
An iris recognition algorithm based on DCT and GLCM
NASA Astrophysics Data System (ADS)
Feng, G.; Wu, Ye-qing
2008-04-01
With the enlargement of mankind's activity range, the significance for person's status identity is becoming more and more important. So many different techniques for person's status identity were proposed for this practical usage. Conventional person's status identity methods like password and identification card are not always reliable. A wide variety of biometrics has been developed for this challenge. Among those biologic characteristics, iris pattern gains increasing attention for its stability, reliability, uniqueness, noninvasiveness and difficult to counterfeit. The distinct merits of the iris lead to its high reliability for personal identification. So the iris identification technique had become hot research point in the past several years. This paper presents an efficient algorithm for iris recognition using gray-level co-occurrence matrix(GLCM) and Discrete Cosine transform(DCT). To obtain more representative iris features, features from space and DCT transformation domain are extracted. Both GLCM and DCT are applied on the iris image to form the feature sequence in this paper. The combination of GLCM and DCT makes the iris feature more distinct. Upon GLCM and DCT the eigenvector of iris extracted, which reflects features of spatial transformation and frequency transformation. Experimental results show that the algorithm is effective and feasible with iris recognition.
Speech motor planning and execution deficits in early childhood stuttering.
Walsh, Bridget; Mettel, Kathleen Marie; Smith, Anne
2015-01-01
Five to eight percent of preschool children develop stuttering, a speech disorder with clearly observable, hallmark symptoms: sound repetitions, prolongations, and blocks. While the speech motor processes underlying stuttering have been widely documented in adults, few studies to date have assessed the speech motor dynamics of stuttering near its onset. We assessed fundamental characteristics of speech movements in preschool children who stutter and their fluent peers to determine if atypical speech motor characteristics described for adults are early features of the disorder or arise later in the development of chronic stuttering. Orofacial movement data were recorded from 58 children who stutter and 43 children who do not stutter aged 4;0 to 5;11 (years; months) in a sentence production task. For single speech movements and multiple speech movement sequences, we computed displacement amplitude, velocity, and duration. For the phrase level movement sequence, we computed an index of articulation coordination consistency for repeated productions of the sentence. Boys who stutter, but not girls, produced speech with reduced amplitudes and velocities of articulatory movement. All children produced speech with similar durations. Boys, particularly the boys who stuttered, had more variable patterns of articulatory coordination compared to girls. This study is the first to demonstrate sex-specific differences in speech motor control processes between preschool boys and girls who are stuttering. The sex-specific lag in speech motor development in many boys who stutter likely has significant implications for the dramatically different recovery rates between male and female preschoolers who stutter. Further, our findings document that atypical speech motor development is an early feature of stuttering.
Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T
2017-10-01
Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.
Yamamoto, Eiji; Ito, Toshihiro; Ito, Hiroshi
2016-11-01
The nucleotide sequences of nucleocapsid protein (N); phosphoprotein (P); matrix protein (M); hemagglutinin-neuraminidase (HN); and large polymerase protein (L) genes, 3'-end leader, 5'-end trailer and intergenic regions of the avian paramyxovirus (APMV) strain goose/Shimane/67/2000 (APMV/Shimane67) were determined. Together with previously reported data on fusion protein (F) gene sequence [46], the determination of the genome sequence of APMV/Shimane67 has been completed in this study. The genome of APMV/Shimane67 comprised 16,146 nucleotides in length and contains six genes in the order of 3'-N-P-M-F-HN-L-5'. The features of the APMV/Shimane67 genome (e.g., nucleotide length of whole genome and each of the six genes, and predicted amino acid length of each of the six genes) were distinct from those of other APMV serotypes. Phylogenetic analysis indicated that although APMV/Shimane67 was grouped with APMV-1, -9 and -12, the evolutionary distance between APMV/Shimane67 and these viruses was longer than that observed between intra-serotype viruses. These results show that the genome sequence of APMV/Shimane67 contains specific characteristics and is distinguishable from other types of APMV.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kuppuswamy, M.N.; Hoffmann, J.W.; Spitzer, S.G.
1991-02-15
In this report, the authors describe an approach to detect the presence of abnormal alleles in those genetic diseases in which frequency of occurrence of the same mutation is high (e.g., hemophilia B). Initially, from each subject, the DNA fragment containing the putative mutation site is amplified by the polymerase chain reaction. For each fragment two reaction mixtures are then prepared. Each contains the amplified fragment, a primer (18-mer or longer) whose sequence is identical to the coding sequence of the normal gene immediately flanking the 5{prime} end of the mutation site, and either an {alpha}-{sup 32}P-labeled nucleotide corresponding tomore » the normal coding sequence at the mutation site or an {alpha}-{sup 32}P-labeled nucleotide corresponding to the mutant sequence. An essential feature of the present methodology is that the base immediately 3{prime} to the template-bound primer is one of those altered in the mutant, since in this way an extension of the primer by a single base will give an extended molecule characteristic of either the mutant or the wild type. The method is rapid and should be useful in carrier detection and prenatal diagnosis of every genetic disease with a known sequence variation.« less
Conflict Background Triggered Congruency Sequence Effects in Graphic Judgment Task
Zhao, Liang; Wang, Yonghui
2013-01-01
Congruency sequence effects refer to the reduction of congruency effects when following an incongruent trial than following a congruent trial. The conflict monitoring account, one of the most influential contributions to this effect, assumes that the sequential modulations are evoked by response conflict. The present study aimed at exploring the congruency sequence effects in the absence of response conflict. We found congruency sequence effects occurred in graphic judgment task, in which the conflict stimuli acted as irrelevant information. The findings reveal that processing task-irrelevant conflict stimulus features could also induce sequential modulations of interference. The results do not support the interpretation of conflict monitoring and favor a feature integration account that the congruency sequence effects are attributed to the repetitions of stimulus and response features. PMID:23372766
Tsai, Wen-Ting; Hassan, Ahmed; Sarkar, Purbasha; Correa, Joaquin; Metlagel, Zoltan; Jorgens, Danielle M.; Auer, Manfred
2014-01-01
Modern 3D electron microscopy approaches have recently allowed unprecedented insight into the 3D ultrastructural organization of cells and tissues, enabling the visualization of large macromolecular machines, such as adhesion complexes, as well as higher-order structures, such as the cytoskeleton and cellular organelles in their respective cell and tissue context. Given the inherent complexity of cellular volumes, it is essential to first extract the features of interest in order to allow visualization, quantification, and therefore comprehension of their 3D organization. Each data set is defined by distinct characteristics, e.g., signal-to-noise ratio, crispness (sharpness) of the data, heterogeneity of its features, crowdedness of features, presence or absence of characteristic shapes that allow for easy identification, and the percentage of the entire volume that a specific region of interest occupies. All these characteristics need to be considered when deciding on which approach to take for segmentation. The six different 3D ultrastructural data sets presented were obtained by three different imaging approaches: resin embedded stained electron tomography, focused ion beam- and serial block face- scanning electron microscopy (FIB-SEM, SBF-SEM) of mildly stained and heavily stained samples, respectively. For these data sets, four different segmentation approaches have been applied: (1) fully manual model building followed solely by visualization of the model, (2) manual tracing segmentation of the data followed by surface rendering, (3) semi-automated approaches followed by surface rendering, or (4) automated custom-designed segmentation algorithms followed by surface rendering and quantitative analysis. Depending on the combination of data set characteristics, it was found that typically one of these four categorical approaches outperforms the others, but depending on the exact sequence of criteria, more than one approach may be successful. Based on these data, we propose a triage scheme that categorizes both objective data set characteristics and subjective personal criteria for the analysis of the different data sets. PMID:25145678
NASA Astrophysics Data System (ADS)
Zhongqin, G.; Chen, Y.
2017-12-01
Abstract Quickly identify the spatial distribution of landslides automatically is essential for the prevention, mitigation and assessment of the landslide hazard. It's still a challenging job owing to the complicated characteristics and vague boundary of the landslide areas on the image. The high resolution remote sensing image has multi-scales, complex spatial distribution and abundant features, the object-oriented image classification methods can make full use of the above information and thus effectively detect the landslides after the hazard happened. In this research we present a new semi-supervised workflow, taking advantages of recent object-oriented image analysis and machine learning algorithms to quick locate the different origins of landslides of some areas on the southwest part of China. Besides a sequence of image segmentation, feature selection, object classification and error test, this workflow ensemble the feature selection and classifier selection. The feature this study utilized were normalized difference vegetation index (NDVI) change, textural feature derived from the gray level co-occurrence matrices (GLCM), spectral feature and etc. The improvement of this study shows this algorithm significantly removes some redundant feature and the classifiers get fully used. All these improvements lead to a higher accuracy on the determination of the shape of landslides on the high resolution remote sensing image, in particular the flexibility aimed at different kinds of landslides.
Umarov, Ramzan Kh; Solovyev, Victor V
2017-01-01
Accurate computational identification of promoters remains a challenge as these key DNA regulatory regions have variable structures composed of functional motifs that provide gene-specific initiation of transcription. In this paper we utilize Convolutional Neural Networks (CNN) to analyze sequence characteristics of prokaryotic and eukaryotic promoters and build their predictive models. We trained a similar CNN architecture on promoters of five distant organisms: human, mouse, plant (Arabidopsis), and two bacteria (Escherichia coli and Bacillus subtilis). We found that CNN trained on sigma70 subclass of Escherichia coli promoter gives an excellent classification of promoters and non-promoter sequences (Sn = 0.90, Sp = 0.96, CC = 0.84). The Bacillus subtilis promoters identification CNN model achieves Sn = 0.91, Sp = 0.95, and CC = 0.86. For human, mouse and Arabidopsis promoters we employed CNNs for identification of two well-known promoter classes (TATA and non-TATA promoters). CNN models nicely recognize these complex functional regions. For human promoters Sn/Sp/CC accuracy of prediction reached 0.95/0.98/0,90 on TATA and 0.90/0.98/0.89 for non-TATA promoter sequences, respectively. For Arabidopsis we observed Sn/Sp/CC 0.95/0.97/0.91 (TATA) and 0.94/0.94/0.86 (non-TATA) promoters. Thus, the developed CNN models, implemented in CNNProm program, demonstrated the ability of deep learning approach to grasp complex promoter sequence characteristics and achieve significantly higher accuracy compared to the previously developed promoter prediction programs. We also propose random substitution procedure to discover positionally conserved promoter functional elements. As the suggested approach does not require knowledge of any specific promoter features, it can be easily extended to identify promoters and other complex functional regions in sequences of many other and especially newly sequenced genomes. The CNNProm program is available to run at web server http://www.softberry.com.
Bystry, Vojtech; Agathangelidis, Andreas; Bikos, Vasilis; Sutton, Lesley Ann; Baliakas, Panagiotis; Hadzidimitriou, Anastasia; Stamatopoulos, Kostas; Darzentas, Nikos
2015-12-01
An ever-increasing body of evidence supports the importance of B cell receptor immunoglobulin (BcR IG) sequence restriction, alias stereotypy, in chronic lymphocytic leukemia (CLL). This phenomenon accounts for ∼30% of studied cases, one in eight of which belong to major subsets, and extends beyond restricted sequence patterns to shared biologic and clinical characteristics and, generally, outcome. Thus, the robust assignment of new cases to major CLL subsets is a critical, and yet unmet, requirement. We introduce a novel application, ARResT/AssignSubsets, which enables the robust assignment of BcR IG sequences from CLL patients to major stereotyped subsets. ARResT/AssignSubsets uniquely combines expert immunogenetic sequence annotation from IMGT/V-QUEST with curation to safeguard quality, statistical modeling of sequence features from more than 7500 CLL patients, and results from multiple perspectives to allow for both objective and subjective assessment. We validated our approach on the learning set, and evaluated its real-world applicability on a new representative dataset comprising 459 sequences from a single institution. ARResT/AssignSubsets is freely available on the web at http://bat.infspire.org/arrest/assignsubsets/ nikos.darzentas@gmail.com. Supplementary data are available at Bioinformatics online. © The Author 2015. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.
Community detection in sequence similarity networks based on attribute clustering
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
2017-07-24
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Community detection in sequence similarity networks based on attribute clustering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chowdhary, Janamejaya; Loeffler, Frank E.; Smith, Jeremy C.
Networks are powerful tools for the presentation and analysis of interactions in multi-component systems. A commonly studied mesoscopic feature of networks is their community structure, which arises from grouping together similar nodes into one community and dissimilar nodes into separate communities. Here in this paper, the community structure of protein sequence similarity networks is determined with a new method: Attribute Clustering Dependent Communities (ACDC). Sequence similarity has hitherto typically been quantified by the alignment score or its expectation value. However, pair alignments with the same score or expectation value cannot thus be differentiated. To overcome this deficiency, the method constructs,more » for pair alignments, an extended alignment metric, the link attribute vector, which includes the score and other alignment characteristics. Rescaling components of the attribute vectors qualitatively identifies a systematic variation of sequence similarity within protein superfamilies. The problem of community detection is then mapped to clustering the link attribute vectors, selection of an optimal subset of links and community structure refinement based on the partition density of the network. ACDC-predicted communities are found to be in good agreement with gold standard sequence databases for which the "ground truth" community structures (or families) are known. ACDC is therefore a community detection method for sequence similarity networks based entirely on pair similarity information. A serial implementation of ACDC is available from https://cmb.ornl.gov/resources/developments« less
Chakraborty, Sandipan; Chatterjee, Barnali; Basu, Soumalee
2012-07-01
A collective approach of sequence analysis, phylogenetic tree and in silico prediction of amyloidogenecity using bioinformatics tools have been used to correlate the observed species-specific variations in IAPP sequences with the amyloid forming propensity. Observed substitution patterns indicate that probable changes in local hydrophobicity are instrumental in altering the aggregation propensity of the peptide. In particular, residues at 17th, 22nd and 23rd positions of the IAPP peptide are found to be crucial for amyloid formation. Proline25 primarily dictates the observed non-amyloidogenecity in rodents. Furthermore, extensive molecular dynamics simulation of 0.24 μs have been carried out with human IAPP (hIAPP) fragment 19-27, the portion showing maximum sequence variation across different species, to understand the native folding characteristic of this region. Principal component analysis in combination with free energy landscape analysis illustrates a four residue turn spanning from residue 22 to 25. The results provide a structural insight into the intramolecular β-sheet structure of amylin which probably is the template for nucleation of fibril formation and growth, a pathogenic feature of type II diabetes. Copyright © 2012 Elsevier B.V. All rights reserved.
REDIdb: the RNA editing database.
Picardi, Ernesto; Regina, Teresa Maria Rosaria; Brennicke, Axel; Quagliariello, Carla
2007-01-01
The RNA Editing Database (REDIdb) is an interactive, web-based database created and designed with the aim to allocate RNA editing events such as substitutions, insertions and deletions occurring in a wide range of organisms. The database contains both fully and partially sequenced DNA molecules for which editing information is available either by experimental inspection (in vitro) or by computational detection (in silico). Each record of REDIdb is organized in a specific flat-file containing a description of the main characteristics of the entry, a feature table with the editing events and related details and a sequence zone with both the genomic sequence and the corresponding edited transcript. REDIdb is a relational database in which the browsing and identification of editing sites has been simplified by means of two facilities to either graphically display genomic or cDNA sequences or to show the corresponding alignment. In both cases, all editing sites are highlighted in colour and their relative positions are detailed by mousing over. New editing positions can be directly submitted to REDIdb after a user-specific registration to obtain authorized secure access. This first version of REDIdb database stores 9964 editing events and can be freely queried at http://biologia.unical.it/py_script/search.html.
DOE Office of Scientific and Technical Information (OSTI.GOV)
White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.
MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).
Stargardt disease: clinical features, molecular genetics, animal models and therapeutic options.
Tanna, Preena; Strauss, Rupert W; Fujinami, Kaoru; Michaelides, Michel
2017-01-01
Stargardt disease (STGD1; MIM 248200) is the most prevalent inherited macular dystrophy and is associated with disease-causing sequence variants in the gene ABCA4 Significant advances have been made over the last 10 years in our understanding of both the clinical and molecular features of STGD1, and also the underlying pathophysiology, which has culminated in ongoing and planned human clinical trials of novel therapies. The aims of this review are to describe the detailed phenotypic and genotypic characteristics of the disease, conventional and novel imaging findings, current knowledge of animal models and pathogenesis, and the multiple avenues of intervention being explored. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.
Yang, Jian-Yi; Peng, Zhen-Ling; Yu, Zu-Guo; Zhang, Rui-Jie; Anh, Vo; Wang, Desheng
2009-04-21
In this paper, we intend to predict protein structural classes (alpha, beta, alpha+beta, or alpha/beta) for low-homology data sets. Two data sets were used widely, 1189 (containing 1092 proteins) and 25PDB (containing 1673 proteins) with sequence homology being 40% and 25%, respectively. We propose to decompose the chaos game representation of proteins into two kinds of time series. Then, a novel and powerful nonlinear analysis technique, recurrence quantification analysis (RQA), is applied to analyze these time series. For a given protein sequence, a total of 16 characteristic parameters can be calculated with RQA, which are treated as feature representation of protein sequences. Based on such feature representation, the structural class for each protein is predicted with Fisher's linear discriminant algorithm. The jackknife test is used to test and compare our method with other existing methods. The overall accuracies with step-by-step procedure are 65.8% and 64.2% for 1189 and 25PDB data sets, respectively. With one-against-others procedure used widely, we compare our method with five other existing methods. Especially, the overall accuracies of our method are 6.3% and 4.1% higher for the two data sets, respectively. Furthermore, only 16 parameters are used in our method, which is less than that used by other methods. This suggests that the current method may play a complementary role to the existing methods and is promising to perform the prediction of protein structural classes.
Prediction of type III secretion signals in genomes of gram-negative bacteria.
Löwer, Martin; Schneider, Gisbert
2009-06-15
Pathogenic bacteria infecting both animals as well as plants use various mechanisms to transport virulence factors across their cell membranes and channel these proteins into the infected host cell. The type III secretion system represents such a mechanism. Proteins transported via this pathway ("effector proteins") have to be distinguished from all other proteins that are not exported from the bacterial cell. Although a special targeting signal at the N-terminal end of effector proteins has been proposed in literature its exact characteristics remain unknown. In this study, we demonstrate that the signals encoded in the sequences of type III secretion system effectors can be consistently recognized and predicted by machine learning techniques. Known protein effectors were compiled from the literature and sequence databases, and served as training data for artificial neural networks and support vector machine classifiers. Common sequence features were most pronounced in the first 30 amino acids of the effector sequences. Classification accuracy yielded a cross-validated Matthews correlation of 0.63 and allowed for genome-wide prediction of potential type III secretion system effectors in 705 proteobacterial genomes (12% predicted candidates protein), their chromosomes (11%) and plasmids (13%), as well as 213 Firmicute genomes (7%). We present a signal prediction method together with comprehensive survey of potential type III secretion system effectors extracted from 918 published bacterial genomes. Our study demonstrates that the analyzed signal features are common across a wide range of species, and provides a substantial basis for the identification of exported pathogenic proteins as targets for future therapeutic intervention. The prediction software is publicly accessible from our web server (www.modlab.org).
Predicting turns in proteins with a unified model.
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications.
Predicting Turns in Proteins with a Unified Model
Song, Qi; Li, Tonghua; Cong, Peisheng; Sun, Jiangming; Li, Dapeng; Tang, Shengnan
2012-01-01
Motivation Turns are a critical element of the structure of a protein; turns play a crucial role in loops, folds, and interactions. Current prediction methods are well developed for the prediction of individual turn types, including α-turn, β-turn, and γ-turn, etc. However, for further protein structure and function prediction it is necessary to develop a uniform model that can accurately predict all types of turns simultaneously. Results In this study, we present a novel approach, TurnP, which offers the ability to investigate all the turns in a protein based on a unified model. The main characteristics of TurnP are: (i) using newly exploited features of structural evolution information (secondary structure and shape string of protein) based on structure homologies, (ii) considering all types of turns in a unified model, and (iii) practical capability of accurate prediction of all turns simultaneously for a query. TurnP utilizes predicted secondary structures and predicted shape strings, both of which have greater accuracy, based on innovative technologies which were both developed by our group. Then, sequence and structural evolution features, which are profile of sequence, profile of secondary structures and profile of shape strings are generated by sequence and structure alignment. When TurnP was validated on a non-redundant dataset (4,107 entries) by five-fold cross-validation, we achieved an accuracy of 88.8% and a sensitivity of 71.8%, which exceeded the most state-of-the-art predictors of certain type of turn. Newly determined sequences, the EVA and CASP9 datasets were used as independent tests and the results we achieved were outstanding for turn predictions and confirmed the good performance of TurnP for practical applications. PMID:23144872
Liu, Laura; Chen, Ho-Min; Tsai, Shawn; Chang, Tsong-Chi; Tsai, Tzu-Hsun; Yang, Chung-May; Chao, An-Ning; Chen, Kuan-Jen; Kao, Ling-Yuh; Yeung, Ling; Yeh, Lung-Kun; Hwang, Yih-Shiou; Wu, Wei-Chi; Lai, Chi-Chun
2015-01-01
Purpose To investigate the clinical characteristics of X-linked retinoschisis (XLRS) and identify genetic mutations in Taiwanese patients with XLRS. Methods This study included 23 affected males from 16 families with XLRS. Fundus photography, spectral domain optical coherent tomography (SD-OCT), fundus autofluorescence (FAF), and full-field electroretinograms (ERGs) were performed. The coding regions of the RS1 gene that encodes retinoschisin were sequenced. Results The median age at diagnosis was 18 years (range 4–58 years). The best-corrected visual acuity ranged from no light perception to 20/25. The typical spoke-wheel pattern in the macula was present in 61% of the patients (14/23) while peripheral retinoschisis was present in 43% of the patients (10/23). Four eyes presented with vitreous hemorrhage, and two eyes presented with leukocoria that mimics Coats’ disease. Macular schisis was identified with SD-OCT in 82% of the eyes (31/38) while foveal atrophy was present in 18% of the eyes (7/38). Concentric area of high intensity was the most common FAF abnormality observed. Seven out of 12 patients (58%) showed electronegative ERG findings. Sequencing of the RS1 gene identified nine mutations, six of which were novel. The mutations are all located in exons 4–6, including six missense mutations, two nonsense mutations, and one deletion-caused frameshift mutation. Conclusions XLRS is a clinically heterogeneous disease with profound phenotypic inter- and intrafamiliar variability. Genetic sequencing is valuable as it allows a definite diagnosis of XLRS to be made without the classical clinical features and ERG findings. This study showed the variety of clinical features of XLRS and reported novel mutations. PMID:25999676
Schwegmann, Alexander; Lindemann, Jens Peter; Egelhaaf, Martin
2014-01-01
Many flying insects, such as flies, wasps and bees, pursue a saccadic flight and gaze strategy. This behavioral strategy is thought to separate the translational and rotational components of self-motion and, thereby, to reduce the computational efforts to extract information about the environment from the retinal image flow. Because of the distinguishing dynamic features of this active flight and gaze strategy of insects, the present study analyzes systematically the spatiotemporal statistics of image sequences generated during saccades and intersaccadic intervals in cluttered natural environments. We show that, in general, rotational movements with saccade-like dynamics elicit fluctuations and overall changes in brightness, contrast and spatial frequency of up to two orders of magnitude larger than translational movements at velocities that are characteristic of insects. Distinct changes in image parameters during translations are only caused by nearby objects. Image analysis based on larger patches in the visual field reveals smaller fluctuations in brightness and spatial frequency composition compared to small patches. The temporal structure and extent of these changes in image parameters define the temporal constraints imposed on signal processing performed by the insect visual system under behavioral conditions in natural environments. PMID:25340761
Robust k-mer frequency estimation using gapped k-mers
Ghandi, Mahmoud; Mohammad-Noori, Morteza
2013-01-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome. PMID:23861010
Robust k-mer frequency estimation using gapped k-mers.
Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A
2014-08-01
Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.
NASA Astrophysics Data System (ADS)
Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen
2017-01-01
Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.
[Biophysics of single molecules].
Serdiuk, I N; Deriusheva, E I
2011-01-01
The modern methods of research of biological molecules whose application led to the development of a new field of science, biophysics of single molecules, are reviewed. The measurement of the characteristics of single molecules enables one to reveal their individual features, and it is just for this reason that much more information can be obtained from one molecule than from the entire ensample of molecules. The high sensitivity of the methods considered in detail makes it possible to come close to the solution of the basic problem of practical importance, namely, the determination of the nucleotide sequence of a single DNA molecule.
Bioinformatics and expressional analysis of cDNA clones from floral buds
NASA Astrophysics Data System (ADS)
Pawełkowicz, Magdalena Ewa; Skarzyńska, Agnieszka; Cebula, Justyna; Hincha, Dirck; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew
2017-08-01
The application of genomic approaches may serve as an initial step in understanding the complexity of biochemical network and cellular processes responsible for regulation and execution of many developmental tasks. The molecular mechanism of sex expression in cucumber is still not elucidated. A study of differential expression was conducted to identify genes involved in sex determination and floral organ morphogenesis. Herein, we present generation of expression sequence tags (EST) obtained by differential hybridization (DH) and subtraction technique (cDNA-DSC) and their characteristic features such as molecular function, involvement in biology processes, expression and mapping position on the genome.
Mutational Dynamics of Aroid Chloroplast Genomes
Ahmed, Ibrar; Biggs, Patrick J.; Matthews, Peter J.; Collins, Lesley J.; Hendy, Michael D.; Lockhart, Peter J.
2012-01-01
A characteristic feature of eukaryote and prokaryote genomes is the co-occurrence of nucleotide substitution and insertion/deletion (indel) mutations. Although similar observations have also been made for chloroplast DNA, genome-wide associations have not been reported. We determined the chloroplast genome sequences for two morphotypes of taro (Colocasia esculenta; family Araceae) and compared these with four publicly available aroid chloroplast genomes. Here, we report the extent of genome-wide association between direct and inverted repeats, indels, and substitutions in these aroid chloroplast genomes. We suggest that alternative but not mutually exclusive hypotheses explain the mutational dynamics of chloroplast genome evolution. PMID:23204304
Below, Jennifer E.; Earl, Dawn L.; Shively, Kathryn M.; McMillin, Margaret J.; Smith, Joshua D.; Turner, Emily H.; Stephan, Mark J.; Al-Gazali, Lihadh I.; Hertecant, Jozef L.; Chitayat, David; Unger, Sheila; Cohn, Daniel H.; Krakow, Deborah; Swanson, James M.; Faustman, Elaine M.; Shendure, Jay; Nickerson, Deborah A.; Bamshad, Michael J.
2013-01-01
Opsismodysplasia is a rare, autosomal-recessive skeletal dysplasia characterized by short stature, characteristic facial features, and in some cases severe renal phosphate wasting. We used linkage analysis and whole-genome sequencing of a consanguineous trio to discover that mutations in inositol polyphosphate phosphatase-like 1 (INPPL1) cause opsismodysplasia with or without renal phosphate wasting. Evaluation of 12 families with opsismodysplasia revealed that INPPL1 mutations explain ∼60% of cases overall, including both of the families in our cohort with more than one affected child and 50% of the simplex cases. PMID:23273567
Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs
2014-01-01
Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395
Evolutionary genomics and HIV restriction factors.
Pyndiah, Nitisha; Telenti, Amalio; Rausell, Antonio
2015-03-01
To provide updated insights into innate antiviral immunity and highlight prototypical evolutionary features of well characterized HIV restriction factors. Recently, a new HIV restriction factor, Myxovirus resistance 2, has been discovered and the region/residue responsible for its activity identified using an evolutionary approach. Furthermore, IFI16, an innate immunity protein known to sense several viruses, has been shown to contribute to the defense to HIV-1 by causing cell death upon sensing HIV-1 DNA. Restriction factors against HIV show characteristic signatures of positive selection. Different patterns of accelerated sequence evolution can distinguish antiviral strategies--offense or defence--as well as the level of specificity of the antiviral properties. Sequence analysis of primate orthologs of restriction factors serves to localize functional domains and sites responsible for antiviral action. We use recent discoveries to illustrate how evolutionary genomic analyses help identify new antiviral genes and their mechanisms of action.
Coherent manipulation of non-thermal spin order in optical nuclear polarization experiments
NASA Astrophysics Data System (ADS)
Buntkowsky, Gerd; Ivanov, Konstantin L.; Zimmermann, Herbert; Vieth, Hans-Martin
2017-03-01
Time resolved measurements of Optical Nuclear Polarization (ONP) have been performed on hyperpolarized triplet states in molecular crystals created by light excitation. Transfer of the initial electron polarization to nuclear spins has been studied in the presence of radiofrequency excitation; the experiments have been performed with different pulse sequences using different doped molecular systems. The experimental results clearly demonstrate the dominant role of coherent mechanisms of spin order transfer, which manifest themselves in well pronounced oscillations. These oscillations are of two types, precessions and nutations, having characteristic frequencies, which are the same for the different molecular systems and the pulse sequences applied. Hence, precessions and nutations constitute a general feature of polarization transfer in ONP experiments. In general, coherent manipulation of spin order transfer creates a powerful resource for improving the performance of the ONP method, which paves the way to strong signal enhancement in nuclear magnetic resonance.
Expanded complexity of unstable repeat diseases
Polak, Urszula; McIvor, Elizabeth; Dent, Sharon Y.R.; Wells, Robert D.; Napierala, Marek
2015-01-01
Unstable Repeat Diseases (URDs) share a common mutational phenomenon of changes in the copy number of short, tandemly repeated DNA sequences. More than 20 human neurological diseases are caused by instability, predominantly expansion, of microsatellite sequences. Changes in the repeat size initiate a cascade of pathological processes, frequently characteristic of a unique disease or a small subgroup of the URDs. Understanding of both the mechanism of repeat instability and molecular consequences of the repeat expansions is critical to developing successful therapies for these diseases. Recent technological breakthroughs in whole genome, transcriptome and proteome analyses will almost certainly lead to new discoveries regarding the mechanisms of repeat instability, the pathogenesis of URDs, and will facilitate development of novel therapeutic approaches. The aim of this review is to give a general overview of unstable repeats diseases, highlight the complexities of these diseases, and feature the emerging discoveries in the field. PMID:23233240
Kobayashi, Takehito; Yagi, Yusuke; Nakamura, Takahiro
2016-01-01
The pentatricopeptide repeat (PPR) motif is a sequence-specific RNA/DNA-binding module. Elucidation of the RNA/DNA recognition mechanism has enabled engineering of PPR motifs as new RNA/DNA manipulation tools in living cells, including for genome editing. However, the biochemical characteristics of PPR proteins remain unknown, mostly due to the instability and/or unfolding propensities of PPR proteins in heterologous expression systems such as bacteria and yeast. To overcome this issue, we constructed reporter systems using animal cultured cells. The cell-based system has highly attractive features for PPR engineering: robust eukaryotic gene expression; availability of various vectors, reagents, and antibodies; highly efficient DNA delivery ratio (>80 %); and rapid, high-throughput data production. In this chapter, we introduce an example of such reporter systems: a PPR-based sequence-specific translational activation system. The cell-based reporter system can be applied to characterize plant genes of interested and to PPR engineering.
Bouallaga, I; Massicard, S; Yaniv, M; Thierry, F
2000-11-01
Recent studies have reported new mechanisms that mediate the transcriptional synergy of strong tissue-specific enhancers, involving the cooperative assembly of higher-order nucleoprotein complexes called enhanceosomes. Here we show that the HPV18 enhancer, which controls the epithelial-specific transcription of the E6 and E7 transforming genes, exhibits characteristic features of these structures. We used deletion experiments to show that a core enhancer element cooperates, in a specific helical phasing, with distant essential factors binding to the ends of the enhancer. This core sequence, binding a Jun B/Fra-2 heterodimer, cooperatively recruits the architectural protein HMG-I(Y) in a nucleoprotein complex, where they interact with each other. Therefore, in HeLa cells, HPV18 transcription seems to depend upon the assembly of an enhanceosome containing multiple cellular factors recruited by a core sequence interacting with AP1 and HMG-I(Y).
Complete genome sequence of Conexibacter woesei type strain (ID131577T)
Pukall, Rüdiger; Lapidus, Alla; Glavina Del Rio, Tijana; Copeland, Alex; Tice, Hope; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Mavromatis, Konstantinos; Ivanova, Natalia; Ovchinnikova, Galina; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia D.; Chain, Patrick; Meincke, Linda; Sims, David; Brettin, Thomas; Detter, John C.; Rohde, Manfred; Göker, Markus; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Kyrpides, Nikos C.; Klenk, Hans-Peter; Hugenholtz, Philip
2010-01-01
The genus Conexibacter (Monciardini et al. 2003) represents the type genus of the family Conexibacteraceae (Stackebrandt 2005, emend. Zhi et al. 2009) with Conexibacter woesei as the type species of the genus. C. woesei is a representative of a deep evolutionary line of descent within the class Actinobacteria. Strain ID131577T was originally isolated from temperate forest soil in Gerenzano (Italy). Cells are small, short rods that are motile by peritrichous flagella. They may form aggregates after a longer period of growth and, then as a typical characteristic, an undulate structure is formed by self-aggregation of flagella with entangled bacterial cells. Here we describe the features of the organism, together with the complete sequence and annotation. The 6,359,369 bp long genome of C. woesei contains 5,950 protein-coding and 48 RNA genes and is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304704
Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao
2012-05-01
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Goyal, K; Browne, J A; Burnell, A M; Tunnacliffe, A
2005-06-01
Accumulation of the non-reducing disaccharide trehalose is associated with desiccation tolerance during anhydrobiosis in a number of invertebrates, but there is little information on trehalose biosynthetic genes in these organisms. We have identified two trehalose-6-phosphate synthase (tps) genes in the anhydrobiotic nematode Aphelenchus avenae and determined full length cDNA sequences for both; for comparison, full length tps cDNAs from the model nematode, Caenorhabditis elegans, have also been obtained. The A. avenae genes encode very similar proteins containing the catalytic domain characteristic of the GT-20 family of glycosyltransferases and are most similar to tps-2 of C. elegans; no evidence was found for a gene in A. avenae corresponding to Ce-tps-1. Analysis of A. avenae tps cDNAs revealed several features of interest, including alternative trans-splicing of spliced leader sequences in Aav-tps-1, and four different, novel SL1-related trans-spliced leaders, which were different to the canonical SL1 sequence found in all other nematodes studied. The latter observation suggests that A. avenae does not comply with the strict evolutionary conservation of SL1 sequences observed in other species. Unusual features were also noted in predicted nematode TPS proteins, which distinguish them from homologues in other higher eukaryotes (plants and insects) and in micro-organisms. Phylogenetic analysis confirmed their membership of the GT-20 glycosyltransferase family, but indicated an accelerated rate of molecular evolution. Furthermore, nematode TPS proteins possess N- and C-terminal domains, which are unrelated to those of other eukaryotes: nematode C-terminal domains, for example, do not contain trehalose-6-phosphate phosphatase-like sequences, as seen in plant and insect homologues. During onset of anhydrobiosis, both tps genes in A. avenae are upregulated, but exposure to cold or increased osmolarity also results in gene induction, although to a lesser extent. Trehalose seems likely therefore to play a role in a number of stress responses in nematodes.
Chromosomal features of Escherichia coli serotype O2:K2, an avian pathogenic E. coli.
Jørgensen, Steffen L; Kudirkiene, Egle; Li, Lili; Christensen, Jens P; Olsen, John E; Nolan, Lisa; Olsen, Rikke H
2017-01-01
Escherichia coli causing infection outside the gastrointestinal system are referred to as extra-intestinal pathogenic E. coli. Avian pathogenic E. coli is a subgroup of extra-intestinal pathogenic E. coli and infections due to avian pathogenic E. coli have major impact on poultry production economy and welfare worldwide. An almost defining characteristic of avian pathogenic E. coli is the carriage of plasmids, which may encode virulence factors and antibiotic resistance determinates. For the same reason, plasmids of avian pathogenic E. coli have been intensively studied. However, genes encoded by the chromosome may also be important for disease manifestation and antimicrobial resistance. For the E. coli strain APEC_O2 the plasmids have been sequenced and analyzed in several studies, and E. coli APEC_O2 may therefore serve as a reference strain in future studies. Here we describe the chromosomal features of E. coli APEC_O2. E. coli APEC_O2 is a sequence type ST135, has a chromosome of 4,908,820 bp (plasmid removed), comprising 4672 protein-coding genes, 110 RNA genes, and 156 pseudogenes, with an average G + C content of 50.69%. We identified 82 insertion sequences as well as 4672 protein coding sequences, 12 predicated genomic islands, three prophage-related sequences, and two clustered regularly interspaced short palindromic repeats regions on the chromosome, suggesting the possible occurrence of horizontal gene transfer in this strain. The wildtype strain of E. coli APEC_O2 is resistant towards multiple antimicrobials, however, no (complete) antibiotic resistance genes were present on the chromosome, but a number of genes associated with extra-intestinal disease were identified. Together, the information provided here on E. coli APEC_O2 will assist in future studies of avian pathogenic E. coli strains, in particular regarding strain of E. coli APEC_O2, and aid in the general understanding of the pathogenesis of avian pathogenic E. coli .
Robust sensorimotor representation to physical interaction changes in humanoid motion learning.
Shimizu, Toshihiko; Saegusa, Ryo; Ikemoto, Shuhei; Ishiguro, Hiroshi; Metta, Giorgio
2015-05-01
This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.
WebLogo: A Sequence Logo Generator
Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.
2004-01-01
WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120
Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics
Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas
2014-01-01
Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727
Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.; ...
2017-04-09
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less
Musumeci, Matías A; Lozada, Mariana; Rial, Daniela V; Mac Cormack, Walter P; Jansson, Janet K; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M
2017-04-09
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.
Musumeci, Matías A.; Lozada, Mariana; Rial, Daniela V.; Mac Cormack, Walter P.; Jansson, Janet K.; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M.
2017-01-01
The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer–Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments. PMID:28397770
Koppelhus, U; Tranebjaerg, L; Esberg, G; Ramsing, M; Lodahl, M; Rendtorff, N D; Olesen, H V; Sommerlund, M
2011-03-01
Keratitis-ichthyosis-deafness (KID) syndrome is a rare congenital ectodermal disorder, caused by heterozygous missense mutation in GJB2, encoding the gap junction protein connexin 26. The commonest mutation is the p.Asp50Asn mutation, and only a few other mutations have been described to date. To report the fatal clinical course and characterize the genetic background of a premature male neonate with the clinical and histological features of KID syndrome. Genomic DNA was extracted from peripheral blood and used for PCR amplification of the GJB2 gene. Direct sequencing was used for mutation analysis. The clinical features included hearing impairment, ichthyosiform erythroderma with hyperkeratotic plaques, palmoplantar keratoderma, alopecia of the scalp and eyelashes, and a thick vernix caseosa-like covering of the scalp. On histological analysis, features characteristic of KID syndrome, such as acanthosis and papillomatosis of the epidermis with basket-weave hyperkeratosis, were seen. The skin symptoms were treated successfully with acitretin 0.5 mg/kg. The boy developed intraventricular and intracerebral haemorrhage, leading to hydrocephalus. His condition was further complicated by septicaemia and meningitis caused by infection with extended-spectrum beta-lactamase-producing Klebsiella pneumoniae. Severe respiratory failure followed, and the child died at 46 weeks of gestational age (13 weeks postnatally). Sequencing of the GJB2 gene showed that the child was heterozygous for a novel nucleotide change, c.263C>T, in exon 2, leading to a substitution of alanine for valine at position 88 (p.Ala88Val). This study has identified a new heterozygous de novo mutation in the Cx26 gene (c.263C>T; p.Ala88Val) leading to KID syndrome. © The Author(s). CED © 2010 British Association of Dermatologists.
Kim, Yoonjung; Lee, Myeongsang; Choi, Hyunsung; Baek, Inchul; Kim, Jae In; Na, Sungsoo
2018-04-01
Silk materials are receiving significant attention as base materials for various functional nanomaterials and nanodevices, due to its exceptionally high mechanical properties, biocompatibility, and degradable characteristics. Although crystalline silk regions are composed of various repetitive motifs with differing amino acid sequences, how the effect of humidity works differently on each of the motifs and their structural characteristics remains unclear. We report molecular dynamics (MD) simulations on various silkworm fibroins composed of major motifs (i.e. (GAGAGS) n , (GAGAGA) n , and (GAGAGY) n ) at varying degrees of hydration, and reveal how each major motifs of silk fibroins change at each degrees of hydration using MD simulations and their structural properties in mechanical perspective via steered molecular dynamics simulations. Our results explain what effects humidity can have on nanoscale materials and devices consisting of crystalline silk materials.
Illeghems, Koen; De Vuyst, Luc; Weckx, Stefan
2013-08-01
Acetobacter pasteurianus 386B, an acetic acid bacterium originating from a spontaneous cocoa bean heap fermentation, proved to be an ideal functional starter culture for coca bean fermentations. It is able to dominate the fermentation process, thereby resisting high acetic acid concentrations and temperatures. However, the molecular mechanisms underlying its metabolic capabilities and niche adaptations are unknown. In this study, whole-genome sequencing and comparative genome analysis was used to investigate this strain's mechanisms to dominate the cocoa bean fermentation process. The genome sequence of A. pasteurianus 386B is composed of a 2.8-Mb chromosome and seven plasmids. The annotation of 2875 protein-coding sequences revealed important characteristics, including several metabolic pathways, the occurrence of strain-specific genes such as an endopolygalacturonase, and the presence of mechanisms involved in tolerance towards various stress conditions. Furthermore, the low number of transposases in the genome and the absence of complete phage genomes indicate that this strain might be more genetically stable compared with other A. pasteurianus strains, which is an important advantage for the use of this strain as a functional starter culture. Comparative genome analysis with other members of the Acetobacteraceae confirmed the functional properties of A. pasteurianus 386B, such as its thermotolerant nature and unique genetic composition. Genome analysis of A. pasteurianus 386B provided detailed insights into the underlying mechanisms of its metabolic features, niche adaptations, and tolerance towards stress conditions. Combination of these data with previous experimental knowledge enabled an integrated, global overview of the functional characteristics of this strain. This knowledge will enable improved fermentation strategies and selection of appropriate acetic acid bacteria strains as functional starter culture for cocoa bean fermentation processes.
Sharma, Ronesh; Bayarjargal, Maitsetseg; Tsunoda, Tatsuhiko; Patil, Ashwini; Sharma, Alok
2018-01-21
Intrinsically Disordered Proteins (IDPs) lack stable tertiary structure and they actively participate in performing various biological functions. These IDPs expose short binding regions called Molecular Recognition Features (MoRFs) that permit interaction with structured protein regions. Upon interaction they undergo a disorder-to-order transition as a result of which their functionality arises. Predicting these MoRFs in disordered protein sequences is a challenging task. In this study, we present MoRFpred-plus, an improved predictor over our previous proposed predictor to identify MoRFs in disordered protein sequences. Two separate independent propensity scores are computed via incorporating physicochemical properties and HMM profiles, these scores are combined to predict final MoRF propensity score for a given residue. The first score reflects the characteristics of a query residue to be part of MoRF region based on the composition and similarity of assumed MoRF and flank regions. The second score reflects the characteristics of a query residue to be part of MoRF region based on the properties of flanks associated around the given residue in the query protein sequence. The propensity scores are processed and common averaging is applied to generate the final prediction score of MoRFpred-plus. Performance of the proposed predictor is compared with available MoRF predictors, MoRFchibi, MoRFpred, and ANCHOR. Using previously collected training and test sets used to evaluate the mentioned predictors, the proposed predictor outperforms these predictors and generates lower false positive rate. In addition, MoRFpred-plus is a downloadable predictor, which makes it useful as it can be used as input to other computational tools. https://github.com/roneshsharma/MoRFpred-plus/wiki/MoRFpred-plus:-Download. Copyright © 2017 Elsevier Ltd. All rights reserved.
An insect-inspired model for visual binding II: functional analysis and visual attention.
Northcutt, Brandon D; Higgins, Charles M
2017-04-01
We have developed a neural network model capable of performing visual binding inspired by neuronal circuitry in the optic glomeruli of flies: a brain area that lies just downstream of the optic lobes where early visual processing is performed. This visual binding model is able to detect objects in dynamic image sequences and bind together their respective characteristic visual features-such as color, motion, and orientation-by taking advantage of their common temporal fluctuations. Visual binding is represented in the form of an inhibitory weight matrix which learns over time which features originate from a given visual object. In the present work, we show that information represented implicitly in this weight matrix can be used to explicitly count the number of objects present in the visual image, to enumerate their specific visual characteristics, and even to create an enhanced image in which one particular object is emphasized over others, thus implementing a simple form of visual attention. Further, we present a detailed analysis which reveals the function and theoretical limitations of the visual binding network and in this context describe a novel network learning rule which is optimized for visual binding.
Image correlation microscopy for uniform illumination.
Gaborski, T R; Sealander, M N; Ehrenberg, M; Waugh, R E; McGrath, J L
2010-01-01
Image cross-correlation microscopy is a technique that quantifies the motion of fluorescent features in an image by measuring the temporal autocorrelation function decay in a time-lapse image sequence. Image cross-correlation microscopy has traditionally employed laser-scanning microscopes because the technique emerged as an extension of laser-based fluorescence correlation spectroscopy. In this work, we show that image correlation can also be used to measure fluorescence dynamics in uniform illumination or wide-field imaging systems and we call our new approach uniform illumination image correlation microscopy. Wide-field microscopy is not only a simpler, less expensive imaging modality, but it offers the capability of greater temporal resolution over laser-scanning systems. In traditional laser-scanning image cross-correlation microscopy, lateral mobility is calculated from the temporal de-correlation of an image, where the characteristic length is the illuminating laser beam width. In wide-field microscopy, the diffusion length is defined by the feature size using the spatial autocorrelation function. Correlation function decay in time occurs as an object diffuses from its original position. We show that theoretical and simulated comparisons between Gaussian and uniform features indicate the temporal autocorrelation function depends strongly on particle size and not particle shape. In this report, we establish the relationships between the spatial autocorrelation function feature size, temporal autocorrelation function characteristic time and the diffusion coefficient for uniform illumination image correlation microscopy using analytical, Monte Carlo and experimental validation with particle tracking algorithms. Additionally, we demonstrate uniform illumination image correlation microscopy analysis of adhesion molecule domain aggregation and diffusion on the surface of human neutrophils.
Childhood apraxia of speech: A survey of praxis and typical speech characteristics.
Malmenholt, Ann; Lohmander, Anette; McAllister, Anita
2017-07-01
The purpose of this study was to investigate current knowledge of the diagnosis childhood apraxia of speech (CAS) in Sweden and compare speech characteristics and symptoms to those of earlier survey findings in mainly English-speakers. In a web-based questionnaire 178 Swedish speech-language pathologists (SLPs) anonymously answered questions about their perception of typical speech characteristics for CAS. They graded own assessment skills and estimated clinical occurrence. The seven top speech characteristics reported as typical for children with CAS were: inconsistent speech production (85%), sequencing difficulties (71%), oro-motor deficits (63%), vowel errors (62%), voicing errors (61%), consonant cluster deletions (54%), and prosodic disturbance (53%). Motor-programming deficits described as lack of automatization of speech movements were perceived by 82%. All listed characteristics were consistent with the American Speech-Language-Hearing Association (ASHA) consensus-based features, Strand's 10-point checklist, and the diagnostic model proposed by Ozanne. The mode for clinical occurrence was 5%. Number of suspected cases of CAS in the clinical caseload was approximately one new patient/year and SLP. The results support and add to findings from studies of CAS in English-speaking children with similar speech characteristics regarded as typical. Possibly, these findings could contribute to cross-linguistic consensus on CAS characteristics.
The dependence on morphology of the gas content in galactic disks
NASA Technical Reports Server (NTRS)
Hogg, D. E.; Roberts, M. S.
1993-01-01
The classification S0 was introduced by Hubble to serve as a description of galaxies whose morphological characteristics seemed to lie between the disk-dominated spirals and the spheroidal elliptical systems. Since then there has been extensive discussion as to whether this classification sequence is also an evolutionary sequence. Many studies have focussed on a particular feature such as the luminosity profile, the bulge-to-disk ratio, or the nature of the interstellar matter, but the question of the evolution remains contentious. Equally contentious is the question of the classification itself. For systems with well-developed disks there usually is no problem. Many spheroidal systems also are unambiguously classified as ellipticals in most catalogs. However, there are a number of early systems which have been reclassified following review using improved optical material. For example, Eder et al. (AJ, 102, 572, 1991) found that many of the S0 galaxies which are rich in neutral hydrogen have faint spiral features. The confusion about classification propagates into the discussion of the properties of early-type systems. Attempts to put the classification system on a quantitative basis have in general been unsuccessful. Recently Sandage (private communication) has reviewed the classification of early systems and has defined a set of sub-classes for these objects. The S0 galaxies are divided into three groups, depending on the prominence of the disk. There are six subdivisions of Sa galaxies, depending upon the relative prominence of knots and other arm-like characteristics. We have explored the total gas content in these objects to see if there is a dependence on the galaxy morphology, as denoted by these new subclasses.
Layered Deposits of Arabia Terra and Meridiani Planum: Keys to the Habitability of Ancient Mars
NASA Technical Reports Server (NTRS)
Allen, Carlton C.; Oehler, Dorothy Z.; Paris, Kristen N.; Venechuk, Elizabeth M.
2006-01-01
Understanding the habitability of ancient Mars is a key goal in the exploration of that planet. Evidence for conditions favorable to early life must be sought in ancient sedimentary rocks, such as those of Arabia Terra and Meridiani Planum. Arabia Terra, the northernmost extension of the ancient highlands, is dominated by cratered plains and minor ridged units. These plains extend south into the adjacent Meridiani Planum. The Opportunity rover landed in northern Meridiani, close to the border with Arabia. High resolution MOC images reveal extensive layered sequences across much of the Arabia and Meridiani region. These layers have been interpreted as eroded remnants of sedimentary rock deposits (Edgett, 2005). The layered sequences are concentrated in the SW quadrant of Arabia and in northern Meridiani. Preliminary mapping by Edgett (2005) distinguished four large scale layered sequences in the Arabia and Meridiani region. These have dimensions of hundreds to more than 1,000 km. MOLA altimetry shows that each of the sequences can attain a thickness of 200 to 400 m, with a total thickness greater than 1 km. The sequences are generally flat lying, with regional slopes of a few degrees. Much finer layering is evident within a number of craters. The plains and ridged units of the Arabia and Meridiani region were originally mapped as Noachian based on crater statistics, particularly the number of large craters (Scott and Carr, 1978). The layered sequences in the current study postdate many, but not all, of these large craters. The layered sequences have partially or totally filled a number of craters with diameters ranging from 20 to over 50 km. The topmost layered sequence, as well as the lower two sequences, have intermediate thermal inertia, as derived from THEMIS, indicative of moderate induration. The TES spectra from the lower sequences include features indicative of basalt. Some areas of the topmost sequence, which includes the Opportunity landing site, have TES spectra dominated by hematite. Just below this topmost sequence lies a sequence with higher thermal inertia, indicative of more indurated or coarser grained material. The TES spectra of this sequence lack distinctive mineral features, and the rocks may be obscured by a thin coating of dust. The layers have been extensively eroded. The uppermost sequences are characterized by deeply scalloped boundaries. Filled craters have been partially exhumed. Finely layered deposits within craters have been strongly dissected. Landforms uniquely attributable to wind erosion are rare, but erosive styles and geomorphology characteristic of water and possibly ice are present. The layered sequences in Arabia Terra and Meridiani Planum likely reflect an epoch when the planet was much more habitable than it is today. Several areas in these layered sequences are under intensive study as candidate landing sites for the 2009 Mars Science Laboratory.
Ma, Xin; Guo, Jing; Sun, Xiao
2015-01-01
The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.
Molecular Structure and Sequence in Complex Coacervates
NASA Astrophysics Data System (ADS)
Sing, Charles; Lytle, Tyler; Madinya, Jason; Radhakrishna, Mithun
Oppositely-charged polyelectrolytes in aqueous solution can undergo associative phase separation, in a process known as complex coacervation. This results in a polyelectrolyte-dense phase (coacervate) and polyelectrolyte-dilute phase (supernatant). There remain challenges in understanding this process, despite a long history in polymer physics. We use Monte Carlo simulation to demonstrate that molecular features (charge spacing, size) play a crucial role in governing the equilibrium in coacervates. We show how these molecular features give rise to strong monomer sequence effects, due to a combination of counterion condensation and correlation effects. We distinguish between structural and sequence-based correlations, which can be designed to tune the phase diagram of coacervation. Sequence effects further inform the physical understanding of coacervation, and provide the basis for new coacervation models that take monomer-level features into account.
Sensorineural Deafness, Distinctive Facial Features and Abnormal Cranial Bones
Gad, Alona; Laurino, Mercy; Maravilla, Kenneth R.; Matsushita, Mark; Raskind, Wendy H.
2008-01-01
The Waardenburg syndromes (WS) account for approximately 2% of congenital sensorineural deafness. This heterogeneous group of diseases currently can be categorized into four major subtypes (WS types 1-4) on the basis of characteristic clinical features. Multiple genes have been implicated in WS, and mutations in some genes can cause more than one WS subtype. In addition to eye, hair and skin pigmentary abnormalities, dystopia canthorum and broad nasal bridge are seen in WS type 1. Mutations in the PAX3 gene are responsible for the condition in the majority of these patients. In addition, mutations in PAX3 have been found in WS type 3 that is distinguished by musculoskeletal abnormalities, and in a family with a rare subtype of WS, craniofacial-deafness-hand syndrome (CDHS), characterized by dysmorphic facial features, hand abnormalities, and absent or hypoplastic nasal and wrist bones. Here we describe a woman who shares some, but not all features of WS type 3 and CDHS, and who also has abnormal cranial bones. All sinuses were hypoplastic, and the cochlea were small. No sequence alteration in PAX3 was found. These observations broaden the clinical range of WS and suggest there may be genetic heterogeneity even within the CDHS subtype. PMID:18553554
Kawagoshi, Taiki; Nishida, Chizuko; Ota, Hidetoshi; Kumazawa, Yoshinori; Endo, Hideki; Matsuda, Yoichi
2008-01-01
Crocodilians have several unique karyotypic features, such as small diploid chromosome numbers (30-42) and the absence of dot-shaped microchromosomes. Of the extant crocodilian species, the Siamese crocodile (Crocodylus siamensis) has no more than 2n = 30, comprising mostly bi-armed chromosomes with large centromeric heterochromatin blocks. To investigate the molecular structures of C-heterochromatin and genomic compartmentalization in the karyotype, characterized by the disappearance of tiny microchromosomes and reduced chromosome number, we performed molecular cloning of centromeric repetitive sequences and chromosome mapping of the 18S-28S rDNA and telomeric (TTAGGG)( n ) sequences. The centromeric heterochromatin was composed mainly of two repetitive sequence families whose characteristics were quite different. Two types of GC-rich CSI-HindIII family sequences, the 305 bp CSI-HindIII-S (G+C content, 61.3%) and 424 bp CSI-HindIII-M (63.1%), were localized to the intensely PI-stained centric regions of all chromosomes, except for chromosome 2 with PI-negative heterochromatin. The 94 bp CSI-DraI (G+C content, 48.9%) was tandem-arrayed satellite DNA and localized to chromosome 2 and four pairs of small-sized chromosomes. The chromosomal size-dependent genomic compartmentalization that is supposedly unique to the Archosauromorpha was probably lost in the crocodilian lineage with the disappearance of microchromosomes followed by the homogenization of centromeric repetitive sequences between chromosomes, except for chromosome 2.
Smith, Joel; Davidson, Eric H.
2009-01-01
Design features that ensure reproducible and invariant embryonic processes are major characteristics of current gene regulatory network models. New cis-regulatory studies on a gene regulatory network subcircuit activated early in the development of the sea urchin embryo reveal a sequence of encoded “fail-safe” regulatory devices. These ensure the maintenance of fate separation between skeletogenic and nonskeletogenic mesoderm lineages. An unexpected consequence of the network design revealed in the course of these experiments is that it enables the embryo to “recover” from regulatory interference that has catastrophic effects if this feature is disarmed. A reengineered regulatory system inserted into the embryo was used to prove how this system operates in vivo. Genomically encoded backup control circuitry thus provides the mechanism underlying a specific example of the regulative development for which the sea urchin embryo has long been famous. PMID:19822764
Clinical features of multiple organ failure in the elderly.
Wang, S W; Fan, L
1990-09-01
Multiple organ failure (MOF) in the elderly is a new syndrome evolved from multiple organ chronic diseases on the basis of multiple organ dysfunction in the aged. Its characteristics are clinically different from those of MOF due to serious trauma. 122 cases of MOF were analysed retrospectively and their clinical features discussed. MOF with a long course is the natural presentation in many of the elderly before death. Its main precipitating factors are pulmonary infection, metastatic carcinoma, cardiac attack, etc. The sequence of a failure in organs is heart, lung, kidney, liver, etc. The mortality is similar to that of MOF due to trauma. However, those suffering from 4-organ failure can still survive, and instead, the renal failure can be mostly fatal. More attention should be paid to the prevention of MOF in the elderly so as to shorten its developing course.
Exon–intron organization of genes in the slime mold Physarum polycephalum
Trzcinska-Danielewicz, Joanna; Fronk, Jan
2000-01-01
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon–intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon–intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon–intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3′-ends. PMID:10982858
SURVEY AND SUMMARY: exon-intron organization of genes in the slime mold Physarum polycephalum.
Trzcinska-Danielewicz, J; Fronk, J
2000-09-15
The slime mold Physarum polycephalum is a morphologically simple organism with a large and complex genome. The exon-intron organization of its genes exhibits features typical for protists and fungi as well as those characteristic for the evolutionarily more advanced species. This indicates that both the taxonomic position as well as the size of the genome shape the exon-intron organization of an organism. The average gene has 3.7 introns which are on average 138 bp, with a rather narrow size distribution. Introns are enriched in AT base pairs by 13% relative to exons. The consensus sequences at exon-intron boundaries resemble those found for other species, with minor differences between short and long introns. A unique feature of P.polycephalum introns is the strong preference for pyrimidines in the coding strand throughout their length, without a particular enrichment at the 3'-ends.
Attention capture without awareness in a non-spatial selection task.
Oriet, Chris; Pandey, Mamata; Kawahara, Jun-Ichiro
2017-02-01
Distractors presented prior to a critical target in a rapid sequence of visually-presented items induce a lag-dependent deficit in target identification, particularly when the distractor shares a task-relevant feature of the target. Presumably, such capture of central attention is important for bringing a target into awareness. The results of the present investigation suggest that greater capture of attention by a distractor is not accompanied by greater awareness of it. Moreover, awareness tends to be limited to superficial characteristics of the target such as colour. The findings are interpreted within the context of a model that assumes sudden increases in arousal trigger selection of information for consolidation in working memory. In this conceptualization, prolonged analysis of distractor items sharing task-relevant features leads to larger target identification deficits (i.e., greater capture) but no increase in awareness. Copyright © 2016 Elsevier Inc. All rights reserved.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; Carmona e Ferreira, Renata; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-12-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3'-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment.
Real, Fernando; Vidal, Ramon Oliveira; Carazzolle, Marcelo Falsarella; Mondego, Jorge Maurício Costa; Costa, Gustavo Gilson Lacerda; Herai, Roberto Hirochi; Würtele, Martin; de Carvalho, Lucas Miguel; e Ferreira, Renata Carmona; Mortara, Renato Arruda; Barbiéri, Clara Lucia; Mieczkowski, Piotr; da Silveira, José Franco; Briones, Marcelo Ribeiro da Silva; Pereira, Gonçalo Amarante Guimarães; Bahia, Diana
2013-01-01
We present the sequencing and annotation of the Leishmania (Leishmania) amazonensis genome, an etiological agent of human cutaneous leishmaniasis in the Amazon region of Brazil. L. (L.) amazonensis shares features with Leishmania (L.) mexicana but also exhibits unique characteristics regarding geographical distribution and clinical manifestations of cutaneous lesions (e.g. borderline disseminated cutaneous leishmaniasis). Predicted genes were scored for orthologous gene families and conserved domains in comparison with other human pathogenic Leishmania spp. Carboxypeptidase, aminotransferase, and 3′-nucleotidase genes and ATPase, thioredoxin, and chaperone-related domains were represented more abundantly in L. (L.) amazonensis and L. (L.) mexicana species. Phylogenetic analysis revealed that these two species share groups of amastin surface proteins unique to the genus that could be related to specific features of disease outcomes and host cell interactions. Additionally, we describe a hypothetical hybrid interactome of potentially secreted L. (L.) amazonensis proteins and host proteins under the assumption that parasite factors mimic their mammalian counterparts. The model predicts an interaction between an L. (L.) amazonensis heat-shock protein and mammalian Toll-like receptor 9, which is implicated in important immune responses such as cytokine and nitric oxide production. The analysis presented here represents valuable information for future studies of leishmaniasis pathogenicity and treatment. PMID:23857904
Hou, Xiao-qiang; Guo, Shun-xing
2014-09-01
The endophytic fungi with plant growth promoting effects were screened by co-culture of each endophytic fungus and seedlings of Dendrobium officinale. Anatomical features of the inoculated roots were studied by paraffin sectioning. Morphological characteristics and rDNA ITS1-5. 8S-ITS2 sequences were applied for the taxonomy of endophytic fungi. The results showed that 8 strains inoculated to D. officinale seedlings greatly enhanced plant height, stem diameter, new roots number and biomass. According to the anatomical features of the inoculated roots, each fungus could infect the velamina of seedlings. The hyphae or pelotons were existed in the exodermis passage cells and cortex cells. The effective fungi could not infect the endodermis and vascular bundle sheath, but which was exception for other fungi with harmful to seedlings. Combined with classic morphologic classification, 2 effective strains were identified which were subjected to Pestalotiopsis and Eurotium. Six species of fungi without conidiophore belonged to Pyrenochaeta, Coprinellus, Pholiota, Alternaria, Helotiales, which were identified by sequencing the PCR-amplified rDNA ITS1-5. 8S-ITS2 regions. The co-culture technology of effective endophytic fungi and plant can apply to cultivate the seedlings of D. officinale. It is feasible to shorten growth cycle of D. officinale and increase the resource of Chinese herbs.
Lee, Seonah
2013-10-01
This study aimed to organize the system features of decision support technologies targeted at nursing practice into assessment, problem identification, care plans, implementation, and outcome evaluation. It also aimed to identify the range of the five stage-related sequential decision supports that computerized clinical decision support systems provided. MEDLINE, CINAHL, and EMBASE were searched. A total of 27 studies were reviewed. The system features collected represented the characteristics of each category from patient assessment to outcome evaluation. Several features were common across the reviewed systems. For the sequential decision support, all of the reviewed systems provided decision support in sequence for patient assessment and care plans. Fewer than half of the systems included problem identification. There were only three systems operating in an implementation stage and four systems in outcome evaluation. Consequently, the key steps for sequential decision support functions were initial patient assessment, problem identification, care plan, and outcome evaluation. Providing decision support in such a full scope will effectively help nurses' clinical decision making. By organizing the system features, a comprehensive picture of nursing practice-oriented computerized decision support systems was obtained; however, the development of a guideline for better systems should go beyond the scope of a literature review.
Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences
ERIC Educational Resources Information Center
Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam
2015-01-01
This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…
SeqDepot: streamlined database of biological sequences and precomputed features.
Ulrich, Luke E; Zhulin, Igor B
2014-01-15
Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.
Recurrence time statistics: versatile tools for genomic DNA sequence analysis.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
2004-01-01
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
NASA Astrophysics Data System (ADS)
Karakatsanis, L. P.; Pavlos, G. P.; Iliopoulos, A. C.; Pavlos, E. G.; Clark, P. M.; Duke, J. L.; Monos, D. S.
2018-09-01
This study combines two independent domains of science, the high throughput DNA sequencing capabilities of Genomics and complexity theory from Physics, to assess the information encoded by the different genomic segments of exonic, intronic and intergenic regions of the Major Histocompatibility Complex (MHC) and identify possible interactive relationships. The dynamic and non-extensive statistical characteristics of two well characterized MHC sequences from the homozygous cell lines, PGF and COX, in addition to two other genomic regions of comparable size, used as controls, have been studied using the reconstructed phase space theorem and the non-extensive statistical theory of Tsallis. The results reveal similar non-linear dynamical behavior as far as complexity and self-organization features. In particular, the low-dimensional deterministic nonlinear chaotic and non-extensive statistical character of the DNA sequences was verified with strong multifractal characteristics and long-range correlations. The nonlinear indices repeatedly verified that MHC sequences, whether exonic, intronic or intergenic include varying levels of information and reveal an interaction of the genes with intergenic regions, whereby the lower the number of genes in a region, the less the complexity and information content of the intergenic region. Finally we showed the significance of the intergenic region in the production of the DNA dynamics. The findings reveal interesting content information in all three genomic elements and interactive relationships of the genes with the intergenic regions. The results most likely are relevant to the whole genome and not only to the MHC. These findings are consistent with the ENCODE project, which has now established that the non-coding regions of the genome remain to be of relevance, as they are functionally important and play a significant role in the regulation of expression of genes and coordination of the many biological processes of the cell.
Zhao, Chunqing; Feng, Xiaoyun; Tang, Tao; Qiu, Lihong
2015-01-01
Cytochrome P450 monooxygenases (CYPs), as an enzyme superfamily, is widely distributed in organisms and plays a vital function in the metabolism of exogenous and endogenous compounds by interacting with its obligatory redox partner, CYP reductase (CPR). A novel CYP gene (CYP9A11) and CPR gene from the agricultural pest insect Spodoptera exigua were cloned and characterized. The complete cDNA sequences of SeCYP9A11 and SeCPR are 1,931 and 3,919 bp in length, respectively, and contain open reading frames of 1,593 and 2,070 nucleotides, respectively. Analysis of the putative protein sequences indicated that SeCYP9A11 contains a heme-binding domain and the unique characteristic sequence (SRFALCE) of the CYP9 family, in addition to a signal peptide and transmembrane segment at the N-terminal. Alignment analysis revealed that SeCYP9A11 shares the highest sequence similarity with CYP9A13 from Mamestra brassicae, which is 66.54%. The putative protein sequence of SeCPR has all of the classical CPR features, such as an N-terminal membrane anchor; three conserved domain flavin adenine dinucleotide (FAD), flavin mononucleotide (FMN), and nicotinamide adenine dinucleotide phosphate (NADPH) domain; and characteristic binding motifs. Phylogenetic analysis revealed that SeCPR shares the highest identity with HaCPR, which is 95.21%. The SeCYP9A11 and SeCPR genes were detected in the midgut, fat body, and cuticle tissues, and throughout all of the developmental stages of S. exigua. The mRNA levels of SeCYP9A11 and SeCPR decreased remarkably after exposure to plant secondary metabolites quercetin and tannin. The results regarding SeCYP9A11 and SeCPR genes in the current study provide foundation for the further study of S. exigua P450 system. PMID:26320261
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Yanfeng; Zheng, Yi; Qin, Ling
Beta-hydroxyacid dehydrogenase (β-HAD) genes have been identified in all sequenced genomes of eukaryotes and prokaryotes. Their gene products catalyze the NAD+- or NADP+-dependent oxidation of various β-hydroxy acid substrates into their corresponding semialdehyde. In many fungal and bacterial genomes, multiple β-HAD genes are observed leading to the hypothesis that these gene products may have unique, uncharacterized metabolic roles specific to their species. The genomes of Geobacter sulfurreducens and Geobacter metallireducens each contain two potential β-HAD genes. The protein sequences of one pair of these genes, Gs-βHAD (Q74DE4) and Gm-βHAD (Q39R98), have 65% sequence identity and 77% sequence similarity with eachmore » other. Both proteins reduce succinic semialdehyde, a metabolite of the GABA shunt. To further explore the structural and functional characteristics of these two β-HADs with a potentially unique substrate specificity, crystal structures for Gs-βHAD and Gm-βHAD in complex with NADP+ were determined to a resolution of 1.89 Å and 2.07 Å, respectively. The structure of both proteins are similar, composed of 14 α-helices and nine β-strands organized into two domains. Domain One (1-165) adopts a typical Rossmann fold composed of two α/β units: a six-strand parallel β-sheet surrounded by six α-helices (α1 – α6) followed by a mixed three-strand β-sheet surrounded by two α-helices (α7 and α8). Domain Two (166-287) is composed of a bundle of seven α-helices (α9 – α14). Four functional regions conserved in all β-HADs are spatially located near each other at the interdomain cleft in both Gs-βHAD and Gm-βHAD with a buried molecule of NADP+. The structural features of Gs-βHAD and Gm-βHAD are described in relation to the four conserved consensus sequences characteristic of β-HADs and the potential biochemical importance of these enzymes as an alternative pathway for the degradation of succinic semialdehyde.« less
Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.
Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao
2011-09-20
Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.
Processing Translational Motion Sequences.
1982-10-01
the initial ROADSIGN image using a (del)**2g mask with a width of 5 pixels The distinctiveness values were computed using features which were 5x5 pixel...the initial step size of the local search quite large. 34 4. EX P R g NTg The following experiments were performed using the roadsign and industrial...the initial image of the sequence. The third experiment involves processing the roadsign image sequence using the features extracted at the positions
Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C; Hiramatsu, Keiichi
2005-11-01
Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the "oriC environ," likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance.
A systematic approach to magnetic resonance imaging evaluation of epiphyseal lesions.
Thawait, Shrey K; Thawait, Gaurav K; Frassica, Frank J; Andreisek, Gustav; Carrino, John A; Chhabra, Avneesh
2013-04-01
Magnetic Resonance Imaging (MRI) is the preferred modality of choice to image epiphyseal lesions. It provides excellent soft tissue resolution and extent of disease. A wide spectrum of tumor and tumor like lesions can involve the epiphysis. Early and accurate diagnosis as well as appropriate management of epiphyseal lesions is critical as these conditions may lead to disabling complications such as, limb length discrepancy, angular or joint surface deformities and secondary osteoarthritis. In this article, we discuss the role of conventional sequences, such as T1W, fluid sensitive T2W and intravenous (IV) Gadolinium enhanced sequences as well as the additional value of problem solving MRI sequences such as, chemical shift and diffusion weighted imaging. Based on the imaging findings on various MRI sequences and lesion characteristics, a systematic approach directed to the diagnoses of epiphyseal lesions is presented and discussed. MRI features of clinically and biopsy proven examples of the epiphyseal lesions, such as osteomyelitis, intra-osseous abscess, infiltrative malignancy, metastases, transient osteoporosis, subchondral insufficiency fracture, avascular necrosis, osteochondral fracture, osteochondritis dissecans, eosinophilic granuloma and geode are demonstrated. Using this systematic approach, the reader will be able to better characterize epiphyseal lesions with a potential to positively affect patient management. Copyright © 2013 Elsevier Inc. All rights reserved.
Boscaro, Vittorio; Fokin, Sergei I; Schrallhammer, Martina; Schweikert, Michael; Petroni, Giulio
2013-01-01
The genus Holospora (Rickettsiales) includes highly infectious nuclear symbionts of the ciliate Paramecium with unique morphology and life cycle. To date, nine species have been described, but a molecular characterization is lacking for most of them. In this study, we have characterized a novel Holospora-like bacterium (HLB) living in the macronuclei of a Paramecium jenningsi population. This bacterium was morphologically and ultrastructurally investigated in detail, and its life cycle and infection capabilities were described. We also obtained its 16S rRNA gene sequence and developed a specific probe for fluorescence in situ hybridization experiments. A new taxon, "Candidatus Gortzia infectiva", was established for this HLB according to its unique characteristics and the relatively low DNA sequence similarities shared with other bacteria. The phylogeny of the order Rickettsiales based on 16S rRNA gene sequences has been inferred, adding to the available data the sequence of the novel bacterium and those of two Holospora species (Holospora obtusa and Holospora undulata) characterized for the purpose. Our phylogenetic analysis provided molecular support for the monophyly of HLBs and showed a possible pattern of evolution for some of their features. We suggested to classify inside the family Holosporaceae only HLBs, excluding other more distantly related and phenotypically different Paramecium endosymbionts.
Dobinson, K F; Harris, R E; Hamer, J E
1993-01-01
The fungal phytopathogen Magnaporthe grisea parasitizes a wide variety of gramineous hosts. In the course of investigating the genetic relationship between pathogen genotype and host specificity we identified a retroelement that is present in some strains of M. grisea that infect finger millet and goosegrass (members of the plant genus Eleusine). The element, designated grasshopper (grh), is present in multiple copies and dispersed throughout the genome. DNA sequence analysis showed that grasshopper contains 198 base pair direct, long terminal repeats (LTRs) with features characteristic of retroviral and retrotransposon LTRs. Within the element we identified an open reading frame with sequences homologous to the reverse transcriptase, RNaseH, and integrase domains of retroelement pol genes. Comparison of the open reading frame with sequences from other retroelements showed that grh is related to the gypsy family of retrotransposons. Comparisons of the distribution of the grasshopper element with other dispersed repeated DNA sequences in M. grisea indicated that grasshopper was present in a broadly dispersed subgroup of Eleusine pathogens, suggesting that the element was acquired subsequent to the evolution of this host-specific form. We present arguments that the amplification of different retroelements within populations of M. grisea is a consequence of the clonal organization of the fungal populations.
Anonymization of electronic medical records for validating genome-wide association studies
Loukides, Grigorios; Gkoulalas-Divanis, Aris; Malin, Bradley
2010-01-01
Genome-wide association studies (GWAS) facilitate the discovery of genotype–phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients’ standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data “as is” may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients’ identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks. PMID:20385806
Newell, Nicholas E
2011-12-15
The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.
Toward a model for lexical access based on acoustic landmarks and distinctive features
NASA Astrophysics Data System (ADS)
Stevens, Kenneth N.
2002-04-01
This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps: (1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of ``articulator-bound'' features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.
Alu expression in human cell lines and their retrotranspositional potential.
Oler, Andrew J; Traina-Dorge, Stephen; Derbes, Rebecca S; Canella, Donatella; Cairns, Brad R; Roy-Engel, Astrid M
2012-06-20
The vast majority of the 1.1 million Alu elements are retrotranspositionally inactive, where only a few loci referred to as 'source elements' can generate new Alu insertions. The first step in identifying the active Alu sources is to determine the loci transcribed by RNA polymerase III (pol III). Previous genome-wide analyses from normal and transformed cell lines identified multiple Alu loci occupied by pol III factors, making them candidate source elements. Analysis of the data from these genome-wide studies determined that the majority of pol III-bound Alus belonged to the older subfamilies Alu S and Alu J, which varied between cell lines from 62.5% to 98.7% of the identified loci. The pol III-bound Alus were further scored for estimated retrotransposition potential (ERP) based on the absence or presence of selected sequence features associated with Alu retrotransposition capability. Our analyses indicate that most of the pol III-bound Alu loci candidates identified lack the sequence characteristics important for retrotransposition. These data suggest that Alu expression likely varies by cell type, growth conditions and transformation state. This variation could extend to where the same cell lines in different laboratories present different Alu expression patterns. The vast majority of Alu loci potentially transcribed by RNA pol III lack important sequence features for retrotransposition and the majority of potentially active Alu loci in the genome (scored high ERP) belong to young Alu subfamilies. Our observations suggest that in an in vivo scenario, the contribution of Alu activity on somatic genetic damage may significantly vary between individuals and tissues.
Kulik, Natallia; Slámová, Kristýna; Ettrich, Rüdiger; Křen, Vladimír
2015-01-28
β-N-Acetylhexosaminidase (GH20) from the filamentous fungus Talaromyces flavus, previously identified as a prominent enzyme in the biosynthesis of modified glycosides, lacks a high resolution three-dimensional structure so far. Despite of high sequence identity to previously reported Aspergillus oryzae and Penicilluim oxalicum β-N-acetylhexosaminidases, this enzyme tolerates significantly better substrate modification. Understanding of key structural features, prediction of effective mutants and potential substrate characteristics prior to their synthesis are of general interest. Computational methods including homology modeling and molecular dynamics simulations were applied to shad light on the structure-activity relationship in the enzyme. Primary sequence analysis revealed some variable regions able to influence difference in substrate affinity of hexosaminidases. Moreover, docking in combination with consequent molecular dynamics simulations of C-6 modified glycosides enabled us to identify the structural features required for accommodation and processing of these bulky substrates in the active site of hexosaminidase from T. flavus. To access the reliability of predictions on basis of the reported model, all results were confronted with available experimental data that demonstrated the principal correctness of the predictions as well as the model. The main variable regions in β-N-acetylhexosaminidases determining difference in modified substrate affinity are located close to the active site entrance and engage two loops. Differences in primary sequence and the spatial arrangement of these loops and their interplay with active site amino acids, reflected by interaction energies and dynamics, account for the different catalytic activity and substrate specificity of the various fungal and bacterial β-N-acetylhexosaminidases.
Our evolving understanding of aeolian bedforms, based on observation of dunes on different worlds
NASA Astrophysics Data System (ADS)
Diniega, Serina; Kreslavsky, Mikhail; Radebaugh, Jani; Silvestro, Simone; Telfer, Matt; Tirsch, Daniela
2017-06-01
Dunes, dune fields, and ripples are unique and useful records of the interaction between wind and granular materials - finding such features on a planetary surface immediately suggests certain information about climate and surface conditions (at least during the dunes' formation and evolution). Additionally, studies of dune characteristics under non-Earth conditions allow for ;tests; of aeolian process models based primarily on observations of terrestrial features and dynamics, and refinement of the models to include consideration of a wider range of environmental and planetary conditions. To-date, the planetary aeolian community has found and studied dune fields on Mars, Venus, and the Saturnian moon Titan. Additionally, we have observed candidate ;aeolian bedforms; on Comet 67P/Churyumov-Gerasimenko, the Jovian moon Io, and - most recently - Pluto. In this paper, we hypothesize that the progression of investigations of aeolian bedforms and processes on a particular planetary body follows a consistent sequence - primarily set by the acquisition of data of particular types and resolutions, and by the maturation of knowledge about that planetary body. We define that sequence of generated knowledge and new questions (within seven investigation phases) and discuss examples from all of the studied bodies. The aim of such a sequence is to better define our past and current state of understanding about the aeolian bedforms of a particular body, to highlight the related assumptions that require re-analysis with data acquired during later investigations, and to use lessons learned from planetary and terrestrial aeolian studies to predict what types of investigations could be most fruitful in the future.
Lier, Clément; Baticle, Elodie; Horvath, Philippe; Haguenoer, Eve; Valentin, Anne-Sophie; Glaser, Philippe; Mereghetti, Laurent; Lanotte, Philippe
2015-01-01
CRISPR-Cas systems (clustered regularly interspaced short palindromic repeats/CRISPR-associated proteins) are found in 90% of archaea and about 40% of bacteria. In this original system, CRISPR arrays comprise short, almost unique sequences called spacers that are interspersed with conserved palindromic repeats. These systems play a role in adaptive immunity and participate to fight non-self DNA such as integrative and conjugative elements, plasmids, and phages. In Streptococcus agalactiae, a bacterium implicated in colonization and infections in humans since the 1960s, two CRISPR-Cas systems have been described. A type II-A system, characterized by proteins Cas9, Cas1, Cas2, and Csn2, is ubiquitous, and a type I–C system, with the Cas8c signature protein, is present in about 20% of the isolates. Unlike type I–C, which appears to be non-functional, type II-A appears fully functional. Here we studied type II-A CRISPR-cas loci from 126 human isolates of S. agalactiae belonging to different clonal complexes that represent the diversity of the species and that have been implicated in colonization or infection. The CRISPR-cas locus was analyzed both at spacer and repeat levels. Major distinctive features were identified according to the phylogenetic lineages previously defined by multilocus sequence typing, especially for the sequence type (ST) 17, which is considered hypervirulent. Among other idiosyncrasies, ST-17 shows a significantly lower number of spacers in comparison with other lineages. This characteristic could reflect the peculiar virulence or colonization specificities of this lineage. PMID:26124774
The geochemistry and petrogenesis of an ophiolitic sequence from Pindos, Greece
NASA Astrophysics Data System (ADS)
Capedri, S.; Venturelli, G.; Bocchi, G.; Dostal, J.; Garuti, G.; Rossi, A.
1980-06-01
The ophiolites of Northern Pindos have been studied in a section close to the village of Perivoli (Grevena District). The section comprises cumulus rocks ranging from ultramafics to gabbros, overlain by dolerites (non-cumulus microgabbro) capped by thick frequently pillowed lava flows. The sequence is cut by basaltic dykes. While the cumulus rocks and the dolerites are mostly fresh, the lavas and dykes are strongly transformed. Major and trace element (Ni, Cr, Sc, Y, Zr, Nb, Sr, Ba, Zn, Cu, V, Li) data are presented for selected samples from the sequence. For some elements, the volcanic/subvolcanic rocks (flows, dykes, dolerites) exhibit wide chemical characteristics which are considered to mainly reflect variations within the parent magmas. Some lavas appear to be closely comparable with the present-day ocean-floor basalts, while other flows and most of the dykes are strongly depleted in some “incompatible” elements and are similar to some rocks from immature island arcs. The dolerites have transitional chemical features. The Pindos lavas differ from Western Mediterranean ophiolites in that the former have lower Ti,P,Zr,Y, higher Fe tot. and normally higher Ti/Zr ratio. The volcanic/subvolcanic rocks from Pindos have been derived from separate magmas. Some lavas were possibly produced by variable partial melting of an already depleted mantle source, while the lavas exhibiting ocean-floor affinity were probably generated by partial melting of a less depleted source. The wide chemical variations of the Pindos lavas cannot be easily explained by an ocean-ridge system. An “island arc-marginal basin system” could better account for the observed chemical features.
Prediction of siRNA potency using sparse logistic regression.
Hu, Wei; Hu, John
2014-06-01
RNA interference (RNAi) can modulate gene expression at post-transcriptional as well as transcriptional levels. Short interfering RNA (siRNA) serves as a trigger for the RNAi gene inhibition mechanism, and therefore is a crucial intermediate step in RNAi. There have been extensive studies to identify the sequence characteristics of potent siRNAs. One such study built a linear model using LASSO (Least Absolute Shrinkage and Selection Operator) to measure the contribution of each siRNA sequence feature. This model is simple and interpretable, but it requires a large number of nonzero weights. We have introduced a novel technique, sparse logistic regression, to build a linear model using single-position specific nucleotide compositions which has the same prediction accuracy of the linear model based on LASSO. The weights in our new model share the same general trend as those in the previous model, but have only 25 nonzero weights out of a total 84 weights, a 54% reduction compared to the previous model. Contrary to the linear model based on LASSO, our model suggests that only a few positions are influential on the efficacy of the siRNA, which are the 5' and 3' ends and the seed region of siRNA sequences. We also employed sparse logistic regression to build a linear model using dual-position specific nucleotide compositions, a task LASSO is not able to accomplish well due to its high dimensional nature. Our results demonstrate the superiority of sparse logistic regression as a technique for both feature selection and regression over LASSO in the context of siRNA design.
Zaidi, Syed H E; Meyer, Sascha; Peltekova, Vanya D; Lindinger, Angelika; Teebi, Ahmad S; Faiyaz-Ul-Haque, Muhammad
2009-07-01
Arterial tortuosity syndrome (ATS) is a rare autosomal recessive disorder in which patients display tortuosity of arteries in addition to hyperextensible skin, joint laxity, and other connective tissue features. This syndrome is caused by mutations in the SLC2A10 gene. In this article we describe an ATS girl of Kurdish origin who, in addition to arterial tortuosity and connective tissue features, displays stomach displacement within the thorax and bilateral hip dislocation. Clinical details of this patient have been reported previously. Sequencing of the SLC2A10 gene identified a novel homozygous non-sense c.756C>A mutation in this patient's DNA. This mutation in the SLC2A10 gene replaces a cysteine encoding codon with a stop signal. This is believed to cause a premature truncation of GLUT10 protein in this patient. We conclude that patients of Kurdish origin who display arterial tortuosity associated with skin hyperextensibility, joint hypermobility, and characteristic facial features may carry mutations in the SLC2A10 gene.
Experimental Demonstration of Coherent Control in Quantum Chaotic Systems
NASA Astrophysics Data System (ADS)
Bitter, M.; Milner, V.
2017-01-01
We experimentally demonstrate coherent control of a quantum system, whose dynamics is chaotic in the classical limit. Interaction of diatomic molecules with a periodic sequence of ultrashort laser pulses leads to the dynamical localization of the molecular angular momentum, a characteristic feature of the chaotic quantum kicked rotor. By changing the phases of the rotational states in the initially prepared coherent wave packet, we control the rotational distribution of the final localized state and its total energy. We demonstrate the anticipated sensitivity of control to the exact parameters of the kicking field, as well as its disappearance in the classical regime of excitation.
Characteristic features and biotechnological applications of cross-linked enzyme aggregates (CLEAs).
Sheldon, Roger A
2011-11-01
Cross-linked enzyme aggregates (CLEAs) have many economic and environmental benefits in the context of industrial biocatalysis. They are easily prepared from crude enzyme extracts, and the costs of (often expensive) carriers are circumvented. They generally exhibit improved storage and operational stability towards denaturation by heat, organic solvents, and autoproteolysis and are stable towards leaching in aqueous media. Furthermore, they have high catalyst productivities (kilograms product per kilogram biocatalyst) and are easy to recover and recycle. Yet another advantage derives from the possibility to co-immobilize two or more enzymes to provide CLEAs that are capable of catalyzing multiple biotransformations, independently or in sequence as catalytic cascade processes.
Characterization of a new apscaviroid from American persimmon.
Ito, Takao; Suzaki, Koichi; Nakano, Masaaki; Sato, Akihiko
2013-12-01
A unique circular molecule of 358 nucleotides was detected in American persimmon (Diospyros virginiana L.). The molecule was graft-transmissible and had genetic characteristics of members of the genus Apscaviroid. It had the highest sequence similarity (72-73 %) to citrus viroid VI (CVd-VI) and formed a clade with CVd-VI, citrus dwarfing viroid, and apple dimple fruit viroid in a phylogenetic tree. The molecule was not detected in citrus, unlike CVd-VI, which infects citrus and persimmon, and it was genetically distant from persimmon latent viroid, which infects persimmon only. The genetic and biological features indicated that the molecule may be a member of a new apscaviroid species.
Cunningham, Danielle A; Lowe, Lisa H; Shao, Lei; Acosta, Natasha R
2016-08-01
Astroblastoma is a rare tumor of uncertain origin most commonly presenting in the cerebrum of children and young adults. The literature contains only case reports and small series regarding its radiologic features. This systematic review is the largest study of imaging findings of astroblastoma to date and serves to identify features that might differentiate it from other neoplasms. This study describes the imaging features of astroblastoma based on a systematic review of the literature and two new cases. We conducted a PubMed and Google Scholar database search that identified 59 publications containing 125 cases of pathology-confirmed astroblastoma, and we also added two new cases from our own institution. Data collected include patient age, gender, tumor location, morphology, calcifications and calvarial changes. We recorded findings on CT, MRI, diffusion-weighted imaging (DWI), MR spectroscopy, positron emission tomography (PET) and catheter angiography. Age at diagnosis ranged 0-70 years (mean 18 years; median 14 years). Female-to-male ratio was 8:1. Of 127 cases, 66 reported CT, 78 reported MRI and 47 reported both findings. Not all authors reported all features, but the tumor features reported included supratentorial in 96% (122/127), superficial in 72% (48/67), well-demarcated in 96% (79/82), mixed cystic-solid in 93% (79/85), and enhancing in 99% (78/79). On CT, 84% (26/31) of astroblastomas were hyperattenuated, 73% (27/37) had calcifications and 7 cases reported adjacent calvarial erosion. Astroblastomas were hypointense on T1-W in 58% (26/45) and on T2-W in 50% (23/46) of MRI sequences. Peritumoral edema was present in 80% (40/50) of cases but was typically described as slight. Six cases included DWI findings, with 100% showing restricted diffusion. On MR spectroscopy, 100% (5/5) showed nonspecific tumor spectra with elevated choline and decreased N-acetylaspartate (NAA). PET revealed nonspecific reduced uptake of [F-18] 2-fluoro-2-deoxyglucose ((18)F-FDG) and increased uptake of [11C]-Methionine in 100% (3/3) of cases. Catheter angiography findings (n=12) were variable, including hypervascularity in 67%, arteriovenous shunting in 33% and avascular areas in 25%. Astroblastomas occur most often in adolescent girls. Imaging often shows a supratentorial, superficial, well-defined, cystic-solid enhancing mass. On CT, most are hyperattenuated, have calcifications, and may remodel adjacent bone if superficial. MRI characteristically reveals a hypointense mass on T1-W and T2-W sequences with restricted diffusion. MR spectroscopy, PET and catheter angiography findings are nonspecific.
Boomsma, Wouter; Nielsen, Sofie V; Lindorff-Larsen, Kresten; Hartmann-Petersen, Rasmus; Ellgaard, Lars
2016-01-01
The ubiquitin-proteasome system targets misfolded proteins for degradation. Since the accumulation of such proteins is potentially harmful for the cell, their prompt removal is important. E3 ubiquitin-protein ligases mediate substrate ubiquitination by bringing together the substrate with an E2 ubiquitin-conjugating enzyme, which transfers ubiquitin to the substrate. For misfolded proteins, substrate recognition is generally delegated to molecular chaperones that subsequently interact with specific E3 ligases. An important exception is San1, a yeast E3 ligase. San1 harbors extensive regions of intrinsic disorder, which provide both conformational flexibility and sites for direct recognition of misfolded targets of vastly different conformations. So far, no mammalian ortholog of San1 is known, nor is it clear whether other E3 ligases utilize disordered regions for substrate recognition. Here, we conduct a bioinformatics analysis to examine >600 human and S. cerevisiae E3 ligases to identify enzymes that are similar to San1 in terms of function and/or mechanism of substrate recognition. An initial sequence-based database search was found to detect candidates primarily based on the homology of their ordered regions, and did not capture the unique disorder patterns that encode the functional mechanism of San1. However, by searching specifically for key features of the San1 sequence, such as long regions of intrinsic disorder embedded with short stretches predicted to be suitable for substrate interaction, we identified several E3 ligases with these characteristics. Our initial analysis revealed that another remarkable trait of San1 is shared with several candidate E3 ligases: long stretches of complete lysine suppression, which in San1 limits auto-ubiquitination. We encode these characteristic features into a San1 similarity-score, and present a set of proteins that are plausible candidates as San1 counterparts in humans. In conclusion, our work indicates that San1 is not a unique case, and that several other yeast and human E3 ligases have sequence properties that may allow them to recognize substrates by a similar mechanism as San1.
NASA Astrophysics Data System (ADS)
Chin, A.; O'Dowd, A. P.; Mendez, P. K.; Velasco, K. Z.; Leventhal, R. D.; Storesund, R.; Laurencio, L. R.
2014-12-01
Step-pools are important features in fluvial systems. Through energy dissipation, step-pools provide stability in high-energy environments that otherwise may erode and degrade. Although research has focused on geomorphological aspects of step-pool channels, the ecological significance of step-pool streams is increasingly recognized. Step-pool streams often contain higher density and diversity of benthic macroinvertebrates and are critical habitats for organisms such as salmonids and tailed frogs. Step-pools are therefore increasingly used to restore eroding channels and improve ecological conditions. This paper addresses a restoration reach of Wildcat Creek in Berkeley, California that featured an installation of step-pools in 2012. The design framework recognized step-pool formation as a self-organizing process that produces a rhythmic morphology. After placing step particles at locations where step-pools are expected to form according to hydraulic theory, the self-organizing approach allowed fluvial processes to refine the rocks into adjusted sequences over time. In addition, a 30-meter "experimental" reach was created to explore the co-evolution of geomorphological and ecological characteristics. After constructing a plane bed channel, boulders and cobbles piled at the upstream end allowed natural flows to mobilize and sort them into step-pool sequences. Ground surveys and LiDAR recorded the development of step-pool sequences over several seasons. Concurrent sampling of benthic macroinvertebrates documented the formation of biological communities in conjunction with habitat. Biological sampling in an upstream reference reach provided a comparison with the restored reach over time. Results to date show an emergent step-pool channel with steps that segment the plane bed into initial step and pool habitats. Biological communities are beginning to form, showing more distinction among habitat types during some seasons, although they do not yet approach reference values at this stage of development. Research over longer timeframes is needed to reveal how biological and physical characteristics may co-organize toward an equilibrium landscape. Such integrated understanding will assist development of innovative restoration designs.
Evaluating, Comparing, and Interpreting Protein Domain Hierarchies
2014-01-01
Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108
Huang, Ying; Chen, Shi-Yi; Deng, Feilong
2016-01-01
In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hudson, Corey M.; Williams, Kelly P.
We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less
Hudson, Corey M.; Williams, Kelly P.
2014-11-05
We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less
Applications of alignment-free methods in epigenomics.
Pinello, Luca; Lo Bosco, Giosuè; Yuan, Guo-Cheng
2014-05-01
Epigenetic mechanisms play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have supported a role of DNA sequences in recruitment of epigenetic regulators. Alignment-free methods have been applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. Here, we review recent advances in such applications, including the methods to map DNA sequence to feature space, sequence comparison and prediction models. Computational studies using these methods have provided important insights into the epigenetic regulatory mechanisms.
Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J
2018-05-17
Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
Association of a Novel Nonsense Mutation in KIAA1279 with Goldberg-Shprintzen Syndrome.
Salehpour, Shadab; Hashemi-Gorji, Feyzollah; Soltani, Ziba; Ghafouri-Fard, Soudeh; Miryounesi, Mohammad
2017-01-01
Goldberg-Shprintzen syndrome (OMIM 609460) (GOSHS) is an autosomal recessive multiple congenital anomaly syndrome distinguished by intellectual disability, microcephaly, and dysmorphic facial characteristics. Most affected individuals also have Hirschsprung disease and/or gyral abnormalities of the brain. This syndrome has been associated with KIAA1279 gene mutations at 10q22.1. Here we report a 16 yr old male patient referred to Center for Comprehensive Genetic Services, Tehran, Iran in 2015 with cardinal features of GOSHS in addition to refractory seizures. Whole exome sequencing in the patient revealed a novel nonsense (stop gain) homozygous mutation in KIAA1279 gene (KIAA1279: NM_015634:exon6:c.C976T:p.Q326X). Considering the wide range of phenotypic variations in GOSHS, relying on phenotypic characteristics for discrimination of GOSH from similar syndromes may lead to misdiagnosis. Consequently, molecular diagnostic tools would help in accurate diagnosis of such overlapping phenotypes.
Description of new genera and species of marine cyanobacteria from the Portuguese Atlantic coast.
Brito, Ângela; Ramos, Vitor; Mota, Rita; Lima, Steeve; Santos, Arlete; Vieira, Jorge; Vieira, Cristina P; Kaštovský, Jan; Vasconcelos, Vitor M; Tamagnini, Paula
2017-06-01
Aiming at increasing the knowledge on marine cyanobacteria from temperate regions, we previously isolated and characterized 60 strains from the Portuguese foreshore and evaluate their potential to produce secondary metabolites. About 15% of the obtained 16S rRNA gene sequences showed less than 97% similarity to sequences in the databases revealing novel biodiversity. Herein, seven of these strains were extensively characterized and their classification was re-evaluated. The present study led to the proposal of five new taxa, three genera (Geminobacterium, Lusitaniella, and Calenema) and two species (Hyella patelloides and Jaaginema litorale). Geminobacterium atlanticum LEGE 07459 is a chroococcalean that shares morphological characteristics with other unicellular cyanobacterial genera but has a distinct phylogenetic position and particular ultrastructural features. The description of the Pleurocapsales Hyella patelloides LEGE 07179 includes novel molecular data for members of this genus. The filamentous isolates of Lusitaniella coriacea - LEGE 07167, 07157 and 06111 - constitute a very distinct lineage, and seem to be ubiquitous on the Portuguese coast. Jaaginema litorale LEGE 07176 has distinct characteristics compared to their marine counterparts, and our analysis indicates that this genus is polyphyletic. The Synechococcales Calenema singularis possess wider trichomes than Leptolyngbya, and its phylogenetic position reinforces the establishment of this new genus. Copyright © 2017 Elsevier Inc. All rights reserved.
Emergence and Spread of Epidemic Multidrug-Resistant Pseudomonas aeruginosa.
Miyoshi-Akiyama, Tohru; Tada, Tatsuya; Ohmagari, Norio; Viet Hung, Nguyen; Tharavichitkul, Prasit; Pokhrel, Bharat Mani; Gniadkowski, Marek; Shimojima, Masahiro; Kirikae, Teruo
2017-12-01
Pseudomonas aeruginosa (P. aeruginosa) is one of the most common nosocomial pathogens worldwide. Although the emergence of multidrug-resistant (MDR) P. aeruginosa is a critical problem in medical practice, the key features involved in the emergence and spread of MDR P. aeruginosa remain unknown. This study utilized whole genome sequence (WGS) analyses to define the population structure of 185 P. aeruginosa clinical isolates from several countries. Of these 185 isolates, 136 were categorized into sequence type (ST) 235, one of the most common types worldwide. Phylogenetic analysis showed that these isolates fell within seven subclades. Each subclade harbors characteristic drug resistance genes and a characteristic genetic background confined to a geographic location, suggesting that clonal expansion following antibiotic exposure is the driving force in generating the population structure of MDR P. aeruginosa. WGS analyses also showed that the substitution rate was markedly higher in ST235 MDR P. aeruginosa than in other strains. Notably, almost all ST235 isolates harbor the specific type IV secretion system and very few or none harbor the CRISPR/CAS system. These findings may help explain the mechanism underlying the emergence and spread of ST235 P. aeruginosa as the predominant MDR lineage. © The Author(s) 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Zarafeta, Dimitra; Moschidi, Danai; Ladoukakis, Efthymios; Gavrilov, Sergey; Chrysina, Evangelia D; Chatziioannou, Aristotelis; Kublanov, Ilya; Skretas, Georgios; Kolisis, Fragiskos N
2016-12-19
Biocatalysts exerting activity against ester bonds have a broad range of applications in modern biotechnology. Here, we have identified a new esterolytic enzyme by screening a metagenomic sample collected from a hot spring in Kamchatka, Russia. Biochemical characterization of the new esterase, termed EstDZ2, revealed that it is highly active against medium chain fatty acid esters at temperatures between 25 and 60 °C and at pH values 7-8. The new enzyme is moderately thermostable with a half-life of more than six hours at 60 °C, but exhibits exquisite stability against high concentrations of organic solvents. Phylogenetic analysis indicated that EstDZ2 is likely an Acetothermia enzyme that belongs to a new family of bacterial esterases, for which we propose the index XV. One distinctive feature of this new family, is the presence of a conserved GHSAG catalytic motif. Multiple sequence alignment, coupled with computational modelling of the three-dimensional structure of EstDZ2, revealed that the enzyme lacks the largest part of the "cap" domain, whose extended structure is characteristic for the closely related Family IV esterases. Thus, EstDZ2 appears to be distinct from known related esterolytic enzymes, both in terms of sequence characteristics, as well as in terms of three-dimensional structure.
Weiss, Karin; Kruszka, Paul; Guillen Sacoto, Maria J; Addissie, Yonit A; Hadley, Donald W; Hadsall, Casey K; Stokes, Bethany; Hu, Ping; Roessler, Erich; Solomon, Beth; Wiggs, Edythe; Thurm, Audrey; Hufnagel, Robert B; Zein, Wadih M; Hahn, Jin S; Stashinko, Elaine; Levey, Eric; Baldwin, Debbie; Clegg, Nancy J; Delgado, Mauricio R; Muenke, Maximilian
2018-01-01
PurposeWith improved medical care, some individuals with holoprosencephaly (HPE) are surviving into adulthood. We investigated the clinical manifestations of adolescents and adults with HPE and explored the underlying molecular causes.MethodsParticipants included 20 subjects 15 years of age and older. Clinical assessments included dysmorphology exams, cognitive testing, swallowing studies, ophthalmic examination, and brain magnetic resonance imaging. Genetic testing included chromosomal microarray, Sanger sequencing for SHH, ZIC2, SIX3, and TGIF, and whole-exome sequencing (WES) of 10 trios.ResultsSemilobar HPE was the most common subtype of HPE, seen in 50% of the participants. Neurodevelopmental disabilities were found to correlate with HPE subtype. Factors associated with long-term survival included HPE subtype not alobar, female gender, and nontypical facial features. Four participants had de novo pathogenic variants in ZIC2. WES analysis of 11 participants did not reveal plausible candidate genes, suggesting complex inheritance in these cases. Indeed, in two probands there was a history of uncontrolled maternal type 1 diabetes.ConclusionIndividuals with various HPE subtypes can survive into adulthood and the neurodevelopmental outcomes are variable. Based on the facial characteristics and molecular evaluations, we suggest that classic genetic causes of HPE may play a smaller role in this cohort.
Ko, Kwan Soo; Yeom, Joon-Sup; Lee, Mi Young; Peck, Kyong Ran
2008-01-01
In this study, we investigated the molecular characteristics of extended-spectrum β-lactamase (ESBL)-producing Klebsiella pneumoniae isolates that were recovered from an outbreak in a Korean hospital. A new multilocus sequence typing (MLST) scheme for K. pneumoniae based on five housekeeping genes was developed and was evaluated for 43 ESBL-producing isolates from an outbreak as well as 38 surveillance isolates from Korea and also a reference strain. Overall, a total of 37 sequence types (STs) and six clonal complexes (CCs) were identified among the 82 K. pneumoniae isolates. The result of MLST analysis was concordant with that of pulsedfield gel electrophoresis. Most of the outbreak isolates belonged to a certain clone (ST2), and they produced SHV-1 and CTX-M14 enzymes, which was a different feature from that of the K. pneumoniae isolates from other Korean hospitals (ST20 and SHV-12). We also found a different distribution of CCs between ESBL-producing and -nonproducing K. pneumoniae isolates. The MLST method we developed in this study could provide unambiguous and well-resolved data for the epidemiologic study of K. pneumoniae. The outbreak isolates showed different molecular characteristics from the other K. pneumoniae isolates from other Korean hospitals. PMID:18303199
NASA Astrophysics Data System (ADS)
Vye-Brown, C.; Self, S.; Barry, T. L.
2013-03-01
The physical features and morphologies of collections of lava bodies emplaced during single eruptions (known as flow fields) can be used to understand flood basalt emplacement mechanisms. Characteristics and internal features of lava lobes and whole flow field morphologies result from the forward propagation, radial spread, and cooling of individual lobes and are used as a tool to understand the architecture of extensive flood basalt lavas. The features of three flood basalt flow fields from the Columbia River Basalt Group are presented, including the Palouse Falls flow field, a small (8,890 km2, ˜190 km3) unit by common flood basalt proportions, and visualized in three dimensions. The architecture of the Palouse Falls flow field is compared to the complex Ginkgo and more extensive Sand Hollow flow fields to investigate the degree to which simple emplacement models represent the style, as well as the spatial and temporal developments, of flow fields. Evidence from each flow field supports emplacement by inflation as the predominant mechanism producing thick lobes. Inflation enables existing lobes to transmit lava to form new lobes, thus extending the advance and spread of lava flow fields. Minimum emplacement timescales calculated for each flow field are 19.3 years for Palouse Falls, 8.3 years for Ginkgo, and 16.9 years for Sand Hollow. Simple flow fields can be traced from vent to distal areas and an emplacement sequence visualized, but those with multiple-layered lobes present a degree of complexity that make lava pathways and emplacement sequences more difficult to identify.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saha, Malay C.; Brummer, E. Charles; Kaeppler, Shawn
Switchgrass (Panicum virgatum L.) is a C4 grass with high biomass yield potential and a model species for bioenergy feedstock development. Understanding the genetic basis of quantitative traits is essential to facilitate genome-enabled breeding programs. The nested association mapping (NAM) analysis combines the best features of both bi-parental and association analyses and can provide high power and high resolution in QTL detection and will ensure significant improvements in biomass yield and quality. To develop a NAM population of switchgrass, 15 highly diverse genotypes with specific characteristics were selected from a diversity panel and crossed to a recurrent parent, AP13, amore » genotype selected for whole genome sequencing and parent of a mapping population. Ten genotypes from each of the 15 F1 families were then chain crossed. Progenies form each family were randomly selected to develop the NAM population. The switchgrass NAM population consists of a total of 2000 genotypes from 15 families. All the progenies, founder parents, F1 parents (n=2350) were evaluated in replicated field trials at Ardmore, OK and Knoxville, TN. Phenotypic data on plant height, tillering ability, regrowth, flowering time, and biomass yield were collected. Dried biomass samples were also analyzed using prediction equations of NIRS at the Noble Foundation and for lignin content, S/G ratio, and sugar release characteristics at the NREL. Genomic shotgun sequencing of 15 switchgrass NAM founder parental genomes at JGI produced 28-66 Gb high-quality sequence data. Alignment of these sequences with the reference genome, AP13 (v3.0), revealed that up to 99% of the genomic sequences mapped to the reference genome. A total of 2,149 individuals from NAM populations were sequenced by exome capture and two sets of 15 SNP matrices (one for each family) were generated. QTL associated with important traits have been identified and verified in breeding populations. The QTL detected and their associated markers can be used in molecular breeding programs to facilitate development of improved switchgrass cultivars for biofuel production.« less
Rescaled earthquake recurrence time statistics: application to microrepeaters
NASA Astrophysics Data System (ADS)
Goltz, Christian; Turcotte, Donald L.; Abaimov, Sergey G.; Nadeau, Robert M.; Uchida, Naoki; Matsuzawa, Toru
2009-01-01
Slip on major faults primarily occurs during `characteristic' earthquakes. The recurrence statistics of characteristic earthquakes play an important role in seismic hazard assessment. A major problem in determining applicable statistics is the short sequences of characteristic earthquakes that are available worldwide. In this paper, we introduce a rescaling technique in which sequences can be superimposed to establish larger numbers of data points. We consider the Weibull and log-normal distributions, in both cases we rescale the data using means and standard deviations. We test our approach utilizing sequences of microrepeaters, micro-earthquakes which recur in the same location on a fault. It seems plausible to regard these earthquakes as a miniature version of the classic characteristic earthquakes. Microrepeaters are much more frequent than major earthquakes, leading to longer sequences for analysis. In this paper, we present results for the analysis of recurrence times for several microrepeater sequences from Parkfield, CA as well as NE Japan. We find that, once the respective sequence can be considered to be of sufficient stationarity, the statistics can be well fitted by either a Weibull or a log-normal distribution. We clearly demonstrate this fact by our technique of rescaled combination. We conclude that the recurrence statistics of the microrepeater sequences we consider are similar to the recurrence statistics of characteristic earthquakes on major faults.
Xu, Yi-Hua; Manoharan, Herbert T; Pitot, Henry C
2007-09-01
The bisulfite genomic sequencing technique is one of the most widely used techniques to study sequence-specific DNA methylation because of its unambiguous ability to reveal DNA methylation status to the order of a single nucleotide. One characteristic feature of the bisulfite genomic sequencing technique is that a number of sample sequence files will be produced from a single DNA sample. The PCR products of bisulfite-treated DNA samples cannot be sequenced directly because they are heterogeneous in nature; therefore they should be cloned into suitable plasmids and then sequenced. This procedure generates an enormous number of sample DNA sequence files as well as adding extra bases belonging to the plasmids to the sequence, which will cause problems in the final sequence comparison. Finding the methylation status for each CpG in each sample sequence is not an easy job. As a result CpG PatternFinder was developed for this purpose. The main functions of the CpG PatternFinder are: (i) to analyze the reference sequence to obtain CpG and non-CpG-C residue position information. (ii) To tailor sample sequence files (delete insertions and mark deletions from the sample sequence files) based on a configuration of ClustalW multiple alignment. (iii) To align sample sequence files with a reference file to obtain bisulfite conversion efficiency and CpG methylation status. And, (iv) to produce graphics, highlighted aligned sequence text and a summary report which can be easily exported to Microsoft Office suite. CpG PatternFinder is designed to operate cooperatively with BioEdit, a freeware on the internet. It can handle up to 100 files of sample DNA sequences simultaneously, and the total CpG pattern analysis process can be finished in minutes. CpG PatternFinder is an ideal software tool for DNA methylation studies to determine the differential methylation pattern in a large number of individuals in a population. Previously we developed the CpG Analyzer program; CpG PatternFinder is our further effort to create software tools for DNA methylation studies.
Tamburini, Beth A; Phang, Tzu L; Fosmire, Susan P; Scott, Milcah C; Trapp, Susan C; Duckett, Megan M; Robinson, Sally R; Slansky, Jill E; Sharkey, Leslie C; Cutter, Gary R; Wojcieszyn, John W; Bellgrau, Donald; Gemmill, Robert M; Hunter, Lawrence E; Modiano, Jaime F
2010-11-09
The etiology of hemangiosarcoma remains incompletely understood. Its common occurrence in dogs suggests predisposing factors favor its development in this species. These factors could represent a constellation of heritable characteristics that promote transformation events and/or facilitate the establishment of a microenvironment that is conducive for survival of malignant blood vessel-forming cells. The hypothesis for this study was that characteristic molecular features distinguish hemangiosarcoma from non-malignant endothelial cells, and that such features are informative for the etiology of this disease. We first investigated mutations of VHL and Ras family genes that might drive hemangiosarcoma by sequencing tumor DNA and mRNA (cDNA). Protein expression was examined using immunostaining. Next, we evaluated genome-wide gene expression profiling using the Affymetrix Canine 2.0 platform as a global approach to test the hypothesis. Data were evaluated using routine bioinformatics and validation was done using quantitative real time RT-PCR. Each of 10 tumor and four non-tumor samples analyzed had wild type sequences for these genes. At the genome wide level, hemangiosarcoma cells clustered separately from non-malignant endothelial cells based on a robust signature that included genes involved in inflammation, angiogenesis, adhesion, invasion, metabolism, cell cycle, signaling, and patterning. This signature did not simply reflect a cancer-associated angiogenic phenotype, as it also distinguished hemangiosarcoma from non-endothelial, moderately to highly angiogenic bone marrow-derived tumors (lymphoma, leukemia, osteosarcoma). The data show that inflammation and angiogenesis are important processes in the pathogenesis of vascular tumors, but a definitive ontogeny of the cells that give rise to these tumors remains to be established. The data do not yet distinguish whether functional or ontogenetic plasticity creates this phenotype, although they suggest that cells which give rise to hemangiosarcoma modulate their microenvironment to promote tumor growth and survival. We propose that the frequent occurrence of canine hemangiosarcoma in defined dog breeds, as well as its similarity to homologous tumors in humans, offers unique models to solve the dilemma of stem cell plasticity and whether angiogenic endothelial cells and hematopoietic cells originate from a single cell or from distinct progenitor cells.
2010-01-01
Background The etiology of hemangiosarcoma remains incompletely understood. Its common occurrence in dogs suggests predisposing factors favor its development in this species. These factors could represent a constellation of heritable characteristics that promote transformation events and/or facilitate the establishment of a microenvironment that is conducive for survival of malignant blood vessel-forming cells. The hypothesis for this study was that characteristic molecular features distinguish hemangiosarcoma from non-malignant endothelial cells, and that such features are informative for the etiology of this disease. Methods We first investigated mutations of VHL and Ras family genes that might drive hemangiosarcoma by sequencing tumor DNA and mRNA (cDNA). Protein expression was examined using immunostaining. Next, we evaluated genome-wide gene expression profiling using the Affymetrix Canine 2.0 platform as a global approach to test the hypothesis. Data were evaluated using routine bioinformatics and validation was done using quantitative real time RT-PCR. Results Each of 10 tumor and four non-tumor samples analyzed had wild type sequences for these genes. At the genome wide level, hemangiosarcoma cells clustered separately from non-malignant endothelial cells based on a robust signature that included genes involved in inflammation, angiogenesis, adhesion, invasion, metabolism, cell cycle, signaling, and patterning. This signature did not simply reflect a cancer-associated angiogenic phenotype, as it also distinguished hemangiosarcoma from non-endothelial, moderately to highly angiogenic bone marrow-derived tumors (lymphoma, leukemia, osteosarcoma). Conclusions The data show that inflammation and angiogenesis are important processes in the pathogenesis of vascular tumors, but a definitive ontogeny of the cells that give rise to these tumors remains to be established. The data do not yet distinguish whether functional or ontogenetic plasticity creates this phenotype, although they suggest that cells which give rise to hemangiosarcoma modulate their microenvironment to promote tumor growth and survival. We propose that the frequent occurrence of canine hemangiosarcoma in defined dog breeds, as well as its similarity to homologous tumors in humans, offers unique models to solve the dilemma of stem cell plasticity and whether angiogenic endothelial cells and hematopoietic cells originate from a single cell or from distinct progenitor cells. PMID:21062482
Yin, Changchuan
2015-04-01
To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.
Detection of distorted frames in retinal video-sequences via machine learning
NASA Astrophysics Data System (ADS)
Kolar, Radim; Liberdova, Ivana; Odstrcilik, Jan; Hracho, Michal; Tornow, Ralf P.
2017-07-01
This paper describes detection of distorted frames in retinal sequences based on set of global features extracted from each frame. The feature vector is consequently used in classification step, in which three types of classifiers are tested. The best classification accuracy 96% has been achieved with support vector machine approach.
Ballif, Blake C.; Theisen, Aaron; Rosenfeld, Jill A.; Traylor, Ryan N.; Gastier-Foster, Julie; Thrush, Devon Lamb; Astbury, Caroline; Bartholomew, Dennis; McBride, Kim L.; Pyatt, Robert E.; Shane, Kate; Smith, Wendy E.; Banks, Valerie; Gallentine, William B.; Brock, Pamela; Rudd, M. Katharine; Adam, Margaret P.; Keene, Julia A.; Phillips, John A.; Pfotenhauer, Jean P.; Gowans, Gordon C.; Stankiewicz, Pawel; Bejjani, Bassem A.; Shaffer, Lisa G.
2010-01-01
Segmental duplications, which comprise ∼5%–10% of the human genome, are known to mediate medically relevant deletions, duplications, and inversions through nonallelic homologous recombination (NAHR) and have been suggested to be hot spots in chromosome evolution and human genomic instability. We report seven individuals with microdeletions at 17q23.1q23.2, identified by microarray-based comparative genomic hybridization (aCGH). Six of the seven deletions are ∼2.2 Mb in size and flanked by large segmental duplications of >98% sequence identity and in the same orientation. One of the deletions is ∼2.8 Mb in size and is flanked on the distal side by a segmental duplication, whereas the proximal breakpoint falls between segmental duplications. These characteristics suggest that NAHR mediated six out of seven of these rearrangements. These individuals have common features, including mild to moderate developmental delay (particularly speech delay), microcephaly, postnatal growth retardation, heart defects, and hand, foot, and limb abnormalities. Although all individuals had at least mild dysmorphic facial features, there was no characteristic constellation of features that would elicit clinical suspicion of a specific disorder. The identification of common clinical features suggests that microdeletions at 17q23.1q23.2 constitute a novel syndrome. Furthermore, the inclusion in the minimal deletion region of TBX2 and TBX4, transcription factors belonging to a family of genes implicated in a variety of developmental pathways including those of heart and limb, suggests that these genes may play an important role in the phenotype of this emerging syndrome. PMID:20206336
Grandi, Nicole; Cadeddu, Marta; Blomberg, Jonas; Tramontano, Enzo
2016-09-09
Human endogenous retroviruses (HERVs) are ancient sequences integrated in the germ line cells and vertically transmitted through the offspring constituting about 8 % of our genome. In time, HERVs accumulated mutations that compromised their coding capacity. A prominent exception is HERV-W locus 7q21.2, producing a functional Env protein (Syncytin-1) coopted for placental syncytiotrophoblast formation. While expression of HERV-W sequences has been investigated for their correlation to disease, an exhaustive description of the group composition and characteristics is still not available and current HERV-W group information derive from studies published a few years ago that, of course, used the rough assemblies of the human genome available at that time. This hampers the comparison and correlation with current human genome assemblies. In the present work we identified and described in detail the distribution and genetic composition of 213 HERV-W elements. The bioinformatics analysis led to the characterization of several previously unreported features and provided a phylogenetic classification of two main subgroups with different age and structural characteristics. New facts on HERV-W genomic context of insertion and co-localization with sequences putatively involved in disease development are also reported. The present work is a detailed overview of the HERV-W contribution to the human genome and provides a robust genetic background useful to clarify HERV-W role in pathologies with poorly understood etiology, representing, to our knowledge, the most complete and exhaustive HERV-W dataset up to date.
Deterministic folding: The role of entropic forces and steric specificities
NASA Astrophysics Data System (ADS)
da Silva, Roosevelt A.; da Silva, M. A. A.; Caliri, A.
2001-03-01
The inverse folding problem of proteinlike macromolecules is studied by using a lattice Monte Carlo (MC) model in which steric specificities (nearest-neighbors constraints) are included and the hydrophobic effect is treated explicitly by considering interactions between the chain and solvent molecules. Chemical attributes and steric peculiarities of the residues are encoded in a 10-letter alphabet and a correspondent "syntax" is provided in order to write suitable sequences for the specified target structures; twenty-four target configurations, chosen in order to cover all possible values of the average contact order χ (0.2381⩽χ⩽0.4947 for this system), were encoded and analyzed. The results, obtained by MC simulations, are strongly influenced by geometrical properties of the native configuration, namely χ and the relative number φ of crankshafts-type structures: For χ<0.35 the folding is deterministic, that is, the syntax is able to encode successful sequences: The system presents larger encodability, minimum sequence-target degeneracies and smaller characteristic folding time τf. For χ⩾0.35 the above results are not reproduced any more: The folding success is severely reduced, showing strong correlation with φ. Additionally, the existence of distinct characteristic folding times suggests that different mechanisms are acting at the same time in the folding process. The results (all obtained from the same single model, under the same "physiological conditions") resemble some general features of the folding problem, supporting the premise that the steric specificities, in association with the entropic forces (hydrophobic effect), are basic ingredients in the protein folding process.
Geology of the Devonian black shales of the Appalachian Basin
Roen, J.B.
1984-01-01
Black shales of Devonian age in the Appalachian Basin are a unique rock sequence. The high content of organic matter, which imparts the characteristic lithology, has for years attracted considerable interest in the shales as a possible source of energy. The recent energy shortage prompted the U.S. Department of Energy through the Eastern Gas Shales Project of the Morgantown Energy Technology Center to underwrite a research program to determine the geologic, geochemical, and structural characteristics of the Devonian black shales in order to enhance the recovery of gas from the shales. Geologic studies by Federal and State agencies and academic institutions produced a regional stratigraphic network that correlates the 15 ft black shale sequence in Tennessee with 3000 ft of interbedded black and gray shales in central New York. These studies correlate the classic Devonian black shale sequence in New York with the Ohio Shale of Ohio and Kentucky and the Chattanooga Shale of Tennessee and southwestern Virginia. Biostratigraphic and lithostratigraphic markers in conjunction with gamma-ray logs facilitated long-range correlations within the Appalachian Basin. Basinwide correlations, including the subsurface rocks, provided a basis for determining the areal distribution and thickness of the important black shale units. The organic carbon content of the dark shales generally increases from east to west across the basin and is sufficient to qualify as a hydrocarbon source rock. Significant structural features that involve the black shale and their hydrocarbon potential are the Rome trough, Kentucky River and Irvine-Paint Creek fault zone, and regional decollements and ramp zones. ?? 1984.
Draft genome of the reindeer (Rangifer tarandus).
Li, Zhipeng; Lin, Zeshan; Ba, Hengxing; Chen, Lei; Yang, Yongzhi; Wang, Kun; Qiu, Qiang; Wang, Wen; Li, Guangyu
2017-12-01
The reindeer (Rangifer tarandus) is the only fully domesticated species in the Cervidae family, and it is the only cervid with a circumpolar distribution. Unlike all other cervids, female reindeer, as well as males, regularly grow cranial appendages (antlers, the defining characteristics of cervids). Moreover, reindeer milk contains more protein and less lactose than bovids' milk. A high-quality reference genome of this species will assist efforts to elucidate these and other important features in the reindeer. We obtained 615 Gb (Gigabase) of usable sequences by filtering the low-quality reads of the raw data generated from the Illumina Hiseq 4000 platform, and a 2.64-Gb final assembly, representing 95.7% of the estimated genome (2.76 Gb according to k-mer analysis), including 92.6% of expected genes according to BUSCO analysis. The contig N50 and scaffold N50 sizes were 89.7 kilo base (kb) and 0.94 mega base (Mb), respectively. We annotated 21 555 protein-coding genes and 1.07 Gb of repetitive sequences by de novo and homology-based prediction. Homology-based searches detected 159 rRNA, 547 miRNA, 1339 snRNA, and 863 tRNA sequences in the genome of R. tarandus. The divergence time between R. tarandus and ancestors of Bos taurus and Capra hircus is estimated to be about 29.5 million years ago. Our results provide the first high-quality reference genome for the reindeer and a valuable resource for studying the evolution, domestication, and other unusual characteristics of the reindeer. © The Authors 2017. Published by Oxford University Press.
A novel NHS mutation causes Nance-Horan Syndrome in a Chinese family.
Tian, Qi; Li, Yunping; Kousar, Rizwana; Guo, Hui; Peng, Fenglan; Zheng, Yu; Yang, Xiaohua; Long, Zhigao; Tian, Runyi; Xia, Kun; Lin, Haiying; Pan, Qian
2017-01-07
Nance-Horan Syndrome (NHS) (OMIM: 302350) is a rare X-linked developmental disorder characterized by bilateral congenital cataracts, with occasional dental anomalies, characteristic dysmorphic features, brachymetacarpia and mental retardation. Carrier females exhibit similar manifestations that are less severe than in affected males. Here, we report a four-generation Chinese family with multiple affected individuals presenting Nance-Horan Syndrome. Whole-exome sequencing combined with RT-PCR and Sanger sequencing was used to search for a genetic cause underlying the disease phenotype. Whole-exome sequencing identified in all affected individuals of the family a novel donor splicing site mutation (NM_198270: c.1045 + 2T > A) in intron 4 of the gene NHS, which maps to chromosome Xp22.13. The identified mutation results in an RNA processing defect causing a 416-nucleotide addition to exon 4 of the mRNA transcript, likely producing a truncated NHS protein. The donor splicing site mutation NM_198270: c.1045 + 2T > A of the NHS gene is the causative mutation in this Nance-Horan Syndrome family. This research broadens the spectrum of NHS gene mutations, contributing to our understanding of the molecular genetics of NHS.
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.
2013-01-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147
Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.
Manrubia, Susanna; Cuesta, José A
2017-04-01
An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).
kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.
Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A
2013-07-01
Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.
Music-Elicited Emotion Identification Using Optical Flow Analysis of Human Face
NASA Astrophysics Data System (ADS)
Kniaz, V. V.; Smirnova, Z. N.
2015-05-01
Human emotion identification from image sequences is highly demanded nowadays. The range of possible applications can vary from an automatic smile shutter function of consumer grade digital cameras to Biofied Building technologies, which enables communication between building space and residents. The highly perceptual nature of human emotions leads to the complexity of their classification and identification. The main question arises from the subjective quality of emotional classification of events that elicit human emotions. A variety of methods for formal classification of emotions were developed in musical psychology. This work is focused on identification of human emotions evoked by musical pieces using human face tracking and optical flow analysis. Facial feature tracking algorithm used for facial feature speed and position estimation is presented. Facial features were extracted from each image sequence using human face tracking with local binary patterns (LBP) features. Accurate relative speeds of facial features were estimated using optical flow analysis. Obtained relative positions and speeds were used as the output facial emotion vector. The algorithm was tested using original software and recorded image sequences. The proposed technique proves to give a robust identification of human emotions elicited by musical pieces. The estimated models could be used for human emotion identification from image sequences in such fields as emotion based musical background or mood dependent radio.
Maximum entropy methods for extracting the learned features of deep neural networks.
Finnegan, Alex; Song, Jun S
2017-10-01
New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.
Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong
2017-01-01
Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.
Zhou, Ligang; Zhou, Meixian; Sun, Chaomin; Han, Jing; Lu, Qiuhe; Zhou, Jian; Xiang, Hua
2008-08-01
The precise nick site in the double-strand origin (DSO) of pZMX201, a 1,668-bp rolling-circle replication (RCR) plasmid from the haloarchaeon Natrinema sp. CX2021, was determined by electron microscopy and DSO mapping. In this plasmid, DSO nicking occurred between residues C404 and G405 within a heptanucleotide sequence (TCTC/GGC) located in the stem region of an imperfect hairpin structure. This nick site sequence was conserved among the haloarchaeal RCR plasmids, including pNB101, suggesting that the DSO nick site might be the same for all members of this plasmid family. Interestingly, the DSOs of pZMX201 and pNB101 were found to be cross-recognized in RCR initiation and termination in a hybrid plasmid system. Mutation analysis of the DSO from pZMX201 (DSO(Z)) in this hybrid plasmid system revealed that: (i) the nucleotides in the middle of the conserved TCTCGGC sequence play more-important roles in the initiation and termination process; (ii) the left half of the hairpin structure is required for initiation but not for termination; and (iii) a 36-bp sequence containing TCTCGGC and the downstream sequence is essential and sufficient for termination. In conclusion, these haloarchaeal plasmids, with novel features that are different from the characteristics of both single-stranded DNA phages and bacterial RCR plasmids, might serve as a good model for studying the evolution of RCR replicons.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.
Emergence of biological organization through thermodynamic inversion.
Kompanichenko, Vladimir
2014-01-01
Biological organization arises under thermodynamic inversion in prebiotic systems that provide the prevalence of free energy and information contribution over the entropy contribution. The inversion might occur under specific far-from-equilibrium conditions in prebiotic systems oscillating around the bifurcation point. At the inversion moment, (physical) information characteristic of non-biological systems acquires the new features: functionality, purposefulness, and control over the life processes, which transform it into biological information. Random sequences of amino acids and nucleotides, spontaneously synthesized in the prebiotic microsystem, in the primary living unit (probiont) re-assemble into functional sequences, involved into bioinformation circulation through nucleoprotein interaction, resulted in the genetic code emergence. According to the proposed concept, oscillating three-dimensional prebiotic microsystems transformed into probionts in the changeable hydrothermal medium of the early Earth. The inversion concept states that spontaneous (accidental, random) transformations in prebiotic systems cannot produce life; it is only non-spontaneous (perspective, purposeful) transformations, which are the result of thermodynamic inversion, that lead to the negentropy conversion of prebiotic systems into initial living units.
[Magnetic resonance for the study of osteosarcoma].
Spina, V; Romagnoli, R; Manfrini, M; Cerofolini, E; Capanna, R; Gaiani, L; Calandra Buonaura, P; Picci, P; Campanacci, M
1991-01-01
The authors report their experience with MR imaging in the study of osteosarcoma. Two main elements were evaluated: signal characteristics and loco-regional staging. Seventy-one patients were studied: 65 of them had central long-bone osteosarcoma, and 6 had telangiectatic long-bone osteosarcoma. T1- and T2-weighted spin-echo sequences were employed and all cases were scanned on 3 planes (sagittal, coronal, and axial). In 28 patients MR imaging was performed both before and after preoperative chemotherapy. The obtained data were compared to surgical and pathological findings. With the exception of the typical signal patterns of quite-osteoblastic osteosarcoma (which presents with low signal on both T1- and T2-weighted sequences), no particular signal features were observed which could help distinguish the different types of osteosarcoma. MR imaging is the method of choice in loco-regional staging for, in our series, it allowed a rational and adequate surgical planning. For this purpose, at least a longitudinal T1- and an axial T2-weighted images are required.
Rajan, Arunkumar Chitteth; Rezapour, Mohammad Reza; Yun, Jeonghun; Cho, Yeonchoo; Cho, Woo Jong; Min, Seung Kyu; Lee, Geunsik; Kim, Kwang S
2014-02-25
Laser-driven molecular spectroscopy of low spatial resolution is widely used, while electronic current-driven molecular spectroscopy of atomic scale resolution has been limited because currents provide only minimal information. However, electron transmission of a graphene nanoribbon on which a molecule is adsorbed shows molecular fingerprints of Fano resonances, i.e., characteristic features of frontier orbitals and conformations of physisorbed molecules. Utilizing these resonance profiles, here we demonstrate two-dimensional molecular electronics spectroscopy (2D MES). The differential conductance with respect to bias and gate voltages not only distinguishes different types of nucleobases for DNA sequencing but also recognizes methylated nucleobases which could be related to cancerous cell growth. This 2D MES could open an exciting field to recognize single molecule signatures at atomic resolution. The advantages of the 2D MES over the one-dimensional (1D) current analysis can be comparable to those of 2D NMR over 1D NMR analysis.
Mechanism for Coordinated RNA Packaging and Genome Replication by Rotavirus Polymerase VP1
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lu, Xiaohui; McDonald, Sarah M.; Tortorici, M. Alejandra
2009-04-08
Rotavirus RNA-dependent RNA polymerase VP1 catalyzes RNA synthesis within a subviral particle. This activity depends on core shell protein VP2. A conserved sequence at the 3' end of plus-strand RNA templates is important for polymerase association and genome replication. We have determined the structure of VP1 at 2.9 {angstrom} resolution, as apoenzyme and in complex with RNA. The cage-like enzyme is similar to reovirus {lambda}3, with four tunnels leading to or from a central, catalytic cavity. A distinguishing characteristic of VP1 is specific recognition, by conserved features of the template-entry channel, of four bases, UGUG, in the conserved 3' sequence.more » Well-defined interactions with these bases position the RNA so that its 3' end overshoots the initiating register, producing a stable but catalytically inactive complex. We propose that specific 3' end recognition selects rotavirus RNA for packaging and that VP2 activates the autoinhibited VP1/RNA complex to coordinate packaging and genome replication.« less
Genome Structure of the Legume, Lotus japonicus
Sato, Shusei; Nakamura, Yasukazu; Kaneko, Takakazu; Asamizu, Erika; Kato, Tomohiko; Nakao, Mitsuteru; Sasamoto, Shigemi; Watanabe, Akiko; Ono, Akiko; Kawashima, Kumiko; Fujishiro, Tsunakazu; Katoh, Midori; Kohara, Mitsuyo; Kishida, Yoshie; Minami, Chiharu; Nakayama, Shinobu; Nakazaki, Naomi; Shimizu, Yoshimi; Shinpo, Sayaka; Takahashi, Chika; Wada, Tsuyuko; Yamada, Manabu; Ohmido, Nobuko; Hayashi, Makoto; Fukui, Kiichi; Baba, Tomoya; Nakamichi, Tomoko; Mori, Hirotada; Tabata, Satoshi
2008-01-01
The legume Lotus japonicus has been widely used as a model system to investigate the genetic background of legume-specific phenomena such as symbiotic nitrogen fixation. Here, we report structural features of the L. japonicus genome. The 315.1-Mb sequences determined in this and previous studies correspond to 67% of the genome (472 Mb), and are likely to cover 91.3% of the gene space. Linkage mapping anchored 130-Mb sequences onto the six linkage groups. A total of 10 951 complete and 19 848 partial structures of protein-encoding genes were assigned to the genome. Comparative analysis of these genes revealed the expansion of several functional domains and gene families that are characteristic of L. japonicus. Synteny analysis detected traces of whole-genome duplication and the presence of synteny blocks with other plant genomes to various degrees. This study provides the first opportunity to look into the complex and unique genetic system of legumes. PMID:18511435
Neuhaus, H; Link, G
1987-01-01
The trnK gene endocing the tRNALys(UUU) has been located on mustard (Sinapis alba) chloroplast DNA, 263 bp upstream of the psbA gene on the same strand. The nucleotide sequence of the trnK gene and its flanking regions as well as the putative transcription start and termination sites are shown. The 5' end of the transcript lies 121 bp upstream of the 5' tRNA coding region and is preceded by procaryotic-type "-10" and "-35" sequence elements, while the 3' end maps 2.77 kb downstream to a DNA region with possible stemloop secondary structure. The anticodon loop of the tRNALys is interrupted by a 2,574 bp intron containing a long open reading frame, which codes for 524 amino acids. Based on conserved stem and loop structures, this intron has characteristic features of a class II intron. A region near the carboxyl terminus of the derived polypeptide appears structurally related to maturases.
Building Facade Modeling Under Line Feature Constraint Based on Close-Range Images
NASA Astrophysics Data System (ADS)
Liang, Y.; Sheng, Y. H.
2018-04-01
To solve existing problems in modeling facade of building merely with point feature based on close-range images , a new method for modeling building facade under line feature constraint is proposed in this paper. Firstly, Camera parameters and sparse spatial point clouds data were restored using the SFM , and 3D dense point clouds were generated with MVS; Secondly, the line features were detected based on the gradient direction , those detected line features were fit considering directions and lengths , then line features were matched under multiple types of constraints and extracted from multi-image sequence. At last, final facade mesh of a building was triangulated with point cloud and line features. The experiment shows that this method can effectively reconstruct the geometric facade of buildings using the advantages of combining point and line features of the close - range image sequence, especially in restoring the contour information of the facade of buildings.
2013-01-01
Background Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. Results In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Conclusions Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome. PMID:23374229
Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo
2013-02-04
Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.
Johnston, J.W.; Thompson, T.A.; Wilcox, D.A.; Baedke, S.J.
2007-01-01
A common break was recognized in four Lake Superior strandplain sequences using geomorphic and sedimentologic characteristics. Strandplains were divided into lakeward and landward sets of beach ridges using aerial photographs and topographic surveys to identify similar surficial features and core data to identify similar subsurface features. Cross-strandplain, elevation-trend changes from a lowering towards the lake in the landward set of beach ridges to a rise or reduction of slope towards the lake in the lakeward set of beach ridges indicates that the break is associated with an outlet change for Lake Superior. Correlation of this break between study sites and age model results for the strandplain sequences suggest that the outlet change occurred sometime after about 2,400 calendar years ago (after the Algoma phase). Age model results from one site (Grand Traverse Bay) suggest an alternate age closer to about 1,200 calendar years ago but age models need to be investigated further. The landward part of the strandplain was deposited when water levels were common in all three upper Great Lakes basins (Superior, Huron, and Michigan) and drained through the Port Huron/Sarnia outlet. The lakeward part was deposited after the Sault outlet started to help regulate water levels in the Lake Superior basin. The landward beach ridges are commonly better defined and continuous across the embayments, more numerous, larger in relief, wider, have greater vegetation density, and intervening swales contain more standing water and peat than the lakeward set. Changes in drainage patterns, foreshore sediment thickness and grain size help in identifying the break between sets in the strandplain sequences. Investigation of these breaks may help identify possible gaps in the record or missing ridges in strandplain sequences that may not be apparent when viewing age distributions and may justify the need for multiple age and glacial isostatic adjustment models. ?? 2006 Springer Science+Business Media B.V.
Traeger, Stefanie; Altegoer, Florian; Freitag, Michael; Gabaldon, Toni; Kempken, Frank; Kumar, Abhishek; Marcet-Houben, Marina; Pöggeler, Stefanie; Stajich, Jason E.; Nowrousian, Minou
2013-01-01
Fungi are a large group of eukaryotes found in nearly all ecosystems. More than 250 fungal genomes have already been sequenced, greatly improving our understanding of fungal evolution, physiology, and development. However, for the Pezizomycetes, an early-diverging lineage of filamentous ascomycetes, there is so far only one genome available, namely that of the black truffle, Tuber melanosporum, a mycorrhizal species with unusual subterranean fruiting bodies. To help close the sequence gap among basal filamentous ascomycetes, and to allow conclusions about the evolution of fungal development, we sequenced the genome and assayed transcriptomes during development of Pyronema confluens, a saprobic Pezizomycete with a typical apothecium as fruiting body. With a size of 50 Mb and ∼13,400 protein-coding genes, the genome is more characteristic of higher filamentous ascomycetes than the large, repeat-rich truffle genome; however, some typical features are different in the P. confluens lineage, e.g. the genomic environment of the mating type genes that is conserved in higher filamentous ascomycetes, but only partly conserved in P. confluens. On the other hand, P. confluens has a full complement of fungal photoreceptors, and expression studies indicate that light perception might be similar to distantly related ascomycetes and, thus, represent a basic feature of filamentous ascomycetes. Analysis of spliced RNA-seq sequence reads allowed the detection of natural antisense transcripts for 281 genes. The P. confluens genome contains an unusually high number of predicted orphan genes, many of which are upregulated during sexual development, consistent with the idea of rapid evolution of sex-associated genes. Comparative transcriptomics identified the transcription factor gene pro44 that is upregulated during development in P. confluens and the Sordariomycete Sordaria macrospora. The P. confluens pro44 gene (PCON_06721) was used to complement the S. macrospora pro44 deletion mutant, showing functional conservation of this developmental regulator. PMID:24068976
Modeling Forest Understory Fires in an Eastern Amazonian Landscape
NASA Technical Reports Server (NTRS)
Alencar, A. A. C.; Solorzano, L. A.; Nepstad, D. C.
2004-01-01
Forest understory fires are an increasingly important cause of forest impoverishment in Ammonia, but little is known of the landscape characteristics and climatic phenomena that determine their occurrence. We developed empirical functions relating the occurrence of understory fires to landscape features near Paragominas, a 35- yr-old ranching and logging center in eastern Ammonia. An historical sequence of maps of forest understory fire was created based on field interviews With local farmers and Landsat TM images. Several landscape features that might explain spatial variations in the occurrence of understory fires were also mapped and co-registered for each of the sample dates, including: forest fragment size and shape, forest impoverishment through logging and understory fires, source of ignition (settlements and charcoal pits), roads, forest edges, and others. The spatial relationship between forest understory fire and each landscape characteristic was tested by regression analyses. Fire probability models were then developed for various combinations of landscape characteristics. The analyses were conducted separately for years of the El Nino Southern Oscillation (ENSO), which are associated with severe drought in eastern Amazonia, and non-ENS0 years. Most (91 %) of the forest area that burned during the 10-yr sequence caught fire during ENSO years, when severe drought may have increased both forest flammability and the escape of agricultural management fires. Forest understory fires were associated with forest edges, as reported in previous studies from Ammonia. But the strongest predictor of forest fire was the percentage of the forest fragment that had been previously logged or burned. Forest fragment size, distance to charcoal pits, distance to agricultural settlement, proximity to forest edge, and distance to roads were also correlated with forest understory fire. Logistic regression models using information on fragment degradation and distance to ignition sources accurately predicted the location of lss than 80% of the forest fires observed during the ENSO event of 1997- 1998. In this Amazon landscape, forest understory fire is a complex function of several variables that influence both the flammability and ignition exposure of the forest.
Variants in SLC18A3, vesicular acetylcholine transporter, cause congenital myasthenic syndrome
O'Grady, Gina L.; Verschuuren, Corien; Yuen, Michaela; Webster, Richard; Menezes, Manoj; Fock, Johanna M.; Pride, Natalie; Best, Heather A.; Benavides Damm, Tatiana; Turner, Christian; Lek, Monkol; Engel, Andrew G.; North, Kathryn N.; Clarke, Nigel F.; MacArthur, Daniel G.; Kamsteeg, Erik-Jan
2016-01-01
Objective: To describe the clinical and genetic characteristics of presynaptic congenital myasthenic syndrome secondary to biallelic variants in SLC18A3. Methods: Individuals from 2 families were identified with biallelic variants in SLC18A3, the gene encoding the vesicular acetylcholine transporter (VAChT), through whole-exome sequencing. Results: The patients demonstrated features seen in presynaptic congenital myasthenic syndrome, including ptosis, ophthalmoplegia, fatigable weakness, apneic crises, and deterioration of symptoms in cold water for patient 1. Both patients demonstrated moderate clinical improvement on pyridostigmine. Patient 1 had a broader phenotype, including learning difficulties and left ventricular dysfunction. Electrophysiologic studies were typical for a presynaptic defect. Both patients showed profound electrodecrement on low-frequency repetitive stimulation followed by a prolonged period of postactivation exhaustion. In patient 1, this was unmasked only after isometric contraction, a recognized feature of presynaptic disease, emphasizing the importance of activation procedures. Conclusions: VAChT is responsible for uptake of acetylcholine into presynaptic vesicles. The clinical and electrographic characteristics of the patients described are consistent with previously reported mouse models of VAChT deficiency. These findings make it very likely that defects in VAChT due to variants in SLC18A3 are a cause of congenital myasthenic syndrome in humans. PMID:27590285
Interevent time distributions of human multi-level activity in a virtual world
NASA Astrophysics Data System (ADS)
Mryglod, O.; Fuchs, B.; Szell, M.; Holovatch, Yu.; Thurner, S.
2015-02-01
Studying human behavior in virtual environments provides extraordinary opportunities for a quantitative analysis of social phenomena with levels of accuracy that approach those of the natural sciences. In this paper we use records of player activities in the massive multiplayer online game Pardus over 1238 consecutive days, and analyze dynamical features of sequences of actions of players. We build on previous work where temporal structures of human actions of the same type were quantified, and provide an empirical understanding of human actions of different types. This study of multi-level human activity can be seen as a dynamic counterpart of static multiplex network analysis. We show that the interevent time distributions of actions in the Pardus universe follow highly non-trivial distribution functions, from which we extract action-type specific characteristic 'decay constants'. We discuss characteristic features of interevent time distributions, including periodic patterns on different time scales, bursty dynamics, and various functional forms on different time scales. We comment on gender differences of players in emotional actions, and find that while males and females act similarly when performing some positive actions, females are slightly faster for negative actions. We also observe effects on the age of players: more experienced players are generally faster in making decisions about engaging in and terminating enmity and friendship, respectively.
Noncoding sequence classification based on wavelet transform analysis: part II
NASA Astrophysics Data System (ADS)
Paredes, O.; Strojnik, M.; Romo-Vázquez, R.; Vélez-Pérez, H.; Ranta, R.; Garcia-Torales, G.; Scholl, M. K.; Morales, J. A.
2017-09-01
DNA sequences in human genome can be divided into the coding and noncoding ones. We hypothesize that the characteristic periodicities of the noncoding sequences are related to their function. We describe the procedure to identify these characteristic periodicities using the wavelet analysis. Our results show that three groups of noncoding sequences, each one with different biological function, may be differentiated by their wavelet coefficients within specific frequency range.
Hammond, R W
2003-06-01
Isolates of Prunus necrotic ringspot virus (PNRSV) were examined to establish the level of naturally occurring sequence variation in the coat protein (CP) gene and to identify group-specific genome features that may prove valuable for the generation of diagnostic reagents. Phylogenetic analysis of a 452 bp sequence of 68 virus isolates, 20 obtained from the European Union Ilarvirus Ringtest held in October 1998, confirmed the clustering of the isolates into three distinct groups. Although no correlation was found between the sequence and host or geographic origin, there was a general trend for severe isolates to cluster into one group. Group-specific features have been identified for discrimination between virus strains.
Del Grande, Filippo; Subhawong, Ty; Weber, Kristy; Aro, Michael; Mugera, Charles; Fayad, Laura M
2014-05-01
To determine the added value of functional magnetic resonance (MR) sequences (dynamic contrast material-enhanced [DCE] and quantitative diffusion-weighted [DW] imaging with apparent diffusion coefficient [ADC] mapping) for the detection of recurrent soft-tissue sarcomas following surgical resection. This retrospective study was approved by the institutional review board. The requirement to obtain informed consent was waived. Thirty-seven patients referred for postoperative surveillance after resection of soft-tissue sarcoma (35 with high-grade sarcoma) were studied. Imaging at 3.0 T included conventional (T1-weighted, fluid-sensitive, and contrast-enhanced T1-weighted imaging) and functional (DCE MR imaging, DW imaging with ADC mapping) sequences. Recurrences were confirmed with biopsy or resection. A disease-free state was determined with at least 6 months of follow-up. Two readers independently recorded the signal and morphologic characteristics with conventional sequences, the presence or absence of arterial enhancement at DCE MR imaging, and ADCs of the surgical bed. The accuracy of conventional MR imaging in the detection of recurrence was compared with that with the addition of functional sequences. The Fisher exact and Wilcoxon rank sum tests were used to define the accuracy of imaging features, the Cohen κ and Lin interclass correlation were used to define interobserver variability, and receiver operating characteristic analysis was used to define a threshold to detect recurrence and assess reader confidence after the addition of functional imaging to conventional sequences. There were six histologically proved recurrences in 37 patients. Sensitivity and specificity of MR imaging in the detection of tumor recurrence were 100% (six of six patients) and 52% (16 of 31 patients), respectively, with conventional sequences, 100% (six of six patients) and 97% (30 of 31 patients) with the addition of DCE MR imaging, and 60% (three of five patients) and 97% (30 of 31 patients) with the addition of DW imaging and ADC mapping. The average ADC of recurrence (1.08 mm(2)/sec ± 0.19) was significantly different from those of postoperative scarring (0.9 mm(2)/sec ± 0.00) and hematomas (2.34 mm(2)/sec ± 0.72) (P = .03 for both). The addition of functional MR sequences to a routine MR protocol, in particular DCE MR imaging, offers a specificity of more than 95% for distinguishing recurrent sarcoma from postsurgical scarring.
Form drag in rivers due to small-scale natural topographic features: 2. Irregular sequences
Kean, J.W.; Smith, J.D.
2006-01-01
The size, shape, and spacing of small-scale topographic features found on the boundaries of natural streams, rivers, and floodplains can be quite variable. Consequently, a procedure for determining the form drag on irregular sequences of different-sized topographic features is essential for calculating near-boundary flows and sediment transport. A method for carrying out such calculations is developed in this paper. This method builds on the work of Kean and Smith (2006), which describes the flow field for the simpler case of a regular sequence of identical topographic features. Both approaches model topographic features as two-dimensional elements with Gaussian-shaped cross sections defined in terms of three parameters. Field measurements of bank topography are used to show that (1) the magnitude of these shape parameters can vary greatly between adjacent topographic features and (2) the variability of these shape parameters follows a lognormal distribution. Simulations using an irregular set of topographic roughness elements show that the drag on an individual element is primarily controlled by the size and shape of the feature immediately upstream and that the spatial average of the boundary shear stress over a large set of randomly ordered elements is relatively insensitive to the sequence of the elements. In addition, a method to transform the topography of irregular surfaces into an equivalently rough surface of regularly spaced, identical topographic elements also is given. The methods described in this paper can be used to improve predictions of flow resistance in rivers as well as quantify bank roughness.
Accurate prediction of RNA-binding protein residues with two discriminative structural descriptors.
Sun, Meijian; Wang, Xia; Zou, Chuanxin; He, Zenghui; Liu, Wei; Li, Honglin
2016-06-07
RNA-binding proteins participate in many important biological processes concerning RNA-mediated gene regulation, and several computational methods have been recently developed to predict the protein-RNA interactions of RNA-binding proteins. Newly developed discriminative descriptors will help to improve the prediction accuracy of these prediction methods and provide further meaningful information for researchers. In this work, we designed two structural features (residue electrostatic surface potential and triplet interface propensity) and according to the statistical and structural analysis of protein-RNA complexes, the two features were powerful for identifying RNA-binding protein residues. Using these two features and other excellent structure- and sequence-based features, a random forest classifier was constructed to predict RNA-binding residues. The area under the receiver operating characteristic curve (AUC) of five-fold cross-validation for our method on training set RBP195 was 0.900, and when applied to the test set RBP68, the prediction accuracy (ACC) was 0.868, and the F-score was 0.631. The good prediction performance of our method revealed that the two newly designed descriptors could be discriminative for inferring protein residues interacting with RNAs. To facilitate the use of our method, a web-server called RNAProSite, which implements the proposed method, was constructed and is freely available at http://lilab.ecust.edu.cn/NABind .
Gad, Alona; Laurino, Mercy; Maravilla, Kenneth R; Matsushita, Mark; Raskind, Wendy H
2008-07-15
The Waardenburg syndromes (WS) account for approximately 2% of congenital sensorineural deafness. This heterogeneous group of diseases currently can be categorized into four major subtypes (WS types 1-4) on the basis of characteristic clinical features. Multiple genes have been implicated in WS, and mutations in some genes can cause more than one WS subtype. In addition to eye, hair, and skin pigmentary abnormalities, dystopia canthorum and broad nasal bridge are seen in WS type 1. Mutations in the PAX3 gene are responsible for the condition in the majority of these patients. In addition, mutations in PAX3 have been found in WS type 3 that is distinguished by musculoskeletal abnormalities, and in a family with a rare subtype of WS, craniofacial-deafness-hand syndrome (CDHS), characterized by dysmorphic facial features, hand abnormalities, and absent or hypoplastic nasal and wrist bones. Here we describe a woman who shares some, but not all features of WS type 3 and CDHS, and who also has abnormal cranial bones. All sinuses were hypoplastic, and the cochlea were small. No sequence alteration in PAX3 was found. These observations broaden the clinical range of WS and suggest there may be genetic heterogeneity even within the CDHS subtype. 2008 Wiley-Liss, Inc.
Templated sequence insertion polymorphisms in the human genome
NASA Astrophysics Data System (ADS)
Onozawa, Masahiro; Aplan, Peter
2016-11-01
Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.
Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements
Peterson, Thomas
2013-01-01
Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome. PMID:23966872
Preattentive binding of auditory and visual stimulus features.
Winkler, István; Czigler, István; Sussman, Elyse; Horváth, János; Balázs, Lászlo
2005-02-01
We investigated the role of attention in feature binding in the auditory and the visual modality. One auditory and one visual experiment used the mismatch negativity (MMN and vMMN, respectively) event-related potential to index the memory representations created from stimulus sequences, which were either task-relevant and, therefore, attended or task-irrelevant and ignored. In the latter case, the primary task was a continuous demanding within-modality task. The test sequences were composed of two frequently occurring stimuli, which differed from each other in two stimulus features (standard stimuli) and two infrequently occurring stimuli (deviants), which combined one feature from one standard stimulus with the other feature of the other standard stimulus. Deviant stimuli elicited MMN responses of similar parameters across the different attentional conditions. These results suggest that the memory representations involved in the MMN deviance detection response encoded the frequently occurring feature combinations whether or not the test sequences were attended. A possible alternative to the memory-based interpretation of the visual results, the elicitation of the McCollough color-contingent aftereffect, was ruled out by the results of our third experiment. The current results are compared with those supporting the attentive feature integration theory. We conclude that (1) with comparable stimulus paradigms, similar results have been obtained in the two modalities, (2) there exist preattentive processes of feature binding, however, (3) conjoining features within rich arrays of objects under time pressure and/or longterm retention of the feature-conjoined memory representations may require attentive processes.
NASA Astrophysics Data System (ADS)
Merkel, Ronny; Gruhn, Stefan; Dittmann, Jana; Vielhauer, Claus; Bräutigam, Anja
2012-03-01
Determining the age of latent fingerprint traces found at crime scenes is an unresolved research issue since decades. Solving this issue could provide criminal investigators with the specific time a fingerprint trace was left on a surface, and therefore would enable them to link potential suspects to the time a crime took place as well as to reconstruct the sequence of events or eliminate irrelevant fingerprints to ensure privacy constraints. Transferring imaging techniques from different application areas, such as 3D image acquisition, surface measurement and chemical analysis to the domain of lifting latent biometric fingerprint traces is an upcoming trend in forensics. Such non-destructive sensor devices might help to solve the challenge of determining the age of a latent fingerprint trace, since it provides the opportunity to create time series and process them using pattern recognition techniques and statistical methods on digitized 2D, 3D and chemical data, rather than classical, contact-based capturing techniques, which alter the fingerprint trace and therefore make continuous scans impossible. In prior work, we have suggested to use a feature called binary pixel, which is a novel approach in the working field of fingerprint age determination. The feature uses a Chromatic White Light (CWL) image sensor to continuously scan a fingerprint trace over time and retrieves a characteristic logarithmic aging tendency for 2D-intensity as well as 3D-topographic images from the sensor. In this paper, we propose to combine such two characteristic aging features with other 2D and 3D features from the domains of surface measurement, microscopy, photography and spectroscopy, to achieve an increase in accuracy and reliability of a potential future age determination scheme. Discussing the feasibility of such variety of sensor devices and possible aging features, we propose a general fusion approach, which might combine promising features to a joint age determination scheme in future. We furthermore demonstrate the feasibility of the introduced approach by exemplary fusing the binary pixel features based on 2D-intensity and 3D-topographic images of the mentioned CWL sensor. We conclude that a formula based age determination approach requires very precise image data, which cannot be achieved at the moment, whereas a machine learning based classification approach seems to be feasible, if an adequate amount of features can be provided.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saunders, T.D.; Pemberton, A.G.; Ranger, M.J.
A well-exposed example of a regressive barrier island succession crops out in the Alberta badlands along the Red Deer River Valley. In the most landward (northwestern) corner of the study area, only shallow-water and subaerial deposits are represented and are dominated by tidal inlet related facies. Seaward (southeast), water depth increases and the succession is typified by open-marine beach to offshore-related facies arranged in coarsening-upward progradational sequence. Detailed sedimentologic and ichnologic analyses of this sequence have allowed for its division into three distinct environmental zones (lower, middle, and upper). The lower zone comprises a laterally diverse assemblage of storm-influenced, lowermore » shoreface through offshore deposits. Outcrop in the northeast is dominated by thick beds of hummocky and/or swaley cross-stratified storm sand. In the southeast, storm events have only minor influence. This lower zone contains a wide diversity of well-preserved trace fossils whose distribution appears to have been influenced by gradients in wave energy, bottom stagnation, and the interplay of storm and fair-weather processes. The middle zone records deposition across an upper shoreface environment. Here, horizontal to low-angle bedding predominates, with interspersed sets of small- and large-scale cross-bedding increasing toward the top. A characteristic feature of the upper part of this zone is the lack of biogenic structures suggesting deposition in an exposed high-energy surf zone. The upper zone records intertidal to supratidal progradation of the shoreline complex. Planar-laminated sandstone forms a distinct foreshore interval above which rhizoliths and organic material become increasingly abundant, marking transition to the backshore. A significant feature of this zone is the occurrence of an intensely bioturbated interval toward the top of the foreshore.« less
Campens, Laurence; Callewaert, Bert; Muiño Mosquera, Laura; Renard, Marjolijn; Symoens, Sofie; De Paepe, Anne; Coucke, Paul; De Backer, Julie
2015-02-03
Heritable Thoracic Aortic Disorders (H-TAD) may present clinically as part of a syndromic entity or as an isolated (nonsyndromic) manifestation. About one dozen genes are now available for clinical molecular testing. Targeted single gene testing is hampered by significant clinical overlap between syndromic H-TAD entities and the absence of discriminating features in isolated cases. Therefore panel testing of multiple genes has now emerged as the preferred approach. So far, no data on mutation detection rate with this technique have been reported. We performed Next Generation Sequencing (NGS) based screening of the seven currently most prevalent H-TAD-associated genes (FBN1, TGFBR1/2, TGFB2, SMAD3, ACTA2 and COL3A1) on 264 samples from unrelated probands referred for H-TAD and related entities. Patients fulfilling the criteria for Marfan syndrome (MFS) were only included if targeted FBN1 sequencing and MLPA analysis were negative. A mutation was identified in 34 patients (13%): 12 FBN1, one TGFBR1, two TGFBR2, three TGFB2, nine SMAD3, four ACTA2 and three COL3A1 mutations. We found mutations in FBN1 (N = 3), TGFBR2 (N = 1) and COL3A1 (N = 2) in patients without characteristic clinical features of syndromal H-TAD. Six TAD patients harboring a mutation in SMAD3 and one TAD patient with a TGFB2 mutation fulfilled the diagnostic criteria for MFS. NGS based H-TAD panel testing efficiently reveals a mutation in 13% of patients. Our observations emphasize the clinical overlap between patients harboring mutations in syndromic and nonsyndromic H-TAD related genes as well as within syndromic H-TAD entities, justifying a widespread application of this technique.
Wagner, Monica; Shafer, Valerie L.; Martin, Brett; Steinschneider, Mitchell
2013-01-01
The influence of native-language experience on sensory-obligatory auditory-evoked potentials (AEPs) was investigated in native-English and native-Polish listeners. AEPs were recorded to the first word in nonsense word pairs, while participants performed a syllable identification task to the second word in the pairs. Nonsense words contained phoneme sequence onsets (i.e., /pt/, /pət/, /st/ and /sət/) that occur in the Polish and English languages, with the exception that /pt/ at syllable onset is an illegal phonotactic form in English. P1–N1–P2 waveforms from fronto-central electrode sites were comparable in English and Polish listeners, even though, these same English participants were unable to distinguish the nonsense words having /pt/ and /pət/ onsets. The P1–N1–P2 complex indexed the temporal characteristics of the word stimuli in the same manner for both language groups. Taken together, these findings suggest that the fronto-central P1–N1–P2 complex reflects acoustic feature processing of speech and is not significantly influenced by exposure to the phoneme sequences of the native-language. In contrast, the T-complex from bilateral posterior temporal sites was found to index phonological as well as acoustic feature processing to the nonsense word stimuli. An enhanced negativity for the /pt/ cluster relative to its contrast sequence (i.e., /pət/) occurred only for the Polish listeners, suggesting that neural networks within non-primary auditory cortex may be involved in early cortical phonological processing. PMID:23643857
Glaciers and Ice Sheets As Analog Environments of Potentially Habitable Icy Worlds
Garcia-Lopez, Eva; Cid, Cristina
2017-01-01
Icy worlds in the solar system and beyond have attracted a remarkable attention as possible habitats for life. The current consideration about whether life exists beyond Earth is based on our knowledge of life in terrestrial cold environments. On Earth, glaciers and ice sheets have been considered uninhabited for a long time as they seemed too hostile to harbor life. However, these environments are unique biomes dominated by microbial communities which maintain active biochemical routes. Thanks to techniques such as microscopy and more recently DNA sequencing methods, a great biodiversity of prokaryote and eukaryote microorganisms have been discovered. These microorganisms are adapted to a harsh environment, in which the most extreme features are the lack of liquid water, extremely cold temperatures, high solar radiation and nutrient shortage. Here we compare the environmental characteristics of icy worlds, and the environmental characteristics of terrestrial glaciers and ice sheets in order to address some interesting questions: (i) which are the characteristics of habitability known for the frozen worlds, and which could be compatible with life, (ii) what are the environmental characteristics of terrestrial glaciers and ice sheets that can be life-limiting, (iii) What are the microbial communities of prokaryotic and eukaryotic microorganisms that can live in them, and (iv) taking into account these observations, could any of these planets or satellites meet the conditions of habitability? In this review, the icy worlds are considered from the point of view of astrobiological exploration. With the aim of determining whether icy worlds could be potentially habitable, they have been compared with the environmental features of glaciers and ice sheets on Earth. We also reviewed some field and laboratory investigations about microorganisms that live in analog environments of icy worlds, where they are not only viable but also metabolically active. PMID:28804477
Glaciers and Ice Sheets As Analog Environments of Potentially Habitable Icy Worlds.
Garcia-Lopez, Eva; Cid, Cristina
2017-01-01
Icy worlds in the solar system and beyond have attracted a remarkable attention as possible habitats for life. The current consideration about whether life exists beyond Earth is based on our knowledge of life in terrestrial cold environments. On Earth, glaciers and ice sheets have been considered uninhabited for a long time as they seemed too hostile to harbor life. However, these environments are unique biomes dominated by microbial communities which maintain active biochemical routes. Thanks to techniques such as microscopy and more recently DNA sequencing methods, a great biodiversity of prokaryote and eukaryote microorganisms have been discovered. These microorganisms are adapted to a harsh environment, in which the most extreme features are the lack of liquid water, extremely cold temperatures, high solar radiation and nutrient shortage. Here we compare the environmental characteristics of icy worlds, and the environmental characteristics of terrestrial glaciers and ice sheets in order to address some interesting questions: (i) which are the characteristics of habitability known for the frozen worlds, and which could be compatible with life, (ii) what are the environmental characteristics of terrestrial glaciers and ice sheets that can be life-limiting, (iii) What are the microbial communities of prokaryotic and eukaryotic microorganisms that can live in them, and (iv) taking into account these observations, could any of these planets or satellites meet the conditions of habitability? In this review, the icy worlds are considered from the point of view of astrobiological exploration. With the aim of determining whether icy worlds could be potentially habitable, they have been compared with the environmental features of glaciers and ice sheets on Earth. We also reviewed some field and laboratory investigations about microorganisms that live in analog environments of icy worlds, where they are not only viable but also metabolically active.
Kaga, Chiaki; Okochi, Mina; Tomita, Yasuyuki; Kato, Ryuji; Honda, Hiroyuki
2008-03-01
We developed a method of effective peptide screening that combines experiments and computational analysis. The method is based on the concept that screening efficiency can be enhanced from even limited data by use of a model derived from computational analysis that serves as a guide to screening and combining the model with subsequent repeated experiments. Here we focus on cell-adhesion peptides as a model application of this peptide-screening strategy. Cell-adhesion peptides were screened by use of a cell-based assay of a peptide array. Starting with the screening data obtained from a limited, random 5-mer library (643 sequences), a rule regarding structural characteristics of cell-adhesion peptides was extracted by fuzzy neural network (FNN) analysis. According to this rule, peptides with unfavored residues in certain positions that led to inefficient binding were eliminated from the random sequences. In the restricted, second random library (273 sequences), the yield of cell-adhesion peptides having an adhesion rate more than 1.5-fold to that of the basal array support was significantly high (31%) compared with the unrestricted random library (20%). In the restricted third library (50 sequences), the yield of cell-adhesion peptides increased to 84%. We conclude that a repeated cycle of experiments screening limited numbers of peptides can be assisted by the rule-extracting feature of FNN.
Takeuchi, Fumihiko; Watanabe, Shinya; Baba, Tadashi; Yuzawa, Harumi; Ito, Teruyo; Morimoto, Yuh; Kuroda, Makoto; Cui, Longzhu; Takahashi, Mikio; Ankai, Akiho; Baba, Shin-ichi; Fukui, Shigehiro; Lee, Jean C.; Hiramatsu, Keiichi
2005-01-01
Staphylococcus haemolyticus is an opportunistic bacterial pathogen that colonizes human skin and is remarkable for its highly antibiotic-resistant phenotype. We determined the complete genome sequence of S.haemolyticus to better understand its pathogenicity and evolutionary relatedness to the other staphylococcal species. A large proportion of the open reading frames in the genomes of S.haemolyticus, Staphylococcus aureus, and Staphylococcus epidermidis were conserved in their sequence and order on the chromosome. We identified a region of the bacterial chromosome just downstream of the origin of replication that showed little homology among the species but was conserved among strains within a species. This novel region, designated the “oriC environ,” likely contributes to the evolution and differentiation of the staphylococcal species, since it was enriched for species-specific nonessential genes that contribute to the biological features of each staphylococcal species. A comparative analysis of the genomes of S.haemolyticus, S.aureus, and S.epidermidis elucidated differences in their biological and genetic characteristics and pathogenic potentials. We identified as many as 82 insertion sequences in the S.haemolyticus chromosome that probably mediated frequent genomic rearrangements, resulting in phenotypic diversification of the strain. Such rearrangements could have brought genomic plasticity to this species and contributed to its acquisition of antibiotic resistance. PMID:16237012
Cvetkovska, Marina; Szyszka-Mroz, Beth; Possmayer, Marc; Pittock, Paula; Lajoie, Gilles; Smith, David R; Hüner, Norman P A
2018-05-08
The objective of this work was to characterize photosynthetic ferredoxin from the Antarctic green alga Chlamydomonas sp. UWO241, a key enzyme involved in distributing photosynthetic reducing power. We hypothesize that ferredoxin possesses characteristics typical of cold-adapted enzymes, namely increased structural flexibility and high activity at low temperatures, accompanied by low stability at moderate temperatures. To address this objective, we purified ferredoxin from UWO241 and characterized the temperature dependence of its enzymatic activity and protein conformation. The UWO241 ferredoxin protein, RNA, and DNA sequences were compared with homologous sequences from related organisms. We provide evidence for the duplication of the main ferredoxin gene in the UWO241 nuclear genome and the presence of two highly similar proteins. Ferredoxin from UWO241 has both high activity at low temperatures and high stability at moderate temperatures, representing a novel class of cold-adapted enzymes. Our study reveals novel insights into how photosynthesis functions in the cold. The presence of two distinct ferredoxin proteins in UWO241 could provide an adaptive advantage for survival at cold temperatures. The primary amino acid sequence of ferredoxin is highly conserved among photosynthetic species, and we suggest that subtle differences in sequence can lead to significant changes in activity at low temperatures. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.
Jancusova, Miroslava; Kovacik, Lubomir; Pereira, Antonio Batista; Dusinsky, Roman; Wilmotte, Annick
2016-07-01
The evolutionary relationships of 10 Antarctic cyanobacterial strains of the order Oscillatoriales isolated from King George and Deception Islands, South Shetland Islands were studied by a polyphasic approach (morphology, 16S rRNA and internal transcribed spacer sequences). The studied taxa are characteristic of coastal Antarctic biotopes, where they form distinct populations and ecologically delimited communities. They were isolated from terrestrial habitats: microbial mats in seepages; crusts on soil, rocks, bones and mosses; mud, sometimes close to bird colonies; and from guano. Based on major phenotypic features, the strains were divided into four distinct morphotypes: Leptolyngbya borchgrevinkii (A), Leptolyngbya frigida (B), Microcoleus sp. (C) and Wilmottia murrayi (D). This morphological identification was in agreement with the phylogenetic relationships. For the first time, the 16S rRNA gene sequence of a strain corresponding to the L. borchgrevinkii morphotype was determined. Morphotype B is most related to sequences assigned to L. frigida isolated from microbial mats of coastal lakes in East Antarctica. Morphotype C belongs to a cluster including strains with morphotypes corresponding to Microcoleus attenuatus, Microcoleus favosus and Microcoleus sp., which are from Antarctica and other continents. Morphotype D is grouped with sequences assigned to W. murrayi mostly isolated from Antarctica. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Characteristics common to a cytokine family spanning five orders of insects.
Matsumoto, Hitoshi; Tsuzuki, Seiji; Date-Ito, Atsuko; Ohnishi, Atsushi; Hayakawa, Yoichi
2012-06-01
Growth-blocking peptide (GBP) is a member of an insect cytokine family with diverse functions including growth and immunity controls. Members of this cytokine family have been reported in 15 species of Lepidoptera, and we have recently identified GBP-like peptides in Diptera such as Lucilia cuprina and Drosophila melanogaster, indicating that this peptide family is not specific to Lepidoptera. In order to extend our knowledge of this peptide family, we purified the same family peptide from one of the tenebrionids, Zophobas atratus,(1) isolated its cDNA, and sequenced it. The Z. atratus GBP sequence together with reported sequence data of peptides from the same family enabled us to perform BLAST searches against EST and genome databases of several insect species including Coleoptera, Diptera, Hymenoptera, and Hemiptera and identify homologous peptide genes. Here we report conserved structural features in these sequence data. They consist of 19-30 amino acid residues encoded at the C terminus of a 73-152 amino acid precursor and contain the motif C-x(2)-G-x(4,6)-G-x(1,2)-C-[KR], which shares a certain similarity with the motif in the mammalian EGF peptide family. These data indicate that these small cytokines belonging to one family are present in at least five insect orders. Copyright © 2012 Elsevier Ltd. All rights reserved.
Protein Solvent-Accessibility Prediction by a Stacked Deep Bidirectional Recurrent Neural Network.
Zhang, Buzhong; Li, Linqing; Lü, Qiang
2018-05-25
Residue solvent accessibility is closely related to the spatial arrangement and packing of residues. Predicting the solvent accessibility of a protein is an important step to understand its structure and function. In this work, we present a deep learning method to predict residue solvent accessibility, which is based on a stacked deep bidirectional recurrent neural network applied to sequence profiles. To capture more long-range sequence information, a merging operator was proposed when bidirectional information from hidden nodes was merged for outputs. Three types of merging operators were used in our improved model, with a long short-term memory network performing as a hidden computing node. The trained database was constructed from 7361 proteins extracted from the PISCES server using a cut-off of 25% sequence identity. Sequence-derived features including position-specific scoring matrix, physical properties, physicochemical characteristics, conservation score and protein coding were used to represent a residue. Using this method, predictive values of continuous relative solvent-accessible area were obtained, and then, these values were transformed into binary states with predefined thresholds. Our experimental results showed that our deep learning method improved prediction quality relative to current methods, with mean absolute error and Pearson's correlation coefficient values of 8.8% and 74.8%, respectively, on the CB502 dataset and 8.2% and 78%, respectively, on the Manesh215 dataset.
Belak, Zachery R; Ovsenek, Nicholas; Eskiw, Christopher H
2018-05-23
Yin-Yang 1 (YY1) is a highly conserved transcription factor possessing RNA-binding activity. A putative YY1 homologue was previously identified in the developmental model organism Strongylocentrotus purpuratus (the purple sea urchin) by genomic sequencing. We identified a high degree of sequence similarity with YY1 homologues of vertebrate origin which shared 100% protein sequence identity over the DNA- and RNA-binding zinc-finger region with high similarity in the N-terminal transcriptional activation domain. SpYY1 demonstrated identical DNA- and RNA-binding characteristics between Xenopus laevis and S. purpuratus indicating that it maintains similar functional and biochemical properties across widely divergent deuterostome species. SpYY1 binds to the consensus YY1 DNA element, and also to U-rich RNA sequences. Although we detected SpYY1 RNA-binding activity in ova lysates and observed cytoplasmic localization, SpYY1 was not associated with maternal mRNA in ova. SpYY1 expressed in Xenopus oocytes was excluded from the nucleus and associated with maternally expressed cytoplasmic mRNA molecules. These data demonstrate the existence of an YY1 homologue in S. purpuratus with similar structural and biochemical features to those of the well-studied vertebrate YY1; however, the data reveal major differences in the biological role of YY1 in the regulation of maternally expressed mRNA in the two species.
Domain-specific learning of grammatical structure in musical and phonological sequences.
Bly, Benjamin Martin; Carrión, Ricardo E; Rasch, Björn
2009-01-01
Artificial grammar learning depends on acquisition of abstract structural representations rather than domain-specific representational constraints, or so many studies tell us. Using an artificial grammar task, we compared learning performance in two stimulus domains in which respondents have differing tacit prior knowledge. We found that despite grammatically identical sequence structures, learning was better for harmonically related chord sequences than for letter name sequences or harmonically unrelated chord sequences. We also found transfer effects within the musical and letter name tasks, but not across the domains. We conclude that knowledge acquired in implicit learning depends not only on abstract features of structured stimuli, but that the learning of regularities is in some respects domain-specific and strongly linked to particular features of the stimulus domain.
Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.
Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai
2015-12-01
The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.
SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.
Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue
2018-05-02
Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.
Prediction of phenotypes of missense mutations in human proteins from biological assemblies.
Wei, Qiong; Xu, Qifang; Dunbrack, Roland L
2013-02-01
Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.
The geology and geophysics of Mars
NASA Technical Reports Server (NTRS)
Saunders, R. S.
1976-01-01
The current state of knowledge concerning the regional geology and geophysics of Mars is summarized. Telescopic observations of the planet are reviewed, pre-Mariner models of its interior are discussed, and progress achieved with the Mariner flybys, especially that of Mariner 9, is noted. A map of the Martian geological provinces is presented to provide a summary of the surface geology and morphology. The contrast between the northern and southern hemispheres is pointed out, and the characteristic features of the surface are described in detail. The global topography of the planet is examined along with its gravitational field, gravity anomalies, and moment of inertia. The general sequence of events in Martian geological history is briefly outlined.
Introduction to the Apollo collections: Part 2: Lunar breccias
NASA Technical Reports Server (NTRS)
Mcgee, P. E.; Simonds, C. H.; Warner, J. L.; Phinney, W. C.
1979-01-01
Basic petrographic, chemical and age data for a representative suite of lunar breccias are presented for students and potential lunar sample investigators. Emphasis is on sample description and data presentation. Samples are listed, together with a classification scheme based on matrix texture and mineralogy and the nature and abundance of glass present both in the matrix and as clasts. A calculus of the classification scheme, describes the characteristic features of each of the breccia groups. The cratering process which describes the sequence of events immediately following an impact event is discussed, especially the thermal and material transport processes affecting the two major components of lunar breccias (clastic debris and fused material).
Layered classification techniques for remote sensing applications
NASA Technical Reports Server (NTRS)
Swain, P. H.; Wu, C. L.; Landgrebe, D. A.; Hauska, H.
1975-01-01
The single-stage method of pattern classification utilizes all available features in a single test which assigns the unknown to a category according to a specific decision strategy (such as the maximum likelihood strategy). The layered classifier classifies the unknown through a sequence of tests, each of which may be dependent on the outcome of previous tests. Although the layered classifier was originally investigated as a means of improving classification accuracy and efficiency, it was found that in the context of remote sensing data analysis, other advantages also accrue due to many of the special characteristics of both the data and the applications pursued. The layered classifier method and several of the diverse applications of this approach are discussed.
Predicting DNA hybridization kinetics from sequence
NASA Astrophysics Data System (ADS)
Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu
2018-01-01
Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.
Hepatitis B virus sequencing and liver fibrosis evaluation in HIV/HBV co-infected Nigerians.
Grant, Jennifer; Agbaji, Oche; Kramvis, Anna; Yousif, Mukhlid; Auwal, Mu'azu; Penugonda, Sudhir; Ugoagwu, Placid; Murphy, Robert; Hawkins, Claudia
2017-06-01
Molecular characteristics of hepatitis B virus (HBV), such as genotype and genomic mutations, may contribute to liver-related morbidity and mortality. The association of these characteristics with liver fibrosis severity in sub-Saharan Africa is uncertain. We aimed to characterise molecular HBV features in human immunodeficiency virus (HIV)/HBV co-infected Nigerians and evaluate associations between these characteristics and liver fibrosis severity before and after antiretroviral therapy (ART) initiation. HIV/HBV co-infected Nigerians underwent liver fibrosis estimation by transient elastography (TE) prior to and 36 months after ART initiation. Basal core promoter/precore (BCP/PC) and preS1/preS2/S regions of HBV were sequenced from baseline plasma samples. We evaluated associations between HBV mutations and liver fibrosis severity by univariate and multivariable regression. At baseline, 94 patients underwent TE with median liver stiffness of 6.4 (IQR 4.7-8.7) kPa. Patients were predominantly infected with HBV genotype E (45/46) and HBe-antigen negative (75/94, 79.8%). We identified BCP A1762T/G1764A in 15/35 (43%), PC G1896A in 20/35 (57%), 'a' determinant mutations in 12/45 (26.7%) and preS2 deletions in 6/16 (37.5%). PreS2 mutations were associated with advanced fibrosis in multivariable analysis. At follow-up, median liver stiffness was 5.2 (IQR 4.1-6.6) kPa. No HBV molecular characteristics were associated with lack of fibrosis regression, although HIV virologic control, body mass index (BMI) and baseline CD4+ T-cell count were associated with a decline in fibrosis stage. Frequent BCP/PC and preS1/preS2/S mutations were found in ART-naïve HIV/HBV co-infected Nigerians. Median liver stiffness declined after initiation of ART, regardless of pre-ART HBV mutational pattern or virologic characteristics. © 2017 John Wiley & Sons Ltd.
An Evolutionary Machine Learning Framework for Big Data Sequence Mining
ERIC Educational Resources Information Center
Kamath, Uday Krishna
2014-01-01
Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…
Possible ancient giant basin and related water enrichment in the Arabia Terra province, Mars
Dohm, J.M.; Barlow, N.G.; Anderson, R.C.; Williams, J.-P.; Miyamoto, H.; Ferris, J.C.; Strom, R.G.; Taylor, G.J.; Fairen, A.G.; Baker, V.R.; Boynton, W.V.; Keller, J.M.; Kerry, K.; Janes, D.; Rodriguez, J.A.P.; Hare, T.M.
2007-01-01
A circular albedo feature in the Arabia Terra province was first hypothesized as an ancient impact basin using Viking-era information. To test this unpublished hypothesis, we have analyzed the Viking era-information together with layers of new data derived from the Mars Global Surveyor (MGS) and Mars Odyssey (MO) missions. Our analysis indicates that Arabia Terra is an ancient geologic province of Mars with many distinct characteristics, including predominantly Noachian materials, a unique part of the highland-lowland boundary, a prominent paleotectonic history, the largest region of fretted terrain on the planet, outflow channels with no obvious origins, extensive exposures of eroded layered sedimentary deposits, and notable structural, albedo, thermal inertia, gravity, magnetic, and elemental signatures. The province also is marked by special impact crater morphologies, which suggest a persistent volatile-rich substrate. No one characteristic provides definitive answers to the dominant event(s) that shaped this unique province. Collectively the characteristics reported here support the following hypothesized sequence of events in Arabia Terra: (1) an enormous basin, possibly of impact origin, formed early in martian history when the magnetic dynamo was active and the lithosphere was relatively thin, (2) sediments and other materials were deposited in the basin during high erosion rates while maintaining isostatic equilibrium, (3) sediments became water enriched during the Noachian Period, and (4) basin materials were uplifted in response to the growth of the Tharsis Bulge, resulting in differential erosion exposing ancient stratigraphic sequences. Parts of the ancient basin remain water-enriched to the present day. ?? 2007 Elsevier Inc. All rights reserved.
Federal Register 2010, 2011, 2012, 2013, 2014
2012-05-15
... (EPO) as the lead, to propose a revised standard for the filing of nucleotide and/or amino acid.... ST.25 uses a controlled vocabulary of feature keys to describe nucleic acid and amino acid sequences... patent data purposes. The XML standard also includes four qualifiers for amino acids. These feature keys...
Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A
2017-01-01
RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.
Pombert, Jean-François; Lemieux, Claude; Turmel, Monique
2006-01-01
Background The phylum Chlorophyta contains the majority of the green algae and is divided into four classes. The basal position of the Prasinophyceae has been well documented, but the divergence order of the Ulvophyceae, Trebouxiophyceae and Chlorophyceae is currently debated. The four complete chloroplast DNA (cpDNA) sequences presently available for representatives of these classes have revealed extensive variability in overall structure, gene content, intron composition and gene order. The chloroplast genome of Pseudendoclonium (Ulvophyceae), in particular, is characterized by an atypical quadripartite architecture that deviates from the ancestral type by a large inverted repeat (IR) featuring an inverted rRNA operon and a small single-copy (SSC) region containing 14 genes normally found in the large single-copy (LSC) region. To gain insights into the nature of the events that led to the reorganization of the chloroplast genome in the Ulvophyceae, we have determined the complete cpDNA sequence of Oltmannsiellopsis viridis, a representative of a distinct, early diverging lineage. Results The 151,933 bp IR-containing genome of Oltmannsiellopsis differs considerably from Pseudendoclonium and other chlorophyte cpDNAs in intron content and gene order, but shares close similarities with its ulvophyte homologue at the levels of quadripartite architecture, gene content and gene density. Oltmannsiellopsis cpDNA encodes 105 genes, contains five group I introns, and features many short dispersed repeats. As in Pseudendoclonium cpDNA, the rRNA genes in the IR are transcribed toward the single copy region featuring the genes typically found in the ancestral LSC region, and the opposite single copy region harbours genes characteristic of both the ancestral SSC and LSC regions. The 52 genes that were transferred from the ancestral LSC to SSC region include 12 of those observed in Pseudendoclonium cpDNA. Surprisingly, the overall gene organization of Oltmannsiellopsis cpDNA more closely resembles that of Chlorella (Trebouxiophyceae) cpDNA. Conclusion The chloroplast genome of the last common ancestor of Oltmannsiellopsis and Pseudendoclonium contained a minimum of 108 genes, carried only a few group I introns, and featured a distinctive quadripartite architecture. Numerous changes were experienced by the chloroplast genome in the lineages leading to Oltmannsiellopsis and Pseudendoclonium. Our comparative analyses of chlorophyte cpDNAs support the notion that the Ulvophyceae is sister to the Chlorophyceae. PMID:16472375
Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin
2017-11-15
An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.
[Prosody, speech input and language acquisition].
Jungheim, M; Miller, S; Kühn, D; Ptok, M
2014-04-01
In order to acquire language, children require speech input. The prosody of the speech input plays an important role. In most cultures adults modify their code when communicating with children. Compared to normal speech this code differs especially with regard to prosody. For this review a selective literature search in PubMed and Scopus was performed. Prosodic characteristics are a key feature of spoken language. By analysing prosodic features, children gain knowledge about underlying grammatical structures. Child-directed speech (CDS) is modified in a way that meaningful sequences are highlighted acoustically so that important information can be extracted from the continuous speech flow more easily. CDS is said to enhance the representation of linguistic signs. Taking into consideration what has previously been described in the literature regarding the perception of suprasegmentals, CDS seems to be able to support language acquisition due to the correspondence of prosodic and syntactic units. However, no findings have been reported, stating that the linguistically reduced CDS could hinder first language acquisition.
DNA replication origins—where do we begin?
Prioleau, Marie-Noëlle; MacAlpine, David M.
2016-01-01
For more than three decades, investigators have sought to identify the precise locations where DNA replication initiates in mammalian genomes. The development of molecular and biochemical approaches to identify start sites of DNA replication (origins) based on the presence of defining and characteristic replication intermediates at specific loci led to the identification of only a handful of mammalian replication origins. The limited number of identified origins prevented a comprehensive and exhaustive search for conserved genomic features that were capable of specifying origins of DNA replication. More recently, the adaptation of origin-mapping assays to genome-wide approaches has led to the identification of tens of thousands of replication origins throughout mammalian genomes, providing an unprecedented opportunity to identify both genetic and epigenetic features that define and regulate their distribution and utilization. Here we summarize recent advances in our understanding of how primary sequence, chromatin environment, and nuclear architecture contribute to the dynamic selection and activation of replication origins across diverse cell types and developmental stages. PMID:27542827
Leng, Yinglin; Liu, Yuhe; Fang, Xiaojing; Li, Yao; Yu, Lei; Yuan, Yun; Wang, Zhaoxia
2015-04-01
Mitochondrial encephalomyopathy with lactic acidosis and stroke-like episodes/Leigh (MELAS/LS) overlap syndrome is a mitochondrial disorder subtype with clinical and magnetic resonance imaging (MRI) features that are characteristic of both MELAS and Leigh syndrome (LS). Here, we report an MELAS/LS case presenting with cortical deafness and seizures. Cranial MRI revealed multiple lesions involving bilateral temporal lobes, the basal ganglia and the brainstem, which conformed to neuroimaging features of both MELAS and LS. Whole mitochondrial DNA (mtDNA) sequencing and PCR-RFLP revealed a de novo heteroplasmic m.10197 G > A mutation in the NADH dehydrogenase subunit 3 gene (ND3), which was predicted to cause an alanine to threonine substitution at amino acid 47. Although the mtDNA m.10197 G > A mutation has been reported in association with LS, Leber hereditary optic neuropathy and dystonia, it has never been linked with MELAS/LS overlap syndrome. Our patient therefore expands the phenotypic spectrum of the mtDNA m.10197 G > A mutation.
Direct Evidence of Intrinsic Blue Fluorescence from Oligomeric Interfaces of Human Serum Albumin.
Bhattacharya, Arpan; Bhowmik, Soumitra; Singh, Amit K; Kodgire, Prashant; Das, Apurba K; Mukherjee, Tushar Kanti
2017-10-10
The molecular origin behind the concentration-dependent intrinsic blue fluorescence of human serum albumin (HSA) is not known yet. This unusual blue fluorescence is believed to be a characteristic feature of amyloid-like fibrils of protein/peptide and originates due to the delocalization of peptide bond electrons through the extended hydrogen bond networks of cross-β-sheet structure. Herein, by combining the results of spectroscopy, size exclusion chromatography, native gel electrophoresis, and confocal microscopy, we have shown that the intrinsic blue fluorescence of HSA exclusively originates from oligomeric interfaces devoid of any amyloid-like fibrillar structure. Our study suggests that this low energy fluorescence band is not due to any particular residue/sequence, but rather it is a common feature of self-assembled peptide bonds. The present findings of intrinsic blue fluorescence from oligomeric interfaces pave the way for future applications of this unique visual phenomenon for early stage detection of various protein aggregation related human diseases.
Low, Karen J; Ansari, Morad; Abou Jamra, Rami; Clarke, Angus; El Chehadeh, Salima; FitzPatrick, David R; Greenslade, Mark; Henderson, Alex; Hurst, Jane; Keller, Kory; Kuentz, Paul; Prescott, Trine; Roessler, Franziska; Selmer, Kaja K; Schneider, Michael C; Stewart, Fiona; Tatton-Brown, Katrina; Thevenon, Julien; Vigeland, Magnus D; Vogt, Julie; Willems, Marjolaine; Zonana, Jonathan; Study, D D D; Smithson, Sarah F
2017-01-01
PUF60 encodes a nucleic acid-binding protein, a component of multimeric complexes regulating RNA splicing and transcription. In 2013, patients with microdeletions of chromosome 8q24.3 including PUF60 were found to have developmental delay, microcephaly, craniofacial, renal and cardiac defects. Very similar phenotypes have been described in six patients with variants in PUF60, suggesting that it underlies the syndrome. We report 12 additional patients with PUF60 variants who were ascertained using exome sequencing: six through the Deciphering Developmental Disorders Study and six through similar projects. Detailed phenotypic analysis of all patients was undertaken. All 12 patients had de novo heterozygous PUF60 variants on exome analysis, each confirmed by Sanger sequencing: four frameshift variants resulting in premature stop codons, three missense variants that clustered within the RNA recognition motif of PUF60 and five essential splice-site (ESS) variant. Analysis of cDNA from a fibroblast cell line derived from one of the patients with an ESS variants revealed aberrant splicing. The consistent feature was developmental delay and most patients had short stature. The phenotypic variability was striking; however, we observed similarities including spinal segmentation anomalies, congenital heart disease, ocular colobomata, hand anomalies and (in two patients) unilateral renal agenesis/horseshoe kidney. Characteristic facial features included micrognathia, a thin upper lip and long philtrum, narrow almond-shaped palpebral fissures, synophrys, flared eyebrows and facial hypertrichosis. Heterozygote loss-of-function variants in PUF60 cause a phenotype comprising growth/developmental delay and craniofacial, cardiac, renal, ocular and spinal anomalies, adding to disorders of human development resulting from aberrant RNA processing/spliceosomal function. PMID:28327570
Marzoug, Douniazed; Rima, Mohamed; Boutiba, Zitouni; Georgieva, Simona; Kostadinova, Aneta; Pérez-del-Olmo, Ana
2014-02-01
A new hemiurid digenean, Saturnius gibsoni n. sp., is described from the stomach lining of Mugil cephalus L. off Oran, Mediterranean coast of Algeria. Characteristic morphological features of the new species include small size of the body which is comprised of six pseudosegments, small ventral sucker, weakly developed mound-shaped flange at the level of the ventral sucker, and eggs being large in relation to the size of the body. Saturnius gibsoni n. sp. resembles S. minutus Blasco-Costa, Pankov, Gibson, Balbuena, Raga, Sarabeev & Kostadinova, 2006 and two unidentified Saturnius spp. in the small size of the body and most metrical features. However, in spite of the presence of five transverse septa resulting in six pseudosegments and the range overlap of some metrical features, the ventral sucker in S. minutus is much larger, the ventral sucker muscular flange is more prominent, the last pseudosegment is narrower in relation to body width and more rounded, and the eggs are smaller (mean 21 × 10 vs 25 × 12 μm). Furthermore, the partial sequences of the 28S rRNA gene region (domains D1-D3; 1,195 nt) obtained from two isolates of S. gibsoni n. sp. differed by 11 nt (0.9%) from that of S. minutus. Both unidentified forms of Saturnius are clearly distinguishable from S. gibsoni n. sp. by the presence of six stout, transverse muscular septa, forming seven pseudosegments (vs five septa forming six pseudosegments). Bayesian inference analysis of partial 28S rDNA sequences based on a total of 15 species from the families Hemiuridae and Lecithasteridae depicted the Bunocotylinae Dollfus, 1950 as a strongly supported basal clade, with Bunocotyle progenetica (Markowski, 1936) as the closest sister taxon to Saturnius spp.
Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features
Mohammad-Noori, Morteza; Beer, Michael A.
2014-01-01
Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408
Enhanced regulatory sequence prediction using gapped k-mer features.
Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A
2014-07-01
Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.
Application of genetic algorithm in integrated setup planning and operation sequencing
NASA Astrophysics Data System (ADS)
Kafashi, Sajad; Shakeri, Mohsen
2011-01-01
Process planning is an essential component for linking design and manufacturing process. Setup planning and operation sequencing is two main tasks in process planning. Many researches solved these two problems separately. Considering the fact that the two functions are complementary, it is necessary to integrate them more tightly so that performance of a manufacturing system can be improved economically and competitively. This paper present a generative system and genetic algorithm (GA) approach to process plan the given part. The proposed approach and optimization methodology analyses the TAD (tool approach direction), tolerance relation between features and feature precedence relations to generate all possible setups and operations using workshop resource database. Based on these technological constraints the GA algorithm approach, which adopts the feature-based representation, optimizes the setup plan and sequence of operations using cost indices. Case study show that the developed system can generate satisfactory results in optimizing the setup planning and operation sequencing simultaneously in feasible condition.
Sherchan, Jatan Bahadur; Miyoshi-Akiyama, Tohru; Ohmagari, Norio; Kirikae, Teruo; Nagamatsu, Maki; Tojo, Masayoshi; Ohara, Hiroshi; Sherchand, Jeevan B.; Tandukar, Sarmila
2015-01-01
Recently, CTX-M-type extended-spectrum-β-lactamase (ESBL)-producing Escherichia coli strains have emerged worldwide. In particular, E. coli with O antigen type 25 (O25) and sequence type 131 (ST131), which is often associated with the CTX-M-15 ESBL, has been increasingly reported globally; however, epidemiology reports on ESBL-producing E. coli in Asia are limited. Patients with clinical isolates of ESBL-producing E. coli in the Tribhuvan University teaching hospital in Kathmandu, Nepal, were included in this study. Whole-genome sequencing of the isolates was conducted to analyze multilocus sequence types, phylotypes, virulence genotypes, O25b-ST131 clones, and distribution of acquired drug resistance genes. During the study period, 105 patients with ESBL-producing E. coli isolation were identified, and the majority (90%) of these isolates were CTX-M-15 positive. The most dominant ST was ST131 (n = 54; 51.4%), followed by ST648 (n = 15; 14.3%). All ST131 isolates were identified as O25b-ST131 clones, subclone H30-Rx. Three ST groups (ST131, ST648, and non-ST131/648) were compared in further analyses. ST648 isolates had a proportionally higher resistance to non-β-lactam antibiotics and featured drug-resistant genes more frequently than ST131 or non-ST131/648 isolates. ST131 possessed the most virulence genes, followed by ST648. The clinical characteristics were similar among groups. More than 38% of ESBL-producing E. coli isolates were from the outpatient clinic, and pregnant patients comprised 24% of ESBL-producing E. coli cases. We revealed that the high resistance of ESBL-producing E. coli to multiple classes of antibiotics in Nepal is driven mainly by CTX-M-producing ST131 and ST648. Their immense prevalence in the communities is a matter of great concern. PMID:25824221
Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity
Elias-Kirma, Shani; Nir, Ronit; Segal, Eran
2017-01-01
Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394
PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection
O’Halloran, Damien M.
2016-01-01
Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing, and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR, and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features, and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases, and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction. PMID:26853558
The imaging features of the meniscal roots on isotropic 3D MRI in young asymptomatic volunteers.
Wang, Ping; Zhang, Cheng-Zhou; Zhang, Di; Liu, Quan-Yuan; Zhong, Xiao-Fei; Yin, Zhi-Jie; Wang, Bin
2018-05-01
This study aimed to describe clearly the normal imaging features of the meniscal roots on the magnetic resonance imaging (MRI) with a 3-dimensional (3D) proton density-weighted (PDW) sequence at 3T. A total of 60 knees of 31 young asymptomatic volunteers were examined using a 3D MRI. The insertion patterns, constitution patterns, and MR signals of the meniscal roots were recorded. The anterior root of the medial meniscus (ARMM), the anterior root of the lateral meniscus (ARLM), and the posterior root of the medial meniscus (PRMM) had 1 insertion site, whereas the posterior root of the lateral meniscus (PRLM) can be divided into major and minor insertion sites. The ARLM and the PRMM usually consisted of multiple fiber bundles (≥3), whereas the ARMM and the PRLM often consisted of a single fiber bundle. The ARMM and the PRLM usually appeared as hypointense, whereas the ARLM and the PRMM typically exhibited mixed signals. The meniscal roots can be complex and diverse, and certain characteristics of them were observed on 3D MRI. Understanding the normal imaging features of the meniscal roots is extremely beneficial for further diagnosis of root tears.
Bejiqi, Ramush; Retkoceri, Ragip; Bejiqi, Hana; Zeka, Naim
2015-01-01
First time described in 1912, from Maurice Klippel and Andre Feil independently, Klippel-Feil syndrome (synonyms: cervical vertebra fusion syndrome, Klippel-Feil deformity, Klippel-Feil sequence disorder) is a bone disorder characterized by the abnormal joining (fusion) of two or more spinal bones in the neck (cervical vertebrae), which is present from birth. Three major features result from this abnormality: a short neck, a limited range of motion in the neck, and a low hairline at the back of the head. Most affected people have one or two of these characteristic features. Less than half of all individuals with Klippel-Feil syndrome have all three classic features of this condition. Since first classification from Feil in three categories (I – III) other classification systems have been advocated to describe the anomalies, predict the potential problems, and guide treatment decisions. Patients with Klippel-Feil syndrome usually present with the disease during childhood, but may present later in life. The challenge to the clinician is to recognize the associated anomalies that can occur with Klippel-Feil syndrome and to perform the appropriate workup for diagnosis. PMID:27275209
NASA Technical Reports Server (NTRS)
Zhang, Zhengdong; Willson, Richard C.; Fox, George E.
2002-01-01
MOTIVATION: The phylogenetic structure of the bacterial world has been intensively studied by comparing sequences of 16S ribosomal RNA (16S rRNA). This database of sequences is now widely used to design probes for the detection of specific bacteria or groups of bacteria one at a time. The success of such methods reflects the fact that there are local sequence segments that are highly characteristic of particular organisms or groups of organisms. It is not clear, however, the extent to which such signature sequences exist in the 16S rRNA dataset. A better understanding of the numbers and distribution of highly informative oligonucleotide sequences may facilitate the design of hybridization arrays that can characterize the phylogenetic position of an unknown organism or serve as the basis for the development of novel approaches for use in bacterial identification. RESULTS: A computer-based algorithm that characterizes the extent to which any individual oligonucleotide sequence in 16S rRNA is characteristic of any particular bacterial grouping was developed. A measure of signature quality, Q(s), was formulated and subsequently calculated for every individual oligonucleotide sequence in the size range of 5-11 nucleotides and for 15mers with reference to each cluster and subcluster in a 929 organism representative phylogenetic tree. Subsequently, the perfect signature sequences were compared to the full set of 7322 sequences to see how common false positives were. The work completed here establishes beyond any doubt that highly characteristic oligonucleotides exist in the bacterial 16S rRNA sequence dataset in large numbers. Over 16,000 15mers were identified that might be useful as signatures. Signature oligonucleotides are available for over 80% of the nodes in the representative tree.
Sequence Complexity of Chromosome 3 in Caenorhabditis elegans
Pierro, Gaetano
2012-01-01
The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380
Farkas, Szilvia L; Benko, Mária; Elo, Péter; Ursu, Krisztina; Dán, Adám; Ahne, Winfried; Harrach, Balázs
2002-10-01
Approximately 60% of the genome of an adenovirus isolated from a corn snake (Elaphe guttata) was cloned and sequenced. The results of homology searches showed that the genes of the corn snake adenovirus (SnAdV-1) were closest to their counterparts in members of the recently proposed new genus ATADENOVIRUS: In phylogenetic analyses of the complete hexon and protease genes, SnAdV-1 indeed clustered together with the atadenoviruses. The characteristic features in the genome organization of SnAdV-1 included the presence of a gene homologous to that for protein p32K, the lack of structural proteins V and IX and the absence of homologues of the E1A and E3 regions. These characteristics are in accordance with the genus-defining markers of atadenoviruses. Comparison of the cleavage sites of the viral protease in core protein pVII also confirmed SnAdV-1 as a candidate member of the genus ATADENOVIRUS: Thus, the hypothesis on the possible reptilian origin of atadenoviruses (Harrach, Acta Veterinaria Hungarica 48, 484-490, 2000) seems to be supported. However, the base composition of DNA sequence (>18 kb) determined from the SnAdV-1 genome showed an equilibrated GC content of 51%, which is unusual for an atadenovirus.
Mid-Permian Phosphoria Sea in Nevada and the Upwelling Model
Ketner, Keith B.
2009-01-01
The Phosphoria Sea extended at least 500 km westward and at least 700 km southwestward from its core area centered in southeastern Idaho. Throughout that extent it displayed many characteristic features of the core: the same fauna, the same unique sedimentary assemblage including phosphate in mostly pelletal form, chert composed mainly of sponge spicules, and an association with dolomite. Phosphoria-age sediments in Nevada display ample evidence of deposition in shallow water. The chief difference between the sediments in Nevada and those of the core area is the greater admixture of sandstone and conglomerate in Nevada. Evidence of the western margin of the Phosphoria Sea where the water deepened and began to lose its essential characteristics is located in the uppermost part of the Upper Devonian to Permian Havallah sequence, which has been displaced tectonically eastward an unknown distance. The relatively deep water in which the mid-Permian part of the Havallah was deposited was a sea of probably restricted east-west width and was floored by a very thick sequence of mainly terrigenous sedimentary rocks. The phosphate content of mid-Permian strata in western exposures tends to be relatively low as a percentage, but the thickness of those strata tends to be high. The core area in and near southeastern Idaho where the concentration of phosphate is highest was separated from any possible site of upwelling oceanic waters by a great expanse of shallow sea.
NASA Astrophysics Data System (ADS)
Wagner, Martin G.; Laeseke, Paul F.; Schubert, Tilman; Slagowski, Jordan M.; Speidel, Michael A.; Mistretta, Charles A.
2017-03-01
Fluoroscopic image guidance for minimally invasive procedures in the thorax and abdomen suffers from respiratory and cardiac motion, which can cause severe subtraction artifacts and inaccurate image guidance. This work proposes novel techniques for respiratory motion tracking in native fluoroscopic images as well as a model based estimation of vessel deformation. This would allow compensation for respiratory motion during the procedure and therefore simplify the workflow for minimally invasive procedures such as liver embolization. The method first establishes dynamic motion models for both the contrast-enhanced vasculature and curvilinear background features based on a native (non-contrast) and a contrast-enhanced image sequence acquired prior to device manipulation, under free breathing conditions. The model of vascular motion is generated by applying the diffeomorphic demons algorithm to an automatic segmentation of the subtraction sequence. The model of curvilinear background features is based on feature tracking in the native sequence. The two models establish the relationship between the respiratory state, which is inferred from curvilinear background features, and the vascular morphology during that same respiratory state. During subsequent fluoroscopy, curvilinear feature detection is applied to determine the appropriate vessel mask to display. The result is a dynamic motioncompensated vessel mask superimposed on the fluoroscopic image. Quantitative evaluation of the proposed methods was performed using a digital 4D CT-phantom (XCAT), which provides realistic human anatomy including sophisticated respiratory and cardiac motion models. Four groups of datasets were generated, where different parameters (cycle length, maximum diaphragm motion and maximum chest expansion) were modified within each image sequence. Each group contains 4 datasets consisting of the initial native and contrast enhanced sequences as well as a sequence, where the respiratory motion is tracked. The respiratory motion tracking error was between 1.00 % and 1.09 %. The estimated dynamic vessel masks yielded a Sørensen-Dice coefficient between 0.94 and 0.96. Finally, the accuracy of the vessel contours was measured in terms of the 99th percentile of the error, which ranged between 0.64 and 0.96 mm. The presented results show that the approach is feasible for respiratory motion tracking and compensation and could therefore considerably improve the workflow of minimally invasive procedures in the thorax and abdomen
Automatic Spatio-Temporal Flow Velocity Measurement in Small Rivers Using Thermal Image Sequences
NASA Astrophysics Data System (ADS)
Lin, D.; Eltner, A.; Sardemann, H.; Maas, H.-G.
2018-05-01
An automatic spatio-temporal flow velocity measurement approach, using an uncooled thermal camera, is proposed in this paper. The basic principle of the method is to track visible thermal features at the water surface in thermal camera image sequences. Radiometric and geometric calibrations are firstly implemented to remove vignetting effects in thermal imagery and to get the interior orientation parameters of the camera. An object-based unsupervised classification approach is then applied to detect the interest regions for data referencing and thermal feature tracking. Subsequently, GCPs are extracted to orient the river image sequences and local hot points are identified as tracking features. Afterwards, accurate dense tracking outputs are obtained using pyramidal Lucas-Kanade method. To validate the accuracy potential of the method, measurements obtained from thermal feature tracking are compared with reference measurements taken by a propeller gauge. Results show a great potential of automatic flow velocity measurement in small rivers using imagery from a thermal camera.
ERIC Educational Resources Information Center
Moeschler, Jacques
1981-01-01
Analyzes the strategies employed in terminating conversational exchanges, with particular attention to argumentative sequences. Examines the features that distinguish these sequences from those that have a transactional character, and discusses the patterns of verbal interaction attendant to negative responses. Societe Nouvelle Didier Erudition,…
Hand gesture recognition by analysis of codons
NASA Astrophysics Data System (ADS)
Ramachandra, Poornima; Shrikhande, Neelima
2007-09-01
The problem of recognizing gestures from images using computers can be approached by closely understanding how the human brain tackles it. A full fledged gesture recognition system will substitute mouse and keyboards completely. Humans can recognize most gestures by looking at the characteristic external shape or the silhouette of the fingers. Many previous techniques to recognize gestures dealt with motion and geometric features of hands. In this thesis gestures are recognized by the Codon-list pattern extracted from the object contour. All edges of an image are described in terms of sequence of Codons. The Codons are defined in terms of the relationship between maxima, minima and zeros of curvature encountered as one traverses the boundary of the object. We have concentrated on a catalog of 24 gesture images from the American Sign Language alphabet (Letter J and Z are ignored as they are represented using motion) [2]. The query image given as an input to the system is analyzed and tested against the Codon-lists, which are shape descriptors for external parts of a hand gesture. We have used the Weighted Frequency Indexing Transform (WFIT) approach which is used in DNA sequence matching for matching the Codon-lists. The matching algorithm consists of two steps: 1) the query sequences are converted to short sequences and are assigned weights and, 2) all the sequences of query gestures are pruned into match and mismatch subsequences by the frequency indexing tree based on the weights of the subsequences. The Codon sequences with the most weight are used to determine the most precise match. Once a match is found, the identified gesture and corresponding interpretation are shown as output.
2012-01-01
Background Chaos Game Representation (CGR) is an iterated function that bijectively maps discrete sequences into a continuous domain. As a result, discrete sequences can be object of statistical and topological analyses otherwise reserved to numerical systems. Characteristically, CGR coordinates of substrings sharing an L-long suffix will be located within 2-L distance of each other. In the two decades since its original proposal, CGR has been generalized beyond its original focus on genomic sequences and has been successfully applied to a wide range of problems in bioinformatics. This report explores the possibility that it can be further extended to approach algorithms that rely on discrete, graph-based representations. Results The exploratory analysis described here consisted of selecting foundational string problems and refactoring them using CGR-based algorithms. We found that CGR can take the role of suffix trees and emulate sophisticated string algorithms, efficiently solving exact and approximate string matching problems such as finding all palindromes and tandem repeats, and matching with mismatches. The common feature of these problems is that they use longest common extension (LCE) queries as subtasks of their procedures, which we show to have a constant time solution with CGR. Additionally, we show that CGR can be used as a rolling hash function within the Rabin-Karp algorithm. Conclusions The analysis of biological sequences relies on algorithmic foundations facing mounting challenges, both logistic (performance) and analytical (lack of unifying mathematical framework). CGR is found to provide the latter and to promise the former: graph-based data structures for sequence analysis operations are entailed by numerical-based data structures produced by CGR maps, providing a unifying analytical framework for a diversity of pattern matching problems. PMID:22551152
Taxonomic evaluation of selected Ganoderma species and database sequence validation
Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon
2017-01-01
Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785
A Unified Dynamic Model for Learning, Replay, and Sharp-Wave/Ripples.
Jahnke, Sven; Timme, Marc; Memmesheimer, Raoul-Martin
2015-12-09
Hippocampal activity is fundamental for episodic memory formation and consolidation. During phases of rest and sleep, it exhibits sharp-wave/ripple (SPW/R) complexes, which are short episodes of increased activity with superimposed high-frequency oscillations. Simultaneously, spike sequences reflecting previous behavior, such as traversed trajectories in space, are replayed. Whereas these phenomena are thought to be crucial for the formation and consolidation of episodic memory, their neurophysiological mechanisms are not well understood. Here we present a unified model showing how experience may be stored and thereafter replayed in association with SPW/Rs. We propose that replay and SPW/Rs are tightly interconnected as they mutually generate and support each other. The underlying mechanism is based on the nonlinear dendritic computation attributable to dendritic sodium spikes that have been prominently found in the hippocampal regions CA1 and CA3, where SPW/Rs and replay are also generated. Besides assigning SPW/Rs a crucial role for replay and thus memory processing, the proposed mechanism also explains their characteristic features, such as the oscillation frequency and the overall wave form. The results shed a new light on the dynamical aspects of hippocampal circuit learning. During phases of rest and sleep, the hippocampus, the "memory center" of the brain, generates intermittent patterns of strongly increased overall activity with high-frequency oscillations, the so-called sharp-wave/ripples. We investigate their role in learning and memory processing. They occur together with replay of activity sequences reflecting previous behavior. Developing a unifying computational model, we propose that both phenomena are tightly linked, by mutually generating and supporting each other. The underlying mechanism depends on nonlinear amplification of synchronous inputs that has been prominently found in the hippocampus. Besides assigning sharp-wave/ripples a crucial role for replay generation and thus memory processing, the proposed mechanism also explains their characteristic features, such as the oscillation frequency and the overall wave form. Copyright © 2015 the authors 0270-6474/15/3516236-23$15.00/0.
Schwarzhans, Jan-Philipp; Wibberg, Daniel; Winkler, Anika; Luttermann, Tobias; Kalinowski, Jörn; Friehs, Karl
2016-05-20
The classic AOX1 replacement approach is still one of the most often used techniques for expression of recombinant proteins in the methylotrophic yeast Pichia pastoris. Although this approach is largely successful, it frequently delivers clones with unpredicted production characteristics and a work-intense screening process is required to find the strain with desired productivity. In this project 845 P. pastoris clones, transformed with a GFP expression cassette, were analyzed for their methanol-utilization (Mut)-phenotypes, GFP gene expression levels and gene copy numbers. Several groups of strains with irregular features were identified. Such features include GFP expression that is markedly higher or lower than expected based on gene copy number as well as strains that grew under selective conditions but where the GFP gene cassette and its expression could not be detected. From these classes of strains 31 characteristic clones were selected and their genomes sequenced. By correlating the assembled genome data with the experimental phenotypes novel insights were obtained. These comprise a clear connection between productivity and cassette-to-cassette orientation in the genome, the occurrence of false-positive clones due to a secondary recombination event, and lower total productivity due to the presence of untransformed cells within the isolates were discovered. To cope with some of these problems, the original vector was optimized by replacing the AOX1 terminator, preventing the occurrence of false-positive clones due to the secondary recombination event. Standard methods for transformation of P. pastoris led to a multitude of unintended and sometimes detrimental integration events, lowering total productivity. By documenting the connections between productivity and integration event we obtained a deeper understanding of the genetics of mutation in P. pastoris. These findings and the derived improved mutagenesis and transformation procedures and tools will help other scientists working on recombinant protein production in P. pastoris and similar non-conventional yeasts.
Web Apollo: a web-based genomic annotation editing platform.
Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E
2013-08-30
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.
Web Apollo: a web-based genomic annotation editing platform
2013-01-01
Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world. PMID:24000942
Van Gool, Inge C; Ubachs, Jef E H; Stelloo, Ellen; de Kroon, Cor D; Goeman, Jelle J; Smit, Vincent T H B M; Creutzberg, Carien L; Bosse, Tjalling
2018-01-01
POLE exonuclease domain mutations identify a subset of endometrial cancer (EC) patients with an excellent prognosis. The use of this biomarker has been suggested to refine adjuvant treatment decisions, but the necessary sequencing is not widely performed and is relatively expensive. Therefore, we aimed to identify histopathological and immunohistochemical characteristics to aid in the detection of POLE-mutant ECs. Fifty-one POLE-mutant endometrioid, 67 POLE-wild-type endometrioid and 15 POLE-wild-type serous ECs were included (total N = 133). An expert gynaecopathologist, blinded to molecular features, evaluated each case (two or more slides) for 16 morphological characteristics. Immunohistochemistry was performed for p53, p16, MLH1, MSH2, MSH6, and PMS2. POLE-mutant ECs were characterised by a prominent immune infiltrate: 80% showed peritumoral lymphocytes and 59% showed tumour-infiltrating lymphocytes, as compared with 43% and 28% of POLE-wild-type endometrioid ECs, and 27% and 13% of their serous counterparts (P < 0.01, all comparisons). Of POLE-mutant ECs, 33% contained tumour giant cells; this proportion was significantly higher than that in POLE-wild-type endometrioid ECs (10%; P = 0.003), but not significantly different from that in serous ECs (53%). Serous-like features were as often (focally) present in POLE-mutant as in POLE-wild-type endometrioid ECs (6-24%, depending on the feature). The majority of POLE-mutant ECs showed wild-type p53 (86%), negative/focal p16 (82%) and normal mismatch repair protein expression (90%). A simple combination of morphological and immunohistochemical characteristics (tumour type, grade, peritumoral lymphocytes, MLH1, and p53 expression) can assist in prescreening for POLE exonuclease domain mutations in EC, increasing the probability of a mutation being detected from 7% to 33%. This facilitates the use of this important prognostic biomarker in routine pathology. © 2017 John Wiley & Sons Ltd.
iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties
Feng, Peng-Mian; Ding, Chen; Zuo, Yong-Chun; Chou, Kuo-Chen
2012-01-01
Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at http://lin.uestc.edu.cn/server/iNuc-PhysChem. For the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented just for the integrity in developing the predictor. Meanwhile, for those who prefer to run predictions in their own computers, the predictor's code can be easily downloaded from the web-server. It is anticipated that iNuc-PhysChem may become a useful high throughput tool for both basic research and drug design. PMID:23144709
Hayat, Maqsood; Khan, Asifullah
2011-02-21
Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.
Tramontano, A; Macchiato, M F
1986-01-01
An algorithm to determine the probability that a reading frame codifies for a protein is presented. It is based on the results of our previous studies on the thermodynamic characteristics of a translated reading frame. We also develop a prediction procedure to distinguish between coding and non-coding reading frames. The procedure is based on the characteristics of the putative product of the DNA sequence and not on periodicity characteristics of the sequence, so the prediction is not biased by the presence of overlapping translated reading frames or by the presence of translated reading frames on the complementary DNA strand. PMID:3753761
NASA Astrophysics Data System (ADS)
Wassmer, Patrick; Gomez, Christopher; Iskandasyah, T. Yan W. M.; Lavigne, Franck; Sartohadi, Junun
2015-07-01
One of the main concerns of deciphering tsunami sedimentary records along seashore is to link the emplaced layers with marine high energy events. Based on a combination of morphologic features, sedimentary figures, grain size characteristics, fossils content, microfossils assemblages, geochemical elements, heavy minerals presence; it is, in principle, possible to relate the sedimentary record to a tsunami event. However, experience shows that sometimes, in reason of a lack of any visible sedimentary features, it is hard to decide between a storm and a tsunami origin. To solve this issue, the authors have used the Anisotropy of Magnetic Susceptibility (AMS) to evidence the sediment fabric. The validity of the method for reconstructing flow direction has been proved when applied on sediments in the aftermath of a tsunami event, for which the behaviour was well documented (2004 IOT). We present herein an application of this method for a 56 cm thick paleo-deposit dated 4220 BP laying under the soil covered by the 2004 IOT, SE of Banda Aceh, North Sumatra. We analysed this homogenous deposit, lacking of any visible structure, using methods of classic sedimentology to confirm the occurrence of a high energy event. We then applied AMS technique that allowed the reconstruction of flow characteristics during sediment emplacement. We show that all the sequence was emplaced by uprush phases and that the local topography played a role on the re-orientation of a part of the uprush flow, creating strong reverse current. This particular behaviour was reported by eyewitnesses during the 2004 IOT event.
Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning
2014-01-01
X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528
Molecular identification based on ITS sequences for Kappaphycus and Eucheuma cultivated in China
NASA Astrophysics Data System (ADS)
Zhao, Sufen; He, Peimin
2011-11-01
The systematic classification of the Eucheumatoideae is difficult because of their variable morphology and interpretation of reproductive structures. Kappaphycus and Eucheuma specimens cultivated on the Hainan and Fujian coast of China were introduced from Vietnam, the Philippines and Indonesia. Combined with morphological characteristics, all Kappaphycus and Eucheuma cultivated strains were identified by internal transcribed spacer (ITS) sequences. The phylogenetic tree was constructed using neighbor-joining and maximum likelihood methods. The results indicate that different ITS sequence lengths occurred in the different genera and species. An obvious difference in morphology could be found in the protuberance shape between Kappaphycus and Eucheuma. The protuberance in Eucheuma was thorn-like and in Kappaphycus was wartlike or papillate. Their ITS sequence lengths differed significantly in nucleotide variation rates up to 58.55%-63.90%. All nucleotide variations occurred in the ITS1 and ITS2 regions except for five nucleotide transversions in the 5.8S rDNA region. In addition, the difference was at the branches among congeneric species. Kappaphycus sp. had branches with small buds, while K. alvarezii did not have such a feature. The nucleotide variation rates varied from 7.02% to 7.48% among species; within the same species of the clades it was <1.20%. Eucheumatoideae algae cultivated in China consisted of three clades, K. alvarezii, Kappaphycus sp., and E. denticulatum. The results indicate that ITS sequence analysis was an effective way for identification of interspecies and intraspecies phylogenetic relationships and might provide a clue for molecular identification of algal Eucheumatoideae.
A Feature-Based Approach to Modeling Protein–DNA Interactions
Segal, Eran
2008-01-01
Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/. PMID:18725950
Logacheva, Maria D; Samigullin, Tahir H; Dhingra, Amit; Penin, Aleksey A
2008-01-01
Background Chloroplast genome sequences are extremely informative about species-interrelationships owing to its non-meiotic and often uniparental inheritance over generations. The subject of our study, Fagopyrum esculentum, is a member of the family Polygonaceae belonging to the order Caryophyllales. An uncertainty remains regarding the affinity of Caryophyllales and the asterids that could be due to undersampling of the taxa. With that background, having access to the complete chloroplast genome sequence for Fagopyrum becomes quite pertinent. Results We report the complete chloroplast genome sequence of a wild ancestor of cultivated buckwheat, Fagopyrum esculentum ssp. ancestrale. The sequence was rapidly determined using a previously described approach that utilized a PCR-based method and employed universal primers, designed on the scaffold of multiple sequence alignment of chloroplast genomes. The gene content and order in buckwheat chloroplast genome is similar to Spinacia oleracea. However, some unique structural differences exist: the presence of an intron in the rpl2 gene, a frameshift mutation in the rpl23 gene and extension of the inverted repeat region to include the ycf1 gene. Phylogenetic analysis of 61 protein-coding gene sequences from 44 complete plastid genomes provided strong support for the sister relationships of Caryophyllales (including Polygonaceae) to asterids. Further, our analysis also provided support for Amborella as sister to all other angiosperms, but interestingly, in the bayesian phylogeny inference based on first two codon positions Amborella united with Nymphaeales. Conclusion Comparative genomics analyses revealed that the Fagopyrum chloroplast genome harbors the characteristic gene content and organization as has been described for several other chloroplast genomes. However, it has some unique structural features distinct from previously reported complete chloroplast genome sequences. Phylogenetic analysis of the dataset, including this new sequence from non-core Caryophyllales supports the sister relationship between Caryophyllales and asterids. PMID:18492277
Zook, Justin M.; Samarov, Daniel; McDaniel, Jennifer; Sen, Shurjo K.; Salit, Marc
2012-01-01
While the importance of random sequencing errors decreases at higher DNA or RNA sequencing depths, systematic sequencing errors (SSEs) dominate at high sequencing depths and can be difficult to distinguish from biological variants. These SSEs can cause base quality scores to underestimate the probability of error at certain genomic positions, resulting in false positive variant calls, particularly in mixtures such as samples with RNA editing, tumors, circulating tumor cells, bacteria, mitochondrial heteroplasmy, or pooled DNA. Most algorithms proposed for correction of SSEs require a data set used to calculate association of SSEs with various features in the reads and sequence context. This data set is typically either from a part of the data set being “recalibrated” (Genome Analysis ToolKit, or GATK) or from a separate data set with special characteristics (SysCall). Here, we combine the advantages of these approaches by adding synthetic RNA spike-in standards to human RNA, and use GATK to recalibrate base quality scores with reads mapped to the spike-in standards. Compared to conventional GATK recalibration that uses reads mapped to the genome, spike-ins improve the accuracy of Illumina base quality scores by a mean of 5 Phred-scaled quality score units, and by as much as 13 units at CpG sites. In addition, since the spike-in data used for recalibration are independent of the genome being sequenced, our method allows run-specific recalibration even for the many species without a comprehensive and accurate SNP database. We also use GATK with the spike-in standards to demonstrate that the Illumina RNA sequencing runs overestimate quality scores for AC, CC, GC, GG, and TC dinucleotides, while SOLiD has less dinucleotide SSEs but more SSEs for certain cycles. We conclude that using these DNA and RNA spike-in standards with GATK improves base quality score recalibration. PMID:22859977
Ma, Xin; Guo, Jing; Sun, Xiao
2016-01-01
DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.
Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin
2007-12-01
Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide
Nevada Test Site craters used for astronaut training
NASA Technical Reports Server (NTRS)
Moore, H. J.
1977-01-01
Craters produced by chemical and nuclear explosives at the Nevada Test Site were used to train astronauts before their lunar missions. The craters have characteristics suitable for reconnaissance-type field investigations. The Schooner test produced a crater about 300 m across and excavated more than 72 m of stratigraphic section deposited in a fairly regular fashion so that systematic observations yield systematic results. Other features common on the moon, such as secondary craters and glass-coated rocks, are present at Schooner crater. Smaller explosive tests on Buckboard Mesa excavated rocks from three horizontal alteration zones within basalt flows so that the original sequence of the zones could be determined. One crater illustrated the characteristics of craters formed across vertical boundaries between rock units. Although the exercises at the Nevada Test Site were only a small part of the training of the astronauts, voice transcripts of Apollo missions 14, 16, and 17 show that the exercises contributed to astronaut performance on the moon.
Vancanneyt, M; Mengaud, J; Cleenwerck, I; Vanhonacker, K; Hoste, B; Dawyndt, P; Degivry, M C; Ringuet, D; Janssens, D; Swings, J
2004-03-01
Fourteen homofermentative lactic acid bacteria that were isolated from kefir grains and kefir fermented milks were assigned to either Lactobacillus kefiranofaciens or Lactobacillus kefirgranum, based on their characteristic morphotypes, phenotypic features and SDS-PAGE profiles of whole-cell proteins. Further genotypic analyses on representative strains from both taxa demonstrated that L. kefiranofaciens and L. kefirgranum share 100 % 16S rDNA sequence similarity and belong phylogenetically to the Lactobacillus acidophilus species group. DNA-DNA binding values of >79 % and analogous DNA G+C contents of 37-38 mol% showed that the strains studied belonged to one species: L. kefirgranum is a later synonym of L. kefiranofaciens. An emended description is proposed for L. kefiranofaciens. Due to the specific morphological and biochemical characteristics of these taxa in kefir grain formation, it is proposed that L. kefirgranum should be reclassified as L. kefiranofaciens subsp. kefirgranum subsp. nov.
Rosman, Noor Hasyimah; Nor Anuar, Aznah; Chelliapan, Shreeshivadasan; Md Din, Mohd Fadhil; Ujang, Zaini
2014-06-01
The influence of hydraulic retention time (HRT, 24, 12, and 6h) on the physical characteristics of granules and performance of a sequencing batch reactor (SBR) treating rubber wastewater was investigated. Results showed larger granular sludge formation at HRT of 6h with a mean size of 2.0±0.1mm, sludge volume index of 20.1mLg(-1), settling velocity of 61mh(-1), density of 78.2gL(-1) and integrity coefficient of 9.54. Scanning electron microscope analyses revealed different morphology of microorganisms and structural features of granules when operated at various HRT. The results also demonstrated that up to 98.4% COD reduction was achieved when the reactor was operated at low HRT (6h). Around 92.7% and 89.5% removal efficiency was noted for ammonia and total nitrogen in the granular SBR system during the treatment of rubber wastewater. Copyright © 2014 Elsevier Ltd. All rights reserved.
Novel approaches in function-driven single-cell genomics.
Doud, Devin F R; Woyke, Tanja
2017-07-01
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbial communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision. © FEMS 2017.
Novel approaches in function-driven single-cell genomics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Doud, Devin F. R.; Woyke, Tanja
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbialmore » communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision.« less
Novel approaches in function-driven single-cell genomics
Doud, Devin F. R.; Woyke, Tanja
2017-06-07
Deeper sequencing and improved bioinformatics in conjunction with single-cell and metagenomic approaches continue to illuminate undercharacterized environmental microbial communities. This has propelled the 'who is there, and what might they be doing' paradigm to the uncultivated and has already radically changed the topology of the tree of life and provided key insights into the microbial contribution to biogeochemistry. While characterization of 'who' based on marker genes can describe a large fraction of the community, answering 'what are they doing' remains the elusive pinnacle for microbiology. Function-driven single-cell genomics provides a solution by using a function-based screen to subsample complex microbialmore » communities in a targeted manner for the isolation and genome sequencing of single cells. This enables single-cell sequencing to be focused on cells with specific phenotypic or metabolic characteristics of interest. Recovered genomes are conclusively implicated for both encoding and exhibiting the feature of interest, improving downstream annotation and revealing activity levels within that environment. This emerging approach has already improved our understanding of microbial community functioning and facilitated the experimental analysis of uncharacterized gene product space. Here we provide a comprehensive review of strategies that have been applied for function-driven single-cell genomics and the future directions we envision.« less
Le Bras, Ronan J; Kuzma, Heidi; Sucic, Victor; Bokelmann, Götz
2016-05-01
A notable sequence of calls was encountered, spanning several days in January 2003, in the central part of the Indian Ocean on a hydrophone triplet recording acoustic data at a 250 Hz sampling rate. This paper presents signal processing methods applied to the waveform data to detect, group, extract amplitude and bearing estimates for the recorded signals. An approximate location for the source of the sequence of calls is inferred from extracting the features from the waveform. As the source approaches the hydrophone triplet, the source level (SL) of the calls is estimated at 187 ± 6 dB re: 1 μPa-1 m in the 15-60 Hz frequency range. The calls are attributed to a subgroup of blue whales, Balaenoptera musculus, with a characteristic acoustic signature. A Bayesian location method using probabilistic models for bearing and amplitude is demonstrated on the calls sequence. The method is applied to the case of detection at a single triad of hydrophones and results in a probability distribution map for the origin of the calls. It can be extended to detections at multiple triads and because of the Bayesian formulation, additional modeling complexity can be built-in as needed.
NASA Astrophysics Data System (ADS)
Dmitriev, A. V.; Suvorova, A. V.
2012-08-01
Here, we present a case study of THEMIS and ground-based observations of the perturbed dayside magnetopause and the geomagnetic field in relation to the interaction of an interplanetary directional discontinuity (DD) with the magnetosphere on 16 June 2007. The interaction resulted in a large-scale local magnetopause distortion of an "expansion - compression - expansion" (ECE) sequence that lasted for ˜15 min. The compression was caused by a very dense, cold, and fast high-βmagnetosheath plasma flow, a so-called plasma jet, whose kinetic energy was approximately three times higher than the energy of the incident solar wind. The plasma jet resulted in the effective penetration of magnetosheath plasma inside the magnetosphere. A strong distortion of the Chapman-Ferraro current in the ECE sequence generated a tripolar magnetic pulse "decrease - peak- decrease" (DPD) that was observed at low and middle latitudes by some ground-based magnetometers of the INTERMAGNET network. The characteristics of the ECE sequence and the spatial-temporal dynamics of the DPD pulse were found to be very different from any reported patterns of DD interactions with the magnetosphere. The observed features only partially resembled structures such as FTE, hot flow anomalies, and transient density events. Thus, it is difficult to explain them in the context of existing models.
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Kato, Mamoru; Shibata, Tatsuhiro; Yachida, Shinichi
2016-01-01
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect molecular characteristics of tumors, supporting the concept of "liquid biopsy".We determined the mutational status of KRAS in plasma cfDNA using multiplex droplet digital PCR in 259 patients with PDAC, retrospectively. Furthermore, we constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA in 48 patients who had ≥1 % mutant allele frequencies of KRAS in plasma cfDNA.Droplet digital PCR detected KRAS mutations in plasma cfDNA in 63 of 107 (58.9 %) patients with inoperable tumors. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2 %) examined by cfDNA sequencing.Our two-step approach with plasma cfDNA, combining droplet digital PCR and targeted deep sequencing, is a feasible clinical approach. Assessment of mutations in plasma cfDNA may provide a new diagnostic tool, assisting decisions for optimal therapeutic strategies for PDAC patients.
Clinical utility of circulating tumor DNA for molecular assessment in pancreatic cancer.
Takai, Erina; Totoki, Yasushi; Nakamura, Hiromi; Morizane, Chigusa; Nara, Satoshi; Hama, Natsuko; Suzuki, Masami; Furukawa, Eisaku; Kato, Mamoru; Hayashi, Hideyuki; Kohno, Takashi; Ueno, Hideki; Shimada, Kazuaki; Okusaka, Takuji; Nakagama, Hitoshi; Shibata, Tatsuhiro; Yachida, Shinichi
2015-12-16
Pancreatic ductal adenocarcinoma (PDAC) remains one of the most lethal malignancies. The genomic landscape of the PDAC genome features four frequently mutated genes (KRAS, CDKN2A, TP53, and SMAD4) and dozens of candidate driver genes altered at low frequency, including potential clinical targets. Circulating cell-free DNA (cfDNA) is a promising resource to detect and monitor molecular characteristics of tumors. In the present study, we determined the mutational status of KRAS in plasma cfDNA using multiplex picoliter-droplet digital PCR in 259 patients with PDAC. We constructed a novel modified SureSelect-KAPA-Illumina platform and an original panel of 60 genes. We then performed targeted deep sequencing of cfDNA and matched germline DNA samples in 48 patients who had ≥1% mutant allele frequencies of KRAS in plasma cfDNA. Importantly, potentially targetable somatic mutations were identified in 14 of 48 patients (29.2%) examined by targeted deep sequencing of cfDNA. We also analyzed somatic copy number alterations based on the targeted sequencing data using our in-house algorithm, and potentially targetable amplifications were detected. Assessment of mutations and copy number alterations in plasma cfDNA may provide a prognostic and diagnostic tool to assist decisions regarding optimal therapeutic strategies for PDAC patients.
Chen, Jing-Hua; Yu, Long-Jiang; Boussac, Alain; Wang-Otomo, Zheng-Yu; Kuang, Tingyun; Shen, Jian-Ren
2018-04-24
The thermophilic purple sulfur bacterium Thermochromatium tepidum possesses four main water-soluble redox proteins involved in the electron transfer behavior. Crystal structures have been reported for three of them: a high potential iron-sulfur protein, cytochrome c', and one of two low-potential cytochrome c 552 (which is a flavocytochrome c) have been determined. In this study, we purified another low-potential cytochrome c 552 (LPC), determined its N-terminal amino acid sequence and the whole gene sequence, characterized it with absorption and electron paramagnetic spectroscopy, and solved its high-resolution crystal structure. This novel cytochrome was found to contain five c-type hemes. The overall fold of LPC consists of two distinct domains, one is the five heme-containing domain and the other one is an Ig-like domain. This provides a representative example for the structures of multiheme cytochromes containing an odd number of hemes, although the structures of multiheme cytochromes with an even number of hemes are frequently seen in the PDB database. Comparison of the sequence and structure of LPC with other proteins in the databases revealed several characteristic features which may be important for its functioning. Based on the results obtained, we discuss the possible intracellular function of this LPC in Tch. tepidum.
Kariminejad, Ariana; Ajeawung, Norbert Fonya; Bozorgmehr, Bita; Dionne-Laporte, Alexandre; Molidperee, Sirinart; Najafi, Kimia; Gibbs, Richard A; Lee, Brendan H; Hennekam, Raoul C; Campeau, Philippe M
2017-04-01
Kaufman oculo-cerebro-facial syndrome (KOS) is caused by recessive UBE3B mutations and presents with microcephaly, ocular abnormalities, distinctive facial morphology, low cholesterol levels and intellectual disability. We describe a child with microcephaly, brachycephaly, hearing loss, ptosis, blepharophimosis, hypertelorism, cleft palate, multiple renal cysts, absent nails, small or absent terminal phalanges, absent speech and intellectual disability. Syndromes that were initially considered include DOORS syndrome, Coffin-Siris syndrome and Dubowitz syndrome. Clinical investigations coupled with karyotype analysis, array-comparative genomic hybridization, exome and Sanger sequencing were performed to characterize the condition in this child. Sanger sequencing was negative for the DOORS syndrome gene TBC1D24 but exome sequencing identified a homozygous deletion in UBE3B (NM_183415:c.3139_3141del, p.1047_1047del) located within the terminal portion of the HECT domain. This finding coupled with the presence of characteristic features such as brachycephaly, ptosis, blepharophimosis, hypertelorism, short palpebral fissures, cleft palate and developmental delay allowed us to make a diagnosis of KOS. In conclusion, our findings highlight the importance of considering KOS as a differential diagnosis for patients under evaluation for DOORS syndrome and expand the phenotype of KOS to include small or absent terminal phalanges, nails, and the presence of hallux varus and multicystic dysplastic kidneys.
Statistical Methods for Identifying Sequence Motifs Affecting Point Mutations
Zhu, Yicheng; Neeman, Teresa; Yap, Von Bing; Huttley, Gavin A.
2017-01-01
Mutation processes differ between types of point mutation, genomic locations, cells, and biological species. For some point mutations, specific neighboring bases are known to be mechanistically influential. Beyond these cases, numerous questions remain unresolved, including: what are the sequence motifs that affect point mutations? How large are the motifs? Are they strand symmetric? And, do they vary between samples? We present new log-linear models that allow explicit examination of these questions, along with sequence logo style visualization to enable identifying specific motifs. We demonstrate the performance of these methods by analyzing mutation processes in human germline and malignant melanoma. We recapitulate the known CpG effect, and identify novel motifs, including a highly significant motif associated with A→G mutations. We show that major effects of neighbors on germline mutation lie within ±2 of the mutating base. Models are also presented for contrasting the entire mutation spectra (the distribution of the different point mutations). We show the spectra vary significantly between autosomes and X-chromosome, with a difference in T→C transition dominating. Analyses of malignant melanoma confirmed reported characteristic features of this cancer, including statistically significant strand asymmetry, and markedly different neighboring influences. The methods we present are made freely available as a Python library https://bitbucket.org/pycogent3/mutationmotif. PMID:27974498
Bernardinelli, Emanuele; Nofziger, Charity; Patsch, Wolfgang; Rasp, Gerd; Paulmichl, Markus; Dossena, Silvia
2018-01-01
The prevalence and spectrum of sequence alterations in the SLC26A4 gene, which codes for the anion exchanger pendrin, are population-specific and account for at least 50% of cases of non-syndromic hearing loss associated with an enlarged vestibular aqueduct. A cohort of nineteen patients from Austria with hearing loss and a radiological alteration of the vestibular aqueduct underwent Sanger sequencing of SLC26A4 and GJB2, coding for connexin 26. The pathogenicity of sequence alterations detected was assessed by determining ion transport and molecular features of the corresponding SLC26A4 protein variants. In this group, four uncharacterized sequence alterations within the SLC26A4 coding region were found. Three of these lead to protein variants with abnormal functional and molecular features, while one should be considered with no pathogenic potential. Pathogenic SLC26A4 sequence alterations were only found in 12% of patients. SLC26A4 sequence alterations commonly found in other Caucasian populations were not detected. This survey represents the first study on the prevalence and spectrum of SLC26A4 sequence alterations in an Austrian cohort and further suggests that genetic testing should always be integrated with functional characterization and determination of the molecular features of protein variants in order to unequivocally identify or exclude a causal link between genotype and phenotype. PMID:29320412
Noise-robust speech recognition through auditory feature detection and spike sequence decoding.
Schafer, Phillip B; Jin, Dezhe Z
2014-03-01
Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites.
Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo
2014-01-01
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.
TFBSshape: a motif database for DNA shape features of transcription factor binding sites
Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo
2014-01-01
Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955
Qin, Lei; Snoussi, Hichem; Abdallah, Fahed
2014-01-01
We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883
Crabtree, Nathaniel M; Moore, Jason H; Bowyer, John F; George, Nysia I
2017-01-01
A computational evolution system (CES) is a knowledge discovery engine that can identify subtle, synergistic relationships in large datasets. Pareto optimization allows CESs to balance accuracy with model complexity when evolving classifiers. Using Pareto optimization, a CES is able to identify a very small number of features while maintaining high classification accuracy. A CES can be designed for various types of data, and the user can exploit expert knowledge about the classification problem in order to improve discrimination between classes. These characteristics give CES an advantage over other classification and feature selection algorithms, particularly when the goal is to identify a small number of highly relevant, non-redundant biomarkers. Previously, CESs have been developed only for binary class datasets. In this study, we developed a multi-class CES. The multi-class CES was compared to three common feature selection and classification algorithms: support vector machine (SVM), random k-nearest neighbor (RKNN), and random forest (RF). The algorithms were evaluated on three distinct multi-class RNA sequencing datasets. The comparison criteria were run-time, classification accuracy, number of selected features, and stability of selected feature set (as measured by the Tanimoto distance). The performance of each algorithm was data-dependent. CES performed best on the dataset with the smallest sample size, indicating that CES has a unique advantage since the accuracy of most classification methods suffer when sample size is small. The multi-class extension of CES increases the appeal of its application to complex, multi-class datasets in order to identify important biomarkers and features.
Büssow, Konrad; Hoffmann, Steve; Sievert, Volker
2002-12-19
Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.
Characteristic features of injuries due to shark attacks: a review of 12 cases.
Ihama, Yoko; Ninomiya, Kenji; Noguchi, Masamichi; Fuke, Chiaki; Miyazaki, Tetsuji
2009-09-01
Shark attacks on humans might not occur as often as is believed and the characteristic features of shark injuries on corpses have not been extensively reviewed. We describe the characteristic features of shark injuries on 12 corpses. The analysis of these injuries might reveal the motivation behind the attacks and/or the shark species involved in the attack. Gouge marks on the bones are evidence of a shark attack, even if the corpse is decomposed. Severance of the body part at the joints without a fracture was found to be a characteristic feature of shark injuries.
NASA Astrophysics Data System (ADS)
Ashastina, Kseniia; Schirrmeister, Lutz; Fuchs, Margret; Kienast, Frank
2017-07-01
Syngenetic permafrost deposits formed extensively on and around the arising Beringian subcontinent during the Late Pleistocene sea level lowstands. Syngenetic deposition implies that all material, both mineral and organic, freezes parallel to sedimentation and remains frozen until degradation of the permafrost. Permafrost is therefore a unique archive of Late Pleistocene palaeoclimate. Most studied permafrost outcrops are situated in the coastal lowlands of northeastern Siberia; inland sections are, however, scarcely available. Here, we describe the stratigraphical, cryolithological, and geochronological characteristics of a permafrost sequence near Batagay in the Siberian Yana Highlands, the interior of the Sakha Republic (Yakutia), Russia, with focus on the Late Pleistocene Yedoma ice complex (YIC). The recently formed Batagay mega-thaw slump exposes permafrost deposits to a depth of up to 80 m and gives insight into a climate record close to Verkhoyansk, which has the most severe continental climate in the Northern Hemisphere. Geochronological dating (optically stimulated luminescence, OSL, and 14C ages) and stratigraphic implications delivered a temporal frame from the Middle Pleistocene to the Holocene for our sedimentological interpretations and also revealed interruptions in the deposition. The sequence of lithological units indicates a succession of several distinct climate phases: a Middle Pleistocene ice complex indicates cold stage climate. Then, ice wedge growth stopped due to highly increased sedimentation rates and eventually a rise in temperature. Full interglacial climate conditions existed during accumulation of an organic-rich layer - plant macrofossils reflected open forest vegetation existing under dry conditions during Marine Isotope Stage (MIS) 5e. The Late Pleistocene YIC (MIS 4-MIS 2) suggests severe cold-stage climate conditions. No alas deposits, potentially indicating thermokarst processes, were detected at the site. A detailed comparison of the permafrost deposits exposed in the Batagay thaw slump with well-studied permafrost sequences, both coastal and inland, is made to highlight common features and differences in their formation processes and palaeoclimatic histories. Fluvial and lacustrine influence is temporarily common in the majority of permafrost exposures, but has to be excluded for the Batagay sequence. We interpret the characteristics of permafrost deposits at this location as a result of various climatically induced processes that are partly seasonally controlled. Nival deposition might have been dominant during winter time, whereas proluvial and aeolian deposition could have prevailed during the snowmelt period and the dry summer season.
Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P
2017-03-01
Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.
Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.
Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin
2016-01-01
First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.
Durdu, Murat; Güran, Mümtaz; Kandemir, Hazal; Ilkit, Macit; Seyedmousavi, Seyedmojtaba
2016-02-01
Although some studies have investigated the epidemiological characteristics of Malassezia folliculitis (MF), little is known about the clinical features and laboratory characteristics of folliculitis caused by other fungi. In this prospective study, 158 patients with folliculitis were identified, and cytological and mycological examinations were performed. The positive fungal cultures were confirmed using conventional methods, ITS sequencing and HWP1 analysis. Additionally, an in vitro antifungal susceptibility test was performed. Of 158 patients with folliculitis, 65 (41.1 %) were found to have fungal folliculitis. The most common (90.8 %) fungal folliculitis was MF. Non-MF fungal folliculitis was detected in 6 (9.2 %) patients. Four patients were diagnosed with dermatophytic folliculitis (Trichophyton rubrum in three patients and Arthroderma vanbreuseghemii in one patient), and two patients were diagnosed with Candida albicans folliculitis. Although only 5 of the 6 samples were found to be positive via a potassium hydroxide test, all May-Grünwald-Giemsa-stained samples were positive. Both of the C. albicans isolates demonstrated a susceptibility profile to itraconazole, and all four dermatophytes were susceptible to terbinafine. All six patients completely recovered with systemic and topical treatment. This study revealed that dermatophytes and C. albicans are the primary causative agents of non-Malassezia fungal folliculitis. We compared our findings with published reports on fungal folliculitis.
Not all (possibly) “random” sequences are created equal
Pincus, Steve; Kalman, Rudolf E.
1997-01-01
The need to assess the randomness of a single sequence, especially a finite sequence, is ubiquitous, yet is unaddressed by axiomatic probability theory. Here, we assess randomness via approximate entropy (ApEn), a computable measure of sequential irregularity, applicable to single sequences of both (even very short) finite and infinite length. We indicate the novelty and facility of the multidimensional viewpoint taken by ApEn, in contrast to classical measures. Furthermore and notably, for finite length, finite state sequences, one can identify maximally irregular sequences, and then apply ApEn to quantify the extent to which given sequences differ from maximal irregularity, via a set of deficit (defm) functions. The utility of these defm functions which we show allows one to considerably refine the notions of probabilistic independence and normality, is featured in several studies, including (i) digits of e, π, √2, and √3, both in base 2 and in base 10, and (ii) sequences given by fractional parts of multiples of irrationals. We prove companion analytic results, which also feature in a discussion of the role and validity of the almost sure properties from axiomatic probability theory insofar as they apply to specified sequences and sets of sequences (in the physical world). We conclude by relating the present results and perspective to both previous and subsequent studies. PMID:11038612
Pairwise Sequence Alignment Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jeff Daily, PNNL
2015-05-20
Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Wilkin, Justin; Kerr, Natalie C; Byrd, Kathryn W; Ward, Jewell C; Iannaccone, Alessandro
2016-06-01
To report longitudinal phenotypic findings in a patient with Sanfilippo syndrome type IIIA, harboring SGSH mutations, one of which is novel. Heparan-N-sulfatidase enzyme function testing in skin fibroblasts and white blood cells and SGSH gene sequencing were obtained. Clinical office examinations, examinations under anesthesia, electroretinogram, spectral domain optical coherence tomography (SD-OCT), and fundus photography were performed over a 5-year period. Fundus examination revealed a progressive breadcrumb-like pigmentary retinopathy with perifoveal pigmentary involvement. SD-OCT showed loss of normal neuroretinal lamination and cystic macular changes responsive to treatment with carbonic anhydrase inhibitors. Electroretinography exhibited complex characteristics indicative of a generalized retinal rod > cone dysfunction with significant ON > OFF postreceptoral response compromise. Sequencing revealed compound heterozygous mutations in the SGSH gene, the novel c.88G > C (p.A30P) change and a second, previously reported one (c.734G > A, p.R245H). We have identified ocular features of a patient with Sanfilippo syndrome type IIIA harboring a novel SGHS mutation that were not previously known to occur in this disease - namely, a progressive retinopathy with distinctive features, cystic macular changes responsive to carbonic anhydrase inhibitors, and complex electroretinographic abnormalities consistent with postreceptoral dysfunction. SD-OCT imaging revealed retinal lamination changes consistent with previously reported histologic studies. Both the SD-OCT and the electroretinogram changes appear attributable to intraretinal deposition of heparan sulfate.
Thomas, Anna C.; Williams, Hywel; Setó-Salvia, Núria; Bacchelli, Chiara; Jenkins, Dagan; O’Sullivan, Mary; Mengrelis, Konstantinos; Ishida, Miho; Ocaka, Louise; Chanudet, Estelle; James, Chela; Lescai, Francesco; Anderson, Glenn; Morrogh, Deborah; Ryten, Mina; Duncan, Andrew J.; Pai, Yun Jin; Saraiva, Jorge M.; Ramos, Fabiana; Farren, Bernadette; Saunders, Dawn; Vernay, Bertrand; Gissen, Paul; Straatmaan-Iwanowska, Anna; Baas, Frank; Wood, Nicholas W.; Hersheson, Joshua; Houlden, Henry; Hurst, Jane; Scott, Richard; Bitner-Glindzicz, Maria; Moore, Gudrun E.; Sousa, Sérgio B.; Stanier, Philip
2014-01-01
Intellectual disability and cerebellar atrophy occur together in a large number of genetic conditions and are frequently associated with microcephaly and/or epilepsy. Here we report the identification of causal mutations in Sorting Nexin 14 (SNX14) found in seven affected individuals from three unrelated consanguineous families who presented with recessively inherited moderate-severe intellectual disability, cerebellar ataxia, early-onset cerebellar atrophy, sensorineural hearing loss, and the distinctive association of progressively coarsening facial features, relative macrocephaly, and the absence of seizures. We used homozygosity mapping and whole-exome sequencing to identify a homozygous nonsense mutation and an in-frame multiexon deletion in two families. A homozygous splice site mutation was identified by Sanger sequencing of SNX14 in a third family, selected purely by phenotypic similarity. This discovery confirms that these characteristic features represent a distinct and recognizable syndrome. SNX14 encodes a cellular protein containing Phox (PX) and regulator of G protein signaling (RGS) domains. Weighted gene coexpression network analysis predicts that SNX14 is highly coexpressed with genes involved in cellular protein metabolism and vesicle-mediated transport. All three mutations either directly affected the PX domain or diminished SNX14 levels, implicating a loss of normal cellular function. This manifested as increased cytoplasmic vacuolation as observed in cultured fibroblasts. Our findings indicate an essential role for SNX14 in neural development and function, particularly in development and maturation of the cerebellum. PMID:25439728
Thomas, Anna C; Williams, Hywel; Setó-Salvia, Núria; Bacchelli, Chiara; Jenkins, Dagan; O'Sullivan, Mary; Mengrelis, Konstantinos; Ishida, Miho; Ocaka, Louise; Chanudet, Estelle; James, Chela; Lescai, Francesco; Anderson, Glenn; Morrogh, Deborah; Ryten, Mina; Duncan, Andrew J; Pai, Yun Jin; Saraiva, Jorge M; Ramos, Fabiana; Farren, Bernadette; Saunders, Dawn; Vernay, Bertrand; Gissen, Paul; Straatmaan-Iwanowska, Anna; Baas, Frank; Wood, Nicholas W; Hersheson, Joshua; Houlden, Henry; Hurst, Jane; Scott, Richard; Bitner-Glindzicz, Maria; Moore, Gudrun E; Sousa, Sérgio B; Stanier, Philip
2014-11-06
Intellectual disability and cerebellar atrophy occur together in a large number of genetic conditions and are frequently associated with microcephaly and/or epilepsy. Here we report the identification of causal mutations in Sorting Nexin 14 (SNX14) found in seven affected individuals from three unrelated consanguineous families who presented with recessively inherited moderate-severe intellectual disability, cerebellar ataxia, early-onset cerebellar atrophy, sensorineural hearing loss, and the distinctive association of progressively coarsening facial features, relative macrocephaly, and the absence of seizures. We used homozygosity mapping and whole-exome sequencing to identify a homozygous nonsense mutation and an in-frame multiexon deletion in two families. A homozygous splice site mutation was identified by Sanger sequencing of SNX14 in a third family, selected purely by phenotypic similarity. This discovery confirms that these characteristic features represent a distinct and recognizable syndrome. SNX14 encodes a cellular protein containing Phox (PX) and regulator of G protein signaling (RGS) domains. Weighted gene coexpression network analysis predicts that SNX14 is highly coexpressed with genes involved in cellular protein metabolism and vesicle-mediated transport. All three mutations either directly affected the PX domain or diminished SNX14 levels, implicating a loss of normal cellular function. This manifested as increased cytoplasmic vacuolation as observed in cultured fibroblasts. Our findings indicate an essential role for SNX14 in neural development and function, particularly in development and maturation of the cerebellum. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.
Epigenetic and genetic diagnosis of Silver-Russell syndrome.
Eggermann, Thomas; Spengler, Sabrina; Gogiel, Magdalena; Begemann, Matthias; Elbracht, Miriam
2012-06-01
Silver-Russell syndrome (SRS) is a congenital imprinting disorder characterized by intrauterine and postnatal growth restriction and further characteristic features. SRS is genetically heterogenous: 7-10% of patients carry a maternal uniparental disomy of chromosome 7; >38% show a hypomethylation in imprinting control region 1 in 11p15; and a further class of mutations are copy number variations affecting different chromosomes, but mainly 11p15 and 7. The diagnostic work-up should thus aim to detect these three molecular subtypes. Numerous techniques are currently applied in genetic SRS testing, but none of them covers all known (epi)mutations, and they should therefore be used synergistically. However, future next-generation sequencing approaches will allow a comprehensive analysis of all types of alterations in SRS.
Three-dimensional interactions and vortical flows with emphasis on high speeds
NASA Technical Reports Server (NTRS)
Peake, D. J.; Tobak, M.
1980-01-01
Diverse kinds of three-dimensional regions of separation in laminar and turbulent boundary layers are discussed that exist on lifting aerodynamic configurations immersed in flows from subsonic to hypersonic speeds. In all cases of three dimensional flow separation, the assumption of continuous vector fields of skin-friction lines and external-flow streamlines, coupled with simple topology laws, provides a flow grammar whose elemental constituents are the singular points: nodes, foci, and saddles. Adopting these notions enables one to create sequences of plausible flow structures, to deduce mean flow characteristics, expose flow mechanisms, and to aid theory and experiment where lack of resolution in numerical calculations or wind tunnel observation causes imprecision in diagnosing the three dimensional flow features.
Salem, Nida’ M.; Miller, W. Allen; Rowhani, Adib; Golino, Deborah A.; Moyne, Anne-Laure; Falk, Bryce W.
2015-01-01
We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5′- and 3′-RACE showed the RSDaV genomic RNA to be 5,808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3′-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5′ ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5′ end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3′ cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae. PMID:18329064
Salem, Nida' M; Miller, W Allen; Rowhani, Adib; Golino, Deborah A; Moyne, Anne-Laure; Falk, Bryce W
2008-06-05
We determined the complete nucleotide sequence of the Rose spring dwarf-associated virus (RSDaV) genomic RNA (GenBank accession no. EU024678) and compared its predicted RNA structural characteristics affecting gene expression. A cDNA library was derived from RSDaV double-stranded RNAs (dsRNAs) purified from infected tissue. Nucleotide sequence analysis of the cloned cDNAs, plus for clones generated by 5'- and 3'-RACE showed the RSDaV genomic RNA to be 5808 nucleotides. The genomic RNA contains five major open reading frames (ORFs), and three small ORFs in the 3'-terminal 800 nucleotides, typical for viruses of genus Luteovirus in the family Luteoviridae. Northern blot hybridization analysis revealed the genomic RNA and two prominent subgenomic RNAs of approximately 3 kb and 1 kb. Putative 5' ends of the sgRNAs were predicted by identification of conserved sequences and secondary structures which resembled the Barley yellow dwarf virus (BYDV) genomic RNA 5' end and subgenomic RNA promoter sequences. Secondary structures of the BYDV-like ribosomal frameshift elements and cap-independent translation elements, including long-distance base pairing spanning four kb were identified. These contain similarities but also informative differences with the BYDV structures, including a strikingly different structure predicted for the 3' cap-independent translation element. These analyses of the RSDaV genomic RNA show more complexity for the RNA structural elements for members of the Luteoviridae.
Di, Yanming; Schafer, Daniel W.; Wilhelm, Larry J.; Fox, Samuel E.; Sullivan, Christopher M.; Curzon, Aron D.; Carrington, James C.; Mockler, Todd C.; Chang, Jeff H.
2011-01-01
GENE-counter is a complete Perl-based computational pipeline for analyzing RNA-Sequencing (RNA-Seq) data for differential gene expression. In addition to its use in studying transcriptomes of eukaryotic model organisms, GENE-counter is applicable for prokaryotes and non-model organisms without an available genome reference sequence. For alignments, GENE-counter is configured for CASHX, Bowtie, and BWA, but an end user can use any Sequence Alignment/Map (SAM)-compliant program of preference. To analyze data for differential gene expression, GENE-counter can be run with any one of three statistics packages that are based on variations of the negative binomial distribution. The default method is a new and simple statistical test we developed based on an over-parameterized version of the negative binomial distribution. GENE-counter also includes three different methods for assessing differentially expressed features for enriched gene ontology (GO) terms. Results are transparent and data are systematically stored in a MySQL relational database to facilitate additional analyses as well as quality assessment. We used next generation sequencing to generate a small-scale RNA-Seq dataset derived from the heavily studied defense response of Arabidopsis thaliana and used GENE-counter to process the data. Collectively, the support from analysis of microarrays as well as the observed and substantial overlap in results from each of the three statistics packages demonstrates that GENE-counter is well suited for handling the unique characteristics of small sample sizes and high variability in gene counts. PMID:21998647
Sequencing Events: Exploring Art and Art Jobs.
ERIC Educational Resources Information Center
Stephens, Pamela Geiger; Shaddix, Robin K.
2000-01-01
Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)
NASA Technical Reports Server (NTRS)
Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.
2011-01-01
MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described above to resolve the issue. Once resolved, saving the file causes the problem to be removed from the problem view.
Díaz, Martha Lucía; Leal, Sandra; Mantilla, Julio César; Molina-Berríos, Alfredo; López-Muñoz, Rodrigo; Solari, Aldo; Escobar, Patricia; González Rugeles, Clara Isabel
2015-11-26
Outbreaks of acute Chagas disease associated with oral transmission are easily detected nowadays with trained health personnel in areas of low endemicity, or in which the vector transmission has been interrupted. Given the biological and genetic diversity of Trypanosoma cruzi, the high morbidity, mortality, and the observed therapeutic failure, new characteristics of these outbreaks need to be addressed at different levels, both in Trypanosoma cruzi as in patient response. The aim of this work was to evaluate the patient's features involved in six outbreaks of acute Chagas disease which occurred in Santander, Colombia, and the characteristics of Trypanosoma cruzi clones isolated from these patients, to establish the potential relationship between the etiologic agent features with host behavior. The clinical, pathological and epidemiological aspects of outbreaks were analyzed. In addition, Trypanosoma cruzi clones were biologically characterized both in vitro and in vivo, and the susceptibility to the classical trypanocidal drugs nifurtimox and benznidazole was evaluated. Trypanosoma cruzi clones were genotyped by means of mini-exon intergenic spacer and cytochrome b genes sequencing. All clones were DTU I, and based on the mini-exon intergenic spacer, belong to two genotypes: G2 related with sub-urban, and G11 with rural outbreaks. Girón outbreak clones with higher susceptibility to drugs presented G2 genotype and C/T transition in Cyt b. The outbreaks affected mainly young population (±25.9 years), and the mortality rate was 10 %. The cardiac tissue showed intense inflammatory infiltrate, myocardial necrosis and abundant amastigote nests. However, although the gastrointestinal tissue was congestive, no inflammation or parasites were observed. Although all clones belong to DTU I, two intra-DTU genotypes were found with the sequencing of the mini-exon intergenic spacer, however there is no strict correlation between genetic groups, the cycles of the parasite or the clinical forms of the disease. Trypanosoma cruzi clones from Girón with higher sensitivity to nifurtimox presented a particular G2 genotype and C/T transition in Cyt b. When the diagnosis was early, the patients responded well to antichagasic treatment, which highlights the importance of diagnosis and treatment early to prevent fatal outcomes associated with these acute episodes.
Distribution and Features of the Six Classes of Peroxiredoxins
Poole, Leslie B.; Nelson, Kimberly J.
2016-01-01
Peroxiredoxins are cysteine-dependent peroxide reductases that group into 6 different, structurally discernable classes. In 2011, our research team reported the application of a bioinformatic approach called active site profiling to extract active site-proximal sequence segments from the 29 distinct, structurally-characterized peroxiredoxins available at the time. These extracted sequences were then used to create unique profiles for the six groups which were subsequently used to search GenBank(nr), allowing identification of ∼3500 peroxiredoxin sequences and their respective subgroups. Summarized in this minireview are the features and phylogenetic distributions of each of these peroxiredoxin subgroups; an example is also provided illustrating the use of the web accessible, searchable database known as PREX to identify subfamily-specific peroxiredoxin sequences for the organism Vitis vinifera (grape). PMID:26810075
Hansen, Loren; Kim, Nak-Kyeong; Mariño-Ramírez, Leonardo; Landsman, David
2011-01-01
Meiotic recombination is not distributed uniformly throughout the genome. There are regions of high and low recombination rates called hot and cold spots, respectively. The recombination rate parallels the frequency of DNA double-strand breaks (DSBs) that initiate meiotic recombination. The aim is to identify biological features associated with DSB frequency. We constructed vectors representing various chromatin and sequence-based features for 1179 DSB hot spots and 1028 DSB cold spots. Using a feature selection approach, we have identified five features that distinguish hot from cold spots in Saccharomyces cerevisiae with high accuracy, namely the histone marks H3K4me3, H3K14ac, H3K36me3, and H3K79me3; and GC content. Previous studies have associated H3K4me3, H3K36me3, and GC content with areas of mitotic recombination. H3K14ac and H3K79me3 are novel predictions and thus represent good candidates for further experimental study. We also show nucleosome occupancy maps produced using next generation sequencing exhibit a bias at DSB hot spots and this bias is strong enough to obscure biologically relevant information. A computational approach using feature selection can productively be used to identify promising biological associations. H3K14ac and H3K79me3 are novel predictions of chromatin marks associated with meiotic DSBs. Next generation sequencing can exhibit a bias that is strong enough to lead to incorrect conclusions. Care must be taken when interpreting high throughput sequencing data where systematic biases have been documented. PMID:22242140
Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E
2014-06-10
Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.
Simulation of spatial and temporal properties of aftershocks by means of the fiber bundle model
NASA Astrophysics Data System (ADS)
Monterrubio-Velasco, Marisol; Zúñiga, F. R.; Márquez-Ramírez, Victor Hugo; Figueroa-Soto, Angel
2017-11-01
The rupture processes of any heterogeneous material constitute a complex physical problem. Earthquake aftershocks show temporal and spatial behaviors which are consequence of the heterogeneous stress distribution and multiple rupturing following the main shock. This process is difficult to model deterministically due to the number of parameters and physical conditions, which are largely unknown. In order to shed light on the minimum requirements for the generation of aftershock clusters, in this study, we perform a simulation of the main features of such a complex process by means of a fiber bundle (FB) type model. The FB model has been widely used to analyze the fracture process in heterogeneous materials. It is a simple but powerful tool that allows modeling the main characteristics of a medium such as the brittle shallow crust of the earth. In this work, we incorporate spatial properties, such as the Coulomb stress change pattern, which help simulate observed characteristics of aftershock sequences. In particular, we introduce a parameter ( P) that controls the probability of spatial distribution of initial loads. Also, we use a "conservation" parameter ( π), which accounts for the load dissipation of the system, and demonstrate its influence on the simulated spatio-temporal patterns. Based on numerical results, we find that P has to be in the range 0.06 < P < 0.30, whilst π needs to be limited by a very narrow range ( 0.60 < π < 0.66) in order to reproduce aftershocks pattern characteristics which resemble those of observed sequences. This means that the system requires a small difference in the spatial distribution of initial stress, and a very particular fraction of load transfer in order to generate realistic aftershocks.
Uropathogenic Escherichia coli ST131 in urinary tract infections in children.
Yun, Ki Wook; Lee, Mi-Kyung; Kim, Wonyong; Lim, In Seok
2017-07-01
Escherichia coli sequence type (ST) 131, a multidrug-resistant clone causing extraintestinal infections, has rapidly become prevalent worldwide. However, the epidemiological and clinical features of pediatric infections are poorly understood. We aimed to explore the characteristics of ST131 Escherichia coli isolated from Korean children with urinary tract infections. We examined 114 uropathogenic E. coli (UPEC) isolates from children hospitalized at Chung-Ang University Hospital between 2011 and 2014. Bacterial strains were classified into STs by partial sequencing of seven housekeeping genes ( adk , fumC , gyrB , icd , mdh , purA , and recA ). Clinical characteristics and antimicrobial susceptibility were compared between ST131 and non-ST131 UPEC isolates. Sixteen UPEC isolates (14.0%) were extended-spectrum β-lactamase (ESBL)-producers; 50.0% of ESBL-producers were ST131 isolates. Of all the isolates tested, 13.2% (15 of 114) were classified as ST131. There were no statistically significant associations between ST131 and age, sex, or clinical characteristics, including fever, white blood cell counts in urine and serum, C-reactive protein, radiologic abnormalities, and clinical outcome. However, ST131 isolates showed significantly lower rates of susceptibility to cefazolin (26.7%), cefotaxime (40.0%), cefepime (40.0%), and ciprofloxacin (53.3%) than non-ST131 isolates (65.7%, 91.9%, 92.9%, and 87.9%, respectively; P <0.001 for all). ESBL was more frequently produced in ST131 (53.3%) than in non-ST131 (8.1%) isolates ( P <0.01). ST131 E. coli isolates were prevalent uropathogens in children at a single medical center in Korea between 2011 and 2014. Although ST131 isolates showed higher rates of antimicrobial resistance, clinical presentation and outcomes of patients were similar to those of patients infected with non-ST131 isolates.
Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.
1997-01-01
To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156
Extracting DNA words based on the sequence features: non-uniform distribution and integrity.
Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo
2016-01-25
DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.
BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.
Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins
2018-06-05
With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.
Texture analysis of common renal masses in multiple MR sequences for prediction of pathology
NASA Astrophysics Data System (ADS)
Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua
2017-03-01
This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.
An RNAi-Enhanced Logic Circuit for Cancer Specific Detection and Destruction
2013-02-01
monomeric protein secreted by Corynebacterium diphtheriae, and pro-apoptotic members of Bcl-2 family: mBax (Mus musculus), hBax ( Homo sapiens ), and its...Gata3 mStaple. Intron- feature sequences – donor site, branch point, poly- pyrimidine tract, and acceptor site – were selected based on previously...sequences found in literature our intron features were chosen according SplicePort [4], an online analyzer that detects the likelihood of splicing to
Current State of an Intelligent System to Aid in Tephra Layer Correlation
NASA Astrophysics Data System (ADS)
Hanson-Hedgecock, S.; Bursik, M.; Rogova, G.
2007-12-01
We are developing a computer based intelligent system to correlate tephra layers by using the lithologic, mineralogic, and geochemical characteristics of field samples, to aid geologists in interpreting eruption patterns of volcanic chains and fields. The intelligent system is used to define groups of tephra source vents by utilizing geochemical data, and to correlate tephra layers based on lithostratigraphic characteristics. Understanding the eruption history of a volcano from stratigraphic studies is important for forecasting future eruptive behavior and hazards. In volcanic chains and fields with a complex eruptive history and no central vent, determining the spatio- temporal eruption patterns is difficult. Sedimentologic and chemical variability, and sparse sampling often result in relatively large variances and imprecision in the dataset. Lithostratigraphic and geochemical interpretation also depends on ones' level of expertise and can be subjective. The processing of lithostratigraphic features is conducted by a hybrid classifier, composed of supervised artificial neural networks (ANNs) combined within the framework of the Dempster-Shafer theory of evidence. Since lithostratigraphic features vary with distance from source, hypothetical vent locations are determined by using expert domain knowledge and geostatistical methods. Geochemical data are processed by a suit of fuzzy k- means classifiers. Each fuzzy k-means classifier assigns observations to multiple clusters with various degrees, called membership coefficients. The assignment minimizes a function of the total distance between the centers of clusters and the individual geochemical data patterns weighed by the membership coefficients. Improved clustering results of geochemical data are achieved by the fusion of individual clustering results with an evidential combination method. Lithostratigraphic data from individual tephra beds of the North Mono eruption sequence are used to test the effectiveness of the intelligent system for tephra layer correlation. Geochemical data from tephra bedsets of the Mono and Inyo Craters, CA, are used to test the effectiveness of the intelligent system for eruption sequence correlation. The intelligent system aids correlation by showing matches and disparities between data patterns from different outcrops that may have been overlooked in initial interpretations. Initial results show that the lithostratigraphic classifier is able to accurately differentiate known layers 76% of the time. Output from the lithostratigraphic classifier can furthermore be plotted directly as isopleth maps that can aid in rapid recognition of tephra layers as well as determination of eruption characteristics, e.g. eruption volume, plume height, etc. The intelligent system produces a useful recognition result, while dealing with the uncertainty from sparse data and the imprecise description of layer characteristics.
Prediction of Nucleotide Binding Peptides Using Star Graph Topological Indices.
Liu, Yong; Munteanu, Cristian R; Fernández Blanco, Enrique; Tan, Zhiliang; Santos Del Riego, Antonino; Pazos, Alejandro
2015-11-01
The nucleotide binding proteins are involved in many important cellular processes, such as transmission of genetic information or energy transfer and storage. Therefore, the screening of new peptides for this biological function is an important research topic. The current study proposes a mixed methodology to obtain the first classification model that is able to predict new nucleotide binding peptides, using only the amino acid sequence. Thus, the methodology uses a Star graph molecular descriptor of the peptide sequences and the Machine Learning technique for the best classifier. The best model represents a Random Forest classifier based on two features of the embedded and non-embedded graphs. The performance of the model is excellent, considering similar models in the field, with an Area Under the Receiver Operating Characteristic Curve (AUROC) value of 0.938 and true positive rate (TPR) of 0.886 (test subset). The prediction of new nucleotide binding peptides with this model could be useful for drug target studies in drug development. © 2015 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.
Gut microbial profile analysis by MiSeq sequencing of pancreatic carcinoma patients in China
Xie, Haiyang; Li, Ang; Lu, Haifeng; Xu, Shaoyan; Zhou, Lin; Zhang, Hua; Cui, Guangying; Chen, Xinhua; Liu, Yuanxing; Wu, Liming; Qin, Nan; Sun, Ranran; Wang, Wei; Li, Lanjuan; Wang, Weilin; Zheng, Shusen
2017-01-01
Pancreatic carcinoma (PC) is a lethal cancer. Gut microbiota is associated with some risk factors of PC, e.g. obesity and types II diabetes. However, the specific gut microbial profile in clinical PC in China has never been reported. This prospective study collected 85 PC and 57 matched healthy controls (HC) to analyze microbial characteristics by MiSeq sequencing. The results showed that gut microbial diversity was decreased in PC with an unique microbial profile, which partly attributed to its decrease of alpha diversity. Microbial alterations in PC featured by the increase of certain pathogens and lipopolysaccharides-producing bacteria, and the decrease of probiotics and butyrate-producing bacteria. Microbial community in obstruction cases was separated from the un-obstructed cases. Streptococcus was associated with the bile. Furthermore, 23 microbial functions e.g. Leucine and LPS biosynthesis were enriched, while 13 functions were reduced in PC. Importantly, based on 40 genera associated with PC, microbial markers achieves a high classification power with AUC of 0.842. In conclusion, gut microbial profile was unique in PC, providing a microbial marker for non-invasive PC diagnosis. PMID:29221120
Molecular and clinical studies of X-linked deafness among Pakistani families.
Waryah, Ali M; Ahmed, Zubair M; Bhinder, Munir A; Binder, Munir A; Choo, Daniel I; Sisk, Robert A; Shahzad, Mohsin; Khan, Shaheen N; Friedman, Thomas B; Riazuddin, Sheikh; Riazuddin, Saima
2011-07-01
There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132 and PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild-to-profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling and molecular epidemiology of hearing loss among Pakistanis.
Pi, Yiming
2017-01-01
The frequency of terahertz radar ranges from 0.1 THz to 10 THz, which is higher than that of microwaves. Multi-modal signals, including high-resolution range profile (HRRP) and Doppler signatures, can be acquired by the terahertz radar system. These two kinds of information are commonly used in automatic target recognition; however, dynamic gesture recognition is rarely discussed in the terahertz regime. In this paper, a dynamic gesture recognition system using a terahertz radar is proposed, based on multi-modal signals. The HRRP sequences and Doppler signatures were first achieved from the radar echoes. Considering the electromagnetic scattering characteristics, a feature extraction model is designed using location parameter estimation of scattering centers. Dynamic Time Warping (DTW) extended to multi-modal signals is used to accomplish the classifications. Ten types of gesture signals, collected from a terahertz radar, are applied to validate the analysis and the recognition system. The results of the experiment indicate that the recognition rate reaches more than 91%. This research verifies the potential applications of dynamic gesture recognition using a terahertz radar. PMID:29267249
Calcium-Dependent Protein Kinase Genes in Corn Roots
NASA Technical Reports Server (NTRS)
Takezawa, D.; Patil, S.; Bhatia, A.; Poovaiah, B. W.
1996-01-01
Two cDNAs encoding Ca-2(+) - Dependent Protein Kinases (CDPKs), Corn Root Protein Kinase 1 and 2 (CRPK 1, CRPK 2) were isolated from the root tip library of corn (Zea mays L., cv. Merit) and their nucleotide sequences were determined. Deduced amino acid sequences of both the clones have features characteristic of plant CDPKS, including all 11 conserved serine/threonine kinase subdomains, a junction domain and a calmodulin-like domain with four Ca-2(+), -binding sites. Northern analysis revealed that CRPKI mRNA is preferentially expressed in roots, especially in the root tip; whereas, the expression of CRPK2 mRNA was very low in all the tissues tested. In situ hybridization experiments revealed that CRPKI mRNA is highly expressed in the root apex, as compared to other parts of the root. Partially purified CDPK from the root tip phosphorylates syntide-2, a common peptide substrate for plant CDPKs, and the phosphorylation was stimulated 7-fold by the addition of Ca-2(+). Our results show that two CDPK isoforms are expressed in corn roots and they may be involved in the Ca-2(+)-dependent signal transduction process.
Serrano-Ahumada, Ana Silvia; Cortes-González, Vianney; González-Huerta, Luz María; Cuevas, Sergio; Aguilar-Lozano, Luis; Villanueva-Mendoza, Cristina
2018-02-01
The aim of this study was to describe a case of severe keratitis-ichthyosis-deafness (KID) syndrome with ocular surface squamous neoplasia. The affected patient underwent complete ocular and systemic examinations. The molecular studies included polymerase chain reaction amplification and automated DNA sequencing of the complete gap junction beta-2 (GJB2) gene coding sequence. A 30-year-old man presented with generalized erythro-hyperkeratosis and deafness and complaints of decreased visual acuity, tearing, and photophobia. Ophthalmic examination showed corneal erosion, vascularization, and a gray gelatinous lesion partially covering the right cornea, suggestive of squamous neoplasia. The clinical features were characteristic of KID syndrome. This diagnosis was confirmed with a DNA analysis showing the pathogenic variant p.D50N in the GJB2 gene. Presumed squamous neoplasia was treated with topical interferon α2b. KID syndrome is a very rare disease that has been reported with an incremental incidence of squamous cell carcinoma of the mucous membranes and skin (12%-15%). Here, we presented a case of severe systemic KID syndrome with ocular surface squamous neoplasia.
Giessen, Tobias W
2016-10-01
Compartmentalization is one of the defining features of life. Cells use protein compartments to exert spatial control over their metabolism, store nutrients and create unique microenvironments needed for essential physiological processes. Encapsulins are a recently discovered class of protein nanocompartments found in bacteria and archaea that naturally encapsulate cargo proteins. A short C-terminal targeting sequence directs the highly specific encapsulation process in vivo. Here, I will initially discuss the properties, diversity and putative function of encapsulins. The unique characteristics and potential uses of the self-sorting cargo-packaging process found in encapsulin systems will then be highlighted. Examples for the application of encapsulins as cell-specific optical nanoprobes and targeted therapeutic delivery systems will be discussed with an emphasis on the ability to integrate multiple functionalities within a single nanodevice. By fusing targeting sequences to non-native proteins, encapsulins can also be used as specific nanocontainers and enzymatic nanoreactors in vivo. I will end by briefly discussing future avenues for encapsulin research related to both basic microbial metabolism and applications in biomedicine, catalysis and materials science. Copyright © 2016 Elsevier Ltd. All rights reserved.
Development and use of molecular markers: past and present.
Grover, Atul; Sharma, P C
2016-01-01
Molecular markers, due to their stability, cost-effectiveness and ease of use provide an immensely popular tool for a variety of applications including genome mapping, gene tagging, genetic diversity diversity, phylogenetic analysis and forensic investigations. In the last three decades, a number of molecular marker techniques have been developed and exploited worldwide in different systems. However, only a handful of these techniques, namely RFLPs, RAPDs, AFLPs, ISSRs, SSRs and SNPs have received global acceptance. A recent revolution in DNA sequencing techniques has taken the discovery and application of molecular markers to high-throughput and ultrahigh-throughput levels. Although, the choice of marker will obviously depend on the targeted use, microsatellites, SNPs and genotyping by sequencing (GBS) largely fulfill most of the user requirements. Further, modern transcriptomic and functional markers will lead the ventures onto high-density genetic map construction, identification of QTLs, breeding and conservation strategies in times to come in combination with other high throughput techniques. This review presents an overview of different marker technologies and their variants with a comparative account of their characteristic features and applications.
Zhou, Zhi; Cao, Zongjie; Pi, Yiming
2017-12-21
The frequency of terahertz radar ranges from 0.1 THz to 10 THz, which is higher than that of microwaves. Multi-modal signals, including high-resolution range profile (HRRP) and Doppler signatures, can be acquired by the terahertz radar system. These two kinds of information are commonly used in automatic target recognition; however, dynamic gesture recognition is rarely discussed in the terahertz regime. In this paper, a dynamic gesture recognition system using a terahertz radar is proposed, based on multi-modal signals. The HRRP sequences and Doppler signatures were first achieved from the radar echoes. Considering the electromagnetic scattering characteristics, a feature extraction model is designed using location parameter estimation of scattering centers. Dynamic Time Warping (DTW) extended to multi-modal signals is used to accomplish the classifications. Ten types of gesture signals, collected from a terahertz radar, are applied to validate the analysis and the recognition system. The results of the experiment indicate that the recognition rate reaches more than 91%. This research verifies the potential applications of dynamic gesture recognition using a terahertz radar.
An interdisciplinary analysis of ERTS data for Colorado mountain environments using ADP Techniques
NASA Technical Reports Server (NTRS)
Hoffer, R. M. (Principal Investigator)
1972-01-01
Author identified significant preliminary results from the Ouachita portion of the Texoma frame of data indicate many potentials in the analysis and interpretation of ERTS data. It is believed that one of the more significant aspects of this analysis sequence has been the investigation of a technique to relate ERTS analysis and surface observation analysis. At present a sequence involving (1) preliminary analysis based solely upon the spectral characteristics of the data, followed by (2) a surface observation mission to obtain visual information and oblique photography to particular points of interest in the test site area, appears to provide an extremely efficient technique for obtaining particularly meaningful surface observation data. Following such a procedure permits concentration on particular points of interest in the entire ERTS frame and thereby makes the surface observation data obtained to be particularly significant and meaningful. The analysis of the Texoma frame has also been significant from the standpoint of demonstrating a fast turn around analysis capability. Additionally, the analysis has shown the potential accuracy and degree of complexity of features that can be identified and mapped using ERTS data.
Characteristics of the sequence effect in Parkinson's disease.
Kang, Suk Yun; Wasaka, Toshiaki; Shamim, Ejaz A; Auh, Sungyoung; Ueki, Yoshino; Lopez, Grisel J; Kida, Tetsuo; Jin, Seung-Hyun; Dang, Nguyet; Hallett, Mark
2010-10-15
The sequence effect (SE) in Parkinson's disease (PD) is progressive slowing of sequential movements. It is a feature of bradykinesia, but is separate from a general slowness without deterioration over time. It is commonly seen in PD, but its physiology is unclear. We measured general slowness and the SE separately with a computer-based, modified Purdue pegboard in 11 patients with advanced PD. We conducted a placebo-controlled, four-way crossover study to learn whether levodopa and repetitive transcranial magnetic stimulation (rTMS) could improve general slowness or the SE. We also examined the correlation between the SE and clinical fatigue. Levodopa alone and rTMS alone improved general slowness, but rTMS showed no additive effect on levodopa. Levodopa alone, rTMS alone, and their combination did not alleviate the SE. There was no correlation between the SE and fatigue. This study suggests that dopaminergic dysfunction and abnormal motor cortex excitability are not the relevant mechanisms for the SE. Additionally, the SE is not a component of clinical fatigue. Further work is needed to establish the physiology and clinical relevance of the SE. © 2010 Movement Disorder Society.
Molecular and Clinical Studies of X-linked Deafness Among Pakistani Families
Waryah, Ali M.; Ahmed, Zubair M.; Choo, Daniel I.; Sisk, Robert A.; Binder, Munir A.; Shahzad, Mohsin; Khan, Shaheen N.; Friedman, Thomas B.; Riazuddin, Sheikh; Riazuddin, Saima
2011-01-01
There are 68 sex-linked syndromes that include hearing loss as one feature and five sex-linked nonsyndromic deafness loci listed in the OMIM database. The possibility of additional such sex-linked loci was explored by ascertaining three unrelated Pakistani families (PKDF536, PKDF1132, PKDF740) segregating X-linked recessive deafness. Sequence analysis of POU3F4 (DFN3) in affected members of families PKDF536 and PKDF1132 revealed two novel nonsense mutations, p.Q136X and p.W114X, respectively. Family PKDF740 is segregating congenital blindness, mild to profound progressive hearing loss that is characteristic of Norrie disease (MIM#310600). Sequence analysis of NDP among affected members of this family revealed a novel single nucleotide deletion c.49delG causing a frameshift and premature truncation (p.V17fsX1) of the encoded protein. These mutations were not found in 150 normal DNA samples. Identification of pathogenic alleles causing X-linked recessive deafness will improve molecular diagnosis, genetic counseling, and molecular epidemiology of hearing loss among Pakistanis. PMID:21633365
Sieve analysis in HIV-1 vaccine efficacy trials
Edlefsen, Paul T.; Gilbert, Peter B.; Rolland, Morgane
2013-01-01
Purpose of review The genetic characterization of HIV-1 breakthrough infections in vaccine and placebo recipients offers new ways to assess vaccine efficacy trials. Statistical and sequence analysis methods provide opportunities to mine the mechanisms behind the effect of an HIV vaccine. Recent findings The release of results from two HIV-1 vaccine efficacy trials, Step/HVTN-502 and RV144, led to numerous studies in the last five years, including efforts to sequence HIV-1 breakthrough infections and compare viral characteristics between the vaccine and placebo groups. Novel genetic and statistical analysis methods uncovered features that distinguished founder viruses isolated from vaccinees from those isolated from placebo recipients, and identified HIV-1 genetic targets of vaccine-induced immune responses. Summary Studies of HIV-1 breakthrough infections in vaccine efficacy trials can provide an independent confirmation to correlates of risk studies, as they take advantage of vaccine/placebo comparisons while correlates of risk analyses are limited to vaccine recipients. Through the identification of viral determinants impacted by vaccine-mediated host immune responses, sieve analyses can shed light on potential mechanisms of vaccine protection. PMID:23719202
Sieve analysis in HIV-1 vaccine efficacy trials.
Edlefsen, Paul T; Gilbert, Peter B; Rolland, Morgane
2013-09-01
The genetic characterization of HIV-1 breakthrough infections in vaccine and placebo recipients offers new ways to assess vaccine efficacy trials. Statistical and sequence analysis methods provide opportunities to mine the mechanisms behind the effect of an HIV vaccine. The release of results from two HIV-1 vaccine efficacy trials, Step/HVTN-502 (HIV Vaccine Trials Network-502) and RV144, led to numerous studies in the last 5 years, including efforts to sequence HIV-1 breakthrough infections and compare viral characteristics between the vaccine and placebo groups. Novel genetic and statistical analysis methods uncovered features that distinguished founder viruses isolated from vaccinees from those isolated from placebo recipients, and identified HIV-1 genetic targets of vaccine-induced immune responses. Studies of HIV-1 breakthrough infections in vaccine efficacy trials can provide an independent confirmation to correlates of risk studies, as they take advantage of vaccine/placebo comparisons, whereas correlates of risk analyses are limited to vaccine recipients. Through the identification of viral determinants impacted by vaccine-mediated host immune responses, sieve analyses can shed light on potential mechanisms of vaccine protection.
Methylotrophic Methylobacterium Bacteria Nodulate and Fix Nitrogen in Symbiosis with Legumes
Sy, Abdoulaye; Giraud, Eric; Jourand, Philippe; Garcia, Nelly; Willems, Anne; de Lajudie, Philippe; Prin, Yves; Neyra, Marc; Gillis, Monique; Boivin-Masson, Catherine; Dreyfus, Bernard
2001-01-01
Rhizobia described so far belong to three distinct phylogenetic branches within the α-2 subclass of Proteobacteria. Here we report the discovery of a fourth rhizobial branch involving bacteria of the Methylobacterium genus. Rhizobia isolated from Crotalaria legumes were assigned to a new species, “Methylobacterium nodulans,” within the Methylobacterium genus on the basis of 16S ribosomal DNA analyses. We demonstrated that these rhizobia facultatively grow on methanol, which is a characteristic of Methylobacterium spp. but a unique feature among rhizobia. Genes encoding two key enzymes of methylotrophy and nodulation, the mxaF gene, encoding the α subunit of the methanol dehydrogenase, and the nodA gene, encoding an acyltransferase involved in Nod factor biosynthesis, were sequenced for the type strain, ORS2060. Plant tests and nodA amplification assays showed that “M. nodulans” is the only nodulating Methylobacterium sp. identified so far. Phylogenetic sequence analysis showed that “M. nodulans” NodA is closely related to Bradyrhizobium NodA, suggesting that this gene was acquired by horizontal gene transfer. PMID:11114919
A Point Rainfall Generator With Internal Storm Structure
NASA Astrophysics Data System (ADS)
Marien, J. L.; Vandewiele, G. L.
1986-04-01
A point rainfall generator is a probabilistic model for the time series of rainfall as observed in one geographical point. The main purpose of such a model is to generate long synthetic sequences of rainfall for simulation studies. The present generator is a continuous time model based on 13.5 years of 10-min point rainfalls observed in Belgium and digitized with a resolution of 0.1 mm. The present generator attempts to model all features of the rainfall time series which are important for flood studies as accurately as possible. The original aspects of the model are on the one hand the way in which storms are defined and on the other hand the theoretical model for the internal storm characteristics. The storm definition has the advantage that the important characteristics of successive storms are fully independent and very precisely modelled, even on time bases as small as 10 min. The model of the internal storm characteristics has a strong theoretical structure. This fact justifies better the extrapolation of this model to severe storms for which the data are very sparse. This can be important when using the model to simulate severe flood events.
Software for pre-processing Illumina next-generation sequencing short read sequences
2014-01-01
Background When compared to Sanger sequencing technology, next-generation sequencing (NGS) technologies are hindered by shorter sequence read length, higher base-call error rate, non-uniform coverage, and platform-specific sequencing artifacts. These characteristics lower the quality of their downstream analyses, e.g. de novo and reference-based assembly, by introducing sequencing artifacts and errors that may contribute to incorrect interpretation of data. Although many tools have been developed for quality control and pre-processing of NGS data, none of them provide flexible and comprehensive trimming options in conjunction with parallel processing to expedite pre-processing of large NGS datasets. Methods We developed ngsShoRT (next-generation sequencing Short Reads Trimmer), a flexible and comprehensive open-source software package written in Perl that provides a set of algorithms commonly used for pre-processing NGS short read sequences. We compared the features and performance of ngsShoRT with existing tools: CutAdapt, NGS QC Toolkit and Trimmomatic. We also compared the effects of using pre-processed short read sequences generated by different algorithms on de novo and reference-based assembly for three different genomes: Caenorhabditis elegans, Saccharomyces cerevisiae S288c, and Escherichia coli O157 H7. Results Several combinations of ngsShoRT algorithms were tested on publicly available Illumina GA II, HiSeq 2000, and MiSeq eukaryotic and bacteria genomic short read sequences with the focus on removing sequencing artifacts and low-quality reads and/or bases. Our results show that across three organisms and three sequencing platforms, trimming improved the mean quality scores of trimmed sequences. Using trimmed sequences for de novo and reference-based assembly improved assembly quality as well as assembler performance. In general, ngsShoRT outperformed comparable trimming tools in terms of trimming speed and improvement of de novo and reference-based assembly as measured by assembly contiguity and correctness. Conclusions Trimming of short read sequences can improve the quality of de novo and reference-based assembly and assembler performance. The parallel processing capability of ngsShoRT reduces trimming time and improves the memory efficiency when dealing with large datasets. We recommend combining sequencing artifacts removal, and quality score based read filtering and base trimming as the most consistent method for improving sequence quality and downstream assemblies. ngsShoRT source code, user guide and tutorial are available at http://research.bioinformatics.udel.edu/genomics/ngsShoRT/. ngsShoRT can be incorporated as a pre-processing step in genome and transcriptome assembly projects. PMID:24955109
Soil, vegetation and total organic carbon stock development in self-restoring abandoned vineyards
NASA Astrophysics Data System (ADS)
József Novák, Tibor; Incze, József; Spohn, Marie; Giani, Luise
2016-04-01
Abandoned vineyard's soil and vegetation development was studied on Tokaj Nagy-Hill, which is one of the traditional wine-producing regions of Hungary, it is declared as UNESCO World Heritage site as cultural landscape. Spatial distribution and pattern of vineyards were changing during the last several hundreds of years, therefore significant part of abandoned vineyards were subjected to long-term spontaneous secondary succession of vegetation and self-restoration of soils in absence of later cultivation. Two chronosequences of spontaneously regenerating vineyard abandonments, one on south (S-sequence) and one on southwest (SW-sequence) slope with differing times since their abandonment (193, 142, 101, 63, 39 and 14 years), were compiled and studied. The S-sequence was 25-35% sloped and strongly eroded, and the SW-sequence was 17-25% sloped and moderately eroded. The sites were investigated in respect of vegetation characteristics, soil physico-chemical characteristics, total organic carbon stocks (TOC stocks), accumulation rates of total organic carbon (TOC accumulation rates), and soil profiles, which were classified according to the World Reference Base (WRB) 2014. Vegetation development resulted in shrub-grassland mosaics, supplemented frequently by protected forb species and forest development at the earliest abandonment in S-sequence, and predominantly to forest vegetation in SW-sequence, where trees were only absent at the 63 and 14 years old abandonment sites. In all sites soils on level of reference groups according to WRB were classified, and Cambisols, Regosols, Calcisols, Leptosols, Chernozems and Phaeozems were found. Soils of the S-sequence show shallow remnants of loess cover with colluvic and redeposited soil materials containing 15-65% skeletal volcanic rock of weathering products coated by secondary calcium carbonates. The SW-sequence profiles are developed on deep loess or loess derivatives. The calcium-carbonate content was higher in profiles of the S-sequence (18.1±10.4%) than in the SW-sequence (6.7±2.7%); consequently. The pH of the topsoil was higher in the S-sequence, and correlated significantly negatively with the age of abandonment in both sequences (r=-0.893; p=0.01 in S, and r=-0.739; p=0.05 in SW). TOC stocks of the top 6 cm soil layers were higher in the S-sequence (1.82±0.71 kg m-2) than in the SW-sequence (0.95 ± 0.49 kg m-2), and correlated significantly positively with the duration of self-restoration. When calculated for the whole profile, TOC stocks were similar in both S- and SW-sequences (S: 8.21±3.31 kg m-2; SW: 8.24±6.01 kg m-2). The TOC accumulation rates of the top 6 cm soil layers exhibited 18.9±10.0 g C m-2y-1 in the S and 7.0±4.2 g C m-2y-1 in the SW-sequence. Sites with the same age of abandonment developed to different vegetation and had different soil features in both chronosequences, indicating that duration of self-restoration is only one of the directive factors in soil development and carbon sequestration processes after abandonment of viticulture on Tokaj Nagy-Hill, which was significantly affected by lithology, slope steepness and exposition as well. Keywords: soil organic carbon stocks; soil organic carbon accumulation rates; vineyard abandonment; terraced soils; Tokaj,