Sample records for sequence features involved

  1. Domino effect in chemical accidents: main features and accident sequences.

    PubMed

    Darbra, R M; Palacios, Adriana; Casal, Joaquim

    2010-11-15

    The main features of domino accidents in process/storage plants and in the transportation of hazardous materials were studied through an analysis of 225 accidents involving this effect. Data on these accidents, which occurred after 1961, were taken from several sources. Aspects analyzed included the accident scenario, the type of accident, the materials involved, the causes and consequences and the most common accident sequences. The analysis showed that the most frequent causes are external events (31%) and mechanical failure (29%). Storage areas (35%) and process plants (28%) are by far the most common settings for domino accidents. Eighty-nine per cent of the accidents involved flammable materials, the most frequent of which was LPG. The domino effect sequences were analyzed using relative probability event trees. The most frequent sequences were explosion→fire (27.6%), fire→explosion (27.5%) and fire→fire (17.8%). Copyright © 2010 Elsevier B.V. All rights reserved.

  2. Processing Translational Motion Sequences.

    DTIC Science & Technology

    1982-10-01

    the initial ROADSIGN image using a (del)**2g mask with a width of 5 pixels The distinctiveness values were computed using features which were 5x5 pixel...the initial step size of the local search quite large. 34 4. EX P R g NTg The following experiments were performed using the roadsign and industrial...the initial image of the sequence. The third experiment involves processing the roadsign image sequence using the features extracted at the positions

  3. Classification of G-protein coupled receptors based on a rich generation of convolutional neural network, N-gram transformation and multiple sequence alignments.

    PubMed

    Li, Man; Ling, Cheng; Xu, Qi; Gao, Jingyang

    2018-02-01

    Sequence classification is crucial in predicting the function of newly discovered sequences. In recent years, the prediction of the incremental large-scale and diversity of sequences has heavily relied on the involvement of machine-learning algorithms. To improve prediction accuracy, these algorithms must confront the key challenge of extracting valuable features. In this work, we propose a feature-enhanced protein classification approach, considering the rich generation of multiple sequence alignment algorithms, N-gram probabilistic language model and the deep learning technique. The essence behind the proposed method is that if each group of sequences can be represented by one feature sequence, composed of homologous sites, there should be less loss when the sequence is rebuilt, when a more relevant sequence is added to the group. On the basis of this consideration, the prediction becomes whether a query sequence belonging to a group of sequences can be transferred to calculate the probability that the new feature sequence evolves from the original one. The proposed work focuses on the hierarchical classification of G-protein Coupled Receptors (GPCRs), which begins by extracting the feature sequences from the multiple sequence alignment results of the GPCRs sub-subfamilies. The N-gram model is then applied to construct the input vectors. Finally, these vectors are imported into a convolutional neural network to make a prediction. The experimental results elucidate that the proposed method provides significant performance improvements. The classification error rate of the proposed method is reduced by at least 4.67% (family level I) and 5.75% (family Level II), in comparison with the current state-of-the-art methods. The implementation program of the proposed work is freely available at: https://github.com/alanFchina/CNN .

  4. Efficient Feature Selection and Classification of Protein Sequence Data in Bioinformatics

    PubMed Central

    Faye, Ibrahima; Samir, Brahim Belhaouari; Md Said, Abas

    2014-01-01

    Bioinformatics has been an emerging area of research for the last three decades. The ultimate aims of bioinformatics were to store and manage the biological data, and develop and analyze computational tools to enhance their understanding. The size of data accumulated under various sequencing projects is increasing exponentially, which presents difficulties for the experimental methods. To reduce the gap between newly sequenced protein and proteins with known functions, many computational techniques involving classification and clustering algorithms were proposed in the past. The classification of protein sequences into existing superfamilies is helpful in predicting the structure and function of large amount of newly discovered proteins. The existing classification results are unsatisfactory due to a huge size of features obtained through various feature encoding methods. In this work, a statistical metric-based feature selection technique has been proposed in order to reduce the size of the extracted feature vector. The proposed method of protein classification shows significant improvement in terms of performance measure metrics: accuracy, sensitivity, specificity, recall, F-measure, and so forth. PMID:25045727

  5. The PRC2-binding long non-coding RNAs in human and mouse genomes are associated with predictive sequence features

    NASA Astrophysics Data System (ADS)

    Tu, Shiqi; Yuan, Guo-Cheng; Shao, Zhen

    2017-01-01

    Recently, long non-coding RNAs (lncRNAs) have emerged as an important class of molecules involved in many cellular processes. One of their primary functions is to shape epigenetic landscape through interactions with chromatin modifying proteins. However, mechanisms contributing to the specificity of such interactions remain poorly understood. Here we took the human and mouse lncRNAs that were experimentally determined to have physical interactions with Polycomb repressive complex 2 (PRC2), and systematically investigated the sequence features of these lncRNAs by developing a new computational pipeline for sequences composition analysis, in which each sequence is considered as a series of transitions between adjacent nucleotides. Through that, PRC2-binding lncRNAs were found to be associated with a set of distinctive and evolutionarily conserved sequence features, which can be utilized to distinguish them from the others with considerable accuracy. We further identified fragments of PRC2-binding lncRNAs that are enriched with these sequence features, and found they show strong PRC2-binding signals and are more highly conserved across species than the other parts, implying their functional importance.

  6. The Genome sequences of four non-human/non-clinical Salmonella enterica serovar Kentucky ST198 isolates recovered between 1972 and 1973

    USDA-ARS?s Scientific Manuscript database

    Salmonella Kentucky is a polyphyletic member of S. enterica subclade A1 with multiple sequence types that often colonize the same hosts but in different frequencies on different continents. To evaluate the genomic features involved in S. Kentucky host specificity we sequenced the genomes of four iso...

  7. Frequency of the first feature in action sequences influences feature binding.

    PubMed

    Mattson, Paul S; Fournier, Lisa R; Behmer, Lawrence P

    2012-10-01

    We investigated whether binding among perception and action feature codes is a preliminary step toward creating a more durable memory trace of an action event. If so, increasing the frequency of a particular event (e.g., a stimulus requiring a movement with the left or right hand in an up or down direction) should increase the strength and speed of feature binding for this event. The results from two experiments, using a partial-repetition paradigm, confirmed that feature binding increased in strength and/or occurred earlier for a high-frequency (e.g., left hand moving up) than for a low-frequency (e.g., right hand moving down) event. Moreover, increasing the frequency of the first-specified feature in the action sequence alone (e.g., "left" hand) increased the strength and/or speed of action feature binding (e.g., between the "left" hand and movement in an "up" or "down" direction). The latter finding suggests an update to the theory of event coding, as not all features in the action sequence equally determine binding strength. We conclude that action planning involves serial binding of features in the order of action feature execution (i.e., associations among features are not bidirectional but are directional), which can lead to a more durable memory trace. This is consistent with physiological evidence suggesting that serial order is preserved in an action plan executed from memory and that the first feature in the action sequence may be critical in preserving this serial order.

  8. Effective Feature Selection for Classification of Promoter Sequences.

    PubMed

    K, Kouser; P G, Lavanya; Rangarajan, Lalitha; K, Acharya Kshitish

    2016-01-01

    Exploring novel computational methods in making sense of biological data has not only been a necessity, but also productive. A part of this trend is the search for more efficient in silico methods/tools for analysis of promoters, which are parts of DNA sequences that are involved in regulation of expression of genes into other functional molecules. Promoter regions vary greatly in their function based on the sequence of nucleotides and the arrangement of protein-binding short-regions called motifs. In fact, the regulatory nature of the promoters seems to be largely driven by the selective presence and/or the arrangement of these motifs. Here, we explore computational classification of promoter sequences based on the pattern of motif distributions, as such classification can pave a new way of functional analysis of promoters and to discover the functionally crucial motifs. We make use of Position Specific Motif Matrix (PSMM) features for exploring the possibility of accurately classifying promoter sequences using some of the popular classification techniques. The classification results on the complete feature set are low, perhaps due to the huge number of features. We propose two ways of reducing features. Our test results show improvement in the classification output after the reduction of features. The results also show that decision trees outperform SVM (Support Vector Machine), KNN (K Nearest Neighbor) and ensemble classifier LibD3C, particularly with reduced features. The proposed feature selection methods outperform some of the popular feature transformation methods such as PCA and SVD. Also, the methods proposed are as accurate as MRMR (feature selection method) but much faster than MRMR. Such methods could be useful to categorize new promoters and explore regulatory mechanisms of gene expressions in complex eukaryotic species.

  9. Genome Sequence of the Hemolytic-Uremic Syndrome-Causing Strain Escherichia coli NCCP15647

    PubMed Central

    Jeong, Haeyoung; Zhao, Fumei; Igori, Davaajargal; Oh, Kyung-Hwan; Kim, Seon-Young; Kang, Sung Gyun; Kim, Byung Kwon; Kwon, Soon-Kyeong; Lee, Choong Hoon; Song, Ju Yeon; Yu, Dong Su; Park, Mi-Sun

    2012-01-01

    Enterohemorrhagic Escherichia coli (EHEC) causes a disease involving diarrhea, hemorrhagic colitis, and hemolytic-uremic syndrome (HUS). Here we present the draft genome sequence of NCCP15647, an EHEC isolate from an HUS patient. Its genome exhibits features of EHEC, such as genes for verotoxins, a type III secretion system, and prophages. PMID:22740672

  10. Dynamic Encoding of Speech Sequence Probability in Human Temporal Cortex

    PubMed Central

    Leonard, Matthew K.; Bouchard, Kristofer E.; Tang, Claire

    2015-01-01

    Sensory processing involves identification of stimulus features, but also integration with the surrounding sensory and cognitive context. Previous work in animals and humans has shown fine-scale sensitivity to context in the form of learned knowledge about the statistics of the sensory environment, including relative probabilities of discrete units in a stream of sequential auditory input. These statistics are a defining characteristic of one of the most important sequential signals humans encounter: speech. For speech, extensive exposure to a language tunes listeners to the statistics of sound sequences. To address how speech sequence statistics are neurally encoded, we used high-resolution direct cortical recordings from human lateral superior temporal cortex as subjects listened to words and nonwords with varying transition probabilities between sound segments. In addition to their sensitivity to acoustic features (including contextual features, such as coarticulation), we found that neural responses dynamically encoded the language-level probability of both preceding and upcoming speech sounds. Transition probability first negatively modulated neural responses, followed by positive modulation of neural responses, consistent with coordinated predictive and retrospective recognition processes, respectively. Furthermore, transition probability encoding was different for real English words compared with nonwords, providing evidence for online interactions with high-order linguistic knowledge. These results demonstrate that sensory processing of deeply learned stimuli involves integrating physical stimulus features with their contextual sequential structure. Despite not being consciously aware of phoneme sequence statistics, listeners use this information to process spoken input and to link low-level acoustic representations with linguistic information about word identity and meaning. PMID:25948269

  11. The Acquisition of Consonant Feature Sequences: Harmony, Metathesis, and Deletion Patterns in Phonological Development

    ERIC Educational Resources Information Center

    Gerlach, Sharon Ruth

    2010-01-01

    This dissertation examines three processes affecting consonants in child speech: harmony (long-distance assimilation) involving major place features as in "coat" [kouk]; long-distance metathesis as in "cup" [p[wedge]k]; and initial consonant deletion as in "fish" [is]. These processes are unattested in adult phonology, leading to proposals for…

  12. Driving on the surface of Mars with the rover sequencing and visualization program

    NASA Technical Reports Server (NTRS)

    Wright, J.; Hartman, F.; Cooper, B.; Maxwell, S.; Yen, J.; Morrison, J.

    2005-01-01

    Operating a rover on Mars is not possible using teleoperations due to the distance involved and the bandwith limitations. To operate these rovers requires sophisticated tools to make operators knowledgeable of the terrain, hazards, features of interest, and rover state and limitations, and to support building command sequences and rehearsing expected operations. This paper discusses how the Rover Sequencing and Visualization program and a small set of associated tools support this requirement.

  13. Draft genome sequences of two Streptococcus pyogenes strains involved in abnormal sharp raised scarlet fever in China, 2011.

    PubMed

    You, Yuanhai; Yang, Xianwei; Song, Yanyan; Yan, Xiaomei; Yuan, Yanting; Li, Dongfang; Yan, Yanfeng; Wang, Haibin; Tao, Xiaoxia; Li, Leilei; Jiang, Xihong; Zhou, Hao; Xiao, Di; Jin, Lianmei; Feng, Zijian; Yang, Ruifu; Luo, Fengji; Cui, Yujun; Zhang, Jianzhong

    2012-11-01

    A scarlet fever outbreak caused by Streptococcus pyogenes occurred in China in 2011. To determine the genomic features of the outbreak strains, we deciphered genomes of two strains isolated from the regions with the highest incidence rates. The sequences will provide valuable information for comprehensive study of mechanisms related to this outbreak.

  14. Unlocking hidden genomic sequence

    PubMed Central

    Keith, Jonathan M.; Cochran, Duncan A. E.; Lala, Gita H.; Adams, Peter; Bryant, Darryn; Mitchelson, Keith R.

    2004-01-01

    Despite the success of conventional Sanger sequencing, significant regions of many genomes still present major obstacles to sequencing. Here we propose a novel approach with the potential to alleviate a wide range of sequencing difficulties. The technique involves extracting target DNA sequence from variants generated by introduction of random mutations. The introduction of mutations does not destroy original sequence information, but distributes it amongst multiple variants. Some of these variants lack problematic features of the target and are more amenable to conventional sequencing. The technique has been successfully demonstrated with mutation levels up to an average 18% base substitution and has been used to read previously intractable poly(A), AT-rich and GC-rich motifs. PMID:14973330

  15. KBG syndrome involving a single-nucleotide duplication in ANKRD11

    PubMed Central

    Kleyner, Robert; Malcolmson, Janet; Tegay, David; Ward, Kenneth; Maughan, Annette; Maughan, Glenn; Nelson, Lesa; Wang, Kai; Robison, Reid; Lyon, Gholson J.

    2016-01-01

    KBG syndrome is a rare autosomal dominant genetic condition characterized by neurological involvement and distinct facial, hand, and skeletal features. More than 70 cases have been reported; however, it is likely that KBG syndrome is underdiagnosed because of lack of comprehensive characterization of the heterogeneous phenotypic features. We describe the clinical manifestations in a male currently 13 years of age, who exhibited symptoms including epilepsy, severe developmental delay, distinct facial features, and hand anomalies, without a positive genetic diagnosis. Subsequent exome sequencing identified a novel de novo heterozygous single base pair duplication (c.6015dupA) in ANKRD11, which was validated by Sanger sequencing. This single-nucleotide duplication is predicted to lead to a premature stop codon and loss of function in ANKRD11, thereby implicating it as contributing to the proband's symptoms and yielding a molecular diagnosis of KBG syndrome. Before molecular diagnosis, this syndrome was not recognized in the proband, as several key features of the disorder were mild and were not recognized by clinicians, further supporting the concept of variable expressivity in many disorders. Although a diagnosis of cerebral folate deficiency has also been given, its significance for the proband's condition remains uncertain. PMID:27900361

  16. Sequence features of viral and human Internal Ribosome Entry Sites predictive of their activity

    PubMed Central

    Elias-Kirma, Shani; Nir, Ronit; Segal, Eran

    2017-01-01

    Translation of mRNAs through Internal Ribosome Entry Sites (IRESs) has emerged as a prominent mechanism of cellular and viral initiation. It supports cap-independent translation of select cellular genes under normal conditions, and in conditions when cap-dependent translation is inhibited. IRES structure and sequence are believed to be involved in this process. However due to the small number of IRESs known, there have been no systematic investigations of the determinants of IRES activity. With the recent discovery of thousands of novel IRESs in human and viruses, the next challenge is to decipher the sequence determinants of IRES activity. We present the first in-depth computational analysis of a large body of IRESs, exploring RNA sequence features predictive of IRES activity. We identified predictive k-mer features resembling IRES trans-acting factor (ITAF) binding motifs across human and viral IRESs, and found that their effect on expression depends on their sequence, number and position. Our results also suggest that the architecture of retroviral IRESs differs from that of other viruses, presumably due to their exposure to the nuclear environment. Finally, we measured IRES activity of synthetically designed sequences to confirm our prediction of increasing activity as a function of the number of short IRES elements. PMID:28922394

  17. Draft Genome Sequences of Two Streptococcus pyogenes Strains Involved in Abnormal Sharp Raised Scarlet Fever in China, 2011

    PubMed Central

    You, Yuanhai; Yang, Xianwei; Song, Yanyan; Yan, Xiaomei; Yuan, Yanting; Li, Dongfang; Yan, Yanfeng; Wang, Haibin; Tao, Xiaoxia; Li, Leilei; Jiang, Xihong; Zhou, Hao; Xiao, Di; Jin, Lianmei; Feng, Zijian; Yang, Ruifu; Luo, Fengji

    2012-01-01

    A scarlet fever outbreak caused by Streptococcus pyogenes occurred in China in 2011. To determine the genomic features of the outbreak strains, we deciphered genomes of two strains isolated from the regions with the highest incidence rates. The sequences will provide valuable information for comprehensive study of mechanisms related to this outbreak. PMID:23045496

  18. Preattentive binding of auditory and visual stimulus features.

    PubMed

    Winkler, István; Czigler, István; Sussman, Elyse; Horváth, János; Balázs, Lászlo

    2005-02-01

    We investigated the role of attention in feature binding in the auditory and the visual modality. One auditory and one visual experiment used the mismatch negativity (MMN and vMMN, respectively) event-related potential to index the memory representations created from stimulus sequences, which were either task-relevant and, therefore, attended or task-irrelevant and ignored. In the latter case, the primary task was a continuous demanding within-modality task. The test sequences were composed of two frequently occurring stimuli, which differed from each other in two stimulus features (standard stimuli) and two infrequently occurring stimuli (deviants), which combined one feature from one standard stimulus with the other feature of the other standard stimulus. Deviant stimuli elicited MMN responses of similar parameters across the different attentional conditions. These results suggest that the memory representations involved in the MMN deviance detection response encoded the frequently occurring feature combinations whether or not the test sequences were attended. A possible alternative to the memory-based interpretation of the visual results, the elicitation of the McCollough color-contingent aftereffect, was ruled out by the results of our third experiment. The current results are compared with those supporting the attentive feature integration theory. We conclude that (1) with comparable stimulus paradigms, similar results have been obtained in the two modalities, (2) there exist preattentive processes of feature binding, however, (3) conjoining features within rich arrays of objects under time pressure and/or longterm retention of the feature-conjoined memory representations may require attentive processes.

  19. Mulibrey nanism: Two novel mutations in a child identified by Array CGH and DNA sequencing.

    PubMed

    Mozzillo, Enza; Cozzolino, Carla; Genesio, Rita; Melis, Daniela; Frisso, Giulia; Orrico, Ada; Lombardo, Barbara; Fattorusso, Valentina; Discepolo, Valentina; Della Casa, Roberto; Simonelli, Francesca; Nitsch, Lucio; Salvatore, Francesco; Franzese, Adriana

    2016-08-01

    In childhood, several rare genetic diseases have overlapping symptoms and signs, including those regarding growth alterations, thus the differential diagnosis is sometimes difficult. The proband, aged 3 years, was suspected to have Silver-Russel syndrome because of intrauterine growth retardation, postnatal growth retardation, typical facial dysmorphic features, macrocephaly, body asymmetry, and bilateral fifth finger clinodactyly. Other features were left atrial and ventricular enlargement and patent foramen ovale. Total X-ray skeleton showed hypoplasia of the twelfth rib bilaterally and of the coccyx, slender long bones with thick cortex, and narrow medullary channels. The genetic investigation did not confirm Silver-Russel syndrome. At the age of 5 the patient developed an additional sign: hepatomegaly. Array CGH revealed a 147 kb deletion (involving TRIM 37 and SKA2 genes) on one allele of chromosome 17, inherited from his mother. These results suggested Mulibrey nanism. The clinical features were found to fit this hypothesis. Sequencing of the TRIM 37 gene showed a single base change at a splicing locus, inherited from his father that provoked a truncated protein. The combined use of Array CGH and DNA sequencing confirmed diagnosis of Mulibrey nanism. The large deletion involving the SKA2 gene, along with the increased frequency of malignant tumours in mulibrey patients, suggests closed monitoring for cancer of our patient and his mother. Array CGH should be performed as first tier test in all infants with multiple anomalies. The clinician should reconsider the clinical features when the genetics suggests this. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.

  20. Draft Genome Sequence of Plant Growth-Promoting Drought-Tolerant Bacillus sp. Strain CMAA 1363 Isolated from the Brazilian Caatinga Biome.

    PubMed

    Kavamura, Vanessa Nessner; Santos, Suikinai Nobre; Taketani, Rodrigo Gouvêa; Vasconcellos, Rafael Leandro Figueiredo; Melo, Itamar Soares

    2017-02-02

    The strain of Bacillus sp. CMAA 1363 was isolated from the Brazilian Caatinga biome and showed plant growth-promoting traits and ability to promote maize growth under drought stress. Sequencing revealed genes involved in stress response and plant growth promotion. These genomic features might aid in the protection of plants against the negative effects imposed by drought. Copyright © 2017 Kavamura et al.

  1. Draft Genome Sequence of Plant Growth-Promoting Drought-Tolerant Bacillus sp. Strain CMAA 1363 Isolated from the Brazilian Caatinga Biome

    PubMed Central

    Santos, Suikinai Nobre; Taketani, Rodrigo Gouvêa; Vasconcellos, Rafael Leandro Figueiredo; Melo, Itamar Soares

    2017-01-01

    ABSTRACT The strain of Bacillus sp. CMAA 1363 was isolated from the Brazilian Caatinga biome and showed plant growth-promoting traits and ability to promote maize growth under drought stress. Sequencing revealed genes involved in stress response and plant growth promotion. These genomic features might aid in the protection of plants against the negative effects imposed by drought. PMID:28153893

  2. Rotation invariant features for wear particle classification

    NASA Astrophysics Data System (ADS)

    Arof, Hamzah; Deravi, Farzin

    1997-09-01

    This paper investigates the ability of a set of rotation invariant features to classify images of wear particles found in used lubricating oil of machinery. The rotation invariant attribute of the features is derived from the property of the magnitudes of Fourier transform coefficients that do not change with spatial shift of the input elements. By analyzing individual circular neighborhoods centered at every pixel in an image, local and global texture characteristics of an image can be described. A number of input sequences are formed by the intensities of pixels on concentric rings of various radii measured from the center of each neighborhood. Fourier transforming the sequences would generate coefficients whose magnitudes are invariant to rotation. Rotation invariant features extracted from these coefficients were utilized to classify wear particle images that were obtained from a number of different particles captured at different orientations. In an experiment involving images of 6 classes, the circular neighborhood features obtained a 91% recognition rate which compares favorably to a 76% rate achieved by features of a 6 by 6 co-occurrence matrix.

  3. ORFer--retrieval of protein sequences and open reading frames from GenBank and storage into relational databases or text files.

    PubMed

    Büssow, Konrad; Hoffmann, Steve; Sievert, Volker

    2002-12-19

    Functional genomics involves the parallel experimentation with large sets of proteins. This requires management of large sets of open reading frames as a prerequisite of the cloning and recombinant expression of these proteins. A Java program was developed for retrieval of protein and nucleic acid sequences and annotations from NCBI GenBank, using the XML sequence format. Annotations retrieved by ORFer include sequence name, organism and also the completeness of the sequence. The program has a graphical user interface, although it can be used in a non-interactive mode. For protein sequences, the program also extracts the open reading frame sequence, if available, and checks its correct translation. ORFer accepts user input in the form of single or lists of GenBank GI identifiers or accession numbers. It can be used to extract complete sets of open reading frames and protein sequences from any kind of GenBank sequence entry, including complete genomes or chromosomes. Sequences are either stored with their features in a relational database or can be exported as text files in Fasta or tabulator delimited format. The ORFer program is freely available at http://www.proteinstrukturfabrik.de/orfer. The ORFer program allows for fast retrieval of DNA sequences, protein sequences and their open reading frames and sequence annotations from GenBank. Furthermore, storage of sequences and features in a relational database is supported. Such a database can supplement a laboratory information system (LIMS) with appropriate sequence information.

  4. Motor programming when sequencing multiple elements of the same duration.

    PubMed

    Magnuson, Curt E; Robin, Donald A; Wright, David L

    2008-11-01

    Motor programming at the self-select paradigm was adopted in 2 experiments to examine the processing demands of independent processes. One process (INT) is responsible for organizing the internal features of the individual elements in a movement (e.g., response duration). The 2nd process (SEQ) is responsible for placing the elements into the proper serial order before execution. Participants in Experiment 1 performed tasks involving 1 key press or sequences of 4 key presses of the same duration. Implementing INT and SEQ was more time consuming for key-pressing sequences than for single key-press tasks. Experiment 2 examined whether the INT costs resulting from the increase in sequence length observed in Experiment 1 resulted from independent planning of each sequence element or via a separate "multiplier" process that handled repetitions of elements of the same duration. Findings from Experiment 2, in which participants performed single key presses or double or triple key sequences of the same duration, suggested that INT is involved with the independent organization of each element contained in the sequence. Researchers offer an elaboration of the 2-process account of motor programming to incorporate the present findings and the findings from other recent sequence-learning research.

  5. The Conformational Stability and Biophysical Properties of the Eukaryotic Thioredoxins of Pisum Sativum Are Not Family-Conserved

    PubMed Central

    Aguado-Llera, David; Martínez-Gómez, Ana Isabel; Prieto, Jesús; Marenchino, Marco; Traverso, José Angel; Gómez, Javier; Chueca, Ana; Neira, José L.

    2011-01-01

    Thioredoxins (TRXs) are ubiquitous proteins involved in redox processes. About forty genes encode TRX or TRX-related proteins in plants, grouped in different families according to their subcellular localization. For instance, the h-type TRXs are located in cytoplasm or mitochondria, whereas f-type TRXs have a plastidial origin, although both types of proteins have an eukaryotic origin as opposed to other TRXs. Herein, we study the conformational and the biophysical features of TRXh1, TRXh2 and TRXf from Pisum sativum. The modelled structures of the three proteins show the well-known TRX fold. While sharing similar pH-denaturations features, the chemical and thermal stabilities are different, being PsTRXh1 (Pisum sativum thioredoxin h1) the most stable isoform; moreover, the three proteins follow a three-state denaturation model, during the chemical-denaturations. These differences in the thermal- and chemical-denaturations result from changes, in a broad sense, of the several ASAs (accessible surface areas) of the proteins. Thus, although a strong relationship can be found between the primary amino acid sequence and the structure among TRXs, that between the residue sequence and the conformational stability and biophysical properties is not. We discuss how these differences in the biophysical properties of TRXs determine their unique functions in pea, and we show how residues involved in the biophysical features described (pH-titrations, dimerizations and chemical-denaturations) belong to regions involved in interaction with other proteins. Our results suggest that the sequence demands of protein-protein function are relatively rigid, with different protein-binding pockets (some in common) for each of the three proteins, but the demands of structure and conformational stability per se (as long as there is a maintained core), are less so. PMID:21364950

  6. [Noonan syndrome can be diagnosed clinically and through molecular genetic analyses].

    PubMed

    Henningsen, Marie Krab; Jelsig, Anne Marie; Andersen, Helle; Brusgaard, Klaus; Ousager, Lilian Bomme; Hertz, Jens Michael

    2015-08-03

    Noonan syndrome is part of the group of RASopathies caused by germ line mutations in genes involved in the RAS/MAPK pathway. There is substantial phenotypic overlap among the RASopathies. Diagnosis of Noonan syndrome is often based on clinical features including dysmorphic facial features, short stature and congenital heart disease. Rapid advances in sequencing technology have made molecular genetic analyses a helpful tool in diagnosing and distinguishing Noonan syndrome from other RASopathies.

  7. Severe Craniofacial Involvement due to Amniotic Band Sequence.

    PubMed

    Becerra-Solano, Luis Eduardo; Castañeda-Cisneros, Gema; Corona-Rivera, Jorge Roman; Díaz-Rodríguez, Manuel; Figuera, Luis Eduardo; López-Muñoz, Eunice; Nastasi-Catanese, José Antonio; Toscano-Flores, José Jesús; Ramírez-Dueñas, María de Lourdes; García-Ortíz, José Elias

    2018-02-01

    Disruptive amniotic band sequence (DABS) is a sporadic, non-familial disorder with unclear etiology. Diagnosis is based on clinical features because there is currently no reliable laboratory diagnostic tests. We describe six cases of DABS with severe craniofacial deformations, three with and three without classical constrictive limb deformation. The craniofacial deformities were delimited by peripheral sharply demarcated scarring. When a sharply demarcated linear disruptive craniofacial lesion is observed, DABS should be considered despite the absence of constrictive limb scarring.

  8. The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat

    PubMed Central

    Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

    2016-01-01

    Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. PMID:26645327

  9. Amelogenin Evolution and Tetrapod Enamel Structure

    PubMed Central

    Diekwisch, Thomas G.H.; Jin, Tianquan; Wang, Xinping; Ito, Yoshihiro; Schmidt, Marcella; Druzinsky, Robert; Yamane, Akira; Luan, Xianghong

    2009-01-01

    Amelogenins are the major proteins involved in tooth enamel formation. In the present study we have cloned and sequenced four novel amelogenins from three amphibian species in order to analyze similarities and differences between mammalian and non-mammalian amelogenins. The newly sequenced amphibian amelogenin sequences were from a Red-eyed tree frog (Litoria chloris) and a Mexican axolotl (Ambystoma mexicanum). We identified two amelogenin isoforms in the Eastern Red-backed Salamander (Plethodon cinereus). Sequence comparisons confirmed that non-mammalian amelogenins are overall shorter than their mammalian counterparts, contain less proline and less glutamine, and feature shorter polyproline tripeptide repeat stretches than mammalian amelogenins. We propose that unique sequence parameters of mammalian amelogenins might be a pre-requisite for complex mammalian enamel prism architecture. PMID:19828974

  10. PDNAsite: Identification of DNA-binding Site from Protein Sequence by Incorporating Spatial and Sequence Context

    PubMed Central

    Zhou, Jiyun; Xu, Ruifeng; He, Yulan; Lu, Qin; Wang, Hongpeng; Kong, Bing

    2016-01-01

    Protein-DNA interactions are involved in many fundamental biological processes essential for cellular function. Most of the existing computational approaches employed only the sequence context of the target residue for its prediction. In the present study, for each target residue, we applied both the spatial context and the sequence context to construct the feature space. Subsequently, Latent Semantic Analysis (LSA) was applied to remove the redundancies in the feature space. Finally, a predictor (PDNAsite) was developed through the integration of the support vector machines (SVM) classifier and ensemble learning. Results on the PDNA-62 and the PDNA-224 datasets demonstrate that features extracted from spatial context provide more information than those from sequence context and the combination of them gives more performance gain. An analysis of the number of binding sites in the spatial context of the target site indicates that the interactions between binding sites next to each other are important for protein-DNA recognition and their binding ability. The comparison between our proposed PDNAsite method and the existing methods indicate that PDNAsite outperforms most of the existing methods and is a useful tool for DNA-binding site identification. A web-server of our predictor (http://hlt.hitsz.edu.cn:8080/PDNAsite/) is made available for free public accessible to the biological research community. PMID:27282833

  11. Multiple Repair Sequences in Everyday Conversations Involving People with Parkinson's Disease

    ERIC Educational Resources Information Center

    Griffiths, Sarah; Barnes, Rebecca; Britten, Nicky; Wilkinson, Ray

    2015-01-01

    Background: Features of dysarthria associated with Parkinson's disease (PD), such as low volume, variable rate of speech and increased pauses, impact speaker intelligibility. Those affected report restricted interactional participation, although this area is under explored. Aims: To examine naturally occurring instances of problems with…

  12. Playing Digital: Music Instruction for the Next Generation.

    ERIC Educational Resources Information Center

    Hardy, Lawrence

    2001-01-01

    Active involvement in music can yield significant intellectual and emotional benefits. A Washington-area high school features a digitally literate music teacher and a piano lab with 25 workstations allowing music-loving students to express their creativity. MIDI sequencers and synthesizers aid young composers' efforts. (MLH)

  13. Draft genome sequence of Kocuria sp. SM24M-10 isolated from coral mucus

    PubMed Central

    Palermo, Bruna Rafaella Z.; Castro, Daniel B.A.; Pereira, Letícia Bianca; Cauz, Ana Carolina G.; Magalhães, Beatriz L.; Carlos, Camila; da Costa, Fernanda L.P.; Scagion, Guilherme P.; Higa, Juliana S.; Almeida, Ludimila D.; das Neves, Meiriele da S.; Cordeiro, Melina Aparecida; do Prado, Paula F.V.; da Silva, Thiago M.; Balsalobre, Thiago Willian A.; Paulino, Luciana C.; Vicentini, Renato; Ferraz, Lúcio F.C.; Ottoboni, Laura M.M.

    2015-01-01

    Here, we describe the genomic features of the Actinobacteria Kocuria sp. SM24M-10 isolated from mucus of the Brazilian endemic coral Mussismilia hispida. The sequences are available under accession number LDNX01000000 (http://www.ncbi.nlm.nih.gov/nuccore/LDNX00000000). The genomic analysis revealed interesting information about the adaptation of bacteria to the marine environment (such as genes involved in osmotic and oxidative stress) and to the nutrient-rich environment provided by the coral mucus. PMID:26981384

  14. 40 CFR 1065.526 - Repeating void modes or test intervals.

    Code of Federal Regulations, 2011 CFR

    2011-07-01

    ... or test intervals in any circumstances that would be inconsistent with good engineering judgment. For... that include hybrid energy storage features or emission controls that involve physical or chemical... shut down, restart the engine. (2) Use good engineering judgment to restart the test sequence using the...

  15. 40 CFR 1065.526 - Repeating void modes or test intervals.

    Code of Federal Regulations, 2012 CFR

    2012-07-01

    ... or test intervals in any circumstances that would be inconsistent with good engineering judgment. For... that include hybrid energy storage features or emission controls that involve physical or chemical... shut down, restart the engine. (2) Use good engineering judgment to restart the test sequence using the...

  16. The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat.

    PubMed

    Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji

    2016-02-01

    Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  17. Exploiting Amino Acid Composition for Predicting Protein-Protein Interactions

    PubMed Central

    Roy, Sushmita; Martinez, Diego; Platero, Harriett; Lane, Terran; Werner-Washburne, Margaret

    2009-01-01

    Background Computational prediction of protein interactions typically use protein domains as classifier features because they capture conserved information of interaction surfaces. However, approaches relying on domains as features cannot be applied to proteins without any domain information. In this paper, we explore the contribution of pure amino acid composition (AAC) for protein interaction prediction. This simple feature, which is based on normalized counts of single or pairs of amino acids, is applicable to proteins from any sequenced organism and can be used to compensate for the lack of domain information. Results AAC performed at par with protein interaction prediction based on domains on three yeast protein interaction datasets. Similar behavior was obtained using different classifiers, indicating that our results are a function of features and not of classifiers. In addition to yeast datasets, AAC performed comparably on worm and fly datasets. Prediction of interactions for the entire yeast proteome identified a large number of novel interactions, the majority of which co-localized or participated in the same processes. Our high confidence interaction network included both well-studied and uncharacterized proteins. Proteins with known function were involved in actin assembly and cell budding. Uncharacterized proteins interacted with proteins involved in reproduction and cell budding, thus providing putative biological roles for the uncharacterized proteins. Conclusion AAC is a simple, yet powerful feature for predicting protein interactions, and can be used alone or in conjunction with protein domains to predict new and validate existing interactions. More importantly, AAC alone performs at par with existing, but more complex, features indicating the presence of sequence-level information that is predictive of interaction, but which is not necessarily restricted to domains. PMID:19936254

  18. Common and diverse features of cocirculating type 2 and 3 recombinant vaccine-derived polioviruses isolated from patients with poliomyelitis and healthy children.

    PubMed

    Joffret, Marie-Line; Jégouic, Sophie; Bessaud, Maël; Balanant, Jean; Tran, Coralie; Caro, Valerie; Holmblat, Barbara; Razafindratsimandresy, Richter; Reynes, Jean-Marc; Rakoto-Andrianarivelo, Mala; Delpeyroux, Francis

    2012-05-01

    Five cases of poliomyelitis due to type 2 or 3 recombinant vaccine-derived polioviruses (VDPVs) were reported in the Toliara province of Madagascar in 2005. We sequenced the genome of the VDPVs isolated from the patients and from 12 healthy children and characterized phenotypic aspects, including pathogenicity, in mice transgenic for the poliovirus receptor. We identified 6 highly complex mosaic recombinant lineages composed of sequences derived from different vaccine polioviruses and other species C human enteroviruses (HEV-Cs). Most had some recombinant genome features in common and contained nucleotide sequences closely related to certain cocirculating coxsackie A virus isolates. However, they differed in terms of their recombinant characteristics or nucleotide substitutions and phenotypic features. All VDPVs were neurovirulent in mice. This study confirms the genetic relationship between type 2 and 3 VDPVs, indicating that both types can be involved in a single outbreak of disease. Our results highlight the various ways in which a vaccine-derived poliovirus may become pathogenic in complex viral ecosystems, through frequent recombination events and mutations. Intertypic recombination between cocirculating HEV-Cs (including polioviruses) appears to be a common mechanism of genetic plasticity underlying transverse genetic variability.

  19. A Multistep Synthesis Featuring Classic Carbonyl Chemistry for the Advanced Organic Chemistry Laboratory

    ERIC Educational Resources Information Center

    Duff, David B.; Abbe, Tyler G.; Goess, Brian C.

    2012-01-01

    A multistep synthesis of 5-isopropyl-1,3-cyclohexanedione is carried out from three commodity chemicals. The sequence involves an aldol condensation, Dieckmann-type annulation, ester hydrolysis, and decarboxylation. No purification is required until after the final step, at which point gravity column chromatography provides the desired product in…

  20. Protein structure based prediction of catalytic residues.

    PubMed

    Fajardo, J Eduardo; Fiser, Andras

    2013-02-22

    Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases.

  1. Moving object detection and tracking in videos through turbulent medium

    NASA Astrophysics Data System (ADS)

    Halder, Kalyan Kumar; Tahtali, Murat; Anavatti, Sreenatha G.

    2016-06-01

    This paper addresses the problem of identifying and tracking moving objects in a video sequence having a time-varying background. This is a fundamental task in many computer vision applications, though a very challenging one because of turbulence that causes blurring and spatiotemporal movements of the background images. Our proposed approach involves two major steps. First, a moving object detection algorithm that deals with the detection of real motions by separating the turbulence-induced motions using a two-level thresholding technique is used. In the second step, a feature-based generalized regression neural network is applied to track the detected objects throughout the frames in the video sequence. The proposed approach uses the centroid and area features of the moving objects and creates the reference regions instantly by selecting the objects within a circle. Simulation experiments are carried out on several turbulence-degraded video sequences and comparisons with an earlier method confirms that the proposed approach provides a more effective tracking of the targets.

  2. RStrucFam: a web server to associate structure and cognate RNA for RNA-binding proteins from sequence information.

    PubMed

    Ghosh, Pritha; Mathew, Oommen K; Sowdhamini, Ramanathan

    2016-10-07

    RNA-binding proteins (RBPs) interact with their cognate RNA(s) to form large biomolecular assemblies. They are versatile in their functionality and are involved in a myriad of processes inside the cell. RBPs with similar structural features and common biological functions are grouped together into families and superfamilies. It will be useful to obtain an early understanding and association of RNA-binding property of sequences of gene products. Here, we report a web server, RStrucFam, to predict the structure, type of cognate RNA(s) and function(s) of proteins, where possible, from mere sequence information. The web server employs Hidden Markov Model scan (hmmscan) to enable association to a back-end database of structural and sequence families. The database (HMMRBP) comprises of 437 HMMs of RBP families of known structure that have been generated using structure-based sequence alignments and 746 sequence-centric RBP family HMMs. The input protein sequence is associated with structural or sequence domain families, if structure or sequence signatures exist. In case of association of the protein with a family of known structures, output features like, multiple structure-based sequence alignment (MSSA) of the query with all others members of that family is provided. Further, cognate RNA partner(s) for that protein, Gene Ontology (GO) annotations, if any and a homology model of the protein can be obtained. The users can also browse through the database for details pertaining to each family, protein or RNA and their related information based on keyword search or RNA motif search. RStrucFam is a web server that exploits structurally conserved features of RBPs, derived from known family members and imprinted in mathematical profiles, to predict putative RBPs from sequence information. Proteins that fail to associate with such structure-centric families are further queried against the sequence-centric RBP family HMMs in the HMMRBP database. Further, all other essential information pertaining to an RBP, like overall function annotations, are provided. The web server can be accessed at the following link: http://caps.ncbs.res.in/rstrucfam .

  3. Repetitive sequences and epigenetic modification: inseparable partners play important roles in the evolution of plant sex chromosomes.

    PubMed

    Li, Shu-Fen; Zhang, Guo-Jun; Yuan, Jin-Hong; Deng, Chuan-Liang; Gao, Wu-Jun

    2016-05-01

    The present review discusses the roles of repetitive sequences played in plant sex chromosome evolution, and highlights epigenetic modification as potential mechanism of repetitive sequences involved in sex chromosome evolution. Sex determination in plants is mostly based on sex chromosomes. Classic theory proposes that sex chromosomes evolve from a specific pair of autosomes with emergence of a sex-determining gene(s). Subsequently, the newly formed sex chromosomes stop recombination in a small region around the sex-determining locus, and over time, the non-recombining region expands to almost all parts of the sex chromosomes. Accumulation of repetitive sequences, mostly transposable elements and tandem repeats, is a conspicuous feature of the non-recombining region of the Y chromosome, even in primitive one. Repetitive sequences may play multiple roles in sex chromosome evolution, such as triggering heterochromatization and causing recombination suppression, leading to structural and morphological differentiation of sex chromosomes, and promoting Y chromosome degeneration and X chromosome dosage compensation. In this article, we review the current status of this field, and based on preliminary evidence, we posit that repetitive sequences are involved in sex chromosome evolution probably via epigenetic modification, such as DNA and histone methylation, with small interfering RNAs as the mediator.

  4. Early-Onset LMNA-Associated Muscular Dystrophy with Later Involvement of Contracture.

    PubMed

    Lee, Younggun; Lee, Jung Hwan; Park, Hyung Jun; Choi, Young Chul

    2017-10-01

    The early diagnosis of LMNA-associated muscular dystrophy is important for preventing sudden arrest related to cardiac conduction block. However, diagnosing early-onset Emery-Dreifuss muscular dystrophy (EDMD) with later involvement of contracture and limb-girdle muscular dystrophy type 1B is often delayed due to heterogeneous clinical presentations. We aimed to determine the clinical features that contribute to a delayed diagnosis. We reviewed four patients who were recently diagnosed with LMNA-associated muscular dystrophy by targeted exome sequencing and who were initially diagnosed with nonspecific or other types of muscular dystrophy. Certain clinical features such as delayed contracture involvement and calf hypertrophy were found to contribute to a delayed diagnosis. Muscle biopsies were not informative for the diagnosis in these patients. Genetic testing of single or multiple genes is useful for confirming a diagnosis of LMNA-associated muscular dystrophy. Even EDMD patients could experience the later involvement of contracture, so clinicians should consider early genetic testing for patients with undiagnosed muscular dystrophy or laminopathy. Copyright © 2017 Korean Neurological Association

  5. MACSIMS : multiple alignment of complete sequences information management system

    PubMed Central

    Thompson, Julie D; Muller, Arnaud; Waterhouse, Andrew; Procter, Jim; Barton, Geoffrey J; Plewniak, Frédéric; Poch, Olivier

    2006-01-01

    Background In the post-genomic era, systems-level studies are being performed that seek to explain complex biological systems by integrating diverse resources from fields such as genomics, proteomics or transcriptomics. New information management systems are now needed for the collection, validation and analysis of the vast amount of heterogeneous data available. Multiple alignments of complete sequences provide an ideal environment for the integration of this information in the context of the protein family. Results MACSIMS is a multiple alignment-based information management program that combines the advantages of both knowledge-based and ab initio sequence analysis methods. Structural and functional information is retrieved automatically from the public databases. In the multiple alignment, homologous regions are identified and the retrieved data is evaluated and propagated from known to unknown sequences with these reliable regions. In a large-scale evaluation, the specificity of the propagated sequence features is estimated to be >99%, i.e. very few false positive predictions are made. MACSIMS is then used to characterise mutations in a test set of 100 proteins that are known to be involved in human genetic diseases. The number of sequence features associated with these proteins was increased by 60%, compared to the features available in the public databases. An XML format output file allows automatic parsing of the MACSIM results, while a graphical display using the JalView program allows manual analysis. Conclusion MACSIMS is a new information management system that incorporates detailed analyses of protein families at the structural, functional and evolutionary levels. MACSIMS thus provides a unique environment that facilitates knowledge extraction and the presentation of the most pertinent information to the biologist. A web server and the source code are available at . PMID:16792820

  6. The kinetoplast DNA of the Australian trypanosome, Trypanosoma copemani, shares features with Trypanosoma cruzi and Trypanosoma lewisi.

    PubMed

    Botero, Adriana; Kapeller, Irit; Cooper, Crystal; Clode, Peta L; Shlomai, Joseph; Thompson, R C Andrew

    2018-05-17

    Kinetoplast DNA (kDNA) is the mitochondrial genome of trypanosomatids. It consists of a few dozen maxicircles and several thousand minicircles, all catenated topologically to form a two-dimensional DNA network. Minicircles are heterogeneous in size and sequence among species. They present one or several conserved regions that contain three highly conserved sequence blocks. CSB-1 (10 bp sequence) and CSB-2 (8 bp sequence) present lower interspecies homology, while CSB-3 (12 bp sequence) or the Universal Minicircle Sequence is conserved within most trypanosomatids. The Universal Minicircle Sequence is located at the replication origin of the minicircles, and is the binding site for the UMS binding protein, a protein involved in trypanosomatid survival and virulence. Here, we describe the structure and organisation of the kDNA of Trypanosoma copemani, a parasite that has been shown to infect mammalian cells and has been associated with the drastic decline of the endangered Australian marsupial, the woylie (Bettongia penicillata). Deep genomic sequencing showed that T. copemani presents two classes of minicircles that share sequence identity and organisation in the conserved sequence blocks with those of Trypanosoma cruzi and Trypanosoma lewisi. A 19,257 bp partial region of the maxicircle of T. copemani that contained the entire coding region was obtained. Comparative analysis of the T. copemani entire maxicircle coding region with the coding regions of T. cruzi and T. lewisi showed they share 71.05% and 71.28% identity, respectively. The shared features in the maxicircle/minicircle organisation and sequence between T. copemani and T. cruzi/T. lewisi suggest similarities in their process of kDNA replication, and are of significance in understanding the evolution of Australian trypanosomes. Copyright © 2018 The Authors. Published by Elsevier Ltd.. All rights reserved.

  7. Analysis and Visualization of ChIP-Seq and RNA-Seq Sequence Alignments Using ngs.plot.

    PubMed

    Loh, Yong-Hwee Eddie; Shen, Li

    2016-01-01

    The continual maturation and increasing applications of next-generation sequencing technology in scientific research have yielded ever-increasing amounts of data that need to be effectively and efficiently analyzed and innovatively mined for new biological insights. We have developed ngs.plot-a quick and easy-to-use bioinformatics tool that performs visualizations of the spatial relationships between sequencing alignment enrichment and specific genomic features or regions. More importantly, ngs.plot is customizable beyond the use of standard genomic feature databases to allow the analysis and visualization of user-specified regions of interest generated by the user's own hypotheses. In this protocol, we demonstrate and explain the use of ngs.plot using command line executions, as well as a web-based workflow on the Galaxy framework. We replicate the underlying commands used in the analysis of a true biological dataset that we had reported and published earlier and demonstrate how ngs.plot can easily generate publication-ready figures. With ngs.plot, users would be able to efficiently and innovatively mine their own datasets without having to be involved in the technical aspects of sequence coverage calculations and genomic databases.

  8. Metabolism and Genetics of Helicobacter pylori: the Genome Era

    PubMed Central

    Marais, Armelle; Mendz, George L.; Hazell, Stuart L.; Mégraud, Francis

    1999-01-01

    The publication of the complete sequence of Helicobacter pylori 26695 in 1997 and more recently that of strain J99 has provided new insight into the biology of this organism. In this review, we attempt to analyze and interpret the information provided by sequence annotations and to compare these data with those provided by experimental analyses. After a brief description of the general features of the genomes of the two sequenced strains, the principal metabolic pathways are analyzed. In particular, the enzymes encoded by H. pylori involved in fermentative and oxidative metabolism, lipopolysaccharide biosynthesis, nucleotide biosynthesis, aerobic and anaerobic respiration, and iron and nitrogen assimilation are described, and the areas of controversy between the experimental data and those provided by the sequence annotation are discussed. The role of urease, particularly in pH homeostasis, and other specialized mechanisms developed by the bacterium to maintain its internal pH are also considered. The replicational, transcriptional, and translational apparatuses are reviewed, as is the regulatory network. The numerous findings on the metabolism of the bacteria and the paucity of gene expression regulation systems are indicative of the high level of adaptation to the human gastric environment. Arguments in favor of the diversity of H. pylori and molecular data reflecting possible mechanisms involved in this diversity are presented. Finally, we compare the numerous experimental data on the colonization factors and those provided from the genome sequence annotation, in particular for genes involved in motility and adherence of the bacterium to the gastric tissue. PMID:10477311

  9. Identification of upstream and intragenic regulatory elements that confer cell-type-restricted and differentiation-specific expression on the muscle creatine kinase gene

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sternberg, E.A.; Spizz, G.; Perry, W.M.

    1988-07-01

    Terminal differentiation of skeletal myobalsts is accompanied by induction of a series of tissue-specific gene products, which includes the muscle isoenzymte of creatine kinase (MCK). To begin to define the sequences and signals involved in MCK regulation in developing muscle cells, the mouse MCK gene has been isolated. Sequence analysis of 4,147 bases of DNA surrounding the transcription initiation site revealed several interesting structural features, some of which are common to other muscle-specific genes and to cellular and viral enhancers.

  10. Statistical theory of combinatorial libraries of folding proteins: energetic discrimination of a target structure.

    PubMed

    Zou, J; Saven, J G

    2000-02-11

    A self-consistent theory is presented that can be used to estimate the number and composition of sequences satisfying a predetermined set of constraints. The theory is formulated so as to examine the features of sequences having a particular value of Delta=E(f)-(u), where E(f) is the energy of sequences when in a target structure and (u) is an average energy of non-target structures. The theory yields the probabilities w(i)(alpha) that each position i in the sequence is occupied by a particular monomer type alpha. The theory is applied to a simple lattice model of proteins. Excellent agreement is observed between the theory and the results of exact enumerations. The theory provides a quantitative framework for the design and interpretation of combinatorial experiments involving proteins, where a library of amino acid sequences is searched for sequences that fold to a desired structure. Copyright 2000 Academic Press.

  11. High quality draft genome sequence of Olivibacter sitiensis type strain (AW-6T), a diphenol degrader with genes involved in the catechol pathway

    PubMed Central

    Ntougias, Spyridon; Lapidus, Alla; Han, James; Mavromatis, Konstantinos; Pati, Amrita; Chen, Amy; Klenk, Hans-Peter; Woyke, Tanja; Fasseas, Constantinos; Kyrpides, Nikos C.; Zervakis, Georgios I.

    2014-01-01

    Olivibacter sitiensis Ntougias et al. 2007 is a member of the family Sphingobacteriaceae, phylum Bacteroidetes. Members of the genus Olivibacter are phylogenetically diverse and of significant interest. They occur in diverse habitats, such as rhizosphere and contaminated soils, viscous wastes, composts, biofilter clean-up facilities on contaminated sites and cave environments, and they are involved in the degradation of complex and toxic compounds. Here we describe the features of O. sitiensis AW-6T, together with the permanent-draft genome sequence and annotation. The organism was sequenced under the Genomic Encyclopedia for Bacteria and Archaea (GEBA) project at the DOE Joint Genome Institute and is the first genome sequence of a species within the genus Olivibacter. The genome is 5,053,571 bp long and is comprised of 110 scaffolds with an average GC content of 44.61%. Of the 4,565 genes predicted, 4,501 were protein-coding genes and 64 were RNA genes. Most protein-coding genes (68.52%) were assigned to a putative function. The identification of 2-keto-4-pentenoate hydratase/2-oxohepta-3-ene-1,7-dioic acid hydratase-coding genes indicates involvement of this organism in the catechol catabolic pathway. In addition, genes encoding for β-1,4-xylanases and β-1,4-xylosidases reveal the xylanolytic action of O. sitiensis. PMID:25197463

  12. Genome Analysis of Streptococcus pyogenes Associated with Pharyngitis and Skin Infections

    PubMed Central

    Ibrahim, Joe; Eisen, Jonathan A.; Jospin, Guillaume; Coil, David A.; Khazen, Georges

    2016-01-01

    Streptococcus pyogenes is a very important human pathogen, commonly associated with skin or throat infections but can also cause life-threatening situations including sepsis, streptococcal toxic shock syndrome, and necrotizing fasciitis. Various studies involving typing and molecular characterization of S. pyogenes have been published to date; however next-generation sequencing (NGS) studies provide a comprehensive collection of an organism’s genetic variation. In this study, the genomes of nine S. pyogenes isolates associated with pharyngitis and skin infection were sequenced and studied for the presence of virulence genes, resistance elements, prophages, genomic recombination, and other genomic features. Additionally, a comparative phylogenetic analysis of the isolates with global clones highlighted their possible evolutionary lineage and their site of infection. The genomes were found to also house a multitude of features including gene regulation systems, virulence factors and antimicrobial resistance mechanisms. PMID:27977735

  13. The genome sequence of the model ascomycete fungus Podospora anserina.

    PubMed

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne Gj; Henrissat, Bernard; Khoury, Riyad El; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope.

  14. Formation of a functional maize centromere after loss of centromeric sequences and gain of ectopic sequences.

    PubMed

    Zhang, Bing; Lv, Zhenling; Pang, Junling; Liu, Yalin; Guo, Xiang; Fu, Shulan; Li, Jun; Dong, Qianhua; Wu, Hua-Jun; Gao, Zhi; Wang, Xiu-Jie; Han, Fangpu

    2013-06-01

    The maize (Zea mays) B centromere is composed of B centromere-specific repeats (ZmBs), centromere-specific satellite repeats (CentC), and centromeric retrotransposons of maize (CRM). Here we describe a newly formed B centromere in maize, which has lost CentC sequences and has dramatically reduced CRM and ZmBs sequences, but still retains the molecular features of functional centromeres, such as CENH3, H2A phosphorylation at Thr-133, H3 phosphorylation at Ser-10, and Thr-3 immunostaining signals. This new centromere is stable and can be transmitted to offspring through meiosis. Anti-CENH3 chromatin immunoprecipitation sequencing revealed that a 723-kb region from the short arm of chromosome 9 (9S) was involved in the formation of the new centromere. The 723-kb region, which is gene poor and enriched for transposons, contains two abundant DNA motifs. Genes in the new centromere region are still transcribed. The original 723-kb region showed a higher DNA methylation level compared with native centromeres but was not significantly changed when it was involved in new centromere formation. Our results indicate that functional centromeres may be formed without the known centromere-specific sequences, yet the maintenance of a high DNA methylation level seems to be crucial for the proper function of a new centromere.

  15. Formation of a Functional Maize Centromere after Loss of Centromeric Sequences and Gain of Ectopic Sequences[C][W

    PubMed Central

    Zhang, Bing; Lv, Zhenling; Pang, Junling; Liu, Yalin; Guo, Xiang; Fu, Shulan; Li, Jun; Dong, Qianhua; Wu, Hua-Jun; Gao, Zhi; Wang, Xiu-Jie; Han, Fangpu

    2013-01-01

    The maize (Zea mays) B centromere is composed of B centromere–specific repeats (ZmBs), centromere-specific satellite repeats (CentC), and centromeric retrotransposons of maize (CRM). Here we describe a newly formed B centromere in maize, which has lost CentC sequences and has dramatically reduced CRM and ZmBs sequences, but still retains the molecular features of functional centromeres, such as CENH3, H2A phosphorylation at Thr-133, H3 phosphorylation at Ser-10, and Thr-3 immunostaining signals. This new centromere is stable and can be transmitted to offspring through meiosis. Anti-CENH3 chromatin immunoprecipitation sequencing revealed that a 723-kb region from the short arm of chromosome 9 (9S) was involved in the formation of the new centromere. The 723-kb region, which is gene poor and enriched for transposons, contains two abundant DNA motifs. Genes in the new centromere region are still transcribed. The original 723-kb region showed a higher DNA methylation level compared with native centromeres but was not significantly changed when it was involved in new centromere formation. Our results indicate that functional centromeres may be formed without the known centromere-specific sequences, yet the maintenance of a high DNA methylation level seems to be crucial for the proper function of a new centromere. PMID:23771890

  16. Properties of genes essential for mouse development

    PubMed Central

    Kabir, Mitra; Barradas, Ana; Tzotzos, George T.; Hentges, Kathryn E.

    2017-01-01

    Essential genes are those that are critical for life. In the specific case of the mouse, they are the set of genes whose deletion means that a mouse is unable to survive after birth. As such, they are the key minimal set of genes needed for all the steps of development to produce an organism capable of life ex utero. We explored a wide range of sequence and functional features to characterise essential (lethal) and non-essential (viable) genes in mice. Experimental data curated manually identified 1301 essential genes and 3451 viable genes. Very many sequence features show highly significant differences between essential and viable mouse genes. Essential genes generally encode complex proteins, with multiple domains and many introns. These genes tend to be: long, highly expressed, old and evolutionarily conserved. These genes tend to encode ligases, transferases, phosphorylated proteins, intracellular proteins, nuclear proteins, and hubs in protein-protein interaction networks. They are involved with regulating protein-protein interactions, gene expression and metabolic processes, cell morphogenesis, cell division, cell proliferation, DNA replication, cell differentiation, DNA repair and transcription, cell differentiation and embryonic development. Viable genes tend to encode: membrane proteins or secreted proteins, and are associated with functions such as cellular communication, apoptosis, behaviour and immune response, as well as housekeeping and tissue specific functions. Viable genes are linked to transport, ion channels, signal transduction, calcium binding and lipid binding, consistent with their location in membranes and involvement with cell-cell communication. From the analysis of the composite features of essential and viable genes, we conclude that essential genes tend to be required for intracellular functions, and viable genes tend to be involved with extracellular functions and cell-cell communication. Knowledge of the features that are over-represented in essential genes allows for a deeper understanding of the functions and processes implemented during mammalian development. PMID:28562614

  17. Protein structure based prediction of catalytic residues

    PubMed Central

    2013-01-01

    Background Worldwide structural genomics projects continue to release new protein structures at an unprecedented pace, so far nearly 6000, but only about 60% of these proteins have any sort of functional annotation. Results We explored a range of features that can be used for the prediction of functional residues given a known three-dimensional structure. These features include various centrality measures of nodes in graphs of interacting residues: closeness, betweenness and page-rank centrality. We also analyzed the distance of functional amino acids to the general center of mass (GCM) of the structure, relative solvent accessibility (RSA), and the use of relative entropy as a measure of sequence conservation. From the selected features, neural networks were trained to identify catalytic residues. We found that using distance to the GCM together with amino acid type provide a good discriminant function, when combined independently with sequence conservation. Using an independent test set of 29 annotated protein structures, the method returned 411 of the initial 9262 residues as the most likely to be involved in function. The output 411 residues contain 70 of the annotated 111 catalytic residues. This represents an approximately 14-fold enrichment of catalytic residues on the entire input set (corresponding to a sensitivity of 63% and a precision of 17%), a performance competitive with that of other state-of-the-art methods. Conclusions We found that several of the graph based measures utilize the same underlying feature of protein structures, which can be simply and more effectively captured with the distance to GCM definition. This also has the added the advantage of simplicity and easy implementation. Meanwhile sequence conservation remains by far the most influential feature in identifying functional residues. We also found that due the rapid changes in size and composition of sequence databases, conservation calculations must be recalibrated for specific reference databases. PMID:23433045

  18. Analysis of the antigen recognition sites of anti-methamphetamine monoclonal antibodies (II): unique feature of MA-3 antibody.

    PubMed

    Ishimaru, M; Morikawa, K; Hifumi, E; Itoh, T; Uda, T

    2000-01-01

    A monoclonal antibody against methamphetamine (MA-3 mAb) was found to be strongly bound to ephedrine. This feature was quite different from that of other fourteen mAbs against MA. Analyses of cDNA sequence and steric conformation by molecular modeling revealed that one hydrophilic pocket was generated in the heavy chain of MA-3 mAb involving CDRH-1 and CDRH-2. Asn33, Asn35, Asn50 and Asp52 were the main components of the unique pocket capable of binding to the hydroxyl group of ephedrine.

  19. History dependence in insect flight decisions during odor tracking.

    PubMed

    Pang, Rich; van Breugel, Floris; Dickinson, Michael; Riffell, Jeffrey A; Fairhall, Adrienne

    2018-02-01

    Natural decision-making often involves extended decision sequences in response to variable stimuli with complex structure. As an example, many animals follow odor plumes to locate food sources or mates, but turbulence breaks up the advected odor signal into intermittent filaments and puffs. This scenario provides an opportunity to ask how animals use sparse, instantaneous, and stochastic signal encounters to generate goal-oriented behavioral sequences. Here we examined the trajectories of flying fruit flies (Drosophila melanogaster) and mosquitoes (Aedes aegypti) navigating in controlled plumes of attractive odorants. While it is known that mean odor-triggered flight responses are dominated by upwind turns, individual responses are highly variable. We asked whether deviations from mean responses depended on specific features of odor encounters, and found that odor-triggered turns were slightly but significantly modulated by two features of odor encounters. First, encounters with higher concentrations triggered stronger upwind turns. Second, encounters occurring later in a sequence triggered weaker upwind turns. To contextualize the latter history dependence theoretically, we examined trajectories simulated from three normative tracking strategies. We found that neither a purely reactive strategy nor a strategy in which the tracker learned the plume centerline over time captured the observed history dependence. In contrast, "infotaxis", in which flight decisions maximized expected information gain about source location, exhibited a history dependence aligned in sign with the data, though much larger in magnitude. These findings suggest that while true plume tracking is dominated by a reactive odor response it might also involve a history-dependent modulation of responses consistent with the accumulation of information about a source over multi-encounter timescales. This suggests that short-term memory processes modulating decision sequences may play a role in natural plume tracking.

  20. History dependence in insect flight decisions during odor tracking

    PubMed Central

    van Breugel, Floris; Dickinson, Michael; Riffell, Jeffrey A.; Fairhall, Adrienne

    2018-01-01

    Natural decision-making often involves extended decision sequences in response to variable stimuli with complex structure. As an example, many animals follow odor plumes to locate food sources or mates, but turbulence breaks up the advected odor signal into intermittent filaments and puffs. This scenario provides an opportunity to ask how animals use sparse, instantaneous, and stochastic signal encounters to generate goal-oriented behavioral sequences. Here we examined the trajectories of flying fruit flies (Drosophila melanogaster) and mosquitoes (Aedes aegypti) navigating in controlled plumes of attractive odorants. While it is known that mean odor-triggered flight responses are dominated by upwind turns, individual responses are highly variable. We asked whether deviations from mean responses depended on specific features of odor encounters, and found that odor-triggered turns were slightly but significantly modulated by two features of odor encounters. First, encounters with higher concentrations triggered stronger upwind turns. Second, encounters occurring later in a sequence triggered weaker upwind turns. To contextualize the latter history dependence theoretically, we examined trajectories simulated from three normative tracking strategies. We found that neither a purely reactive strategy nor a strategy in which the tracker learned the plume centerline over time captured the observed history dependence. In contrast, “infotaxis”, in which flight decisions maximized expected information gain about source location, exhibited a history dependence aligned in sign with the data, though much larger in magnitude. These findings suggest that while true plume tracking is dominated by a reactive odor response it might also involve a history-dependent modulation of responses consistent with the accumulation of information about a source over multi-encounter timescales. This suggests that short-term memory processes modulating decision sequences may play a role in natural plume tracking. PMID:29432454

  1. Genome Features of “Dark-Fly”, a Drosophila Line Reared Long-Term in a Dark Environment

    PubMed Central

    Zhou, Jun; Sugiyama, Yuzo; Nishimura, Osamu; Aizu, Tomoyuki; Toyoda, Atsushi; Fujiyama, Asao; Agata, Kiyokazu

    2012-01-01

    Organisms are remarkably adapted to diverse environments by specialized metabolisms, morphology, or behaviors. To address the molecular mechanisms underlying environmental adaptation, we have utilized a Drosophila melanogaster line, termed “Dark-fly”, which has been maintained in constant dark conditions for 57 years (1400 generations). We found that Dark-fly exhibited higher fecundity in dark than in light conditions, indicating that Dark-fly possesses some traits advantageous in darkness. Using next-generation sequencing technology, we determined the whole genome sequence of Dark-fly and identified approximately 220,000 single nucleotide polymorphisms (SNPs) and 4,700 insertions or deletions (InDels) in the Dark-fly genome compared to the genome of the Oregon-R-S strain, a control strain. 1.8% of SNPs were classified as non-synonymous SNPs (nsSNPs: i.e., they alter the amino acid sequence of gene products). Among them, we detected 28 nonsense mutations (i.e., they produce a stop codon in the protein sequence) in the Dark-fly genome. These included genes encoding an olfactory receptor and a light receptor. We also searched runs of homozygosity (ROH) regions as putative regions selected during the population history, and found 21 ROH regions in the Dark-fly genome. We identified 241 genes carrying nsSNPs or InDels in the ROH regions. These include a cluster of alpha-esterase genes that are involved in detoxification processes. Furthermore, analysis of structural variants in the Dark-fly genome showed the deletion of a gene related to fatty acid metabolism. Our results revealed unique features of the Dark-fly genome and provided a list of potential candidate genes involved in environmental adaptation. PMID:22432011

  2. A Reevaluation of Rice Mitochondrial Evolution Based on the Complete Sequence of Male-Fertile and Male-Sterile Mitochondrial Genomes1[C][W][OA

    PubMed Central

    Bentolila, Stéphane; Stefanov, Stefan

    2012-01-01

    Plant mitochondrial genomes have features that distinguish them radically from their animal counterparts: a high rate of rearrangement, of uptake and loss of DNA sequences, and an extremely low point mutation rate. Perhaps the most unique structural feature of plant mitochondrial DNAs is the presence of large repeated sequences involved in intramolecular and intermolecular recombination. In addition, rare recombination events can occur across shorter repeats, creating rearrangements that result in aberrant phenotypes, including pollen abortion, which is known as cytoplasmic male sterility (CMS). Using next-generation sequencing, we pyrosequenced two rice (Oryza sativa) mitochondrial genomes that belong to the indica subspecies. One genome is normal, while the other carries the wild abortive-CMS. We find that numerous rearrangements in the rice mitochondrial genome occur even between close cytotypes during rice evolution. Unlike maize (Zea mays), a closely related species also belonging to the grass family, integration of plastid sequences did not play a role in the sequence divergence between rice cytotypes. This study also uncovered an excellent candidate for the wild abortive-CMS-encoding gene; like most of the CMS-associated open reading frames that are known in other species, this candidate was created via a rearrangement, is chimeric in structure, possesses predicted transmembrane domains, and coopted the promoter of a genuine mitochondrial gene. Our data give new insights into rice mitochondrial evolution, correcting previous reports. PMID:22128137

  3. Influence of time and length size feature selections for human activity sequences recognition.

    PubMed

    Fang, Hongqing; Chen, Long; Srinivasan, Raghavendiran

    2014-01-01

    In this paper, Viterbi algorithm based on a hidden Markov model is applied to recognize activity sequences from observed sensors events. Alternative features selections of time feature values of sensors events and activity length size feature values are tested, respectively, and then the results of activity sequences recognition performances of Viterbi algorithm are evaluated. The results show that the selection of larger time feature values of sensor events and/or smaller activity length size feature values will generate relatively better results on the activity sequences recognition performances. © 2013 ISA Published by ISA All rights reserved.

  4. The Effect of the Number and Nature of Features and of General Ability on the Simultaneous and Successive Processing of Maps.

    ERIC Educational Resources Information Center

    Sutherland, Sandra; Winn, William

    The interactions of three factors that may be involved with the memory for pattern or sequence in visual materials were investigated in this study: (1) arbitrariness of representation; (2) task; and (3) ability of students. The subjects, who were 29 graduate students in education, were pretested for general ability and randomly assigned to four…

  5. SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome

    PubMed Central

    Horga, Alejandro; Tomaselli, Pedro J.; Gonzalez, Michael A.; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y.; Hanna, Michael G.; Blake, Julian C.; Houlden, Henry; Züchner, Stephan

    2016-01-01

    Objective: To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor–1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. Methods: We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. Results: In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. Conclusions: We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. PMID:27629094

  6. SIGMAR1 mutation associated with autosomal recessive Silver-like syndrome.

    PubMed

    Horga, Alejandro; Tomaselli, Pedro J; Gonzalez, Michael A; Laurà, Matilde; Muntoni, Francesco; Manzur, Adnan Y; Hanna, Michael G; Blake, Julian C; Houlden, Henry; Züchner, Stephan; Reilly, Mary M

    2016-10-11

    To describe the genetic and clinical features of a simplex patient with distal hereditary motor neuropathy (dHMN) and lower limb spasticity (Silver-like syndrome) due to a mutation in the sigma nonopioid intracellular receptor-1 gene (SIGMAR1) and review the phenotypic spectrum of mutations in this gene. We used whole-exome sequencing to investigate the proband. The variants of interest were investigated for segregation in the family using Sanger sequencing. Subsequently, a larger cohort of 16 unrelated dHMN patients was specifically screened for SIGMAR1 mutations. In the proband, we identified a homozygous missense variant (c.194T>A, p.Leu65Gln) in exon 2 of SIGMAR1 as the probable causative mutation. Pathogenicity is supported by evolutionary conservation, in silico analyses, and the strong phenotypic similarities with previously reported cases carrying coding sequence mutations in SIGMAR1. No other mutations were identified in 16 additional patients with dHMN. We suggest that coding sequence mutations in SIGMAR1 present clinically with a combination of dHMN and pyramidal tract signs, with or without spasticity, in the lower limbs. Preferential involvement of extensor muscles of the upper limbs may be a distinctive feature of the disease. These observations should be confirmed in future studies. © 2016 American Academy of Neurology.

  7. PubDNA Finder: a web database linking full-text articles to sequences of nucleic acids.

    PubMed

    García-Remesal, Miguel; Cuevas, Alejandro; Pérez-Rey, David; Martín, Luis; Anguita, Alberto; de la Iglesia, Diana; de la Calle, Guillermo; Crespo, José; Maojo, Víctor

    2010-11-01

    PubDNA Finder is an online repository that we have created to link PubMed Central manuscripts to the sequences of nucleic acids appearing in them. It extends the search capabilities provided by PubMed Central by enabling researchers to perform advanced searches involving sequences of nucleic acids. This includes, among other features (i) searching for papers mentioning one or more specific sequences of nucleic acids and (ii) retrieving the genetic sequences appearing in different articles. These additional query capabilities are provided by a searchable index that we created by using the full text of the 176 672 papers available at PubMed Central at the time of writing and the sequences of nucleic acids appearing in them. To automatically extract the genetic sequences occurring in each paper, we used an original method we have developed. The database is updated monthly by automatically connecting to the PubMed Central FTP site to retrieve and index new manuscripts. Users can query the database via the web interface provided. PubDNA Finder can be freely accessed at http://servet.dia.fi.upm.es:8080/pubdnafinder

  8. PCR Primers to Study the Diversity of Expressed Fungal Genes Encoding Lignocellulolytic Enzymes in Soils Using High-Throughput Sequencing

    PubMed Central

    Barbi, Florian; Bragalini, Claudia; Vallon, Laurent; Prudent, Elsa; Dubost, Audrey; Fraissinet-Tachet, Laurence; Marmeisse, Roland; Luis, Patricia

    2014-01-01

    Plant biomass degradation in soil is one of the key steps of carbon cycling in terrestrial ecosystems. Fungal saprotrophic communities play an essential role in this process by producing hydrolytic enzymes active on the main components of plant organic matter. Open questions in this field regard the diversity of the species involved, the major biochemical pathways implicated and how these are affected by external factors such as litter quality or climate changes. This can be tackled by environmental genomic approaches involving the systematic sequencing of key enzyme-coding gene families using soil-extracted RNA as material. Such an approach necessitates the design and evaluation of gene family-specific PCR primers producing sequence fragments compatible with high-throughput sequencing approaches. In the present study, we developed and evaluated PCR primers for the specific amplification of fungal CAZy Glycoside Hydrolase gene families GH5 (subfamily 5) and GH11 encoding endo-β-1,4-glucanases and endo-β-1,4-xylanases respectively as well as Basidiomycota class II peroxidases, corresponding to the CAZy Auxiliary Activity family 2 (AA2), active on lignin. These primers were experimentally validated using DNA extracted from a wide range of Ascomycota and Basidiomycota species including 27 with sequenced genomes. Along with the published primers for Glycoside Hydrolase GH7 encoding enzymes active on cellulose, the newly design primers were shown to be compatible with the Illumina MiSeq sequencing technology. Sequences obtained from RNA extracted from beech or spruce forest soils showed a high diversity and were uniformly distributed in gene trees featuring the global diversity of these gene families. This high-throughput sequencing approach using several degenerate primers constitutes a robust method, which allows the simultaneous characterization of the diversity of different fungal transcripts involved in plant organic matter degradation and may lead to the discovery of complex patterns in gene expression of soil fungal communities. PMID:25545363

  9. De novo design of RNA-binding proteins with a prion-like domain related to ALS/FTD proteinopathies.

    PubMed

    Mitsuhashi, Kana; Ito, Daisuke; Mashima, Kyoko; Oyama, Munenori; Takahashi, Shinichi; Suzuki, Norihiro

    2017-12-04

    Aberrant RNA-binding proteins form the core of the neurodegeneration cascade in spectrums of disease, such as amyotrophic lateral sclerosis (ALS)/frontotemporal dementia (FTD). Six ALS-related molecules, TDP-43, FUS, TAF15, EWSR1, heterogeneous nuclear (hn)RNPA1 and hnRNPA2 are RNA-binding proteins containing candidate mutations identified in ALS patients and those share several common features, including harboring an aggregation-prone prion-like domain (PrLD) containing a glycine/serine-tyrosine-glycine/serine (G/S-Y-G/S)-motif-enriched low-complexity sequence and rich in glutamine and/or asparagine. Additinally, these six molecules are components of RNA granules involved in RNA quality control and become mislocated from the nucleus to form cytoplasmic inclusion bodies (IBs) in the ALS/FTD-affected brain. To reveal the essential mechanisms involved in ALS/FTD-related cytotoxicity associated with RNA-binding proteins containing PrLDs, we designed artificial RNA-binding proteins harboring G/S-Y-G/S-motif repeats with and without enriched glutamine residues and nuclear-import/export-signal sequences and examined their cytotoxicity in vitro. These proteins recapitulated features of ALS-linked molecules, including insoluble aggregation, formation of cytoplasmic IBs and components of RNA granules, and cytotoxicity instigation. These findings indicated that these artificial RNA-binding proteins mimicked features of ALS-linked molecules and allowed the study of mechanisms associated with gain of toxic functions related to ALS/FTD pathogenesis.

  10. Molecular characterization and intermolecular interaction of coat protein of Prunus necrotic ringspot virus: implications for virus assembly.

    PubMed

    Kulshrestha, Saurabh; Hallan, Vipin; Sharma, Anshul; Seth, Chandrika Attri; Chauhan, Anjali; Zaidi, Aijaz Asghar

    2013-09-01

    Coat protein (CP) and RNA3 from Prunus necrotic ringspot virus (PNRSV-rose), the most prevalent virus infecting rose in India, were characterized and regions in the coat protein important for self-interaction, during dimer formation were identified. The sequence analysis of CP and partial RNA 3 revealed that the rose isolate of PNRSV in India belongs to PV-32 group of PNRSV isolates. Apart from the already established group specific features of PV-32 group member's additional group-specific and host specific features were also identified. Presence of methionine at position 90 in the amino acid sequence alignment of PNRSV CP gene (belonging to PV-32 group) was identified as the specific conserved feature for the rose isolates of PNRSV. As protein-protein interaction plays a vital role in the infection process, an attempt was made to identify the portions of PNRSV CP responsible for self-interaction using yeast two-hybrid system. It was found (after analysis of the deletion clones) that the C-terminal region of PNRSV CP (amino acids 153-226) plays a vital role in this interaction during dimer formation. N-terminal of PNRSV CP is previously known to be involved in CP-RNA interactions, but our results also suggested that N-terminal of PNRSV CP represented by amino acids 1-77 also interacts with C-terminal (amino acids 153-226) in yeast two-hybrid system, suggesting its probable involvement in the CP-CP interaction.

  11. Prediction of plant pre-microRNAs and their microRNAs in genome-scale sequences using structure-sequence features and support vector machine.

    PubMed

    Meng, Jun; Liu, Dong; Sun, Chao; Luan, Yushi

    2014-12-30

    MicroRNAs (miRNAs) are a family of non-coding RNAs approximately 21 nucleotides in length that play pivotal roles at the post-transcriptional level in animals, plants and viruses. These molecules silence their target genes by degrading transcription or suppressing translation. Studies have shown that miRNAs are involved in biological responses to a variety of biotic and abiotic stresses. Identification of these molecules and their targets can aid the understanding of regulatory processes. Recently, prediction methods based on machine learning have been widely used for miRNA prediction. However, most of these methods were designed for mammalian miRNA prediction, and few are available for predicting miRNAs in the pre-miRNAs of specific plant species. Although the complete Solanum lycopersicum genome has been published, only 77 Solanum lycopersicum miRNAs have been identified, far less than the estimated number. Therefore, it is essential to develop a prediction method based on machine learning to identify new plant miRNAs. A novel classification model based on a support vector machine (SVM) was trained to identify real and pseudo plant pre-miRNAs together with their miRNAs. An initial set of 152 novel features related to sequential structures was used to train the model. By applying feature selection, we obtained the best subset of 47 features for use with the Back Support Vector Machine-Recursive Feature Elimination (B-SVM-RFE) method for the classification of plant pre-miRNAs. Using this method, 63 features were obtained for plant miRNA classification. We then developed an integrated classification model, miPlantPreMat, which comprises MiPlantPre and MiPlantMat, to identify plant pre-miRNAs and their miRNAs. This model achieved approximately 90% accuracy using plant datasets from nine plant species, including Arabidopsis thaliana, Glycine max, Oryza sativa, Physcomitrella patens, Medicago truncatula, Sorghum bicolor, Arabidopsis lyrata, Zea mays and Solanum lycopersicum. Using miPlantPreMat, 522 Solanum lycopersicum miRNAs were identified in the Solanum lycopersicum genome sequence. We developed an integrated classification model, miPlantPreMat, based on structure-sequence features and SVM. MiPlantPreMat was used to identify both plant pre-miRNAs and the corresponding mature miRNAs. An improved feature selection method was proposed, resulting in high classification accuracy, sensitivity and specificity.

  12. iFeature: a python package and web server for features extraction and selection from protein and peptide sequences.

    PubMed

    Chen, Zhen; Zhao, Pei; Li, Fuyi; Leier, André; Marquez-Lago, Tatiana T; Wang, Yanan; Webb, Geoffrey I; Smith, A Ian; Daly, Roger J; Chou, Kuo-Chen; Song, Jiangning

    2018-03-08

    Structural and physiochemical descriptors extracted from sequence data have been widely used to represent sequences and predict structural, functional, expression and interaction profiles of proteins and peptides as well as DNAs/RNAs. Here, we present iFeature, a versatile Python-based toolkit for generating various numerical feature representation schemes for both protein and peptide sequences. iFeature is capable of calculating and extracting a comprehensive spectrum of 18 major sequence encoding schemes that encompass 53 different types of feature descriptors. It also allows users to extract specific amino acid properties from the AAindex database. Furthermore, iFeature integrates 12 different types of commonly used feature clustering, selection, and dimensionality reduction algorithms, greatly facilitating training, analysis, and benchmarking of machine-learning models. The functionality of iFeature is made freely available via an online web server and a stand-alone toolkit. http://iFeature.erc.monash.edu/; https://github.com/Superzchen/iFeature/. jiangning.song@monash.edu; kcchou@gordonlifescience.org; roger.daly@monash.edu. Supplementary data are available at Bioinformatics online.

  13. Use of eluted peptide sequence data to identify the binding characteristics of peptides to the insulin-dependent diabetes susceptibility allele HLA-DQ8 (DQ 3.2).

    PubMed

    Godkin, A; Friede, T; Davenport, M; Stevanovic, S; Willis, A; Jewell, D; Hill, A; Rammensee, H G

    1997-06-01

    HLA-DQ8 (A1*0301, B1*0302) and -DQ2 (A1*0501, B1*0201) are both associated with diseases such as insulin-dependent diabetes mellitus and coeliac disease. We used the technique of pool sequencing to look at the requirements of peptides binding to HLA-DQ8, and combined these data with naturally sequenced ligands and in vitro binding assays to describe a novel motif for HLA-DQ8. The motif, which has the same basic format as many HLA-DR molecules, consists of four or five anchor regions, in the positions from the N-terminus of the binding core of n, n + 3, n + 5/6 and n + 8, i.e. P1, P4, P6/7 and P9. P1 and P9 require negative or polar residues, with mainly aliphatic residues at P4 and P6/7. The features of the HLA-DQ8 motif were then compared to a pool sequence of peptides eluted from HLA-DQ2. A consensus motif for the binding of a common peptide which may be involved in disease pathogenesis is described. Neither of the disease-associated alleles HLA-DQ2 and -DQ8 have Asp at position 57 of the beta-chain. This Asp, if present, may form a salt bridge with an Arg at position 79 of the alpha-chain and so alter the binding specificity of P9. HLA-DQ2 and -DQ8 both appear to prefer negatively charged amino acids at P9. In contrast, HLA-DQ7 (A1*0301, B1*0301), which is not associated with diabetes, has Asp at beta 57, allowing positively charged amino acids at P9. This analysis of the sequence features of DQ-binding peptides suggests molecular characteristics which may be useful to predict epitopes involved in disease pathogenesis.

  14. Pseudomonas syringae pv. actinidiae Draft Genomes Comparison Reveal Strain-Specific Features Involved in Adaptation and Virulence to Actinidia Species

    PubMed Central

    Marcelletti, Simone; Ferrante, Patrizia; Petriccione, Milena; Firrao, Giuseppe; Scortichini, Marco

    2011-01-01

    A recent re-emerging bacterial canker disease incited by Pseudomonas syringae pv. actinidiae (Psa) is causing severe economic losses to Actinidia chinensis and A. deliciosa cultivations in southern Europe, New Zealand, Chile and South Korea. Little is known about the genetic features of this pathovar. We generated genome-wide Illumina sequence data from two Psa strains causing outbreaks of bacterial canker on the A. deliciosa cv. Hayward in Japan (J-Psa, type-strain of the pathovar) and in Italy (I-Psa) in 1984 and 1992, respectively as well as from a Psa strain (I2-Psa) isolated at the beginning of the recent epidemic on A. chinensis cv. Hort16A in Italy. All strains were isolated from typical leaf spot symptoms. The phylogenetic relationships revealed that Psa is more closely related to P. s. pv. theae than to P. avellanae within genomospecies 8. Comparative genomic analyses revealed both relevant intrapathovar variations and putative pathovar-specific genomic regions in Psa. The genomic sequences of J-Psa and I-Psa were very similar. Conversely, the I2-Psa genome encodes four additional effector protein genes, lacks a 50 kb plasmid and the phaseolotoxin gene cluster, argK-tox but has acquired a 160 kb plasmid and putative prophage sequences. Several lines of evidence from the analysis of the genome sequences support the hypothesis that this strain did not evolve from the Psa population that caused the epidemics in 1984–1992 in Japan and Italy but rather is the product of a recent independent evolution of the pathovar actinidiae for infecting Actinidia spp. All Psa strains share the genetic potential for copper resistance, antibiotic detoxification, high affinity iron acquisition and detoxification of nitric oxide of plant origin. Similar to other sequenced phytopathogenic pseudomonads associated with woody plant species, the Psa strains isolated from leaves also display a set of genes involved in the catabolism of plant-derived aromatic compounds. PMID:22132095

  15. Processing Dynamic Image Sequences from a Moving Sensor.

    DTIC Science & Technology

    1984-02-01

    65 Roadsign Image Sequence ..... ................ ... 70 Roadsign Sequence with Redundant Features .. ........ . 79 Roadsign Subimage...Selected Feature Error Values .. ........ 66 2c. Industrial Image Selected Feature Local Search Values. .. .... 67 3ab. Roadsign Image Error Values...72 3c. Roadsign Image Local Search Values ............. 73 4ab. Roadsign Redundant Feature Error Values. ............ 8 4c. Roadsign

  16. Novel Insights into Tree Biology and Genome Evolution as Revealed Through Genomics.

    PubMed

    Neale, David B; Martínez-García, Pedro J; De La Torre, Amanda R; Montanari, Sara; Wei, Xiao-Xin

    2017-04-28

    Reference genome sequences are the key to the discovery of genes and gene families that determine traits of interest. Recent progress in sequencing technologies has enabled a rapid increase in genome sequencing of tree species, allowing the dissection of complex characters of economic importance, such as fruit and wood quality and resistance to biotic and abiotic stresses. Although the number of reference genome sequences for trees lags behind those for other plant species, it is not too early to gain insight into the unique features that distinguish trees from nontree plants. Our review of the published data suggests that, although many gene families are conserved among herbaceous and tree species, some gene families, such as those involved in resistance to biotic and abiotic stresses and in the synthesis and transport of sugars, are often expanded in tree genomes. As the genomes of more tree species are sequenced, comparative genomics will further elucidate the complexity of tree genomes and how this relates to traits unique to trees.

  17. Bioinformatics and expressional analysis of cDNA clones from floral buds

    NASA Astrophysics Data System (ADS)

    Pawełkowicz, Magdalena Ewa; Skarzyńska, Agnieszka; Cebula, Justyna; Hincha, Dirck; ZiÄ bska, Karolina; PlÄ der, Wojciech; Przybecki, Zbigniew

    2017-08-01

    The application of genomic approaches may serve as an initial step in understanding the complexity of biochemical network and cellular processes responsible for regulation and execution of many developmental tasks. The molecular mechanism of sex expression in cucumber is still not elucidated. A study of differential expression was conducted to identify genes involved in sex determination and floral organ morphogenesis. Herein, we present generation of expression sequence tags (EST) obtained by differential hybridization (DH) and subtraction technique (cDNA-DSC) and their characteristic features such as molecular function, involvement in biology processes, expression and mapping position on the genome.

  18. The genome sequence of the model ascomycete fungus Podospora anserina

    PubMed Central

    Espagne, Eric; Lespinet, Olivier; Malagnac, Fabienne; Da Silva, Corinne; Jaillon, Olivier; Porcel, Betina M; Couloux, Arnaud; Aury, Jean-Marc; Ségurens, Béatrice; Poulain, Julie; Anthouard, Véronique; Grossetete, Sandrine; Khalili, Hamid; Coppin, Evelyne; Déquard-Chablat, Michelle; Picard, Marguerite; Contamine, Véronique; Arnaise, Sylvie; Bourdais, Anne; Berteaux-Lecellier, Véronique; Gautheret, Daniel; de Vries, Ronald P; Battaglia, Evy; Coutinho, Pedro M; Danchin, Etienne GJ; Henrissat, Bernard; Khoury, Riyad EL; Sainsard-Chanet, Annie; Boivin, Antoine; Pinan-Lucarré, Bérangère; Sellem, Carole H; Debuchy, Robert; Wincker, Patrick; Weissenbach, Jean; Silar, Philippe

    2008-01-01

    Background The dung-inhabiting ascomycete fungus Podospora anserina is a model used to study various aspects of eukaryotic and fungal biology, such as ageing, prions and sexual development. Results We present a 10X draft sequence of P. anserina genome, linked to the sequences of a large expressed sequence tag collection. Similar to higher eukaryotes, the P. anserina transcription/splicing machinery generates numerous non-conventional transcripts. Comparison of the P. anserina genome and orthologous gene set with the one of its close relatives, Neurospora crassa, shows that synteny is poorly conserved, the main result of evolution being gene shuffling in the same chromosome. The P. anserina genome contains fewer repeated sequences and has evolved new genes by duplication since its separation from N. crassa, despite the presence of the repeat induced point mutation mechanism that mutates duplicated sequences. We also provide evidence that frequent gene loss took place in the lineages leading to P. anserina and N. crassa. P. anserina contains a large and highly specialized set of genes involved in utilization of natural carbon sources commonly found in its natural biotope. It includes genes potentially involved in lignin degradation and efficient cellulose breakdown. Conclusion The features of the P. anserina genome indicate a highly dynamic evolution since the divergence of P. anserina and N. crassa, leading to the ability of the former to use specific complex carbon sources that match its needs in its natural biotope. PMID:18460219

  19. A girl with early-onset epileptic encephalopathy associated with microdeletion involving CDKL5.

    PubMed

    Saitsu, Hirotomo; Osaka, Hitoshi; Nishiyama, Kiyomi; Tsurusaki, Yoshinori; Doi, Hiroshi; Miyake, Noriko; Matsumoto, Naomichi

    2012-05-01

    Recent studies have shown that aberrations of CDKL5 in female patients cause early-onset intractable seizures, severe developmental delay or regression, and Rett syndrome-like features. We report on a Japanese girl with early-onset epileptic encephalopathy, hypotonia, developmental regression, and Rett syndrome-like features. The patient showed generalized tonic seizures, and later, massive myoclonus induced by phone and light stimuli. Brain magnetic resonance imaging showed no structural brain anomalies but cerebral atrophy. Electroencephalogram showed frontal dominant diffuse poly spikes and waves. Through copy number analysis by genomic microarray, we found a microdeletion at Xp22.13. A de novo 137-kb deletion, involving exons 5-21 of CDKL5, RS1, and part of PPEF1 gene, was confirmed by quantitative PCR and breakpoint specific PCR analyses. Our report suggests that the clinical features associated with CDKL5 deletions could be implicated in Japanese patients, and that genetic testing of CDKL5, including both sequencing and deletion analyses, should be considered in girls with early-onset epileptic encephalopathy and RTT-like features. Copyright © 2011 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.

  20. Qualification of security printing features

    NASA Astrophysics Data System (ADS)

    Simske, Steven J.; Aronoff, Jason S.; Arnabat, Jordi

    2006-02-01

    This paper describes the statistical and hardware processes involved in qualifying two related printing features for their deployment in product (e.g. document and package) security. The first is a multi-colored tiling feature that can also be combined with microtext to provide additional forms of security protection. The color information is authenticated automatically with a variety of handheld, desktop and production scanners. The microtext is authenticated either following magnification or manually by a field inspector. The second security feature can also be tile-based. It involves the use of two inks that provide the same visual color, but differ in their transparency to infrared (IR) wavelengths. One of the inks is effectively transparent to IR wavelengths, allowing emitted IR light to pass through. The other ink is effectively opaque to IR wavelengths. These inks allow the printing of a seemingly uniform, or spot, color over a (truly) uniform IR emitting ink layer. The combination converts a uniform covert ink and a spot color to a variable data region capable of encoding identification sequences with high density. Also, it allows the extension of variable data printing for security to ostensibly static printed regions, affording greater security protection while meeting branding and marketing specifications.

  1. repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects.

    PubMed

    Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

    2015-04-15

    In order to develop powerful computational predictors for identifying the biological features or attributes of DNAs, one of the most challenging problems is to find a suitable approach to effectively represent the DNA sequences. To facilitate the studies of DNAs and nucleotides, we developed a Python package called representations of DNAs (repDNA) for generating the widely used features reflecting the physicochemical properties and sequence-order effects of DNAs and nucleotides. There are three feature groups composed of 15 features. The first group calculates three nucleic acid composition features describing the local sequence information by means of kmers; the second group calculates six autocorrelation features describing the level of correlation between two oligonucleotides along a DNA sequence in terms of their specific physicochemical properties; the third group calculates six pseudo nucleotide composition features, which can be used to represent a DNA sequence with a discrete model or vector yet still keep considerable sequence-order information via the physicochemical properties of its constituent oligonucleotides. In addition, these features can be easily calculated based on both the built-in and user-defined properties via using repDNA. The repDNA Python package is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repDNA/. bliu@insun.hit.edu.cn or kcchou@gordonlifescience.org Supplementary data are available at Bioinformatics online. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  2. Spinal motor neuron involvement in a patient with homozygous PRUNE mutation.

    PubMed

    Iacomino, Michele; Fiorillo, Chiara; Torella, Annalaura; Severino, Mariasavina; Broda, Paolo; Romano, Catia; Falsaperla, Raffaele; Pozzolini, Giulia; Minetti, Carlo; Striano, Pasquale; Nigro, Vincenzo; Zara, Federico

    2018-05-01

    In the last few years, whole exome sequencing (WES) allowed the identification of PRUNE mutations in patients featuring a complex neurological phenotype characterized by severe neurodevelopmental delay, microcephaly, epilepsy, optic atrophy, and brain or cerebellar atrophy. We describe an additional patient with homozygous PRUNE mutation who presented with spinal muscular atrophy phenotype, in addition to the already known brain developmental disorder. This novel feature expands the clinical consequences of PRUNE mutations and allow to converge PRUNE syndrome with previous descriptions of neurodevelopmental/neurodegenerative disorders linked to altered microtubule dynamics. Copyright © 2017 European Paediatric Neurology Society. Published by Elsevier Ltd. All rights reserved.

  3. LMX1B Mutations Cause Hereditary FSGS without Extrarenal Involvement

    PubMed Central

    Boyer, Olivia; Woerner, Stéphanie; Yang, Fan; Oakeley, Edward J.; Linghu, Bolan; Gribouval, Olivier; Tête, Marie-Josèphe; Duca, José S.; Klickstein, Lloyd; Damask, Amy J.; Szustakowski, Joseph D.; Heibel, Françoise; Matignon, Marie; Baudouin, Véronique; Chantrel, François; Champigneulle, Jacqueline; Martin, Laurent; Nitschké, Patrick; Gubler, Marie-Claire; Johnson, Keith J.; Chibout, Salah-Dine

    2013-01-01

    LMX1B encodes a homeodomain-containing transcription factor that is essential during development. Mutations in LMX1B cause nail-patella syndrome, characterized by dysplasia of the patellae, nails, and elbows and FSGS with specific ultrastructural lesions of the glomerular basement membrane (GBM). By linkage analysis and exome sequencing, we unexpectedly identified an LMX1B mutation segregating with disease in a pedigree of five patients with autosomal dominant FSGS but without either extrarenal features or ultrastructural abnormalities of the GBM suggestive of nail-patella–like renal disease. Subsequently, we screened 73 additional unrelated families with FSGS and found mutations involving the same amino acid (R246) in 2 families. An LMX1B in silico homology model suggested that the mutated residue plays an important role in strengthening the interaction between the LMX1B homeodomain and DNA; both identified mutations would be expected to diminish such interactions. In summary, these results suggest that isolated FSGS could result from mutations in genes that are also involved in syndromic forms of FSGS. This highlights the need to include these genes in all diagnostic approaches to FSGS that involve next-generation sequencing. PMID:23687361

  4. Comparative genomics of Paracoccus sp. SM22M-07 isolated from coral mucus: insights into bacteria-host interactions.

    PubMed

    Carlos, Camila; Pereira, Letícia Bianca; Ottoboni, Laura Maria Mariscal

    2017-06-01

    One of the main goals of coral microbiology is to understand the ways in which coral-bacteria associations are established and maintained. This work describes the sequencing of the genome of Paracoccus sp. SM22M-07 isolated from the mucus of the endemic Brazilian coral species Mussismilia hispida. Comparative analysis was used to identify unique genomic features of SM22M-07 that might be involved in its adaptation to the marine ecosystem and the nutrient-rich environment provided by coral mucus, as well as in the establishment and strengthening of the interaction with the host. These features included genes related to the type IV protein secretion system, erythritol catabolism, and succinoglycan biosynthesis. We experimentally confirmed the production of succinoglycan by Paracoccus sp. SM22M-07 and we hypothesize that it may be involved in the association of the bacterium with coral surfaces.

  5. Use of whole-exome sequencing to determine the genetic basis of multiple mitochondrial respiratory chain complex deficiencies.

    PubMed

    Taylor, Robert W; Pyle, Angela; Griffin, Helen; Blakely, Emma L; Duff, Jennifer; He, Langping; Smertenko, Tania; Alston, Charlotte L; Neeve, Vivienne C; Best, Andrew; Yarham, John W; Kirschner, Janbernd; Schara, Ulrike; Talim, Beril; Topaloglu, Haluk; Baric, Ivo; Holinski-Feder, Elke; Abicht, Angela; Czermin, Birgit; Kleinle, Stephanie; Morris, Andrew A M; Vassallo, Grace; Gorman, Grainne S; Ramesh, Venkateswaran; Turnbull, Douglass M; Santibanez-Koref, Mauro; McFarland, Robert; Horvath, Rita; Chinnery, Patrick F

    2014-07-02

    Mitochondrial disorders have emerged as a common cause of inherited disease, but their diagnosis remains challenging. Multiple respiratory chain complex defects are particularly difficult to diagnose at the molecular level because of the massive number of nuclear genes potentially involved in intramitochondrial protein synthesis, with many not yet linked to human disease. To determine the molecular basis of multiple respiratory chain complex deficiencies. We studied 53 patients referred to 2 national centers in the United Kingdom and Germany between 2005 and 2012. All had biochemical evidence of multiple respiratory chain complex defects but no primary pathogenic mitochondrial DNA mutation. Whole-exome sequencing was performed using 62-Mb exome enrichment, followed by variant prioritization using bioinformatic prediction tools, variant validation by Sanger sequencing, and segregation of the variant with the disease phenotype in the family. Presumptive causal variants were identified in 28 patients (53%; 95% CI, 39%-67%) and possible causal variants were identified in 4 (8%; 95% CI, 2%-18%). Together these accounted for 32 patients (60% 95% CI, 46%-74%) and involved 18 different genes. These included recurrent mutations in RMND1, AARS2, and MTO1, each on a haplotype background consistent with a shared founder allele, and potential novel mutations in 4 possible mitochondrial disease genes (VARS2, GARS, FLAD1, and PTCD1). Distinguishing clinical features included deafness and renal involvement associated with RMND1 and cardiomyopathy with AARS2 and MTO1. However, atypical clinical features were present in some patients, including normal liver function and Leigh syndrome (subacute necrotizing encephalomyelopathy) seen in association with TRMU mutations and no cardiomyopathy with founder SCO2 mutations. It was not possible to confidently identify the underlying genetic basis in 21 patients (40%; 95% CI, 26%-54%). Exome sequencing enhances the ability to identify potential nuclear gene mutations in patients with biochemically defined defects affecting multiple mitochondrial respiratory chain complexes. Additional study is required in independent patient populations to determine the utility of this approach in comparison with traditional diagnostic methods.

  6. novPTMenzy: a database for enzymes involved in novel post-translational modifications

    PubMed Central

    Khater, Shradha; Mohanty, Debasisa

    2015-01-01

    With the recent discoveries of novel post-translational modifications (PTMs) which play important roles in signaling and biosynthetic pathways, identification of such PTM catalyzing enzymes by genome mining has been an area of major interest. Unlike well-known PTMs like phosphorylation, glycosylation, SUMOylation, no bioinformatics resources are available for enzymes associated with novel and unusual PTMs. Therefore, we have developed the novPTMenzy database which catalogs information on the sequence, structure, active site and genomic neighborhood of experimentally characterized enzymes involved in five novel PTMs, namely AMPylation, Eliminylation, Sulfation, Hydroxylation and Deamidation. Based on a comprehensive analysis of the sequence and structural features of these known PTM catalyzing enzymes, we have created Hidden Markov Model profiles for the identification of similar PTM catalyzing enzymatic domains in genomic sequences. We have also created predictive rules for grouping them into functional subfamilies and deciphering their mechanistic details by structure-based analysis of their active site pockets. These analytical modules have been made available as user friendly search interfaces of novPTMenzy database. It also has a specialized analysis interface for some PTMs like AMPylation and Eliminylation. The novPTMenzy database is a unique resource that can aid in discovery of unusual PTM catalyzing enzymes in newly sequenced genomes. Database URL: http://www.nii.ac.in/novptmenzy.html PMID:25931459

  7. Comparative genomic analysis of Acinetobacter strains isolated from murine colonic crypts.

    PubMed

    Saffarian, Azadeh; Touchon, Marie; Mulet, Céline; Tournebize, Régis; Passet, Virginie; Brisse, Sylvain; Rocha, Eduardo P C; Sansonetti, Philippe J; Pédron, Thierry

    2017-07-11

    A restricted set of aerobic bacteria dominated by the Acinetobacter genus was identified in murine intestinal colonic crypts. The vicinity of such bacteria with intestinal stem cells could indicate that they protect the crypt against cytotoxic and genotoxic signals. Genome analyses of these bacteria were performed to better appreciate their biodegradative capacities. Two taxonomically different clusters of Acinetobacter were isolated from murine proximal colonic crypts, one was identified as A. modestus and the other as A. radioresistens. Their identification was performed through biochemical parameters and housekeeping gene sequencing. After selection of one strain of each cluster (A. modestus CM11G and A. radioresistens CM38.2), comparative genomic analysis was performed on whole-genome sequencing data. The antibiotic resistance pattern of these two strains is different, in line with the many genes involved in resistance to heavy metals identified in both genomes. Moreover whereas the operon benABCDE involved in benzoate metabolism is encoded by the two genomes, the operon antABC encoding the anthranilate dioxygenase, and the phenol hydroxylase gene cluster are absent in the A. modestus genomic sequence, indicating that the two strains have different capacities to metabolize xenobiotics. A common feature of the two strains is the presence of a type IV pili system, and the presence of genes encoding proteins pertaining to secretion systems such as Type I and Type II secretion systems. Our comparative genomic analysis revealed that different Acinetobacter isolated from the same biological niche, even if they share a large majority of genes, possess unique features that could play a specific role in the protection of the intestinal crypt.

  8. Splicing predictions reliably classify different types of alternative splicing

    PubMed Central

    Busch, Anke; Hertel, Klemens J.

    2015-01-01

    Alternative splicing is a key player in the creation of complex mammalian transcriptomes and its misregulation is associated with many human diseases. Multiple mRNA isoforms are generated from most human genes, a process mediated by the interplay of various RNA signature elements and trans-acting factors that guide spliceosomal assembly and intron removal. Here, we introduce a splicing predictor that evaluates hundreds of RNA features simultaneously to successfully differentiate between exons that are constitutively spliced, exons that undergo alternative 5′ or 3′ splice-site selection, and alternative cassette-type exons. Surprisingly, the splicing predictor did not feature strong discriminatory contributions from binding sites for known splicing regulators. Rather, the ability of an exon to be involved in one or multiple types of alternative splicing is dictated by its immediate sequence context, mainly driven by the identity of the exon's splice sites, the conservation around them, and its exon/intron architecture. Thus, the splicing behavior of human exons can be reliably predicted based on basic RNA sequence elements. PMID:25805853

  9. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction.

    PubMed

    Bossi, Flavia; Fan, Jue; Xiao, Jun; Chandra, Lilyana; Shen, Max; Dorone, Yanniv; Wagner, Doris; Rhee, Seung Y

    2017-06-26

    The molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. To identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation. We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.

  10. Genes and Pathways Involved in Adult Onset Disorders Featuring Muscle Mitochondrial DNA Instability

    PubMed Central

    Ahmed, Naghia; Ronchi, Dario; Comi, Giacomo Pietro

    2015-01-01

    Replication and maintenance of mtDNA entirely relies on a set of proteins encoded by the nuclear genome, which include members of the core replicative machinery, proteins involved in the homeostasis of mitochondrial dNTPs pools or deputed to the control of mitochondrial dynamics and morphology. Mutations in their coding genes have been observed in familial and sporadic forms of pediatric and adult-onset clinical phenotypes featuring mtDNA instability. The list of defects involved in these disorders has recently expanded, including mutations in the exo-/endo-nuclease flap-processing proteins MGME1 and DNA2, supporting the notion that an enzymatic DNA repair system actively takes place in mitochondria. The results obtained in the last few years acknowledge the contribution of next-generation sequencing methods in the identification of new disease loci in small groups of patients and even single probands. Although heterogeneous, these genes can be conveniently classified according to the pathway to which they belong. The definition of the molecular and biochemical features of these pathways might be helpful for fundamental knowledge of these disorders, to accelerate genetic diagnosis of patients and the development of rational therapies. In this review, we discuss the molecular findings disclosed in adult patients with muscle pathology hallmarked by mtDNA instability. PMID:26251896

  11. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation.

    PubMed

    Bolleman, Jerven T; Mungall, Christopher J; Strozzi, Francesco; Baran, Joachim; Dumontier, Michel; Bonnal, Raoul J P; Buels, Robert; Hoehndorf, Robert; Fujisawa, Takatomo; Katayama, Toshiaki; Cock, Peter J A

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. We have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned "omics" areas. Using the same data format to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe - and potentially merge - sequence annotations from multiple sources. Data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.

  12. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE PAGES

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco; ...

    2016-06-13

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  13. FALDO: a semantic standard for describing the location of nucleotide and protein feature annotation

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bolleman, Jerven T.; Mungall, Christopher J.; Strozzi, Francesco

    Nucleotide and protein sequence feature annotations are essential to understand biology on the genomic, transcriptomic, and proteomic level. Using Semantic Web technologies to query biological annotations, there was no standard that described this potentially complex location information as subject-predicate-object triples. In this paper, we have developed an ontology, the Feature Annotation Location Description Ontology (FALDO), to describe the positions of annotated features on linear and circular sequences. FALDO can be used to describe nucleotide features in sequence records, protein annotations, and glycan binding sites, among other features in coordinate systems of the aforementioned “omics” areas. Using the same data formatmore » to represent sequence positions that are independent of file formats allows us to integrate sequence data from multiple sources and data types. The genome browser JBrowse is used to demonstrate accessing multiple SPARQL endpoints to display genomic feature annotations, as well as protein annotations from UniProt mapped to genomic locations. Our ontology allows users to uniformly describe – and potentially merge – sequence annotations from multiple sources. Finally, data sources using FALDO can prospectively be retrieved using federalised SPARQL queries against public SPARQL endpoints and/or local private triple stores.« less

  14. Initial sequence and comparative analysis of the cat genome

    PubMed Central

    Pontius, Joan U.; Mullikin, James C.; Smith, Douglas R.; Lindblad-Toh, Kerstin; Gnerre, Sante; Clamp, Michele; Chang, Jean; Stephens, Robert; Neelam, Beena; Volfovsky, Natalia; Schäffer, Alejandro A.; Agarwala, Richa; Narfström, Kristina; Murphy, William J.; Giger, Urs; Roca, Alfred L.; Antunes, Agostinho; Menotti-Raymond, Marilyn; Yuhki, Naoya; Pecon-Slattery, Jill; Johnson, Warren E.; Bourque, Guillaume; Tesler, Glenn; O’Brien, Stephen J.

    2007-01-01

    The genome sequence (1.9-fold coverage) of an inbred Abyssinian domestic cat was assembled, mapped, and annotated with a comparative approach that involved cross-reference to annotated genome assemblies of six mammals (human, chimpanzee, mouse, rat, dog, and cow). The results resolved chromosomal positions for 663,480 contigs, 20,285 putative feline gene orthologs, and 133,499 conserved sequence blocks (CSBs). Additional annotated features include repetitive elements, endogenous retroviral sequences, nuclear mitochondrial (numt) sequences, micro-RNAs, and evolutionary breakpoints that suggest historic balancing of translocation and inversion incidences in distinct mammalian lineages. Large numbers of single nucleotide polymorphisms (SNPs), deletion insertion polymorphisms (DIPs), and short tandem repeats (STRs), suitable for linkage or association studies were characterized in the context of long stretches of chromosome homozygosity. In spite of the light coverage capturing ∼65% of euchromatin sequence from the cat genome, these comparative insights shed new light on the tempo and mode of gene/genome evolution in mammals, promise several research applications for the cat, and also illustrate that a comparative approach using more deeply covered mammals provides an informative, preliminary annotation of a light (1.9-fold) coverage mammal genome sequence. PMID:17975172

  15. Temporality of Features in Near-Death Experience Narratives

    PubMed Central

    Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven

    2017-01-01

    Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon. PMID:28659779

  16. Temporality of Features in Near-Death Experience Narratives.

    PubMed

    Martial, Charlotte; Cassol, Héléna; Antonopoulos, Georgios; Charlier, Thomas; Heros, Julien; Donneau, Anne-Françoise; Charland-Verville, Vanessa; Laureys, Steven

    2017-01-01

    Background: After an occurrence of a Near-Death Experience (NDE), Near-Death Experiencers (NDErs) usually report extremely rich and detailed narratives. Phenomenologically, a NDE can be described as a set of distinguishable features. Some authors have proposed regular patterns of NDEs, however, the actual temporality sequence of NDE core features remains a little explored area. Objectives: The aim of the present study was to investigate the frequency distribution of these features (globally and according to the position of features in narratives) as well as the most frequently reported temporality sequences of features. Methods: We collected 154 French freely expressed written NDE narratives (i.e., Greyson NDE scale total score ≥ 7/32). A text analysis was conducted on all narratives in order to infer temporal ordering and frequency distribution of NDE features. Results: Our analyses highlighted the following most frequently reported sequence of consecutive NDE features: Out-of-Body Experience, Experiencing a tunnel, Seeing a bright light, Feeling of peace. Yet, this sequence was encountered in a very limited number of NDErs. Conclusion: These findings may suggest that NDEs temporality sequences can vary across NDErs. Exploring associations and relationships among features encountered during NDEs may complete the rigorous definition and scientific comprehension of the phenomenon.

  17. Discriminative prediction of mammalian enhancers from DNA sequence

    PubMed Central

    Lee, Dongwon; Karchin, Rachel; Beer, Michael A.

    2011-01-01

    Accurately predicting regulatory sequences and enhancers in entire genomes is an important but difficult problem, especially in large vertebrate genomes. With the advent of ChIP-seq technology, experimental detection of genome-wide EP300/CREBBP bound regions provides a powerful platform to develop predictive tools for regulatory sequences and to study their sequence properties. Here, we develop a support vector machine (SVM) framework which can accurately identify EP300-bound enhancers using only genomic sequence and an unbiased set of general sequence features. Moreover, we find that the predictive sequence features identified by the SVM classifier reveal biologically relevant sequence elements enriched in the enhancers, but we also identify other features that are significantly depleted in enhancers. The predictive sequence features are evolutionarily conserved and spatially clustered, providing further support of their functional significance. Although our SVM is trained on experimental data, we also predict novel enhancers and show that these putative enhancers are significantly enriched in both ChIP-seq signal and DNase I hypersensitivity signal in the mouse brain and are located near relevant genes. Finally, we present results of comparisons between other EP300/CREBBP data sets using our SVM and uncover sequence elements enriched and/or depleted in the different classes of enhancers. Many of these sequence features play a role in specifying tissue-specific or developmental-stage-specific enhancer activity, but our results indicate that some features operate in a general or tissue-independent manner. In addition to providing a high confidence list of enhancer targets for subsequent experimental investigation, these results contribute to our understanding of the general sequence structure of vertebrate enhancers. PMID:21875935

  18. Enhancing links between visual short term memory, visual attention and cognitive control processes through practice: An electrophysiological insight.

    PubMed

    Fuggetta, Giorgio; Duke, Philip A

    2017-05-01

    The operation of attention on visible objects involves a sequence of cognitive processes. The current study firstly aimed to elucidate the effects of practice on neural mechanisms underlying attentional processes as measured with both behavioural and electrophysiological measures. Secondly, it aimed to identify any pattern in the relationship between Event-Related Potential (ERP) components which play a role in the operation of attention in vision. Twenty-seven participants took part in two recording sessions one week apart, performing an experimental paradigm which combined a match-to-sample task with a memory-guided efficient visual-search task within one trial sequence. Overall, practice decreased behavioural response times, increased accuracy, and modulated several ERP components that represent cognitive and neural processing stages. This neuromodulation through practice was also associated with an enhanced link between behavioural measures and ERP components and with an enhanced cortico-cortical interaction of functionally interconnected ERP components. Principal component analysis (PCA) of the ERP amplitude data revealed three components, having different rostro-caudal topographic representations. The first component included both the centro-parietal and parieto-occipital mismatch triggered negativity - involved in integration of visual representations of the target with current task-relevant representations stored in visual working memory - loaded with second negative posterior-bilateral (N2pb) component, involved in categorising specific pop-out target features. The second component comprised the amplitude of bilateral anterior P2 - related to detection of a specific pop-out feature - loaded with bilateral anterior N2, related to detection of conflicting features, and fronto-central mismatch triggered negativity. The third component included the parieto-occipital N1 - related to early neural responses to the stimulus array - which loaded with the second negative posterior-contralateral (N2pc) component, mediating the process of orienting and focusing covert attention on peripheral target features. We discussed these three components as representing different neurocognitive systems modulated with practice within which the input selection process operates. Crown Copyright © 2017. Published by Elsevier B.V. All rights reserved.

  19. [DARS mutations responsible for hypomyelination with brain stem and spinal cord involvement and leg spasticity: report of two cases and review of literature].

    PubMed

    Zhang, J; Liu, M; Zhou, L; Zhang, Z B; Wang, J M; Jiang, Y W; Wu, Y

    2018-03-02

    Objective: To analyze the clinical and imaging features of hypomyelination with brain stem and spinal cord involvement and leg spasticity (HBSL) due to mutations in DARS, and to identify DARS mutations responsible for HBSL. Methods: Data on 2 HBSL patients who were admitted to the pediatric department of Peking University First Hospital from January 2009 through December 2016 were reviewed and the 2 patients were followed up. Targeted next generation sequencing, whole exome sequencing and Sanger sequencing were employed to identify potential genetic variations of the children and their parents. The clinical manifestations, MRI features and genotypic characteristics of two patients were reviewed, and the literature was reviewed. HBSL reported cases were searched with"leukoencephalopathies, DARS"on databases of PubMed, Wanfang, China National Knowledge Infrastructure and VIP from 1975 to 2017. The clinical manifestations and molecular features were analyzed. Results: Both patients showed delayed motor development, but had normal cognitive development. At the age of 8 years, case 1 reached the most significant motor development milestone of only standing with help during the last follow-up. At the age of 9, case 2 could walk independently during the last follow-up. On physical examination, both showed leg spastcity, active tendon reflex, positive Babinski sign. Both patients had brain MRI findings of high T2WI signal in bilateral deep cerebral white matter, slightly lower T1WI, and no abnormal DWI signal. Lesions of case 1 were relatively extensive and involved subcortical white matter, corpus callosum and internal capsule. Spinal MRI scans for both patients showed no abnormal signals. Novel mutations in DARS gene-namely, c.1498_1499insTCA (p.500_501insIle) and c.1210A>G (p.Met404Val) , c.1432A>G (p.Met478Val) and c.1210A>G (p.Met404Val) were identified in case 1 and case 2 respectively. On the database, 2 reports involving 13 foreign patients were retrieved. The age of disease onset was from 4 months to 18 years, and their initial symptoms were development delay or regression. Most of them presented with progressive lower extremity spasm, and the brain magnetic resonance imaging was characterized by hypomyelination in white matter. Clinical phenotypes of different age groups were significantly different. Conclusion: We have reported two patients with HBSL in China, and 3 novel mutations in DARS, which is helpful for the diagnosis and genetic counseling of HBSL.

  20. Visualizing bacterial tRNA identity determinants and antideterminants using function logos and inverse function logos

    PubMed Central

    Freyhult, Eva; Moulton, Vincent; Ardell, David H.

    2006-01-01

    Sequence logos are stacked bar graphs that generalize the notion of consensus sequence. They employ entropy statistics very effectively to display variation in a structural alignment of sequences of a common function, while emphasizing its over-represented features. Yet sequence logos cannot display features that distinguish functional subclasses within a structurally related superfamily nor do they display under-represented features. We introduce two extensions to address these needs: function logos and inverse logos. Function logos display subfunctions that are over-represented among sequences carrying a specific feature. Inverse logos generalize both sequence logos and function logos by displaying under-represented, rather than over-represented, features or functions in structural alignments. To make inverse logos, a compositional inverse is applied to the feature or function frequency distributions before logo construction, where a compositional inverse is a mathematical transform that makes common features or functions rare and vice versa. We applied these methods to a database of structurally aligned bacterial tDNAs to create highly condensed, birds-eye views of potentially all so-called identity determinants and antideterminants that confer specific amino acid charging or initiator function on tRNAs in bacteria. We recovered both known and a few potentially novel identity elements. Function logos and inverse logos are useful tools for exploratory bioinformatic analysis of structure–function relationships in sequence families and superfamilies. PMID:16473848

  1. Listening to sound patterns as a dynamic activity

    NASA Astrophysics Data System (ADS)

    Jones, Mari Riess

    2003-04-01

    The act of listening to a series of sounds created by some natural event is described as involving an entrainmentlike process that transpires in real time. Some aspects of this dynamic process are suggested. In particular, real-time attending is described in terms of an adaptive synchronization activity that permits a listener to target attending energy to forthcoming elements within an acoustical pattern (e.g., music, speech, etc.). Also described are several experiments that illustrate features of this approach as it applies to attending to musiclike patterns. These involve listeners' responses to changes in either the timing or the pitch structure (or both) of various acoustical sequences.

  2. Guiding students towards sensemaking: teacher questions focused on integrating scientific practices with science content

    NASA Astrophysics Data System (ADS)

    Benedict-Chambers, Amanda; Kademian, Sylvie M.; Davis, Elizabeth A.; Palincsar, Annemarie Sullivan

    2017-10-01

    Science education reforms articulate a vision of ambitious science teaching where teachers engage students in sensemaking discussions and emphasise the integration of scientific practices with science content. Learning to teach in this way is complex, and there are few examples of sensemaking discussions in schools where textbook lessons and teacher-directed discussions are the norm. The purpose of this study was to characterise the questioning practices of an experienced teacher who taught a curricular unit enhanced with educative features that emphasised students' engagement in scientific practices integrated with science content. Analyses indicated the teacher asked four types of questions: explication questions, explanation questions, science concept questions, and scientific practice questions, and she used three questioning patterns including: (1) focusing students on scientific practices, which involved a sequence of questions to turn students back to the scientific practice; (2) supporting students in naming observed phenomena, which involved a sequence of questions to help students use scientific language; and (3) guiding students in sensemaking, which involved a sequence of questions to help students learn about scientific practices, describe evidence, and develop explanations. Although many of the discussions in this study were not yet student-centred, they provide an image of a teacher asking specific questions that move students towards reform-oriented instruction. Implications for classroom practice are discussed and recommendations for future research are provided.

  3. Draft genome sequence of marine-derived Streptomyces sp. TP-A0598, a producer of anti-MRSA antibiotic lydicamycins.

    PubMed

    Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Fujita, Nobuyuki; Igarashi, Yasuhiro

    2015-01-01

    Streptomyces sp. TP-A0598, isolated from seawater, produces lydicamycin, structurally unique type I polyketide bearing two nitrogen-containing five-membered rings, and four congeners TPU-0037-A, -B, -C, and -D. We herein report the 8 Mb draft genome sequence of this strain, together with classification and features of the organism and generation, annotation and analysis of the genome sequence. The genome encodes 7,240 putative ORFs, of which 4,450 ORFs were assigned with COG categories. Also, 66 tRNA genes and one rRNA operon were identified. The genome contains eight gene clusters involved in the production of polyketides and nonribosomal peptides. Among them, a PKS/NRPS gene cluster was assigned to be responsible for lydicamycin biosynthesis and a plausible biosynthetic pathway was proposed on the basis of gene function prediction. This genome sequence data will facilitate to probe the potential of secondary metabolism in marine-derived Streptomyces.

  4. A corticostriatal deficit promotes temporal distortion of automatic action in ageing

    PubMed Central

    Matamales, Miriam; Skrbis, Zala; Bailey, Matthew R; Balsam, Peter D; Balleine, Bernard W; Götz, Jürgen

    2017-01-01

    The acquisition of motor skills involves implementing action sequences that increase task efficiency while reducing cognitive loads. This learning capacity depends on specific cortico-basal ganglia circuits that are affected by normal ageing. Here, combining a series of novel behavioural tasks with extensive neuronal mapping and targeted cell manipulations in mice, we explored how ageing of cortico-basal ganglia networks alters the microstructure of action throughout sequence learning. We found that, after extended training, aged mice produced shorter actions and displayed squeezed automatic behaviours characterised by ultrafast oligomeric action chunks that correlated with deficient reorganisation of corticostriatal activity. Chemogenetic disruption of a striatal subcircuit in young mice reproduced age-related within-sequence features, and the introduction of an action-related feedback cue temporarily restored normal sequence structure in aged mice. Our results reveal static properties of aged cortico-basal ganglia networks that introduce temporal limits to action automaticity, something that can compromise procedural learning in ageing. PMID:29058672

  5. Analysis and Prediction of Exon Skipping Events from RNA-Seq with Sequence Information Using Rotation Forest.

    PubMed

    Du, Xiuquan; Hu, Changlin; Yao, Yu; Sun, Shiwei; Zhang, Yanping

    2017-12-12

    In bioinformatics, exon skipping (ES) event prediction is an essential part of alternative splicing (AS) event analysis. Although many methods have been developed to predict ES events, a solution has yet to be found. In this study, given the limitations of machine learning algorithms with RNA-Seq data or genome sequences, a new feature, called RS (RNA-seq and sequence) features, was constructed. These features include RNA-Seq features derived from the RNA-Seq data and sequence features derived from genome sequences. We propose a novel Rotation Forest classifier to predict ES events with the RS features (RotaF-RSES). To validate the efficacy of RotaF-RSES, a dataset from two human tissues was used, and RotaF-RSES achieved an accuracy of 98.4%, a specificity of 99.2%, a sensitivity of 94.1%, and an area under the curve (AUC) of 98.6%. When compared to the other available methods, the results indicate that RotaF-RSES is efficient and can predict ES events with RS features.

  6. Intrinsic DNA curvature in trypanosomes.

    PubMed

    Smircich, Pablo; El-Sayed, Najib M; Garat, Beatriz

    2017-11-09

    Trypanosoma cruzi and Trypanosoma brucei are protozoan parasites causing Chagas disease and African sleeping sickness, displaying unique features of cellular and molecular biology. Remarkably, no canonical signals for RNA polymerase II promoters, which drive protein coding genes transcription, have been identified so far. The secondary structure of DNA has long been recognized as a signal in biological processes and more recently, its involvement in transcription initiation in Leishmania was proposed. In order to study whether this feature is conserved in trypanosomatids, we undertook a genome wide search for intrinsic DNA curvature in T. cruzi and T. brucei. Using a region integrated intrinsic curvature (RIIC) scoring that we previously developed, a non-random distribution of sequence-dependent curvature was observed. High RIIC scores were found to be significantly correlated with transcription start sites in T. cruzi, which have been mapped in divergent switch regions, whereas in T. brucei, the high RIIC scores correlated with sites that have been involved not only in RNA polymerase II initiation but also in termination. In addition, we observed regions with high RIIC score presenting in-phase tracts of Adenines, in the subtelomeric regions of the T. brucei chromosomes that harbor the variable surface glycoproteins genes. In both T. cruzi and T. brucei genomes, a link between DNA conformational signals and gene expression was found. High sequence dependent curvature is associated with transcriptional regulation regions. High intrinsic curvature also occurs at the T. brucei chromosome subtelomeric regions where the recombination processes involved in the evasion of the immune host system take place. These findings underscore the relevance of indirect DNA readout in these ancient eukaryotes.

  7. Grading of Gliomas by Using Radiomic Features on Multiple Magnetic Resonance Imaging (MRI) Sequences.

    PubMed

    Qin, Jiang-Bo; Liu, Zhenyu; Zhang, Hui; Shen, Chen; Wang, Xiao-Chun; Tan, Yan; Wang, Shuo; Wu, Xiao-Feng; Tian, Jie

    2017-05-07

    BACKGROUND Gliomas are the most common primary brain neoplasms. Misdiagnosis occurs in glioma grading due to an overlap in conventional MRI manifestations. The aim of the present study was to evaluate the power of radiomic features based on multiple MRI sequences - T2-Weighted-Imaging-FLAIR (FLAIR), T1-Weighted-Imaging-Contrast-Enhanced (T1-CE), and Apparent Diffusion Coefficient (ADC) map - in glioma grading, and to improve the power of glioma grading by combining features. MATERIAL AND METHODS Sixty-six patients with histopathologically proven gliomas underwent T2-FLAIR and T1WI-CE sequence scanning with some patients (n=63) also undergoing DWI scanning. A total of 114 radiomic features were derived with radiomic methods by using in-house software. All radiomic features were compared between high-grade gliomas (HGGs) and low-grade gliomas (LGGs). Features with significant statistical differences were selected for receiver operating characteristic (ROC) curve analysis. The relationships between significantly different radiomic features and glial fibrillary acidic protein (GFAP) expression were evaluated. RESULTS A total of 8 radiomic features from 3 MRI sequences displayed significant differences between LGGs and HGGs. FLAIR GLCM Cluster Shade, T1-CE GLCM Entropy, and ADC GLCM Homogeneity were the best features to use in differentiating LGGs and HGGs in each MRI sequence. The combined feature was best able to differentiate LGGs and HGGs, which improved the accuracy of glioma grading compared to the above features in each MRI sequence. A significant correlation was found between GFAP and T1-CE GLCM Entropy, as well as between GFAP and ADC GLCM Homogeneity. CONCLUSIONS The combined radiomic feature had the highest efficacy in distinguishing LGGs from HGGs.

  8. Spatio-temporal features for tracking and quadruped/biped discrimination

    NASA Astrophysics Data System (ADS)

    Rickman, Rick; Copsey, Keith; Bamber, David C.; Page, Scott F.

    2012-05-01

    Techniques such as SIFT and SURF facilitate efficient and robust image processing operations through the use of sparse and compact spatial feature descriptors and show much potential for defence and security applications. This paper considers the extension of such techniques to include information from the temporal domain, to improve utility in applications involving moving imagery within video data. In particular, the paper demonstrates how spatio-temporal descriptors can be used very effectively as the basis of a target tracking system and as target discriminators which can distinguish between bipeds and quadrupeds. Results using sequences of video imagery of walking humans and dogs are presented, and the relative merits of the approach are discussed.

  9. On the structural context and identification of enzyme catalytic residues.

    PubMed

    Chien, Yu-Tung; Huang, Shao-Wei

    2013-01-01

    Enzymes play important roles in most of the biological processes. Although only a small fraction of residues are directly involved in catalytic reactions, these catalytic residues are the most crucial parts in enzymes. The study of the fundamental and unique features of catalytic residues benefits the understanding of enzyme functions and catalytic mechanisms. In this work, we analyze the structural context of catalytic residues based on theoretical and experimental structure flexibility. The results show that catalytic residues have distinct structural features and context. Their neighboring residues, whether sequence or structure neighbors within specific range, are usually structurally more rigid than those of noncatalytic residues. The structural context feature is combined with support vector machine to identify catalytic residues from enzyme structure. The prediction results are better or comparable to those of recent structure-based prediction methods.

  10. Two cases of X-linked juvenile retinoschisis with different optical coherence tomography findings and RS1 gene mutations.

    PubMed

    Chan, Wai Man; Choy, Kwong Wai; Wang, Jianghua; Lam, Dennis S C; Yip, Wilson W K; Fu, Weiling; Pang, Chi Pui

    2004-08-01

    The optical coherence tomography (OCT) findings, clinical features, and mutations in the RS1 gene of two unrelated patients with X-linked retinoschisis (XLRS) are reported herein. Two Chinese patients with early onset XLRS were given a comprehensive ophthalmologic examination and OCT investigation. The RS1 gene was screened for sequence alterations in all exons and splice regions. The two patients presented with different phenotypic features and OCT findings. One patient with more severe clinical presentation had a RS1 exon 1 deletion and a P193S mutation was found in the other patient with mild macular involvement. OCT demonstrates the markedly different features of XLRS patients with different RS1 mutations. This study strengthens the role of OCT in the diagnosis and monitoring of XLRS.

  11. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Sueyoshi, Eijun, E-mail: EijunSueyoshi@aol.com; Sakamoto, Ichiro; Okimoto, Tomoaki

    Amyloidosis is a rare systemic disease. However, involvement of the heart is a common finding and is the most frequent cause of death in amyloidosis. We report the sonographic, scintigraphic, and MRI features of a pathologically proven case of cardiac amyloidosis. Delayed contrast-enhanced MR images, using an inversion recovery prepped gradient-echo sequence, revealed diffuse enhancement in the wall of both left and right ventricles. This enhancement suggested expansion of the extracellular space of the myocardium caused by diffuse myocardial necrosis secondary to deposition of amyloid.

  12. Complete genome sequence of Pedobacter heparinus type strain (HIM 762-3T)

    PubMed Central

    Han, Cliff; Spring, Stefan; Lapidus, Alla; Del Rio, Tijana Glavina; Tice, Hope; Copeland, Alex; Cheng, Jan-Fang; Lucas, Susan; Chen, Feng; Nolan, Matt; Bruce, David; Goodwin, Lynne; Pitluck, Sam; Ivanova, Natalia; Mavromatis, Konstantinos; Mikhailova, Natalia; Pati, Amrita; Chen, Amy; Palaniappan, Krishna; Land, Miriam; Hauser, Loren; Chang, Yun-Juan; Jeffries, Cynthia C.; Saunders, Elizabeth; Chertkov, Olga; Brettin, Thomas; Göker, Markus; Rohde, Manfred; Bristow, Jim; Eisen, Jonathan A.; Markowitz, Victor; Hugenholtz, Philip; Kyrpides, Nikos C.; Klenk, Hans-Peter; Detter, John C.

    2009-01-01

    Pedobacter heparinus (Payza and Korn 1956) Steyn et al. 1998 comb. nov. is the type species of the rapidly growing genus Pedobacter within the family Sphingobacteriaceae of the phylum ‘Bacteroidetes’. P. heparinus is of interest, because it was the first isolated strain shown to grow with heparin as sole carbon and nitrogen source and because it produces several enzymes involved in the degradation of mucopolysaccharides. All available data about this species are based on a sole strain that was isolated from dry soil. Here we describe the features of this organism, together with the complete genome sequence, and annotation. This is the first report on a complete genome sequence of a member of the genus Pedobacter, and the 5,167,383 bp long single replicon genome with its 4287 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project. PMID:21304637

  13. Dual-echo ASL based assessment of motor networks: a feasibility study

    NASA Astrophysics Data System (ADS)

    Storti, Silvia Francesca; Boscolo Galazzo, Ilaria; Pizzini, Francesca B.; Menegaz, Gloria

    2018-04-01

    Objective. Dual-echo arterial spin labeling (DE-ASL) technique has been recently proposed for the simultaneous acquisition of ASL and blood-oxygenation-level-dependent (BOLD)-functional magnetic resonance imaging (fMRI) data. The assessment of this technique in detecting functional connectivity at rest or during motor and motor imagery tasks is still unexplored both per-se and in comparison with conventional methods. The purpose is to quantify the sensitivity of the DE-ASL sequence with respect to the conventional fMRI sequence (cvBOLD) in detecting brain activations, and to assess and compare the relevance of node features in decoding the network structure. Approach. Thirteen volunteers were scanned acquiring a pseudo-continuous DE-ASL sequence from which the concomitant BOLD (ccBOLD) simultaneously to the ASL can be extracted. The approach consists of two steps: (i) model-based analyses for assessing brain activations at individual and group levels, followed by statistical analysis for comparing the activation elicited by the three sequences under two conditions (motor and motor imagery), respectively; (ii) brain connectivity graph-theoretical analysis for assessing and comparing the network models properties. Main results. Our results suggest that cvBOLD and ccBOLD have comparable sensitivity in detecting the regions involved in the active task, whereas ASL offers a higher degree of co-localization with smaller activation volumes. The connectivity results and the comparative analysis of node features across sequences revealed that there are no strong changes between rest and tasks and that the differences between the sequences are limited to few connections. Significance. Considering the comparable sensitivity of the ccBOLD and cvBOLD sequences in detecting activated brain regions, the results demonstrate that DE-ASL can be successfully applied in functional studies allowing to obtain both ASL and BOLD information within a single sequence. Further, DE-ASL is a powerful technique for research and clinical applications allowing to perform quantitative comparisons as well as to characterize functional connectivity.

  14. Extracting features from protein sequences to improve deep extreme learning machine for protein fold recognition.

    PubMed

    Ibrahim, Wisam; Abadeh, Mohammad Saniee

    2017-05-21

    Protein fold recognition is an important problem in bioinformatics to predict three-dimensional structure of a protein. One of the most challenging tasks in protein fold recognition problem is the extraction of efficient features from the amino-acid sequences to obtain better classifiers. In this paper, we have proposed six descriptors to extract features from protein sequences. These descriptors are applied in the first stage of a three-stage framework PCA-DELM-LDA to extract feature vectors from the amino-acid sequences. Principal Component Analysis PCA has been implemented to reduce the number of extracted features. The extracted feature vectors have been used with original features to improve the performance of the Deep Extreme Learning Machine DELM in the second stage. Four new features have been extracted from the second stage and used in the third stage by Linear Discriminant Analysis LDA to classify the instances into 27 folds. The proposed framework is implemented on the independent and combined feature sets in SCOP datasets. The experimental results show that extracted feature vectors in the first stage could improve the performance of DELM in extracting new useful features in second stage. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Comparative sequence analysis suggests a conserved gating mechanism for TRP channels

    PubMed Central

    Palovcak, Eugene; Delemotte, Lucie; Klein, Michael L.

    2015-01-01

    The transient receptor potential (TRP) channel superfamily plays a central role in transducing diverse sensory stimuli in eukaryotes. Although dissimilar in sequence and domain organization, all known TRP channels act as polymodal cellular sensors and form tetrameric assemblies similar to those of their distant relatives, the voltage-gated potassium (Kv) channels. Here, we investigated the related questions of whether the allosteric mechanism underlying polymodal gating is common to all TRP channels, and how this mechanism differs from that underpinning Kv channel voltage sensitivity. To provide insight into these questions, we performed comparative sequence analysis on large, comprehensive ensembles of TRP and Kv channel sequences, contextualizing the patterns of conservation and correlation observed in the TRP channel sequences in light of the well-studied Kv channels. We report sequence features that are specific to TRP channels and, based on insight from recent TRPV1 structures, we suggest a model of TRP channel gating that differs substantially from the one mediating voltage sensitivity in Kv channels. The common mechanism underlying polymodal gating involves the displacement of a defect in the H-bond network of S6 that changes the orientation of the pore-lining residues at the hydrophobic gate. PMID:26078053

  16. TagDust2: a generic method to extract reads from sequencing data.

    PubMed

    Lassmann, Timo

    2015-01-28

    Arguably the most basic step in the analysis of next generation sequencing data (NGS) involves the extraction of mappable reads from the raw reads produced by sequencing instruments. The presence of barcodes, adaptors and artifacts subject to sequencing errors makes this step non-trivial. Here I present TagDust2, a generic approach utilizing a library of hidden Markov models (HMM) to accurately extract reads from a wide array of possible read architectures. TagDust2 extracts more reads of higher quality compared to other approaches. Processing of multiplexed single, paired end and libraries containing unique molecular identifiers is fully supported. Two additional post processing steps are included to exclude known contaminants and filter out low complexity sequences. Finally, TagDust2 can automatically detect the library type of sequenced data from a predefined selection. Taken together TagDust2 is a feature rich, flexible and adaptive solution to go from raw to mappable NGS reads in a single step. The ability to recognize and record the contents of raw reads will help to automate and demystify the initial, and often poorly documented, steps in NGS data analysis pipelines. TagDust2 is freely available at: http://tagdust.sourceforge.net .

  17. Novel techniques for data decomposition and load balancing for parallel processing of vision systems: Implementation and evaluation using a motion estimation system

    NASA Technical Reports Server (NTRS)

    Choudhary, Alok Nidhi; Leung, Mun K.; Huang, Thomas S.; Patel, Janak H.

    1989-01-01

    Computer vision systems employ a sequence of vision algorithms in which the output of an algorithm is the input of the next algorithm in the sequence. Algorithms that constitute such systems exhibit vastly different computational characteristics, and therefore, require different data decomposition techniques and efficient load balancing techniques for parallel implementation. However, since the input data for a task is produced as the output data of the previous task, this information can be exploited to perform knowledge based data decomposition and load balancing. Presented here are algorithms for a motion estimation system. The motion estimation is based on the point correspondence between the involved images which are a sequence of stereo image pairs. Researchers propose algorithms to obtain point correspondences by matching feature points among stereo image pairs at any two consecutive time instants. Furthermore, the proposed algorithms employ non-iterative procedures, which results in saving considerable amounts of computation time. The system consists of the following steps: (1) extraction of features; (2) stereo match of images in one time instant; (3) time match of images from consecutive time instants; (4) stereo match to compute final unambiguous points; and (5) computation of motion parameters.

  18. Human β-glucuronidase: structure, function, and application in enzyme replacement therapy.

    PubMed

    Naz, Huma; Islam, Asimul; Waheed, Abdul; Sly, William S; Ahmad, Faizan; Hassan, Imtaiyaz

    2013-10-01

    Lysosomal storage diseases occur due to incomplete metabolic degradation of macromolecules by various hydrolytic enzymes in the lysosome. Despite structural differences, most of the lysosomal enzymes share many common features including a lysosomal targeting motif and phosphotransferase recognition sites. β-Glucuronidase (GUSB) is an important lysosomal enzyme involved in the degradation of glucuronate-containing glycosaminoglycan. The deficiency of GUSB causes mucopolysaccharidosis type VII (MPSVII), leading to lysosomal storage in the brain. GUSB is a well-studied protein for its expression, sequence, structure, and function. The purpose of this review is to summarize our current understanding of sequence, structure, function, and evolution of GUSB and its lysosomal enzyme targeting. Enzyme replacement therapy reported for this protein is also discussed.

  19. Wheat beta-expansin (EXPB11) genes: Identification of the expressed gene on chromosome 3BS carrying a pollen allergen domain

    PubMed Central

    2010-01-01

    Background Expansins form a large multi-gene family found in wheat and other cereal genomes that are involved in the expansion of cell walls as a tissue grows. The expansin family can be divided up into two main groups, namely, alpha-expansin (EXPA) and beta-expansin proteins (EXPB), with the EXPB group being of particular interest as group 1-pollen allergens. Results In this study, three beta-expansin genes were identified and characterized from a newly sequenced region of the Triticum aestivum cv. Chinese Spring chromosome 3B physical map at the Sr2 locus (FPC contig ctg11). The analysis of a 357 kb sub-sequence of FPC contig ctg11 identified one beta-expansin genes to be TaEXPB11, originally identified as a cDNA from the wheat cv Wyuna. Through the analysis of intron sequences of the three wheat cv. Chinese Spring genes, we propose that two of these beta-expansin genes are duplications of the TaEXPB11 gene. Comparative sequence analysis with two other wheat cultivars (cv. Westonia and cv. Hope) and a Triticum aestivum var. spelta line validated the identification of the Chinese Spring variant of TaEXPB11. The expression in maternal and grain tissues was confirmed by examining EST databases and carrying out RT-PCR experiments. Detailed examination of the position of TaEXPB11 relative to the locus encoding Sr2 disease resistance ruled out the possibility of this gene directly contributing to the resistance phenotype. Conclusions Through 3-D structural protein comparisons with Zea mays EXPB1, we proposed that variations within the coding sequence of TaEXPB11 in wheats may produce a functional change within features such as domain 1 related to possible involvement in cell wall structure and domain 2 defining the pollen allergen domain and binding to IgE protein. The variation established in this gene suggests it is a clearly identifiable member of a gene family and reflects the dynamic features of the wheat genome as it adapted to a range of different environments and uses. Accession Numbers: ctg11 =FN564426 Survey sequences of TaEXPB11ws and TsEXPB11 are provided request. PMID:20507562

  20. Somatic mutations affect key pathways in lung adenocarcinoma

    PubMed Central

    Ding, Li; Getz, Gad; Wheeler, David A.; Mardis, Elaine R.; McLellan, Michael D.; Cibulskis, Kristian; Sougnez, Carrie; Greulich, Heidi; Muzny, Donna M.; Morgan, Margaret B.; Fulton, Lucinda; Fulton, Robert S.; Zhang, Qunyuan; Wendl, Michael C.; Lawrence, Michael S.; Larson, David E.; Chen, Ken; Dooling, David J.; Sabo, Aniko; Hawes, Alicia C.; Shen, Hua; Jhangiani, Shalini N.; Lewis, Lora R.; Hall, Otis; Zhu, Yiming; Mathew, Tittu; Ren, Yanru; Yao, Jiqiang; Scherer, Steven E.; Clerc, Kerstin; Metcalf, Ginger A.; Ng, Brian; Milosavljevic, Aleksandar; Gonzalez-Garay, Manuel L.; Osborne, John R.; Meyer, Rick; Shi, Xiaoqi; Tang, Yuzhu; Koboldt, Daniel C.; Lin, Ling; Abbott, Rachel; Miner, Tracie L.; Pohl, Craig; Fewell, Ginger; Haipek, Carrie; Schmidt, Heather; Dunford-Shore, Brian H.; Kraja, Aldi; Crosby, Seth D.; Sawyer, Christopher S.; Vickery, Tammi; Sander, Sacha; Robinson, Jody; Winckler, Wendy; Baldwin, Jennifer; Chirieac, Lucian R.; Dutt, Amit; Fennell, Tim; Hanna, Megan; Johnson, Bruce E.; Onofrio, Robert C.; Thomas, Roman K.; Tonon, Giovanni; Weir, Barbara A.; Zhao, Xiaojun; Ziaugra, Liuda; Zody, Michael C.; Giordano, Thomas; Orringer, Mark B.; Roth, Jack A.; Spitz, Margaret R.; Wistuba, Ignacio I.; Ozenberger, Bradley; Good, Peter J.; Chang, Andrew C.; Beer, David G.; Watson, Mark A.; Ladanyi, Marc; Broderick, Stephen; Yoshizawa, Akihiko; Travis, William D.; Pao, William; Province, Michael A.; Weinstock, George M.; Varmus, Harold E.; Gabriel, Stacey B.; Lander, Eric S.; Gibbs, Richard A.; Meyerson, Matthew; Wilson, Richard K.

    2009-01-01

    Determining the genetic basis of cancer requires comprehensive analyses of large collections of histopathologically well-classified primary tumours. Here we report the results of a collaborative study to discover somatic mutations in 188 human lung adenocarcinomas. DNA sequencing of 623 genes with known or potential relationships to cancer revealed more than 1,000 somatic mutations across the samples. Our analysis identified 26 genes that are mutated at significantly high frequencies and thus are probably involved in carcinogenesis. The frequently mutated genes include tyrosine kinases, among them the EGFR homologue ERBB4; multiple ephrin receptor genes, notably EPHA3; vascular endothelial growth factor receptor KDR; and NTRK genes. These data provide evidence of somatic mutations in primary lung adenocarcinoma for several tumour suppressor genes involved in other cancers—including NF1, APC, RB1 and ATM—and for sequence changes in PTPRD as well as the frequently deleted gene LRP1B. The observed mutational profiles correlate with clinical features, smoking status and DNA repair defects. These results are reinforced by data integration including single nucleotide polymorphism array and gene expression array. Our findings shed further light on several important signalling pathways involved in lung adenocarcinoma, and suggest new molecular targets for treatment. PMID:18948947

  1. Magnetic resonance features of cerebral malaria.

    PubMed

    Yadav, P; Sharma, R; Kumar, S; Kumar, U

    2008-06-01

    Cerebral malaria is a major health hazard, with a high incidence of mortality. The disease is endemic in many developing countries, but with a greater increase in tourism, occasional cases may be detected in countries where the disease in not prevalent. Early diagnosis and evaluation of cerebral involvement in malaria utilizing modern imaging modalities have an impact on the treatment and clinical outcome. To evaluate the magnetic resonance (MR) features of patients with cerebral malaria presenting with altered sensorium. We present the findings in three patients with cerebral malaria presenting with altered sensorium. MR imaging using a 1.5-Tesla unit was carried out. The sequences performed were 5-mm-thick T1-weighted, T2-weighted, fluid-attenuated inversion-recovery (FLAIR), and T2-weighted gradient-echo axial sequences, and sagittal and coronal FLAIR. Diffusion-weighted imaging was performed with b values of 0 and 1000 s/mm(2), and apparent diffusion coefficient (ADC) maps were obtained. Focal hyperintensities in the bilateral periventricular white matter, corpus callosum, occipital subcortex, and bilateral thalami were noticed on T2-weighted and FLAIR sequences. The lesions were more marked in the splenium of the corpus callosum. No enhancement on postcontrast T1-weighted MR images was observed. There was no evidence of restricted diffusion on the diffusion-weighted sequence and ADC map. MR is a sensitive imaging modality, with a role in the assessment of cerebral lesions in malaria. Focal white matter and corpus callosal lesions without any restricted diffusion were the key findings in our patients.

  2. Predicting discovery rates of genomic features.

    PubMed

    Gravel, Simon

    2014-06-01

    Successful sequencing experiments require judicious sample selection. However, this selection must often be performed on the basis of limited preliminary data. Predicting the statistical properties of the final sample based on preliminary data can be challenging, because numerous uncertain model assumptions may be involved. Here, we ask whether we can predict "omics" variation across many samples by sequencing only a fraction of them. In the infinite-genome limit, we find that a pilot study sequencing 5% of a population is sufficient to predict the number of genetic variants in the entire population within 6% of the correct value, using an estimator agnostic to demography, selection, or population structure. To reach similar accuracy in a finite genome with millions of polymorphisms, the pilot study would require ∼15% of the population. We present computationally efficient jackknife and linear programming methods that exhibit substantially less bias than the state of the art when applied to simulated data and subsampled 1000 Genomes Project data. Extrapolating based on the National Heart, Lung, and Blood Institute Exome Sequencing Project data, we predict that 7.2% of sites in the capture region would be variable in a sample of 50,000 African Americans and 8.8% in a European sample of equal size. Finally, we show how the linear programming method can also predict discovery rates of various genomic features, such as the number of transcription factor binding sites across different cell types. Copyright © 2014 by the Genetics Society of America.

  3. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Bossi, Flavia; Fan, Jue; Xiao, Jun

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  4. Systematic discovery of novel eukaryotic transcriptional regulators using sequence homology independent prediction

    DOE PAGES

    Bossi, Flavia; Fan, Jue; Xiao, Jun; ...

    2017-06-26

    Here, the molecular function of a gene is most commonly inferred by sequence similarity. Therefore, genes that lack sufficient sequence similarity to characterized genes (such as certain classes of transcriptional regulators) are difficult to classify using most function prediction algorithms and have remained uncharacterized. As a result, to identify novel transcriptional regulators systematically, we used a feature-based pipeline to screen protein families of unknown function. This method predicted 43 transcriptional regulator families in Arabidopsis thaliana, 7 families in Drosophila melanogaster, and 9 families in Homo sapiens. Literature curation validated 12 of the predicted families to be involved in transcriptional regulation.more » We tested 33 out of the 195 Arabidopsis putative transcriptional regulators for their ability to activate transcription of a reporter gene in planta and found twelve coactivators, five of which had no prior literature support. To investigate mechanisms of action in which the predicted regulators might work, we looked for interactors of an Arabidopsis candidate that did not show transactivation activity in planta and found that it might work with other members of its own family and a subunit of the Polycomb Repressive Complex 2 to regulate transcription. Our results demonstrate the feasibility of assigning molecular function to proteins of unknown function without depending on sequence similarity. In particular, we identified novel transcriptional regulators using biological features enriched in transcription factors. The predictions reported here should accelerate the characterization of novel regulators.« less

  5. Genome sequence of Methanobacterium congolense strain Buetzberg, a hydrogenotrophic, methanogenic archaeon, isolated from a mesophilic industrial-scale biogas plant utilizing bio-waste.

    PubMed

    Tejerizo, Gonzalo Torres; Kim, Yong Sung; Maus, Irena; Wibberg, Daniel; Winkler, Anika; Off, Sandra; Pühler, Alfred; Scherer, Paul; Schlüter, Andreas

    2017-04-10

    Methanogenic Archaea are of importance at the end of the anaerobic digestion (AD) chain for biomass conversion. They finally produce methane, the end-product of AD. Among this group of microorganisms, members of the genus Methanobacterium are ubiquitously present in anaerobic habitats, such as bioreactors. The genome of a novel methanogenic archaeon, namely Methanobacterium congolense Buetzberg, originally isolated from a mesophilic biogas plant, was completely sequenced to analyze putative adaptive genome features conferring competitiveness of this isolate within the biogas reactor environment. Sequencing and assembly of the M. congolense Buetzberg genome yielded a chromosome with a size of 2,451,457bp and a mean GC-content of 38.51%. Additionally, a plasmid with a size of 18,118bp, featuring a GC content of 36.05% was identified. The M. congolense Buetzberg plasmid showed no sequence similarities with the plasmids described previously suggesting that it represents a new plasmid type. Analysis of the M. congolense Buetzberg chromosome architecture revealed a high collinearity with the Methanobacterium paludis chromosome. Furthermore, annotation of the genome and functional predictions disclosed several genes involved in cell wall and membrane biogenesis. Compilation of specific genes among Methanobacterium strains originating from AD environments revealed 474 genetic determinants that could be crucial for adaptation of these strains to specific conditions prevailing in AD habitats. Copyright © 2017 Elsevier B.V. All rights reserved.

  6. LINE-1 retrotransposons: from 'parasite' sequences to functional elements.

    PubMed

    Paço, Ana; Adega, Filomena; Chaves, Raquel

    2015-02-01

    Long interspersed nuclear elements-1 (LINE-1) are the most abundant and active retrotransposons in the mammalian genomes. Traditionally, the occurrence of LINE-1 sequences in the genome of mammals has been explained by the selfish DNA hypothesis. Nevertheless, recently, it has also been argued that these sequences could play important roles in these genomes, as in the regulation of gene expression, genome modelling and X-chromosome inactivation. The non-random chromosomal distribution is a striking feature of these retroelements that somehow reflects its functionality. In the present study, we have isolated and analysed a fraction of the open reading frame 2 (ORF2) LINE-1 sequence from three rodent species, Cricetus cricetus, Peromyscus eremicus and Praomys tullbergi. Physical mapping of the isolated sequences revealed an interspersed longitudinal AT pattern of distribution along all the chromosomes of the complement in the three genomes. A detailed analysis shows that these sequences are preferentially located in the euchromatic regions, although some signals could be detected in the heterochromatin. In addition, a coincidence between the location of imprinted gene regions (as Xist and Tsix gene regions) and the LINE-1 retroelements was also observed. According to these results, we propose an involvement of LINE-1 sequences in different genomic events as gene imprinting, X-chromosome inactivation and evolution of repetitive sequences located at the heterochromatic regions (e.g. satellite DNA sequences) of the rodents' genomes analysed.

  7. A streamlined method for analysing genome-wide DNA methylation patterns from low amounts of FFPE DNA.

    PubMed

    Ludgate, Jackie L; Wright, James; Stockwell, Peter A; Morison, Ian M; Eccles, Michael R; Chatterjee, Aniruddha

    2017-08-31

    Formalin fixed paraffin embedded (FFPE) tumor samples are a major source of DNA from patients in cancer research. However, FFPE is a challenging material to work with due to macromolecular fragmentation and nucleic acid crosslinking. FFPE tissue particularly possesses challenges for methylation analysis and for preparing sequencing-based libraries relying on bisulfite conversion. Successful bisulfite conversion is a key requirement for sequencing-based methylation analysis. Here we describe a complete and streamlined workflow for preparing next generation sequencing libraries for methylation analysis from FFPE tissues. This includes, counting cells from FFPE blocks and extracting DNA from FFPE slides, testing bisulfite conversion efficiency with a polymerase chain reaction (PCR) based test, preparing reduced representation bisulfite sequencing libraries and massively parallel sequencing. The main features and advantages of this protocol are: An optimized method for extracting good quality DNA from FFPE tissues. An efficient bisulfite conversion and next generation sequencing library preparation protocol that uses 50 ng DNA from FFPE tissue. Incorporation of a PCR-based test to assess bisulfite conversion efficiency prior to sequencing. We provide a complete workflow and an integrated protocol for performing DNA methylation analysis at the genome-scale and we believe this will facilitate clinical epigenetic research that involves the use of FFPE tissue.

  8. Prediction of enhancer-promoter interactions via natural language processing.

    PubMed

    Zeng, Wanwen; Wu, Mengmeng; Jiang, Rui

    2018-05-09

    Precise identification of three-dimensional genome organization, especially enhancer-promoter interactions (EPIs), is important to deciphering gene regulation, cell differentiation and disease mechanisms. Currently, it is a challenging task to distinguish true interactions from other nearby non-interacting ones since the power of traditional experimental methods is limited due to low resolution or low throughput. We propose a novel computational framework EP2vec to assay three-dimensional genomic interactions. We first extract sequence embedding features, defined as fixed-length vector representations learned from variable-length sequences using an unsupervised deep learning method in natural language processing. Then, we train a classifier to predict EPIs using the learned representations in supervised way. Experimental results demonstrate that EP2vec obtains F1 scores ranging from 0.841~ 0.933 on different datasets, which outperforms existing methods. We prove the robustness of sequence embedding features by carrying out sensitivity analysis. Besides, we identify motifs that represent cell line-specific information through analysis of the learned sequence embedding features by adopting attention mechanism. Last, we show that even superior performance with F1 scores 0.889~ 0.940 can be achieved by combining sequence embedding features and experimental features. EP2vec sheds light on feature extraction for DNA sequences of arbitrary lengths and provides a powerful approach for EPIs identification.

  9. From the Cover: Genome analysis of the smallest free-living eukaryote Ostreococcus tauri unveils many unique features

    NASA Astrophysics Data System (ADS)

    Derelle, Evelyne; Ferraz, Conchita; Rombauts, Stephane; Rouzé, Pierre; Worden, Alexandra Z.; Robbens, Steven; Partensky, Frédéric; Degroeve, Sven; Echeynié, Sophie; Cooke, Richard; Saeys, Yvan; Wuyts, Jan; Jabbari, Kamel; Bowler, Chris; Panaud, Olivier; Piégu, Benoît; Ball, Steven G.; Ral, Jean-Philippe; Bouget, François-Yves; Piganeau, Gwenael; de Baets, Bernard; Picard, André; Delseny, Michel; Demaille, Jacques; van de Peer, Yves; Moreau, Hervé

    2006-08-01

    The green lineage is reportedly 1,500 million years old, evolving shortly after the endosymbiosis event that gave rise to early photosynthetic eukaryotes. In this study, we unveil the complete genome sequence of an ancient member of this lineage, the unicellular green alga Ostreococcus tauri (Prasinophyceae). This cosmopolitan marine primary producer is the world's smallest free-living eukaryote known to date. Features likely reflecting optimization of environmentally relevant pathways, including resource acquisition, unusual photosynthesis apparatus, and genes potentially involved in C4 photosynthesis, were observed, as was downsizing of many gene families. Overall, the 12.56-Mb nuclear genome has an extremely high gene density, in part because of extensive reduction of intergenic regions and other forms of compaction such as gene fusion. However, the genome is structurally complex. It exhibits previously unobserved levels of heterogeneity for a eukaryote. Two chromosomes differ structurally from the other eighteen. Both have a significantly biased G+C content, and, remarkably, they contain the majority of transposable elements. Many chromosome 2 genes also have unique codon usage and splicing, but phylogenetic analysis and composition do not support alien gene origin. In contrast, most chromosome 19 genes show no similarity to green lineage genes and a large number of them are specialized in cell surface processes. Taken together, the complete genome sequence, unusual features, and downsized gene families, make O. tauri an ideal model system for research on eukaryotic genome evolution, including chromosome specialization and green lineage ancestry. genome heterogeneity | genome sequence | green alga | Prasinophyceae | gene prediction

  10. HMPAS: Human Membrane Protein Analysis System

    PubMed Central

    2013-01-01

    Background Membrane proteins perform essential roles in diverse cellular functions and are regarded as major pharmaceutical targets. The significance of membrane proteins has led to the developing dozens of resources related with membrane proteins. However, most of these resources are built for specific well-known membrane protein groups, making it difficult to find common and specific features of various membrane protein groups. Methods We collected human membrane proteins from the dispersed resources and predicted novel membrane protein candidates by using ortholog information and our membrane protein classifiers. The membrane proteins were classified according to the type of interaction with the membrane, subcellular localization, and molecular function. We also made new feature dataset to characterize the membrane proteins in various aspects including membrane protein topology, domain, biological process, disease, and drug. Moreover, protein structure and ICD-10-CM based integrated disease and drug information was newly included. To analyze the comprehensive information of membrane proteins, we implemented analysis tools to identify novel sequence and functional features of the classified membrane protein groups and to extract features from protein sequences. Results We constructed HMPAS with 28,509 collected known membrane proteins and 8,076 newly predicted candidates. This system provides integrated information of human membrane proteins individually and in groups organized by 45 subcellular locations and 1,401 molecular functions. As a case study, we identified associations between the membrane proteins and diseases and present that membrane proteins are promising targets for diseases related with nervous system and circulatory system. A web-based interface of this system was constructed to facilitate researchers not only to retrieve organized information of individual proteins but also to use the tools to analyze the membrane proteins. Conclusions HMPAS provides comprehensive information about human membrane proteins including specific features of certain membrane protein groups. In this system, user can acquire the information of individual proteins and specified groups focused on their conserved sequence features, involved cellular processes, and diseases. HMPAS may contribute as a valuable resource for the inference of novel cellular mechanisms and pharmaceutical targets associated with the human membrane proteins. HMPAS is freely available at http://fcode.kaist.ac.kr/hmpas. PMID:24564858

  11. Insights into heliobacterial photosynthesis and physiology from the genome of Heliobacterium modesticaldum.

    PubMed

    Sattley, W Matthew; Blankenship, Robert E

    2010-06-01

    The complete annotated genome sequence of Heliobacterium modesticaldum strain Ice1 provides our first glimpse into the genetic potential of the Heliobacteriaceae, a unique family of anoxygenic phototrophic bacteria. H. modesticaldum str. Ice1 is the first completely sequenced phototrophic representative of the Firmicutes, and heliobacteria are the only phototrophic members of this large bacterial phylum. The H. modesticaldum genome consists of a single 3.1-Mb circular chromosome with no plasmids. Of special interest are genomic features that lend insight to the physiology and ecology of heliobacteria, including the genetic inventory of the photosynthesis gene cluster. Genes involved in transport, photosynthesis, and central intermediary metabolism are described and catalogued. The obligately heterotrophic metabolism of heliobacteria is a key feature of the physiology and evolution of these phototrophs. The conspicuous absence of recognizable genes encoding the enzyme ATP-citrate lyase prevents autotrophic growth via the reverse citric acid cycle in heliobacteria, thus being a distinguishing differential characteristic between heliobacteria and green sulfur bacteria. The identities of electron carriers that enable energy conservation by cyclic light-driven electron transfer remain in question.

  12. Enhanced flyby science with onboard computer vision: Tracking and surface feature detection at small bodies

    NASA Astrophysics Data System (ADS)

    Fuchs, Thomas J.; Thompson, David R.; Bue, Brian D.; Castillo-Rogez, Julie; Chien, Steve A.; Gharibian, Dero; Wagstaff, Kiri L.

    2015-10-01

    Spacecraft autonomy is crucial to increase the science return of optical remote sensing observations at distant primitive bodies. To date, most small bodies exploration has involved short timescale flybys that execute prescripted data collection sequences. Light time delay means that the spacecraft must operate completely autonomously without direct control from the ground, but in most cases the physical properties and morphologies of prospective targets are unknown before the flyby. Surface features of interest are highly localized, and successful observations must account for geometry and illumination constraints. Under these circumstances onboard computer vision can improve science yield by responding immediately to collected imagery. It can reacquire bad data or identify features of opportunity for additional targeted measurements. We present a comprehensive framework for onboard computer vision for flyby missions at small bodies. We introduce novel algorithms for target tracking, target segmentation, surface feature detection, and anomaly detection. The performance and generalization power are evaluated in detail using expert annotations on data sets from previous encounters with primitive bodies.

  13. Arabidopsis thaliana telomeres exhibit euchromatic features

    PubMed Central

    Vaquero-Sedas, María I.; Gámez-Arjona, Francisco M.; Vega-Palas, Miguel A.

    2011-01-01

    Telomere function is influenced by chromatin structure and organization, which usually involves epigenetic modifications. We describe here the chromatin structure of Arabidopsis thaliana telomeres. Based on the study of six different epigenetic marks we show that Arabidopsis telomeres exhibit euchromatic features. In contrast, subtelomeric regions and telomeric sequences present at interstitial chromosomal loci are heterochromatic. Histone methyltransferases and the chromatin remodeling protein DDM1 control subtelomeric heterochromatin formation. Whereas histone methyltransferases are required for histone H3K92Me and non-CpG DNA methylation, DDM1 directs CpG methylation but not H3K92Me or non-CpG methylation. These results argue that both kinds of proteins participate in different pathways to reinforce subtelomeric heterochromatin formation. PMID:21071395

  14. Proteiniphilum saccharofermentans str. M3/6T isolated from a laboratory biogas reactor is versatile in polysaccharide and oligopeptide utilization as deduced from genome-based metabolic reconstructions.

    PubMed

    Tomazetto, Geizecler; Hahnke, Sarah; Wibberg, Daniel; Pühler, Alfred; Klocke, Michael; Schlüter, Andreas

    2018-06-01

    Proteiniphilum saccharofermentans str. M3/6 T is a recently described species within the family Porphyromonadaceae (phylum Bacteroidetes ), which was isolated from a mesophilic laboratory-scale biogas reactor. The genome of the strain was completely sequenced and manually annotated to reconstruct its metabolic potential regarding biomass degradation and fermentation pathways. The P. saccharofermentans str. M3/6 T genome consists of a 4,414,963 bp chromosome featuring an average GC-content of 43.63%. Genome analyses revealed that the strain possesses 3396 protein-coding sequences. Among them are 158 genes assigned to the carbohydrate-active-enzyme families as defined by the CAZy database, including 116 genes encoding glycosyl hydrolases (GHs) involved in pectin, arabinogalactan, hemicellulose (arabinan, xylan, mannan, β-glucans), starch, fructan and chitin degradation. The strain also features several transporter genes, some of which are located in polysaccharide utilization loci (PUL). PUL gene products are involved in glycan binding, transport and utilization at the cell surface. In the genome of strain M3/6 T , 64 PUL are present and most of them in association with genes encoding carbohydrate-active enzymes. Accordingly, the strain was predicted to metabolize several sugars yielding carbon dioxide, hydrogen, acetate, formate, propionate and isovalerate as end-products of the fermentation process. Moreover, P. saccharofermentans str. M3/6 T encodes extracellular and intracellular proteases and transporters predicted to be involved in protein and oligopeptide degradation. Comparative analyses between P. saccharofermentans str. M3/6 T and its closest described relative P. acetatigenes str. DSM 18083 T indicate that both strains share a similar metabolism regarding decomposition of complex carbohydrates and fermentation of sugars.

  15. repRNA: a web server for generating various feature vectors of RNA sequences.

    PubMed

    Liu, Bin; Liu, Fule; Fang, Longyun; Wang, Xiaolong; Chou, Kuo-Chen

    2016-02-01

    With the rapid growth of RNA sequences generated in the postgenomic age, it is highly desired to develop a flexible method that can generate various kinds of vectors to represent these sequences by focusing on their different features. This is because nearly all the existing machine-learning methods, such as SVM (support vector machine) and KNN (k-nearest neighbor), can only handle vectors but not sequences. To meet the increasing demands and speed up the genome analyses, we have developed a new web server, called "representations of RNA sequences" (repRNA). Compared with the existing methods, repRNA is much more comprehensive, flexible and powerful, as reflected by the following facts: (1) it can generate 11 different modes of feature vectors for users to choose according to their investigation purposes; (2) it allows users to select the features from 22 built-in physicochemical properties and even those defined by users' own; (3) the resultant feature vectors and the secondary structures of the corresponding RNA sequences can be visualized. The repRNA web server is freely accessible to the public at http://bioinformatics.hitsz.edu.cn/repRNA/ .

  16. An Alignment-Free Algorithm in Comparing the Similarity of Protein Sequences Based on Pseudo-Markov Transition Probabilities among Amino Acids

    PubMed Central

    Li, Yushuang; Yang, Jiasheng; Zhang, Yi

    2016-01-01

    In this paper, we have proposed a novel alignment-free method for comparing the similarity of protein sequences. We first encode a protein sequence into a 440 dimensional feature vector consisting of a 400 dimensional Pseudo-Markov transition probability vector among the 20 amino acids, a 20 dimensional content ratio vector, and a 20 dimensional position ratio vector of the amino acids in the sequence. By evaluating the Euclidean distances among the representing vectors, we compare the similarity of protein sequences. We then apply this method into the ND5 dataset consisting of the ND5 protein sequences of 9 species, and the F10 and G11 datasets representing two of the xylanases containing glycoside hydrolase families, i.e., families 10 and 11. As a result, our method achieves a correlation coefficient of 0.962 with the canonical protein sequence aligner ClustalW in the ND5 dataset, much higher than those of other 5 popular alignment-free methods. In addition, we successfully separate the xylanases sequences in the F10 family and the G11 family and illustrate that the F10 family is more heat stable than the G11 family, consistent with a few previous studies. Moreover, we prove mathematically an identity equation involving the Pseudo-Markov transition probability vector and the amino acids content ratio vector. PMID:27918587

  17. Exome sequencing reveals a de novo POLD1 mutation causing phenotypic variability in mandibular hypoplasia, deafness, progeroid features, and lipodystrophy syndrome (MDPL).

    PubMed

    Elouej, Sahar; Beleza-Meireles, Ana; Caswell, Richard; Colclough, Kevin; Ellard, Sian; Desvignes, Jean Pierre; Béroud, Christophe; Lévy, Nicolas; Mohammed, Shehla; De Sandre-Giovannoli, Annachiara

    2017-06-01

    Mandibular hypoplasia, deafness, progeroid features, and lipodystrophy syndrome (MDPL) is an autosomal dominant systemic disorder characterized by prominent loss of subcutaneous fat, a characteristic facial appearance and metabolic abnormalities. This syndrome is caused by heterozygous de novo mutations in the POLD1 gene. To date, 19 patients with MDPL have been reported in the literature and among them 14 patients have been characterized at the molecular level. Twelve unrelated patients carried a recurrent in-frame deletion of a single codon (p.Ser605del) and two other patients carried a novel heterozygous mutation in exon 13 (p.Arg507Cys). Additionally and interestingly, germline mutations of the same gene have been involved in familial polyposis and colorectal cancer (CRC) predisposition. We describe a male and a female patient with MDPL respectively affected with mild and severe phenotypes. Both of them showed mandibular hypoplasia, a beaked nose with bird-like facies, prominent eyes, a small mouth, growth retardation, muscle and skin atrophy, but the female patient showed such a severe and early phenotype that a first working diagnosis of Hutchinson-Gilford Progeria was made. The exploration was performed by direct sequencing of POLD1 gene exon 15 in the male patient with a classical MDPL phenotype and by whole exome sequencing in the female patient and her unaffected parents. Exome sequencing identified in the latter patient a de novo heterozygous undescribed mutation in the POLD1 gene (NM_002691.3: c.3209T>A), predicted to cause the missense change p.Ile1070Asn in the ZnF2 (Zinc Finger 2) domain of the protein. This mutation was not reported in the 1000 Genome Project, dbSNP and Exome sequencing databases. Furthermore, the Isoleucine1070 residue of POLD1 is highly conserved among various species, suggesting that this substitution may cause a major impairment of POLD1 activity. For the second patient, affected with a typical MDPL phenotype, direct sequencing of POLD1 exon 15 revealed the recurrent in-frame deletion (c.1812_1814del, p.S605del). Our work highlights that mutations in different POLD1 domains can lead to phenotypic variability, ranging from dominantly inherited cancer predisposition syndromes, to mild MDPL phenotypes without lifespan reduction, to very severe MDPL syndromes with major premature aging features. These results also suggest that POLD1 gene testing should be considered in patients presenting with severe progeroid features. Copyright © 2017 Elsevier Inc. All rights reserved.

  18. Extraordinary Sequence Divergence at Tsga8, an X-linked Gene Involved in Mouse Spermiogenesis

    PubMed Central

    Good, Jeffrey M.; Vanderpool, Dan; Smith, Kimberly L.; Nachman, Michael W.

    2011-01-01

    The X chromosome plays an important role in both adaptive evolution and speciation. We used a molecular evolutionary screen of X-linked genes potentially involved in reproductive isolation in mice to identify putative targets of recurrent positive selection. We then sequenced five very rapidly evolving genes within and between several closely related species of mice in the genus Mus. All five genes were involved in male reproduction and four of the genes showed evidence of recurrent positive selection. The most remarkable evolutionary patterns were found at Testis-specific gene a8 (Tsga8), a spermatogenesis-specific gene expressed during postmeiotic chromatin condensation and nuclear transformation. Tsga8 was characterized by extremely high levels of insertion–deletion variation of an alanine-rich repetitive motif in natural populations of Mus domesticus and M. musculus, differing in length from the reference mouse genome by up to 89 amino acids (27% of the total protein length). This population-level variation was coupled with striking divergence in protein sequence and length between closely related mouse species. Although no clear orthologs had previously been described for Tsga8 in other mammalian species, we have identified a highly divergent hypothetical gene on the rat X chromosome that shares clear orthology with the 5′ and 3′ ends of Tsga8. Further inspection of this ortholog verified that it is expressed in rat testis and shares remarkable similarity with mouse Tsga8 across several general features of the protein sequence despite no conservation of nucleotide sequence across over 60% of the rat-coding domain. Overall, Tsga8 appears to be one of the most rapidly evolving genes to have been described in rodents. We discuss the potential evolutionary causes and functional implications of this extraordinary divergence and the possible contribution of Tsga8 and the other four genes we examined to reproductive isolation in mice. PMID:21186189

  19. TANGLE: Two-Level Support Vector Regression Approach for Protein Backbone Torsion Angle Prediction from Primary Sequences

    PubMed Central

    Song, Jiangning; Tan, Hao; Wang, Mingjun; Webb, Geoffrey I.; Akutsu, Tatsuya

    2012-01-01

    Protein backbone torsion angles (Phi) and (Psi) involve two rotation angles rotating around the Cα-N bond (Phi) and the Cα-C bond (Psi). Due to the planarity of the linked rigid peptide bonds, these two angles can essentially determine the backbone geometry of proteins. Accordingly, the accurate prediction of protein backbone torsion angle from sequence information can assist the prediction of protein structures. In this study, we develop a new approach called TANGLE (Torsion ANGLE predictor) to predict the protein backbone torsion angles from amino acid sequences. TANGLE uses a two-level support vector regression approach to perform real-value torsion angle prediction using a variety of features derived from amino acid sequences, including the evolutionary profiles in the form of position-specific scoring matrices, predicted secondary structure, solvent accessibility and natively disordered region as well as other global sequence features. When evaluated based on a large benchmark dataset of 1,526 non-homologous proteins, the mean absolute errors (MAEs) of the Phi and Psi angle prediction are 27.8° and 44.6°, respectively, which are 1% and 3% respectively lower than that using one of the state-of-the-art prediction tools ANGLOR. Moreover, the prediction of TANGLE is significantly better than a random predictor that was built on the amino acid-specific basis, with the p-value<1.46e-147 and 7.97e-150, respectively by the Wilcoxon signed rank test. As a complementary approach to the current torsion angle prediction algorithms, TANGLE should prove useful in predicting protein structural properties and assisting protein fold recognition by applying the predicted torsion angles as useful restraints. TANGLE is freely accessible at http://sunflower.kuicr.kyoto-u.ac.jp/~sjn/TANGLE/. PMID:22319565

  20. A survey of transposable element classification systems--a call for a fundamental update to meet the challenge of their diversity and complexity.

    PubMed

    Piégu, Benoît; Bire, Solenne; Arensburger, Peter; Bigot, Yves

    2015-05-01

    The increase of publicly available sequencing data has allowed for rapid progress in our understanding of genome composition. As new information becomes available we should constantly be updating and reanalyzing existing and newly acquired data. In this report we focus on transposable elements (TEs) which make up a significant portion of nearly all sequenced genomes. Our ability to accurately identify and classify these sequences is critical to understanding their impact on host genomes. At the same time, as we demonstrate in this report, problems with existing classification schemes have led to significant misunderstandings of the evolution of both TE sequences and their host genomes. In a pioneering publication Finnegan (1989) proposed classifying all TE sequences into two classes based on transposition mechanisms and structural features: the retrotransposons (class I) and the DNA transposons (class II). We have retraced how ideas regarding TE classification and annotation in both prokaryotic and eukaryotic scientific communities have changed over time. This has led us to observe that: (1) a number of TEs have convergent structural features and/or transposition mechanisms that have led to misleading conclusions regarding their classification, (2) the evolution of TEs is similar to that of viruses by having several unrelated origins, (3) there might be at least 8 classes and 12 orders of TEs including 10 novel orders. In an effort to address these classification issues we propose: (1) the outline of a universal TE classification, (2) a set of methods and classification rules that could be used by all scientific communities involved in the study of TEs, and (3) a 5-year schedule for the establishment of an International Committee for Taxonomy of Transposable Elements (ICTTE). Copyright © 2015 Elsevier Inc. All rights reserved.

  1. Utility of fat-suppressed sequences in differentiation of aggressive vs typical asymptomatic haemangioma of the spine.

    PubMed

    Nabavizadeh, Seyed Ali; Mamourian, Alexander; Schmitt, James E; Cloran, Francis; Vossough, Arastoo; Pukenas, Bryan; Loevner, Laurie A; Mohan, Suyash

    2016-01-01

    While haemangiomas are common benign vascular lesions involving the spine, some behave in an aggressive fashion. We investigated the utility of fat-suppressed sequences to differentiate between benign and aggressive vertebral haemangiomas. Patients with the diagnosis of aggressive vertebral haemangioma and available short tau inversion-recovery or T2 fat saturation sequence were included in the study. 11 patients with typical asymptomatic vertebral body haemangiomas were selected as the control group. Region of interest signal intensity (SI) analysis of the entire haemangioma as well as the portion of each haemangioma with highest signal on fat-saturation sequences was performed and normalized to a reference normal vertebral body. A total of 8 patients with aggressive vertebral haemangioma and 11 patients with asymptomatic typical vertebral haemangioma were included. There was a significant difference between total normalized mean SI ratio (3.14 vs 1.48, p = 0.0002), total normalized maximum SI ratio (5.72 vs 2.55, p = 0.0003), brightest normalized mean SI ratio (4.28 vs 1.72, p < 0.0001) and brightest normalized maximum SI ratio (5.25 vs 2.45, p = 0.0003). Multiple measures were able to discriminate between groups with high sensitivity (>88%) and specificity (>82%). In addition to the conventional imaging features such as vertebral expansion and presence of extravertebral component, quantitative evaluation of fat-suppression sequences is also another imaging feature that can differentiate aggressive haemangioma and typical asymptomatic haemangioma. The use of quantitative fat-suppressed MRI in vertebral haemangiomas is demonstrated. Quantitative fat-suppressed MRI can have a role in confirming the diagnosis of aggressive haemangiomas. In addition, this application can be further investigated in future studies to predict aggressiveness of vertebral haemangiomas in early stages.

  2. Efficient moving target analysis for inverse synthetic aperture radar images via joint speeded-up robust features and regular moment

    NASA Astrophysics Data System (ADS)

    Yang, Hongxin; Su, Fulin

    2018-01-01

    We propose a moving target analysis algorithm using speeded-up robust features (SURF) and regular moment in inverse synthetic aperture radar (ISAR) image sequences. In our study, we first extract interest points from ISAR image sequences by SURF. Different from traditional feature point extraction methods, SURF-based feature points are invariant to scattering intensity, target rotation, and image size. Then, we employ a bilateral feature registering model to match these feature points. The feature registering scheme can not only search the isotropic feature points to link the image sequences but also reduce the error matching pairs. After that, the target centroid is detected by regular moment. Consequently, a cost function based on correlation coefficient is adopted to analyze the motion information. Experimental results based on simulated and real data validate the effectiveness and practicability of the proposed method.

  3. Protein binding hot spots prediction from sequence only by a new ensemble learning method.

    PubMed

    Hu, Shan-Shan; Chen, Peng; Wang, Bing; Li, Jinyan

    2017-10-01

    Hot spots are interfacial core areas of binding proteins, which have been applied as targets in drug design. Experimental methods are costly in both time and expense to locate hot spot areas. Recently, in-silicon computational methods have been widely used for hot spot prediction through sequence or structure characterization. As the structural information of proteins is not always solved, and thus hot spot identification from amino acid sequences only is more useful for real-life applications. This work proposes a new sequence-based model that combines physicochemical features with the relative accessible surface area of amino acid sequences for hot spot prediction. The model consists of 83 classifiers involving the IBk (Instance-based k means) algorithm, where instances are encoded by important properties extracted from a total of 544 properties in the AAindex1 (Amino Acid Index) database. Then top-performance classifiers are selected to form an ensemble by a majority voting technique. The ensemble classifier outperforms the state-of-the-art computational methods, yielding an F1 score of 0.80 on the benchmark binding interface database (BID) test set. http://www2.ahu.edu.cn/pchen/web/HotspotEC.htm .

  4. AutoFACT: An Automatic Functional Annotation and Classification Tool

    PubMed Central

    Koski, Liisa B; Gray, Michael W; Lang, B Franz; Burger, Gertraud

    2005-01-01

    Background Assignment of function to new molecular sequence data is an essential step in genomics projects. The usual process involves similarity searches of a given sequence against one or more databases, an arduous process for large datasets. Results We present AutoFACT, a fully automated and customizable annotation tool that assigns biologically informative functions to a sequence. Key features of this tool are that it (1) analyzes nucleotide and protein sequence data; (2) determines the most informative functional description by combining multiple BLAST reports from several user-selected databases; (3) assigns putative metabolic pathways, functional classes, enzyme classes, GeneOntology terms and locus names; and (4) generates output in HTML, text and GFF formats for the user's convenience. We have compared AutoFACT to four well-established annotation pipelines. The error rate of functional annotation is estimated to be only between 1–2%. Comparison of AutoFACT to the traditional top-BLAST-hit annotation method shows that our procedure increases the number of functionally informative annotations by approximately 50%. Conclusion AutoFACT will serve as a useful annotation tool for smaller sequencing groups lacking dedicated bioinformatics staff. It is implemented in PERL and runs on LINUX/UNIX platforms. AutoFACT is available at . PMID:15960857

  5. Sequence, structure and function relationships in flaviviruses as assessed by evolutive aspects of its conserved non-structural protein domains.

    PubMed

    da Fonseca, Néli José; Lima Afonso, Marcelo Querino; Pedersolli, Natan Gonçalves; de Oliveira, Lucas Carrijo; Andrade, Dhiego Souto; Bleicher, Lucas

    2017-10-28

    Flaviviruses are responsible for serious diseases such as dengue, yellow fever, and zika fever. Their genomes encode a polyprotein which, after cleavage, results in three structural and seven non-structural proteins. Homologous proteins can be studied by conservation and coevolution analysis as detected in multiple sequence alignments, usually reporting positions which are strictly necessary for the structure and/or function of all members in a protein family or which are involved in a specific sub-class feature requiring the coevolution of residue sets. This study provides a complete conservation and coevolution analysis on all flaviviruses non-structural proteins, with results mapped on all well-annotated available sequences. A literature review on the residues found in the analysis enabled us to compile available information on their roles and distribution among different flaviviruses. Also, we provide the mapping of conserved and coevolved residues for all sequences currently in SwissProt as a supplementary material, so that particularities in different viruses can be easily analyzed. Copyright © 2017 Elsevier Inc. All rights reserved.

  6. Outcomes of Diagnostic Exome Sequencing in Patients With Diagnosed or Suspected Autism Spectrum Disorders.

    PubMed

    Rossi, Mari; El-Khechen, Dima; Black, Mary Helen; Farwell Hagman, Kelly D; Tang, Sha; Powis, Zöe

    2017-05-01

    Exome sequencing has recently been proved to be a successful diagnostic method for complex neurodevelopmental disorders. However, the diagnostic yield of exome sequencing for autism spectrum disorders has not been extensively evaluated in large cohorts to date. We performed diagnostic exome sequencing in a cohort of 163 individuals with autism spectrum disorder (66.3%) or autistic features (33.7%). The diagnostic yield observed in patients in our cohort was 25.8% (42 of 163) for positive or likely positive findings in characterized disease genes, while a candidate genetic etiology was reported for an additional 3.3% (4 of 120) of patients. Among the positive findings in the patients with autism spectrum disorder or autistic features, 61.9% were the result of de novo mutations. Patients presenting with psychiatric conditions or ataxia or paraplegia in addition to autism spectrum disorder or autistic features were significantly more likely to receive positive results compared with patients without these clinical features (95.6% vs 27.1%, P < 0.0001; 83.3% vs 21.2%, P < 0.0001, respectively). The majority of the positive findings were in recently identified autism spectrum disorder genes, supporting the importance of diagnostic exome sequencing for patients with autism spectrum disorder or autistic features as the causative genes might evade traditional sequential or panel testing. These results suggest that diagnostic exome sequencing would be an efficient primary diagnostic method for patients with autism spectrum disorders or autistic features. Moreover, our data may aid clinicians to better determine which subset of patients with autism spectrum disorder with additional clinical features would benefit the most from diagnostic exome sequencing. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Protein location prediction using atomic composition and global features of the amino acid sequence

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Cherian, Betsy Sheena, E-mail: betsy.skb@gmail.com; Nair, Achuthsankar S.

    2010-01-22

    Subcellular location of protein is constructive information in determining its function, screening for drug candidates, vaccine design, annotation of gene products and in selecting relevant proteins for further studies. Computational prediction of subcellular localization deals with predicting the location of a protein from its amino acid sequence. For a computational localization prediction method to be more accurate, it should exploit all possible relevant biological features that contribute to the subcellular localization. In this work, we extracted the biological features from the full length protein sequence to incorporate more biological information. A new biological feature, distribution of atomic composition is effectivelymore » used with, multiple physiochemical properties, amino acid composition, three part amino acid composition, and sequence similarity for predicting the subcellular location of the protein. Support Vector Machines are designed for four modules and prediction is made by a weighted voting system. Our system makes prediction with an accuracy of 100, 82.47, 88.81 for self-consistency test, jackknife test and independent data test respectively. Our results provide evidence that the prediction based on the biological features derived from the full length amino acid sequence gives better accuracy than those derived from N-terminal alone. Considering the features as a distribution within the entire sequence will bring out underlying property distribution to a greater detail to enhance the prediction accuracy.« less

  8. Survey of endosymbionts in the Diaphorina citri metagenome and assembly of a Wolbachia wDi draft genome.

    PubMed

    Saha, Surya; Hunter, Wayne B; Reese, Justin; Morgan, J Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen

    2012-01-01

    Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China.

  9. Survey of Endosymbionts in the Diaphorina citri Metagenome and Assembly of a Wolbachia wDi Draft Genome

    PubMed Central

    Saha, Surya; Hunter, Wayne B.; Reese, Justin; Morgan, J. Kent; Marutani-Hert, Mizuri; Huang, Hong; Lindeberg, Magdalen

    2012-01-01

    Diaphorina citri (Hemiptera: Psyllidae), the Asian citrus psyllid, is the insect vector of Ca. Liberibacter asiaticus, the causal agent of citrus greening disease. Sequencing of the D. citri metagenome has been initiated to gain better understanding of the biology of this organism and the potential roles of its bacterial endosymbionts. To corroborate candidate endosymbionts previously identified by rDNA amplification, raw reads from the D. citri metagenome sequence were mapped to reference genome sequences. Results of the read mapping provided the most support for Wolbachia and an enteric bacterium most similar to Salmonella. Wolbachia-derived reads were extracted using the complete genome sequences for four Wolbachia strains. Reads were assembled into a draft genome sequence, and the annotation assessed for the presence of features potentially involved in host interaction. Genome alignment with the complete sequences reveals membership of Wolbachia wDi in supergroup B, further supported by phylogenetic analysis of FtsZ. FtsZ and Wsp phylogenies additionally indicate that the Wolbachia strain in the Florida D. citri isolate falls into a sub-clade of supergroup B, distinct from Wolbachia present in Chinese D. citri isolates, supporting the hypothesis that the D. citri introduced into Florida did not originate from China. PMID:23166822

  10. Characterization of the first complete genome sequence of an Impatiens necrotic spot orthotospovirus isolate from the United States and worldwide phylogenetic analyses of INSV isolates.

    PubMed

    Zhao, Kaixi; Margaria, Paolo; Rosa, Cristina

    2018-05-10

    Impatiens necrotic spot orthotospovirus (INSV) can impact economically important ornamental plants and vegetables worldwide. Characterization studies on INSV are limited. For most INSV isolates, there are no complete genome sequences available. This lack of genomic information has a negative impact on the understanding of the INSV genetic diversity and evolution. Here we report the first complete nucleotide sequence of a US INSV isolate. INSV-UP01 was isolated from an impatiens in Pennsylvania, US. RT-PCR was used to clone its full-length genome and Vector NTI to assemble overlapping sequences. Phylogenetic trees were constructed by using MEGA7 software to show the phylogenetic relationships with other available INSV sequences worldwide. This US isolate has genome and biological features classical of INSV species and clusters in the Western Hemisphere clade, but its origin appears to be recent. Furthermore, INSV-UP01 might have been involved in a recombination event with an Italian isolate belonging to the Asian clade. Our analyses support that INSV isolates infect a broad plant-host range they group by geographic origin and not by host, and are subjected to frequent recombination events. These results justify the need to generate and analyze complete genome sequences of orthotospoviruses in general and INSV in particular.

  11. Genome-guided exploration of metabolic features of Streptomyces peucetius ATCC 27952: past, current, and prospect.

    PubMed

    Thuan, Nguyen Huy; Dhakal, Dipesh; Pokhrel, Anaya Raj; Chu, Luan Luong; Van Pham, Thi Thuy; Shrestha, Anil; Sohng, Jae Kyung

    2018-05-01

    Streptomyces peucetius ATCC 27952 produces two major anthracyclines, doxorubicin (DXR) and daunorubicin (DNR), which are potent chemotherapeutic agents for the treatment of several cancers. In order to gain detailed insight on genetics and biochemistry of the strain, the complete genome was determined and analyzed. The result showed that its complete sequence contains 7187 protein coding genes in a total of 8,023,114 bp, whereas 87% of the genome contributed to the protein coding region. The genomic sequence included 18 rRNA, 66 tRNAs, and 3 non-coding RNAs. In silico studies predicted ~ 68 biosynthetic gene clusters (BCGs) encoding diverse classes of secondary metabolites, including non-ribosomal polyketide synthase (NRPS), polyketide synthase (PKS I, II, and III), terpenes, and others. Detailed analysis of the genome sequence revealed versatile biocatalytic enzymes such as cytochrome P450 (CYP), electron transfer systems (ETS) genes, methyltransferase (MT), glycosyltransferase (GT). In addition, numerous functional genes (transporter gene, SOD, etc.) and regulatory genes (afsR-sp, metK-sp, etc.) involved in the regulation of secondary metabolites were found. This minireview summarizes the genome-based genome mining (GM) of diverse BCGs and genome exploration (GE) of versatile biocatalytic enzymes, and other enzymes involved in maintenance and regulation of metabolism of S. peucetius. The detailed analysis of genome sequence provides critically important knowledge useful in the bioengineering of the strain or harboring catalytically efficient enzymes for biotechnological applications.

  12. Deep sequencing of foot-and-mouth disease virus reveals RNA sequences involved in genome packaging.

    PubMed

    Logan, Grace; Newman, Joseph; Wright, Caroline F; Lasecka-Dykes, Lidia; Haydon, Daniel T; Cottam, Eleanor M; Tuthill, Tobias J

    2017-10-18

    Non-enveloped viruses protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. Packaging and capsid assembly in RNA viruses can involve interactions between capsid proteins and secondary structures in the viral genome as exemplified by the RNA bacteriophage MS2 and as proposed for other RNA viruses of plants, animals and human. In the picornavirus family of non-enveloped RNA viruses, the requirements for genome packaging remain poorly understood. Here we show a novel and simple approach to identify predicted RNA secondary structures involved in genome packaging in the picornavirus foot-and-mouth disease virus (FMDV). By interrogating deep sequencing data generated from both packaged and unpackaged populations of RNA we have determined multiple regions of the genome with constrained variation in the packaged population. Predicted secondary structures of these regions revealed stem loops with conservation of structure and a common motif at the loop. Disruption of these features resulted in attenuation of virus growth in cell culture due to a reduction in assembly of mature virions. This study provides evidence for the involvement of predicted RNA structures in picornavirus packaging and offers a readily transferable methodology for identifying packaging requirements in many other viruses. Importance In order to transmit their genetic material to a new host, non-enveloped viruses must protect their genomes by packaging them into an outer shell or capsid of virus-encoded proteins. For many non-enveloped RNA viruses the requirements for this critical part of the viral life cycle remain poorly understood. We have identified RNA sequences involved in genome packaging of the picornavirus foot-and-mouth disease virus. This virus causes an economically devastating disease of livestock affecting both the developed and developing world. The experimental methods developed to carry out this work are novel, simple and transferable to the study of packaging signals in other RNA viruses. Improved understanding of RNA packaging may lead to novel vaccine approaches or targets for antiviral drugs with broad spectrum activity. Copyright © 2017 Logan et al.

  13. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes.

    PubMed

    Srinivasulu, Yerukala Sathipati; Wang, Jyun-Rong; Hsu, Kai-Ti; Tsai, Ming-Ju; Charoenkwan, Phasit; Huang, Wen-Lin; Huang, Hui-Ling; Ho, Shinn-Ying

    2015-01-01

    Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes.

  14. Characterizing informative sequence descriptors and predicting binding affinities of heterodimeric protein complexes

    PubMed Central

    2015-01-01

    Background Protein-protein interactions (PPIs) are involved in various biological processes, and underlying mechanism of the interactions plays a crucial role in therapeutics and protein engineering. Most machine learning approaches have been developed for predicting the binding affinity of protein-protein complexes based on structure and functional information. This work aims to predict the binding affinity of heterodimeric protein complexes from sequences only. Results This work proposes a support vector machine (SVM) based binding affinity classifier, called SVM-BAC, to classify heterodimeric protein complexes based on the prediction of their binding affinity. SVM-BAC identified 14 of 580 sequence descriptors (physicochemical, energetic and conformational properties of the 20 amino acids) to classify 216 heterodimeric protein complexes into low and high binding affinity. SVM-BAC yielded the training accuracy, sensitivity, specificity, AUC and test accuracy of 85.80%, 0.89, 0.83, 0.86 and 83.33%, respectively, better than existing machine learning algorithms. The 14 features and support vector regression were further used to estimate the binding affinities (Pkd) of 200 heterodimeric protein complexes. Prediction performance of a Jackknife test was the correlation coefficient of 0.34 and mean absolute error of 1.4. We further analyze three informative physicochemical properties according to their contribution to prediction performance. Results reveal that the following properties are effective in predicting the binding affinity of heterodimeric protein complexes: apparent partition energy based on buried molar fractions, relations between chemical structure and biological activity in principal component analysis IV, and normalized frequency of beta turn. Conclusions The proposed sequence-based prediction method SVM-BAC uses an optimal feature selection method to identify 14 informative features to classify and predict binding affinity of heterodimeric protein complexes. The characterization analysis revealed that the average numbers of beta turns and hydrogen bonds at protein-protein interfaces in high binding affinity complexes are more than those in low binding affinity complexes. PMID:26681483

  15. Clinical and molecular characterization of duplications encompassing the human SHOX gene reveal a variable effect on stature.

    PubMed

    Thomas, N Simon; Harvey, John F; Bunyan, David J; Rankin, Julia; Grigelioniene, Giedre; Bruno, Damien L; Tan, Tiong Y; Tomkins, Susan; Hastings, Robert

    2009-07-01

    Deletions of the SHOX gene are well documented and cause disproportionate short stature and variable skeletal abnormalities. In contrast interstitial SHOX duplications limited to PAR1 appear to be very rare and the clinical significance of the only case report in the literature is unclear. Mapping of this duplication has now shown that it includes the entire SHOX gene but little flanking sequence and so will not encompass any of the long-range enhancers required for SHOX transcription. We now describe the clinical and molecular characterization of three additional cases. The duplications all included the SHOX coding sequence but varied in the amount of flanking sequence involved. The probands were ascertained for a variety of reasons: hypotonia and features of Asperger syndrome, Leri-Weill dyschondrosteosis (LWD), and a family history of cleft palate. However, the presence of a duplication did not correlate with any of these features or with evidence of skeletal abnormality. Remarkably, the proband with LWD had inherited both a SHOX deletion and a duplication. The effect of the duplications on stature was variable: height appeared to be elevated in some carriers, particularly in those with the largest duplications, but was still within the normal range. SHOX duplications are likely to be under ascertained and more cases need to be identified and characterized in detail in order to accurately determine their phenotypic consequences.

  16. Visualization of protein sequence features using JavaScript and SVG with pViz.js.

    PubMed

    Mukhyala, Kiran; Masselot, Alexandre

    2014-12-01

    pViz.js is a visualization library for displaying protein sequence features in a Web browser. By simply providing a sequence and the locations of its features, this lightweight, yet versatile, JavaScript library renders an interactive view of the protein features. Interactive exploration of protein sequence features over the Web is a common need in Bioinformatics. Although many Web sites have developed viewers to display these features, their implementations are usually focused on data from a specific source or use case. Some of these viewers can be adapted to fit other use cases but are not designed to be reusable. pViz makes it easy to display features as boxes aligned to a protein sequence with zooming functionality but also includes predefined renderings for secondary structure and post-translational modifications. The library is designed to further customize this view. We demonstrate such applications of pViz using two examples: a proteomic data visualization tool with an embedded viewer for displaying features on protein structure, and a tool to visualize the results of the variant_effect_predictor tool from Ensembl. pViz.js is a JavaScript library, available on github at https://github.com/Genentech/pviz. This site includes examples and functional applications, installation instructions and usage documentation. A Readme file, which explains how to use pViz with examples, is available as Supplementary Material A. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  17. Protein contact prediction by integrating deep multiple sequence alignments, coevolution and machine learning.

    PubMed

    Adhikari, Badri; Hou, Jie; Cheng, Jianlin

    2018-03-01

    In this study, we report the evaluation of the residue-residue contacts predicted by our three different methods in the CASP12 experiment, focusing on studying the impact of multiple sequence alignment, residue coevolution, and machine learning on contact prediction. The first method (MULTICOM-NOVEL) uses only traditional features (sequence profile, secondary structure, and solvent accessibility) with deep learning to predict contacts and serves as a baseline. The second method (MULTICOM-CONSTRUCT) uses our new alignment algorithm to generate deep multiple sequence alignment to derive coevolution-based features, which are integrated by a neural network method to predict contacts. The third method (MULTICOM-CLUSTER) is a consensus combination of the predictions of the first two methods. We evaluated our methods on 94 CASP12 domains. On a subset of 38 free-modeling domains, our methods achieved an average precision of up to 41.7% for top L/5 long-range contact predictions. The comparison of the three methods shows that the quality and effective depth of multiple sequence alignments, coevolution-based features, and machine learning integration of coevolution-based features and traditional features drive the quality of predicted protein contacts. On the full CASP12 dataset, the coevolution-based features alone can improve the average precision from 28.4% to 41.6%, and the machine learning integration of all the features further raises the precision to 56.3%, when top L/5 predicted long-range contacts are evaluated. And the correlation between the precision of contact prediction and the logarithm of the number of effective sequences in alignments is 0.66. © 2017 Wiley Periodicals, Inc.

  18. SCPRED: accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences.

    PubMed

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-05-01

    Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods.

  19. SCPRED: Accurate prediction of protein structural class for sequences of twilight-zone similarity with predicting sequences

    PubMed Central

    Kurgan, Lukasz; Cios, Krzysztof; Chen, Ke

    2008-01-01

    Background Protein structure prediction methods provide accurate results when a homologous protein is predicted, while poorer predictions are obtained in the absence of homologous templates. However, some protein chains that share twilight-zone pairwise identity can form similar folds and thus determining structural similarity without the sequence similarity would be desirable for the structure prediction. The folding type of a protein or its domain is defined as the structural class. Current structural class prediction methods that predict the four structural classes defined in SCOP provide up to 63% accuracy for the datasets in which sequence identity of any pair of sequences belongs to the twilight-zone. We propose SCPRED method that improves prediction accuracy for sequences that share twilight-zone pairwise similarity with sequences used for the prediction. Results SCPRED uses a support vector machine classifier that takes several custom-designed features as its input to predict the structural classes. Based on extensive design that considers over 2300 index-, composition- and physicochemical properties-based features along with features based on the predicted secondary structure and content, the classifier's input includes 8 features based on information extracted from the secondary structure predicted with PSI-PRED and one feature computed from the sequence. Tests performed with datasets of 1673 protein chains, in which any pair of sequences shares twilight-zone similarity, show that SCPRED obtains 80.3% accuracy when predicting the four SCOP-defined structural classes, which is superior when compared with over a dozen recent competing methods that are based on support vector machine, logistic regression, and ensemble of classifiers predictors. Conclusion The SCPRED can accurately find similar structures for sequences that share low identity with sequence used for the prediction. The high predictive accuracy achieved by SCPRED is attributed to the design of the features, which are capable of separating the structural classes in spite of their low dimensionality. We also demonstrate that the SCPRED's predictions can be successfully used as a post-processing filter to improve performance of modern fold classification methods. PMID:18452616

  20. Methylotrophic Methylobacterium Bacteria Nodulate and Fix Nitrogen in Symbiosis with Legumes

    PubMed Central

    Sy, Abdoulaye; Giraud, Eric; Jourand, Philippe; Garcia, Nelly; Willems, Anne; de Lajudie, Philippe; Prin, Yves; Neyra, Marc; Gillis, Monique; Boivin-Masson, Catherine; Dreyfus, Bernard

    2001-01-01

    Rhizobia described so far belong to three distinct phylogenetic branches within the α-2 subclass of Proteobacteria. Here we report the discovery of a fourth rhizobial branch involving bacteria of the Methylobacterium genus. Rhizobia isolated from Crotalaria legumes were assigned to a new species, “Methylobacterium nodulans,” within the Methylobacterium genus on the basis of 16S ribosomal DNA analyses. We demonstrated that these rhizobia facultatively grow on methanol, which is a characteristic of Methylobacterium spp. but a unique feature among rhizobia. Genes encoding two key enzymes of methylotrophy and nodulation, the mxaF gene, encoding the α subunit of the methanol dehydrogenase, and the nodA gene, encoding an acyltransferase involved in Nod factor biosynthesis, were sequenced for the type strain, ORS2060. Plant tests and nodA amplification assays showed that “M. nodulans” is the only nodulating Methylobacterium sp. identified so far. Phylogenetic sequence analysis showed that “M. nodulans” NodA is closely related to Bradyrhizobium NodA, suggesting that this gene was acquired by horizontal gene transfer. PMID:11114919

  1. BioSAVE: display of scored annotation within a sequence context.

    PubMed

    Pollock, Richard F; Adryan, Boris

    2008-03-20

    Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage.

  2. BioSAVE: Display of scored annotation within a sequence context

    PubMed Central

    Pollock, Richard F; Adryan, Boris

    2008-01-01

    Background Visualization of sequence annotation is a common feature in many bioinformatics tools. For many applications it is desirable to restrict the display of such annotation according to a score cutoff, as biological interpretation can be difficult in the presence of the entire data. Unfortunately, many visualisation solutions are somewhat static in the way they handle such score cutoffs. Results We present BioSAVE, a sequence annotation viewer with on-the-fly selection of visualisation thresholds for each feature. BioSAVE is a versatile OS X program for visual display of scored features (annotation) within a sequence context. The program reads sequence and additional supplementary annotation data (e.g., position weight matrix matches, conservation scores, structural domains) from a variety of commonly used file formats and displays them graphically. Onscreen controls then allow for live customisation of these graphics, including on-the-fly selection of visualisation thresholds for each feature. Conclusion Possible applications of the program include display of transcription factor binding sites in a genomic context or the visualisation of structural domain assignments in protein sequences and many more. The dynamic visualisation of these annotations is useful, e.g., for the determination of cutoff values of predicted features to match experimental data. Program, source code and exemplary files are freely available at the BioSAVE homepage. PMID:18366701

  3. Fusion genes with ALK as recurrent partner in ependymoma-like gliomas: a new brain tumor entity?

    PubMed Central

    Olsen, Thale Kristin; Panagopoulos, Ioannis; Meling, Torstein R.; Micci, Francesca; Gorunova, Ludmila; Thorsen, Jim; Due-Tønnessen, Bernt; Scheie, David; Lund-Iversen, Marius; Krossnes, Bård; Saxhaug, Cathrine; Heim, Sverre; Brandal, Petter

    2015-01-01

    Background We have previously characterized 19 ependymal tumors using Giemsa banding and high-resolution comparative genomic hybridization. The aim of this study was to analyze these tumors searching for fusion genes. Methods RNA sequencing was performed in 12 samples. Potential fusion transcripts were assessed by seed count and structural chromosomal aberrations. Transcripts of interest were validated using fluorescence in situ hybridization and PCR followed by direct sequencing. Results RNA sequencing identified rearrangements of the anaplastic lymphoma kinase gene (ALK) in 2 samples. Both tumors harbored structural aberrations involving the ALK locus 2p23. Tumor 1 had an unbalanced t(2;14)(p23;q22) translocation which led to the fusion gene KTN1-ALK. Tumor 2 had an interstitial del(2)(p16p23) deletion causing the fusion of CCDC88A and ALK. In both samples, the breakpoint of ALK was located between exons 19 and 20. Both patients were infants and both tumors were supratentorial. The tumors were well demarcated from surrounding tissue and had both ependymal and astrocytic features but were diagnosed and treated as ependymomas. Conclusions By combining karyotyping and RNA sequencing, we identified the 2 first ever reported ALK rearrangements in CNS tumors. Such rearrangements may represent the hallmark of a new entity of pediatric glioma characterized by both ependymal and astrocytic features. Our findings are of particular importance because crizotinib, a selective ALK inhibitor, has demonstrated effect in patients with lung cancer harboring ALK rearrangements. Thus, ALK emerges as an interesting therapeutic target in patients with ependymal tumors carrying ALK fusions. PMID:25795305

  4. CEP78 is mutated in a distinct type of Usher syndrome.

    PubMed

    Fu, Qing; Xu, Mingchu; Chen, Xue; Sheng, Xunlun; Yuan, Zhisheng; Liu, Yani; Li, Huajin; Sun, Zixi; Li, Huiping; Yang, Lizhu; Wang, Keqing; Zhang, Fangxia; Li, Yumei; Zhao, Chen; Sui, Ruifang; Chen, Rui

    2017-03-01

    Usher syndrome is a genetically heterogeneous disorder featured by combined visual impairment and hearing loss. Despite a dozen of genes involved in Usher syndrome having been identified, the genetic basis remains unknown in 20-30% of patients. In this study, we aimed to identify the novel disease-causing gene of a distinct subtype of Usher syndrome. Ophthalmic examinations and hearing tests were performed on patients with Usher syndrome in two consanguineous families. Target capture sequencing was initially performed to screen causative mutations in known retinal disease-causing loci. Whole exome sequencing (WES) and whole genome sequencing (WGS) were applied for identifying novel disease-causing genes. RT-PCR and Sanger sequencing were performed to evaluate the splicing-altering effect of identified CEP78 variants. Patients from the two independent families show a mild Usher syndrome phenotype featured by juvenile or adult-onset cone-rod dystrophy and sensorineural hearing loss. WES and WGS identified two homozygous rare variants that affect mRNA splicing of a ciliary gene CEP78 . RT-PCR confirmed that the two variants indeed lead to abnormal splicing, resulting in premature stop of protein translation due to frameshift. Our results provide evidence that CEP78 is a novel disease-causing gene for Usher syndrome, demonstrating an additional link between ciliopathy and Usher protein network in photoreceptor cells and inner ear hair cells. Published by the BMJ Publishing Group Limited. For permission to use (where not already granted under a licence) please go to http://www.bmj.com/company/products-services/rights-and-licensing/.

  5. Expansion of phenotype and genotypic data in CRB2-related syndrome.

    PubMed

    Lamont, Ryan E; Tan, Wen-Hann; Innes, A Micheil; Parboosingh, Jillian S; Schneidman-Duhovny, Dina; Rajkovic, Aleksandar; Pappas, John; Altschwager, Pablo; DeWard, Stephanie; Fulton, Anne; Gray, Kathryn J; Krall, Max; Mehta, Lakshmi; Rodan, Lance H; Saller, Devereux N; Steele, Deanna; Stein, Deborah; Yatsenko, Svetlana A; Bernier, François P; Slavotinek, Anne M

    2016-10-01

    Sequence variants in CRB2 cause a syndrome with greatly elevated maternal serum alpha-fetoprotein and amniotic fluid alpha-fetoprotein levels, cerebral ventriculomegaly and renal findings similar to Finnish congenital nephrosis. All reported patients have been homozygotes or compound heterozygotes for sequence variants in the Crumbs, Drosophila, Homolog of, 2 (CRB2) genes. Variants affecting CRB2 function have also been identified in four families with steroid resistant nephrotic syndrome, but without any other known systemic findings. We ascertained five, previously unreported individuals with biallelic variants in CRB2 that were predicted to affect function. We compiled the clinical features of reported cases and reviewed available literature for cases with features suggestive of CRB2-related syndrome in order to better understand the phenotypic and genotypic manifestations. Phenotypic analyses showed that ventriculomegaly was a common clinical manifestation (9/11 confirmed cases), in contrast to the original reports, in which patients were ascertained due to renal disease. Two children had minor eye findings and one was diagnosed with a B-cell lymphoma. Further genetic analysis identified one family with two affected siblings who were both heterozygous for a variant in NPHS2 predicted to affect function and separate families with sequence variants in NPHS4 and BBS7 in addition to the CRB2 variants. Our report expands the clinical phenotype of CRB2-related syndrome and establishes ventriculomegaly and hydrocephalus as frequent manifestations. We found additional sequence variants in genes involved in kidney development and ciliopathies in patients with CRB2-related syndrome, suggesting that these variants may modify the phenotype.

  6. Computational Prediction of Protein Epsilon Lysine Acetylation Sites Based on a Feature Selection Method.

    PubMed

    Gao, JianZhao; Tao, Xue-Wen; Zhao, Jia; Feng, Yuan-Ming; Cai, Yu-Dong; Zhang, Ning

    2017-01-01

    Lysine acetylation, as one type of post-translational modifications (PTM), plays key roles in cellular regulations and can be involved in a variety of human diseases. However, it is often high-cost and time-consuming to use traditional experimental approaches to identify the lysine acetylation sites. Therefore, effective computational methods should be developed to predict the acetylation sites. In this study, we developed a position-specific method for epsilon lysine acetylation site prediction. Sequences of acetylated proteins were retrieved from the UniProt database. Various kinds of features such as position specific scoring matrix (PSSM), amino acid factors (AAF), and disorders were incorporated. A feature selection method based on mRMR (Maximum Relevance Minimum Redundancy) and IFS (Incremental Feature Selection) was employed. Finally, 319 optimal features were selected from total 541 features. Using the 319 optimal features to encode peptides, a predictor was constructed based on dagging. As a result, an accuracy of 69.56% with MCC of 0.2792 was achieved. We analyzed the optimal features, which suggested some important factors determining the lysine acetylation sites. We developed a position-specific method for epsilon lysine acetylation site prediction. A set of optimal features was selected. Analysis of the optimal features provided insights into the mechanism of lysine acetylation sites, providing guidance of experimental validation. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  7. Complete genome sequence of Syntrophobacter fumaroxidans strain (MPOBT)

    PubMed Central

    Plugge, Caroline M.; Henstra, Anne M.; Worm, Petra; Swarts, Daan C.; Paulitsch-Fuchs, Astrid H.; Scholten, Johannes C.M.; Lykidis, Athanasios; Lapidus, Alla L.; Goltsman, Eugene; Kim, Edwin; McDonald, Erin; Rohlin, Lars; Crable, Bryan R.; Gunsalus, Robert P.; Stams, Alfons J.M.; McInerney, Michael J.

    2012-01-01

    Syntrophobacter fumaroxidans strain MPOBT is the best-studied species of the genus Syntrophobacter. The species is of interest because of its anaerobic syntrophic lifestyle, its involvement in the conversion of propionate to acetate, H2 and CO2 during the overall degradation of organic matter, and its release of products that serve as substrates for other microorganisms. The strain is able to ferment fumarate in pure culture to CO2 and succinate, and is also able to grow as a sulfate reducer with propionate as an electron donor. This is the first complete genome sequence of a member of the genus Syntrophobacter and a member genus in the family Syntrophobacteraceae. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,990,251 bp long genome with its 4,098 protein-coding and 81 RNA genes is a part of the Microbial Genome Program (MGP) and the Genomes to Life (GTL) Program project. PMID:23450070

  8. Identification of the WBSCR9 gene, encoding a novel transcriptional regulator, in the Williams-Beuren syndrome deletion at 7q11.23.

    PubMed

    Peoples, R J; Cisco, M J; Kaplan, P; Francke, U

    1998-01-01

    We have identified a novel gene (WBSCR9) within the common Williams-Beuren syndrome (WBS) deletion by interspecies sequence conservation. The WBSCR9 gene encodes a roughly 7-kb transcript with an open reading frame of 1483 amino acids and a predicted protein product size of 170.8 kDa. WBSCR9 is comprised of at least 20 exons extending over 60 kb. The transcript is expressed ubiquitously throughout development and is subject to alternative splicing. Functional motifs identified by sequence homology searches include a bromodomain; a PHD, or C4HC3, finger; several putative nuclear localization signals; four nuclear receptor binding motifs; a polyglutamate stretch and two PEST sequences. Bromodomains, PHD motifs and nuclear receptor binding motifs are cardinal features of proteins that are involved in chromatin remodeling and modulation of transcription. Haploinsufficiency for WBSCR9 gene products may contribute to the complex phenotype of WBS by interacting with tissue-specific regulatory factors during development.

  9. Enhancing Teacher Preparation and Improving Faculty Teaching Skills: Lessons Learned from Implementing ``Science That Matters'' a Standards Based Interdisciplinary Science Course Sequence

    NASA Astrophysics Data System (ADS)

    Potter, Robert; Meisels, Gerry

    2005-06-01

    In a highly collaborative process we developed an introductory science course sequence to improve science literacy especially among future elementary and middle school education majors. The materials and course features were designed using the results of research on teaching and learning to provide a rigorous, relevant and engaging, standard based science experience. More than ten years of combined planning, development, implementation and assessment of this college science course sequence for nonmajors/future teachers has provided significant insights and success in achieving our goal. This paper describes the history and iterative nature of our ongoing improvements, changes in faculty instructional practice, strategies used to overcome student resistance, significant student learning outcomes, support structures for faculty, and the essential and informative role of assessment in improving the outcomes. Our experience with diverse institutions, students and faculty provides the basis for the lessons we have learned and should be of help to others involved in advancing science education.

  10. Novel SNP array analysis and exome sequencing detect a homozygous exon 7 deletion of MEGF10 causing early onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD)

    PubMed Central

    Pierson, Tyler Mark; Markello, Thomas; Accardi, John; Wolfe, Lynne; Adams, David; Sincan, Murat; Tarazi, Noor M.; Fajardo, Karin Fuentes; Cherukuri, Praveen F.; Bajraktari, Ilda; Meilleur, Katy G.; Donkervoort, Sandra; Jain, Mina; Hu, Ying; Lehky, Tanya J.; Cruz, Pedro; Mullikin, James C.; Bonnemann, Carsten; Gahl, William A.; Boerkoel, Cornelius F.; Tifft, Cynthia J.

    2013-01-01

    Early-onset myopathy, areflexia, respiratory distress and dysphagia (EMARDD) is a myopathic disorder associated with mutations in MEGF10. By novel analysis of SNP array hybridization and exome sequence coverage, we diagnosed a 10-year old girl with EMARDD following identification of a novel homozygous deletion of exon 7 in MEGF10. In contrast to previously reported EMARDD patients, her weakness was more prominent proximally than distally, and involved her legs more than her arms. MRI of her pelvis and thighs showed muscle atrophy and fatty replacement. Ultrasound of several muscle groups revealed dense homogenous increases in echogenicity. Cloning and sequencing of the deletion breakpoint identified features suggesting the mutation arose by fork stalling and template switching. These findings constitute the first genomic deletion causing EMARDD, expand the clinical phenotype, and provide new insight into the pattern and histology of its muscular pathology. PMID:23453856

  11. Molecular evolution of miraculin-like proteins in soybean Kunitz super-family.

    PubMed

    Selvakumar, Purushotham; Gahloth, Deepankar; Tomar, Prabhat Pratap Singh; Sharma, Nidhi; Sharma, Ashwani Kumar

    2011-12-01

    Miraculin-like proteins (MLPs) belong to soybean Kunitz super-family and have been characterized from many plant families like Rutaceae, Solanaceae, Rubiaceae, etc. Many of them possess trypsin inhibitory activity and are involved in plant defense. MLPs exhibit significant sequence identity (~30-95%) to native miraculin protein, also belonging to Kunitz super-family compared with a typical Kunitz family member (~30%). The sequence and structure-function comparison of MLPs with that of a classical Kunitz inhibitor have demonstrated that MLPs have evolved to form a distinct group within Kunitz super-family. Sequence analysis of new genes along with available MLP sequences in the literature revealed three major groups for these proteins. A significant feature of Rutaceae MLP type 2 sequences is the presence of phosphorylation motif. Subtle changes are seen in putative reactive loop residues among different MLPs suggesting altered specificities to specific proteases. In phylogenetic analysis, Rutaceae MLP type 1 and type 2 proteins clustered together on separate branches, whereas native miraculin along with other MLPs formed distinct clusters. Site-specific positive Darwinian selection was observed at many sites in both the groups of Rutaceae MLP sequences with most of the residues undergoing positive selection located in loop regions. The results demonstrate the sequence and thereby the structure-function divergence of MLPs as a distinct group within soybean Kunitz super-family due to biotic and abiotic stresses of local environment.

  12. Real-time ultrasound image classification for spine anesthesia using local directional Hadamard features.

    PubMed

    Pesteie, Mehran; Abolmaesumi, Purang; Ashab, Hussam Al-Deen; Lessoway, Victoria A; Massey, Simon; Gunka, Vit; Rohling, Robert N

    2015-06-01

    Injection therapy is a commonly used solution for back pain management. This procedure typically involves percutaneous insertion of a needle between or around the vertebrae, to deliver anesthetics near nerve bundles. Most frequently, spinal injections are performed either blindly using palpation or under the guidance of fluoroscopy or computed tomography. Recently, due to the drawbacks of the ionizing radiation of such imaging modalities, there has been a growing interest in using ultrasound imaging as an alternative. However, the complex spinal anatomy with different wave-like structures, affected by speckle noise, makes the accurate identification of the appropriate injection plane difficult. The aim of this study was to propose an automated system that can identify the optimal plane for epidural steroid injections and facet joint injections. A multi-scale and multi-directional feature extraction system to provide automated identification of the appropriate plane is proposed. Local Hadamard coefficients are obtained using the sequency-ordered Hadamard transform at multiple scales. Directional features are extracted from local coefficients which correspond to different regions in the ultrasound images. An artificial neural network is trained based on the local directional Hadamard features for classification. The proposed method yields distinctive features for classification which successfully classified 1032 images out of 1090 for epidural steroid injection and 990 images out of 1052 for facet joint injection. In order to validate the proposed method, a leave-one-out cross-validation was performed. The average classification accuracy for leave-one-out validation was 94 % for epidural and 90 % for facet joint targets. Also, the feature extraction time for the proposed method was 20 ms for a native 2D ultrasound image. A real-time machine learning system based on the local directional Hadamard features extracted by the sequency-ordered Hadamard transform for detecting the laminae and facet joints in ultrasound images has been proposed. The system has the potential to assist the anesthesiologists in quickly finding the target plane for epidural steroid injections and facet joint injections.

  13. Clusters of ancestrally related genes that show paralogy in whole or in part are a major feature of the genomes of humans and other species.

    PubMed

    Walker, Michael B; King, Benjamin L; Paigen, Kenneth

    2012-01-01

    Arrangements of genes along chromosomes are a product of evolutionary processes, and we can expect that preferable arrangements will prevail over the span of evolutionary time, often being reflected in the non-random clustering of structurally and/or functionally related genes. Such non-random arrangements can arise by two distinct evolutionary processes: duplications of DNA sequences that give rise to clusters of genes sharing both sequence similarity and common sequence features and the migration together of genes related by function, but not by common descent. To provide a background for distinguishing between the two, which is important for future efforts to unravel the evolutionary processes involved, we here provide a description of the extent to which ancestrally related genes are found in proximity.Towards this purpose, we combined information from five genomic datasets, InterPro, SCOP, PANTHER, Ensembl protein families, and Ensembl gene paralogs. The results are provided in publicly available datasets (http://cgd.jax.org/datasets/clustering/paraclustering.shtml) describing the extent to which ancestrally related genes are in proximity beyond what is expected by chance (i.e. form paraclusters) in the human and nine other vertebrate genomes, as well as the D. melanogaster, C. elegans, A. thaliana, and S. cerevisiae genomes. With the exception of Saccharomyces, paraclusters are a common feature of the genomes we examined. In the human genome they are estimated to include at least 22% of all protein coding genes. Paraclusters are far more prevalent among some gene families than others, are highly species or clade specific and can evolve rapidly, sometimes in response to environmental cues. Altogether, they account for a large portion of the functional clustering previously reported in several genomes.

  14. Protein fold recognition using geometric kernel data fusion.

    PubMed

    Zakeri, Pooya; Jeuris, Ben; Vandebril, Raf; Moreau, Yves

    2014-07-01

    Various approaches based on features extracted from protein sequences and often machine learning methods have been used in the prediction of protein folds. Finding an efficient technique for integrating these different protein features has received increasing attention. In particular, kernel methods are an interesting class of techniques for integrating heterogeneous data. Various methods have been proposed to fuse multiple kernels. Most techniques for multiple kernel learning focus on learning a convex linear combination of base kernels. In addition to the limitation of linear combinations, working with such approaches could cause a loss of potentially useful information. We design several techniques to combine kernel matrices by taking more involved, geometry inspired means of these matrices instead of convex linear combinations. We consider various sequence-based protein features including information extracted directly from position-specific scoring matrices and local sequence alignment. We evaluate our methods for classification on the SCOP PDB-40D benchmark dataset for protein fold recognition. The best overall accuracy on the protein fold recognition test set obtained by our methods is ∼ 86.7%. This is an improvement over the results of the best existing approach. Moreover, our computational model has been developed by incorporating the functional domain composition of proteins through a hybridization model. It is observed that by using our proposed hybridization model, the protein fold recognition accuracy is further improved to 89.30%. Furthermore, we investigate the performance of our approach on the protein remote homology detection problem by fusing multiple string kernels. The MATLAB code used for our proposed geometric kernel fusion frameworks are publicly available at http://people.cs.kuleuven.be/∼raf.vandebril/homepage/software/geomean.php?menu=5/. © The Author 2014. Published by Oxford University Press.

  15. Analysis of temporal transcription expression profiles reveal links between protein function and developmental stages of Drosophila melanogaster.

    PubMed

    Wan, Cen; Lees, Jonathan G; Minneci, Federico; Orengo, Christine A; Jones, David T

    2017-10-01

    Accurate gene or protein function prediction is a key challenge in the post-genome era. Most current methods perform well on molecular function prediction, but struggle to provide useful annotations relating to biological process functions due to the limited power of sequence-based features in that functional domain. In this work, we systematically evaluate the predictive power of temporal transcription expression profiles for protein function prediction in Drosophila melanogaster. Our results show significantly better performance on predicting protein function when transcription expression profile-based features are integrated with sequence-derived features, compared with the sequence-derived features alone. We also observe that the combination of expression-based and sequence-based features leads to further improvement of accuracy on predicting all three domains of gene function. Based on the optimal feature combinations, we then propose a novel multi-classifier-based function prediction method for Drosophila melanogaster proteins, FFPred-fly+. Interpreting our machine learning models also allows us to identify some of the underlying links between biological processes and developmental stages of Drosophila melanogaster.

  16. Conflict Background Triggered Congruency Sequence Effects in Graphic Judgment Task

    PubMed Central

    Zhao, Liang; Wang, Yonghui

    2013-01-01

    Congruency sequence effects refer to the reduction of congruency effects when following an incongruent trial than following a congruent trial. The conflict monitoring account, one of the most influential contributions to this effect, assumes that the sequential modulations are evoked by response conflict. The present study aimed at exploring the congruency sequence effects in the absence of response conflict. We found congruency sequence effects occurred in graphic judgment task, in which the conflict stimuli acted as irrelevant information. The findings reveal that processing task-irrelevant conflict stimulus features could also induce sequential modulations of interference. The results do not support the interpretation of conflict monitoring and favor a feature integration account that the congruency sequence effects are attributed to the repetitions of stimulus and response features. PMID:23372766

  17. Feature Selection Has a Large Impact on One-Class Classification Accuracy for MicroRNAs in Plants.

    PubMed

    Yousef, Malik; Saçar Demirci, Müşerref Duygu; Khalifa, Waleed; Allmer, Jens

    2016-01-01

    MicroRNAs (miRNAs) are short RNA sequences involved in posttranscriptional gene regulation. Their experimental analysis is complicated and, therefore, needs to be supplemented with computational miRNA detection. Currently computational miRNA detection is mainly performed using machine learning and in particular two-class classification. For machine learning, the miRNAs need to be parametrized and more than 700 features have been described. Positive training examples for machine learning are readily available, but negative data is hard to come by. Therefore, it seems prerogative to use one-class classification instead of two-class classification. Previously, we were able to almost reach two-class classification accuracy using one-class classifiers. In this work, we employ feature selection procedures in conjunction with one-class classification and show that there is up to 36% difference in accuracy among these feature selection methods. The best feature set allowed the training of a one-class classifier which achieved an average accuracy of ~95.6% thereby outperforming previous two-class-based plant miRNA detection approaches by about 0.5%. We believe that this can be improved upon in the future by rigorous filtering of the positive training examples and by improving current feature clustering algorithms to better target pre-miRNA feature selection.

  18. Structural Features of a Picornavirus Polymerase Involved in the Polyadenylation of Viral RNA

    PubMed Central

    Kempf, Brian J.; Kelly, Michelle M.; Springer, Courtney L.; Peersen, Olve B.

    2013-01-01

    Picornaviruses have 3′ polyadenylated RNA genomes, but the mechanisms by which these genomes are polyadenylated during viral replication remain obscure. Based on prior studies, we proposed a model wherein the poliovirus RNA-dependent RNA polymerase (3Dpol) uses a reiterative transcription mechanism while replicating the poly(A) and poly(U) portions of viral RNA templates. To further test this model, we examined whether mutations in 3Dpol influenced the polyadenylation of virion RNA. We identified nine alanine substitution mutations in 3Dpol that resulted in shorter or longer 3′ poly(A) tails in virion RNA. These mutations could disrupt structural features of 3Dpol required for the recruitment of a cellular poly(A) polymerase; however, the structural orientation of these residues suggests a direct role of 3Dpol in the polyadenylation of RNA genomes. Reaction mixtures containing purified 3Dpol and a template RNA with a defined poly(U) sequence provided data consistent with a template-dependent reiterative transcription mechanism for polyadenylation. The phylogenetically conserved structural features of 3Dpol involved in the polyadenylation of virion RNA include a thumb domain alpha helix that is positioned in the minor groove of the double-stranded RNA product and lysine and arginine residues that interact with the phosphates of both the RNA template and product strands. PMID:23468507

  19. MerCat: a versatile k-mer counter and diversity estimator for database-independent property analysis obtained from metagenomic and/or metatranscriptomic sequencing data

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    White, Richard A.; Panyala, Ajay R.; Glass, Kevin A.

    MerCat is a parallel, highly scalable and modular property software package for robust analysis of features in next-generation sequencing data. MerCat inputs include assembled contigs and raw sequence reads from any platform resulting in feature abundance counts tables. MerCat allows for direct analysis of data properties without reference sequence database dependency commonly used by search tools such as BLAST and/or DIAMOND for compositional analysis of whole community shotgun sequencing (e.g. metagenomes and metatranscriptomes).

  20. Transcriptome analyses to investigate symbiotic relationships between marine protists

    PubMed Central

    Balzano, Sergio; Corre, Erwan; Decelle, Johan; Sierra, Roberto; Wincker, Patrick; Da Silva, Corinne; Poulain, Julie; Pawlowski, Jan; Not, Fabrice

    2015-01-01

    Rhizaria are an important component of oceanic plankton communities worldwide. A number of species harbor eukaryotic microalgal symbionts, which are horizontally acquired in the environment at each generation. Although these photosymbioses are determinant for Rhizaria ability to thrive in oceanic ecosystems, the mechanisms for symbiotic interactions are unclear. Using high-throughput sequencing technology (i.e., 454), we generated large Expressed Sequence Tag (EST) datasets from four uncultured Rhizaria, an acantharian (Amphilonche elongata), two polycystines (Collozoum sp. and Spongosphaera streptacantha), and one phaeodarian (Aulacantha scolymantha). We assessed the main genetic features of the host/symbionts consortium (i.e., the holobiont) transcriptomes and found rRNA sequences affiliated to a wide range of bacteria and protists in all samples, suggesting that diverse microbial communities are associated with the holobionts. A particular focus was then carried out to search for genes potentially involved in symbiotic processes such as the presence of c-type lectins-coding genes, which are proteins that play a role in cell recognition among eukaryotes. Unigenes coding putative c-type lectin domains (CTLD) were found in the species bearing photosynthetic symbionts (A. elongata, Collozoum sp., and S. streptacantha) but not in the non-symbiotic one (A. scolymantha). More particularly, phylogenetic analyses group CTLDs from A. elongata and Collozoum sp. on a distinct branch from S. streptacantha CTLDs, which contained carbohydrate-binding motifs typically observed in other marine photosymbiosis. Our data suggest that similarly to other well-known marine photosymbiosis involving metazoans, the interactions of glycans with c-type lectins is likely involved in modulation of the host/symbiont specific recognition in Radiolaria. PMID:25852650

  1. Informational structure of genetic sequences and nature of gene splicing

    NASA Astrophysics Data System (ADS)

    Trifonov, E. N.

    1991-10-01

    Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.

  2. Kindler syndrome with severe mucosal involvement in a large Palestinian pedigree.

    PubMed

    El Hachem, May; Diociaiuti, Andrea; Proto, Vittoria; Fortugno, Paola; Zambruno, Giovanna; Castiglia, Daniele; Naim, Majdy

    2015-01-01

    Kindler syndrome (KS) is a rare autosomal recessive disease of skin fragility, photosensitivity and progressive poikiloderma. Mucous membranes may also be involved. KS is caused by mutations in the FERMT1 gene encoding kindlin-1. We report the clinical and molecular features of the largest kindred with KS to date, comprising 18 affected family members (age range: 12-63 years) from the Gaza Strip. All the affected family members were clinically examined. In addition a skin biopsy for immunofluorescence testing was obtained from the index case. Molecular analysis of the FERMT1 gene was performed on genomic DNA extracted from peripheral blood of 5 patients. All patients presented skin and eye photosensitivity, cutaneous atrophy, dyschromia and poikiloderma, oral cavity involvement, dysphagia and constipation with anal fissures. In addition, nail dystrophy and digit webbing were observed in most of them. Ocular manifestations detected in all patients comprised ectropion and keratoconjunctivitis, with early development of symblepharon in 17 out of 18 cases and blindness in one. Of note, 17 out of 18 affected family members also suffered from urethral strictures since childhood. Diagnosis was supported by immunofluorescence findings and definitely confirmed by FERMT1 sequencing which identified the homozygous frame-shift mutation c.137_140delTAGT. The high rate of mucosal involvement, its early onset and progressive course are noticeable features of our kindred. Also noteworthy is the lack of muco-cutaneous malignancies, despite the sunny habitat.

  3. Autosomal dominant cerebellar ataxia with retinal degeneration (ADCA II): clinical and neuropathological findings in two pedigrees and genetic linkage to 3p12-p21.1.

    PubMed

    Jöbsis, G J; Weber, J W; Barth, P G; Keizers, H; Baas, F; van Schooneveld, M J; van Hilten, J J; Troost, D; Geesink, H H; Bolhuis, P A

    1997-04-01

    To investigate relations between clinical and neuropathological features and age of onset, presence of anticipation, and genetic linkage in autosomal dominant cerebellar ataxia type II (ADCA II). The natural history of ADCA II was studied on the basis of clinical and neuropathological findings in two pedigrees and genetic linkage studies were carried out with polymorphic DNA markers in the largest, four generation, pedigree. Ataxia was constant in all age groups. Retinal degeneration with early extinction of the electroretinogram constituted an important component in juvenile and early adult (< 25 years) onset but was variable in late adult presentation. Neuromuscular involvement due to spinal anterior horn disease was an important contributing factor to illness in juvenile cases. Postmortem findings in four patients confirm the general neurodegenerative nature of the disease, which includes prominent spinal anterior horn involvement and widespread involvement of grey and white matter. Genetic linkage was found with markers to chromosome 3p12-p21.1 (maximum pairwise lod score 4.42 at D3S1285). The sequence of clinical involvement seems related to age at onset. Retinal degeneration is variable in late onset patients and neuromuscular features are important in patients with early onset. Strong anticipation was found in subsequent generations. Linkage of ADCA II to chromosome 3p12-p21.1 is confirmed.

  4. Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition.

    PubMed

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Möller, Steffen; Hartmann, Enno; Kalies, Kai-Uwe; Suganthan, P N; Martinetz, Thomas

    2010-12-01

    Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at http://www.inb.uni-luebeck.de/tools-demos/apoptosis/GASVM.

  5. Robust k-mer frequency estimation using gapped k-mers

    PubMed Central

    Ghandi, Mahmoud; Mohammad-Noori, Morteza

    2013-01-01

    Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome. PMID:23861010

  6. Robust k-mer frequency estimation using gapped k-mers.

    PubMed

    Ghandi, Mahmoud; Mohammad-Noori, Morteza; Beer, Michael A

    2014-08-01

    Oligomers of fixed length, k, commonly known as k-mers, are often used as fundamental elements in the description of DNA sequence features of diverse biological function, or as intermediate elements in the constuction of more complex descriptors of sequence features such as position weight matrices. k-mers are very useful as general sequence features because they constitute a complete and unbiased feature set, and do not require parameterization based on incomplete knowledge of biological mechanisms. However, a fundamental limitation in the use of k-mers as sequence features is that as k is increased, larger spatial correlations in DNA sequence elements can be described, but the frequency of observing any specific k-mer becomes very small, and rapidly approaches a sparse matrix of binary counts. Thus any statistical learning approach using k-mers will be susceptible to noisy estimation of k-mer frequencies once k becomes large. Because all molecular DNA interactions have limited spatial extent, gapped k-mers often carry the relevant biological signal. Here we use gapped k-mer counts to more robustly estimate the ungapped k-mer frequencies, by deriving an equation for the minimum norm estimate of k-mer frequencies given an observed set of gapped k-mer frequencies. We demonstrate that this approach provides a more accurate estimate of the k-mer frequencies in real biological sequences using a sample of CTCF binding sites in the human genome.

  7. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Han, Cliff; Spring, Stefan; Lapidus, Alla

    Pedobacter heparinus (Payza and Korn 1956) Steyn et al. 1998 comb. nov. is the type species of the rapidly growing genus Pedobacter within the family Sphingobacteriaceae of the phylum 'Bacteroidetes'. P. heparinus is of interest, because it was the first isolated strain shown to grow with heparin as sole carbon and nitrogen source and because it produces several enzymes involved in the degradation of mucopolysaccharides. All available data about this species are based on a sole strain that was isolated from dry soil. Here we describe the features of this organism, together with the complete genome sequence, and annotation. Thismore » is the first report on a complete genome sequence of a member of the genus Pedobacter, and the 5,167,383 bp long single replicon genome with its 4287 protein-coding and 54 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.« less

  8. Microfluidics for genome-wide studies involving next generation sequencing

    PubMed Central

    Murphy, Travis W.; Lu, Chang

    2017-01-01

    Next-generation sequencing (NGS) has revolutionized how molecular biology studies are conducted. Its decreasing cost and increasing throughput permit profiling of genomic, transcriptomic, and epigenomic features for a wide range of applications. Microfluidics has been proven to be highly complementary to NGS technology with its unique capabilities for handling small volumes of samples and providing platforms for automation, integration, and multiplexing. In this article, we review recent progress on applying microfluidics to facilitate genome-wide studies. We emphasize on several technical aspects of NGS and how they benefit from coupling with microfluidic technology. We also summarize recent efforts on developing microfluidic technology for genomic, transcriptomic, and epigenomic studies, with emphasis on single cell analysis. We envision rapid growth in these directions, driven by the needs for testing scarce primary cell samples from patients in the context of precision medicine. PMID:28396707

  9. Contaminated and uncontaminated feeding influence perceived intimacy in mixed-sex dyads.

    PubMed

    Alley, Thomas R

    2012-06-01

    It was expected that viewers watching adult mixed-sex pairs dining together will give higher ratings of the perceived intimacy and involvement of the pair if feeding is displayed while eating, especially if the feeding involves contaminated (i.e., with potential germ transfer) foods. Our hypotheses were tested using a design in which participants viewed five videotapes in varying order. Each video showed different mixed-sex pairs of actors sharing meal and included a distinct form of food sharing or none. These were shown to 50 small groups of young adults in quasi-random sequences to control for order effects. Immediately after each video, viewers were asked about the attractiveness, attraction and intimacy in the dyad they had just observed. As predicted, videos featuring contaminated feeding consistently produced higher ratings on involvement and attraction than those showing uncontaminated feeding which, in turn, mostly produced higher ratings on involvement and attraction than those showing no feeding behaviors. Copyright © 2012 Elsevier Ltd. All rights reserved.

  10. The first Taxus rhizosphere microbiome revealed by shotgun metagenomic sequencing.

    PubMed

    Hao, Da-Cheng; Zhang, Cai-Rong; Xiao, Pei-Gen

    2018-06-01

    In the present study, the shotgun high throughput metagenomic sequencing was implemented to globally capture the features of Taxus rhizosphere microbiome. Total reads could be assigned to 6925 species belonging to 113 bacteria phyla and 301 species of nine fungi phyla. For archaea and virus, 263 and 134 species were for the first time identified, respectively. More than 720,000 Unigenes were identified by clean reads assembly. The top five assigned phyla were Actinobacteria (363,941 Unigenes), Proteobacteria (182,053), Acidobacteria (44,527), Ascomycota (fungi; 18,267), and Chloroflexi (15,539). KEGG analysis predicted numerous functional genes; 7101 Unigenes belong to "Xenobiotics biodegradation and metabolism." A total of 12,040 Unigenes involved in defense mechanisms (e.g., xenobiotic metabolism) were annotated by eggNOG. Talaromyces addition could influence not only the diversity and structure of microbial communities of Taxus rhizosphere, but also the relative abundance of functional genes, including metabolic genes, antibiotic resistant genes, and genes involved in pathogen-host interaction, bacterial virulence, and bacterial secretion system. The structure and function of rhizosphere microbiome could be sensitive to non-native microbe addition, which could impact on the pollutant degradation. This study, complementary to the amplicon sequencing, more objectively reflects the native microbiome of Taxus rhizosphere and its response to environmental pressure, and lays a foundation for potential combination of phytoremediation and bioaugmentation. © 2018 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  11. The sequence of camelpox virus shows it is most closely related to variola virus, the cause of smallpox.

    PubMed

    Gubser, Caroline; Smith, Geoffrey L

    2002-04-01

    Camelpox virus (CMPV) and variola virus (VAR) are orthopoxviruses (OPVs) that share several biological features and cause high mortality and morbidity in their single host species. The sequence of a virulent CMPV strain was determined; it is 202182 bp long, with inverted terminal repeats (ITRs) of 6045 bp and has 206 predicted open reading frames (ORFs). As for other poxviruses, the genes are tightly packed with little non-coding sequence. Most genes within 25 kb of each terminus are transcribed outwards towards the terminus, whereas genes within the centre of the genome are transcribed from either DNA strand. The central region of the genome contains genes that are highly conserved in other OPVs and 87 of these are conserved in all sequenced chordopoxviruses. In contrast, genes towards either terminus are more variable and encode proteins involved in host range, virulence or immunomodulation. In some cases, these are broken versions of genes found in other OPVs. The relationship of CMPV to other OPVs was analysed by comparisons of DNA and predicted protein sequences, repeats within the ITRs and arrangement of ORFs within the terminal regions. Each comparison gave the same conclusion: CMPV is the closest known virus to variola virus, the cause of smallpox.

  12. Sequence Bundles: a novel method for visualising, discovering and exploring sequence motifs

    PubMed Central

    2014-01-01

    Background We introduce Sequence Bundles--a novel data visualisation method for representing multiple sequence alignments (MSAs). We identify and address key limitations of the existing bioinformatics data visualisation methods (i.e. the Sequence Logo) by enabling Sequence Bundles to give salient visual expression to sequence motifs and other data features, which would otherwise remain hidden. Methods For the development of Sequence Bundles we employed research-led information design methodologies. Sequences are encoded as uninterrupted, semi-opaque lines plotted on a 2-dimensional reconfigurable grid. Each line represents a single sequence. The thickness and opacity of the stack at each residue in each position indicates the level of conservation and the lines' curved paths expose patterns in correlation and functionality. Several MSAs can be visualised in a composite image. The Sequence Bundles method is designed to favour a tangible, continuous and intuitive display of information. Results We have developed a software demonstration application for generating a Sequence Bundles visualisation of MSAs provided for the BioVis 2013 redesign contest. A subsequent exploration of the visualised line patterns allowed for the discovery of a number of interesting features in the dataset. Reported features include the extreme conservation of sequences displaying a specific residue and bifurcations of the consensus sequence. Conclusions Sequence Bundles is a novel method for visualisation of MSAs and the discovery of sequence motifs. It can aid in generating new insight and hypothesis making. Sequence Bundles is well disposed for future implementation as an interactive visual analytics software, which can complement existing visualisation tools. PMID:25237395

  13. Novel de novo pathogenic variant in the NR2F2 gene in a boy with congenital heart defect and dysmorphic features.

    PubMed

    Upadia, Jariya; Gonzales, Patrick R; Robin, Nathaniel H

    2018-04-16

    The NR2F2 gene plays an important role in angiogenesis and heart development. Moreover, this gene is involved in organogenesis in many other organs in mouse models. Variants in this gene have been reported in a number of patients with nonsyndromic atrioventricular septal defect, and in one patient with congenital heart defect and dysmorphic features. Here we report an 11-month-old Caucasian male with global developmental delay, dysmorphic features, coarctation of the aorta, and ventricular septal defect. He was later found to have a pathogenic mutation in the NR2F2 gene by whole exome sequencing. This is the second instance in which an NR2F2 mutation has been identified in a child with a congenital heart defect and other anomalies. This case suggests that some variants in NR2F2 may cause syndromic forms of congenital heart defect. © 2018 Wiley Periodicals, Inc.

  14. Predicting protein-protein interactions by combing various sequence- derived features into the general form of Chou's Pseudo amino acid composition.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2012-05-01

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  15. Fifty years of coiled-coils and alpha-helical bundles: a close relationship between sequence and structure.

    PubMed

    Parry, David A D; Fraser, R D Bruce; Squire, John M

    2008-09-01

    alpha-Helical coiled coils are remarkable for the diversity of related conformations that they adopt in both fibrous and globular proteins, and for the range of functions that they exhibit. The coiled coils are based on a heptad (7-residue), hendecad (11-residue) or a related quasi-repeat of apolar residues in the sequences of the alpha-helical regions involved. Most of these, however, display one or more sequence discontinuities known as stutters or stammers. The resulting coiled coils vary in length, in the number of chains participating, in the relative polarity of the contributing alpha-helical regions (parallel or antiparallel), and in the pitch length and handedness of the supercoil (left- or right-handed). Functionally, the concept that a coiled coil can act only as a static rod is no longer valid, and the range of roles that these structures have now been shown to exhibit has expanded rapidly in recent years. An important development has been the recognition that the delightful simplicity that exists between sequence and structure, and between structure and function, allows coiled coils with specialized features to be designed de novo.

  16. Identification and removal of low-complexity sites in allele-specific analysis of ChIP-seq data.

    PubMed

    Waszak, Sebastian M; Kilpinen, Helena; Gschwind, Andreas R; Orioli, Andrea; Raghav, Sunil K; Witwicki, Robert M; Migliavacca, Eugenia; Yurovsky, Alisa; Lappalainen, Tuuli; Hernandez, Nouria; Reymond, Alexandre; Dermitzakis, Emmanouil T; Deplancke, Bart

    2014-01-15

    High-throughput sequencing technologies enable the genome-wide analysis of the impact of genetic variation on molecular phenotypes at unprecedented resolution. However, although powerful, these technologies can also introduce unexpected artifacts. We investigated the impact of library amplification bias on the identification of allele-specific (AS) molecular events from high-throughput sequencing data derived from chromatin immunoprecipitation assays (ChIP-seq). Putative AS DNA binding activity for RNA polymerase II was determined using ChIP-seq data derived from lymphoblastoid cell lines of two parent-daughter trios. We found that, at high-sequencing depth, many significant AS binding sites suffered from an amplification bias, as evidenced by a larger number of clonal reads representing one of the two alleles. To alleviate this bias, we devised an amplification bias detection strategy, which filters out sites with low read complexity and sites featuring a significant excess of clonal reads. This method will be useful for AS analyses involving ChIP-seq and other functional sequencing assays. The R package abs filter for library clonality simulations and detection of amplification-biased sites is available from http://updepla1srv1.epfl.ch/waszaks/absfilter

  17. Comparative analysis on the structural features of the 5' flanking region of κ-casein genes from six different species

    PubMed Central

    Gerencsér, Ákos; Barta, Endre; Boa, Simon; Kastanis, Petros; Bösze, Zsuzsanna; Whitelaw, C Bruce A

    2002-01-01

    κ-casein plays an essential role in the formation, stabilisation and aggregation of milk micelles. Control of κ-casein expression reflects this essential role, although an understanding of the mechanisms involved lags behind that of the other milk protein genes. We determined the 5'-flanking sequences for the murine, rabbit and human κ-casein genes and compared them to the published ruminant sequences. The most conserved region was not the proximal promoter region but an approximately 400 bp long region centred 800 bp upstream of the TATA box. This region contained two highly conserved MGF/STAT5 sites with common spacing relative to each other. In this region, six conserved short stretches of similarity were also found which did not correspond to known transcription factor consensus sites. On the contrary to ruminant and human 5' regulatory sequences, the rabbit and murine 5'-flanking regions did not harbour any kind of repetitive elements. We generated a phylogenetic tree of the six species based on multiple alignment of the κ-casein sequences. This study identified conserved candidate transcriptional regulatory elements within the κ-casein gene promoter. PMID:11929628

  18. Robust sensorimotor representation to physical interaction changes in humanoid motion learning.

    PubMed

    Shimizu, Toshihiko; Saegusa, Ryo; Ikemoto, Shuhei; Ishiguro, Hiroshi; Metta, Giorgio

    2015-05-01

    This paper proposes a learning from demonstration system based on a motion feature, called phase transfer sequence. The system aims to synthesize the knowledge on humanoid whole body motions learned during teacher-supported interactions, and apply this knowledge during different physical interactions between a robot and its surroundings. The phase transfer sequence represents the temporal order of the changing points in multiple time sequences. It encodes the dynamical aspects of the sequences so as to absorb the gaps in timing and amplitude derived from interaction changes. The phase transfer sequence was evaluated in reinforcement learning of sitting-up and walking motions conducted by a real humanoid robot and compatible simulator. In both tasks, the robotic motions were less dependent on physical interactions when learned by the proposed feature than by conventional similarity measurements. Phase transfer sequence also enhanced the convergence speed of motion learning. Our proposed feature is original primarily because it absorbs the gaps caused by changes of the originally acquired physical interactions, thereby enhancing the learning speed in subsequent interactions.

  19. WebLogo: A Sequence Logo Generator

    PubMed Central

    Crooks, Gavin E.; Hon, Gary; Chandonia, John-Marc; Brenner, Steven E.

    2004-01-01

    WebLogo generates sequence logos, graphical representations of the patterns within a multiple sequence alignment. Sequence logos provide a richer and more precise description of sequence similarity than consensus sequences and can rapidly reveal significant features of the alignment otherwise difficult to perceive. Each logo consists of stacks of letters, one stack for each position in the sequence. The overall height of each stack indicates the sequence conservation at that position (measured in bits), whereas the height of symbols within the stack reflects the relative frequency of the corresponding amino or nucleic acid at that position. WebLogo has been enhanced recently with additional features and options, to provide a convenient and highly configurable sequence logo generator. A command line interface and the complete, open WebLogo source code are available for local installation and customization. PMID:15173120

  20. Genome Sequence of Azospirillum brasilense CBG497 and Comparative Analyses of Azospirillum Core and Accessory Genomes provide Insight into Niche Adaptation

    PubMed Central

    Wisniewski-Dyé, Florence; Lozano, Luis; Acosta-Cruz, Erika; Borland, Stéphanie; Drogue, Benoît; Prigent-Combaret, Claire; Rouy, Zoé; Barbe, Valérie; Mendoza Herrera, Alberto; González, Victor; Mavingui, Patrick

    2012-01-01

    Bacteria of the genus Azospirillum colonize roots of important cereals and grasses, and promote plant growth by several mechanisms, notably phytohormone synthesis. The genomes of several Azospirillum strains belonging to different species, isolated from various host plants and locations, were recently sequenced and published. In this study, an additional genome of an A. brasilense strain, isolated from maize grown on an alkaline soil in the northeast of Mexico, strain CBG497, was obtained. Comparative genomic analyses were performed on this new genome and three other genomes (A. brasilense Sp245, A. lipoferum 4B and Azospirillum sp. B510). The Azospirillum core genome was established and consists of 2,328 proteins, representing between 30% to 38% of the total encoded proteins within a genome. It is mainly chromosomally-encoded and contains 74% of genes of ancestral origin shared with some aquatic relatives. The non-ancestral part of the core genome is enriched in genes involved in signal transduction, in transport and in metabolism of carbohydrates and amino-acids, and in surface properties features linked to adaptation in fluctuating environments, such as soil and rhizosphere. Many genes involved in colonization of plant roots, plant-growth promotion (such as those involved in phytohormone biosynthesis), and properties involved in rhizosphere adaptation (such as catabolism of phenolic compounds, uptake of iron) are restricted to a particular strain and/or species, strongly suggesting niche-specific adaptation. PMID:24705077

  1. MEvoLib v1.0: the first molecular evolution library for Python.

    PubMed

    Álvarez-Jarreta, Jorge; Ruiz-Pesini, Eduardo

    2016-10-28

    Molecular evolution studies involve many different hard computational problems solved, in most cases, with heuristic algorithms that provide a nearly optimal solution. Hence, diverse software tools exist for the different stages involved in a molecular evolution workflow. We present MEvoLib, the first molecular evolution library for Python, providing a framework to work with different tools and methods involved in the common tasks of molecular evolution workflows. In contrast with already existing bioinformatics libraries, MEvoLib is focused on the stages involved in molecular evolution studies, enclosing the set of tools with a common purpose in a single high-level interface with fast access to their frequent parameterizations. The gene clustering from partial or complete sequences has been improved with a new method that integrates accessible external information (e.g. GenBank's features data). Moreover, MEvoLib adjusts the fetching process from NCBI databases to optimize the download bandwidth usage. In addition, it has been implemented using parallelization techniques to cope with even large-case scenarios. MEvoLib is the first library for Python designed to facilitate molecular evolution researches both for expert and novel users. Its unique interface for each common task comprises several tools with their most used parameterizations. It has also included a method to take advantage of biological knowledge to improve the gene partition of sequence datasets. Additionally, its implementation incorporates parallelization techniques to enhance computational costs when handling very large input datasets.

  2. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    DOE PAGES

    Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.; ...

    2017-04-09

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less

  3. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Musumeci, Matias A.; Lozada, Mariana; Rial, Daniela V.

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putativemore » monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. As a result, this work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.« less

  4. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach.

    PubMed

    Musumeci, Matías A; Lozada, Mariana; Rial, Daniela V; Mac Cormack, Walter P; Jansson, Janet K; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M

    2017-04-09

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer-Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments.

  5. Prospecting Biotechnologically-Relevant Monooxygenases from Cold Sediment Metagenomes: An In Silico Approach

    PubMed Central

    Musumeci, Matías A.; Lozada, Mariana; Rial, Daniela V.; Mac Cormack, Walter P.; Jansson, Janet K.; Sjöling, Sara; Carroll, JoLynn; Dionisi, Hebe M.

    2017-01-01

    The goal of this work was to identify sequences encoding monooxygenase biocatalysts with novel features by in silico mining an assembled metagenomic dataset of polar and subpolar marine sediments. The targeted enzyme sequences were Baeyer–Villiger and bacterial cytochrome P450 monooxygenases (CYP153). These enzymes have wide-ranging applications, from the synthesis of steroids, antibiotics, mycotoxins and pheromones to the synthesis of monomers for polymerization and anticancer precursors, due to their extraordinary enantio-, regio-, and chemo- selectivity that are valuable features for organic synthesis. Phylogenetic analyses were used to select the most divergent sequences affiliated to these enzyme families among the 264 putative monooxygenases recovered from the ~14 million protein-coding sequences in the assembled metagenome dataset. Three-dimensional structure modeling and docking analysis suggested features useful in biotechnological applications in five metagenomic sequences, such as wide substrate range, novel substrate specificity or regioselectivity. Further analysis revealed structural features associated with psychrophilic enzymes, such as broader substrate accessibility, larger catalytic pockets or low domain interactions, suggesting that they could be applied in biooxidations at room or low temperatures, saving costs inherent to energy consumption. This work allowed the identification of putative enzyme candidates with promising features from metagenomes, providing a suitable starting point for further developments. PMID:28397770

  6. Double-strand break repair processes drive evolution of the mitochondrial genome in Arabidopsis.

    PubMed

    Davila, Jaime I; Arrieta-Montiel, Maria P; Wamboldt, Yashitola; Cao, Jun; Hagmann, Joerg; Shedge, Vikas; Xu, Ying-Zhi; Weigel, Detlef; Mackenzie, Sally A

    2011-09-27

    The mitochondrial genome of higher plants is unusually dynamic, with recombination and nonhomologous end-joining (NHEJ) activities producing variability in size and organization. Plant mitochondrial DNA also generally displays much lower nucleotide substitution rates than mammalian or yeast systems. Arabidopsis displays these features and expedites characterization of the mitochondrial recombination surveillance gene MSH1 (MutS 1 homolog), lending itself to detailed study of de novo mitochondrial genome activity. In the present study, we investigated the underlying basis for unusual plant features as they contribute to rapid mitochondrial genome evolution. We obtained evidence of double-strand break (DSB) repair, including NHEJ, sequence deletions and mitochondrial asymmetric recombination activity in Arabidopsis wild-type and msh1 mutants on the basis of data generated by Illumina deep sequencing and confirmed by DNA gel blot analysis. On a larger scale, with mitochondrial comparisons across 72 Arabidopsis ecotypes, similar evidence of DSB repair activity differentiated ecotypes. Forty-seven repeat pairs were active in DNA exchange in the msh1 mutant. Recombination sites showed asymmetrical DNA exchange within lengths of 50- to 556-bp sharing sequence identity as low as 85%. De novo asymmetrical recombination involved heteroduplex formation, gene conversion and mismatch repair activities. Substoichiometric shifting by asymmetrical exchange created the appearance of rapid sequence gain and loss in association with particular repeat classes. Extensive mitochondrial genomic variation within a single plant species derives largely from DSB activity and its repair. Observed gene conversion and mismatch repair activity contribute to the low nucleotide substitution rates seen in these genomes. On a phenotypic level, these patterns of rearrangement likely contribute to the reproductive versatility of higher plants.

  7. Metal-free trifluoromethylation of aromatic and heteroaromatic aldehydes and ketones.

    PubMed

    Qiao, Yupu; Si, Tuda; Yang, Ming-Hsiu; Altman, Ryan A

    2014-08-01

    The ability to convert simple and common substrates into fluoroalkyl derivatives under mild conditions remains an important goal for medicinal and agricultural chemists. One representative example of a desirable transformation involves the conversion of aromatic and heteroaromatic ketones and aldehydes into aryl and heteroaryl β,β,β-trifluoroethylarenes and -heteroarenes. The traditional approach for this net transformation involves stoichiometric metals and/or multistep reaction sequences that consume excessive time, material, and labor resources while providing low yields of products. To complement these traditional strategies, we report a one-pot metal-free decarboxylative procedure for accessing β,β,β-trifluoroethylarenes and -heteroarenes from readily available ketones and aldehydes. This method features several benefits, including ease of operation, readily available reagents, mild reaction conditions, high functional-group compatibility, and scalability.

  8. Metal-Free Trifluoromethylation of Aromatic and Heteroaromatic Aldehydes and Ketones

    PubMed Central

    2015-01-01

    The ability to convert simple and common substrates into fluoroalkyl derivatives under mild conditions remains an important goal for medicinal and agricultural chemists. One representative example of a desirable transformation involves the conversion of aromatic and heteroaromatic ketones and aldehydes into aryl and heteroaryl β,β,β-trifluoroethylarenes and -heteroarenes. The traditional approach for this net transformation involves stoichiometric metals and/or multistep reaction sequences that consume excessive time, material, and labor resources while providing low yields of products. To complement these traditional strategies, we report a one-pot metal-free decarboxylative procedure for accessing β,β,β-trifluoroethylarenes and -heteroarenes from readily available ketones and aldehydes. This method features several benefits, including ease of operation, readily available reagents, mild reaction conditions, high functional-group compatibility, and scalability. PMID:25001876

  9. Genetic Rearrangements Can Modify Chromatin Features at Epialleles

    PubMed Central

    Foerster, Andrea M.; Dinh, Huy Q.; Sedman, Laura; Wohlrab, Bonnie; Mittelsten Scheid, Ortrun

    2011-01-01

    Analogous to genetically distinct alleles, epialleles represent heritable states of different gene expression from sequence-identical genes. Alleles and epialleles both contribute to phenotypic heterogeneity. While alleles originate from mutation and recombination, the source of epialleles is less well understood. We analyze active and inactive epialleles that were found at a transgenic insert with a selectable marker gene in Arabidopsis. Both converse expression states are stably transmitted to progeny. The silent epiallele was previously shown to change its state upon loss-of-function of trans-acting regulators and drug treatments. We analyzed the composition of the epialleles, their chromatin features, their nuclear localization, transcripts, and homologous small RNA. After mutagenesis by T-DNA transformation of plants carrying the silent epiallele, we found new active alleles. These switches were associated with different, larger or smaller, and non-overlapping deletions or rearrangements in the 3′ regions of the epiallele. These cis-mutations caused different degrees of gene expression stability depending on the nature of the sequence alteration, the consequences for transcription and transcripts, and the resulting chromatin organization upstream. This illustrates a tight dependence of epigenetic regulation on local structures and indicates that sequence alterations can cause epigenetic changes at some distance in regions not directly affected by the mutation. Similar effects may also be involved in gene expression and chromatin changes in the vicinity of transposon insertions or excisions, recombination events, or DNA repair processes and could contribute to the origin of new epialleles. PMID:22028669

  10. Using Highlighting to Train Attentional Expertise

    PubMed Central

    Roads, Brett; Mozer, Michael C.; Busey, Thomas A.

    2016-01-01

    Acquiring expertise in complex visual tasks is time consuming. To facilitate the efficient training of novices on where to look in these tasks, we propose an attentional highlighting paradigm. Highlighting involves dynamically modulating the saliency of a visual image to guide attention along the fixation path of a domain expert who had previously viewed the same image. In Experiment 1, we trained naive subjects via attentional highlighting on a fingerprint-matching task. Before and after training, we asked subjects to freely inspect images containing pairs of prints and determine whether the prints matched. Fixation sequences were automatically scored for the degree of expertise exhibited using a Bayesian discriminative model of novice and expert gaze behavior. Highlighted training causes gaze behavior to become more expert-like not only on the trained images but also on transfer images, indicating generalization of learning. In Experiment 2, to control for the possibility that the increase in expertise is due to mere exposure, we trained subjects via highlighting of fixation sequences from novices, not experts, and observed no transition toward expertise. In Experiment 3, to determine the specificity of the training effect, we trained subjects with expert fixation sequences from images other than the one being viewed, which preserves coarse-scale statistics of expert gaze but provides no information about fine-grain features. Observing at least a partial transition toward expertise, we obtain only weak evidence that the highlighting procedure facilitates the learning of critical local features. We discuss possible improvements to the highlighting procedure. PMID:26744839

  11. Using Highlighting to Train Attentional Expertise.

    PubMed

    Roads, Brett; Mozer, Michael C; Busey, Thomas A

    2016-01-01

    Acquiring expertise in complex visual tasks is time consuming. To facilitate the efficient training of novices on where to look in these tasks, we propose an attentional highlighting paradigm. Highlighting involves dynamically modulating the saliency of a visual image to guide attention along the fixation path of a domain expert who had previously viewed the same image. In Experiment 1, we trained naive subjects via attentional highlighting on a fingerprint-matching task. Before and after training, we asked subjects to freely inspect images containing pairs of prints and determine whether the prints matched. Fixation sequences were automatically scored for the degree of expertise exhibited using a Bayesian discriminative model of novice and expert gaze behavior. Highlighted training causes gaze behavior to become more expert-like not only on the trained images but also on transfer images, indicating generalization of learning. In Experiment 2, to control for the possibility that the increase in expertise is due to mere exposure, we trained subjects via highlighting of fixation sequences from novices, not experts, and observed no transition toward expertise. In Experiment 3, to determine the specificity of the training effect, we trained subjects with expert fixation sequences from images other than the one being viewed, which preserves coarse-scale statistics of expert gaze but provides no information about fine-grain features. Observing at least a partial transition toward expertise, we obtain only weak evidence that the highlighting procedure facilitates the learning of critical local features. We discuss possible improvements to the highlighting procedure.

  12. RNA-ID, a highly sensitive and robust method to identify cis-regulatory sequences using superfolder GFP and a fluorescence-based assay.

    PubMed

    Dean, Kimberly M; Grayhack, Elizabeth J

    2012-12-01

    We have developed a robust and sensitive method, called RNA-ID, to screen for cis-regulatory sequences in RNA using fluorescence-activated cell sorting (FACS) of yeast cells bearing a reporter in which expression of both superfolder green fluorescent protein (GFP) and yeast codon-optimized mCherry red fluorescent protein (RFP) is driven by the bidirectional GAL1,10 promoter. This method recapitulates previously reported progressive inhibition of translation mediated by increasing numbers of CGA codon pairs, and restoration of expression by introduction of a tRNA with an anticodon that base pairs exactly with the CGA codon. This method also reproduces effects of paromomycin and context on stop codon read-through. Five key features of this method contribute to its effectiveness as a selection for regulatory sequences: The system exhibits greater than a 250-fold dynamic range, a quantitative and dose-dependent response to known inhibitory sequences, exquisite resolution that allows nearly complete physical separation of distinct populations, and a reproducible signal between different cells transformed with the identical reporter, all of which are coupled with simple methods involving ligation-independent cloning, to create large libraries. Moreover, we provide evidence that there are sequences within a 9-nt library that cause reduced GFP fluorescence, suggesting that there are novel cis-regulatory sequences to be found even in this short sequence space. This method is widely applicable to the study of both RNA-mediated and codon-mediated effects on expression.

  13. The SIDER2 elements, interspersed repeated sequences that populate the Leishmania genomes, constitute subfamilies showing chromosomal proximity relationship.

    PubMed

    Requena, Jose M; Folgueira, Cristina; López, Manuel C; Thomas, M Carmen

    2008-06-02

    Protozoan parasites of the genus Leishmania are causative agents of a diverse spectrum of human diseases collectively known as leishmaniasis. These eukaryotic pathogens that diverged early from the main eukaryotic lineage possess a number of unusual genomic, molecular and biochemical features. The completion of the genome projects for three Leishmania species has generated invaluable information enabling a direct analysis of genome structure and organization. By using DNA macroarrays, made with Leishmania infantum genomic clones and hybridized with total DNA from the parasite, we identified a clone containing a repeated sequence. An analysis of the recently completed genome sequence of L. infantum, using this repeated sequence as bait, led to the identification of a new class of repeated elements that are interspersed along the different L. infantum chromosomes. These elements turned out to be homologues of SIDER2 sequences, which were recently identified in the Leishmania major genome; thus, we adopted this nomenclature for the Leishmania elements described herein. Since SIDER2 elements are very heterogeneous in sequence, their precise identification is rather laborious. We have characterized 54 LiSIDER2 elements in chromosome 32 and 27 ones in chromosome 20. The mean size for these elements is 550 bp and their sequence is G+C rich (mean value of 66.5%). On the basis of sequence similarity, these elements can be grouped in subfamilies that show a remarkable relationship of proximity, i.e. SIDER2s of a given subfamily locate close in a chromosomal region without intercalating elements. For comparative purposes, we have identified the SIDER2 elements existing in L. major and Leishmania braziliensis chromosomes 32. While SIDER2 elements are highly conserved both in number and location between L. infantum and L. major, no such conservation exists when comparing with SIDER2s in L. braziliensis chromosome 32. SIDER2 elements constitute a relevant piece in the Leishmania genome organization. Sequence characteristics, genomic distribution and evolutionarily conservation of SIDER2s are suggestive of relevant functions for these elements in Leishmania. Apart from a proved involvement in post-transcriptional mechanisms of gene regulation, SIDER2 elements could be involved in DNA amplification processes and, perhaps, in chromosome segregation as centromeric sequences.

  14. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information.

  15. Molecular Structure and Sequence in Complex Coacervates

    NASA Astrophysics Data System (ADS)

    Sing, Charles; Lytle, Tyler; Madinya, Jason; Radhakrishna, Mithun

    Oppositely-charged polyelectrolytes in aqueous solution can undergo associative phase separation, in a process known as complex coacervation. This results in a polyelectrolyte-dense phase (coacervate) and polyelectrolyte-dilute phase (supernatant). There remain challenges in understanding this process, despite a long history in polymer physics. We use Monte Carlo simulation to demonstrate that molecular features (charge spacing, size) play a crucial role in governing the equilibrium in coacervates. We show how these molecular features give rise to strong monomer sequence effects, due to a combination of counterion condensation and correlation effects. We distinguish between structural and sequence-based correlations, which can be designed to tune the phase diagram of coacervation. Sequence effects further inform the physical understanding of coacervation, and provide the basis for new coacervation models that take monomer-level features into account.

  16. The mitochondrial genome of a sea anemone Bolocera sp. exhibits novel genetic structures potentially involved in adaptation to the deep-sea environment.

    PubMed

    Zhang, Bo; Zhang, Yan-Hong; Wang, Xin; Zhang, Hui-Xian; Lin, Qiang

    2017-07-01

    The deep sea is one of the most extensive ecosystems on earth. Organisms living there survive in an extremely harsh environment, and their mitochondrial energy metabolism might be a result of evolution. As one of the most important organelles, mitochondria generate energy through energy metabolism and play an important role in almost all biological activities. In this study, the mitogenome of a deep-sea sea anemone ( Bolocera sp.) was sequenced and characterized. Like other metazoans, it contained 13 energy pathway protein-coding genes and two ribosomal RNAs. However, it also exhibited some unique features: just two transfer RNA genes, two group I introns, two transposon-like noncanonical open reading frames (ORFs), and a control region-like (CR-like) element. All of the mitochondrial genes were coded by the same strand (the H-strand). The genetic order and orientation were identical to those of most sequenced actiniarians. Phylogenetic analyses showed that this species was closely related to Bolocera tuediae . Positive selection analysis showed that three residues (31 L and 42 N in ATP6 , 570 S in ND5 ) of Bolocera sp. were positively selected sites. By comparing these features with those of shallow sea anemone species, we deduced that these novel gene features may influence the activity of mitochondrial genes. This study may provide some clues regarding the adaptation of Bolocera sp. to the deep-sea environment.

  17. A computational analysis of the three isoforms of glutamate dehydrogenase reveals structural features of the isoform EC 1.4.1.4 supporting a key role in ammonium assimilation by plants

    PubMed Central

    Jaspard, Emmanuel

    2006-01-01

    Background There are three isoforms of glutamate dehydrogenase. The isoform EC 1.4.1.4 (GDH4) catalyses glutamate synthesis from 2-oxoglutarate and ammonium, using NAD(P)H. Ammonium assimilation is critical for plant growth. Although GDH4 from animals and prokaryotes are well characterized, there are few data concerning plant GDH4, even from those whose genomes are well annotated. Results A large set of the three GDH isoforms was built resulting in 116 non-redundant full polypeptide sequences. A computational analysis was made to gain more information concerning the structure – function relationship of GDH4 from plants (Eukaryota, Viridiplantae). The tested plant GDH4 sequences were the two ones known to date, those of Chlorella sorokiniana. This analysis revealed several structural features specific of plant GDH4: (i) the lack of a structure called "antenna"; (ii) the NAD(P)-binding motif GAGNVA; and (iii) a second putative coenzyme-binding motif GVLTGKG together with four residues involved in the binding of the reduced form of NADP. Conclusion A number of structural features specific of plant GDH4 have been found. The results reinforce the probable key role of GDH4 in ammonium assimilation by plants. Reviewers This article was reviewed by Tina Bakolitsa (nominated by Eugene Koonin), Martin Jambon (nominated by Laura Landweber), Sandor Pangor and Franck Eisenhaber. PMID:17173671

  18. Utility of fat-suppressed sequences in differentiation of aggressive vs typical asymptomatic haemangioma of the spine

    PubMed Central

    Nabavizadeh, Seyed Ali; Mamourian, Alexander; Schmitt, James E; Cloran, Francis; Vossough, Arastoo; Pukenas, Bryan; Loevner, Laurie A

    2016-01-01

    Objective: While haemangiomas are common benign vascular lesions involving the spine, some behave in an aggressive fashion. We investigated the utility of fat-suppressed sequences to differentiate between benign and aggressive vertebral haemangiomas. Methods: Patients with the diagnosis of aggressive vertebral haemangioma and available short tau inversion-recovery or T2 fat saturation sequence were included in the study. 11 patients with typical asymptomatic vertebral body haemangiomas were selected as the control group. Region of interest signal intensity (SI) analysis of the entire haemangioma as well as the portion of each haemangioma with highest signal on fat-saturation sequences was performed and normalized to a reference normal vertebral body. Results: A total of 8 patients with aggressive vertebral haemangioma and 11 patients with asymptomatic typical vertebral haemangioma were included. There was a significant difference between total normalized mean SI ratio (3.14 vs 1.48, p = 0.0002), total normalized maximum SI ratio (5.72 vs 2.55, p = 0.0003), brightest normalized mean SI ratio (4.28 vs 1.72, p < 0.0001) and brightest normalized maximum SI ratio (5.25 vs 2.45, p = 0.0003). Multiple measures were able to discriminate between groups with high sensitivity (>88%) and specificity (>82%). Conclusion: In addition to the conventional imaging features such as vertebral expansion and presence of extravertebral component, quantitative evaluation of fat-suppression sequences is also another imaging feature that can differentiate aggressive haemangioma and typical asymptomatic haemangioma. Advances in knowledge: The use of quantitative fat-suppressed MRI in vertebral haemangiomas is demonstrated. Quantitative fat-suppressed MRI can have a role in confirming the diagnosis of aggressive haemangiomas. In addition, this application can be further investigated in future studies to predict aggressiveness of vertebral haemangiomas in early stages. PMID:26511277

  19. Learning of goal-relevant and -irrelevant complex visual sequences in human V1.

    PubMed

    Rosenthal, Clive R; Mallik, Indira; Caballero-Gaudes, Cesar; Sereno, Martin I; Soto, David

    2018-06-12

    Learning and memory are supported by a network involving the medial temporal lobe and linked neocortical regions. Emerging evidence indicates that primary visual cortex (i.e., V1) may contribute to recognition memory, but this has been tested only with a single visuospatial sequence as the target memorandum. The present study used functional magnetic resonance imaging to investigate whether human V1 can support the learning of multiple, concurrent complex visual sequences involving discontinous (second-order) associations. Two peripheral, goal-irrelevant but structured sequences of orientated gratings appeared simultaneously in fixed locations of the right and left visual fields alongside a central, goal-relevant sequence that was in the focus of spatial attention. Pseudorandom sequences were introduced at multiple intervals during the presentation of the three structured visual sequences to provide an online measure of sequence-specific knowledge at each retinotopic location. We found that a network involving the precuneus and V1 was involved in learning the structured sequence presented at central fixation, whereas right V1 was modulated by repeated exposure to the concurrent structured sequence presented in the left visual field. The same result was not found in left V1. These results indicate for the first time that human V1 can support the learning of multiple concurrent sequences involving complex discontinuous inter-item associations, even peripheral sequences that are goal-irrelevant. Copyright © 2018. Published by Elsevier Inc.

  20. Learning Behavior Characterization with Multi-Feature, Hierarchical Activity Sequences

    ERIC Educational Resources Information Center

    Ye, Cheng; Segedy, James R.; Kinnebrew, John S.; Biswas, Gautam

    2015-01-01

    This paper discusses Multi-Feature Hierarchical Sequential Pattern Mining, MFH-SPAM, a novel algorithm that efficiently extracts patterns from students' learning activity sequences. This algorithm extends an existing sequential pattern mining algorithm by dynamically selecting the level of specificity for hierarchically-defined features…

  1. Description of the PMAD DC test bed architecture and integration sequence

    NASA Technical Reports Server (NTRS)

    Beach, R. F.; Trash, L.; Fong, D.; Bolerjack, B.

    1991-01-01

    NASA-Lewis is responsible for the development, fabrication, and assembly of the electric power system (EPS) for the Space Station Freedom (SSF). The SSF power system is radically different from previous spacecraft power systems in both the size and complexity of the system. Unlike past spacecraft power system the SSF EPS will grow and be maintained on orbit and must be flexible to meet changing user power needs. The SSF power system is also unique in comparison with terrestrial power systems because it is dominated by power electronic converters which regulate and control the power. Although spacecraft historically have used power converters for regulation they typically involved only a single series regulating element. The SSF EPS involves multiple regulating elements, two or more in series, prior to the load. These unique system features required the construction of a testbed which would allow the development of spacecraft power system technology. A description is provided of the Power Management and Distribution (PMAD) DC Testbed which was assembled to support the design and early evaluation of the SSF EPS. A description of the integration process used in the assembly sequence is also given along with a description of the support facility.

  2. SeqDepot: streamlined database of biological sequences and precomputed features.

    PubMed

    Ulrich, Luke E; Zhulin, Igor B

    2014-01-15

    Assembling and/or producing integrated knowledge of sequence features continues to be an onerous and redundant task despite a large number of existing resources. We have developed SeqDepot-a novel database that focuses solely on two primary goals: (i) assimilating known primary sequences with predicted feature data and (ii) providing the most simple and straightforward means to procure and readily use this information. Access to >28.5 million sequences and 300 million features is provided through a well-documented and flexible RESTful interface that supports fetching specific data subsets, bulk queries, visualization and searching by MD5 digests or external database identifiers. We have also developed an HTML5/JavaScript web application exemplifying how to interact with SeqDepot and Perl/Python scripts for use with local processing pipelines. Freely available on the web at http://seqdepot.net/. RESTaccess via http://seqdepot.net/api/v1. Database files and scripts maybe downloaded from http://seqdepot.net/download.

  3. Non-encapsidation Activities of the Capsid Proteins of Positive-strand RNA Viruses

    PubMed Central

    Ni, Peng; Kao, C. Cheng

    2013-01-01

    Viral capsid proteins (CPs) are characterized by their role in forming protective shells around viral genomes. However, CPs have additional and important roles in the virus infection cycles and in the cellular response to infection. These activities involve CP binding to RNAs in both sequence-specific and nonspecific manners as well as association with other proteins. This review focuses on CPs of both plant and animal-infecting viruses with positive-strand RNA genomes. We summarize the structural features of CPs and describe their modulatory roles in viral translation, RNA-dependent RNA synthesis, and host defense responses. PMID:24074574

  4. Mapping Sequence performed during the STS-117 R-Bar Pitch Maneuver

    NASA Image and Video Library

    2007-06-10

    ISS015-E-11320 (10 June 2007) --- This is one of a series of images, photographed with a digital still camera using an 800mm focal length, featuring the different areas of the Space Shuttle Atlantis as it approached the International Space Station and performed a back-flip to accommodate close scrutiny by eyeballs and cameras. This image shows part of Atlantis' cabin and its docking system, which a short time later was involved in linking up with the orbital outpost. Distance from the station and shuttle at this time was approximately 600 feet.

  5. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    PubMed

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  6. Organization, chromosomal localization and promoter analysis of the gene encoding human acidic fibroblast growth factor intracellular binding protein.

    PubMed Central

    Kolpakova, E; Frengen, E; Stokke, T; Olsnes, S

    2000-01-01

    Acidic fibroblast growth factor (aFGF) intracellular binding protein (FIBP) is a protein found mainly in the nucleus that might be involved in the intracellular function of aFGF. Here we present a comparative analysis of the deduced amino acid sequences of human, murine and Drosophila FIBP analogues and demonstrate that FIBP is an evolutionarily conserved protein. The human gene spans more than 5 kb, comprising ten exons and nine introns, and maps to chromosome 11q13.1. Two slightly different splice variants found in different tissues were isolated and characterized. Sequence analysis of the region surrounding the translation start revealed a CpG island, a classical feature of widely expressed genes. Functional studies of the promoter region with a luciferase reporter system suggested a strong transcriptional activity residing within 600 bp of the 5' flanking region. PMID:11104667

  7. The Sequence and Analysis of Duplication Rich Human Chromosome 16

    DOE R&D Accomplishments Database

    Martin, Joel; Han, Cliff; Gordon, Laurie A.; Terry, Astrid; Prabhakar, Shyam; She, Xinwei; Xie, Gary; Hellsten, Uffe; Man Chan, Yee; Altherr, Michael; Couronne, Olivier; Aerts, Andrea; Bajorek, Eva; Black, Stacey; Blumer, Heather; Branscomb, Elbert; Brown, Nancy C.; Bruno, William J.; Buckingham, Judith M.; Callen, David F.; Campbell, Connie S.; Campbell, Mary L.; Campbell, Evelyn W.; Caoile, Chenier; Challacombe, Jean F.; Chasteen, Leslie A.; Chertkov, Olga; Chi, Han C.; Christensen, Mari; Clark, Lynn M.; Cohn, Judith D.; Denys, Mirian; Detter, John C.; Dickson, Mark; Dimitrijevic-Bussod, Mira; Escobar, Julio; Fawcett, Joseph J.; Flowers, Dave; Fotopulos, Dea; Glavina, Tijana; Gomez, Maria; Gonzales, Eidelyn; Goodstein, David; Goodwin, Lynne A.; Grady, Deborah L.; Grigoriev, Igor; Groza, Matthew; Hammon, Nancy; Hawkins, Trevor; Haydu, Lauren; Hildebrand, Carl E.; Huang, Wayne; Israni, Sanjay; Jett, Jamie; Jewett, Phillip E.; Kadner, Kristen; Kimball, Heather; Kobayashi, Arthur; Krawczyk, Marie-Claude; Leyba, Tina; Longmire, Jonathan L.; Lopez, Frederick; Lou, Yunian; Lowry, Steve; Ludeman, Thom; Mark, Graham A.; Mcmurray, Kimberly L.; Meincke, Linda J.; Morgan, Jenna; Moyzis, Robert K.; Mundt, Mark O.; Munk, A. Christine; Nandkeshwar, Richard D.; Pitluck, Sam; Pollard, Martin; Predki, Paul; Parson-Quintana, Beverly; Ramirez, Lucia; Rash, Sam; Retterer, James; Ricke, Darryl O.; Robinson, Donna L.; Rodriguez, Alex; Salamov, Asaf; Saunders, Elizabeth H.; Scott, Duncan; Shough, Timothy; Stallings, Raymond L.; Stalvey, Malinda; Sutherland, Robert D.; Tapia, Roxanne; Tesmer, Judith G.; Thayer, Nina; Thompson, Linda S.; Tice, Hope; Torney, David C.; Tran-Gyamfi, Mary; Tsai, Ming; Ulanovsky, Levy E.; Ustaszewska, Anna; Vo, Nu; White, P. Scott; Williams, Albert L.; Wills, Patricia L.; Wu, Jung-Rung; Wu, Kevin; Yang, Joan; DeJong, Pieter; Bruce, David; Doggett, Norman; Deaven, Larry; Schmutz, Jeremy; Grimwood, Jane; Richardson, Paul; et al.

    2004-01-01

    We report here the 78,884,754 base pairs of finished human chromosome 16 sequence, representing over 99.9 percent of its euchromatin. Manual annotation revealed 880 protein coding genes confirmed by 1,637 aligned transcripts, 19 tRNA genes, 341 pseudogenes and 3 RNA pseudogenes. These genes include metallothionein, cadherin and iroquois gene families, as well as the disease genes for polycystic kidney disease and acute myelomonocytic leukemia. Several large-scale structural polymorphisms spanning hundreds of kilobasepairs were identified and result in gene content differences across humans. One of the unique features of chromosome 16 is its high level of segmental duplication, ranked among the highest of the human autosomes. While the segmental duplications are enriched in the relatively gene poor pericentromere of the p-arm, some are involved in recent gene duplication and conversion events which are likely to have had an impact on the evolution of primates and human disease susceptibility.

  8. From Sequence and Forces to Structure, Function and Evolution of Intrinsically Disordered Proteins

    PubMed Central

    Forman-Kay, Julie D.; Mittag, Tanja

    2015-01-01

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales and compactness is shaping a unified understanding of structure-dynamics-disorder/function relationships. On the 20th anniversary of this journal, Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional and evolutionary properties. PMID:24010708

  9. From sequence and forces to structure, function, and evolution of intrinsically disordered proteins.

    PubMed

    Forman-Kay, Julie D; Mittag, Tanja

    2013-09-03

    Intrinsically disordered proteins (IDPs), which lack persistent structure, are a challenge to structural biology due to the inapplicability of standard methods for characterization of folded proteins as well as their deviation from the dominant structure/function paradigm. Their widespread presence and involvement in biological function, however, has spurred the growing acceptance of the importance of IDPs and the development of new tools for studying their structure, dynamics, and function. The interplay of folded and disordered domains or regions for function and the existence of a continuum of protein states with respect to conformational energetics, motional timescales, and compactness are shaping a unified understanding of structure-dynamics-disorder/function relationships. In the 20(th) anniversary of Structure, we provide a historical perspective on the investigation of IDPs and summarize the sequence features and physical forces that underlie their unique structural, functional, and evolutionary properties. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. An enhanceosome containing the Jun B/Fra-2 heterodimer and the HMG-I(Y) architectural protein controls HPV 18 transcription.

    PubMed

    Bouallaga, I; Massicard, S; Yaniv, M; Thierry, F

    2000-11-01

    Recent studies have reported new mechanisms that mediate the transcriptional synergy of strong tissue-specific enhancers, involving the cooperative assembly of higher-order nucleoprotein complexes called enhanceosomes. Here we show that the HPV18 enhancer, which controls the epithelial-specific transcription of the E6 and E7 transforming genes, exhibits characteristic features of these structures. We used deletion experiments to show that a core enhancer element cooperates, in a specific helical phasing, with distant essential factors binding to the ends of the enhancer. This core sequence, binding a Jun B/Fra-2 heterodimer, cooperatively recruits the architectural protein HMG-I(Y) in a nucleoprotein complex, where they interact with each other. Therefore, in HeLa cells, HPV18 transcription seems to depend upon the assembly of an enhanceosome containing multiple cellular factors recruited by a core sequence interacting with AP1 and HMG-I(Y).

  11. A septal chromosome segregator protein evolved into a conjugative DNA-translocator protein

    PubMed Central

    Sepulveda, Edgardo; Vogelmann, Jutta

    2011-01-01

    Streptomycetes, Gram-positive soil bacteria well known for the production of antibiotics feature a unique conjugative DNA transfer system. In contrast to classical conjugation which is characterized by the secretion of a pilot protein covalently linked to a single-stranded DNA molecule, in Streptomyces a double-stranded DNA molecule is translocated during conjugative transfer. This transfer involves a single plasmid encoded protein, TraB. A detailed biochemical and biophysical characterization of TraB, revealed a close relationship to FtsK, mediating chromosome segregation during bacterial cell division. TraB translocates plasmid DNA by recognizing 8-bp direct repeats located in a specific plasmid region clt. Similar sequences accidentally also occur on chromosomes and have been shown to be bound by TraB. We suggest that TraB mobilizes chromosomal genes by the interaction with these chromosomal clt-like sequences not relying on the integration of the conjugative plasmid into the chromosome. PMID:22479692

  12. Whole-Exome Sequencing of Congenital Glaucoma Patients Reveals Hypermorphic Variants in GPATCH3, a New Gene Involved in Ocular and Craniofacial Development

    PubMed Central

    Ferre-Fernández, Jesús-José; Aroca-Aguilar, José-Daniel; Medina-Trillo, Cristina; Bonet-Fernández, Juan-Manuel; Méndez-Hernández, Carmen-Dora; Morales-Fernández, Laura; Corton, Marta; Cabañero-Valera, María-José; Gut, Marta; Tonda, Raul; Ayuso, Carmen; Coca-Prados, Miguel; García-Feijoo, Julián; Escribano, Julio

    2017-01-01

    Congenital glaucoma (CG) is a heterogeneous, inherited and severe optical neuropathy that originates from maldevelopment of the anterior segment of the eye. To identify new disease genes, we performed whole-exome sequencing of 26 unrelated CG patients. In one patient we identified two rare, recessive and hypermorphic coding variants in GPATCH3, a gene of unidentified function, and 5% of a second group of 170 unrelated CG patients carried rare variants in this gene. The recombinant GPATCH3 protein activated in vitro the proximal promoter of CXCR4, a gene involved in embryo neural crest cell migration. The GPATCH3 protein was detected in human tissues relevant to glaucoma (e.g., ciliary body). This gene was expressed in the dermis, skeletal muscles, periocular mesenchymal-like cells and corneal endothelium of early zebrafish embryos. Morpholino-mediated knockdown and transient overexpression of gpatch3 led to varying degrees of goniodysgenesis and ocular and craniofacial abnormalities, recapitulating some of the features of zebrafish embryos deficient in the glaucoma-related genes pitx2 and foxc1. In conclusion, our data suggest the existence of high genetic heterogeneity in CG and provide evidence for the role of GPATCH3 in this disease. We also show that GPATCH3 is a new gene involved in ocular and craniofacial development. PMID:28397860

  13. Predicting Protein-Protein Interactions by Combing Various Sequence-Derived.

    PubMed

    Zhao, Xiao-Wei; Ma, Zhi-Qiang; Yin, Ming-Hao

    2011-09-20

    Knowledge of protein-protein interactions (PPIs) plays an important role in constructing protein interaction networks and understanding the general machineries of biological systems. In this study, a new method is proposed to predict PPIs using a comprehensive set of 930 features based only on sequence information, these features measure the interactions between residues a certain distant apart in the protein sequences from different aspects. To achieve better performance, the principal component analysis (PCA) is first employed to obtain an optimized feature subset. Then, the resulting 67-dimensional feature vectors are fed to Support Vector Machine (SVM). Experimental results on Drosophila melanogaster and Helicobater pylori datasets show that our method is very promising to predict PPIs and may at least be a useful supplement tool to existing methods.

  14. Preparation of Low-Input and Ligation-Free ChIP-seq Libraries Using Template-Switching Technology.

    PubMed

    Bolduc, Nathalie; Lehman, Alisa P; Farmer, Andrew

    2016-10-10

    Chromatin immunoprecipitation (ChIP) followed by high-throughput sequencing (ChIP-seq) has become the gold standard for mapping of transcription factors and histone modifications throughout the genome. However, for ChIP experiments involving few cells or targeting low-abundance transcription factors, the small amount of DNA recovered makes ligation of adapters very challenging. In this unit, we describe a ChIP-seq workflow that can be applied to small cell numbers, including a robust single-tube and ligation-free method for preparation of sequencing libraries from sub-nanogram amounts of ChIP DNA. An example ChIP protocol is first presented, resulting in selective enrichment of DNA-binding proteins and cross-linked DNA fragments immobilized on beads via an antibody bridge. This is followed by a protocol for fast and easy cross-linking reversal and DNA recovery. Finally, we describe a fast, ligation-free library preparation protocol, featuring DNA SMART technology, resulting in samples ready for Illumina sequencing. © 2016 by John Wiley & Sons, Inc. Copyright © 2016 John Wiley & Sons, Inc.

  15. Multilocus enzyme electrophoresis and cytochrome B gene sequencing-based identification of Leishmania isolates from different foci of cutaneous leishmaniasis in Pakistan.

    PubMed

    Marco, Jorge D; Bhutto, Abdul M; Soomro, Farooq R; Baloch, Javed H; Barroso, Paola A; Kato, Hirotomo; Uezato, Hiroshi; Katakura, Ken; Korenaga, Masataka; Nonaka, Shigeo; Hashiguchi, Yoshihisa

    2006-08-01

    Seventeen Leishmania stocks isolated from cutaneous lesions of Pakistani patients were studied by multilocus enzyme electrophoresis and by polymerase chain reaction amplification and sequencing of the cytochrome b (Cyt b) gene. Eleven stocks that expressed nine zymodemes were assigned to L. (Leishmania) major. All of them were isolated from patients in the lowlands of Larkana district and Sibi city in Sindh and Balochistan provinces, respectively. The remaining six, distributed in two zymodemes (five and one), isolated from the highland of Quetta city, Balochistan, were identified as L. (L.) tropica. The same result at species level was obtained by the Cyt b sequencing for all the stocks examined. No clear-cut association between the clinical features (wet or dry type lesions) and the Leishmania species involved was found. Leishmania (L.) major was highly polymorphic compared with L. (L.) tropica. This difference may be explained by the fact that humans may act as a sole reservoir of L. (L.) tropica in anthroponotic cycles; however, many wild mammals can be reservoirs of L. (L.) major in zoonotic cycles.

  16. Oligo kernels for datamining on biological sequences: a case study on prokaryotic translation initiation sites

    PubMed Central

    Meinicke, Peter; Tech, Maike; Morgenstern, Burkhard; Merkl, Rainer

    2004-01-01

    Background Kernel-based learning algorithms are among the most advanced machine learning methods and have been successfully applied to a variety of sequence classification tasks within the field of bioinformatics. Conventional kernels utilized so far do not provide an easy interpretation of the learnt representations in terms of positional and compositional variability of the underlying biological signals. Results We propose a kernel-based approach to datamining on biological sequences. With our method it is possible to model and analyze positional variability of oligomers of any length in a natural way. On one hand this is achieved by mapping the sequences to an intuitive but high-dimensional feature space, well-suited for interpretation of the learnt models. On the other hand, by means of the kernel trick we can provide a general learning algorithm for that high-dimensional representation because all required statistics can be computed without performing an explicit feature space mapping of the sequences. By introducing a kernel parameter that controls the degree of position-dependency, our feature space representation can be tailored to the characteristics of the biological problem at hand. A regularized learning scheme enables application even to biological problems for which only small sets of example sequences are available. Our approach includes a visualization method for transparent representation of characteristic sequence features. Thereby importance of features can be measured in terms of discriminative strength with respect to classification of the underlying sequences. To demonstrate and validate our concept on a biochemically well-defined case, we analyze E. coli translation initiation sites in order to show that we can find biologically relevant signals. For that case, our results clearly show that the Shine-Dalgarno sequence is the most important signal upstream a start codon. The variability in position and composition we found for that signal is in accordance with previous biological knowledge. We also find evidence for signals downstream of the start codon, previously introduced as transcriptional enhancers. These signals are mainly characterized by occurrences of adenine in a region of about 4 nucleotides next to the start codon. Conclusions We showed that the oligo kernel can provide a valuable tool for the analysis of relevant signals in biological sequences. In the case of translation initiation sites we could clearly deduce the most discriminative motifs and their positional variation from example sequences. Attractive features of our approach are its flexibility with respect to oligomer length and position conservation. By means of these two parameters oligo kernels can easily be adapted to different biological problems. PMID:15511290

  17. Autosomal dominant cerebellar ataxia with retinal degeneration (ADCA II): clinical and neuropathological findings in two pedigrees and genetic linkage to 3p12-p21.1.

    PubMed Central

    Jöbsis, G J; Weber, J W; Barth, P G; Keizers, H; Baas, F; van Schooneveld, M J; van Hilten, J J; Troost, D; Geesink, H H; Bolhuis, P A

    1997-01-01

    OBJECTIVES: To investigate relations between clinical and neuropathological features and age of onset, presence of anticipation, and genetic linkage in autosomal dominant cerebellar ataxia type II (ADCA II). METHODS: The natural history of ADCA II was studied on the basis of clinical and neuropathological findings in two pedigrees and genetic linkage studies were carried out with polymorphic DNA markers in the largest, four generation, pedigree. RESULTS: Ataxia was constant in all age groups. Retinal degeneration with early extinction of the electroretinogram constituted an important component in juvenile and early adult (< 25 years) onset but was variable in late adult presentation. Neuromuscular involvement due to spinal anterior horn disease was an important contributing factor to illness in juvenile cases. Postmortem findings in four patients confirm the general neurodegenerative nature of the disease, which includes prominent spinal anterior horn involvement and widespread involvement of grey and white matter. Genetic linkage was found with markers to chromosome 3p12-p21.1 (maximum pairwise lod score 4.42 at D3S1285). CONCLUSIONS: The sequence of clinical involvement seems related to age at onset. Retinal degeneration is variable in late onset patients and neuromuscular features are important in patients with early onset. Strong anticipation was found in subsequent generations. Linkage of ADCA II to chromosome 3p12-p21.1 is confirmed. Images PMID:9120450

  18. Anonymization of electronic medical records for validating genome-wide association studies

    PubMed Central

    Loukides, Grigorios; Gkoulalas-Divanis, Aris; Malin, Bradley

    2010-01-01

    Genome-wide association studies (GWAS) facilitate the discovery of genotype–phenotype relations from population-based sequence databases, which is an integral facet of personalized medicine. The increasing adoption of electronic medical records allows large amounts of patients’ standardized clinical features to be combined with the genomic sequences of these patients and shared to support validation of GWAS findings and to enable novel discoveries. However, disseminating these data “as is” may lead to patient reidentification when genomic sequences are linked to resources that contain the corresponding patients’ identity information based on standardized clinical features. This work proposes an approach that provably prevents this type of data linkage and furnishes a result that helps support GWAS. Our approach automatically extracts potentially linkable clinical features and modifies them in a way that they can no longer be used to link a genomic sequence to a small number of patients, while preserving the associations between genomic sequences and specific sets of clinical features corresponding to GWAS-related diseases. Extensive experiments with real patient data derived from the Vanderbilt's University Medical Center verify that our approach generates data that eliminate the threat of individual reidentification, while supporting GWAS validation and clinical case analysis tasks. PMID:20385806

  19. Cascade detection for the extraction of localized sequence features; specificity results for HIV-1 protease and structure-function results for the Schellman loop.

    PubMed

    Newell, Nicholas E

    2011-12-15

    The extraction of the set of features most relevant to function from classified biological sequence sets is still a challenging problem. A central issue is the determination of expected counts for higher order features so that artifact features may be screened. Cascade detection (CD), a new algorithm for the extraction of localized features from sequence sets, is introduced. CD is a natural extension of the proportional modeling techniques used in contingency table analysis into the domain of feature detection. The algorithm is successfully tested on synthetic data and then applied to feature detection problems from two different domains to demonstrate its broad utility. An analysis of HIV-1 protease specificity reveals patterns of strong first-order features that group hydrophobic residues by side chain geometry and exhibit substantial symmetry about the cleavage site. Higher order results suggest that favorable cooperativity is weak by comparison and broadly distributed, but indicate possible synergies between negative charge and hydrophobicity in the substrate. Structure-function results for the Schellman loop, a helix-capping motif in proteins, contain strong first-order features and also show statistically significant cooperativities that provide new insights into the design of the motif. These include a new 'hydrophobic staple' and multiple amphipathic and electrostatic pair features. CD should prove useful not only for sequence analysis, but also for the detection of multifactor synergies in cross-classified data from clinical studies or other sources. Windows XP/7 application and data files available at: https://sites.google.com/site/cascadedetect/home. nacnewell@comcast.net Supplementary information is available at Bioinformatics online.

  20. Toward a model for lexical access based on acoustic landmarks and distinctive features

    NASA Astrophysics Data System (ADS)

    Stevens, Kenneth N.

    2002-04-01

    This article describes a model in which the acoustic speech signal is processed to yield a discrete representation of the speech stream in terms of a sequence of segments, each of which is described by a set (or bundle) of binary distinctive features. These distinctive features specify the phonemic contrasts that are used in the language, such that a change in the value of a feature can potentially generate a new word. This model is a part of a more general model that derives a word sequence from this feature representation, the words being represented in a lexicon by sequences of feature bundles. The processing of the signal proceeds in three steps: (1) Detection of peaks, valleys, and discontinuities in particular frequency ranges of the signal leads to identification of acoustic landmarks. The type of landmark provides evidence for a subset of distinctive features called articulator-free features (e.g., [vowel], [consonant], [continuant]). (2) Acoustic parameters are derived from the signal near the landmarks to provide evidence for the actions of particular articulators, and acoustic cues are extracted by sampling selected attributes of these parameters in these regions. The selection of cues that are extracted depends on the type of landmark and on the environment in which it occurs. (3) The cues obtained in step (2) are combined, taking context into account, to provide estimates of ``articulator-bound'' features associated with each landmark (e.g., [lips], [high], [nasal]). These articulator-bound features, combined with the articulator-free features in (1), constitute the sequence of feature bundles that forms the output of the model. Examples of cues that are used, and justification for this selection, are given, as well as examples of the process of inferring the underlying features for a segment when there is variability in the signal due to enhancement gestures (recruited by a speaker to make a contrast more salient) or due to overlap of gestures from neighboring segments.

  1. Evaluating, Comparing, and Interpreting Protein Domain Hierarchies

    PubMed Central

    2014-01-01

    Abstract Arranging protein domain sequences hierarchically into evolutionarily divergent subgroups is important for investigating evolutionary history, for speeding up web-based similarity searches, for identifying sequence determinants of protein function, and for genome annotation. However, whether or not a particular hierarchy is optimal is often unclear, and independently constructed hierarchies for the same domain can often differ significantly. This article describes methods for statistically evaluating specific aspects of a hierarchy, for probing the criteria underlying its construction and for direct comparisons between hierarchies. Information theoretical notions are used to quantify the contributions of specific hierarchical features to the underlying statistical model. Such features include subhierarchies, sequence subgroups, individual sequences, and subgroup-associated signature patterns. Underlying properties are graphically displayed in plots of each specific feature's contributions, in heat maps of pattern residue conservation, in “contrast alignments,” and through cross-mapping of subgroups between hierarchies. Together, these approaches provide a deeper understanding of protein domain functional divergence, reveal uncertainties caused by inconsistent patterns of sequence conservation, and help resolve conflicts between competing hierarchies. PMID:24559108

  2. Well-characterized sequence features of eukaryote genomes and implications for ab initio gene prediction.

    PubMed

    Huang, Ying; Chen, Shi-Yi; Deng, Feilong

    2016-01-01

    In silico analysis of DNA sequences is an important area of computational biology in the post-genomic era. Over the past two decades, computational approaches for ab initio prediction of gene structure from genome sequence alone have largely facilitated our understanding on a variety of biological questions. Although the computational prediction of protein-coding genes has already been well-established, we are also facing challenges to robustly find the non-coding RNA genes, such as miRNA and lncRNA. Two main aspects of ab initio gene prediction include the computed values for describing sequence features and used algorithm for training the discriminant function, and by which different combinations are employed into various bioinformatic tools. Herein, we briefly review these well-characterized sequence features in eukaryote genomes and applications to ab initio gene prediction. The main purpose of this article is to provide an overview to beginners who aim to develop the related bioinformatic tools.

  3. microRNA-122 target sites in the hepatitis C virus RNA NS5B coding region and 3' untranslated region: function in replication and influence of RNA secondary structure.

    PubMed

    Gerresheim, Gesche K; Dünnes, Nadia; Nieder-Röhrmann, Anika; Shalamova, Lyudmila A; Fricke, Markus; Hofacker, Ivo; Höner Zu Siederdissen, Christian; Marz, Manja; Niepmann, Michael

    2017-02-01

    We have analyzed the binding of the liver-specific microRNA-122 (miR-122) to three conserved target sites of hepatitis C virus (HCV) RNA, two in the non-structural protein 5B (NS5B) coding region and one in the 3' untranslated region (3'UTR). miR-122 binding efficiency strongly depends on target site accessibility under conditions when the range of flanking sequences available for the formation of local RNA secondary structures changes. Our results indicate that the particular sequence feature that contributes most to the correlation between target site accessibility and binding strength varies between different target sites. This suggests that the dynamics of miRNA/Ago2 binding not only depends on the target site itself but also on flanking sequence context to a considerable extent, in particular in a small viral genome in which strong selection constraints act on coding sequence and overlapping cis-signals and model the accessibility of cis-signals. In full-length genomes, single and combination mutations in the miR-122 target sites reveal that site 5B.2 is positively involved in regulating overall genome replication efficiency, whereas mutation of site 5B.3 showed a weaker effect. Mutation of the 3'UTR site and double or triple mutants showed no significant overall effect on genome replication, whereas in a translation reporter RNA, the 3'UTR target site inhibits translation directed by the HCV 5'UTR. Thus, the miR-122 target sites in the 3'-region of the HCV genome are involved in a complex interplay in regulating different steps of the HCV replication cycle.

  4. Genomic characterization of a Helicobacter pylori isolate from a patient with gastric cancer in China

    PubMed Central

    2014-01-01

    Background Helicobacter pylori is well known for its relationship with the occurrence of several severe gastric diseases. The mechanisms of pathogenesis triggered by H. pylori are less well known. In this study, we report the genome sequence and genomic characterizations of H. pylori strain HLJ039 that was isolated from a patient with gastric cancer in the Chinese province of Heilongjiang, where there is a high incidence of gastric cancer. To investigate potential genomic features that may be involved in pathogenesis of carcinoma, the genome was compared to three previously sequenced genomes in this area. Result We obtained 42 contigs with a total length of 1,611,192 bp and predicted 1,687 coding sequences. Compared to strains isolated from gastritis and ulcers in this area, 10 different regions were identified as being unique for HLJ039; they mainly encoded type II restriction-modification enzyme, type II m6A methylase, DNA-cytosine methyltransferase, DNA methylase, and hypothetical proteins. A unique 547-bp fragment sharing 93% identity with a hypothetical protein of Helicobacter cinaedi ATCC BAA-847 was not present in any other previous H. pylori strains. Phylogenetic analysis based on core genome single nucleotide polymorphisms shows that HLJ039 is defined as hspEAsia subgroup, which belongs to the hpEastAsia group. Conclusion DNA methylations, variations of the genomic regions involved in restriction and modification systems, are the “hot” regions that may be related to the mechanism of H. pylori-induced gastric cancer. The genome sequence will provide useful information for the deep mining of potential mechanisms related to East Asian gastric cancer. PMID:24565107

  5. The tmRNA website

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Hudson, Corey M.; Williams, Kelly P.

    We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less

  6. The tmRNA website

    DOE PAGES

    Hudson, Corey M.; Williams, Kelly P.

    2014-11-05

    We report that the transfer-messenger RNA (tmRNA) and its partner protein SmpB act together in resolving problems arising when translating bacterial ribosomes reach the end of mRNA with no stop codon. Their genes have been found in nearly all bacterial genomes and in some organelles. The tmRNA Website serves tmRNA sequences, alignments and feature annotations, and has recently moved to http: //bioinformatics.sandia.gov/tmrna/. New features include software used to find the sequences, an update raising the number of unique tmRNA sequences from 492 to 1716, and a database of SmpB sequences which are served along with the tmRNA sequence from themore » same organism.« less

  7. A classification of morphoseismic features in the New Madrid seismic zone

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Knox, R.; Stewart, D.

    1993-03-01

    The New Madrid Seismic Zone (NMSZ) contains thousands of surface features distributed over 5,000 square miles in four states. These are attributable to some combination of (1) seismically-induced liquefaction (SIL), (2) secondary deformation, and (3) seismically-induced slope failures. Most of these features were produced by the 1811--12 series of great earthquakes, but some predate and some postdate 1811--12. Subsequent non-seismic factors, such as hydrologically-induced liquefaction (HIL), mechanically-induced liquefaction (MIL), human activities, mass wasting, eolian and fluvial processes have modified all of these features. Morphoseismic features are new landforms produced by earthquakes, or are pre-existing landforms modified by them. Involved aremore » complex interrelationships among several variables, including: (1) intensity and duration of seismic ground motion, (2) surface wave harmonics, (3) depth to water table, (4) depth to basement, (5) particle size, composition, and sorting of sediment making up the liquefied (LZ) and non-liquefied zones (NLZ), (6) topographic parameters, and (7) attitudes of beds and lenses susceptible to liquefaction. Morphoseismic features are depicted as results of a time-flow sequence initiated by primary basement disturbances which produce three major categories of surface response: secondary deformation, liquefaction and slope failure. Nine subcategories incorporate features produced by or resulting in: extruded sand, intruded sand, lateral spreading, faulting, subsidence of large areas, uplift of large areas, altered streams, coherent landslides, and incoherent landslides. The total morphoseismic features identified by this classification are 34 in number.« less

  8. Classifying transcription factor targets and discovering relevant biological features

    PubMed Central

    Holloway, Dustin T; Kon, Mark; DeLisi, Charles

    2008-01-01

    Background An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties. Principal Findings (1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4). (2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression. (3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties. (4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter. Conclusion Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite. Reviewers This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor. PMID:18513408

  9. Applications of alignment-free methods in epigenomics.

    PubMed

    Pinello, Luca; Lo Bosco, Giosuè; Yuan, Guo-Cheng

    2014-05-01

    Epigenetic mechanisms play an important role in the regulation of cell type-specific gene activities, yet how epigenetic patterns are established and maintained remains poorly understood. Recent studies have supported a role of DNA sequences in recruitment of epigenetic regulators. Alignment-free methods have been applied to identify distinct sequence features that are associated with epigenetic patterns and to predict epigenomic profiles. Here, we review recent advances in such applications, including the methods to map DNA sequence to feature space, sequence comparison and prediction models. Computational studies using these methods have provided important insights into the epigenetic regulatory mechanisms.

  10. Utility of late gadolinium enhancement in pediatric cardiac MRI.

    PubMed

    Etesami, Maryam; Gilkeson, Robert C; Rajiah, Prabhakar

    2016-07-01

    Late gadolinium enhancement (LGE) cardiac magnetic resonance (MR) imaging sequence is increasingly used in the evaluation of pediatric cardiovascular disorders, and although LGE might be a normal feature at the sites of previous surgeries, it is pathologically seen as a result of extracellular space expansion, either from acute cell damage or chronic scarring or fibrosis. LGE is broadly divided into ischemic and non-ischemic patterns. LGE caused by myocardial infarction occurs in a vascular distribution and always involves the subendocardial portion, progressively involving the outer regions in a waveform pattern. Non-ischemic cardiomyopathies can have a mid-myocardial (either linear or patchy), subepicardial or diffuse subendocardial distribution. Idiopathic dilated cardiomyopathy can have a linear mid-myocardial pattern, while hypertrophic cardiomyopathy can have fine, patchy enhancement in hypertrophied and non-hypertrophied segments as well as right ventricular insertion points. Myocarditis and sarcoidosis have a mid-myocardial or subepicardial pattern of LGE. Fabry disease typically affects the basal inferolateral segment while Danon disease typically spares the septum. Pericarditis is characterized by diffuse or focal pericardial thickening and enhancement. Thrombus, the most common non-neoplastic cardiac mass, is characterized by absence of enhancement in all sequences, while neoplastic masses show at least some contrast enhancement, depending on the pathology. Regardless of the etiology, presence of LGE is associated with a poor prognosis. In this review, we describe the technical modifications required for performing LGE cardiac MR sequence in children, review and illustrate the patterns of LGE in children, and discuss their clinical significance.

  11. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2's q2-feature-classifier plugin.

    PubMed

    Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J

    2018-05-17

    Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.

  12. PathogenFinder--distinguishing friend from foe using bacterial whole genome sequence data.

    PubMed

    Cosentino, Salvatore; Voldby Larsen, Mette; Møller Aarestrup, Frank; Lund, Ole

    2013-01-01

    Although the majority of bacteria are harmless or even beneficial to their host, others are highly virulent and can cause serious diseases, and even death. Due to the constantly decreasing cost of high-throughput sequencing there are now many completely sequenced genomes available from both human pathogenic and innocuous strains. The data can be used to identify gene families that correlate with pathogenicity and to develop tools to predict the pathogenicity of newly sequenced strains, investigations that previously were mainly done by means of more expensive and time consuming experimental approaches. We describe PathogenFinder (http://cge.cbs.dtu.dk/services/PathogenFinder/), a web-server for the prediction of bacterial pathogenicity by analysing the input proteome, genome, or raw reads provided by the user. The method relies on groups of proteins, created without regard to their annotated function or known involvement in pathogenicity. The method has been built to work with all taxonomic groups of bacteria and using the entire training-set, achieved an accuracy of 88.6% on an independent test-set, by correctly classifying 398 out of 449 completely sequenced bacteria. The approach here proposed is not biased on sets of genes known to be associated with pathogenicity, thus the approach could aid the discovery of novel pathogenicity factors. Furthermore the pathogenicity prediction web-server could be used to isolate the potential pathogenic features of both known and unknown strains.

  13. Evolutionarily conserved regions and hydrophobic contacts at the superfamily level: The case of the fold-type I, pyridoxal-5′-phosphate-dependent enzymes

    PubMed Central

    Paiardini, Alessandro; Bossa, Francesco; Pascarella, Stefano

    2004-01-01

    The wealth of biological information provided by structural and genomic projects opens new prospects of understanding life and evolution at the molecular level. In this work, it is shown how computational approaches can be exploited to pinpoint protein structural features that remain invariant upon long evolutionary periods in the fold-type I, PLP-dependent enzymes. A nonredundant set of 23 superposed crystallographic structures belonging to this superfamily was built. Members of this family typically display high-structural conservation despite low-sequence identity. For each structure, a multiple-sequence alignment of orthologous sequences was obtained, and the 23 alignments were merged using the structural information to obtain a comprehensive multiple alignment of 921 sequences of fold-type I enzymes. The structurally conserved regions (SCRs), the evolutionarily conserved residues, and the conserved hydrophobic contacts (CHCs) were extracted from this data set, using both sequence and structural information. The results of this study identified a structural pattern of hydrophobic contacts shared by all of the superfamily members of fold-type I enzymes and involved in native interactions. This profile highlights the presence of a nucleus for this fold, in which residues participating in the most conserved native interactions exhibit preferential evolutionary conservation, that correlates significantly (r = 0.70) with the extent of mean hydrophobic contact value of their apolar fraction. PMID:15498941

  14. Trading genes along the silk road: mtDNA sequences and the origin of central Asian populations.

    PubMed Central

    Comas, D; Calafell, F; Mateu, E; Pérez-Lezaun, A; Bosch, E; Martínez-Arias, R; Clarimon, J; Facchini, F; Fiori, G; Luiselli, D; Pettener, D; Bertranpetit, J

    1998-01-01

    Central Asia is a vast region at the crossroads of different habitats, cultures, and trade routes. Little is known about the genetics and the history of the population of this region. We present the analysis of mtDNA control-region sequences in samples of the Kazakh, the Uighurs, the lowland Kirghiz, and the highland Kirghiz, which we have used to address both the population history of the region and the possible selective pressures that high altitude has on mtDNA genes. Central Asian mtDNA sequences present features intermediate between European and eastern Asian sequences, in several parameters-such as the frequencies of certain nucleotides, the levels of nucleotide diversity, mean pairwise differences, and genetic distances. Several hypotheses could explain the intermediate position of central Asia between Europe and eastern Asia, but the most plausible would involve extensive levels of admixture between Europeans and eastern Asians in central Asia, possibly enhanced during the Silk Road trade and clearly after the eastern and western Eurasian human groups had diverged. Lowland and highland Kirghiz mtDNA sequences are very similar, and the analysis of molecular variance has revealed that the fraction of mitochondrial genetic variance due to altitude is not significantly different from zero. Thus, it seems unlikely that altitude has exerted a major selective pressure on mitochondrial genes in central Asian populations. PMID:9837835

  15. The 28S–18S rDNA intergenic spacer from Crithidia fasciculata: repeated sequences, length heterogeneity, putative processing sites and potential interactions between U3 small nucleolar RNA and the ribosomal RNA precursor

    PubMed Central

    Schnare, Murray N.; Collings, James C.; Spencer, David F.; Gray, Michael W.

    2000-01-01

    In Crithidia fasciculata, the ribosomal RNA (rRNA) gene repeats range in size from ∼11 to 12 kb. This length heterogeneity is localized to a region of the intergenic spacer (IGS) that contains tandemly repeated copies of a 19mer sequence. The IGS also contains four copies of an ∼55 nt repeat that has an internal inverted repeat and is also present in the IGS of Leishmania species. We have mapped the C.fasciculata transcription initiation site as well as two other reverse transcriptase stop sites that may be analogous to the A0 and A′ pre-rRNA processing sites within the 5′ external transcribed spacer (ETS) of other eukaryotes. Features that could influence processing at these sites include two stretches of conserved primary sequence and three secondary structure elements present in the 5′ ETS. We also characterized the C.fasciculata U3 snoRNA, which has the potential for base-pairing with pre-rRNA sequences. Finally, we demonstrate that biosynthesis of large subunit rRNA in both C.fasciculata and Trypanosoma brucei involves 3′-terminal addition of three A residues that are not present in the corresponding DNA sequences. PMID:10982863

  16. DNA motifs associated with aberrant CpG island methylation.

    PubMed

    Feltus, F Alex; Lee, Eva K; Costello, Joseph F; Plass, Christoph; Vertino, Paula M

    2006-05-01

    Epigenetic silencing involving the aberrant methylation of promoter region CpG islands is widely recognized as a tumor suppressor silencing mechanism in cancer. However, the molecular pathways underlying aberrant DNA methylation remain elusive. Recently we showed that, on a genome-wide level, CpG island loci differ in their intrinsic susceptibility to aberrant methylation and that this susceptibility can be predicted based on underlying sequence context. These data suggest that there are sequence/structural features that contribute to the protection from or susceptibility to aberrant methylation. Here we use motif elicitation coupled with classification techniques to identify DNA sequence motifs that selectively define methylation-prone or methylation-resistant CpG islands. Motifs common to 28 methylation-prone or 47 methylation-resistant CpG island-containing genomic fragments were determined using the MEME and MAST algorithms (). The five most discriminatory motifs derived from methylation-prone sequences were found to be associated with CpG islands in general and were nonrandomly distributed throughout the genome. In contrast, the eight most discriminatory motifs derived from the methylation-resistant CpG islands were randomly distributed throughout the genome. Interestingly, this latter group tended to associate with Alu and other repetitive sequences. Used together, the frequency of occurrence of these motifs successfully discriminated methylation-prone and methylation-resistant CpG island groups with an accuracy of 87% after 10-fold cross-validation. The motifs identified here are candidate methylation-targeting or methylation-protection DNA sequences.

  17. Nucleotide sequence of the L1 ribosomal protein gene of Xenopus laevis: remarkable sequence homology among introns.

    PubMed Central

    Loreni, F; Ruberti, I; Bozzoni, I; Pierandrei-Amaldi, P; Amaldi, F

    1985-01-01

    Ribosomal protein L1 is encoded by two genes in Xenopus laevis. The comparison of two cDNA sequences shows that the two L1 gene copies (L1a and L1b) have diverged in many silent sites and very few substitution sites; moreover a small duplication occurred at the very end of the coding region of the L1b gene which thus codes for a product five amino acids longer than that coded by L1a. Quantitatively the divergence between the two L1 genes confirms that a whole genome duplication took place in Xenopus laevis approximately 30 million years ago. A genomic fragment containing one of the two L1 gene copies (L1a), with its nine introns and flanking regions, has been completely sequenced. The 5' end of this gene has been mapped within a 20-pyridimine stretch as already found for other vertebrate ribosomal protein genes. Four of the nine introns have a 60-nucleotide sequence with 80% homology; within this region some boxes, one of which is 16 nucleotides long, are 100% homologous among the four introns. This feature of L1a gene introns is interesting since we have previously shown that the activity of this gene is regulated at a post-transcriptional level and it involves the block of the normal splicing of some intron sequences. Images Fig. 3. Fig. 5. PMID:3841512

  18. Effective gene prediction by high resolution frequency estimator based on least-norm solution technique

    PubMed Central

    2014-01-01

    Linear algebraic concept of subspace plays a significant role in the recent techniques of spectrum estimation. In this article, the authors have utilized the noise subspace concept for finding hidden periodicities in DNA sequence. With the vast growth of genomic sequences, the demand to identify accurately the protein-coding regions in DNA is increasingly rising. Several techniques of DNA feature extraction which involves various cross fields have come up in the recent past, among which application of digital signal processing tools is of prime importance. It is known that coding segments have a 3-base periodicity, while non-coding regions do not have this unique feature. One of the most important spectrum analysis techniques based on the concept of subspace is the least-norm method. The least-norm estimator developed in this paper shows sharp period-3 peaks in coding regions completely eliminating background noise. Comparison of proposed method with existing sliding discrete Fourier transform (SDFT) method popularly known as modified periodogram method has been drawn on several genes from various organisms and the results show that the proposed method has better as well as an effective approach towards gene prediction. Resolution, quality factor, sensitivity, specificity, miss rate, and wrong rate are used to establish superiority of least-norm gene prediction method over existing method. PMID:24386895

  19. Cloning of a newly identified heart-specific troponin I isoform, which lacks the troponin T binding portion, using the yeast hybrid system.

    PubMed

    Suzuki, Hideaki; Arakawa, Yasuhiro; Ito, Masaki; Yamada, Hisashi; Horiguchi-Yamada, Junko

    2006-01-01

    To elucidate the molecular pathogenesis behind increased levels of laminin in cardiac muscle cells in cardiomyopathy by using a yeast hybrid screen. The present study reports the cloning of a newly identified heart-specific troponin I isoform, which is putatively linked to laminin. Future studies will explore the functional significance of this connection. Yeast two-hybrid screen analysis was performed using MLF1-interacting protein (amino acids 1 to 318) as bait. The human heart complementary DNA library was screened by using the yeast-mating method for overnight culture. Two final positive clones from the heart library were isolated. These two clones encoded the same protein, a short isoform of human cardiac troponin I (TnI) that lacked TnI exons 5 and 6. The TnI isoform has a heart-specific expression pattern and it shares several sequence features with human cardiac TnI; however, it lacks the troponin T binding portion. The heart-specific segment of the human cardiac TnI isoform shares several sequence features with human cardiac TnI, but it lacks the troponin T binding portion. These results suggest that the heart-specific TnI isoform may be involved in cardiac development and disease.

  20. Cloning of a newly identified heart-specific troponin I isoform, which lacks the troponin T binding portion, using the yeast hybrid system

    PubMed Central

    Suzuki, Hideaki; Arakawa, Yasuhiro; Ito, Masaki; Yamada, Hisashi; Horiguchi-Yamada, Junko

    2006-01-01

    OBJECTIVE To elucidate the molecular pathogenesis behind increased levels of laminin in cardiac muscle cells in cardiomyopathy by using a yeast hybrid screen. The present study reports the cloning of a newly identified heart-specific troponin I isoform, which is putatively linked to laminin. Future studies will explore the functional significance of this connection. METHODS Yeast two-hybrid screen analysis was performed using MLF1-interacting protein (amino acids 1 to 318) as bait. The human heart complementary DNA library was screened by using the yeast-mating method for overnight culture. RESULTS Two final positive clones from the heart library were isolated. These two clones encoded the same protein, a short isoform of human cardiac troponin I (TnI) that lacked TnI exons 5 and 6. The TnI isoform has a heart-specific expression pattern and it shares several sequence features with human cardiac TnI; however, it lacks the troponin T binding portion. CONCLUSION The heart-specific segment of the human cardiac TnI isoform shares several sequence features with human cardiac TnI, but it lacks the troponin T binding portion. These results suggest that the heart-specific TnI isoform may be involved in cardiac development and disease. PMID:18651010

  1. Diversity, virulence, and antimicrobial resistance of the KPC-producing Klebsiella pneumoniae ST307 clone.

    PubMed

    Villa, Laura; Feudi, Claudia; Fortini, Daniela; Brisse, Sylvain; Passet, Virginie; Bonura, Celestino; Endimiani, Andrea; Mammina, Caterina; Ocampo, Ana Maria; Jimenez, Judy Natalia; Doumith, Michel; Woodford, Neil; Hopkins, Katie; Carattoli, Alessandra

    2017-04-01

    The global spread of Klebsiella pneumoniae producing Klebsiella pneumoniae carbapenemase (KPC) has been mainly associated with the dissemination of high-risk clones. In the last decade, hospital outbreaks involving KPC-producing K. pneumoniae have been predominantly attributed to isolates belonging to clonal group (CG) 258. However, results of recent epidemiological analysis indicate that KPC-producing sequence type (ST) 307, is emerging in different parts of the world and is a candidate to become a prevalent high-risk clone in the near future. Here we show that the ST307 genome encodes genetic features that may provide an advantage in adaptation to the hospital environment and the human host. Sequence analysis revealed novel plasmid-located virulence factors, including a cluster for glycogen synthesis. Glycogen production is considered to be one of the possible adaptive responses to long-term survival and growth in environments outside the host. Chromosomally-encoded virulence traits in the clone comprised fimbriae, an integrative conjugative element carrying the yersiniabactin siderophore, and two different capsular loci. Compared with the ST258 clone, capsulated ST307 isolates showed higher resistance to complement-mediated killing. The acquired genetic features identified in the genome of this new emerging clone may contribute to increased persistence of ST307 in the hospital environment and shed light on its potential epidemiological success.

  2. Complete Genome Sequence of the Cystic Fibrosis Pathogen Achromobacter xylosoxidans NH44784-1996 Complies with Important Pathogenic Phenotypes

    PubMed Central

    Jakobsen, Tim Holm; Hansen, Martin Asser; Jensen, Peter Østrup; Hansen, Lars; Riber, Leise; Cockburn, April; Kolpen, Mette; Rønne Hansen, Christine; Ridderberg, Winnie; Eickhardt, Steffen; Hansen, Marlene; Kerpedjiev, Peter; Alhede, Morten; Qvortrup, Klaus; Burmølle, Mette; Moser, Claus; Kühl, Michael; Ciofu, Oana; Givskov, Michael; Sørensen, Søren J.; Høiby, Niels; Bjarnsholt, Thomas

    2013-01-01

    Achromobacter xylosoxidans is an environmental opportunistic pathogen, which infects an increasing number of immunocompromised patients. In this study we combined genomic analysis of a clinical isolated A. xylosoxidans strain with phenotypic investigations of its important pathogenic features. We present a complete assembly of the genome of A. xylosoxidans NH44784-1996, an isolate from a cystic fibrosis patient obtained in 1996. The genome of A. xylosoxidans NH44784-1996 contains approximately 7 million base pairs with 6390 potential protein-coding sequences. We identified several features that render it an opportunistic human pathogen, We found genes involved in anaerobic growth and the pgaABCD operon encoding the biofilm adhesin poly-β-1,6-N-acetyl-D-glucosamin. Furthermore, the genome contains a range of antibiotic resistance genes coding efflux pump systems and antibiotic modifying enzymes. In vitro studies of A. xylosoxidans NH44784-1996 confirmed the genomic evidence for its ability to form biofilms, anaerobic growth via denitrification, and resistance to a broad range of antibiotics. Our investigation enables further studies of the functionality of important identified genes contributing to the pathogenicity of A. xylosoxidans and thereby improves our understanding and ability to treat this emerging pathogen. PMID:23894309

  3. The sequence, structure and evolutionary features of HOTAIR in mammals

    PubMed Central

    2011-01-01

    Background An increasing number of long noncoding RNAs (lncRNAs) have been identified recently. Different from all the others that function in cis to regulate local gene expression, the newly identified HOTAIR is located between HoxC11 and HoxC12 in the human genome and regulates HoxD expression in multiple tissues. Like the well-characterised lncRNA Xist, HOTAIR binds to polycomb proteins to methylate histones at multiple HoxD loci, but unlike Xist, many details of its structure and function, as well as the trans regulation, remain unclear. Moreover, HOTAIR is involved in the aberrant regulation of gene expression in cancer. Results To identify conserved domains in HOTAIR and study the phylogenetic distribution of this lncRNA, we searched the genomes of 10 mammalian and 3 non-mammalian vertebrates for matches to its 6 exons and the two conserved domains within the 1800 bp exon6 using Infernal. There was just one high-scoring hit for each mammal, but many low-scoring hits were found in both mammals and non-mammalian vertebrates. These hits and their flanking genes in four placental mammals and platypus were examined to determine whether HOTAIR contained elements shared by other lncRNAs. Several of the hits were within unknown transcripts or ncRNAs, many were within introns of, or antisense to, protein-coding genes, and conservation of the flanking genes was observed only between human and chimpanzee. Phylogenetic analysis revealed discrete evolutionary dynamics for orthologous sequences of HOTAIR exons. Exon1 at the 5' end and a domain in exon6 near the 3' end, which contain domains that bind to multiple proteins, have evolved faster in primates than in other mammals. Structures were predicted for exon1, two domains of exon6 and the full HOTAIR sequence. The sequence and structure of two fragments, in exon1 and the domain B of exon6 respectively, were identified to robustly occur in predicted structures of exon1, domain B of exon6 and the full HOTAIR in mammals. Conclusions HOTAIR exists in mammals, has poorly conserved sequences and considerably conserved structures, and has evolved faster than nearby HoxC genes. Exons of HOTAIR show distinct evolutionary features, and a 239 bp domain in the 1804 bp exon6 is especially conserved. These features, together with the absence of some exons and sequences in mouse, rat and kangaroo, suggest ab initio generation of HOTAIR in marsupials. Structure prediction identifies two fragments in the 5' end exon1 and the 3' end domain B of exon6, with sequence and structure invariably occurring in various predicted structures of exon1, the domain B of exon6 and the full HOTAIR. PMID:21496275

  4. Truncating Variants in NAA15 Are Associated with Variable Levels of Intellectual Disability, Autism Spectrum Disorder, and Congenital Anomalies.

    PubMed

    Cheng, Hanyin; Dharmadhikari, Avinash V; Varland, Sylvia; Ma, Ning; Domingo, Deepti; Kleyner, Robert; Rope, Alan F; Yoon, Margaret; Stray-Pedersen, Asbjørg; Posey, Jennifer E; Crews, Sarah R; Eldomery, Mohammad K; Akdemir, Zeynep Coban; Lewis, Andrea M; Sutton, Vernon R; Rosenfeld, Jill A; Conboy, Erin; Agre, Katherine; Xia, Fan; Walkiewicz, Magdalena; Longoni, Mauro; High, Frances A; van Slegtenhorst, Marjon A; Mancini, Grazia M S; Finnila, Candice R; van Haeringen, Arie; den Hollander, Nicolette; Ruivenkamp, Claudia; Naidu, Sakkubai; Mahida, Sonal; Palmer, Elizabeth E; Murray, Lucinda; Lim, Derek; Jayakar, Parul; Parker, Michael J; Giusto, Stefania; Stracuzzi, Emanuela; Romano, Corrado; Beighley, Jennifer S; Bernier, Raphael A; Küry, Sébastien; Nizon, Mathilde; Corbett, Mark A; Shaw, Marie; Gardner, Alison; Barnett, Christopher; Armstrong, Ruth; Kassahn, Karin S; Van Dijck, Anke; Vandeweyer, Geert; Kleefstra, Tjitske; Schieving, Jolanda; Jongmans, Marjolijn J; de Vries, Bert B A; Pfundt, Rolph; Kerr, Bronwyn; Rojas, Samantha K; Boycott, Kym M; Person, Richard; Willaert, Rebecca; Eichler, Evan E; Kooy, R Frank; Yang, Yaping; Wu, Joseph C; Lupski, James R; Arnesen, Thomas; Cooper, Gregory M; Chung, Wendy K; Gecz, Jozef; Stessman, Holly A F; Meng, Linyan; Lyon, Gholson J

    2018-05-03

    N-alpha-acetylation is a common co-translational protein modification that is essential for normal cell function in humans. We previously identified the genetic basis of an X-linked infantile lethal Mendelian disorder involving a c.109T>C (p.Ser37Pro) missense variant in NAA10, which encodes the catalytic subunit of the N-terminal acetyltransferase A (NatA) complex. The auxiliary subunit of the NatA complex, NAA15, is the dimeric binding partner for NAA10. Through a genotype-first approach with whole-exome or genome sequencing (WES/WGS) and targeted sequencing analysis, we identified and phenotypically characterized 38 individuals from 33 unrelated families with 25 different de novo or inherited, dominantly acting likely gene disrupting (LGD) variants in NAA15. Clinical features of affected individuals with LGD variants in NAA15 include variable levels of intellectual disability, delayed speech and motor milestones, and autism spectrum disorder. Additionally, mild craniofacial dysmorphology, congenital cardiac anomalies, and seizures are present in some subjects. RNA analysis in cell lines from two individuals showed degradation of the transcripts with LGD variants, probably as a result of nonsense-mediated decay. Functional assays in yeast confirmed a deleterious effect for two of the LGD variants in NAA15. Further supporting a mechanism of haploinsufficiency, individuals with copy-number variant (CNV) deletions involving NAA15 and surrounding genes can present with mild intellectual disability, mild dysmorphic features, motor delays, and decreased growth. We propose that defects in NatA-mediated N-terminal acetylation (NTA) lead to variable levels of neurodevelopmental disorders in humans, supporting the importance of the NatA complex in normal human development. Copyright © 2018 American Society of Human Genetics. All rights reserved.

  5. A machine learning approach for the identification of key markers involved in brain development from single-cell transcriptomic data.

    PubMed

    Hu, Yongli; Hase, Takeshi; Li, Hui Peng; Prabhakar, Shyam; Kitano, Hiroaki; Ng, See Kiong; Ghosh, Samik; Wee, Lawrence Jin Kiat

    2016-12-22

    The ability to sequence the transcriptomes of single cells using single-cell RNA-seq sequencing technologies presents a shift in the scientific paradigm where scientists, now, are able to concurrently investigate the complex biology of a heterogeneous population of cells, one at a time. However, till date, there has not been a suitable computational methodology for the analysis of such intricate deluge of data, in particular techniques which will aid the identification of the unique transcriptomic profiles difference between the different cellular subtypes. In this paper, we describe the novel methodology for the analysis of single-cell RNA-seq data, obtained from neocortical cells and neural progenitor cells, using machine learning algorithms (Support Vector machine (SVM) and Random Forest (RF)). Thirty-eight key transcripts were identified, using the SVM-based recursive feature elimination (SVM-RFE) method of feature selection, to best differentiate developing neocortical cells from neural progenitor cells in the SVM and RF classifiers built. Also, these genes possessed a higher discriminative power (enhanced prediction accuracy) as compared commonly used statistical techniques or geneset-based approaches. Further downstream network reconstruction analysis was carried out to unravel hidden general regulatory networks where novel interactions could be further validated in web-lab experimentation and be useful candidates to be targeted for the treatment of neuronal developmental diseases. This novel approach reported for is able to identify transcripts, with reported neuronal involvement, which optimally differentiate neocortical cells and neural progenitor cells. It is believed to be extensible and applicable to other single-cell RNA-seq expression profiles like that of the study of the cancer progression and treatment within a highly heterogeneous tumour.

  6. Further developments in the controlled growth approach for optimal structural synthesis

    NASA Technical Reports Server (NTRS)

    Hajela, P.

    1982-01-01

    It is pointed out that the use of nonlinear programming methods in conjunction with finite element and other discrete analysis techniques have provided a powerful tool in the domain of optimal structural synthesis. The present investigation is concerned with new strategies which comprise an extension to the controlled growth method considered by Hajela and Sobieski-Sobieszczanski (1981). This method proposed an approach wherein the standard nonlinear programming (NLP) methodology of working with a very large number of design variables was replaced by a sequence of smaller optimization cycles, each involving a single 'dominant' variable. The current investigation outlines some new features. Attention is given to a modified cumulative constraint representation which is defined in both the feasible and infeasible domain of the design space. Other new features are related to the evaluation of the 'effectiveness measure' on which the choice of the dominant variable and the linking strategy is based.

  7. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene, and assayedmore » embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Measuring conservation of sequence features closely linked to function--such as binding-site clustering--makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  8. HRAS mutations in Costello syndrome: detection of constitutional activating mutations in codon 12 and 13 and loss of wild-type allele in malignancy.

    PubMed

    Estep, Anne L; Tidyman, William E; Teitell, Michael A; Cotter, Philip D; Rauen, Katherine A

    2006-01-01

    Costello syndrome (CS) is a complex developmental disorder involving characteristic craniofacial features, failure to thrive, developmental delay, cardiac and skeletal anomalies, and a predisposition to develop neoplasia. Based on similarities with other cancer syndromes, we previously hypothesized that CS is likely due to activation of signal transduction through the Ras/MAPK pathway [Tartaglia et al., 2003]. In this study, the HRAS coding region was sequenced for mutations in a large, well-characterized cohort of 36 CS patients. Heterogeneous missense point mutations predicting an amino acid substitution were identified in 33/36 (92%) patients. The majority (91%) had a 34G --> A transition in codon 12. Less frequent mutations included 35G --> C (codon 12) and 37G --> T (codon 13). Parental samples did not have an HRAS mutation supporting the hypothesis of de novo heterogeneous mutations. There is phenotypic variability among patients with a 34G --> A transition. The most consistent features included characteristic facies and skin, failure to thrive, developmental delay, musculoskeletal abnormalities, visual impairment, cardiac abnormalities, and generalized hyperpigmentation. The two patients with 35G --> C had cardiac arrhythmias whereas one patient with a 37G --> T transversion had an enlarged aortic root. Of the patients with a clinical diagnosis of CS, neoplasia was the most consistent phenotypic feature for predicating an HRAS mutation. To gain an understanding of the relationship between constitutional HRAS mutations and malignancy, HRAS was sequenced in an advanced biphasic rhabdomyosarcoma/fibrosarcoma from an individual with a 34G --> A mutation. Loss of the wild-type HRAS allele was observed, suggesting tumorigenesis in CS patients is accompanied by additional somatic changes affecting HRAS. Finally, due to phenotypic overlap between CS and cardio-facio-cutaneous (CFC) syndromes, the HRAS coding region was sequenced in a well-characterized CFC cohort. No mutations were found which support a distinct genetic etiology between CS and CFC syndromes. (c) 2005 Wiley-Liss, Inc.

  9. Comparative genome analysis of rice-pathogenic Burkholderia provides insight into capacity to adapt to different environments and hosts.

    PubMed

    Seo, Young-Su; Lim, Jae Yun; Park, Jungwook; Kim, Sunyoung; Lee, Hyun-Hee; Cheong, Hoon; Kim, Sang-Mok; Moon, Jae Sun; Hwang, Ingyu

    2015-05-06

    In addition to human and animal diseases, bacteria of the genus Burkholderia can cause plant diseases. The representative species of rice-pathogenic Burkholderia are Burkholderia glumae, B. gladioli, and B. plantarii, which primarily cause grain rot, sheath rot, and seedling blight, respectively, resulting in severe reductions in rice production. Though Burkholderia rice pathogens cause problems in rice-growing countries, comprehensive studies of these rice-pathogenic species aiming to control Burkholderia-mediated diseases are only in the early stages. We first sequenced the complete genome of B. plantarii ATCC 43733T. Second, we conducted comparative analysis of the newly sequenced B. plantarii ATCC 43733T genome with eleven complete or draft genomes of B. glumae and B. gladioli strains. Furthermore, we compared the genome of three rice Burkholderia pathogens with those of other Burkholderia species such as those found in environmental habitats and those known as animal/human pathogens. These B. glumae, B. gladioli, and B. plantarii strains have unique genes involved in toxoflavin or tropolone toxin production and the clustered regularly interspaced short palindromic repeats (CRISPR)-mediated bacterial immune system. Although the genome of B. plantarii ATCC 43733T has many common features with those of B. glumae and B. gladioli, this B. plantarii strain has several unique features, including quorum sensing and CRISPR/CRISPR-associated protein (Cas) systems. The complete genome sequence of B. plantarii ATCC 43733T and publicly available genomes of B. glumae BGR1 and B. gladioli BSR3 enabled comprehensive comparative genome analyses among three rice-pathogenic Burkholderia species responsible for tissue rotting and seedling blight. Our results suggest that B. glumae has evolved rapidly, or has undergone rapid genome rearrangements or deletions, in response to the hosts. It also, clarifies the unique features of rice pathogenic Burkholderia species relative to other animal and human Burkholderia species.

  10. Determinants of Global Color-Based Selection in Human Visual Cortex.

    PubMed

    Bartsch, Mandy V; Boehler, Carsten N; Stoppel, Christian M; Merkel, Christian; Heinze, Hans-Jochen; Schoenfeld, Mircea A; Hopf, Jens-Max

    2015-09-01

    Feature attention operates in a spatially global way, with attended feature values being prioritized for selection outside the focus of attention. Accounts of global feature attention have emphasized feature competition as a determining factor. Here, we use magnetoencephalographic recordings in humans to test whether competition is critical for global feature selection to arise. Subjects performed a color/shape discrimination task in one visual field (VF), while irrelevant color probes were presented in the other unattended VF. Global effects of color attention were assessed by analyzing the response to the probe as a function of whether or not the probe's color was a target-defining color. We find that global color selection involves a sequence of modulations in extrastriate cortex, with an initial phase in higher tier areas (lateral occipital complex) followed by a later phase in lower tier retinotopic areas (V3/V4). Importantly, these modulations appeared with and without color competition in the focus of attention. Moreover, early parts of the modulation emerged for a task-relevant color not even present in the focus of attention. All modulations, however, were eliminated during simple onset-detection of the colored target. These results indicate that global color-based attention depends on target discrimination independent of feature competition in the focus of attention. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  11. A systematic approach to magnetic resonance imaging evaluation of epiphyseal lesions.

    PubMed

    Thawait, Shrey K; Thawait, Gaurav K; Frassica, Frank J; Andreisek, Gustav; Carrino, John A; Chhabra, Avneesh

    2013-04-01

    Magnetic Resonance Imaging (MRI) is the preferred modality of choice to image epiphyseal lesions. It provides excellent soft tissue resolution and extent of disease. A wide spectrum of tumor and tumor like lesions can involve the epiphysis. Early and accurate diagnosis as well as appropriate management of epiphyseal lesions is critical as these conditions may lead to disabling complications such as, limb length discrepancy, angular or joint surface deformities and secondary osteoarthritis. In this article, we discuss the role of conventional sequences, such as T1W, fluid sensitive T2W and intravenous (IV) Gadolinium enhanced sequences as well as the additional value of problem solving MRI sequences such as, chemical shift and diffusion weighted imaging. Based on the imaging findings on various MRI sequences and lesion characteristics, a systematic approach directed to the diagnoses of epiphyseal lesions is presented and discussed. MRI features of clinically and biopsy proven examples of the epiphyseal lesions, such as osteomyelitis, intra-osseous abscess, infiltrative malignancy, metastases, transient osteoporosis, subchondral insufficiency fracture, avascular necrosis, osteochondral fracture, osteochondritis dissecans, eosinophilic granuloma and geode are demonstrated. Using this systematic approach, the reader will be able to better characterize epiphyseal lesions with a potential to positively affect patient management. Copyright © 2013 Elsevier Inc. All rights reserved.

  12. Representation of DNA sequences in genetic codon context with applications in exon and intron prediction.

    PubMed

    Yin, Changchuan

    2015-04-01

    To apply digital signal processing (DSP) methods to analyze DNA sequences, the sequences first must be specially mapped into numerical sequences. Thus, effective numerical mappings of DNA sequences play key roles in the effectiveness of DSP-based methods such as exon prediction. Despite numerous mappings of symbolic DNA sequences to numerical series, the existing mapping methods do not include the genetic coding features of DNA sequences. We present a novel numerical representation of DNA sequences using genetic codon context (GCC) in which the numerical values are optimized by simulation annealing to maximize the 3-periodicity signal to noise ratio (SNR). The optimized GCC representation is then applied in exon and intron prediction by Short-Time Fourier Transform (STFT) approach. The results show the GCC method enhances the SNR values of exon sequences and thus increases the accuracy of predicting protein coding regions in genomes compared with the commonly used 4D binary representation. In addition, this study offers a novel way to reveal specific features of DNA sequences by optimizing numerical mappings of symbolic DNA sequences.

  13. Detection of distorted frames in retinal video-sequences via machine learning

    NASA Astrophysics Data System (ADS)

    Kolar, Radim; Liberdova, Ivana; Odstrcilik, Jan; Hracho, Michal; Tornow, Ralf P.

    2017-07-01

    This paper describes detection of distorted frames in retinal sequences based on set of global features extracted from each frame. The feature vector is consequently used in classification step, in which three types of classifiers are tested. The best classification accuracy 96% has been achieved with support vector machine approach.

  14. Characterization and Comparative Overview of Complete Sequences of the First Plasmids of Pandoraea across Clinical and Non-clinical Strains

    PubMed Central

    Yong, Delicia; Tee, Kok Keng; Yin, Wai-Fong; Chan, Kok-Gan

    2016-01-01

    To date, information on plasmid analysis in Pandoraea spp. is scarce. To address the gap of knowledge on this, the complete sequences of eight plasmids from Pandoraea spp. namely Pandoraea faecigallinarum DSM 23572T (pPF72-1, pPF72-2), Pandoraea oxalativorans DSM 23570T (pPO70-1, pPO70-2, pPO70-3, pPO70-4), Pandoraea vervacti NS15 (pPV15) and Pandoraea apista DSM 16535T (pPA35) were studied for the first time in this study. The information on plasmid sequences in Pandoraea spp. is useful as the sequences did not match any known plasmid sequence deposited in public databases. Replication genes were not identified in some plasmids, a situation that has led to the possibility of host interaction involvement. Some plasmids were also void of par genes and intriguingly, repA gene was also not discovered in these plasmids. This further leads to the hypothesis of host-plasmid interaction. Plasmid stabilization/stability protein-encoding genes were observed in some plasmids but were not established for participating in plasmid segregation. Toxin-antitoxin systems MazEF, VapBC, RelBE, YgiT-MqsR, HigBA, and ParDE were identified across the plasmids and their presence would improve plasmid maintenance. Conjugation genes were identified portraying the conjugation ability amongst Pandoraea plasmids. Additionally, we found a shared region amongst some of the plasmids that consists of conjugation genes. The identification of genes involved in replication, segregation, toxin-antitoxin systems and conjugation, would aid the design of drugs to prevent the survival or transmission of plasmids carrying pathogenic properties. Additionally, genes conferring virulence and antibiotic resistance were identified amongst the plasmids. The observed features in the plasmids shed light on the Pandoraea spp. as opportunistic pathogens. PMID:27790203

  15. Comparative and Joint Analysis of Two Metagenomic Datasets from a Biogas Fermenter Obtained by 454-Pyrosequencing

    PubMed Central

    Jaenicke, Sebastian; Ander, Christina; Bekel, Thomas; Bisdorf, Regina; Dröge, Marcus; Gartemann, Karl-Heinz; Jünemann, Sebastian; Kaiser, Olaf; Krause, Lutz; Tille, Felix; Zakrzewski, Martha; Pühler, Alfred

    2011-01-01

    Biogas production from renewable resources is attracting increased attention as an alternative energy source due to the limited availability of traditional fossil fuels. Many countries are promoting the use of alternative energy sources for sustainable energy production. In this study, a metagenome from a production-scale biogas fermenter was analysed employing Roche's GS FLX Titanium technology and compared to a previous dataset obtained from the same community DNA sample that was sequenced on the GS FLX platform. Taxonomic profiling based on 16S rRNA-specific sequences and an Environmental Gene Tag (EGT) analysis employing CARMA demonstrated that both approaches benefit from the longer read lengths obtained on the Titanium platform. Results confirmed Clostridia as the most prevalent taxonomic class, whereas species of the order Methanomicrobiales are dominant among methanogenic Archaea. However, the analyses also identified additional taxa that were missed by the previous study, including members of the genera Streptococcus, Acetivibrio, Garciella, Tissierella, and Gelria, which might also play a role in the fermentation process leading to the formation of methane. Taking advantage of the CARMA feature to correlate taxonomic information of sequences with their assigned functions, it appeared that Firmicutes, followed by Bacteroidetes and Proteobacteria, dominate within the functional context of polysaccharide degradation whereas Methanomicrobiales represent the most abundant taxonomic group responsible for methane production. Clostridia is the most important class involved in the reductive CoA pathway (Wood-Ljungdahl pathway) that is characteristic for acetogenesis. Based on binning of 16S rRNA-specific sequences allocated to the dominant genus Methanoculleus, it could be shown that this genus is represented by several different species. Phylogenetic analysis of these sequences placed them in close proximity to the hydrogenotrophic methanogen Methanoculleus bourgensis. While rarefaction analyses still indicate incomplete coverage, examination of the GS FLX Titanium dataset resulted in the identification of additional genera and functional elements, providing a far more complete coverage of the community involved in anaerobic fermentative pathways leading to methane formation. PMID:21297863

  16. Orthogonal Polynomials Associated with Complementary Chain Sequences

    NASA Astrophysics Data System (ADS)

    Behera, Kiran Kumar; Sri Ranga, A.; Swaminathan, A.

    2016-07-01

    Using the minimal parameter sequence of a given chain sequence, we introduce the concept of complementary chain sequences, which we view as perturbations of chain sequences. Using the relation between these complementary chain sequences and the corresponding Verblunsky coefficients, the para-orthogonal polynomials and the associated Szegő polynomials are analyzed. Two illustrations, one involving Gaussian hypergeometric functions and the other involving Carathéodory functions are also provided. A connection between these two illustrations by means of complementary chain sequences is also observed.

  17. A matter of emphasis: Linguistic stress habits modulate serial recall.

    PubMed

    Taylor, John C; Macken, Bill; Jones, Dylan M

    2015-04-01

    Models of short-term memory for sequential information rely on item-level, feature-based descriptions to account for errors in serial recall. Transposition errors within alternating similar/dissimilar letter sequences derive from interactions between overlapping features. However, in two experiments, we demonstrated that the characteristics of the sequence are what determine the fates of items, rather than the properties ascribed to the items themselves. Performance in alternating sequences is determined by the way that the sequences themselves induce particular prosodic rehearsal patterns, and not by the nature of the items per se. In a serial recall task, the shapes of the canonical "saw-tooth" serial position curves and transposition error probabilities at successive input-output distances were modulated by subvocal rehearsal strategies, despite all item-based parameters being held constant. We replicated this finding using nonalternating lists, thus demonstrating that transpositions are substantially influenced by prosodic features-such as stress-that emerge during subvocal rehearsal.

  18. A Novel Feature Extraction Method with Feature Selection to Identify Golgi-Resident Protein Types from Imbalanced Data

    PubMed Central

    Yang, Runtao; Zhang, Chengjin; Gao, Rui; Zhang, Lina

    2016-01-01

    The Golgi Apparatus (GA) is a major collection and dispatch station for numerous proteins destined for secretion, plasma membranes and lysosomes. The dysfunction of GA proteins can result in neurodegenerative diseases. Therefore, accurate identification of protein subGolgi localizations may assist in drug development and understanding the mechanisms of the GA involved in various cellular processes. In this paper, a new computational method is proposed for identifying cis-Golgi proteins from trans-Golgi proteins. Based on the concept of Common Spatial Patterns (CSP), a novel feature extraction technique is developed to extract evolutionary information from protein sequences. To deal with the imbalanced benchmark dataset, the Synthetic Minority Over-sampling Technique (SMOTE) is adopted. A feature selection method called Random Forest-Recursive Feature Elimination (RF-RFE) is employed to search the optimal features from the CSP based features and g-gap dipeptide composition. Based on the optimal features, a Random Forest (RF) module is used to distinguish cis-Golgi proteins from trans-Golgi proteins. Through the jackknife cross-validation, the proposed method achieves a promising performance with a sensitivity of 0.889, a specificity of 0.880, an accuracy of 0.885, and a Matthew’s Correlation Coefficient (MCC) of 0.765, which remarkably outperforms previous methods. Moreover, when tested on a common independent dataset, our method also achieves a significantly improved performance. These results highlight the promising performance of the proposed method to identify Golgi-resident protein types. Furthermore, the CSP based feature extraction method may provide guidelines for protein function predictions. PMID:26861308

  19. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    PubMed Central

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  20. Distribution of genotype network sizes in sequence-to-structure genotype-phenotype maps.

    PubMed

    Manrubia, Susanna; Cuesta, José A

    2017-04-01

    An essential quantity to ensure evolvability of populations is the navigability of the genotype space. Navigability, understood as the ease with which alternative phenotypes are reached, relies on the existence of sufficiently large and mutually attainable genotype networks. The size of genotype networks (e.g. the number of RNA sequences folding into a particular secondary structure or the number of DNA sequences coding for the same protein structure) is astronomically large in all functional molecules investigated: an exhaustive experimental or computational study of all RNA folds or all protein structures becomes impossible even for moderately long sequences. Here, we analytically derive the distribution of genotype network sizes for a hierarchy of models which successively incorporate features of increasingly realistic sequence-to-structure genotype-phenotype maps. The main feature of these models relies on the characterization of each phenotype through a prototypical sequence whose sites admit a variable fraction of letters of the alphabet. Our models interpolate between two limit distributions: a power-law distribution, when the ordering of sites in the prototypical sequence is strongly constrained, and a lognormal distribution, as suggested for RNA, when different orderings of the same set of sites yield different phenotypes. Our main result is the qualitative and quantitative identification of those features of sequence-to-structure maps that lead to different distributions of genotype network sizes. © 2017 The Author(s).

  1. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets.

    PubMed

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S; Beer, Michael A

    2013-07-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167-80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org.

  2. Music-Elicited Emotion Identification Using Optical Flow Analysis of Human Face

    NASA Astrophysics Data System (ADS)

    Kniaz, V. V.; Smirnova, Z. N.

    2015-05-01

    Human emotion identification from image sequences is highly demanded nowadays. The range of possible applications can vary from an automatic smile shutter function of consumer grade digital cameras to Biofied Building technologies, which enables communication between building space and residents. The highly perceptual nature of human emotions leads to the complexity of their classification and identification. The main question arises from the subjective quality of emotional classification of events that elicit human emotions. A variety of methods for formal classification of emotions were developed in musical psychology. This work is focused on identification of human emotions evoked by musical pieces using human face tracking and optical flow analysis. Facial feature tracking algorithm used for facial feature speed and position estimation is presented. Facial features were extracted from each image sequence using human face tracking with local binary patterns (LBP) features. Accurate relative speeds of facial features were estimated using optical flow analysis. Obtained relative positions and speeds were used as the output facial emotion vector. The algorithm was tested using original software and recorded image sequences. The proposed technique proves to give a robust identification of human emotions elicited by musical pieces. The estimated models could be used for human emotion identification from image sequences in such fields as emotion based musical background or mood dependent radio.

  3. Maximum entropy methods for extracting the learned features of deep neural networks.

    PubMed

    Finnegan, Alex; Song, Jun S

    2017-10-01

    New architectures of multilayer artificial neural networks and new methods for training them are rapidly revolutionizing the application of machine learning in diverse fields, including business, social science, physical sciences, and biology. Interpreting deep neural networks, however, currently remains elusive, and a critical challenge lies in understanding which meaningful features a network is actually learning. We present a general method for interpreting deep neural networks and extracting network-learned features from input data. We describe our algorithm in the context of biological sequence analysis. Our approach, based on ideas from statistical physics, samples from the maximum entropy distribution over possible sequences, anchored at an input sequence and subject to constraints implied by the empirical function learned by a network. Using our framework, we demonstrate that local transcription factor binding motifs can be identified from a network trained on ChIP-seq data and that nucleosome positioning signals are indeed learned by a network trained on chemical cleavage nucleosome maps. Imposing a further constraint on the maximum entropy distribution also allows us to probe whether a network is learning global sequence features, such as the high GC content in nucleosome-rich regions. This work thus provides valuable mathematical tools for interpreting and extracting learned features from feed-forward neural networks.

  4. Influenza virus sequence feature variant type analysis: evidence of a role for NS1 in influenza virus host range restriction.

    PubMed

    Noronha, Jyothi M; Liu, Mengya; Squires, R Burke; Pickett, Brett E; Hale, Benjamin G; Air, Gillian M; Galloway, Summer E; Takimoto, Toru; Schmolke, Mirco; Hunt, Victoria; Klem, Edward; García-Sastre, Adolfo; McGee, Monnie; Scheuermann, Richard H

    2012-05-01

    Genetic drift of influenza virus genomic sequences occurs through the combined effects of sequence alterations introduced by a low-fidelity polymerase and the varying selective pressures experienced as the virus migrates through different host environments. While traditional phylogenetic analysis is useful in tracking the evolutionary heritage of these viruses, the specific genetic determinants that dictate important phenotypic characteristics are often difficult to discern within the complex genetic background arising through evolution. Here we describe a novel influenza virus sequence feature variant type (Flu-SFVT) approach, made available through the public Influenza Research Database resource (www.fludb.org), in which variant types (VTs) identified in defined influenza virus protein sequence features (SFs) are used for genotype-phenotype association studies. Since SFs have been defined for all influenza virus proteins based on known structural, functional, and immune epitope recognition properties, the Flu-SFVT approach allows the rapid identification of the molecular genetic determinants of important influenza virus characteristics and their connection to underlying biological functions. We demonstrate the use of the SFVT approach to obtain statistical evidence for effects of NS1 protein sequence variations in dictating influenza virus host range restriction.

  5. Analysis and Prediction of Myristoylation Sites Using the mRMR Method, the IFS Method and an Extreme Learning Machine Algorithm.

    PubMed

    Wang, ShaoPeng; Zhang, Yu-Hang; Huang, GuoHua; Chen, Lei; Cai, Yu-Dong

    2017-01-01

    Myristoylation is an important hydrophobic post-translational modification that is covalently bound to the amino group of Gly residues on the N-terminus of proteins. The many diverse functions of myristoylation on proteins, such as membrane targeting, signal pathway regulation and apoptosis, are largely due to the lipid modification, whereas abnormal or irregular myristoylation on proteins can lead to several pathological changes in the cell. To better understand the function of myristoylated sites and to correctly identify them in protein sequences, this study conducted a novel computational investigation on identifying myristoylation sites in protein sequences. A training dataset with 196 positive and 84 negative peptide segments were obtained. Four types of features derived from the peptide segments following the myristoylation sites were used to specify myristoylatedand non-myristoylated sites. Then, feature selection methods including maximum relevance and minimum redundancy (mRMR), incremental feature selection (IFS), and a machine learning algorithm (extreme learning machine method) were adopted to extract optimal features for the algorithm to identify myristoylation sites in protein sequences, thereby building an optimal prediction model. As a result, 41 key features were extracted and used to build an optimal prediction model. The effectiveness of the optimal prediction model was further validated by its performance on a test dataset. Furthermore, detailed analyses were also performed on the extracted 41 features to gain insight into the mechanism of myristoylation modification. This study provided a new computational method for identifying myristoylation sites in protein sequences. We believe that it can be a useful tool to predict myristoylation sites from protein sequences. Copyright© Bentham Science Publishers; For any queries, please email at epub@benthamscience.org.

  6. Extensive structural variations between mitochondrial genomes of CMS and normal peppers (Capsicum annuum L.) revealed by complete nucleotide sequencing.

    PubMed

    Jo, Yeong Deuk; Choi, Yoomi; Kim, Dong-Hwan; Kim, Byung-Dong; Kang, Byoung-Cheorl

    2014-07-04

    Cytoplasmic male sterility (CMS) is an inability to produce functional pollen that is caused by mutation of the mitochondrial genome. Comparative analyses of mitochondrial genomes of lines with and without CMS in several species have revealed structural differences between genomes, including extensive rearrangements caused by recombination. However, the mitochondrial genome structure and the DNA rearrangements that may be related to CMS have not been characterized in Capsicum spp. We obtained the complete mitochondrial genome sequences of the pepper CMS line FS4401 (507,452 bp) and the fertile line Jeju (511,530 bp). Comparative analysis between mitochondrial genomes of peppers and tobacco that are included in Solanaceae revealed extensive DNA rearrangements and poor conservation in non-coding DNA. In comparison between pepper lines, FS4401 and Jeju mitochondrial DNAs contained the same complement of protein coding genes except for one additional copy of an atp6 gene (ψatp6-2) in FS4401. In terms of genome structure, we found eighteen syntenic blocks in the two mitochondrial genomes, which have been rearranged in each genome. By contrast, sequences between syntenic blocks, which were specific to each line, accounted for 30,380 and 17,847 bp in FS4401 and Jeju, respectively. The previously-reported CMS candidate genes, orf507 and ψatp6-2, were located on the edges of the largest sequence segments that were specific to FS4401. In this region, large number of small sequence segments which were absent or found on different locations in Jeju mitochondrial genome were combined together. The incorporation of repeats and overlapping of connected sequence segments by a few nucleotides implied that extensive rearrangements by homologous recombination might be involved in evolution of this region. Further analysis using mtDNA pairs from other plant species revealed common features of DNA regions around CMS-associated genes. Although large portion of sequence context was shared by mitochondrial genomes of CMS and male-fertile pepper lines, extensive genome rearrangements were detected. CMS candidate genes located on the edges of highly-rearranged CMS-specific DNA regions and near to repeat sequences. These characteristics were detected among CMS-associated genes in other species, implying a common mechanism might be involved in the evolution of CMS-associated genes.

  7. Transcription Factor Information System (TFIS): A Tool for Detection of Transcription Factor Binding Sites.

    PubMed

    Narad, Priyanka; Kumar, Abhishek; Chakraborty, Amlan; Patni, Pranav; Sengupta, Abhishek; Wadhwa, Gulshan; Upadhyaya, K C

    2017-09-01

    Transcription factors are trans-acting proteins that interact with specific nucleotide sequences known as transcription factor binding site (TFBS), and these interactions are implicated in regulation of the gene expression. Regulation of transcriptional activation of a gene often involves multiple interactions of transcription factors with various sequence elements. Identification of these sequence elements is the first step in understanding the underlying molecular mechanism(s) that regulate the gene expression. For in silico identification of these sequence elements, we have developed an online computational tool named transcription factor information system (TFIS) for detecting TFBS for the first time using a collection of JAVA programs and is mainly based on TFBS detection using position weight matrix (PWM). The database used for obtaining position frequency matrices (PFM) is JASPAR and HOCOMOCO, which is an open-access database of transcription factor binding profiles. Pseudo-counts are used while converting PFM to PWM, and TFBS detection is carried out on the basis of percent score taken as threshold value. TFIS is equipped with advanced features such as direct sequence retrieving from NCBI database using gene identification number and accession number, detecting binding site for common TF in a batch of gene sequences, and TFBS detection after generating PWM from known raw binding sequences in addition to general detection methods. TFIS can detect the presence of potential TFBSs in both the directions at the same time. This feature increases its efficiency. And the results for this dual detection are presented in different colors specific to the orientation of the binding site. Results obtained by the TFIS are more detailed and specific to the detected TFs as integration of more informative links from various related web servers are added in the result pages like Gene Ontology, PAZAR database and Transcription Factor Encyclopedia in addition to NCBI and UniProt. Common TFs like SP1, AP1 and NF-KB of the Amyloid beta precursor gene is easily detected using TFIS along with multiple binding sites. In another scenario of embryonic developmental process, TFs of the FOX family (FOXL1 and FOXC1) were also identified. TFIS is platform-independent which is publicly available along with its support and documentation at http://tfistool.appspot.com and http://www.bioinfoplus.com/tfis/ . TFIS is licensed under the GNU General Public License, version 3 (GPL-3.0).

  8. PredPPCrys: accurate prediction of sequence cloning, protein production, purification and crystallization propensity from protein sequences using multi-step heterogeneous feature fusion and selection.

    PubMed

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed 'PredPPCrys' using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys.

  9. Building Facade Modeling Under Line Feature Constraint Based on Close-Range Images

    NASA Astrophysics Data System (ADS)

    Liang, Y.; Sheng, Y. H.

    2018-04-01

    To solve existing problems in modeling facade of building merely with point feature based on close-range images , a new method for modeling building facade under line feature constraint is proposed in this paper. Firstly, Camera parameters and sparse spatial point clouds data were restored using the SFM , and 3D dense point clouds were generated with MVS; Secondly, the line features were detected based on the gradient direction , those detected line features were fit considering directions and lengths , then line features were matched under multiple types of constraints and extracted from multi-image sequence. At last, final facade mesh of a building was triangulated with point cloud and line features. The experiment shows that this method can effectively reconstruct the geometric facade of buildings using the advantages of combining point and line features of the close - range image sequence, especially in restoring the contour information of the facade of buildings.

  10. Different levels of learning interact to shape the congruency sequence effect.

    PubMed

    Weissman, Daniel H; Hawks, Zoë W; Egner, Tobias

    2016-04-01

    The congruency effect in distracter interference tasks is often reduced after incongruent relative to congruent trials. Moreover, this congruency sequence effect (CSE) is influenced by learning related to concrete stimulus and response features as well as by learning related to abstract cognitive control processes. There is an ongoing debate, however, over whether interactions between these learning processes are best explained by an episodic retrieval account, an adaptation by binding account, or a cognitive efficiency account of the CSE. To make this distinction, we orthogonally manipulated the expression of these learning processes in a novel factorial design involving the prime-probe arrow task. In Experiment 1, these processes interacted in an over-additive fashion to influence CSE magnitude. In Experiment 2, we replicated this interaction while showing it was not driven by conditional differences in the size of the congruency effect. In Experiment 3, we ruled out an alternative account of this interaction as reflecting conditional differences in learning related to concrete stimulus and response features. These findings support an episodic retrieval account of the CSE, in which repeating a stimulus feature from the previous trial facilitates the retrieval and use of previous-trial control parameters, thereby boosting control in the current trial. In contrast, they do not fit with (a) an adaptation by binding account, in which CSE magnitude is directly related to the size of the congruency effect, or (b) a cognitive efficiency account, in which costly control processes are recruited only when behavioral adjustments cannot be mediated by low-level associative mechanisms. (c) 2016 APA, all rights reserved).

  11. High resolution tempo-spatial ozone prediction with SVM and LSTM

    NASA Astrophysics Data System (ADS)

    Gao, D.; Zhang, Y.; Qu, Z.; Sadighi, K.; Coffey, E.; LIU, Q.; Hannigan, M.; Henze, D. K.; Dick, R.; Shang, L.; Lv, Q.

    2017-12-01

    To investigate and predict the exposure of ozone and other pollutants in urban areas, we utilize data from various infrastructures including EPA, NOAA and RIITS from government of Los Angeles and construct statistical models to conduct ozone concentration prediction in Los Angeles areas at finer spatial and temporal granularity. Our work involves cyber data such as traffic, roads and population data as features for prediction. Two statistical models, Support Vector Machine (SVM) and Long Short-term Memory (LSTM, deep learning method) are used for prediction. . Our experiments show that kernelized SVM gains better prediction performance when taking traffic counts, road density and population density as features, with a prediction RMSE of 7.99 ppb for all-time ozone and 6.92 ppb for peak-value ozone. With simulated NOx from Chemical Transport Model(CTM) as features, SVM generates even better prediction performance, with a prediction RMSE of 6.69ppb. We also build LSTM, which has shown great advantages at dealing with temporal sequences, to predict ozone concentration by treating ozone concentration as spatial-temporal sequences. Trained by ozone concentration measurements from the 13 EPA stations in LA area, the model achieves 4.45 ppb RMSE. Besides, we build a variant of this model which adds spatial dynamics into the model in the form of transition matrix that reveals new knowledge on pollutant transition. The forgetting gate of the trained LSTM is consistent with the delay effect of ozone concentration and the trained transition matrix shows spatial consistency with the common direction of winds in LA area.

  12. Computed tomography synthesis from magnetic resonance images in the pelvis using multiple random forests and auto-context features

    NASA Astrophysics Data System (ADS)

    Andreasen, Daniel; Edmund, Jens M.; Zografos, Vasileios; Menze, Bjoern H.; Van Leemput, Koen

    2016-03-01

    In radiotherapy treatment planning that is only based on magnetic resonance imaging (MRI), the electron density information usually obtained from computed tomography (CT) must be derived from the MRI by synthesizing a so-called pseudo CT (pCT). This is a non-trivial task since MRI intensities are neither uniquely nor quantitatively related to electron density. Typical approaches involve either a classification or regression model requiring specialized MRI sequences to solve intensity ambiguities, or an atlas-based model necessitating multiple registrations between atlases and subject scans. In this work, we explore a machine learning approach for creating a pCT of the pelvic region from conventional MRI sequences without using atlases. We use a random forest provided with information about local texture, edges and spatial features derived from the MRI. This helps to solve intensity ambiguities. Furthermore, we use the concept of auto-context by sequentially training a number of classification forests to create and improve context features, which are finally used to train a regression forest for pCT prediction. We evaluate the pCT quality in terms of the voxel-wise error and the radiologic accuracy as measured by water-equivalent path lengths. We compare the performance of our method against two baseline pCT strategies, which either set all MRI voxels in the subject equal to the CT value of water, or in addition transfer the bone volume from the real CT. We show an improved performance compared to both baseline pCTs suggesting that our method may be useful for MRI-only radiotherapy.

  13. DNA sequence analysis in 598 individuals with a clinical diagnosis of osteogenesis imperfecta: diagnostic yield and mutation spectrum.

    PubMed

    Bardai, G; Moffatt, P; Glorieux, F H; Rauch, F

    2016-12-01

    We detected disease-causing mutations in 585 of 598 individuals (98 %) with typical features of osteogenesis imperfecta (OI). In mild OI, only collagen type I encoding genes were involved. In moderate to severe OI, mutations in 12 different genes were found; 11 % of these patients had mutations in recessive genes. OI is usually caused by mutations in COL1A1 or COL1A2, the genes encoding collagen type I alpha chains, but mutations in at least 16 other genes have also been associated with OI. It is presently unknown what proportion of individuals with clinical features of OI has a disease-causing mutation in one of these genes. DNA sequence analysis was performed on 598 individuals from 487 families who had a typical OI phenotype. OI type I was diagnosed in 43 % of individuals, and 57 % had moderate to severe OI, defined as OI types other than type I. Disease-causing variants were detected in 97 % of individuals with OI type I and in 99 % of patients with moderate to severe OI. All mutations found in OI type I were dominant and exclusively affected COL1A1 or COL1A2. In moderate to severe OI, dominant mutations were found in COL1A1/COL1A2 (77 %), IFITM5 (9 %), and P4HB (0.6 %). Mutations in one of the recessive OI-associated gene were observed in 12 % of individuals with moderate to severe OI. The genes most frequently involved in recessive OI were SERPINF1 (4.0 % of individuals with moderate to severe OI) and CRTAP (2.9 %). DNA sequence analysis of currently known OI-associated genes identifies disease-causing variants in almost all individuals with a typical OI phenotype. About 20 % of individuals with moderate to severe OI had mutations in genes other than COL1A1/COL1A2.

  14. Graph pyramids for protein function prediction

    PubMed Central

    2015-01-01

    Background Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Methods Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Results Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data. PMID:26044522

  15. Graph pyramids for protein function prediction.

    PubMed

    Sandhan, Tushar; Yoo, Youngjun; Choi, Jin; Kim, Sun

    2015-01-01

    Uncovering the hidden organizational characteristics and regularities among biological sequences is the key issue for detailed understanding of an underlying biological phenomenon. Thus pattern recognition from nucleic acid sequences is an important affair for protein function prediction. As proteins from the same family exhibit similar characteristics, homology based approaches predict protein functions via protein classification. But conventional classification approaches mostly rely on the global features by considering only strong protein similarity matches. This leads to significant loss of prediction accuracy. Here we construct the Protein-Protein Similarity (PPS) network, which captures the subtle properties of protein families. The proposed method considers the local as well as the global features, by examining the interactions among 'weakly interacting proteins' in the PPS network and by using hierarchical graph analysis via the graph pyramid. Different underlying properties of the protein families are uncovered by operating the proposed graph based features at various pyramid levels. Experimental results on benchmark data sets show that the proposed hierarchical voting algorithm using graph pyramid helps to improve computational efficiency as well the protein classification accuracy. Quantitatively, among 14,086 test sequences, on an average the proposed method misclassified only 21.1 sequences whereas baseline BLAST score based global feature matching method misclassified 362.9 sequences. With each correctly classified test sequence, the fast incremental learning ability of the proposed method further enhances the training model. Thus it has achieved more than 96% protein classification accuracy using only 20% per class training data.

  16. Abstract feature codes: The building blocks of the implicit learning system.

    PubMed

    Eberhardt, Katharina; Esser, Sarah; Haider, Hilde

    2017-07-01

    According to the Theory of Event Coding (TEC; Hommel, Müsseler, Aschersleben, & Prinz, 2001), action and perception are represented in a shared format in the cognitive system by means of feature codes. In implicit sequence learning research, it is still common to make a conceptual difference between independent motor and perceptual sequences. This supposedly independent learning takes place in encapsulated modules (Keele, Ivry, Mayr, Hazeltine, & Heuer 2003) that process information along single dimensions. These dimensions have remained underspecified so far. It is especially not clear whether stimulus and response characteristics are processed in separate modules. Here, we suggest that feature dimensions as they are described in the TEC should be viewed as the basic content of modules of implicit learning. This means that the modules process all stimulus and response information related to certain feature dimensions of the perceptual environment. In 3 experiments, we investigated by means of a serial reaction time task the nature of the basic units of implicit learning. As a test case, we used stimulus location sequence learning. The results show that a stimulus location sequence and a response location sequence cannot be learned without interference (Experiment 2) unless one of the sequences can be coded via an alternative, nonspatial dimension (Experiment 3). These results support the notion that spatial location is one module of the implicit learning system and, consequently, that there are no separate processing units for stimulus versus response locations. (PsycINFO Database Record (c) 2017 APA, all rights reserved).

  17. Chemical event chain model of coupled genetic oscillators.

    PubMed

    Jörg, David J; Morelli, Luis G; Jülicher, Frank

    2018-03-01

    We introduce a stochastic model of coupled genetic oscillators in which chains of chemical events involved in gene regulation and expression are represented as sequences of Poisson processes. We characterize steady states by their frequency, their quality factor, and their synchrony by the oscillator cross correlation. The steady state is determined by coupling and exhibits stochastic transitions between different modes. The interplay of stochasticity and nonlinearity leads to isolated regions in parameter space in which the coupled system works best as a biological pacemaker. Key features of the stochastic oscillations can be captured by an effective model for phase oscillators that are coupled by signals with distributed delays.

  18. Robinow syndrome: phenotypic variability in a family with a novel intragenic ROR2 mutation.

    PubMed

    Brunetti-Pierri, Nicola; Del Gaudio, Daniela; Peters, Hartmut; Justino, Henri; Ott, Claus-Eric; Mundlos, Stefan; Bacino, Carlos A

    2008-11-01

    Robinow syndrome comprises dysmorphic facial features, short stature, brachymesomelia, segmental spine defects, and genital hypoplasia. The range of severity in this disorder is broad. We report on the clinical and molecular findings of two sib pairs from the same extended family with Robinow syndrome due to a novel intragenic ROR2 deletion involving exons 6 and 7 that could not be detected by sequencing. The affected individuals exhibited variability with respect to the cleft lip, cleft palate, and cardiac findings and for the presence in one of the patients of syringomyelia, which has not been previously reported in Robinow syndrome. Copyright 2008 Wiley-Liss, Inc.

  19. Multistep modeling of protein structure: application to bungarotoxin

    NASA Technical Reports Server (NTRS)

    Srinivasan, S.; Shibata, M.; Rein, R.

    1986-01-01

    Modelling of bungarotoxin in atomic details is presented in this article. The model-building procedure utilizes the low-resolution crystal coordinates of the c-alpha atoms of bungarotoxin, sequence homology within the neurotoxin family, as well as high-resolution x-ray diffraction data of cobratoxin and erabutoxin. Our model-building procedure involves: (a) principles of comparative modelling, (b) embedding procedures of distance geometry, and (c) use of molecular mechanics for optimizing packing. The model is not only consistent with the c-alpha coordinates of crystal structure, but also agrees with solution conformational features of the triple-stranded beta sheet as observed by NOE measurements.

  20. Inversion-mediated gene fusions involving NAB2-STAT6 in an unusual malignant meningioma.

    PubMed

    Gao, F; Ling, C; Shi, L; Commins, D; Zada, G; Mack, W J; Wang, K

    2013-08-20

    Meningiomas are the most common primary intracranial tumours, with ∼3% meeting current histopathologic criteria for malignancy. In this study, we explored the transcriptome of meningiomas using RNA-Seq. Inversion-mediated fusions between two adjacent genes, NAB2 and STAT6, were detected in one malignant tumour, creating two novel in-frame transcripts that were validated by RT-PCR and Sanger sequencing. Gene fusions of NAB2-STAT6 were recently implicated in the pathogenesis of solitary fibrous tumours; our study suggested that similar fusions may also have a role in a malignant meningioma with unusual histopathologic features.

  1. Chemical event chain model of coupled genetic oscillators

    NASA Astrophysics Data System (ADS)

    Jörg, David J.; Morelli, Luis G.; Jülicher, Frank

    2018-03-01

    We introduce a stochastic model of coupled genetic oscillators in which chains of chemical events involved in gene regulation and expression are represented as sequences of Poisson processes. We characterize steady states by their frequency, their quality factor, and their synchrony by the oscillator cross correlation. The steady state is determined by coupling and exhibits stochastic transitions between different modes. The interplay of stochasticity and nonlinearity leads to isolated regions in parameter space in which the coupled system works best as a biological pacemaker. Key features of the stochastic oscillations can be captured by an effective model for phase oscillators that are coupled by signals with distributed delays.

  2. Approaches to Fungal Genome Annotation

    PubMed Central

    Haas, Brian J.; Zeng, Qiandong; Pearson, Matthew D.; Cuomo, Christina A.; Wortman, Jennifer R.

    2011-01-01

    Fungal genome annotation is the starting point for analysis of genome content. This generally involves the application of diverse methods to identify features on a genome assembly such as protein-coding and non-coding genes, repeats and transposable elements, and pseudogenes. Here we describe tools and methods leveraged for eukaryotic genome annotation with a focus on the annotation of fungal nuclear and mitochondrial genomes. We highlight the application of the latest technologies and tools to improve the quality of predicted gene sets. The Broad Institute eukaryotic genome annotation pipeline is described as one example of how such methods and tools are integrated into a sequencing center’s production genome annotation environment. PMID:22059117

  3. Neutral Evolution of Duplicated DNA: An Evolutionary Stick-Breaking Process Causes Scale-Invariant Behavior

    NASA Astrophysics Data System (ADS)

    Massip, Florian; Arndt, Peter F.

    2013-04-01

    Recently, an enrichment of identical matching sequences has been found in many eukaryotic genomes. Their length distribution exhibits a power law tail raising the question of what evolutionary mechanism or functional constraints would be able to shape this distribution. Here we introduce a simple and evolutionarily neutral model, which involves only point mutations and segmental duplications, and produces the same statistical features as observed for genomic data. Further, we extend a mathematical model for random stick breaking to analytically show that the exponent of the power law tail is -3 and universal as it does not depend on the microscopic details of the model.

  4. Phylogeny of isolates of Prunus necrotic ringspot virus from the Ilarvirus Ringtest and identification of group-specific features.

    PubMed

    Hammond, R W

    2003-06-01

    Isolates of Prunus necrotic ringspot virus (PNRSV) were examined to establish the level of naturally occurring sequence variation in the coat protein (CP) gene and to identify group-specific genome features that may prove valuable for the generation of diagnostic reagents. Phylogenetic analysis of a 452 bp sequence of 68 virus isolates, 20 obtained from the European Union Ilarvirus Ringtest held in October 1998, confirmed the clustering of the isolates into three distinct groups. Although no correlation was found between the sequence and host or geographic origin, there was a general trend for severe isolates to cluster into one group. Group-specific features have been identified for discrimination between virus strains.

  5. Form drag in rivers due to small-scale natural topographic features: 2. Irregular sequences

    USGS Publications Warehouse

    Kean, J.W.; Smith, J.D.

    2006-01-01

    The size, shape, and spacing of small-scale topographic features found on the boundaries of natural streams, rivers, and floodplains can be quite variable. Consequently, a procedure for determining the form drag on irregular sequences of different-sized topographic features is essential for calculating near-boundary flows and sediment transport. A method for carrying out such calculations is developed in this paper. This method builds on the work of Kean and Smith (2006), which describes the flow field for the simpler case of a regular sequence of identical topographic features. Both approaches model topographic features as two-dimensional elements with Gaussian-shaped cross sections defined in terms of three parameters. Field measurements of bank topography are used to show that (1) the magnitude of these shape parameters can vary greatly between adjacent topographic features and (2) the variability of these shape parameters follows a lognormal distribution. Simulations using an irregular set of topographic roughness elements show that the drag on an individual element is primarily controlled by the size and shape of the feature immediately upstream and that the spatial average of the boundary shear stress over a large set of randomly ordered elements is relatively insensitive to the sequence of the elements. In addition, a method to transform the topography of irregular surfaces into an equivalently rough surface of regularly spaced, identical topographic elements also is given. The methods described in this paper can be used to improve predictions of flow resistance in rivers as well as quantify bank roughness.

  6. Templated sequence insertion polymorphisms in the human genome

    NASA Astrophysics Data System (ADS)

    Onozawa, Masahiro; Aplan, Peter

    2016-11-01

    Templated Sequence Insertion Polymorphism (TSIP) is a recently described form of polymorphism recognized in the human genome, in which a sequence that is templated from a distant genomic region is inserted into the genome, seemingly at random. TSIPs can be grouped into two classes based on nucleotide sequence features at the insertion junctions; Class 1 TSIPs show features of insertions that are mediated via the LINE-1 ORF2 protein, including 1) target-site duplication (TSD), 2) polyadenylation 10-30 nucleotides downstream of a “cryptic” polyadenylation signal, and 3) preference for insertion at a 5’-TTTT/A-3’ sequence. In contrast, class 2 TSIPs show features consistent with repair of a DNA double-strand break via insertion of a DNA “patch” that is derived from a distant genomic region. Survey of a large number of normal human volunteers demonstrates that most individuals have 25-30 TSIPs, and that these TSIPs track with specific geographic regions. Similar to other forms of human polymorphism, we suspect that these TSIPs may be important for the generation of human diversity and genetic diseases.

  7. Why cellular communication during plant reproduction is particularly mediated by CRP signalling.

    PubMed

    Bircheneder, Susanne; Dresselhaus, Thomas

    2016-08-01

    Secreted cysteine-rich peptides (CRPs) represent one of the main classes of signalling peptides in plants. Whereas post-translationally modified small non-CRP peptides (psNCRPs) are mostly involved in signalling events during vegetative development and interactions with the environment, CRPs are overrepresented in reproductive processes including pollen germination and growth, self-incompatibility, gamete activation and fusion as well as seed development. In this opinion paper we compare the involvement of both types of peptides in vegetative and reproductive phases of the plant lifecycle. Besides their conserved cysteine pattern defining structural features, CRPs exhibit hypervariable primary sequences and a rapid evolution rate. As a result, CRPs represent a pool of highly polymorphic signalling peptides involved in species-specific functions during reproduction and thus likely represent key players to trigger speciation in plants by supporting reproductive isolation. In contrast, precursers of psNCRPs are proteolytically processed into small functional domains with high sequence conservation and act in more general processes. We discuss parallels in downstream processes of CRP signalling in both reproduction and defence against pathogenic fungi and alien pollen tubes, with special emphasis on the role of ROS and ion channels. In conclusion we suggest that CRP signalling during reproduction in plants has evolved from ancient defence mechanisms. © The Author 2016. Published by Oxford University Press on behalf of the Society for Experimental Biology. All rights reserved. For permissions, please email: journals.permissions@oup.com.

  8. Two distinct CXC chemokine receptors (CXCR3 and CXCR4) from the big-belly seahorse Hippocampus abdominalis: Molecular perspectives and immune defensive role upon pathogenic stress.

    PubMed

    Priyathilaka, Thanthrige Thiunuwan; Oh, Minyoung; Bathige, S D N K; De Zoysa, Mahanama; Lee, Jehee

    2017-06-01

    CXC chemokine receptor 3 (CXCR3) and 4 (CXCR4) are members of the seven transmembrane G protein coupled receptor family, involved in pivotal physiological functions. In this study, seahorse CXCR3 and CXCR4 (designated as HaCXCR3 and HaCXCR4) cDNA sequences were identified from the transcriptome library and subsequently molecularly characterized. HaCXCR3 and HaCXCR4 encoded 363 and 373 amino acid long polypeptides, respectively. The HaCXCR3 and HaCXCR4 deduced proteins have typical structural features of chemokine receptors, including seven transmembrane domains and a G protein coupled receptors family 1 profile with characteristic DRY motifs. Amino acid sequence comparison and phylogenetic analysis of these two CXC chemokine receptors revealed a close relationship to their corresponding teleost counterparts. Quantitative real time PCR analysis revealed that HaCXCR3 and HaCXCR4 were ubiquitously expressed in all the tested tissues, with highest expression levels in blood cells. The seahorse blood cells and kidney HaCXCR3 and HaCXCR4 mRNA expressions were differently modulated when challenged with Edwardsiella tarda, Streptococcus iniae, lipopolysaccharide, and polyinosinic:polycytidylic acid, confirming their involvement in post immune responses. Copyright © 2017 Elsevier Ltd. All rights reserved.

  9. Chromosome catastrophes involve replication mechanisms generating complex genomic rearrangements

    PubMed Central

    Liu, Pengfei; Erez, Ayelet; Sreenath Nagamani, Sandesh C.; Dhar, Shweta U.; Kołodziejska, Katarzyna E.; Dharmadhikari, Avinash V.; Cooper, M. Lance; Wiszniewska, Joanna; Zhang, Feng; Withers, Marjorie A.; Bacino, Carlos A.; Campos-Acevedo, Luis Daniel; Delgado, Mauricio R.; Freedenberg, Debra; Garnica, Adolfo; Grebe, Theresa A.; Hernández-Almaguer, Dolores; Immken, LaDonna; Lalani, Seema R.; McLean, Scott D.; Northrup, Hope; Scaglia, Fernando; Strathearn, Lane; Trapane, Pamela; Kang, Sung-Hae L.; Patel, Ankita; Cheung, Sau Wai; Hastings, P. J.; Stankiewicz, Paweł; Lupski, James R.; Bi, Weimin

    2011-01-01

    SUMMARY Complex genomic rearrangements (CGR) consisting of two or more breakpoint junctions have been observed in genomic disorders. Recently, a chromosome catastrophe phenomenon termed chromothripsis, in which numerous genomic rearrangements are apparently acquired in one single catastrophic event, was described in multiple cancers. Here we show that constitutionally acquired CGRs share similarities with cancer chromothripsis. In the 17 CGR cases investigated we observed localization and multiple copy number changes including deletions, duplications and/or triplications, as well as extensive translocations and inversions. Genomic rearrangements involved varied in size and complexities; in one case, array comparative genomic hybridization revealed 18 copy number changes. Breakpoint sequencing identified characteristic features, including small templated insertions at breakpoints and microhomology at breakpoint junctions, which have been attributed to replicative processes. The resemblance between CGR and chromothripsis suggests similar mechanistic underpinnings. Such chromosome catastrophic events appear to reflect basic DNA metabolism operative throughout an organism’s life cycle. PMID:21925314

  10. Evaluation of a patient with classical Ehlers-Danlos syndrome due to a 9q34 duplication affecting COL5A1.

    PubMed

    Kuroda, Yukiko; Ohashi, Ikuko; Naruto, Takuya; Ida, Kazumi; Enomoto, Yumi; Saito, Toshiyuki; Nagai, Jun-Ichi; Kurosawa, Kenji

    2018-03-09

    Ehlers-Danlos syndrome classical type is a connective tissue disorder characterized by skin hyperextensibility, atrophic scarring, and joint hypermobility. The condition typically results from mutations in COL5A1 or COL5A2 leading to the functional haploinsufficiency. Here, we report of a 24-year-old male with mild intellectual disability, dysmorphic features, and a phenotype consistent with Ehlers-Danlos syndrome classical type. A copy number variant-calling algorithm from panel sequencing data identified the deletions exons 2-11 and duplications of exons 12-67 within COL5A1. Array comparative genomic hybridization confirmed a 94 kb deletion at 9q34.3 involving exons 2-11 of COL5A1, and a 3.4 Mb duplication at 9q34.3 involving exons 12-67 of COL5A1. © 2018 Japanese Teratology Society.

  11. Identification of HIBCH gene mutations causing autosomal recessive Leigh syndrome: a gene involved in valine metabolism.

    PubMed

    Soler-Alfonso, Claudia; Enns, Gregory M; Koenig, Mary Kay; Saavedra, Heather; Bonfante-Mejia, Eliana; Northrup, Hope

    2015-03-01

    Leigh syndrome is a progressive neurodegenerative disorder with usual onset of symptoms during the first year of life. The disorder has been associated with mutations in over 30 genes. This difficulty with genetic heterogeneity makes whole exome sequencing a more cost-effective approach for investigation of etiology. We describe an individual with typical Leigh syndrome who was found to have compound heterozygous mutations in the gene HIBCH (3-hydroxyisobutyryl coenzyme A hydrolase), an enzyme involved in the catabolism of valine. She exhibited significant clinical improvement after a valine-restricted diet. A subset of patients with uncharacterized Leigh syndrome present with specific biochemical abnormalities. This report highpoints the challenges and restrictions of routine metabolic testing and features the recognition of inborn errors of metabolism as potential treatable causes of Leigh syndrome. Copyright © 2015 Elsevier Inc. All rights reserved.

  12. Domain-specific learning of grammatical structure in musical and phonological sequences.

    PubMed

    Bly, Benjamin Martin; Carrión, Ricardo E; Rasch, Björn

    2009-01-01

    Artificial grammar learning depends on acquisition of abstract structural representations rather than domain-specific representational constraints, or so many studies tell us. Using an artificial grammar task, we compared learning performance in two stimulus domains in which respondents have differing tacit prior knowledge. We found that despite grammatically identical sequence structures, learning was better for harmonically related chord sequences than for letter name sequences or harmonically unrelated chord sequences. We also found transfer effects within the musical and letter name tasks, but not across the domains. We conclude that knowledge acquired in implicit learning depends not only on abstract features of structured stimuli, but that the learning of regularities is in some respects domain-specific and strongly linked to particular features of the stimulus domain.

  13. The impact of feature selection on one and two-class classification performance for plant microRNAs.

    PubMed

    Khalifa, Waleed; Yousef, Malik; Saçar Demirci, Müşerref Duygu; Allmer, Jens

    2016-01-01

    MicroRNAs (miRNAs) are short nucleotide sequences that form a typical hairpin structure which is recognized by a complex enzyme machinery. It ultimately leads to the incorporation of 18-24 nt long mature miRNAs into RISC where they act as recognition keys to aid in regulation of target mRNAs. It is involved to determine miRNAs experimentally and, therefore, machine learning is used to complement such endeavors. The success of machine learning mostly depends on proper input data and appropriate features for parameterization of the data. Although, in general, two-class classification (TCC) is used in the field; because negative examples are hard to come by, one-class classification (OCC) has been tried for pre-miRNA detection. Since both positive and negative examples are currently somewhat limited, feature selection can prove to be vital for furthering the field of pre-miRNA detection. In this study, we compare the performance of OCC and TCC using eight feature selection methods and seven different plant species providing positive pre-miRNA examples. Feature selection was very successful for OCC where the best feature selection method achieved an average accuracy of 95.6%, thereby being ∼29% better than the worst method which achieved 66.9% accuracy. While the performance is comparable to TCC, which performs up to 3% better than OCC, TCC is much less affected by feature selection and its largest performance gap is ∼13% which only occurs for two of the feature selection methodologies. We conclude that feature selection is crucially important for OCC and that it can perform on par with TCC given the proper set of features.

  14. Algorithm, applications and evaluation for protein comparison by Ramanujan Fourier transform.

    PubMed

    Zhao, Jian; Wang, Jiasong; Hua, Wei; Ouyang, Pingkai

    2015-12-01

    The amino acid sequence of a protein determines its chemical properties, chain conformation and biological functions. Protein sequence comparison is of great importance to identify similarities of protein structures and infer their functions. Many properties of a protein correspond to the low-frequency signals within the sequence. Low frequency modes in protein sequences are linked to the secondary structures, membrane protein types, and sub-cellular localizations of the proteins. In this paper, we present Ramanujan Fourier transform (RFT) with a fast algorithm to analyze the low-frequency signals of protein sequences. The RFT method is applied to similarity analysis of protein sequences with the Resonant Recognition Model (RRM). The results show that the proposed fast RFT method on protein comparison is more efficient than commonly used discrete Fourier transform (DFT). RFT can detect common frequencies as significant feature for specific protein families, and the RFT spectrum heat-map of protein sequences demonstrates the information conservation in the sequence comparison. The proposed method offers a new tool for pattern recognition, feature extraction and structural analysis on protein sequences. Copyright © 2015 Elsevier Ltd. All rights reserved.

  15. SSAW: A new sequence similarity analysis method based on the stationary discrete wavelet transform.

    PubMed

    Lin, Jie; Wei, Jing; Adjeroh, Donald; Jiang, Bing-Hua; Jiang, Yue

    2018-05-02

    Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts. A new alignment-free sequence similarity analysis method, called SSAW is proposed. SSAW stands for Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification. Using two different types of applications, namely, clustering and classification, we compared SSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or superior performance in terms of standard indicators, such as accuracy, F-score, precision, and recall. The running time was significantly better in most cases. These make SSAW a suitable method for sequence analysis, especially, given the rapidly increasing volumes of sequence data required by most modern applications.

  16. Prediction of phenotypes of missense mutations in human proteins from biological assemblies.

    PubMed

    Wei, Qiong; Xu, Qifang; Dunbrack, Roland L

    2013-02-01

    Single nucleotide polymorphisms (SNPs) are the most frequent variation in the human genome. Nonsynonymous SNPs that lead to missense mutations can be neutral or deleterious, and several computational methods have been presented that predict the phenotype of human missense mutations. These methods use sequence-based and structure-based features in various combinations, relying on different statistical distributions of these features for deleterious and neutral mutations. One structure-based feature that has not been studied significantly is the accessible surface area within biologically relevant oligomeric assemblies. These assemblies are different from the crystallographic asymmetric unit for more than half of X-ray crystal structures. We find that mutations in the core of proteins or in the interfaces in biological assemblies are significantly more likely to be disease-associated than those on the surface of the biological assemblies. For structures with more than one protein in the biological assembly (whether the same sequence or different), we find the accessible surface area from biological assemblies provides a statistically significant improvement in prediction over the accessible surface area of monomers from protein crystal structures (P = 6e-5). When adding this information to sequence-based features such as the difference between wildtype and mutant position-specific profile scores, the improvement from biological assemblies is statistically significant but much smaller (P = 0.018). Combining this information with sequence-based features in a support vector machine leads to 82% accuracy on a balanced dataset of 50% disease-associated mutations from SwissVar and 50% neutral mutations from human/primate sequence differences in orthologous proteins. Copyright © 2012 Wiley Periodicals, Inc.

  17. Similarity as an organising principle in short-term memory.

    PubMed

    LeCompte, D C; Watkins, M J

    1993-03-01

    The role of stimulus similarity as an organising principle in short-term memory was explored in a series of seven experiments. Each experiment involved the presentation of a short sequence of items that were drawn from two distinct physical classes and arranged such that item class changed after every second item. Following presentation, one item was re-presented as a probe for the 'target' item that had directly followed it in the sequence. Memory for the sequence was considered organised by class if probability of recall was higher when the probe and target were from the same class than when they were from different classes. Such organisation was found when one class was auditory and the other was visual (spoken vs. written words, and sounds vs. pictures). It was also found when both classes were auditory (words spoken in a male voice vs. words spoken in a female voice) and when both classes were visual (digits shown in one location vs. digits shown in another). It is concluded that short-term memory can be organised on the basis of sensory modality and on the basis of certain features within both the auditory and visual modalities.

  18. Towards pathogenomics: a web-based resource for pathogenicity islands

    PubMed Central

    Yoon, Sung Ho; Park, Young-Kyu; Lee, Soohyun; Choi, Doil; Oh, Tae Kwang; Hur, Cheol-Goo; Kim, Jihyun F.

    2007-01-01

    Pathogenicity islands (PAIs) are genetic elements whose products are essential to the process of disease development. They have been horizontally (laterally) transferred from other microbes and are important in evolution of pathogenesis. In this study, a comprehensive database and search engines specialized for PAIs were established. The pathogenicity island database (PAIDB) is a comprehensive relational database of all the reported PAIs and potential PAI regions which were predicted by a method that combines feature-based analysis and similarity-based analysis. Also, using the PAI Finder search application, a multi-sequence query can be analyzed onsite for the presence of potential PAIs. As of April 2006, PAIDB contains 112 types of PAIs and 889 GenBank accessions containing either partial or all PAI loci previously reported in the literature, which are present in 497 strains of pathogenic bacteria. The database also offers 310 candidate PAIs predicted from 118 sequenced prokaryotic genomes. With the increasing number of prokaryotic genomes without functional inference and sequenced genetic regions of suspected involvement in diseases, this web-based, user-friendly resource has the potential to be of significant use in pathogenomics. PAIDB is freely accessible at . PMID:17090594

  19. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing

    PubMed Central

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-01-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome. PMID:20392818

  20. Survey of the transcriptome of Aspergillus oryzae via massively parallel mRNA sequencing.

    PubMed

    Wang, Bin; Guo, Guangwu; Wang, Chao; Lin, Ying; Wang, Xiaoning; Zhao, Mouming; Guo, Yong; He, Minghui; Zhang, Yong; Pan, Li

    2010-08-01

    Aspergillus oryzae, an important filamentous fungus used in food fermentation and the enzyme industry, has been shown through genome sequencing and various other tools to have prominent features in its genomic composition. However, the functional complexity of the A. oryzae transcriptome has not yet been fully elucidated. Here, we applied direct high-throughput paired-end RNA-sequencing (RNA-Seq) to the transcriptome of A. oryzae under four different culture conditions. With the high resolution and sensitivity afforded by RNA-Seq, we were able to identify a substantial number of novel transcripts, new exons, untranslated regions, alternative upstream initiation codons and upstream open reading frames, which provide remarkable insight into the A. oryzae transcriptome. We were also able to assess the alternative mRNA isoforms in A. oryzae and found a large number of genes undergoing alternative splicing. Many genes and pathways that might be involved in higher levels of protein production in solid-state culture than in liquid culture were identified by comparing gene expression levels between different cultures. Our analysis indicated that the transcriptome of A. oryzae is much more complex than previously anticipated, and these results may provide a blueprint for further study of the A. oryzae transcriptome.

  1. Exploring the Sequence-based Prediction of Folding Initiation Sites in Proteins.

    PubMed

    Raimondi, Daniele; Orlando, Gabriele; Pancsa, Rita; Khan, Taushif; Vranken, Wim F

    2017-08-18

    Protein folding is a complex process that can lead to disease when it fails. Especially poorly understood are the very early stages of protein folding, which are likely defined by intrinsic local interactions between amino acids close to each other in the protein sequence. We here present EFoldMine, a method that predicts, from the primary amino acid sequence of a protein, which amino acids are likely involved in early folding events. The method is based on early folding data from hydrogen deuterium exchange (HDX) data from NMR pulsed labelling experiments, and uses backbone and sidechain dynamics as well as secondary structure propensities as features. The EFoldMine predictions give insights into the folding process, as illustrated by a qualitative comparison with independent experimental observations. Furthermore, on a quantitative proteome scale, the predicted early folding residues tend to become the residues that interact the most in the folded structure, and they are often residues that display evolutionary covariation. The connection of the EFoldMine predictions with both folding pathway data and the folded protein structure suggests that the initial statistical behavior of the protein chain with respect to local structure formation has a lasting effect on its subsequent states.

  2. Molecular Characterization of a Catalase from Hydra vulgaris

    PubMed Central

    Dash, Bhagirathi; Phillips, Timothy D.

    2012-01-01

    Catalase, an antioxidant and hydroperoxidase enzyme protects the cellular environment from harmful effects of hydrogen peroxide by facilitating its degradation to oxygen and water. Molecular information on a cnidarian catalase and/or peroxidase is, however, limited. In this work an apparent full length cDNA sequence coding for a catalase (HvCatalase) was isolated from Hydra vulgaris using 3’- and 5’- (RLM) RACE approaches. The 1859 bp HvCatalase cDNA included an open reading frame of 1518 bp encoding a putative protein of 505 amino acids with a predicted molecular mass of 57.44 kDa. The deduced amino acid sequence of HvCatalase contained several highly conserved motifs including the heme-ligand signature sequence RLFSYGDTH and the active site signature FXRERIPERVVHAKGXGA. A comparative analysis showed the presence of conserved catalytic amino acids [His(71), Asn(145), and Tyr(354)] in HvCatalase as well. Homology modeling indicated the presence of the conserved features of mammalian catalase fold. Hydrae exposed to thermal, starvation, metal and oxidative stress responded by regulating its catalase mRNA transcription. These results indicated that the HvCatalase gene is involved in the cellular stress response and (anti)oxidative processes triggered by stressor and contaminant exposure. PMID:22521743

  3. Characterization of the flgG operon of Rhodobacter sphaeroides WS8 and its role in flagellum biosynthesis.

    PubMed

    González-Pedrajo, Bertha; de la Mora, Javier; Ballado, Teresa; Camarena, Laura; Dreyfus, Georges

    2002-11-13

    In this work, we show evidence regarding the functionality of a large cluster of flagellar genes in Rhodobacter sphaeroides. The genes of this cluster, flgGHIJKL and orf-1, are mainly involved in the formation of the basal body, and flgK and flgL encode the hook-associated proteins HAP1 and HAP3. In general, these genes showed a good similarity as compared with those reported for Salmonella enterica. However, flgJ and flgK showed particular features that make them unique among the flagellar sequences already reported. flgJ is only a third of the size reported for flgJ from Salmonella; whereas flgK is about three times larger than any other flgK sequence previously known. Our results indicate that both genes are functional, and their products are essential for flagellar assembly. In contrast, the interruption of orf-1, did not affect motility suggesting that this sequence, if functional, is not indispensable for flagellar assembly. Finally, we present genetic evidence suggesting that the flgGHIJKL genes are expressed as a single transcriptional unit depending on the sigma-54 factor.

  4. Novel rare variations of the oxytocin receptor (OXTR) gene in autism spectrum disorder individuals.

    PubMed

    Liu, Xiaoxi; Kawashima, Minae; Miyagawa, Taku; Otowa, Takeshi; Latt, Khun Zaw; Thiri, Myo; Nishida, Hisami; Sugiyama, Toshiro; Tsurusaki, Yoshinori; Matsumoto, Naomichi; Mabuchi, Akihiko; Tokunaga, Katsushi; Sasaki, Tsukasa

    2015-01-01

    The oxytocin receptor (OXTR) gene has been implicated as a risk gene for autism spectrum disorder (ASD)-a neurodevelopmental disorder with essential features of impairments in social communication and reciprocal interaction. The genetic associations between common variations in OXTR and ASD have been reported in multiple ethnic populations. However, little is known about the distribution of rare variations within OXTR in ASD patients. In this study, we resequenced the full length of OXTR in 105 ASD individuals using an approach that combined the power of next-generation sequencing technology, long-range PCR and DNA pooling. We demonstrated that rare variants with minor allele frequency as low as 0.05% could be reliably detected by our method. We identified 28 novel variants including potential functional variants in the intron region and one rare missense variant (R150S). We subsequently performed Sanger sequencing and validated five novel variants located in previously suggested candidate regions in ASD individuals. Further sequencing of 312 healthy subjects showed that the burden of rare variants is significantly higher in ASDs compared with healthy individuals. Our results support that the rare variation in OXTR gene might be involved in ASD.

  5. The role of consolidation in learning context-dependent phonotactic patterns in speech and digital sequence production.

    PubMed

    Anderson, Nathaniel D; Dell, Gary S

    2018-04-03

    Speakers implicitly learn novel phonotactic patterns by producing strings of syllables. The learning is revealed in their speech errors. First-order patterns, such as "/f/ must be a syllable onset," can be distinguished from contingent, or second-order, patterns, such as "/f/ must be an onset if the vowel is /a/, but a coda if the vowel is /o/." A metaanalysis of 19 experiments clearly demonstrated that first-order patterns affect speech errors to a very great extent in a single experimental session, but second-order vowel-contingent patterns only affect errors on the second day of testing, suggesting the need for a consolidation period. Two experiments tested an analogue to these studies involving sequences of button pushes, with fingers as "consonants" and thumbs as "vowels." The button-push errors revealed two of the key speech-error findings: first-order patterns are learned quickly, but second-order thumb-contingent patterns are only strongly revealed in the errors on the second day of testing. The influence of computational complexity on the implicit learning of phonotactic patterns in speech production may be a general feature of sequence production.

  6. Predicting DNA hybridization kinetics from sequence

    NASA Astrophysics Data System (ADS)

    Zhang, Jinny X.; Fang, John Z.; Duan, Wei; Wu, Lucia R.; Zhang, Angela W.; Dalchau, Neil; Yordanov, Boyan; Petersen, Rasmus; Phillips, Andrew; Zhang, David Yu

    2018-01-01

    Hybridization is a key molecular process in biology and biotechnology, but so far there is no predictive model for accurately determining hybridization rate constants based on sequence information. Here, we report a weighted neighbour voting (WNV) prediction algorithm, in which the hybridization rate constant of an unknown sequence is predicted based on similarity reactions with known rate constants. To construct this algorithm we first performed 210 fluorescence kinetics experiments to observe the hybridization kinetics of 100 different DNA target and probe pairs (36 nt sub-sequences of the CYCS and VEGF genes) at temperatures ranging from 28 to 55 °C. Automated feature selection and weighting optimization resulted in a final six-feature WNV model, which can predict hybridization rate constants of new sequences to within a factor of 3 with ∼91% accuracy, based on leave-one-out cross-validation. Accurate prediction of hybridization kinetics allows the design of efficient probe sequences for genomics research.

  7. Genome Sequence of Candidatus Nitrososphaera evergladensis from Group I.1b Enriched from Everglades Soil Reveals Novel Genomic Features of the Ammonia-Oxidizing Archaea

    PubMed Central

    Zhalnina, Kateryna V.; Dias, Raquel; Leonard, Michael T.; Dorr de Quadros, Patricia; Camargo, Flavio A. O.; Drew, Jennifer C.; Farmerie, William G.; Daroub, Samira H.; Triplett, Eric W.

    2014-01-01

    The activity of ammonia-oxidizing archaea (AOA) leads to the loss of nitrogen from soil, pollution of water sources and elevated emissions of greenhouse gas. To date, eight AOA genomes are available in the public databases, seven are from the group I.1a of the Thaumarchaeota and only one is from the group I.1b, isolated from hot springs. Many soils are dominated by AOA from the group I.1b, but the genomes of soil representatives of this group have not been sequenced and functionally characterized. The lack of knowledge of metabolic pathways of soil AOA presents a critical gap in understanding their role in biogeochemical cycles. Here, we describe the first complete genome of soil archaeon Candidatus Nitrososphaera evergladensis, which has been reconstructed from metagenomic sequencing of a highly enriched culture obtained from an agricultural soil. The AOA enrichment was sequenced with the high throughput next generation sequencing platforms from Pacific Biosciences and Ion Torrent. The de novo assembly of sequences resulted in one 2.95 Mb contig. Annotation of the reconstructed genome revealed many similarities of the basic metabolism with the rest of sequenced AOA. Ca. N. evergladensis belongs to the group I.1b and shares only 40% of whole-genome homology with the closest sequenced relative Ca. N. gargensis. Detailed analysis of the genome revealed coding sequences that were completely absent from the group I.1a. These unique sequences code for proteins involved in control of DNA integrity, transporters, two-component systems and versatile CRISPR defense system. Notably, genomes from the group I.1b have more gene duplications compared to the genomes from the group I.1a. We suggest that the presence of these unique genes and gene duplications may be associated with the environmental versatility of this group. PMID:24999826

  8. Unlinking the methylome pattern from nucleotide sequence, revealed by large-scale in vivo genome engineering and methylome editing in medaka fish

    PubMed Central

    Nakamura, Ryohei; Uno, Ayako; Kumagai, Masahiko; Fukushima, Hiroto S.; Morishita, Shinichi; Takeda, Hiroyuki

    2017-01-01

    The heavily methylated vertebrate genomes are punctuated by stretches of poorly methylated DNA sequences that usually mark gene regulatory regions. It is known that the methylation state of these regions confers transcriptional control over their associated genes. Given its governance on the transcriptome, cellular functions and identity, genome-wide DNA methylation pattern is tightly regulated and evidently predefined. However, how is the methylation pattern determined in vivo remains enigmatic. Based on in silico and in vitro evidence, recent studies proposed that the regional hypomethylated state is primarily determined by local DNA sequence, e.g., high CpG density and presence of specific transcription factor binding sites. Nonetheless, the dependency of DNA methylation on nucleotide sequence has not been carefully validated in vertebrates in vivo. Herein, with the use of medaka (Oryzias latipes) as a model, the sequence dependency of DNA methylation was intensively tested in vivo. Our statistical modeling confirmed the strong statistical association between nucleotide sequence pattern and methylation state in the medaka genome. However, by manipulating the methylation state of a number of genomic sequences and reintegrating them into medaka embryos, we demonstrated that artificially conferred DNA methylation states were predominantly and robustly maintained in vivo, regardless of their sequences and endogenous states. This feature was also observed in the medaka transgene that had passed across generations. Thus, despite the observed statistical association, nucleotide sequence was unable to autonomously determine its own methylation state in medaka in vivo. Our results apparently argue against the notion of the governance on the DNA methylation by nucleotide sequence, but instead suggest the involvement of other epigenetic factors in defining and maintaining the DNA methylation landscape. Further investigation in other vertebrate models in vivo will be needed for the generalization of our observations made in medaka. PMID:29267279

  9. An Evolutionary Machine Learning Framework for Big Data Sequence Mining

    ERIC Educational Resources Information Center

    Kamath, Uday Krishna

    2014-01-01

    Sequence classification is an important problem in many real-world applications. Unlike other machine learning data, there are no "explicit" features or signals in sequence data that can help traditional machine learning algorithms learn and predict from the data. Sequence data exhibits inter-relationships in the elements that are…

  10. Sequencing analysis of 20,000 full-length cDNA clones from cassava reveals lineage specific expansions in gene families related to stress response

    PubMed Central

    Sakurai, Tetsuya; Plata, Germán; Rodríguez-Zapata, Fausto; Seki, Motoaki; Salcedo, Andrés; Toyoda, Atsushi; Ishiwata, Atsushi; Tohme, Joe; Sakaki, Yoshiyuki; Shinozaki, Kazuo; Ishitani, Manabu

    2007-01-01

    Background Cassava, an allotetraploid known for its remarkable tolerance to abiotic stresses is an important source of energy for humans and animals and a raw material for many industrial processes. A full-length cDNA library of cassava plants under normal, heat, drought, aluminum and post harvest physiological deterioration conditions was built; 19968 clones were sequence-characterized using expressed sequence tags (ESTs). Results The ESTs were assembled into 6355 contigs and 9026 singletons that were further grouped into 10577 scaffolds; we found 4621 new cassava sequences and 1521 sequences with no significant similarity to plant protein databases. Transcripts of 7796 distinct genes were captured and we were able to assign a functional classification to 78% of them while finding more than half of the enzymes annotated in metabolic pathways in Arabidopsis. The annotation of sequences that were not paired to transcripts of other species included many stress-related functional categories showing that our library is enriched with stress-induced genes. Finally, we detected 230 putative gene duplications that include key enzymes in reactive oxygen species signaling pathways and could play a role in cassava stress response features. Conclusion The cassava full-length cDNA library here presented contains transcripts of genes involved in stress response as well as genes important for different areas of cassava research. This library will be an important resource for gene discovery, characterization and cloning; in the near future it will aid the annotation of the cassava genome. PMID:18096061

  11. An Imaging And Graphics Workstation For Image Sequence Analysis

    NASA Astrophysics Data System (ADS)

    Mostafavi, Hassan

    1990-01-01

    This paper describes an application-specific engineering workstation designed and developed to analyze imagery sequences from a variety of sources. The system combines the software and hardware environment of the modern graphic-oriented workstations with the digital image acquisition, processing and display techniques. The objective is to achieve automation and high throughput for many data reduction tasks involving metric studies of image sequences. The applications of such an automated data reduction tool include analysis of the trajectory and attitude of aircraft, missile, stores and other flying objects in various flight regimes including launch and separation as well as regular flight maneuvers. The workstation can also be used in an on-line or off-line mode to study three-dimensional motion of aircraft models in simulated flight conditions such as wind tunnels. The system's key features are: 1) Acquisition and storage of image sequences by digitizing real-time video or frames from a film strip; 2) computer-controlled movie loop playback, slow motion and freeze frame display combined with digital image sharpening, noise reduction, contrast enhancement and interactive image magnification; 3) multiple leading edge tracking in addition to object centroids at up to 60 fields per second from both live input video or a stored image sequence; 4) automatic and manual field-of-view and spatial calibration; 5) image sequence data base generation and management, including the measurement data products; 6) off-line analysis software for trajectory plotting and statistical analysis; 7) model-based estimation and tracking of object attitude angles; and 8) interface to a variety of video players and film transport sub-systems.

  12. 77 FR 28541 - Request for Comments on the Recommendation for the Disclosure of Sequence Listings Using XML...

    Federal Register 2010, 2011, 2012, 2013, 2014

    2012-05-15

    ... (EPO) as the lead, to propose a revised standard for the filing of nucleotide and/or amino acid.... ST.25 uses a controlled vocabulary of feature keys to describe nucleic acid and amino acid sequences... patent data purposes. The XML standard also includes four qualifiers for amino acids. These feature keys...

  13. Optimized approach for Ion Proton RNA sequencing reveals details of RNA splicing and editing features of the transcriptome.

    PubMed

    Brown, Roger B; Madrid, Nathaniel J; Suzuki, Hideaki; Ness, Scott A

    2017-01-01

    RNA-sequencing (RNA-seq) has become the standard method for unbiased analysis of gene expression but also provides access to more complex transcriptome features, including alternative RNA splicing, RNA editing, and even detection of fusion transcripts formed through chromosomal translocations. However, differences in library methods can adversely affect the ability to recover these different types of transcriptome data. For example, some methods have bias for one end of transcripts or rely on low-efficiency steps that limit the complexity of the resulting library, making detection of rare transcripts less likely. We tested several commonly used methods of RNA-seq library preparation and found vast differences in the detection of advanced transcriptome features, such as alternatively spliced isoforms and RNA editing sites. By comparing several different protocols available for the Ion Proton sequencer and by utilizing detailed bioinformatics analysis tools, we were able to develop an optimized random primer based RNA-seq technique that is reliable at uncovering rare transcript isoforms and RNA editing features, as well as fusion reads from oncogenic chromosome rearrangements. The combination of optimized libraries and rapid Ion Proton sequencing provides a powerful platform for the transcriptome analysis of research and clinical samples.

  14. Proteins without unique 3D structures: biotechnological applications of intrinsically unstable/disordered proteins.

    PubMed

    Uversky, Vladimir N

    2015-03-01

    Intrinsically disordered proteins (IDPs) and intrinsically disordered protein regions (IDPRs) are functional proteins or regions that do not have unique 3D structures under functional conditions. Therefore, from the viewpoint of their lack of stable 3D structure, IDPs/IDPRs are inherently unstable. As much as structure and function of normal ordered globular proteins are determined by their amino acid sequences, the lack of unique 3D structure in IDPs/IDPRs and their disorder-based functionality are also encoded in the amino acid sequences. Because of their specific sequence features and distinctive conformational behavior, these intrinsically unstable proteins or regions have several applications in biotechnology. This review introduces some of the most characteristic features of IDPs/IDPRs (such as peculiarities of amino acid sequences of these proteins and regions, their major structural features, and peculiar responses to changes in their environment) and describes how these features can be used in the biotechnology, for example for the proteome-wide analysis of the abundance of extended IDPs, for recombinant protein isolation and purification, as polypeptide nanoparticles for drug delivery, as solubilization tools, and as thermally sensitive carriers of active peptides and proteins. Copyright © 2014 WILEY-VCH Verlag GmbH & Co. KGaA, Weinheim.

  15. Transitions in Lava Emplacement Recorded in the Deccan Traps Sequence (India)

    NASA Astrophysics Data System (ADS)

    Vanderkluysen, L.; Self, S.; Jay, A. E.; Sheth, H. C.; Clarke, A. B.

    2015-12-01

    Transitions in the style of lava flow emplacement are recognized in the stratigraphic sequence of several mafic large igneous provinces (LIPs), including the Etendeka (Namibia), the Faeroe Islands (North Atlantic LIP), the Ethiopian Traps, and the Deccan Traps (India). These transitions, from units dominated by meter-sized pāhoehoe toes and lobes to those dominated by inflated sheet lobes tens to hundreds of meters in width and meters to tens of meters in height, seems to be a fundamental feature of LIP emplacement. In the Deccan, this volcanological transition is thought to coincide with deeper changes to the volcano-magmatic system expressed, notably, in the trace element and isotopic signature of erupted flows. We investigated this transition in the Deccan Traps by logging eight sequences along the Western Ghats, an escarpment in western India where the Deccan province is thickest and best exposed. The Deccan province, which once covered ~1 million km2 of west-central India, is subdivided in eleven chemo-stratigraphic formations in the type sections of the Western Ghats. Where the lower Deccan formations are exposed, we found that as much as 65% of the exposed thickness (below the Khandala Formation) is made up of sheet lobes, from 40% in the Bhimashankar Formation to 75% in the Thakurvadi Formation. Near the bottom of the sequence, 25% of the Neral Formation is composed of sheet lobes ≥15 m in thickness. On this basis, the traditional view that inflated sheet lobes are an exclusive feature of the upper part of the stratigraphy must be challenged. Several mechanisms have been proposed to explain the development of compound flows and inflated sheet lobes, involving one or more of the following factors: underlying slope, varying effusion rate, and source geometry. Analogue experiments are currently under way to test the relative influence of each of these factors in the development of different lava flow morphologies in LIPs.

  16. Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus.

    PubMed

    Kumlert, Rawadee; Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L; Sungvornyothin, Sungsit; Paris, Daniel H

    2018-01-01

    Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People's Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus.

  17. Autofluorescence microscopy for paired-matched morphological and molecular identification of individual chigger mites (Acari: Trombiculidae), the vectors of scrub typhus

    PubMed Central

    Chaisiri, Kittipong; Anantatat, Tippawan; Stekolnikov, Alexandr A.; Morand, Serge; Prasartvit, Anchana; Makepeace, Benjamin L.; Sungvornyothin, Sungsit; Paris, Daniel H.

    2018-01-01

    Background Conventional gold standard characterization of chigger mites involves chemical preparation procedures (i.e. specimen clearing) for visualization of morphological features, which however contributes to destruction of the arthropod host DNA and any endosymbiont or pathogen DNA harbored within the specimen. Methodology/Principal findings In this study, a novel work flow based on autofluorescence microscopy was developed to enable identification of trombiculid mites to the species level on the basis of morphological traits without any special preparation, while preserving the mite DNA for subsequent genotyping. A panel of 16 specifically selected fluorescence microscopy images of mite features from available identification keys served for complete chigger morphological identification to the species level, and was paired with corresponding genotype data. We evaluated and validated this method for paired chigger morphological and genotypic ID using the mitochondrial cytochrome c oxidase subunit I gene (coi) in 113 chigger specimens representing 12 species and 7 genera (Leptotrombidium, Ascoschoengastia, Gahrliepia, Walchia, Blankaartia, Schoengastia and Schoutedenichia) from the Lao People’s Democratic Republic (Lao PDR) to the species level (complete characterization), and 153 chiggers from 5 genera (Leptotrombidium, Ascoschoengastia, Helenicula, Schoengastiella and Walchia) from Thailand, Cambodia and Lao PDR to the genus level. A phylogenetic tree constructed from 77 coi gene sequences (approximately 640 bp length, n = 52 new coi sequences and n = 25 downloaded from GenBank), demonstrated clear grouping of assigned morphotypes at the genus levels, although evidence of both genetic polymorphism and morphological plasticity was found. Conclusions/Significance With this new methodology, we provided the largest collection of characterized coi gene sequences for trombiculid mites to date, and almost doubled the number of available characterized coi gene sequences with a single study. The ability to provide paired phenotypic-genotypic data is of central importance for future characterization of mites and dissecting the molecular epidemiology of mites transmitting diseases like scrub typhus. PMID:29494599

  18. Complete Genome Sequence of the Broad-Host-Range Vibriophage KVP40: Comparative Genomics of a T4-Related Bacteriophage

    PubMed Central

    Miller, Eric S.; Heidelberg, John F.; Eisen, Jonathan A.; Nelson, William C.; Durkin, A. Scott; Ciecko, Ann; Feldblyum, Tamara V.; White, Owen; Paulsen, Ian T.; Nierman, William C.; Lee, Jong; Szczypinski, Bridget; Fraser, Claire M.

    2003-01-01

    The complete genome sequence of the T4-like, broad-host-range vibriophage KVP40 has been determined. The genome sequence is 244,835 bp, with an overall G+C content of 42.6%. It encodes 386 putative protein-encoding open reading frames (CDSs), 30 tRNAs, 33 T4-like late promoters, and 57 potential rho-independent terminators. Overall, 92.1% of the KVP40 genome is coding, with an average CDS size of 587 bp. While 65% of the CDSs were unique to KVP40 and had no known function, the genome sequence and organization show specific regions of extensive conservation with phage T4. At least 99 KVP40 CDSs have homologs in the T4 genome (Blast alignments of 45 to 68% amino acid similarity). The shared CDSs represent 36% of all T4 CDSs but only 26% of those from KVP40. There is extensive representation of the DNA replication, recombination, and repair enzymes as well as the viral capsid and tail structural genes. KVP40 lacks several T4 enzymes involved in host DNA degradation, appears not to synthesize the modified cytosine (hydroxymethyl glucose) present in T-even phages, and lacks group I introns. KVP40 likely utilizes the T4-type sigma-55 late transcription apparatus, but features of early- or middle-mode transcription were not identified. There are 26 CDSs that have no viral homolog, and many did not necessarily originate from Vibrio spp., suggesting an even broader host range for KVP40. From these latter CDSs, an NAD salvage pathway was inferred that appears to be unique among bacteriophages. Features of the KVP40 genome that distinguish it from T4 are presented, as well as those, such as the replication and virion gene clusters, that are substantially conserved. PMID:12923095

  19. Sequence of contactin, a 130-kD glycoprotein concentrated in areas of interneuronal contact, defines a new member of the immunoglobulin supergene family in the nervous system

    PubMed Central

    1988-01-01

    The primary amino acid sequence of contactin, a neuronal cell surface glycoprotein of 130 kD that is isolated in association with components of the cytoskeleton (Ranscht, B., D. J. Moss, and C. Thomas. 1984. J. Cell Biol. 99:1803-1813), was deduced from the nucleotide sequence of cDNA clones and is reported here. The cDNA sequence contains an open reading frame for a 1,071-amino acid transmembrane protein with 962 extracellular and 89 cytoplasmic amino acids. In its extracellular portion, the polypeptide features six type 1 and two type 2 repeats. The six amino-terminal type 1 repeats (I-VI) each consist of 81-99 amino acids and contain two cysteine residues that are in the right context to form globular domains as described for molecules with immunoglobulin structure. Within the proposed globular region, contactin shares 31% identical amino acids with the neural cell adhesion molecule NCAM. The two type 2 repeats (I-II) are each composed of 100 amino acids and lack cysteine residues. They are 20-31% identical to fibronectin type III repeats. Both the structural similarity of contactin to molecules of the immunoglobulin supergene family, in particular the amino acid sequence resemblance to NCAM, and its relationship to fibronectin indicate that contactin could be involved in some aspect of cellular adhesion. This suggestion is further strengthened by its localization in neuropil containing axon fascicles and synapses. PMID:3049624

  20. Mhc class II B gene evolution in East African cichlid fishes.

    PubMed

    Figueroa, F; Mayer, W E; Sültmann, H; O'hUigin, C; Tichy, H; Satta, Y; Takezaki, N; Takahata, N; Klein, J

    2000-06-01

    A distinctive feature of essential major histocompatibility complex (Mhc) loci is their polymorphism characterized by large genetic distances between alleles and long persistence times of allelic lineages. Since the lineages often span several successive speciations, we investigated the behavior of the Mhc alleles during or close to the speciation phase. We sequenced exon 2 of the class II B locus 4 from 232 East African cichlid fishes representing 32 related species. The divergence times of the (sub)species ranged from 6,000 to 8.4 million years. Two types of evolutionary analysis were used to elucidate the pattern of exon 2 sequence divergence. First, phylogenetic methods were applied to reconstruct the most likely evolutionary pathways leading from the last common ancestor of the set to the extant sequences, and to assess the probable mechanisms involved in allelic diversification. Second, pairwise comparisons of sequences were carried out to detect differences seemingly incompatible with origin by nonparallel point mutations. The analysis revealed point mutations to be the most important mechanism behind allelic divergences, with recombination playing only an auxiliary part. Comparison of sequences from related species revealed evidence of random allelic (lineage) losses apparently associated with speciation. Sharing of identical alleles could be demonstrated between species that diverged 2 million years ago. The phylogeny of the exon was incongruent with that of the flanking introns, indicating either a high degree of convergent evolution at the peptide-binding region-encoding sites, or intron homogenization.

  1. Insertion Sequences

    PubMed Central

    Mahillon, Jacques; Chandler, Michael

    1998-01-01

    Insertion sequences (ISs) constitute an important component of most bacterial genomes. Over 500 individual ISs have been described in the literature to date, and many more are being discovered in the ongoing prokaryotic and eukaryotic genome-sequencing projects. The last 10 years have also seen some striking advances in our understanding of the transposition process itself. Not least of these has been the development of various in vitro transposition systems for both prokaryotic and eukaryotic elements and, for several of these, a detailed understanding of the transposition process at the chemical level. This review presents a general overview of the organization and function of insertion sequences of eubacterial, archaebacterial, and eukaryotic origins with particular emphasis on bacterial elements and on different aspects of the transposition mechanism. It also attempts to provide a framework for classification of these elements by assigning them to various families or groups. A total of 443 members of the collection have been grouped in 17 families based on combinations of the following criteria: (i) similarities in genetic organization (arrangement of open reading frames); (ii) marked identities or similarities in the enzymes which mediate the transposition reactions, the recombinases/transposases (Tpases); (iii) similar features of their ends (terminal IRs); and (iv) fate of the nucleotide sequence of their target sites (generation of a direct target duplication of determined length). A brief description of the mechanism(s) involved in the mobility of individual ISs in each family and of the structure-function relationships of the individual Tpases is included where available. PMID:9729608

  2. Computational identification of developmental enhancers:conservation and function of transcription factor binding-site clustersin drosophila melanogaster and drosophila psedoobscura

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Berman, Benjamin P.; Pfeiffer, Barret D.; Laverty, Todd R.

    2004-08-06

    Background The identification of sequences that control transcription in metazoans is a major goal of genome analysis. In a previous study, we demonstrated that searching for clusters of predicted transcription factor binding sites could discover active regulatory sequences, and identified 37 regions of the Drosophila melanogaster genome with high densities of predicted binding sites for five transcription factors involved in anterior-posterior embryonic patterning. Nine of these clusters overlapped known enhancers. Here, we report the results of in vivo functional analysis of 27 remaining clusters. Results We generated transgenic flies carrying each cluster attached to a basal promoter and reporter gene,more » and assayed embryos for reporter gene expression. Six clusters are enhancers of adjacent genes: giant, fushi tarazu, odd-skipped, nubbin, squeeze and pdm2; three drive expression in patterns unrelated to those of neighboring genes; the remaining 18 do not appear to have enhancer activity. We used the Drosophila pseudoobscura genome to compare patterns of evolution in and around the 15 positive and 18 false-positive predictions. Although conservation of primary sequence cannot distinguish true from false positives, conservation of binding-site clustering accurately discriminates functional binding-site clusters from those with no function. We incorporated conservation of binding-site clustering into a new genome-wide enhancer screen, and predict several hundred new regulatory sequences, including 85 adjacent to genes with embryonic patterns. Conclusions Measuring conservation of sequence features closely linked to function - such as binding-site clustering - makes better use of comparative sequence data than commonly used methods that examine only sequence identity.« less

  3. Sequence2Vec: a novel embedding approach for modeling transcription factor binding affinity landscape.

    PubMed

    Dai, Hanjun; Umarov, Ramzan; Kuwahara, Hiroyuki; Li, Yu; Song, Le; Gao, Xin

    2017-11-15

    An accurate characterization of transcription factor (TF)-DNA affinity landscape is crucial to a quantitative understanding of the molecular mechanisms underpinning endogenous gene regulation. While recent advances in biotechnology have brought the opportunity for building binding affinity prediction methods, the accurate characterization of TF-DNA binding affinity landscape still remains a challenging problem. Here we propose a novel sequence embedding approach for modeling the transcription factor binding affinity landscape. Our method represents DNA binding sequences as a hidden Markov model which captures both position specific information and long-range dependency in the sequence. A cornerstone of our method is a novel message passing-like embedding algorithm, called Sequence2Vec, which maps these hidden Markov models into a common nonlinear feature space and uses these embedded features to build a predictive model. Our method is a novel combination of the strength of probabilistic graphical models, feature space embedding and deep learning. We conducted comprehensive experiments on over 90 large-scale TF-DNA datasets which were measured by different high-throughput experimental technologies. Sequence2Vec outperforms alternative machine learning methods as well as the state-of-the-art binding affinity prediction methods. Our program is freely available at https://github.com/ramzan1990/sequence2vec. xin.gao@kaust.edu.sa or lsong@cc.gatech.edu. Supplementary data are available at Bioinformatics online. © The Author(s) 2017. Published by Oxford University Press.

  4. Mutations in FLVCR1 Cause Posterior Column Ataxia and Retinitis Pigmentosa

    PubMed Central

    Rajadhyaksha, Anjali M.; Elemento, Olivier; Puffenberger, Erik G.; Schierberl, Kathryn C.; Xiang, Jenny Z.; Putorti, Maria L.; Berciano, José; Poulin, Chantal; Brais, Bernard; Michaelides, Michel; Weleber, Richard G.; Higgins, Joseph J.

    2010-01-01

    The study of inherited retinal diseases has advanced our knowledge of the cellular and molecular mechanisms involved in sensory neural signaling. Dysfunction of two specific sensory modalities, vision and proprioception, characterizes the phenotype of the rare, autosomal-recessive disorder posterior column ataxia and retinitis pigmentosa (PCARP). Using targeted DNA capture and high-throughput sequencing, we analyzed the entire 4.2 Mb candidate sequence on chromosome 1q32 to find the gene mutated in PCARP in a single family. Employing comprehensive bioinformatic analysis and filtering, we identified a single-nucleotide coding variant in the feline leukemia virus subgroup C cellular receptor 1 (FLVCR1), a gene encoding a heme-transporter protein. Sanger sequencing confirmed the FLVCR1 mutation in this family and identified different homozygous missense mutations located within the protein's transmembrane channel segment in two other unrelated families with PCARP. To determine whether the selective pathologic features of PCARP correlated with FLVCR1 expression, we examined wild-type mouse Flvcr1 mRNA levels in the posterior column of the spinal cord and the retina via quantitative real-time reverse-transcriptase PCR. The Flvcr1 mRNA levels were most abundant in the retina, followed by the posterior column of the spinal cord and other brain regions. These results suggest that aberrant FLVCR1 causes a selective degeneration of a subpopulation of neurons in the retina and the posterior columns of the spinal cord via dysregulation of heme or iron homeostasis. This finding broadens the molecular basis of sensory neural signaling to include common mechanisms that involve proprioception and vision. PMID:21070897

  5. A novel recombinant retrovirus in the genomes of modern birds combines features of avian and mammalian retroviruses.

    PubMed

    Henzy, Jamie E; Gifford, Robert J; Johnson, Welkin E; Coffin, John M

    2014-03-01

    Endogenous retroviruses (ERVs) represent ancestral sequences of modern retroviruses or their extinct relatives. The majority of ERVs cluster alongside exogenous retroviruses into two main groups based on phylogenetic analyses of the reverse transcriptase (RT) enzyme. Class I includes gammaretroviruses, and class II includes lentiviruses and alpha-, beta-, and deltaretroviruses. However, analyses of the transmembrane subunit (TM) of the envelope glycoprotein (env) gene result in a different topology for some retroviruses, suggesting recombination events in which heterologous env sequences have been acquired. We previously demonstrated that the TM sequences of five of the six genera of orthoretroviruses can be divided into three types, each of which infects a distinct set of vertebrate classes. Moreover, these classes do not always overlap the host range of the associated RT classes. Thus, recombination resulting in acquisition of a heterologous env gene could in theory facilitate cross-species transmissions across vertebrate classes, for example, from mammals to reptiles. Here we characterized a family of class II avian ERVs, "TgERV-F," that acquired a mammalian gammaretroviral env sequence. Although TgERV-F clusters near a sister clade to alpharetroviruses, its genome also has some features of betaretroviruses. We offer evidence that this unusual recombinant has circulated among several avian orders and may still have infectious members. In addition to documenting the infection of a nongalliform avian species by a mammalian retrovirus, TgERV-F also underscores the importance of env sequences in reconstructing phylogenies and supports a possible role for env swapping in allowing cross-species transmissions across wide taxonomic distances. Retroviruses can sometimes acquire an envelope gene (env) from a distantly related retrovirus. Since env is a key determinant of host range, such an event affects the host range of the recombinant virus and can lead to the creation of novel retroviral lineages. Retroviruses insert viral DNA into the host DNA during infection, and therefore vertebrate genomes contain a "fossil record" of endogenous retroviral sequences thought to represent past infections of germ cells. We examined endogenous retroviral sequences in avian genomes for evidence of recombination events involving env. Although cross-species transmissions of retroviruses between vertebrate classes (from mammals to birds, for example) are thought to be rare, we here characterized a group of avian retroviruses that acquired an env sequence from a mammalian retrovirus. We offer evidence that this unusual recombinant circulated among songbirds 2 to 4 million years ago and has remained active into the recent past.

  6. Enhanced Regulatory Sequence Prediction Using Gapped k-mer Features

    PubMed Central

    Mohammad-Noori, Morteza; Beer, Michael A.

    2014-01-01

    Abstract Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. PMID:25033408

  7. Enhanced regulatory sequence prediction using gapped k-mer features.

    PubMed

    Ghandi, Mahmoud; Lee, Dongwon; Mohammad-Noori, Morteza; Beer, Michael A

    2014-07-01

    Oligomers of length k, or k-mers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, k-mers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific k-mer becomes very small, and k-mer counts approach a binary variable, with most k-mers absent and a few present once. Thus, any statistical learning approach using k-mers as features becomes susceptible to noisy training set k-mer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped k-mers, a new classifier, gkm-SVM, and a general method for robust estimation of k-mer frequencies. To make the method applicable to large-scale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmer-SVM and alternative approaches, our gkm-SVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkm-SVM consistently outperforms kmer-SVM on human ENCODE ChIP-seq datasets, and further demonstrate the general utility of our method using a Naïve-Bayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem.

  8. Application of genetic algorithm in integrated setup planning and operation sequencing

    NASA Astrophysics Data System (ADS)

    Kafashi, Sajad; Shakeri, Mohsen

    2011-01-01

    Process planning is an essential component for linking design and manufacturing process. Setup planning and operation sequencing is two main tasks in process planning. Many researches solved these two problems separately. Considering the fact that the two functions are complementary, it is necessary to integrate them more tightly so that performance of a manufacturing system can be improved economically and competitively. This paper present a generative system and genetic algorithm (GA) approach to process plan the given part. The proposed approach and optimization methodology analyses the TAD (tool approach direction), tolerance relation between features and feature precedence relations to generate all possible setups and operations using workshop resource database. Based on these technological constraints the GA algorithm approach, which adopts the feature-based representation, optimizes the setup plan and sequence of operations using cost indices. Case study show that the developed system can generate satisfactory results in optimizing the setup planning and operation sequencing simultaneously in feasible condition.

  9. Analysis of sequence repeats of proteins in the PDB.

    PubMed

    Mary Rajathei, David; Selvaraj, Samuel

    2013-12-01

    Internal repeats in protein sequences play a significant role in the evolution of protein structure and function. Applications of different bioinformatics tools help in the identification and characterization of these repeats. In the present study, we analyzed sequence repeats in a non-redundant set of proteins available in the Protein Data Bank (PDB). We used RADAR for detecting internal repeats in a protein, PDBeFOLD for assessing structural similarity, PDBsum for finding functional involvement and Pfam for domain assignment of the repeats in a protein. Through the analysis of sequence repeats, we found that identity of the sequence repeats falls in the range of 20-40% and, the superimposed structures of the most of the sequence repeats maintain similar overall folding. Analysis sequence repeats at the functional level reveals that most of the sequence repeats are involved in the function of the protein through functionally involved residues in the repeat regions. We also found that sequence repeats in single and two domain proteins often contained conserved sequence motifs for the function of the domain. Copyright © 2013 Elsevier Ltd. All rights reserved.

  10. DMS-MaPseq for genome-wide or targeted RNA structure probing in vivo.

    PubMed

    Zubradt, Meghan; Gupta, Paromita; Persad, Sitara; Lambowitz, Alan M; Weissman, Jonathan S; Rouskin, Silvi

    2017-01-01

    Coupling of structure-specific in vivo chemical modification to next-generation sequencing is transforming RNA secondary structure studies in living cells. The dominant strategy for detecting in vivo chemical modifications uses reverse transcriptase truncation products, which introduce biases and necessitate population-average assessments of RNA structure. Here we present dimethyl sulfate (DMS) mutational profiling with sequencing (DMS-MaPseq), which encodes DMS modifications as mismatches using a thermostable group II intron reverse transcriptase. DMS-MaPseq yields a high signal-to-noise ratio, can report multiple structural features per molecule, and allows both genome-wide studies and focused in vivo investigations of even low-abundance RNAs. We apply DMS-MaPseq for the first analysis of RNA structure within an animal tissue and to identify a functional structure involved in noncanonical translation initiation. Additionally, we use DMS-MaPseq to compare the in vivo structure of pre-mRNAs with their mature isoforms. These applications illustrate DMS-MaPseq's capacity to dramatically expand in vivo analysis of RNA structure.

  11. The diversity of H3 loops determines the antigen-binding tendencies of antibody CDR loops.

    PubMed

    Tsuchiya, Yuko; Mizuguchi, Kenji

    2016-04-01

    Of the complementarity-determining regions (CDRs) of antibodies, H3 loops, with varying amino acid sequences and loop lengths, adopt particularly diverse loop conformations. The diversity of H3 conformations produces an array of antigen recognition patterns involving all the CDRs, in which the residue positions actually in contact with the antigen vary considerably. Therefore, for a deeper understanding of antigen recognition, it is necessary to relate the sequence and structural properties of each residue position in each CDR loop to its ability to bind antigens. In this study, we proposed a new method for characterizing the structural features of the CDR loops and obtained the antigen-binding ability of each residue position in each CDR loop. This analysis led to a simple set of rules for identifying probable antigen-binding residues. We also found that the diversity of H3 loop lengths and conformations affects the antigen-binding tendencies of all the CDR loops. © 2016 The Protein Society.

  12. Sequence-dependent base pair stepping dynamics in XPD helicase unwinding

    PubMed Central

    Qi, Zhi; Pugh, Robert A; Spies, Maria; Chemla, Yann R

    2013-01-01

    Helicases couple the chemical energy of ATP hydrolysis to directional translocation along nucleic acids and transient duplex separation. Understanding helicase mechanism requires that the basic physicochemical process of base pair separation be understood. This necessitates monitoring helicase activity directly, at high spatio-temporal resolution. Using optical tweezers with single base pair (bp) resolution, we analyzed DNA unwinding by XPD helicase, a Superfamily 2 (SF2) DNA helicase involved in DNA repair and transcription initiation. We show that monomeric XPD unwinds duplex DNA in 1-bp steps, yet exhibits frequent backsteps and undergoes conformational transitions manifested in 5-bp backward and forward steps. Quantifying the sequence dependence of XPD stepping dynamics with near base pair resolution, we provide the strongest and most direct evidence thus far that forward, single-base pair stepping of a helicase utilizes the spontaneous opening of the duplex. The proposed unwinding mechanism may be a universal feature of DNA helicases that move along DNA phosphodiester backbones. DOI: http://dx.doi.org/10.7554/eLife.00334.001 PMID:23741615

  13. Emergence of biological organization through thermodynamic inversion.

    PubMed

    Kompanichenko, Vladimir

    2014-01-01

    Biological organization arises under thermodynamic inversion in prebiotic systems that provide the prevalence of free energy and information contribution over the entropy contribution. The inversion might occur under specific far-from-equilibrium conditions in prebiotic systems oscillating around the bifurcation point. At the inversion moment, (physical) information characteristic of non-biological systems acquires the new features: functionality, purposefulness, and control over the life processes, which transform it into biological information. Random sequences of amino acids and nucleotides, spontaneously synthesized in the prebiotic microsystem, in the primary living unit (probiont) re-assemble into functional sequences, involved into bioinformation circulation through nucleoprotein interaction, resulted in the genetic code emergence. According to the proposed concept, oscillating three-dimensional prebiotic microsystems transformed into probionts in the changeable hydrothermal medium of the early Earth. The inversion concept states that spontaneous (accidental, random) transformations in prebiotic systems cannot produce life; it is only non-spontaneous (perspective, purposeful) transformations, which are the result of thermodynamic inversion, that lead to the negentropy conversion of prebiotic systems into initial living units.

  14. Yeast Prions and Human Prion-like Proteins: Sequence Features and Prediction Methods

    PubMed Central

    Cascarina, Sean; Ross, Eric D.

    2014-01-01

    Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis (ALS). This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms. PMID:24390581

  15. Yeast prions and human prion-like proteins: sequence features and prediction methods.

    PubMed

    Cascarina, Sean M; Ross, Eric D

    2014-06-01

    Prions are self-propagating infectious protein isoforms. A growing number of prions have been identified in yeast, each resulting from the conversion of soluble proteins into an insoluble amyloid form. These yeast prions have served as a powerful model system for studying the causes and consequences of prion aggregation. Remarkably, a number of human proteins containing prion-like domains, defined as domains with compositional similarity to yeast prion domains, have recently been linked to various human degenerative diseases, including amyotrophic lateral sclerosis. This suggests that the lessons learned from yeast prions may help in understanding these human diseases. In this review, we examine what has been learned about the amino acid sequence basis for prion aggregation in yeast, and how this information has been used to develop methods to predict aggregation propensity. We then discuss how this information is being applied to understand human disease, and the challenges involved in applying yeast prediction methods to higher organisms.

  16. Enhanced transformation of incidentally learned knowledge into explicit memory by dopaminergic modulation.

    PubMed

    Clos, Mareike; Sommer, Tobias; Schneider, Signe L; Rose, Michael

    2018-01-01

    During incidental learning statistical regularities are extracted from the environment without the intention to learn. Acquired implicit memory of these regularities can affect behavior in the absence of awareness. However, conscious insight in the underlying regularities can also develop during learning. Such emergence of explicit memory is an important learning mechanism that is assumed to involve prediction errors in the striatum and to be dopamine-dependent. Here we directly tested this hypothesis by manipulating dopamine levels during incidental learning in a modified serial reaction time task (SRTT) featuring a hidden regular sequence of motor responses in a placebo-controlled between-group study. Awareness for the sequential regularity was subsequently assessed using cued generation and additionally verified using free recall. The results demonstrated that dopaminergic modulation nearly doubled the amount of explicit sequence knowledge emerged during learning in comparison to the placebo group. This strong effect clearly argues for a causal role of dopamine-dependent processing for the development of awareness for sequential regularities during learning.

  17. Shape classification of malignant lymphomas and leukemia by morphological watersheds and ARMA modeling

    NASA Astrophysics Data System (ADS)

    Celenk, Mehmet; Song, Yinglei; Ma, Limin; Zhou, Min

    2003-05-01

    A new algorithm that can be used to automatically recognize and classify malignant lymphomas and lukemia is proposed in this paper. The algorithm utilizes the morphological watershed to extract boundaries of cells from their grey-level images. It generates a sequence of Euclidean distances by selecting pixels in clockwise direction on the boundary of the cell and calculating the Euclidean distances of the selected pixels from the centroid of the cell. A feature vector associated with each cell is then obtained by applying the auto-regressive moving-average (ARMA) model to the generated sequence of Euclidean distances. The clustering measure J3=trace{inverse(Sw-1)Sm} involving the within (Sw) and mixed (Sm) class-scattering matrices is computed for both cell classes to provide an insight into the extent to which different cell classes in the training data are separated. Our test results suggest that the algorithm is highly accurate for the development of an interactive, computer-assisted diagnosis (CAD) tool.

  18. The genome sequence of the facultative intracellular pathogen Brucella melitensis.

    PubMed

    DelVecchio, Vito G; Kapatral, Vinayak; Redkar, Rajendra J; Patra, Guy; Mujer, Cesar; Los, Tamara; Ivanova, Natalia; Anderson, Iain; Bhattacharyya, Anamitra; Lykidis, Athanasios; Reznik, Gary; Jablonski, Lynn; Larsen, Niels; D'Souza, Mark; Bernal, Axel; Mazur, Mikhail; Goltsman, Eugene; Selkov, Eugene; Elzer, Philip H; Hagius, Sue; O'Callaghan, David; Letesson, Jean-Jacques; Haselkorn, Robert; Kyrpides, Nikos; Overbeek, Ross

    2002-01-08

    Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other alpha-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti.

  19. The genome sequence of the facultative intracellular pathogen Brucella melitensis

    PubMed Central

    DelVecchio, Vito G.; Kapatral, Vinayak; Redkar, Rajendra J.; Patra, Guy; Mujer, Cesar; Los, Tamara; Ivanova, Natalia; Anderson, Iain; Bhattacharyya, Anamitra; Lykidis, Athanasios; Reznik, Gary; Jablonski, Lynn; Larsen, Niels; D'Souza, Mark; Bernal, Axel; Mazur, Mikhail; Goltsman, Eugene; Selkov, Eugene; Elzer, Philip H.; Hagius, Sue; O'Callaghan, David; Letesson, Jean-Jacques; Haselkorn, Robert; Kyrpides, Nikos; Overbeek, Ross

    2002-01-01

    Brucella melitensis is a facultative intracellular bacterial pathogen that causes abortion in goats and sheep and Malta fever in humans. The genome of B. melitensis strain 16M was sequenced and found to contain 3,294,935 bp distributed over two circular chromosomes of 2,117,144 bp and 1,177,787 bp encoding 3,197 ORFs. By using the bioinformatics suite ERGO, 2,487 (78%) ORFs were assigned functions. The origins of replication of the two chromosomes are similar to those of other α-proteobacteria. Housekeeping genes, including those involved in DNA replication, transcription, translation, core metabolism, and cell wall biosynthesis, are distributed on both chromosomes. Type I, II, and III secretion systems are absent, but genes encoding sec-dependent, sec-independent, and flagella-specific type III, type IV, and type V secretion systems as well as adhesins, invasins, and hemolysins were identified. Several features of the B. melitensis genome are similar to those of the symbiotic Sinorhizobium meliloti. PMID:11756688

  20. Conserved hypothetical protein Rv1977 in Mycobacterium tuberculosis strains contains sequence polymorphisms and might be involved in ongoing immune evasion.

    PubMed

    Jiang, Yi; Liu, Haican; Wang, Xuezhi; Li, Guilian; Qiu, Yan; Dou, Xiangfeng; Wan, Kanglin

    2015-01-01

    Host immune pressure and associated parasite immune evasion are key features of host-pathogen co-evolution. A previous study showed that human T cell epitopes of Mycobacterium tuberculosis are evolutionarily hyperconserved and thus it was deduced that M. tuberculosis lacks antigenic variation and immune evasion. Here, we selected 151 clinical Mycobacterium tuberculosis isolates from China, amplified gene encoding Rv1977 and compared the sequences. The results showed that Rv1977, a conserved hypothetical protein, is not conserved in M. tuberculosis strains and there are polymorphisms existed in the protein. Some mutations, especially one frameshift mutation, occurred in the antigen Rv1977, which is uncommon in M.tb strains and may lead to the protein function altering. Mutations and deletion in the gene all affect one of three T cell epitopes and the changed T cell epitope contained more than one variable position, which may suggest ongoing immune evasion.

  1. Technological advancements and their importance for nematode identification

    NASA Astrophysics Data System (ADS)

    Ahmed, Mohammed; Sapp, Melanie; Prior, Thomas; Karssen, Gerrit; Back, Matthew Alan

    2016-06-01

    Nematodes represent a species-rich and morphologically diverse group of metazoans known to inhabit both aquatic and terrestrial environments. Their role as biological indicators and as key players in nutrient cycling has been well documented. Some plant-parasitic species are also known to cause significant losses to crop production. In spite of this, there still exists a huge gap in our knowledge of their diversity due to the enormity of time and expertise often involved in characterising species using phenotypic features. Molecular methodology provides useful means of complementing the limited number of reliable diagnostic characters available for morphology-based identification. We discuss herein some of the limitations of traditional taxonomy and how molecular methodologies, especially the use of high-throughput sequencing, have assisted in carrying out large-scale nematode community studies and characterisation of phytonematodes through rapid identification of multiple taxa. We also provide brief descriptions of some the current and almost-outdated high-throughput sequencing platforms and their applications in both plant nematology and soil ecology.

  2. PrimerMapper: high throughput primer design and graphical assembly for PCR and SNP detection

    PubMed Central

    O’Halloran, Damien M.

    2016-01-01

    Primer design represents a widely employed gambit in diverse molecular applications including PCR, sequencing, and probe hybridization. Variations of PCR, including primer walking, allele-specific PCR, and nested PCR provide specialized validation and detection protocols for molecular analyses that often require screening large numbers of DNA fragments. In these cases, automated sequence retrieval and processing become important features, and furthermore, a graphic that provides the user with a visual guide to the distribution of designed primers across targets is most helpful in quickly ascertaining primer coverage. To this end, I describe here, PrimerMapper, which provides a comprehensive graphical user interface that designs robust primers from any number of inputted sequences while providing the user with both, graphical maps of primer distribution for each inputted sequence, and also a global assembled map of all inputted sequences with designed primers. PrimerMapper also enables the visualization of graphical maps within a browser and allows the user to draw new primers directly onto the webpage. Other features of PrimerMapper include allele-specific design features for SNP genotyping, a remote BLAST window to NCBI databases, and remote sequence retrieval from GenBank and dbSNP. PrimerMapper is hosted at GitHub and freely available without restriction. PMID:26853558

  3. Genome-Wide Locations of Potential Epimutations Associated with Environmentally Induced Epigenetic Transgenerational Inheritance of Disease Using a Sequential Machine Learning Prediction Approach.

    PubMed

    Haque, M Muksitul; Holder, Lawrence B; Skinner, Michael K

    2015-01-01

    Environmentally induced epigenetic transgenerational inheritance of disease and phenotypic variation involves germline transmitted epimutations. The primary epimutations identified involve altered differential DNA methylation regions (DMRs). Different environmental toxicants have been shown to promote exposure (i.e., toxicant) specific signatures of germline epimutations. Analysis of genomic features associated with these epimutations identified low-density CpG regions (<3 CpG / 100bp) termed CpG deserts and a number of unique DNA sequence motifs. The rat genome was annotated for these and additional relevant features. The objective of the current study was to use a machine learning computational approach to predict all potential epimutations in the genome. A number of previously identified sperm epimutations were used as training sets. A novel machine learning approach using a sequential combination of Active Learning and Imbalance Class Learner analysis was developed. The transgenerational sperm epimutation analysis identified approximately 50K individual sites with a 1 kb mean size and 3,233 regions that had a minimum of three adjacent sites with a mean size of 3.5 kb. A select number of the most relevant genomic features were identified with the low density CpG deserts being a critical genomic feature of the features selected. A similar independent analysis with transgenerational somatic cell epimutation training sets identified a smaller number of 1,503 regions of genome-wide predicted sites and differences in genomic feature contributions. The predicted genome-wide germline (sperm) epimutations were found to be distinct from the predicted somatic cell epimutations. Validation of the genome-wide germline predicted sites used two recently identified transgenerational sperm epimutation signature sets from the pesticides dichlorodiphenyltrichloroethane (DDT) and methoxychlor (MXC) exposure lineage F3 generation. Analysis of this positive validation data set showed a 100% prediction accuracy for all the DDT-MXC sperm epimutations. Observations further elucidate the genomic features associated with transgenerational germline epimutations and identify a genome-wide set of potential epimutations that can be used to facilitate identification of epigenetic diagnostics for ancestral environmental exposures and disease susceptibility.

  4. Measuring Magnetic Declination With Compass, GPS and Virtual Globes

    NASA Astrophysics Data System (ADS)

    O'Brien, W. P.

    2006-12-01

    Using virtual globe (VG) imagery to determine geographic bearing and a compass to determine magnetic bearing yielded acceptable experimental magnetic declination values for large linear physical features at 13 sites in the western continental United States. The geographic bearing of each feature was determined from measurements involving the latitude/longitude coordinate system associated with the VG image (from World Wind or Google Earth). The corresponding magnetic bearing was measured on the ground at the feature with a hand-bearing compass calibrated in 1-degree subdivisions. A sequence of GPS trackpoints, recorded while traveling along the feature either in an automobile or on foot, unambiguously identified the pertinent portion of the feature (a straight segment of a road, for example) when plotted on the VG image. For each physical feature located on a VG image, its geographic bearing was determined directly using on-screen measurement tools available with the VG program or by hand using ruler/protractor methods with printed copies of the VG image. An independent (no use of VG) geographic bearing was also extracted from the slope of a straight-line fit to a latitude/longitude plot of each feature's GPS coordinates, a value that was the same (to within the inherent uncertainty of the data) as the VG-determined bearing, thus validating this procedure for finding geographic bearings. Differences between the VG bearings and the magnetic bearings yielded experimental magnetic declination values within one degree (8 within 0.5 degree) of expected values. From the point of view of physics and geophysics pedagogy, this project affords students a simple magnetism/geodesy field experiment requiring only a good compass and a GPS receiver with memory and a data port. The novel and straightforward data analysis with VG software yields reliable experimental values for an important abstract geophysical quantity, magnetic declination. Just as the compass has long provided easy access to Magnetic North, the coordinate systems inherent in recently-developed VG and GPS satellite technologies now provide easy access (i.e., no astronomical measurements involving Polaris or the Sun) to Geographic North for this and future applications.

  5. Molecular evolution of calcification genes in morphologically similar but phylogenetically unrelated scleractinian corals.

    PubMed

    Wirshing, Herman H; Baker, Andrew C

    2014-08-01

    Molecular phylogenies of scleractinian corals often fail to agree with traditional phylogenies derived from morphological characters. These discrepancies are generally attributed to non-homologous or morphologically plastic characters used in taxonomic descriptions. Consequently, morphological convergence of coral skeletons among phylogenetically unrelated groups is considered to be the major evolutionary process confounding molecular and morphological hypotheses. A strategy that may help identify cases of convergence and/or diversification in coral morphology is to compare phylogenies of existing "neutral" genetic markers used to estimate genealogic phylogenetic history with phylogenies generated from non-neutral genes involved in calcification (biomineralization). We tested the hypothesis that differences among calcification gene phylogenies with respect to the "neutral" trees may represent convergent or divergent functional strategies among calcification gene proteins that may correlate to aspects of coral skeletal morphology. Partial sequences of two nuclear genes previously determined to be involved in the calcification process in corals, "Cnidaria-III" membrane-bound/secreted α-carbonic anhydrase (CIII-MBSα-CA) and bone morphogenic protein (BMP) 2/4, were PCR-amplified, cloned and sequenced from 31 scleractinian coral species in 26 genera and 9 families. For comparison, "neutral" gene phylogenies were generated from sequences from two protein-coding "non-calcification" genes, one nuclear (β-tubulin) and one mitochondrial (cytochrome b), from the same individuals. Cloned CIII-MBSα-CA sequences were found to be non-neutral, and phylogenetic analyses revealed CIII-MBSα-CAs to exhibit a complex evolutionary history with clones distributed between at least 2 putative gene copies. However, for several coral taxa only one gene copy was recovered. With CIII-MBSα-CA, several recovered clades grouped taxa that differed from the "non-calcification" loci. In some cases, these taxa shared aspects of their skeletal morphology (i.e., convergence or diversification relative to the "non-calcification" loci), but in other cases they did not. For example, the "non-calcification" loci recovered Atlantic and Pacific mussids as separate evolutionary lineages, whereas with CIII-MBSα-CA, clones of two species of Atlantic mussids (Isophyllia sinuosa and Mycetophyllia sp.) and two species of Pacific mussids (Acanthastrea echinata and Lobophyllia hemprichii) were united in a distinct clade (except for one individual of Mycetophyllia). However, this clade also contained other taxa which were not unambiguously correlated with morphological features. BMP2/4 also contained clones that likely represent different gene copies. However, many of the sequences showed no significant deviation from neutrality, and reconstructed phylogenies were similar to the "non-calcification" tree topologies with a few exceptions. Although individual calcification genes are unlikely to precisely explain the diverse morphological features exhibited by scleractinian corals, this study demonstrates an approach for identifying cases where morphological taxonomy may have been misled by convergent and/or divergent molecular evolutionary processes in corals. Studies such as this may help illuminate our understanding of the likely complex evolution of genes involved in the calcification process, and enhance our knowledge of the natural history and biodiversity within this central ecological group. Published by Elsevier Inc.

  6. Rigorous assessment and integration of the sequence and structure based features to predict hot spots

    PubMed Central

    2011-01-01

    Background Systematic mutagenesis studies have shown that only a few interface residues termed hot spots contribute significantly to the binding free energy of protein-protein interactions. Therefore, hot spots prediction becomes increasingly important for well understanding the essence of proteins interactions and helping narrow down the search space for drug design. Currently many computational methods have been developed by proposing different features. However comparative assessment of these features and furthermore effective and accurate methods are still in pressing need. Results In this study, we first comprehensively collect the features to discriminate hot spots and non-hot spots and analyze their distributions. We find that hot spots have lower relASA and larger relative change in ASA, suggesting hot spots tend to be protected from bulk solvent. In addition, hot spots have more contacts including hydrogen bonds, salt bridges, and atomic contacts, which favor complexes formation. Interestingly, we find that conservation score and sequence entropy are not significantly different between hot spots and non-hot spots in Ab+ dataset (all complexes). While in Ab- dataset (antigen-antibody complexes are excluded), there are significant differences in two features between hot pots and non-hot spots. Secondly, we explore the predictive ability for each feature and the combinations of features by support vector machines (SVMs). The results indicate that sequence-based feature outperforms other combinations of features with reasonable accuracy, with a precision of 0.69, a recall of 0.68, an F1 score of 0.68, and an AUC of 0.68 on independent test set. Compared with other machine learning methods and two energy-based approaches, our approach achieves the best performance. Moreover, we demonstrate the applicability of our method to predict hot spots of two protein complexes. Conclusion Experimental results show that support vector machine classifiers are quite effective in predicting hot spots based on sequence features. Hot spots cannot be fully predicted through simple analysis based on physicochemical characteristics, but there is reason to believe that integration of features and machine learning methods can remarkably improve the predictive performance for hot spots. PMID:21798070

  7. Generation of Tandem Direct Duplications by Reversed-Ends Transposition of Maize Ac Elements

    PubMed Central

    Peterson, Thomas

    2013-01-01

    Tandem direct duplications are a common feature of the genomes of eukaryotes ranging from yeast to human, where they comprise a significant fraction of copy number variations. The prevailing model for the formation of tandem direct duplications is non-allelic homologous recombination (NAHR). Here we report the isolation of a series of duplications and reciprocal deletions isolated de novo from a maize allele containing two Class II Ac/Ds transposons. The duplication/deletion structures suggest that they were generated by alternative transposition reactions involving the termini of two nearby transposable elements. The deletion/duplication breakpoint junctions contain 8 bp target site duplications characteristic of Ac/Ds transposition events, confirming their formation directly by an alternative transposition mechanism. Tandem direct duplications and reciprocal deletions were generated at a relatively high frequency (∼0.5 to 1%) in the materials examined here in which transposons are positioned nearby each other in appropriate orientation; frequencies would likely be much lower in other genotypes. To test whether this mechanism may have contributed to maize genome evolution, we analyzed sequences flanking Ac/Ds and other hAT family transposons and identified three small tandem direct duplications with the structural features predicted by the alternative transposition mechanism. Together these results show that some class II transposons are capable of directly inducing tandem sequence duplications, and that this activity has contributed to the evolution of the maize genome. PMID:23966872

  8. Diversity, virulence, and antimicrobial resistance of the KPC-producing Klebsiella pneumoniae ST307 clone

    PubMed Central

    Villa, Laura; Feudi, Claudia; Fortini, Daniela; Brisse, Sylvain; Passet, Virginie; Bonura, Celestino; Endimiani, Andrea; Mammina, Caterina; Ocampo, Ana Maria; Jimenez, Judy Natalia; Doumith, Michel; Woodford, Neil; Hopkins, Katie

    2017-01-01

    The global spread of Klebsiella pneumoniae producing Klebsiella pneumoniae carbapenemase (KPC) has been mainly associated with the dissemination of high-risk clones. In the last decade, hospital outbreaks involving KPC-producing K. pneumoniae have been predominantly attributed to isolates belonging to clonal group (CG) 258. However, results of recent epidemiological analysis indicate that KPC-producing sequence type (ST) 307, is emerging in different parts of the world and is a candidate to become a prevalent high-risk clone in the near future. Here we show that the ST307 genome encodes genetic features that may provide an advantage in adaptation to the hospital environment and the human host. Sequence analysis revealed novel plasmid-located virulence factors, including a cluster for glycogen synthesis. Glycogen production is considered to be one of the possible adaptive responses to long-term survival and growth in environments outside the host. Chromosomally-encoded virulence traits in the clone comprised fimbriae, an integrative conjugative element carrying the yersiniabactin siderophore, and two different capsular loci. Compared with the ST258 clone, capsulated ST307 isolates showed higher resistance to complement-mediated killing. The acquired genetic features identified in the genome of this new emerging clone may contribute to increased persistence of ST307 in the hospital environment and shed light on its potential epidemiological success. PMID:28785421

  9. Unusual Phenotypic Features in a Patient with a Novel Splice Mutation in the GHRHR Gene

    PubMed Central

    Hilal, Latifa; Hajaji, Yassir; Vie-Luton, Marie-Pierre; Ajaltouni, Zeina; Benazzouz, Bouchra; Chana, Maha; Chraïbi, Adelmajid; Kadiri, Abdelkrim; Amselem, Serge; Sobrier, Marie-Laure

    2008-01-01

    Isolated growth hormone deficiency (IGHD) may be of genetic origin. One of the few genes involved in that condition encodes the growth hormone releasing hormone receptor (GHRHR) that, through its ligand (GHRH), plays a pivotal role in the GH synthesis and secretion by the pituitary. Our objective is to describe the phenotype of two siblings born to a consanguineous union presenting with short stature (IGHD) and Magnetic Resonance Imaging (MRI) abnormalities, and to identify the molecular basis of this condition. Our main outcome measures were clinical and endocrinological investigations, MRI of the pituitary region, study of the GHRHR gene sequence and transcripts. In both patients, the severe growth retardation (−5SD) was combined with anterior pituitary hypoplasia. In addition to these classical phenotypic features for IGHD, one of the patients had a Chiari I malformation, an arachnoid cyst, and a dysmorphic anterior pituitary. A homozygous sequence variation in the consensus donor splice site of intron 1 (IVS1 + 2T > G) of the GHRHR gene was identified in both patients. Using in vitro transcription assay, we showed that this mutation results in abnormal splicing of GHRHR transcripts. In this report, which broadens the phenotype associated with GHRHR defects, we discuss the possible role of the GHRHR in the proper development of extrapituitary structures, through a mechanism that could be direct or secondary to severe GH deficiency. PMID:18297129

  10. Sequence Complexity of Chromosome 3 in Caenorhabditis elegans

    PubMed Central

    Pierro, Gaetano

    2012-01-01

    The nucleotide sequences complexity in chromosome 3 of Caenorhabditis elegans (C. elegans) is studied. The complexity of these sequences is compared with some random sequences. Moreover, by using some parameters related to complexity such as fractal dimension and frequency, indicator matrix is given a first classification of sequences of C. elegans. In particular, the sequences with highest and lowest fractal value are singled out. It is shown that the intrinsic nature of the low fractal dimension sequences has many common features with the random sequences. PMID:22919380

  11. Formin homology 2 domains occur in multiple contexts in angiosperms

    PubMed Central

    Cvrčková, Fatima; Novotný, Marian; Pícková, Denisa; Žárský, Viktor

    2004-01-01

    Background Involvement of conservative molecular modules and cellular mechanisms in the widely diversified processes of eukaryotic cell morphogenesis leads to the intriguing question: how do similar proteins contribute to dissimilar morphogenetic outputs. Formins (FH2 proteins) play a central part in the control of actin organization and dynamics, providing a good example of evolutionarily versatile use of a conserved protein domain in the context of a variety of lineage-specific structural and signalling interactions. Results In order to identify possible plant-specific sequence features within the FH2 protein family, we performed a detailed analysis of angiosperm formin-related sequences available in public databases, with particular focus on the complete Arabidopsis genome and the nearly finished rice genome sequence. This has led to revision of the current annotation of half of the 22 Arabidopsis formin-related genes. Comparative analysis of the two plant genomes revealed a good conservation of the previously described two subfamilies of plant formins (Class I and Class II), as well as several subfamilies within them that appear to predate the separation of monocot and dicot plants. Moreover, a number of plant Class II formins share an additional conserved domain, related to the protein phosphatase/tensin/auxilin fold. However, considerable inter-species variability sets limits to generalization of any functional conclusions reached on a single species such as Arabidopsis. Conclusions The plant-specific domain context of the conserved FH2 domain, as well as plant-specific features of the domain itself, may reflect distinct functional requirements in plant cells. The variability of formin structures found in plants far exceeds that known from both fungi and metazoans, suggesting a possible contribution of FH2 proteins in the evolution of the plant type of multicellularity. PMID:15256004

  12. The effect of native-language experience on the sensory-obligatory components, the P1–N1–P2 and the T-complex

    PubMed Central

    Wagner, Monica; Shafer, Valerie L.; Martin, Brett; Steinschneider, Mitchell

    2013-01-01

    The influence of native-language experience on sensory-obligatory auditory-evoked potentials (AEPs) was investigated in native-English and native-Polish listeners. AEPs were recorded to the first word in nonsense word pairs, while participants performed a syllable identification task to the second word in the pairs. Nonsense words contained phoneme sequence onsets (i.e., /pt/, /pət/, /st/ and /sət/) that occur in the Polish and English languages, with the exception that /pt/ at syllable onset is an illegal phonotactic form in English. P1–N1–P2 waveforms from fronto-central electrode sites were comparable in English and Polish listeners, even though, these same English participants were unable to distinguish the nonsense words having /pt/ and /pət/ onsets. The P1–N1–P2 complex indexed the temporal characteristics of the word stimuli in the same manner for both language groups. Taken together, these findings suggest that the fronto-central P1–N1–P2 complex reflects acoustic feature processing of speech and is not significantly influenced by exposure to the phoneme sequences of the native-language. In contrast, the T-complex from bilateral posterior temporal sites was found to index phonological as well as acoustic feature processing to the nonsense word stimuli. An enhanced negativity for the /pt/ cluster relative to its contrast sequence (i.e., /pət/) occurred only for the Polish listeners, suggesting that neural networks within non-primary auditory cortex may be involved in early cortical phonological processing. PMID:23643857

  13. Feature-based respiratory motion tracking in native fluoroscopic sequences for dynamic roadmaps during minimally invasive procedures in the thorax and abdomen

    NASA Astrophysics Data System (ADS)

    Wagner, Martin G.; Laeseke, Paul F.; Schubert, Tilman; Slagowski, Jordan M.; Speidel, Michael A.; Mistretta, Charles A.

    2017-03-01

    Fluoroscopic image guidance for minimally invasive procedures in the thorax and abdomen suffers from respiratory and cardiac motion, which can cause severe subtraction artifacts and inaccurate image guidance. This work proposes novel techniques for respiratory motion tracking in native fluoroscopic images as well as a model based estimation of vessel deformation. This would allow compensation for respiratory motion during the procedure and therefore simplify the workflow for minimally invasive procedures such as liver embolization. The method first establishes dynamic motion models for both the contrast-enhanced vasculature and curvilinear background features based on a native (non-contrast) and a contrast-enhanced image sequence acquired prior to device manipulation, under free breathing conditions. The model of vascular motion is generated by applying the diffeomorphic demons algorithm to an automatic segmentation of the subtraction sequence. The model of curvilinear background features is based on feature tracking in the native sequence. The two models establish the relationship between the respiratory state, which is inferred from curvilinear background features, and the vascular morphology during that same respiratory state. During subsequent fluoroscopy, curvilinear feature detection is applied to determine the appropriate vessel mask to display. The result is a dynamic motioncompensated vessel mask superimposed on the fluoroscopic image. Quantitative evaluation of the proposed methods was performed using a digital 4D CT-phantom (XCAT), which provides realistic human anatomy including sophisticated respiratory and cardiac motion models. Four groups of datasets were generated, where different parameters (cycle length, maximum diaphragm motion and maximum chest expansion) were modified within each image sequence. Each group contains 4 datasets consisting of the initial native and contrast enhanced sequences as well as a sequence, where the respiratory motion is tracked. The respiratory motion tracking error was between 1.00 % and 1.09 %. The estimated dynamic vessel masks yielded a Sørensen-Dice coefficient between 0.94 and 0.96. Finally, the accuracy of the vessel contours was measured in terms of the 99th percentile of the error, which ranged between 0.64 and 0.96 mm. The presented results show that the approach is feasible for respiratory motion tracking and compensation and could therefore considerably improve the workflow of minimally invasive procedures in the thorax and abdomen

  14. Automatic Spatio-Temporal Flow Velocity Measurement in Small Rivers Using Thermal Image Sequences

    NASA Astrophysics Data System (ADS)

    Lin, D.; Eltner, A.; Sardemann, H.; Maas, H.-G.

    2018-05-01

    An automatic spatio-temporal flow velocity measurement approach, using an uncooled thermal camera, is proposed in this paper. The basic principle of the method is to track visible thermal features at the water surface in thermal camera image sequences. Radiometric and geometric calibrations are firstly implemented to remove vignetting effects in thermal imagery and to get the interior orientation parameters of the camera. An object-based unsupervised classification approach is then applied to detect the interest regions for data referencing and thermal feature tracking. Subsequently, GCPs are extracted to orient the river image sequences and local hot points are identified as tracking features. Afterwards, accurate dense tracking outputs are obtained using pyramidal Lucas-Kanade method. To validate the accuracy potential of the method, measurements obtained from thermal feature tracking are compared with reference measurements taken by a propeller gauge. Results show a great potential of automatic flow velocity measurement in small rivers using imagery from a thermal camera.

  15. Discours polemique, refutation et resolution des sequences conversationnelles (Argumentative Discourse, Refutation and Outcome of Conversational Sequences).

    ERIC Educational Resources Information Center

    Moeschler, Jacques

    1981-01-01

    Analyzes the strategies employed in terminating conversational exchanges, with particular attention to argumentative sequences. Examines the features that distinguish these sequences from those that have a transactional character, and discusses the patterns of verbal interaction attendant to negative responses. Societe Nouvelle Didier Erudition,…

  16. Coupling detrended fluctuation analysis for multiple warehouse-out behavioral sequences

    NASA Astrophysics Data System (ADS)

    Yao, Can-Zhong; Lin, Ji-Nan; Zheng, Xu-Zhou

    2017-01-01

    Interaction patterns among different warehouses could make the warehouse-out behavioral sequences less predictable. We firstly take a coupling detrended fluctuation analysis on the warehouse-out quantity, and find that the multivariate sequences exhibit significant coupling multifractal characteristics regardless of the types of steel products. Secondly, we track the sources of multifractal warehouse-out sequences by shuffling and surrogating original ones, and we find that fat-tail distribution contributes more to multifractal features than the long-term memory, regardless of types of steel products. From perspective of warehouse contribution, some warehouses steadily contribute more to multifractal than other warehouses. Finally, based on multiscale multifractal analysis, we propose Hurst surface structure to investigate coupling multifractal, and show that multiple behavioral sequences exhibit significant coupling multifractal features that emerge and usually be restricted within relatively greater time scale interval.

  17. Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter.

    PubMed

    Roux-Rouquie, M; Marilley, M

    2000-09-15

    We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X. laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed.

  18. Characterization of photosynthetic ferredoxin from the Antarctic alga Chlamydomonas sp. UWO241 reveals novel features of cold adaptation.

    PubMed

    Cvetkovska, Marina; Szyszka-Mroz, Beth; Possmayer, Marc; Pittock, Paula; Lajoie, Gilles; Smith, David R; Hüner, Norman P A

    2018-05-08

    The objective of this work was to characterize photosynthetic ferredoxin from the Antarctic green alga Chlamydomonas sp. UWO241, a key enzyme involved in distributing photosynthetic reducing power. We hypothesize that ferredoxin possesses characteristics typical of cold-adapted enzymes, namely increased structural flexibility and high activity at low temperatures, accompanied by low stability at moderate temperatures. To address this objective, we purified ferredoxin from UWO241 and characterized the temperature dependence of its enzymatic activity and protein conformation. The UWO241 ferredoxin protein, RNA, and DNA sequences were compared with homologous sequences from related organisms. We provide evidence for the duplication of the main ferredoxin gene in the UWO241 nuclear genome and the presence of two highly similar proteins. Ferredoxin from UWO241 has both high activity at low temperatures and high stability at moderate temperatures, representing a novel class of cold-adapted enzymes. Our study reveals novel insights into how photosynthesis functions in the cold. The presence of two distinct ferredoxin proteins in UWO241 could provide an adaptive advantage for survival at cold temperatures. The primary amino acid sequence of ferredoxin is highly conserved among photosynthetic species, and we suggest that subtle differences in sequence can lead to significant changes in activity at low temperatures. © 2018 The Authors. New Phytologist © 2018 New Phytologist Trust.

  19. Involvement of Two Latex-Clearing Proteins during Rubber Degradation and Insights into the Subsequent Degradation Pathway Revealed by the Genome Sequence of Gordonia polyisoprenivorans Strain VH2

    PubMed Central

    Hiessl, Sebastian; Schuldes, Jörg; Thürmer, Andrea; Halbsguth, Tobias; Bröker, Daniel; Angelov, Angel; Liebl, Wolfgang; Daniel, Rolf

    2012-01-01

    The increasing production of synthetic and natural poly(cis-1,4-isoprene) rubber leads to huge challenges in waste management. Only a few bacteria are known to degrade rubber, and little is known about the mechanism of microbial rubber degradation. The genome of Gordonia polyisoprenivorans strain VH2, which is one of the most effective rubber-degrading bacteria, was sequenced and annotated to elucidate the degradation pathway and other features of this actinomycete. The genome consists of a circular chromosome of 5,669,805 bp and a circular plasmid of 174,494 bp with average GC contents of 67.0% and 65.7%, respectively. It contains 5,110 putative protein-coding sequences, including many candidate genes responsible for rubber degradation and other biotechnically relevant pathways. Furthermore, we detected two homologues of a latex-clearing protein, which is supposed to be a key enzyme in rubber degradation. The deletion of these two genes for the first time revealed clear evidence that latex-clearing protein is essential for the microbial utilization of rubber. Based on the genome sequence, we predict a pathway for the microbial degradation of rubber which is supported by previous and current data on transposon mutagenesis, deletion mutants, applied comparative genomics, and literature search. PMID:22327575

  20. The identification of cis-regulatory elements: A review from a machine learning perspective.

    PubMed

    Li, Yifeng; Chen, Chih-Yu; Kaye, Alice M; Wasserman, Wyeth W

    2015-12-01

    The majority of the human genome consists of non-coding regions that have been called junk DNA. However, recent studies have unveiled that these regions contain cis-regulatory elements, such as promoters, enhancers, silencers, insulators, etc. These regulatory elements can play crucial roles in controlling gene expressions in specific cell types, conditions, and developmental stages. Disruption to these regions could contribute to phenotype changes. Precisely identifying regulatory elements is key to deciphering the mechanisms underlying transcriptional regulation. Cis-regulatory events are complex processes that involve chromatin accessibility, transcription factor binding, DNA methylation, histone modifications, and the interactions between them. The development of next-generation sequencing techniques has allowed us to capture these genomic features in depth. Applied analysis of genome sequences for clinical genetics has increased the urgency for detecting these regions. However, the complexity of cis-regulatory events and the deluge of sequencing data require accurate and efficient computational approaches, in particular, machine learning techniques. In this review, we describe machine learning approaches for predicting transcription factor binding sites, enhancers, and promoters, primarily driven by next-generation sequencing data. Data sources are provided in order to facilitate testing of novel methods. The purpose of this review is to attract computational experts and data scientists to advance this field. Crown Copyright © 2015. Published by Elsevier Ireland Ltd. All rights reserved.

  1. Genes involved in protein metabolism of the probiotic lactic acid bacterium Lactobacillus delbrueckii UFV H2b20.

    PubMed

    Do Carmo, A P; da Silva, D F; De Oliveira, M N V; Borges, A C; De Carvalho, A F; De Moraes, C A

    2011-09-01

    A basic requirement for the prediction of the potential use of lactic acid bacteria (LAB) in the dairy industry is the identification of specific genes involved in flavour-forming pathways. The probiotic Lactobacillus delbrueckii UFV H2b20 was submitted to a genetic characterisation and phylogenetic analysis of genes involved in protein catabolism. Eight genes belonging to this system were identified, which possess a closely phylogenetic relationship to NCFM strains representative, as it was demonstrated for oppC and oppBII, encoding oligopeptide transport system components. PepC, PepN, and PepX might be essential for growth of LAB, probiotic or not, since the correspondent genes are always present, including in L. delbrueckii UFV H2b20 genome. For pepX gene, a probable link between carbohydrate catabolism and PepX expression may exists, where it is regulated by PepR1/CcpA-like, a common feature between Lactobacillus strains and also in L. delbrueckii UFV H2b20. The well conserved evolutionary history of the ilvE gene is evidence that the pathways leading to branched-chain amino acid degradation, such as isoleucine and valine, are similar among L. delbrueckii subsp. bulgaricus strains and L. delbrueckii UFV H2b20. Thus, the involvement of succinate in flavour formation can be attributed to IlvE activity. The presence of aminopeptidase G in L. delbrueckii UFV H2b20 genome, which is absent in several strains, might improve the proteolytic activity and effectiveness. The nucleotide sequence encoding PepG revealed that it is a cysteine endopeptidase, belonging to Peptidase C1 superfamily; sequence analysis showed 99% identity with L. delbrueckii subsp. bulgaricus ATCC 11842 pepG, whereas protein sequence analysis revealed 100% similarity with PepG from the same organism. The present study proposes a schematic model to explain how the proteolytic system of the probiotic L. delbrueckii UFV H2b20 works, based on the components identified so far.

  2. Taxonomic evaluation of selected Ganoderma species and database sequence validation

    PubMed Central

    Jargalmaa, Suldbold; Eimes, John A.; Park, Myung Soo; Park, Jae Young; Oh, Seung-Yoon

    2017-01-01

    Species in the genus Ganoderma include several ecologically important and pathogenic fungal species whose medicinal and economic value is substantial. Due to the highly similar morphological features within the Ganoderma, identification of species has relied heavily on DNA sequencing using BLAST searches, which are only reliable if the GenBank submissions are accurately labeled. In this study, we examined 113 specimens collected from 1969 to 2016 from various regions in Korea using morphological features and multigene analysis (internal transcribed spacer, translation elongation factor 1-α, and the second largest subunit of RNA polymerase II). These specimens were identified as four Ganoderma species: G. sichuanense, G. cf. adspersum, G. cf. applanatum, and G. cf. gibbosum. With the exception of G. sichuanense, these species were difficult to distinguish based solely on morphological features. However, phylogenetic analysis at three different loci yielded concordant phylogenetic information, and supported the four species distinctions with high bootstrap support. A survey of over 600 Ganoderma sequences available on GenBank revealed that 65% of sequences were either misidentified or ambiguously labeled. Here, we suggest corrected annotations for GenBank sequences based on our phylogenetic validation and provide updated global distribution patterns for these Ganoderma species. PMID:28761785

  3. Web Apollo: a web-based genomic annotation editing platform.

    PubMed

    Lee, Eduardo; Helt, Gregg A; Reese, Justin T; Munoz-Torres, Monica C; Childers, Chris P; Buels, Robert M; Stein, Lincoln; Holmes, Ian H; Elsik, Christine G; Lewis, Suzanna E

    2013-08-30

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world.

  4. Web Apollo: a web-based genomic annotation editing platform

    PubMed Central

    2013-01-01

    Web Apollo is the first instantaneous, collaborative genomic annotation editor available on the web. One of the natural consequences following from current advances in sequencing technology is that there are more and more researchers sequencing new genomes. These researchers require tools to describe the functional features of their newly sequenced genomes. With Web Apollo researchers can use any of the common browsers (for example, Chrome or Firefox) to jointly analyze and precisely describe the features of a genome in real time, whether they are in the same room or working from opposite sides of the world. PMID:24000942

  5. Taxonomy-aware feature engineering for microbiome classification.

    PubMed

    Oudah, Mai; Henschel, Andreas

    2018-06-15

    What is a healthy microbiome? The pursuit of this and many related questions, especially in light of the recently recognized microbial component in a wide range of diseases has sparked a surge in metagenomic studies. They are often not simply attributable to a single pathogen but rather are the result of complex ecological processes. Relatedly, the increasing DNA sequencing depth and number of samples in metagenomic case-control studies enabled the applicability of powerful statistical methods, e.g. Machine Learning approaches. For the latter, the feature space is typically shaped by the relative abundances of operational taxonomic units, as determined by cost-effective phylogenetic marker gene profiles. While a substantial body of microbiome/microbiota research involves unsupervised and supervised Machine Learning, very little attention has been put on feature selection and engineering. We here propose the first algorithm to exploit phylogenetic hierarchy (i.e. an all-encompassing taxonomy) in feature engineering for microbiota classification. The rationale is to exploit the often mono- or oligophyletic distribution of relevant (but hidden) traits by virtue of taxonomic abstraction. The algorithm is embedded in a comprehensive microbiota classification pipeline, which we applied to a diverse range of datasets, distinguishing healthy from diseased microbiota samples. We demonstrate substantial improvements over the state-of-the-art microbiota classification tools in terms of classification accuracy, regardless of the actual Machine Learning technique while using drastically reduced feature spaces. Moreover, generalized features bear great explanatory value: they provide a concise description of conditions and thus help to provide pathophysiological insights. Indeed, the automatically and reproducibly derived features are consistent with previously published domain expert analyses.

  6. iNuc-PhysChem: A Sequence-Based Predictor for Identifying Nucleosomes via Physicochemical Properties

    PubMed Central

    Feng, Peng-Mian; Ding, Chen; Zuo, Yong-Chun; Chou, Kuo-Chen

    2012-01-01

    Nucleosome positioning has important roles in key cellular processes. Although intensive efforts have been made in this area, the rules defining nucleosome positioning is still elusive and debated. In this study, we carried out a systematic comparison among the profiles of twelve DNA physicochemical features between the nucleosomal and linker sequences in the Saccharomyces cerevisiae genome. We found that nucleosomal sequences have some position-specific physicochemical features, which can be used for in-depth studying nucleosomes. Meanwhile, a new predictor, called iNuc-PhysChem, was developed for identification of nucleosomal sequences by incorporating these physicochemical properties into a 1788-D (dimensional) feature vector, which was further reduced to a 884-D vector via the IFS (incremental feature selection) procedure to optimize the feature set. It was observed by a cross-validation test on a benchmark dataset that the overall success rate achieved by iNuc-PhysChem was over 96% in identifying nucleosomal or linker sequences. As a web-server, iNuc-PhysChem is freely accessible to the public at http://lin.uestc.edu.cn/server/iNuc-PhysChem. For the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematics that were presented just for the integrity in developing the predictor. Meanwhile, for those who prefer to run predictions in their own computers, the predictor's code can be easily downloaded from the web-server. It is anticipated that iNuc-PhysChem may become a useful high throughput tool for both basic research and drug design. PMID:23144709

  7. Predicting membrane protein types by fusing composite protein sequence features into pseudo amino acid composition.

    PubMed

    Hayat, Maqsood; Khan, Asifullah

    2011-02-21

    Membrane proteins are vital type of proteins that serve as channels, receptors, and energy transducers in a cell. Prediction of membrane protein types is an important research area in bioinformatics. Knowledge of membrane protein types provides some valuable information for predicting novel example of the membrane protein types. However, classification of membrane protein types can be both time consuming and susceptible to errors due to the inherent similarity of membrane protein types. In this paper, neural networks based membrane protein type prediction system is proposed. Composite protein sequence representation (CPSR) is used to extract the features of a protein sequence, which includes seven feature sets; amino acid composition, sequence length, 2 gram exchange group frequency, hydrophobic group, electronic group, sum of hydrophobicity, and R-group. Principal component analysis is then employed to reduce the dimensionality of the feature vector. The probabilistic neural network (PNN), generalized regression neural network, and support vector machine (SVM) are used as classifiers. A high success rate of 86.01% is obtained using SVM for the jackknife test. In case of independent dataset test, PNN yields the highest accuracy of 95.73%. These classifiers exhibit improved performance using other performance measures such as sensitivity, specificity, Mathew's correlation coefficient, and F-measure. The experimental results show that the prediction performance of the proposed scheme for classifying membrane protein types is the best reported, so far. This performance improvement may largely be credited to the learning capabilities of neural networks and the composite feature extraction strategy, which exploits seven different properties of protein sequences. The proposed Mem-Predictor can be accessed at http://111.68.99.218/Mem-Predictor. Copyright © 2010 Elsevier Ltd. All rights reserved.

  8. PredPPCrys: Accurate Prediction of Sequence Cloning, Protein Production, Purification and Crystallization Propensity from Protein Sequences Using Multi-Step Heterogeneous Feature Fusion and Selection

    PubMed Central

    Wang, Huilin; Wang, Mingjun; Tan, Hao; Li, Yuan; Zhang, Ziding; Song, Jiangning

    2014-01-01

    X-ray crystallography is the primary approach to solve the three-dimensional structure of a protein. However, a major bottleneck of this method is the failure of multi-step experimental procedures to yield diffraction-quality crystals, including sequence cloning, protein material production, purification, crystallization and ultimately, structural determination. Accordingly, prediction of the propensity of a protein to successfully undergo these experimental procedures based on the protein sequence may help narrow down laborious experimental efforts and facilitate target selection. A number of bioinformatics methods based on protein sequence information have been developed for this purpose. However, our knowledge on the important determinants of propensity for a protein sequence to produce high diffraction-quality crystals remains largely incomplete. In practice, most of the existing methods display poorer performance when evaluated on larger and updated datasets. To address this problem, we constructed an up-to-date dataset as the benchmark, and subsequently developed a new approach termed ‘PredPPCrys’ using the support vector machine (SVM). Using a comprehensive set of multifaceted sequence-derived features in combination with a novel multi-step feature selection strategy, we identified and characterized the relative importance and contribution of each feature type to the prediction performance of five individual experimental steps required for successful crystallization. The resulting optimal candidate features were used as inputs to build the first-level SVM predictor (PredPPCrys I). Next, prediction outputs of PredPPCrys I were used as the input to build second-level SVM classifiers (PredPPCrys II), which led to significantly enhanced prediction performance. Benchmarking experiments indicated that our PredPPCrys method outperforms most existing procedures on both up-to-date and previous datasets. In addition, the predicted crystallization targets of currently non-crystallizable proteins were provided as compendium data, which are anticipated to facilitate target selection and design for the worldwide structural genomics consortium. PredPPCrys is freely available at http://www.structbioinfor.org/PredPPCrys. PMID:25148528

  9. Cognitive and neural foundations of discrete sequence skill: a TMS study.

    PubMed

    Ruitenberg, Marit F L; Verwey, Willem B; Schutter, Dennis J L G; Abrahamse, Elger L

    2014-04-01

    Executing discrete movement sequences typically involves a shift with practice from a relatively slow, stimulus-based mode to a fast mode in which performance is based on retrieving and executing entire motor chunks. The dual processor model explains the performance of (skilled) discrete key-press sequences in terms of an interplay between a cognitive processor and a motor system. In the present study, we tested and confirmed the core assumptions of this model at the behavioral level. In addition, we explored the involvement of the pre-supplementary motor area (pre-SMA) in discrete sequence skill by applying inhibitory 20 min 1-Hz off-line repetitive transcranial magnetic stimulation (rTMS). Based on previous work, we predicted pre-SMA involvement in the selection/initiation of motor chunks, and this was confirmed by our results. The pre-SMA was further observed to be more involved in more complex than in simpler sequences, while no evidence was found for pre-SMA involvement in direct stimulus-response translations or associative learning processes. In conclusion, support is provided for the dual processor model, and for pre-SMA involvement in the initiation of motor chunks. Copyright © 2014 Elsevier Ltd. All rights reserved.

  10. Features of the bronchial bacterial microbiome associated with atopy, asthma, and responsiveness to inhaled corticosteroid treatment.

    PubMed

    Durack, Juliana; Lynch, Susan V; Nariya, Snehal; Bhakta, Nirav R; Beigelman, Avraham; Castro, Mario; Dyer, Anne-Marie; Israel, Elliot; Kraft, Monica; Martin, Richard J; Mauger, David T; Rosenberg, Sharon R; Sharp-King, Tonya; White, Steven R; Woodruff, Prescott G; Avila, Pedro C; Denlinger, Loren C; Holguin, Fernando; Lazarus, Stephen C; Lugogo, Njira; Moore, Wendy C; Peters, Stephen P; Que, Loretta; Smith, Lewis J; Sorkness, Christine A; Wechsler, Michael E; Wenzel, Sally E; Boushey, Homer A; Huang, Yvonne J

    2017-07-01

    Compositional differences in the bronchial bacterial microbiota have been associated with asthma, but it remains unclear whether the findings are attributable to asthma, to aeroallergen sensitization, or to inhaled corticosteroid treatment. We sought to compare the bronchial bacterial microbiota in adults with steroid-naive atopic asthma, subjects with atopy but no asthma, and nonatopic healthy control subjects and to determine relationships of the bronchial microbiota to phenotypic features of asthma. Bacterial communities in protected bronchial brushings from 42 atopic asthmatic subjects, 21 subjects with atopy but no asthma, and 21 healthy control subjects were profiled by using 16S rRNA gene sequencing. Bacterial composition and community-level functions inferred from sequence profiles were analyzed for between-group differences. Associations with clinical and inflammatory variables were examined, including markers of type 2-related inflammation and change in airway hyperresponsiveness after 6 weeks of fluticasone treatment. The bronchial microbiome differed significantly among the 3 groups. Asthmatic subjects were uniquely enriched in members of the Haemophilus, Neisseria, Fusobacterium, and Porphyromonas species and the Sphingomonodaceae family and depleted in members of the Mogibacteriaceae family and Lactobacillales order. Asthma-associated differences in predicted bacterial functions included involvement of amino acid and short-chain fatty acid metabolism pathways. Subjects with type 2-high asthma harbored significantly lower bronchial bacterial burden. Distinct changes in specific microbiota members were seen after fluticasone treatment. Steroid responsiveness was linked to differences in baseline compositional and functional features of the bacterial microbiome. Even in subjects with mild steroid-naive asthma, differences in the bronchial microbiome are associated with immunologic and clinical features of the disease. The specific differences identified suggest possible microbiome targets for future approaches to asthma treatment or prevention. Copyright © 2016 American Academy of Allergy, Asthma & Immunology. Published by Elsevier Inc. All rights reserved.

  11. A Feature-Based Approach to Modeling Protein–DNA Interactions

    PubMed Central

    Segal, Eran

    2008-01-01

    Transcription factor (TF) binding to its DNA target site is a fundamental regulatory interaction. The most common model used to represent TF binding specificities is a position specific scoring matrix (PSSM), which assumes independence between binding positions. However, in many cases, this simplifying assumption does not hold. Here, we present feature motif models (FMMs), a novel probabilistic method for modeling TF–DNA interactions, based on log-linear models. Our approach uses sequence features to represent TF binding specificities, where each feature may span multiple positions. We develop the mathematical formulation of our model and devise an algorithm for learning its structural features from binding site data. We also developed a discriminative motif finder, which discovers de novo FMMs that are enriched in target sets of sequences compared to background sets. We evaluate our approach on synthetic data and on the widely used TF chromatin immunoprecipitation (ChIP) dataset of Harbison et al. We then apply our algorithm to high-throughput TF ChIP data from mouse and human, reveal sequence features that are present in the binding specificities of mouse and human TFs, and show that FMMs explain TF binding significantly better than PSSMs. Our FMM learning and motif finder software are available at http://genie.weizmann.ac.il/. PMID:18725950

  12. Quantiprot - a Python package for quantitative analysis of protein sequences.

    PubMed

    Konopka, Bogumił M; Marciniak, Marta; Dyrka, Witold

    2017-07-17

    The field of protein sequence analysis is dominated by tools rooted in substitution matrices and alignments. A complementary approach is provided by methods of quantitative characterization. A major advantage of the approach is that quantitative properties defines a multidimensional solution space, where sequences can be related to each other and differences can be meaningfully interpreted. Quantiprot is a software package in Python, which provides a simple and consistent interface to multiple methods for quantitative characterization of protein sequences. The package can be used to calculate dozens of characteristics directly from sequences or using physico-chemical properties of amino acids. Besides basic measures, Quantiprot performs quantitative analysis of recurrence and determinism in the sequence, calculates distribution of n-grams and computes the Zipf's law coefficient. We propose three main fields of application of the Quantiprot package. First, quantitative characteristics can be used in alignment-free similarity searches, and in clustering of large and/or divergent sequence sets. Second, a feature space defined by quantitative properties can be used in comparative studies of protein families and organisms. Third, the feature space can be used for evaluating generative models, where large number of sequences generated by the model can be compared to actually observed sequences.

  13. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues.

    PubMed

    Ma, Xin; Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/.

  14. Mutations of E3 Ubiquitin Ligase Cbl Family Members Constitute a Novel Common Pathogenic Lesion in Myeloid Malignancies

    PubMed Central

    Makishima, Hideki; Cazzolli, Heather; Szpurka, Hadrian; Dunbar, Andrew; Tiu, Ramon; Huh, Jungwon; Muramatsu, Hideki; O'Keefe, Christine; Hsi, Eric; Paquette, Ronald L.; Kojima, Seiji; List, Alan F.; Sekeres, Mikkael A.; McDevitt, Michael A.; Maciejewski, Jaroslaw P.

    2009-01-01

    Purpose Acquired somatic uniparental disomy (UPD) is commonly observed in myelodysplastic syndromes (MDS), myelodysplastic/myeloproliferative neoplasms (MDS/MPN), or secondary acute myelogenous leukemia (sAML) and may point toward genes harboring mutations. Recurrent UPD11q led to identification of homozygous mutations in c-Cbl, an E3 ubiquitin ligase involved in attenuation of proliferative signals transduced by activated receptor tyrosine kinases. We examined the role and frequency of Cbl gene family mutations in MPN and related conditions. Methods We applied high-density SNP-A karyotyping to identify loss of heterozygosity of 11q in 442 patients with MDS, MDS/MPN, MPN, sAML evolved from these conditions, and primary AML. We sequenced c-Cbl, Cbl-b, and Cbl-c in patients with or without corresponding UPD or deletions and correlated mutational status with clinical features and outcomes. Results We identified c-Cbl mutations in 5% and 9% of patients with chronic myelomonocytic leukemia (CMML) and sAML, and also in CML blast crisis and juvenile myelomonocytic leukemia (JMML). Most mutations were homozygous and affected c-Cbl; mutations in Cbl-b were also found in patients with similar clinical features. Patients with Cbl family mutations showed poor prognosis, with a median survival of 5 months. Pathomorphologic features included monocytosis, monocytoid blasts, aberrant expression of phosphoSTAT5, and c-kit overexpression. Serial studies showed acquisition of c-Cbl mutations during malignant evolution. Conclusion Mutations in the Cbl family RING finger domain or linker sequence constitute important pathogenic lesions associated with not only preleukemic CMML, JMML, and other MPN, but also progression to AML, suggesting that impairment of degradation of activated tyrosine kinases constitutes an important cancer mechanism. PMID:19901108

  15. Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure.

    PubMed

    Song, Jiangning; Yuan, Zheng; Tan, Hao; Huber, Thomas; Burrage, Kevin

    2007-12-01

    Disulfide bonds are primary covalent crosslinks between two cysteine residues in proteins that play critical roles in stabilizing the protein structures and are commonly found in extracy-toplasmatic or secreted proteins. In protein folding prediction, the localization of disulfide bonds can greatly reduce the search in conformational space. Therefore, there is a great need to develop computational methods capable of accurately predicting disulfide connectivity patterns in proteins that could have potentially important applications. We have developed a novel method to predict disulfide connectivity patterns from protein primary sequence, using a support vector regression (SVR) approach based on multiple sequence feature vectors and predicted secondary structure by the PSIPRED program. The results indicate that our method could achieve a prediction accuracy of 74.4% and 77.9%, respectively, when averaged on proteins with two to five disulfide bridges using 4-fold cross-validation, measured on the protein and cysteine pair on a well-defined non-homologous dataset. We assessed the effects of different sequence encoding schemes on the prediction performance of disulfide connectivity. It has been shown that the sequence encoding scheme based on multiple sequence feature vectors coupled with predicted secondary structure can significantly improve the prediction accuracy, thus enabling our method to outperform most of other currently available predictors. Our work provides a complementary approach to the current algorithms that should be useful in computationally assigning disulfide connectivity patterns and helps in the annotation of protein sequences generated by large-scale whole-genome projects. The prediction web server and Supplementary Material are accessible at http://foo.maths.uq.edu.au/~huber/disulfide

  16. Distillation and purification of symmetric entangled Gaussian states

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Fiurasek, Jaromir

    2010-10-15

    We propose an entanglement distillation and purification scheme for symmetric two-mode entangled Gaussian states that allows to asymptotically extract a pure entangled Gaussian state from any input entangled symmetric Gaussian state. The proposed scheme is a modified and extended version of the entanglement distillation protocol originally developed by Browne et al. [Phys. Rev. A 67, 062320 (2003)]. A key feature of the present protocol is that it utilizes a two-copy degaussification procedure that involves a Mach-Zehnder interferometer with single-mode non-Gaussian filters inserted in its two arms. The required non-Gaussian filtering operations can be implemented by coherently combining two sequences ofmore » single-photon addition and subtraction operations.« less

  17. CyanoClust: comparative genome resources of cyanobacteria and plastids.

    PubMed

    Sasaki, Naobumi V; Sato, Naoki

    2010-01-01

    Cyanobacteria, which perform oxygen-evolving photosynthesis as do chloroplasts of plants and algae, are one of the best-studied prokaryotic phyla and one from which many representative genomes have been sequenced. Lack of a suitable comparative genomic database has been a problem in cyanobacterial genomics because many proteins involved in physiological functions such as photosynthesis and nitrogen fixation are not catalogued in commonly used databases, such as Clusters of Orthologous Proteins (COG). CyanoClust is a database of homolog groups in cyanobacteria and plastids that are produced by the program Gclust. We have developed a web-server system for the protein homology database featuring cyanobacteria and plastids. Database URL: http://cyanoclust.c.u-tokyo.ac.jp/.

  18. The 11 Micron Emissions of Carbon Stars

    NASA Technical Reports Server (NTRS)

    Goebel, J. H.; Cheeseman, P.; Gerbault, F.

    1995-01-01

    A new classification scheme of the IRAS LRS carbon stars is presented. It comprises the separation of 718 probable carbon stars into 12 distinct self-similar spectral groupings. Continuum temperatures are assigned and range from 470 to 5000 K. Three distinct dust species are identifiable: SiC, alpha:C-H, and MgS. In addition to the narrow 11 + micron emission feature that is commonly attributed to SiC, a broad 11 + micron emission feature, that is correlated with the 8.5 and 7.7 micron features, is found and attributed to alpha:C-H. SiC and alpha:C-H band strengths are found to correlate with the temperature progression among the Classes. We find a spectral sequence of Classes that reflects the carbon star evolutionary sequence of spectral types, or alternatively developmental sequences of grain condensation in carbon-rich circumstellar shells. If decreasing temperature corresponds to increasing evolution, then decreasing temperature corresponds to increasing C/O resulting in increasing amounts of carbon rich dust, namely alpha:C-H. If decreasing the temperature corresponds to a grain condensation sequence, then heterogeneous, or induced nucleation scenarios are supported. SiC grains precede alpha:C-H and form the nuclei for the condensation of the latter material. At still lower temperatures, MgS appears to be quite prevalent. No 11.3 micron PAH features are identified in any of the 718 carbon stars. However, one of the coldest objects, IRAS 15048-5702, and a few others, displays an 11.9 micron emission feature characteristic of laboratory samples of coronene. That feature corresponds to the C-H out of plane deformation mode of aromatic hydrocarbon. This band indicates the presence of unsaturated, sp(sup 3), hydrocarbon bonds that may subsequently evolve into saturated bonds, sp(sup 2), if, and when, the star enters the planetary nebulae phase of stellar evolution. The effusion of hydrogen from the hydrocarbon grain results in the evolution in wavelength of this 11.9 micron emission feature to the 11.3 micron feature.

  19. The 11 Micron Emissions of Cabon Stars

    NASA Technical Reports Server (NTRS)

    Goebel, J. H.; Cheeseman, P.; Gerbault, F.

    1995-01-01

    A new classification scheme of the IRAS LRS carbon stars is presented. It comprises the separation of 718 probable carbon stars into 12 distinct self-similar spectral groupings. Continuum temperatures are assigned and range from 470 to 5000 K. Three distinct dust species are identifiable: SiC, alpha:C-H, and MgS. In addition to the narrow 11 + micron emission feature that is commonly attributed to SiC, a broad 11 + micron emission feature, that is correlated with the 8.5 and 7.7 micron features, is found and attributed to alpha:C-H. SiC and alpha:C-H band strengths are found to correlate with the temperature progression among the Classes. We find a spectral sequence of Classes that reflects the carbon star evolutionary sequence of spectral types, or alternatively developmental sequences of grain condensation in carbon-rich circumstellar shells. If decreasing temperature corresponds to increasing evolution, then decreasing temperature corresponds to increasing CIO resulting in increasing amounts of carbon rich dust, namely alpha:C-H. If decreasing the temperature corresponds to a grain condensation sequence, then heterogeneous, or induced nucleation scenarios are supported. SiC grains precede alpha:C-H and form the nuclei for the condensation of the latter material. At still lower temperatures, MgS appears to be quite prevalent. No 11.3 micron PAH features are identified in any of the 718 carbon stars. However, one of the coldest objects, IRAS 15048-5702, and a few others, displays an 11.9 micron emission feature characteristic of laboratory samples of coronene. That feature corresponds to the C-H out of plane deformation mode of aromatic hydrocarbon. This band indicates the presence of unsaturated, sp(sup 3), hydrocarbon bonds that may subsequently evolve into saturated bonds, sp(sup 2), if, and when, the star enters the planetary nebulae phase of stellar evolution. The effusion of hydrogen from the hydrocarbon grain results in the evolution in wavelength of this 11.9 micron emission feature to the 11.3 micron feature.

  20. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences.

    PubMed

    Mizianty, Marcin J; Kurgan, Lukasz

    2009-12-13

    Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/.

  1. Modular prediction of protein structural classes from sequences of twilight-zone identity with predicting sequences

    PubMed Central

    2009-01-01

    Background Knowledge of structural class is used by numerous methods for identification of structural/functional characteristics of proteins and could be used for the detection of remote homologues, particularly for chains that share twilight-zone similarity. In contrast to existing sequence-based structural class predictors, which target four major classes and which are designed for high identity sequences, we predict seven classes from sequences that share twilight-zone identity with the training sequences. Results The proposed MODular Approach to Structural class prediction (MODAS) method is unique as it allows for selection of any subset of the classes. MODAS is also the first to utilize a novel, custom-built feature-based sequence representation that combines evolutionary profiles and predicted secondary structure. The features quantify information relevant to the definition of the classes including conservation of residues and arrangement and number of helix/strand segments. Our comprehensive design considers 8 feature selection methods and 4 classifiers to develop Support Vector Machine-based classifiers that are tailored for each of the seven classes. Tests on 5 twilight-zone and 1 high-similarity benchmark datasets and comparison with over two dozens of modern competing predictors show that MODAS provides the best overall accuracy that ranges between 80% and 96.7% (83.5% for the twilight-zone datasets), depending on the dataset. This translates into 19% and 8% error rate reduction when compared against the best performing competing method on two largest datasets. The proposed predictor provides accurate predictions at 58% accuracy for membrane proteins class, which is not considered by majority of existing methods, in spite that this class accounts for only 2% of the data. Our predictive model is analyzed to demonstrate how and why the input features are associated with the corresponding classes. Conclusions The improved predictions stem from the novel features that express collocation of the secondary structure segments in the protein sequence and that combine evolutionary and secondary structure information. Our work demonstrates that conservation and arrangement of the secondary structure segments predicted along the protein chain can successfully predict structural classes which are defined based on the spatial arrangement of the secondary structures. A web server is available at http://biomine.ece.ualberta.ca/MODAS/. PMID:20003388

  2. Indel PDB: a database of structural insertions and deletions derived from sequence alignments of closely related proteins.

    PubMed

    Hsing, Michael; Cherkasov, Artem

    2008-06-25

    Insertions and deletions (indels) represent a common type of sequence variations, which are less studied and pose many important biological questions. Recent research has shown that the presence of sizable indels in protein sequences may be indicative of protein essentiality and their role in protein interaction networks. Examples of utilization of indels for structure-based drug design have also been recently demonstrated. Nonetheless many structural and functional characteristics of indels remain less researched or unknown. We have created a web-based resource, Indel PDB, representing a structural database of insertions/deletions identified from the sequence alignments of highly similar proteins found in the Protein Data Bank (PDB). Indel PDB utilized large amounts of available structural information to characterize 1-, 2- and 3-dimensional features of indel sites. Indel PDB contains 117,266 non-redundant indel sites extracted from 11,294 indel-containing proteins. Unlike loop databases, Indel PDB features more indel sequences with secondary structures including alpha-helices and beta-sheets in addition to loops. The insertion fragments have been characterized by their sequences, lengths, locations, secondary structure composition, solvent accessibility, protein domain association and three dimensional structures. By utilizing the data available in Indel PDB, we have studied and presented here several sequence and structural features of indels. We anticipate that Indel PDB will not only enable future functional studies of indels, but will also assist protein modeling efforts and identification of indel-directed drug binding sites.

  3. Functional Testing of SLC26A4 Variants—Clinical and Molecular Analysis of a Cohort with Enlarged Vestibular Aqueduct from Austria

    PubMed Central

    Bernardinelli, Emanuele; Nofziger, Charity; Patsch, Wolfgang; Rasp, Gerd; Paulmichl, Markus; Dossena, Silvia

    2018-01-01

    The prevalence and spectrum of sequence alterations in the SLC26A4 gene, which codes for the anion exchanger pendrin, are population-specific and account for at least 50% of cases of non-syndromic hearing loss associated with an enlarged vestibular aqueduct. A cohort of nineteen patients from Austria with hearing loss and a radiological alteration of the vestibular aqueduct underwent Sanger sequencing of SLC26A4 and GJB2, coding for connexin 26. The pathogenicity of sequence alterations detected was assessed by determining ion transport and molecular features of the corresponding SLC26A4 protein variants. In this group, four uncharacterized sequence alterations within the SLC26A4 coding region were found. Three of these lead to protein variants with abnormal functional and molecular features, while one should be considered with no pathogenic potential. Pathogenic SLC26A4 sequence alterations were only found in 12% of patients. SLC26A4 sequence alterations commonly found in other Caucasian populations were not detected. This survey represents the first study on the prevalence and spectrum of SLC26A4 sequence alterations in an Austrian cohort and further suggests that genetic testing should always be integrated with functional characterization and determination of the molecular features of protein variants in order to unequivocally identify or exclude a causal link between genotype and phenotype. PMID:29320412

  4. Noise-robust speech recognition through auditory feature detection and spike sequence decoding.

    PubMed

    Schafer, Phillip B; Jin, Dezhe Z

    2014-03-01

    Speech recognition in noisy conditions is a major challenge for computer systems, but the human brain performs it routinely and accurately. Automatic speech recognition (ASR) systems that are inspired by neuroscience can potentially bridge the performance gap between humans and machines. We present a system for noise-robust isolated word recognition that works by decoding sequences of spikes from a population of simulated auditory feature-detecting neurons. Each neuron is trained to respond selectively to a brief spectrotemporal pattern, or feature, drawn from the simulated auditory nerve response to speech. The neural population conveys the time-dependent structure of a sound by its sequence of spikes. We compare two methods for decoding the spike sequences--one using a hidden Markov model-based recognizer, the other using a novel template-based recognition scheme. In the latter case, words are recognized by comparing their spike sequences to template sequences obtained from clean training data, using a similarity measure based on the length of the longest common sub-sequence. Using isolated spoken digits from the AURORA-2 database, we show that our combined system outperforms a state-of-the-art robust speech recognizer at low signal-to-noise ratios. Both the spike-based encoding scheme and the template-based decoding offer gains in noise robustness over traditional speech recognition methods. Our system highlights potential advantages of spike-based acoustic coding and provides a biologically motivated framework for robust ASR development.

  5. The mitochondrial DNA 10197 G > A mutation causes MELAS/Leigh overlap syndrome presenting with acute auditory agnosia.

    PubMed

    Leng, Yinglin; Liu, Yuhe; Fang, Xiaojing; Li, Yao; Yu, Lei; Yuan, Yun; Wang, Zhaoxia

    2015-04-01

    Mitochondrial encephalomyopathy with lactic acidosis and stroke-like episodes/Leigh (MELAS/LS) overlap syndrome is a mitochondrial disorder subtype with clinical and magnetic resonance imaging (MRI) features that are characteristic of both MELAS and Leigh syndrome (LS). Here, we report an MELAS/LS case presenting with cortical deafness and seizures. Cranial MRI revealed multiple lesions involving bilateral temporal lobes, the basal ganglia and the brainstem, which conformed to neuroimaging features of both MELAS and LS. Whole mitochondrial DNA (mtDNA) sequencing and PCR-RFLP revealed a de novo heteroplasmic m.10197 G > A mutation in the NADH dehydrogenase subunit 3 gene (ND3), which was predicted to cause an alanine to threonine substitution at amino acid 47. Although the mtDNA m.10197 G > A mutation has been reported in association with LS, Leber hereditary optic neuropathy and dystonia, it has never been linked with MELAS/LS overlap syndrome. Our patient therefore expands the phenotypic spectrum of the mtDNA m.10197 G > A mutation.

  6. HCMM imagery for the discrimination of rock types, the detection of geothermal energy sources and the assessment of soil moisture content in western Queensland and adjacent parts of New South Wales and South Australia

    NASA Technical Reports Server (NTRS)

    Cole, M. M. (Principal Investigator)

    1980-01-01

    The author has identified the following significant results. Day-visible and day-IR imagery of northwest Queensland show that large scale geological features like the Mitakoodi anticlinorium, which involves rocks of contrasting lithological type, can be delineated. North of Cloncurry, the contrasting lithological units of the Knapdale quartzite and bedded argillaceous limestones within the Proterozoic Corella sequence are clearly delineated in the area of the Dugald River Lode. Major structural features in the Mount Isa area are revealed on the day-visible cover. Which provides similar but less detailed information than the LANDSAT imagery. The day-IR cover provides less additional information for areas of outcropping bedrock than had been expected. Initial studies of the day-IR and night-IR cover for parts of South Australia suggest that they contain additional information on geology compared with day-visible cover.

  7. TFBSshape: a motif database for DNA shape features of transcription factor binding sites.

    PubMed

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein-DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone.

  8. TFBSshape: a motif database for DNA shape features of transcription factor binding sites

    PubMed Central

    Yang, Lin; Zhou, Tianyin; Dror, Iris; Mathelier, Anthony; Wasserman, Wyeth W.; Gordân, Raluca; Rohs, Remo

    2014-01-01

    Transcription factor binding sites (TFBSs) are most commonly characterized by the nucleotide preferences at each position of the DNA target. Whereas these sequence motifs are quite accurate descriptions of DNA binding specificities of transcription factors (TFs), proteins recognize DNA as a three-dimensional object. DNA structural features refine the description of TF binding specificities and provide mechanistic insights into protein–DNA recognition. Existing motif databases contain extensive nucleotide sequences identified in binding experiments based on their selection by a TF. To utilize DNA shape information when analysing the DNA binding specificities of TFs, we developed a new tool, the TFBSshape database (available at http://rohslab.cmb.usc.edu/TFBSshape/), for calculating DNA structural features from nucleotide sequences provided by motif databases. The TFBSshape database can be used to generate heat maps and quantitative data for DNA structural features (i.e., minor groove width, roll, propeller twist and helix twist) for 739 TF datasets from 23 different species derived from the motif databases JASPAR and UniPROBE. As demonstrated for the basic helix-loop-helix and homeodomain TF families, our TFBSshape database can be used to compare, qualitatively and quantitatively, the DNA binding specificities of closely related TFs and, thus, uncover differential DNA binding specificities that are not apparent from nucleotide sequence alone. PMID:24214955

  9. Object Tracking Using Adaptive Covariance Descriptor and Clustering-Based Model Updating for Visual Surveillance

    PubMed Central

    Qin, Lei; Snoussi, Hichem; Abdallah, Fahed

    2014-01-01

    We propose a novel approach for tracking an arbitrary object in video sequences for visual surveillance. The first contribution of this work is an automatic feature extraction method that is able to extract compact discriminative features from a feature pool before computing the region covariance descriptor. As the feature extraction method is adaptive to a specific object of interest, we refer to the region covariance descriptor computed using the extracted features as the adaptive covariance descriptor. The second contribution is to propose a weakly supervised method for updating the object appearance model during tracking. The method performs a mean-shift clustering procedure among the tracking result samples accumulated during a period of time and selects a group of reliable samples for updating the object appearance model. As such, the object appearance model is kept up-to-date and is prevented from contamination even in case of tracking mistakes. We conducted comparing experiments on real-world video sequences, which confirmed the effectiveness of the proposed approaches. The tracking system that integrates the adaptive covariance descriptor and the clustering-based model updating method accomplished stable object tracking on challenging video sequences. PMID:24865883

  10. A common class of transcripts with 5'-intron depletion, distinct early coding sequence features, and N1-methyladenosine modification.

    PubMed

    Cenik, Can; Chua, Hon Nian; Singh, Guramrit; Akef, Abdalla; Snyder, Michael P; Palazzo, Alexander F; Moore, Melissa J; Roth, Frederick P

    2017-03-01

    Introns are found in 5' untranslated regions (5'UTRs) for 35% of all human transcripts. These 5'UTR introns are not randomly distributed: Genes that encode secreted, membrane-bound and mitochondrial proteins are less likely to have them. Curiously, transcripts lacking 5'UTR introns tend to harbor specific RNA sequence elements in their early coding regions. To model and understand the connection between coding-region sequence and 5'UTR intron status, we developed a classifier that can predict 5'UTR intron status with >80% accuracy using only sequence features in the early coding region. Thus, the classifier identifies transcripts with 5 ' proximal- i ntron- m inus-like-coding regions ("5IM" transcripts). Unexpectedly, we found that the early coding sequence features defining 5IM transcripts are widespread, appearing in 21% of all human RefSeq transcripts. The 5IM class of transcripts is enriched for non-AUG start codons, more extensive secondary structure both preceding the start codon and near the 5' cap, greater dependence on eIF4E for translation, and association with ER-proximal ribosomes. 5IM transcripts are bound by the exon junction complex (EJC) at noncanonical 5' proximal positions. Finally, N 1 -methyladenosines are specifically enriched in the early coding regions of 5IM transcripts. Taken together, our analyses point to the existence of a distinct 5IM class comprising ∼20% of human transcripts. This class is defined by depletion of 5' proximal introns, presence of specific RNA sequence features associated with low translation efficiency, N 1 -methyladenosines in the early coding region, and enrichment for noncanonical binding by the EJC. © 2017 Cenik et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society.

  11. Supervised Learning for Detection of Duplicates in Genomic Sequence Databases.

    PubMed

    Chen, Qingyu; Zobel, Justin; Zhang, Xiuzhen; Verspoor, Karin

    2016-01-01

    First identified as an issue in 1996, duplication in biological databases introduces redundancy and even leads to inconsistency when contradictory information appears. The amount of data makes purely manual de-duplication impractical, and existing automatic systems cannot detect duplicates as precisely as can experts. Supervised learning has the potential to address such problems by building automatic systems that learn from expert curation to detect duplicates precisely and efficiently. While machine learning is a mature approach in other duplicate detection contexts, it has seen only preliminary application in genomic sequence databases. We developed and evaluated a supervised duplicate detection method based on an expert curated dataset of duplicates, containing over one million pairs across five organisms derived from genomic sequence databases. We selected 22 features to represent distinct attributes of the database records, and developed a binary model and a multi-class model. Both models achieve promising performance; under cross-validation, the binary model had over 90% accuracy in each of the five organisms, while the multi-class model maintains high accuracy and is more robust in generalisation. We performed an ablation study to quantify the impact of different sequence record features, finding that features derived from meta-data, sequence identity, and alignment quality impact performance most strongly. The study demonstrates machine learning can be an effective additional tool for de-duplication of genomic sequence databases. All Data are available as described in the supplementary material.

  12. Comparative genome analysis of the candidate functional starter culture strains Lactobacillus fermentum 222 and Lactobacillus plantarum 80 for controlled cocoa bean fermentation processes.

    PubMed

    Illeghems, Koen; De Vuyst, Luc; Weckx, Stefan

    2015-10-12

    Lactobacillus fermentum 222 and Lactobacillus plantarum 80, isolates from a spontaneous Ghanaian cocoa bean fermentation process, proved to be interesting functional starter culture strains for cocoa bean fermentations. Lactobacillus fermentum 222 is a thermotolerant strain, able to dominate the fermentation process, thereby converting citrate and producing mannitol. Lactobacillus plantarum 80 is an acid-tolerant and facultative heterofermentative strain that is competitive during cocoa bean fermentation processes. In this study, whole-genome sequencing and comparative genome analysis was used to investigate the mechanisms of these strains to dominate the cocoa bean fermentation process. Through functional annotation and analysis of the high-coverage contigs obtained through 454 pyrosequencing, plantaricin production was predicted for L. plantarum 80. For L. fermentum 222, genes encoding a complete arginine deiminase pathway were attributed. Further, in-depth functional analysis revealed the capacities of these strains associated with carbohydrate and amino acid metabolism, such as the ability to use alternative external electron acceptors, the presence of an extended pyruvate metabolism, and the occurrence of several amino acid conversion pathways. A comparative genome sequence analysis using publicly available genome sequences of strains of the species L. plantarum and L. fermentum revealed unique features of both strains studied. Indeed, L. fermentum 222 possessed genes encoding additional citrate transporters and enzymes involved in amino acid conversions, whereas L. plantarum 80 is the only member of this species that harboured a gene cluster involved in uptake and consumption of fructose and/or sorbose. In-depth genome sequence analysis of the candidate functional starter culture strains L. fermentum 222 and L. plantarum 80 revealed their metabolic capacities, niche adaptations and functionalities that enable them to dominate the cocoa bean fermentation process. Further, these results offered insights into the cocoa bean fermentation ecosystem as a whole and will facilitate the selection of appropriate starter culture strains for controlled cocoa bean fermentation processes.

  13. Genomic insights into strategies used by Xanthomonas albilineans with its reduced artillery to spread within sugarcane xylem vessels.

    PubMed

    Pieretti, Isabelle; Royer, Monique; Barbe, Valérie; Carrere, Sébastien; Koebnik, Ralf; Couloux, Arnaud; Darrasse, Armelle; Gouzy, Jérôme; Jacques, Marie-Agnès; Lauber, Emmanuelle; Manceau, Charles; Mangenot, Sophie; Poussier, Stéphane; Segurens, Béatrice; Szurek, Boris; Verdier, Valérie; Arlat, Matthieu; Gabriel, Dean W; Rott, Philippe; Cociancich, Stéphane

    2012-11-21

    Xanthomonas albilineans causes leaf scald, a lethal disease of sugarcane. X. albilineans exhibits distinctive pathogenic mechanisms, ecology and taxonomy compared to other species of Xanthomonas. For example, this species produces a potent DNA gyrase inhibitor called albicidin that is largely responsible for inducing disease symptoms; its habitat is limited to xylem; and the species exhibits large variability. A first manuscript on the complete genome sequence of the highly pathogenic X. albilineans strain GPE PC73 focused exclusively on distinctive genomic features shared with Xylella fastidiosa-another xylem-limited Xanthomonadaceae. The present manuscript on the same genome sequence aims to describe all other pathogenicity-related genomic features of X. albilineans, and to compare, using suppression subtractive hybridization (SSH), genomic features of two strains differing in pathogenicity. Comparative genomic analyses showed that most of the known pathogenicity factors from other Xanthomonas species are conserved in X. albilineans, with the notable absence of two major determinants of the "artillery" of other plant pathogenic species of Xanthomonas: the xanthan gum biosynthesis gene cluster, and the type III secretion system Hrp (hypersensitive response and pathogenicity). Genomic features specific to X. albilineans that may contribute to specific adaptation of this pathogen to sugarcane xylem vessels were also revealed. SSH experiments led to the identification of 20 genes common to three highly pathogenic strains but missing in a less pathogenic strain. These 20 genes, which include four ABC transporter genes, a methyl-accepting chemotaxis protein gene and an oxidoreductase gene, could play a key role in pathogenicity. With the exception of hypothetical proteins revealed by our comparative genomic analyses and SSH experiments, no genes potentially involved in any offensive or counter-defensive mechanism specific to X. albilineans were identified, supposing that X. albilineans has a reduced artillery compared to other pathogenic Xanthomonas species. Particular attention has therefore been given to genomic features specific to X. albilineans making it more capable of evading sugarcane surveillance systems or resisting sugarcane defense systems. This study confirms that X. albilineans is a highly distinctive species within the genus Xanthomonas, and opens new perpectives towards a greater understanding of the pathogenicity of this destructive sugarcane pathogen.

  14. Not all (possibly) “random” sequences are created equal

    PubMed Central

    Pincus, Steve; Kalman, Rudolf E.

    1997-01-01

    The need to assess the randomness of a single sequence, especially a finite sequence, is ubiquitous, yet is unaddressed by axiomatic probability theory. Here, we assess randomness via approximate entropy (ApEn), a computable measure of sequential irregularity, applicable to single sequences of both (even very short) finite and infinite length. We indicate the novelty and facility of the multidimensional viewpoint taken by ApEn, in contrast to classical measures. Furthermore and notably, for finite length, finite state sequences, one can identify maximally irregular sequences, and then apply ApEn to quantify the extent to which given sequences differ from maximal irregularity, via a set of deficit (defm) functions. The utility of these defm functions which we show allows one to considerably refine the notions of probabilistic independence and normality, is featured in several studies, including (i) digits of e, π, √2, and √3, both in base 2 and in base 10, and (ii) sequences given by fractional parts of multiples of irrationals. We prove companion analytic results, which also feature in a discussion of the role and validity of the almost sure properties from axiomatic probability theory insofar as they apply to specified sequences and sets of sequences (in the physical world). We conclude by relating the present results and perspective to both previous and subsequent studies. PMID:11038612

  15. Pairwise Sequence Alignment Library

    DOE Office of Scientific and Technical Information (OSTI.GOV)

    Jeff Daily, PNNL

    2015-05-20

    Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less

  16. The glucose transporter 1 -GLUT1- from the white shrimp Litopenaeus vannamei is up-regulated during hypoxia.

    PubMed

    Martínez-Quintana, José A; Peregrino-Uriarte, Alma B; Gollas-Galván, Teresa; Gómez-Jiménez, Silvia; Yepiz-Plascencia, Gloria

    2014-12-01

    During hypoxia the shrimp Litopenaeus vannamei accelerates anaerobic glycolysis to obtain energy; therefore, a correct supply of glucose to the cells is needed. Facilitated glucose transport across the cells is mediated by a group of membrane embedded integral proteins called GLUT; being GLUT1 the most ubiquitous form. In this work, we report the first cDNA nucleotide and deduced amino acid sequences of a glucose transporter 1 from L. vannamei. A 1619 bp sequence was obtained by RT-PCR and RACE approaches. The 5´ UTR is 161 bp and the poly A tail is exactly after the stop codon in the mRNA. The ORF is 1485 bp and codes for 485 amino acids. The deduced protein sequence has high identity to GLUT1 proteins from several species and contains all the main features of glucose transporter proteins, including twelve transmembrane domains, the conserved motives and amino acids involved in transport activity, ligands binding and membrane anchor. Therefore, we decided to name this sequence, glucose transporter 1 of L. vannamei (LvGLUT1). A partial gene sequence of 8.87 Kbp was also obtained; it contains the complete coding sequence divided in 10 exons. LvGlut1 expression was detected in hemocytes, hepatopancreas, intestine gills, muscle and pleopods. The higher relative expression was found in gills and the lower in hemocytes. This indicates that LvGlut1 is ubiquitously expressed but its levels are tissue-specific and upon short-term hypoxia, the GLUT1 transcripts increase 3.7-fold in hepatopancreas and gills. To our knowledge, this is the first evidence of expression of GLUT1 in crustaceans.

  17. Genomic deletions of OFD1 account for 23% of oral-facial-digital type 1 syndrome after negative DNA sequencing.

    PubMed

    Thauvin-Robinet, Christel; Franco, Brunella; Saugier-Veber, Pascale; Aral, Bernard; Gigot, Nadège; Donzel, Anne; Van Maldergem, Lionel; Bieth, Eric; Layet, Valérie; Mathieu, Michèle; Teebi, Ahmad; Lespinasse, James; Callier, Patrick; Mugneret, Francine; Masurel-Paulet, Alice; Gautier, Elodie; Huet, Frédéric; Teyssier, Jean-Raymond; Tosi, Mario; Frébourg, Thierry; Faivre, Laurence

    2009-02-01

    Oral-facial-digital type I syndrome (OFDI) is characterised by an X-linked dominant mode of inheritance with lethality in males. Clinical features include facial dysmorphism with oral, dental and distal abnormalities, polycystic kidney disease and central nervous system malformations. Considerable allelic heterogeneity has been reported within the OFD1 gene, but DNA bi-directional sequencing of the exons and intron-exon boundaries of the OFD1 gene remains negative in more than 20% of cases. We hypothesized that genomic rearrangements could account for the majority of the remaining undiagnosed cases. Thus, we took advantage of two independent available series of patients with OFDI syndrome and negative DNA bi-directional sequencing of the exons and intron-exon boundaries of the OFD1 gene from two different European labs: 13/36 cases from the French lab; 13/95 from the Italian lab. All patients were screened by a semiquantitative fluorescent multiplex method (QFMPSF) and relative quantification by real-time PCR (qPCR). Six OFD1 genomic deletions (exon 5, exons 1-8, exons 1-14, exons 10-11, exons 13-23 and exon 17) were identified, accounting for 5% of OFDI patients and for 23% of patients with negative mutation screening by DNA sequencing. The association of DNA direct sequencing, QFMPSF and qPCR detects OFD1 alteration in up to 85% of patients with a phenotype suggestive of OFDI syndrome. Given the average percentage of large genomic rearrangements (5%), we suggest that dosage methods should be performed in addition to DNA direct sequencing analysis to exclude the involvement of the OFD1 transcript when there are genetic counselling issues. (c) 2008 Wiley-Liss, Inc.

  18. High-precision relocation for aftershocks of the 2016 ML 5.8 Gyeongju earthquake in South Korea: Stress partitioning controlled by complex fault systems

    NASA Astrophysics Data System (ADS)

    Woo, J. U.; Rhie, J.; Kang, T. S.; Kim, S.; Chai, G.; Cho, E.

    2017-12-01

    Complex inherent fault system is one of key factors controlling the main shock occurrence and the pattern of aftershock sequence. Many field studies have shown that the fault systems in the Korean Peninsula are complex because they formed by various tectonic events since Proterozoic. Apart from that the mainshock is the largest one (ML 5.8) ever recorded in South Korea, the Gyeongju earthquake sequence shows particularly interesting features: ML 5.1 event preceded ML 5.8 event by 50 min and they are located closely to each other ( 1 km). In addition, ML 4.5 event occurred 2 3 km away from the two events after a week of the mainshock. Considering reported focal mechanisms and hypocenters of the three major events, it is unlikely that the earthquake sequence occurs on a single fault plane. To depict the detailed fault geometry associated with the sequence, we precisely determine the relative locations of 1,400 aftershocks recorded by 27 broadband stations, which started to be deployed less than one hour after the mainshock. Double difference algorithm is applied using relative travel time measurements by a waveform cross-correlation method. Relocated hypocenters show that a major fault striking NE-SW and some minor faults get involved in the sequence. In particular, aftershocks immediately following ML 4.5 event seem to occur on a fault striking NW-SE, which is orthogonal to the strike of a major fault. We expect that the Gyeongju earthquake sequence resulted from the stress transfer controlled by the complex inherent fault system in this region.

  19. Sequencing Events: Exploring Art and Art Jobs.

    ERIC Educational Resources Information Center

    Stephens, Pamela Geiger; Shaddix, Robin K.

    2000-01-01

    Presents an activity for upper-elementary students that correlates the actions of archaeologists, patrons, and artists with the sequencing of events in a logical order. Features ancient Egyptian art images. Discusses the preparation of materials, motivation, a pre-writing activity, and writing a story in sequence. (CMK)

  20. MSLICE Sequencing

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Norris, Jeffrey S.; Morris, John R.

    2011-01-01

    MSLICE Sequencing is a graphical tool for writing sequences and integrating them into RML files, as well as for producing SCMF files for uplink. When operated in a testbed environment, it also supports uplinking these SCMF files to the testbed via Chill. This software features a free-form textural sequence editor featuring syntax coloring, automatic content assistance (including command and argument completion proposals), complete with types, value ranges, unites, and descriptions from the command dictionary that appear as they are typed. The sequence editor also has a "field mode" that allows tabbing between arguments and displays type/range/units/description for each argument as it is edited. Color-coded error and warning annotations on problematic tokens are included, as well as indications of problems that are not visible in the current scroll range. "Quick Fix" suggestions are made for resolving problems, and all the features afforded by modern source editors are also included such as copy/cut/paste, undo/redo, and a sophisticated find-and-replace system optionally using regular expressions. The software offers a full XML editor for RML files, which features syntax coloring, content assistance and problem annotations as above. There is a form-based, "detail view" that allows structured editing of command arguments and sequence parameters when preferred. The "project view" shows the user s "workspace" as a tree of "resources" (projects, folders, and files) that can subsequently be opened in editors by double-clicking. Files can be added, deleted, dragged-dropped/copied-pasted between folders or projects, and these operations are undoable and redoable. A "problems view" contains a tabular list of all problems in the current workspace. Double-clicking on any row in the table opens an editor for the appropriate sequence, scrolling to the specific line with the problem, and highlighting the problematic characters. From there, one can invoke "quick fix" as described above to resolve the issue. Once resolved, saving the file causes the problem to be removed from the problem view.

  1. Distribution and Features of the Six Classes of Peroxiredoxins

    PubMed Central

    Poole, Leslie B.; Nelson, Kimberly J.

    2016-01-01

    Peroxiredoxins are cysteine-dependent peroxide reductases that group into 6 different, structurally discernable classes. In 2011, our research team reported the application of a bioinformatic approach called active site profiling to extract active site-proximal sequence segments from the 29 distinct, structurally-characterized peroxiredoxins available at the time. These extracted sequences were then used to create unique profiles for the six groups which were subsequently used to search GenBank(nr), allowing identification of ∼3500 peroxiredoxin sequences and their respective subgroups. Summarized in this minireview are the features and phylogenetic distributions of each of these peroxiredoxin subgroups; an example is also provided illustrating the use of the web accessible, searchable database known as PREX to identify subfamily-specific peroxiredoxin sequences for the organism Vitis vinifera (grape). PMID:26810075

  2. Analysis of Biological Features Associated with Meiotic Recombination Hot and Cold Spots in Saccharomyces cerevisiae

    PubMed Central

    Hansen, Loren; Kim, Nak-Kyeong; Mariño-Ramírez, Leonardo; Landsman, David

    2011-01-01

    Meiotic recombination is not distributed uniformly throughout the genome. There are regions of high and low recombination rates called hot and cold spots, respectively. The recombination rate parallels the frequency of DNA double-strand breaks (DSBs) that initiate meiotic recombination. The aim is to identify biological features associated with DSB frequency. We constructed vectors representing various chromatin and sequence-based features for 1179 DSB hot spots and 1028 DSB cold spots. Using a feature selection approach, we have identified five features that distinguish hot from cold spots in Saccharomyces cerevisiae with high accuracy, namely the histone marks H3K4me3, H3K14ac, H3K36me3, and H3K79me3; and GC content. Previous studies have associated H3K4me3, H3K36me3, and GC content with areas of mitotic recombination. H3K14ac and H3K79me3 are novel predictions and thus represent good candidates for further experimental study. We also show nucleosome occupancy maps produced using next generation sequencing exhibit a bias at DSB hot spots and this bias is strong enough to obscure biologically relevant information. A computational approach using feature selection can productively be used to identify promising biological associations. H3K14ac and H3K79me3 are novel predictions of chromatin marks associated with meiotic DSBs. Next generation sequencing can exhibit a bias that is strong enough to lead to incorrect conclusions. Care must be taken when interpreting high throughput sequencing data where systematic biases have been documented. PMID:22242140

  3. Nonspatial Sequence Coding in CA1 Neurons

    PubMed Central

    Allen, Timothy A.; Salz, Daniel M.; McKenzie, Sam

    2016-01-01

    The hippocampus is critical to the memory for sequences of events, a defining feature of episodic memory. However, the fundamental neuronal mechanisms underlying this capacity remain elusive. While considerable research indicates hippocampal neurons can represent sequences of locations, direct evidence of coding for the memory of sequential relationships among nonspatial events remains lacking. To address this important issue, we recorded neural activity in CA1 as rats performed a hippocampus-dependent sequence-memory task. Briefly, the task involves the presentation of repeated sequences of odors at a single port and requires rats to identify each item as “in sequence” or “out of sequence”. We report that, while the animals' location and behavior remained constant, hippocampal activity differed depending on the temporal context of items—in this case, whether they were presented in or out of sequence. Some neurons showed this effect across items or sequence positions (general sequence cells), while others exhibited selectivity for specific conjunctions of item and sequence position information (conjunctive sequence cells) or for specific probe types (probe-specific sequence cells). We also found that the temporal context of individual trials could be accurately decoded from the activity of neuronal ensembles, that sequence coding at the single-cell and ensemble level was linked to sequence memory performance, and that slow-gamma oscillations (20–40 Hz) were more strongly modulated by temporal context and performance than theta oscillations (4–12 Hz). These findings provide compelling evidence that sequence coding extends beyond the domain of spatial trajectories and is thus a fundamental function of the hippocampus. SIGNIFICANCE STATEMENT The ability to remember the order of life events depends on the hippocampus, but the underlying neural mechanisms remain poorly understood. Here we addressed this issue by recording neural activity in hippocampal region CA1 while rats performed a nonspatial sequence memory task. We found that hippocampal neurons code for the temporal context of items (whether odors were presented in the correct or incorrect sequential position) and that this activity is linked with memory performance. The discovery of this novel form of temporal coding in hippocampal neurons advances our fundamental understanding of the neurobiology of episodic memory and will serve as a foundation for our cross-species, multitechnique approach aimed at elucidating the neural mechanisms underlying memory impairments in aging and dementia. PMID:26843637

  4. Aptaligner: automated software for aligning pseudorandom DNA X-aptamers from next-generation sequencing data.

    PubMed

    Lu, Emily; Elizondo-Riojas, Miguel-Angel; Chang, Jeffrey T; Volk, David E

    2014-06-10

    Next-generation sequencing results from bead-based aptamer libraries have demonstrated that traditional DNA/RNA alignment software is insufficient. This is particularly true for X-aptamers containing specialty bases (W, X, Y, Z, ...) that are identified by special encoding. Thus, we sought an automated program that uses the inherent design scheme of bead-based X-aptamers to create a hypothetical reference library and Markov modeling techniques to provide improved alignments. Aptaligner provides this feature as well as length error and noise level cutoff features, is parallelized to run on multiple central processing units (cores), and sorts sequences from a single chip into projects and subprojects.

  5. DNA Sequences from Formalin-Fixed Nematodes: Integrating Molecular and Morphological Approaches to Taxonomy

    PubMed Central

    Thomas, W. Kelley; Vida, J. T.; Frisse, Linda M.; Mundo, Manuel; Baldwin, James G.

    1997-01-01

    To effectively integrate DNA sequence analysis and classical nematode taxonomy, we must be able to obtain DNA sequences from formalin-fixed specimens. Microdissected sections of nematodes were removed from specimens fixed in formalin, using standard protocols and without destroying morphological features. The fixed sections provided sufficient template for multiple polymerase chain reaction-based DNA sequence analyses. PMID:19274156

  6. Extracting DNA words based on the sequence features: non-uniform distribution and integrity.

    PubMed

    Li, Zhi; Cao, Hongyan; Cui, Yuehua; Zhang, Yanbo

    2016-01-25

    DNA sequence can be viewed as an unknown language with words as its functional units. Given that most sequence alignment algorithms such as the motif discovery algorithms depend on the quality of background information about sequences, it is necessary to develop an ab initio algorithm for extracting the "words" based only on the DNA sequences. We considered that non-uniform distribution and integrity were two important features of a word, based on which we developed an ab initio algorithm to extract "DNA words" that have potential functional meaning. A Kolmogorov-Smirnov test was used for consistency test of uniform distribution of DNA sequences, and the integrity was judged by the sequence and position alignment. Two random base sequences were adopted as negative control, and an English book was used as positive control to verify our algorithm. We applied our algorithm to the genomes of Saccharomyces cerevisiae and 10 strains of Escherichia coli to show the utility of the methods. The results provide strong evidences that the algorithm is a promising tool for ab initio building a DNA dictionary. Our method provides a fast way for large scale screening of important DNA elements and offers potential insights into the understanding of a genome.

  7. BASiNET-BiologicAl Sequences NETwork: a case study on coding and non-coding RNAs identification.

    PubMed

    Ito, Eric Augusto; Katahira, Isaque; Vicente, Fábio Fernandes da Rocha; Pereira, Luiz Filipe Protasio; Lopes, Fabrício Martins

    2018-06-05

    With the emergence of Next Generation Sequencing (NGS) technologies, a large volume of sequence data in particular de novo sequencing was rapidly produced at relatively low costs. In this context, computational tools are increasingly important to assist in the identification of relevant information to understand the functioning of organisms. This work introduces BASiNET, an alignment-free tool for classifying biological sequences based on the feature extraction from complex network measurements. The method initially transform the sequences and represents them as complex networks. Then it extracts topological measures and constructs a feature vector that is used to classify the sequences. The method was evaluated in the classification of coding and non-coding RNAs of 13 species and compared to the CNCI, PLEK and CPC2 methods. BASiNET outperformed all compared methods in all adopted organisms and datasets. BASiNET have classified sequences in all organisms with high accuracy and low standard deviation, showing that the method is robust and non-biased by the organism. The proposed methodology is implemented in open source in R language and freely available for download at https://cran.r-project.org/package=BASiNET.

  8. Texture analysis of common renal masses in multiple MR sequences for prediction of pathology

    NASA Astrophysics Data System (ADS)

    Hoang, Uyen N.; Malayeri, Ashkan A.; Lay, Nathan S.; Summers, Ronald M.; Yao, Jianhua

    2017-03-01

    This pilot study performs texture analysis on multiple magnetic resonance (MR) images of common renal masses for differentiation of renal cell carcinoma (RCC). Bounding boxes are drawn around each mass on one axial slice in T1 delayed sequence to use for feature extraction and classification. All sequences (T1 delayed, venous, arterial, pre-contrast phases, T2, and T2 fat saturated sequences) are co-registered and texture features are extracted from each sequence simultaneously. Random forest is used to construct models to classify lesions on 96 normal regions, 87 clear cell RCCs, 8 papillary RCCs, and 21 renal oncocytomas; ground truths are verified through pathology reports. The highest performance is seen in random forest model when data from all sequences are used in conjunction, achieving an overall classification accuracy of 83.7%. When using data from one single sequence, the overall accuracies achieved for T1 delayed, venous, arterial, and pre-contrast phase, T2, and T2 fat saturated were 79.1%, 70.5%, 56.2%, 61.0%, 60.0%, and 44.8%, respectively. This demonstrates promising results of utilizing intensity information from multiple MR sequences for accurate classification of renal masses.

  9. An RNAi-Enhanced Logic Circuit for Cancer Specific Detection and Destruction

    DTIC Science & Technology

    2013-02-01

    monomeric protein secreted by Corynebacterium diphtheriae, and pro-apoptotic members of Bcl-2 family: mBax (Mus musculus), hBax ( Homo sapiens ), and its...Gata3 mStaple. Intron- feature sequences – donor site, branch point, poly- pyrimidine tract, and acceptor site – were selected based on previously...sequences found in literature our intron features were chosen according SplicePort [4], an online analyzer that detects the likelihood of splicing to

  10. How the Sequence of a Gene Specifies Structural Symmetry in Proteins

    PubMed Central

    Shen, Xiaojuan; Huang, Tongcheng; Wang, Guanyu; Li, Guanglin

    2015-01-01

    Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the formation of structural symmetry: a major decrease of local codon usage bias in the middle of the codon sequence can be identified as a common feature; and major or consecutive decreases in local mRNA folding energy near the boundaries of the symmetric substructures can also be observed. The results suggest that gene duplication and fusion may be an evolutionarily conserved process for this protein fold. In addition, the usage of rare codons and the formation of higher order of secondary structure near the boundaries of symmetric substructures might have coevolved as conserved mechanisms to slow down translation elongation and to facilitate effective folding of symmetric substructures. These findings provide valuable insights into our understanding of the mechanisms of translation and its evolution, as well as the design of proteins via symmetric modules. PMID:26641668

  11. Immunoglobulin Light Chains Form an Extensive and Highly Ordered Fibril Involving the N- and C-Termini

    PubMed Central

    2017-01-01

    Light-chain (AL)-associated amyloidosis is a systemic disorder involving the formation and deposition of immunoglobulin AL fibrils in various bodily organs. One severe instance of AL disease is exhibited by the patient-derived variable domain (VL) of the light chain AL-09, a 108 amino acid residue protein containing seven mutations relative to the corresponding germline protein, κI O18/O8 VL. Previous work has demonstrated that the thermodynamic stability of native AL-09 VL is greatly lowered by two of these mutations, Y87H and N34I, whereas a third mutation, K42Q, further increases the kinetics of fibril formation. However, detailed knowledge regarding the residues that are responsible for stabilizing the misfolded fibril structure is lacking. In this study, using solid-state NMR spectroscopy, we show that the majority of the AL-09 VL sequence is immobilized in the fibrils and that the N- and C-terminal portions of the sequence are particularly well-structured. Thus, AL-09 VL forms an extensively ordered and β-strand-rich fibril structure. Furthermore, we demonstrate that the predominant β-sheet secondary structure and rigidity observed for in vitro prepared AL-09 VL fibrils are qualitatively similar to those observed for AL fibrils extracted from postmortem human spleen tissue, suggesting that this conformation may be representative of a common feature of AL fibrils. PMID:28261692

  12. Contrasting patterns of variation in weedy traits and unique crop features in divergent populations of US weedy rice (Oryza sativa sp.) in Arkansas and California.

    PubMed

    Kanapeckas, Kimberly L; Tseng, Te-Ming; Vigueira, Cynthia C; Ortiz, Aida; Bridges, William C; Burgos, Nilda R; Fischer, Albert J; Lawton-Rauh, Amy

    2018-06-01

    Weed evolution from crops involves changes in key traits, but it is unclear how genetic and phenotypic variation contribute to weed diversification and productivity. Weedy rice is a conspecific weed of rice (Oryza sativa) worldwide. We used principal component analysis and hierarchical clustering to understand how morphologically and evolutionarily distinct US weedy rice populations persist in rice fields in different locations under contrasting management regimes. Further, we used a representative subset of 15 sequence-tagged site fragments of expressed genes from global Oryza to assess genome-wide sequence variation among populations. Crop hull color and crop-overlapping maturity dates plus awns, seed (panicle) shattering (> 50%), pigmented pericarp and stature variation (30.2% of total phenotypic variance) characterize genetically less diverse California weedy rice. By contrast, wild-like hull color, seed shattering (> 50%) and stature differences (55.8% of total phenotypic variance) typify genetically diverse weedy rice ecotypes in Arkansas. Recent de-domestication of weedy species - such as in California weedy rice - can involve trait combinations indistinguishable from the crop. This underscores the need for strict seed certification with genetic monitoring and proactive field inspection to prevent proliferation of weedy plant types. In established populations, tillage practice may affect weed diversity and persistence over time. © 2017 Society of Chemical Industry. © 2017 Society of Chemical Industry.

  13. A new family of cyanobacterial penicillin-binding proteins. A missing link in the evolution of class A beta-lactamases.

    PubMed

    Urbach, Carole; Fastrez, Jacques; Soumillion, Patrice

    2008-11-21

    It is largely accepted that serine beta-lactamases evolved from some ancestral DD-peptidases involved in the biosynthesis and maintenance of the bacterial peptidoglycan. DD-peptidases are also called penicillin-binding proteins (PBPs), since they form stable acyl-enzymes with beta-lactam antibiotics, such as penicillins. On the other hand, beta-lactamases react similarly with these antibiotics, but the acyl-enzymes are unstable and rapidly hydrolyzed. Besides, all known PBPs and beta-lactamases share very low sequence similarities, thus rendering it difficult to understand how a PBP could evolve into a beta-lactamase. In this study, we identified a new family of cyanobacterial PBPs featuring the highest sequence similarity with the most widespread class A beta-lactamases. Interestingly, the Omega-loop, which, in the beta-lactamases, carries an essential glutamate involved in the deacylation process, is six amino acids shorter and does not contain any glutamate residue. From this new family of proteins, we characterized PBP-A from Thermosynechococcus elongatus and discovered hydrolytic activity with synthetic thiolesters that are usually good substrates of DD-peptidases. Penicillin degradation pathways as well as acylation and deacylation rates are characteristic of PBPs. In a first attempt to generate beta-lactamase activity, a 90-fold increase in deacylation rate was obtained by introducing a glutamate in the shorter Omega-loop.

  14. Characterization of chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL) in Shanghai, China: molecular and cytogenetic characteristics, IgV gene restriction and hypermutation patterns.

    PubMed

    Irons, Richard D; Le, Anh; Bao, Liming; Zhu, Xiongzeng; Ryder, John; Wang, Xiao Qin; Ji, Meirong; Chen, Yan; Wu, Xichun; Lin, Guowei

    2009-12-01

    The clinical, cytogenetic and molecular features of chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL), a disease previously considered to be rare in Asia, were examined in consecutive series of 70 cases diagnosed by our laboratory over a 30-month period. Clonal abnormalities were observed in 80% of CLL/SLL cases using a combination of conventional cytogenetic and fluorescence in situ hybridization (FISH) analysis. Those involving 14q32/IGH were the most frequent (24 cases), followed by trisomy 12 and 11q abnormalities. IgV(H) gene usage was non-random with over-representation of V(H)4-34, V(H)3-23 and a previously unreported increase in V(H)3-48 gene use. Somatic hypermutation (SHM) of IgV(H) germline sequences was observed in 56.5% of cases with stereotyped patterns of SHM observed in V(H)4-34 heavy chain complimentary-determining (HCDR1) and framework region CFR2 sequences. These findings in a Chinese population suggest subtle geographical differences in IgV(H) gene usage while the remarkably specific pattern of SHM suggest that a relatively limited set of antigens may be involved in the development of this disease worldwide. IgV(H) gene mutation status was a significant predictor of initial survival in CLL/SLL. However, an influence of karyotype on prognosis was not observed.

  15. Non-random occurrence of Robertsonian translocations in the house mouse (Mus musculus domesticus): is it related to quantitative variation in the minor satellite?

    PubMed

    Cazaux, Benoîte; Catalan, Josette; Claude, Julien; Britton-Davidian, Janice

    2014-01-01

    The house mouse, Mus musculus domesticus, shows extraordinary chromosomal diversity driven by fixation of Robertsonian (Rb) translocations. The high frequency of this rearrangement, which involves the centromeric regions, has been ascribed to the architecture of the satellite sequence (high quantity and homogeneity). This promotes centromere-related translocations through unequal recombination and gene conversion. A characteristic feature of Rb variation in this subspecies is the non-random contribution of different chromosomes to the translocation frequency, which, in turn, depends on the chromosome size. Here, the association between satellite quantity and Rb frequency was tested by PRINS of the minor satellite which is the sequence involved in the translocation breakpoints. Five chromosomes with different translocation frequencies were selected and analyzed among wild house mice from 8 European localities. Using a relative quantitative measurement per chromosome, the analysis detected a large variability in signal size most of which was observed between individuals and/or localities. The chromosomes differed significantly in the quantity of the minor satellite, but these differences were not correlated with their translocation frequency. However, the data uncovered a marginally significant correlation between the quantity of the minor satellite and chromosome size. The implications of these results on the evolution of the chromosomal architecture in the house mouse are discussed. © 2014 S. Karger AG, Basel.

  16. Two novel genes, fanA and fanB, involved in the biogenesis of K99 fimbriae.

    PubMed

    Roosendaal, E; Boots, M; de Graaf, F K

    1987-08-11

    The nucleotide sequence of the region located transcriptionally upstream of the K99 fimbrial subunit gene (fanC) was determined. Several putative transcription signals and two open reading frames, designated fanA and fanB, became apparent. Frameshift mutations in fanA and fanB reduced K99 fimbriae expression 8-fold and 16-fold, respectively. Complementation of the mutants in trans restored the K99 expression to about 75% of the wild type level, indicating that fanA and fanB code for transacting polypeptides involved in the biogenesis of K99 fimbriae. The fanA and fanB gene products FanA and FanB were not detectable in minicell preparations, indicating that both polypeptides are synthesized in very small amounts. However, in an in vitro DNA directed translation system FanA and FanB could be identified. The deduced amino acid sequences of FanA and FanB showed that both polypeptides contain no signal peptides, indicating a cytoplasmic location. Furthermore, the polypeptides are very hydrophilic, mainly basic, and exhibit remarkable homology to each other and to a regulatory protein (papB) encoded by the pap-operon (1). Some of these features are characteristics of nucleic acid binding proteins, which suggests that FanA and FanB have a regulatory function in the synthesis of FanC and the auxiliary polypeptides FanD-H.

  17. Molecular cloning of sea bass (Dicentrarchus labrax L.) caspase-8 gene and its involvement in Photobacterium damselae ssp. piscicida triggered apoptosis.

    PubMed

    Reis, Marta I R; Costa-Ramos, Carolina; do Vale, Ana; dos Santos, Nuno M S

    2010-07-01

    Caspase-8 is an initiator caspase that plays a crucial role in some cases of apoptosis by extrinsic and intrinsic pathways. Caspase-8 structure and function have been extensively studied in mammals, but in fish the characterization of that initiator caspase is still scarce. In this work, the sea bass counterpart of mammalian caspase-8 was sequenced and characterized, and its involvement in the apoptogenic activity of a toxin from a fish pathogen was assessed. A 2472 bp cDNA of sea bass caspase-8 was obtained, consisting of 1455 bp open reading frame coding for 484 amino acids and with a predicted molecular weight of 55.2 kDa. The sea bass caspase-8 gene has 6639 bp and is organized in 11 introns and 12 exons. Several distinctive features of sea bass caspase-8 were identified, which include two death effector domains, the caspase family domains p20 and p10, the caspase-8 active-site pentapeptide and potential aspartic acid cleavage sites. The sea bass caspase-8 sequence revealed a significant degree of similarity to corresponding sequences from several vertebrate taxonomic groups. A low expression of sea bass caspase-8 was detected in various tissues of non-stimulated sea bass. Furthermore, it is shown that stimulation of sea bass with mid-exponential phase culture supernatants from Photobacterium damselae ssp. piscicida (Phdp), known to induce selective apoptosis of macrophages and neutrophils, resulted in an increased expression of caspase-8 in the spleen, one of the main affected organs by Phdp infection. 2010 Elsevier Ltd. All rights reserved.

  18. The Complete Genome Sequence of Thermoproteus tenax: A Physiologically Versatile Member of the Crenarchaeota

    PubMed Central

    Siebers, Bettina; Zaparty, Melanie; Raddatz, Guenter; Tjaden, Britta; Albers, Sonja-Verena; Bell, Steve D.; Blombach, Fabian; Kletzin, Arnulf; Kyrpides, Nikos; Lanz, Christa; Plagens, André; Rampp, Markus; Rosinus, Andrea; von Jan, Mathias; Makarova, Kira S.; Klenk, Hans-Peter; Schuster, Stephan C.; Hensel, Reinhard

    2011-01-01

    Here, we report on the complete genome sequence of the hyperthermophilic Crenarchaeum Thermoproteus tenax (strain Kra1, DSM 2078T) a type strain of the crenarchaeotal order Thermoproteales. Its circular 1.84-megabase genome harbors no extrachromosomal elements and 2,051 open reading frames are identified, covering 90.6% of the complete sequence, which represents a high coding density. Derived from the gene content, T. tenax is a representative member of the Crenarchaeota. The organism is strictly anaerobic and sulfur-dependent with optimal growth at 86°C and pH 5.6. One particular feature is the great metabolic versatility, which is not accompanied by a distinct increase of genome size or information density as compared to other Crenarchaeota. T. tenax is able to grow chemolithoautotrophically (CO2/H2) as well as chemoorganoheterotrophically in presence of various organic substrates. All pathways for synthesizing the 20 proteinogenic amino acids are present. In addition, two presumably complete gene sets for NADH:quinone oxidoreductase (complex I) were identified in the genome and there is evidence that either NADH or reduced ferredoxin might serve as electron donor. Beside the typical archaeal A0A1-ATP synthase, a membrane-bound pyrophosphatase is found, which might contribute to energy conservation. Surprisingly, all genes required for dissimilatory sulfate reduction are present, which is confirmed by growth experiments. Mentionable is furthermore, the presence of two proteins (ParA family ATPase, actin-like protein) that might be involved in cell division in Thermoproteales, where the ESCRT system is absent, and of genes involved in genetic competence (DprA, ComF) that is so far unique within Archaea. PMID:22003381

  19. Transcriptomics Provide Insight Into Mussel (Mytilus galloprovincialis) Mantle Function And Its Role In Biomineralization

    NASA Astrophysics Data System (ADS)

    Zaghdoudi-Allan, N.; Yarra, T.; Churcher, A.; Felix, R. C.; Cardoso, J.; Clark, M.; Power, D. M.

    2016-02-01

    With over 90,000 extant species, the Mollusca is one of the most successful and species-rich phyla, comprising 23% of known marine fauna. Common to all molluscs, the mantle is a multi-functional highly muscular tissue that contacts the shell and envelops vital organs. In bivalves, the epithelial cells of the mantle secrete the external shell by a complex network of mechanisms that remain poorly understood. To date, the bulk of the work on Mytilus mantle has focused on two of its features: the mantle edge and the pallial mantle and relatively little is known about the factors regulating its function. We hypothesize that the mantle edge in Mytilus species is heterogeneous in cellular structure and function and use next generation sequencing to mine for receptors involved in biomineralization. The mantle edge of the Mediterranean mussel (Mytilus galloprovincialis) was sectioned into three parts and sequenced using the Illumina platform. The transcriptome sequences generated assembled into 179,879 transcripts with a 34% GC content, congruent with other bivalve asssemblies. The transcriptome was annotated and String analysis (http://www.string-db.org) was used for a preliminary characterisation of biological processes. To test our hypothesis, we compared the transcripts from the 3 mantle segments and the expression levels of putative receptors such as the G -protein coupled receptors (GPCRs) in the sectioned mantle of 6 individuals using qPCR. Candidates were chosen based on their regulatory function and potential involvement in shell formation. Our results show differences in transcript abundance and cellular function amongst the three mantle sections. Combining our transcriptomic study with histological studies of the mantle tissue, we present evidence of both molecular and structural heterogeneity of the mussel mantle and identify several putative regulatory networks.

  20. Targeting functional motifs of a protein family

    NASA Astrophysics Data System (ADS)

    Bhadola, Pradeep; Deo, Nivedita

    2016-10-01

    The structural organization of a protein family is investigated by devising a method based on the random matrix theory (RMT), which uses the physiochemical properties of the amino acid with multiple sequence alignment. A graphical method to represent protein sequences using physiochemical properties is devised that gives a fast, easy, and informative way of comparing the evolutionary distances between protein sequences. A correlation matrix associated with each property is calculated, where the noise reduction and information filtering is done using RMT involving an ensemble of Wishart matrices. The analysis of the eigenvalue statistics of the correlation matrix for the β -lactamase family shows the universal features as observed in the Gaussian orthogonal ensemble (GOE). The property-based approach captures the short- as well as the long-range correlation (approximately following GOE) between the eigenvalues, whereas the previous approach (treating amino acids as characters) gives the usual short-range correlations, while the long-range correlations are the same as that of an uncorrelated series. The distribution of the eigenvector components for the eigenvalues outside the bulk (RMT bound) deviates significantly from RMT observations and contains important information about the system. The information content of each eigenvector of the correlation matrix is quantified by introducing an entropic estimate, which shows that for the β -lactamase family the smallest eigenvectors (low eigenmodes) are highly localized as well as informative. These small eigenvectors when processed gives clusters involving positions that have well-defined biological and structural importance matching with experiments. The approach is crucial for the recognition of structural motifs as shown in β -lactamase (and other families) and selectively identifies the important positions for targets to deactivate (activate) the enzymatic actions.

  1. CNNdel: Calling Structural Variations on Low Coverage Data Based on Convolutional Neural Networks

    PubMed Central

    2017-01-01

    Many structural variations (SVs) detection methods have been proposed due to the popularization of next-generation sequencing (NGS). These SV calling methods use different SV-property-dependent features; however, they all suffer from poor accuracy when running on low coverage sequences. The union of results from these tools achieves fairly high sensitivity but still produces low accuracy on low coverage sequence data. That is, these methods contain many false positives. In this paper, we present CNNdel, an approach for calling deletions from paired-end reads. CNNdel gathers SV candidates reported by multiple tools and then extracts features from aligned BAM files at the positions of candidates. With labeled feature-expressed candidates as a training set, CNNdel trains convolutional neural networks (CNNs) to distinguish true unlabeled candidates from false ones. Results show that CNNdel works well with NGS reads from 26 low coverage genomes of the 1000 Genomes Project. The paper demonstrates that convolutional neural networks can automatically assign the priority of SV features and reduce the false positives efficaciously. PMID:28630866

  2. SNBRFinder: A Sequence-Based Hybrid Algorithm for Enhanced Prediction of Nucleic Acid-Binding Residues.

    PubMed

    Yang, Xiaoxia; Wang, Jia; Sun, Jun; Liu, Rong

    2015-01-01

    Protein-nucleic acid interactions are central to various fundamental biological processes. Automated methods capable of reliably identifying DNA- and RNA-binding residues in protein sequence are assuming ever-increasing importance. The majority of current algorithms rely on feature-based prediction, but their accuracy remains to be further improved. Here we propose a sequence-based hybrid algorithm SNBRFinder (Sequence-based Nucleic acid-Binding Residue Finder) by merging a feature predictor SNBRFinderF and a template predictor SNBRFinderT. SNBRFinderF was established using the support vector machine whose inputs include sequence profile and other complementary sequence descriptors, while SNBRFinderT was implemented with the sequence alignment algorithm based on profile hidden Markov models to capture the weakly homologous template of query sequence. Experimental results show that SNBRFinderF was clearly superior to the commonly used sequence profile-based predictor and SNBRFinderT can achieve comparable performance to the structure-based template methods. Leveraging the complementary relationship between these two predictors, SNBRFinder reasonably improved the performance of both DNA- and RNA-binding residue predictions. More importantly, the sequence-based hybrid prediction reached competitive performance relative to our previous structure-based counterpart. Our extensive and stringent comparisons show that SNBRFinder has obvious advantages over the existing sequence-based prediction algorithms. The value of our algorithm is highlighted by establishing an easy-to-use web server that is freely accessible at http://ibi.hzau.edu.cn/SNBRFinder.

  3. SIMAP--a comprehensive database of pre-calculated protein sequence similarities, domains, annotations and clusters.

    PubMed

    Rattei, Thomas; Tischler, Patrick; Götz, Stefan; Jehl, Marc-André; Hoser, Jonathan; Arnold, Roland; Conesa, Ana; Mewes, Hans-Werner

    2010-01-01

    The prediction of protein function as well as the reconstruction of evolutionary genesis employing sequence comparison at large is still the most powerful tool in sequence analysis. Due to the exponential growth of the number of known protein sequences and the subsequent quadratic growth of the similarity matrix, the computation of the Similarity Matrix of Proteins (SIMAP) becomes a computational intensive task. The SIMAP database provides a comprehensive and up-to-date pre-calculation of the protein sequence similarity matrix, sequence-based features and sequence clusters. As of September 2009, SIMAP covers 48 million proteins and more than 23 million non-redundant sequences. Novel features of SIMAP include the expansion of the sequence space by including databases such as ENSEMBL as well as the integration of metagenomes based on their consistent processing and annotation. Furthermore, protein function predictions by Blast2GO are pre-calculated for all sequences in SIMAP and the data access and query functions have been improved. SIMAP assists biologists to query the up-to-date sequence space systematically and facilitates large-scale downstream projects in computational biology. Access to SIMAP is freely provided through the web portal for individuals (http://mips.gsf.de/simap/) and for programmatic access through DAS (http://webclu.bio.wzw.tum.de/das/) and Web-Service (http://mips.gsf.de/webservices/services/SimapService2.0?wsdl).

  4. AfterQC: automatic filtering, trimming, error removing and quality control for fastq data.

    PubMed

    Chen, Shifu; Huang, Tanxiao; Zhou, Yanqing; Han, Yue; Xu, Mingyan; Gu, Jia

    2017-03-14

    Some applications, especially those clinical applications requiring high accuracy of sequencing data, usually have to face the troubles caused by unavoidable sequencing errors. Several tools have been proposed to profile the sequencing quality, but few of them can quantify or correct the sequencing errors. This unmet requirement motivated us to develop AfterQC, a tool with functions to profile sequencing errors and correct most of them, plus highly automated quality control and data filtering features. Different from most tools, AfterQC analyses the overlapping of paired sequences for pair-end sequencing data. Based on overlapping analysis, AfterQC can detect and cut adapters, and furthermore it gives a novel function to correct wrong bases in the overlapping regions. Another new feature is to detect and visualise sequencing bubbles, which can be commonly found on the flowcell lanes and may raise sequencing errors. Besides normal per cycle quality and base content plotting, AfterQC also provides features like polyX (a long sub-sequence of a same base X) filtering, automatic trimming and K-MER based strand bias profiling. For each single or pair of FastQ files, AfterQC filters out bad reads, detects and eliminates sequencer's bubble effects, trims reads at front and tail, detects the sequencing errors and corrects part of them, and finally outputs clean data and generates HTML reports with interactive figures. AfterQC can run in batch mode with multiprocess support, it can run with a single FastQ file, a single pair of FastQ files (for pair-end sequencing), or a folder for all included FastQ files to be processed automatically. Based on overlapping analysis, AfterQC can estimate the sequencing error rate and profile the error transform distribution. The results of our error profiling tests show that the error distribution is highly platform dependent. Much more than just another new quality control (QC) tool, AfterQC is able to perform quality control, data filtering, error profiling and base correction automatically. Experimental results show that AfterQC can help to eliminate the sequencing errors for pair-end sequencing data to provide much cleaner outputs, and consequently help to reduce the false-positive variants, especially for the low-frequency somatic mutations. While providing rich configurable options, AfterQC can detect and set all the options automatically and require no argument in most cases.

  5. An Observational Study of Children's Involvement in Informed Consent for Exome Sequencing Research.

    PubMed

    Miller, Victoria A; Werner-Lin, Allison; Walser, Sarah A; Biswas, Sawona; Bernhardt, Barbara A

    2017-02-01

    The goal of this study was to examine children's involvement in consent sessions for exome sequencing research and associations of involvement with provider and parent communication. Participants included 44 children (8-17 years) from five cohorts who were offered participation in an exome sequencing study. The consent sessions were audiotaped, transcribed, and coded. Providers attempted to facilitate the child's involvement in the majority (73%) of sessions, and most (75%) children also verbally participated. Provider facilitation was strongly associated with likelihood of child participation. These findings underscore that strategies such as asking for children's opinions and soliciting their questions show respect for children and may increase the likelihood that they are engaged and involved in decisions about research participation.

  6. Mining SNPs from EST sequences using filters and ensemble classifiers.

    PubMed

    Wang, J; Zou, Q; Guo, M Z

    2010-05-04

    Abundant single nucleotide polymorphisms (SNPs) provide the most complete information for genome-wide association studies. However, due to the bottleneck of manual discovery of putative SNPs and the inaccessibility of the original sequencing reads, it is essential to develop a more efficient and accurate computational method for automated SNP detection. We propose a novel computational method to rapidly find true SNPs in public-available EST (expressed sequence tag) databases; this method is implemented as SNPDigger. EST sequences are clustered and aligned. SNP candidates are then obtained according to a measure of redundant frequency. Several new informative biological features, such as the structural neighbor profiles and the physical position of the SNP, were extracted from EST sequences, and the effectiveness of these features was demonstrated. An ensemble classifier, which employs a carefully selected feature set, was included for the imbalanced training data. The sensitivity and specificity of our method both exceeded 80% for human genetic data in the cross validation. Our method enables detection of SNPs from the user's own EST dataset and can be used on species for which there is no genome data. Our tests showed that this method can effectively guide SNP discovery in ESTs and will be useful to avoid and save the cost of biological analyses.

  7. Nullomers and High Order Nullomers in Genomic Sequences

    PubMed Central

    Vergni, Davide; Santoni, Daniele

    2016-01-01

    A nullomer is an oligomer that does not occur as a subsequence in a given DNA sequence, i.e. it is an absent word of that sequence. The importance of nullomers in several applications, from drug discovery to forensic practice, is now debated in the literature. Here, we investigated the nature of nullomers, whether their absence in genomes has just a statistical explanation or it is a peculiar feature of genomic sequences. We introduced an extension of the notion of nullomer, namely high order nullomers, which are nullomers whose mutated sequences are still nullomers. We studied different aspects of them: comparison with nullomers of random sequences, CpG distribution and mean helical rise. In agreement with previous results we found that the number of nullomers in the human genome is much larger than expected by chance. Nevertheless antithetical results were found when considering a random DNA sequence preserving dinucleotide frequencies. The analysis of CpG frequencies in nullomers and high order nullomers revealed, as expected, a high CpG content but it also highlighted a strong dependence of CpG frequencies on the dinucleotide position, suggesting that nullomers have their own peculiar structure and are not simply sequences whose CpG frequency is biased. Furthermore, phylogenetic trees were built on eleven species based on both the similarities between the dinucleotide frequencies and the number of nullomers two species share, showing that nullomers are fairly conserved among close species. Finally the study of mean helical rise of nullomers sequences revealed significantly high mean rise values, reinforcing the hypothesis that those sequences have some peculiar structural features. The obtained results show that nullomers are the consequence of the peculiar structure of DNA (also including biased CpG frequency and CpGs islands), so that the hypermutability model, also taking into account CpG islands, seems to be not sufficient to explain nullomer phenomenon. Finally, high order nullomers could emphasize those features that already make simple nullomers useful in several applications. PMID:27906971

  8. Genome analysis of Desulfotomaculum gibsoniae strain GrollT a highly versatile Gram-positive sulfate-reducing bacterium

    PubMed Central

    Kuever, Jan; Visser, Michael; Loeffler, Claudia; Boll, Matthias; Worm, Petra; Sousa, Diana Z.; Plugge, Caroline M.; Schaap, Peter J.; Muyzer, Gerard; Pereira, Ines A.C.; Parshina, Sofiya N.; Goodwin, Lynne A.; Kyrpides, Nikos C.; Detter, Janine; Woyke, Tanja; Chain, Patrick; Davenport, Karen W.; Rohde, Manfred; Spring, Stefan; Klenk, Hans-Peter; Stams, Alfons J.M.

    2014-01-01

    Desulfotomaculum gibsoniae is a mesophilic member of the polyphyletic spore-forming genus Desulfotomaculum within the family Peptococcaceae. This bacterium was isolated from a freshwater ditch and is of interest because it can grow with a large variety of organic substrates, in particular several aromatic compounds, short-chain and medium-chain fatty acids, which are degraded completely to carbon dioxide coupled to the reduction of sulfate. It can grow autotrophically with H2 + CO2 and sulfate and slowly acetogenically with H2 + CO2, formate or methoxylated aromatic compounds in the absence of sulfate. It does not require any vitamins for growth. Here, we describe the features of D. gibsoniae strain GrollT together with the genome sequence and annotation. The chromosome has 4,855,529 bp organized in one circular contig and is the largest genome of all sequenced Desulfotomaculum spp. to date. A total of 4,666 candidate protein-encoding genes and 96 RNA genes were identified. Genes of the acetyl-CoA pathway, possibly involved in heterotrophic growth and in CO2 fixation during autotrophic growth, are present. The genome contains a large set of genes for the anaerobic transformation and degradation of aromatic compounds, which are lacking in the other sequenced Desulfotomaculum genomes. PMID:25197466

  9. Novel compound heterozygous mutations in CNGA1in a Chinese family affected with autosomal recessive retinitis pigmentosa by targeted sequencing.

    PubMed

    Wang, Min; Gan, Dekang; Huang, Xin; Xu, Gezhi

    2016-07-08

    About 37 genes have been reported to be involved in autosomal recessive retinitis pigmentosa, a hereditary retinal disease. However, causative genes remain unclear in a lot of cases. Two sibs of a Chinese family with ocular disease were diagnosed in Eye and ENT Hospital of Fudan University. Targeted sequencing performed on proband to screen pathogenic mutations. PCR combined Sanger sequencing then performed on eight family members including two affected and six unaffected individuals to determine whether mutations cosegregate with disease. Two affected members exhibited clinical features that fit the criteria of autosomal recessive retinitis pigmentosa. Two heterozygous mutations (NM000087, p.Y82X and p.L89fs) in CNGA1 were revealed on proband. Affected members were compound heterozygotes for the two mutations whereas unaffected members either had no mutation or were heterozygote carriers for only one of the two mutations. That is, these mutations cosegregate with autosomal recessive retinitis pigmentosa. Compound heterozygous mutations (NM000087, p.Y82X and p.L89fs) in exon 6 of CNGA1are pathogenic mutations in this Chinese family. Of which, p.Y82X is firstly reported in patient with autosomal recessive retinitis pigmentosa.

  10. Comparative Genomic Analyses of Clavibacter michiganensis subsp. insidiosus and Pathogenicity on Medicago truncatula.

    PubMed

    Lu, You; Ishimaru, Carol A; Glazebrook, Jane; Samac, Deborah A

    2018-02-01

    Clavibacter michiganensis is the most economically important gram-positive bacterial plant pathogen, with subspecies that cause serious diseases of maize, wheat, tomato, potato, and alfalfa. Much less is known about pathogenesis involving gram-positive plant pathogens than is known for gram-negative bacteria. Comparative genome analyses of C. michiganensis subspecies affecting tomato, potato, and maize have provided insights on pathogenicity. In this study, we identified strains of C. michiganensis subsp. insidiosus with contrasting pathogenicity on three accessions of the model legume Medicago truncatula. We generated complete genome sequences for two strains and compared these to a previously sequenced strain and genome sequences of four other subspecies. The three C. michiganensis subsp. insidiosus strains varied in gene content due to genome rearrangements, most likely facilitated by insertion elements, and plasmid number, which varied from one to three depending on strain. The core C. michiganensis genome consisted of 1,917 genes, with 379 genes unique to C. michiganensis subsp. insidiosus. An operon for synthesis of the extracellular blue pigment indigoidine, enzymes for pectin degradation, and an operon for inositol metabolism are among the unique features. Secreted serine proteases belonging to both the pat-1 and ppa families were present but highly diverged from those in other subspecies.

  11. Properties and structure of a low-potential, penta-heme cytochrome c 552 from a thermophilic purple sulfur photosynthetic bacterium Thermochromatium tepidum.

    PubMed

    Chen, Jing-Hua; Yu, Long-Jiang; Boussac, Alain; Wang-Otomo, Zheng-Yu; Kuang, Tingyun; Shen, Jian-Ren

    2018-04-24

    The thermophilic purple sulfur bacterium Thermochromatium tepidum possesses four main water-soluble redox proteins involved in the electron transfer behavior. Crystal structures have been reported for three of them: a high potential iron-sulfur protein, cytochrome c', and one of two low-potential cytochrome c 552 (which is a flavocytochrome c) have been determined. In this study, we purified another low-potential cytochrome c 552 (LPC), determined its N-terminal amino acid sequence and the whole gene sequence, characterized it with absorption and electron paramagnetic spectroscopy, and solved its high-resolution crystal structure. This novel cytochrome was found to contain five c-type hemes. The overall fold of LPC consists of two distinct domains, one is the five heme-containing domain and the other one is an Ig-like domain. This provides a representative example for the structures of multiheme cytochromes containing an odd number of hemes, although the structures of multiheme cytochromes with an even number of hemes are frequently seen in the PDB database. Comparison of the sequence and structure of LPC with other proteins in the databases revealed several characteristic features which may be important for its functioning. Based on the results obtained, we discuss the possible intracellular function of this LPC in Tch. tepidum.

  12. Finding functional features in Saccharomyces genomes by phylogenetic footprinting.

    PubMed

    Cliften, Paul; Sudarsanam, Priya; Desikan, Ashwin; Fulton, Lucinda; Fulton, Bob; Majors, John; Waterston, Robert; Cohen, Barak A; Johnston, Mark

    2003-07-04

    The sifting and winnowing of DNA sequence that occur during evolution cause nonfunctional sequences to diverge, leaving phylogenetic footprints of functional sequence elements in comparisons of genome sequences. We searched for such footprints among the genome sequences of six Saccharomyces species and identified potentially functional sequences. Comparison of these sequences allowed us to revise the catalog of yeast genes and identify sequence motifs that may be targets of transcriptional regulatory proteins. Some of these conserved sequence motifs reside upstream of genes with similar functional annotations or similar expression patterns or those bound by the same transcription factor and are thus good candidates for functional regulatory sequences.

  13. Gee Fu: a sequence version and web-services database tool for genomic assembly, genome feature and NGS data.

    PubMed

    Ramirez-Gonzalez, Ricardo; Caccamo, Mario; MacLean, Daniel

    2011-10-01

    Scientists now use high-throughput sequencing technologies and short-read assembly methods to create draft genome assemblies in just days. Tools and pipelines like the assembler, and the workflow management environments make it easy for a non-specialist to implement complicated pipelines to produce genome assemblies and annotations very quickly. Such accessibility results in a proliferation of assemblies and associated files, often for many organisms. These assemblies get used as a working reference by lots of different workers, from a bioinformatician doing gene prediction or a bench scientist designing primers for PCR. Here we describe Gee Fu, a database tool for genomic assembly and feature data, including next-generation sequence alignments. Gee Fu is an instance of a Ruby-On-Rails web application on a feature database that provides web and console interfaces for input, visualization of feature data via AnnoJ, access to data through a web-service interface, an API for direct data access by Ruby scripts and access to feature data stored in BAM files. Gee Fu provides a platform for storing and sharing different versions of an assembly and associated features that can be accessed and updated by bench biologists and bioinformaticians in ways that are easy and useful for each. http://tinyurl.com/geefu dan.maclean@tsl.ac.uk.

  14. Feature co-localization landscape of the human genome

    PubMed Central

    Ng, Siu-Kin; Hu, Taobo; Long, Xi; Chan, Cheuk-Hin; Tsang, Shui-Ying; Xue, Hong

    2016-01-01

    Although feature co-localizations could serve as useful guide-posts to genome architecture, a comprehensive and quantitative feature co-localization map of the human genome has been lacking. Herein we show that, in contrast to the conventional bipartite division of genomic sequences into genic and inter-genic regions, pairwise co-localizations of forty-two genomic features in the twenty-two autosomes based on 50-kb to 2,000-kb sequence windows indicate a tripartite zonal architecture comprising Genic zones enriched with gene-related features and Alu-elements; Proximal zones enriched with MIR- and L2-elements, transcription-factor-binding-sites (TFBSs), and conserved-indels (CIDs); and Distal zones enriched with L1-elements. Co-localizations between single-nucleotide-polymorphisms (SNPs) and copy-number-variations (CNVs) reveal a fraction of sequence windows displaying steeply enhanced levels of SNPs, CNVs and recombination rates that point to active adaptive evolution in such pathways as immune response, sensory perceptions, and cognition. The strongest positive co-localization observed between TFBSs and CIDs suggests a regulatory role of CIDs in cooperation with TFBSs. The positive co-localizations of cancer somatic CNVs (CNVT) with all Proximal zone and most Genic zone features, in contrast to the distinctly more restricted co-localizations exhibited by germline CNVs (CNVG), reveal disparate distributions of CNVTs and CNVGs indicative of dissimilarity in their underlying mechanisms. PMID:26854351

  15. Formation and evolution of Lakshmi Planum, Venus: Assessment of models using observations from geological mapping

    NASA Astrophysics Data System (ADS)

    Ivanov, M. A.; Head, J. W.

    2008-12-01

    Detailed geological analysis of the Lakshmi Planum region of western Ishtar Terra results in the establishment of the sequence of major events during the formation and evolution of western Ishtar Terra, an important and somewhat unique area on Venus characterized by a raised volcanic plateau surrounded by distinctive folded mountain belts, such as Maxwell Montes. These mapping results and the stratigraphic and structural relationships provide a basis for addressing the complicated problem of Lakshmi Planum formation and for testing the suite of models previously proposed to explain this structure. We review and classify previous models of formation for western Ishtar Terra into "downwelling" models (generally involving convergence and underthrusting) and "upwelling" models (generally involving plume-like upwelling and divergence). The interpreted nature of units and the sequence of events derived from geological mapping are in contrast to the predictions of the divergent models. The major contradictions are as follows: (1) The very likely presence of an ancient (craton-like) tessera massif in the core of Lakshmi, which is inconsistent with the model of formation of Lakshmi due to rise and collapse of a mantle diapir; (2) The absence of rift zones in the interior of Lakshmi that are predicted by the divergent models; (3) The apparent migration of volcanic activity toward the center of Lakshmi, whereas divergent models predict the opposite trend; (4) The abrupt cessation of ridges of the mountain ranges at the edge of Lakshmi Planum and propagation of these ridges over hundreds of kilometers outside Lakshmi; the divergent models predict the opposite progression in the development of major contractional features. In contrast, convergent models of formation and evolution of Lakshmi Planum appear to be more consistent with the observations and explain this structure by collision and underthrusting/subduction of lower-lying plains with the elevated and rigid block of tessera. These models are capable of explaining formation of the major features of western Ishtar (for example, the mountain belts), the sequences of events, and principal volcanic and tectonic trends during the evolution of Lakshmi. To explain the pronounced north-south asymmetry of Lakshmi these models need to consider the likelihood that the major focal points of collision are at the north and north-west margins of the plateau. We note that pure downwelling models, however, face three important difficulties: (1) The possibly unrealistically long time span that appears to be required to produce the major features of Lakshmi; (2) The strong north-south asymmetry of the Planum; the pure downwelling models predict the formation of a more symmetrical structure; and (3) The absence of radial contractional structures (arches and ridges) in the interior of Lakshmi that would represent the predictions of the downwelling models.

  16. Gene calling and bacterial genome annotation with BG7.

    PubMed

    Tobes, Raquel; Pareja-Tobes, Pablo; Manrique, Marina; Pareja-Tobes, Eduardo; Kovach, Evdokim; Alekhin, Alexey; Pareja, Eduardo

    2015-01-01

    New massive sequencing technologies are providing many bacterial genome sequences from diverse taxa but a refined annotation of these genomes is crucial for obtaining scientific findings and new knowledge. Thus, bacterial genome annotation has emerged as a key point to investigate in bacteria. Any efficient tool designed specifically to annotate bacterial genomes sequenced with massively parallel technologies has to consider the specific features of bacterial genomes (absence of introns and scarcity of nonprotein-coding sequence) and of next-generation sequencing (NGS) technologies (presence of errors and not perfectly assembled genomes). These features make it convenient to focus on coding regions and, hence, on protein sequences that are the elements directly related with biological functions. In this chapter we describe how to annotate bacterial genomes with BG7, an open-source tool based on a protein-centered gene calling/annotation paradigm. BG7 is specifically designed for the annotation of bacterial genomes sequenced with NGS. This tool is sequence error tolerant maintaining their capabilities for the annotation of highly fragmented genomes or for annotating mixed sequences coming from several genomes (as those obtained through metagenomics samples). BG7 has been designed with scalability as a requirement, with a computing infrastructure completely based on cloud computing (Amazon Web Services).

  17. A novel and efficient technique for identification and classification of GPCRs.

    PubMed

    Gupta, Ravi; Mittal, Ankush; Singh, Kuldip

    2008-07-01

    G-protein coupled receptors (GPCRs) play a vital role in different biological processes, such as regulation of growth, death, and metabolism of cells. GPCRs are the focus of significant amount of current pharmaceutical research since they interact with more than 50% of prescription drugs. The dipeptide-based support vector machine (SVM) approach is the most accurate technique to identify and classify the GPCRs. However, this approach has two major disadvantages. First, the dimension of dipeptide-based feature vector is equal to 400. The large dimension makes the classification task computationally and memory wise inefficient. Second, it does not consider the biological properties of protein sequence for identification and classification of GPCRs. In this paper, we present a novel-feature-based SVM classification technique. The novel features are derived by applying wavelet-based time series analysis approach on protein sequences. The proposed feature space summarizes the variance information of seven important biological properties of amino acids in a protein sequence. In addition, the dimension of the feature vector for proposed technique is equal to 35. Experiments were performed on GPCRs protein sequences available at GPCRs Database. Our approach achieves an accuracy of 99.9%, 98.06%, 97.78%, and 94.08% for GPCR superfamily, families, subfamilies, and subsubfamilies (amine group), respectively, when evaluated using fivefold cross-validation. Further, an accuracy of 99.8%, 97.26%, and 97.84% was obtained when evaluated on unseen or recall datasets of GPCR superfamily, families, and subfamilies, respectively. Comparison with dipeptide-based SVM technique shows the effectiveness of our approach.

  18. SERDP and ESTCP Expert Panel Workshop on Research and Development Needs for the Environmental Remediation Application of Molecular Biological Tools

    DTIC Science & Technology

    2005-10-01

    used to infer metabolic rates in marine systems. For example, there is evidence from both pure cultures and environmental samples that rbcL...It includes many useful bioinformatics features such as constructing a neighbor-joining tree for a subset of sequences, downloading a subset of...further provide software that allow users to extract useful information from sequences. The most commonly used feature is probe/primer design

  19. Novel and general approach to linear filter design for contrast-to-noise ratio enhancement of magnetic resonance images with multiple interfering features in the scene

    NASA Astrophysics Data System (ADS)

    Soltanian-Zadeh, Hamid; Windham, Joe P.

    1992-04-01

    Maximizing the minimum absolute contrast-to-noise ratios (CNRs) between a desired feature and multiple interfering processes, by linear combination of images in a magnetic resonance imaging (MRI) scene sequence, is attractive for MRI analysis and interpretation. A general formulation of the problem is presented, along with a novel solution utilizing the simple and numerically stable method of Gram-Schmidt orthogonalization. We derive explicit solutions for the case of two interfering features first, then for three interfering features, and, finally, using a typical example, for an arbitrary number of interfering feature. For the case of two interfering features, we also provide simplified analytical expressions for the signal-to-noise ratios (SNRs) and CNRs of the filtered images. The technique is demonstrated through its applications to simulated and acquired MRI scene sequences of a human brain with a cerebral infarction. For these applications, a 50 to 100% improvement for the smallest absolute CNR is obtained.

  20. Fungal prion HET-s as a model for structural complexity and self-propagation in prions.

    PubMed

    Wan, William; Stubbs, Gerald

    2014-04-08

    The highly ordered and reproducible structure of the fungal prion HET-s makes it an excellent model system for studying the inherent properties of prions, self-propagating infectious proteins that have been implicated in a number of fatal diseases. In particular, the HET-s prion-forming domain readily folds into a relatively complex two-rung β-solenoid amyloid. The faithful self-propagation of this fold involves a diverse array of inter- and intramolecular structural features. These features include a long flexible loop connecting the two rungs, buried polar residues, salt bridges, and asparagine ladders. We have used site-directed mutagenesis and X-ray fiber diffraction to probe the relative importance of these features for the formation of β-solenoid structure, as well as the cumulative effects of multiple mutations. Using fibrillization kinetics and chemical stability assays, we have determined the biophysical effects of our mutations on the assembly and stability of the prion-forming domain. We have found that a diversity of structural features provides a level of redundancy that allows robust folding and stability even in the face of significant sequence alterations and suboptimal environmental conditions. Our findings provide fundamental insights into the structural interactions necessary for self-propagation. Propagation of prion structure seems to require an obligatory level of complexity that may not be reproducible in short peptide models.

  1. Next-generation sequencing using a pre-designed gene panel for the molecular diagnosis of congenital disorders in pediatric patients.

    PubMed

    Lim, Eileen C P; Brett, Maggie; Lai, Angeline H M; Lee, Siew-Peng; Tan, Ee-Shien; Jamuar, Saumya S; Ng, Ivy S L; Tan, Ene-Choo

    2015-12-14

    Next-generation sequencing (NGS) has revolutionized genetic research and offers enormous potential for clinical application. Sequencing the exome has the advantage of casting the net wide for all known coding regions while targeted gene panel sequencing provides enhanced sequencing depths and can be designed to avoid incidental findings in adult-onset conditions. A HaloPlex panel consisting of 180 genes within commonly altered chromosomal regions is available for use on both the Ion Personal Genome Machine (PGM) and MiSeq platforms to screen for causative mutations in these genes. We used this Haloplex ICCG panel for targeted sequencing of 15 patients with clinical presentations indicative of an abnormality in one of the 180 genes. Sequencing runs were done using the Ion 318 Chips on the Ion Torrent PGM. Variants were filtered for known polymorphisms and analysis was done to identify possible disease-causing variants before validation by Sanger sequencing. When possible, segregation of variants with phenotype in family members was performed to ascertain the pathogenicity of the variant. More than 97% of the target bases were covered at >20×. There was an average of 9.6 novel variants per patient. Pathogenic mutations were identified in five genes for six patients, with two novel variants. There were another five likely pathogenic variants, some of which were unreported novel variants. In a cohort of 15 patients, we were able to identify a likely genetic etiology in six patients (40%). Another five patients had candidate variants for which further evaluation and segregation analysis are ongoing. Our results indicate that the HaloPlex ICCG panel is useful as a rapid, high-throughput and cost-effective screening tool for 170 of the 180 genes. There is low coverage for some regions in several genes which might have to be supplemented by Sanger sequencing. However, comparing the cost, ease of analysis, and shorter turnaround time, it is a good alternative to exome sequencing for patients whose features are suggestive of a genetic etiology involving one of the genes in the panel.

  2. DOE Office of Scientific and Technical Information (OSTI.GOV)

    Malashkevich, Vladimir N.; Higgins, Chelsea D.; Almo, Steven C.

    The coiled-coil is one of the most ubiquitous and well studied protein structural motifs. Significant effort has been devoted to dissecting subtle variations of the typical heptad repeat sequence pattern that can designate larger topological features such as relative α-helical orientation and oligomer size. Here in this paper we report the X-ray structure of a model coiled-coil peptide, HA2-Del-L2seM, which forms an unanticipated core antiparallel dimer with potential sites for discrete higher-order multimerization (trimer or tetramer). In the X-ray structure, a third, partially-ordered α-helix is weakly associated with the antiparallel dimer and analytical ultracentrifugation experiments indicate the peptide forms amore » well-defined tetramer in solution. The HA2-Del-L2seM sequence is closely related to a parent model peptide, HA2-Del, which we previously reported adopts a parallel trimer; HA2-Del-L2seM differs by only hydrophobic leucine to selenomethione mutations and thus this subtle difference is sufficient to switch both relative α-helical topology and number of α-helices participating in the coiled-coil. Comparison of the X-ray structures of HA2-Del-L2seM (reported here) with the HA2-Del parent (reported previously) reveals novel interactions involving the selenomethionine residues that promote antiparallel coiled-coil configuration and preclude parallel trimer formation. Finally, these novel atomic insights are instructive for understanding subtle features that can affect coiled-coil topology and provide additional information for design of antiparallel coiled-coils.« less

  3. Biallelic Alteration and Dysregulation of the Hippo Pathway in Mucinous Tubular and Spindle Cell Carcinoma of the Kidney.

    PubMed

    Mehra, Rohit; Vats, Pankaj; Cieslik, Marcin; Cao, Xuhong; Su, Fengyun; Shukla, Sudhanshu; Udager, Aaron M; Wang, Rui; Pan, Jincheng; Kasaian, Katayoon; Lonigro, Robert; Siddiqui, Javed; Premkumar, Kumpati; Palapattu, Ganesh; Weizer, Alon; Hafez, Khaled S; Wolf, J Stuart; Sangoi, Ankur R; Trpkov, Kiril; Osunkoya, Adeboye O; Zhou, Ming; Giannico, Giovanna; McKenney, Jesse K; Dhanasekaran, Saravana M; Chinnaiyan, Arul M

    2016-11-01

    Mucinous tubular and spindle cell carcinoma (MTSCC) is a relatively rare subtype of renal cell carcinoma (RCC) with distinctive morphologic and cytogenetic features. Here, we carry out whole-exome and transcriptome sequencing of a multi-institutional cohort of MTSCC (n = 22). We demonstrate the presence of either biallelic loss of Hippo pathway tumor suppressor genes (TSG) and/or evidence of alteration of Hippo pathway genes in 85% of samples. PTPN14 (31%) and NF2 (22%) were the most commonly implicated Hippo pathway genes, whereas other genes such as SAV1 and HIPK2 were also involved in a mutually exclusive fashion. Mutations in the context of recurrent chromosomal losses amounted to biallelic alterations in these TSGs. As a readout of Hippo pathway inactivation, a majority of cases (90%) exhibited increased nuclear YAP1 protein expression. Taken together, nearly all cases of MTSCC exhibit some evidence of Hippo pathway dysregulation. MTSCC is a rare and relatively recently described subtype of RCC. Next-generation sequencing of a multi-institutional MTSCC cohort revealed recurrent chromosomal losses and somatic mutations in the Hippo signaling pathway genes leading to potential YAP1 activation. In virtually all cases of MTSCC, there was evidence of Hippo pathway dysregulation, suggesting a common mechanistic basis for this disease. Cancer Discov; 6(11); 1258-66. ©2016 AACR.This article is highlighted in the In This Issue feature, p. 1197. ©2016 American Association for Cancer Research.

  4. Characterization of a Case of Pigmentary Retinopathy in Sanfilippo Syndrome Type IIIA Associated with Compound Heterozygous Mutations in the SGSH Gene.

    PubMed

    Wilkin, Justin; Kerr, Natalie C; Byrd, Kathryn W; Ward, Jewell C; Iannaccone, Alessandro

    2016-06-01

    To report longitudinal phenotypic findings in a patient with Sanfilippo syndrome type IIIA, harboring SGSH mutations, one of which is novel. Heparan-N-sulfatidase enzyme function testing in skin fibroblasts and white blood cells and SGSH gene sequencing were obtained. Clinical office examinations, examinations under anesthesia, electroretinogram, spectral domain optical coherence tomography (SD-OCT), and fundus photography were performed over a 5-year period. Fundus examination revealed a progressive breadcrumb-like pigmentary retinopathy with perifoveal pigmentary involvement. SD-OCT showed loss of normal neuroretinal lamination and cystic macular changes responsive to treatment with carbonic anhydrase inhibitors. Electroretinography exhibited complex characteristics indicative of a generalized retinal rod > cone dysfunction with significant ON > OFF postreceptoral response compromise. Sequencing revealed compound heterozygous mutations in the SGSH gene, the novel c.88G > C (p.A30P) change and a second, previously reported one (c.734G > A, p.R245H). We have identified ocular features of a patient with Sanfilippo syndrome type IIIA harboring a novel SGHS mutation that were not previously known to occur in this disease - namely, a progressive retinopathy with distinctive features, cystic macular changes responsive to carbonic anhydrase inhibitors, and complex electroretinographic abnormalities consistent with postreceptoral dysfunction. SD-OCT imaging revealed retinal lamination changes consistent with previously reported histologic studies. Both the SD-OCT and the electroretinogram changes appear attributable to intraretinal deposition of heparan sulfate.

  5. Mutations in SNX14 Cause a Distinctive Autosomal-Recessive Cerebellar Ataxia and Intellectual Disability Syndrome

    PubMed Central

    Thomas, Anna C.; Williams, Hywel; Setó-Salvia, Núria; Bacchelli, Chiara; Jenkins, Dagan; O’Sullivan, Mary; Mengrelis, Konstantinos; Ishida, Miho; Ocaka, Louise; Chanudet, Estelle; James, Chela; Lescai, Francesco; Anderson, Glenn; Morrogh, Deborah; Ryten, Mina; Duncan, Andrew J.; Pai, Yun Jin; Saraiva, Jorge M.; Ramos, Fabiana; Farren, Bernadette; Saunders, Dawn; Vernay, Bertrand; Gissen, Paul; Straatmaan-Iwanowska, Anna; Baas, Frank; Wood, Nicholas W.; Hersheson, Joshua; Houlden, Henry; Hurst, Jane; Scott, Richard; Bitner-Glindzicz, Maria; Moore, Gudrun E.; Sousa, Sérgio B.; Stanier, Philip

    2014-01-01

    Intellectual disability and cerebellar atrophy occur together in a large number of genetic conditions and are frequently associated with microcephaly and/or epilepsy. Here we report the identification of causal mutations in Sorting Nexin 14 (SNX14) found in seven affected individuals from three unrelated consanguineous families who presented with recessively inherited moderate-severe intellectual disability, cerebellar ataxia, early-onset cerebellar atrophy, sensorineural hearing loss, and the distinctive association of progressively coarsening facial features, relative macrocephaly, and the absence of seizures. We used homozygosity mapping and whole-exome sequencing to identify a homozygous nonsense mutation and an in-frame multiexon deletion in two families. A homozygous splice site mutation was identified by Sanger sequencing of SNX14 in a third family, selected purely by phenotypic similarity. This discovery confirms that these characteristic features represent a distinct and recognizable syndrome. SNX14 encodes a cellular protein containing Phox (PX) and regulator of G protein signaling (RGS) domains. Weighted gene coexpression network analysis predicts that SNX14 is highly coexpressed with genes involved in cellular protein metabolism and vesicle-mediated transport. All three mutations either directly affected the PX domain or diminished SNX14 levels, implicating a loss of normal cellular function. This manifested as increased cytoplasmic vacuolation as observed in cultured fibroblasts. Our findings indicate an essential role for SNX14 in neural development and function, particularly in development and maturation of the cerebellum. PMID:25439728

  6. Mutations in SNX14 cause a distinctive autosomal-recessive cerebellar ataxia and intellectual disability syndrome.

    PubMed

    Thomas, Anna C; Williams, Hywel; Setó-Salvia, Núria; Bacchelli, Chiara; Jenkins, Dagan; O'Sullivan, Mary; Mengrelis, Konstantinos; Ishida, Miho; Ocaka, Louise; Chanudet, Estelle; James, Chela; Lescai, Francesco; Anderson, Glenn; Morrogh, Deborah; Ryten, Mina; Duncan, Andrew J; Pai, Yun Jin; Saraiva, Jorge M; Ramos, Fabiana; Farren, Bernadette; Saunders, Dawn; Vernay, Bertrand; Gissen, Paul; Straatmaan-Iwanowska, Anna; Baas, Frank; Wood, Nicholas W; Hersheson, Joshua; Houlden, Henry; Hurst, Jane; Scott, Richard; Bitner-Glindzicz, Maria; Moore, Gudrun E; Sousa, Sérgio B; Stanier, Philip

    2014-11-06

    Intellectual disability and cerebellar atrophy occur together in a large number of genetic conditions and are frequently associated with microcephaly and/or epilepsy. Here we report the identification of causal mutations in Sorting Nexin 14 (SNX14) found in seven affected individuals from three unrelated consanguineous families who presented with recessively inherited moderate-severe intellectual disability, cerebellar ataxia, early-onset cerebellar atrophy, sensorineural hearing loss, and the distinctive association of progressively coarsening facial features, relative macrocephaly, and the absence of seizures. We used homozygosity mapping and whole-exome sequencing to identify a homozygous nonsense mutation and an in-frame multiexon deletion in two families. A homozygous splice site mutation was identified by Sanger sequencing of SNX14 in a third family, selected purely by phenotypic similarity. This discovery confirms that these characteristic features represent a distinct and recognizable syndrome. SNX14 encodes a cellular protein containing Phox (PX) and regulator of G protein signaling (RGS) domains. Weighted gene coexpression network analysis predicts that SNX14 is highly coexpressed with genes involved in cellular protein metabolism and vesicle-mediated transport. All three mutations either directly affected the PX domain or diminished SNX14 levels, implicating a loss of normal cellular function. This manifested as increased cytoplasmic vacuolation as observed in cultured fibroblasts. Our findings indicate an essential role for SNX14 in neural development and function, particularly in development and maturation of the cerebellum. Copyright © 2014 The Authors. Published by Elsevier Inc. All rights reserved.

  7. Molecular analysis of immunoglobulin variable genes supports a germinal center experienced normal counterpart in primary cutaneous diffuse large B-cell lymphoma, leg-type.

    PubMed

    Pham-Ledard, Anne; Prochazkova-Carlotti, Martina; Deveza, Mélanie; Laforet, Marie-Pierre; Beylot-Barry, Marie; Vergier, Béatrice; Parrens, Marie; Feuillard, Jean; Merlio, Jean-Philippe; Gachard, Nathalie

    2017-11-01

    Immunophenotype of primary cutaneous diffuse large B-cell lymphoma, leg-type (PCLBCL-LT) suggests a germinal center-experienced B lymphocyte (BCL2+ MUM1+ BCL6+/-). As maturation history of B-cell is "imprinted" during B-cell development on the immunoglobulin gene sequence, we studied the structure and sequence of the variable part of the genes (IGHV, IGLV, IGKV), immunoglobulin surface expression and features of class switching in order to determine the PCLBCL-LT cell of origin. Clonality analysis with BIOMED2 protocol and VH leader primers was done on DNA extracted from frozen skin biopsies on retrospective samples from 14 patients. The clonal DNA IGHV sequence of the tumor was aligned and compared with the closest germline sequence and homology percentage was calculated. Superantigen binding sites were studied. Features of selection pressure were evaluated with the multinomial Lossos model. A functional monoclonal sequence was observed in 14 cases as determined for IGHV (10), IGLV (2) or IGKV (3). IGV mutation rates were high (>5%) in all cases but one (median:15.5%), with superantigen binding sites conservation. Features of selection pressure were identified in 11/12 interpretable cases, more frequently negative (75%) than positive (25%). Intraclonal variation was detected in 3 of 8 tumor specimens with a low rate of mutations. Surface immunoglobulin was an IgM in 12/12 cases. FISH analysis of IGHM locus, deleted during class switching, showed heterozygous IGHM gene deletion in half of cases. The genomic PCR analysis confirmed the deletions within the switch μ region. IGV sequences were highly mutated but functional, with negative features of selection pressure suggesting one or more germinal center passage(s) with somatic hypermutation, but superantigen (SpA) binding sites conservation. Genetic features of class switch were observed, but on the non functional allele and co-existing with primary isotype IgM expression. These data suggest that cell-of origin is germinal center experienced and superantigen driven selected B-cell, in a stage between germinal center B-cell and plasma cell. Copyright © 2017 Japanese Society for Investigative Dermatology. Published by Elsevier B.V. All rights reserved.

  8. Identification of sequence motifs significantly associated with antisense activity.

    PubMed

    McQuisten, Kyle A; Peek, Andrew S

    2007-06-07

    Predicting the suppression activity of antisense oligonucleotide sequences is the main goal of the rational design of nucleic acids. To create an effective predictive model, it is important to know what properties of an oligonucleotide sequence associate significantly with antisense activity. Also, for the model to be efficient we must know what properties do not associate significantly and can be omitted from the model. This paper will discuss the results of a randomization procedure to find motifs that associate significantly with either high or low antisense suppression activity, analysis of their properties, as well as the results of support vector machine modelling using these significant motifs as features. We discovered 155 motifs that associate significantly with high antisense suppression activity and 202 motifs that associate significantly with low suppression activity. The motifs range in length from 2 to 5 bases, contain several motifs that have been previously discovered as associating highly with antisense activity, and have thermodynamic properties consistent with previous work associating thermodynamic properties of sequences with their antisense activity. Statistical analysis revealed no correlation between a motif's position within an antisense sequence and that sequences antisense activity. Also, many significant motifs existed as subwords of other significant motifs. Support vector regression experiments indicated that the feature set of significant motifs increased correlation compared to all possible motifs as well as several subsets of the significant motifs. The thermodynamic properties of the significantly associated motifs support existing data correlating the thermodynamic properties of the antisense oligonucleotide with antisense efficiency, reinforcing our hypothesis that antisense suppression is strongly associated with probe/target thermodynamics, as there are no enzymatic mediators to speed the process along like the RNA Induced Silencing Complex (RISC) in RNAi. The independence of motif position and antisense activity also allows us to bypass consideration of this feature in the modelling process, promoting model efficiency and reducing the chance of overfitting when predicting antisense activity. The increase in SVR correlation with significant features compared to nearest-neighbour features indicates that thermodynamics alone is likely not the only factor in determining antisense efficiency.

  9. A machine-learning approach for predicting palmitoylation sites from integrated sequence-based features.

    PubMed

    Li, Liqi; Luo, Qifa; Xiao, Weidong; Li, Jinhui; Zhou, Shiwen; Li, Yongsheng; Zheng, Xiaoqi; Yang, Hua

    2017-02-01

    Palmitoylation is the covalent attachment of lipids to amino acid residues in proteins. As an important form of protein posttranslational modification, it increases the hydrophobicity of proteins, which contributes to the protein transportation, organelle localization, and functions, therefore plays an important role in a variety of cell biological processes. Identification of palmitoylation sites is necessary for understanding protein-protein interaction, protein stability, and activity. Since conventional experimental techniques to determine palmitoylation sites in proteins are both labor intensive and costly, a fast and accurate computational approach to predict palmitoylation sites from protein sequences is in urgent need. In this study, a support vector machine (SVM)-based method was proposed through integrating PSI-BLAST profile, physicochemical properties, [Formula: see text]-mer amino acid compositions (AACs), and [Formula: see text]-mer pseudo AACs into the principal feature vector. A recursive feature selection scheme was subsequently implemented to single out the most discriminative features. Finally, an SVM method was implemented to predict palmitoylation sites in proteins based on the optimal features. The proposed method achieved an accuracy of 99.41% and Matthews Correlation Coefficient of 0.9773 for a benchmark dataset. The result indicates the efficiency and accuracy of our method in prediction of palmitoylation sites based on protein sequences.

  10. Neural network and SVM classifiers accurately predict lipid binding proteins, irrespective of sequence homology.

    PubMed

    Bakhtiarizadeh, Mohammad Reza; Moradi-Shahrbabak, Mohammad; Ebrahimi, Mansour; Ebrahimie, Esmaeil

    2014-09-07

    Due to the central roles of lipid binding proteins (LBPs) in many biological processes, sequence based identification of LBPs is of great interest. The major challenge is that LBPs are diverse in sequence, structure, and function which results in low accuracy of sequence homology based methods. Therefore, there is a need for developing alternative functional prediction methods irrespective of sequence similarity. To identify LBPs from non-LBPs, the performances of support vector machine (SVM) and neural network were compared in this study. Comprehensive protein features and various techniques were employed to create datasets. Five-fold cross-validation (CV) and independent evaluation (IE) tests were used to assess the validity of the two methods. The results indicated that SVM outperforms neural network. SVM achieved 89.28% (CV) and 89.55% (IE) overall accuracy in identification of LBPs from non-LBPs and 92.06% (CV) and 92.90% (IE) (in average) for classification of different LBPs classes. Increasing the number and the range of extracted protein features as well as optimization of the SVM parameters significantly increased the efficiency of LBPs class prediction in comparison to the only previous report in this field. Altogether, the results showed that the SVM algorithm can be run on broad, computationally calculated protein features and offers a promising tool in detection of LBPs classes. The proposed approach has the potential to integrate and improve the common sequence alignment based methods. Copyright © 2014 Elsevier Ltd. All rights reserved.

  11. A hybrid model based on neural networks for biomedical relation extraction.

    PubMed

    Zhang, Yijia; Lin, Hongfei; Yang, Zhihao; Wang, Jian; Zhang, Shaowu; Sun, Yuanyuan; Yang, Liang

    2018-05-01

    Biomedical relation extraction can automatically extract high-quality biomedical relations from biomedical texts, which is a vital step for the mining of biomedical knowledge hidden in the literature. Recurrent neural networks (RNNs) and convolutional neural networks (CNNs) are two major neural network models for biomedical relation extraction. Neural network-based methods for biomedical relation extraction typically focus on the sentence sequence and employ RNNs or CNNs to learn the latent features from sentence sequences separately. However, RNNs and CNNs have their own advantages for biomedical relation extraction. Combining RNNs and CNNs may improve biomedical relation extraction. In this paper, we present a hybrid model for the extraction of biomedical relations that combines RNNs and CNNs. First, the shortest dependency path (SDP) is generated based on the dependency graph of the candidate sentence. To make full use of the SDP, we divide the SDP into a dependency word sequence and a relation sequence. Then, RNNs and CNNs are employed to automatically learn the features from the sentence sequence and the dependency sequences, respectively. Finally, the output features of the RNNs and CNNs are combined to detect and extract biomedical relations. We evaluate our hybrid model using five public (protein-protein interaction) PPI corpora and a (drug-drug interaction) DDI corpus. The experimental results suggest that the advantages of RNNs and CNNs in biomedical relation extraction are complementary. Combining RNNs and CNNs can effectively boost biomedical relation extraction performance. Copyright © 2018 Elsevier Inc. All rights reserved.

  12. Modeling of DNA local parameters predicts encrypted architectural motifs in Xenopus laevis ribosomal gene promoter

    PubMed Central

    Roux-Rouquie, Magali; Marilley, Monique

    2000-01-01

    We have modeled local DNA sequence parameters to search for DNA architectural motifs involved in transcription regulation and promotion within the Xenopus laevis ribosomal gene promoter and the intergenic spacer (IGS) sequences. The IGS was found to be shaped into distinct topological domains. First, intrinsic bends split the IGS into domains of common but different helical features. Local parameters at inter-domain junctions exhibit a high variability with respect to intrinsic curvature, bendability and thermal stability. Secondly, the repeated sequence blocks of the IGS exhibit right-handed supercoiled structures which could be related to their enhancer properties. Thirdly, the gene promoter presents both inherent curvature and minor groove narrowing which may be viewed as motifs of a structural code for protein recognition and binding. Such pre-existing deformations could simply be remodeled during the binding of the transcription complex. Alternatively, these deformations could pre-shape the promoter in such a way that further remodeling is facilitated. Mutations shown to abolish promoter curvature as well as intrinsic minor groove narrowing, in a variant which maintained full transcriptional activity, bring circumstantial evidence for structurally-preorganized motifs in relation to transcription regulation and promotion. Using well documented X.laevis rDNA regulatory sequences we showed that computer modeling may be of invaluable assistance in assessing encrypted architectural motifs. The evidence of these DNA topological motifs with respect to the concept of structural code is discussed. PMID:10982860

  13. Germline PTPN11 and somatic PIK3CA variant in a boy with megalencephaly-capillary malformation syndrome (MCAP) - pure coincidence?

    PubMed Central

    Döcker, Dennis; Schubach, Max; Menzel, Moritz; Spaich, Christiane; Gabriel, Heinz-Dieter; Zenker, Martin; Bartholdi, Deborah; Biskup, Saskia

    2015-01-01

    Megalencephaly-capillary malformation (MCAP) syndrome is an overgrowth syndrome that is diagnosed by clinical criteria. Recently, somatic and germline variants in genes that are involved in the PI3K-AKT pathway (AKT3, PIK3R2 and PIK3CA) have been described to be associated with MCAP and/or other related megalencephaly syndromes. We performed trio-exome sequencing in a 6-year-old boy and his healthy parents. Clinical features were macrocephaly, cutis marmorata, angiomata, asymmetric overgrowth, developmental delay, discrete midline facial nevus flammeus, toe syndactyly and postaxial polydactyly—thus, clearly an MCAP phenotype. Exome sequencing revealed a pathogenic de novo germline variant in the PTPN11 gene (c.1529A>G; p.(Gln510Arg)), which has so far been associated with Noonan, as well as LEOPARD syndrome. Whole-exome sequencing (>100 × coverage) did not reveal any alteration in the known megalencephaly genes. However, ultra-deep sequencing results from saliva (>1000 × coverage) revealed a 22% mosaic variant in PIK3CA (c.2740G>A; p.(Gly914Arg)). To our knowledge, this report is the first description of a PTPN11 germline variant in an MCAP patient. Data from experimental studies show a complex interaction of SHP2 (gene product of PTPN11) and the PI3K-AKT pathway. We hypothesize that certain PTPN11 germline variants might drive toward additional second-hit alterations. PMID:24939587

  14. Germline PTPN11 and somatic PIK3CA variant in a boy with megalencephaly-capillary malformation syndrome (MCAP)--pure coincidence?

    PubMed

    Döcker, Dennis; Schubach, Max; Menzel, Moritz; Spaich, Christiane; Gabriel, Heinz-Dieter; Zenker, Martin; Bartholdi, Deborah; Biskup, Saskia

    2015-03-01

    Megalencephaly-capillary malformation (MCAP) syndrome is an overgrowth syndrome that is diagnosed by clinical criteria. Recently, somatic and germline variants in genes that are involved in the PI3K-AKT pathway (AKT3, PIK3R2 and PIK3CA) have been described to be associated with MCAP and/or other related megalencephaly syndromes. We performed trio-exome sequencing in a 6-year-old boy and his healthy parents. Clinical features were macrocephaly, cutis marmorata, angiomata, asymmetric overgrowth, developmental delay, discrete midline facial nevus flammeus, toe syndactyly and postaxial polydactyly--thus, clearly an MCAP phenotype. Exome sequencing revealed a pathogenic de novo germline variant in the PTPN11 gene (c.1529A>G; p.(Gln510Arg)), which has so far been associated with Noonan, as well as LEOPARD syndrome. Whole-exome sequencing (>100 × coverage) did not reveal any alteration in the known megalencephaly genes. However, ultra-deep sequencing results from saliva (>1000 × coverage) revealed a 22% mosaic variant in PIK3CA (c.2740G>A; p.(Gly914Arg)). To our knowledge, this report is the first description of a PTPN11 germline variant in an MCAP patient. Data from experimental studies show a complex interaction of SHP2 (gene product of PTPN11) and the PI3K-AKT pathway. We hypothesize that certain PTPN11 germline variants might drive toward additional second-hit alterations.

  15. Divalent Metal-Ion Complexes with Dipeptide Ligands Having Phe and His Side-Chain Anchors: Effects of Sequence, Metal Ion, and Anchor.

    PubMed

    Dunbar, Robert C; Berden, Giel; Martens, Jonathan K; Oomens, Jos

    2015-09-24

    Conformational preferences have been surveyed for divalent metal cation complexes with the dipeptide ligands AlaPhe, PheAla, GlyHis, and HisGly. Density functional theory results for a full set of complexes are presented, and previous experimental infrared spectra, supplemented by a number of newly recorded spectra obtained with infrared multiple photon dissociation spectroscopy, provide experimental verification of the preferred conformations in most cases. The overall structural features of these complexes are shown, and attention is given to comparisons involving peptide sequence, nature of the metal ion, and nature of the side-chain anchor. A regular progression is observed as a function of binding strength, whereby the weakly binding metal ions (Ba(2+) to Ca(2+)) transition from carboxylate zwitterion (ZW) binding to charge-solvated (CS) binding, while the stronger binding metal ions (Ca(2+) to Mg(2+) to Ni(2+)) transition from CS binding to metal-ion-backbone binding (Iminol) by direct metal-nitrogen bonds to the deprotonated amide nitrogens. Two new sequence-dependent reversals are found between ZW and CS binding modes, such that Ba(2+) and Ca(2+) prefer ZW binding in the GlyHis case but prefer CS binding in the HisGly case. The overall binding strength for a given metal ion is not strongly dependent on the sequence, but the histidine peptides are significantly more strongly bound (by 50-100 kJ mol(-1)) than the phenylalanine peptides.

  16. Analytical study of avian reticuloendotheliosis virus dimeric RNA generated in vivo and in vitro.

    PubMed

    Darlix, J L; Gabus, C; Allain, B

    1992-12-01

    The retroviral genome consists of two identical RNA molecules associated at their 5' ends by a stable structure called the dimer linkage structure. The dimer linkage structure, while maintaining the dimer state of the retroviral genome, might also be involved in packaging and reverse transcription, as well as recombination during proviral DNA synthesis. To study the dimer structure of the retroviral genome and the mechanism of dimerization, we analyzed features of the dimeric genome of reticuloendotheliosis virus (REV) type A and identified elements required for its dimerization. Here we report that the REV dimeric genome extracted from virions and infected cells, as well as that synthesized in vitro, is more resistant to heat denaturation than avian sarcoma and leukemia virus, murine leukemia virus, or human immunodeficiency virus type 1 dimeric RNA. The minimal domain required to form a stable REV RNA dimer in vitro was found to map between positions 268 and 452 (KpnI and SalI sites), thus corresponding to the E encapsidation sequence (J. E. Embretson and H. M. Temin, J. Virol. 61:2675-2683, 1987). In addition, both the 5' and 3' halves of E are necessary in cis for RNA dimerization and the extent of RNA dimerization is influenced by viral sequences flanking E. Rapid and efficient dimerization of REV RNA containing gag sequences in addition to the E sequences and annealing of replication primer tRNA(Pro) to the primer-binding site necessitate the nucleocapsid protein.

  17. Analytical study of avian reticuloendotheliosis virus dimeric RNA generated in vivo and in vitro.

    PubMed Central

    Darlix, J L; Gabus, C; Allain, B

    1992-01-01

    The retroviral genome consists of two identical RNA molecules associated at their 5' ends by a stable structure called the dimer linkage structure. The dimer linkage structure, while maintaining the dimer state of the retroviral genome, might also be involved in packaging and reverse transcription, as well as recombination during proviral DNA synthesis. To study the dimer structure of the retroviral genome and the mechanism of dimerization, we analyzed features of the dimeric genome of reticuloendotheliosis virus (REV) type A and identified elements required for its dimerization. Here we report that the REV dimeric genome extracted from virions and infected cells, as well as that synthesized in vitro, is more resistant to heat denaturation than avian sarcoma and leukemia virus, murine leukemia virus, or human immunodeficiency virus type 1 dimeric RNA. The minimal domain required to form a stable REV RNA dimer in vitro was found to map between positions 268 and 452 (KpnI and SalI sites), thus corresponding to the E encapsidation sequence (J. E. Embretson and H. M. Temin, J. Virol. 61:2675-2683, 1987). In addition, both the 5' and 3' halves of E are necessary in cis for RNA dimerization and the extent of RNA dimerization is influenced by viral sequences flanking E. Rapid and efficient dimerization of REV RNA containing gag sequences in addition to the E sequences and annealing of replication primer tRNA(Pro) to the primer-binding site necessitate the nucleocapsid protein. Images PMID:1331519

  18. The African coelacanth genome provides insights into tetrapod evolution.

    PubMed

    Amemiya, Chris T; Alföldi, Jessica; Lee, Alison P; Fan, Shaohua; Philippe, Hervé; Maccallum, Iain; Braasch, Ingo; Manousaki, Tereza; Schneider, Igor; Rohner, Nicolas; Organ, Chris; Chalopin, Domitille; Smith, Jeramiah J; Robinson, Mark; Dorrington, Rosemary A; Gerdol, Marco; Aken, Bronwen; Biscotti, Maria Assunta; Barucca, Marco; Baurain, Denis; Berlin, Aaron M; Blatch, Gregory L; Buonocore, Francesco; Burmester, Thorsten; Campbell, Michael S; Canapa, Adriana; Cannon, John P; Christoffels, Alan; De Moro, Gianluca; Edkins, Adrienne L; Fan, Lin; Fausto, Anna Maria; Feiner, Nathalie; Forconi, Mariko; Gamieldien, Junaid; Gnerre, Sante; Gnirke, Andreas; Goldstone, Jared V; Haerty, Wilfried; Hahn, Mark E; Hesse, Uljana; Hoffmann, Steve; Johnson, Jeremy; Karchner, Sibel I; Kuraku, Shigehiro; Lara, Marcia; Levin, Joshua Z; Litman, Gary W; Mauceli, Evan; Miyake, Tsutomu; Mueller, M Gail; Nelson, David R; Nitsche, Anne; Olmo, Ettore; Ota, Tatsuya; Pallavicini, Alberto; Panji, Sumir; Picone, Barbara; Ponting, Chris P; Prohaska, Sonja J; Przybylski, Dariusz; Saha, Nil Ratan; Ravi, Vydianathan; Ribeiro, Filipe J; Sauka-Spengler, Tatjana; Scapigliati, Giuseppe; Searle, Stephen M J; Sharpe, Ted; Simakov, Oleg; Stadler, Peter F; Stegeman, John J; Sumiyama, Kenta; Tabbaa, Diana; Tafer, Hakim; Turner-Maier, Jason; van Heusden, Peter; White, Simon; Williams, Louise; Yandell, Mark; Brinkmann, Henner; Volff, Jean-Nicolas; Tabin, Clifford J; Shubin, Neil; Schartl, Manfred; Jaffe, David B; Postlethwait, John H; Venkatesh, Byrappa; Di Palma, Federica; Lander, Eric S; Meyer, Axel; Lindblad-Toh, Kerstin

    2013-04-18

    The discovery of a living coelacanth specimen in 1938 was remarkable, as this lineage of lobe-finned fish was thought to have become extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features. Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues show the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.

  19. Hereditary sensory and autonomic neuropathy type I in a Chinese family: British C133W mutation exists in the Chinese.

    PubMed

    Bi, Hongyan; Gao, Yunying; Yao, Sheng; Dong, Mingrui; Headley, Alexander Peter; Yuan, Yun

    2007-10-01

    Hereditary sensory and autonomic neuropathy type I (HSAN I) is an autosomal dominant disorder of the peripheral nervous system characterized by marked progressive sensory loss, with variable autonomic and motor involvement. The HSAN I locus maps to chromosome 9q22.1-22.3 and is caused by mutations in the gene coding for serine palmitoyltransferase long chain base subunit 1 (SPTLC1). Sequencing in HSAN I families have previously identified mutations in exons 5, 6 and 13 of this gene. Here we report the clinical, electrophysiological and pathological findings of a proband in a Chinese family with HSAN I. The affected members showed almost typical clinical features. Electrophysiological findings showed an axonal, predominantly sensory, neuropathy with motor and autonomic involvement. Sural nerve biopsy showed loss of myelinated and unmyelinated fibers. SPTLC1 mutational analysis revealed the C133W mutation, a mutation common in British HSAN I families.

  20. Processes involved in solving mathematical problems

    NASA Astrophysics Data System (ADS)

    Shahrill, Masitah; Putri, Ratu Ilma Indra; Zulkardi, Prahmana, Rully Charitas Indra

    2018-04-01

    This study examines one of the instructional practices features utilized within the Year 8 mathematics lessons in Brunei Darussalam. The codes from the TIMSS 1999 Video Study were applied and strictly followed, and from the 183 mathematics problems recorded, there were 95 problems with a solution presented during the public segments of the video-recorded lesson sequences of the four sampled teachers. The analyses involved firstly, identifying the processes related to mathematical problem statements, and secondly, examining the different processes used in solving the mathematical problems for each problem publicly completed during the lessons. The findings revealed that for three of the teachers, their problem statements coded as `using procedures' ranged from 64% to 83%, while the remaining teacher had 40% of his problem statements coded as `making connections.' The processes used when solving the problems were mainly `using procedures', and none of the problems were coded as `giving results only'. Furthermore, all four teachers made use of making the relevant connections in solving the problems given to their respective students.

  1. Leptotrichia species in human infections II

    PubMed Central

    Eribe, Emenike R. K.; Olsen, Ingar

    2017-01-01

    ABSTRACT Leptotrichia species are non-motile facultative anaerobic/anaerobic bacteria that are found mostly in the oral cavity and some other parts of the human body, in animals, and even in ocean sediments. Valid species include L. buccalis, L. goodfellowii, L. hofstadii, L. honkongensis, L. shahii, L. trevisanii, and L. wadei. Some species require serum or blood for growth. All species ferment carbohydrates and produce lactic acid that may be involved with tooth decay. Acting as opportunistic pathogens, they are involved in a variety of diseases, and have been isolated from immunocompromised but also immunocompetent individuals. Mucositis, oral lesions, wounds, and abscesses may predispose to Leptotrichia septicemia. Because identification of Leptotrichia species by phenotypic features occasionally lead to misidentification, genetic techniques such as 16S rRNA gene sequencing is recommended. Early diagnosis and treatment of leptotrichia infections is important for positive outcomes. Over the last years, Leptotrichia species have been associated with several changes in taxonomy and new associations with clinical diseases. Such changes are reported in this updated review. PMID:29081911

  2. Genome assembly with in vitro proximity ligation data and whole-genome triplication in lettuce

    PubMed Central

    Reyes-Chin-Wo, Sebastian; Wang, Zhiwen; Yang, Xinhua; Kozik, Alexander; Arikit, Siwaret; Song, Chi; Xia, Liangfeng; Froenicke, Lutz; Lavelle, Dean O.; Truco, María-José; Xia, Rui; Zhu, Shilin; Xu, Chunyan; Xu, Huaqin; Xu, Xun; Cox, Kyle; Korf, Ian; Meyers, Blake C.; Michelmore, Richard W.

    2017-01-01

    Lettuce (Lactuca sativa) is a major crop and a member of the large, highly successful Compositae family of flowering plants. Here we present a reference assembly for the species and family. This was generated using whole-genome shotgun Illumina reads plus in vitro proximity ligation data to create large superscaffolds; it was validated genetically and superscaffolds were oriented in genetic bins ordered along nine chromosomal pseudomolecules. We identify several genomic features that may have contributed to the success of the family, including genes encoding Cycloidea-like transcription factors, kinases, enzymes involved in rubber biosynthesis and disease resistance proteins that are expanded in the genome. We characterize 21 novel microRNAs, one of which may trigger phasiRNAs from numerous kinase transcripts. We provide evidence for a whole-genome triplication event specific but basal to the Compositae. We detect 26% of the genome in triplicated regions containing 30% of all genes that are enriched for regulatory sequences and depleted for genes involved in defence. PMID:28401891

  3. Structural features based genome-wide characterization and prediction of nucleosome organization

    PubMed Central

    2012-01-01

    Background Nucleosome distribution along chromatin dictates genomic DNA accessibility and thus profoundly influences gene expression. However, the underlying mechanism of nucleosome formation remains elusive. Here, taking a structural perspective, we systematically explored nucleosome formation potential of genomic sequences and the effect on chromatin organization and gene expression in S. cerevisiae. Results We analyzed twelve structural features related to flexibility, curvature and energy of DNA sequences. The results showed that some structural features such as DNA denaturation, DNA-bending stiffness, Stacking energy, Z-DNA, Propeller twist and free energy, were highly correlated with in vitro and in vivo nucleosome occupancy. Specifically, they can be classified into two classes, one positively and the other negatively correlated with nucleosome occupancy. These two kinds of structural features facilitated nucleosome binding in centromere regions and repressed nucleosome formation in the promoter regions of protein-coding genes to mediate transcriptional regulation. Based on these analyses, we integrated all twelve structural features in a model to predict more accurately nucleosome occupancy in vivo than the existing methods that mainly depend on sequence compositional features. Furthermore, we developed a novel approach, named DLaNe, that located nucleosomes by detecting peaks of structural profiles, and built a meta predictor to integrate information from different structural features. As a comparison, we also constructed a hidden Markov model (HMM) to locate nucleosomes based on the profiles of these structural features. The result showed that the meta DLaNe and HMM-based method performed better than the existing methods, demonstrating the power of these structural features in predicting nucleosome positions. Conclusions Our analysis revealed that DNA structures significantly contribute to nucleosome organization and influence chromatin structure and gene expression regulation. The results indicated that our proposed methods are effective in predicting nucleosome occupancy and positions and that these structural features are highly predictive of nucleosome organization. The implementation of our DLaNe method based on structural features is available online. PMID:22449207

  4. Identifying Group-Specific Sequences for Microbial Communities Using Long k-mer Sequence Signatures

    PubMed Central

    Wang, Ying; Fu, Lei; Ren, Jie; Yu, Zhaoxia; Chen, Ting; Sun, Fengzhu

    2018-01-01

    Comparing metagenomic samples is crucial for understanding microbial communities. For different groups of microbial communities, such as human gut metagenomic samples from patients with a certain disease and healthy controls, identifying group-specific sequences offers essential information for potential biomarker discovery. A sequence that is present, or rich, in one group, but absent, or scarce, in another group is considered “group-specific” in our study. Our main purpose is to discover group-specific sequence regions between control and case groups as disease-associated markers. We developed a long k-mer (k ≥ 30 bps)-based computational pipeline to detect group-specific sequences at strain resolution free from reference sequences, sequence alignments, and metagenome-wide de novo assembly. We called our method MetaGO: Group-specific oligonucleotide analysis for metagenomic samples. An open-source pipeline on Apache Spark was developed with parallel computing. We applied MetaGO to one simulated and three real metagenomic datasets to evaluate the discriminative capability of identified group-specific markers. In the simulated dataset, 99.11% of group-specific logical 40-mers covered 98.89% disease-specific regions from the disease-associated strain. In addition, 97.90% of group-specific numerical 40-mers covered 99.61 and 96.39% of differentially abundant genome and regions between two groups, respectively. For a large-scale metagenomic liver cirrhosis (LC)-associated dataset, we identified 37,647 group-specific 40-mer features. Any one of the features can predict disease status of the training samples with the average of sensitivity and specificity higher than 0.8. The random forests classification using the top 10 group-specific features yielded a higher AUC (from ∼0.8 to ∼0.9) than that of previous studies. All group-specific 40-mers were present in LC patients, but not healthy controls. All the assembled 11 LC-specific sequences can be mapped to two strains of Veillonella parvula: UTDB1-3 and DSM2008. The experiments on the other two real datasets related to Inflammatory Bowel Disease and Type 2 Diabetes in Women consistently demonstrated that MetaGO achieved better prediction accuracy with fewer features compared to previous studies. The experiments showed that MetaGO is a powerful tool for identifying group-specific k-mers, which would be clinically applicable for disease prediction. MetaGO is available at https://github.com/VVsmileyx/MetaGO. PMID:29774017

  5. Formulaic Sequences and the Implications for Second Language Learning

    ERIC Educational Resources Information Center

    Xu, Qi

    2016-01-01

    The present paper is a review of literature in relation to formulaic sequences and the implications for second language learning. The formulaic sequence is a significant part of our language, and plays an essential role in both first and second language learning. The paper first introduces the definition, classifications, and major features of…

  6. The nucleotide sequence of 5S ribosomal RNA from Micrococcus lysodeikticus.

    PubMed Central

    Hori, H; Osawa, S; Murao, K; Ishikura, H

    1980-01-01

    The nucleotide sequence of ribosomal 5S RNA from Micrococcus lysodeikticus is pGUUACGGCGGCUAUAGCGUGGGGGAAACGCCCGGCCGUAUAUCGAACCCGGAAGCUAAGCCCCAUAGCGCCGAUGGUUACUGUAACCGGGAGGUUGUGGGAGAGUAGGUCGCCGCCGUGAOH. When compared to other 5S RNAs, the sequence homology is greatest with Thermus aquaticus, and these two 5S RNAs reveal several features intermediate between those of typical gram-positive bacteria and gram-negative bacteria. PMID:6780979

  7. Atropos: specific, sensitive, and speedy trimming of sequencing reads.

    PubMed

    Didion, John P; Martin, Marcel; Collins, Francis S

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos.

  8. Statistical properties of DNA sequences

    NASA Technical Reports Server (NTRS)

    Peng, C. K.; Buldyrev, S. V.; Goldberger, A. L.; Havlin, S.; Mantegna, R. N.; Simons, M.; Stanley, H. E.

    1995-01-01

    We review evidence supporting the idea that the DNA sequence in genes containing non-coding regions is correlated, and that the correlation is remarkably long range--indeed, nucleotides thousands of base pairs distant are correlated. We do not find such a long-range correlation in the coding regions of the gene. We resolve the problem of the "non-stationarity" feature of the sequence of base pairs by applying a new algorithm called detrended fluctuation analysis (DFA). We address the claim of Voss that there is no difference in the statistical properties of coding and non-coding regions of DNA by systematically applying the DFA algorithm, as well as standard FFT analysis, to every DNA sequence (33301 coding and 29453 non-coding) in the entire GenBank database. Finally, we describe briefly some recent work showing that the non-coding sequences have certain statistical features in common with natural and artificial languages. Specifically, we adapt to DNA the Zipf approach to analyzing linguistic texts. These statistical properties of non-coding sequences support the possibility that non-coding regions of DNA may carry biological information.

  9. Atropos: specific, sensitive, and speedy trimming of sequencing reads

    PubMed Central

    Collins, Francis S.

    2017-01-01

    A key step in the transformation of raw sequencing reads into biological insights is the trimming of adapter sequences and low-quality bases. Read trimming has been shown to increase the quality and reliability while decreasing the computational requirements of downstream analyses. Many read trimming software tools are available; however, no tool simultaneously provides the accuracy, computational efficiency, and feature set required to handle the types and volumes of data generated in modern sequencing-based experiments. Here we introduce Atropos and show that it trims reads with high sensitivity and specificity while maintaining leading-edge speed. Compared to other state-of-the-art read trimming tools, Atropos achieves significant increases in trimming accuracy while remaining competitive in execution times. Furthermore, Atropos maintains high accuracy even when trimming data with elevated rates of sequencing errors. The accuracy, high performance, and broad feature set offered by Atropos makes it an appropriate choice for the pre-processing of Illumina, ABI SOLiD, and other current-generation short-read sequencing datasets. Atropos is open source and free software written in Python (3.3+) and available at https://github.com/jdidion/atropos. PMID:28875074

  10. Using cellular automata to generate image representation for biological sequences.

    PubMed

    Xiao, X; Shao, S; Ding, Y; Huang, Z; Chen, X; Chou, K-C

    2005-02-01

    A novel approach to visualize biological sequences is developed based on cellular automata (Wolfram, S. Nature 1984, 311, 419-424), a set of discrete dynamical systems in which space and time are discrete. By transforming the symbolic sequence codes into the digital codes, and using some optimal space-time evolvement rules of cellular automata, a biological sequence can be represented by a unique image, the so-called cellular automata image. Many important features, which are originally hidden in a long and complicated biological sequence, can be clearly revealed thru its cellular automata image. With biological sequences entering into databanks rapidly increasing in the post-genomic era, it is anticipated that the cellular automata image will become a very useful vehicle for investigation into their key features, identification of their function, as well as revelation of their "fingerprint". It is anticipated that by using the concept of the pseudo amino acid composition (Chou, K.C. Proteins: Structure, Function, and Genetics, 2001, 43, 246-255), the cellular automata image approach can also be used to improve the quality of predicting protein attributes, such as structural class and subcellular location.

  11. A statistical learning approach to the modeling of chromatographic retention of oligonucleotides incorporating sequence and secondary structure data

    PubMed Central

    Sturm, Marc; Quinten, Sascha; Huber, Christian G.; Kohlbacher, Oliver

    2007-01-01

    We propose a new model for predicting the retention time of oligonucleotides. The model is based on ν support vector regression using features derived from base sequence and predicted secondary structure of oligonucleotides. Because of the secondary structure information, the model is applicable even at relatively low temperatures where the secondary structure is not suppressed by thermal denaturing. This makes the prediction of oligonucleotide retention time for arbitrary temperatures possible, provided that the target temperature lies within the temperature range of the training data. We describe different possibilities of feature calculation from base sequence and secondary structure, present the results and compare our model to existing models. PMID:17567619

  12. Protein Information Resource: a community resource for expert annotation of protein data

    PubMed Central

    Barker, Winona C.; Garavelli, John S.; Hou, Zhenglin; Huang, Hongzhan; Ledley, Robert S.; McGarvey, Peter B.; Mewes, Hans-Werner; Orcutt, Bruce C.; Pfeiffer, Friedhelm; Tsugita, Akira; Vinayaka, C. R.; Xiao, Chunlin; Yeh, Lai-Su L.; Wu, Cathy

    2001-01-01

    The Protein Information Resource, in collaboration with the Munich Information Center for Protein Sequences (MIPS) and the Japan International Protein Information Database (JIPID), produces the most comprehensive and expertly annotated protein sequence database in the public domain, the PIR-International Protein Sequence Database. To provide timely and high quality annotation and promote database interoperability, the PIR-International employs rule-based and classification-driven procedures based on controlled vocabulary and standard nomenclature and includes status tags to distinguish experimentally determined from predicted protein features. The database contains about 200 000 non-redundant protein sequences, which are classified into families and superfamilies and their domains and motifs identified. Entries are extensively cross-referenced to other sequence, classification, genome, structure and activity databases. The PIR web site features search engines that use sequence similarity and database annotation to facilitate the analysis and functional identification of proteins. The PIR-Inter­national databases and search tools are accessible on the PIR web site at http://pir.georgetown.edu/ and at the MIPS web site at http://www.mips.biochem.mpg.de. The PIR-International Protein Sequence Database and other files are also available by FTP. PMID:11125041

  13. Sockeye: A 3D Environment for Comparative Genomics

    PubMed Central

    Montgomery, Stephen B.; Astakhova, Tamara; Bilenky, Mikhail; Birney, Ewan; Fu, Tony; Hassel, Maik; Melsopp, Craig; Rak, Marcin; Robertson, A. Gordon; Sleumer, Monica; Siddiqui, Asim S.; Jones, Steven J.M.

    2004-01-01

    Comparative genomics techniques are used in bioinformatics analyses to identify the structural and functional properties of DNA sequences. As the amount of available sequence data steadily increases, the ability to perform large-scale comparative analyses has become increasingly relevant. In addition, the growing complexity of genomic feature annotation means that new approaches to genomic visualization need to be explored. We have developed a Java-based application called Sockeye that uses three-dimensional (3D) graphics technology to facilitate the visualization of annotation and conservation across multiple sequences. This software uses the Ensembl database project to import sequence and annotation information from several eukaryotic species. A user can additionally import their own custom sequence and annotation data. Individual annotation objects are displayed in Sockeye by using custom 3D models. Ensembl-derived and imported sequences can be analyzed by using a suite of multiple and pair-wise alignment algorithms. The results of these comparative analyses are also displayed in the 3D environment of Sockeye. By using the Java3D API to visualize genomic data in a 3D environment, we are able to compactly display cross-sequence comparisons. This provides the user with a novel platform for visualizing and comparing genomic feature organization. PMID:15123592

  14. probeBase—an online resource for rRNA-targeted oligonucleotide probes and primers: new features 2016

    PubMed Central

    Greuter, Daniel; Loy, Alexander; Horn, Matthias; Rattei, Thomas

    2016-01-01

    probeBase http://www.probebase.net is a manually maintained and curated database of rRNA-targeted oligonucleotide probes and primers. Contextual information and multiple options for evaluating in silico hybridization performance against the most recent rRNA sequence databases are provided for each oligonucleotide entry, which makes probeBase an important and frequently used resource for microbiology research and diagnostics. Here we present a major update of probeBase, which was last featured in the NAR Database Issue 2007. This update describes a complete remodeling of the database architecture and environment to accommodate computationally efficient access. Improved search functions, sequence match tools and data output now extend the opportunities for finding suitable hierarchical probe sets that target an organism or taxon at different taxonomic levels. To facilitate the identification of complementary probe sets for organisms represented by short rRNA sequence reads generated by amplicon sequencing or metagenomic analysis with next generation sequencing technologies such as Illumina and IonTorrent, we introduce a novel tool that recovers surrogate near full-length rRNA sequences for short query sequences and finds matching oligonucleotides in probeBase. PMID:26586809

  15. Chronic myelomonocytic leukemia masquerading as cutaneous indeterminate dendritic cell tumor: Expanding the spectrum of skin lesions in chronic myelomonocytic leukemia.

    PubMed

    Loghavi, Sanam; Curry, Jonathan L; Garcia-Manero, Guillermo; Patel, Keyur P; Xu, Jie; Khoury, Joseph D; Torres-Cabala, Carlos A; Nagarajan, Priyadharsini; Aung, Phyu P; Gibson, Bernard R; Goodwin, Brandon P; Kelly, Brent C; Korivi, Brinda R; Medeiros, L Jeffrey; Prieto, Victor G; Kantarjian, Hagop M; Bueso-Ramos, Carlos E; Tetzlaff, Michael T

    2017-12-01

    Chronic myelomonocytic leukemia (CMML) is a hematopoietic stem cell neoplasm exhibiting both myelodysplastic and myeloproliferative features. Cutaneous involvement by CMML is critical to recognize as it typically is a harbinger of disease progression and an increased incidence of transformation to acute myeloid leukemia. Cutaneous lesions of CMML exhibit heterogeneous histopathologic features that can be challenging to recognize as CMML. We describe a 67-year-old man with a 3-year history of CMML who had been managed on single-agent azacitidine with stable disease before developing splenomegaly and acute onset skin lesions. Examination of these skin lesions revealed a dense infiltrate of histiocytic cells morphologically resembling Langerhans type cells (lacking frank histopathologic atypia), and with the immunophenotype of an indeterminate cell histiocytosis (S100+ CD1a+ and langerin-). Given the history of CMML, next-generation sequencing studies were performed on the skin biopsy. These revealed a KRAS (p.G12R) mutation identical to that seen in the CMML 3 years prior, establishing a clonal relationship between the 2 processes. This case expands the spectrum for and underscores the protean nature of cutaneous involvement by CMML and underscores the importance of heightened vigilance when evaluating skin lesions of CMML patients. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  16. Intrinsic epidermoid of the brain stem: case report and review of the literature.

    PubMed

    Singh, Saraj K; Jain, Kapil; Jain, Vijendra Kumar

    2018-03-19

    Purely cystic brain stem epidermoid is a rare diagnosis among all brainstem cystic lesions. Further, it is very rare in pediatric age group. Here, we are reporting a rare case of completely cystic brain stem epidermoid in a child. The patient presented with clinical features of brain stem involvement. MRI brain was suggestive of cystic brain stem lesion. Patient went through surgical procedure. Final diagnosis of epidermoid cyst was confirmed on histopathological report. With the help of various advanced sequences of MRI like diffusion and ADC, diagnosis of epidermoid cyst can be established at unusual intracranial site also. Surgical resection of epidermoid cyst at brain stem should be attempted judiciously utilizing all modern tools of neurosurgery.

  17. Early allelic selection in maize as revealed by ancient DNA.

    PubMed

    Jaenicke-Després, Viviane; Buckler, Ed S; Smith, Bruce D; Gilbert, M Thomas P; Cooper, Alan; Doebley, John; Pääbo, Svante

    2003-11-14

    Maize was domesticated from teosinte, a wild grass, by approximately 6300 years ago in Mexico. After initial domestication, early farmers continued to select for advantageous morphological and biochemical traits in this important crop. However, the timing and sequence of character selection are, thus far, known only for morphological features discernible in corn cobs. We have analyzed three genes involved in the control of plant architecture, storage protein synthesis, and starch production from archaeological maize samples from Mexico and the southwestern United States. The results reveal that the alleles typical of contemporary maize were present in Mexican maize by 4400 years ago. However, as recently as 2000 years ago, allelic selection at one of the genes may not yet have been complete.

  18. Novel mutation of OCRL1 in Lowe syndrome.

    PubMed

    Liu, Ting; Yue, Zhihui; Wang, Haiyan; Tong, Huajuan; Sun, Liangzhong

    2015-01-01

    Lowe syndrome is a rare, X-linked recessive genetic disease with multi-organ involvement. The pathogenic gene is OCRL1. The authors analyzed the OCRL1 mutation and summarized the clinical features of a Chinese child with Lowe syndrome. The patient is a 3 year 7 mo-old boy. He presented with hypotonia at birth and gradually presented with bilateral congenital cataracts, psychomotor retardation, hypophosphatemic rickets and renal tubular function disorder. Sequence analysis of OCRL1 revealed a novel insertion mutation, c.2367insA (p. Ala813X), in exon 22. This mutation was suspected to cause a premature stop codon of OCRL1 and truncation of the OCRL1 protein. His mother, who carried a heterozygous mutation, had no sign of abnormality.

  19. Historical feature pattern extraction based network attack situation sensing algorithm.

    PubMed

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously.

  20. Historical Feature Pattern Extraction Based Network Attack Situation Sensing Algorithm

    PubMed Central

    Zeng, Yong; Liu, Dacheng; Lei, Zhou

    2014-01-01

    The situation sequence contains a series of complicated and multivariate random trends, which are very sudden, uncertain, and difficult to recognize and describe its principle by traditional algorithms. To solve the above questions, estimating parameters of super long situation sequence is essential, but very difficult, so this paper proposes a situation prediction method based on historical feature pattern extraction (HFPE). First, HFPE algorithm seeks similar indications from the history situation sequence recorded and weighs the link intensity between occurred indication and subsequent effect. Then it calculates the probability that a certain effect reappears according to the current indication and makes a prediction after weighting. Meanwhile, HFPE method gives an evolution algorithm to derive the prediction deviation from the views of pattern and accuracy. This algorithm can continuously promote the adaptability of HFPE through gradual fine-tuning. The method preserves the rules in sequence at its best, does not need data preprocessing, and can track and adapt to the variation of situation sequence continuously. PMID:24892054

Top