Note: This page contains sample records for the topic jackknifing from
While these samples are representative of the content of,
they are not comprehensive nor are they the most current set.
We encourage you to perform a real-time search of
to obtain the most current and comprehensive results.
Last update: August 15, 2014.


EPA Science Inventory

An exact expression is given for the jackknife estimate of the number of species in a community and its variance when one uses quadrat sampling procedures. The jackknife estimate is a function of the number of species that occur in one and only one quadrat. The variance of the nu...


Estimating Species Richness Using the Jackknife Procedure.  

National Technical Information Service (NTIS)

An exact expression is given for the jackknife estimate of the number of species in a community and its variance when one uses quadrat sampling procedures. The jackknife estimate is a function of the number of species that occur in one and only one quadra...

J. F. Heltshe N. E. Forrester



Nonparametric Estimation of Standard Errors in Covariance Analysis Using the Infinitesimal Jackknife  

ERIC Educational Resources Information Center

The infinitesimal jackknife provides a simple general method for estimating standard errors in covariance structure analysis. Beyond its simplicity and generality what makes the infinitesimal jackknife method attractive is that essentially no assumptions are required to produce consistent standard error estimates, not even the requirement that the…

Jennrich, Robert I.



Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR)  

Microsoft Academic Search

A method for assessing the uncertainty of the individual bilinear model parameters from two-block regression modelling by multivariate partial least squares regression (PLSR) is presented. The method is based on the so-called “Jack-knife” resampling, comparing the perturbed model parameter estimates from cross-validation with the estimates from the full model. The conventional jack-knifing from ordinary least squares regression is modified in

Harald Martens; Magni Martens



Jack-knife stretching promotes flexibility of tight hamstrings after 4 weeks: a pilot study.  


Tight hamstrings are reported to be one of the causes of low back pain. However, there have been few reports on effective stretching procedures for the tight hamstrings. The so-called jack-knife stretch, an active-static type of stretching, can efficiently increase the flexibility of tight hamstrings. To evaluate hamstring tightness before and after the 4-week stretching protocol in healthy volunteer adults and patients aged under 18 years with low back pain. For understanding the hamstrings tightness, we measured two parameters including (1) finger to floor distance (FFD) and (2) pelvis forward inclination angle (PFIA). Eight healthy adult volunteers who had no lumbar or hip problems participated in this study (mean age: 26.8 years). All lacked flexibility and their FFD were positive before the experiment. Subjects performed 2 sets of the jack-knife stretch every day for 4 weeks. One set consisted of 5 repetitions, each held for 5 s. Before and during the 4-week experiment, the FFD and PFIA of toe-touching tests were measured weekly. For 17 of the sports players aged under 18, only FFD was measured. In adult volunteers, FFD was 14.1 ± 6.1 cm before the experiment and decreased to -8.1 ± 3.7 cm by the end of week 4, indicating a gain in flexibility of 22.2 cm. PFIA was 50.6 ± 8.2 before the experiment and 83.8 ± 5.8 degrees after. Before and after the experiment, the differences were significant (p < 0.05). For those aged under 18, FFD was 8.1 ± 8.0 and -9.6 ± 6.8, before and after the stretching, respectively. This difference was significant (p < 0.05). The jack-knife stretch is a useful active-static stretching technique to efficiently increase flexibility of tight hamstrings. PMID:23412177

Sairyo, Koichi; Kawamura, Takeshi; Mase, Yasuyoshi; Hada, Yasushi; Sakai, Toshinori; Hasebe, Kiyotaka; Dezawa, Akira



Power spectral estimates using two-dimensional Morlet-fan wavelets with emphasis on the long wavelengths: jackknife errors, bandwidth resolution and orthogonality properties  

NASA Astrophysics Data System (ADS)

We present a method for estimating the errors on local and global wavelet power spectra using the jackknife approach to error estimation, and compare results with jackknifed multitaper (MT) spectrum estimates. We test the methods on both synthetic and real data, the latter being free air gravity over the Congo Basin. To satisfy the independence requirement of the jackknife we investigate the orthogonality properties of the 2-D Morlet wavelet. Although Morlet wavelets are non-orthogonal, we show that careful selection of parameters can yield approximate orthogonality in space and azimuth. We also find that, when computed via the Fourier transform, the continuous wavelet transform (CWT) contains errors at very long wavelengths due to the discretization of large-scale wavelets in the Fourier domain. We hence recommend the use of convolution in the space-domain at these scales, even though this is computationally more expensive. Finally, in providing an investigation into the bandwidth resolution of CWT and MT spectra and errors at long wavelengths, we show that the Morlet wavelet is superior in this regard to Slepian tapers. Wavelets with higher bandwidth resolution deliver smaller spectral error estimates, in contrast to the MT method, where tapers with higher bandwidth resolution deliver larger errors. This results in the fan-WT having better spectral estimation properties at long wavelengths than Slepian MTs.

Kirby, J. F.; Swain, C. J.



Power spectral estimates using two-dimensional Morlet-fan wavelets with emphasis on the long wavelengths: jackknife errors, bandwidth resolution and orthogonality properties  

NASA Astrophysics Data System (ADS)

We present a method for estimating the errors on local and global wavelet power spectra using the jackknife approach to error estimation, and compare results with jackknifed multitaper (MT) spectrum estimates. We test the methods on both synthetic and real data, the latter being free air gravity over the Congo Basin. To satisfy the independence requirement of the jackknife we investigate the orthogonality properties of the 2-D Morlet wavelet. Although Morlet wavelets are non-orthogonal, we show that careful selection of parameters can yield approximate orthogonality in space and azimuth. We also find that, when computed via the Fourier transform, the continuous wavelet transform (CWT) contains errors at very long wavelengths due to the discretization of large-scale wavelets in the Fourier domain. We hence recommend the use of convolution in the space-domain at these scales, even though this is computationally more expensive. Finally, in providing an investigation into the bandwidth resolution of CWT and MT spectra and errors at long wavelengths, we show that the Morlet wavelet is superior in this regard to Slepian tapers. Wavelets with higher bandwidth resolution deliver smaller spectral error estimates, in contrast to the MT method, where tapers with higher bandwidth resolution deliver larger errors. This results in the fan-WT having better spectral estimation properties at long wavelengths than Slepian MTs.

Kirby, J. F.; Swain, C. J.



Jackknifing Estimated Weighted Least Squares.  

National Technical Information Service (NTIS)

The paper investigates regression analysis of experimental designs with replications, assuming variance heterogeneity, possibly combined with nonnormality. These replications yield variance estimators which result in Estimated Weighted Least Squares (EWLS...

J. P. C. Kleijnen P. C. A. Karremans W. K. Oortwijn W. J. H. van Groenendaal



The effect of temperature and wing morphology on quantitative genetic variation in the cricket Gryllus firmus, with an appendix examining the statistical properties of the Jackknife-MANOVA method of matrix comparison.  


We investigated the effect of temperature and wing morphology on the quantitative genetic variances and covariances of five size-related traits in the sand cricket, Gryllus firmus. Micropterous and macropterous crickets were reared in the laboratory at 24, 28 and 32 degrees C. Quantitative genetic parameters were estimated using a nested full-sib family design, and (co)variance matrices were compared using the T method, Flury hierarchy and Jackknife-manova method. The results revealed that the mean phenotypic value of each trait varied significantly among temperatures and wing morphs, but temperature reaction norms were not similar across all traits. Micropterous individuals were always smaller than macropterous individuals while expressing more phenotypic variation, a finding discussed in terms of canalization and life-history trade-offs. We observed little variation between the matrices of among-family (co)variation corresponding to each combination of temperature and wing morphology, with only one matrix of six differing in structure from the others. The implications of this result are discussed with respect to the prediction of evolutionary trajectories. PMID:15525410

Bégin, M; Roff, D A; Debat, V



Jackknife: Its Application to Test Equating.  

National Technical Information Service (NTIS)

Many tests used by the Armed Services are revised frequently to update content and to reduce compromise. A major psychometric concern during revision is the necessity of deriving scores on the new test which are comparable to those on the old test. This s...

J. A. Earles M. J. Ree M. G. Kadura



Fatal accidental asphyxia in a jack-knife position  

Microsoft Academic Search

Accidental death from postural or positional asphyxia takes place when the abnormal position of the victim’s body compromises the process of respiration. Diagnosis is largely made by circumstantial evidence supported by absence of any other significant pathology or trauma explaining death. This case report is about a 50-year-old male who had been drinking the previous night and was found dead

F. A. Benomran



An Iterative Jackknife Approach for Assessing Reliability and Power of fMRI Group Analyses  

PubMed Central

For functional magnetic resonance imaging (fMRI) group activation maps, so-called second-level random effect approaches are commonly used, which are intended to be generalizable to the population as a whole. However, reliability of a certain activation focus as a function of group composition or group size cannot directly be deduced from such maps. This question is of particular relevance when examining smaller groups (<20–27 subjects). The approach presented here tries to address this issue by iteratively excluding each subject from a group study and presenting the overlap of the resulting (reduced) second-level maps in a group percent overlap map. This allows to judge where activation is reliable even upon excluding one, two, or three (or more) subjects, thereby also demonstrating the inherent variability that is still present in second-level analyses. Moreover, when progressively decreasing group size, foci of activation will become smaller and/or disappear; hence, the group size at which a given activation disappears can be considered to reflect the power necessary to detect this particular activation. Systematically exploiting this effect allows to rank clusters according to their observable effect size. The approach is tested using different scenarios from a recent fMRI study (children performing a “dual-use” fMRI task, n?=?39), and the implications of this approach are discussed.

Wilke, Marko



A new multisensor network for collision avoidance and jackknife prevention of articulated vehicles using Lebesgue sampling  

Microsoft Academic Search

Sensor networks are increasingly used in advanced vehicle and transportation applications. It is desirable in many systems to minimize sensor power requirements. Motivated by the flexibility available with battery powered wireless sensors, this paper presents a new sensor network communication method that significantly reduces the occurrence of message transmissions. This is achieved by applying Lebesgue sampling theory to detect transitions

Roy A. McCann; Anh T. Le



Jackknife instrumental variables estimation: replication and extension of angrist, imbens and krueger (1999)  

Microsoft Academic Search

I replicate most of the results in Angrist, Imbens, and Krueger (Journal of Applied Econometrics 1999; 14: 57-67), point to a possible error in and re-estimate Model 3, and analyze some simple extensions. The programming code, data, and results are available at

Anton Nakov



A Leisurely Look at the Bootstrap, the Jackknife, and Cross-Validation  

Microsoft Academic Search

This is an invited expository article for The American Statistician. It reviews the nonparametric estimation of statistical error, mainly the bias and standard error of an estimator, or the error rate of a prediction rule. The presentation is written at a relaxed mathematical level, omitting most proofs, regularity conditions, and technical details.

Bradley Efron; Gail Gong



Statistical Inference on Associated Fertility Life Parameters Using Jackknife Technique: Computational Aspects  

Microsoft Academic Search

Knowledge of population growth potential is crucial for studying population dynamics and for establishing management tactics for pest control. Estimation of population growth can be achieved with fertility life tables because they synthesize data on reproduction and mortality of a population. The five main parameters associated with a fertility life table are as follows: (1) the net reproductive rate (Ro),

Aline de H. N. Maia; Alfredo J. B. Luiz; Clayton Campanhola



Complete mitochondrial genome of the jackknife clam Solen grandis (Veneroida, Solenidae).  


The complete mitochondrial genome sequence of Solen grandis that lives in sub-tidal waters and being buried in muddy to fine sand substrates, is described in this paper. The mitogenome (16,794 bp) consists of 12 protein-coding genes (loss of ATPase subunit 8), 22 tRNA genes, 2 rRNA genes and 1 putative control region. It is the typical bivalve mitochondrial gene composition. PMID:22409762

Zhu, Hong Chai; Shen, He Ding; Zheng, Pei; Zhang, Yu



ROCView: prototype software for data collection in jackknife alternative free-response receiver operating characteristic analysis  

PubMed Central

ROCView has been developed as an image display and response capture (IDRC) solution to image display and consistent recording of reader responses in relation to the free-response receiver operating characteristic paradigm. A web-based solution to IDRC for observer response studies allows observations to be completed from any location, assuming that display performance and viewing conditions are consistent with the study being completed. The simplistic functionality of the software allows observations to be completed without supervision. ROCView can display images from multiple modalities, in a randomised order if required. Following registration, observers are prompted to begin their image evaluation. All data are recorded via mouse clicks, one to localise (mark) and one to score confidence (rate) using either an ordinal or continuous rating scale. Up to nine “mark-rating” pairs can be made per image. Unmarked images are given a default score of zero. Upon completion of the study, both true-positive and false-positive reports can be downloaded and adapted for analysis. ROCView has the potential to be a useful tool in the assessment of modality performance difference for a range of imaging methods.

Thompson, J; Hogg, P; Thompson, S; Manning, D; Szczepura, K



Cross-Validation, the Jackknife, and the Bootstrap: Excess Error Estimation in Forward Logistic Regression  

Microsoft Academic Search

Given a prediction rule based on a set of patients, what is the probability of incorrectly predicting the outcome of a new patient? Call this probability the true error. An optimistic estimate is the apparent error, or the proportion of incorrect predictions on the original set of patients, and it is the goal of this article to study estimates of

Gail Gong



A Fortran IV Program for Estimating Parameters through Multiple Matrix Sampling with Standard Errors of Estimate Approximated by the Jackknife.  

ERIC Educational Resources Information Center

Described and listed herein with concomitant sample input and output is the Fortran IV program which estimates parameters and standard errors of estimate per parameters for parameters estimated through multiple matrix sampling. The specific program is an improved and expanded version of an earlier version. (Author/BJG)

Shoemaker, David M.


Pre-collapse identification of sinkholes in unconsolidated media at Dead Sea area by `nanoseismic monitoring' (graphical jackknife location of weak sources by few, low-SNR records)  

Microsoft Academic Search

The sudden failure of near-surface cavities and the resulting sinkholes have constituted a recent hazard affecting the populations, lifelines and the economy of the Dead Sea region. This paper describes how seismic monitoring techniques could detect the extremely low-energy signals produced by cavitation in unconsolidated, layered media. Dozens of such events were recorded within a radius of 200 m during

Gilles Hillel Wust-Bloch; Manfred Joswig



Issues in Predictive Discriminant Analysis: Using and Interpreting the Leave-One-Out Jackknife Method and the Improvement-Over-Change "I" Index Effect Size.  

ERIC Educational Resources Information Center

Prediction of group membership is the goal of predictive discriminant analysis (PDA) and the accuracy of group classification is the focus of PDA. The purpose of this paper is to provide an overview of how PDA works and how it can be used to answer a variety of research questions. The paper explains what PDA is and why it is important, and it…

Hwang, Dae-Yeop


16 CFR 1632.1 - Definitions.  

Code of Federal Regulations, 2010 CFR

...lounges, push-back sofas, sleep lounges, sofa beds (including jackknife sofa beds), sofa lounges (including glide-outs), studio...prototype means mattresses of a particular design, sharing all materials and methods...



16 CFR 1632.1 - Definitions.  

Code of Federal Regulations, 2010 CFR

...lounges, push-back sofas, sleep lounges, sofa beds (including jackknife sofa beds), sofa lounges (including glide-outs), studio...prototype means mattresses of a particular design, sharing all materials and methods...



Evaluating Result Replicability: Better Alternatives to Significance Tests.  

ERIC Educational Resources Information Center

Three procedures for evaluating the replicability of descriptive discriminant analysis (DDA) results are discussed. The techniques include cross-validation, the jackknife, and the bootstrap. Discriminant analysis is a multivariate technique used when group membership or classification is the focus of the analysis. DDA is used to describe major…

Gillaspy, James Arthur, Jr.


Estimating the reliability of diet overlap measures  

Microsoft Academic Search

Diet overlap measures, commonly used in studies of resource partitioning and competition in fish, are too often treated as fixed values; in fact, they are random variables. Two methods for estimating the variance of some overlap measures using stomach content data are examined here: the jackknife and the bootstrap. Simulation results indicate that the methods work well. In addition, they

Eric P. Smith



Affinities and Historical Zoogeography of the New Zealand Short-Tailed Bat, Mystacina tuberculata Gray 1843, Inferred from DNA-Hybridization Comparisons  

Microsoft Academic Search

We carried out DNA-hybridization comparisons among representatives of the major groups of Chiroptera to determine the phylogenetic position of the New Zealand short-tailed bat, Mystacina tuberculata. All analyses confirmed the noctilionoid affinity of this species suggested by an earlier serological study, with support from taxon jackknifing and at bootstrap levels of 98% or higher. However, a specific association with Noctilio

John A. W. Kirsch; James M. Hutcheon; Deanna G. P. Byrnes; Brian D. Lloyd



46 CFR 160.043-6 - Marking and packing.  

Code of Federal Regulations, 2013 CFR

... 2013-10-01 false Marking and packing. 160.043-6 Section 160.043-6...Vessels § 160.043-6 Marking and packing. (a) General. Jackknives...method for using the can opener. (c) Packing. Each jackknife, complete...



The truck backer-upper: an example of self-learning in neural networks  

Microsoft Academic Search

Neural networks can be used to solve highly nonlinear control problems. A two-layer neural network containing 26 adaptive neural elements has learned to back up a computer-simulated trailer truck to a loading dock, even when initially jackknifed. It is not yet known how to design a controller to perform this steering task. Nevertheless, the neural net was able to learn

Derrick Nguyen; Bernard Widrow



Accurate Prediction of Protein Secondary Structural Content  

Microsoft Academic Search

An improved multiple linear regression (MLR) method is proposed to predict a protein's secondary structural content based on its primary sequence. The amino acid composition, the autocorrelation function, and the interaction function of side-chain mass derived from the primary sequence are taken into account. The average absolute errors of prediction over 704 unrelated proteins with the jackknife test are 0.088,

Zong Lin; Xian-Ming Pan



Variance Estimation for NAEP Data Using a Resampling-Based Approach: An Application of Cognitive Diagnostic Models. Research Report. ETS RR-10-26  

ERIC Educational Resources Information Center

This paper presents an application of a jackknifing approach to variance estimation of ability inferences for groups of students, using a multidimensional discrete model for item response data. The data utilized to demonstrate the approach come from the National Assessment of Educational Progress (NAEP). In contrast to the operational approach…

Hsieh, Chueh-an; Xu, Xueli; von Davier, Matthias



How Meaningful Are Bayesian Support Values?  

Microsoft Academic Search

In this study, we used an empirical example based on 100 mitochondrial genomes from higher teleost fishes to compare the accuracy of parsimony-based jackknife values with Bayesian support values. Phylogenetic analyses of 366 partitions, using differential taxon and character sampling from the entire data matrix of 100 taxa and 7,990 characters, were performed for both phylogenetic methods. The tree topology

Mark P. Simmons; Kurt M. Pickett; Masaki Miya



Two-Step Weighted Least Squares Factor Analysis of Dichotomized Variables  

ERIC Educational Resources Information Center

A two-step weighted least squares estimator for multiple factor analysis of dichotomized variables is discussed. The estimator is based on the first and second order joint probabilities. Asymptotic standard errors and a model test are obtained by applying the Jackknife procedure. (Author)

Christoffersson, Anders



Ribosomal DNA and Resolution of Branching Order among the Ascomycota: How Many Nucleotides Are Enough?  

Microsoft Academic Search

Molecular phylogenies for the fungi in the Ascomycota rely heavily on 18S rRNA gene sequences but this gene alone does not answer all questions about relationships. Particularly problematical are the relationships among the first ascomycetes to diverge, the Archiascomycetes, and the branching order among the basal filamentous ascomycetes, the Euascomycetes. Would more data resolve branching order? We used the jackknife

Mary L. Berbee; David A. Carmean; Katarina Winka



Resampling Methods Revisited: Advancing the Understanding and Applications in Educational Research  

ERIC Educational Resources Information Center

Resampling methods including randomization test, cross-validation, the jackknife and the bootstrap are widely employed in the research areas of natural science, engineering and medicine, but they lack appreciation in educational research. The purpose of the present review is to revisit and highlight the key principles and developments of…

Bai, Haiyan; Pan, Wei



The Beginner's Guide to the Bootstrap Method of Resampling.  

ERIC Educational Resources Information Center

The bootstrap method of resampling can be useful in estimating the replicability of study results. The bootstrap procedure creates a mock population from a given sample of data from which multiple samples are then drawn. The method extends the usefulness of the jackknife procedure as it allows for computation of a given statistic across a maximal…

Lane, Ginny G.



Microsoft Academic Search

Entropy and jackknife estimation procedures were used to find that custom rates are 20.3% lower than the true cost to own and operate machinery for an average size Kansas farm. A method was then developed to estimate a farms total machinery costs with which to benchmark machinery costs.

Aaron J. Beaton; Kevin C. Dhuyvetter; Terry L. Kastens



The 1972 Wald Lecture Robust Statistics: A Review  

Microsoft Academic Search

This is a selective review on robust statistics, centering on estimates of location, but extending into other estimation and testing problems. After some historical remarks, several possible concepts of robustness are critically reviewed. Three important classes of estimates are singled out and some basic heuristic tools for assessing properties of robust estimates (or test statistics) are discussed: influence curve, jackknifing.

Peter J. Huber



Robust Tests for the Equality of Variances  

Microsoft Academic Search

Alternative formulations of Levene's test statistic for equality of variances are found to be robust under nonnormality. These statistics use more robust estimators of central location in place of the mean. They are compared with the unmodified Levene's statistic, a jackknife procedure, and a ? test suggested by Layard which are all found to be less robust under nonnormality.

Morton B. Brown; Alan B. Forsythe



Sampling effort and estimates of species richness based on prepositioned area electrofisher samples  

USGS Publications Warehouse

Estimates of species richness based on electrofishing data are commonly used to describe the structure of fish communities. One electrofishing method for sampling riverine fishes that has become popular in the last decade is the prepositioned area electrofisher (PAE). We investigated the relationship between sampling effort and fish species richness at seven sites in the Tallapoosa River system, USA based on 1,400 PAE samples collected during 1994 and 1995. First, we estimated species richness at each site using the first-order jackknife and compared observed values for species richness and jackknife estimates of species richness to estimates based on historical collection data. Second, we used a permutation procedure and nonlinear regression to examine rates of species accumulation. Third, we used regression to predict the number of PAE samples required to collect the jackknife estimate of species richness at each site during 1994 and 1995. We found that jackknife estimates of species richness generally were less than or equal to estimates based on historical collection data. The relationship between PAE electrofishing effort and species richness in the Tallapoosa River was described by a positive asymptotic curve as found in other studies using different electrofishing gears in wadable streams. Results from nonlinear regression analyses indicted that rates of species accumulation were variable among sites and between years. Across sites and years, predictions of sampling effort required to collect jackknife estimates of species richness suggested that doubling sampling effort (to 200 PAEs) would typically increase observed species richness by not more than six species. However, sampling effort beyond about 60 PAE samples typically increased observed species richness by < 10%. We recommend using historical collection data in conjunction with a preliminary sample size of at least 70 PAE samples to evaluate estimates of species richness in medium-sized rivers. Seventy PAE samples should provide enough information to describe the relationship between sampling effort and species richness and thus facilitate evaluation of a sampling effort.

Bowen, Z.H.; Freeman, M.C.



Inferring Phylogenetic Networks from Gene Order Data  

PubMed Central

Existing algorithms allow us to infer phylogenetic networks from sequences (DNA, protein or binary), sets of trees, and distance matrices, but there are no methods to build them using the gene order data as an input. Here we describe several methods to build split networks from the gene order data, perform simulation studies, and use our methods for analyzing and interpreting different real gene order datasets. All proposed methods are based on intermediate data, which can be generated from genome structures under study and used as an input for network construction algorithms. Three intermediates are used: set of jackknife trees, distance matrix, and binary encoding. According to simulations and case studies, the best intermediates are jackknife trees and distance matrix (when used with Neighbor-Net algorithm). Binary encoding can also be useful, but only when the methods mentioned above cannot be used.

Morozov, Alexey Anatolievich; Galachyants, Yuri Pavlovich; Likhoshway, Yelena Valentinovna



A note on bias and mean squared error in steady-state quantile estimation  

NASA Astrophysics Data System (ADS)

When using a batch means methodology for estimation of a nonlinear function of a steady-state mean from the output of simulation experiments, it has been shown that a jackknife estimator may reduce the bias and mean squared error (mse) compared to the classical estimator, whereas the average of the classical estimators from the batches (the batch means estimator) has a worse performance from the point of view of bias and mse. In this paper we show that, under reasonable assumptions, the performance of the jackknife, classical and batch means estimators for the estimation of quantiles of the steady-state distribution exhibit similar properties as in the case of the estimation of a nonlinear function of a steady-state mean. We present some experimental results from the simulation of the waiting time in queue for an M/M/1 system under heavy traffic.

Muñoz, David F.; Ramírez-López, Adán



Angiosperm phylogeny inferred from 18S rDNA, rbcL , and atpB sequences  

Microsoft Academic Search

A phylogenetic analysis of a combined data set for 560 angiosperms and seven outgroups based on three genes, 18S rDNA (1855 bp), rbcL (1428 bp), and atpB (1450 bp) representing a total of 4733 bp is presented. Parsimony analysis was expedited by use of a new computer program, the RATCHET. Parsimony jackknifing was performed to assess the support of clades.




Resampling Methods: Concepts, Applications, and Justification  

NSDL National Science Digital Library

Created by Chong Hu Yu for Cisco Systems, this journal article is a summary of resampling methods such as the jackknife, bootstrap, and permutation tests. It summarizes the tests, describes various software to perform the tests, and has a list of references. The author provides an introduction, resampling methods, software for, the rationale of supporting, criticisms of resampling, a conclusion and references. This is a expansive resource which goes very in-depth into the study of resampling methods.

Yu, Chong H.



Asymmetric Business Cycle Fluctuations and Contagion Effects in G7 Countries  

Microsoft Academic Search

This research studies possible existence of business cycle asymmetries in Canada, France, Germany, Italy, Japan, UK, and US real GDP growth rates. Asymmetries in these countries are modeled using in-sample as well as jackknife out-of-sample forecasts approximated from artificial neural networks. Univariate results show statistically significant evidence of asymmetries in business cycle fluctuations in all the series; this is corroborated

Khurshid M. Kiani


Bootstrap Cross-Validation Indices for PLS Path Model Assessment  

Microsoft Academic Search

\\u000a The goal of PLS path modeling is primarily to estimate the variance of endogenous constructs and in turn their respective\\u000a manifest variables (if reflective). Models with significant jackknife or bootstrap parameter estimates may still be considered\\u000a invalid in a predictive sense. In this chapter, the objective is to shift from that of assessing the significance of parameter\\u000a estimates (e.g., loadings

Wynne W. Chin


Prediction of protein structural classes using hybrid properties  

Microsoft Academic Search

In this paper, amino acid compositions are combined with some protein sequence properties (physiochemical properties) to predict\\u000a protein structural classes. We are able to predict protein structural classes using a mathematical model that combines the\\u000a nearest neighbor algorithm (NNA), mRMR (minimum redundancy, maximum relevance), and feature forward searching strategy. Jackknife\\u000a cross-validation is used to evaluate the prediction accuracy. As a

Wenjin Li; Kao Lin; Kaiyan Feng; Yudong Cai



Prediction of resource volumes at untested locations using simple local prediction models  

USGS Publications Warehouse

This paper shows how local spatial nonparametric prediction models can be applied to estimate volumes of recoverable gas resources at individual undrilled sites, at multiple sites on a regional scale, and to compute confidence bounds for regional volumes based on the distribution of those estimates. An approach that combines cross-validation, the jackknife, and bootstrap procedures is used to accomplish this task. Simulation experiments show that cross-validation can be applied beneficially to select an appropriate prediction model. The cross-validation procedure worked well for a wide range of different states of nature and levels of information. Jackknife procedures are used to compute individual prediction estimation errors at undrilled locations. The jackknife replicates also are used with a bootstrap resampling procedure to compute confidence bounds for the total volume. The method was applied to data (partitioned into a training set and target set) from the Devonian Antrim Shale continuous-type gas play in the Michigan Basin in Otsego County, Michigan. The analysis showed that the model estimate of total recoverable volumes at prediction sites is within 4 percent of the total observed volume. The model predictions also provide frequency distributions of the cell volumes at the production unit scale. Such distributions are the basis for subsequent economic analyses. ?? Springer Science+Business Media, LLC 2007.

Attanasi, E. D.; Coburn, T. C.; Freeman, P. A.



MMOD: an R library for the calculation of population differentiation statistics.  


MMOD is a library for the R programming language that allows the calculation of the population differentiation measures D(est), G?(ST) and ?'(ST). R provides a powerful environment in which to conduct and record population genetic analyses but, at present, no R libraries provide functions for the calculation of these statistics from standard population genetic files. In addition to the calculation of differentiation measures, mmod can produce parametric bootstrap and jackknife samples of data sets for further analysis. By integrating with and complimenting the existing libraries adegenet and pegas, mmod extends the power of R as a population genetic platform. PMID:22883857

Winter, David J



Efficacy of Anal Fistula Plug in Closure of Crohn’s Anorectal Fistulas  

Microsoft Academic Search

\\u000a Purpose  The efficacy of Surgisis anal fistula plug in closure of Crohn’s anorectal fistula was studied.\\u000a \\u000a \\u000a \\u000a Methods  Patients with Crohn’s anorectal fistulas were prospectively studied. Diagnosis was made by histologic, radiographic, or endoscopic\\u000a criteria. Variables recorded were: number of fistula tracts (primary openings), presence of setons, and current antitumor\\u000a necrosis factor therapy. Under general anesthesia and in prone jackknife position, patients underwent

Lynn O’Connor; Bradley J. Champagne; Martha A. Ferguson; Guy R. Orangio; Marion E. Schertzer; David N. Armstrong



Efficacy of Anal Fistula Plug in Closure of Cryptoglandular Fistulas: Long-Term Follow-Up  

Microsoft Academic Search

Purpose  The long-term efficacy of Surgisis anal fistula plug in closure of cryptoglandular anorectal fistulas was studied.\\u000a \\u000a \\u000a \\u000a Methods  Patients with high cryptoglandular anorectal fistulas were prospectively studied. Additional variables recorded were: number\\u000a of fistula tracts, and presence of setons. Under general anesthesia and in prone jackknife position, patients underwent irrigation\\u000a of the fistula tract by using hydrogen peroxide. Each primary opening was

Bradley J. Champagne; Lynn M. O’Connor; Martha Ferguson; Guy R. Orangio; Marion E. Schertzer; David N. Armstrong



Statistical analysis of SHAPE-directed RNA secondary structure modeling  

PubMed Central

The ability to predict RNA secondary structure is fundamental for understanding and manipulating RNA function. The structural information obtained from selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) experiments greatly improves the accuracy of RNA secondary structure prediction. Recently, Das and colleagues [Kladwang et al., Biochemistry 50:8049 (2011)] proposed a “bootstrapping” approach to estimate the variance and helix-by-helix confidence levels of predicted secondary structures based on resampling (randomizing and summing) the measured SHAPE data. We show that the specific resampling approach described by Kladwang et al. introduces systematic errors and underestimates confidence in secondary structure prediction using SHAPE data. Instead, a leave-data-out jackknife approach better estimates the influence of a given experimental dataset on SHAPE-directed secondary structure modeling. Even when 35% of the data were left out in the jackknife approach, the confidence levels of SHAPE-directed secondary structure prediction were significantly higher than those calculated by Das and colleagues using bootstrapping. Helix confidence levels were thus significantly underestimated in the recent study, and resampling approach implemented by Kladwang et al. is not an appropriate metric for assigning confidences in SHAPE-directed secondary structure modeling.

Ramachandran, Srinivas; Ding, Feng; Weeks, Kevin M.; Dokholyan, Nikolay V.



HerMES: SPIRE Science Demonstration Phase maps†‡  

NASA Astrophysics Data System (ADS)

We describe the production and verification of sky maps of the five Spectral and Photometric Imaging Receiver (SPIRE) fields observed as part of the Herschel Multi-tiered Extragalactic Survey (HerMES) during the Science Demonstration Phase (SDP) of the Herschel mission. We have implemented an iterative map-making algorithm [The SPIRE-HerMES Iterative Mapper (SHIM)] to produce high fidelity maps that preserve extended diffuse emission on the sky while exploiting the repeated observations of the same region of the sky with many detectors in multiple scan directions to minimize residual instrument noise. We specify here the SHIM algorithm and outline the various tests that were performed to determine and characterize the quality of the maps and verify that the astrometry, point source flux and power on all relevant angular scales meet the needs of the HerMES science goals. These include multiple jackknife tests, determination of the map transfer function and detailed examination of the power spectra of both sky and jackknife maps. The map transfer function is approximately unity on scales from 1arcmin to 1°. Final maps (v1.0), including multiple jackknives, as well as the SHIM pipeline, have been used by the HerMES team for the production of SDP papers. Herschel is an ESA space observatory with science instruments provided by European-led Principal Investigator consortia and with important participation from NASA. ‡ § E-mail:

Levenson, L.; Marsden, G.; Zemcov, M.; Amblard, A.; Blain, A.; Bock, J.; Chapin, E.; Conley, A.; Cooray, A.; Dowell, C. D.; Ellsworth-Bowers, T. P.; Franceschini, A.; Glenn, J.; Griffin, M.; Halpern, M.; Nguyen, H. T.; Oliver, S. J.; Page, M. J.; Papageorgiou, A.; Pérez-Fournon, I.; Pohlen, M.; Rangwala, N.; Rowan-Robinson, M.; Schulz, B.; Scott, Douglas; Serra, P.; Shupe, D. L.; Valiante, E.; Vieira, J. D.; Vigroux, L.; Wiebe, D.; Wright, G.; Xu, C. K.



Isolation and characterization of microsatellite DNA in the piracema fish Prochilodus lineatus (Characiformes).  


We described five novel microsatellite loci for the piracema fish species Prochilodus lineatus (Characiformes), endemic to South America and of extreme importance to both commercial and artisanal fisheries across its occurrence area. A primary, unenriched genomic library was constructed and radioactively screened for repetitive motifs. Positive clones were automatically sequenced and based on the design of new primers, polymerase chain reaction assays were carried out to determine optimum reaction and electrophoretic conditions for each characterized locus. We evaluated its usefulness in population genetic studies by determining Hardy-Weinberg equilibrium, FIS and a jackknife estimate of the number of alleles for a sample of fish caught below the Funil Hydroelectric Power Plant dam (N = 95), Grande River, Brazil. The number of alleles varied from 3 to 21 and expected heterozygosities ranged from 0.58 to 0.91. Two of five loci were in Hardy-Weinberg equilibrium. Jackknife estimates of the number of alleles were higher than the observed number of alleles for three loci and could provide a measure of sampling bias. These markers should provide important tools for the determination of genetic structure, stock delimitation and reservoir fish management in the Grande River as well as to improve hatchery practices for environmental mitigation measures and to help sustain fisheries in the river. PMID:18273795

Yazbeck, G M; Kalapothakis, E



Na?ve Bayes Classifier with Feature Selection to Identify Phage Virion Proteins  

PubMed Central

Knowledge about the protein composition of phage virions is a key step to understand the functions of phage virion proteins. However, the experimental method to identify virion proteins is time consuming and expensive. Thus, it is highly desirable to develop novel computational methods for phage virion protein identification. In this study, a Naïve Bayes based method was proposed to predict phage virion proteins using amino acid composition and dipeptide composition. In order to remove redundant information, a novel feature selection technique was employed to single out optimized features. In the jackknife test, the proposed method achieved an accuracy of 79.15% for phage virion and nonvirion proteins classification, which are superior to that of other state-of-the-art classifiers. These results indicate that the proposed method could be as an effective and promising high-throughput method in phage proteomics research.

Feng, Peng-Mian; Ding, Hui; Chen, Wei



Otolith elemental signatures indicate population separation in deep-sea rockfish, Helicolenus dactylopterus and Pontinus kuhlii, from the Azores  

NASA Astrophysics Data System (ADS)

Deep sea rockfish, Helicolenus dactylopterus and Pontinus kuhlii from the Azores archipelago were used to study population structuring using the trace element composition of the otolith. Through solution-based inductively coupled plasma mass spectrometry we identified elemental profiles that adequately identified fish from different island groups in the region (East, West and Central). Mg:Ca, Pb:Ca and Li:Ca ratios combined to distinguish H. dactylopterus with 67% overall success. Sr:Ca, Ba:Ca, Li:Ca and Cu:Ca provided adequate distinction in P. kuhlii with a mean jack-knifed classification success of 75%. This was a first attempt at determining the distinguishability of fish aggregations from this oceanic island setting, where suitable habitat for these species is limited and fragmented. Results of our study corroborate with previous research pointing to constrained home ranges for these species. Implications for fisheries management are important since these commercial resources should be managed locally rather than regionally.

Higgins, Ruth; Isidro, Eduardo; Menezes, Gui; Correia, Alberto



A quantitative structure-activity relationship study on a few series of anti-hepatitis C virus agents.  


A 2-Dimensional Quantitative Structure-Activity Relationship study has been performed on 2 series of hepatitis C virus (HCV) inhibitors, i.e., Isothiazoles and Thiazolones. In each case significant correlations are found between the anti-HCV potencies and some physicochemical, electronic and steric properties of the compounds, indicating that for the first series the activity is controlled by density and two indicator parameters (one for halogen and other for methyl), while for the second series density, Hammett constant and Kier's first order valence molecular connectivity index are important for anti-HCV activity. The validity of the correlation has been judged by leave-one-out jackknife procedure and predicting the activity of some test compounds. Using the correlations obtained, some new compounds of high potency have been predicted in each series. PMID:22530896

Varshney, Jonish; Sharma, Anjana; Gupta, Satya P



A method for WD40 repeat detection and secondary structure prediction.  


WD40-repeat proteins (WD40s), as one of the largest protein families in eukaryotes, play vital roles in assembling protein-protein/DNA/RNA complexes. WD40s fold into similar ?-propeller structures despite diversified sequences. A program WDSP (WD40 repeat protein Structure Predictor) has been developed to accurately identify WD40 repeats and predict their secondary structures. The method is designed specifically for WD40 proteins by incorporating both local residue information and non-local family-specific structural features. It overcomes the problem of highly diversified protein sequences and variable loops. In addition, WDSP achieves a better prediction in identifying multiple WD40-domain proteins by taking the global combination of repeats into consideration. In secondary structure prediction, the average Q3 accuracy of WDSP in jack-knife test reaches 93.7%. A disease related protein LRRK2 was used as a representive example to demonstrate the structure prediction. PMID:23776530

Wang, Yang; Jiang, Fan; Zhuo, Zhu; Wu, Xian-Hui; Wu, Yun-Dong



Supervised method for periodontitis phenotypes prediction based on microbial composition using 16S rRNA sequences.  


Microbes play an important role on human health, however, little is known on microbes in the past decades for the limitation of culture-based techniques. Recently, with the development of next-generation sequencing (NGS) technologies, it is now possible to sequence millions of sequences directly from environments samples, and thus it supplies us a sight to probe the hidden world of microbial communities and detect the associations between microbes and diseases. In the present work, we proposed a supervised learning-based method to mine the relationship between microbes and periodontitis with 16S rRNA sequences. The jackknife accuracy is 94.83% and it indicated the method can effectively predict disease status. These findings not only expand our understanding of the association between microbes and diseases but also provide a potential approach for disease diagnosis and forensics. PMID:24878731

Chen, Wei; Cheng, Yong-Mei; Zhang, Shao-Wu; Pan, Quan



Structural class tendency of polypeptide: A new conception in predicting protein structural class  

NASA Astrophysics Data System (ADS)

Prediction of protein domain structural classes is an important topic in protein science. In this paper, we proposed a new conception: structural class tendency of polypeptides (SCTP), which is based on the fact that a given amino acid fragment tends to be presented in certain type of proteins. The SCTP is obtained from an available training data set PDB40-B. When using the SCTP to predict protein structural classes by Intimate Sorting predictive method, we got the predictive accuracy (jackknife test) with 93.7%, 96.5%, and 78.6% for the testing data set PDB40-j, Chou&Maggiora and CHOU. These results indicate that the SCTP approach is quite encouraging and promising. This new conception provides an effective tool to extract valuable information from protein sequences.

Yu, Tao; Sun, Zhi-Bo; Sang, Jian-Ping; Huang, Sheng-You; Zou, Xian-Wu



[Protein structural class prediction with binary tree-based support vector machines].  


A new mutil-classification method based on binary tree SVM (BT-SVM) is presented to predict protein structural class. The protein sequence, which is represented by 26-D vector, is used as input vector. BT-SVM method resolves unclassifiable regions for multiclass problems which can not be solved by SVM. Self-consistency and cross validation test are used to verify the performance of the proposal method on two benchmark datasets. Satisfactory test results demonstrate that the new method is promising. The Jackknife results of the new method are compared with the existing results on the same datasets. The results of the new method are almost the same as the ones of the best exiting method. It illuminates that the new method has good prediction performance and it will become a useful tool in protein structure class prediction. PMID:18788309

Zhang, Tongliang; Ding, Yongsheng



Small-Angle X-Ray Scattering- and Nuclear Magnetic Resonance-Derived Conformational Ensemble of the Highly Flexible Antitoxin PaaA2.  


Antitoxins from prokaryotic type II toxin-antitoxin modules are characterized by a high degree of intrinsic disorder. The description of such highly flexible proteins is challenging because they cannot be represented by a single structure. Here, we present a combination of SAXS and NMR data to describe the conformational ensemble of the PaaA2 antitoxin from the human pathogen E. coli O157. The method encompasses the use of SAXS data to filter ensembles out of a pool of conformers generated by a custom NMR structure calculation protocol and the subsequent refinement by a block jackknife procedure. The final ensemble obtained through the method is validated by an established residual dipolar coupling analysis. We show that the conformational ensemble of PaaA2 is highly compact and that the protein exists in solution as two preformed helices, connected by a flexible linker, that probably act as molecular recognition elements for toxin inhibition. PMID:24768114

Sterckx, Yann G J; Volkov, Alexander N; Vranken, Wim F; Kragelj, Jaka; Jensen, Malene Ringkjøbing; Buts, Lieven; Garcia-Pino, Abel; Jové, Thomas; Van Melderen, Laurence; Blackledge, Martin; van Nuland, Nico A J; Loris, Remy



Using reduced amino acid composition to predict defensin family and subfamily: Integrating similarity measure and structural alphabet.  


Defensins are essentially ancient natural antibiotics with potent activity extending from lower organisms to humans. They can inhibit the growth or virulence of micro-organisms directly or indirectly enhance the host's immune system. The successful prediction of defensin peptides will provide very useful information and insights for the basic research of defensins. In this study, by selecting the N-peptide composition of reduced amino acid alphabet (RAAA) obtained from structural alphabet named Protein Blocks as the feature parameters, the increment of diversity (ID) is firstly developed to predict defensins family and subfamily. The jackknife test based on 2-peptide composition of reduced amino acid alphabet (RAAA) with 13 reduced amino acids shows that the overall accuracy of prediction are 91.36% for defensin family, and 94.21% for defensin subfamily. The results indicate that ID_RAAA is a simple and efficient prediction method for defensin peptides. PMID:19591890

Zuo, Yong-Chun; Li, Qian-Zhong



Using AdaBoost for the prediction of subcellular location of prokaryotic and eukaryotic proteins.  


In this paper, AdaBoost algorithm, a popular and effective prediction method, is applied to predict the subcellular locations of Prokaryotic and Eukaryotic Proteins-a dataset derived from SWISSPROT 33.0. Its prediction ability was evaluated by re-substitution test, Leave-One-Out Cross validation (LOOCV) and jackknife test. By comparing its results with some most popular predictors such as Discriminant Function, neural networks, and SVM, we demonstrated that the AdaBoost predictor outperformed these predictors. As a result, we arrive at the conclusion that AdaBoost algorithm could be employed as a robust method to predict subcellular location. An online web server for predicting subcellular location of prokaryotic and eukaryotic proteins is available at . PMID:18506593

Niu, Bing; Jin, Yu-Huan; Feng, Kai-Yan; Lu, Wen-Cong; Cai, Yu-Dong; Li, Guo-Zheng



A protein structural classes prediction method based on PSI-BLAST profile.  


Knowledge of protein structural classes plays an important role in understanding protein folding patterns. Prediction of protein structural class based solely on sequence data remains to be a challenging problem. In this study, we extract the long-range correlation information and linear correlation information from position-specific score matrix (PSSM). A total of 3600 features are extracted, then, 278 features are selected by a filter feature selection method based on 1189 dataset. To verify the performance of our method (named by LCC-PSSM), jackknife tests are performed on three widely used low similarity benchmark datasets. Comparison of our results with the existing methods shows that our method provides the favorable performance for protein structural class prediction. Stand-alone version of the proposed method (LCC-PSSM) is written in MATLAB language and it can be downloaded from PMID:24607742

Ding, Shuyan; Yan, Shoujiang; Qi, Shuhua; Li, Yan; Yao, Yuhua



Predicting peroxidase subcellular location by hybridizing different descriptors of Chou' pseudo amino acid patterns.  


Peroxidases as universal enzymes are essential for the regulation of reactive oxygen species levels and play major roles in both disease prevention and human pathologies. Automated prediction of functional protein localization is rarely reported and also is important for designing new drugs and drug targets. In this study, we first propose a support vector machine (SVM)-based method to predict peroxidase subcellular localization. Various Chou' pseudo amino acid descriptors and gene ontology (GO)-homology patterns were selected as input features to multiclass SVM. Prediction results showed that the smoothed PSSM encoding pattern performed better than the other approaches. The best overall prediction accuracy was 87.0% in a jackknife test using a PSSM profile of pattern with width=5. We also demonstrate that the present GO annotation is far from complete or deep enough for annotating proteins with a specific function. PMID:24802134

Zuo, Yong-Chun; Peng, Yong; Liu, Li; Chen, Wei; Yang, Lei; Fan, Guo-Liang



Classification in karyometry: performance testing and prediction error  

PubMed Central

Classification plays a central role in quantitative histopathology. Success is expressed in terms of the accuracy of prediction for the classification of future data points and an estimate of the prediction error. The prediction error is affected by the chosen procedure, e.g., the use of a training set of data points, a validation set, an independent test set, the sample size and the learning curve of the classification algorithm. For small samples procedures such as the “jackknife”, the “leave one out” and the “bootstrap” are recommended to arrive at an unbiased estimate of the true prediction error. All of the procedures rest on the assumption that the data set used to derive a classification rule is representative for the diagnostic categories involved. It is this assumption that in quantitative histopathology has to be carefully verified before a clinically generally valid classification procedure can be claimed.

Bartels, PH; Bartels, HG



A novel predictor for protein structural class based on integrated information of the secondary structure sequence.  


The structural class has become one of the most important features for characterizing the overall folding type of a protein and played important roles in many aspects of protein research. At present, it is still a challenging problem to accurately predict protein structural class for low-similarity sequences. In this study, an 18-dimensional integrated feature vector is proposed by fusing the information about content and position of the predicted secondary structure elements. The consistently high accuracies of jackknife and 10-fold cross-validation tests on different low-similarity benchmark datasets show that the proposed method is reliable and stable. Comparison of our results with other methods demonstrates that our method is an effective computational tool for protein structural class prediction, especially for low-similarity sequences. PMID:24859536

Zhang, Lichao; Zhao, Xiqiang; Kong, Liang; Liu, Shuxia



Predicting protein subcellular locations with feature selection and analysis.  


In this paper, we propose a strategy to predict the subcellular locations of proteins by combining various feature selection methods. Firstly, proteins are coded by amino-acid composition and physicochemical properties, then these features are arranged by Minimum Redundancy Maximum Relevance method and further filtered by feature selection procedure. Nearest Neighbor Algorithm is used as a prediction model to predict the protein subcellular locations, and gains a correct prediction rate of 70.63%, evaluated by Jackknife cross-validation. Results of feature selection also enable us to identify the most important protein properties. The prediction software is available for public access on the website, which may play a important complementary role to a series of web-server predictors summarized recently in a review by Chou and Shen (Chou, K.C., Shen, H.B. Natural Science, 2009, 2, 63-92, PMID:19995336

Cai, Yudong; He, Jianfeng; Li, Xinlei; Feng, Kaiyan; Lu, Lin; Feng, Kairui; Kong, Xiangyin; Lu, Wencong



Linear regression models for biomass table construction using cluster samples: A simulation study  

SciTech Connect

Biomass tables are often constructed by application of Ordinary Least Squares (OLS) regression methods to data from clusters samples. These methods assume that data are collected from a simple random sample. Application of OLS methods to clustered data leads to underestimates of the error of the regression, because of the intracluster correlation among trees from the same cluster. This study considered alternate regression models under a variety of cluster sampling methods, to determined model-sampling method combinations which yield both accurate model estimates and reliable estimates of model precision. Biomass was modeled as a function of diameter at breast height (DBH) and total height (H). Each model was fit under 19 least squares estimation procedures from five categories, including (1) ordinary methods, (2) modified methods using cluster sums as variables, (3) generalized least squares methods which consider estimated intracluster correlation, (4) jackknifed regression models, and (5) random coefficient regressions.

Gillespie, A.J.R.



Avian community response to small-scale habitat disturbance in Maine  

USGS Publications Warehouse

The effects of small clearcuts (1 - 8 ha) on avian communities in the forest of eastern Maine were studied using point counts during spring 1978 - 1981. Surveys were conducted in uncut (control) and clear-cut (treatment) plots in three stand types: conifer, hardwood, and mixed growth. We used a mark-recapture model and its associated jackknife species richness estimator (N), as an indicator of avian community structure. Increases in estimated richness (N) and Shannon - Weaver diversity (H') were noted in the treated hardwood and mixed growth, but not in the conifer stands. Seventeen avian species increased in relative abundance, whereas two species declined. Stand treatment was associated with important changes in bird species composition. Increased habitat patchiness and the creation of forest edge are hypothesized as causes for the greater estimates of richness and diversity.

Derleth, E.L.; McAuley, D.G.; Dwyer, T.J.



Prediction of mitochondrial proteins based on genetic algorithm - partial least squares and support vector machine.  


Mitochondria are essential cell organelles of eukaryotes. Hence, it is vitally important to develop an automated and reliable method for timely identification of novel mitochondrial proteins. In this study, mitochondrial proteins were encoded by dipeptide composition technology; then, the genetic algorithm-partial least square (GA-PLS) method was used to evaluate the dipeptide composition elements which are more important in recognizing mitochondrial proteins; further, these selected dipeptide composition elements were applied to support vector machine (SVM)-based classifiers to predict the mitochondrial proteins. All the models were trained and validated by the jackknife cross-validation test. The prediction accuracy is 85%, suggesting that it performs reasonably well in predicting the mitochondrial proteins. Our results strongly imply that not all the dipeptide compositions are informative and indispensable for predicting proteins. The source code of MATLAB and the dataset are available on request under PMID:17701100

Tan, F; Feng, X; Fang, Z; Li, M; Guo, Y; Jiang, L



Prediction of protein structural classes using hybrid properties.  


In this paper, amino acid compositions are combined with some protein sequence properties (physiochemical properties) to predict protein structural classes. We are able to predict protein structural classes using a mathematical model that combines the nearest neighbor algorithm (NNA), mRMR (minimum redundancy, maximum relevance), and feature forward searching strategy. Jackknife cross-validation is used to evaluate the prediction accuracy. As a result, the prediction success rate improves to 68.8%, which is better than the 62.2% obtained when using only amino acid compositions. Therefore, we conclude that the physiochemical properties are factors that contribute to the protein folding phenomena and the most contributing features are found to be the amino acid composition. We expect that prediction accuracy will improve further as more sequence information comes to light. A web server for predicting the protein structural classes is available at PMID:18953662

Li, Wenjin; Lin, Kao; Feng, Kaiyan; Cai, Yudong



Comparative analysis on determinants of self-rated health among non-Hispanic white, Hispanic, and asian american older adults.  


The purpose of the study is (1) to compare the effects of factors on self-rated health (SRH) among older non-Hispanic Whites (NHW), Hispanic, and Asian Californians and (2) to provide estimated influence size of each factor on SRH. This study analyzed secondary data drawn from the 2005 California Health Interview Survey. Binary logit regressions were used to analyze data with the Jackknife replication sampling weights. Significant differences were found in SRH among the three groups. Hispanics and Asians reported poorer health than NHW. Socioeconomic status, acculturation, and health access significantly accounted for an association between ethnicity and SRH. However, the magnitudes of their effects on SRH varied across the groups and by the factors examined. This study discusses and concludes with some recommendations on the opportunities presented by the Affordable Care Act and Healthy People 2020. PMID:23744285

Min, Jong Won; Rhee, Siyon; Lee, Sang E; Rhee, Jessica; Tran, Thanh



Temporal genetic variability and host sources of Escherichia coli associated with fecal pollution from domesticated animals in the shellfish culture environment of Xiangshan Bay, East China Sea.  


This study was conducted to analyze the genetic variability of Escherichia coli from domesticated animal wastes for microbial source tracking (MST) application in fecal contaminated shellfish growing waters of Xiangshan Bay, East China Sea. (GTG)(5) primer was used to generate 1363 fingerprints from E. coli isolated from feces of known 9 domesticated animal sources around this shellfish culture area. Jackknife analysis of the complete (GTG)(5)-PCR DNA fingerprint library indicated that isolates were assigned to the correct source groups with an 84.28% average rate of correct classification. Based on one-year source tracking data, the dominant sources of E. coli were swine, chickens, ducks and cows in this water area. Moreover, annual and spatial changes of E. coli concentrations and host sources may affect the level and distribution of zoonotic pathogen species in waters. Our findings will further contribute to preventing fecal pollution in aquatic environments and quality control of shellfish. PMID:21645948

Fu, Ling-Lin; Shuai, Jiang-Bing; Wang, Yanbo; Ma, Hong-Jia; Li, Jian-Rong



Bootstrapped MRMC confidence intervals  

NASA Astrophysics Data System (ADS)

The multiple-reader, multiple-case (MRMC) paradigm of Swets and Pickett (1982) for ROC analysis was expressed as a components of variance model by Dorfman, Berbaum, and Metz (1992) and validated by Roe and Metz (1997) for Type I error rates. Our group proposed an analysis of the MRMC components of variance model using bootstrap (Beiden, Wagner, and Campbell, 2000) experiments instead of jackknife pseudo-values. These approaches have been challenged by some contemporary authors (e.g. Zhou, Obuchowski, and McClish, 2002). The purpose of the present paper is to formally compare the models and to carry out validation tests of their performance. We investigate different approaches to statistical inference, including several types of nonparametric bootstrap confidence intervals and report on validation and simulation experiments of Type I errors.

Samuelson, Frank W.; Wagner, Robert F.



Linear regression in astronomy. II  

NASA Technical Reports Server (NTRS)

A wide variety of least-squares linear regression procedures used in observational astronomy, particularly investigations of the cosmic distance scale, are presented and discussed. The classes of linear models considered are (1) unweighted regression lines, with bootstrap and jackknife resampling; (2) regression solutions when measurement error, in one or both variables, dominates the scatter; (3) methods to apply a calibration line to new data; (4) truncated regression models, which apply to flux-limited data sets; and (5) censored regression models, which apply when nondetections are present. For the calibration problem we develop two new procedures: a formula for the intercept offset between two parallel data sets, which propagates slope errors from one regression to the other; and a generalization of the Working-Hotelling confidence bands to nonstandard least-squares lines. They can provide improved error analysis for Faber-Jackson, Tully-Fisher, and similar cosmic distance scale relations.

Feigelson, Eric D.; Babu, Gutti J.



Predict protein structural class for low-similarity sequences by evolutionary difference information into the general form of Chou?s pseudo amino acid composition.  


Knowledge of protein structural class plays an important role in characterizing the overall folding type of a given protein. At present, it is still a challenge to extract sequence information solely using protein sequence for protein structural class prediction with low similarity sequence in the current computational biology. In this study, a novel sequence representation method is proposed based on position specific scoring matrix for protein structural class prediction. By defined evolutionary difference formula, varying length proteins are expressed as uniform dimensional vectors, which can represent evolutionary difference information between the adjacent residues of a given protein. To perform and evaluate the proposed method, support vector machine and jackknife tests are employed on three widely used datasets, 25PDB, 1189 and 640 datasets with sequence similarity lower than 25%, 40% and 25%, respectively. Comparison of our results with the previous methods shows that our method may provide a promising method to predict protein structural class especially for low-similarity sequences. PMID:24735902

Zhang, Lichao; Zhao, Xiqiang; Kong, Liang



PACo: a novel procrustes application to cophylogenetic analysis.  


We present Procrustean Approach to Cophylogeny (PACo), a novel statistical tool to test for congruence between phylogenetic trees, or between phylogenetic distance matrices of associated taxa. Unlike previous tests, PACo evaluates the dependence of one phylogeny upon the other. This makes it especially appropriate to test the classical coevolutionary model that assumes that parasites that spend part of their life in or on their hosts track the phylogeny of their hosts. The new method does not require fully resolved phylogenies and allows for multiple host-parasite associations. PACo produces a Procrustes superimposition plot enabling a graphical assessment of the fit of the parasite phylogeny onto the host phylogeny and a goodness-of-fit statistic, whose significance is established by randomization of the host-parasite association data. The contribution of each individual host-parasite association to the global fit is measured by means of jackknife estimation of their respective squared residuals and confidence intervals associated to each host-parasite link. We carried out different simulations to evaluate the performance of PACo in terms of Type I and Type II errors with respect to two similar published tests. In most instances, PACo performed at least as well as the other tests and showed higher overall statistical power. In addition, the jackknife estimation of squared residuals enabled more elaborate validations about the nature of individual links than the ParaFitLink1 test of the program ParaFit. In order to demonstrate how it can be used in real biological situations, we applied PACo to two published studies using a script written in the public-domain statistical software R. PMID:23580325

Balbuena, Juan Antonio; Míguez-Lozano, Raúl; Blasco-Costa, Isabel



Phylogenetic relationships of agaric fungi based on nuclear large subunit ribosomal DNA sequences.  


Phylogenetic relationships of mushrooms and their relatives within the order Agaricales were addressed by using nuclear large subunit ribosomal DNA sequences. Approximately 900 bases of the 5' end of the nucleus-encoded large subunit RNA gene were sequenced for 154 selected taxa representing most families within the Agaricales. Several phylogenetic methods were used, including weighted and equally weighted parsimony (MP), maximum likelihood (ML), and distance methods (NJ). The starting tree for branch swapping in the ML analyses was the tree with the highest ML score among previously produced MP and NJ trees. A high degree of consensus was observed between phylogenetic estimates obtained through MP and ML. NJ trees differed according to the distance model that was used; however, all NJ trees still supported most of the same terminal groupings as the MP and ML trees did. NJ trees were always significantly suboptimal when evaluated against the best MP and ML trees, by both parsimony and likelihood tests. Our analyses suggest that weighted MP and ML provide the best estimates of Agaricales phylogeny. Similar support was observed between bootstrapping and jackknifing methods for evaluation of tree robustness. Phylogenetic analyses revealed many groups of agaricoid fungi that are supported by moderate to high bootstrap or jackknife values or are consistent with morphology-based classification schemes. Analyses also support separate placement of the boletes and russules, which are basal to the main core group of gilled mushrooms (the Agaricineae of Singer). Examples of monophyletic groups include the families Amanitaceae, Coprinaceae (excluding Coprinus comatus and subfamily Panaeolideae), Agaricaceae (excluding the Cystodermateae), and Strophariaceae pro parte (Stropharia, Pholiota, and Hypholoma); the mycorrhizal species of Tricholoma (including Leucopaxillus, also mycorrhizal); Mycena and Resinomycena; Termitomyces, Podabrella, and Lyophyllum; and Pleurotus with Hohenbuehelia. Several groups revealed by these data to be nonmonophyletic include the families Tricholomataceae, Cortinariaceae, and Hygrophoraceae and the genera Clitocybe, Omphalina, and Marasmius. This study provides a framework for future systematics studies in the Agaricales and suggestions for analyzing large molecular data sets. PMID:12118409

Moncalvo, J M; Lutzoni, F M; Rehner, S A; Johnson, J; Vilgalys, R



Substrate depletion analysis as an approach to the pre-steady-state anticooperative kinetics of aminoacyl adenylate formation by tryptophanyl-tRNA synthetase from beef pancreas.  


The formation of tryptophanyl adenylate catalyzed by tryptophanyl-tRNA synthetase from beef pancreas has been studied by stopped-flow analysis under conditions where the concentration of one of the substrates was largely decreasing during the time course of the reaction. Under such conditions a nonlinear regression analysis of the formation of the adenylate (adenylate vs. time curve) at several initial tryptophan and enzyme concentrations gave an accurate determination of both binding constants of this substrate. The use of the jackknife procedure according to Cornish - Bowden & Wong [ Cornish - Bowden , A., & Wong , J.J. (1978) Biochem. J. 175, 969-976] gave the limit of confidence of these constants. This approach confirmed that tryptophanyl-tRNA synthetase presents a kinetic anticooperativity toward tryptophan in the activation reaction that closely parallels the anticooperativity found for tryptophan binding at equilibrium. Both sites are simultaneously forming the adenylate. The dissociation constants obtained under the present pre-steady-state conditions for tryptophan are KT1 = 1.6 +/- 0.5 microM and KT2 = 18.5 +/- 3.0 microM at pH 8.0, 25 degrees C. The rate constant kf of adenylate formation is identical for both active sites (kf = 42 +/- 5 s-1). The substrate depletion method presently used, linked to the jackknife procedure, proves to be particularly suitable for the determination of the kinetic constants and for the discrimination between different possible kinetic models of dimeric enzyme with high substrate affinity. In such a case this method is more reliable than the conventional method using substrate concentrations in high excess over that of the enzyme. PMID:6609716

Merle, M; Graves, P V; Labouesse, B



Maximum Likelihood Analyses of 3,490 rbcL Sequences: Scalability of Comprehensive Inference versus Group-Specific Taxon Sampling  

PubMed Central

The constant accumulation of sequence data poses new computational and methodological challenges for phylogenetic inference, since multiple sequence alignments grow both in the horizontal (number of base pairs, phylogenomic alignments) as well as vertical (number of taxa) dimension. Put aside the ongoing controversial discussion about appropriate models, partitioning schemes, and assembly methods for phylogenomic alignments, coupled with the high computational cost to infer these, for many organismic groups, a sufficient number of taxa is often exclusively available from one or just a few genes (e.g., rbcL, matK, rDNA). In this paper we address scalability of Maximum-Likelihood-based phylogeny reconstruction with respect to the number of taxa by example of several large nested single-gene rbcL alignments comprising 400 up to 3,491 taxa. In order to test the effect of taxon sampling, we employ an appropriately adapted taxon jackknifing approach. In contrast to standard jackknifing, this taxon subsampling procedure is not conducted entirely at random, but based on drawing subsamples from empirical taxon-groups which can either be user-defined or determined by using taxonomic information from databases. Our results indicate that, despite an unfavorable number of sequences to number of base pairs ratio, i.e., many relatively short sequences, Maximum Likelihood tree searches and bootstrap analyses scale well on single-gene rbcL alignments with a dense taxon sampling up to several thousand sequences. Moreover, the newly implemented taxon subsampling procedure can be beneficial for inferring higher level relationships and interpreting bootstrap support from comprehensive analysis.

Stamatakis, Alexandros; Goker, Markus; Grimm, Guido W.



HIV-1 protease cleavage site prediction based on amino acid property.  


Knowledge of the polyprotein cleavage sites by HIV protease will refine our understanding of its specificity, and the information thus acquired is useful for designing specific and efficient HIV protease inhibitors. Recently, several works have approached the HIV-1 protease specificity problem by applying a number of classifier creation and combination methods. The pace in searching for the proper inhibitors of HIV protease will be greatly expedited if one can find an accurate, robust, and rapid method for predicting the cleavage sites in proteins by HIV protease. In this article, we selected HIV-1 protease as the subject of the study. 299 oligopeptides were chosen for the training set, while the other 63 oligopeptides were taken as a test set. The peptides are represented by features constructed by AAIndex (Kawashima et al., Nucleic Acids Res 1999, 27, 368; Kawashima and Kanehisa, Nucleic Acids Res 2000, 28, 374). The mRMR method (Maximum Relevance, Minimum Redundancy; Ding and Peng, Proc Second IEEE Comput Syst Bioinformatics Conf 2003, 523; Peng et al., IEEE Trans Pattern Anal Mach Intell 2005, 27, 1226) combining with incremental feature selection (IFS) and feature forward search (FFS) are applied to find the two important cleavage sites and to select 364 important biochemistry features by jackknife test. Using KNN (K-nearest neighbors) to combine the selected features, the prediction model obtains high accuracy rate of 91.3% for Jackknife cross-validation test and 87.3% for independent-set test. It is expected that our feature selection scheme can be referred to as a useful assistant technique for finding effective inhibitors of HIV protease, especially for the scientists in this field. PMID:18496789

Niu, Bing; Lu, Lin; Liu, Liang; Gu, Tian Hong; Feng, Kai-Yan; Lu, Wen-Cong; Cai, Yu-Dong



Pine Hollow Watershed Project : FY 2000 Projects.  

SciTech Connect

The Pine Hollow Project (1999-010-00) is an on-going watershed restoration effort administered by Sherman County Soil and Water Conservation District and spearheaded by Pine Hollow/Jackknife Watershed Council. The headwaters are located near Shaniko in Wasco County, and the mouth is in Sherman County on the John Day River. Pine Hollow provides more than 20 miles of potential summer steelhead spawning and rearing habitat. The watershed is 92,000 acres. Land use is mostly range, with some dryland grain. There are no water rights on Pine Hollow. Due to shallow soils, the watershed is prone to rapid runoff events which scour out the streambed and the riparian vegetation. This project seeks to improve the quality of upland, riparian and in-stream habitat by restoring the natural hydrologic function of the entire watershed. Project implementation to date has consisted of construction of water/sediment control basins, gradient terraces on croplands, pasture cross-fences, upland water sources, and grass seeding on degraded sites, many of which were crop fields in the early part of the century. The project is expected to continue through about 2007. From March 2000 to June 2001, the Pine Hollow Project built 6 sediment basins, 1 cross-fence, 2 spring developments, 1 well development, 1 solar pump, 50 acres of native range seeding and 1 livestock waterline. FY2000 projects were funded by BPA, Oregon Watershed Enhancement Board, US Fish and Wildlife Service and landowners. In-kind services were provided by Sherman County Soil and Water Conservation District, USDA Natural Resources Conservation Service, USDI Bureau of Land Management, Oregon Department of Fish and Wildlife, Pine Hollow/Jackknife Watershed Council, landowners and Wasco County Soil and Water Conservation District.

Sherman County Soil and Water Conservation District



Bootstrapping phylogenies inferred from rearrangement data  

PubMed Central

Background Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models. Results We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches. Conclusions Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.



Spectral estimation for geophysical time-series with inconvenient gaps  

NASA Astrophysics Data System (ADS)

The power of spectral estimation as a tool for studying geophysical processes is often limited by short records or breaks in available time-series. Direct spectral estimation using multitaper techniques designed to reduce variance and minimize leakage can help alleviate the first problem. For records with gaps, systematic interpolation or averaging of multitaper spectra derived from record fragments may prove adequate in some cases, but can be cumbersome to implement. Alternatively, multitapers can be modified for use in direct spectral estimation with intermittently sampled data. However, their performance has not been adequately studied. We investigate reliability and resolution of techniques that adapt prolate and minimum bias (MB) multitapers to accommodate the longest breaks in sampling, comparing the tapering functions (referred to as PRG or MBG tapers) with the standard prolate and MB tapers used for complete data series, and with the section-averaging approach. Using a synthetic data set, we test both jackknife and bootstrap methods to calculate confidence intervals for PRG and MBG multitaper spectral estimates and find the jackknife is both more accurate and faster to compute. To implement these techniques for a variety of data sets, we provide an algorithm that allows the user to balance judicious interpolation against the use of suitably adapted tapers, providing empirical measures of both bias and frequency resolution for candidate sets of tapers. These techniques are tested on diverse geophysical data sets: a record of change in the length of day, a model of the external dipole part of the geomagnetic field produced by the magnetospheric ring current, and a 12 Myr long irregularly sampled relative geomagnetic palaeointensity record with pernicious gaps. We conclude that both PRG and MBG tapers generally perform as well as, or better than, an optimized form of the commonly used section averaging approach. The greatest improvements seem to occur when the gap structure creates data segments of very unequal lengths. Ease of computation and more robust behaviour can make MBG tapers a better choice than PRG except when very fine-scale frequency resolution is required. These techniques could readily be applied for cross-spectral and transfer function estimation and are a useful addition to the geophysical toolbox.

Smith-Boughner, L. T.; Constable, C. G.



Constructing large-scale genetic maps using an evolutionary strategy algorithm.  

PubMed Central

This article is devoted to the problem of ordering in linkage groups with many dozens or even hundreds of markers. The ordering problem belongs to the field of discrete optimization on a set of all possible orders, amounting to n!/2 for n loci; hence it is considered an NP-hard problem. Several authors attempted to employ the methods developed in the well-known traveling salesman problem (TSP) for multilocus ordering, using the assumption that for a set of linked loci the true order will be the one that minimizes the total length of the linkage group. A novel, fast, and reliable algorithm developed for the TSP and based on evolution-strategy discrete optimization was applied in this study for multilocus ordering on the basis of pairwise recombination frequencies. The quality of derived maps under various complications (dominant vs. codominant markers, marker misclassification, negative and positive interference, and missing data) was analyzed using simulated data with approximately 50-400 markers. High performance of the employed algorithm allows systematic treatment of the problem of verification of the obtained multilocus orders on the basis of computing-intensive bootstrap and/or jackknife approaches for detecting and removing questionable marker scores, thereby stabilizing the resulting maps. Parallel calculation technology can easily be adopted for further acceleration of the proposed algorithm. Real data analysis (on maize chromosome 1 with 230 markers) is provided to illustrate the proposed methodology.

Mester, D; Ronin, Y; Minkov, D; Nevo, E; Korol, A



Prediction of protein structural classes by Chou's pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis.  


A prior knowledge of protein structural classes can provide useful information about its overall structure, so it is very important for quick and accurate determination of protein structural class with computation method in protein science. One of the key for computation method is accurate protein sample representation. Here, based on the concept of Chou's pseudo-amino acid composition (AAC, Chou, Proteins: structure, function, and genetics, 43:246-255, 2001), a novel method of feature extraction that combined continuous wavelet transform (CWT) with principal component analysis (PCA) was introduced for the prediction of protein structural classes. Firstly, the digital signal was obtained by mapping each amino acid according to various physicochemical properties. Secondly, CWT was utilized to extract new feature vector based on wavelet power spectrum (WPS), which contains more abundant information of sequence order in frequency domain and time domain, and PCA was then used to reorganize the feature vector to decrease information redundancy and computational complexity. Finally, a pseudo-amino acid composition feature vector was further formed to represent primary sequence by coupling AAC vector with a set of new feature vector of WPS in an orthogonal space by PCA. As a showcase, the rigorous jackknife cross-validation test was performed on the working datasets. The results indicated that prediction quality has been improved, and the current approach of protein representation may serve as a useful complementary vehicle in classifying other attributes of proteins, such as enzyme family class, subcellular localization, membrane protein types and protein secondary structure, etc. PMID:18726140

Li, Zhan-Chao; Zhou, Xi-Bin; Dai, Zong; Zou, Xiao-Yong



ISDTool: a computational model for predicting immunosuppressive domain of HERVs.  


Human endogenous retroviruses (HERVs) have been found to act as etiological cofactors in several chronic diseases, including cancer, autoimmunity and neurological dysfunction. Immunosuppressive domain (ISD) is a conserved region of transmembrane protein (TM) in envelope gene (env) of retroviruses. In vitro and vivo, evidence has shown that retroviral TM is highly immunosuppressive and a synthetic peptide (CKS-17) that shows homology to ISD inhibits immune function. ISD is probably a potential pathogenic element in HERVs. However, only less than one hundred ISDs of HERVs have been annotated by researchers so far, and universal software for domain prediction could not achieve sufficient accuracy for specific ISD. In this paper, a computational model is proposed to identify ISD in HERVs based on genome sequences only. It has a classification accuracy of 97.9% using Jack-knife test. 117 HERVs families were scanned with the model, 1002 new putative ISDs have been predicted and annotated in the human chromosomes. This model is also applicable to search for ISDs in human T-lymphotropic virus (HTLV), simian T-lymphotropic virus (STLV) and murine leukemia virus (MLV) because of the evolutionary relationship between endogenous and exogenous retroviruses. Furthermore, software named ISDTool has been developed to facilitate the application of the model. Datasets and the software involved in the paper are all available at PMID:24583604

Lv, Hongqiang; Han, Jiuqiang; Liu, Jun; Zheng, Jiguang; Zhong, Dexing; Liu, Ruiling



Regional flood rainfall duration-frequency modeling at small ungaged sites  

NASA Astrophysics Data System (ADS)

SummaryFlood frequency data for different durations of floods are required in many practical hydrologic applications. The estimation of flood frequency as an integrated function of return period and flood duration can be accomplished by flood-duration-frequency modeling. This study introduces a new approach to regional flood-duration-frequency modeling that is based on statistical properties of combined flood-rainfall events. The approach integrates flood-duration-frequency (QDF) and rainfall depth-duration-frequency (DDF) models into one regional flood-rainfall duration-frequency model (QDDF). The proposed model has only one local parameter, which accounts for site-specific physiographic characteristics. Regional parameters of the model are determined from statistical properties of regional rainfall depth-duration-frequency curves. The main advantage of the proposed approach is that it relies on rainfall data, which are spatially and temporally more abundant than streamflow data, and usually also available in hydrologically ungaged areas. The regional QDDF model was applied to a set of small catchments from a hydro-climatologically homogeneous region in south-western Ontario, Canada. The performance of the model was compared to the performance of the conventional regional converging QDF model by means of a jack-knife procedure. The results showed that the proposed approach significantly outperformed the converging QDF model, leading to quantile estimates with three-times lower average BIAS and RMSE. The proposed QDDF model seems to be a promising alternative for the regional QDF modeling of floods in the study area.

Cunderlik, Juraj M.; Ouarda, Taha B. M. J.



Use of remote sensing for analysis and estimation of vector-borne disease  

NASA Astrophysics Data System (ADS)

An epidemiological data of malaria cases were correlated with satellite-based vegetation health (VH) indices to investigate if they can be used as a proxy for monitoring the number of malaria cases. Mosquitoes, which spread malaria in Bangladesh, are very sensitive to environmental conditions, especially to changes in weather. Therefore, VH indices, which characterize weather conditions, were tested as indicators of mosquitoes' activities in the spread of malaria. Satellite data were presented by the following VH indices: Vegetation Condition Index (VCI), Temperature Condition Index (TCI), and Vegetation Health Index (VHI). They were derived from radiances and measured by the Advanced Very High Resolution Radiometer (AVHRR) flown on NOAA afternoon polar orbiting satellites. Assessment of sensitivity of the VH was performed using correlation and regression analysis. Estimation models were validated using of Jackknife Cross-Validation procedure. Results show that the VH indices can be used for detection, and numerical estimate of the number of malaria cases. During the cooler months (January--April) when mosquitoes are less active, the correlation is low and increases considerably during the warm and wet season (April--November), for TCI in early October and for VCI in mid September. All analysis and estimation model developed here are based on data obtained for Bangladesh.

Rahman, Atiqur


Recent Developments in the Dorfman-Berbaum-Metz Procedure for Multireader ROC Study Analysis  

PubMed Central

Rationale and Objectives The Dorfman-Berbaum-Metz (DBM) method has been one of the most popular methods for analyzing multireader receiver operating characteristic (ROC) studies since it was proposed in 1992. Despite its popularity, the original procedure has several drawbacks: it is limited to jackknife accuracy estimates, it is substantially conservative, and it is not based on a satisfactory conceptual or theoretical model. Recently, solutions to these problems have been presented in three papers. Our purpose is to summarize and provide an overview of these recent developments. Materials and Methods We present and discuss the recently proposed solutions for the various drawbacks of the original DBM method. Results We compare the solutions in a simulation study and find that they result in improved performance for the DBM procedure. We also compare the solutions using two real data studies and find that the modified DBM procedure that incorporates these solutions yields more significant results and clearer interpretations of the variance component parameters than the original DBM procedure. Conclusions We recommend using the modified DBM procedure that incorporates the recent developments.

Hillis, Stephen L.; Berbaum, Kevin S.; Metz, Charles E.



Discriminating protein structure classes by incorporating Pseudo Average Chemical Shift to Chou's general PseAAC and Support Vector Machine.  


Proteins control all biological functions in living species. Protein structure is comprised of four major classes including all-? class, all-? class, ?+?, and ?/?. Each class performs different function according to their nature. Owing to the large exploration of protein sequences in the databanks, the identification of protein structure classes is difficult through conventional methods with respect to cost and time. Looking at the importance of protein structure classes, it is thus highly desirable to develop a computational model for discriminating protein structure classes with high accuracy. For this purpose, we propose a silco method by incorporating Pseudo Average Chemical Shift and Support Vector Machine. Two feature extraction schemes namely Pseudo Amino Acid Composition and Pseudo Average Chemical Shift are used to explore valuable information from protein sequences. The performance of the proposed model is assessed using four benchmark datasets 25PDB, 1189, 640 and 399 employing jackknife test. The success rates of the proposed model are 84.2%, 85.0%, 86.4%, and 89.2%, respectively on the four datasets. The empirical results reveal that the performance of our proposed model compared to existing models is promising in the literature so far and might be useful for future research. PMID:24997484

Hayat, Maqsood; Iqbal, Nadeem



On the value of nuclear and mitochondrial gene sequences for reconstructing the phylogeny of vanilloid orchids (Vanilloideae, Orchidaceae)  

PubMed Central

Background and Aims Most molecular phylogenetic studies of Orchidaceae have relied heavily on DNA sequences from the plastid genome. Nuclear and mitochondrial loci have only been superficially examined for their systematic value. Since 40% of the genera within Vanilloideae are achlorophyllous mycoheterotrophs, this is an ideal group of orchids in which to evaluate non-plastid gene sequences. Methods Phylogenetic reconstructions for Vanilloideae were produced using independent and combined data from the nuclear 18S, 5·8S and 26S rDNA genes and the mitochondrial atpA gene and nad1b-c intron. Key Results These new data indicate placements for genera such as Lecanorchis and Galeola, for which plastid gene sequences have been mostly unavailable. Nuclear and mitochondrial parsimony jackknife trees are congruent with each other and previously published trees based solely on plastid data. Because of high rates of sequence divergence among vanilloid orchids, even the short 5·8S rDNA gene provides impressive levels of resolution and support. Conclusions Orchid systematists are encouraged to sequence nuclear and mitochondrial gene regions along with the growing number of plastid loci available.

Cameron, Kenneth M.



COMDYN: Software to study the dynamics of animal communities using a capture-recapture approach  

USGS Publications Warehouse

COMDYN is a set of programs developed for estimation of parameters associated with community dynamics using count data from two locations or time periods. It is Internet-based, allowing remote users either to input their own data, or to use data from the North American Breeding Bird Survey for analysis. COMDYN allows probability of detection to vary among species and among locations and time periods. The basic estimator for species richness underlying all estimators is the jackknife estimator proposed by Burnham and Overton. Estimators are presented for quantities associated with temporal change in species richness, including rate of change in species richness over time, local extinction probability, local species turnover and number of local colonizing species. Estimators are also presented for quantities associated with spatial variation in species richness, including relative richness at two locations and proportion of species present in one location that are also present at a second location. Application of the estimators to species richness estimation has been previously described and justified. The potential applications of these programs are discussed.

Hines, J.E.; Boulinier, T.; Nichols, J.D.; Sauer, J.R.; Pollock, K.H.



iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach.  


Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome's translating mRNA will undergo a series of "product-forming" steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area. PMID:24977164

Qiu, Wang-Ren; Xiao, Xuan; Lin, Wei-Zhong; Chou, Kuo-Chen



Differentiating prenatal exposure to methamphetamine and alcohol versus alcohol and not methamphetamine using tensor-based brain morphometry and discriminant analysis.  


Here we investigate the effects of prenatal exposure to methamphetamine (MA) on local brain volume using magnetic resonance imaging. Because many who use MA during pregnancy also use alcohol, a known teratogen, we examined whether local brain volumes differed among 61 children (ages 5-15 years), 21 with prenatal MA exposure, 18 with concomitant prenatal alcohol exposure (the MAA group), 13 with heavy prenatal alcohol but not MA exposure (ALC group), and 27 unexposed controls. Volume reductions were observed in both exposure groups relative to controls in striatal and thalamic regions bilaterally and in right prefrontal and left occipitoparietal cortices. Striatal volume reductions were more severe in the MAA group than in the ALC group, and, within the MAA group, a negative correlation between full-scale intelligence quotient (FSIQ) scores and caudate volume was observed. Limbic structures, including the anterior and posterior cingulate, the inferior frontal gyrus (IFG), and ventral and lateral temporal lobes bilaterally, were increased in volume in both exposure groups. Furthermore, cingulate and right IFG volume increases were more pronounced in the MAA than ALC group. Discriminant function analyses using local volume measurements and FSIQ were used to predict group membership, yielding factor scores that correctly classified 72% of participants in jackknife analyses. These findings suggest that striatal and limbic structures, known to be sites of neurotoxicity in adult MA abusers, may be more vulnerable to prenatal MA exposure than alcohol exposure and that more severe striatal damage is associated with more severe cognitive deficit. PMID:20237258

Sowell, Elizabeth R; Leow, Alex D; Bookheimer, Susan Y; Smith, Lynne M; O'Connor, Mary J; Kan, Eric; Rosso, Carly; Houston, Suzanne; Dinov, Ivo D; Thompson, Paul M



First Results from the Q/U Imaging ExperimenT (QUIET)  

NASA Astrophysics Data System (ADS)

The Q/U Imaging ExperimenT (QUIET) is a large-angular-scale telescope designed to measure the polarization of the cosmic microwave background from the Atacama Desert, Chile and to place direct, competitive limits on the tensor-to-scalar ratio (which parameterizes primordial inflationary B modes) using solely polarization information. We have used QUIET to observe 1000,q. deg. of low-foreground sky at 43 (Q band) and 95,Hz (W band) between October 2008 and December 2010, collecting some 10000,ours of data in that time. The integrity of the Q-band data analysis has been verified with an extensive suite of jackknife tests for nullity, and by comparing results from two independent (and blind) analysis pipelines. I shall give an overview of QUIET and present the first power-spectrum results from the Q-band data set, including the E-mode power spectrum, a limit on the tensor-to-scalar ratio, and the detection of polarized Galactic synchroton emission away from the Galactic plane.

Zwart, Jonathan



Technical note: comparing von Luschan skin color tiles and modern spectrophotometry for measuring human skin pigmentation.  


Prior to the introduction of reflectance spectrophotometry into anthropological field research during the 1950s, human skin color was most commonly classified by visual skin color matching using the von Luschan tiles, a set of 36 standardized, opaque glass tiles arranged in a chromatic scale. Our goal was to establish a conversion formula between the tile-based color matching method and modern reflectance spectrophotometry to make historical and contemporary data comparable. Skin pigmentation measurements were taken on the forehead, inner upper arms, and backs of the hands using both the tiles and a spectrophotometer on 246 participants showing a broad range of skin pigmentation. From these data, a second-order polynomial conversion formula was derived by jackknife analysis to estimate melanin index (M-index) based on tile values. This conversion formula provides a means for comparing modern data to von Luschan tile measurements recorded in historical reports. This is particularly important for populations now extinct, extirpated, or admixed for which tile-based measures of skin pigmentation are the only data available. PMID:23633083

Swiatoniowski, Anna K; Quillen, Ellen E; Shriver, Mark D; Jablonski, Nina G



Stock structure of Lake Baikal omul as determined by whole-body morphology  

USGS Publications Warehouse

In Lake Baikal, three morphotypes of omul Coregonus autumnalis migratorius are recognized; the littoral, pelagic, and deep-water forms. Morphotype assignment is difficult, and similar to that encountered in pelagic and deep-water coregonines in the Laurentian Great Lakes. Principal component analysis revealed separation of all three morphotypes based on caudal peduncle length and depth, length and depth of the body between the dorsal and anal fin, and distance between the pectoral and pelvic fins. Strong negative loadings were associated with head measurements. Omul of the same morphotype captured at different locations were classified to location of capture using step-wise discriminant function analysis. Jackknife correct classifications ranged from 43 to 78% for littoral omul from five locations, and 45-86% for pelagic omul from four locations. Patterns of local misclassification of littoral omul suggested that the sub-population structure, hence stock affinity, may be influenced by movements and intermixing of individuals among areas that are joined bathymetrically. Pelagic omul were more distinguishable by site and may support a previous hypothesis of a spawning based rather than a foraging-based sub-population structure. Omul morphotypes may reflect adaptations to both ecological and local environmental conditions, and may have a genetic basis.

Bronte, Charles R.; Fleischer, G. W.; Maistrenko, S. G.; Pronin, N. M.



Multivariate seismic calibration for the Novaya Zemlya test site. Report No. 2, 27 June 1991-22 June 1992  

SciTech Connect

Within the last year, Soviet yield data have been acquired by DARPA for over 40 underground nuclear explosions at the Novaya Zemlya Test Site between 1964 and 1990. These yields are compared to previous estimates by other authors, based on observed seismic magnitudes and magnitude-log yield relations transported from other test sites. Several discrepancies in the yield data are noted. Seismic magnitude data, based on NORSAR Lg and P coda, Grafenberg Lg, and a world-wide m sub b, have been published by Ringdal and Fyen (1991) for 18 of these events. A similar set of Soviet network magnitudes have been published by Israelsson (1992). Using these data, estimates of the multivariate calibration parameters of the magnitude-log yield relations are computed. An outlier test is applied to the residuals to the lines of best fit. One of the two smallest events is identified as an outlier for every multivariate magnitude combination. A classical confidence interval is presented to estimate future yields, based on estimates of the unknown multivariate calibration parameters. A test of TTBT compliance and a definition of the F-number, based on the confidence interval, are also provided. F-number estimates are obtained for various magnitude combinations by jackknifing. The reliability of the results is discussed, in light of the fact that the data are tightly clustered for 16 of the 18 events.

Fisk, M.D.; Gray, H.L.; Alewine, R.W.; McCartor, G.D.



Use of Repetitive DNA Sequences and the PCR To Differentiate Escherichia coli Isolates from Human and Animal Sources  

PubMed Central

The rep-PCR DNA fingerprint technique, which uses repetitive intergenic DNA sequences, was investigated as a way to differentiate between human and animal sources of fecal pollution. BOX and REP primers were used to generate DNA fingerprints from Escherichia coli strains isolated from human and animal sources (geese, ducks, cows, pigs, chickens, and sheep). Our initial studies revealed that the DNA fingerprints obtained with the BOX primer were more effective for grouping E. coli strains than the DNA fingerprints obtained with REP primers. The BOX primer DNA fingerprints of 154 E. coli isolates were analyzed by using the Jaccard band-matching algorithm. Jackknife analysis of the resulting similarity coefficients revealed that 100% of the chicken and cow isolates and between 78 and 90% of the human, goose, duck, pig, and sheep isolates were assigned to the correct source groups. A dendrogram constructed by using Jaccard similarity coefficients almost completely separated the human isolates from the nonhuman isolates. Multivariate analysis of variance, a form of discriminant analysis, successfully differentiated the isolates and placed them in the appropriate source groups. Taken together, our results indicate that rep-PCR performed with the BOX A1R primer may be a useful and effective tool for rapidly determining sources of fecal pollution.

Dombek, Priscilla E.; Johnson, LeeAnn K.; Zimmerley, Sara T.; Sadowsky, Michael J.



Predicting Drug-Target Interaction Networks Based on Functional Groups and Biological Features  

PubMed Central

Background Study of drug-target interaction networks is an important topic for drug development. It is both time-consuming and costly to determine compound-protein interactions or potential drug-target interactions by experiments alone. As a complement, the in silico prediction methods can provide us with very useful information in a timely manner. Methods/Principal Findings To realize this, drug compounds are encoded with functional groups and proteins encoded by biological features including biochemical and physicochemical properties. The optimal feature selection procedures are adopted by means of the mRMR (Maximum Relevance Minimum Redundancy) method. Instead of classifying the proteins as a whole family, target proteins are divided into four groups: enzymes, ion channels, G-protein- coupled receptors and nuclear receptors. Thus, four independent predictors are established using the Nearest Neighbor algorithm as their operation engine, with each to predict the interactions between drugs and one of the four protein groups. As a result, the overall success rates by the jackknife cross-validation tests achieved with the four predictors are 85.48%, 80.78%, 78.49%, and 85.66%, respectively. Conclusion/Significance Our results indicate that the network prediction system thus established is quite promising and encouraging.

Shi, Xiao-He; Hu, Le-Le; Kong, Xiangyin; Cai, Yu-Dong; Chou, Kuo-Chen



Evaluation of marked-recapture for estimating striped skunk abundance  

USGS Publications Warehouse

The mark-recapture method for estimating striped skunk (Mephitis mephitis) abundance was evaluated by systematically livetrapping a radio-equipped population on a 31.4-km2 study area in North Dakota during late April of 1977 and 1978. The study population was 10 females and 13 males in 1977 and 20 females and 8 males in 1978. Skunks were almost exclusively nocturnal. Males traveled greater nightly distances than females (3.3 vs. 2.6 km, P < 0.05) and had larger home ranges (308 vs. 242 ha) although not significantly so. Increased windchill reduced night-time activity. The population was demographically but not geographically closed. Frequency of capture was positively correlated with time skunks spent on the study area. Little variation in capture probabilities was found among trap-nights. Skunks exhibited neither trap-proneness nor shyness. Capture rates in 1977 were higher for males than for females; the reverse occurred in 1978. Variation in individual capture rates was indicated among males in 1977 and among females in 1978. Ten estimators produced generally similar results, but all underestimated true population size. Underestimation was a function of the number of untrapped skunks, primarily those that spent limited time on the study area. The jackknife method produced the best estimates of skunk abundance.

Greenwood, R. J.; Sargeant, A. B.; Johnson, D. H.



LabCaS: Labeling calpain substrate cleavage sites from amino acid sequence using conditional random fields  

PubMed Central

The calpain family of Ca2+-dependent cysteine proteases plays a vital role in many important biological processes which is closely related with a variety of pathological states. Activated calpains selectively cleave relevant substrates at specific cleavage sites, yielding multiple fragments that can have different functions from the intact substrate protein. Until now, our knowledge about the calpain functions and their substrate cleavage mechanisms are limited because the experimental determination and validation on calpain binding are usually laborious and expensive. In this work, we aim to develop a new computational approach (LabCaS) for accurate prediction of the calpain substrate cleavage sites from amino acid sequences. To overcome the imbalance of negative and positive samples in the machine-learning training which have been suffered by most of the former approaches when splitting sequences into short peptides, we designed a conditional random field algorithm that can label the potential cleavage sites directly from the entire sequences. By integrating the multiple amino acid features and those derived from sequences, LabCaS achieves an accurate recognition of the cleave sites for most calpain proteins. In a jackknife test on a set of 129 benchmark proteins, LabCaS generates an AUC score 0.862. The LabCaS program is freely available at:

Fan, Yong-Xian; Zhang, Yang; Shen, Hong-Bin



EFICAz: a comprehensive approach for accurate genome-scale enzyme function inference  

PubMed Central

EFICAz (Enzyme Function Inference by Combined Approach) is an automatic engine for large-scale enzyme function inference that combines predictions from four different methods developed and optimized to achieve high prediction accuracy: (i) recognition of functionally discriminating residues (FDRs) in enzyme families obtained by a Conservation-controlled HMM Iterative procedure for Enzyme Family classification (CHIEFc), (ii) pairwise sequence comparison using a family specific Sequence Identity Threshold, (iii) recognition of FDRs in Multiple Pfam enzyme families, and (iv) recognition of multiple Prosite patterns of high specificity. For FDR (i.e. conserved positions in an enzyme family that discriminate between true and false members of the family) identification, we have developed an Evolutionary Footprinting method that uses evolutionary information from homofunctional and heterofunctional multiple sequence alignments associated with an enzyme family. The FDRs show a significant correlation with annotated active site residues. In a jackknife test, EFICAz shows high accuracy (92%) and sensitivity (82%) for predicting four EC digits in testing sequences that are <40% identical to any member of the corresponding training set. Applied to Escherichia coli genome, EFICAz assigns more detailed enzymatic function than KEGG, and generates numerous novel predictions.

Tian, Weidong; Arakaki, Adrian K.; Skolnick, Jeffrey



Identifying the subfamilies of voltage-gated potassium channels using feature selection technique.  


Voltage-gated K+ channel (VKC) plays important roles in biology procession, especially in nervous system. Different subfamilies of VKCs have different biological functions. Thus, knowing VKCs' subfamilies has become a meaningful job because it can guide the direction for the disease diagnosis and drug design. However, the traditional wet-experimental methods were costly and time-consuming. It is highly desirable to develop an effective and powerful computational tool for identifying different subfamilies of VKCs. In this study, a predictor, called iVKC-OTC, has been developed by incorporating the optimized tripeptide composition (OTC) generated by feature selection technique into the general form of pseudo-amino acid composition to identify six subfamilies of VKCs. One of the remarkable advantages of introducing the optimized tripeptide composition is being able to avoid the notorious dimension disaster or over fitting problems in statistical predictions. It was observed on a benchmark dataset, by using a jackknife test, that the overall accuracy achieved by iVKC-OTC reaches to 96.77% in identifying the six subfamilies of VKCs, indicating that the new predictor is promising or at least may become a complementary tool to the existing methods in this area. It has not escaped our notice that the optimized tripeptide composition can also be used to investigate other protein classification problems. PMID:25054318

Liu, Wei-Xin; Deng, En-Ze; Chen, Wei; Lin, Hao



Using Chou's pseudo amino acid composition to predict protein quaternary structure: a sequence-segmented PseAAC approach.  


In the protein universe, many proteins are composed of two or more polypeptide chains, generally referred to as subunits, which associate through noncovalent interactions and, occasionally, disulfide bonds to form protein quaternary structures. It has long been known that the functions of proteins are closely related to their quaternary structures; some examples include enzymes, hemoglobin, DNA polymerase, and ion channels. However, it is extremely labor-expensive and even impossible to quickly determine the structures of hundreds of thousands of protein sequences solely from experiments. Since the number of protein sequences entering databanks is increasing rapidly, it is highly desirable to develop computational methods for classifying the quaternary structures of proteins from their primary sequences. Since the concept of Chou's pseudo amino acid composition (PseAAC) was introduced, a variety of approaches, such as residue conservation scores, von Neumann entropy, multiscale energy, autocorrelation function, moment descriptors, and cellular automata, have been utilized to formulate the PseAAC for predicting different attributes of proteins. Here, in a different approach, a sequence-segmented PseAAC is introduced to represent protein samples. Meanwhile, multiclass SVM classifier modules were adopted to classify protein quaternary structures. As a demonstration, the dataset constructed by Chou and Cai [(2003) Proteins 53:282-289] was adopted as a benchmark dataset. The overall jackknife success rates thus obtained were 88.2-89.1%, indicating that the new approach is quite promising for predicting protein quaternary structure. PMID:18427713

Zhang, Shao-Wu; Chen, Wei; Yang, Feng; Pan, Quan



Heritable changes in regional cortical thickness with age.  


It is now well established that regional indices of brain structure such as cortical thickness, surface area or grey matter volume exhibit spatially variable patterns of heritability. However, a recent study found these patterns to change with age during development, a result supported by gene expression studies. Changes in heritability have not been investigated in adulthood so far and could have important implications in the study of heritability and genetic correlations in the brain as well as in the discovery of specific genes explaining them. Herein, we tested for genotype by age (G ×A) interactions, an extension of genotype by environment interactions, through adulthood and healthy aging in 902 subjects from the Genetics of Brain Structure (GOBS) study. A "jackknife" based method for the analysis of stable cortical thickness clusters (JASC) and scale selection is also introduced. Although additive genetic variance remained constant throughout adulthood, we found evidence for incomplete pleiotropy across age in the cortical thickness of paralimbic and parieto-temporal areas. This suggests that different genetic factors account for cortical thickness heritability at different ages in these regions. PMID:24752552

Chouinard-Decorte, Francois; McKay, D Reese; Reid, Andrew; Khundrakpam, Budhachandra; Zhao, Lu; Karama, Sherif; Rioux, Pierre; Sprooten, Emma; Knowles, Emma; Kent, Jack W; Curran, Joanne E; Göring, Harald H H; Dyer, Thomas D; Olvera, Rene L; Kochunov, Peter; Duggirala, Ravi; Fox, Peter T; Almasy, Laura; Blangero, John; Bellec, Pierre; Evans, Alan C; Glahn, David C



Swfoldrate: predicting protein folding rates from amino acid sequence with sliding window method.  


Protein folding is the process by which a protein processes from its denatured state to its specific biologically active conformation. Understanding the relationship between sequences and the folding rates of proteins remains an important challenge. Most previous methods of predicting protein folding rate require the tertiary structure of a protein as an input. In this study, the long-range and short-range contact in protein were used to derive extended version of the pseudo amino acid composition based on sliding window method. This method is capable of predicting the protein folding rates just from the amino acid sequence without the aid of any structural class information. We systematically studied the contributions of individual features to folding rate prediction. The optimal feature selection procedures are adopted by means of combining the forward feature selection and sequential backward selection method. Using the jackknife cross validation test, the method was demonstrated on the large dataset. The predictor was achieved on the basis of multitudinous physicochemical features and statistical features from protein using nonlinear support vector machine (SVM) regression model, the method obtained an excellent agreement between predicted and experimentally observed folding rates of proteins. The correlation coefficient is 0.9313 and the standard error is 2.2692. The prediction server is freely available at PMID:22933332

Cheng, Xiang; Xiao, Xuan; Wu, Zhi-cheng; Wang, Pu; Lin, Wei-zhong



Predicting membrane protein types with bragging learner.  


The membrane protein type is an important feature in characterizing the overall topological folding type of a protein or its domains therein. Many investigators have put their efforts to the prediction of membrane protein type. Here, we propose a new approach, the bootstrap aggregating method or bragging learner, to address this problem based on the protein amino acid composition. As a demonstration, the benchmark dataset constructed by K.C. Chou and D.W. Elrod was used to test the new method. The overall success rate thus obtained by jackknife cross-validation was over 84%, indicating that the bragging learner as presented in this paper holds a quite high potential in predicting the attributes of proteins, or at least can play a complementary role to many existing algorithms in this area. It is anticipated that the prediction quality can be further enhanced if the pseudo amino acid composition can be effectively incorporated into the current predictor. An online membrane protein type prediction web server developed in our lab is available at PMID:18680454

Niu, Bing; Jin, Yu-Huan; Feng, Kai-Yan; Liu, Liang; Lu, Wen-Cong; Cai, Yu-Dong; Li, Guo-Zheng



Conotoxin superfamily prediction using diffusion maps dimensionality reduction and subspace classifier.  


Conotoxins are disulfide-rich small peptides that are invaluable channel-targeted peptides and target neuronal receptors, which have been demonstrated to be potent pharmaceuticals in the treatment of Alzheimer's disease, Parkinson's disease, and epilepsy. Accurate prediction of conotoxin superfamily would have many important applications towards the understanding of its biological and pharmacological functions. In this study, a novel method, named dHKNN, is developed to predict conotoxin superfamily. Firstly, we extract the protein's sequential features composed of physicochemical properties, evolutionary information, predicted secondary structures and amino acid composition. Secondly, we use the diffusion maps for dimensionality reduction, which interpret the eigenfunctions of Markov matrices as a system of coordinates on the original data set in order to obtain efficient representation of data geometric descriptions. Finally, an improved K-local hyperplane distance nearest neighbor subspace classifier method called dHKNN is proposed for predicting conotoxin superfamilies by considering the local density information in the diffusion space. The overall accuracy of 91.90% is obtained through the jackknife cross-validation test on a benchmark dataset, indicating the proposed dHKNN is promising. PMID:21787305

Yin, Jiang-Bo; Fan, Yong-Xian; Shen, Hong-Bin



Using pseudo amino acid composition to predict protease families by incorporating a series of protein biological features.  


Proteases are essential to most biological processes though they themselves remain intact during the processes. In this research, a computational approach was developed for predicting the families of proteases based on their sequences. According to the concept of pseudo amino acid composition, in order to catch the essential patterns for the sequences of proteases, the sample of a protein was formulated by a series of its biological features. There were a total of 132 biological features, which were sourced from various biochemical and physicochemical properties of the constituent amino acids. The importance of these features to the prediction is rated by Maximum Relevance Minimum Redundancy algorithm and then the Incremental Feature Selection was applied to select an optimal feature set, which was used to construct a predictor through the nearest neighbor algorithm. As a demonstration, the overall success rate by the jackknife test in identifying proteases among their seven families was 92.74%. It was revealed by further analysis on the optimal feature set that the secondary structure and amino acid composition play the key roles for the classification, which is quite consistent with some previous findings. The promising results imply that the predictor as presented in this paper may become a useful tool for studying proteases. PMID:21271978

Hu, Lele; Zheng, Lulu; Wang, Zhiwen; Li, Bing; Liu, Lei



A novel computational approach to predict transcription factor DNA binding preference.  


Transcription is one of the most important processes in cell in which transcription factors translate DNA sequences into RNA sequences. Accurate prediction of DNA binding preference of transcription factors is valuable for understanding the transcription regulatory mechanism and (1) elucidating regulation network. (2-4) Here we predict the DNA binding preference of transcription factor based on the protein amino acid composition and physicochemical properties, 0/1 encoding system of nucleotide, minimum Redundancy Maximum Relevance Feature Selection method, (5) and Nearest Neighbor Algorithm. The overall prediction accuracy of Jackknife cross-validation test is 91.1%, indicating that this approach is a useful tool to explore the relation between transcription factor and its binding sites. Moreover, we find that the secondary structure and polarizability of transcriptor contribute mostly in the prediction. Especially, a 7-nt motif with AT-rich region of the DNA binding sites discovered via our method is also consistent with the statistical analysis from the TRANSFAC database. (6). PMID:19099508

Cai, Yudong; He, Jianfeng; Li, Xinlei; Lu, Lin; Yang, Xinyi; Feng, Kaiyan; Lu, Wencong; Kong, Xiangyin



Digital core biopsy tissue texture used to distinguish benign from malignant breast calcifications  

NASA Astrophysics Data System (ADS)

To avoid missing breast malignancies, large numbers of benign breast lesions must be biopsied. To reduce the number of benign biopsies, a computer aided diagnosis (CAD) method has been developed which is based on tissue texture surrounding calcifications. When core samples are obtained stereotaxically, a digital record of the area biopsied is available. This method has been tested on 82 biopsies containing calcifications. Of these, 52 were benign and 30 were malignant. A region of interest centered on the biopsied area was processed to obtain texture features. Both co-occurrence and fractal features were collected and used with stepwise linear discriminant analysis to isolate useful features. A jackknife method identified ten features that gave a probability distribution associated with malignancy. Because of this association, a probability could be selected which eliminated 12 of the 52 benign biopsies without missing a malignancy. Thirty-nine could be avoided if five malignancies could be followed rather than biopsied. Unfortunately, four of these five missed malignancies do not have strong visual signs of malignancy and so the texture measurement error would not be overruled by radiological signs.

Kimme-Smith, Carolyn; Thiele, David; Johnson, Timothy; Zhou, Wensheng; Bassett, Lawrence W.



Discriminating bioluminescent proteins by incorporating average chemical shift and evolutionary information into the general form of Chou's pseudo amino acid composition.  


Bioluminescent proteins are highly sensitive optical reporters for imaging in live animals; they have been extensively used in analytical applications in intracellular monitoring, genetic regulation and detection, and immune and binding assays. In this work, we systematically analyzed the sequence and structure information of 199 bioluminescent and nonbioluminescent proteins, respectively. Based on the results, we presented a novel method called auto covariance of averaged chemical shift (acACS) for extracting structure features from a sequence. A classifier of support vector machine (SVM) fusing increment of diversity (ID) was used to distinguish bioluminescent proteins from nonbioluminescent proteins by combining dipeptide composition, reduced amino acid composition, evolutionary information, and acACS. The overall prediction accuracy evaluated by jackknife validation reached 82.16%. This result was better than that obtained by other existing methods. Improvement of the overall prediction accuracy reached up to 5.33% higher than those of the SVM and auto covariance of sequential evolution information by 10-fold cross-validation. The acACS algorithm also outperformed other feature extraction methods, indicating that our approach is better than other existing methods in the literature. PMID:23770403

Fan, Guo-Liang; Li, Qian-Zhong



Tomographic imaging of local earthquake delay times for three-dimensional velocity variation in western Washington  

NASA Astrophysics Data System (ADS)

Tomographic inversion is applied to delay times from local earthquakes to image three dimensional velocity variations in the Puget Sound region of Western Washington. The 37,500 square km region is represented by nearly cubic blocks of 5 km per side. P-wave arrival time observations from 4,387 crustal earthquakes, with depths of 0 to 40 km, were used as sources producing 36,865 rays covering the target region. A conjugate gradient method (LSQR) is used to invert the large, sparse system of equations. To diminish the effects of noisy data, the Laplacian is constrained to be zero within horizontal layers, providing smoothing of the model. The resolution is estimated by calculating impulse responses at blocks of interest and estimates of standard errors are calculated by the jackknife statistical procedure. Results of the inversion are correlated with some known geologic features and independent geophysical measurements. High P-wave velocities along the eastern flank of the Olympic Peninsula are interpreted to reflect the subsurface extension of Crescent terrane. Low velocities beneath the Puget Sound further to the east are inferred to reflect thick sediment accumulations. The Crescent terrane appears to extend beneath Puget Sound, consistent with its interpretation as a major accretionary unit. In the southern Puget Sound basin, high velocity anomalies at depths of 10-20 km are interpreted as Crescent terrane and are correlated with a region of low seismicity. Near Mt. Rainier, high velocity anomalies may reflect buried plutons.

Lees, Jonathan M.; Crosson, Robert S.



Regional flood frequency analysis at ungauged sites using the adaptive neuro-fuzzy inference system  

NASA Astrophysics Data System (ADS)

SummaryIn this paper, the methodology of using adaptive neuro-fuzzy inference systems (ANFIS) for flood quantile estimation at ungauged sites is presented. The proposed approach has the system identification and interpretability of fuzzy models and the learning capability of artificial neural networks (ANNs). The structure of the ANFIS is identified using the subtractive clustering algorithm. A hybrid learning algorithm consisting of back-propagation and least-squares estimation is used for system training. The ANFIS approach provides an integrated mechanism for identifying the hydrological regions, generating knowledge from the data, providing flood estimates and self-tuning to achieve the optimal performance. The proposed approach is applied to 151 catchments in the province of Quebec, Canada, and is compared to the ANN approach, the nonlinear regression (NLR) approach and the nonlinear regression with regionalization approach (NLR-R). A jackknife procedure is used for the evaluation of the performances of the three approaches. Results indicate that the ANFIS approach has a much better generalization capability than the NLR and NLR-R approaches and is comparable to the ANN approach.

Shu, C.; Ouarda, T. B. M. J.




SciTech Connect

Background Imaging of Cosmic Extragalactic Polarization (BICEP) is a bolometric polarimeter designed to measure the inflationary B-mode polarization of the cosmic microwave background (CMB) at degree angular scales. During three seasons of observing at the South Pole (2006 through 2008), BICEP mapped {approx}2% of the sky chosen to be uniquely clean of polarized foreground emission. Here, we present initial results derived from a subset of the data acquired during the first two years. We present maps of temperature, Stokes Q and U, E and B modes, and associated angular power spectra. We demonstrate that the polarization data are self-consistent by performing a series of jackknife tests. We study potential systematic errors in detail and show that they are sub-dominant to the statistical errors. We measure the E-mode angular power spectrum with high precision at 21 <= l <= 335, detecting for the first time the peak expected at l {approx} 140. The measured E-mode spectrum is consistent with expectations from a LAMBDACDM model, and the B-mode spectrum is consistent with zero. The tensor-to-scalar ratio derived from the B-mode spectrum is r = 0.02{sup +0.31}{sub -0.26}, or r < 0.72 at 95% confidence, the first meaningful constraint on the inflationary gravitational wave background to come directly from CMB B-mode polarization.

Chiang, H. C.; Barkats, D.; Bock, J. J.; Hristov, V. V.; Jones, W. C.; Kovac, J. M.; Lange, A. E.; Mason, P. V.; Matsumura, T. [Department of Physics, California Institute of Technology, Pasadena, CA 91125 (United States); Ade, P. A. R. [Department of Physics and Astronomy, University of Wales, Cardiff, CF24 3YB, Wales (United Kingdom); Battle, J. O.; Dowell, C. D.; Nguyen, H. T. [Jet Propulsion Laboratory, Pasadena, CA 91109 (United States); Bierman, E. M.; Keating, B. G. [Department of Physics, University of California at San Diego, La Jolla, CA 92093 (United States); Duband, L. [SBT, Commissariat a l'Energie Atomique, Grenoble (France); Hivon, E. F. [Institut d'Astrophysique de Paris, Paris (France); Holzapfel, W. L. [Department of Physics, University of California at Berkeley, Berkeley, CA 94720 (United States); Kuo, C. L. [Stanford University, Palo Alto, CA 94305 (United States); Leitch, E. M. [University of Chicago, Chicago, IL 60637 (United States)



Sensitivity analysis for misclassification in logistic regression via likelihood methods and predictive value weighting  

PubMed Central

The potential for bias due to misclassification error in regression analysis is well understood by statisticians and epidemiologists. Assuming little or no available data for estimating misclassification probabilities, investigators sometimes seek to gauge the sensitivity of an estimated effect to variations in the assumed values of those probabilities. We present an intuitive and flexible approach to such a sensitivity analysis, assuming an underlying logistic regression model. For outcome misclassification, we argue that a likelihood-based analysis is the cleanest and the most preferable approach. In the case of covariate misclassification, we combine observed data on the outcome, error-prone binary covariate of interest, and other covariates measured without error, together with investigator-supplied values for sensitivity and specificity parameters, to produce corresponding positive and negative predictive values. These values serve as estimated weights to be used in fitting the model of interest to an appropriately defined expanded data set using standard statistical software. Jackknifing provides a convenient tool for incorporating uncertainty in the estimated weights into valid standard errors to accompany log odds ratio estimates obtained from the sensitivity analysis. Examples illustrate the flexibility of this unified strategy, and simulations suggest that it performs well relative to a maximum likelihood approach carried out via numerical optimization.

Lyles, Robert H.; Lin, Ji



Discriminating lysosomal membrane protein types using dynamic neural network.  


This work presents a dynamic artificial neural network methodology, which classifies the proteins into their classes from their sequences alone: the lysosomal membrane protein classes and the various other membranes protein classes. In this paper, neural networks-based lysosomal-associated membrane protein type prediction system is proposed. Different protein sequence representations are fused to extract the features of a protein sequence, which includes seven feature sets; amino acid (AA) composition, sequence length, hydrophobic group, electronic group, sum of hydrophobicity, R-group, and dipeptide composition. To reduce the dimensionality of the large feature vector, we applied the principal component analysis. The probabilistic neural network, generalized regression neural network, and Elman regression neural network (RNN) are used as classifiers and compared with layer recurrent network (LRN), a dynamic network. The dynamic networks have memory, i.e. its output depends not only on the input but the previous outputs also. Thus, the accuracy of LRN classifier among all other artificial neural networks comes out to be the highest. The overall accuracy of jackknife cross-validation is 93.2% for the data-set. These predicted results suggest that the method can be effectively applied to discriminate lysosomal associated membrane proteins from other membrane proteins (Type-I, Outer membrane proteins, GPI-Anchored) and Globular proteins, and it also indicates that the protein sequence representation can better reflect the core feature of membrane proteins than the classical AA composition. PMID:23968467

Tripathi, Vijay; Gupta, Dwijendra Kumar



Predicting thermophilic proteins with pseudo amino acid composition:approached from chaos game representation and principal component analysis.  


Comprehensive knowledge of thermophilic mechanisms about some organisms whose optimum growth temperature (OGT) ranges from 50 to 80 °C degree plays a major role for helping to design stable proteins. How to predict function-unknown proteins to be thermophilic is a long but not fairly resolved problem. Chaos game representation (CGR) can investigate hidden patterns in protein sequences, and also can visually reveal their previously unknown structures. In this paper, using the general form of pseudo amino acid composition to represent protein samples, we proposed a novel method for presenting protein sequence to a CGR picture using CGR algorithm. A 24-dimensional vector extracted from these CGR segments and the first two PCA features are used to classify thermophilic and mesophilic proteins by Support Vector Machine (SVM). Our method is evaluated by the jackknife test. For the 24-dimensional vector, the accuracy is 0.8792 and Matthews Correlation Coefficient (MCC) is 0.7587. The 26-dimensional vector by hybridizing with PCA components performs highly satisfaction, in which the accuracy achieves 0.9944 and MCC achieves 0.9888. The results show the effectiveness of the new hybrid method. PMID:21787282

Liu, Xiao-Lei; Lu, Jin-Long; Hu, Xue-Hai



Predicting the Types of J-Proteins Using Clustered Amino Acids  

PubMed Central

J-proteins are molecular chaperones and present in a wide variety of organisms from prokaryote to eukaryote. Based on their domain organizations, J-proteins can be classified into 4 types, that is, Type I, Type II, Type III, and Type IV. Different types of J-proteins play distinct roles in influencing cancer properties and cell death. Thus, reliably annotating the types of J-proteins is essential to better understand their molecular functions. In the present work, a support vector machine based method was developed to identify the types of J-proteins using the tripeptide composition of reduced amino acid alphabet. In the jackknife cross-validation, the maximum overall accuracy of 94% was achieved on a stringent benchmark dataset. We also analyzed the amino acid compositions by using analysis of variance and found the distinct distributions of amino acids in each family of the J-proteins. To enhance the value of the practical applications of the proposed model, an online web server was developed and can be freely accessed.

Feng, Pengmian; Zuo, Yongchun



Pharmacokinetics of oxymorphone in cats.  


This study reports the pharmacokinetics of oxymorphone in spayed female cats after intravenous administration. Six healthy adult domestic shorthair spayed female cats were used. Oxymorphone (0.1?mg/kg) was administered intravenously as a bolus. Blood samples were collected immediately prior to oxymorphone administration and at various times up to 480?min following administration. Plasma oxymorphone concentrations were determined by liquid chromatography-mass spectrometry, and plasma oxymorphone concentration-time data were fitted to compartmental models. A three-compartment model, with input in and elimination from the central compartment, best described the disposition of oxymorphone following intravenous administration. The apparent volume of distribution of the central compartment and apparent volume of distribution at steady state [mean?±?SEM (range)] and the clearance and terminal half-life [harmonic mean?±?jackknife pseudo-SD (range)] were 1.1?±?0.2 (0.4-1.7) L/kg, 2.5?±?0.4 (2.4-4.4) L/kg, 26?±?7 (18-38) mL/, and 96?±?49 (62-277) min, respectively. The disposition of oxymorphone in cats is characterized by a moderate volume of distribution and a short terminal half-life. PMID:21323677

Siao, K T; Pypendop, B H; Stanley, S D; Ilkiw, J E



Bioequivalence evaluation of two formulations of pidotimod using a limited sampling strategy.  


The aim of this study was to develop a limited sampling strategy (LSS) to assess the bioequivalence of two formulations of pidotimod. A randomized, two-way, cross-over study was conducted in healthy Chinese volunteers to compare two formulations of pidotimod. A limited sampling model was established using regression models to estimate the pharmacokinetic parameters and assess the bioequivalence of pidotimod. The model was internally validated by the Jack-knife method and graphical methods. The traditional non-compartmental method was also used to analyze the data and compared with LSS method. The results indicate that following oral administration of a single 800 mg dose, the plasma AUC(0-12 h) and C(max) of pidotimod can be predicted accurately using only two to four plasma samples. The bioequivalence assessment based on the LSS models provided results very similar to that obtained using all the observed concentration-time data points and indicate that the two pidotimod formulations were bioequivalent. A LSS method for assessing the bioequivalence of pidotimod formulations was established and proved to be applicable and accurate. This LSS method could be considered appropriate for a pidotimod bioequivalence study, providing an inexpensive cost of sampling acquisition and analysis. And the methodology presented here may also be applicable to bioequivalence evaluation of other medications. PMID:23639228

Huang, Ji-Han; Huang, Xiao-Hui; Wang, Kun; Li, Jian-Chun; Xie, Xue-Feng; Shen, Chen-Lin; Li, Lu-Jin; Zheng, Qing-Shan



Human DNA Ligase III Recognizes DNA Ends by Dynamic Switching between Two DNA-Bound States  

SciTech Connect

Human DNA ligase III has essential functions in nuclear and mitochondrial DNA replication and repair and contains a PARP-like zinc finger (ZnF) that increases the extent of DNA nick joining and intermolecular DNA ligation, yet the bases for ligase III specificity and structural variation among human ligases are not understood. Here combined crystal structure and small-angle X-ray scattering results reveal dynamic switching between two nick-binding components of ligase III: the ZnF-DNA binding domain (DBD) forms a crescent-shaped surface used for DNA end recognition which switches to a ring formed by the nucleotidyl transferase (NTase) and OB-fold (OBD) domains for catalysis. Structural and mutational analyses indicate that high flexibility and distinct DNA binding domain features in ligase III assist both nick sensing and the transition from nick sensing by the ZnF to nick joining by the catalytic core. The collective results support a 'jackknife model' in which the ZnF loads ligase III onto nicked DNA and conformational changes deliver DNA into the active site. This work has implications for the biological specificity of DNA ligases and functions of PARP-like zinc fingers.

Cotner-Gohara, Elizabeth; Kim, In-Kwon; Hammel, Michal; Tainer, John A.; Tomkinson, Alan E.; Ellenberger, Tom (Scripps); (Maryland-MED); (WU-MED); (LBNL)



Phylogenetic analysis identifies the invertebrate pathogen Helicosporidium sp. as a green alga (Chlorophyta).  


Historically, the invertebrate pathogens of the genus Helicosporidium were considered to be either protozoa or fungi, but the taxonomic position of this group has not been considered since 1931. Recently, a Helicosporidium sp., isolated from the blackfly Simulium jonesi Stone & Snoddy (Diptera: Simuliidae), has been amplified in the heterologous host Helicoverpa zea. Genomic DNA has been extracted from gradient-purified cysts. The 185, 28S and 5.8S regions of the Helicosporidium rDNA, as well as partial sequences of the actin and beta-tubulin genes, were amplified by PCR and sequenced. Comparative analysis of these nucleotide sequences was performed using neighbour-joining and maximum-parsimony methods. All inferred phylogenetic trees placed Helicosporidium sp. among the green algae (Chlorophyta), and this association was supported by bootstrap and parsimony jackknife values. Phylogenetic analysis focused on the green algae depicted Helicosporidium sp. as a close relative of Prototheca wickerhamii and Prototheca zopfii (Chlorophyta, Trebouxiophyceae), two achlorophylous, pathogenic green algae. On the basis of this phylogenetic analysis, Helicosporidium sp. is clearly neither a protist nor a fungus, but appears to be the first described algal invertebrate pathogen. These conclusions lead us to propose the transfer of the genus Helicosporidium to Chlorophyta, Trebouxiophyceae. PMID:11837312

Tartar, Aurélien; Boucias, Drion G; Adams, Byron J; Becnel, James J



The diagnostic accuracy of dual-view digital mammography, single-view breast tomosynthesis and a dual-view combination of breast tomosynthesis and digital mammography in a free-response observer performance study  

PubMed Central

The purpose of the present study was to compare the diagnostic accuracy of dual-view digital mammography (DM), single-view breast tomosynthesis (BT) and BT combined with the opposite DM view. Patients with subtle lesions were selected to undergo BT examinations. Two radiologists who are non-participants in the study and have experience in using DM and BT determined the locations and extents of lesions in the images. Five expert mammographers interpreted the cases using the free-response paradigm. The task was to mark and rate clinically reportable findings suspicious for malignancy and clinically relevant benign findings. The marks were scored with reference to the outlined regions into lesion localization or non-lesion localization, and analysed by the jackknife alternative free-response receiver operating characteristic method. The analysis yielded statistically significant differences between the combined modality and dual-view DM (p < 0.05). No differences were found between single-view BT and dual-view DM or between single-view BT and the combined modality.

Svahn, T.; Andersson, I.; Chakraborty, D.; Svensson, S.; Ikeda, D.; Fornvik, D.; Mattsson, S.; Tingberg, A.; Zackrisson, S.



Dissociable executive functions in behavioral variant frontotemporal and Alzheimer dementias  

PubMed Central

Objective: The objective of this study was to determine which aspects of executive functions are most affected in behavioral variant frontotemporal dementia (bvFTD) and best differentiate this syndrome from Alzheimer disease (AD). Methods: We compared executive functions in 22 patients diagnosed with bvFTD, 26 with AD, and 31 neurologically healthy controls using a conceptually driven and comprehensive battery of executive function tests, the NIH EXAMINER battery ( Results: The bvFTD and the AD patients were similarly impaired compared with controls on tests of working memory, category fluency, and attention, but the patients with bvFTD showed significantly more severe impairments than the patients with AD on tests of letter fluency, antisaccade accuracy, social decision-making, and social behavior. Discriminant function analysis with jackknifed cross-validation classified the bvFTD and AD patient groups with 73% accuracy. Conclusions: Executive function assessment can support bvFTD diagnosis when measures are carefully selected to emphasize frontally specific functions.

Feigenbaum, Dana; Rankin, Katherine P.; Smith, Glenn E.; Boxer, Adam L.; Wood, Kristie; Hanna, Sherrie M.; Miller, Bruce L.; Kramer, Joel H.



The [Formula: see text]-sample problem in a multi-state model and testing transition probability matrices.  


The choice of multi-state models is natural in analysis of survival data, e.g., when the subjects in a study pass through different states like 'healthy', 'in a state of remission', 'relapse' or 'dead' in a health related quality of life study. Competing risks is another common instance of the use of multi-state models. Statistical inference for such event history data can be carried out by assuming a stochastic process model. Under such a setting, comparison of the event history data generated by two different treatments calls for testing equality of the corresponding transition probability matrices. The present paper proposes solution to this class of problems by assuming a non-homogeneous Markov process to describe the transitions among the health states. A class of test statistics are derived for comparison of [Formula: see text] treatments by using a 'weight process'. This class, in particular, yields generalisations of the log-rank, Gehan, Peto-Peto and Harrington-Fleming tests. For an intrinsic comparison of the treatments, the 'leave-one-out' jackknife method is employed for identifying influential observations. The proposed methods are then used to develop the Kolmogorov-Smirnov type supremum tests corresponding to the various extended tests. To demonstrate the usefulness of the test procedures developed, a simulation study was carried out and an application to the Trial V data provided by International Breast Cancer Study Group is discussed. PMID:23722306

Tattar, Prabhanjan N; Vaman, H J



Identification of pathogenic fungi with an optoelectronic nose.  


Human fungal infections have gained recent notoriety following contamination of pharmaceuticals in the compounding process. Such invasive infections are a more serious global problem, especially for immunocompromised patients. While superficial fungal infections are common and generally curable, invasive fungal infections are often life-threatening and much harder to diagnose and treat. Despite the increasing awareness of the situation's severity, currently available fungal diagnostic methods cannot always meet diagnostic needs, especially for invasive fungal infections. Volatile organic compounds produced by fungi provide an alternative diagnostic approach for identification of fungal strains. We report here an optoelectronic nose based on a disposable colorimetric sensor array capable of rapid differentiation and identification of pathogenic fungi based on their metabolic profiles of emitted volatiles. The sensor arrays were tested with 12 human pathogenic fungal strains grown on standard agar medium. Array responses were monitored with an ordinary flatbed scanner. All fungal strains gave unique composite responses within 3 hours and were correctly clustered using hierarchical cluster analysis. A standard jackknifed linear discriminant analysis gave a classification accuracy of 94% for 155 trials. Tensor discriminant analysis, which takes better advantage of the high dimensionality of the sensor array data, gave a classification accuracy of 98.1%. The sensor array is also able to observe metabolic changes in growth patterns upon the addition of fungicides, and this provides a facile screening tool for determining fungicide efficacy for various fungal strains in real time. PMID:24570999

Zhang, Yinan; Askim, Jon R; Zhong, Wenxuan; Orlean, Peter; Suslick, Kenneth S



EGS hydraulic stimulation monitoring by surface arrays - location accuracy and completeness magnitude: the Basel Deep Heat Mining Project case study  

NASA Astrophysics Data System (ADS)

The potential and limits of monitoring induced seismicity by surface-based mini arrays was evaluated for the hydraulic stimulation of the Basel Deep Heat Mining Project. This project aimed at the exploitation of geothermal heat from a depth of about 4,630 m. As reference for our results, a network of borehole stations by Geothermal Explorers Ltd. provided ground truth information. We utilized array processing, sonogram event detection and outlier-resistant, graphical jackknife location procedures to compensate for the decrease in signal-to-noise ratio at the surface. We could correctly resolve the NNW-SSE striking fault plane by relative master event locations. Statistical analysis of our catalog data resulted in M L 0.36 as completeness magnitude, but with significant day-to-night dependency. To compare to the performance of borehole data with M W 0.9 as completeness magnitude, we applied two methods for converting M L to M W which raised our M C to M W in the range of 0.99-1.13. Further, the b value for the duration of our measurement was calculated to 1.14 (related to M L), respectively 1.66 (related to M W), but changes over time could not be resolved from the error bars.

Häge, Martin; Blascheck, Patrick; Joswig, Manfred



Variables influencing the presence of subyearling fall Chinook salmon in shoreline habitats of the Hanford Reach, Columbia River  

USGS Publications Warehouse

Little information currently exists on habitat use by subyearling fall Chinook salmon Oncorhynchus tshawytscha rearing in large, main-stem habitats. We collected habitat use information on subyearlings in the Hanford Reach of the Columbia River during May 1994 and April-May 1995 using point abundance electrofishing. We analyzed measures of physical habitat using logistic regression to predict fish presence and absence in shoreline habitats. The difference between water temperature at the point of sampling and in the main river channel was the most important variable for predicting the presence and absence of subyearlings. Mean water velocities of 45 cm/s or less and habitats with low lateral bank slopes were also associated with a greater likelihood of subyearling presence. Intermediate-sized gravel and cobble substrates were significant predictors of fish presence, but small (256-mm) substrates were not. Our rearing model was accurate at predicting fish presence and absence using jackknifing (80% correct) and classification of observations from an independent data set (76% correct). The habitat requirements of fall Chinook salmon in the Hanford Reach are similar to those reported for juvenile Chinook salmon in smaller systems but are met in functionally different ways in a large river.

Tiffan, K. F.; Clark, L. O.; Garland, R. D.; Rondorf, D. W.



Calibration plots for risk prediction models in the presence of competing risks.  


A predicted risk of 17% can be called reliable if it can be expected that the event will occur to about 17 of 100 patients who all received a predicted risk of 17%. Statistical models can predict the absolute risk of an event such as cardiovascular death in the presence of competing risks such as death due to other causes. For personalized medicine and patient counseling, it is necessary to check that the model is calibrated in the sense that it provides reliable predictions for all subjects. There are three often encountered practical problems when the aim is to display or test if a risk prediction model is well calibrated. The first is lack of independent validation data, the second is right censoring, and the third is that when the risk scale is continuous, the estimation problem is as difficult as density estimation. To deal with these problems, we propose to estimate calibration curves for competing risks models based on jackknife pseudo-values that are combined with a nearest neighborhood smoother and a cross-validation approach to deal with all three problems. Copyright © 2014 John Wiley & Sons, Ltd. PMID:24668611

Gerds, Thomas A; Andersen, Per K; Kattan, Michael W



Patterns of connectivity among populations of a coral reef fish  

NASA Astrophysics Data System (ADS)

Knowledge of the patterns and scale of connectivity among populations is essential for the effective management of species, but our understanding is still poor for marine species. We used otolith microchemistry of newly settled bicolor damselfish ( Stegastes partitus) in the Mesoamerican Reef System (MRS), Western Caribbean, to investigate patterns of connectivity among populations over 2 years. First, we assessed spatial and temporal variability in trace elemental concentrations from the otolith edge to make a `chemical map' of potential source reef(s) in the region. Significant otolith chemical differences were detected at three spatial scales (within-atoll, between-atolls, and region-wide), such that individuals were classified to locations with moderate (52 % jackknife classification) to high (99 %) accuracy. Most sites at Turneffe Atoll, Belize showed significant temporal variability in otolith concentrations on the scale of 1-2 months. Using a maximum likelihood approach, we estimated the natal source of larvae recruiting to reefs across the MRS by comparing `natal' chemical signatures from the otolith of recruits to the `chemical map' of potential source reef(s). Our results indicated that populations at both Turneffe Atoll and Banco Chinchorro supply a substantial amount of individuals to their own reefs (i.e., self-recruitment) and thus emphasize that marine conservation and management in the MRS region would benefit from localized management efforts as well as international cooperation.

Chittaro, P. M.; Hogan, J. D.



Using Chou's pseudo amino acid composition based on approximate entropy and an ensemble of AdaBoost classifiers to predict protein subnuclear location.  


The knowledge of subnuclear localization in eukaryotic cells is essential for understanding the life function of nucleus. Developing prediction methods and tools for proteins subnuclear localization become important research fields in protein science for special characteristics in cell nuclear. In this study, a novel approach has been proposed to predict protein subnuclear localization. Sample of protein is represented by Pseudo Amino Acid (PseAA) composition based on approximate entropy (ApEn) concept, which reflects the complexity of time series. A novel ensemble classifier is designed incorporating three AdaBoost classifiers. The base classifier algorithms in three AdaBoost are decision stumps, fuzzy K nearest neighbors classifier, and radial basis-support vector machines, respectively. Different PseAA compositions are used as input data of different AdaBoost classifier in ensemble. Genetic algorithm is used to optimize the dimension and weight factor of PseAA composition. Two datasets often used in published works are used to validate the performance of the proposed approach. The obtained results of Jackknife cross-validation test are higher and more balance than them of other methods on same datasets. The promising results indicate that the proposed approach is effective and practical. It might become a useful tool in protein subnuclear localization. The software in Matlab and supplementary materials are available freely by contacting the corresponding author. PMID:18256886

Jiang, Xiaoying; Wei, Rong; Zhao, Yanjun; Zhang, Tongliang



The potential distribution of Phlebotomus papatasi (Diptera: Psychodidae) in Libya based on ecological niche model.  


The increased cases of cutaneous leishmaniasis vectored by Phlebotomus papatasi (Scopoli) in Libya have driven considerable effort to develop a predictive model for the potential geographical distribution of this disease. We collected adult P. papatasi from 17 sites in Musrata and Yefern regions of Libya using four different attraction traps. Our trap results and literature records describing the distribution of P. papatasi were incorporated into a MaxEnt algorithm prediction model that used 22 environmental variables. The model showed a high performance (AUC = 0.992 and 0.990 for training and test data, respectively). High suitability for P. papatasi was predicted to be largely confined to the coast at altitudes <600 m. Regions south of 300 degrees N latitude were calculated as unsuitable for this species. Jackknife analysis identified precipitation as having the most significant predictive power, while temperature and elevation variables were less influential. The National Leishmaniasis Control Program in Libya may find this information useful in their efforts to control zoonotic cutaneous leishmaniasis. Existing records are strongly biased toward a few geographical regions, and therefore, further sand fly collections are warranted that should include documentation of such factors as soil texture and humidity, land cover, and normalized difference vegetation index (NDVI) data to increase the model's predictive power. PMID:22679884

Abdel-Dayem, M S; Annajar, B B; Hanafi, H A; Obenauer, P J



New methodology of influential point detection in regression model building for the prediction of metabolic clearance rate of glucose.  


Identifying outliers and high-leverage points is a fundamental step in the least-squares regression model building process. The examination of data quality involves the detection of influential points, outliers and high-leverages, which cause many problems in regression analysis. On the basis of a statistical analysis of the residuals (classical, normalized, standardized, jackknife, predicted and recursive) and diagonal elements of a projection matrix, diagnostic plots for influential points indication are formed. The identification of outliers and high leverage points are combined with graphs for the identification of influence type based on the likelihood distance. The powerful procedure for the computation of influential points characteristics written in S-Plus is demonstrated on the model predicting the metabolic clearance rate of glucose (MCRg) that represents the ratio of the amount of glucose supplied to maintain blood glucose levels during the euglycemic clamp and the blood glucose concentration from common laboratory and anthropometric indices. MCRg reflects insulin sensitivity filtering-off the effect of blood glucose. The prediction of clamp parameters should enable us to avoid the demanding clamp examination, which is connected with a higher load and risk for patients. PMID:15080566

Meloun, Milan; Hill, Martin; Militký, Jirí; Vrbíková, Jana; Stanická, Sona; Skrha, Jan



Two DNA-binding and Nick Recognition Modules in Human DNA Ligase III*  

PubMed Central

Human DNA ligase III contains an N-terminal zinc finger domain that binds to nicks and gaps in DNA. This small domain has been described as a DNA nick sensor, but it is not required for DNA nick joining activity in vitro. In light of new structural information for mammalian ligases, we measured the DNA binding affinity and specificity of each domain of DNA ligase III. These studies identified two separate, independent DNA-binding modules in DNA ligase III that each bind specifically to nicked DNA over intact duplex DNA. One of these modules comprises the zinc finger domain and DNA-binding domain, which function together as a single DNA binding unit. The catalytic core of ligase III is the second DNA nick-binding module. Both binding modules are required for ligation of blunt ended DNA substrates. Although the zinc finger increases the catalytic efficiency of nick ligation, it appears to occupy the same binding site as the DNA ligase III catalytic core. We present a jackknife model for ligase III that posits conformational changes during nick sensing and ligation to extend the versatility of the enzyme.

Cotner-Gohara, Elizabeth; Kim, In-Kwon; Tomkinson, Alan E.; Ellenberger, Tom



Diagnostic performance of detecting breast cancer on computed radiographic (CR) mammograms: comparison of hard copy film, 3-megapixel liquid-crystal-display (LCD) monitor and 5-megapixel LCD monitor.  


The purpose was to compare observer performance in the detection of breast cancer using hard-copy film, and 3-megapixel (3-MP) and 5-megapixel (5-MP) liquid crystal display (LCD) monitors in a simulated screening setting. We amassed 100 sample sets, including 32 patients with surgically proven breast cancer (masses present, N = 12; microcalcifications, N = 10; other types, N = 10) and 68 normal controls. All the mammograms were obtained using computed radiography (CR; sampling pitch of 50 mum). Twelve mammographers independently assessed CR mammograms presented in random order for hard-copy and soft-copy reading at minimal 4-week intervals. Observers rated the images on seven-point (1 to 7) and continuous (0 to 100) malignancy scales. Receiver-operating-characteristics analysis was performed, and the average area under the curve (AUC) was calculated for each modality. The jackknife method with the Bonferroni correction was applied to multireader/multicase analysis. The average AUC values for the 3-MP LCD, 5-MP LCD, and hard-copy film were 0.954, 0.947, and 0.956 on the seven-point scale and 0.943, 0.923, and 0.944 on the continuous scale, respectively. There were no significant differences among the three modalities on either scale. Soft-copy reading using 3-MP and 5-MP LCDs is comparable to hard-copy reading for detecting breast cancer. PMID:18491108

Yamada, Takayuki; Suzuki, Akihiko; Uchiyama, Nachiko; Ohuchi, Noriaki; Takahashi, Shoki



Nucleosome positioning based on the sequence word composition.  


The DNA of all eukaryotic organisms is packaged into nucleosomes (a basic repeating unit of chromatin). A nucleosome consists of histone octamer wrapped by core DNA and linker histone H1 associated with linker DNA. It has profound effects on all DNA-dependent processes by affecting sequence accessibility. Understanding the factors that influence nucleosome positioning has great help to the study of genomic control mechanism. Among many determinants, the inherent DNA sequence has been suggested to have a dominant role in nucleosome positioning in vivo. Here, we used the method of minimum redundancy maximum relevance (mRMR) feature selection and the nearest neighbor algorithm (NNA) combined with the incremental feature selection (IFS) method to identify the most important sequence features that either favor or inhibit nucleosome positioning. We analyzed the words of 53,021 nucleosome DNA sequences and 50,299 linker DNA sequences of Saccharomyces cerevisiae. 32 important features were abstracted from 5,460 features, and the overall prediction accuracy through jackknife cross-validation test was 76.5%. Our results support that sequence-dependent DNA flexibility plays an important role in positioning nucleosome core particles and that genome sequence facilitates the rapid nucleosome reassembly instead of nucleosome depletion. Besides, our results suggest that there exist some additional features playing a considerable role in discriminating nucleosome forming and inhibiting sequences. These results confirmed that the underlying DNA sequence plays a major role in nucleosome positioning. PMID:21919856

Yi, Xian-Fu; He, Zhi-Song; Chou, Kuo-Chen; Kong, Xiang-Yin



Predicting Chemical Toxicity Effects Based on Chemical-Chemical Interactions  

PubMed Central

Toxicity is a major contributor to high attrition rates of new chemical entities in drug discoveries. In this study, an order-classifier was built to predict a series of toxic effects based on data concerning chemical-chemical interactions under the assumption that interactive compounds are more likely to share similar toxicity profiles. According to their interaction confidence scores, the order from the most likely toxicity to the least was obtained for each compound. Ten test groups, each of them containing one training dataset and one test dataset, were constructed from a benchmark dataset consisting of 17,233 compounds. By a Jackknife test on each of these test groups, the 1st order prediction accuracies of the training dataset and the test dataset were all approximately 79.50%, substantially higher than the rate of 25.43% achieved by random guesses. Encouraged by the promising results, we expect that our method will become a useful tool in screening out drugs with high toxicity.

Zhang, Jian; Feng, Kai-Rui; Zheng, Ming-Yue; Cai, Yu-Dong



AcalPred: A Sequence-Based Tool for Discriminating between Acidic and Alkaline Enzymes  

PubMed Central

The structure and activity of enzymes are influenced by pH value of their surroundings. Although many enzymes work well in the pH range from 6 to 8, some specific enzymes have good efficiencies only in acidic (pH<5) or alkaline (pH>9) solution. Studies have demonstrated that the activities of enzymes correlate with their primary sequences. It is crucial to judge enzyme adaptation to acidic or alkaline environment from its amino acid sequence in molecular mechanism clarification and the design of high efficient enzymes. In this study, we developed a sequence-based method to discriminate acidic enzymes from alkaline enzymes. The analysis of variance was used to choose the optimized discriminating features derived from g-gap dipeptide compositions. And support vector machine was utilized to establish the prediction model. In the rigorous jackknife cross-validation, the overall accuracy of 96.7% was achieved. The method can correctly predict 96.3% acidic and 97.1% alkaline enzymes. Through the comparison between the proposed method and previous methods, it is demonstrated that the proposed method is more accurate. On the basis of this proposed method, we have built an online web-server called AcalPred which can be freely accessed from the website ( We believe that the AcalPred will become a powerful tool to study enzyme adaptation to acidic or alkaline environment.

Lin, Hao; Chen, Wei; Ding, Hui



Position-Specific Analysis and Prediction of Protein Pupylation Sites Based on Multiple Features  

PubMed Central

Pupylation is one of the most important posttranslational modifications of proteins; accurate identification of pupylation sites will facilitate the understanding of the molecular mechanism of pupylation. Besides the conventional experimental approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, we developed a novel predictor to predict the pupylation sites. First, the maximum relevance minimum redundancy (mRMR) and incremental feature selection methods were made on five kinds of features to select the optimal feature set. Then the prediction model was built based on the optimal feature set with the assistant of the support vector machine algorithm. As a result, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was 0.764, and the Mathews correlation coefficient was 0.522, indicating a good prediction. Feature analysis showed that all features types contributed to the prediction of protein pupylation sites. Further site-specific features analysis revealed that the features of sites surrounding the central lysine contributed more to the determination of pupylation sites than the other sites.

Zhao, Xiaowei; Dai, Jiangyan; Ning, Qiao; Ma, Zhiqiang; Yin, Minghao; Sun, Pingping



Predicting deleterious non-synonymous single nucleotide polymorphisms in signal peptides based on hybrid sequence attributes.  


Signal peptides play a crucial role in various biological processes, such as localization of cell surface receptors, translocation of secreted proteins and cell-cell communication. However, the amino acid mutation in signal peptides, also called non-synonymous single nucleotide polymorphisms (nsSNPs or SAPs) may lead to the loss of their functions. In the present study, a computational method was proposed for predicting deleterious nsSNPs in signal peptides based on random forest (RF) by incorporating position specific scoring matrix (PSSM) profile, SignalP score and physicochemical properties. These features were optimized by the maximum relevance minimum redundancy (mRMR) method. Then, a cost matrix was used to minimize the effect of the imbalanced data classification problem that usually occurred in nsSNPs prediction. The method achieved an overall accuracy of 84.5% and the area under the ROC curve (AUC) of 0.822 by Jackknife test, when the optimal subset included 10 features. Furthermore, on the same dataset, we compared our predictor with other existing methods, including R-score-based method and D-score-based methods, and the result of our method was superior to those of the two methods. The satisfactory performance suggests that our method is effective in predicting the deleterious nsSNPs in signal peptides. PMID:22277674

Qin, Wenli; Li, Yizhou; Li, Juan; Yu, Lezheng; Wu, Di; Jing, Runyu; Pu, Xuemei; Guo, Yanzhi; Li, Menglong



Predicting miRNA's target from primary structure by the nearest neighbor algorithm.  


We used a machine learning method, the nearest neighbor algorithm (NNA), to learn the relationship between miRNAs and their target proteins, generating a predictor which can then judge whether a new miRNA-target pair is true or not. We acquired 198 positive (true) miRNA-target pairs from Tarbase and the literature, and generated 4,888 negative (false) pairs through random combination. A 0/1 system and the frequencies of single nucleotides and di-nucleotides were used to encode miRNAs into vectors while various physicochemical parameters were used to encode the targets. The NNA was then applied, learning from these data to produce a predictor. We implemented minimum redundancy maximum relevance (mRMR) and properties forward selection (PFS) to reduce the redundancy of our encoding system, obtaining 91 most efficient properties. Finally, via the Jackknife cross-validation test, we got a positive accuracy of 69.2% and an overall accuracy of 96.0% with all the 253 properties. Besides, we got a positive accuracy of 83.8% and an overall accuracy of 97.2% with the 91 most efficient properties. A web-server for predictions is also made available at PMID:20041294

Lin, Kao; Qian, Ziliang; Lu, Lin; Lu, Lingyi; Lai, Lihui; Gu, Jieyi; Zeng, Zhenbing; Li, Haipeng; Cai, Yudong



Classification of transcription factors using protein primary structure.  


The transcription factor (TF) is a protein that binds DNA at specific site to help regulate the transcription from DNA to RNA. The mechanism of transcriptional regulatory can be much better understood if the category of transcription factors is known. We introduce a system which can automatically categorize transcription factors using their primary structures. A feature analysis strategy called "mRMR" (Minimum Redundancy, Maximum Relevance) is used to analyze the contribution of the TF properties towards the TF classification. mRMR is coupled with forward feature selection to choose an optimized feature subset for the classification. TF properties are composed of the amino acid composition and the physiochemical characters of the proteins. These properties will generate over a hundred features/parameters. We put all the features/parameters into a classifier, called NNA (nearest neighbor algorithm), for the classification. The classification accuracy is 93.81%, evaluated by a Jackknife test. Feature analysis using mRMR algorithm shows that secondary structure, amino acid composition and hydrophobicity are the most relevant features for classification. A free online classifier is available at PMID:20394581

Yang, Xin-Yi; Shi, Xiao-He; Meng, Xin; Li, Xin-Lei; Lin, Kao; Qian, Zi-Liang; Feng, Kai-Yan; Kong, Xiang-Yin; Cai, Yu-Dong



Predicting subcellular location of proteins using integrated-algorithm method.  


Protein's subcellular location, which indicates where a protein resides in a cell, is an important characteristic of protein. Correctly assigning proteins to their subcellular locations would be of great help to the prediction of proteins' function, genome annotation, and drug design. Yet, in spite of great technical advance in the past decades, it is still time-consuming and laborious to experimentally determine protein subcellular locations on a high throughput scale. Hence, four integrated-algorithm methods were developed to fulfill such high throughput prediction in this article. Two data sets taken from the literature (Chou and Elrod, Protein Eng 12:107-118, 1999) were used as training set and test set, which consisted of 2,391 and 2,598 proteins, respectively. Amino acid composition was applied to represent the protein sequences. The jackknife cross-validation was used to test the training set. The final best integrated-algorithm predictor was constructed by integrating 10 algorithms in Weka (a software tool for tackling data mining tasks, ) based on an mRMR (Minimum Redundancy Maximum Relevance, ) method. It can achieve correct rate of 77.83 and 80.56% for the training set and test set, respectively, which is better than all of the 60 algorithms collected in Weka. This predicting software is available upon request. PMID:19662505

Cai, Yu-Dong; Lu, Lin; Chen, Lei; He, Jian-Feng



A novel sequence-based method for phosphorylation site prediction with feature selection and analysis.  


Phosphorylation is one of the most important post-translational modifications, and the identification of protein phosphorylation sites is particularly important for studying disease diagnosis. However, experimental detection of phosphorylation sites is labor intensive. It would be beneficial if computational methods are available to provide an extra reference for the phosphorylation sites. Here we developed a novel sequence-based method for serine, threonine, and tyrosine phosphorylation site prediction. Nearest Neighbor algorithm was employed as the prediction engine. The peptides around the phosphorylation sites with a fixed length of thirteen amino acid residues were extracted via a sliding window along the protein chains concerned. Each of such peptides was coded into a vector with 6,072 features, derived from Amino Acid Index (AAIndex) database, for the classification/detection. Incremental Feature Selection, a feature selection algorithm based on the Maximum Relevancy Minimum Redundancy (mRMR) method was used to select a compact feature set for a further improvement of the classification performance. Three predictors were established for identifying the three types of phosphorylation sites, achieving the overall accuracies of 66.64%, 66.11%% and 66.69%, respectively. These rates were obtained by rigorous jackknife cross-validation tests. PMID:21919857

He, Zhi-Song; Shi, Xiao-He; Kong, Xiang-Ying; Zhu, Yu-Bei; Chou, Kuo-Chen



Prediction of membrane protein types in a hybrid space.  


Prediction of the types of membrane proteins is of great importance both for genome-wide annotation and for experimental researchers to understand proteins' functions. We describe a new strategy for the prediction of the types of membrane proteins using the Nearest Neighbor Algorithm. We introduced a bipartite feature space consisting of two kinds of disjoint vectors, proteins' domain profile and proteins' physiochemical characters. Jackknife cross validation test shows that a combination of both features greatly improves the prediction accuracy. Furthermore, the contribution of the physiochemical features to the classification of membrane proteins has also been explored using the feature selection method called "mRMR" (Minimum Redundancy, Maximum Relevance) ( IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27 ( 8), 1226- 1238 ). A more compact set of features that are mostly contributive to membrane protein classification are obtained. The analyses highlighted both hydrophobicity and polarity as the most important features. The predictor with 56 most contributive features achieves an acceptable prediction accuracy of 87.02%. Online prediction service is available freely on our Web site PMID:18260610

Jia, Peilin; Qian, Ziliang; Feng, Kaiyan; Lu, Wencong; Li, Yixue; Cai, Yudong



Position-specific analysis and prediction of protein pupylation sites based on multiple features.  


Pupylation is one of the most important posttranslational modifications of proteins; accurate identification of pupylation sites will facilitate the understanding of the molecular mechanism of pupylation. Besides the conventional experimental approaches, computational prediction of pupylation sites is much desirable for their convenience and fast speed. In this study, we developed a novel predictor to predict the pupylation sites. First, the maximum relevance minimum redundancy (mRMR) and incremental feature selection methods were made on five kinds of features to select the optimal feature set. Then the prediction model was built based on the optimal feature set with the assistant of the support vector machine algorithm. As a result, the overall jackknife success rate by the new predictor on a newly constructed benchmark dataset was 0.764, and the Mathews correlation coefficient was 0.522, indicating a good prediction. Feature analysis showed that all features types contributed to the prediction of protein pupylation sites. Further site-specific features analysis revealed that the features of sites surrounding the central lysine contributed more to the determination of pupylation sites than the other sites. PMID:24066285

Zhao, Xiaowei; Dai, Jiangyan; Ning, Qiao; Ma, Zhiqiang; Yin, Minghao; Sun, Pingping



Effects of Taxon Sampling in Reconstructions of Intron Evolution  

PubMed Central

Introns comprise a considerable portion of eukaryotic genomes; however, their evolution is understudied. Numerous works of the last years largely disagree on many aspects of intron evolution. Interpretation of these differences is hindered because different algorithms and taxon sampling strategies were used. Here, we present the first attempt of a systematic evaluation of the effects of taxon sampling on popular intron evolution estimation algorithms. Using the “taxon jackknife” method, we compared the effect of taxon sampling on the behavior of intron evolution inferring algorithms. We show that taxon sampling can dramatically affect the inferences and identify conditions where algorithms are prone to systematic errors. Presence or absence of some key species is often more important than the taxon sampling size alone. Criteria of representativeness of the taxonomic sampling for reliable reconstructions are outlined. Presence of the deep-branching species with relatively high intron density is more important than sheer number of species. According to these criteria, currently available genomic databases are representative enough to provide reliable inferences of the intron evolution in animals, land plants, and fungi, but they underrepresent many groups of unicellular eukaryotes, including the well-studied Alveolata.

Nikitin, Mikhail A.; Aleoshin, Vladimir V.



Inference of hazel grouse population structure using multilocus data: a landscape genetic approach.  


In conservation and management of species it is important to make inferences about gene flow, dispersal and population structure. In this study, we used 613 georeferenced tissue samples from hazel grouse (Bonasa bonasia) where each individual was genotyped at 12 microsatellite loci to make inference on population genetic structure, gene flow and dispersal in northern Sweden. Observed levels of genetic diversity suggest that Swedish hazel grouse do not suffer loss of genetic diversity compared with other grouse species. We found significant F(IS) (deviation from Hardy-Weinberg expectations) over the entire sample using jack-knifed estimators over loci, which is most likely explained by a Wahlund effect. With the use of spatial autocorrelation methods, we detected significant isolation by distance among individuals. Neighbourhood size was estimated in the order of 62-158 individuals corresponding to a dispersal distance of 950-1500 m. Using a spatial statistical model for landscape genetics to infer the number of populations and the spatial location of genetic discontinuities between these populations we found indications that Swedish hazel grouse are divided into a northern and a southern population. We could not find a sharp border between these two populations and none of the observed borders appeared to coincide with any potential geographical barriers.These results imply that gene flow appears somewhat unrestricted in the boreal taiga forests of northern Sweden and that the two populations of hazel grouse in Sweden may be explained by the post-glacial reinvasion history of the Scandinavian Peninsula. PMID:18827838

Sahlsten, J; Thörngren, H; Höglund, J



Identification of bacteriophage virion proteins by the ANOVA feature selection and analysis.  


The bacteriophage virion proteins play extremely important roles in the fate of host bacterial cells. Accurate identification of bacteriophage virion proteins is very important for understanding their functions and clarifying the lysis mechanism of bacterial cells. In this study, a new sequence-based method was developed to identify phage virion proteins. In the new method, the protein sequences were initially formulated by the g-gap dipeptide compositions. Subsequently, the analysis of variance (ANOVA) with incremental feature selection (IFS) was used to search for the optimal feature set. It was observed that, in jackknife cross-validation, the optimal feature set including 160 optimized features can produce the maximum accuracy of 85.02%. By performing feature analysis, we found that the correlation between two amino acids with one gap was more important than other correlations for phage virion protein prediction and that some of the 1-gap dipeptides were important and mainly contributed to the virion protein prediction. This analysis will provide novel insights into the function of phage virion proteins. On the basis of the proposed method, an online web-server, PVPred, was established and can be freely accessed from the website (). We believe that the PVPred will become a powerful tool to study phage virion proteins and to guide the related experimental validations. PMID:24931825

Ding, Hui; Feng, Peng-Mian; Chen, Wei; Lin, Hao



A novel computational method to predict transcription factor DNA binding preference.  


Transcription factor binds to sequence specific sites in regulatory region to control nearby gene's expression. It is termed as the major regulator of transcription. However, identifying DNA binding preference of transcription factors systematically is still a challenge. By using the nearest neighbor algorithm, a novel computational approach was developed to predict transcription factor DNA binding preference based on the gene ontology [M. Ashburner, C.A. Ball, J.A. Blake, D. Botstein, H. Butler, J.M. Cherry, A.P. Davis, K. Dolinski, S.S. Dwight, J.T. Eppig, M.A. Harris, D.P. Hill, L. Issel-Tarver, A. Kasarskis, S. Lewis, J.C. Matese, J.E. Richardson, M. Ringwald, G.M. Rubin, G. Sherlock, Gene Ontology: tool for the unification of biology, Nat. Genet. 25 (2000) 25-29.] and 0/1 encoding system of nucleotide. The overall success rate of Jackknife cross-validation test for our predictor reaches 76.6%, which indicates the DNA binding preference is closely correlated with its biological functions and computational method developed in this contribution could be a powerful tool to investigate transcription factor DNA binding preference, especially for those novel transcription factors with little prior knowledge on its DNA binding preference. PMID:16899225

Qian, Ziliang; Cai, Yu-Dong; Li, Yixue



A computational approach to identify genes for functional RNAs in genomic sequences  

PubMed Central

Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 80–90% accurate in jackknife testing experiments for bacteria and 90–99% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.

Carter, Richard J.; Dubchak, Inna; Holbrook, Stephen R.



PSSP-RFE: Accurate Prediction of Protein Structural Class by Recursive Feature Extraction from PSI-BLAST Profile, Physical-Chemical Property and Functional Annotations  

PubMed Central

Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets.

Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi



Statistical analysis of galaxy surveys - IV. An objective way to quantify the impact of superstructures on galaxy clustering statistics  

NASA Astrophysics Data System (ADS)

For galaxy clustering to provide robust constraints on cosmological parameters and galaxy formation models, it is essential to make reliable estimates of the errors on clustering measurements. We present a new technique, based on a spatial jackknife (JK) resampling, which provides an objective way to estimate errors on clustering statistics. Our approach allows us to set the appropriate size for the JK subsamples. The method also provides a means to assess the impact of individual regions on the measured clustering, and thereby to establish whether or not a given galaxy catalogue is dominated by one or several large structures, preventing it to be considered as a ‘fair sample’. We apply this methodology to the two- and three-point correlation functions measured from a volume-limited sample of M* galaxies drawn from Data Release 7 of the Sloan Digital Sky Survey (SDSS). The frequency of JK subsample outliers in the data is shown to be consistent with that seen in large N-body simulations of clustering in the cosmological constant plus cold dark matter cosmology. We also present a comparison of the three-point correlation function in SDSS and Two-degree-Field Galaxy Redshift Survey using this approach and find consistent measurements between the two samples.

Norberg, P.; Gaztañaga, E.; Baugh, C. M.; Croton, D. J.



Prediction of space sickness in astronauts from preflight fluid, electrolyte, and cardiovascular variables and Weightless Environmental Training Facility (WETF) training  

NASA Technical Reports Server (NTRS)

Nine preflight variables related to fluid, electrolyte, and cardiovascular status from 64 first-time Shuttle crewmembers were differentially weighted by discrimination analysis to predict the incidence and severity of each crewmember's space sickness as rated by NASA flight surgeons. The nine variables are serum uric acid, red cell count, environmental temperature at the launch site, serum phosphate, urine osmolality, serum thyroxine, sitting systolic blood pressure, calculated blood volume, and serum chloride. Using two methods of cross-validation on the original samples (jackknife and a stratefied random subsample), these variables enable the prediction of space sickness incidence (NONE or SICK) with 80 percent sickness and space severity (NONE, MILD, MODERATE, of SEVERE) with 59 percent success by one method of cross-validation and 67 percent by another method. Addition of a tenth variable, hours spent in the Weightlessness Environment Training Facility (WETF) did not improve the prediction of space sickness incidences but did improve the prediction of space sickness severity to 66 percent success by the first method of cross-validation of original samples and to 71 percent by the second method. Results to date suggest the presence of predisposing physiologic factors to space sickness that implicate fluid shift etiology. The data also suggest that prior exposure to fluid shift during WETF training may produce some circulatory pre-adaption to fluid shifts in weightlessness that results in a reduction of space sickness severity.

Simanonok, K.; Mosely, E.; Charles, J.



PSSP-RFE: accurate prediction of protein structural class by recursive feature extraction from PSI-BLAST profile, physical-chemical property and functional annotations.  


Protein structure prediction is critical to functional annotation of the massively accumulated biological sequences, which prompts an imperative need for the development of high-throughput technologies. As a first and key step in protein structure prediction, protein structural class prediction becomes an increasingly challenging task. Amongst most homological-based approaches, the accuracies of protein structural class prediction are sufficiently high for high similarity datasets, but still far from being satisfactory for low similarity datasets, i.e., below 40% in pairwise sequence similarity. Therefore, we present a novel method for accurate and reliable protein structural class prediction for both high and low similarity datasets. This method is based on Support Vector Machine (SVM) in conjunction with integrated features from position-specific score matrix (PSSM), PROFEAT and Gene Ontology (GO). A feature selection approach, SVM-RFE, is also used to rank the integrated feature vectors through recursively removing the feature with the lowest ranking score. The definitive top features selected by SVM-RFE are input into the SVM engines to predict the structural class of a query protein. To validate our method, jackknife tests were applied to seven widely used benchmark datasets, reaching overall accuracies between 84.61% and 99.79%, which are significantly higher than those achieved by state-of-the-art tools. These results suggest that our method could serve as an accurate and cost-effective alternative to existing methods in protein structural classification, especially for low similarity datasets. PMID:24675610

Li, Liqi; Cui, Xiang; Yu, Sanjiu; Zhang, Yuan; Luo, Zhong; Yang, Hua; Zhou, Yue; Zheng, Xiaoqi



Tomographic inversion for three-dimensional velocity structure at Mount St. Helens using earthquake data  

NASA Astrophysics Data System (ADS)

Tomographic inversion is applied to 17,659 P phase observations at 21 stations from 2023 earthquakes in the vicinity of Mount St. Helens to study the three-dimensional velocity structure. Block size for the inversion is 2 km horizontally and 2 km or more vertically. Locations of hypocenters are assumed known and are based on a reference one-dimensional, layered velocity structure. A conjugate gradient technique (LSQR) is used to invert the large sparse system of equations, augmented by regularization with a Laplacian roughening matrix. Resolution is estimated by computing the impulse response of the inversion for various critical locations, and uncertainties of the estimates are determined by a jackknife approach. The results of the inversion show a remarkable correlation with known geological and geophysical features. The Spirit Lake and Spud Mt. plutons are characterized by high-velocity regions extending to approximately 9 km depth. The St. Helens seismic zone, a band of diffuse seismicity extending NNW from the volcano is evident as a prominent low-velocity lineation. The change in character of the velocity anomalies south of St. Helens corresponds well with the near cessation of seismic activity there. A low-velocity anomaly beneath the crater from 6 to 16 km depths may represent modern magma accumulations.

Lees, Jonathan M.; Crosson, Robert S.



iMethyl-PseAAC: Identification of Protein Methylation Sites via a Pseudo Amino Acid Composition Approach  

PubMed Central

Before becoming the native proteins during the biosynthesis, their polypeptide chains created by ribosome's translating mRNA will undergo a series of “product-forming” steps, such as cutting, folding, and posttranslational modification (PTM). Knowledge of PTMs in proteins is crucial for dynamic proteome analysis of various human diseases and epigenetic inheritance. One of the most important PTMs is the Arg- or Lys-methylation that occurs on arginine or lysine, respectively. Given a protein, which site of its Arg (or Lys) can be methylated, and which site cannot? This is the first important problem for understanding the methylation mechanism and drug development in depth. With the avalanche of protein sequences generated in the postgenomic age, its urgency has become self-evident. To address this problem, we proposed a new predictor, called iMethyl-PseAAC. In the prediction system, a peptide sample was formulated by a 346-dimensional vector, formed by incorporating its physicochemical, sequence evolution, biochemical, and structural disorder information into the general form of pseudo amino acid composition. It was observed by the rigorous jackknife test and independent dataset test that iMethyl-PseAAC was superior to any of the existing predictors in this area.

Qiu, Wang-Ren; Lin, Wei-Zhong; Chou, Kuo-Chen



Predicting red wolf release success in the southeastern United States  

USGS Publications Warehouse

Although the red wolf (Canis rufus) was once found throughout the southeastern United States, indiscriminate killing and habitat destruction reduced its range to a small section of coastal Texas and Louisiana. Wolves trapped from 1973 to 1980 were taken to establish a captive breeding program that was used to repatriate 2 mainland and 3 island red wolf populations. We collected data from 320 red wolf releases in these areas and classified each as a success or failure based on survival and reproductive criteria, and whether recaptures were necessary to resolve conflicts with humans. We evaluated the relations between release success and conditions at the release sites, characteristics of released wolves, and release procedures. Although <44% of the variation in release success was explained, model performance based on jackknife tests indicated a 72-80% correct prediction rate for the 4 operational models we developed. The models indicated that success was associated with human influences on the landscape and the level of wolf habituation to humans prior to release. We applied the models to 31 prospective areas for wolf repatriation and calculated an index of release success for each area. Decision-makers can use these models to objectively rank prospective release areas and compare strengths and weaknesses of each.

Van Manen, F. T.; Crawford, B. A.; Clark, J. D.



Deriving probabilistic regional envelope curves with two pooling methods  

NASA Astrophysics Data System (ADS)

SummaryA probabilistic regional envelope curve (PREC) assigns a recurrence interval to a regional envelope curve. A central point of this method is the determination of homogeneous regions according to the index flood hypothesis. A flood discharge associated with the recurrence interval (PREC flood quantile) is estimated for each gauge of a homogeneous region. In this study, the influence of two pooling methods on PREC for a large group of catchments located in the south-east of Germany is investigated. Firstly, using cluster analysis, fixed homogeneous regions are derived. Secondly, the Region of Influence (RoI) approach is combined with PREC. The sensitivity of PREC flood quantiles with respect to pooling groups is evaluated. Different candidate sets of catchment descriptors are used to derive pooling groups for both pooling methods. Each pooling group is checked by a homogeneity test. PRECs are then constructed for all homogeneous regions. The ensemble of PREC realisations reveals the sensitivity of the PREC flood quantiles. A comparison with the traditional index flood method ascertains the suitability of the pooling methods. A leave-one-out jackknifing procedure points out a similar performance of cluster analysis and RoI. Furthermore, a comparison of different degrees of heterogeneity for deriving pooling groups reveals that the performance of PREC for ungauged catchments decreases in more heterogeneous pooling groups.

Guse, Björn; Thieken, Annegret H.; Castellarin, Attilio; Merz, Bruno



Driver Assistance System for Passive Multi-Trailer Vehicles with Haptic Steering Limitations on the Leading Unit  

PubMed Central

Driving vehicles with one or more passive trailers has difficulties in both forward and backward motion due to inter-unit collisions, jackknife, and lack of visibility. Consequently, advanced driver assistance systems (ADAS) for multi-trailer combinations can be beneficial to accident avoidance as well as to driver comfort. The ADAS proposed in this paper aims to prevent unsafe steering commands by means of a haptic handwheel. Furthermore, when driving in reverse, the steering-wheel and pedals can be used as if the vehicle was driven from the back of the last trailer with visual aid from a rear-view camera. This solution, which can be implemented in drive-by-wire vehicles with hitch angle sensors, profits from two methods previously developed by the authors: safe steering by applying a curvature limitation to the leading unit, and a virtual tractor concept for backward motion that includes the complex case of set-point propagation through on-axle hitches. The paper addresses system requirements and provides implementation details to tele-operate two different off- and on-axle combinations of a tracked mobile robot pulling and pushing two dissimilar trailers.

Morales, Jesus; Mandow, Anthony; Martinez, Jorge L.; Reina, Antonio J.; Garcia-Cerezo, Alfonso



Resampling Methodologies and the Estimation of Parameters of Rare Events  

NASA Astrophysics Data System (ADS)

In extreme value theory the extremal index is a key parameter that enables a straightforward extension of the classic results for the independent case to stationary processes, measuring the degree of local dependence in the largest observations. Its estimation is important not only by itself but also because of its effect on the estimation of other parameters of extreme events. The estimators considered in the literature, despite of having good asymptotic properties, show a strong dependence on the high level un, presenting a high variance for high levels and a high bias when the level decreases. It has been seen that the bias is the dominant component of the mean squared error of the semiparametric estimators presented in the literature. Resampling techniques have been applied in situations where classical statistical procedures are difficult to use, but for a dependent setup, the resampling has to be done using blocks of observations. An adaptive resampling approach, based on block-bootstrap and Jackknife-After-Bootstrap is here considered for estimating the optimal block size in order to obtain a ``good'' estimator of the bias. This work is still in progress and the main objective of this first stage is to estimate the bias for obtaining a more stable path for the extremal index estimates. A simulation study as well as an application to daily returns of the S&P 500 stock index is presented.

Gomes, Dora Prata; Neves, Manuela



Predicting Drugs Side Effects Based on Chemical-Chemical Interactions and Protein-Chemical Interactions  

PubMed Central

A drug side effect is an undesirable effect which occurs in addition to the intended therapeutic effect of the drug. The unexpected side effects that many patients suffer from are the major causes of large-scale drug withdrawal. To address the problem, it is highly demanded by pharmaceutical industries to develop computational methods for predicting the side effects of drugs. In this study, a novel computational method was developed to predict the side effects of drug compounds by hybridizing the chemical-chemical and protein-chemical interactions. Compared to most of the previous works, our method can rank the potential side effects for any query drug according to their predicted level of risk. A training dataset and test datasets were constructed from the benchmark dataset that contains 835 drug compounds to evaluate the method. By a jackknife test on the training dataset, the 1st order prediction accuracy was 86.30%, while it was 89.16% on the test dataset. It is expected that the new method may become a useful tool for drug design, and that the findings obtained by hybridizing various interactions in a network system may provide useful insights for conducting in-depth pharmacological research as well, particularly at the level of systems biomedicine.

Chen, Lei; Huang, Tao; Zhang, Jian; Zheng, Ming-Yue; Feng, Kai-Yan; Cai, Yu-Dong; Chou, Kuo-Chen



Prediction of Drugs Target Groups Based on ChEBI Ontology  

PubMed Central

Most drugs have beneficial as well as adverse effects and exert their biological functions by adjusting and altering the functions of their target proteins. Thus, knowledge of drugs target proteins is essential for the improvement of therapeutic effects and mitigation of undesirable side effects. In the study, we proposed a novel prediction method based on drug/compound ontology information extracted from ChEBI to identify drugs target groups from which the kind of functions of a drug may be deduced. By collecting data in KEGG, a benchmark dataset consisting of 876 drugs, categorized into four target groups, was constructed. To evaluate the method more thoroughly, the benchmark dataset was divided into a training dataset and an independent test dataset. It is observed by jackknife test that the overall prediction accuracy on the training dataset was 83.12%, while it was 87.50% on the test dataset—the predictor exhibited an excellent generalization. The good performance of the method indicates that the ontology information of the drugs contains rich information about their target groups, and the study may become an inspiration to solve the problems of this sort and bridge the gap between ChEBI ontology and drugs target groups.

Gao, Yu-Fei; Chen, Lei; Huang, Guo-Hua; Zhang, Tao; Feng, Kai-Yan; Li, Hai-Peng; Jiang, Yang



The Effects of Topography on Shortwave solar radiation modelling: The JGrass-NewAge System way  

NASA Astrophysics Data System (ADS)

The NewAGE-SwRB and NewAGE-DEC-MOD's are the two components of JGrass-NewAge hydrological modeling system to estimate the shortwave incident radiation. Shortwave solar radiation at the land surface is influenced by topographic parameters such as slope, aspect, altitude, and skyview factor, hence, detail analyses and discussions on their effect is the way to improve the modeling approach. The NewAGE-SwRB accounts for slope, aspect, shadow and the topographical information of the sites to estimate the cloudless irradiance. The first part of the paper is on the topographic parameter analysis using Udig GIS spatial toolbox, which is integrated in JGrass-NewAge system, and indicates the effect of each topographic parameters on the shortwave radiation. A statistical study on station topographic geometry (slope, aspect, altitude and Sky-view factor) and correlation of pairs of measurements of station analyzed to get closer look at the impact of rugged topography. The jackknife correlation coefficients has been used to analyze the estimate bias between shortwave radiations in different topographic geometric position, thereby helping to develop generalized linear models to explain the impacts of those topographic features. In addition to the NewAGE-SwRB accounts for the topographical parameters, there are three (an estimation of the visibility extent(V), the single-scattering albedo fraction of incident energy scattered to total attenuation by aerosols (Wo), and fraction of forward scattering to total scattering (Fs )) parameter needed to run the NewAGE-DEC-MOD's component. Sufficient knowledge regarding the magnitude and spatial distribution of the these parameters are very crucial. In this paper, the particle swarm NewAge component of the NewAge System used for automatic calibration of NewAGE-DEC-MOD's parameters for each stations based on different optimization and objective functions. Finally, the estimated parameters for all measurements station are interpolated in space, and, Kriging spatial interpolation techniques has applied to give their spatial structure. Different variogram models were determined to explain the spatial corrologram of parameters over space, and in return, used to estimate spatially distributed parameters using kriging. Jackknife kriging, which is a rekriging of each station by eliminating one sample from the original sample set and then taking the average of the rekriged estimates, has been used to test the practical validity of the model. The method gives better estimation and also resulting with standard deviation as useful indicator of uncertainty associated with station estimates. This analysis helps to understand spatial variability of radiative transmittance with position, height, aspect, slope and other topographic features. Two basin shortwave radiation data set (one in flat topography and the other in mountainous topography) are used to test statistical analysis of the modeling components of JGrass-NewAGE model systems.

Abera, Wuletawu; Formetta, Giuseppe; Rigon, Riccardo



Prey stage preference and functional response of Euseius hibisci to Tetranychus urticae (Acari: Phytoseiidae, Tetranychidae).  


The aims of this study were: (a) determine the prey stage preference of female Euseius hibisci (Chant) (Phytoseiidae) at constant densities of different stages of Tetranychus urticae Koch (Tetranychidae), (b) assess the functional response of the predator females to the varying densities of eggs, larvae, or protonymphs of T. urticae, and (c) estimate the functional response of E. hibisci when pollen of Ligustrum ovalifolium was present as well. We conducted experiments on excised pieces of strawberry leaf arenas (Fragaria ananassa) under laboratory conditions of 25+/-2 degrees C, 60+/-5% RH and 12 h photophase. Our results indicated that the predator consumed significantly more prey eggs than other prey stages. Consumption of prey deutonymphs and adults was so low that they were excluded from the non-choice functional response experiments. The functional response on all food items was of type II. The two parameters of the functional response were estimated for each prey type by means of the adjusted non-linear regression model. The highest estimated value a' (instantaneous rate of discovery) and the lowest value of Th (handling time, including digestion) were found for the predator feeding on prey eggs, and a' was lowest and Th highest when fed protonymphs. Using the jack-knife method, the values for the functional response parameters were estimated. The values of a' and Th produced by the model were similar among all prey types except for the eggs, which were different. Using pollen simultaneously with prey larvae decreased the consumption of the latter over the full range of prey densities The suitability of this predator for biological control of T. urticae on strawberry is discussed. PMID:15651524

Badii, Mohammad H; Hernández-Ortiz, Emilio; Flores, Adriana E; Landeros, Jerónimo



Influence of cell surface hydrophobicity on attachment of Campylobacter to abiotic surfaces.  


This work aimed to investigate the influence of physicochemical properties and prior mode of growth (planktonic or sessile culture) on attachment of 13 Campylobacter jejuni strains and 5 Campylobacter coli strains isolated from chicken samples to three abiotic surfaces: stainless steel, glass and polyurethane. Water contact angle and zeta potential measurements indicated that the strains varied with respect to surface hydrophobicity (17.6 ± 1.5 to 53.0 ± 2.3°) and surface charge (-3.3 ± 0.4 to -15.1 ± 0.5 mV). Individual strains had different attachment abilities to stainless steel and glass (3.79 ± 0.16 to 5.45 ± 0.08 log cell cm(-2)) but did not attach to polyurethane, with one exception. Attachment of Campylobacter to abiotic surfaces significantly correlated with cell surface hydrophobicity (P ? 0.007), but not with surface charge (P ? 0.507). Cells grown as planktonic and sessile culture generally differed significantly from each other with respect to hydrophobicity and attachment (P < 0.05), but not with respect to surface charge (P > 0.05). Principal component analysis (PCA) clustered strains into three groups (planktonic culture) and two groups (sessile culture) representing those with similar hydrophobicity and attachment. Of the four highly hydrophobic and adherent strains, three were C. coli suggesting that isolates with greater hydrophobicity and adherence may occur more frequently among C. coli than C. jejuni strains although this requires further investigation using a larger number of strains. Assignment of pulsed-field gel electrophoresis profiles to PCA groups using Jackknife analysis revealed no overall relationship between bacterial genotypes and bacterial attachment. No relationship between serotype distribution and bacterial attachment was apparent in this study. PMID:21569937

Nguyen, Vu Tuan; Turner, Mark S; Dykes, Gary A



Historical extension of operational NDVI products for livestock insurance in Kenya  

NASA Astrophysics Data System (ADS)

Droughts induce livestock losses that severely affect Kenyan pastoralists. Recent index insurance schemes have the potential of being a viable tool for insuring pastoralists against drought-related risk. Such schemes require as input a forage scarcity (or drought) index that can be reliably updated in near real-time, and that strongly relates to livestock mortality. Generally, a long record (>25 years) of the index is needed to correctly estimate mortality risk and calculate the related insurance premium. Data from current operational satellites used for large-scale vegetation monitoring span over a maximum of 15 years, a time period that is considered insufficient for accurate premium computation. This study examines how operational NDVI datasets compare to, and could be combined with the non-operational recently constructed 30-year GIMMS AVHRR record (1981-2011) to provide a near-real time drought index with a long term archive for the arid lands of Kenya. We compared six freely available, near-real time NDVI products: five from MODIS and one from SPOT-VEGETATION. Prior to comparison, all datasets were averaged in time for the two vegetative seasons in Kenya, and aggregated spatially at the administrative division level at which the insurance is offered. The feasibility of extending the resulting aggregated drought indices back in time was assessed using jackknifed R2 statistics (leave-one-year-out) for the overlapping period 2002-2011. We found that division-specific models were more effective than a global model for linking the division-level temporal variability of the index between NDVI products. Based on our results, good scope exists for historically extending the aggregated drought index, thus providing a longer operational record for insurance purposes. We showed that this extension may have large effects on the calculated insurance premium. Finally, we discuss several possible improvements to the drought index.

Vrieling, Anton; Meroni, Michele; Shee, Apurba; Mude, Andrew G.; Woodard, Joshua; de Bie, C. A. J. M. (Kees); Rembold, Felix



Asymmetric Constriction of Dividing Escherichia coli Cells Induced by Expression of a Fusion between Two Min Proteins.  


The Min system, consisting of MinC, MinD, and MinE, plays an important role in localizing the Escherichia coli cell division machinery to midcell by preventing FtsZ ring (Z ring) formation at cell poles. MinC has two domains, MinCn and MinCc, which both bind to FtsZ and act synergistically to inhibit FtsZ polymerization. Binary fission of E. coli usually proceeds symmetrically, with daughter cells at roughly 180° to each other. In contrast, we discovered that overproduction of an artificial MinCc-MinD fusion protein in the absence of other Min proteins induced frequent and dramatic jackknife-like bending of cells at division septa, with cell constriction predominantly on the outside of the bend. Mutations in the fusion known to disrupt MinCc-FtsZ, MinCc-MinD, or MinD-membrane interactions largely suppressed bending division. Imaging of FtsZ-green fluorescent protein (GFP) showed no obvious asymmetric localization of FtsZ during MinCc-MinD overproduction, suggesting that a downstream activity of the Z ring was inhibited asymmetrically. Consistent with this, MinCc-MinD fusions localized predominantly to segments of the Z ring at the inside of developing cell bends, while FtsA (but not ZipA) tended to localize to the outside. As FtsA is required for ring constriction, we propose that this asymmetric localization pattern blocks constriction of the inside of the septal ring while permitting continued constriction of the outside portion. PMID:24682325

Rowlett, Veronica Wells; Margolin, William



Do adolescent Ecstasy users have different attitudes towards drugs when compared to Marijuana users?  

PubMed Central

Background Perceived risk and attitudes about the consequences of drug use, perceptions of others expectations and self-efficacy influence the intent to try drugs and continue drug use once use has started. We examine associations between adolescents’ attitudes and beliefs towards ecstasy use; because most ecstasy users have a history of marijuana use, we estimate the association for three groups of adolescents: non-marijuana/ecstasy users, marijuana users (used marijuana at least once but never used ecstasy) and ecstasy users (used ecstasy at least once). Methods Data from 5,049 adolescents aged 12–18 years old who had complete weighted data information in Round 2 of the Restricted Use Files (RUF) of the National Survey of Parents and Youth (NSPY). Data were analyzed using jackknife weighted multinomial logistic regression models. Results Adolescent marijuana and ecstasy users were more likely to approve of marijuana and ecstasy use as compared to non-drug using youth. Adolescent marijuana and ecstasy users were more likely to have close friends who approved of ecstasy as compared to non-drug using youth. The magnitudes of these two associations were stronger for ecstasy use than for marijuana use in the final adjusted model. Our final adjusted model shows that approval of marijuana and ecstasy use was more strongly associated with marijuana and ecstasy use in adolescence than perceived risk in using both drugs. Conclusion Information about the risks and consequences of ecstasy use need to be presented to adolescents in order to attempt to reduce adolescents’ approval of ecstasy use as well as ecstasy experimentation.

Martins, Silvia S.; Storr, Carla L.; Alexandre, Pierre K.; Chilcoat, Howard D.



Improved classification of lung cancer tumors based on structural and physicochemical properties of proteins using data mining models.  


Detecting divergence between oncogenic tumors plays a pivotal role in cancer diagnosis and therapy. This research work was focused on designing a computational strategy to predict the class of lung cancer tumors from the structural and physicochemical properties (1497 attributes) of protein sequences obtained from genes defined by microarray analysis. The proposed methodology involved the use of hybrid feature selection techniques (gain ratio and correlation based subset evaluators with Incremental Feature Selection) followed by Bayesian Network prediction to discriminate lung cancer tumors as Small Cell Lung Cancer (SCLC), Non-Small Cell Lung Cancer (NSCLC) and the COMMON classes. Moreover, this methodology eliminated the need for extensive data cleansing strategies on the protein properties and revealed the optimal and minimal set of features that contributed to lung cancer tumor classification with an improved accuracy compared to previous work. We also attempted to predict via supervised clustering the possible clusters in the lung tumor data. Our results revealed that supervised clustering algorithms exhibited poor performance in differentiating the lung tumor classes. Hybrid feature selection identified the distribution of solvent accessibility, polarizability and hydrophobicity as the highest ranked features with Incremental feature selection and Bayesian Network prediction generating the optimal Jack-knife cross validation accuracy of 87.6%. Precise categorization of oncogenic genes causing SCLC and NSCLC based on the structural and physicochemical properties of their protein sequences is expected to unravel the functionality of proteins that are essential in maintaining the genomic integrity of a cell and also act as an informative source for drug design, targeting essential protein properties and their composition that are found to exist in lung cancer tumors. PMID:23505559

Ramani, R Geetha; Jacob, Shomona Gracia




SciTech Connect

We constrain the linear and quadratic bias parameters from the configuration dependence of the three-point correlation function (3PCF) in both redshift and projected space, utilizing measurements of spectroscopic galaxies in the Sloan Digital Sky Survey Main Galaxy Sample. We show that bright galaxies (M{sub r} < -21.5) are biased tracers of mass, measured at a significance of 4.5{sigma} in redshift space and 2.5{sigma} in projected space by using a thorough error analysis in the quasi-linear regime (9-27 h{sup -1} Mpc). Measurements on a fainter galaxy sample are consistent with an unbiased model. We demonstrate that a linear bias model appears sufficient to explain the galaxy-mass bias of our samples, although a model using both linear and quadratic terms results in a better fit. In contrast, the bias values obtained from the linear model appear in better agreement with the data by inspection of the relative bias and yield implied values of {sigma}{sub 8} that are more consistent with current constraints. We investigate the covariance of the 3PCF, which itself is a measurement of galaxy clustering. We assess the accuracy of our error estimates by comparing results from mock galaxy catalogs to jackknife re-sampling methods. We identify significant differences in the structure of the covariance. However, the impact of these discrepancies appears to be mitigated by an eigenmode analysis that can account for the noisy, unresolved modes. Our joint analysis of both redshift space and projected measurements allows us to identify systematic effects affecting constraints from the 3PCF.

McBride, Cameron K. [Department of Physics and Astronomy, University of Pittsburgh, Pittsburgh, PA 15260 (United States); Connolly, Andrew J. [Department of Astronomy, University of Washington, Seattle, WA 98195-1580 (United States); Gardner, Jeffrey P. [Department of Physics, University of Washington, Seattle, WA 98195-1560 (United States); Scranton, Ryan [Department of Physics, University of California, Davis, CA 95616 (United States); Scoccimarro, Roman [Center for Cosmology and Particle Physics, New York University, New York, NY 10003 (United States); Berlind, Andreas A. [Department of Physics and Astronomy, Vanderbilt University, Nashville, TN 37235 (United States); MarIn, Felipe [Department of Astronomy and Astrophysics, Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637 (United States); Schneider, Donald P., E-mail: [Department of Astronomy and Astrophysics, Pennsylvania State University, University Park, PA 16802 (United States)



Improved Classification of Lung Cancer Tumors Based on Structural and Physicochemical Properties of Proteins Using Data Mining Models  

PubMed Central

Detecting divergence between oncogenic tumors plays a pivotal role in cancer diagnosis and therapy. This research work was focused on designing a computational strategy to predict the class of lung cancer tumors from the structural and physicochemical properties (1497 attributes) of protein sequences obtained from genes defined by microarray analysis. The proposed methodology involved the use of hybrid feature selection techniques (gain ratio and correlation based subset evaluators with Incremental Feature Selection) followed by Bayesian Network prediction to discriminate lung cancer tumors as Small Cell Lung Cancer (SCLC), Non-Small Cell Lung Cancer (NSCLC) and the COMMON classes. Moreover, this methodology eliminated the need for extensive data cleansing strategies on the protein properties and revealed the optimal and minimal set of features that contributed to lung cancer tumor classification with an improved accuracy compared to previous work. We also attempted to predict via supervised clustering the possible clusters in the lung tumor data. Our results revealed that supervised clustering algorithms exhibited poor performance in differentiating the lung tumor classes. Hybrid feature selection identified the distribution of solvent accessibility, polarizability and hydrophobicity as the highest ranked features with Incremental feature selection and Bayesian Network prediction generating the optimal Jack-knife cross validation accuracy of 87.6%. Precise categorization of oncogenic genes causing SCLC and NSCLC based on the structural and physicochemical properties of their protein sequences is expected to unravel the functionality of proteins that are essential in maintaining the genomic integrity of a cell and also act as an informative source for drug design, targeting essential protein properties and their composition that are found to exist in lung cancer tumors.

Ramani, R. Geetha; Jacob, Shomona Gracia



iLoc-Euk: A Multi-Label Classifier for Predicting the Subcellular Localization of Singleplex and Multiplex Eukaryotic Proteins  

PubMed Central

Predicting protein subcellular localization is an important and difficult problem, particularly when query proteins may have the multiplex character, i.e., simultaneously exist at, or move between, two or more different subcellular location sites. Most of the existing protein subcellular location predictor can only be used to deal with the single-location or “singleplex” proteins. Actually, multiple-location or “multiplex” proteins should not be ignored because they usually posses some unique biological functions worthy of our special notice. By introducing the “multi-labeled learning” and “accumulation-layer scale”, a new predictor, called iLoc-Euk, has been developed that can be used to deal with the systems containing both singleplex and multiplex proteins. As a demonstration, the jackknife cross-validation was performed with iLoc-Euk on a benchmark dataset of eukaryotic proteins classified into the following 22 location sites: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centriole, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole, where none of proteins included has pairwise sequence identity to any other in a same subset. The overall success rate thus obtained by iLoc-Euk was 79%, which is significantly higher than that by any of the existing predictors that also have the capacity to deal with such a complicated and stringent system. As a user-friendly web-server, iLoc-Euk is freely accessible to the public at the web-site It is anticipated that iLoc-Euk may become a useful bioinformatics tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development Also, its novel approach will further stimulate the development of predicting other protein attributes.

Chou, Kuo-Chen; Wu, Zhi-Cheng; Xiao, Xuan



Sodium and potassium intakes among US adults: NHANES 2003-20081234  

PubMed Central

Background: The American Heart Association (AHA), Institute of Medicine (IOM), and US Departments of Health and Human Services and Agriculture (USDA) Dietary Guidelines for Americans all recommend that Americans limit sodium intake and choose foods that contain potassium to decrease the risk of hypertension and other adverse health outcomes. Objective: We estimated the distributions of usual daily sodium and potassium intakes by sociodemographic and health characteristics relative to current recommendations. Design: We used 24-h dietary recalls and other data from 12,581 adults aged ?20 y who participated in NHANES in 2003–2008. Estimates of sodium and potassium intakes were adjusted for within-individual day-to-day variation by using measurement error models. SEs and 95% CIs were assessed by using jackknife replicate weights. Results: Overall, 99.4% (95% CI: 99.3%, 99.5%) of US adults consumed more sodium daily than recommended by the AHA (<1500 mg), and 90.7% (89.6%, 91.8%) consumed more than the IOM Tolerable Upper Intake Level (2300 mg). In US adults who are recommended by the Dietary Guidelines to further reduce sodium intake to 1500 mg/d (ie, African Americans aged ?51 y or persons with hypertension, diabetes, or chronic kidney disease), 98.8% (98.4%, 99.2%) overall consumed >1500 mg/d, and 60.4% consumed >3000 mg/d—more than double the recommendation. Overall, <2% of US adults and ?5% of US men consumed ?4700 mg K/d (ie, met recommendations for potassium). Conclusion: Regardless of recommendations or sociodemographic or health characteristics, the vast majority of US adults consume too much sodium and too little potassium.

Zhang, Zefeng; Carriquiry, Alicia L; Gunn, Janelle P; Kuklina, Elena V; Saydah, Sharon H; Yang, Quanhe; Moshfegh, Alanna J



Prediction of subcellular localization of eukaryotic proteins using position-specific profiles and neural network with weighted inputs.  


Subcellular location is one of the key biological characteristics of proteins. Position-specific profiles (PSP) have been introduced as important characteristics of proteins in this article. In this study, to obtain position-specific profiles, the Position Specific Iterative-Basic Local Alignment Search Tool (PSI-BLAST) has been used to search for protein sequences in a database. Position-specific scoring matrices are extracted from the profiles as one class of characteristics. Four-part amino acid compositions and 1st-7th order dipeptide compositions have also been calculated as the other two classes of characteristics. Therefore, twelve characteristic vectors are extracted from each of the protein sequences. Next, the characteristic vectors are weighed by a simple weighing function and inputted into a BP neural network predictor named PSP-Weighted Neural Network (PSP-WNN). The Levenberg-Marquardt algorithm is employed to adjust the weight matrices and thresholds during the network training instead of the error back propagation algorithm. With a jackknife test on the RH2427 dataset, PSP-WNN has achieved a higher overall prediction accuracy of 88.4% rather than the prediction results by the general BP neural network, Markov model, and fuzzy k-nearest neighbors algorithm on this dataset. In addition, the prediction performance of PSP-WNN has been evaluated with a five-fold cross validation test on the PK7579 dataset and the prediction results have been consistently better than those of the previous method on the basis of several support vector machines, using compositions of both amino acids and amino acid pairs. These results indicate that PSP-WNN is a powerful tool for subcellular localization prediction. At the end of the article, influences on prediction accuracy using different weighting proportions among three characteristic vector categories have been discussed. An appropriate proportion is considered by increasing the prediction accuracy. PMID:18155620

Zou, Lingyun; Wang, Zhengzhi; Huang, Jiaomin



Assessing the effect of a true-positive recall case in screening mammography: does perceptual priming alter radiologists' performance?  


Objective: To measure the effect of the insertion of less-difficult malignant cases on subsequent breast cancer detection by breast imaging radiologists. Methods: The research comprises two studies. Study 1: 8 radiologists read 2 sets of images each consisting of 40 mammographic cases. Set A contained four abnormal cases, and Set B contained six abnormal cases, including two priming cases (less difficult malignancies) placed at intervals of three and five subsequent cases before a subtle cancer. Study 2: 16 radiologists read a third condition of the same cases, known as Set C, containing six abnormal cases and two priming cases immediately preceding the subtle cancer cases. The readers were asked to localize malignancies and give confidence ratings on decisions. Results: Although not significant, a decrease in performance was observed in Set B compared with in Set A. There was a significant increase in the receiver operating characteristic (ROC) area under the curve (z?=?-2.532; p?=?0.0114) and location sensitivity (z?=?-2.128; p?=?0.0333) between the first and second halves of Set A and a marginal improvement in jackknife free-response ROC figure of merit (z?=?-1.89; p?=?0.0587) between the first and second halves of Set B. In Study 2, Set C yielded no significant differences between the two halves of the study. Conclusion: Overall findings show no evidence that priming with lower difficulty malignant cases affects the detection of higher difficulty cancers; however, performance may decrease with priming. Advances in knowledge: This research suggests that inserting additional malignant cases in screening mammography sets as an audit tool may potentially lead to a decrease in performance of experienced breast radiologists. PMID:24814694

Lewis, S J; Mello-Thoms, C R; Brennan, P C; Lee, W; Tan, A; McEntee, M F; Evanoff, M; Pietrzyk, M; Reed, W M



Predicting Secretory Proteins of Malaria Parasite by Incorporating Sequence Evolution Information into Pseudo Amino Acid Composition via Grey System Model  

PubMed Central

The malaria disease has become a cause of poverty and a major hindrance to economic development. The culprit of the disease is the parasite, which secretes an array of proteins within the host erythrocyte to facilitate its own survival. Accordingly, the secretory proteins of malaria parasite have become a logical target for drug design against malaria. Unfortunately, with the increasing resistance to the drugs thus developed, the situation has become more complicated. To cope with the drug resistance problem, one strategy is to timely identify the secreted proteins by malaria parasite, which can serve as potential drug targets. However, it is both expensive and time-consuming to identify the secretory proteins of malaria parasite by experiments alone. To expedite the process for developing effective drugs against malaria, a computational predictor called “iSMP-Grey” was developed that can be used to identify the secretory proteins of malaria parasite based on the protein sequence information alone. During the prediction process a protein sample was formulated with a 60D (dimensional) feature vector formed by incorporating the sequence evolution information into the general form of PseAAC (pseudo amino acid composition) via a grey system model, which is particularly useful for solving complicated problems that are lack of sufficient information or need to process uncertain information. It was observed by the jackknife test that iSMP-Grey achieved an overall success rate of 94.8%, remarkably higher than those by the existing predictors in this area. As a user-friendly web-server, iSMP-Grey is freely accessible to the public at Moreover, for the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated mathematical equations involved in this paper.

Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen



Prediction of protein structure classes using hybrid space of multi-profile Bayes and bi-gram probability feature spaces.  


Proteins are the executants of biological functions in living organisms. Comprehension of protein structure is a challenging problem in the era of proteomics, computational biology, and bioinformatics because of its pivotal role in protein folding patterns. Owing to the large exploration of protein sequences in protein databanks and intricacy of protein structures, experimental and theoretical methods are insufficient for prediction of protein structure classes. Therefore, it is highly desirable to develop an accurate, reliable, and high throughput computational model to predict protein structure classes correctly from polygenetic sequences. In this regard, we propose a promising model employing hybrid descriptor space in conjunction with optimized evidence-theoretic K-nearest neighbor algorithm. Hybrid space is the composition of two descriptor spaces including Multi-profile Bayes and bi-gram probability. In order to enhance the generalization power of the classifier, we have selected high discriminative descriptors from the hybrid space using particle swarm optimization, a well-known evolutionary feature selection technique. Performance evaluation of the proposed model is performed using the jackknife test on three low similarity benchmark datasets including 25PDB, 1189, and 640. The success rates of the proposed model are 87.0%, 86.6%, and 88.4%, respectively on the three benchmark datasets. The comparative analysis exhibits that our proposed model has yielded promising results compared to the existing methods in the literature. In addition, our proposed prediction system might be helpful in future research particularly in cases where the major focus of research is on low similarity datasets. PMID:24384128

Hayat, Maqsood; Tahir, Muhammad; Khan, Sher Afzal



[Estimation of appropriate dose for computed radiography by the threshold value of the image quality figure].  


We estimated the optimum dose for imaging with a computed radiography (CR) system at two different pixel sizes based on the area under curve (AUC) in receiver operating characteristic (ROC) analysis and image quality figure (IQF). Samples for ROC analysis were prepared as follows. Acryl beads, 2.0 mm in diameter, were placed on a 50.0 mm tough water phantom that was fitted with a 20.0 mm Al filter (SID 200 cm, tube voltage 80 kV). The dose level at which the film density of the screen-film system (SRO250/SRG) was 1.0+/-0.05 served as the reference dose (0.69microC/kg). Five samples were prepared by multiplying the reference dose by 1/4, 1/2, 1, 2, and 4. The samples for image quality evaluation on the basis of IQF were prepared under identical conditions. A contrast-detail (C-D) phantom was placed on a 50.0 mm tough water phantom and images were taken. The contrast threshold of these samples was determined by 10 film readers, the same as those for the ROC analysis. When the significance of differences in the AUC was tested by the paired t-test (two-sided) and the Jackknife method, significant differences were noted between the reference dose and the 1/4 or 4-times dose at the standard pixel size (0.175 mm) and smaller pixel size (0.0875 mm) size, while no significant difference was noted between the reference dose and the 1/2 or 2-times dose. In terms of IQF, no significant difference was noted between standard and smaller pixel sizes (paired t-test). The IQF data indicate that the dose level for imaging with CR can be reduced by about 30% from the reference dose. PMID:19420827

Mochizuki, Yasuo; Abe, Shinji; Yamaguchi, Kojirou



Clinical validation of a medical grade color monitor for chest radiology  

NASA Astrophysics Data System (ADS)

Until recently, the specifications of medical grade monochrome LCD monitors outperformed those of color LCD monitors. New generations of color LCD monitors, however, show specifications that are in many respects similar to those of monochrome monitors typically used in diagnostic workstations. The aim of present study was to evaluate the impact of different medical grade monitors in terms of detection of simulated lung nodules in chest x-ray images. Specifically, we wanted to compare a new medical grade color monitor (Barco Coronis 6MP color) to a medical grade grayscale monitor (Barco Coronis 3MP monochrome) and a consumer color monitor (Philips 200VW 1.7MP color) by means of an observer performance experiment. Using the free-response acquisition data paradigm, seven radiologists were asked to detect and locate lung nodules (170 in total), simulated in half of the 200 chest X-ray images used in the experiment. The jackknife free-response receiver operating characteristic (JAFROC) analysis of the data showed a statistically significant difference between at least two monitors, F-value=3.77 and p-value =0.0481. The different Figure of Merit values were 0.727, 0.723 and 0.697 for the new color LCD monitor, the medical grade monitor and the consumer color monitor respectively. There was no difference between the needed reading times but there was a difference between the mean calculated Euclidian distances between the position marked by the observers and the center of the simulated nodule, indicating a better accuracy with both medical grade monitors. Present data suggests that the new generation of medical grade color monitors could be used as diagnostic workstations.

Jacobs, J.; Zanca, F.; Verschakelen, J.; Marchal, G.; Bosmans, H.



A hybrid orographic plus statistical model for downscaling daily precipitation in Northern California  

USGS Publications Warehouse

A hybrid (physical-statistical) scheme is developed to resolve the finescale distribution of daily precipitation over complex terrain. The scheme generates precipitation by combining information from the upper-air conditions and from sparsely distributed station measurements; thus, it proceeds in two steps. First, an initial estimate of the precipitation is made using a simplified orographic precipitation model. It is a steady-state, multilayer, and two-dimensional model following the concepts of Rhea. The model is driven by the 2.5?? ?? 2.5?? gridded National Oceanic and Atmospheric Administration-National Centers for Environmental Prediction upper-air profiles, and its parameters are tuned using the observed precipitation structure of the region. Precipitation is generated assuming a forced lifting of the air parcels as they cross the mountain barrier following a straight trajectory. Second, the precipitation is adjusted using errors between derived precipitation and observations from nearby sites. The study area covers the northern half of California, including coastal mountains, central valley, and the Sierra Nevada. The model is run for a 5-km rendition of terrain for days of January-March over the period of 1988-95. A jackknife analysis demonstrates the validity of the approach. The spatial and temporal distributions of the simulated precipitation field agree well with the observed precipitation. Further, a mapping of model performance indices (correlation coefficients, model bias, root-mean-square error, and threat scores) from an array of stations from the region indicates that the model performs satisfactorily in resolving daily precipitation at 5-km resolution.

Pandey, G. R.; Cayan, D. R.; Dettinger, M. D.; Georgakakos, K. P.



Nestedness in centipede (Chilopoda) assemblages on continental islands (Aegean, Greece)  

NASA Astrophysics Data System (ADS)

In natural ecosystems, species assemblages among isolated ecological communities such as continental islands often show a nested pattern in which biotas of sites with low species richness are non-random subsets of biotas of richer sites. The distribution of centipede (Chilopoda) species in the central and south Aegean archipelago was tested for nestedness. To achieve this aim we used distribution data for 53 species collected on 24 continental Aegean islands (Kyklades and Dodekanisa). Based on the first-order jackknife estimator, most of islands were comprehensively surveyed. In order to quantify nestedness, we used the nestedness temperature calculator (NTC) as well as the nestedness metric based on overlap and decreasing Fill (NODF). NTC indicated that data exhibited a high degree of nestedness in the central and south Aegean island complexes. As far as the Kyklades and Dodekanisa are concerned, NTC showed less nested centipede structures than the 24 islands. Likewise, NODF revealed a significant degree of nestedness in central and south Aegean islands. It also showed that biotas matrices without singletons were more nested than the complete ones (Aegean, Kyklades and Dodekanisa). The two commonest centipede taxa (lithobiomorphs and geophilomorphs) contributed differently to centipede assemblages. In the Kyklades and Dodekanisa, geophilomorphs did not show a reliable nested arrangement unlike lithobiomorphs. In relation to the entire data set, nestedness was positively associated with the degree of isolation. In the Kyklades altitudinal range best explained nestedness patterns, while in Dodekanisa habitat heterogeneity proved to be more important for the centipede communities. Island area does not seem to be a significant explanatory variable. Some of our results from the Kyklades were critically compared with those for terrestrial isopod and land snail nested assemblages from the same geographical area. The complex geological and palaeogeographical history of the Aegean archipelago partly accounted for the pattern of centipede assemblages.

Simaiakis, Stylianos Michail; Martínez-Morales, Miguel Angel


DNA-dependent RNA polymerase subunit B as a tool for phylogenetic reconstructions: branching topology of the archaeal domain.  


The branching topology of the archaeal (archaebacterial) domain was inferred from sequence comparisons of the largest subunit (B) of DNA-dependent RNA polymerases (RNAP). Both the nucleic acid sequences of the genes coding for RNAP subunit B and the amino acid sequences of the derived gene products were used for phylogenetic reconstructions. Individual analysis of the three nucleotide positions of codons revealed significant inequalities with respect to guanosine and cytosine (GC) content and evolutionary rates. Only the nucleotides at the second codon positions were found to be unbiased by varied GC contents and sufficiently conserved for reliable phylogenetic reconstructions. A decision matrix was used for the combination of the results of distance matrix, maximum parsimony, and maximum likelihood methods. For this purpose the original results (sums of squares, steps, and logarithms of likelihoods) were transformed into comparable effective values and analyzed with methods known from the theory of statistical decisions. Phylogenetic invariants and statistical analysis with resampling techniques (bootstrap and jackknife) confirmed the preferred branching topology, which is significantly different from the topology known from phylogenetic trees based on 16S rRNA sequences. The preferred topology reconstructed by this analysis shows a common stem for the Methanococcales and Methanobacteriales and a separation of the thermophilic sulfur archaea from the methanogens and halophiles. The latter coincides with a unique phylogenetic location of a characteristic splitting event replacing the largest RNAP subunit of thermophilic sulfur archaea by two fragments in methanogens and halophiles. This topology is in good agreement with physiological and structural differences between the various archaea and demonstrates RNAP to be a suitable phylogenetic marker molecule. PMID:8007009

Klenk, H P; Zillig, W



Aboveground biomass and leaf area index (LAI) mapping for Niassa Reserve, northern Mozambique  

NASA Astrophysics Data System (ADS)

Estimations of biomass are critical in miombo woodlands because they represent the primary source of goods and services for over 80% of the population in southern Africa. This study was carried out in Niassa Reserve, northern Mozambique. The main objectives were first to estimate woody biomass and Leaf Area Index (LAI) using remotely sensed data [RADARSAT (C-band, ? = 5.7-cm)] and Landsat ETM+ derived Normalized Difference Vegetation Index (NDVI) and Simple Ratio (SR) calibrated by field measurements and, second to determine, at both landscape and plot scales, the environmental controls (precipitation, woody cover density, fire and elephants) of biomass and LAI. A land-cover map (72% overall accuracy) was derived from the June 2004 ETM+ mosaic. Field biomass and LAI were correlated with RADARSAT backscatter (rbiomass = 0.65, rLAI = 0.57, p < 0.0001) from July 2004, NDVI (rbiomass = 0.30, rLAI = 0.35; p < 0.0001) and SR (rbiomass = 0.36, rLAI = 0.40, p < 0.0001). A jackknife stepwise regression technique was used to develop the best predictive models for biomass (biomass = -5.19 + 0.074 * radarsat + 1.56 * SR, r2 = 0.55) and LAI (LAI = -0.66 + 0.01 * radarsat + 0.22 * SR, r2 = 0.45). Biomass and LAI maps were produced with an estimated peak of 18 kg m-2 and 2.80 m2 m-2, respectively. On the landscape-scale, both biomass and LAI were strongly determined by mean annual precipitation (F = 13.91, p = 0.0002). On the plot spatial scale, woody biomass was significantly determined by fire frequency, and LAI by vegetation type.

Ribeiro, Natasha S.; Saatchi, Sassan S.; Shugart, Herman H.; Washington-Allen, Robert A.



Three-Dimensional Spectral-Domain Optical Coherence Tomography Data Analysis for Glaucoma Detection  

PubMed Central

Purpose To develop a new three-dimensional (3D) spectral-domain optical coherence tomography (SD-OCT) data analysis method using a machine learning technique based on variable-size super pixel segmentation that efficiently utilizes full 3D dataset to improve the discrimination between early glaucomatous and healthy eyes. Methods 192 eyes of 96 subjects (44 healthy, 59 glaucoma suspect and 89 glaucomatous eyes) were scanned with SD-OCT. Each SD-OCT cube dataset was first converted into 2D feature map based on retinal nerve fiber layer (RNFL) segmentation and then divided into various number of super pixels. Unlike the conventional super pixel having a fixed number of points, this newly developed variable-size super pixel is defined as a cluster of homogeneous adjacent pixels with variable size, shape and number. Features of super pixel map were extracted and used as inputs to machine classifier (LogitBoost adaptive boosting) to automatically identify diseased eyes. For discriminating performance assessment, area under the curve (AUC) of the receiver operating characteristics of the machine classifier outputs were compared with the conventional circumpapillary RNFL (cpRNFL) thickness measurements. Results The super pixel analysis showed statistically significantly higher AUC than the cpRNFL (0.855 vs. 0.707, respectively, p?=?0.031, Jackknife test) when glaucoma suspects were discriminated from healthy, while no significant difference was found when confirmed glaucoma eyes were discriminated from healthy eyes. Conclusions A novel 3D OCT analysis technique performed at least as well as the cpRNFL in glaucoma discrimination and even better at glaucoma suspect discrimination. This new method has the potential to improve early detection of glaucomatous damage.

Wollstein, Gadi; Bilonick, Richard A.; Folio, Lindsey S.; Nadler, Zach; Kagemann, Larry; Schuman, Joel S.



Environmental, dietary, demographic, and activity variables associated with biomarkers of exposure for benzene and lead.  


Classification and regression tree methods represent a potentially powerful means of identifying patterns in exposure data that may otherwise be overlooked. Here, regression tree models are developed to identify associations between blood concentrations of benzene and lead and over 300 variables of disparate type (numerical and categorical), often with observations that are missing or below the quantitation limit. Benzene and lead are selected from among all the environmental agents measured in the NHEXAS Region V study because they are ubiquitous, and they serve as paradigms for volatile organic compounds (VOCs) and heavy metals, two classes of environmental agents that have very different properties. Two sets of regression models were developed. In the first set, only environmental and dietary measurements were employed as predictor variables, while in the second set these were supplemented with demographic and time-activity data. In both sets of regression models, the predictor variables were regressed on the blood concentrations of the environmental agents. Jack-knife cross-validation was employed to detect overfitting of the models to the data. Blood concentrations of benzene were found to be associated with: (a) indoor air concentrations of benzene; (b) the duration of time spent indoors with someone who was smoking; and (c) the number of cigarettes smoked by the subject. All these associations suggest that tobacco smoke is a major source of exposure to benzene. Blood concentrations of lead were found to be associated with: (a) house dust concentrations of lead; (b) the duration of time spent working in a closed workshop; and (c) the year in which the subject moved into the residence. An unexpected finding was that the regression trees identified time-activity data as better predictors of the blood concentrations than the measurements in environmental and dietary media. PMID:14603342

Roy, A; Georgopoulos, P G; Ouyang, M; Freeman, N; Lioy, P J



Prediction of Protein S-Nitrosylation Sites Based on Adapted Normal Distribution Bi-Profile Bayes and Chou's Pseudo Amino Acid Composition.  


Protein S-nitrosylation is a reversible post-translational modification by covalent modification on the thiol group of cysteine residues by nitric oxide. Growing evidence shows that protein S-nitrosylation plays an important role in normal cellular function as well as in various pathophysiologic conditions. Because of the inherent chemical instability of the S-NO bond and the low abundance of endogenous S-nitrosylated proteins, the unambiguous identification of S-nitrosylation sites by commonly used proteomic approaches remains challenging. Therefore, computational prediction of S-nitrosylation sites has been considered as a powerful auxiliary tool. In this work, we mainly adopted an adapted normal distribution bi-profile Bayes (ANBPB) feature extraction model to characterize the distinction of position-specific amino acids in 784 S-nitrosylated and 1568 non-S-nitrosylated peptide sequences. We developed a support vector machine prediction model, iSNO-ANBPB, by incorporating ANBPB with the Chou's pseudo amino acid composition. In jackknife cross-validation experiments, iSNO-ANBPB yielded an accuracy of 65.39% and a Matthew's correlation coefficient (MCC) of 0.3014. When tested on an independent dataset, iSNO-ANBPB achieved an accuracy of 63.41% and a MCC of 0.2984, which are much higher than the values achieved by the existing predictors SNOSite, iSNO-PseAAC, the Li et al. algorithm, and iSNO-AAPair. On another training dataset, iSNO-ANBPB also outperformed GPS-SNO and iSNO-PseAAC in the 10-fold crossvalidation test. PMID:24918295

Jia, Cangzhi; Lin, Xin; Wang, Zhiping



Population pharmacokinetics of caffeine and its metabolites theobromine, paraxanthine and theophylline after inhalation in combination with diacetylmorphine.  


The stimulant effect of caffeine, as an additive in diacetylmorphine preparations for study purposes, may interfere with the pharmacodynamic effects of diacetylmorphine. In order to obtain insight into the pharmacology of caffeine after inhalation in heroin users, the pharmacokinetics of caffeine and its dimethylxanthine metabolites were studied. The objectives were to establish the population pharmacokinetics under these exceptional circumstances and to compare the results to published data regarding intravenous and oral administration in healthy volunteers. Diacetylmorphine preparations containing 100 mg of caffeine were used by 10 persons by inhalation. Plasma concentrations of caffeine, theobromine, paraxanthine and theophylline were measured by high performance liquid chromatography. Non-linear mixed effects modelling was used to estimate population pharmacokinetic parameters. The model was evaluated by the jack-knife procedure. Caffeine was rapidly and effectively absorbed after inhalation. Population pharmacokinetics of caffeine and its dimethylxanthine metabolites could adequately and simultaneously be described by a linear multi-compartment model. The volume of distribution for the central compartment was estimated to be 45.7 l and the apparent elimination rate constant of caffeine at 8 hr after inhalation was 0.150 hr(-1) for a typical individual. The bioavailability was approximately 60%. The presented model adequately describes the population pharmacokinetics of caffeine and its dimethylxanthine metabolites after inhalation of the caffeine sublimate of a 100 mg tablet. Validation proved the stability of the model. Pharmacokinetics of caffeine after inhalation and intravenous administration are to a large extent similar. The bioavailability of inhaled caffeine is approximately 60% in experienced smokers. PMID:15667599

Zandvliet, Anthe S; Huitema, Alwin D R; de Jonge, Milly E; den Hoed, Rob; Sparidans, Rolf W; Hendriks, Vincent M; van den Brink, Wim; van Ree, Jan M; Beijnen, Jos H



Does more sequence data improve estimates of galliform phylogeny? Analyses of a rapid radiation using a complete data matrix.  


The resolution of rapid evolutionary radiations or "bushes" in the tree of life has been one of the most difficult and interesting problems in phylogenetics. The avian order Galliformes appears to have undergone several rapid radiations that have limited the resolution of prior studies and obscured the position of taxa important both agriculturally and as model systems (chicken, turkey, Japanese quail). Here we present analyses of a multi-locus data matrix comprising over 15,000 sites, primarily from nuclear introns but also including three mitochondrial regions, from 46 galliform taxa with all gene regions sampled for all taxa. The increased sampling of unlinked nuclear genes provided strong bootstrap support for all but a small number of relationships. Coalescent-based methods to combine individual gene trees and analyses of datasets that are independent of published data indicated that this well-supported topology is likely to reflect the galliform species tree. The inclusion or exclusion of mitochondrial data had a limited impact upon analyses upon analyses using either concatenated data or multispecies coalescent methods. Some of the key phylogenetic findings include support for a second major clade within the core phasianids that includes the chicken and Japanese quail and clarification of the phylogenetic relationships of turkey. Jackknifed datasets suggested that there is an advantage to sampling many independent regions across the genome rather than obtaining long sequences for a small number of loci, possibly reflecting the differences among gene trees that differ due to incomplete lineage sorting. Despite the novel insights we obtained using this increased sampling of gene regions, some nodes remain unresolved, likely due to periods of rapid diversification. Resolving these remaining groups will likely require sequencing a very large number of gene regions, but our analyses now appear to support a robust backbone for this order. PMID:24795852

Kimball, Rebecca T; Braun, Edward L



Modelling temperature, photoperiod and vernalization responses of Brunonia australis (Goodeniaceae) and Calandrinia sp. (Portulacaceae) to predict flowering time  

PubMed Central

Background and Aims Crop models for herbaceous ornamental species typically include functions for temperature and photoperiod responses, but very few incorporate vernalization, which is a requirement of many traditional crops. This study investigated the development of floriculture crop models, which describe temperature responses, plus photoperiod or vernalization requirements, using Australian native ephemerals Brunonia australis and Calandrinia sp. Methods A novel approach involved the use of a field crop modelling tool, DEVEL2. This optimization program estimates the parameters of selected functions within the development rate models using an iterative process that minimizes sum of squares residual between estimated and observed days for the phenological event. Parameter profiling and jack-knifing are included in DEVEL2 to remove bias from parameter estimates and introduce rigour into the parameter selection process. Key Results Development rate of B. australis from planting to first visible floral bud (VFB) was predicted using a multiplicative approach with a curvilinear function to describe temperature responses and a broken linear function to explain photoperiod responses. A similar model was used to describe the development rate of Calandrinia sp., except the photoperiod function was replaced with an exponential vernalization function, which explained a facultative cold requirement and included a coefficient for determining the vernalization ceiling temperature. Temperature was the main environmental factor influencing development rate for VFB to anthesis of both species and was predicted using a linear model. Conclusions The phenology models for B. australis and Calandrinia sp. described development rate from planting to VFB and from VFB to anthesis in response to temperature and photoperiod or vernalization and may assist modelling efforts of other herbaceous ornamental plants. In addition to crop management, the vernalization function could be used to identify plant communities most at risk from predicted increases in temperature due to global warming.

Cave, Robyn L.; Hammer, Graeme L.; McLean, Greg; Birch, Colin J.; Erwin, John E.; Johnston, Margaret E.



Dose reduction and its influence on diagnostic accuracy and radiation risk in digital mammography: an observer performance study using an anthropomorphic breast phantom  

PubMed Central

This study aimed to investigate the effect of dose reduction on diagnostic accuracy and radiation risk in digital mammography. Simulated masses and microcalcifications were positioned in an anthropomorphic breast phantom. Thirty digital images, 14 with lesions, 16 without, were acquired of the phantom using a Mammomat Novation (Siemens, Erlangen, Germany) at each of three dose levels. These corresponded to 100%, 50% and 30% of the normally used average glandular dose (AGD; 1.3 mGy for a standard breast). Eight observers interpreted the 90 unprocessed images in a free-response study and the data was analyzed with the jackknife free-response receiver operating characteristic (JAFROC) method. Observer performance was assessed using the JAFROC figure of merit (FOM). The benefit of radiation risk reduction was estimated based on several risk models. There was no statistically significant difference in performance, as described by the FOM, between the 100% and the 50% dose levels. However, the FOMs for both the 100% and the 50% dose were significantly different from the corresponding quantity for the 30% dose level (F-statistic = 4.95, p-value = 0.01). A dose reduction of 50% would result in 3-9 fewer breast cancer fatalities per 100,000 women undergoing annual screening from the age of 40 to 49 years. The results of the study indicate a possibility of reducing the dose to the breast to half of the dose level currently used. This has to be confirmed in clinical studies and possible differences depending on lesion type should be further examined.

Svahn, Tony; Hemdal, Bengt; Ruschin, Mark; Chakraborty, Dev P; Andersson, Ingvar; Tingberg, Anders; Mattsson, Soren



Predicting Anatomical Therapeutic Chemical (ATC) Classification of Drugs by Integrating Chemical-Chemical Interactions and Similarities  

PubMed Central

The Anatomical Therapeutic Chemical (ATC) classification system, recommended by the World Health Organization, categories drugs into different classes according to their therapeutic and chemical characteristics. For a set of query compounds, how can we identify which ATC-class (or classes) they belong to? It is an important and challenging problem because the information thus obtained would be quite useful for drug development and utilization. By hybridizing the informations of chemical-chemical interactions and chemical-chemical similarities, a novel method was developed for such purpose. It was observed by the jackknife test on a benchmark dataset of 3,883 drug compounds that the overall success rate achieved by the prediction method was about 73% in identifying the drugs among the following 14 main ATC-classes: (1) alimentary tract and metabolism; (2) blood and blood forming organs; (3) cardiovascular system; (4) dermatologicals; (5) genitourinary system and sex hormones; (6) systemic hormonal preparations, excluding sex hormones and insulins; (7) anti-infectives for systemic use; (8) antineoplastic and immunomodulating agents; (9) musculoskeletal system; (10) nervous system; (11) antiparasitic products, insecticides and repellents; (12) respiratory system; (13) sensory organs; (14) various. Such a success rate is substantially higher than 7% by the random guess. It has not escaped our notice that the current method can be straightforwardly extended to identify the drugs for their 2nd-level, 3rd-level, 4th-level, and 5th-level ATC-classifications once the statistically significant benchmark data are available for these lower levels.

Chen, Lei; Zeng, Wei-Ming; Cai, Yu-Dong; Feng, Kai-Yan; Chou, Kuo-Chen



H-ATLAS: estimating redshifts of Herschel sources from sub-mm fluxes  

NASA Astrophysics Data System (ADS)

Upon its completion, the Herschel Astrophysics Terahertz Large Area Survey (H-ATLAS) will be the largest sub-millimetre survey to date, detecting close to half-a-million sources. It will only be possible to measure spectroscopic redshifts for a small fraction of these sources. However, if the rest-frame spectral energy distribution (SED) of a typical H-ATLAS source is known, this SED and the observed Herschel fluxes can be used to estimate the redshifts of the H-ATLAS sources without spectroscopic redshifts. In this paper, we use a sub-set of 40 H-ATLAS sources with previously measured redshifts in the range 0.5 < z < 4.2 to derive a suitable average template for high-redshift H-ATLAS sources. We find that a template with two dust components (Tc = 23.9 K, Th = 46.9 K and ratio of mass of cold dust to mass of warm dust of 30.1) provides a good fit to the rest-frame fluxes of the sources in our calibration sample. We use a jackknife technique to estimate the accuracy of the redshifts estimated with this template, finding a root mean square of ?z/(1 + z) = 0.26. For sources for which there is prior information that they lie at z > 1, we estimate that the rms of ?z/(1 + z) = 0.12. We have used this template to estimate the redshift distribution for the sources detected in the H-ATLAS equatorial fields, finding a bimodal distribution with a mean redshift of 1.2, 1.9 and 2.5 for 250, 350 and 500 ?m selected sources, respectively.

Pearson, E. A.; Eales, S.; Dunne, L.; Gonzalez-Nuevo, J.; Maddox, S.; Aguirre, J. E.; Baes, M.; Baker, A. J.; Bourne, N.; Bradford, C. M.; Clark, C. J. R.; Cooray, A.; Dariush, A.; Zotti, G. De; Dye, S.; Frayer, D.; Gomez, H. L.; Harris, A. I.; Hopwood, R.; Ibar, E.; Ivison, R. J.; Jarvis, M.; Krips, M.; Lapi, A.; Lupu, R. E.; Micha?owski, M. J.; Rosenman, M.; Scott, D.; Valiante, E.; Valtchanov, I.; Werf, P. van der; Vieira, J. D.



Prediction of Protein S-Nitrosylation Sites Based on Adapted Normal Distribution Bi-Profile Bayes and Chou's Pseudo Amino Acid Composition  

PubMed Central

Protein S-nitrosylation is a reversible post-translational modification by covalent modification on the thiol group of cysteine residues by nitric oxide. Growing evidence shows that protein S-nitrosylation plays an important role in normal cellular function as well as in various pathophysiologic conditions. Because of the inherent chemical instability of the S-NO bond and the low abundance of endogenous S-nitrosylated proteins, the unambiguous identification of S-nitrosylation sites by commonly used proteomic approaches remains challenging. Therefore, computational prediction of S-nitrosylation sites has been considered as a powerful auxiliary tool. In this work, we mainly adopted an adapted normal distribution bi-profile Bayes (ANBPB) feature extraction model to characterize the distinction of position-specific amino acids in 784 S-nitrosylated and 1568 non-S-nitrosylated peptide sequences. We developed a support vector machine prediction model, iSNO-ANBPB, by incorporating ANBPB with the Chou’s pseudo amino acid composition. In jackknife cross-validation experiments, iSNO-ANBPB yielded an accuracy of 65.39% and a Matthew’s correlation coefficient (MCC) of 0.3014. When tested on an independent dataset, iSNO-ANBPB achieved an accuracy of 63.41% and a MCC of 0.2984, which are much higher than the values achieved by the existing predictors SNOSite, iSNO-PseAAC, the Li et al. algorithm, and iSNO-AAPair. On another training dataset, iSNO-ANBPB also outperformed GPS-SNO and iSNO-PseAAC in the 10-fold crossvalidation test.

Jia, Cangzhi; Lin, Xin; Wang, Zhiping



Grid search modeling of receiver functions: Implications for crustal structure in the Middle East and North Africa  

SciTech Connect

A grid search is used to estimate average crustal thickness and shear wave velocity structure beneath 12 three-component broadband seismic stations in the Middle East, North Africa, and nearby regions. The crustal thickness in these regions is found to vary from a minimum of 8.0{plus_minus}1.5&hthinsp;km in East Africa (Afar) region to possibly a maximum of 64{plus_minus}4.8&hthinsp;km in the lesser Caucasus. Stations located within the stable African platform indicate a crustal thickness of about 40 km. Teleseismic three-component waveform data produced by 165 earthquakes are used to create receiver function stacks for each station. Using a grid search, we have solved for the optimal and most simple shear velocity models beneath all 12 stations. Unlike other techniques (linearized least squares or forward modeling), the grid search methodology guarantees that we solve for the global minimum within our defined model parameter space. Using the grid search, we also qualitatively estimate the least number of layers required to model the observed receiver functions{close_quote} major seismic phases (e.g., PS{sub Moho}). A jackknife error estimation method is used to test the stability of our receiver function inversions for all 12 stations in the region that had recorded a sufficient number of high-quality broadband teleseismic waveforms. Five of the 12 estimates of crustal thicknesses are consistent with what is known of crustal structure from prior geophysical work. Furthermore, the remaining seven estimates of crustal structure are in regions for which previously there were few or no data about crustal thickness. {copyright} 1998 American Geophysical Union

Sandvol, E.; Seber, D.; Calvert, A.; Barazangi, M. [Institute for the Study of the Continents, Cornell University, Ithaca, New York (United States)] [Institute for the Study of the Continents, Cornell University, Ithaca, New York (United States)



Comparative assessment of GIS-based methods and metrics for estimating long-term exposures to air pollution  

NASA Astrophysics Data System (ADS)

The development of geographical information system techniques has opened up a wide array of methods for air pollution exposure assessment. The extent to which these provide reliable estimates of air pollution concentrations is nevertheless not clearly established. Nor is it clear which methods or metrics should be preferred in epidemiological studies. This paper compares the performance of ten different methods and metrics in terms of their ability to predict mean annual PM 10 concentrations across 52 monitoring sites in London, UK. Metrics analysed include indicators (distance to nearest road, traffic volume on nearest road, heavy duty vehicle (HDV) volume on nearest road, road density within 150 m, traffic volume within 150 m and HDV volume within 150 m) and four modelling approaches: based on the nearest monitoring site, kriging, dispersion modelling and land use regression (LUR). Measures were computed in a GIS, and resulting metrics calibrated and validated against monitoring data using a form of grouped jack-knife analysis. The results show that PM 10 concentrations across London show little spatial variation. As a consequence, most methods can predict the average without serious bias. Few of the approaches, however, show good correlations with monitored PM 10 concentrations, and most predict no better than a simple classification based on site type. Only land use regression reaches acceptable levels of correlation ( R2 = 0.47), though this can be improved by also including information on site type. This might therefore be taken as a recommended approach in many studies, though care is needed in developing meaningful land use regression models, and like any method they need to be validated against local data before their application as part of epidemiological studies.

Gulliver, John; de Hoogh, Kees; Fecht, Daniela; Vienneau, Danielle; Briggs, David



Phylogenetic studies favour the unification of Pennisetum, Cenchrus and Odontelytrum (Poaceae): a combined nuclear, plastid and morphological analysis, and nomenclatural combinations in Cenchrus  

PubMed Central

Backgrounds and Aims Twenty-five genera having sterile inflorescence branches were recognized as the bristle clade within the x = 9 Paniceae (Panicoideae). Within the bristle clade, taxonomic circumscription of Cenchrus (20–25 species), Pennisetum (80–140) and the monotypic Odontelytrum is still unclear. Several criteria have been applied to characterize Cenchrus and Pennisetum, but none of these has proved satisfactory as the diagnostic characters, such as fusion of bristles in the inflorescences, show continuous variation. Methods A phylogenetic analysis based on morphological, plastid (trnL-F, ndhF) and nuclear (knotted) data is presented for a representative species sampling of the genera. All analyses were conducted under parsimony, using heuristic searches with TBR branch swapping. Branch support was assessed with parsimony jackknifing. Key Results Based on plastid and morphological data, Pennisetum, Cenchrus and Odontelytrum were supported as a monophyletic group: the PCO clade. Only one section of Pennisetum (Brevivalvula) was supported as monophyletic. The position of P. lanatum differed among data partitions, although the combined plastid and morphology and nuclear analyses showed this species to be a member of the PCO clade. The basic chromosome number x = 9 was found to be plesiomorphic, and x = 5, 7, 8, 10 and 17 were derived states. The nuclear phylogenetic analysis revealed a reticulate pattern of relationships among Pennisetum and Cenchrus, suggesting that there are at least three different genomes. Because apomixis can be transferred among species through hybridization, its history most likely reflects crossing relationships, rather than multiple independent appearances. Conclusions Due to the consistency between the present results and different phylogenetic hypotheses (including morphological, developmental and multilocus approaches), and the high support found for the PCO clade, also including the type species of the three genera, we propose unification of Pennisetum, Cenchrus and Odontelytrum. Species of Pennisetum and Odontelytrum are here transferred into Cenchrus, which has priority. Sixty-six new combinations are made here.

Chemisquy, M. Amelia; Giussani, Liliana M.; Scataglini, Maria A.; Kellogg, Elizabeth A.; Morrone, Osvaldo



Technical efficiency of district hospitals: Evidence from Namibia using Data Envelopment Analysis  

PubMed Central

Background In most countries of the sub-Saharan Africa, health care needs have been increasing due to emerging and re-emerging health problems. However, the supply of health care resources to address the problems has been continuously declining, thus jeopardizing the progress towards achieving the health-related Millennium Development Goals. Namibia is no exception to this. It is therefore necessary to quantify the level of technical inefficiency in the countries so as to alert policy makers of the potential resource gains to the health system if the hospitals that absorb a lion's share of the available resources are technically efficient. Method All public sector hospitals (N = 30) were included in the study. Hospital capacity utilization ratios and the data envelopment analysis (DEA) technique were used to assess technical efficiency. The DEA model used three inputs and two outputs. Data for four financial years (1997/98 to 2000/2001) was used for the analysis. To test for the robustness of the DEA technical efficiency scores the Jackknife analysis was used. Results The findings suggest the presence of substantial degree of pure technical and scale inefficiency. The average technical efficiency level during the given period was less than 75%. Less than half of the hospitals included in the study were located on the technically efficient frontier. Increasing returns to scale is observed to be the predominant form of scale inefficiency. Conclusion It is concluded that the existing level of pure technical and scale inefficiency of the district hospitals is considerably high and may negatively affect the government's initiatives to improve access to quality health care and scaling up of interventions that are necessary to achieve the health-related Millennium Development Goals. It is recommended that the inefficient hospitals learn from their efficient peers identified by the DEA model so as to improve the overall performance of the health system.

Zere, Eyob; Mbeeli, Thomas; Shangula, Kalumbi; Mandlhate, Custodia; Mutirua, Kautoo; Tjivambi, Ben; Kapenambili, William



Testate Amoebae as Paleohydrological Proxies in the Florida Everglades  

NASA Astrophysics Data System (ADS)

The largest wetland restoration effort ever attempted, the Comprehensive Everglades Restoration Plan (CERP), is currently underway in the Florida Everglades, and a critical goal of CERP is reestablishment of the pre-drainage (pre-AD 1880) hydrology. Paleoecological research in the greater Everglades ecosystem is underway to reconstruct past water levels and variability throughout the system, providing a basis for restoration targets. Testate amoebae, a group of unicellular organisms that form decay-resistant tests, have been successfully used in northern-latitude bogs to reconstruct past wetland hydrology; however, their application in other peatland types, particularly at lower latitudes, has not been well studied. We assessed the potential use of testate amoebae as tools to reconstruct the past hydrology of the Everglades. Modern surface samples were collected from the Everglades National Park and Water Conservation Areas, across a water table gradient that included four vegetation types (tree island interior, tree island edge, sawgrass transition, slough). Community composition was quantified and compared to environmental conditions (water table, pH, vegetation) using ordination and gradient-analysis approaches. Results of nonmetric multidimensional scaling revealed that the most important pattern of community change, representing about 30% of the variance in the dataset, was related to water-table depth (r2=0.32). Jackknifed cross-validation of a transfer function for water table depth, based on a simple weighted average model, indicated the potential for testate amoebae in studies of past Everglades hydrology (RMSEP = 9 cm, r2=0.47). Although the performance of the transfer function was not as good as those from northern-latitude bogs, our results suggest that testate amoebae could be could be a valuable tool in paleohydrological studies of the Everglades, particularly when used with other hydrological proxies (e.g., pollen, plant macrofossils, diatoms).

Andrews, T.; Booth, R.; Bernhardt, C. E.; Willard, D. A.



An Ancient Origin for the Enigmatic Flat-Headed Frogs (Bombinatoridae: Barbourula) from the Islands of Southeast Asia  

PubMed Central

Background The complex history of Southeast Asian islands has long been of interest to biogeographers. Dispersal and vicariance events in the Pleistocene have received the most attention, though recent studies suggest a potentially more ancient history to components of the terrestrial fauna. Among this fauna is the enigmatic archaeobatrachian frog genus Barbourula, which only occurs on the islands of Borneo and Palawan. We utilize this lineage to gain unique insight into the temporal history of lineage diversification in Southeast Asian islands. Methodology/Principal Findings Using mitochondrial and nuclear genetic data, multiple fossil calibration points, and likelihood and Bayesian methods, we estimate phylogenetic relationships and divergence times for Barbourula. We determine the sensitivity of focal divergence times to specific calibration points by jackknife approach in which each calibration point is excluded from analysis. We find that relevant divergence time estimates are robust to the exclusion of specific calibration points. Barbourula is recovered as a monophyletic lineage nested within a monophyletic Costata. Barbourula diverged from its sister taxon Bombina in the Paleogene and the two species of Barbourula diverged in the Late Miocene. Conclusions/Significance The divergences within Barbourula and between it and Bombina are surprisingly old and represent the oldest estimates for a cladogenetic event resulting in living taxa endemic to Southeast Asian islands. Moreover, these divergence time estimates are consistent with a new biogeographic scenario: the Palawan Ark Hypothesis. We suggest that components of Palawan's terrestrial fauna might have “rafted” on emergent portions of the North Palawan Block during its migration from the Asian mainland to its present-day position near Borneo. Further, dispersal from Palawan to Borneo (rather than Borneo to Palawan) may explain the current day disjunct distribution of this ancient lineage.

Blackburn, David C.; Bickford, David P.; Diesmos, Arvin C.; Iskandar, Djoko T.; Brown, Rafe M.



The Application of Censored Regression Models in Low Streamflow Analyses  

NASA Astrophysics Data System (ADS)

Estimation of low streamflow statistics at gauged and ungauged river sites is often a daunting task. This process is further confounded by the presence of intermittent streamflows, where streamflow is sometimes reported as zero, within a region. Streamflows recorded as zero may be zero, or may be less than the measurement detection limit. Such data is often referred to as censored data. Numerous methods have been developed to characterize intermittent streamflow series. Logit regression has been proposed to develop regional models of the probability annual lowflows series (such as 7-day lowflows) are zero. In addition, Tobit regression, a method of regression that allows for censored dependent variables, has been proposed for lowflow regional regression models in regions where the lowflow statistic of interest estimated as zero at some sites in the region. While these methods have been proposed, their use in practice has been limited. Here a delete-one jackknife simulation is presented to examine the performance of Logit and Tobit models of 7-day annual minimum flows in 6 USGS water resource regions in the United States. For the Logit model, an assessment is made of whether sites are correctly classified as having at least 10% of 7-day annual lowflows equal to zero. In such a situation, the 7-day, 10-year lowflow (Q710), a commonly employed low streamflow statistic, would be reported as zero. For the Tobit model, a comparison is made between results from the Tobit model, and from performing either ordinary least squares (OLS) or principal component regression (PCR) after the zero sites are dropped from the analysis. Initial results for the Logit model indicate this method to have a high probability of correctly classifying sites into groups with Q710s as zero and non-zero. Initial results also indicate the Tobit model produces better results than PCR and OLS when more than 5% of the sites in the region have Q710 values calculated as zero.

Kroll, C.; Luz, J.



Streamflow prediction in ungauged catchments using copula-based dissimilarity measures  

NASA Astrophysics Data System (ADS)

There are many procedures in the available literature to perform prediction in ungauged basins. Commonly, the Euclidean metric is used as a proxy of the hydrologic dissimilarity. Here we propose a procedure to find a metric on the basis of dissimilarity measures that are estimated from pairwise empirical copula densities of runoff. A metric is then defined in an transformed space of basin descriptors, whose parameterization is obtained with a variance reducing technique. A hydrologic model was run in an ungauged basin with sets of global parameters obtained from the k nearest neighboring donor basins using various metrics to take into account the uncertainty of its parameterization. Hydrologic model parameters were regionalized with a multiscale parameter regionalization technique whose transfer function parameters were found via calibration. The streamflow in an ungauged basin was found as an ensemble streamflow prediction to account for the uncertainties of the transfer function parameters as well as those to define the metric. This technique was applied in 38 German basins ranging in size from 70 to 4000 km2. For each basin, a number of catchment descriptors and several climatic indices were quantified, e.g., mean slope, aspect, shape factor, mean elevation, and mean monthly temperature in January, among others. Daily streamflow time series correspond to the period from 1961 to 2000. Simulated daily discharge was validated with a Jackknife cross-validation technique. Nash-Sutcliffe efficiencies obtained in this way ranged between 0.76 and 0.86. These results suggested that the proposed technique would produce reasonable results in ungauged basins.

Samaniego, Luis; BáRdossy, AndráS.; Kumar, Rohini



Efficacy of digital breast tomosynthesis for breast cancer diagnosis  

NASA Astrophysics Data System (ADS)

Purpose: To compare the diagnostic performance of digital breast tomosynthesis (DBT) in combination with digital mammography (DM) with that of digital mammography alone. Materials and Methods: Twenty six experienced radiologists who specialized in breast imaging read 50 cases (27 cancers and 23 non-cancer cases) of patients who underwent DM and DBT. Both exams included the craniocaudal (CC) and mediolateral oblique (MLO) views. Histopathologic examination established truth in all lesions. Each case was interpreted in two modes, once with DM alone followed by DM+DBT, and the observers were asked to mark the location of any lesions, if present, and give it a score based on a five-category assessment by the Royal Australian and New Zealand College of Radiologists (RANZCR). The diagnostic performance of DM compared with that of DM+DBT was evaluated in terms of the difference between areas under receiver-operating characteristic curves (AUCs), Jackknife free-response receiver operator characteristics (JAFROC) figure-of-merit, sensitivity, location sensitivity and specificity. Results: Average AUC and JAFROC for DM versus DM+DBT was significantly different (AUCs 0.690 vs 0.781, p=< 0.0001), (JAFROC 0.618 vs. 0.732, p=< 0.0001) respectively. In addition, the use of DM+DBT resulted in an improvement in sensitivity (0.629 vs. 0.701, p=0.0011), location sensitivity (0.548 vs. 0.690, p=< 0.0001) and specificity (0.656 vs. 0.758, p=0.0015) when compared to DM alone. Conclusion: Adding DBT to the standard DM significantly improved radiologists' performance in terms of AUCs, JAFROC figure of merit, sensitivity, location sensitivity and specificity values.

Alakhras, M.; Mello-Thoms, C.; Rickard, M.; Bourne, R.; Brennan, P. C.



A comparison of Australian and USA radiologists' performance in detection of breast cancer  

NASA Astrophysics Data System (ADS)

The aim of current work was to compare the performance of radiologists that read a higher number of cases to those that read a lower number, as well as examine the effect of number of years of experience on performance. This study compares Australian and USA radiologist with differing levels of experience when reading mammograms. Thirty mammographic cases were presented to 41 radiologists, 21 from Australia and 20 from the USA. Readers were asked to locate and visualize cancer and assign a mark-rating pair with confidence levels from 1 to 5. A jackknife free-response receiver operating characteristic (JAFROC), inferred receiver operating characteristic (ROC), sensitivity, specificity and location sensitivity were calculated. A Mann-Whitney test was used to compare the performance of Australian and USA radiologists using SPSS software. The results showed that the USA radiologists sampled had more years of experience (p?0.01) but read less mammograms per year (p?0.03). Significantly higher sensitivity and location sensitivity (p? 0.001) were found for the Australia radiologists when experience and the number of mammograms read per year were taken into account. There were no differences between the two countries in overall performance measured by JAFROC and inferred ROC. For the most experienced radiologists within the Australian sample experienced ROC and location sensitivity were higher when compared to the least experienced. The increased number of years experience of the USA radiologists did not result in an increase in any performance metrics. The number of cases per year is a better predictor of improved diagnostic performance.

Suleiman, Wasfi I.; Georgian-Smith, Dianne; Evanoff, Michael G.; Lewis, Sarah; McEntee, Mark F.



Reliability of Different Mark-Recapture Methods for Population Size Estimation Tested against Reference Population Sizes Constructed from Field Data  

PubMed Central

Reliable estimates of population size are fundamental in many ecological studies and biodiversity conservation. Selecting appropriate methods to estimate abundance is often very difficult, especially if data are scarce. Most studies concerning the reliability of different estimators used simulation data based on assumptions about capture variability that do not necessarily reflect conditions in natural populations. Here, we used data from an intensively studied closed population of the arboreal gecko Gehyra variegata to construct reference population sizes for assessing twelve different population size estimators in terms of bias, precision, accuracy, and their 95%-confidence intervals. Two of the reference populations reflect natural biological entities, whereas the other reference populations reflect artificial subsets of the population. Since individual heterogeneity was assumed, we tested modifications of the Lincoln-Petersen estimator, a set of models in programs MARK and CARE-2, and a truncated geometric distribution. Ranking of methods was similar across criteria. Models accounting for individual heterogeneity performed best in all assessment criteria. For populations from heterogeneous habitats without obvious covariates explaining individual heterogeneity, we recommend using the moment estimator or the interpolated jackknife estimator (both implemented in CAPTURE/MARK). If data for capture frequencies are substantial, we recommend the sample coverage or the estimating equation (both models implemented in CARE-2). Depending on the distribution of catchabilities, our proposed multiple Lincoln-Petersen and a truncated geometric distribution obtained comparably good results. The former usually resulted in a minimum population size and the latter can be recommended when there is a long tail of low capture probabilities. Models with covariates and mixture models performed poorly. Our approach identified suitable methods and extended options to evaluate the performance of mark-recapture population size estimators under field conditions, which is essential for selecting an appropriate method and obtaining reliable results in ecology and conservation biology, and thus for sound management.

Grimm, Annegret; Gruber, Bernd; Henle, Klaus



Analysis of the successional patterns of insects on carrion in southwest Virginia.  


Studies of carrion-insect succession on domestic pig, Sus scrofa L., were conducted in the spring and summer of 2001 and 2002 in Blacksburg, VA, to identify and analyze the successional patterns of the taxa of forensic importance in southwest Virginia. Forty-seven insect taxa were collected in the spring. These were represented by 11 families (Diptera: Calliphoridae, Sarcophagidae, Muscidae, Sepsidae, Piophilidae; Coleoptera: Staphylinidae, Silphidae, Cleridae, Trogidae, Dermestidae, Histeridae). In the summer, 33 taxa were collected that were represented by all of the families collected in the spring, except Trogidae. The most common flies collected were the calliphorids: Phormia regina (Meigen) and Phaenicia coeruleiviridis (Macquart). The most common beetles were Creophilus maxillosus L. (Staphylinidae), Oiceoptoma noveboracense Forster, Necrophila americana L., Necrodes surinamensis (F.) (Silphidae), Euspilotus assimilis (Paykull), and Hister abbreviatus F. (Histeridae). Occurrence matrices were constructed for the successional patterns of insect taxa during 21 sampling intervals in the spring and 8 intervals in the summer studies. Jackknife estimates (mean+/-95% confidence limits) of overall Jaccard similarity in insect taxa among sampling intervals in the occurrence matrices were 0.213+/-0.081 (spring 2001), 0.194+/-0.043 (summer 2001), 0.257+/-0.068 (spring 2002), and 0.274+/-0.172 (summer 2002). Permutation analyses of the occurrence matrices showed that the patterns of succession of insect taxa were similar between spring 2001 and 2002 (P = 0.001) and between summer 2001 and 2002 (P = 0.007). The successional patterns seem to be typical for the seasonal periods and provide data on baseline fauna for estimating postmortem interval in cases of human death. This study is the first of its kind for southwest Virginia. PMID:15311476

Tabor, Kimberly L; Brewster, Carlyle C; Fell, Richard D



Child Mortality Estimation: Appropriate Time Periods for Child Mortality Estimates from Full Birth Histories  

PubMed Central

Background Child mortality estimates from complete birth histories from Demographic and Health Surveys (DHS) surveys and similar surveys are a chief source of data used to track Millennium Development Goal 4, which aims for a reduction of under-five mortality by two-thirds between 1990 and 2015. Based on the expected sample sizes when the DHS program commenced, the estimates are usually based on 5-y time periods. Recent surveys have had larger sample sizes than early surveys, and here we aimed to explore the benefits of using shorter time periods than 5 y for estimation. We also explore the benefit of changing the estimation procedure from being based on years before the survey, i.e., measured with reference to the date of the interview for each woman, to being based on calendar years. Methods and Findings Jackknife variance estimation was used to calculate standard errors for 207 DHS surveys in order to explore to what extent the large samples in recent surveys can be used to produce estimates based on 1-, 2-, 3-, 4-, and 5-y periods. We also recalculated the estimates for the surveys into calendar-year-based estimates. We demonstrate that estimation for 1-y periods is indeed possible for many recent surveys. Conclusions The reduction in bias achieved using 1-y periods and calendar-year-based estimation is worthwhile in some cases. In particular, it allows tracking of the effects of particular events such as droughts, epidemics, or conflict on child mortality in a way not possible with previous estimation procedures. Recommendations to use estimation for short time periods when possible and to use calendar-year-based estimation were adopted in the United Nations 2011 estimates of child mortality.

Pedersen, Jon; Liu, Jing



Estimation of sex from cranial measurements in a Western Australian population.  


It is widely accepted that the most accurate statistical estimations of biological attributes in the human skeleton (e.g., sex, age and stature) are produced using population-specific standards. As we previously demonstrated that the application of foreign standards to Western Australian individuals results in an unacceptably large sex bias (females frequently misclassified), the need for population-specific standards is duly required and greatly overdue. We report here on the first morphometric cranial sexing standards formulated specifically for application in, and based on the statistical analysis of, contemporary Western Australian individuals. The primary aim is to investigate the nature of cranial sexual dimorphism in this population and outline a series of statistically robust standards suitable for estimating sex in the complete bone and/or associated diagnostic fragments. The sample analysed comprised multi-detector computed tomography cranial scans of 400 individuals equally distributed by sex. Following 3D volume rendering, 31 landmarks were acquired using OsiriX, from which a total of 18 linear inter-landmark measurements were calculated. Measurements were analysed using basic descriptive statistics and discriminant function analyses employing jackknife validations of classification results. All measurements (except frontal breadth and orbital height - Bonferroni corrected) are sexually dimorphic with sex differences explaining 3.5-48.9% of sample variance. Bizygomatic breadth and maximum length of the cranium and the cranial base contribute most significantly to sex discrimination; the maximum classification accuracy was 90%, with a -2.1% sex-bias. We conclude that the cranium is both highly dimorphic and a reliable bone for estimating sex in Western Australian individuals. PMID:23537716

Franklin, Daniel; Cardini, Andrea; Flavel, Ambika; Kuliukas, Algis



Absolute and relative locations of earthquakes at Mount St. Helens, Washington, using continuous data: implications for magmatic processes: Chapter 4 in A volcano rekindled: the renewed eruption of Mount St. Helens, 2004-2006  

USGS Publications Warehouse

This study uses a combination of absolute and relative locations from earthquake multiplets to investigate the seismicity associated with the eruptive sequence at Mount St. Helens between September 23, 2004, and November 20, 2004. Multiplets, a prominent feature of seismicity during this time period, occurred as volcano-tectonic, hybrid, and low-frequency earthquakes spanning a large range of magnitudes and lifespans. Absolute locations were improved through the use of a new one-dimensional velocity model with excellent shallow constraints on P-wave velocities. We used jackknife tests to minimize possible biases in absolute and relative locations resulting from station outages and changing station configurations. In this paper, we show that earthquake hypocenters shallowed before the October 1 explosion along a north-dipping structure under the 1980-86 dome. Relative relocations of multiplets during the initial seismic unrest and ensuing eruption showed rather small source volumes before the October 1 explosion and larger tabular source volumes after October 5. All multiplets possess absolute locations very close to each other. However, the highly dissimilar waveforms displayed by each of the multiplets analyzed suggest that different sources and mechanisms were present within a very small source volume. We suggest that multiplets were related to pressurization of the conduit system that produced a stationary source that was highly stable over long time periods. On the basis of their response to explosions occurring in October 2004, earthquakes not associated with multiplets also appeared to be pressure dependent. The pressure source for these earthquakes appeared, however, to be different from the pressure source of the multiplets.

Thelen, Weston A.; Crosson, Robert S.; Creager, Kenneth C.



Does more sequence data improve estimates of galliform phylogeny? Analyses of a rapid radiation using a complete data matrix  

PubMed Central

The resolution of rapid evolutionary radiations or “bushes” in the tree of life has been one of the most difficult and interesting problems in phylogenetics. The avian order Galliformes appears to have undergone several rapid radiations that have limited the resolution of prior studies and obscured the position of taxa important both agriculturally and as model systems (chicken, turkey, Japanese quail). Here we present analyses of a multi-locus data matrix comprising over 15,000 sites, primarily from nuclear introns but also including three mitochondrial regions, from 46 galliform taxa with all gene regions sampled for all taxa. The increased sampling of unlinked nuclear genes provided strong bootstrap support for all but a small number of relationships. Coalescent-based methods to combine individual gene trees and analyses of datasets that are independent of published data indicated that this well-supported topology is likely to reflect the galliform species tree. The inclusion or exclusion of mitochondrial data had a limited impact upon analyses upon analyses using either concatenated data or multispecies coalescent methods. Some of the key phylogenetic findings include support for a second major clade within the core phasianids that includes the chicken and Japanese quail and clarification of the phylogenetic relationships of turkey. Jackknifed datasets suggested that there is an advantage to sampling many independent regions across the genome rather than obtaining long sequences for a small number of loci, possibly reflecting the differences among gene trees that differ due to incomplete lineage sorting. Despite the novel insights we obtained using this increased sampling of gene regions, some nodes remain unresolved, likely due to periods of rapid diversification. Resolving these remaining groups will likely require sequencing a very large number of gene regions, but our analyses now appear to support a robust backbone for this order.

Braun, Edward L.



Development of Pneumatic Aerodynamic Devices to Improve the Performance, Economics, and Safety of Heavy Vehicles  

SciTech Connect

Under contract to the DOE Office of Heavy Vehicle Technologies, the Georgia Tech Research Institute (GTRI) is developing and evaluating pneumatic (blown) aerodynamic devices to improve the performance, economics, stability and safety of operation of Heavy Vehicles. The objective of this program is to apply the pneumatic aerodynamic aircraft technology previously developed and flight-tested by GTRI personnel to the design of an efficient blown tractor-trailer configuration. Recent experimental results obtained by GTRI using blowing have shown drag reductions of 35% on a streamlined automobile wind-tunnel model. Also measured were lift or down-load increases of 100-150% and the ability to control aerodynamic moments about all 3 axes without any moving control surfaces. Similar drag reductions yielded by blowing on bluff afterbody trailers in current US trucking fleet operations are anticipated to reduce yearly fuel consumption by more than 1.2 billion gallons, while even further reduction is possible using pneumatic lift to reduce tire rolling resistance. Conversely, increased drag and down force generated instantaneously by blowing can greatly increase braking characteristics and control in wet/icy weather due to effective ''weight'' increases on the tires. Safety is also enhanced by controlling side loads and moments caused on these Heavy Vehicles by winds, gusts and other vehicles passing. This may also help to eliminate the jack-knifing problem if caused by extreme wind side loads on the trailer. Lastly, reduction of the turbulent wake behind the trailer can reduce splash and spray patterns and rough air being experienced by following vehicles. To be presented by GTRI in this paper will be results developed during the early portion of this effort, including a preliminary systems study, CFD prediction of the blown flowfields, and design of the baseline conventional tractor-trailer model and the pneumatic wind-tunnel model.

Robert J. Englar



Comparison between chest digital tomosynthesis and CT as a screening method to detect artificial pulmonary nodules: a phantom study  

PubMed Central

Objectives The objective of this study was to evaluate the imaging capabilities of chest digital tomosynthesis (DT) as a screening method for the detection of artificial pulmonary nodules, and to compare its efficiency with that of CT. Methods DT and CT were used to detect artificial pulmonary nodules (5 mm and 8 mm in diameter, ground-glass opacities) placed in a chest phantom. Using a three-dimensional filtered back-projection algorithm at acquisition angles of 8°, 20°, 30° and 40°, DT images of the desired layer thicknesses were reconstructed from the image data acquired during a single tomographic scan. Both standard and sharp CT reconstruction kernels were used, and the detectability index (DI) valves computed for both the DT scan acquisition angles and CT reconstruction kernel types were considered. For the observer study, we examined 50 samples of artificial pulmonary nodules using both DT and CT imaging. On the basis of evaluations made by five thoracic radiologists, a jackknife free-response receiver operating characteristic (JAFROC) study was performed to compare and assess the differences in detection accuracy between CT and DT imaging. Results For each increased acquisition angle, DI obtained by DT imaging was similar to that obtained by CT imaging. The difference in the observer-averaged JAFROC figure of merit for the five readings was 0.0363 (95% confidence interval: ?0.18, 0.26; F=0.101; p=0.75). Conclusion With the advantages of a decreased radiation dose and the practical accessibility of examination, DT may be a useful alternative to CT for the detection of artificial pulmonary nodules.

Gomi, T; Nakajima, M; Fujiwara, H; Takeda, T; Saito, K; Umeda, T; Sakaguchi, K



Estimating abundances of retroviral insertion sites from DNA fragment length data  

PubMed Central

Motivation: The relative abundance of retroviral insertions in a host genome is important in understanding the persistence and pathogenesis of both natural retroviral infections and retroviral gene therapy vectors. It could be estimated from a sample of cells if only the host genomic sites of retroviral insertions could be directly counted. When host genomic DNA is randomly broken via sonication and then amplified, amplicons of varying lengths are produced. The number of unique lengths of amplicons of an insertion site tends to increase according to its abundance, providing a basis for estimating relative abundance. However, as abundance increases amplicons of the same length arise by chance leading to a non-linear relation between the number of unique lengths and relative abundance. The difficulty in calibrating this relation is compounded by sample-specific variations in the relative frequencies of clones of each length. Results: A likelihood function is proposed for the discrete lengths observed in each of a collection of insertion sites and is maximized with a hybrid expectation–maximization algorithm. Patient data illustrate the method and simulations show that relative abundance can be estimated with little bias, but that variation in highly abundant sites can be large. In replicated patient samples, variation exceeds what the model implies—requiring adjustment as in Efron (2004) or using jackknife standard errors. Consequently, it is advantageous to collect replicate samples to strengthen inferences about relative abundance. Availability: An R package implements the algorithm described here. It is available at Contact: Supplementary information: Supplementary data are available at at Bioinformatics online.

Berry, Charles C.; Gillet, Nicolas A.; Melamed, Anat; Gormley, Niall; Bangham, Charles R. M.; Bushman, Frederic D.



Monitoring hydrofrac-induced seismicity by surface arrays - the DHM-Project Basel case study  

NASA Astrophysics Data System (ADS)

The method "nanoseismic monitoring" was applied during the hydraulic stimulation at the Deep-Heat-Mining-Project (DHM-Project) Basel. Two small arrays in a distance of 2.1 km and 4.8 km to the borehole recorded continuously for two days. During this time more than 2500 seismic events were detected. The method of the surface monitoring of induced seismicity was compared to the reference which the hydrofrac monitoring presented. The latter was conducted by a network of borehole seismometers by Geothermal Explorers Limited. Array processing provides a outlier resistant, graphical jack-knifing localization method which resulted in a average deviation towards the reference of 850 m. Additionally, by applying the relative localization master-event method, the NNW-SSE strike direction of the reference was confirmed. It was shown that, in order to successfully estimate the magnitude of completeness as well as the b-value at the event rate and detection sensibility present, 3 h segments of data are sufficient. This is supported by two segment out of over 13 h of evaluated data. These segments were chosen so that they represent a time during the high seismic noise during normal working hours in daytime as well as the minimum anthropogenic noise at night. The low signal-to-noise ratio was compensated by the application of a sonogram event detection as well as a coincidence analysis within each array. Sonograms allow by autoadaptive, non-linear filtering to enhance signals whose amplitudes are just above noise level. For these events the magnitude was determined by the master-event method, allowing to compute the magnitude of completeness by the entire-magnitude-range method provided by the ZMAP toolbox. Additionally, the b-values were determined and compared to the reference values. An introduction to the method of "nanoseismic monitoring" will be given as well as the comparison to reference data in the Basel case study.

Blascheck, P.; Häge, M.; Joswig, M.



Microbial diversity of biofilms in dental unit water systems.  


We investigated the microbial diversity of biofilms found in dental unit water systems (DUWS) by three methods. The first was microscopic examination by scanning electron microscopy (SEM), acridine orange staining, and fluorescent in situ hybridization (FISH). Most bacteria present in the biofilm were viable. FISH detected the beta and gamma, but not the alpha, subclasses of Proteobacteria: In the second method, 55 cultivated biofilm isolates were identified with the Biolog system, fatty acid analysis, and 16S ribosomal DNA (rDNA) sequencing. Only 16S identified all 55 isolates, which represented 13 genera. The most common organisms, as shown by analyses of 16S rDNA, belonged to the genera Afipia (28%) and Sphingomonas (16%). The third method was a culture-independent direct amplification and sequencing of 165 subclones from community biofilm 16S rDNA. This method revealed 40 genera: the most common ones included Leptospira (20%), Sphingomonas (14%), Bacillus (7%), Escherichia (6%), Geobacter (5%), and Pseudomonas (5%). Some of these organisms may be opportunistic pathogens. Our results have demonstrated that a biofilm in a health care setting may harbor a vast diversity of organisms. The results also reflect the limitations of culture-based techniques to detect and identify bacteria. Although this is the greatest diversity reported in DUWS biofilms, other genera may have been missed. Using a technique based on jackknife subsampling, we projected that a 25-fold increase in the number of subclones sequenced would approximately double the number of genera observed, reflecting the richness and high diversity of microbial communities in these biofilms. PMID:12788744

Singh, Ruby; Stine, O Colin; Smith, David L; Spitznagel, John K; Labib, Mohamed E; Williams, Henry N



Early detection of production deficit hot spots in semi-arid environment using FAPAR time series and a probabilistic approach  

NASA Astrophysics Data System (ADS)

Timely information on vegetation development at regional scale is needed in arid and semiarid African regions where rainfall variability leads to high inter-annual fluctuations in crop and pasture productivity, as well as to high risk of food crisis in the presence of severe drought events. The present study aims at developing and testing an automatic procedure to estimate the probability of experiencing a seasonal biomass production deficit solely on the basis of historical and near real-time remote sensing observations. The method is based on the extraction of vegetation phenology from SPOT-VEGTATION time series of the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR) and the subsequent computation of seasonally cumulated FAPAR as a proxy for vegetation gross primary production. Within season forecasts of the overall seasonal performance, expressed in terms of probability of experiencing a critical deficit, are based on a statistical approach taking into account two factors: i) the similarity between the current FAPAR profile and past profiles observable in the 15 years FAPAR time series; ii) the uncertainty of past predictions of season outcome as derived using jack-knifing technique. The method is applicable at the regional to continental scale and can be updated regularly during the season (whenever a new satellite observation is made available) to provide a synoptic view of the hot spots of likely production deficit. The specific objective of the procedure described here is to deliver to the food security analyst, as early as possible within the season, only the relevant information (e.g., masking out areas without active vegetation at the time of analysis), expressed through a reliable and easily interpretable measure of impending risk. Evaluation of method performance and examples of application in the Sahel region are discussed.

Meroni, M.; Fasbender, D.; Kayitakire, F.; Pini, G.; Rembold, F.; Urbano, F.; Verstraete, M. M.



Using the concept of Chou's pseudo amino acid composition to predict apoptosis proteins subcellular location: an approach by approximate entropy.  


The function of protein is closely correlated with it subcellular location. Prediction of subcellular location of apoptosis proteins is an important research area in post-genetic era because the knowledge of apoptosis proteins is useful to understand the mechanism of programmed cell death. Compared with the conventional amino acid composition (AAC), the Pseudo Amino Acid composition (PseAA) as originally introduced by Chou can incorporate much more information of a protein sequence so as to remarkably enhance the power of using a discrete model to predict various attributes of a protein. In this study, a novel approach is presented to predict apoptosis protein solely from sequence based on the concept of Chou's PseAA composition. The concept of approximate entropy (ApEn), which is a parameter denoting complexity of time series, is used to construct PseAA composition as additional features. Fuzzy K-nearest neighbor (FKNN) classifier is selected as prediction engine. Particle swarm optimization (PSO) algorithm is adopted for optimizing the weight factors which are important in PseAA composition. Two datasets are used to validate the performance of the proposed approach, which incorporate six subcellular location and four subcellular locations, respectively. The results obtained by jackknife test are quite encouraging. It indicates that the ApEn of protein sequence could represent effectively the information of apoptosis proteins subcellular locations. It can at least play a complimentary role to many of the existing methods, and might become potentially useful tool for protein function prediction. The software in Matlab is available freely by contacting the corresponding author. PMID:18473953

Jiang, Xiaoying; Wei, Rong; Zhang, Tongliang; Gu, Quan



Demographic history and rare allele sharing among human populations.  


High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2-4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ?1,000 sequenced chromosomes per population, whereas ?2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence. PMID:21730125

Gravel, Simon; Henn, Brenna M; Gutenkunst, Ryan N; Indap, Amit R; Marth, Gabor T; Clark, Andrew G; Yu, Fuli; Gibbs, Richard A; Bustamante, Carlos D



Demographic history and rare allele sharing among human populations  

PubMed Central

High-throughput sequencing technology enables population-level surveys of human genomic variation. Here, we examine the joint allele frequency distributions across continental human populations and present an approach for combining complementary aspects of whole-genome, low-coverage data and targeted high-coverage data. We apply this approach to data generated by the pilot phase of the Thousand Genomes Project, including whole-genome 2–4× coverage data for 179 samples from HapMap European, Asian, and African panels as well as high-coverage target sequencing of the exons of 800 genes from 697 individuals in seven populations. We use the site frequency spectra obtained from these data to infer demographic parameters for an Out-of-Africa model for populations of African, European, and Asian descent and to predict, by a jackknife-based approach, the amount of genetic diversity that will be discovered as sample sizes are increased. We predict that the number of discovered nonsynonymous coding variants will reach 100,000 in each population after ?1,000 sequenced chromosomes per population, whereas ?2,500 chromosomes will be needed for the same number of synonymous variants. Beyond this point, the number of segregating sites in the European and Asian panel populations is expected to overcome that of the African panel because of faster recent population growth. Overall, we find that the majority of human genomic variable sites are rare and exhibit little sharing among diverged populations. Our results emphasize that replication of disease association for specific rare genetic variants across diverged populations must overcome both reduced statistical power because of rarity and higher population divergence.

Gravel, Simon; Henn, Brenna M.; Gutenkunst, Ryan N.; Indap, Amit R.; Marth, Gabor T.; Clark, Andrew G.; Yu, Fuli; Gibbs, Richard A.; Bustamante, Carlos D.; Altshuler, David L.; Durbin, Richard M.; Abecasis, Goncalo R.; Bentley, David R.; Chakravarti, Aravinda; Clark, Andrew G.; Collins, Francis S.; De La Vega, Francisco M.; Donnelly, Peter; Egholm, Michael; Flicek, Paul; Gabriel, Stacey B.; Gibbs, Richard A.; Knoppers, Bartha M.; Lander, Eric S.; Lehrach, Hans; Mardis, Elaine R.; McVean, Gil A.; Nickerson, Debbie A.; Peltonen, Leena; Schafer, Alan J.; Sherry, Stephen T.; Wang, Jun; Wilson, Richard K.; Gibbs, Richard A.; Deiros, David; Metzker, Mike; Muzny, Donna; Reid, Jeff; Wheeler, David; Wang, Jun; Li, Jingxiang; Jian, Min; Li, Guoqing; Li, Ruiqiang; Liang, Huiqing; Tian, Geng; Wang, Bo; Wang, Jian; Wang, Wei; Yang, Huanming; Zhang, Xiuqing; Zheng, Huisong; Lander, Eric S.; Altshuler, David L.; Ambrogio, Lauren; Bloom, Toby; Cibulskis, Kristian; Fennell, Tim J.; Gabriel, Stacey B.; Jaffe, David B.; Shefler, Erica; Sougnez, Carrie L.; Bentley, David R.; Gormley, Niall; Humphray, Sean; Kingsbury, Zoya; Koko-Gonzales, Paula; Stone, Jennifer; McKernan, Kevin J.; Costa, Gina L.; Ichikawa, Jeffry K.; Lee, Clarence C.; Sudbrak, Ralf; Lehrach, Hans; Borodina, Tatiana A.; Dahl, Andreas; Davydov, Alexey N.; Marquardt, Peter; Mertes, Florian; Nietfeld, Wilfiried; Rosenstiel, Philip; Schreiber, Stefan; Soldatov, Aleksey V.; Timmermann, Bernd; Tolzmann, Marius; Egholm, Michael; Affourtit, Jason; Ashworth, Dana; Attiya, Said; Bachorski, Melissa; Buglione, Eli; Burke, Adam; Caprio, Amanda; Celone, Christopher; Clark, Shauna; Conners, David; Desany, Brian; Gu, Lisa; Guccione, Lorri; Kao, Kalvin; Kebbel, Andrew; Knowlton, Jennifer; Labrecque, Matthew; McDade, Louise; Mealmaker, Craig; Minderman, Melissa; Nawrocki, Anne; Niazi, Faheem; Pareja, Kristen; Ramenani, Ravi; Riches, David; Song, Wanmin; Turcotte, Cynthia; Wang, Shally; Mardis, Elaine R.; Wilson, Richard K.; Dooling, David; Fulton, Lucinda; Fulton, Robert; Weinstock, George; Durbin, Richard M.; Burton, John; Carter, David M.; Churcher, Carol; Coffey, Alison; Cox, Anthony; Palotie, Aarno; Quail, Michael; Skelly, Tom; Stalker, James; Swerdlow, Harold P.; Turner, Daniel; De Witte, Anniek; Giles, Shane; Gibbs, Richard A.; Wheeler, David; Bainbridge, Matthew; Challis, Danny; Sabo, Aniko; Yu, Fuli; Yu, Jin; Wang, Jun; Fang, Xiaodong; Guo, Xiaosen; Li, Ruiqiang; Li, Yingrui; Luo, Ruibang; Tai, Shuaishuai; Wu, Honglong; Zheng, Hancheng; Zheng, Xiaole; Zhou, Yan; Li, Guoqing; Wang, Jian; Yang, Huanming; Marth, Gabor T.; Garrison, Erik P.; Huang, Weichun; Indap, Amit; Kural, Deniz; Lee, Wan-Ping; Leong, Wen Fung; Quinlan, Aaron R.; Stewart, Chip; Stromberg, Michael P.; Ward, Alistair N.; Wu, Jiantao; Lee, Charles; Mills, Ryan E.; Shi, Xinghua; Daly, Mark J.; DePristo, Mark A.; Altshuler, David L.; Ball, Aaron D.; Banks, Eric; Bloom, Toby; Browning, Brian L.; Cibulskis, Kristian; Fennell, Tim J.; Garimella, Kiran V.; Grossman, Sharon R.; Handsaker, Robert E.; Hanna, Matt; Hartl, Chris; Jaffe, David B.; Kernytsky, Andrew M.; Korn, Joshua M.; Li, Heng; Maguire, Jared R.; McCarroll, Steven A.; McKenna, Aaron; Nemesh, James C.; Philippakis, Anthony A.; Poplin, Ryan E.; Price, Alkes; Rivas, Manuel A.; Sabeti, Pardis C.; Schaffner, Stephen F.; Shefler, Erica; Shlyakhter, Ilya A.; Cooper, David N.; Ball, Edward V.; Mort, Matthew; Phillips, Andrew D.; Stenson, Peter D.; Sebat, Jonathan; Makarov, Vladimir; Ye, Kenny; Yoon, Seungtai C.; Bustamante, Carlos D.; Clark, Andrew G.; Boyko, Adam; Degenhardt, Jeremiah; Gravel, Simon; Gutenkunst, Ryan N.; Kaganovich, Mark; Keinan, Alon; Lacroute, Phil; Ma, Xin; Reynolds, Andy; Clarke, Laura; Flicek, Paul; Cunningham, Fiona; Herrero, Javier; Keenen, Stephen; Kulesha, Eugene; Leinonen, Rasko; McLaren, William M.; Radhakrishnan, Rajesh; Smith, Richard E.; Zalunin, Vadim; Zheng-Bradley, Xiangqun; Korbel, Jan O.; Stutz, Adrian M.; Humphray, Sean; Bauer, Markus; Cheetham, R. Keira; Cox, Tony; Eberle, Michael; James, Terena; Kahn, Scott; Murray, Lisa; Chakravarti, Aravinda; Ye, Kai; De La Vega, Francisco M.; Fu, Yutao; Hyland, Fiona C. L.; Manning, Jonathan M.; McLaughlin, Stephen F.; Peckham, Heather E.; Sakarya, Onur; Sun, Yongming A.; Tsung, Eric F.; Batzer, Mark A.; Konkel, Miriam K.; Walker, Jerilyn A.; Sudbrak, Ralf; Albrecht, Marcus W.; Amstislavskiy, Vyacheslav S.; Herwig, Ralf; Parkhomchuk, Dimitri V.; Sherry, Stephen T.; Agarwala, Richa; Khouri, Hoda M.; Morgulis, Aleksandr O.; Paschall, Justin E.; Phan, Lon D.; Rotmistrovsky, Kirill E.; Sanders, Robert D.; Shumway, Martin F.



Late Results of Endorectal Flap in Management of High Type Perianal Fistula  

PubMed Central

BACKGROUND Fistula-in-ano is a problematic perianal disease for physicians and patients because of its occasional difficulty in management. Due to the different types of fistulas seen in patients, careful approach is necessary to correctly choose from among the various surgical techniques. One surgical method for complex fistula is the endorectal advancement flap which has been frequently performed because of its low complication rate. METHODS This study enrolled 40 (33 males, 7 females) patients who suffered from high type fistula (greater than 30%-50% involvement of the external sphincter) as noted on digital rectal examination and endoanalsonography. Patients were seen at Shahid Faghihi Hospital, affiliated with Shiraz University of Medical Sciences, between 2007 and 2011. All enrolled patients received similar preoperational preparation. We used the jackknife operative position and determined the internal orifice of the fistula by inserting a probe, with injection of methylene blue or oxygen peroxide. Endorectal advancement flap included the mucosa, submucosa and thin portion of the muscle that completely covered the sutured internal orifice area. The external orifice was opened to adjust the external border of the external sphincter to allow for effective drainage. RESULTS All enrolled patients were followed for 36 months, which was noticeable statistically when compared with other study findings of high type fistula. The location of the external orifice, age, sex and bowel habits were not related to recurrence rate. CONCLUSION Endorectal advancement flap in selected patients who suffer from high type fistula seems to have beneficial effects with a low recurrence rate. Therefore, management of complex high type fistulas remains a challenging topic.

Ghahramani, Ladan; Bananzadeh, Ali Mohammad; Izadpanah, Ahmad; Hosseini, Seyed Vahid



Detecting taxonomic signal in an under-utilised character system: geometric morphometrics of the forcipular coxae of Scutigeromorpha (Chilopoda)  

PubMed Central

Abstract To date, the forcipules have played almost no role in determining the systematics of scutigeromorph centipedes though in his 1974 review of taxonomic characters Markus Würmli suggested some potentially informative variation might be found in these structures. Geometric morphometric analyses were used to evaluate Würmli’s suggestion, specifically to determine whether the shape of the forcipular coxa contains information useful for diagnosing species. The geometry of the coxae of eight species from the genera Sphendononema, Scutigera, Dendrothereua, Thereuonema, Thereuopoda, Thereuopodina, Allothereua and Parascutigera was characterised using a combination of landmark- and semi-landmark-based sampling methods to summarize group-specific morphological variation. Canonical variates analysis of shape data characterizing the forcipular coxae indicates that these structures differ significantly between taxa at various systematic levels. Models calculated for the canonical variates space facilitate identification of the main shape differences between genera, including overall length/width, curvature of the external coxal margin, and the extent to which the coxofemoral condyle projects laterally. Jackknifed discriminant function analysis demonstrates that forcipular coxal training-set specimens were assigned to correct species in 61% of cases on average, the most accurate assignments being those of Parascutigera (Parascutigera guttata) and Thereuonema (Thereuonema microstoma). The geographically widespread species Thereuopoda longicornis, Sphendononema guildingii, Scutigera coleoptrata, and Dendrothereua linceci exhibit the least diagnostic coxae in our dataset. Thereuopoda longicornis populations sampled from different parts of East and Southeast Asia were significantly discriminated from each other, suggesting that, in this case, extensive synonymy may be obscuring diagnosable inter-species coxal shape differences.

Gutierrez, Beatriz Lopez; MacLeod, Norman; Edgecombe, Gregory D.



Predicting protein subnuclear location with optimized evidence-theoretic K-nearest classifier and pseudo amino acid composition  

SciTech Connect

The nucleus is the brain of eukaryotic cells that guides the life processes of the cell by issuing key instructions. For in-depth understanding of the biochemical process of the nucleus, the knowledge of localization of nuclear proteins is very important. With the avalanche of protein sequences generated in the post-genomic era, it is highly desired to develop an automated method for fast annotating the subnuclear locations for numerous newly found nuclear protein sequences so as to be able to timely utilize them for basic research and drug discovery. In view of this, a novel approach is developed for predicting the protein subnuclear location. It is featured by introducing a powerful classifier, the optimized evidence-theoretic K-nearest classifier, and using the pseudo amino acid composition [K.C. Chou, PROTEINS: Structure, Function, and Genetics, 43 (2001) 246], which can incorporate a considerable amount of sequence-order effects, to represent protein samples. As a demonstration, identifications were performed for 370 nuclear proteins among the following 9 subnuclear locations: (1) Cajal body, (2) chromatin, (3) heterochromatin, (4) nuclear diffuse, (5) nuclear pore, (6) nuclear speckle, (7) nucleolus, (8) PcG body, and (9) PML body. The overall success rates thus obtained by both the re-substitution test and jackknife cross-validation test are significantly higher than those by existing classifiers on the same working dataset. It is anticipated that the powerful approach may also become a useful high throughput vehicle to bridge the huge gap occurring in the post-genomic era between the number of gene sequences in databases and the number of gene products that have been functionally characterized. The OET-KNN classifier will be available at

Shen Hongbin [Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030 (China); Chou Kuochen [Institute of Image Processing and Pattern Recognition, Shanghai Jiaotong University, Shanghai 200030 (China) and Gordon Life Science Institute, San Diego, CA 92130 (United States)]. E-mail:



iGPCR-Drug: A Web Server for Predicting Interaction between GPCRs and Drugs in Cellular Networking  

PubMed Central

Involved in many diseases such as cancer, diabetes, neurodegenerative, inflammatory and respiratory disorders, G-protein-coupled receptors (GPCRs) are among the most frequent targets of therapeutic drugs. It is time-consuming and expensive to determine whether a drug and a GPCR are to interact with each other in a cellular network purely by means of experimental techniques. Although some computational methods were developed in this regard based on the knowledge of the 3D (dimensional) structure of protein, unfortunately their usage is quite limited because the 3D structures for most GPCRs are still unknown. To overcome the situation, a sequence-based classifier, called “iGPCR-drug”, was developed to predict the interactions between GPCRs and drugs in cellular networking. In the predictor, the drug compound is formulated by a 2D (dimensional) fingerprint via a 256D vector, GPCR by the PseAAC (pseudo amino acid composition) generated with the grey model theory, and the prediction engine is operated by the fuzzy K-nearest neighbour algorithm. Moreover, a user-friendly web-server for iGPCR-drug was established at For the convenience of most experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results without the need to follow the complicated math equations presented in this paper just for its integrity. The overall success rate achieved by iGPCR-drug via the jackknife test was 85.5%, which is remarkably higher than the rate by the existing peer method developed in 2010 although no web server was ever established for it. It is anticipated that iGPCR-Drug may become a useful high throughput tool for both basic research and drug development, and that the approach presented here can also be extended to study other drug – target interaction networks.

Xiao, Xuan; Min, Jian-Liang; Wang, Pu; Chou, Kuo-Chen



Prediction of lysine ubiquitination with mRMR feature selection and analysis.  


Ubiquitination, one of the most important post-translational modifications of proteins, occurs when ubiquitin (a small 76-amino acid protein) is attached to lysine on a target protein. It often commits the labeled protein to degradation and plays important roles in regulating many cellular processes implicated in a variety of diseases. Since ubiquitination is rapid and reversible, it is time-consuming and labor-intensive to identify ubiquitination sites using conventional experimental approaches. To efficiently discover lysine-ubiquitination sites, a sequence-based predictor of ubiquitination site was developed based on nearest neighbor algorithm. We used the maximum relevance and minimum redundancy principle to identify the key features and the incremental feature selection procedure to optimize the prediction engine. PSSM conservation scores, amino acid factors and disorder scores of the surrounding sequence formed the optimized 456 features. The Mathew's correlation coefficient (MCC) of our ubiquitination site predictor achieved 0.142 by jackknife cross-validation test on a large benchmark dataset. In independent test, the MCC of our method was 0.139, higher than the existing ubiquitination site predictor UbiPred and UbPred. The MCCs of UbiPred and UbPred on the same test set were 0.135 and 0.117, respectively. Our analysis shows that the conservation of amino acids at and around lysine plays an important role in ubiquitination site prediction. What's more, disorder and ubiquitination have a strong relevance. These findings might provide useful insights for studying the mechanisms of ubiquitination and modulating the ubiquitination pathway, potentially leading to potential therapeutic strategies in the future. PMID:21267749

Cai, Yudong; Huang, Tao; Hu, Lele; Shi, Xiaohe; Xie, Lu; Li, Yixue



Classification and analysis of regulatory pathways using graph property, biochemical and physicochemical property, and functional property.  


Given a regulatory pathway system consisting of a set of proteins, can we predict which pathway class it belongs to? Such a problem is closely related to the biological function of the pathway in cells and hence is quite fundamental and essential in systems biology and proteomics. This is also an extremely difficult and challenging problem due to its complexity. To address this problem, a novel approach was developed that can be used to predict query pathways among the following six functional categories: (i) "Metabolism", (ii) "Genetic Information Processing", (iii) "Environmental Information Processing", (iv) "Cellular Processes", (v) "Organismal Systems", and (vi) "Human Diseases". The prediction method was established trough the following procedures: (i) according to the general form of pseudo amino acid composition (PseAAC), each of the pathways concerned is formulated as a 5570-D (dimensional) vector; (ii) each of components in the 5570-D vector was derived by a series of feature extractions from the pathway system according to its graphic property, biochemical and physicochemical property, as well as functional property; (iii) the minimum redundancy maximum relevance (mRMR) method was adopted to operate the prediction. A cross-validation by the jackknife test on a benchmark dataset consisting of 146 regulatory pathways indicated that an overall success rate of 78.8% was achieved by our method in identifying query pathways among the above six classes, indicating the outcome is quite promising and encouraging. To the best of our knowledge, the current study represents the first effort in attempting to identity the type of a pathway system or its biological function. It is anticipated that our report may stimulate a series of follow-up investigations in this new and challenging area. PMID:21980418

Huang, Tao; Chen, Lei; Cai, Yu-Dong; Chou, Kuo-Chen



Prediction of deleterious non-synonymous SNPs based on protein interaction network and hybrid properties.  


Non-synonymous SNPs (nsSNPs), also known as Single Amino acid Polymorphisms (SAPs) account for the majority of human inherited diseases. It is important to distinguish the deleterious SAPs from neutral ones. Most traditional computational methods to classify SAPs are based on sequential or structural features. However, these features cannot fully explain the association between a SAP and the observed pathophysiological phenotype. We believe the better rationale for deleterious SAP prediction should be: If a SAP lies in the protein with important functions and it can change the protein sequence and structure severely, it is more likely related to disease. So we established a method to predict deleterious SAPs based on both protein interaction network and traditional hybrid properties. Each SAP is represented by 472 features that include sequential features, structural features and network features. Maximum Relevance Minimum Redundancy (mRMR) method and Incremental Feature Selection (IFS) were applied to obtain the optimal feature set and the prediction model was Nearest Neighbor Algorithm (NNA). In jackknife cross-validation, 83.27% of SAPs were correctly predicted when the optimized 263 features were used. The optimized predictor with 263 features was also tested in an independent dataset and the accuracy was still 80.00%. In contrast, SIFT, a widely used predictor of deleterious SAPs based on sequential features, has a prediction accuracy of 71.05% on the same dataset. In our study, network features were found to be most important for accurate prediction and can significantly improve the prediction performance. Our results suggest that the protein interaction context could provide important clues to help better illustrate SAP's functional association. This research will facilitate the post genome-wide association studies. PMID:20689580

Huang, Tao; Wang, Ping; Ye, Zhi-Qiang; Xu, Heng; He, Zhisong; Feng, Kai-Yan; Hu, Lele; Cui, Weiren; Wang, Kai; Dong, Xiao; Xie, Lu; Kong, Xiangyin; Cai, Yu-Dong; Li, Yixue



A comparison of ROC inferred from FROC and conventional ROC  

NASA Astrophysics Data System (ADS)

This study aims to determine whether receiver operating characteristic (ROC) scores inferred from free-response receiver operating characteristic (FROC) were equivalent to conventional ROC scores for the same readers and cases. Forty-five examining radiologists of the American Board of Radiology independently reviewed 47 PA chest radiographs under at least two conditions. Thirty-seven cases had abnormal findings and 10 cases had normal findings. Half the readers were asked to first locate any visualized lung nodules, mark them and assign a level of confidence [the FROC mark-rating pair] and second give an overall to the entire image on the same scale [the ROC score]. The second half of readers gave the ROC rating first followed by the FROC mark-rating pairs. A normal image was represented with number 1 and malignant lesions with numbers 2-5. A jackknife free-response receiver operating characteristic (JAFROC), and inferred ROC (infROC) was calculated from the mark-rating pairs using JAFROC V4.1 software. ROC based on the overall rating of the image calculated using DBM MRMC software, which was also used to compare infROC and ROC AUCs treating the methods as modalities. Pearson's correlations coefficient and linear regression were used to examine their relationship using SPSS, version 21.0; (SPSS, Chicago, IL). The results of this study showed no significant difference between the ROC and Inferred ROC AUCs (p?0.25). While Pearson's correlation coefficient was 0.7 (p?0.01). Inter-reader correlation calculated from Obuchowski- Rockette covariance's ranged from 0.43-0.86 while intra-reader agreement was greater than previously reported ranging from 0.68-0.82.

McEntee, Mark F.; Littlefair, Stephen; Pietrzyk, Mariusz W.



Housefly Population Density Correlates with Shigellosis among Children in Mirzapur, Bangladesh: A Time Series Analysis  

PubMed Central

Background Shigella infections are a public health problem in developing and transitional countries because of high transmissibility, severity of clinical disease, widespread antibiotic resistance and lack of a licensed vaccine. Whereas Shigellae are known to be transmitted primarily by direct fecal-oral contact and less commonly by contaminated food and water, the role of the housefly Musca domestica as a mechanical vector of transmission is less appreciated. We sought to assess the contribution of houseflies to Shigella-associated moderate-to-severe diarrhea (MSD) among children less than five years old in Mirzapur, Bangladesh, a site where shigellosis is hyperendemic, and to model the potential impact of a housefly control intervention. Methods Stool samples from 843 children presenting to Kumudini Hospital during 2009–2010 with new episodes of MSD (diarrhea accompanied by dehydration, dysentery or hospitalization) were analyzed. Housefly density was measured twice weekly in six randomly selected sentinel households. Poisson time series regression was performed and autoregression-adjusted attributable fractions (AFs) were calculated using the Bruzzi method, with standard errors via jackknife procedure. Findings Dramatic springtime peaks in housefly density in 2009 and 2010 were followed one to two months later by peaks of Shigella-associated MSD among toddlers and pre-school children. Poisson time series regression showed that housefly density was associated with Shigella cases at three lags (six weeks) (Incidence Rate Ratio?=?1.39 [95% CI: 1.23 to 1.58] for each log increase in fly count), an association that was not confounded by ambient air temperature. Autocorrelation-adjusted AF calculations showed that a housefly control intervention could have prevented approximately 37% of the Shigella cases over the study period. Interpretation Houseflies may play an important role in the seasonal transmission of Shigella in some developing country ecologies. Interventions to control houseflies should be evaluated as possible additions to the public health arsenal to diminish Shigella (and perhaps other causes of) diarrheal infection.

Farag, Tamer H.; Faruque, Abu S.; Wu, Yukun; Das, Sumon K.; Hossain, Anowar; Ahmed, Shahnawaz; Ahmed, Dilruba; Nasrin, Dilruba; Kotloff, Karen L.; Panchilangam, Sandra; Nataro, James P.; Cohen, Dani; Blackwelder, William C.; Levine, Myron M.



Differentiation of fecal Escherichia coli from human, livestock, and poultry sources by rep-PCR DNA fingerprinting on the shellfish culture area of East China Sea.  


The rep-PCR DNA fingerprinting performed with REP, BOX A1R, and (GTG)(5) primers was investigated as a way to differentiate between human, livestock, and poultry sources of fecal pollution on the area of Xiangshan Bay, East China Sea. Of the three methods, the BOX-PCR DNA fingerprints analyzed by jack-knife algorithm were revealed high rate of correct classification (RCC) with 91.30, 80.39, 89.39, 86.14, 93.24, 87.72, and 89.28% of human, cattle, swine, chicken, duck, sheep, and goose E. coli isolates classified into the correct host source, respectively. The average rate of correct classification (ARCC) of REP-, BOX-, and (GTG)(5)-PCR patterns was 79.88, 88.21, and 86.39%, respectively. Although the highest amount of bands in (GTG)(5)-PCR fingerprints could be observed, the discriminatory efficacy of BOX-PCR was superior to both REP- and (GTG)(5)-PCR. Moreover, the similarity of 459 isolates originated from shellfish and growing water was compared with fecal-obtained strains. The results showed that 92.4 and 96.2% E. coli strains isolated from midstream and downstream shellfish samples, respectively, had a ? 80% similarity with corresponding strains isolated from fecal samples. It was indicated that E. coli in feces could spread from human sewage or domestic farms to the surrounding shellfish culture water, and potentially affect the quality of shellfish. This work suggests that rep-PCR fingerprinting can be a promising genotypic tool applied in the shellfish growing water management on East China Sea for source identification of fecal pollution. PMID:21279641

Ma, Hong-Jia; Fu, Ling-Lin; Li, Jian-Rong



Spring flood reconstruction from continuous and discrete tree ring series  

NASA Astrophysics Data System (ADS)

This study proposes a method to reconstruct past spring flood discharge from continuous and discrete tree ring chronologies, since both have their respective strengths and weaknesses in northern environments. Ring width or density series provide uninterrupted records that are indirectly linked to regional discharge through a concomitant effect of climate on tree growth and streamflow. Conversely, discrete event chronologies constitute conspicuous records of past high water levels since they are constructed from trees that are directly damaged by the flood. However, the uncertainty of discrete series increases toward the past, and their relationships with spring discharge are often nonlinear. To take advantage of these two sources of information, we introduce a new transfer model technique on the basis of generalized additive model (GAM) theory. The incorporation of discrete predictors and the evaluation of the robustness of the nonlinear relationships are assessed using a jackknife procedure. We exemplify our approach in a reconstruction of May water supplies to the Caniapiscau hydroelectric reservoir in northern Quebec, Canada. We used earlywood density measurements as continuous variables and ice-scar dates around Lake Montausier in the James Bay area as a discrete variable. Strong calibration (0.57 < 0.61 < 0.75) and validation (0.27 < 0.44 < 0.58) R2 statistics were obtained, thus highlighting the usefulness of the model. Our reconstruction suggests that, since ˜1965, spring floods have become more intense and variable in comparison with the last 150 years. We argue that a similar procedure can be used in each case where discrete and continuous tree ring proxies are used together to reconstruct past spring floods.

Boucher, ÉTienne; Ouarda, Taha B. M. J.; BéGin, Yves; Nicault, Antoine



Dose dependence of mass and microcalcification detection in digital mammography: free response human observer studies  

PubMed Central

The purpose of this study was to evaluate the effect of dose reduction in digital mammography on the detection of two lesion types – malignant masses and clusters of microcalcifications. Two free-response observer studies were performed – one for each lesion type. Ninety screening images were retrospectively selected; each image was originally acquired under automatic exposure conditions, corresponding to an average glandular dose of 1.3 mGy for a standard breast (50 mm compressed breast thickness with 50% glandularity). For each study, one to three simulated lesions were added to each of forty images (abnormals) while fifty were kept without lesions (normals). Two levels of simulated system noise were added to the images yielding two new image sets, corresponding to simulated dose levels of 50% and 30% of the original images (100%). The manufacturer’s standard display processing was subsequently applied to all images. Four radiologists experienced in mammography evaluated the images by searching for lesions and marking and assigning confidence levels to suspicious regions. The search data was analyzed using jackknife free-response (JAFROC) methodology. For the detection of masses, the mean figure-of-merit (FOM) averaged over all readers was 0.74, 0.71, and 0.68 corresponding to dose levels 100%, 50% and 30%, respectively. These values were not statistically different from each other (F = 1.67, p = 0.19) but showed a decreasing trend. In contrast, in the microcalcification study the mean FOM was 0.93, 0.67, and 0.38 for the same dose levels and these values were all significantly different from each other (F = 109.84, p < 0.0001). The results indicate that lowering the present dose level by a factor of two compromised the detection of microcalcifications but had a weaker effect on mass detection.

Ruschin, Mark; Timberg, Pontus; Bath, Magnus; Hemdal, Bengt; Svahn, Tony; Saunders, Rob; Samei, Ehsan; Andersson, Ingvar; Mattsson, Soren; Chakraborty, Dev P.; Tingberg, Anders



EMPeror: a tool for visualizing high-throughput microbial community data  

PubMed Central

Background As microbial ecologists take advantage of high-throughput sequencing technologies to describe microbial communities across ever-increasing numbers of samples, new analysis tools are required to relate the distribution of microbes among larger numbers of communities, and to use increasingly rich and standards-compliant metadata to understand the biological factors driving these relationships. In particular, the Earth Microbiome Project drives these needs by profiling the genomic content of tens of thousands of samples across multiple environment types. Findings Features of EMPeror include: ability to visualize gradients and categorical data, visualize different principal coordinates axes, present the data in the form of parallel coordinates, show taxa as well as environmental samples, dynamically adjust the size and transparency of the spheres representing the communities on a per-category basis, dynamically scale the axes according to the fraction of variance each explains, show, hide or recolor points according to arbitrary metadata including that compliant with the MIxS family of standards developed by the Genomic Standards Consortium, display jackknifed-resampled data to assess statistical confidence in clustering, perform coordinate comparisons (useful for procrustes analysis plots), and greatly reduce loading times and overall memory footprint compared with existing approaches. Additionally, ease of sharing, given EMPeror’s small output file size, enables agile collaboration by allowing users to embed these visualizations via emails or web pages without the need for extra plugins. Conclusions Here we present EMPeror, an open source and web browser enabled tool with a versatile command line interface that allows researchers to perform rapid exploratory investigations of 3D visualizations of microbial community data, such as the widely used principal coordinates plots. EMPeror includes a rich set of controllers to modify features as a function of the metadata. By being specifically tailored to the requirements of microbial ecologists, EMPeror thus increases the speed with which insight can be gained from large microbiome datasets.



Prediction of apoptosis protein locations with genetic algorithms and support vector machines through a new mode of pseudo amino acid composition.  


Apoptosis is an essential process for controlling tissue homeostasis by regulating a physiological balance between cell proliferation and cell death. The subcellular locations of proteins performing the cell death are determined by mostly independent cellular mechanisms. The regular bioinformatics tools to predict the subcellular locations of such apoptotic proteins do often fail. This work proposes a model for the sorting of proteins that are involved in apoptosis, allowing us to both the prediction of their subcellular locations as well as the molecular properties that contributed to it. We report a novel hybrid Genetic Algorithm (GA)/Support Vector Machine (SVM) approach to predict apoptotic protein sequences using 119 sequence derived properties like frequency of amino acid groups, secondary structure, and physicochemical properties. GA is used for selecting a near-optimal subset of informative features that is most relevant for the classification. Jackknife cross-validation is applied to test the predictive capability of the proposed method on 317 apoptosis proteins. Our method achieved 85.80% accuracy using all 119 features and 89.91% accuracy for 25 features selected by GA. Our models were examined by a test dataset of 98 apoptosis proteins and obtained an overall accuracy of 90.34%. The results show that the proposed approach is promising; it is able to select small subsets of features and still improves the classification accuracy. Our model can contribute to the understanding of programmed cell death and drug discovery. The software and dataset are available at PMID:20666727

Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Möller, Steffen; Hartmann, Enno; Kalies, Kai-Uwe; Suganthan, P N; Martinetz, Thomas



Population pharmacokinetics of clofarabine, a second-generation nucleoside analog, in pediatric patients with acute leukemia.  


The population pharmacokinetics of plasma clofarabine and intracellular clofarabine triphosphate were characterized in pediatric patients with acute leukemias. Traditional model-building techniques with NONMEM were used. Covariates were entered into the base model using a forward selection significance level of .05 and a backwards deletion criterion of .005. Model performance, stability, and influence analysis were assessed using the nonparametric bootstrap and n-1 jackknife. Simulations were used to understand the relationship between important covariates and exposure. A 2-compartment model with weight (scaled to a 40-kg reference patient) modeled as a power function on all pharmacokinetic parameters (0.75 on clearance-related terms and 1.0 on volume-related terms) was fit to plasma clofarabine concentrations (n = 32). White blood cell (WBC) count, modeled as a power function (scaled to a WBC count of 10 x 10(3)/microL), was a significant predictor of central volume with power term 0.128 +/- 0.0314. A reference patient had a systemic clearance of 32.8 L/h (27% between-subject variability [BSV]), a central volume of 115 L (56% BSV), an intercompartmental clearance of 20.5 L/h (27% BSV), and a peripheral volume of 94.5 L (39% BSV). Intracellular clofarabine triphosphate concentrations were modeled using a random intercept model without any covariates. The average predicted concentration was 11.6 +/- 2.62 microM (80% BSV), and although clofarabine triphosphate half-life could not be definitively estimated, its value was taken to be longer than 24 hours. The results confirm that clofarabine should continue being dosed on a per-squaremeter or per-body-weight basis. PMID:15496649

Bonate, Peter L; Craig, Adam; Gaynon, Paul; Gandhi, Varsha; Jeha, Sima; Kadota, Richard; Lam, Gilbert N; Plunkett, William; Razzouk, Bassem; Rytting, Michael; Steinherz, Peter; Weitman, Steve



MR Imaging in Patients with Suspected Liver Metastases: Value of Liver-Specific Contrast Agent Gadoxetic Acid  

PubMed Central

Objective To compare the diagnostic performance of gadoxetic acid-enhanced magnetic resonance (MR) imaging with that of triple-phase multidetector-row computed tomography (MDCT) in the detection of liver metastasis. Materials and Methods Our institutional review board approved this retrospective study and waived informed consent. The study population consisted of 51 patients with hepatic metastases and 62 patients with benign hepatic lesions, who underwent triple-phase MDCT and gadoxetic acid-enhanced MRI within one month. Two radiologists independently and randomly reviewed MDCT and MRI images regarding the presence and probability of liver metastasis. In order to determine additional value of hepatobiliary-phase (HBP), the dynamic-MRI set alone and combined dynamic-and-HBP set were evaluated, respectively. The standard of reference was a combination of pathology diagnosis and follow-up imaging. For each reader, diagnostic accuracy was compared using the jackknife alternative free-response receiver-operating-characteristic (JAFROC). Results For both readers, average JAFROC figure-of-merit (FOM) was significantly higher on the MR image sets than on the MDCT images: average FOM was 0.582 on the MDCT, 0.788 on the dynamic-MRI set and 0.847 on the combined HBP set, respectively (p < 0.0001). The differences were more prominent for small (? 1 cm) lesions: average FOM values were 0.433 on MDCT, 0.711 on the dynamic-MRI set and 0.828 on the combined HBP set, respectively (p < 0.0001). Sensitivity increased significantly with the addition of HBP in gadoxetic acid-enhanced MR imaging (p < 0.0001). Conclusion Gadoxetic acid-enhanced MRI shows a better performance than triple-phase MDCT for the detection of hepatic metastasis, especially for small (? 1 cm) lesions.

Lee, Kyung Hee; Park, Ji Hoon; Kim, Jung Hoon; Park, Hee Sun; Yu, Mi Hye; Yoon, Jeong-Hee; Han, Joon Koo; Choi, Byung Ihn



Arthropod community structure in pastures of an island archipelago (Azores): looking for local-regional species richness patterns at fine-scales.  


The arthropod species richness of pastures in three Azorean islands was used to examine the relationship between local and regional species richness over two years. Two groups of arthropods, spiders and sucking insects, representing two functionally different but common groups of pasture invertebrates were investigated. The local-regional species richness relationship was assessed over relatively fine scales: quadrats (= local scale) and within pastures (= regional scale). Mean plot species richness was used as a measure of local species richness (= alpha diversity) and regional species richness was estimated at the pasture level (= gamma diversity) with the 'first-order-Jackknife' estimator. Three related issues were addressed: (i). the role of estimated regional species richness and variables operating at the local scale (vegetation structure and diversity) in determining local species richness; (ii). quantification of the relative contributions of alpha and beta diversity to regional diversity using additive partitioning; and (iii). the occurrence of consistent patterns in different years by analysing independently between-year data. Species assemblages of spiders were saturated at the local scale (similar local species richness and increasing beta-diversity in richer regions) and were more dependent on vegetational structure than regional species richness. Sucking insect herbivores, by contrast, exhibited a linear relationship between local and regional species richness, consistent with the proportional sampling model. The patterns were consistent between years. These results imply that for spiders local processes are important, with assemblages in a particular patch being constrained by habitat structure. In contrast, for sucking insects, local processes may be insignificant in structuring communities. PMID:15153294

Borges, P A V; Brown, V K



Computer-aided Detection Improves Detection of Pulmonary Nodules in Chest Radiographs beyond the Support by Bone-suppressed Images.  


Purpose To evaluate the added value of computer-aided detection (CAD) for lung nodules on chest radiographs when radiologists have bone-suppressed images (BSIs) available. Materials and Methods Written informed consent was waived by the institutional review board. Selection of study images and study setup was reviewed and approved by the institutional review boards. Three hundred posteroanterior (PA) and lateral chest radiographs (189 radiographs with negative findings and 111 radiographs with a solitary nodule) in 300 subjects were selected from image archives at four institutions. PA images were processed by using a commercially available CAD, and PA BSIs were generated. Five radiologists and three residents evaluated the radiographs with BSIs available, first, without CAD and, second, after inspection of the CAD marks. Readers marked locations suspicious for a nodule and provided a confidence score for that location to be a nodule. Location-based receiver operating characteristic analysis was performed by using jackknife alternative free-response receiver operating characteristic analysis. Area under the curve (AUC) functioned as figure of merit, and P values were computed with the Dorfman-Berbaum-Metz method. Results Average nodule size was 16.2 mm. Stand-alone CAD reached a sensitivity of 74% at 1.0 false-positive mark per image. Without CAD, average AUC for observers was 0.812. With CAD, performance significantly improved to an AUC of 0.841 (P = .0001). CAD detected 127 of 239 nodules that were missed after evaluation of the radiographs together with BSIs pooled over all observers. Only 57 of these detections were eventually marked by the observers after review of CAD candidates. Conclusion CAD improved radiologists' performance for the detection of lung nodules on chest radiographs, even when baseline performance was optimized by providing lateral radiographs and BSIs. Still, most of the true-positive CAD candidates are dismissed by observers. © RSNA, 2014. PMID:24635675

Schalekamp, Steven; van Ginneken, Bram; Koedam, Emmeline; Snoeren, Miranda M; Tiehuis, Audrey M; Wittenberg, Rianne; Karssemeijer, Nico; Schaefer-Prokop, Cornelia M



[Forest lighting fire forecasting for Daxing'anling Mountains based on MAXENT model].  


Daxing'anling Mountains is one of the areas with the highest occurrence of forest lighting fire in Heilongjiang Province, and developing a lightning fire forecast model to accurately predict the forest fires in this area is of importance. Based on the data of forest lightning fires and environment variables, the MAXENT model was used to predict the lightning fire in Daxing' anling region. Firstly, we studied the collinear diagnostic of each environment variable, evaluated the importance of the environmental variables using training gain and the Jackknife method, and then evaluated the prediction accuracy of the MAXENT model using the max Kappa value and the AUC value. The results showed that the variance inflation factor (VIF) values of lightning energy and neutralized charge were 5.012 and 6.230, respectively. They were collinear with the other variables, so the model could not be used for training. Daily rainfall, the number of cloud-to-ground lightning, and current intensity of cloud-to-ground lightning were the three most important factors affecting the lightning fires in the forest, while the daily average wind speed and the slope was of less importance. With the increase of the proportion of test data, the max Kappa and AUC values were increased. The max Kappa values were above 0.75 and the average value was 0.772, while all of the AUC values were above 0.5 and the average value was 0. 859. With a moderate level of prediction accuracy being achieved, the MAXENT model could be used to predict forest lightning fire in Daxing'anling Mountains. PMID:25011305

Sun, Yu; Shi, Ming-Chang; Peng, Huan; Zhu, Pei-Lin; Liu, Si-Lin; Wu, Shi-Lei; He, Cheng; Chen, Feng



A New Estimate of the Earth's Land Surface Temperature History  

NASA Astrophysics Data System (ADS)

The Berkeley Earth Surface Temperature team has re-evaluated the world's atmospheric land surface temperature record using a linear least-squares method that allow the use of all the digitized records back to 1800, including short records that had been excluded by prior groups. We use the Kriging method to estimate an optimal weighting of stations to give a world average based on uniform weighting of the land surface. We have assembled a record of the available data by merging 1.6 billion temperature reports from 16 pre-existing data archives; this data base will be made available for public use. The former Global Historic Climatology Network (GHCN) monthly data base shows a sudden drop in the number of stations reporting monthly records from 1980 to the present; we avoid this drop by calculating monthly averages from the daily records. By using all the data, we reduce the effects of potential data selection bias. We make an independent estimate of the urban heat island effect by calculating the world land temperature trends based on stations chosen to be far from urban sites. We calculate the effect of poor station quality, as documented in the US by the team led by Anthony Watts by estimating the temperature trends based solely on the stations ranked good (1,2 or 1,2,3 in the NOAA ranking scheme). We avoid issues of homogenization bias by using raw data; at times when the records are discontinuous (e.g. due to station moves) we break the record into smaller segments and analyze those, rather than attempt to correct the discontinuity. We estimate the uncertainties in the final results using the jackknife procedure developed by J. Tukey. We calculate spatial uncertainties by measuring the effects of geographical exclusion on recent data that have good world coverage. The results we obtain are compared to those published by the groups at NOAA, NASA-GISS, and Hadley-CRU in the UK.

Muller, R. A.; Curry, J. A.; Groom, D.; Jacobsen, B.; Perlmutter, S.; Rohde, R. A.; Rosenfeld, A.; Wickham, C.; Wurtele, J.



Source mechanisms of the 2000 earthquake swarm in the West Bohemia/Vogtland region (Central Europe)  

NASA Astrophysics Data System (ADS)

An earthquake swarm of magnitudes up to ML = 3.2 occurred in the region of West Bohemia/Vogtland (border area between Czech Republic and Germany) in autumn 2000. This swarm consisted of nine episodic phases and lasted 4 months. We retrieved source mechanisms of 102 earthquakes with magnitudes between ML = 1.6 and 3.2 applying inversion of the peak amplitudes of direct P and SH waves, which were determined from ground motion seismograms. The investigated events cover the whole swarm activity in both time and space. We use data from permanent stations of seismic network WEBNET and from temporal stations, which were deployed in the epicentral area during the swarm; the number of stations varied from 7 to 18. The unconstrained moment tensor (MT) expression of the mechanism, which describes a general system of dipoles, that is both double-couple (DC) and non-DC sources, was applied. MTs of each earthquake were estimated by inversion of three different sets of data: P-wave amplitudes only, P- and SH-wave amplitudes and P-wave amplitudes along with the SH-wave amplitudes from a priori selected four `base' WEBNET stations, the respective MT solutions are nearly identical for each event investigated. The resultant mechanisms of all events are dominantly DCs with only insignificant non-DC components mostly not exceeding 10 per cent. We checked reliability of the MTs in jackknife trials eliminating some data; we simulated the mislocation of hypocentre or contaminated the P- and SH-wave amplitudes by accidental errors. These tests proved stable and well constrained MT solutions. The massive dominance of the DC in all investigated events implies that the 2000 swarm consisted of a large number of pure shears along a fault plane. The focal mechanisms indicate both oblique-normal and oblique-thrust faulting, however, the oblique-normal faulting prevails. The predominant strikes and dips of the oblique-normal events fit well the geometry of the main fault plane Nový Kostel (NK) and also match the strike, dip and rake of the largest ML = 4.6 earthquake of a strong swarm in 1985/86. On the contrary, the 2000 source mechanisms differ substantially from those of the 1997-swarm (which took place in two fault segments at the edge of the main NK fault plane) in both the faulting and the content of non-DC components. Further, we found that the scalar seismic moment M0 is related to the local magnitude ML used by WEBNET as M0 ? 101.12ML, which differs from the scaling law using moment magnitude Mw, that is M0 ? 101.5Mw.

Horálek, Josef; Šílený, Jan



MSLoc-DT: a new method for predicting the protein subcellular location of multispecies based on decision templates.  


Revealing the subcellular location of newly discovered protein sequences can bring insight to their function and guide research at the cellular level. The rapidly increasing number of sequences entering the genome databanks has called for the development of automated analysis methods. Currently, most existing methods used to predict protein subcellular locations cover only one, or a very limited number of species. Therefore, it is necessary to develop reliable and effective computational approaches to further improve the performance of protein subcellular prediction and, at the same time, cover more species. The current study reports the development of a novel predictor called MSLoc-DT to predict the protein subcellular locations of human, animal, plant, bacteria, virus, fungi, and archaea by introducing a novel feature extraction approach termed Amino Acid Index Distribution (AAID) and then fusing gene ontology information, sequential evolutionary information, and sequence statistical information through four different modes of pseudo amino acid composition (PseAAC) with a decision template rule. Using the jackknife test, MSLoc-DT can achieve 86.5, 98.3, 90.3, 98.5, 95.9, 98.1, and 99.3% overall accuracy for human, animal, plant, bacteria, virus, fungi, and archaea, respectively, on seven stringent benchmark datasets. Compared with other predictors (e.g., Gpos-PLoc, Gneg-PLoc, Virus-PLoc, Plant-PLoc, Plant-mPLoc, ProLoc-Go, Hum-PLoc, GOASVM) on the gram-positive, gram-negative, virus, plant, eukaryotic, and human datasets, the new MSLoc-DT predictor is much more effective and robust. Although the MSLoc-DT predictor is designed to predict the single location of proteins, our method can be extended to multiple locations of proteins by introducing multilabel machine learning approaches, such as the support vector machine and deep learning, as substitutes for the K-nearest neighbor (KNN) method. As a user-friendly web server, MSLoc-DT is freely accessible at PMID:24361712

Zhang, Shao-Wu; Liu, Yan-Fang; Yu, Yong; Zhang, Ting-He; Fan, Xiao-Nan



A Multi-Label Classifier for Predicting the Subcellular Localization of Gram-Negative Bacterial Proteins with Both Single and Multiple Sites  

PubMed Central

Prediction of protein subcellular localization is a challenging problem, particularly when the system concerned contains both singleplex and multiplex proteins. In this paper, by introducing the “multi-label scale” and hybridizing the information of gene ontology with the sequential evolution information, a novel predictor called iLoc-Gneg is developed for predicting the subcellular localization of Gram-positive bacterial proteins with both single-location and multiple-location sites. For facilitating comparison, the same stringent benchmark dataset used to estimate the accuracy of Gneg-mPLoc was adopted to demonstrate the power of iLoc-Gneg. The dataset contains 1,392 Gram-negative bacterial proteins classified into the following eight locations: (1) cytoplasm, (2) extracellular, (3) fimbrium, (4) flagellum, (5) inner membrane, (6) nucleoid, (7) outer membrane, and (8) periplasm. Of the 1,392 proteins, 1,328 are each with only one subcellular location and the other 64 are each with two subcellular locations, but none of the proteins included has pairwise sequence identity to any other in a same subset (subcellular location). It was observed that the overall success rate by jackknife test on such a stringent benchmark dataset by iLoc-Gneg was over 91%, which is about 6% higher than that by Gneg-mPLoc. As a user-friendly web-server, iLoc-Gneg is freely accessible to the public at Meanwhile, a step-by-step guide is provided on how to use the web-server to get the desired results. Furthermore, for the user's convenience, the iLoc-Gneg web-server also has the function to accept the batch job submission, which is not available in the existing version of Gneg-mPLoc web-server. It is anticipated that iLoc-Gneg may become a useful high throughput tool for Molecular Cell Biology, Proteomics, System Biology, and Drug Development.

Xiao, Xuan; Wu, Zhi-Cheng; Chou, Kuo-Chen



Spatial downscaling and mapping of daily precipitation and air temperature using daily station data and monthly mean maps  

NASA Astrophysics Data System (ADS)

Accurate maps of daily weather variables are an essential component of hydrologic and ecologic modeling. Here we present a four-step method that uses daily station data and transient monthly maps of precipitation and air temperature. This method uses the monthly maps to help interpolate between stations for more accurate production of daily maps at any spatial resolution. The first step analyzes the quality of the each station's data using a discrepancy analysis that compares statistics derived from a statistical jack-knifing approach with a time-series evaluation of discrepancies generated for each station. Although several methods could be used for the second step of producing initial maps, such as kriging, splines, etc., we used a gradient plus inverse distance squared method that was developed to produce accurate climate maps for sparse data regions with widely separated and few climate stations, far fewer than would be needed for techniques such as kriging. The gradient plus inverse distance squared method uses local gradients in the climate parameters, easting, northing, and elevation, to adjust the inverse distance squared estimates for local gradients such as lapse rates, inversions, or rain shadows at scales of 10's of meters to kilometers. The third step is to downscale World Wide Web (web) based transient monthly data, such as Precipitation-Elevation Regression on Independent Slope Method (PRISM) for the US (4 km or 800 m maps) or Climate Research Unit (CRU 3.1) data sets (40 km for global applications) to the scale of the daily data's digital elevation model. In the final step the downscaled transient monthly maps are used to adjust the daily time-series mapped data (~30 maps/month) for each month. These adjustments are used to scale daily maps so that summing them for precipitation or averaging them for temperature would more accurately reproduce the variability in selected monthly maps. This method allows for individual days to have maxima or minima values away from the station locations based on the underlying geographic structure of the monthly maps. We compare our results with the web based 12 km Variable Infiltration Capacity model (VIC) daily data and the 1 km DayMet daily data as well as make comparisons of the month summation or average of daily data sets with the PRISM and CRU data sets. There were mixed results in the comparisons with some good agreement and some bad agreement, even between VIC and DayMet. These daily maps are intended to be used as input to daily hydrological models. The results will provide more insight into the significance of the differences, at least from a hydrology perspective.

Flint, A. L.; Flint, L. E.; Stern, M. A.



Optimizing the diagnostic power with gastric emptying scintigraphy at multiple time points  

PubMed Central

Background Gastric Emptying Scintigraphy (GES) at intervals over 4 hours after a standardized radio-labeled meal is commonly regarded as the gold standard for diagnosing gastroparesis. The objectives of this study were: 1) to investigate the best time point and the best combination of multiple time points for diagnosing gastroparesis with repeated GES measures, and 2) to contrast and cross-validate Fisher's Linear Discriminant Analysis (LDA), a rank based Distribution Free (DF) approach, and the Classification And Regression Tree (CART) model. Methods A total of 320 patients with GES measures at 1, 2, 3, and 4 hour (h) after a standard meal using a standardized method were retrospectively collected. Area under the Receiver Operating Characteristic (ROC) curve and the rate of false classification through jackknife cross-validation were used for model comparison. Results Due to strong correlation and an abnormality in data distribution, no substantial improvement in diagnostic power was found with the best linear combination by LDA approach even with data transformation. With DF method, the linear combination of 4-h and 3-h increased the Area Under the Curve (AUC) and decreased the number of false classifications (0.87; 15.0%) over individual time points (0.83, 0.82; 15.6%, 25.3%, for 4-h and 3-h, respectively) at a higher sensitivity level (sensitivity = 0.9). The CART model using 4 hourly GES measurements along with patient's age was the most accurate diagnostic tool (AUC = 0.88, false classification = 13.8%). Patients having a 4-h gastric retention value >10% were 5 times more likely to have gastroparesis (179/207 = 86.5%) than those with ?10% (18/113 = 15.9%). Conclusions With a mixed group of patients either referred with suspected gastroparesis or investigated for other reasons, the CART model is more robust than the LDA and DF approaches, capable of accommodating covariate effects and can be generalized for cross institutional applications, but could be unstable if sample size is limited.



School-age effects of the newborn individualized developmental care and assessment program for preterm infants with intrauterine growth restriction: preliminary findings  

PubMed Central

Background The experience in the newborn intensive care nursery results in premature infants’ neurobehavioral and neurophysiological dysfunction and poorer brain structure. Preterms with severe intrauterine growth restriction are doubly jeopardized given their compromised brains. The Newborn Individualized Developmental Care and Assessment Program improved outcome at early school-age for preterms with appropriate intrauterine growth. It also showed effectiveness to nine months for preterms with intrauterine growth restriction. The current study tested effectiveness into school-age for preterms with intrauterine growth restriction regarding executive function (EF), electrophysiology (EEG) and neurostructure (MRI). Methods Twenty-three 9-year-old former growth-restricted preterms, randomized at birth to standard care (14 controls) or to the Newborn Individualized Developmental Care and Assessment Program (9 experimentals) were assessed with standardized measures of cognition, achievement, executive function, electroencephalography, and magnetic resonance imaging. The participating children were comparable to those lost to follow-up, and the controls to the experimentals, in terms of newborn background health and demographics. All outcome measures were corrected for mother’s intelligence. Analysis techniques included two-group analysis of variance and stepwise discriminate analysis for the outcome measures, Wilks’ lambda and jackknifed classification to ascertain two-group classification success per and across domains; canonical correlation analysis to explore relationships among neuropsychological, electrophysiological and neurostructural domains at school-age, and from the newborn period to school-age. Results Controls and experimentals were comparable in age at testing, anthropometric and health parameters, and in cognitive and achievement scores. Experimentals scored better in executive function, spectral coherence, and cerebellar volumes. Furthermore, executive function, spectral coherence and brain structural measures discriminated controls from experimentals. Executive function correlated with coherence and brain structure measures, and with newborn-period neurobehavioral assessment. Conclusion The intervention in the intensive care nursery improved executive function as well as spectral coherence between occipital and frontal as well as parietal regions. The experimentals’ cerebella were significantly larger than the controls’. These results, while preliminary, point to the possibility of long-term brain improvement even of intrauterine growth compromised preterms if individualized intervention begins with admission to the NICU and extends throughout transition home. Larger sample replications are required in order to confirm these results. Clinical trial registration The study is registered as a clinical trial. The trial registration number is NCT00914108.



Structure and thermal regime beneath the South Pole region, East Antarctica, from magnetotelluric measurements  

NASA Astrophysics Data System (ADS)

Ten tensor magnetotelluric (MT) soundings have been acquired in a 54 km long profile across the South Pole area, East Antarctica. The MT transect was offset from the South Pole station ~5 km and oriented 210 grid north, approximately normal to the Trans-Antarctic Mountains. Surveying around South Pole station was pursued for four main reasons. First, we sought to illuminate first-order structure and physico-chemical state (temperatures, fluids, melts) of the crust and upper mantle of this part of East Antarctica. Secondly, conditions around the South Pole differ from those of previous MT experience at central West Antarctica, so that the project would help to define MT surveying feasibility over the entire continent. Thirdly, the results would provide a crustal response baseline for possible long-term MT monitoring to deep upper mantle depths at the South Pole. Fourthly, because Antarctic logistics are difficult, support facilities at the South Pole enable relatively efficient survey procedures. In making the MT measurements, the high electrical contact impedance at the electrode-firn interface was overcome using a custom-design electrode pre-amplifier at the electrode with low output impedance to the remainder of the recording electronics. Non-plane-wave effects in the data were suppressed using a robust jackknife procedure that emphasized outlier removal from the vertical magnetic field records. Good quality data were obtained, but the rate of collection was hampered by low geomagnetic activity and wind-generated, electrostatic noise induced in the ice. Profile data were inverted using a 2-D algorithm that damps model departures from an a priori structure, in this case a smooth 1-D profile obtained from inversion of an integral of the TM mode impedance along the profile. Inverse models show clear evidence for a pronounced (~1 km thickness), conductive section below the ice tentatively correlated with porous sediments of the Beacon Supergroup. Substantial variations in sedimentary conductance are inferred, which may translate into commensurate variations in sediment thickness. Low resistivities below ~30 km suggest thermal activity in the lower crust and upper mantle, and mantle support for this region of elevated East Antarctica. This contrasts with resistivity structure imaged previously in central West Antarctica, where resistivity remains high into the upper mantle consistent with a fossil state of extensional activity there.

Wannamaker, Philip E.; Stodt, John A.; Pellerin, Louise; Olsen, Steven L.; Hall, Darrell B.



Fine-scale Genetic Structure among Genetic Individuals of the Clone-Forming Monotypic Genus Echinosophora koreensis (Fabaceae)  

PubMed Central

• Background and Aims For rare endemics or endangered plant species that reproduce both sexually and vegetatively it is critical to understand the extent of clonality because assessment of clonal extent and distribution has important ecological and evolutionary consequences with conservation implications. A survey was undertaken to understand clonal effects on fine-scale genetic structure (FSGS) in two populations (one from a disturbed and the other from an undisturbed locality) of Echinosophora koreensis, an endangered small shrub belonging to a monotypic genus in central Korea that reproduces both sexually and vegetatively via rhizomes. • Methods Using inter-simple sequence repeats (ISSRs) as genetic markers, the spatial distribution of individuals was evaluated using Ripley's L(d)-statistics and quantified the spatial scale of clonal spread and spatial distribution of ISSR genotypes using spatial autocorrelation analysis techniques (join-count statistics and kinship coefficient, Fij) for total samples and samples excluding clones. • Key Results A high degree of differentiation between populations was observed (?ST(g) = 0·184, P < 0·001). Ripley's L(d)-statistics revealed a near random distribution of individuals in a disturbed population, whereas significant aggregation of individuals was found in an undisturbed site. The join-count statistics revealed that most clones significantly aggregate at ?6-m interplant distance. The Sp statistic reflecting patterns of correlograms revealed a strong pattern of FSGS for all four data sets (Sp = 0·072–0·154), but these patterns were not significantly different from each other. At small interplant distances (?2?m), however, jackknifed 95?% CIs revealed that the total samples exhibited significantly higher Fij values than the same samples excluding clones. • Conclusion The strong FSGS from genets is consistent with two biological and ecological traits of E. koreensis: bee-pollination and limited seed dispersal. Furthermore, potential clone mates over repeated generations would contribute to the observed high Fij values among genets at short distance. To ensure long-term ex situ genetic variability of the endangered E. koreensis, individuals located at distances of 10?12?m should be collected across entire populations of E. koreensis.




Tests for intact and collapsed magnetofossil chains  

NASA Astrophysics Data System (ADS)

In recent years, new techniques for the detection of magnetofossils have been proposed, based on their unique first-order reversal curves (FORC) and ferromagnetic resonance (FMR) signatures. These signatures are related to the non-interacting (FORC) and strongly uniaxial anisotropy (FMR) of isolated chains of magnetic particles. However, little is known about the fate of these signatures in sediments where magnetosome chains collapsed during early diagenetic processes. Due to the impossibility of observing the particle arrangement in-situ, the structure of collapsed chains can only be inferred from TEM images of magnetic extracts and from first-principles consideration on the mechanical stability of magnetosome chains once the biological material around them is dissolved. The magnetic properties of double chains, produced by some strains of cocci, are also not known. According to these considerations, four main magnetofossil structures were taken into consideration: (1) isolated, linear chains, (2) double, half-staggered chains, where the gaps of one chain face the magnetosomes in the other chain, (3) double chains with side-to-side magnetosomes, which might result from a "jackknife" type of collapse of a single, long chain, and (4) zig-zag collapsed chains of elongated crystals, where the magnetosome long axes are perpendicular to the chain axis. The collapsed structures might be relevant in sediments where magnetofossils carry a significant part of the remanent magnetization, because chain collapse tends to cancel the original natural remanent magnetization. Detailed models for the hysteretic and anhysteretic properties of structures (1-4) have been calculated by taking realistic distributions of magnetosome size, elongation, and spacing into account, as inferred from a number of published TEM observations. Model calculations took a total of >2 years continuous running time on two computers in an effort to obtain realistic results, which are shown here for the first time. These results match measurements obtained previously on magnetosome-rich sediments in smallest details, showing that the identification of distinct intact and collapsed chain structures is possible. On the other hand, these results show that caution should be used when interpreting sediment hysteresis properties as mixtures of single domain (SD), multidomain (MD), and superparamagnetic (SP) particles; because some collapsed chain structures closely mimic SD-MD-SP mixing trends in a Day plot, although being made only of SD particles.

Egli, R.



Warfarin Anticoagulant Therapy: A Southern Italy Pharmacogenetics-Based Dosing Model  

PubMed Central

Background and Aim Warfarin is the most frequently prescribed anticoagulant worldwide. However, warfarin therapy is associated with a high risk of bleeding and thromboembolic events because of a large interindividual dose-response variability. We investigated the effect of genetic and non genetic factors on warfarin dosage in a South Italian population in the attempt to setup an algorithm easily applicable in the clinical practice. Materials and Methods A total of 266 patients from Southern Italy affected by cardiovascular diseases were enrolled and their clinical and anamnestic data recorded. All patients were genotyped for CYP2C9*2,*3, CYP4F2*3, VKORC1 -1639 G>A by the TaqMan assay and for variants VKORC1 1173 C>T and VKORC1 3730 G>A by denaturing high performance liquid chromatography and direct sequencing. The effect of genetic and not genetic factors on warfarin dose variability was tested by multiple linear regression analysis, and an algorithm based on our data was established and then validated by the Jackknife procedure. Results Warfarin dose variability was influenced, in decreasing order, by VKORC1-1639 G>A (29.7%), CYP2C9*3 (11.8%), age (8.5%), CYP2C9*2 (3.5%), gender (2.0%) and lastly CYP4F2*3 (1.7%); VKORC1 1173 C>T and VKORC1 3730 G>A exerted a slight effect (<1% each). Taken together, these factors accounted for 58.4% of the warfarin dose variability in our population. Data obtained with our algorithm significantly correlated with those predicted by the two online algorithms: Warfarin dosing and Pharmgkb (p<0.001; R2?=?0.805 and p<0.001; R2?=?0.773, respectively). Conclusions Our algorithm, which is based on six polymorphisms, age and gender, is user-friendly and its application in clinical practice could improve the personalized management of patients undergoing warfarin therapy.

Mazzaccara, Cristina; Conti, Valeria; Liguori, Rosario; Simeon, Vittorio; Toriello, Mario; Severini, Angelo; Perricone, Corrado; Meccariello, Alfonso; Meccariello, Pasquale; Vitale, Dino Franco; Filippelli, Amelia; Sacchetti, Lucia



A Multi-Label Predictor for Identifying the Subcellular Locations of Singleplex and Multiplex Eukaryotic Proteins  

PubMed Central

Subcellular locations of proteins are important functional attributes. An effective and efficient subcellular localization predictor is necessary for rapidly and reliably annotating subcellular locations of proteins. Most of existing subcellular localization methods are only used to deal with single-location proteins. Actually, proteins may simultaneously exist at, or move between, two or more different subcellular locations. To better reflect characteristics of multiplex proteins, it is highly desired to develop new methods for dealing with them. In this paper, a new predictor, called Euk-ECC-mPLoc, by introducing a powerful multi-label learning approach which exploits correlations between subcellular locations and hybridizing gene ontology with dipeptide composition information, has been developed that can be used to deal with systems containing both singleplex and multiplex eukaryotic proteins. It can be utilized to identify eukaryotic proteins among the following 22 locations: (1) acrosome, (2) cell membrane, (3) cell wall, (4) centrosome, (5) chloroplast, (6) cyanelle, (7) cytoplasm, (8) cytoskeleton, (9) endoplasmic reticulum, (10) endosome, (11) extracellular, (12) Golgi apparatus, (13) hydrogenosome, (14) lysosome, (15) melanosome, (16) microsome, (17) mitochondrion, (18) nucleus, (19) peroxisome, (20) spindle pole body, (21) synapse, and (22) vacuole. Experimental results on a stringent benchmark dataset of eukaryotic proteins by jackknife cross validation test show that the average success rate and overall success rate obtained by Euk-ECC-mPLoc were 69.70% and 81.54%, respectively, indicating that our approach is quite promising. Particularly, the success rates achieved by Euk-ECC-mPLoc for small subsets were remarkably improved, indicating that it holds a high potential for simulating the development of the area. As a user-friendly web-server, Euk-ECC-mPLoc is freely accessible to the public at the website We believe that Euk-ECC-mPLoc may become a useful high-throughput tool, or at least play a complementary role to the existing predictors in identifying subcellular locations of eukaryotic proteins.

Wang, Xiao; Li, Guo-Zheng



Computer-aided detection of breast masses: Four-view strategy for screening mammography  

PubMed Central

Purpose: To improve the performance of a computer-aided detection (CAD) system for mass detection by using four-view information in screening mammography. Methods: The authors developed a four-view CAD system that emulates radiologists’ reading by using the craniocaudal and mediolateral oblique views of the ipsilateral breast to reduce false positives (FPs) and the corresponding views of the contralateral breast to detect asymmetry. The CAD system consists of four major components: (1) Initial detection of breast masses on individual views, (2) information fusion of the ipsilateral views of the breast (referred to as two-view analysis), (3) information fusion of the corresponding views of the contralateral breast (referred to as bilateral analysis), and (4) fusion of the four-view information with a decision tree. The authors collected two data sets for training and testing of the CAD system: A mass set containing 389 patients with 389 biopsy-proven masses and a normal set containing 200 normal subjects. All cases had four-view mammograms. The true locations of the masses on the mammograms were identified by an experienced MQSA radiologist. The authors randomly divided the mass set into two independent sets for cross validation training and testing. The overall test performance was assessed by averaging the free response receiver operating characteristic (FROC) curves of the two test subsets. The FP rates during the FROC analysis were estimated by using the normal set only. The jackknife free-response ROC (JAFROC) method was used to estimate the statistical significance of the difference between the test FROC curves obtained with the single-view and the four-view CAD systems. Results: Using the single-view CAD system, the breast-based test sensitivities were 58% and 77% at the FP rates of 0.5 and 1.0 per image, respectively. With the four-view CAD system, the breast-based test sensitivities were improved to 76% and 87% at the corresponding FP rates, respectively. The improvement was found to be statistically significant (p<0.0001) by JAFROC analysis. Conclusions: The four-view information fusion approach that emulates radiologists’ reading strategy significantly improves the performance of breast mass detection of the CAD system in comparison with the single-view approach.

Wei, Jun; Chan, Heang-Ping; Zhou, Chuan; Wu, Yi-Ta; Sahiner, Berkman; Hadjiiski, Lubomir M.; Roubidoux, Marilyn A.; Helvie, Mark A.



Detection of breast abnormalities using a prototype resonance electrical impedance spectroscopy system: A preliminary study  

PubMed Central

Electrical impedance spectroscopy has been investigated with but limited success as an adjunct procedure to mammography and as a possible pre-screening tool to stratify risk for having or developing breast cancer in younger women. In this study, the authors explored a new resonance frequency based [resonance electrical impedance spectroscopy (REIS)] approach to identify breasts that may have highly suspicious abnormalities that had been recommended for biopsies. The authors assembled a prototype REIS system generating multifrequency electrical sweeps ranging from 100 to 4100 kHz every 12 s. Using only two probes, one in contact with the nipple and the other with the outer breast skin surface 60 mm away, a paired transmission signal detection system is generated. The authors recruited 150 women between 30 and 50 years old to participate in this study. REIS measurements were performed on both breasts. Of these women 58 had been scheduled for a breast biopsy and 13 had been recalled for additional imaging procedures due to suspicious findings. The remaining 79 women had negative screening examinations. Eight REIS output signals at and around the resonance frequency were computed for each breast and the subtracted signals between the left and right breasts were used in a simple jackknifing method to select an optimal feature set to be inputted into a multi-feature based artificial neural network (ANN) that aims to predict whether a woman’s breast had been determined as abnormal (warranting a biopsy) or not. The classification performance was evaluated using a leave-one-case-out method and receiver operating characteristics (ROC) analysis. The study shows that REIS examination is easy to perform, short in duration, and acceptable to all participants in terms of comfort level and there is no indication of sensation of an electrical current during the measurements. Six REIS difference features were selected as input signals to the ANN. The area under the ROC curve (Az) was 0.707±0.033 for classifying between biopsy cases and non-biopsy (including recalled and screening negative) and the performance (Az) increased to 0.746±0.033 after excluding recalled but negative cases. At 95% specificity, the sensitivity levels were approximately 20.5% and 30.4% in the two data sets tested. The results suggest that differences in REIS signals between two breasts measured in and around the tissue resonance frequency can be used to identify at least some of the women with suspicious abnormalities warranting biopsy with high specificity.

Zheng, Bin; Zuley, Margarita L.; Sumkin, Jules H.; Catullo, Victor J.; Abrams, Gordon S.; Rathfon, Grace Y.; Chough, Denise M.; Gruss, Michelle Z.; Gur, David



Intertidal benthic macrofauna of rare rocky fragments in the Amazon region.  


Rock fragment fields are important habitat for biodiversity maintenance in coastal regions, particularly when located in protected areas dominated by soft sediments. Researches in this habitat have received surprisingly little attention on the Amazon Coast, despite rock fragments provide refuges, nursery grounds and food sources for a variety of benthic species. The present survey describes the mobile macroinvertebrate species composition and richness of the intertidal rocky fragments in Areuá Island within the "Mãe Grande de Curuçá" Marine Extractive Reserve (RESEX) on the Brazilian Amazon Coast. Samples were collected during the dry (August and November 2009) and rainy seasons (March and May 2010) on the upper and lower intertidal zone, using a 625cm2 quadrat. At each season and intertidal zone, macroinvertebrate samples were collected along four transects (20m each) parallel to the waterline, and within each transect two quadrats were randomly sampled. Macroinvertebrates were identified, density determined, and biomass values obtained to characterize benthic diversity from the rocky fragments. The Jackknife procedure was used to estimate species richness from different intertidal zones during the dry and rainy seasons. Macrofaunal community comprised 85 taxa, with 17 "unique" taxa, 40 taxa were common to both intertidal zones and seasons, and 23 taxa have been recorded for the first time on the Brazilian Amazon Coast. Species richness was estimated at 106 +/- 9.7 taxa and results suggest that sampling effort was representative. Polychaeta was the most dominant in species number, followed by Malacostraca and Gastropoda. Regarding frequency of occurrence, Crustacean species Dynamenella tropica, Parhyale sp. and Petrolisthes armatus were the most frequent representing >75% of frequency of occurrence and 39 taxa were least frequent representing <5% of frequency of occurrence. Occurrence of crustaceans and polychaetes were particularly noteworthy in all intertidal zones and seasons, represented by 15 and 13 taxa, respectively. The most representative class in abundance and biomass was Malacostraca that represented more than half of all individuals sampled, and was dominated by Petrolisthes armatus. The latter was one of the most frequent, numerous and higher biomass species in the samples. In general, results indicated greater richness and biomass in the lower zone. Additionally, richness and density increase during the rainy season. Rock fragment fields in Areuá Island are rich in microhabitats and include a diverse array of species in a limited area. Our results underline the importance of rock fragment fields in Areuá Island for the maintenance of biodiversity in the Amazon Coast. PMID:24912344

Morais, Gisele Cavalcante; Lee, James Tony



Extreme Precipitation Mapping for Flood Risk Assessment in Ungauged Basins  

NASA Astrophysics Data System (ADS)

The poster present a study of mapping 2-year and 100-year annual maximum daily precipitation for rainfall-runoff studies and estimating flood hazard. The main objective was to discuss the quality and properties of maps of design precipitation with a given return period with respect to the expectations of the end user community. Four approaches to the preprocessing of annual maximum 24-hour precipitation data were used, and three interpolation methods employed. The first method is the direct mapping of at-site estimates of distribution function quantiles; the second is the direct mapping of local estimates of the three parameters of the GEV distribution. In the third method, the daily measurements of the precipitation totals were interpolated into a regular grid network, and then the time series of the maximum daily precipitation totals in each grid point of the selected region were statistically analysed. In the fourth method, the spatial distribution of the design precipitation was modeled by quantiles predicted by regional precipitation frequency analysis using the Hosking and Wallis procedure. Homogeneity of the region of interest was tested, and the index value (the mean annual maximum daily precipitation) was mapped using spatial interpolation (instead of the more usual regional regression). Quantiles were derived through the dimensionless regional frequency distribution estimated by using L-moments. The three interpolation methods used were the inverse distance weighting, nearest neighbor and the kriging method. The daily precipitation measurements at 23 climate stations from 1961-2000 were used in the upper Hron basin in central Slovakia. Visual inspection and jackknife cross-validation was used to compare the combination of approaches. Under the specific regime dominated by thermal and frontal convective events, the potential advantage of using mapping of daily precipitation series as a basis for quantile estimation was not shown and under the given conditions the use of the regional frequency analysis is recommended as a suitable method to account for the spatial variability of design precipitation for mapping purposes. By trading of space for time it overcomes the problems of data shortage and unequal lengths of data series and, through the notion of quantitatively underpinned spatial homogeneity, it also offers a solution to the problem of inadequate spatial coverage and sampling of precipitation fields by the gauging network.

Kohnova, S.; Parajka, J.; Szolgay, J.; Hlavcova, K.



EEG spectral coherence data distinguish chronic fatigue syndrome patients from healthy controls and depressed patients-A case control study  

PubMed Central

Background Previous studies suggest central nervous system involvement in chronic fatigue syndrome (CFS), yet there are no established diagnostic criteria. CFS may be difficult to differentiate from clinical depression. The study's objective was to determine if spectral coherence, a computational derivative of spectral analysis of the electroencephalogram (EEG), could distinguish patients with CFS from healthy control subjects and not erroneously classify depressed patients as having CFS. Methods This is a study, conducted in an academic medical center electroencephalography laboratory, of 632 subjects: 390 healthy normal controls, 70 patients with carefully defined CFS, 24 with major depression, and 148 with general fatigue. Aside from fatigue, all patients were medically healthy by history and examination. EEGs were obtained and spectral coherences calculated after extensive artifact removal. Principal Components Analysis identified coherence factors and corresponding factor loading patterns. Discriminant analysis determined whether spectral coherence factors could reliably discriminate CFS patients from healthy control subjects without misclassifying depression as CFS. Results Analysis of EEG coherence data from a large sample (n = 632) of patients and healthy controls identified 40 factors explaining 55.6% total variance. Factors showed highly significant group differentiation (p < .0004) identifying 89.5% of unmedicated female CFS patients and 92.4% of healthy female controls. Recursive jackknifing showed predictions were stable. A conservative 10-factor discriminant function model was subsequently applied, and also showed highly significant group discrimination (p < .001), accurately classifying 88.9% unmedicated males with CFS, and 82.4% unmedicated male healthy controls. No patient with depression was classified as having CFS. The model was less accurate (73.9%) in identifying CFS patients taking psychoactive medications. Factors involving the temporal lobes were of primary importance. Conclusions EEG spectral coherence analysis identified unmedicated patients with CFS and healthy control subjects without misclassifying depressed patients as CFS, providing evidence that CFS patients demonstrate brain physiology that is not observed in healthy normals or patients with major depression. Studies of new CFS patients and comparison groups are required to determine the possible clinical utility of this test. The results concur with other studies finding neurological abnormalities in CFS, and implicate temporal lobe involvement in CFS pathophysiology.



Small (?1-cm) Hepatocellular Carcinoma: Diagnostic Performance and Imaging Features at Gadoxetic Acid-enhanced MR Imaging.  


Purpose To assess diagnostic performance and imaging features of gadoxetic acid-enhanced magnetic resonance (MR) imaging in small (?1-cm) hepatocellular carcinoma (HCC) detection in patients with chronic liver disease. Materials and Methods The institutional review board approved this retrospective study and waived informed consent. Sixty patients (56 men, four women; mean age, 60.1 years) with HCC (146 lesions; 70 > 1 cm, 76 ? 1 cm) underwent gadoxetic acid-enhanced MR imaging. HCC was confirmed at surgical resection (72 lesions; 30 > 1 cm, 42 ? 1 cm) or by showing interval growth with typical enhancement patterns at follow-up dynamic computed tomography or MR imaging (74 lesions; 40 > 1 cm, 34 ? 1 cm). Two radiologists assessed MR imaging features and graded likelihood of HCC with a five-point confidence scale. Jackknife alternative free-response receiver operating characteristic (JAFROC) method was used. Results Mean JAFROC figure of merit for small HCC was 0.717; that for large (>1-cm) HCC was 0.973 with substantial agreement (? = 0.676). Mean sensitivity and positive predictive value (PPV) were 46.0% (70 of 152) and 48.3% (70 of 145) for small HCC versus 95.0% (133 of 140) and 78.2% (133 of 170) for large HCC, respectively. Eleven of 76 small HCCs (14%) were not seen on MR images, even after careful investigation. MR imaging features of small HCC included arterial enhancement (79%, 60 of 76), hypointensity on hepatobiliary phase (HBP) images (68%, 52 of 76), washout on 3-minute delayed phase images (50%, 38 of 76), hyperintensity on T2-weighted images (43%, 33 of 76), hypointensity on T1-weighted images (32%, 24 of 76), and restriction on diffusion-weighted images (28%, 20 of 72). Arterial enhancement and washout on 3-minute delayed phase images or hypointensity on HBP images occurred in 66% of small HCCs (50 of 76). Conclusion Diagnostic performance of gadoxetic acid-enhanced MR imaging for small HCC detection is still low, with mean sensitivity of 46.0% (70 of 152) and mean PPV of 48.3% (70 of 145). By adding hypointensity on HBP images as washout, diagnostic performance for small HCC detection can be improved. © RSNA, 2014 Online supplemental material is available for this article. PMID:24588677

Yu, Mi Hye; Kim, Jung Hoon; Yoon, Jeong-Hee; Kim, Hyo-Cheol; Chung, Jin Wook; Han, Joon Koo; Choi, Byung-Ihn



Development of waveform inversion techniques for using body-wave waveforms to infer localized three-dimensional seismic structure and an application to D"  

NASA Astrophysics Data System (ADS)

In order to further extract information on localized three-dimensional seismic structure from observed seismic data, we have developed and applied methods for seismic waveform inversion. Deriving algorithms for the calculation of synthetic seismograms and their partial derivatives, development of efficient software for their computation and for data handling, correction for near-source and near-receiver structure, and choosing appropriate parameterization of the model space are the key steps in such an inversion. We formulate the inverse problem of waveform inversion for localized structure, computing partial derivatives of waveforms with respect to the 3-D elastic moduli at arbitrary points in space for anisotropic and anelastic media. Our method does not use any great circle approximations in computing the synthetics and their partial derivatives. In order to efficiently solve the inverse problem we use the conjugate gradient (CG) method. We apply our methods to inversion for the three-dimensional shear wave structure in the lowermost mantle beneath Central America and the Western Pacific using waveforms in the period band from 12.5 to 200~s. Checkerboard tests show that waveform inversion of S, ScS, and the other phases which arrive between them can resolve laterally heterogenous shear-wave structure in the lowermost mantle using waves propagating only in a relatively limited range of azimuths. Checkerboard tests show that white noise has little impact on the results of waveform inversion. Various tests such as a jackknife test show that our model is robust. We verify the near-orthogonality of partial derivatives with respect to structure inside and outside the target region; we find that although datasets with only a small number of waveforms (e.g., waveforms recorded by stations for only a single event) cannot resolve structure inside and outside the target region, a dataset with a large number of waveforms can almost completely remove the effects of near-source and near-receiver structure. Waveform inversion with a large dataset is thus confirmed to be a promising approach to infer 3-D seismic fine structure in the Earth's deep interior.

Kawai, K.; Konishi, K.; Geller, R. J.; Fuji, N.



One year survival of ART and conventional restorations in patients with disability  

PubMed Central

Background Providing restorative treatment for persons with disability may be challenging and has been related to the patient’s ability to cope with the anxiety engendered by treatment and to cooperate fully with the demands of the clinical situation. The aim of the present study was to assess the survival rate of ART restorations compared to conventional restorations in people with disability referred for special care dentistry. Methods Three treatment protocols were distinguished: ART (hand instruments/high-viscosity glass-ionomer); conventional restorative treatment (rotary instrumentation/resin composite) in the clinic (CRT/clinic) and under general anaesthesia (CRT/GA). Patients were referred for restorative care to a special care centre and treated by one of two specialists. Patients and/or their caregivers were provided with written and verbal information regarding the proposed techniques, and selected the type of treatment they were to receive. Treatment was provided as selected but if this option proved clinically unfeasible one of the alternative techniques was subsequently proposed. Evaluation of restoration survival was performed by two independent trained and calibrated examiners using established ART restoration assessment codes at 6 months and 12 months. The Proportional Hazard model with frailty corrections was applied to calculate survival estimates over a one year period. Results 66 patients (13.6?±?7.8 years) with 16 different medical disorders participated. CRT/clinic proved feasible for 5 patients (7.5%), the ART approach for 47 patients (71.2%), and 14 patients received CRT/GA (21.2%). In all, 298 dentine carious lesions were restored in primary and permanent teeth, 182 (ART), 21 (CRT/clinic) and 95 (CRT/GA). The 1-year survival rates and jackknife standard error of ART and CRT restorations were 97.8?±?1.0% and 90.5?±?3.2%, respectively (p?=?0.01). Conclusions These short-term results indicate that ART appears to be an effective treatment protocol for treating patients with disability restoratively, many of whom have difficulty coping with the conventional restorative treatment. Trial registration number Netherlands Trial Registration: NTR 4400



Detection of B-Mode Polarization at Degree Angular Scales by BICEP2.  


We report results from the BICEP2 experiment, a cosmic microwave background (CMB) polarimeter specifically designed to search for the signal of inflationary gravitational waves in the B-mode power spectrum around ??80. The telescope comprised a 26 cm aperture all-cold refracting optical system equipped with a focal plane of 512 antenna coupled transition edge sensor 150 GHz bolometers each with temperature sensitivity of ?300???K_{CMB}sqrt[s] . BICEP2 observed from the South Pole for three seasons from 2010 to 2012. A low-foreground region of sky with an effective area of 380 square deg was observed to a depth of 87 nK deg in Stokes Q and U. In this paper we describe the observations, data reduction, maps, simulations, and results. We find an excess of B-mode power over the base lensed-?CDM expectation in the range 305?. Through jackknife tests and simulations based on detailed calibration measurements we show that systematic contamination is much smaller than the observed excess. Cross correlating against WMAP 23 GHz maps we find that Galactic synchrotron makes a negligible contribution to the observed signal. We also examine a number of available models of polarized dust emission and find that at their default parameter values they predict power ?(5-10)× smaller than the observed excess signal (with no significant cross-correlation with our maps). However, these models are not sufficiently constrained by external public data to exclude the possibility of dust emission bright enough to explain the entire excess signal. Cross correlating BICEP2 against 100 GHz maps from the BICEP1 experiment, the excess signal is confirmed with 3? significance and its spectral index is found to be consistent with that of the CMB, disfavoring dust at 1.7?. The observed B-mode power spectrum is well fit by a lensed-?CDM+tensor theoretical model with tensor-to-scalar ratio r=0.20_{-0.05}^{+0.07}, with r=0 disfavored at 7.0?. Accounting for the contribution of foreground, dust will shift this value downward by an amount which will be better constrained with upcoming data sets. PMID:24996078

Ade, P A R; Aikin, R W; Barkats, D; Benton, S J; Bischoff, C A; Bock, J J; Brevik, J A; Buder, I; Bullock, E; Dowell, C D; Duband, L; Filippini, J P; Fliescher, S; Golwala, S R; Halpern, M; Hasselfield, M; Hildebrandt, S R; Hilton, G C; Hristov, V V; Irwin, K D; Karkare, K S; Kaufman, J P; Keating, B G; Kernasovskiy, S A; Kovac, J M; Kuo, C L; Leitch, E M; Lueker, M; Mason, P; Netterfield, C B; Nguyen, H T; O'Brient, R; Ogburn, R W; Orlando, A; Pryke, C; Reintsema, C D; Richter, S; Schwarz, R; Sheehy, C D; Staniszewski, Z K; Sudiwala, R V; Teply, G P; Tolan, J E; Turner, A D; Vieregg, A G; Wong, C L; Yoon, K W



Computer-aided detection of breast masses: Four-view strategy for screening mammography  

SciTech Connect

Purpose: To improve the performance of a computer-aided detection (CAD) system for mass detection by using four-view information in screening mammography. Methods: The authors developed a four-view CAD system that emulates radiologists' reading by using the craniocaudal and mediolateral oblique views of the ipsilateral breast to reduce false positives (FPs) and the corresponding views of the contralateral breast to detect asymmetry. The CAD system consists of four major components: (1) Initial detection of breast masses on individual views, (2) information fusion of the ipsilateral views of the breast (referred to as two-view analysis), (3) information fusion of the corresponding views of the contralateral breast (referred to as bilateral analysis), and (4) fusion of the four-view information with a decision tree. The authors collected two data sets for training and testing of the CAD system: A mass set containing 389 patients with 389 biopsy-proven masses and a normal set containing 200 normal subjects. All cases had four-view mammograms. The true locations of the masses on the mammograms were identified by an experienced MQSA radiologist. The authors randomly divided the mass set into two independent sets for cross validation training and testing. The overall test performance was assessed by averaging the free response receiver operating characteristic (FROC) curves of the two test subsets. The FP rates during the FROC analysis were estimated by using the normal set only. The jackknife free-response ROC (JAFROC) method was used to estimate the statistical significance of the difference between the test FROC curves obtained with the single-view and the four-view CAD systems. Results: Using the single-view CAD system, the breast-based test sensitivities were 58% and 77% at the FP rates of 0.5 and 1.0 per image, respectively. With the four-view CAD system, the breast-based test sensitivities were improved to 76% and 87% at the corresponding FP rates, respectively. The improvement was found to be statistically significant (p<0.0001) by JAFROC analysis. Conclusions: The four-view information fusion approach that emulates radiologists' reading strategy significantly improves the performance of breast mass detection of the CAD system in comparison with the single-view approach.

Wei Jun; Chan Heangping; Zhou Chuan; Wu Yita; Sahiner, Berkman; Hadjiiski, Lubomir M.; Roubidoux, Marilyn A.; Helvie, Mark A. [Department of Radiology, University of Michigan, 1500 East Medical Center Drive, C478 Med-Inn Building, Ann Arbor, Michigan 48109-5842 (United States)



Sampling times for monitoring tacrolimus in stable adult liver transplant recipients.  


The aim of this study was to determine the most informative sampling time(s) providing a precise prediction of tacrolimus area under the concentration-time curve (AUC). Fifty-four concentration-time profiles of tacrolimus from 31 adult liver transplant recipients were analyzed. Each profile contained 5 tacrolimus whole-blood concentrations (predose and 1, 2, 4, and 6 or 8 hours postdose), measured using liquid chromatography-tandem mass spectrometry. The concentration at 6 hours was interpolated for each profile, and 54 values of AUC(0-6) were calculated using the trapezoidal rule. The best sampling times were then determined using limited sampling strategies and sensitivity analysis. Linear mixed-effects modeling was performed to estimate regression coefficients of equations incorporating each concentration-time point (C0, C1, C2, C4, interpolated C5, and interpolated C6) as a predictor of AUC(0-6). Predictive performance was evaluated by assessment of the mean error (ME) and root mean square error (RMSE). Limited sampling strategy (LSS) equations with C2, C4, and C5 provided similar results for prediction of AUC(0-6) (R2 = 0.869, 0.844, and 0.832, respectively). These 3 time points were superior to C0 in the prediction of AUC. The ME was similar for all time points; the RMSE was smallest for C2, C4, and C5. The highest sensitivity index was determined to be 4.9 hours postdose at steady state, suggesting that this time point provides the most information about the AUC(0-12). The results from limited sampling strategies and sensitivity analysis supported the use of a single blood sample at 5 hours postdose as a predictor of both AUC(0-6) and AUC(0-12). A jackknife procedure was used to evaluate the predictive performance of the model, and this demonstrated that collecting a sample at 5 hours after dosing could be considered as the optimal sampling time for predicting AUC(0-6). PMID:15570182

Dansirikul, Chantaratsamon; Staatz, Christine E; Duffull, Stephen B; Taylor, Paul J; Lynch, Stephen V; Tett, Susan E



GalNAc-transferase specificity prediction based on feature selection method.  


GalNAc-transferase can catalyze the biosynthesis of O-linked oligosaccharides. The specificity of GalNAc-transferase is composed of nine amino acid residues denoted by R4, R3, R2, R1, R0, R1', R2', R3', R4'. To predict whether the reducing monosaccharide will be covalently linked to the central residue R0(Ser or Thr), a new method based on feature selection has been proposed in our work. 277 nonapeptides from reference [Chou KC. A sequence-coupled vector-projection model for predicting the specificity of GalNAc-transferase. Protein Sci 1995;4:1365-83] are chosen for training set. Each nonapeptide is represented by hundreds of amino acid properties collected by Amino Acid Index database ( and transformed into a numeric vector with 4554 features. The Maximum Relevance Minimum Redundancy (mRMR) method combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) are then applied for feature selection. Nearest Neighbor Algorithm (NNA) is used to build prediction models. The optimal model contains 54 features and its correct rate tested by Jackknife cross-validation test reaches 91.34%. Final feature analysis indicates that amino acid residues at position R3' play the most important role in the recognition of GalNAc-transferase specificity, which were confirmed by the experiments [Elhammer AP, Poorman RA, Brown E, Maggiora LL, Hoogerheide JG, Kezdy FJ. The specificity of UDP-GalNAc:polypeptide N-acetylgalactosaminyltransferase as inferred from a database of in vivo substrates and from the in vitro glycosylation of proteins and peptides. J Biol Chem 1993;268:10029-38; O'Connell BC, Hagen FK, Tabak LA. The influence of flanking sequence on the O-glycosylation of threonine in vitro. J Biol Chem 1992;267:25010-8; Yoshida A, Suzuki M, Ikenaga H, Takeuchi M. Discovery of the shortest sequence motif for high level mucin-type O-glycosylation. J Biol Chem 1997;272:16884-8]. Our method can be used as a tool for predicting O-glycosylation sites and for investigating the GalNAc-transferase specificity, which is useful for designing competitive inhibitors of GalNAc-transferase. The predicting software is available upon the request. PMID:18955094

Lu, Lin; Niu, Bing; Zhao, Jun; Liu, Liang; Lu, Wen-Cong; Liu, Xiao-Jun; Li, Yi-Xue; Cai, Yu-Dong



Predicting N-terminal acetylation based on feature selection method.  


Methionine aminopeptidase and N-terminal acetyltransferase are two enzymes that contribute most to the N-terminal acetylation, which has long been recognized as a frequent and important kind of co-translational modifications [R.A. Bradshaw, W.W. Brickey, K.W. Walker, N-terminal processing: the methionine aminopeptidase and N alpha-acetyl transferase families, Trends Biochem. Sci. 23 (1998) 263-267]. The combined action of these two enzymes leads to two types of N-terminal acetylated proteins that are with/without the initiator methionine after the N-terminal acetylation. To accurately predict these two types of N-terminal acetylation, a new method based on feature selection has been developed. 1047 N-terminal acetylated and non-acetylated decapeptides retrieved from Swiss-Prot database ( are encoded into feature vectors by amino acid properties collected in Amino Acid Index database ( The Maximum Relevance Minimum Redundancy method (mRMR) combining with Incremental Feature Selection (IFS) and Feature Forward Selection (FFS) is then applied to extract informative features. Nearest Neighbor Algorithm (NNA) is used to build prediction models. Tested by Jackknife Cross-Validation, the correct rate of predictors reach 91.34% and 75.49% for each type, which are both better than that of 84.41% and 62.99% acquired by using motif methods [S. Huang, R.C. Elliott, P.S. Liu, R.K. Koduri, J.L. Weickmann, J.H. Lee, L.C. Blair, P. Ghosh-Dastidar, R.A. Bradshaw, K.M. Bryan, et al., Specificity of cotranslational amino-terminal processing of proteins in yeast, Biochemistry 26 (1987) 8242-8246; R. Yamada, R.A. Bradshaw, Rat liver polysome N alpha-acetyltransferase: substrate specificity, Biochemistry 30 (1991) 1017-1021]. Furthermore, the analysis of the informative features indicates that at least six downstream residues might have effect on the rules that guide the N-terminal acetylation, besides the penultimate residue. The software is available upon request. PMID:18533108

Cai, Yu-Dong; Lu, Lin



Computer-aided mass detection in mammography: False positive reduction via gray-scale invariant ranklet texture features  

SciTech Connect

In this work, gray-scale invariant ranklet texture features are proposed for false positive reduction (FPR) in computer-aided detection (CAD) of breast masses. Two main considerations are at the basis of this proposal. First, false positive (FP) marks surviving our previous CAD system seem to be characterized by specific texture properties that can be used to discriminate them from masses. Second, our previous CAD system achieves invariance to linear/nonlinear monotonic gray-scale transformations by encoding regions of interest into ranklet images through the ranklet transform, an image transformation similar to the wavelet transform, yet dealing with pixels' ranks rather than with their gray-scale values. Therefore, the new FPR approach proposed herein defines a set of texture features which are calculated directly from the ranklet images corresponding to the regions of interest surviving our previous CAD system, hence, ranklet texture features; then, a support vector machine (SVM) classifier is used for discrimination. As a result of this approach, texture-based information is used to discriminate FP marks surviving our previous CAD system; at the same time, invariance to linear/nonlinear monotonic gray-scale transformations of the new CAD system is guaranteed, as ranklet texture features are calculated from ranklet images that have this property themselves by construction. To emphasize the gray-scale invariance of both the previous and new CAD systems, training and testing are carried out without any in-between parameters' adjustment on mammograms having different gray-scale dynamics; in particular, training is carried out on analog digitized mammograms taken from a publicly available digital database, whereas testing is performed on full-field digital mammograms taken from an in-house database. Free-response receiver operating characteristic (FROC) curve analysis of the two CAD systems demonstrates that the new approach achieves a higher reduction of FP marks when compared to the previous one. Specifically, at 60%, 65%, and 70% per-mammogram sensitivity, the new CAD system achieves 0.50, 0.68, and 0.92 FP marks per mammogram, whereas at 70%, 75%, and 80% per-case sensitivity it achieves 0.37, 0.48, and 0.71 FP marks per mammogram, respectively. Conversely, at the same sensitivities, the previous CAD system reached 0.71, 0.87, and 1.15 FP marks per mammogram, and 0.57, 0.73, and 0.92 FPs per mammogram. Also, statistical significance of the difference between the two per-mammogram and per-case FROC curves is demonstrated by the p-value<0.001 returned by jackknife FROC analysis performed on the two CAD systems.

Masotti, Matteo; Lanconelli, Nico; Campanini, Renato [Department of Physics, University of Bologna, Viale Berti-Pichat 6/2, 40127, Bologna (Italy)



A Deep Search for Extended Radio Continuum Emission from Dwarf Spheroidal Galaxies: Implications for Particle Dark Matter  

NASA Astrophysics Data System (ADS)

We present deep radio observations of four nearby dwarf spheroidal (dSph) galaxies, designed to detect extended synchrotron emission resulting from weakly interacting massive particle (WIMP) dark matter annihilations in their halos. Models by Colafrancesco et al. (CPU07) predict the existence of angularly large, smoothly distributed radio halos in such systems, which stem from electron and positron annihilation products spiraling in a turbulent magnetic field. We map a total of 40.5 deg2 around the Draco, Ursa Major II, Coma Berenices, and Willman 1 dSphs with the Green Bank Telescope (GBT) at 1.4 GHz to detect this annihilation signature, greatly reducing discrete-source confusion using the NVSS catalog. We achieve a sensitivity of ?sub <~ 7 mJy beam-1 in our discrete source-subtracted maps, implying that the NVSS is highly effective at removing background sources from GBT maps. For Draco we obtained approximately concurrent Very Large Array observations to quantify the variability of the discrete source background, and find it to have a negligible effect on our results. We construct radial surface brightness profiles from each of the subtracted maps, and jackknife the data to quantify the significance of the features therein. At the ~10' resolution of our observations, foregrounds contribute a standard deviation of 1.8 mJy beam-1 <= ?ast <= 5.7 mJy beam-1 to our high-latitude maps, with the emission in Draco and Coma dominated by foregrounds. On the other hand, we find no significant emission in the Ursa Major II and Willman 1 fields, and explore the implications of non-detections in these fields for particle dark matter using the fiducial models of CPU07. For a WIMP mass M ? = 100 GeV annihilating into b\\bar{b} final states and B = 1 ?G, upper limits on the annihilation cross-section for Ursa Major II and Willman I are log (lang?vrang?, cm3 s-1) <~ -25 for the preferred set of charged particle propagation parameters adopted by CPU07; this is comparable to that inferred at ?-ray energies from the two-year Fermi Large Area Telescope data. We discuss three avenues for improving the constraints on lang?vrang? presented here, and conclude that deep radio observations of dSphs are highly complementary to indirect WIMP searches at higher energies.

Spekkens, Kristine; Mason, Brian S.; Aguirre, James E.; Nhan, Bang



Detection of prostate cancer by integration of line-scan diffusion, T2-mapping and T2-weighted magnetic resonance imaging; a multichannel statistical classifier.  


A multichannel statistical classifier for detecting prostate cancer was developed and validated by combining information from three different magnetic resonance (MR) methodologies: T2-weighted, T2-mapping, and line scan diffusion imaging (LSDI). From these MR sequences, four different sets of image intensities were obtained: T2-weighted (T2W) from T2-weighted imaging, Apparent Diffusion Coefficient (ADC) from LSDI, and proton density (PD) and T2 (T2 Map) from T2-mapping imaging. Manually segmented tumor labels from a radiologist, which were validated by biopsy results, served as tumor "ground truth." Textural features were extracted from the images using co-occurrence matrix (CM) and discrete cosine transform (DCT). Anatomical location of voxels was described by a cylindrical coordinate system. A statistical jack-knife approach was used to evaluate our classifiers. Single-channel maximum likelihood (ML) classifiers were based on 1 of the 4 basic image intensities. Our multichannel classifiers: support vector machine (SVM) and Fisher linear discriminant (FLD), utilized five different sets of derived features. Each classifier generated a summary statistical map that indicated tumor likelihood in the peripheral zone (PZ) of the prostate gland. To assess classifier accuracy, the average areas under the receiver operator characteristic (ROC) curves over all subjects were compared. Our best FLD classifier achieved an average ROC area of 0.839(+/-0.064), and our best SVM classifier achieved an average ROC area of 0.761(+/-0.043). The T2W ML classifier, our best single-channel classifier, only achieved an average ROC area of 0.599(+/-0.146). Compared to the best single-channel ML classifier, our best multichannel FLD and SVM classifiers have statistically superior ROC performance (P=0.0003 and 0.0017, respectively) from pairwise two-sided t-test. By integrating the information from multiple images and capturing the textural and anatomical features in tumor areas, summary statistical maps can potentially aid in image-guided prostate biopsy and assist in guiding and controlling delivery of localized therapy under image guidance. PMID:14528961

Chan, Ian; Wells, William; Mulkern, Robert V; Haker, Steven; Zhang, Jianqing; Zou, Kelly H; Maier, Stephan E; Tempany, Clare M C



Uncertainty in Pedotransfer Functions from Soil Survey Data  

NASA Astrophysics Data System (ADS)

Pedotransfer functions (PTFs) are empirical relationships between hard-to-get soil parameters, i.e. hydraulic properties, and more easily obtainable basic soil properties, such as texture. Use of PTFs in large-scale projects and pilot studies relies on data of soil survey that provides soil basic data as a categorical information. Unlike numerical variables, categorical data cannot be directly used in statistical regressions or neural networks to develop PTFs. Objectives of this work were (a) to find and test techniques to develop PTFs for soil water retention and saturated hydraulic conductivity with soil categorical data as inputs, (b) to evaluate sources of uncertainty in results of such PTFs and to research opportunities of mitigating the uncertainty. We used a subset of about 12,000 samples from the US National Soil characterization database to estimate water retention, and the data set for circa 1000 hydraulic conductivity measurements done in the US. Regression trees and polynomial neural networks based on dummy coding were the techniques tried for the PTF development. The jackknife validation was used to prevent the over-parameterization. Both techniques were equally efficient in developing PTFs, but regression trees gave much more transparent results. Textural class was the leading predictor with RMSE values of about 6.5 and 4.1 vol.% for water retention at -33 and -1500 kPa, respectively. The RMSE values decreased 10% when the laboratory textural analysis was used to establish the textural class. Textural class in the field was determined correctly only in 41% of all cases. To mitigate this source of error, we added slopes, position on the slope classes, and land surface shape classes to the list of PTF inputs. Regression trees generated topotextural groups that encompassed several textural classes. Using topographic variables and soil horizon appeared to be the way to make up for errors made in field determination of texture. Adding field descriptors of soil structure to the field-determined textural class gave similar results. No large improvement was achieved probably because textural class, topographic descriptors and structure descriptors were correlated predictors in many cases. Both median values and uncertainty of the saturated hydraulic conductivity had a power-law decrease as clay content increased. Defining two classes of bulk density helped to estimate hydraulic conductivity within textural classes. We conclude that categorical field soil survey data can be used in PTF-based estimating soil water retention and saturated hydraulic conductivity with quantified uncertainty

Pachepsky, Y. A.; Rawls, W. J.



A stable pattern of EEG spectral coherence distinguishes children with autism from neuro-typical controls - a large case control study  

PubMed Central

Background The autism rate has recently increased to 1 in 100 children. Genetic studies demonstrate poorly understood complexity. Environmental factors apparently also play a role. Magnetic resonance imaging (MRI) studies demonstrate increased brain sizes and altered connectivity. Electroencephalogram (EEG) coherence studies confirm connectivity changes. However, genetic-, MRI- and/or EEG-based diagnostic tests are not yet available. The varied study results likely reflect methodological and population differences, small samples and, for EEG, lack of attention to group-specific artifact. Methods Of the 1,304 subjects who participated in this study, with ages ranging from 1 to 18 years old and assessed with comparable EEG studies, 463 children were diagnosed with autism spectrum disorder (ASD); 571 children were neuro-typical controls (C). After artifact management, principal components analysis (PCA) identified EEG spectral coherence factors with corresponding loading patterns. The 2- to 12-year-old subsample consisted of 430 ASD- and 554 C-group subjects (n = 984). Discriminant function analysis (DFA) determined the spectral coherence factors' discrimination success for the two groups. Loading patterns on the DFA-selected coherence factors described ASD-specific coherence differences when compared to controls. Results Total sample PCA of coherence data identified 40 factors which explained 50.8% of the total population variance. For the 2- to 12-year-olds, the 40 factors showed highly significant group differences (P < 0.0001). Ten randomly generated split half replications demonstrated high-average classification success (C, 88.5%; ASD, 86.0%). Still higher success was obtained in the more restricted age sub-samples using the jackknifing technique: 2- to 4-year-olds (C, 90.6%; ASD, 98.1%); 4- to 6-year-olds (C, 90.9%; ASD 99.1%); and 6- to 12-year-olds (C, 98.7%; ASD, 93.9%). Coherence loadings demonstrated reduced short-distance and reduced, as well as increased, long-distance coherences for the ASD-groups, when compared to the controls. Average spectral loading per factor was wide (10.1 Hz). Conclusions Classification success suggests a stable coherence loading pattern that differentiates ASD- from C-group subjects. This might constitute an EEG coherence-based phenotype of childhood autism. The predominantly reduced short-distance coherences may indicate poor local network function. The increased long-distance coherences may represent compensatory processes or reduced neural pruning. The wide average spectral range of factor loadings may suggest over-damped neural networks.



Detection of B-Mode Polarization at Degree Angular Scales by BICEP2  

NASA Astrophysics Data System (ADS)

We report results from the BICEP2 experiment, a cosmic microwave background (CMB) polarimeter specifically designed to search for the signal of inflationary gravitational waves in the B-mode power spectrum around ?˜80. The telescope comprised a 26 cm aperture all-cold refracting optical system equipped with a focal plane of 512 antenna coupled transition edge sensor 150 GHz bolometers each with temperature sensitivity of ?300 ?KCMB?s . BICEP2 observed from the South Pole for three seasons from 2010 to 2012. A low-foreground region of sky with an effective area of 380 square deg was observed to a depth of 87 nK deg in Stokes Q and U. In this paper we describe the observations, data reduction, maps, simulations, and results. We find an excess of B-mode power over the base lensed-?CDM expectation in the range 305?. Through jackknife tests and simulations based on detailed calibration measurements we show that systematic contamination is much smaller than the observed excess. Cross correlating against WMAP 23 GHz maps we find that Galactic synchrotron makes a negligible contribution to the observed signal. We also examine a number of available models of polarized dust emission and find that at their default parameter values they predict power ˜(5-10)× smaller than the observed excess signal (with no significant cross-correlation with our maps). However, these models are not sufficiently constrained by external public data to exclude the possibility of dust emission bright enough to explain the entire excess signal. Cross correlating BICEP2 against 100 GHz maps from the BICEP1 experiment, the excess signal is confirmed with 3? significance and its spectral index is found to be consistent with that of the CMB, disfavoring dust at 1.7?. The observed B-mode power spectrum is well fit by a lensed-?CDM+tensor theoretical model with tensor-to-scalar ratio r =0.20-0.05+0.07, with r=0 disfavored at 7.0?. Accounting for the contribution of foreground, dust will shift this value downward by an amount which will be better constrained with upcoming data sets.

Ade, P. A. R.; Aikin, R. W.; Barkats, D.; Benton, S. J.; Bischoff, C. A.; Bock, J. J.; Brevik, J. A.; Buder, I.; Bullock, E.; Dowell, C. D.; Duband, L.; Filippini, J. P.; Fliescher, S.; Golwala, S. R.; Halpern, M.; Hasselfield, M.; Hildebrandt, S. R.; Hilton, G. C.; Hristov, V. V.; Irwin, K. D.; Karkare, K. S.; Kaufman, J. P.; Keating, B. G.; Kernasovskiy, S. A.; Kovac, J. M.; Kuo, C. L.; Leitch, E. M.; Lueker, M.; Mason, P.; Netterfield, C. B.; Nguyen, H. T.; O'Brient, R.; Ogburn, R. W.; Orlando, A.; Pryke, C.; Reintsema, C. D.; Richter, S.; Schwarz, R.; Sheehy, C. D.; Staniszewski, Z. K.; Sudiwala, R. V.; Teply, G. P.; Tolan, J. E.; Turner, A. D.; Vieregg, A. G.; Wong, C. L.; Yoon, K. W.; Bicep2 Collaboration



Estimation of parameters of dose-volume models and their confidence limits.  


Predictions of the normal-tissue complication probability (NTCP) for the ranking of treatment plans are based on fits of dose-volume models to clinical and/or experimental data. In the literature several different fit methods are used. In this work frequently used methods and techniques to fit NTCP models to dose response data for establishing dose-volume effects, are discussed. The techniques are tested for their usability with dose-volume data and NTCP models. Different methods to estimate the confidence intervals of the model parameters are part of this study. From a critical-volume (CV) model with biologically realistic parameters a primary dataset was generated, serving as the reference for this study and describable by the NTCP model. The CV model was fitted to this dataset. From the resulting parameters and the CV model, 1000 secondary datasets were generated by Monte Carlo simulation. All secondary datasets were fitted to obtain 1000 parameter sets of the CV model. Thus the 'real' spread in fit results due to statistical spreading in the data is obtained and has been compared with estimates of the confidence intervals obtained by different methods applied to the primary dataset. The confidence limits of the parameters of one dataset were estimated using the methods, employing the covariance matrix, the jackknife method and directly from the likelihood landscape. These results were compared with the spread of the parameters, obtained from the secondary parameter sets. For the estimation of confidence intervals on NTCP predictions, three methods were tested. Firstly, propagation of errors using the covariance matrix was used. Secondly, the meaning of the width of a bundle of curves that resulted from parameters that were within the one standard deviation region in the likelihood space was investigated. Thirdly, many parameter sets and their likelihood were used to create a likelihood-weighted probability distribution of the NTCP. It is concluded that for the type of dose response data used here, only a full likelihood analysis will produce reliable results. The often-used approximations, such as the usage of the covariance matrix, produce inconsistent confidence limits on both the parameter sets and the resulting NTCP values. PMID:12884921

van Luijk, P; Delvigne, T C; Schilstra, C; Schippers, J M



Application of threshold-bias independent analysis to eye-tracking and FROC data  

PubMed Central

Rationale and Objectives Studies of medical image interpretation have focused on either assessing radiologists’ performance using, for example, the receiver operating characteristic (ROC) paradigm, or assessing the interpretive process by analyzing eye-tracking (ET) data. Analysis of ET data has not benefited from threshold-bias independent figures-of-merit (FOMs) analogous to the area under the ROC curve. The aim was to demonstrate the feasibility of such FOMs and to measure the agreement between figures-of-merit derived from free-response ROC (FROC) and ET data. Methods Eight expert breast radiologists interpreted a case set of 120 two-view mammograms while eye-position data and FROC data were continuously collected during the interpretation interval. Regions that attract prolonged (>800ms) visual attention were considered to be virtual marks, and ratings based on the dwell and approach-rate (inverse of time-to-hit) were assigned to them. The virtual ratings were used to define threshold-bias independent FOMs in a manner analogous to the area under the trapezoidal alternative FROC (AFROC) curve (0 = worst, 1 = best). Agreement at the case level (0.5 = chance, 1 = perfect) was measured using the jackknife and 95% confidence intervals (CI) for the FOMs and agreement were estimated using the bootstrap. Results The AFROC mark-ratings FOM was largest 0.734, CI = (0.65, 0.81) followed by the dwell 0.460 (0.34, 0.59) and then by the approach-rate FOM 0.336 (0.25, 0.46). The differences between the FROC mark-ratings FOM and the perceptual FOMs were significant (p < 0.05). All pairwise agreements were significantly better then chance: ratings vs. dwell 0.707 (0.63, 0.88), dwell vs. approach-rate 0.703 (0.60, 0.79) and rating vs. approach-rate 0.606 (0.53, 0.68). The ratings vs. approach-rate agreement was significantly smaller than the dwell vs. approach-rate agreement (p = 0.008). Conclusions Leveraging current methods developed for analyzing observer performance data could complement current ways of analyzing ET data and lead to new insights.

Chakraborty, Dev P.; Yoon, Hong-Jun; Mello-Thoms, Claudia



Characterization of landslide kinematics with a long range terrestrial laser scan: a methodological approach  

NASA Astrophysics Data System (ADS)

The objective of this work is to present a methodology for analyzing large displacements of landslide with a terrestrial laser scan (TLS) and to characterize the acquisition and computation errors of the displacement fields. Several high resolution TLS observations (0.3 to 4 were acquired in representative plots of the Super-Sauze mudslide in 2007 and 2008: the main scarp in the upper part, the medium part exhibiting the highest displacement rates and the toe in the lower part. The TLS equipment is an Optech ILRIS-3D. All the processing has been performed with the Polyworks software. Among the procedures influencing the quality of the derived displacement fields, the alignment of the scans is the most sensitive. The distribution of statistical noise associated to equipment errors follows a normal law and does not significantly influence the quality of the displacement field ( = 0.1 cm, = 1.0 cm). As well, local changes in surface soil moisture do not significantly influence the quality of the displacement field; although the intensity of the signal is drastically decreased (~24% of the maximum intensity), observations on nearly saturated and unsaturated plots still indicate a tolerable error band of = 0.3 cm and = 0.2 cm at a distance of 30 m from the laser scan. To quantify the displacement field from the original point clouds, several approaches can be used: (1) point cloud comparisons (e.g. algorithm looking for the shortest points along a vector), (2) rebuilding of object geometry (TIN model analysis), and (3) difference of DEMs. In order to characterize displacements with an important horizontal component, it is demonstrated that the object recognition method is more efficient to characterize the kinematics on relative smooth topography than point clouds algorithms. To characterize displacement with a more important vertical component, such as the collapse of material from the main scarp of the mudslide, a "jackknife" procedure was used to identify the best interpolation techniques for producing the DEM. A differential DEM analysis allowed to define the volume of the collapse (~23.000 m3) as well as a progressive subsidence of the area downslope. The quality of the alignment is the most sensitive parameter influencing the accuracy of the laser scan observations. A good coverage among the scans and the inclusion of stable parts are necessary to maximize the alignment procedure, but the number of scans to acquire has also to be minimized in a survey planning.

Travelletti, J.; Oppikofer, T.; Malet, J.-P.; Jaboyedoff, M.



Subspace Dimensionality: A Tool for Automated QC in Seismic Array Processing  

NASA Astrophysics Data System (ADS)

Because of the great resolving power of seismic arrays, the application of automated processing to array data is critically important in treaty verification work. A significant problem in array analysis is the inclusion of bad sensor channels in the beamforming process. We are testing an approach to automated, on-the-fly quality control (QC) to aid in the identification of poorly performing sensor channels prior to beam-forming in routine event detection or location processing. The idea stems from methods used for large computer servers, when monitoring traffic at enormous numbers of nodes is impractical on a node-by node basis, so the dimensionality of the node traffic is instead monitoried for anomalies that could represent malware, cyber-attacks or other problems. The technique relies upon the use of subspace dimensionality or principal components of the overall system traffic. The subspace technique is not new to seismology, but its most common application has been limited to comparing waveforms to an a priori collection of templates for detecting highly similar events in a swarm or seismic cluster. In the established template application, a detector functions in a manner analogous to waveform cross-correlation, applying a statistical test to assess the similarity of the incoming data stream to known templates for events of interest. In our approach, we seek not to detect matching signals, but instead, we examine the signal subspace dimensionality in much the same way that the method addresses node traffic anomalies in large computer systems. Signal anomalies recorded on seismic arrays affect the dimensional structure of the array-wide time-series. We have shown previously that this observation is useful in identifying real seismic events, either by looking at the raw signal or derivatives thereof (entropy, kurtosis), but here we explore the effects of malfunctioning channels on the dimension of the data and its derivatives, and how to leverage this effect for identifying bad array elements through a jackknifing process to isolate the anomalous channels, so that an automated analysis system might discard them prior to FK analysis and beamforming on events of interest.

Rowe, C. A.; Stead, R. J.; Begnaud, M. L.



Optimized multiple quantum MAS lineshape simulations in solid state NMR  

NASA Astrophysics Data System (ADS)

The majority of nuclei available for study in solid state Nuclear Magnetic Resonance have half-integer spin I>1/2, with corresponding electric quadrupole moment. As such, they may couple with a surrounding electric field gradient. This effect introduces anisotropic line broadening to spectra, arising from distinct chemical species within polycrystalline solids. In Multiple Quantum Magic Angle Spinning (MQMAS) experiments, a second frequency dimension is created, devoid of quadrupolar anisotropy. As a result, the center of gravity of peaks in the high resolution dimension is a function of isotropic second order quadrupole and chemical shift alone. However, for complex materials, these parameters take on a stochastic nature due in turn to structural and chemical disorder. Lineshapes may still overlap in the isotropic dimension, complicating the task of assignment and interpretation. A distributed computational approach is presented here which permits simulation of the two-dimensional MQMAS spectrum, generated by random variates from model distributions of isotropic chemical and quadrupole shifts. Owing to the non-convex nature of the residual sum of squares (RSS) function between experimental and simulated spectra, simulated annealing is used to optimize the simulation parameters. In this manner, local chemical environments for disordered materials may be characterized, and via a re-sampling approach, error estimates for parameters produced. Program summaryProgram title: mqmasOPT Catalogue identifier: AEEC_v1_0 Program summary URL: Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, No. of lines in distributed program, including test data, etc.: 3650 No. of bytes in distributed program, including test data, etc.: 73 853 Distribution format: tar.gz Programming language: C, OCTAVE Computer: UNIX/Linux Operating system: UNIX/Linux Has the code been vectorised or parallelized?: Yes RAM: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 3.5M, SMP AMD opteron Classification: 2.3 External routines: OCTAVE (, GNU Scientific Library (, OPENMP ( Nature of problem: The optimal simulation and modeling of multiple quantum magic angle spinning NMR spectra, for general systems, especially those with mild to significant disorder. The approach outlined and implemented in C and OCTAVE also produces model parameter error estimates. Solution method: A model for each distinct chemical site is first proposed, for the individual contribution of crystallite orientations to the spectrum. This model is averaged over all powder angles [1], as well as the (stochastic) parameters; isotropic chemical shift and quadrupole coupling constant. The latter is accomplished via sampling from a bi-variate Gaussian distribution, using the Box-Muller algorithm to transform Sobol (quasi) random numbers [2]. A simulated annealing optimization is performed, and finally the non-linear jackknife [3] is applied in developing model parameter error estimates. Additional comments: The distribution contains a script, mqmasOpt.m, which runs in the OCTAVE language workspace. Running time: Example: (1597 powder angles) × (200 Samples) × (81 F2 frequency pts) × (31 F1 frequency points) = 58.35 seconds, SMP AMD opteron. References:S.K. Zaremba, Annali di Matematica Pura ed Applicata 73 (1966) 293. H. Niederreiter, Random Number Generation and Quasi-Monte Carlo Methods, SIAM, 1992. T. Fox, D. Hinkley, K. Larntz, Technometrics 22 (1980) 29.

Brouwer, William J.; Davis, Michael C.; Mueller, Karl T.



Modeling deformation associated with the 2004-2008 dome-building eruption of Mount St. Helens  

NASA Astrophysics Data System (ADS)

We estimate deformation sources active during and after the 2004-2008 dome-building eruption of Mount St. Helens (MSH) by inverting campaign and continuous GPS (CGPS) measured deformation between 2000 and 2011. All data are corrected for background deformation using a tectonic model that includes block rotation and uniform strain accumulation. The campaign GPS surveys characterize the deformation over a large area, and the CGPS data allow estimates of time-dependent changes in the rate of deformation. Only one CGPS station, JRO1, was operating near MSH prior to the start of unrest on September 23, 2004. Most other CGPS stations, installed by the Plate Boundary Observatory and Cascade Volcano Observatory, were operating by mid-October, 2004. The inward displacement of JRO1 started with the seismic unrest on September 23, 2004, and continued at a rate of 0.5 mm/day until the last phreatic explosion on October 5, 2004 (note there was another explosion in March 2005). The deformation then decayed exponentially until activity ceased in January, 2008. The rate of decay was estimated using a number of clean CGPS time series, and then it was fixed to estimate amplitudes for all CGPS station displacements. The inward and downward movements (deflation) observed at all stations during the eruption (2004-2008) were best-fit by a prolate spheroid with geometric aspect ratio 0.19 ± 0.6, a depth of 7.4 ± 1.7 km, and a cavity volume decrease of 0.028 ± 0.005 cubic km. This source is practically vertical (dip angle: 84 ± 5; strike angle 298 ± 84) and is located beneath the dome. All errors are 95% bounds and have been estimated using jackknife. The post-eruption deformation (2008 - present) is characterized by deflation in the near field (within 2 km from the dome) and inflation in the far field. The near-field deflation signal is best fit by a very shallow sill-like source (~0.18 ± 0.05 km below the crater floor) with a radius of 0.5 ± 0.3 km and a cavity volume decrease of 0.010 ± 0.001 cubic km. The best-fitting source for the far-field inflation is a prolate spheroid of geometric aspect ratio 0.12 ± 0.2, a depth of 7.3 ± 0.6 km, and a cavity volume increase of 0.006 ± 0.001 cubic km. The source dips slightly to the north (dip angle: 75 ± 4; strike angle 357 ± 8). Both sources are located beneath the dome. These results suggest that the same deep magma source has been active beneath the volcano for the past 7 years. This source fed the dome eruption and is now slowly being filled. The shallow source controlling the near-field, post-eruption deformation is probably due to the cooling and contraction of the lava dome within the crater.

Lisowski, M.; Battaglia, M.



Scaling of Microbial Competence for Sediment Remediation  

NASA Astrophysics Data System (ADS)

Reliable characterization of the spatial distribution of sediment site attributes, such as contaminant concentrations or microbial activity depends on how well sampled values represent all values throughout the entire study site. In addition to the reliability of samples described statistically, the physical scale of the samples may further introduce uncertainties. Whereas geostatistical tools have been developed to interpolate the attribute values in space, these do not explicitly take into account the uncertainties associated with the various scales (field, lab, mesocosm) at which the data have been collected. Hence, a model to evaluate uncertainties arising from the various sampling scales, is required to properly sample and interpret data from large sites such as contaminated sediments. Here, we describe a statistical model to optimize the reliability of sampled data on a multi-scale basis. The model not only serves as a tool to evaluate relationships over different scales by their covariances, but also make further use of these covariances as basis for a precision-optimized estimator. Unlike conventional geostatistic tools which are based on the point-to-point spatial structures, the multi-scale model introduces a new framework for spatial analysis in which regional values at different scales are anchored by the correlations of each other. The model is developed using least-squares optimization for estimations of different scales, by which the estimation variance can be evaluated for the estimation of all scales combined. The estimation by the new model is expected to have less unfavored smoothing effect than the conventional kriging approaches in the neighborhood of sampled locations. Preliminary results from a comparison to indicator kriging of a spatial dioxin dataset from the Passaic River indicate that both estimation models agree with regions with lower values, while the multi-scale model preserves the features in the regions where hot-spot measurements exist. Information on the smallest scale appropriate for the dataset is also honored by using the multi-scale model, while kriging approaches gives artificial extrapolation at the near-distance variation despite the sampling scheme of the data set. Uncertainties introduced by sampling equipments can subsequently be analyzed by the multi-scale model after the evaluation of estimation uncertainties completes, to meet the practical end of the model developed. Two evaluation tools for spatial estimation models, cross-validation and jackknifing, are performed on both the multi-scale model and the conventional kriging approach in order to assess the competence of the developed model. Both evaluation tools are similar in concept in that subsamples are removed from the original data set to be estimated by the rest of data points, while focusing differently on either the overall estimation performance or the regional estimation capability. The comparison, using the Passaic River dataset, will be used to inform the applicability and objectivity for both models.

Li, M.; Adriaens, P.



Azimuthal anisotropy beneath southern Africa from very broad-band surface-wave dispersion measurements  

NASA Astrophysics Data System (ADS)

Seismic anisotropy within the lithosphere of cratons preserves an important record of their ancient assembly. In southern Africa, anisotropy across the Archean Kaapvaal Craton and Limpopo Belt has been detected previously by observations of SKS-wave splitting. Because SKS-splitting measurements lack vertical resolution, however, the depth distribution of anisotropy has remained uncertain. End-member interpretations invoked the dominance of either anisotropy in the lithosphere (due to the fabric formed by deformation in Archean or Palaeoproterozoic orogenies) or that in the asthenosphere (due to the fabric formed by the recent plate motion), each with significant geodynamic implications. To determine the distribution of anisotropy with depth, we measured phase velocities of seismic surface waves between stations of the Southern African Seismic Experiment. We applied two complementary measurement approaches, very broad-band cross-correlation and multimode waveform inversion. Robust, Rayleigh- and Love-wave dispersion curves were derived for four different subregions of the Archean southern Africa in a period range from 5 s to 250-400 s (Rayleigh) and 5 s to 100-250 s (Love), depending on the region. Rayleigh-wave anisotropy was determined in each region at periods from 5 s to 150-200 s, sampling from the upper crust down to the asthenosphere. The jackknife method was used to estimate uncertainties, and the F-test to verify the statistical significance of anisotropy. We detected strong anisotropy with a N-S fast-propagation azimuth in the upper crust of the Limpopo Belt. We attribute it to aligned cracks, formed by the regional, E-W extensional stress associated with the southward propagation of the East African Rift. Our results show that it is possible to estimate regional stress from short-period, surface wave anisotropy, measured in this study using broad-band array recordings of teleseismic surface waves. Rayleigh-wave anisotropy at 70-120 s periods shows that the fabric within the deep mantle lithosphere of the Limpopo Belt and northern Kaapvaal Craton is aligned parallel to the Archean-Palaeoproterozoic sutures at block boundaries. This confirms that the fabric within the lithosphere created by pervasive ancient deformation is preserved to this day. Suture-parallel fabric is absent, however, in the deep lithosphere of the western Kaapvaal Craton, suggesting that it was not reworked in the collision with the craton's core, either due to its mechanical strength or because the deformation mechanism was different from those that operated in the north. Anisotropy at periods greater than 120-130 s shows fast directions parallel to the plate motion and indicates shear wave anisotropy in the asthenosphere. The depth distribution of anisotropy revealed by surface wave measurements comprises elements of both end-member models proposed previously: anisotropy in the asthenosphere shows fast-propagation directions parallel to the plate motion; anisotropy in the Limpopo and northern Kaapvaal lithosphere shows fast directions parallel to the Archean-Palaeoproterozoic sutures. The distribution of SKS-splitting orientations across southern Africa reflects anisotropic fabric both within the lithosphere (dominating the splitting beneath the Limpopo Belt and northern Kaapvaal Craton) and within the asthenosphere (dominating beneath the western Kaapvaal Craton).

Adam, Joanne M.-C.; Lebedev, Sergei