Naïve Bayes classification in R.
Zhang, Zhongheng
2016-06-01
Naïve Bayes classification is a kind of simple probabilistic classification methods based on Bayes' theorem with the assumption of independence between features. The model is trained on training dataset to make predictions by predict() function. This article introduces two functions naiveBayes() and train() for the performance of Naïve Bayes classification.
NASA Astrophysics Data System (ADS)
Rahmadani, S.; Dongoran, A.; Zarlis, M.; Zakarias
2018-03-01
This paper discusses the problem of feature selection using genetic algorithms on a dataset for classification problems. The classification model used is the decicion tree (DT), and Naive Bayes. In this paper we will discuss how the Naive Bayes and Decision Tree models to overcome the classification problem in the dataset, where the dataset feature is selectively selected using GA. Then both models compared their performance, whether there is an increase in accuracy or not. From the results obtained shows an increase in accuracy if the feature selection using GA. The proposed model is referred to as GADT (GA-Decision Tree) and GANB (GA-Naive Bayes). The data sets tested in this paper are taken from the UCI Machine Learning repository.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Diane De Steven,Ph.D.; Maureen Tone,PhD.
1997-10-01
This report address four project objectives: (1) Gradient model of Carolina bay vegetation on the SRS--The authors use ordination analyses to identify environmental and landscape factors that are correlated with vegetation composition. Significant factors can provide a framework for site-based conservation of existing diversity, and they may also be useful site predictors for potential vegetation in bay restorations. (2) Regional analysis of Carolina bay vegetation diversity--They expand the ordination analyses to assess the degree to which SRS bays encompass the range of vegetation diversity found in the regional landscape of South Carolina's western Upper Coastal Plain. Such comparisons can indicatemore » floristic status relative to regional potentials and identify missing species or community elements that might be re-introduced or restored. (3) Classification of vegetation communities in Upper Coastal Plain bays--They use cluster analysis to identify plant community-types at the regional scale, and explore how this classification may be functional with respect to significant environmental and landscape factors. An environmentally-based classification at the whole-bay level can provide a system of templates for managing bays as individual units and for restoring bays to desired plant communities. (4) Qualitative model for bay vegetation dynamics--They analyze present-day vegetation in relation to historic land uses and disturbances. The distinctive history of SRS bays provides the possibility of assessing pathways of post-disturbance succession. They attempt to develop a coarse-scale model of vegetation shifts in response to changing site factors; such qualitative models can provide a basis for suggesting management interventions that may be needed to maintain desired vegetation in protected or restored bays.« less
Assawamakin, Anunchai; Prueksaaroon, Supakit; Kulawonganunchai, Supasak; Shaw, Philip James; Varavithya, Vara; Ruangrajitpakorn, Taneth; Tongsima, Sissades
2013-01-01
Identification of suitable biomarkers for accurate prediction of phenotypic outcomes is a goal for personalized medicine. However, current machine learning approaches are either too complex or perform poorly. Here, a novel two-step machine-learning framework is presented to address this need. First, a Naïve Bayes estimator is used to rank features from which the top-ranked will most likely contain the most informative features for prediction of the underlying biological classes. The top-ranked features are then used in a Hidden Naïve Bayes classifier to construct a classification prediction model from these filtered attributes. In order to obtain the minimum set of the most informative biomarkers, the bottom-ranked features are successively removed from the Naïve Bayes-filtered feature list one at a time, and the classification accuracy of the Hidden Naïve Bayes classifier is checked for each pruned feature set. The performance of the proposed two-step Bayes classification framework was tested on different types of -omics datasets including gene expression microarray, single nucleotide polymorphism microarray (SNParray), and surface-enhanced laser desorption/ionization time-of-flight (SELDI-TOF) proteomic data. The proposed two-step Bayes classification framework was equal to and, in some cases, outperformed other classification methods in terms of prediction accuracy, minimum number of classification markers, and computational time.
Modified Mahalanobis Taguchi System for Imbalance Data Classification
2017-01-01
The Mahalanobis Taguchi System (MTS) is considered one of the most promising binary classification algorithms to handle imbalance data. Unfortunately, MTS lacks a method for determining an efficient threshold for the binary classification. In this paper, a nonlinear optimization model is formulated based on minimizing the distance between MTS Receiver Operating Characteristics (ROC) curve and the theoretical optimal point named Modified Mahalanobis Taguchi System (MMTS). To validate the MMTS classification efficacy, it has been benchmarked with Support Vector Machines (SVMs), Naive Bayes (NB), Probabilistic Mahalanobis Taguchi Systems (PTM), Synthetic Minority Oversampling Technique (SMOTE), Adaptive Conformal Transformation (ACT), Kernel Boundary Alignment (KBA), Hidden Naive Bayes (HNB), and other improved Naive Bayes algorithms. MMTS outperforms the benchmarked algorithms especially when the imbalance ratio is greater than 400. A real life case study on manufacturing sector is used to demonstrate the applicability of the proposed model and to compare its performance with Mahalanobis Genetic Algorithm (MGA). PMID:28811820
A Novel Feature Selection Technique for Text Classification Using Naïve Bayes.
Dey Sarkar, Subhajit; Goswami, Saptarsi; Agarwal, Aman; Aktar, Javed
2014-01-01
With the proliferation of unstructured data, text classification or text categorization has found many applications in topic classification, sentiment analysis, authorship identification, spam detection, and so on. There are many classification algorithms available. Naïve Bayes remains one of the oldest and most popular classifiers. On one hand, implementation of naïve Bayes is simple and, on the other hand, this also requires fewer amounts of training data. From the literature review, it is found that naïve Bayes performs poorly compared to other classifiers in text classification. As a result, this makes the naïve Bayes classifier unusable in spite of the simplicity and intuitiveness of the model. In this paper, we propose a two-step feature selection method based on firstly a univariate feature selection and then feature clustering, where we use the univariate feature selection method to reduce the search space and then apply clustering to select relatively independent feature sets. We demonstrate the effectiveness of our method by a thorough evaluation and comparison over 13 datasets. The performance improvement thus achieved makes naïve Bayes comparable or superior to other classifiers. The proposed algorithm is shown to outperform other traditional methods like greedy search based wrapper or CFS.
Hierarchical Naive Bayes for genetic association studies.
Malovini, Alberto; Barbarini, Nicola; Bellazzi, Riccardo; de Michelis, Francesca
2012-01-01
Genome Wide Association Studies represent powerful approaches that aim at disentangling the genetic and molecular mechanisms underlying complex traits. The usual "one-SNP-at-the-time" testing strategy cannot capture the multi-factorial nature of this kind of disorders. We propose a Hierarchical Naïve Bayes classification model for taking into account associations in SNPs data characterized by Linkage Disequilibrium. Validation shows that our model reaches classification performances superior to those obtained by the standard Naïve Bayes classifier for simulated and real datasets. In the Hierarchical Naïve Bayes implemented, the SNPs mapping to the same region of Linkage Disequilibrium are considered as "details" or "replicates" of the locus, each contributing to the overall effect of the region on the phenotype. A latent variable for each block, which models the "population" of correlated SNPs, can be then used to summarize the available information. The classification is thus performed relying on the latent variables conditional probability distributions and on the SNPs data available. The developed methodology has been tested on simulated datasets, each composed by 300 cases, 300 controls and a variable number of SNPs. Our approach has been also applied to two real datasets on the genetic bases of Type 1 Diabetes and Type 2 Diabetes generated by the Wellcome Trust Case Control Consortium. The approach proposed in this paper, called Hierarchical Naïve Bayes, allows dealing with classification of examples for which genetic information of structurally correlated SNPs are available. It improves the Naïve Bayes performances by properly handling the within-loci variability.
Bayesian learning for spatial filtering in an EEG-based brain-computer interface.
Zhang, Haihong; Yang, Huijuan; Guan, Cuntai
2013-07-01
Spatial filtering for EEG feature extraction and classification is an important tool in brain-computer interface. However, there is generally no established theory that links spatial filtering directly to Bayes classification error. To address this issue, this paper proposes and studies a Bayesian analysis theory for spatial filtering in relation to Bayes error. Following the maximum entropy principle, we introduce a gamma probability model for describing single-trial EEG power features. We then formulate and analyze the theoretical relationship between Bayes classification error and the so-called Rayleigh quotient, which is a function of spatial filters and basically measures the ratio in power features between two classes. This paper also reports our extensive study that examines the theory and its use in classification, using three publicly available EEG data sets and state-of-the-art spatial filtering techniques and various classifiers. Specifically, we validate the positive relationship between Bayes error and Rayleigh quotient in real EEG power features. Finally, we demonstrate that the Bayes error can be practically reduced by applying a new spatial filter with lower Rayleigh quotient.
A Dirichlet-Multinomial Bayes Classifier for Disease Diagnosis with Microbial Compositions.
Gao, Xiang; Lin, Huaiying; Dong, Qunfeng
2017-01-01
Dysbiosis of microbial communities is associated with various human diseases, raising the possibility of using microbial compositions as biomarkers for disease diagnosis. We have developed a Bayes classifier by modeling microbial compositions with Dirichlet-multinomial distributions, which are widely used to model multicategorical count data with extra variation. The parameters of the Dirichlet-multinomial distributions are estimated from training microbiome data sets based on maximum likelihood. The posterior probability of a microbiome sample belonging to a disease or healthy category is calculated based on Bayes' theorem, using the likelihood values computed from the estimated Dirichlet-multinomial distribution, as well as a prior probability estimated from the training microbiome data set or previously published information on disease prevalence. When tested on real-world microbiome data sets, our method, called DMBC (for Dirichlet-multinomial Bayes classifier), shows better classification accuracy than the only existing Bayesian microbiome classifier based on a Dirichlet-multinomial mixture model and the popular random forest method. The advantage of DMBC is its built-in automatic feature selection, capable of identifying a subset of microbial taxa with the best classification accuracy between different classes of samples based on cross-validation. This unique ability enables DMBC to maintain and even improve its accuracy at modeling species-level taxa. The R package for DMBC is freely available at https://github.com/qunfengdong/DMBC. IMPORTANCE By incorporating prior information on disease prevalence, Bayes classifiers have the potential to estimate disease probability better than other common machine-learning methods. Thus, it is important to develop Bayes classifiers specifically tailored for microbiome data. Our method shows higher classification accuracy than the only existing Bayesian classifier and the popular random forest method, and thus provides an alternative option for using microbial compositions for disease diagnosis.
Valent, Francesca; Clagnan, Elena; Zanier, Loris
2014-01-01
to assess whether Naïve Bayes Classification could be used to classify injury causes from the Emergency Room (ER) database, because in the Friuli Venezia Giulia Region (Northern Italy) the electronic ER data have never been used to study the epidemiology of injuries, because the proportion of generic "accidental" causes is much higher than that of injuries with a specific cause. application of the Naïve Bayes Classification method to the regional ER database. sensitivity, specificity, positive and negative predictive values, agreement, and the kappa statistic were calculated for the train dataset and the distribution of causes of injury for the test dataset. on 22.248 records with known cause, the classifications assigned by the model agreed moderately (kappa =0.53) with those assigned by ER personnel. The model was then used on 76.660 unclassified cases. Although sensitivity and positive predictive value of the method were generally poor, mainly due to limitations in the ER data, it allowed to estimate for the first time the frequency of specific injury causes in the Region. the model was useful to provide the "big picture" of non-fatal injuries in the Region. To improve the collection of injury data at the ER, the options available for injury classification in the ER software are being revised to make categories exhaustive and mutually exclusive.
NASA Astrophysics Data System (ADS)
Berlin, Cynthia Jane
1998-12-01
This research addresses the identification of the areal extent of the intertidal wetlands of Willapa Bay, Washington, and the evaluation of the potential for exotic Spartina alterniflora (smooth cordgrass) expansion in the bay using a spatial geographic approach. It is hoped that the results will address not only the management needs of the study area but provide a research design that may be applied to studies of other coastal wetlands. Four satellite images, three Landsat Multi-Spectral (MSS) and one Thematic Mapper (TM), are used to derive a map showing areas of water, low, middle and high intertidal, and upland. Two multi-date remote sensing mapping techniques are assessed: a supervised classification using density-slicing and an unsupervised classification using an ISODATA algorithm. Statistical comparisons are made between the resultant derived maps and the U.S.G.S. topographic maps for the Willapa Bay area. The potential for Spartina expansion in the bay is assessed using a sigmoidal (logistic) growth model and a spatial modelling procedure for four possible growth scenarios: without management controls (Business-as-Usual), with moderate management controls (e.g. harvesting to eliminate seed setting), under a hypothetical increase in the growth rate that may reflect favorable environmental changes, and under a hypothetical decrease in the growth rate that may reflect aggressive management controls. Comparisons for the statistics of the two mapping techniques suggest that although the unsupervised classification method performed satisfactorily, the supervised classification (density-slicing) method provided more satisfactory results. Results from the modelling of potential Spartina expansion suggest that Spartina expansion will proceed rapidly for the Business-as-Usual and hypothetical increase in the growth rate scenario, and at a slower rate for the elimination of seed setting and hypothetical decrease in the growth rate scenarios, until all potential habitat is filled.
In silico prediction of drug-induced myelotoxicity by using Naïve Bayes method.
Zhang, Hui; Yu, Peng; Zhang, Teng-Guo; Kang, Yan-Li; Zhao, Xiao; Li, Yuan-Yuan; He, Jia-Hui; Zhang, Ji
2015-11-01
Drug-induced myelotoxicity usually leads to decrease the production of platelets, red cells, and white cells. Thus, early identification and characterization of myelotoxicity hazard in drug development is very necessary. The purpose of this investigation was to develop a prediction model of drug-induced myelotoxicity by using a Naïve Bayes classifier. For comparison, other prediction models based on support vector machine and single-hidden-layer feed-forward neural network methods were also established. Among all the prediction models, the Naïve Bayes classification model showed the best prediction performance, which offered an average overall prediction accuracy of [Formula: see text] for the training set and [Formula: see text] for the external test set. The significant contributions of this study are that we first developed a Naïve Bayes classification model of drug-induced myelotoxicity adverse effect using a larger scale dataset, which could be employed for the prediction of drug-induced myelotoxicity. In addition, several important molecular descriptors and substructures of myelotoxic compounds have been identified, which should be taken into consideration in the design of new candidate compounds to produce safer and more effective drugs, ultimately reducing the attrition rate in later stages of drug development.
Factors Affecting the Item Parameter Estimation and Classification Accuracy of the DINA Model
ERIC Educational Resources Information Center
de la Torre, Jimmy; Hong, Yuan; Deng, Weiling
2010-01-01
To better understand the statistical properties of the deterministic inputs, noisy "and" gate cognitive diagnosis (DINA) model, the impact of several factors on the quality of the item parameter estimates and classification accuracy was investigated. Results of the simulation study indicate that the fully Bayes approach is most accurate when the…
Machine learning approach to automatic exudate detection in retinal images from diabetic patients
NASA Astrophysics Data System (ADS)
Sopharak, Akara; Dailey, Matthew N.; Uyyanonvara, Bunyarit; Barman, Sarah; Williamson, Tom; Thet Nwe, Khine; Aye Moe, Yin
2010-01-01
Exudates are among the preliminary signs of diabetic retinopathy, a major cause of vision loss in diabetic patients. Early detection of exudates could improve patients' chances to avoid blindness. In this paper, we present a series of experiments on feature selection and exudates classification using naive Bayes and support vector machine (SVM) classifiers. We first fit the naive Bayes model to a training set consisting of 15 features extracted from each of 115,867 positive examples of exudate pixels and an equal number of negative examples. We then perform feature selection on the naive Bayes model, repeatedly removing features from the classifier, one by one, until classification performance stops improving. To find the best SVM, we begin with the best feature set from the naive Bayes classifier, and repeatedly add the previously-removed features to the classifier. For each combination of features, we perform a grid search to determine the best combination of hyperparameters ν (tolerance for training errors) and γ (radial basis function width). We compare the best naive Bayes and SVM classifiers to a baseline nearest neighbour (NN) classifier using the best feature sets from both classifiers. We find that the naive Bayes and SVM classifiers perform better than the NN classifier. The overall best sensitivity, specificity, precision, and accuracy are 92.28%, 98.52%, 53.05%, and 98.41%, respectively.
Study on bayes discriminant analysis of EEG data.
Shi, Yuan; He, DanDan; Qin, Fang
2014-01-01
In this paper, we have done Bayes Discriminant analysis to EEG data of experiment objects which are recorded impersonally come up with a relatively accurate method used in feature extraction and classification decisions. In accordance with the strength of α wave, the head electrodes are divided into four species. In use of part of 21 electrodes EEG data of 63 people, we have done Bayes Discriminant analysis to EEG data of six objects. Results In use of part of EEG data of 63 people, we have done Bayes Discriminant analysis, the electrode classification accuracy rates is 64.4%. Bayes Discriminant has higher prediction accuracy, EEG features (mainly αwave) extract more accurate. Bayes Discriminant would be better applied to the feature extraction and classification decisions of EEG data.
NASA Astrophysics Data System (ADS)
Liu, Pudong; Zhou, Jiayuan; Shi, Runhe; Zhang, Chao; Liu, Chaoshun; Sun, Zhibin; Gao, Wei
2016-09-01
The aim of this work was to identify the coastal wetland plants between Bayes and BP neural network using hyperspectral data in order to optimize the classification method. For this purpose, we chose two dominant plants (invasive S. alterniflora and native P. australis) in the Yangtze Estuary, the leaf spectral reflectance of P. australis and S. alterniflora were measured by ASD field spectral machine. We tested the Bayes method and BP neural network for the identification of these two species. Results showed that three different bands (i.e., 555 nm 711 nm and 920 nm) could be identified as the sensitive bands for the input parameters for the two methods. Bayes method and BP neural network prediction model both performed well (Bayes prediction for 88.57% accuracy, BP neural network model prediction for about 80% accuracy), but Bayes theorem method could give higher accuracy and stability.
NASA Technical Reports Server (NTRS)
Williamson, F. S. L.
1974-01-01
The use of remote sensors to determine the characteristics of the wetlands of the Chesapeake Bay and surrounding areas is discussed. The objectives of the program are stated as follows: (1) to use data and remote sensing techniques developed from studies of Rhode River, West River, and South River salt marshes to develop a wetland classification scheme useful in other regions of the Chesapeake Bay and to evaluate the classification system with respect to vegetation types, marsh physiography, man-induced perturbation, and salinity; and (2) to develop a program using remote sensing techniques, for the extension of the classification to Chesapeake Bay salt marshes and to coordinate this program with the goals of the Chesapeake Research Consortium and the states of Maryland and Virginia. Maps of the Chesapeake Bay areas are developed from aerial photographs to display the wetland structure and vegetation.
NASA Technical Reports Server (NTRS)
Mobasseri, B. G.; Mcgillem, C. D.; Anuta, P. E. (Principal Investigator)
1978-01-01
The author has identified the following significant results. The probability of correct classification of various populations in data was defined as the primary performance index. The multispectral data being of multiclass nature as well, required a Bayes error estimation procedure that was dependent on a set of class statistics alone. The classification error was expressed in terms of an N dimensional integral, where N was the dimensionality of the feature space. The multispectral scanner spatial model was represented by a linear shift, invariant multiple, port system where the N spectral bands comprised the input processes. The scanner characteristic function, the relationship governing the transformation of the input spatial, and hence, spectral correlation matrices through the systems, was developed.
Improved Fuzzy K-Nearest Neighbor Using Modified Particle Swarm Optimization
NASA Astrophysics Data System (ADS)
Jamaluddin; Siringoringo, Rimbun
2017-12-01
Fuzzy k-Nearest Neighbor (FkNN) is one of the most powerful classification methods. The presence of fuzzy concepts in this method successfully improves its performance on almost all classification issues. The main drawbackof FKNN is that it is difficult to determine the parameters. These parameters are the number of neighbors (k) and fuzzy strength (m). Both parameters are very sensitive. This makes it difficult to determine the values of ‘m’ and ‘k’, thus making FKNN difficult to control because no theories or guides can deduce how proper ‘m’ and ‘k’ should be. This study uses Modified Particle Swarm Optimization (MPSO) to determine the best value of ‘k’ and ‘m’. MPSO is focused on the Constriction Factor Method. Constriction Factor Method is an improvement of PSO in order to avoid local circumstances optima. The model proposed in this study was tested on the German Credit Dataset. The test of the data/The data test has been standardized by UCI Machine Learning Repository which is widely applied to classification problems. The application of MPSO to the determination of FKNN parameters is expected to increase the value of classification performance. Based on the experiments that have been done indicating that the model offered in this research results in a better classification performance compared to the Fk-NN model only. The model offered in this study has an accuracy rate of 81%, while. With using Fk-NN model, it has the accuracy of 70%. At the end is done comparison of research model superiority with 2 other classification models;such as Naive Bayes and Decision Tree. This research model has a better performance level, where Naive Bayes has accuracy 75%, and the decision tree model has 70%
Theory and analysis of statistical discriminant techniques as applied to remote sensing data
NASA Technical Reports Server (NTRS)
Odell, P. L.
1973-01-01
Classification of remote earth resources sensing data according to normed exponential density statistics is reported. The use of density models appropriate for several physical situations provides an exact solution for the probabilities of classifications associated with the Bayes discriminant procedure even when the covariance matrices are unequal.
NASA Astrophysics Data System (ADS)
Sakuma, Jun; Wright, Rebecca N.
Privacy-preserving classification is the task of learning or training a classifier on the union of privately distributed datasets without sharing the datasets. The emphasis of existing studies in privacy-preserving classification has primarily been put on the design of privacy-preserving versions of particular data mining algorithms, However, in classification problems, preprocessing and postprocessing— such as model selection or attribute selection—play a prominent role in achieving higher classification accuracy. In this paper, we show generalization error of classifiers in privacy-preserving classification can be securely evaluated without sharing prediction results. Our main technical contribution is a new generalized Hamming distance protocol that is universally applicable to preprocessing and postprocessing of various privacy-preserving classification problems, such as model selection in support vector machine and attribute selection in naive Bayes classification.
Cannon, Edward O; Amini, Ata; Bender, Andreas; Sternberg, Michael J E; Muggleton, Stephen H; Glen, Robert C; Mitchell, John B O
2007-05-01
We investigate the classification performance of circular fingerprints in combination with the Naive Bayes Classifier (MP2D), Inductive Logic Programming (ILP) and Support Vector Inductive Logic Programming (SVILP) on a standard molecular benchmark dataset comprising 11 activity classes and about 102,000 structures. The Naive Bayes Classifier treats features independently while ILP combines structural fragments, and then creates new features with higher predictive power. SVILP is a very recently presented method which adds a support vector machine after common ILP procedures. The performance of the methods is evaluated via a number of statistical measures, namely recall, specificity, precision, F-measure, Matthews Correlation Coefficient, area under the Receiver Operating Characteristic (ROC) curve and enrichment factor (EF). According to the F-measure, which takes both recall and precision into account, SVILP is for seven out of the 11 classes the superior method. The results show that the Bayes Classifier gives the best recall performance for eight of the 11 targets, but has a much lower precision, specificity and F-measure. The SVILP model on the other hand has the highest recall for only three of the 11 classes, but generally far superior specificity and precision. To evaluate the statistical significance of the SVILP superiority, we employ McNemar's test which shows that SVILP performs significantly (p < 5%) better than both other methods for six out of 11 activity classes, while being superior with less significance for three of the remaining classes. While previously the Bayes Classifier was shown to perform very well in molecular classification studies, these results suggest that SVILP is able to extract additional knowledge from the data, thus improving classification results further.
Improving Hospital-Wide Early Resource Allocation through Machine Learning.
Gartner, Daniel; Padman, Rema
2015-01-01
The objective of this paper is to evaluate the extent to which early determination of diagnosis-related groups (DRGs) can be used for better allocation of scarce hospital resources. When elective patients seek admission, the true DRG, currently determined only at discharge, is unknown. We approach the problem of early DRG determination in three stages: (1) test how much a Naïve Bayes classifier can improve classification accuracy as compared to a hospital's current approach; (2) develop a statistical program that makes admission and scheduling decisions based on the patients' clincial pathways and scarce hospital resources; and (3) feed the DRG as classified by the Naïve Bayes classifier and the hospitals' baseline approach into the model (which we evaluate in simulation). Our results reveal that the DRG grouper performs poorly in classifying the DRG correctly before admission while the Naïve Bayes approach substantially improves the classification task. The results from the connection of the classification method with the mathematical program also reveal that resource allocation decisions can be more effective and efficient with the hybrid approach.
Relevance popularity: A term event model based feature selection scheme for text classification.
Feng, Guozhong; An, Baiguo; Yang, Fengqin; Wang, Han; Zhang, Libiao
2017-01-01
Feature selection is a practical approach for improving the performance of text classification methods by optimizing the feature subsets input to classifiers. In traditional feature selection methods such as information gain and chi-square, the number of documents that contain a particular term (i.e. the document frequency) is often used. However, the frequency of a given term appearing in each document has not been fully investigated, even though it is a promising feature to produce accurate classifications. In this paper, we propose a new feature selection scheme based on a term event Multinomial naive Bayes probabilistic model. According to the model assumptions, the matching score function, which is based on the prediction probability ratio, can be factorized. Finally, we derive a feature selection measurement for each term after replacing inner parameters by their estimators. On a benchmark English text datasets (20 Newsgroups) and a Chinese text dataset (MPH-20), our numerical experiment results obtained from using two widely used text classifiers (naive Bayes and support vector machine) demonstrate that our method outperformed the representative feature selection methods.
Mapping South San Francisco Bay's seabed diversity for use in wetland restoration planning
Fregoso, Theresa A.; Jaffe, B.; Rathwell, G.; Collins, W.; Rhynas, K.; Tomlin, V.; Sullivan, S.
2006-01-01
Data for an acoustic seabed classification were collected as a part of a California Coastal Conservancy funded bathymetric survey of South Bay in early 2005. A QTC VIEW seabed classification system recorded echoes from a sungle bean 50 kHz echosounder. Approximately 450,000 seabed classification records were generated from an are of of about 30 sq. miles. Ten district acoustic classes were identified through an unsupervised classification system using principle component and cluster analyses. One hundred and sixty-one grab samples and forty-five benthic community composition data samples collected in the study area shortly before and after the seabed classification survey, further refined the ten classes into groups based on grain size. A preliminary map of surficial grain size of South Bay was developed from the combination of the seabed classification and the grab and benthic samples. The initial seabed classification map, the grain size map, and locations of sediment samples will be displayed along with the methods of acousitc seabed classification.
Bayes classification of interferometric TOPSAR data
NASA Technical Reports Server (NTRS)
Michel, T. R.; Rodriguez, E.; Houshmand, B.; Carande, R.
1995-01-01
We report the Bayes classification of terrain types at different sites using airborne interferometric synthetic aperture radar (INSAR) data. A Gaussian maximum likelihood classifier was applied on multidimensional observations derived from the SAR intensity, the terrain elevation model, and the magnitude of the interferometric correlation. Training sets for forested, urban, agricultural, or bare areas were obtained either by selecting samples with known ground truth, or by k-means clustering of random sets of samples uniformly distributed across all sites, and subsequent assignments of these clusters using ground truth. The accuracy of the classifier was used to optimize the discriminating efficiency of the set of features that was chosen. The most important features include the SAR intensity, a canopy penetration depth model, and the terrain slope. We demonstrate the classifier's performance across sites using a unique set of training classes for the four main terrain categories. The scenes examined include San Francisco (CA) (predominantly urban and water), Mount Adams (WA) (forested with clear cuts), Pasadena (CA) (urban with mountains), and Antioch Hills (CA) (water, swamps, fields). Issues related to the effects of image calibration and the robustness of the classification to calibration errors are explored. The relative performance of single polarization Interferometric data classification is contrasted against classification schemes based on polarimetric SAR data.
Zhang, Zhipeng; Zhou, Jian; Song, Jingjing; Wang, Qixiang; Liu, Hongjun; Tang, Xuexi
2017-09-15
A habitat suitability index (HSI) model for the sea cucumber Apostichopus japonicus (Selenka) was established in the present study. Based on geographic information systems, the HSI model was used to identify potential sites around the Shandong Peninsula suitable for restoration of immature (<25g) and mature (>25g) A. japonicus. Six habitat factors were used as input variables for the HSI model: sediment classification, water temperature, salinity, water depth, pH and dissolved oxygen. The weighting of each habitat factor was defined through the Delphi method. Sediment classification was the most important condition affecting the HSI of A. japonicus in the different study areas, while water temperature was the most important condition in different seasons. The HSI of Western Laizhou Bay was relatively low, meaning the site was not suitable for aquaculture-based restoration of A. japonicus. In contrast, Xiaoheishan Island, Rongcheng Bay and Qingdao were preferable sites, suitable as habitats for restoration efforts. Copyright © 2017 Elsevier Ltd. All rights reserved.
Classifying emotion in Twitter using Bayesian network
NASA Astrophysics Data System (ADS)
Surya Asriadie, Muhammad; Syahrul Mubarok, Mohamad; Adiwijaya
2018-03-01
Language is used to express not only facts, but also emotions. Emotions are noticeable from behavior up to the social media statuses written by a person. Analysis of emotions in a text is done in a variety of media such as Twitter. This paper studies classification of emotions on twitter using Bayesian network because of its ability to model uncertainty and relationships between features. The result is two models based on Bayesian network which are Full Bayesian Network (FBN) and Bayesian Network with Mood Indicator (BNM). FBN is a massive Bayesian network where each word is treated as a node. The study shows the method used to train FBN is not very effective to create the best model and performs worse compared to Naive Bayes. F1-score for FBN is 53.71%, while for Naive Bayes is 54.07%. BNM is proposed as an alternative method which is based on the improvement of Multinomial Naive Bayes and has much lower computational complexity compared to FBN. Even though it’s not better compared to FBN, the resulting model successfully improves the performance of Multinomial Naive Bayes. F1-Score for Multinomial Naive Bayes model is 51.49%, while for BNM is 52.14%.
Comparing K-mer based methods for improved classification of 16S sequences.
Vinje, Hilde; Liland, Kristian Hovde; Almøy, Trygve; Snipen, Lars
2015-07-01
The need for precise and stable taxonomic classification is highly relevant in modern microbiology. Parallel to the explosion in the amount of sequence data accessible, there has also been a shift in focus for classification methods. Previously, alignment-based methods were the most applicable tools. Now, methods based on counting K-mers by sliding windows are the most interesting classification approach with respect to both speed and accuracy. Here, we present a systematic comparison on five different K-mer based classification methods for the 16S rRNA gene. The methods differ from each other both in data usage and modelling strategies. We have based our study on the commonly known and well-used naïve Bayes classifier from the RDP project, and four other methods were implemented and tested on two different data sets, on full-length sequences as well as fragments of typical read-length. The difference in classification error obtained by the methods seemed to be small, but they were stable and for both data sets tested. The Preprocessed nearest-neighbour (PLSNN) method performed best for full-length 16S rRNA sequences, significantly better than the naïve Bayes RDP method. On fragmented sequences the naïve Bayes Multinomial method performed best, significantly better than all other methods. For both data sets explored, and on both full-length and fragmented sequences, all the five methods reached an error-plateau. We conclude that no K-mer based method is universally best for classifying both full-length sequences and fragments (reads). All methods approach an error plateau indicating improved training data is needed to improve classification from here. Classification errors occur most frequent for genera with few sequences present. For improving the taxonomy and testing new classification methods, the need for a better and more universal and robust training data set is crucial.
Zmiri, Dror; Shahar, Yuval; Taieb-Maimon, Meirav
2012-04-01
To test the feasibility of classifying emergency department patients into severity grades using data mining methods. Emergency department records of 402 patients were classified into five severity grades by two expert physicians. The Naïve Bayes and C4.5 algorithms were applied to produce classifiers from patient data into severity grades. The classifiers' results over several subsets of the data were compared with the physicians' assessments, with a random classifier, and with a classifier that selects the maximal-prevalence class. Positive predictive value, multiple-class extensions of sensitivity and specificity combinations, and entropy change. The mean accuracy of the data mining classifiers was 52.94 ± 5.89%, significantly better (P < 0.05) than the mean accuracy of a random classifier (34.60 ± 2.40%). The entropy of the input data sets was reduced through classification by a mean of 10.1%. Allowing for classification deviations of one severity grade led to mean accuracy of 85.42 ± 1.42%. The classifiers' accuracy in that case was similar to the physicians' consensus rate. Learning from consensus records led to better performance. Reducing the number of severity grades improved results in certain cases. The performance of the Naïve Bayes and C4.5 algorithms was similar; in unbalanced data sets, Naïve Bayes performed better. It is possible to produce a computerized classification model for the severity grade of triage patients, using data mining methods. Learning from patient records regarding which there is a consensus of several physicians is preferable to learning from each physician's patients. Either Naïve Bayes or C4.5 can be used; Naïve Bayes is preferable for unbalanced data sets. An ambiguity in the intermediate severity grades seems to hamper both the physicians' agreement and the classifiers' accuracy. © 2010 Blackwell Publishing Ltd.
Classification of Indonesian quote on Twitter using Naïve Bayes
NASA Astrophysics Data System (ADS)
Rachmadany, A.; Pranoto, Y. M.; Gunawan; Multazam, M. T.; Nandiyanto, A. B. D.; Abdullah, A. G.; Widiaty, I.
2018-01-01
Quote is sentences made in the hope that someone can become strong personalities, individuals who always improve themselves to move forward and achieve success. Social media is a place for people to express his heart to the world that sometimes the expression of the heart is quotes. Here, the purpose of this study was to classify Indonesian quote on Twitter using Naïve Bayes. This experiment uses text classification from Twitter data written by Twitter users which are quote then classification again grouped into 6 categories (Love, Life, Motivation, Education, Religion, Others). The language used is Indonesian. The method used is Naive Bayes. The results of this experiment are a web application collection of Indonesian quote that have been classified. This classification gives the user ease in finding quote based on class or keyword. For example, when a user wants to find a 'motivation' quote, this classification would be very useful.
Monti, S.; Cooper, G. F.
1998-01-01
We present a new Bayesian classifier for computer-aided diagnosis. The new classifier builds upon the naive-Bayes classifier, and models the dependencies among patient findings in an attempt to improve its performance, both in terms of classification accuracy and in terms of calibration of the estimated probabilities. This work finds motivation in the argument that highly calibrated probabilities are necessary for the clinician to be able to rely on the model's recommendations. Experimental results are presented, supporting the conclusion that modeling the dependencies among findings improves calibration. PMID:9929288
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pon, R K; Cardenas, A F; Buttler, D J
The definition of what makes an article interesting varies from user to user and continually evolves even for a single user. As a result, for news recommendation systems, useless document features can not be determined a priori and all features are usually considered for interestingness classification. Consequently, the presence of currently useless features degrades classification performance [1], particularly over the initial set of news articles being classified. The initial set of document is critical for a user when considering which particular news recommendation system to adopt. To address these problems, we introduce an improved version of the naive Bayes classifiermore » with online feature selection. We use correlation to determine the utility of each feature and take advantage of the conditional independence assumption used by naive Bayes for online feature selection and classification. The augmented naive Bayes classifier performs 28% better than the traditional naive Bayes classifier in recommending news articles from the Yahoo! RSS feeds.« less
NASA Astrophysics Data System (ADS)
Calvin Frans Mariel, Wahyu; Mariyah, Siti; Pramana, Setia
2018-03-01
Deep learning is a new era of machine learning techniques that essentially imitate the structure and function of the human brain. It is a development of deeper Artificial Neural Network (ANN) that uses more than one hidden layer. Deep Learning Neural Network has a great ability on recognizing patterns from various data types such as picture, audio, text, and many more. In this paper, the authors tries to measure that algorithm’s ability by applying it into the text classification. The classification task herein is done by considering the content of sentiment in a text which is also called as sentiment analysis. By using several combinations of text preprocessing and feature extraction techniques, we aim to compare the precise modelling results of Deep Learning Neural Network with the other two commonly used algorithms, the Naϊve Bayes and Support Vector Machine (SVM). This algorithm comparison uses Indonesian text data with balanced and unbalanced sentiment composition. Based on the experimental simulation, Deep Learning Neural Network clearly outperforms the Naϊve Bayes and SVM and offers a better F-1 Score while for the best feature extraction technique which improves that modelling result is Bigram.
Delineation of marsh types from Corpus Christi Bay, Texas, to Perdido Bay, Alabama, in 2010
Enwright, Nicholas M.; Hartley, Stephen B.; Couvillion, Brady R.; Michael G. Brasher,; Jenneke M. Visser,; Michael K. Mitchell,; Bart M. Ballard,; Mark W. Parr,; Barry C. Wilson,
2015-07-23
This study incorporates about 9,800 ground reference locations collected via helicopter surveys in coastal wetland areas. Decision-tree analyses were used to classify emergent marsh vegetation types by using ground reference data from helicopter vegetation surveys and independent variables such as multitemporal satellite-based multispectral imagery from 2009 to 2011, bare-earth digital elevation models based on airborne light detection and ranging (lidar), alternative contemporary land cover classifications, and other spatially explicit variables. Image objects were created from 2010 National Agriculture Imagery Program color-infrared aerial photography. The final classification is a 10-meter raster dataset that was produced by using a majority filter to classify image objects according to the marsh vegetation type covering the majority of each image object. The classification is dated 2010 because the year is both the midpoint of the classified multitemporal satellite-based imagery (2009–11) and the date of the high-resolution airborne imagery that was used to develop image objects. The seamless classification produced through this work can be used to help develop and refine conservation efforts for priority natural resources.
Lu, Yingjie
2013-01-01
To facilitate patient involvement in online health community and obtain informative support and emotional support they need, a topic identification approach was proposed in this paper for identifying automatically topics of the health-related messages in online health community, thus assisting patients in reaching the most relevant messages for their queries efficiently. Feature-based classification framework was presented for automatic topic identification in our study. We first collected the messages related to some predefined topics in a online health community. Then we combined three different types of features, n-gram-based features, domain-specific features and sentiment features to build four feature sets for health-related text representation. Finally, three different text classification techniques, C4.5, Naïve Bayes and SVM were adopted to evaluate our topic classification model. By comparing different feature sets and different classification techniques, we found that n-gram-based features, domain-specific features and sentiment features were all considered to be effective in distinguishing different types of health-related topics. In addition, feature reduction technique based on information gain was also effective to improve the topic classification performance. In terms of classification techniques, SVM outperformed C4.5 and Naïve Bayes significantly. The experimental results demonstrated that the proposed approach could identify the topics of online health-related messages efficiently.
Risk forewarning model for rice grain Cd pollution based on Bayes theory.
Wu, Bo; Guo, Shuhai; Zhang, Lingyan; Li, Fengmei
2018-03-15
Cadmium (Cd) pollution of rice grain caused by Cd-contaminated soils is a common problem in southwest and central south China. In this study, utilizing the advantages of the Bayes classification statistical method, we established a risk forewarning model for rice grain Cd pollution, and put forward two parameters (the prior probability factor and data variability factor). The sensitivity analysis of the model parameters illustrated that sample size and standard deviation influenced the accuracy and applicable range of the model. The accuracy of the model was improved by the self-renewal of the model through adding the posterior data into the priori data. Furthermore, this method can be used to predict the risk probability of rice grain Cd pollution under similar soil environment, tillage and rice varietal conditions. The Bayes approach thus represents a feasible method for risk forewarning of heavy metals pollution of agricultural products caused by contaminated soils. Copyright © 2017 Elsevier B.V. All rights reserved.
A decision support model for investment on P2P lending platform.
Zeng, Xiangxiang; Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao
2017-01-01
Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace-Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone.
A decision support model for investment on P2P lending platform
Liu, Li; Leung, Stephen; Du, Jiangze; Wang, Xun; Li, Tao
2017-01-01
Peer-to-peer (P2P) lending, as a novel economic lending model, has triggered new challenges on making effective investment decisions. In a P2P lending platform, one lender can invest N loans and a loan may be accepted by M investors, thus forming a bipartite graph. Basing on the bipartite graph model, we built an iteration computation model to evaluate the unknown loans. To validate the proposed model, we perform extensive experiments on real-world data from the largest American P2P lending marketplace—Prosper. By comparing our experimental results with those obtained by Bayes and Logistic Regression, we show that our computation model can help borrowers select good loans and help lenders make good investment decisions. Experimental results also show that the Logistic classification model is a good complement to our iterative computation model, which motivates us to integrate the two classification models. The experimental results of the hybrid classification model demonstrate that the logistic classification model and our iteration computation model are complementary to each other. We conclude that the hybrid model (i.e., the integration of iterative computation model and Logistic classification model) is more efficient and stable than the individual model alone. PMID:28877234
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases.
Masías, Víctor Hugo; Valle, Mauricio; Morselli, Carlo; Crespo, Fernando; Vargas, Augusto; Laengle, Sigifredo
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers-Logistic Regression, Naïve Bayes and Random Forest-with a range of social network measures and the necessary databases to model the verdicts in two real-world cases: the U.S. Watergate Conspiracy of the 1970's and the now-defunct Canada-based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures.
Williams, Jennifer A.; Schmitter-Edgecombe, Maureen; Cook, Diane J.
2016-01-01
Introduction Reducing the amount of testing required to accurately detect cognitive impairment is clinically relevant. The aim of this research was to determine the fewest number of clinical measures required to accurately classify participants as healthy older adult, mild cognitive impairment (MCI) or dementia using a suite of classification techniques. Methods Two variable selection machine learning models (i.e., naive Bayes, decision tree), a logistic regression, and two participant datasets (i.e., clinical diagnosis, clinical dementia rating; CDR) were explored. Participants classified using clinical diagnosis criteria included 52 individuals with dementia, 97 with MCI, and 161 cognitively healthy older adults. Participants classified using CDR included 154 individuals CDR = 0, 93 individuals with CDR = 0.5, and 25 individuals with CDR = 1.0+. Twenty-seven demographic, psychological, and neuropsychological variables were available for variable selection. Results No significant difference was observed between naive Bayes, decision tree, and logistic regression models for classification of both clinical diagnosis and CDR datasets. Participant classification (70.0 – 99.1%), geometric mean (60.9 – 98.1%), sensitivity (44.2 – 100%), and specificity (52.7 – 100%) were generally satisfactory. Unsurprisingly, the MCI/CDR = 0.5 participant group was the most challenging to classify. Through variable selection only 2 – 9 variables were required for classification and varied between datasets in a clinically meaningful way. Conclusions The current study results reveal that machine learning techniques can accurately classifying cognitive impairment and reduce the number of measures required for diagnosis. PMID:26332171
A Corpus-Based Approach for Automatic Thai Unknown Word Recognition Using Boosting Techniques
NASA Astrophysics Data System (ADS)
Techo, Jakkrit; Nattee, Cholwich; Theeramunkong, Thanaruk
While classification techniques can be applied for automatic unknown word recognition in a language without word boundary, it faces with the problem of unbalanced datasets where the number of positive unknown word candidates is dominantly smaller than that of negative candidates. To solve this problem, this paper presents a corpus-based approach that introduces a so-called group-based ranking evaluation technique into ensemble learning in order to generate a sequence of classification models that later collaborate to select the most probable unknown word from multiple candidates. Given a classification model, the group-based ranking evaluation (GRE) is applied to construct a training dataset for learning the succeeding model, by weighing each of its candidates according to their ranks and correctness when the candidates of an unknown word are considered as one group. A number of experiments have been conducted on a large Thai medical text to evaluate performance of the proposed group-based ranking evaluation approach, namely V-GRE, compared to the conventional naïve Bayes classifier and our vanilla version without ensemble learning. As the result, the proposed method achieves an accuracy of 90.93±0.50% when the first rank is selected while it gains 97.26±0.26% when the top-ten candidates are considered, that is 8.45% and 6.79% improvement over the conventional record-based naïve Bayes classifier and the vanilla version. Another result on applying only best features show 93.93±0.22% and up to 98.85±0.15% accuracy for top-1 and top-10, respectively. They are 3.97% and 9.78% improvement over naive Bayes and the vanilla version. Finally, an error analysis is given.
Classification of earth terrain using polarimetric synthetic aperture radar images
NASA Technical Reports Server (NTRS)
Lim, H. H.; Swartz, A. A.; Yueh, H. A.; Kong, J. A.; Shin, R. T.; Van Zyl, J. J.
1989-01-01
Supervised and unsupervised classification techniques are developed and used to classify the earth terrain components from SAR polarimetric images of San Francisco Bay and Traverse City, Michigan. The supervised techniques include the Bayes classifiers, normalized polarimetric classification, and simple feature classification using discriminates such as the absolute and normalized magnitude response of individual receiver channel returns and the phase difference between receiver channels. An algorithm is developed as an unsupervised technique which classifies terrain elements based on the relationship between the orientation angle and the handedness of the transmitting and receiving polariation states. It is found that supervised classification produces the best results when accurate classifier training data are used, while unsupervised classification may be applied when training data are not available.
Multinomial mixture model with heterogeneous classification probabilities
Holland, M.D.; Gray, B.R.
2011-01-01
Royle and Link (Ecology 86(9):2505-2512, 2005) proposed an analytical method that allowed estimation of multinomial distribution parameters and classification probabilities from categorical data measured with error. While useful, we demonstrate algebraically and by simulations that this method yields biased multinomial parameter estimates when the probabilities of correct category classifications vary among sampling units. We address this shortcoming by treating these probabilities as logit-normal random variables within a Bayesian framework. We use Markov chain Monte Carlo to compute Bayes estimates from a simulated sample from the posterior distribution. Based on simulations, this elaborated Royle-Link model yields nearly unbiased estimates of multinomial and correct classification probability estimates when classification probabilities are allowed to vary according to the normal distribution on the logit scale or according to the Beta distribution. The method is illustrated using categorical submersed aquatic vegetation data. ?? 2010 Springer Science+Business Media, LLC.
Chung, Sukhoon; Rhee, Hyunsill; Suh, Yongmoo
2010-01-01
Objectives This study sought to find answers to the following questions: 1) Can we predict whether a patient will revisit a healthcare center? 2) Can we anticipate diseases of patients who revisit the center? Methods For the first question, we applied 5 classification algorithms (decision tree, artificial neural network, logistic regression, Bayesian networks, and Naïve Bayes) and the stacking-bagging method for building classification models. To solve the second question, we performed sequential pattern analysis. Results We determined: 1) In general, the most influential variables which impact whether a patient of a public healthcare center will revisit it or not are personal burden, insurance bill, period of prescription, age, systolic pressure, name of disease, and postal code. 2) The best plain classification model is dependent on the dataset. 3) Based on average of classification accuracy, the proposed stacking-bagging method outperformed all traditional classification models and our sequential pattern analysis revealed 16 sequential patterns. Conclusions Classification models and sequential patterns can help public healthcare centers plan and implement healthcare service programs and businesses that are more appropriate to local residents, encouraging them to revisit public health centers. PMID:21818426
Protein classification based on text document classification techniques.
Cheng, Betty Yee Man; Carbonell, Jaime G; Klein-Seetharaman, Judith
2005-03-01
The need for accurate, automated protein classification methods continues to increase as advances in biotechnology uncover new proteins. G-protein coupled receptors (GPCRs) are a particularly difficult superfamily of proteins to classify due to extreme diversity among its members. Previous comparisons of BLAST, k-nearest neighbor (k-NN), hidden markov model (HMM) and support vector machine (SVM) using alignment-based features have suggested that classifiers at the complexity of SVM are needed to attain high accuracy. Here, analogous to document classification, we applied Decision Tree and Naive Bayes classifiers with chi-square feature selection on counts of n-grams (i.e. short peptide sequences of length n) to this classification task. Using the GPCR dataset and evaluation protocol from the previous study, the Naive Bayes classifier attained an accuracy of 93.0 and 92.4% in level I and level II subfamily classification respectively, while SVM has a reported accuracy of 88.4 and 86.3%. This is a 39.7 and 44.5% reduction in residual error for level I and level II subfamily classification, respectively. The Decision Tree, while inferior to SVM, outperforms HMM in both level I and level II subfamily classification. For those GPCR families whose profiles are stored in the Protein FAMilies database of alignments and HMMs (PFAM), our method performs comparably to a search against those profiles. Finally, our method can be generalized to other protein families by applying it to the superfamily of nuclear receptors with 94.5, 97.8 and 93.6% accuracy in family, level I and level II subfamily classification respectively. Copyright 2005 Wiley-Liss, Inc.
Bayes-LQAS: classifying the prevalence of global acute malnutrition
2010-01-01
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications. PMID:20534159
Bayes-LQAS: classifying the prevalence of global acute malnutrition.
Olives, Casey; Pagano, Marcello
2010-06-09
Lot Quality Assurance Sampling (LQAS) applications in health have generally relied on frequentist interpretations for statistical validity. Yet health professionals often seek statements about the probability distribution of unknown parameters to answer questions of interest. The frequentist paradigm does not pretend to yield such information, although a Bayesian formulation might. This is the source of an error made in a recent paper published in this journal. Many applications lend themselves to a Bayesian treatment, and would benefit from such considerations in their design. We discuss Bayes-LQAS (B-LQAS), which allows for incorporation of prior information into the LQAS classification procedure, and thus shows how to correct the aforementioned error. Further, we pay special attention to the formulation of Bayes Operating Characteristic Curves and the use of prior information to improve survey designs. As a motivating example, we discuss the classification of Global Acute Malnutrition prevalence and draw parallels between the Bayes and classical classifications schemes. We also illustrate the impact of informative and non-informative priors on the survey design. Results indicate that using a Bayesian approach allows the incorporation of expert information and/or historical data and is thus potentially a valuable tool for making accurate and precise classifications.
Stephens, David; Diesing, Markus
2014-01-01
Detailed seabed substrate maps are increasingly in demand for effective planning and management of marine ecosystems and resources. It has become common to use remotely sensed multibeam echosounder data in the form of bathymetry and acoustic backscatter in conjunction with ground-truth sampling data to inform the mapping of seabed substrates. Whilst, until recently, such data sets have typically been classified by expert interpretation, it is now obvious that more objective, faster and repeatable methods of seabed classification are required. This study compares the performances of a range of supervised classification techniques for predicting substrate type from multibeam echosounder data. The study area is located in the North Sea, off the north-east coast of England. A total of 258 ground-truth samples were classified into four substrate classes. Multibeam bathymetry and backscatter data, and a range of secondary features derived from these datasets were used in this study. Six supervised classification techniques were tested: Classification Trees, Support Vector Machines, k-Nearest Neighbour, Neural Networks, Random Forest and Naive Bayes. Each classifier was trained multiple times using different input features, including i) the two primary features of bathymetry and backscatter, ii) a subset of the features chosen by a feature selection process and iii) all of the input features. The predictive performances of the models were validated using a separate test set of ground-truth samples. The statistical significance of model performances relative to a simple baseline model (Nearest Neighbour predictions on bathymetry and backscatter) were tested to assess the benefits of using more sophisticated approaches. The best performing models were tree based methods and Naive Bayes which achieved accuracies of around 0.8 and kappa coefficients of up to 0.5 on the test set. The models that used all input features didn't generally perform well, highlighting the need for some means of feature selection.
Modeling Verdict Outcomes Using Social Network Measures: The Watergate and Caviar Network Cases
2016-01-01
Modelling criminal trial verdict outcomes using social network measures is an emerging research area in quantitative criminology. Few studies have yet analyzed which of these measures are the most important for verdict modelling or which data classification techniques perform best for this application. To compare the performance of different techniques in classifying members of a criminal network, this article applies three different machine learning classifiers–Logistic Regression, Naïve Bayes and Random Forest–with a range of social network measures and the necessary databases to model the verdicts in two real–world cases: the U.S. Watergate Conspiracy of the 1970’s and the now–defunct Canada–based international drug trafficking ring known as the Caviar Network. In both cases it was found that the Random Forest classifier did better than either Logistic Regression or Naïve Bayes, and its superior performance was statistically significant. This being so, Random Forest was used not only for classification but also to assess the importance of the measures. For the Watergate case, the most important one proved to be betweenness centrality while for the Caviar Network, it was the effective size of the network. These results are significant because they show that an approach combining machine learning with social network analysis not only can generate accurate classification models but also helps quantify the importance social network variables in modelling verdict outcomes. We conclude our analysis with a discussion and some suggestions for future work in verdict modelling using social network measures. PMID:26824351
NASA Astrophysics Data System (ADS)
Fezzani, Ridha; Berger, Laurent
2018-06-01
An automated signal-based method was developed in order to analyse the seafloor backscatter data logged by calibrated multibeam echosounder. The processing consists first in the clustering of each survey sub-area into a small number of homogeneous sediment types, based on the backscatter average level at one or several incidence angles. Second, it uses their local average angular response to extract discriminant descriptors, obtained by fitting the field data to the Generic Seafloor Acoustic Backscatter parametric model. Third, the descriptors are used for seafloor type classification. The method was tested on the multi-year data recorded by a calibrated 90-kHz Simrad ME70 multibeam sonar operated in the Bay of Biscay, France and Celtic Sea, Ireland. It was applied for seafloor-type classification into 12 classes, to a dataset of 158 spots surveyed for demersal and benthic fauna study and monitoring. Qualitative analyses and classified clusters using extracted parameters show a good discriminatory potential, indicating the robustness of this approach.
NASA Technical Reports Server (NTRS)
Ackleson, S. G.; Klemas, V.
1987-01-01
Landsat MSS and TM imagery, obtained simultaneously over Guinea Marsh, VA, as analyzed and compares for its ability to detect submerged aquatic vegetation (SAV). An unsupervised clustering algorithm was applied to each image, where the input classification parameters are defined as functions of apparent sensor noise. Class confidence and accuracy were computed for all water areas by comparing the classified images, pixel-by-pixel, to rasterized SAV distributions derived from color aerial photography. To illustrate the effect of water depth on classification error, areas of depth greater than 1.9 m were masked, and class confidence and accuracy recalculated. A single-scattering radiative-transfer model is used to illustrate how percent canopy cover and water depth affect the volume reflectance from a water column containing SAV. For a submerged canopy that is morphologically and optically similar to Zostera marina inhabiting Lower Chesapeake Bay, dense canopies may be isolated by masking optically deep water. For less dense canopies, the effect of increasing water depth is to increase the apparent percent crown cover, which may result in classification error.
Adaptive classifier for steel strip surface defects
NASA Astrophysics Data System (ADS)
Jiang, Mingming; Li, Guangyao; Xie, Li; Xiao, Mang; Yi, Li
2017-01-01
Surface defects detection system has been receiving increased attention as its precision, speed and less cost. One of the most challenges is reacting to accuracy deterioration with time as aged equipment and changed processes. These variables will make a tiny change to the real world model but a big impact on the classification result. In this paper, we propose a new adaptive classifier with a Bayes kernel (BYEC) which update the model with small sample to it adaptive for accuracy deterioration. Firstly, abundant features were introduced to cover lots of information about the defects. Secondly, we constructed a series of SVMs with the random subspace of the features. Then, a Bayes classifier was trained as an evolutionary kernel to fuse the results from base SVMs. Finally, we proposed the method to update the Bayes evolutionary kernel. The proposed algorithm is experimentally compared with different algorithms, experimental results demonstrate that the proposed method can be updated with small sample and fit the changed model well. Robustness, low requirement for samples and adaptive is presented in the experiment.
A SVM-based method for sentiment analysis in Persian language
NASA Astrophysics Data System (ADS)
Hajmohammadi, Mohammad Sadegh; Ibrahim, Roliana
2013-03-01
Persian language is the official language of Iran, Tajikistan and Afghanistan. Local online users often represent their opinions and experiences on the web with written Persian. Although the information in those reviews is valuable to potential consumers and sellers, the huge amount of web reviews make it difficult to give an unbiased evaluation to a product. In this paper, standard machine learning techniques SVM and naive Bayes are incorporated into the domain of online Persian Movie reviews to automatically classify user reviews as positive or negative and performance of these two classifiers is compared with each other in this language. The effects of feature presentations on classification performance are discussed. We find that accuracy is influenced by interaction between the classification models and the feature options. The SVM classifier achieves as well as or better accuracy than naive Bayes in Persian movie. Unigrams are proved better features than bigrams and trigrams in capturing Persian sentiment orientation.
NASA Astrophysics Data System (ADS)
Khan, Asif; Ryoo, Chang-Kyung; Kim, Heung Soo
2017-04-01
This paper presents a comparative study of different classification algorithms for the classification of various types of inter-ply delaminations in smart composite laminates. Improved layerwise theory is used to model delamination at different interfaces along the thickness and longitudinal directions of the smart composite laminate. The input-output data obtained through surface bonded piezoelectric sensor and actuator is analyzed by the system identification algorithm to get the system parameters. The identified parameters for the healthy and delaminated structure are supplied as input data to the classification algorithms. The classification algorithms considered in this study are ZeroR, Classification via regression, Naïve Bayes, Multilayer Perceptron, Sequential Minimal Optimization, Multiclass-Classifier, and Decision tree (J48). The open source software of Waikato Environment for Knowledge Analysis (WEKA) is used to evaluate the classification performance of the classifiers mentioned above via 75-25 holdout and leave-one-sample-out cross-validation regarding classification accuracy, precision, recall, kappa statistic and ROC Area.
Protein Secondary Structure Prediction Using AutoEncoder Network and Bayes Classifier
NASA Astrophysics Data System (ADS)
Wang, Leilei; Cheng, Jinyong
2018-03-01
Protein secondary structure prediction is belong to bioinformatics,and it's important in research area. In this paper, we propose a new prediction way of protein using bayes classifier and autoEncoder network. Our experiments show some algorithms including the construction of the model, the classification of parameters and so on. The data set is a typical CB513 data set for protein. In terms of accuracy, the method is the cross validation based on the 3-fold. Then we can get the Q3 accuracy. Paper results illustrate that the autoencoder network improved the prediction accuracy of protein secondary structure.
Marine benthic habitat mapping of the West Arm, Glacier Bay National Park and Preserve, Alaska
Hodson, Timothy O.; Cochrane, Guy R.; Powell, Ross D.
2013-01-01
Seafloor geology and potential benthic habitats were mapped in West Arm, Glacier Bay National Park and Preserve, Alaska, using multibeam sonar, groundtruthed observations, and geological interpretations. The West Arm of Glacier Bay is a recently deglaciated fjord system under the influence of glacial and paraglacial marine processes. High glacially derived sediment and meltwater fluxes, slope instabilities, and variable bathymetry result in a highly dynamic estuarine environment and benthic ecosystem. We characterize the fjord seafloor and potential benthic habitats using the recently developed Coastal and Marine Ecological Classification Standard (CMECS) by the National Oceanic and Atmospheric Administration (NOAA) and NatureServe. Due to the high flux of glacially sourced fines, mud is the dominant substrate within the West Arm. Water-column characteristics are addressed using a combination of CTD and circulation model results. We also present sediment accumulation data derived from differential bathymetry. These data show the West Arm is divided into two contrasting environments: a dynamic upper fjord and a relatively static lower fjord. The results of these analyses serve as a test of the CMECS classification scheme and as a baseline for ongoing and future mapping efforts and correlations between seafloor substrate, benthic habitats, and glacimarine processes.
Novel naïve Bayes classification models for predicting the carcinogenicity of chemicals.
Zhang, Hui; Cao, Zhi-Xing; Li, Meng; Li, Yu-Zhi; Peng, Cheng
2016-11-01
The carcinogenicity prediction has become a significant issue for the pharmaceutical industry. The purpose of this investigation was to develop a novel prediction model of carcinogenicity of chemicals by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test set. The naïve Bayes classifier gave an average overall prediction accuracy of 90 ± 0.8% for the training set and 68 ± 1.9% for the external test set. Moreover, five simple molecular descriptors (e.g., AlogP, Molecular weight (M W ), No. of H donors, Apol and Wiener) considered as important for the carcinogenicity of chemicals were identified, and some substructures related to the carcinogenicity were achieved. Thus, we hope the established naïve Bayes prediction model could be applied to filter early-stage molecules for this potential carcinogenicity adverse effect; and the identified five simple molecular descriptors and substructures of carcinogens would give a better understanding of the carcinogenicity of chemicals, and further provide guidance for medicinal chemists in the design of new candidate drugs and lead optimization, ultimately reducing the attrition rate in later stages of drug development. Copyright © 2016 Elsevier Ltd. All rights reserved.
A model-based test for treatment effects with probabilistic classifications.
Cavagnaro, Daniel R; Davis-Stober, Clintin P
2018-05-21
Within modern psychology, computational and statistical models play an important role in describing a wide variety of human behavior. Model selection analyses are typically used to classify individuals according to the model(s) that best describe their behavior. These classifications are inherently probabilistic, which presents challenges for performing group-level analyses, such as quantifying the effect of an experimental manipulation. We answer this challenge by presenting a method for quantifying treatment effects in terms of distributional changes in model-based (i.e., probabilistic) classifications across treatment conditions. The method uses hierarchical Bayesian mixture modeling to incorporate classification uncertainty at the individual level into the test for a treatment effect at the group level. We illustrate the method with several worked examples, including a reanalysis of the data from Kellen, Mata, and Davis-Stober (2017), and analyze its performance more generally through simulation studies. Our simulations show that the method is both more powerful and less prone to type-1 errors than Fisher's exact test when classifications are uncertain. In the special case where classifications are deterministic, we find a near-perfect power-law relationship between the Bayes factor, derived from our method, and the p value obtained from Fisher's exact test. We provide code in an online supplement that allows researchers to apply the method to their own data. (PsycINFO Database Record (c) 2018 APA, all rights reserved).
Evaluating the Impact of Land Use Change on Submerged Aquatic Vegetation Stressors in Mobile Bay
NASA Technical Reports Server (NTRS)
Al-Hamdan, Mohammad; Estes, Maurice G., Jr.; Quattrochi, Dale; Thom, Ronald; Woodruff, Dana; Judd, Chaeli; Ellis, Jean; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt
2009-01-01
Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land use change in Mobile and Baldwin counties on SAV stressors and controlling factors (temperature, salinity, and sediment) in Mobile Bay. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for land use scenarios in 1948, 1992, 2001, and 2030. Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 land use scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the Bay. Theses results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid with four vertical profiles throughout Mobile Bay. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to land use driven flow changes with the restoration potential of SAVs.
A Pairwise Naïve Bayes Approach to Bayesian Classification.
Asafu-Adjei, Josephine K; Betensky, Rebecca A
2015-10-01
Despite the relatively high accuracy of the naïve Bayes (NB) classifier, there may be several instances where it is not optimal, i.e. does not have the same classification performance as the Bayes classifier utilizing the joint distribution of the examined attributes. However, the Bayes classifier can be computationally intractable due to its required knowledge of the joint distribution. Therefore, we introduce a "pairwise naïve" Bayes (PNB) classifier that incorporates all pairwise relationships among the examined attributes, but does not require specification of the joint distribution. In this paper, we first describe the necessary and sufficient conditions under which the PNB classifier is optimal. We then discuss sufficient conditions for which the PNB classifier, and not NB, is optimal for normal attributes. Through simulation and actual studies, we evaluate the performance of our proposed classifier relative to the Bayes and NB classifiers, along with the HNB, AODE, LBR and TAN classifiers, using normal density and empirical estimation methods. Our applications show that the PNB classifier using normal density estimation yields the highest accuracy for data sets containing continuous attributes. We conclude that it offers a useful compromise between the Bayes and NB classifiers.
Waldman, John R.; Fabrizio, Mary C.
1994-01-01
Stock contribution studies of mixed-stock fisheries rely on the application of classification algorithms to samples of unknown origin. Although the performance of these algorithms can be assessed, there are no guidelines regarding decisions about including minor stocks, pooling stocks into regional groups, or sampling discrete substocks to adequately characterize a stock. We examined these questions for striped bass Morone saxatilis of the U.S. Atlantic coast by applying linear discriminant functions to meristic and morphometric data from fish collected from spawning areas. Some of our samples were from the Hudson and Roanoke rivers and four tributaries of the Chesapeake Bay. We also collected fish of mixed-stock origin from the Atlantic Ocean near Montauk, New York. Inclusion of the minor stock from the Roanoke River in the classification algorithm decreased the correct-classification rate, whereas grouping of the Roanoke River and Chesapeake Bay stock into a regional (''southern'') group increased the overall resolution. The increased resolution was offset by our inability to obtain separate contribution estimates of the groups that were pooled. Although multivariate analysis of variance indicated significant differences among Chesapeake Bay substocks, increasing the number of substocks in the discriminant analysis decreased the overall correct-classification rate. Although the inclusion of one, two, three, or four substocks in the classification algorithm did not greatly affect the overall correct-classification rates, the specific combination of substocks significantly affected the relative contribution estimates derived from the mixed-stock sample. Future studies of this kind must balance the costs and benefits of including minor stocks and would profit from examination of the variation in discriminant characters among all Chesapeake Bay substocks.
Yourganov, Grigori; Schmah, Tanya; Churchill, Nathan W; Berman, Marc G; Grady, Cheryl L; Strother, Stephen C
2014-08-01
The field of fMRI data analysis is rapidly growing in sophistication, particularly in the domain of multivariate pattern classification. However, the interaction between the properties of the analytical model and the parameters of the BOLD signal (e.g. signal magnitude, temporal variance and functional connectivity) is still an open problem. We addressed this problem by evaluating a set of pattern classification algorithms on simulated and experimental block-design fMRI data. The set of classifiers consisted of linear and quadratic discriminants, linear support vector machine, and linear and nonlinear Gaussian naive Bayes classifiers. For linear discriminant, we used two methods of regularization: principal component analysis, and ridge regularization. The classifiers were used (1) to classify the volumes according to the behavioral task that was performed by the subject, and (2) to construct spatial maps that indicated the relative contribution of each voxel to classification. Our evaluation metrics were: (1) accuracy of out-of-sample classification and (2) reproducibility of spatial maps. In simulated data sets, we performed an additional evaluation of spatial maps with ROC analysis. We varied the magnitude, temporal variance and connectivity of simulated fMRI signal and identified the optimal classifier for each simulated environment. Overall, the best performers were linear and quadratic discriminants (operating on principal components of the data matrix) and, in some rare situations, a nonlinear Gaussian naïve Bayes classifier. The results from the simulated data were supported by within-subject analysis of experimental fMRI data, collected in a study of aging. This is the first study that systematically characterizes interactions between analysis model and signal parameters (such as magnitude, variance and correlation) on the performance of pattern classifiers for fMRI. Copyright © 2014 Elsevier Inc. All rights reserved.
Imholte, Gregory; Gottardo, Raphael
2017-01-01
Summary The peptide microarray immunoassay simultaneously screens sample serum against thousands of peptides, determining the presence of antibodies bound to array probes. Peptide microarrays tiling immunogenic regions of pathogens (e.g. envelope proteins of a virus) are an important high throughput tool for querying and mapping antibody binding. Because of the assay’s many steps, from probe synthesis to incubation, peptide microarray data can be noisy with extreme outliers. In addition, subjects may produce different antibody profiles in response to an identical vaccine stimulus or infection, due to variability among subjects’ immune systems. We present a robust Bayesian hierarchical model for peptide microarray experiments, pepBayes, to estimate the probability of antibody response for each subject/peptide combination. Heavy-tailed error distributions accommodate outliers and extreme responses, and tailored random effect terms automatically incorporate technical effects prevalent in the assay. We apply our model to two vaccine trial datasets to demonstrate model performance. Our approach enjoys high sensitivity and specificity when detecting vaccine induced antibody responses. A simulation study shows an adaptive thresholding classification method has appropriate false discovery rate control with high sensitivity, and receiver operating characteristics generated on vaccine trial data suggest that pepBayes clearly separates responses from non-responses. PMID:27061097
NASA Technical Reports Server (NTRS)
Hsu, Wei-Chen; Kuss, Amber Jean; Ketron, Tyler; Nguyen, Andrew; Remar, Alex Covello; Newcomer, Michelle; Fleming, Erich; Debout, Leslie; Debout, Brad; Detweiler, Angela;
2011-01-01
Tidal marshes are highly productive ecosystems that support migratory birds as roosting and over-wintering habitats on the Pacific Flyway. Microphytobenthos, or more commonly 'biofilms' contribute significantly to the primary productivity of wetland ecosystems, and provide a substantial food source for macroinvertebrates and avian communities. In this study, biofilms were characterized based on taxonomic classification, density differences, and spectral signatures. These techniques were then applied to remotely sensed images to map biofilm densities and distributions in the South Bay Salt Ponds and predict the carrying capacity of these newly restored ponds for migratory birds. The GER-1500 spectroradiometer was used to obtain in situ spectral signatures for each density-class of biofilm. The spectral variation and taxonomic classification between high, medium, and low density biofilm cover types was mapped using in-situ spectral measurements and classification of EO-1 Hyperion and Landsat TM 5 images. Biofilm samples were also collected in the field to perform laboratory analyses including chlorophyll-a, taxonomic classification, and energy content. Comparison of the spectral signatures between the three density groups shows distinct variations useful for classification. Also, analysis of chlorophyll-a concentrations show statistically significant differences between each density group, using the Tukey-Kramer test at an alpha level of 0.05. The potential carrying capacity in South Bay Salt Ponds is estimated to be 250,000 birds.
Multilayer perceptron, fuzzy sets, and classification
NASA Technical Reports Server (NTRS)
Pal, Sankar K.; Mitra, Sushmita
1992-01-01
A fuzzy neural network model based on the multilayer perceptron, using the back-propagation algorithm, and capable of fuzzy classification of patterns is described. The input vector consists of membership values to linguistic properties while the output vector is defined in terms of fuzzy class membership values. This allows efficient modeling of fuzzy or uncertain patterns with appropriate weights being assigned to the backpropagated errors depending upon the membership values at the corresponding outputs. During training, the learning rate is gradually decreased in discrete steps until the network converges to a minimum error solution. The effectiveness of the algorithm is demonstrated on a speech recognition problem. The results are compared with those of the conventional MLP, the Bayes classifier, and the other related models.
A Step Towards EEG-based Brain Computer Interface for Autism Intervention*
Fan, Jing; Wade, Joshua W.; Bian, Dayi; Key, Alexandra P.; Warren, Zachary E.; Mion, Lorraine C.; Sarkar, Nilanjan
2017-01-01
Autism Spectrum Disorder (ASD) is a prevalent and costly neurodevelopmental disorder. Individuals with ASD often have deficits in social communication skills as well as adaptive behavior skills related to daily activities. We have recently designed a novel virtual reality (VR) based driving simulator for driving skill training for individuals with ASD. In this paper, we explored the feasibility of detecting engagement level, emotional states, and mental workload during VR-based driving using EEG as a first step towards a potential EEG-based Brain Computer Interface (BCI) for assisting autism intervention. We used spectral features of EEG signals from a 14-channel EEG neuroheadset, together with therapist ratings of behavioral engagement, enjoyment, frustration, boredom, and difficulty to train a group of classification models. Seven classification methods were applied and compared including Bayes network, naïve Bayes, Support Vector Machine (SVM), multilayer perceptron, K-nearest neighbors (KNN), random forest, and J48. The classification results were promising, with over 80% accuracy in classifying engagement and mental workload, and over 75% accuracy in classifying emotional states. Such results may lead to an adaptive closed-loop VR-based skill training system for use in autism intervention. PMID:26737113
Hydrologic Landscape Classification to Estimate Bristol Bay Watershed Hydrology
The use of hydrologic landscapes has proven to be a useful tool for broad scale assessment and classification of landscapes across the United States. These classification systems help organize larger geographical areas into areas of similar hydrologic characteristics based on cl...
Breast cancer Ki67 expression preoperative discrimination by DCE-MRI radiomics features
NASA Astrophysics Data System (ADS)
Ma, Wenjuan; Ji, Yu; Qin, Zhuanping; Guo, Xinpeng; Jian, Xiqi; Liu, Peifang
2018-02-01
To investigate whether quantitative radiomics features extracted from dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) are associated with Ki67 expression of breast cancer. In this institutional review board approved retrospective study, we collected 377 cases Chinese women who were diagnosed with invasive breast cancer in 2015. This cohort included 53 low-Ki67 expression (Ki67 proliferation index less than 14%) and 324 cases with high-Ki67 expression (Ki67 proliferation index more than 14%). A binary-classification of low- vs. high- Ki67 expression was performed. A set of 52 quantitative radiomics features, including morphological, gray scale statistic, and texture features, were extracted from the segmented lesion area. Three most common machine learning classification methods, including Naive Bayes, k-Nearest Neighbor and support vector machine with Gaussian kernel, were employed for the classification and the least absolute shrink age and selection operator (LASSO) method was used to select most predictive features set for the classifiers. Classification performance was evaluated by the area under receiver operating characteristic curve (AUC), accuracy, sensitivity and specificity. The model that used Naive Bayes classification method achieved the best performance than the other two methods, yielding 0.773 AUC value, 0.757 accuracy, 0.777 sensitivity and 0.769 specificity. Our study showed that quantitative radiomics imaging features of breast tumor extracted from DCE-MRI are associated with breast cancer Ki67 expression. Future larger studies are needed in order to further evaluate the findings.
Bayesian model reduction and empirical Bayes for group (DCM) studies
Friston, Karl J.; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E.; van Wijk, Bernadette C.M.; Ziegler, Gabriel; Zeidman, Peter
2016-01-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level – e.g., dynamic causal models – and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. PMID:26569570
Supervised DNA Barcodes species classification: analysis, comparisons and results
2014-01-01
Background Specific fragments, coming from short portions of DNA (e.g., mitochondrial, nuclear, and plastid sequences), have been defined as DNA Barcode and can be used as markers for organisms of the main life kingdoms. Species classification with DNA Barcode sequences has been proven effective on different organisms. Indeed, specific gene regions have been identified as Barcode: COI in animals, rbcL and matK in plants, and ITS in fungi. The classification problem assigns an unknown specimen to a known species by analyzing its Barcode. This task has to be supported with reliable methods and algorithms. Methods In this work the efficacy of supervised machine learning methods to classify species with DNA Barcode sequences is shown. The Weka software suite, which includes a collection of supervised classification methods, is adopted to address the task of DNA Barcode analysis. Classifier families are tested on synthetic and empirical datasets belonging to the animal, fungus, and plant kingdoms. In particular, the function-based method Support Vector Machines (SVM), the rule-based RIPPER, the decision tree C4.5, and the Naïve Bayes method are considered. Additionally, the classification results are compared with respect to ad-hoc and well-established DNA Barcode classification methods. Results A software that converts the DNA Barcode FASTA sequences to the Weka format is released, to adapt different input formats and to allow the execution of the classification procedure. The analysis of results on synthetic and real datasets shows that SVM and Naïve Bayes outperform on average the other considered classifiers, although they do not provide a human interpretable classification model. Rule-based methods have slightly inferior classification performances, but deliver the species specific positions and nucleotide assignments. On synthetic data the supervised machine learning methods obtain superior classification performances with respect to the traditional DNA Barcode classification methods. On empirical data their classification performances are at a comparable level to the other methods. Conclusions The classification analysis shows that supervised machine learning methods are promising candidates for handling with success the DNA Barcoding species classification problem, obtaining excellent performances. To conclude, a powerful tool to perform species identification is now available to the DNA Barcoding community. PMID:24721333
NASA Astrophysics Data System (ADS)
Zhang, Y.; Li, F.; Zhang, S.; Hao, W.; Zhu, T.; Yuan, L.; Xiao, F.
2017-09-01
In this paper, Statistical Distribution based Conditional Random Fields (STA-CRF) algorithm is exploited for improving marginal ice-water classification. Pixel level ice concentration is presented as the comparison of methods based on CRF. Furthermore, in order to explore the effective statistical distribution model to be integrated into STA-CRF, five statistical distribution models are investigated. The STA-CRF methods are tested on 2 scenes around Prydz Bay and Adélie Depression, where contain a variety of ice types during melt season. Experimental results indicate that the proposed method can resolve sea ice edge well in Marginal Ice Zone (MIZ) and show a robust distinction of ice and water.
Minimum Bayes risk image correlation
NASA Technical Reports Server (NTRS)
Minter, T. C., Jr.
1980-01-01
In this paper, the problem of designing a matched filter for image correlation will be treated as a statistical pattern recognition problem. It is shown that, by minimizing a suitable criterion, a matched filter can be estimated which approximates the optimum Bayes discriminant function in a least-squares sense. It is well known that the use of the Bayes discriminant function in target classification minimizes the Bayes risk, which in turn directly minimizes the probability of a false fix. A fast Fourier implementation of the minimum Bayes risk correlation procedure is described.
Linear dimension reduction and Bayes classification
NASA Technical Reports Server (NTRS)
Decell, H. P., Jr.; Odell, P. L.; Coberly, W. A.
1978-01-01
An explicit expression for a compression matrix T of smallest possible left dimension K consistent with preserving the n variate normal Bayes assignment of X to a given one of a finite number of populations and the K variate Bayes assignment of TX to that population was developed. The Bayes population assignment of X and TX were shown to be equivalent for a compression matrix T explicitly calculated as a function of the means and covariances of the given populations.
Prinyakupt, Jaroonrut; Pluempitiwiriyawej, Charnchai
2015-06-30
Blood smear microscopic images are routinely investigated by haematologists to diagnose most blood diseases. However, the task is quite tedious and time consuming. An automatic detection and classification of white blood cells within such images can accelerate the process tremendously. In this paper we propose a system to locate white blood cells within microscopic blood smear images, segment them into nucleus and cytoplasm regions, extract suitable features and finally, classify them into five types: basophil, eosinophil, neutrophil, lymphocyte and monocyte. Two sets of blood smear images were used in this study's experiments. Dataset 1, collected from Rangsit University, were normal peripheral blood slides under light microscope with 100× magnification; 555 images with 601 white blood cells were captured by a Nikon DS-Fi2 high-definition color camera and saved in JPG format of size 960 × 1,280 pixels at 15 pixels per 1 μm resolution. In dataset 2, 477 cropped white blood cell images were downloaded from CellaVision.com. They are in JPG format of size 360 × 363 pixels. The resolution is estimated to be 10 pixels per 1 μm. The proposed system comprises a pre-processing step, nucleus segmentation, cell segmentation, feature extraction, feature selection and classification. The main concept of the segmentation algorithm employed uses white blood cell's morphological properties and the calibrated size of a real cell relative to image resolution. The segmentation process combined thresholding, morphological operation and ellipse curve fitting. Consequently, several features were extracted from the segmented nucleus and cytoplasm regions. Prominent features were then chosen by a greedy search algorithm called sequential forward selection. Finally, with a set of selected prominent features, both linear and naïve Bayes classifiers were applied for performance comparison. This system was tested on normal peripheral blood smear slide images from two datasets. Two sets of comparison were performed: segmentation and classification. The automatically segmented results were compared to the ones obtained manually by a haematologist. It was found that the proposed method is consistent and coherent in both datasets, with dice similarity of 98.9 and 91.6% for average segmented nucleus and cell regions, respectively. Furthermore, the overall correction rate in the classification phase is about 98 and 94% for linear and naïve Bayes models, respectively. The proposed system, based on normal white blood cell morphology and its characteristics, was applied to two different datasets. The results of the calibrated segmentation process on both datasets are fast, robust, efficient and coherent. Meanwhile, the classification of normal white blood cells into five types shows high sensitivity in both linear and naïve Bayes models, with slightly better results in the linear classifier.
Novel naïve Bayes classification models for predicting the chemical Ames mutagenicity.
Zhang, Hui; Kang, Yan-Li; Zhu, Yuan-Yuan; Zhao, Kai-Xia; Liang, Jun-Yu; Ding, Lan; Zhang, Teng-Guo; Zhang, Ji
2017-06-01
Prediction of drug candidates for mutagenicity is a regulatory requirement since mutagenic compounds could pose a toxic risk to humans. The aim of this investigation was to develop a novel prediction model of mutagenicity by using a naïve Bayes classifier. The established model was validated by the internal 5-fold cross validation and external test sets. For comparison, the recursive partitioning classifier prediction model was also established and other various reported prediction models of mutagenicity were collected. Among these methods, the prediction performance of naïve Bayes classifier established here displayed very well and stable, which yielded average overall prediction accuracies for the internal 5-fold cross validation of the training set and external test set I set were 89.1±0.4% and 77.3±1.5%, respectively. The concordance of the external test set II with 446 marketed drugs was 90.9±0.3%. In addition, four simple molecular descriptors (e.g., Apol, No. of H donors, Num-Rings and Wiener) related to mutagenicity and five representative substructures of mutagens (e.g., aromatic nitro, hydroxyl amine, nitroso, aromatic amine and N-methyl-N-methylenemethanaminum) produced by ECFP_14 fingerprints were identified. We hope the established naïve Bayes prediction model can be applied to risk assessment processes; and the obtained important information of mutagenic chemicals can guide the design of chemical libraries for hit and lead optimization. Copyright © 2017 Elsevier B.V. All rights reserved.
Using clustering and a modified classification algorithm for automatic text summarization
NASA Astrophysics Data System (ADS)
Aries, Abdelkrime; Oufaida, Houda; Nouali, Omar
2013-01-01
In this paper we describe a modified classification method destined for extractive summarization purpose. The classification in this method doesn't need a learning corpus; it uses the input text to do that. First, we cluster the document sentences to exploit the diversity of topics, then we use a learning algorithm (here we used Naive Bayes) on each cluster considering it as a class. After obtaining the classification model, we calculate the score of a sentence in each class, using a scoring model derived from classification algorithm. These scores are used, then, to reorder the sentences and extract the first ones as the output summary. We conducted some experiments using a corpus of scientific papers, and we have compared our results to another summarization system called UNIS.1 Also, we experiment the impact of clustering threshold tuning, on the resulted summary, as well as the impact of adding more features to the classifier. We found that this method is interesting, and gives good performance, and the addition of new features (which is simple using this method) can improve summary's accuracy.
Bayard, David S.; Neely, Michael
2016-01-01
An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a nonparametric model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher Information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the nonparametric model. Specifically, the problem of identifying an individual from a nonparametric prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient’s behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (Multiple-Model Optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications. PMID:27909942
Bayard, David S; Neely, Michael
2017-04-01
An experimental design approach is presented for individualized therapy in the special case where the prior information is specified by a nonparametric (NP) population model. Here, a NP model refers to a discrete probability model characterized by a finite set of support points and their associated weights. An important question arises as to how to best design experiments for this type of model. Many experimental design methods are based on Fisher information or other approaches originally developed for parametric models. While such approaches have been used with some success across various applications, it is interesting to note that they largely fail to address the fundamentally discrete nature of the NP model. Specifically, the problem of identifying an individual from a NP prior is more naturally treated as a problem of classification, i.e., to find a support point that best matches the patient's behavior. This paper studies the discrete nature of the NP experiment design problem from a classification point of view. Several new insights are provided including the use of Bayes Risk as an information measure, and new alternative methods for experiment design. One particular method, denoted as MMopt (multiple-model optimal), will be examined in detail and shown to require minimal computation while having distinct advantages compared to existing approaches. Several simulated examples, including a case study involving oral voriconazole in children, are given to demonstrate the usefulness of MMopt in pharmacokinetics applications.
NASA Technical Reports Server (NTRS)
Al-Hamdan, Mohammad Z.; Estes, Maurice G., Jr.; Judd, Chaeli; Thom, Ron; Woodruff, Dana; Ellis, Jean T.; Quattrochi, Dale; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt
2012-01-01
Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA s EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders.
NASA Astrophysics Data System (ADS)
Al-Hamdan, M. Z.; Estes, M. G.; Judd, C.; Thom, R.; Woodruff, D.; Ellis, J. T.; Quattrochi, D.; Watson, B.; Rodriguez, H.; Johnson, H.
2012-12-01
Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA's EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders.
NASA Technical Reports Server (NTRS)
Al-Hamdan, Mohammad; Estes, Maurice G., Jr.; Judd, Chaeli; Woodruff, Dana; Ellis, Jean; Quattrochi, Dale; Watson, Brian; Rodriquez, Hugo; Johnson, Hoyt
2012-01-01
Alabama coastal systems have been subjected to increasing pressure from a variety of activities including urban and rural development, shoreline modifications, industrial activities, and dredging of shipping and navigation channels. The impacts on coastal ecosystems are often observed through the use of indicator species. One such indicator species for aquatic ecosystem health is submerged aquatic vegetation (SAV). Watershed and hydrodynamic modeling has been performed to evaluate the impact of land cover land use (LCLU) change in the two counties surrounding Mobile Bay (Mobile and Baldwin) on SAV stressors and controlling factors (temperature, salinity, and sediment) in the Mobile Bay estuary. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for LCLU scenarios in 1948, 1992, 2001, and 2030. Remotely sensed Landsat-derived National Land Cover Data (NLCD) were used in the 1992 and 2001 simulations after having been reclassified to a common classification scheme. The Prescott Spatial Growth Model was used to project the 2030 LCLU scenario based on current trends. The LSPC model simulations provided output on changes in flow, temperature, and sediment for 22 discharge points into the estuary. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment on a grid throughout Mobile Bay and adjacent estuaries. The changes in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on SAV habitat suitability. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of temperature, salinity, and sediment due to LCLU driven flow changes with the restoration potential of SAVs. Data products and results are being integrated into NOAA s EcoWatch and Gulf of Mexico Data Atlas online systems for dissemination to coastal resource managers and stakeholders. Objective 1: Develop and utilize Land Use scenarios for Mobile and Baldwin Counties, AL as input to models to predict the affects on water properties (temperature,salinity,)for Mobile Bay through 2030. Objective 2: Evaluate the impact of land use change on seagrasses and SAV in Mobile Bay. Hypothesis: Urbanization will significantly increase surface flows and impact salinity and temperature variables that effect seagrasses and SAVs.
NASA Astrophysics Data System (ADS)
O'Carroll, Jack P. J.; Kennedy, Robert; Ren, Lei; Nash, Stephen; Hartnett, Michael; Brown, Colin
2017-10-01
The INFOMAR (Integrated Mapping For the Sustainable Development of Ireland's Marine Resource) initiative has acoustically mapped and classified a significant proportion of Ireland's Exclusive Economic Zone (EEZ), and is likely to be an important tool in Ireland's efforts to meet the criteria of the MSFD. In this study, open source and relic data were used in combination with new grab survey data to model EUNIS level 4 biotope distributions in Galway Bay, Ireland. The correct prediction rates of two artificial neural networks (ANNs) were compared to assess the effectiveness of acoustic sediment classifications versus sediments that were visually classified by an expert in the field as predictor variables. To test for autocorrelation between predictor variables the RELATE routine with Spearman rank correlation method was used. Optimal models were derived by iteratively removing predictor variables and comparing the correct prediction rates of each model. The models with the highest correct prediction rates were chosen as optimal. The optimal models each used a combination of salinity (binary; 0 = polyhaline and 1 = euhaline), proximity to reef (binary; 0 = within 50 m and 1 = outside 50 m), depth (continuous; metres) and a sediment descriptor (acoustic or observed) as predictor variables. As the status of benthic habitats is required to be assessed under the MSFD the Ecological Status (ES) of the subtidal sediments of Galway Bay was also assessed using the Infaunal Quality Index. The ANN that used observed sediment classes as predictor variables could correctly predict the distribution of biotopes 67% of the time, compared to 63% for the ANN using acoustic sediment classes. Acoustic sediment ANN predictions were affected by local sediment heterogeneity, and the lack of a mixed sediment class. The all-round poor performance of ANNs is likely to be a result of the temporally variable and sparsely distributed data within the study area.
NASA Technical Reports Server (NTRS)
Newcomer, Michelle E.; Kuss, Amber Jean; Nguyen, Andrew; Schmidt, Cynthia L.
2012-01-01
In the past, natural tidal marshes in the south bay were segmented by levees and converted into ponds for use in salt production. In an effort to provide habitat for migratory birds and other native plants and animals, as well as to rebuild natural capital, the South Bay Salt Pond Restoration Project (SBSPRP) is focused on restoring a portion of the over 15,000 acres of wetlands in California's South San Francisco Bay. The process of restoration begins when a levee is breached; the bay water and sediment flow into the ponds and eventually restore natural tidal marshes. Since the spring of 2010 the NASA Ames Research Center (ARC) DEVELOP student internship program has collaborated with the South Bay Salt Pond Restoration Project (SBSPRP) to study the effects of these restoration efforts and to provide valuable information to assist in habitat management and ecological forecasting. All of the studies were based on remote sensing techniques -- NASA's area of expertise in the field of Earth Science, and used various analytical techniques such as predictive modeling, flora and fauna classification, and spectral detection, to name a few. Each study was conducted by a team of aspiring scientists as a part of the DEVELOP program at Ames.
Photometric Supernova Classification with Machine Learning
NASA Astrophysics Data System (ADS)
Lochner, Michelle; McEwen, Jason D.; Peiris, Hiranya V.; Lahav, Ofer; Winter, Max K.
2016-08-01
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models to curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k-nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.
Spatial estimation from remotely sensed data via empirical Bayes models
NASA Technical Reports Server (NTRS)
Hill, J. R.; Hinkley, D. V.; Kostal, H.; Morris, C. N.
1984-01-01
Multichannel satellite image data, available as LANDSAT imagery, are recorded as a multivariate time series (four channels, multiple passovers) in two spatial dimensions. The application of parametric empirical Bayes theory to classification of, and estimating the probability of, each crop type at each of a large number of pixels is considered. This theory involves both the probability distribution of imagery data, conditional on crop types, and the prior spatial distribution of crop types. For the latter Markov models indexed by estimable parameters are used. A broad outline of the general theory reveals several questions for further research. Some detailed results are given for the special case of two crop types when only a line transect is analyzed. Finally, the estimation of an underlying continuous process on the lattice is discussed which would be applicable to such quantities as crop yield.
Bayesian model reduction and empirical Bayes for group (DCM) studies.
Friston, Karl J; Litvak, Vladimir; Oswal, Ashwini; Razi, Adeel; Stephan, Klaas E; van Wijk, Bernadette C M; Ziegler, Gabriel; Zeidman, Peter
2016-03-01
This technical note describes some Bayesian procedures for the analysis of group studies that use nonlinear models at the first (within-subject) level - e.g., dynamic causal models - and linear models at subsequent (between-subject) levels. Its focus is on using Bayesian model reduction to finesse the inversion of multiple models of a single dataset or a single (hierarchical or empirical Bayes) model of multiple datasets. These applications of Bayesian model reduction allow one to consider parametric random effects and make inferences about group effects very efficiently (in a few seconds). We provide the relatively straightforward theoretical background to these procedures and illustrate their application using a worked example. This example uses a simulated mismatch negativity study of schizophrenia. We illustrate the robustness of Bayesian model reduction to violations of the (commonly used) Laplace assumption in dynamic causal modelling and show how its recursive application can facilitate both classical and Bayesian inference about group differences. Finally, we consider the application of these empirical Bayesian procedures to classification and prediction. Copyright © 2015 The Authors. Published by Elsevier Inc. All rights reserved.
Bennet, Jaison; Ganaprakasam, Chilambuchelvan Arul; Arputharaj, Kannan
2014-01-01
Cancer classification by doctors and radiologists was based on morphological and clinical features and had limited diagnostic ability in olden days. The recent arrival of DNA microarray technology has led to the concurrent monitoring of thousands of gene expressions in a single chip which stimulates the progress in cancer classification. In this paper, we have proposed a hybrid approach for microarray data classification based on nearest neighbor (KNN), naive Bayes, and support vector machine (SVM). Feature selection prior to classification plays a vital role and a feature selection technique which combines discrete wavelet transform (DWT) and moving window technique (MWT) is used. The performance of the proposed method is compared with the conventional classifiers like support vector machine, nearest neighbor, and naive Bayes. Experiments have been conducted on both real and benchmark datasets and the results indicate that the ensemble approach produces higher classification accuracy than conventional classifiers. This paper serves as an automated system for the classification of cancer and can be applied by doctors in real cases which serve as a boon to the medical community. This work further reduces the misclassification of cancers which is highly not allowed in cancer detection.
Accelerometer and Camera-Based Strategy for Improved Human Fall Detection.
Zerrouki, Nabil; Harrou, Fouzi; Sun, Ying; Houacine, Amrane
2016-12-01
In this paper, we address the problem of detecting human falls using anomaly detection. Detection and classification of falls are based on accelerometric data and variations in human silhouette shape. First, we use the exponentially weighted moving average (EWMA) monitoring scheme to detect a potential fall in the accelerometric data. We used an EWMA to identify features that correspond with a particular type of fall allowing us to classify falls. Only features corresponding with detected falls were used in the classification phase. A benefit of using a subset of the original data to design classification models minimizes training time and simplifies models. Based on features corresponding to detected falls, we used the support vector machine (SVM) algorithm to distinguish between true falls and fall-like events. We apply this strategy to the publicly available fall detection databases from the university of Rzeszow's. Results indicated that our strategy accurately detected and classified fall events, suggesting its potential application to early alert mechanisms in the event of fall situations and its capability for classification of detected falls. Comparison of the classification results using the EWMA-based SVM classifier method with those achieved using three commonly used machine learning classifiers, neural network, K-nearest neighbor and naïve Bayes, proved our model superior.
False-color infrared aerial photography of the Yaquina Bay Estuary, Oregon was acquired at extreme low tides and digitally orthorectified with a ground pixel resolution of 20 cm to provide data for intertidal vegetation mapping. Submerged, semi-exposed and exposed eelgrass mead...
Speaker gender identification based on majority vote classifiers
NASA Astrophysics Data System (ADS)
Mezghani, Eya; Charfeddine, Maha; Nicolas, Henri; Ben Amar, Chokri
2017-03-01
Speaker gender identification is considered among the most important tools in several multimedia applications namely in automatic speech recognition, interactive voice response systems and audio browsing systems. Gender identification systems performance is closely linked to the selected feature set and the employed classification model. Typical techniques are based on selecting the best performing classification method or searching optimum tuning of one classifier parameters through experimentation. In this paper, we consider a relevant and rich set of features involving pitch, MFCCs as well as other temporal and frequency-domain descriptors. Five classification models including decision tree, discriminant analysis, nave Bayes, support vector machine and k-nearest neighbor was experimented. The three best perming classifiers among the five ones will contribute by majority voting between their scores. Experimentations were performed on three different datasets spoken in three languages: English, German and Arabic in order to validate language independency of the proposed scheme. Results confirm that the presented system has reached a satisfying accuracy rate and promising classification performance thanks to the discriminating abilities and diversity of the used features combined with mid-level statistics.
NASA Technical Reports Server (NTRS)
Nguyen, Andrew; Gole, Alexander; Randall, Jarom; Dlott, Glade; Zhang, Sylvia; Alfaro, Brian; Schmidt, Cindy; Skiles, J. W.
2011-01-01
Mapping and predicting the spatial distribution of invasive plant species is central to habitat management, however difficult to implement at landscape and regional scales. Remote sensing techniques can reduce the impact field campaigns have on these ecologically sensitive areas and can provide a regional and multi-temporal view of invasive species spread. Invasive perennial pepperweed (Lepidium latifolium) is now widespread in fragmented estuaries of the South San Francisco Bay, and is shown to degrade native vegetation in estuaries and adjacent habitats, thereby reducing forage and shelter for wildlife. The purpose of this study is to map the present distribution of pepperweed in estuarine areas of the South San Francisco Bay Salt Pond Restoration Project (Alviso, CA), and create a habitat suitability model to predict future spread. Pepperweed reflectance data were collected in-situ with a GER 1500 spectroradiometer along with 88 corresponding pepperweed presence and absence points used for building the statistical models. The spectral angle mapper (SAM) classification algorithm was used to distinguish the reflectance spectrum of pepperweed and map its distribution using an image from EO-1 Hyperion. To map pepperweed, we performed a supervised classification on an ASTER image with a resulting classification accuracy of 71.8%. We generated a weighted overlay analysis model within a geographic information system (GIS) framework to predict areas in the study site most susceptible to pepperweed colonization. Variables for the model included propensity for disturbance, status of pond restoration, proximity to water channels, and terrain curvature. A Generalized Additive Model (GAM) was also used to generate a probability map and investigate the statistical probability that each variable contributed to predict pepperweed spread. Results from the GAM revealed distance to channels, distance to ponds and curvature were statistically significant (p < 0.01) in determining the locations of suitable pepperweed habitats.
On the classification techniques in data mining for microarray data classification
NASA Astrophysics Data System (ADS)
Aydadenta, Husna; Adiwijaya
2018-03-01
Cancer is one of the deadly diseases, according to data from WHO by 2015 there are 8.8 million more deaths caused by cancer, and this will increase every year if not resolved earlier. Microarray data has become one of the most popular cancer-identification studies in the field of health, since microarray data can be used to look at levels of gene expression in certain cell samples that serve to analyze thousands of genes simultaneously. By using data mining technique, we can classify the sample of microarray data thus it can be identified with cancer or not. In this paper we will discuss some research using some data mining techniques using microarray data, such as Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5, and simulation of Random Forest algorithm with technique of reduction dimension using Relief. The result of this paper show performance measure (accuracy) from classification algorithm (SVM, ANN, Naive Bayes, kNN, C4.5, and Random Forets).The results in this paper show the accuracy of Random Forest algorithm higher than other classification algorithms (Support Vector Machine (SVM), Artificial Neural Network (ANN), Naive Bayes, k-Nearest Neighbor (kNN), and C4.5). It is hoped that this paper can provide some information about the speed, accuracy, performance and computational cost generated from each Data Mining Classification Technique based on microarray data.
Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony; Burns, Brendan Paul
2016-01-01
Modern microbial mats are potential analogues of some of Earth's earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic next-generation sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marine mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats. PMID:26023869
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ruvindy, Rendy; White III, Richard Allen; Neilan, Brett Anthony
Modern microbial mats are potential analogues of some of Earth’s earliest ecosystems. Excellent examples can be found in Shark Bay, Australia, with mats of various morphologies. To further our understanding of the functional genetic potential of these complex microbial ecosystems, we conducted for the first time shotgun metagenomic analyses. We assembled metagenomic nextgeneration sequencing data to classify the taxonomic and metabolic potential across diverse morphologies of marine mats in Shark Bay. The microbial community across taxonomic classifications using protein-coding and small subunit rRNA genes directly extracted from the metagenomes suggests that three phyla Proteobacteria, Cyanobacteria and Bacteriodetes dominate all marinemore » mats. However, the microbial community structure between Shark Bay and Highbourne Cay (Bahamas) marine systems appears to be distinct from each other. The metabolic potential (based on SEED subsystem classifications) of the Shark Bay and Highbourne Cay microbial communities were also distinct. Shark Bay metagenomes have a metabolic pathway profile consisting of both heterotrophic and photosynthetic pathways, whereas Highbourne Cay appears to be dominated almost exclusively by photosynthetic pathways. Alternative non-rubisco-based carbon metabolism including reductive TCA cycle and 3-hydroxypropionate/4-hydroxybutyrate pathways is highly represented in Shark Bay metagenomes while not represented in Highbourne Cay microbial mats or any other mat forming ecosystems investigated to date. Potentially novel aspects of nitrogen cycling were also observed, as well as putative heavy metal cycling (arsenic, mercury, copper and cadmium). Finally, archaea are highly represented in Shark Bay and may have critical roles in overall ecosystem function in these modern microbial mats.« less
Remote sensing of Earth terrain
NASA Technical Reports Server (NTRS)
Kong, Jin AU; Shin, Robert T.; Nghiem, Son V.; Yueh, Herng-Aung; Han, Hsiu C.; Lim, Harold H.; Arnold, David V.
1990-01-01
Remote sensing of earth terrain is examined. The layered random medium model is used to investigate the fully polarimetric scattering of electromagnetic waves from vegetation. The model is used to interpret the measured data for vegetation fields such as rice, wheat, or soybean over water or soil. Accurate calibration of polarimetric radar systems is essential for the polarimetric remote sensing of earth terrain. A polarimetric calibration algorithm using three arbitrary in-scene reflectors is developed. In the interpretation of active and passive microwave remote sensing data from the earth terrain, the random medium model was shown to be quite successful. A multivariate K-distribution is proposed to model the statistics of fully polarimetric radar returns from earth terrain. In the terrain cover classification using the synthetic aperture radar (SAR) images, the applications of the K-distribution model will provide better performance than the conventional Gaussian classifiers. The layered random medium model is used to study the polarimetric response of sea ice. Supervised and unsupervised classification procedures are also developed and applied to synthetic aperture radar polarimetric images in order to identify their various earth terrain components for more than two classes. These classification procedures were applied to San Francisco Bay and Traverse City SAR images.
Peltokoski, Jaana; Vehviläinen-Julkunen, Katri; Pitkäaho, Taina; Mikkonen, Santtu; Miettinen, Merja
2015-10-01
To examine the relationship of a comprehensive health care orientation process with a hospital's attractiveness. Little is known about indicators of the employee orientation process that most likely explain a hospital organisation's attractiveness. Empirical data collected from registered nurses (n = 145) and physicians (n = 37) working in two specialised hospital districts. A Naive Bayes Classification was applied to examine the comprehensive orientation process indicators that predict hospital's attractiveness. The model was composed of five orientation process indicators: the contribution of the orientation process to nurses' and physicians' intention to stay; the defined responsibilities of the orientation process; interaction between newcomer and colleagues; responsibilities that are adapted for tasks; and newcomers' baseline knowledge assessment that should be done before the orientation phase. The Naive Bayes Classification was used to explore employee orientation process and related indicators. The model constructed provides insight that can be used in designing and implementing the orientation process to promote the hospital organisation's attractiveness. Managers should focus on developing fluently organised orientation practices based on the indicators that predict the hospital's attractiveness. For the purpose of personalised orientation, employees' baseline knowledge and competence level should be assessed before the orientation phase. © 2014 John Wiley & Sons Ltd.
Garcia-Chimeno, Yolanda; Garcia-Zapirain, Begonya; Gomez-Beldarrain, Marian; Fernandez-Ruanova, Begonya; Garcia-Monco, Juan Carlos
2017-04-13
Feature selection methods are commonly used to identify subsets of relevant features to facilitate the construction of models for classification, yet little is known about how feature selection methods perform in diffusion tensor images (DTIs). In this study, feature selection and machine learning classification methods were tested for the purpose of automating diagnosis of migraines using both DTIs and questionnaire answers related to emotion and cognition - factors that influence of pain perceptions. We select 52 adult subjects for the study divided into three groups: control group (15), subjects with sporadic migraine (19) and subjects with chronic migraine and medication overuse (18). These subjects underwent magnetic resonance with diffusion tensor to see white matter pathway integrity of the regions of interest involved in pain and emotion. The tests also gather data about pathology. The DTI images and test results were then introduced into feature selection algorithms (Gradient Tree Boosting, L1-based, Random Forest and Univariate) to reduce features of the first dataset and classification algorithms (SVM (Support Vector Machine), Boosting (Adaboost) and Naive Bayes) to perform a classification of migraine group. Moreover we implement a committee method to improve the classification accuracy based on feature selection algorithms. When classifying the migraine group, the greatest improvements in accuracy were made using the proposed committee-based feature selection method. Using this approach, the accuracy of classification into three types improved from 67 to 93% when using the Naive Bayes classifier, from 90 to 95% with the support vector machine classifier, 93 to 94% in boosting. The features that were determined to be most useful for classification included are related with the pain, analgesics and left uncinate brain (connected with the pain and emotions). The proposed feature selection committee method improved the performance of migraine diagnosis classifiers compared to individual feature selection methods, producing a robust system that achieved over 90% accuracy in all classifiers. The results suggest that the proposed methods can be used to support specialists in the classification of migraines in patients undergoing magnetic resonance imaging.
PHOTOMETRIC SUPERNOVA CLASSIFICATION WITH MACHINE LEARNING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lochner, Michelle; Peiris, Hiranya V.; Lahav, Ofer
Automated photometric supernova classification has become an active area of research in recent years in light of current and upcoming imaging surveys such as the Dark Energy Survey (DES) and the Large Synoptic Survey Telescope, given that spectroscopic confirmation of type for all supernovae discovered will be impossible. Here, we develop a multi-faceted classification pipeline, combining existing and new approaches. Our pipeline consists of two stages: extracting descriptive features from the light curves and classification using a machine learning algorithm. Our feature extraction methods vary from model-dependent techniques, namely SALT2 fits, to more independent techniques that fit parametric models tomore » curves, to a completely model-independent wavelet approach. We cover a range of representative machine learning algorithms, including naive Bayes, k -nearest neighbors, support vector machines, artificial neural networks, and boosted decision trees (BDTs). We test the pipeline on simulated multi-band DES light curves from the Supernova Photometric Classification Challenge. Using the commonly used area under the curve (AUC) of the Receiver Operating Characteristic as a metric, we find that the SALT2 fits and the wavelet approach, with the BDTs algorithm, each achieve an AUC of 0.98, where 1 represents perfect classification. We find that a representative training set is essential for good classification, whatever the feature set or algorithm, with implications for spectroscopic follow-up. Importantly, we find that by using either the SALT2 or the wavelet feature sets with a BDT algorithm, accurate classification is possible purely from light curve data, without the need for any redshift information.« less
A Lightweight Hierarchical Activity Recognition Framework Using Smartphone Sensors
Han, Manhyung; Bang, Jae Hun; Nugent, Chris; McClean, Sally; Lee, Sungyoung
2014-01-01
Activity recognition for the purposes of recognizing a user's intentions using multimodal sensors is becoming a widely researched topic largely based on the prevalence of the smartphone. Previous studies have reported the difficulty in recognizing life-logs by only using a smartphone due to the challenges with activity modeling and real-time recognition. In addition, recognizing life-logs is difficult due to the absence of an established framework which enables the use of different sources of sensor data. In this paper, we propose a smartphone-based Hierarchical Activity Recognition Framework which extends the Naïve Bayes approach for the processing of activity modeling and real-time activity recognition. The proposed algorithm demonstrates higher accuracy than the Naïve Bayes approach and also enables the recognition of a user's activities within a mobile environment. The proposed algorithm has the ability to classify fifteen activities with an average classification accuracy of 92.96%. PMID:25184486
Using Loss Functions for DIF Detection: An Empirical Bayes Approach.
ERIC Educational Resources Information Center
Zwick, Rebecca; Thayer, Dorothy; Lewis, Charles
2000-01-01
Studied a method for flagging differential item functioning (DIF) based on loss functions. Builds on earlier research that led to the development of an empirical Bayes enhancement to the Mantel-Haenszel DIF analysis. Tested the method through simulation and found its performance better than some commonly used DIF classification systems. (SLD)
A hybrid approach to select features and classify diseases based on medical data
NASA Astrophysics Data System (ADS)
AbdelLatif, Hisham; Luo, Jiawei
2018-03-01
Feature selection is popular problem in the classification of diseases in clinical medicine. Here, we developing a hybrid methodology to classify diseases, based on three medical datasets, Arrhythmia, Breast cancer, and Hepatitis datasets. This methodology called k-means ANOVA Support Vector Machine (K-ANOVA-SVM) uses K-means cluster with ANOVA statistical to preprocessing data and selection the significant features, and Support Vector Machines in the classification process. To compare and evaluate the performance, we choice three classification algorithms, decision tree Naïve Bayes, Support Vector Machines and applied the medical datasets direct to these algorithms. Our methodology was a much better classification accuracy is given of 98% in Arrhythmia datasets, 92% in Breast cancer datasets and 88% in Hepatitis datasets, Compare to use the medical data directly with decision tree Naïve Bayes, and Support Vector Machines. Also, the ROC curve and precision with (K-ANOVA-SVM) Achieved best results than other algorithms
NASA Technical Reports Server (NTRS)
Mulligan, P. J.; Gervin, J. C.; Lu, Y. C.
1985-01-01
An area bordering the Eastern Shore of the Chesapeake Bay was selected for study and classified using unsupervised techniques applied to LANDSAT-2 MSS data and several band combinations of LANDSAT-4 TM data. The accuracies of these Level I land cover classifications were verified using the Taylor's Island USGS 7.5 minute topographic map which was photointerpreted, digitized and rasterized. The the Taylor's Island map, comparing the MSS and TM three band (2 3 4) classifications, the increased resolution of TM produced a small improvement in overall accuracy of 1% correct due primarily to a small improvement, and 1% and 3%, in areas such as water and woodland. This was expected as the MSS data typically produce high accuracies for categories which cover large contiguous areas. However, in the categories covering smaller areas within the map there was generally an improvement of at least 10%. Classification of the important residential category improved 12%, and wetlands were mapped with 11% greater accuracy.
Awaysheh, Abdullah; Wilcke, Jeffrey; Elvinger, François; Rees, Loren; Fan, Weiguo; Zimmerman, Kurt L
2016-11-01
Inflammatory bowel disease (IBD) and alimentary lymphoma (ALA) are common gastrointestinal diseases in cats. The very similar clinical signs and histopathologic features of these diseases make the distinction between them diagnostically challenging. We tested the use of supervised machine-learning algorithms to differentiate between the 2 diseases using data generated from noninvasive diagnostic tests. Three prediction models were developed using 3 machine-learning algorithms: naive Bayes, decision trees, and artificial neural networks. The models were trained and tested on data from complete blood count (CBC) and serum chemistry (SC) results for the following 3 groups of client-owned cats: normal, inflammatory bowel disease (IBD), or alimentary lymphoma (ALA). Naive Bayes and artificial neural networks achieved higher classification accuracy (sensitivities of 70.8% and 69.2%, respectively) than the decision tree algorithm (63%, p < 0.0001). The areas under the receiver-operating characteristic curve for classifying cases into the 3 categories was 83% by naive Bayes, 79% by decision tree, and 82% by artificial neural networks. Prediction models using machine learning provided a method for distinguishing between ALA-IBD, ALA-normal, and IBD-normal. The naive Bayes and artificial neural networks classifiers used 10 and 4 of the CBC and SC variables, respectively, to outperform the C4.5 decision tree, which used 5 CBC and SC variables in classifying cats into the 3 classes. These models can provide another noninvasive diagnostic tool to assist clinicians with differentiating between IBD and ALA, and between diseased and nondiseased cats. © 2016 The Author(s).
Bayes Error Rate Estimation Using Classifier Ensembles
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep
2003-01-01
The Bayes error rate gives a statistical lower bound on the error achievable for a given classification problem and the associated choice of features. By reliably estimating th is rate, one can assess the usefulness of the feature set that is being used for classification. Moreover, by comparing the accuracy achieved by a given classifier with the Bayes rate, one can quantify how effective that classifier is. Classical approaches for estimating or finding bounds for the Bayes error, in general, yield rather weak results for small sample sizes; unless the problem has some simple characteristics, such as Gaussian class-conditional likelihoods. This article shows how the outputs of a classifier ensemble can be used to provide reliable and easily obtainable estimates of the Bayes error with negligible extra computation. Three methods of varying sophistication are described. First, we present a framework that estimates the Bayes error when multiple classifiers, each providing an estimate of the a posteriori class probabilities, a recombined through averaging. Second, we bolster this approach by adding an information theoretic measure of output correlation to the estimate. Finally, we discuss a more general method that just looks at the class labels indicated by ensem ble members and provides error estimates based on the disagreements among classifiers. The methods are illustrated for artificial data, a difficult four-class problem involving underwater acoustic data, and two problems from the Problem benchmarks. For data sets with known Bayes error, the combiner-based methods introduced in this article outperform existing methods. The estimates obtained by the proposed methods also seem quite reliable for the real-life data sets for which the true Bayes rates are unknown.
NASA Technical Reports Server (NTRS)
Estes, Maurice G.; Al-Hamdan, Mohammed; Thom, Ron; Quattrochi, Dale; Woodruff, Dana; Judd, Chaeli; Ellism Jean; Watson, Brian; Rodriguez, Hugo; Johnson, Hoyt
2009-01-01
There is a continued need to understand how human activities along the northern Gulf of Mexico coast are impacting the natural ecosystems. The gulf coast is experiencing rapid population growth and associated land cover/land use change. Mobile Bay, AL is a designated pilot region of the Gulf of Mexico Alliance (GOMA) and is the focus area of many current NASA and NOAA studies, for example. This is a critical region, both ecologically and economically to the entire United States because it has the fourth largest freshwater inflow in the continental USA, is a vital nursery habitat for commercially and recreational important fisheries, and houses a working waterfront and port that is expanding. Watershed and hydrodynamic modeling has been performed for Mobile Bay to evaluate the impact of land use change in Mobile and Baldwin counties on the aquatic ecosystem. Watershed modeling using the Loading Simulation Package in C++ (LSPC) was performed for all watersheds contiguous to Mobile Bay for land use Scenarios in 1948, 1992, 2001, and 2030. The Prescott Spatial Growth Model was used to project the 2030 land use scenario based on observed trends. All land use scenarios were developed to a common land classification system developed by merging the 1992 and 2001 National Land Cover Data (NLCD). The LSPC model output provides changes in flow, temperature, sediments and general water quality for 22 discharge points into the Bay. These results were inputted in the Environmental Fluid Dynamics Computer Code (EFDC) hydrodynamic model to generate data on changes in temperature, salinity, and sediment concentrations on a grid with four vertical profiles throughout the Bay s aquatic ecosystems. The models were calibrated using in-situ data collected at sampling stations in and around Mobile bay. This phase of the project has focused on sediment modeling because of its significant influence on light attenuation which is a critical factor in the health of submerged aquatic vegetation. The impact of land use change on sediment concentrations was evaluated by analyzing the LSPC and EFDC sediment simulations for the four land use scenarios. Such analysis was also performed for storm and non-storm periods. In- situ data of total suspended sediments (TSS) and light attenuation were used to develop a regression model to estimate light attenuation from TSS. This regression model was used to derive marine light attenuation estimates throughout Mobile bay using the EFDC TSS outputs. The changes in sediment concentrations and associated impact on light attenuation in the aquatic ecosystem were used to perform an ecological analysis to evaluate the impact on seagreasses and Submerged Aquatic Vegetation (SAV) habitat. This is the key product benefiting the Mobile Bay coastal environmental managers that integrates the influences of sediments due to land use driven flow changes with the restoration potential of SAVs.
Soriano, Sylvain; Villa, Paola; Delagnes, Anne; Degano, Ilaria; Pollarolo, Luca; Lucejko, Jeannette J; Henshilwood, Christopher; Wadley, Lyn
2015-01-01
The classification of archaeological assemblages in the Middle Stone Age of South Africa in terms of diversity and temporal continuity has significant implications with respect to recent cultural evolutionary models which propose either gradual accumulation or discontinuous, episodic processes for the emergence and diffusion of cultural traits. We present the results of a systematic technological and typological analysis of the Still Bay assemblages from Sibudu and Blombos. A similar approach is used in the analysis of the Howiesons Poort (HP) assemblages from Sibudu seen in comparison with broadly contemporaneous assemblages from Rose Cottage and Klasies River Cave 1A. Using our own and published data from other sites we report on the diversity between stone artifact assemblages and discuss to what extent they can be grouped into homogeneous lithic sets. The gradual evolution of debitage techniques within the Howiesons Poort sequence with a progressive abandonment of the HP technological style argues against the saltational model for its disappearance while the technological differences between the Sibudu and Blombos Still Bay artifacts considerably weaken an interpretation of similarities between the assemblages and their grouping into the same cultural unit. Limited sampling of a fragmented record may explain why simple models of cultural evolution do not seem to apply to a complex reality.
The Still Bay and Howiesons Poort at Sibudu and Blombos: Understanding Middle Stone Age Technologies
Soriano, Sylvain; Villa, Paola; Delagnes, Anne; Degano, Ilaria; Pollarolo, Luca; Lucejko, Jeannette J.; Henshilwood, Christopher; Wadley, Lyn
2015-01-01
The classification of archaeological assemblages in the Middle Stone Age of South Africa in terms of diversity and temporal continuity has significant implications with respect to recent cultural evolutionary models which propose either gradual accumulation or discontinuous, episodic processes for the emergence and diffusion of cultural traits. We present the results of a systematic technological and typological analysis of the Still Bay assemblages from Sibudu and Blombos. A similar approach is used in the analysis of the Howiesons Poort (HP) assemblages from Sibudu seen in comparison with broadly contemporaneous assemblages from Rose Cottage and Klasies River Cave 1A. Using our own and published data from other sites we report on the diversity between stone artifact assemblages and discuss to what extent they can be grouped into homogeneous lithic sets. The gradual evolution of debitage techniques within the Howiesons Poort sequence with a progressive abandonment of the HP technological style argues against the saltational model for its disappearance while the technological differences between the Sibudu and Blombos Still Bay artifacts considerably weaken an interpretation of similarities between the assemblages and their grouping into the same cultural unit. Limited sampling of a fragmented record may explain why simple models of cultural evolution do not seem to apply to a complex reality. PMID:26161665
Hybrid analysis for indicating patients with breast cancer using temperature time series.
Silva, Lincoln F; Santos, Alair Augusto S M D; Bravo, Renato S; Silva, Aristófanes C; Muchaluat-Saade, Débora C; Conci, Aura
2016-07-01
Breast cancer is the most common cancer among women worldwide. Diagnosis and treatment in early stages increase cure chances. The temperature of cancerous tissue is generally higher than that of healthy surrounding tissues, making thermography an option to be considered in screening strategies of this cancer type. This paper proposes a hybrid methodology for analyzing dynamic infrared thermography in order to indicate patients with risk of breast cancer, using unsupervised and supervised machine learning techniques, which characterizes the methodology as hybrid. The dynamic infrared thermography monitors or quantitatively measures temperature changes on the examined surface, after a thermal stress. In the dynamic infrared thermography execution, a sequence of breast thermograms is generated. In the proposed methodology, this sequence is processed and analyzed by several techniques. First, the region of the breasts is segmented and the thermograms of the sequence are registered. Then, temperature time series are built and the k-means algorithm is applied on these series using various values of k. Clustering formed by k-means algorithm, for each k value, is evaluated using clustering validation indices, generating values treated as features in the classification model construction step. A data mining tool was used to solve the combined algorithm selection and hyperparameter optimization (CASH) problem in classification tasks. Besides the classification algorithm recommended by the data mining tool, classifiers based on Bayesian networks, neural networks, decision rules and decision tree were executed on the data set used for evaluation. Test results support that the proposed analysis methodology is able to indicate patients with breast cancer. Among 39 tested classification algorithms, K-Star and Bayes Net presented 100% classification accuracy. Furthermore, among the Bayes Net, multi-layer perceptron, decision table and random forest classification algorithms, an average accuracy of 95.38% was obtained. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
ERIC Educational Resources Information Center
Rudner, Lawrence
2016-01-01
In the machine learning literature, it is commonly accepted as fact that as calibration sample sizes increase, Naïve Bayes classifiers initially outperform Logistic Regression classifiers in terms of classification accuracy. Applied to subtests from an on-line final examination and from a highly regarded certification examination, this study shows…
Can single classifiers be as useful as model ensembles to produce benthic seabed substratum maps?
NASA Astrophysics Data System (ADS)
Turner, Joseph A.; Babcock, Russell C.; Hovey, Renae; Kendrick, Gary A.
2018-05-01
Numerous machine-learning classifiers are available for benthic habitat map production, which can lead to different results. This study highlights the performance of the Random Forest (RF) classifier, which was significantly better than Classification Trees (CT), Naïve Bayes (NB), and a multi-model ensemble in terms of overall accuracy, Balanced Error Rate (BER), Kappa, and area under the curve (AUC) values. RF accuracy was often higher than 90% for each substratum class, even at the most detailed level of the substratum classification and AUC values also indicated excellent performance (0.8-1). Total agreement between classifiers was high at the broadest level of classification (75-80%) when differentiating between hard and soft substratum. However, this sharply declined as the number of substratum categories increased (19-45%) including a mix of rock, gravel, pebbles, and sand. The model ensemble, produced from the results of all three classifiers by majority voting, did not show any increase in predictive performance when compared to the single RF classifier. This study shows how a single classifier may be sufficient to produce benthic seabed maps and model ensembles of multiple classifiers.
Ensemble Methods for Classification of Physical Activities from Wrist Accelerometry.
Chowdhury, Alok Kumar; Tjondronegoro, Dian; Chandran, Vinod; Trost, Stewart G
2017-09-01
To investigate whether the use of ensemble learning algorithms improve physical activity recognition accuracy compared to the single classifier algorithms, and to compare the classification accuracy achieved by three conventional ensemble machine learning methods (bagging, boosting, random forest) and a custom ensemble model comprising four algorithms commonly used for activity recognition (binary decision tree, k nearest neighbor, support vector machine, and neural network). The study used three independent data sets that included wrist-worn accelerometer data. For each data set, a four-step classification framework consisting of data preprocessing, feature extraction, normalization and feature selection, and classifier training and testing was implemented. For the custom ensemble, decisions from the single classifiers were aggregated using three decision fusion methods: weighted majority vote, naïve Bayes combination, and behavior knowledge space combination. Classifiers were cross-validated using leave-one subject out cross-validation and compared on the basis of average F1 scores. In all three data sets, ensemble learning methods consistently outperformed the individual classifiers. Among the conventional ensemble methods, random forest models provided consistently high activity recognition; however, the custom ensemble model using weighted majority voting demonstrated the highest classification accuracy in two of the three data sets. Combining multiple individual classifiers using conventional or custom ensemble learning methods can improve activity recognition accuracy from wrist-worn accelerometer data.
LDA boost classification: boosting by topics
NASA Astrophysics Data System (ADS)
Lei, La; Qiao, Guo; Qimin, Cao; Qitao, Li
2012-12-01
AdaBoost is an efficacious classification algorithm especially in text categorization (TC) tasks. The methodology of setting up a classifier committee and voting on the documents for classification can achieve high categorization precision. However, traditional Vector Space Model can easily lead to the curse of dimensionality and feature sparsity problems; so it affects classification performance seriously. This article proposed a novel classification algorithm called LDABoost based on boosting ideology which uses Latent Dirichlet Allocation (LDA) to modeling the feature space. Instead of using words or phrase, LDABoost use latent topics as the features. In this way, the feature dimension is significantly reduced. Improved Naïve Bayes (NB) is designed as the weaker classifier which keeps the efficiency advantage of classic NB algorithm and has higher precision. Moreover, a two-stage iterative weighted method called Cute Integration in this article is proposed for improving the accuracy by integrating weak classifiers into strong classifier in a more rational way. Mutual Information is used as metrics of weights allocation. The voting information and the categorization decision made by basis classifiers are fully utilized for generating the strong classifier. Experimental results reveals LDABoost making categorization in a low-dimensional space, it has higher accuracy than traditional AdaBoost algorithms and many other classic classification algorithms. Moreover, its runtime consumption is lower than different versions of AdaBoost, TC algorithms based on support vector machine and Neural Networks.
A new computational strategy for predicting essential genes.
Cheng, Jian; Wu, Wenwu; Zhang, Yinwen; Li, Xiangchen; Jiang, Xiaoqian; Wei, Gehong; Tao, Shiheng
2013-12-21
Determination of the minimum gene set for cellular life is one of the central goals in biology. Genome-wide essential gene identification has progressed rapidly in certain bacterial species; however, it remains difficult to achieve in most eukaryotic species. Several computational models have recently been developed to integrate gene features and used as alternatives to transfer gene essentiality annotations between organisms. We first collected features that were widely used by previous predictive models and assessed the relationships between gene features and gene essentiality using a stepwise regression model. We found two issues that could significantly reduce model accuracy: (i) the effect of multicollinearity among gene features and (ii) the diverse and even contrasting correlations between gene features and gene essentiality existing within and among different species. To address these issues, we developed a novel model called feature-based weighted Naïve Bayes model (FWM), which is based on Naïve Bayes classifiers, logistic regression, and genetic algorithm. The proposed model assesses features and filters out the effects of multicollinearity and diversity. The performance of FWM was compared with other popular models, such as support vector machine, Naïve Bayes model, and logistic regression model, by applying FWM to reciprocally predict essential genes among and within 21 species. Our results showed that FWM significantly improves the accuracy and robustness of essential gene prediction. FWM can remarkably improve the accuracy of essential gene prediction and may be used as an alternative method for other classification work. This method can contribute substantially to the knowledge of the minimum gene sets required for living organisms and the discovery of new drug targets.
Sound Classification in Hearing Aids Inspired by Auditory Scene Analysis
NASA Astrophysics Data System (ADS)
Büchler, Michael; Allegro, Silvia; Launer, Stefan; Dillier, Norbert
2005-12-01
A sound classification system for the automatic recognition of the acoustic environment in a hearing aid is discussed. The system distinguishes the four sound classes "clean speech," "speech in noise," "noise," and "music." A number of features that are inspired by auditory scene analysis are extracted from the sound signal. These features describe amplitude modulations, spectral profile, harmonicity, amplitude onsets, and rhythm. They are evaluated together with different pattern classifiers. Simple classifiers, such as rule-based and minimum-distance classifiers, are compared with more complex approaches, such as Bayes classifier, neural network, and hidden Markov model. Sounds from a large database are employed for both training and testing of the system. The achieved recognition rates are very high except for the class "speech in noise." Problems arise in the classification of compressed pop music, strongly reverberated speech, and tonal or fluctuating noises.
Mycofier: a new machine learning-based classifier for fungal ITS sequences.
Delgado-Serrano, Luisa; Restrepo, Silvia; Bustos, Jose Ricardo; Zambrano, Maria Mercedes; Anzola, Juan Manuel
2016-08-11
The taxonomic and phylogenetic classification based on sequence analysis of the ITS1 genomic region has become a crucial component of fungal ecology and diversity studies. Nowadays, there is no accurate alignment-free classification tool for fungal ITS1 sequences for large environmental surveys. This study describes the development of a machine learning-based classifier for the taxonomical assignment of fungal ITS1 sequences at the genus level. A fungal ITS1 sequence database was built using curated data. Training and test sets were generated from it. A Naïve Bayesian classifier was built using features from the primary sequence with an accuracy of 87 % in the classification at the genus level. The final model was based on a Naïve Bayes algorithm using ITS1 sequences from 510 fungal genera. This classifier, denoted as Mycofier, provides similar classification accuracy compared to BLASTN, but the database used for the classification contains curated data and the tool, independent of alignment, is more efficient and contributes to the field, given the lack of an accurate classification tool for large data from fungal ITS1 sequences. The software and source code for Mycofier are freely available at https://github.com/ldelgado-serrano/mycofier.git .
Bokulich, Nicholas A; Kaehler, Benjamin D; Rideout, Jai Ram; Dillon, Matthew; Bolyen, Evan; Knight, Rob; Huttley, Gavin A; Gregory Caporaso, J
2018-05-17
Taxonomic classification of marker-gene sequences is an important step in microbiome analysis. We present q2-feature-classifier ( https://github.com/qiime2/q2-feature-classifier ), a QIIME 2 plugin containing several novel machine-learning and alignment-based methods for taxonomy classification. We evaluated and optimized several commonly used classification methods implemented in QIIME 1 (RDP, BLAST, UCLUST, and SortMeRNA) and several new methods implemented in QIIME 2 (a scikit-learn naive Bayes machine-learning classifier, and alignment-based taxonomy consensus methods based on VSEARCH, and BLAST+) for classification of bacterial 16S rRNA and fungal ITS marker-gene amplicon sequence data. The naive-Bayes, BLAST+-based, and VSEARCH-based classifiers implemented in QIIME 2 meet or exceed the species-level accuracy of other commonly used methods designed for classification of marker gene sequences that were evaluated in this work. These evaluations, based on 19 mock communities and error-free sequence simulations, including classification of simulated "novel" marker-gene sequences, are available in our extensible benchmarking framework, tax-credit ( https://github.com/caporaso-lab/tax-credit-data ). Our results illustrate the importance of parameter tuning for optimizing classifier performance, and we make recommendations regarding parameter choices for these classifiers under a range of standard operating conditions. q2-feature-classifier and tax-credit are both free, open-source, BSD-licensed packages available on GitHub.
The nearest neighbor and the bayes error rates.
Loizou, G; Maybank, S J
1987-02-01
The (k, l) nearest neighbor method of pattern classification is compared to the Bayes method. If the two acceptance rates are equal then the asymptotic error rates satisfy the inequalities Ek,l + 1 ¿ E*(¿) ¿ Ek,l dE*(¿), where d is a function of k, l, and the number of pattern classes, and ¿ is the reject threshold for the Bayes method. An explicit expression for d is given which is optimal in the sense that for some probability distributions Ek,l and dE* (¿) are equal.
Gurney, J C; Ansari, E; Harle, D; O'Kane, N; Sagar, R V; Dunne, M C M
2018-02-09
To determine the accuracy of a Bayesian learning scheme (Bayes') applied to the prediction of clinical decisions made by specialist optometrists in relation to the referral refinement of chronic open angle glaucoma. This cross-sectional observational study involved collection of data from the worst affected or right eyes of a consecutive sample of cases (n = 1,006) referred into the West Kent Clinical Commissioning Group Community Ophthalmology Team (COT) by high street optometrists. Multilevel classification of each case was based on race, sex, age, family history of chronic open angle glaucoma, reason for referral, Goldmann Applanation Tonometry (intraocular pressure and interocular asymmetry), optic nerve head assessment (vertical size, cup disc ratio and interocular asymmetry), central corneal thickness and visual field analysis (Hodapp-Parrish-Anderson classification). Randomised stratified tenfold cross-validation was applied to determine the accuracy of Bayes' by comparing its output to the clinical decisions of three COT specialist optometrists; namely, the decision to discharge, follow-up or refer each case. Outcomes of cross-validation, expressed as means and standard deviations, showed that the accuracy of Bayes' was high (95%, 2.0%) but that it falsely discharged (3.4%, 1.6%) or referred (3.1%, 1.5%) some cases. The results indicate that Bayes' has the potential to augment the decisions of specialist optometrists.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, P.; Beaudet, P.
1980-01-01
The classification of large dimensional data sets arising from the merging of remote sensing data with more traditional forms of ancillary data is considered. Decision tree classification, a popular approach to the problem, is characterized by the property that samples are subjected to a sequence of decision rules before they are assigned to a unique class. An automated technique for effective decision tree design which relies only on apriori statistics is presented. This procedure utilizes a set of two dimensional canonical transforms and Bayes table look-up decision rules. An optimal design at each node is derived based on the associated decision table. A procedure for computing the global probability of correct classfication is also provided. An example is given in which class statistics obtained from an actual LANDSAT scene are used as input to the program. The resulting decision tree design has an associated probability of correct classification of .76 compared to the theoretically optimum .79 probability of correct classification associated with a full dimensional Bayes classifier. Recommendations for future research are included.
Implementation of mutual information and bayes theorem for classification microarray data
NASA Astrophysics Data System (ADS)
Dwifebri Purbolaksono, Mahendra; Widiastuti, Kurnia C.; Syahrul Mubarok, Mohamad; Adiwijaya; Aminy Ma’ruf, Firda
2018-03-01
Microarray Technology is one of technology which able to read the structure of gen. The analysis is important for this technology. It is for deciding which attribute is more important than the others. Microarray technology is able to get cancer information to diagnose a person’s gen. Preparation of microarray data is a huge problem and takes a long time. That is because microarray data contains high number of insignificant and irrelevant attributes. So, it needs a method to reduce the dimension of microarray data without eliminating important information in every attribute. This research uses Mutual Information to reduce dimension. System is built with Machine Learning approach specifically Bayes Theorem. This theorem uses a statistical and probability approach. By combining both methods, it will be powerful for Microarray Data Classification. The experiment results show that system is good to classify Microarray data with highest F1-score using Bayesian Network by 91.06%, and Naïve Bayes by 88.85%.
A comparative study of nonparametric methods for pattern recognition
NASA Technical Reports Server (NTRS)
Hahn, S. F.; Nelson, G. D.
1972-01-01
The applied research discussed in this report determines and compares the correct classification percentage of the nonparametric sign test, Wilcoxon's signed rank test, and K-class classifier with the performance of the Bayes classifier. The performance is determined for data which have Gaussian, Laplacian and Rayleigh probability density functions. The correct classification percentage is shown graphically for differences in modes and/or means of the probability density functions for four, eight and sixteen samples. The K-class classifier performed very well with respect to the other classifiers used. Since the K-class classifier is a nonparametric technique, it usually performed better than the Bayes classifier which assumes the data to be Gaussian even though it may not be. The K-class classifier has the advantage over the Bayes in that it works well with non-Gaussian data without having to determine the probability density function of the data. It should be noted that the data in this experiment was always unimodal.
Arribas-Gil, Ana; De la Cruz, Rolando; Lebarbier, Emilie; Meza, Cristian
2015-06-01
We propose a classification method for longitudinal data. The Bayes classifier is classically used to determine a classification rule where the underlying density in each class needs to be well modeled and estimated. This work is motivated by a real dataset of hormone levels measured at the early stages of pregnancy that can be used to predict normal versus abnormal pregnancy outcomes. The proposed model, which is a semiparametric linear mixed-effects model (SLMM), is a particular case of the semiparametric nonlinear mixed-effects class of models (SNMM) in which finite dimensional (fixed effects and variance components) and infinite dimensional (an unknown function) parameters have to be estimated. In SNMM's maximum likelihood estimation is performed iteratively alternating parametric and nonparametric procedures. However, if one can make the assumption that the random effects and the unknown function interact in a linear way, more efficient estimation methods can be used. Our contribution is the proposal of a unified estimation procedure based on a penalized EM-type algorithm. The Expectation and Maximization steps are explicit. In this latter step, the unknown function is estimated in a nonparametric fashion using a lasso-type procedure. A simulation study and an application on real data are performed. © 2015, The International Biometric Society.
NASA Astrophysics Data System (ADS)
Estuar, Maria Regina Justina; Victorino, John Noel; Coronel, Andrei; Co, Jerelyn; Tiausas, Francis; Señires, Chiara Veronica
2017-09-01
Use of wireless sensor networks and smartphone integration design to monitor environmental parameters surrounding plantations is made possible because of readily available and affordable sensors. Providing low cost monitoring devices would be beneficial, especially to small farm owners, in a developing country like the Philippines, where agriculture covers a significant amount of the labor market. This study discusses the integration of wireless soil sensor devices and smartphones to create an application that will use multidimensional analysis to detect the presence or absence of plant disease. Specifically, soil sensors are designed to collect soil quality parameters in a sink node from which the smartphone collects data from via Bluetooth. Given these, there is a need to develop a classification model on the mobile phone that will report infection status of a soil. Though tree classification is the most appropriate approach for continuous parameter-based datasets, there is a need to determine whether tree models will result to coherent results or not. Soil sensor data that resides on the phone is modeled using several variations of decision tree, namely: decision tree (DT), best-fit (BF) decision tree, functional tree (FT), Naive Bayes (NB) decision tree, J48, J48graft and LAD tree, where decision tree approaches the problem by considering all sensor nodes as one. Results show that there are significant differences among soil sensor parameters indicating that there are variances in scores between the infected and uninfected sites. Furthermore, analysis of variance in accuracy, recall, precision and F1 measure scores from tree classification models homogeneity among NBTree, J48graft and J48 tree classification models.
The ABAG biogenic emissions inventory project
NASA Technical Reports Server (NTRS)
Carson-Henry, C. (Editor)
1982-01-01
The ability to identify the role of biogenic hydrocarbon emissions in contributing to overall ozone production in the Bay Area, and to identify the significance of that role, were investigated in a joint project of the Association of Bay Area Governments (ABAG) and NASA/Ames Research Center. Ozone, which is produced when nitrogen oxides and hydrocarbons combine in the presence of sunlight, is a primary factor in air quality planning. In investigating the role of biogenic emissions, this project employed a pre-existing land cover classification to define areal extent of land cover types. Emission factors were then derived for those cover types. The land cover data and emission factors were integrated into an existing geographic information system, where they were combined to form a Biogenic Hydrocarbon Emissions Inventory. The emissions inventory information was then integrated into an existing photochemical dispersion model.
NASA Astrophysics Data System (ADS)
Avetisyan, H.; Bruna, O.; Holub, J.
2016-11-01
A numerous techniques and algorithms are dedicated to extract emotions from input data. In our investigation it was stated that emotion-detection approaches can be classified into 3 following types: Keyword based / lexical-based, learning based, and hybrid. The most commonly used techniques, such as keyword-spotting method, Support Vector Machines, Naïve Bayes Classifier, Hidden Markov Model and hybrid algorithms, have impressive results in this sphere and can reach more than 90% determining accuracy.
Spitz, J; Jouma'a, J
2013-06-01
Energy densities of 670 fishes belonging to nine species were measured to evaluate intraspecific variability. Functional groups based on energy density appeared to be sufficiently robust to individual variability to provide a classification of forage fish quality applicable in a variety of ecological fields including ecosystem modelling. © 2013 The Authors. Journal of Fish Biology © 2013 The Fisheries Society of the British Isles.
NASA Astrophysics Data System (ADS)
Li, A.; Tsai, F. T. C.; Jafari, N.; Chen, Q. J.; Bentley, S. J.
2017-12-01
A vast area of river deltaic wetlands stretches across southern Louisiana coast. The wetlands are suffering from a high rate of land loss, which increasingly threats coastal community and energy infrastructure. A regional stratigraphic framework of the delta plain is now imperative to answer scientific questions (such as how the delta plain grows and decays?) and to provide information to coastal protection and restoration projects (such as marsh creation and construction of levees and floodwalls). Through years, subsurface investigations in Louisiana have been conducted by state and federal agencies (Louisiana Department of Natural Resources, United States Geological Survey, United States Army Corps of Engineers, etc.), research institutes (Louisiana Geological Survey, LSU Coastal Studies Institute, etc.), engineering firms, and oil-gas companies. This has resulted in the availability of various types of data, including geological, geotechnical, and geophysical data. However, it is challenging to integrate different types of data and construct three-dimensional stratigraphy models in regional scale. In this study, a set of geostatistical methods were used to tackle this problem. An ordinary kriging method was used to regionalize continuous data, such as grain size, water content, liquid limit, plasticity index, and cone penetrometer tests (CPTs). Indicator kriging and multiple indicator kriging methods were used to regionalize categorized data, such as soil classification. A compositional kriging method was used to regionalize compositional data, such as soil composition (fractions of sand, silt and clay). Stratigraphy models were constructed for three cases in the coastal zone: (1) Inner Harbor Navigation Canal (IHNC) area: soil classification and soil behavior type (SBT) stratigraphies were constructed using ordinary kriging; (2) Middle Barataria Bay area: a soil classification stratigraphy was constructed using multiple indicator kriging; (3) Lower Barataria Bay and Lower Breton Sound areas: a soil texture stratigraphy was constructed using soil compositional data and compositional kriging. Cross sections were extracted from the three-dimensional stratigraphy models to reveal spatial distributions of different stratigraphic features.
Spinnato, J; Roubaud, M-C; Burle, B; Torrésani, B
2015-06-01
The main goal of this work is to develop a model for multisensor signals, such as magnetoencephalography or electroencephalography (EEG) signals that account for inter-trial variability, suitable for corresponding binary classification problems. An important constraint is that the model be simple enough to handle small size and unbalanced datasets, as often encountered in BCI-type experiments. The method involves the linear mixed effects statistical model, wavelet transform, and spatial filtering, and aims at the characterization of localized discriminant features in multisensor signals. After discrete wavelet transform and spatial filtering, a projection onto the relevant wavelet and spatial channels subspaces is used for dimension reduction. The projected signals are then decomposed as the sum of a signal of interest (i.e., discriminant) and background noise, using a very simple Gaussian linear mixed model. Thanks to the simplicity of the model, the corresponding parameter estimation problem is simplified. Robust estimates of class-covariance matrices are obtained from small sample sizes and an effective Bayes plug-in classifier is derived. The approach is applied to the detection of error potentials in multichannel EEG data in a very unbalanced situation (detection of rare events). Classification results prove the relevance of the proposed approach in such a context. The combination of the linear mixed model, wavelet transform and spatial filtering for EEG classification is, to the best of our knowledge, an original approach, which is proven to be effective. This paper improves upon earlier results on similar problems, and the three main ingredients all play an important role.
Image segmentation using hidden Markov Gauss mixture models.
Pyun, Kyungsuk; Lim, Johan; Won, Chee Sun; Gray, Robert M
2007-07-01
Image segmentation is an important tool in image processing and can serve as an efficient front end to sophisticated algorithms and thereby simplify subsequent processing. We develop a multiclass image segmentation method using hidden Markov Gauss mixture models (HMGMMs) and provide examples of segmentation of aerial images and textures. HMGMMs incorporate supervised learning, fitting the observation probability distribution given each class by a Gauss mixture estimated using vector quantization with a minimum discrimination information (MDI) distortion. We formulate the image segmentation problem using a maximum a posteriori criteria and find the hidden states that maximize the posterior density given the observation. We estimate both the hidden Markov parameter and hidden states using a stochastic expectation-maximization algorithm. Our results demonstrate that HMGMM provides better classification in terms of Bayes risk and spatial homogeneity of the classified objects than do several popular methods, including classification and regression trees, learning vector quantization, causal hidden Markov models (HMMs), and multiresolution HMMs. The computational load of HMGMM is similar to that of the causal HMM.
BayesMotif: de novo protein sorting motif discovery from impure datasets.
Hu, Jianjun; Zhang, Fan
2010-01-18
Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms. We formulated the protein sorting motif discovery problem as a classification problem and proposed a Bayesian classifier based algorithm (BayesMotif) for de novo identification of a common type of protein sorting motifs in which a highly conserved anchor is present along with a less conserved motif regions. A false positive removal procedure is developed to iteratively remove sequences that are unlikely to contain true motifs so that the algorithm can identify motifs from impure input sequences. Experiments on both implanted motif datasets and real-world datasets showed that the enhanced BayesMotif algorithm can identify anchored sorting motifs from pure or impure protein sequence dataset. It also shows that the false positive removal procedure can help to identify true motifs even when there is only 20% of the input sequences containing true motif instances. We proposed BayesMotif, a novel Bayesian classification based algorithm for de novo discovery of a special category of anchored protein sorting motifs from impure datasets. Compared to conventional motif discovery algorithms such as MEME, our algorithm can find less-conserved motifs with short highly conserved anchors. Our algorithm also has the advantage of easy incorporation of additional meta-sequence features such as hydrophobicity or charge of the motifs which may help to overcome the limitations of PWM (position weight matrix) motif model.
NASA Technical Reports Server (NTRS)
Rignot, E.; Chellappa, R.
1993-01-01
We present a maximum a posteriori (MAP) classifier for classifying multifrequency, multilook, single polarization SAR intensity data into regions or ensembles of pixels of homogeneous and similar radar backscatter characteristics. A model for the prior joint distribution of the multifrequency SAR intensity data is combined with a Markov random field for representing the interactions between region labels to obtain an expression for the posterior distribution of the region labels given the multifrequency SAR observations. The maximization of the posterior distribution yields Bayes's optimum region labeling or classification of the SAR data or its MAP estimate. The performance of the MAP classifier is evaluated by using computer-simulated multilook SAR intensity data as a function of the parameters in the classification process. Multilook SAR intensity data are shown to yield higher classification accuracies than one-look SAR complex amplitude data. The MAP classifier is extended to the case in which the radar backscatter from the remotely sensed surface varies within the SAR image because of incidence angle effects. The results obtained illustrate the practicality of the method for combining SAR intensity observations acquired at two different frequencies and for improving classification accuracy of SAR data.
Learning accurate very fast decision trees from uncertain data streams
NASA Astrophysics Data System (ADS)
Liang, Chunquan; Zhang, Yang; Shi, Peng; Hu, Zhengguo
2015-12-01
Most existing works on data stream classification assume the streaming data is precise and definite. Such assumption, however, does not always hold in practice, since data uncertainty is ubiquitous in data stream applications due to imprecise measurement, missing values, privacy protection, etc. The goal of this paper is to learn accurate decision tree models from uncertain data streams for classification analysis. On the basis of very fast decision tree (VFDT) algorithms, we proposed an algorithm for constructing an uncertain VFDT tree with classifiers at tree leaves (uVFDTc). The uVFDTc algorithm can exploit uncertain information effectively and efficiently in both the learning and the classification phases. In the learning phase, it uses Hoeffding bound theory to learn from uncertain data streams and yield fast and reasonable decision trees. In the classification phase, at tree leaves it uses uncertain naive Bayes (UNB) classifiers to improve the classification performance. Experimental results on both synthetic and real-life datasets demonstrate the strong ability of uVFDTc to classify uncertain data streams. The use of UNB at tree leaves has improved the performance of uVFDTc, especially the any-time property, the benefit of exploiting uncertain information, and the robustness against uncertainty.
Li, X C; Li, J S; Meng, L; Bai, Y N; Yu, D S; Liu, X N; Liu, X F; Jiang, X J; Ren, X W; Yang, X T; Shen, X P; Zhang, J W
2017-08-10
Objective: To understand the dominant pathogens of febrile respiratory syndrome (FRS) patients in Gansu province and to establish the Bayes discriminant function in order to identify the patients infected with the dominant pathogens. Methods: FRS patients were collected in various sentinel hospitals of Gansu province from 2009 to 2015 and the dominant pathogens were determined by describing the composition of pathogenic profile. Significant clinical variables were selected by stepwise discriminant analysis to establish the Bayes discriminant function. Results: In the detection of pathogens for FRS, both influenza virus and rhinovirus showed higher positive rates than those caused by other viruses (13.79%, 8.63%), that accounting for 54.38%, 13.73% of total viral positive patients. Most frequently detected bacteria would include Streptococcus pneumoniae , and haemophilus influenza (44.41%, 18.07%) that accounting for 66.21% and 24.55% among the bacterial positive patients. The original-validated rate of discriminant function, established by 11 clinical variables, was 73.1%, with the cross-validated rate as 70.6%. Conclusion: Influenza virus, Rhinovirus, Streptococcus pneumoniae and Haemophilus influenzae were the dominant pathogens of FRS in Gansu province. Results from the Bayes discriminant analysis showed both higher accuracy in the classification of dominant pathogens, and applicative value for FRS.
Wen, Tingxi; Zhang, Zhongnan
2017-01-01
Abstract In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy. PMID:28489789
Wen, Tingxi; Zhang, Zhongnan
2017-05-01
In this paper, genetic algorithm-based frequency-domain feature search (GAFDS) method is proposed for the electroencephalogram (EEG) analysis of epilepsy. In this method, frequency-domain features are first searched and then combined with nonlinear features. Subsequently, these features are selected and optimized to classify EEG signals. The extracted features are analyzed experimentally. The features extracted by GAFDS show remarkable independence, and they are superior to the nonlinear features in terms of the ratio of interclass distance and intraclass distance. Moreover, the proposed feature search method can search for features of instantaneous frequency in a signal after Hilbert transformation. The classification results achieved using these features are reasonable; thus, GAFDS exhibits good extensibility. Multiple classical classifiers (i.e., k-nearest neighbor, linear discriminant analysis, decision tree, AdaBoost, multilayer perceptron, and Naïve Bayes) achieve satisfactory classification accuracies by using the features generated by the GAFDS method and the optimized feature selection. The accuracies for 2-classification and 3-classification problems may reach up to 99% and 97%, respectively. Results of several cross-validation experiments illustrate that GAFDS is effective in the extraction of effective features for EEG classification. Therefore, the proposed feature selection and optimization model can improve classification accuracy.
Mujtaba, Ghulam; Shuib, Liyana; Raj, Ram Gopal; Rajandram, Retnagowri; Shaikh, Khairunisa
2018-07-01
Automatic text classification techniques are useful for classifying plaintext medical documents. This study aims to automatically predict the cause of death from free text forensic autopsy reports by comparing various schemes for feature extraction, term weighing or feature value representation, text classification, and feature reduction. For experiments, the autopsy reports belonging to eight different causes of death were collected, preprocessed and converted into 43 master feature vectors using various schemes for feature extraction, representation, and reduction. The six different text classification techniques were applied on these 43 master feature vectors to construct a classification model that can predict the cause of death. Finally, classification model performance was evaluated using four performance measures i.e. overall accuracy, macro precision, macro-F-measure, and macro recall. From experiments, it was found that that unigram features obtained the highest performance compared to bigram, trigram, and hybrid-gram features. Furthermore, in feature representation schemes, term frequency, and term frequency with inverse document frequency obtained similar and better results when compared with binary frequency, and normalized term frequency with inverse document frequency. Furthermore, the chi-square feature reduction approach outperformed Pearson correlation, and information gain approaches. Finally, in text classification algorithms, support vector machine classifier outperforms random forest, Naive Bayes, k-nearest neighbor, decision tree, and ensemble-voted classifier. Our results and comparisons hold practical importance and serve as references for future works. Moreover, the comparison outputs will act as state-of-art techniques to compare future proposals with existing automated text classification techniques. Copyright © 2017 Elsevier Ltd and Faculty of Forensic and Legal Medicine. All rights reserved.
Fuzzy inference system for identification of geological stratigraphy off Prydz Bay, East Antarctica
NASA Astrophysics Data System (ADS)
Singh, Upendra K.
2011-12-01
The analysis of well logging data plays key role in the exploration and development of hydrocarbon reservoirs. Various well log parameters such as porosity, gamma ray, density, transit time and resistivity, help in classification of strata and estimation of the physical, electrical and acoustical properties of the subsurface lithology. Strong and conspicuous changes in some of the log parameters associated with any particular geological stratigraphy formation are function of its composition, physical properties that help in classification. However some substrata show moderate values in respective log parameters and make difficult to identify the kind of strata, if we go by the standard variability ranges of any log parameters and visual inspection. The complexity increases further with more number of sensors involved. An attempt is made to identify the kinds of stratigraphy from well logs over Prydz bay basin, East Antarctica using fuzzy inference system. A model is built based on few data sets of known stratigraphy and further the network model is used as test model to infer the lithology of a borehole from their geophysical logs, not used in simulation. Initially the fuzzy based algorithm is trained, validated and tested on well log data and finally identifies the formation lithology of a hydrocarbon reservoir system of study area. The effectiveness of this technique is demonstrated by the analysis of the results for actual lithologs and coring data of ODP Leg 188. The fuzzy results show that the training performance equals to 82.95% while the prediction ability is 87.69%. The fuzzy results are very encouraging and the model is able to decipher even thin layer seams and other strata from geophysical logs. The result provides the significant sand formation of depth range 316.0- 341.0 m, where core recovery is incomplete.
Chapman, Brian E; Lee, Sean; Kang, Hyunseok Peter; Chapman, Wendy W
2011-10-01
In this paper we describe an application called peFinder for document-level classification of CT pulmonary angiography reports. peFinder is based on a generalized version of the ConText algorithm, a simple text processing algorithm for identifying features in clinical report documents. peFinder was used to answer questions about the disease state (pulmonary emboli present or absent), the certainty state of the diagnosis (uncertainty present or absent), the temporal state of an identified pulmonary embolus (acute or chronic), and the technical quality state of the exam (diagnostic or not diagnostic). Gold standard answers for each question were determined from the consensus classifications of three human annotators. peFinder results were compared to naive Bayes' classifiers using unigrams and bigrams. The sensitivities (and positive predictive values) for peFinder were 0.98(0.83), 0.86(0.96), 0.94(0.93), and 0.60(0.90) for disease state, quality state, certainty state, and temporal state respectively, compared to 0.68(0.77), 0.67(0.87), 0.62(0.82), and 0.04(0.25) for the naive Bayes' classifier using unigrams, and 0.75(0.79), 0.52(0.69), 0.59(0.84), and 0.04(0.25) for the naive Bayes' classifier using bigrams. Copyright © 2011 Elsevier Inc. All rights reserved.
Text Classification for Intelligent Portfolio Management
2002-05-01
years including nearest neighbor classification [15], naive Bayes with EM (Ex- pectation Maximization) [11] [13], Winnow with active learning [10... Active Learning and Expectation Maximization (EM). In particular, active learning is used to actively select documents for labeling, then EM assigns...generalization with active learning . Machine Learning, 15(2):201–221, 1994. [3] I. Dagan and P. Engelson. Committee-based sampling for training
Figueroa, Rosa L; Flores, Christopher A
2016-08-01
Obesity is a chronic disease with an increasing impact on the world's population. In this work, we present a method of identifying obesity automatically using text mining techniques and information related to body weight measures and obesity comorbidities. We used a dataset of 3015 de-identified medical records that contain labels for two classification problems. The first classification problem distinguishes between obesity, overweight, normal weight, and underweight. The second classification problem differentiates between obesity types: super obesity, morbid obesity, severe obesity and moderate obesity. We used a Bag of Words approach to represent the records together with unigram and bigram representations of the features. We implemented two approaches: a hierarchical method and a nonhierarchical one. We used Support Vector Machine and Naïve Bayes together with ten-fold cross validation to evaluate and compare performances. Our results indicate that the hierarchical approach does not work as well as the nonhierarchical one. In general, our results show that Support Vector Machine obtains better performances than Naïve Bayes for both classification problems. We also observed that bigram representation improves performance compared with unigram representation.
Movement imagery classification in EMOTIV cap based system by Naïve Bayes.
Stock, Vinicius N; Balbinot, Alexandre
2016-08-01
Brain-computer interfaces (BCI) provide means of communications and control, in assistive technology, which do not require motor activity from the user. The goal of this study is to promote classification of two types of imaginary movements, left and right hands, in an EMOTIV cap based system, using the Naïve Bayes classifier. A preliminary analysis with respect to results obtained by other experiments in this field is also conducted. Processing of the electroencephalography (EEG) signals is done applying Common Spatial Pattern filters. The EPOC electrodes cap is used for EEG acquisition, in two test subjects, for two distinct trial formats. The channels picked are FC5, FC6, P7 and P8 of the 10-20 system, and a discussion about the differences of using C3, C4, P3 and P4 positions is proposed. Dataset 3 of the BCI Competition II is also analyzed using the implemented algorithms. The maximum classification results for the proposed experiment and for the BCI Competition dataset were, respectively, 79% and 85% The conclusion of this study is that the picked positions for electrodes may be applied for BCI systems with satisfactory classification rates.
Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A; Yurramendi, Yosu; Santiago, Josu
2017-01-01
This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class ("tuna" or "no-tuna") and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed.
Uranga, Jon; Arrizabalaga, Haritz; Boyra, Guillermo; Hernandez, Maria Carmen; Goñi, Nicolas; Arregui, Igor; Fernandes, Jose A.; Yurramendi, Yosu; Santiago, Josu
2017-01-01
This study presents a methodology for the automated analysis of commercial medium-range sonar signals for detecting presence/absence of bluefin tuna (Tunnus thynnus) in the Bay of Biscay. The approach uses image processing techniques to analyze sonar screenshots. For each sonar image we extracted measurable regions and analyzed their characteristics. Scientific data was used to classify each region into a class (“tuna” or “no-tuna”) and build a dataset to train and evaluate classification models by using supervised learning. The methodology performed well when validated with commercial sonar screenshots, and has the potential to automatically analyze high volumes of data at a low cost. This represents a first milestone towards the development of acoustic, fishery-independent indices of abundance for bluefin tuna in the Bay of Biscay. Future research lines and additional alternatives to inform stock assessments are also discussed. PMID:28152032
An analysis of USSPACECOM's space surveillance network sensor tasking methodology
NASA Astrophysics Data System (ADS)
Berger, Jeff M.; Moles, Joseph B.; Wilsey, David G.
1992-12-01
This study provides the basis for the development of a cost/benefit assessment model to determine the effects of alterations to the Space Surveillance Network (SSN) on orbital element (OE) set accuracy. It provides a review of current methods used by NORAD and the SSN to gather and process observations, an alternative to the current Gabbard classification method, and the development of a model to determine the effects of observation rate and correction interval on OE set accuracy. The proposed classification scheme is based on satellite J2 perturbations. Specifically, classes were established based on mean motion, eccentricity, and inclination since J2 perturbation effects are functions of only these elements. Model development began by creating representative sensor observations using a highly accurate orbital propagation model. These observations were compared to predicted observations generated using the NORAD Simplified General Perturbation (SGP4) model and differentially corrected using a Bayes, sequential estimation, algorithm. A 10-run Monte Carlo analysis was performed using this model on 12 satellites using 16 different observation rate/correction interval combinations. An ANOVA and confidence interval analysis of the results show that this model does demonstrate the differences in steady state position error based on varying observation rate and correction interval.
A novel artificial immune clonal selection classification and rule mining with swarm learning model
NASA Astrophysics Data System (ADS)
Al-Sheshtawi, Khaled A.; Abdul-Kader, Hatem M.; Elsisi, Ashraf B.
2013-06-01
Metaheuristic optimisation algorithms have become popular choice for solving complex problems. By integrating Artificial Immune clonal selection algorithm (CSA) and particle swarm optimisation (PSO) algorithm, a novel hybrid Clonal Selection Classification and Rule Mining with Swarm Learning Algorithm (CS2) is proposed. The main goal of the approach is to exploit and explore the parallel computation merit of Clonal Selection and the speed and self-organisation merits of Particle Swarm by sharing information between clonal selection population and particle swarm. Hence, we employed the advantages of PSO to improve the mutation mechanism of the artificial immune CSA and to mine classification rules within datasets. Consequently, our proposed algorithm required less training time and memory cells in comparison to other AIS algorithms. In this paper, classification rule mining has been modelled as a miltiobjective optimisation problem with predictive accuracy. The multiobjective approach is intended to allow the PSO algorithm to return an approximation to the accuracy and comprehensibility border, containing solutions that are spread across the border. We compared our proposed algorithm classification accuracy CS2 with five commonly used CSAs, namely: AIRS1, AIRS2, AIRS-Parallel, CLONALG, and CSCA using eight benchmark datasets. We also compared our proposed algorithm classification accuracy CS2 with other five methods, namely: Naïve Bayes, SVM, MLP, CART, and RFB. The results show that the proposed algorithm is comparable to the 10 studied algorithms. As a result, the hybridisation, built of CSA and PSO, can develop respective merit, compensate opponent defect, and make search-optimal effect and speed better.
NASA Astrophysics Data System (ADS)
Clark, M. L.
2016-12-01
The goal of this study was to assess multi-temporal, Hyperspectral Infrared Imager (HyspIRI) satellite imagery for improved forest class mapping relative to multispectral satellites. The study area was the western San Francisco Bay Area, California and forest alliances (e.g., forest communities defined by dominant or co-dominant trees) were defined using the U.S. National Vegetation Classification System. Simulated 30-m HyspIRI, Landsat 8 and Sentinel-2 imagery were processed from image data acquired by NASA's AVIRIS airborne sensor in year 2015, with summer and multi-temporal (spring, summer, fall) data analyzed separately. HyspIRI reflectance was used to generate a suite of hyperspectral metrics that targeted key spectral features related to chemical and structural properties. The Random Forests classifier was applied to the simulated images and overall accuracies (OA) were compared to those from real Landsat 8 images. For each image group, broad land cover (e.g., Needle-leaf Trees, Broad-leaf Trees, Annual agriculture, Herbaceous, Built-up) was classified first, followed by a finer-detail forest alliance classification for pixels mapped as closed-canopy forest. There were 5 needle-leaf tree alliances and 16 broad-leaf tree alliances, including 7 Quercus (oak) alliance types. No forest alliance classification exceeded 50% OA, indicating that there was broad spectral similarity among alliances, most of which were not spectrally pure but rather a mix of tree species. In general, needle-leaf (Pine, Redwood, Douglas Fir) alliances had better class accuracies than broad-leaf alliances (Oaks, Madrone, Bay Laurel, Buckeye, etc). Multi-temporal data classifications all had 5-6% greater OA than with comparable summer data. For simulated data, HyspIRI metrics had 4-5% greater OA than Landsat 8 and Sentinel-2 multispectral imagery and 3-4% greater OA than HyspIRI reflectance. Finally, HyspIRI metrics had 8% greater OA than real Landsat 8 imagery. In conclusion, forest alliance classification was found to be a difficult remote sensing application with moderate resolution (30 m) satellite imagery; however, of the data tested, HyspIRI spectral metrics had the best performance relative to multispectral satellites.
NASA Astrophysics Data System (ADS)
Biondo, Manuela; Bartholomä, Alexander
2017-04-01
One of the burning issues on the topic of acoustic seabed classification is the lack of solid, repeatable, statistical procedures that can support the verification of acoustic variability in relation to seabed properties. Acoustic sediment classification schemes often lead to biased and subjective interpretation, as they ultimately aim at an oversimplified categorization of the seabed based on conventionally defined sediment types. However, grain size variability alone cannot be accounted for acoustic diversity, which will be ultimately affected by multiple physical processes, scale of heterogeneity, instrument settings, data quality, image processing and segmentation performances. Understanding and assessing the weight of all of these factors on backscatter is a difficult task, due to the spatially limited and fragmentary knowledge of the seabed from of direct observations (e.g. grab samples, cores, videos). In particular, large-scale mapping requires an enormous availability of ground-truthing data that is often obtained from heterogeneous and multidisciplinary sources, resulting into a further chance of misclassification. Independently from all of these limitations, acoustic segments still contain signals for seabed changes that, if appropriate procedures are established, can be translated into meaningful knowledge. In this study we design a simple, repeatable method, based on multivariate procedures, with the scope to classify a 100 km2, high-frequency (450 kHz) sidescan sonar mosaic acquired in the year 2012 in the shallow upper-mesotidal inlet of the Jade Bay (German North Sea coast). The tool used for the automated classification of the backscatter mosaic is the QTC SWATHVIEWTMsoftware. The ground-truthing database included grab sample data from multiple sources (2009-2011). The method was designed to extrapolate quantitative descriptors for acoustic backscatter and model their spatial changes in relation to grain size distribution and morphology. The modelled relationships were used to: 1) asses the automated segmentation performance, 2) obtain a ranking of most discriminant seabed attributes responsible for acoustic diversity, 3) select the best-fit ground-truthing information to characterize each acoustic class. Using a supervised Linear Discriminant Analysis (LDA), relationships between seabed parameters and acoustic classes discrimination were modelled, and acoustic classes for each data point were predicted. The model predicted a success rate of 63.5%. An unsupervised LDA was used to model relationships between acoustic variables and clustered seabed categories with the scope of identifying misrepresentative ground-truthing data points. The model prediction scored a success rate of 50.8%. Misclassified data points were disregarded for final classification. Analyses led to clearer, more accurate appreciation of relationship patterns and improved understanding of site-specific processes affecting the acoustic signal. Value to the qualitative classification output was added by comparing the latter with a more recent set of acoustic and ground-truthing information (2014). Classification resulted in the first acoustic sediment map ever produced in the area and offered valuable knowledge for detailed sediment variability. The method proved to be a simple, repeatable strategy that may be applied to similar work and environments.
Automated Classification of Pathology Reports.
Oleynik, Michel; Finger, Marcelo; Patrão, Diogo F C
2015-01-01
This work develops an automated classifier of pathology reports which infers the topography and the morphology classes of a tumor using codes from the International Classification of Diseases for Oncology (ICD-O). Data from 94,980 patients of the A.C. Camargo Cancer Center was used for training and validation of Naive Bayes classifiers, evaluated by the F1-score. Measures greater than 74% in the topographic group and 61% in the morphologic group are reported. Our work provides a successful baseline for future research for the classification of medical documents written in Portuguese and in other domains.
Barbosa, Rommel Melgaço; Nacano, Letícia Ramos; Freitas, Rodolfo; Batista, Bruno Lemos; Barbosa, Fernando
2014-09-01
This article aims to evaluate 2 machine learning algorithms, decision trees and naïve Bayes (NB), for egg classification (free-range eggs compared with battery eggs). The database used for the study consisted of 15 chemical elements (As, Ba, Cd, Co, Cs, Cu, Fe, Mg, Mn, Mo, Pb, Se, Sr, V, and Zn) determined in 52 eggs samples (20 free-range and 32 battery eggs) by inductively coupled plasma mass spectrometry. Our results demonstrated that decision trees and NB associated with the mineral contents of eggs provide a high level of accuracy (above 80% and 90%, respectively) for classification between free-range and battery eggs and can be used as an alternative method for adulteration evaluation. © 2014 Institute of Food Technologists®
Gasson, Peter; Miller, Regis; Stekel, Dov J.; Whinder, Frances; Ziemińska, Kasia
2010-01-01
Background and Aims Dalbergia nigra is one of the most valuable timber species of its genus, having been traded for over 300 years. Due to over-exploitation it is facing extinction and trade has been banned under CITES Appendix I since 1992. Current methods, primarily comparative wood anatomy, are inadequate for conclusive species identification. This study aims to find a set of anatomical characters that distinguish the wood of D. nigra from other commercially important species of Dalbergia from Latin America. Methods Qualitative and quantitative wood anatomy, principal components analysis and naïve Bayes classification were conducted on 43 specimens of Dalbergia, eight D. nigra and 35 from six other Latin American species. Key Results Dalbergia cearensis and D. miscolobium can be distinguished from D. nigra on the basis of vessel frequency for the former, and ray frequency for the latter. Principal components analysis was unable to provide any further basis for separating the species. Naïve Bayes classification using the four characters: minimum vessel diameter; frequency of solitary vessels; mean ray width; and frequency of axially fused rays, classified all eight D. nigra correctly with no false negatives, but there was a false positive rate of 36·36 %. Conclusions Wood anatomy alone cannot distinguish D. nigra from all other commercially important Dalbergia species likely to be encountered by customs officials, but can be used to reduce the number of specimens that would need further study. PMID:19884155
Gasson, Peter; Miller, Regis; Stekel, Dov J; Whinder, Frances; Zieminska, Kasia
2010-01-01
Dalbergia nigra is one of the most valuable timber species of its genus, having been traded for over 300 years. Due to over-exploitation it is facing extinction and trade has been banned under CITES Appendix I since 1992. Current methods, primarily comparative wood anatomy, are inadequate for conclusive species identification. This study aims to find a set of anatomical characters that distinguish the wood of D. nigra from other commercially important species of Dalbergia from Latin America. Qualitative and quantitative wood anatomy, principal components analysis and naïve Bayes classification were conducted on 43 specimens of Dalbergia, eight D. nigra and 35 from six other Latin American species. Dalbergia cearensis and D. miscolobium can be distinguished from D. nigra on the basis of vessel frequency for the former, and ray frequency for the latter. Principal components analysis was unable to provide any further basis for separating the species. Naïve Bayes classification using the four characters: minimum vessel diameter; frequency of solitary vessels; mean ray width; and frequency of axially fused rays, classified all eight D. nigra correctly with no false negatives, but there was a false positive rate of 36.36 %. Wood anatomy alone cannot distinguish D. nigra from all other commercially important Dalbergia species likely to be encountered by customs officials, but can be used to reduce the number of specimens that would need further study.
Ackerman, Seth D.; Pappal, Adrienne L.; Huntley, Emily C.; Blackwood, Dann S.; Schwab, William C.
2015-01-01
Sea-floor sample collection is an important component of a statewide cooperative mapping effort between the U.S. Geological Survey (USGS) and the Massachusetts Office of Coastal Zone Management (CZM). Sediment grab samples, bottom photographs, and video transects were collected within Vineyard Sound and Buzzards Bay in 2010 aboard the research vesselConnecticut. This report contains sample data and related information, including analyses of surficial-sediment grab samples, locations and images of sea-floor photography, survey lines along which sea-floor video was collected, and a classification of benthic biota observed in sea-floor photographs and based on the Coastal and Marine Ecological Classification Standard (CMECS). These sample data and analyses information are used to verify interpretations of geophysical data and are an essential part of geologic maps of the sea floor. These data also provide a valuable inventory of benthic habitat and resources. Geographic information system (GIS) data, maps, and interpretations, produced through the USGS and CZM mapping cooperative, are intended to aid efforts to manage coastal and marine resources and to provide baseline information for research focused on coastal evolution and environmental change.
CFS-SMO based classification of breast density using multiple texture models.
Sharma, Vipul; Singh, Sukhwinder
2014-06-01
It is highly acknowledged in the medical profession that density of breast tissue is a major cause for the growth of breast cancer. Increased breast density was found to be linked with an increased risk of breast cancer growth, as high density makes it difficult for radiologists to see an abnormality which leads to false negative results. Therefore, there is need for the development of highly efficient techniques for breast tissue classification based on density. This paper presents a hybrid scheme for classification of fatty and dense mammograms using correlation-based feature selection (CFS) and sequential minimal optimization (SMO). In this work, texture analysis is done on a region of interest selected from the mammogram. Various texture models have been used to quantify the texture of parenchymal patterns of breast. To reduce the dimensionality and to identify the features which differentiate between breast tissue densities, CFS is used. Finally, classification is performed using SMO. The performance is evaluated using 322 images of mini-MIAS database. Highest accuracy of 96.46% is obtained for two-class problem (fatty and dense) using proposed approach. Performance of selected features by CFS is also evaluated by Naïve Bayes, Multilayer Perceptron, RBF Network, J48 and kNN classifier. The proposed CFS-SMO method outperforms all other classifiers giving a sensitivity of 100%. This makes it suitable to be taken as a second opinion in classifying breast tissue density.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification.
Fan, Jianqing; Feng, Yang; Jiang, Jiancheng; Tong, Xin
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing.
Feature Augmentation via Nonparametrics and Selection (FANS) in High-Dimensional Classification
Feng, Yang; Jiang, Jiancheng; Tong, Xin
2015-01-01
We propose a high dimensional classification method that involves nonparametric feature augmentation. Knowing that marginal density ratios are the most powerful univariate classifiers, we use the ratio estimates to transform the original feature measurements. Subsequently, penalized logistic regression is invoked, taking as input the newly transformed or augmented features. This procedure trains models equipped with local complexity and global simplicity, thereby avoiding the curse of dimensionality while creating a flexible nonlinear decision boundary. The resulting method is called Feature Augmentation via Nonparametrics and Selection (FANS). We motivate FANS by generalizing the Naive Bayes model, writing the log ratio of joint densities as a linear combination of those of marginal densities. It is related to generalized additive models, but has better interpretability and computability. Risk bounds are developed for FANS. In numerical analysis, FANS is compared with competing methods, so as to provide a guideline on its best application domain. Real data analysis demonstrates that FANS performs very competitively on benchmark email spam and gene expression data sets. Moreover, FANS is implemented by an extremely fast algorithm through parallel computing. PMID:27185970
Detection of dechallenge in spontaneous reporting systems: a comparison of Bayes methods.
Banu, A Bazila; Alias Balamurugan, S Appavu; Thirumalaikolundusubramanian, Ponniah
2014-01-01
Dechallenge is a response observed for the reduction or disappearance of adverse drug reactions (ADR) on withdrawal of a drug from a patient. Currently available algorithms to detect dechallenge have limitations. Hence, there is a need to compare available new methods. To detect dechallenge in Spontaneous Reporting Systems, data-mining algorithms like Naive Bayes and Improved Naive Bayes were applied for comparing the performance of the algorithms in terms of accuracy and error. Analyzing the factors of dechallenge like outcome and disease category will help medical practitioners and pharmaceutical industries to determine the reasons for dechallenge in order to take essential steps toward drug safety. Adverse drug reactions of the year 2011 and 2012 were downloaded from the United States Food and Drug Administration's database. The outcome of classification algorithms showed that Improved Naive Bayes algorithm outperformed Naive Bayes with accuracy of 90.11% and error of 9.8% in detecting the dechallenge. Detecting dechallenge for unknown samples are essential for proper prescription. To overcome the issues exposed by Naive Bayes algorithm, Improved Naive Bayes algorithm can be used to detect dechallenge in terms of higher accuracy and minimal error.
Multivariate spline methods in surface fitting
NASA Technical Reports Server (NTRS)
Guseman, L. F., Jr. (Principal Investigator); Schumaker, L. L.
1984-01-01
The use of spline functions in the development of classification algorithms is examined. In particular, a method is formulated for producing spline approximations to bivariate density functions where the density function is decribed by a histogram of measurements. The resulting approximations are then incorporated into a Bayesiaan classification procedure for which the Bayes decision regions and the probability of misclassification is readily computed. Some preliminary numerical results are presented to illustrate the method.
1990-05-01
MOBILE COUNTY, ALABAMA MAY 1990 FT~~f f r’ep l c sl m 0 F UNCLASSIFIED SECURITY CLASSIFICATION OF THIS PAGE rw’hen Dat. Entered) REPOT DCUMNTATON...1990 Portersville Bay, Mobile County, Ala. 6. PERFORMING ORG. REPORT NUMBER 7. AUJTHOR(*) 8. CONTRACT OR GRANT NUMBER(*.) Johnny L. Grandison 9. PE...RFORMING ORGANIZATION NAME AND ADDRESS 10. PROGRAM1 ELEMENT, PROJECT, TASK U.S. Army En(7ineer District, Mobile AREA& ORK UNIT NUMBERS Plan Develop
Trusel, Luke D.; Cochrane, Guy R.; Etherington, Lisa L.; Powell, Ross D.; Mayer, Larry A.
2010-01-01
Seafloor geology and potential benthic habitats were mapped in Muir Inlet, Glacier Bay National Park and Preserve, Alaska, using multibeam sonar, ground-truth information, and geological interpretations. Muir Inlet is a recently deglaciated fjord that is under the influence of glacial and paraglacial marine processes. High glacially derived sediment and meltwater fluxes, slope instabilities, and variable bathymetry result in a highly dynamic estuarine environment and benthic ecosystem. We characterize the fjord seafloor and potential benthic habitats using the Coastal and Marine Ecological Classification Standard (CMECS) recently developed by the National Oceanic and Atmospheric Administration (NOAA) and NatureServe. Substrates within Muir Inlet are dominated by mud, derived from the high glacial debris flux. Water-column characteristics are derived from a combination of conductivity temperature depth (CTD) measurements and circulation-model results. We also present modern glaciomarine sediment accumulation data from quantitative differential bathymetry. These data show Muir Inlet is divided into two contrasting environments: a dynamic upper fjord and a relatively static lower fjord. The accompanying maps represent the first publicly available high-resolution bathymetric surveys of Muir Inlet. The results of these analyses serve as a test of the CMECS and as a baseline for continued mapping and correlations among seafloor substrate, benthic habitats, and glaciomarine processes.
Zhang, Yiyan; Xin, Yi; Li, Qin; Ma, Jianshe; Li, Shuai; Lv, Xiaodan; Lv, Weiqi
2017-11-02
Various kinds of data mining algorithms are continuously raised with the development of related disciplines. The applicable scopes and their performances of these algorithms are different. Hence, finding a suitable algorithm for a dataset is becoming an important emphasis for biomedical researchers to solve practical problems promptly. In this paper, seven kinds of sophisticated active algorithms, namely, C4.5, support vector machine, AdaBoost, k-nearest neighbor, naïve Bayes, random forest, and logistic regression, were selected as the research objects. The seven algorithms were applied to the 12 top-click UCI public datasets with the task of classification, and their performances were compared through induction and analysis. The sample size, number of attributes, number of missing values, and the sample size of each class, correlation coefficients between variables, class entropy of task variable, and the ratio of the sample size of the largest class to the least class were calculated to character the 12 research datasets. The two ensemble algorithms reach high accuracy of classification on most datasets. Moreover, random forest performs better than AdaBoost on the unbalanced dataset of the multi-class task. Simple algorithms, such as the naïve Bayes and logistic regression model are suitable for a small dataset with high correlation between the task and other non-task attribute variables. K-nearest neighbor and C4.5 decision tree algorithms perform well on binary- and multi-class task datasets. Support vector machine is more adept on the balanced small dataset of the binary-class task. No algorithm can maintain the best performance in all datasets. The applicability of the seven data mining algorithms on the datasets with different characteristics was summarized to provide a reference for biomedical researchers or beginners in different fields.
NASA Astrophysics Data System (ADS)
Pham, Binh Thai; Prakash, Indra; Tien Bui, Dieu
2018-02-01
A hybrid machine learning approach of Random Subspace (RSS) and Classification And Regression Trees (CART) is proposed to develop a model named RSSCART for spatial prediction of landslides. This model is a combination of the RSS method which is known as an efficient ensemble technique and the CART which is a state of the art classifier. The Luc Yen district of Yen Bai province, a prominent landslide prone area of Viet Nam, was selected for the model development. Performance of the RSSCART model was evaluated through the Receiver Operating Characteristic (ROC) curve, statistical analysis methods, and the Chi Square test. Results were compared with other benchmark landslide models namely Support Vector Machines (SVM), single CART, Naïve Bayes Trees (NBT), and Logistic Regression (LR). In the development of model, ten important landslide affecting factors related with geomorphology, geology and geo-environment were considered namely slope angles, elevation, slope aspect, curvature, lithology, distance to faults, distance to rivers, distance to roads, and rainfall. Performance of the RSSCART model (AUC = 0.841) is the best compared with other popular landslide models namely SVM (0.835), single CART (0.822), NBT (0.821), and LR (0.723). These results indicate that performance of the RSSCART is a promising method for spatial landslide prediction.
Textual and visual content-based anti-phishing: a Bayesian approach.
Zhang, Haijun; Liu, Gang; Chow, Tommy W S; Liu, Wenyin
2011-10-01
A novel framework using a Bayesian approach for content-based phishing web page detection is presented. Our model takes into account textual and visual contents to measure the similarity between the protected web page and suspicious web pages. A text classifier, an image classifier, and an algorithm fusing the results from classifiers are introduced. An outstanding feature of this paper is the exploration of a Bayesian model to estimate the matching threshold. This is required in the classifier for determining the class of the web page and identifying whether the web page is phishing or not. In the text classifier, the naive Bayes rule is used to calculate the probability that a web page is phishing. In the image classifier, the earth mover's distance is employed to measure the visual similarity, and our Bayesian model is designed to determine the threshold. In the data fusion algorithm, the Bayes theory is used to synthesize the classification results from textual and visual content. The effectiveness of our proposed approach was examined in a large-scale dataset collected from real phishing cases. Experimental results demonstrated that the text classifier and the image classifier we designed deliver promising results, the fusion algorithm outperforms either of the individual classifiers, and our model can be adapted to different phishing cases. © 2011 IEEE
NASA Astrophysics Data System (ADS)
Rose, K.
2012-12-01
Habitat mapping and classification provides essential information for land use planning and ecosystem research, monitoring and management. At the Grand Bay National Estuarine Research Reserve (GRDNERR), Mississippi, habitat characterization of the Grand Bay watershed will also be used to develop a decision-support tool for the NERR's managers and state and local partners. Grand Bay NERR habitat units were identified using a combination of remotely sensed imagery, aerial photography and elevation data. Airborne Imaging Spectrometer for Applications (AISA) hyperspectral data, acquired 5 and 6 May 2010, was analyzed and classified using ENVI v4.8 and v5.0 software. The AISA system was configured to return 63 bands of digital imagery data with a spectral range of 400 to 970 nm (VNIR), spectral resolution (bandwidth) at 8.76 nm, and 1 m spatial resolution. Minimum Noise Fraction (MNF) and Inverse Minimum Noise Fraction were applied to the data prior to using Spectral Angle Mapper ([SAM] supervised) and ISODATA (unsupervised) classification techniques. The resulting class image was exported to ArcGIS 10.0 and visually inspected and compared with the original imagery as well as auxiliary datasets to assist in the attribution of habitat characteristics to the spectral classes, including: National Agricultural Imagery Program (NAIP) aerial photography, Jackson County, MS, 2010; USFWS National Wetlands Inventory, 2007; an existing GRDNERR habitat map (2004), SAV (2009) and salt panne (2002-2003) GIS produced by GRDNERR; and USACE lidar topo-bathymetry, 2005. A field survey to validate the map's accuracy will take place during the 2012 summer season. ENVI's Random Sample generator was used to generate GIS points for a ground-truth survey. The broad range of coastal estuarine habitats and geomorphological features- many of which are transitional and vulnerable to environmental stressors- that have been identified within the GRDNERR point to the value of the Reserve for continued coastal research.
NASA Astrophysics Data System (ADS)
Cochrane, G. R.; Hodson, T. O.; Allee, R.; Cicchetti, G.; Finkbeiner, M.; Goodin, K.; Handley, L.; Madden, C.; Mayer, G.; Shumchenia, E.
2012-12-01
The U S Geological Survey (USGS) is one of four primary organizations (along with the National Oceanographic and Atmospheric Administration, the Evironmental Protection Agency, and NatureServe) responsible for the development of the Coastal and Marine Ecological Classification Standard (CMECS) over the past decade. In June 2012 the Federal Geographic Data Committee approved CMECS as the first-ever comprehensive federal standard for classifying and describing coastal and marine ecosystems. The USGS has pioneered the application of CMECS in Glacier Bay, Alaska as part of its Seafloor Mapping and Benthic Habitat Studies Project. This presentation briefly describes the standard and its application as part of geological survey studies in the Western Arm of Glacier Bay. CMECS offers a simple, standard framework and common terminology for describing natural and human influenced ecosystems from the upper tidal reaches of estuaries to the deepest portions of the ocean. The framework is organized into two settings, biogeographic and aquatic, and four components, water column, geoform, substrate, and biotic. Each describes a separate aspect of the environment and biota. Settings and components can be used in combination or independently to describe ecosystem features. The hierarchical arrangement of units of the settings and components allows users to apply CMECS to the scale and specificity that best suits their needs. Modifiers allow users to customize the classification to meet specific needs. Biotopes can be described when there is a need for more detailed information on the biota and their environment. USGS efforts focused primarily on the substrate and geoform components. Previous research has demonstrated three classes of bottom type that can be derived from multibeam data that in part determine the distribution of benthic organisms: soft, flat bottom, mixed bottom including coarse sediment and low-relief rock with low to moderate rugosity, and rugose, hard bottom. The West Arm of Glacier Bay has all of these habitats, with the greatest abundance being soft, flat bottom. In Glacier Bay, species associated with soft, flat bottom habitats include gastropods, algae, flatfish, Tanner crabs, shrimp, sea pen, and other crustaceans; soft corals and sponge dominate areas of boulder and rock substrate. Video observations in the West Arm suggest that geological-biological associations found in central Glacier Bay to be at least partially analogous to associations in the West Arm. Given that soft, mud substrate is the most prevalent habitat in the West Arm, it is expected that the species associated with a soft bottom in the bay proper are the most abundant types of species within the West Arm. While mud is the dominant substrate throughout the fjord, the upper and lower West Arm are potentially very different environments due to the spatially and temporally heterogeneous influence of glaciation and associated effects on fjord hydrologic and oceanographic conditions. Therefore, we expect variations in the distribution of species and the development of biotopes for Glacier Bay will require data applicable to the full spectrum of CMECS components.
Risk Classification with an Adaptive Naive Bayes Kernel Machine Model.
Minnier, Jessica; Yuan, Ming; Liu, Jun S; Cai, Tianxi
2015-04-22
Genetic studies of complex traits have uncovered only a small number of risk markers explaining a small fraction of heritability and adding little improvement to disease risk prediction. Standard single marker methods may lack power in selecting informative markers or estimating effects. Most existing methods also typically do not account for non-linearity. Identifying markers with weak signals and estimating their joint effects among many non-informative markers remains challenging. One potential approach is to group markers based on biological knowledge such as gene structure. If markers in a group tend to have similar effects, proper usage of the group structure could improve power and efficiency in estimation. We propose a two-stage method relating markers to disease risk by taking advantage of known gene-set structures. Imposing a naive bayes kernel machine (KM) model, we estimate gene-set specific risk models that relate each gene-set to the outcome in stage I. The KM framework efficiently models potentially non-linear effects of predictors without requiring explicit specification of functional forms. In stage II, we aggregate information across gene-sets via a regularization procedure. Estimation and computational efficiency is further improved with kernel principle component analysis. Asymptotic results for model estimation and gene set selection are derived and numerical studies suggest that the proposed procedure could outperform existing procedures for constructing genetic risk models.
Hierarchical Rhetorical Sentence Categorization for Scientific Papers
NASA Astrophysics Data System (ADS)
Rachman, G. H.; Khodra, M. L.; Widyantoro, D. H.
2018-03-01
Important information in scientific papers can be composed of rhetorical sentences that is structured from certain categories. To get this information, text categorization should be conducted. Actually, some works in this task have been completed by employing word frequency, semantic similarity words, hierarchical classification, and the others. Therefore, this paper aims to present the rhetorical sentence categorization from scientific paper by employing TF-IDF and Word2Vec to capture word frequency and semantic similarity words and employing hierarchical classification. Every experiment is tested in two classifiers, namely Naïve Bayes and SVM Linear. This paper shows that hierarchical classifier is better than flat classifier employing either TF-IDF or Word2Vec, although it increases only almost 2% from 27.82% when using flat classifier until 29.61% when using hierarchical classifier. It shows also different learning model for child-category can be built by hierarchical classifier.
Prediction of carbonate rock type from NMR responses using data mining techniques
NASA Astrophysics Data System (ADS)
Gonçalves, Eduardo Corrêa; da Silva, Pablo Nascimento; Silveira, Carla Semiramis; Carneiro, Giovanna; Domingues, Ana Beatriz; Moss, Adam; Pritchard, Tim; Plastino, Alexandre; Azeredo, Rodrigo Bagueira de Vasconcellos
2017-05-01
Recent studies have indicated that the accurate identification of carbonate rock types in a reservoir can be employed as a preliminary step to enhance the effectiveness of petrophysical property modeling. Furthermore, rock typing activity has been shown to be of key importance in several steps of formation evaluation, such as the study of sedimentary series, reservoir zonation and well-to-well correlation. In this paper, a methodology based exclusively on the analysis of 1H-NMR (Nuclear Magnetic Resonance) relaxation responses - using data mining algorithms - is evaluated to perform the automatic classification of carbonate samples according to their rock type. We analyze the effectiveness of six different classification algorithms (k-NN, Naïve Bayes, C4.5, Random Forest, SMO and Multilayer Perceptron) and two data preprocessing strategies (discretization and feature selection). The dataset used in this evaluation is formed by 78 1H-NMR T2 distributions of fully brine-saturated rock samples from six different rock type classes. The experiments reveal that the combination of preprocessing strategies with classification algorithms is able to achieve a prediction accuracy of 97.4%.
NASA Astrophysics Data System (ADS)
Yu, Xin; Cao, Liang; Liu, Jinhu; Zhao, Bo; Shan, Xiujuan; Dou, Shuozeng
2014-09-01
We tested the use of otolith shape analysis to discriminate between species and stocks of five goby species ( Ctenotrypauchen chinensis, Odontamblyopus lacepedii, Amblychaeturichthys hexanema, Chaeturichthys stigmatias, and Acanthogobius hasta) found in northern Chinese coastal waters. The five species were well differentiated with high overall classification success using shape indices (83.7%), elliptic Fourier coefficients (98.6%), or the combination of both methods (94.9%). However, shape analysis alone was only moderately successful at discriminating among the four stocks (Liaodong Bay, LD; Bohai Bay, BH; Huanghe (Yellow) River estuary HRE, and Jiaozhou Bay, JZ stocks) of A. hasta (50%-54%) and C. stigmatias (65.7%-75.8%). For these two species, shape analysis was moderately successful at discriminating the HRE or JZ stocks from other stocks, but failed to effectively identify the LD and BH stocks. A large number of otoliths were misclassified between the HRE and JZ stocks, which are geographically well separated. The classification success for stock discrimination was higher using elliptic Fourier coefficients alone (70.2%) or in combination with shape indices (75.8%) than using only shape indices (65.7%) in C. stigmatias whereas there was little difference among the three methods for A. hasta. Our results supported the common belief that otolith shape analysis is generally more effective for interspecific identification than intraspecific discrimination. Moreover, compared with shape indices analysis, Fourier analysis improves classification success during inter- and intra-species discrimination by otolith shape analysis, although this did not necessarily always occur in all fish species.
NASA Astrophysics Data System (ADS)
Candra Permana, Fahmi; Rosmansyah, Yusep; Setiawan Abdullah, Atje
2017-10-01
Students activity on social media can provide implicit knowledge and new perspectives for an educational system. Sentiment analysis is a part of text mining that can help to analyze and classify the opinion data. This research uses text mining and naive Bayes method as opinion classifier, to be used as an alternative methods in the process of evaluating studentss satisfaction for educational institution. Based on test results, this system can determine the opinion classification in Bahasa Indonesia using naive Bayes as opinion classifier with accuracy level of 84% correct, and the comparison between the existing system and the proposed system to evaluate students satisfaction in learning process, there is only a difference of 16.49%.
Geologic characteristics of benthic habitats in Glacier Bay, southeast Alaska
Harney, Jodi N.; Cochrane, Guy R.; Etherington, Lisa L.; Dartnell, Pete; Golden, Nadine E.; Chezar, Hank
2006-01-01
In April 2004, more than 40 hours of georeferenced submarine digital video was collected in water depths of 15-370 m in Glacier Bay to (1) ground-truth existing geophysical data (bathymetry and acoustic reflectance), (2) examine and record geologic characteristics of the sea floor, and (3) investigate the relation between substrate types and benthic communities, and (4) construct predictive maps of seafloor geomorphology and habitat distribution. Common substrates observed include rock, boulders, cobbles, rippled sand, bioturbated mud, and extensive beds of living horse mussels and scallops. Four principal sea-floor geomorphic types are distinguished by using video observations. Their distribution in lower and central Glacier Bay is predicted using a supervised, hierarchical decision-tree statistical classification of geophysical data.
Admiralty Bay Benthos Diversity—A census of a complex polar ecosystem
NASA Astrophysics Data System (ADS)
Siciński, Jacek; Jażdżewski, Krzysztof; Broyer, Claude De; Presler, Piotr; Ligowski, Ryszard; Nonato, Edmundo F.; Corbisier, Thais N.; Petti, Monica A. V.; Brito, Tania A. S.; Lavrado, Helena P.; BŁażewicz-Paszkowycz, Magdalena; Pabis, Krzysztof; Jażdżewska, Anna; Campos, Lucia S.
2011-03-01
A thorough census of Admiralty Bay benthic biodiversity was completed through the synthesis of data, acquired from more than 30 years of observations. Most of the available records arise from successive Polish and Brazilian Antarctic expeditions organized since 1977 and 1982, respectively, but also include new data from joint collecting efforts during the International Polar Year (2007-2009). Geological and hydrological characteristics of Admiralty Bay and a comprehensive species checklist with detailed data on the distribution and nature of the benthic communities are provided. Approximately 1300 species of benthic organisms (excluding bacteria, fungi and parasites) were recorded from the bay's entire depth range (0-500 m). Generalized classifications and the descriptions of soft-bottom and hard-bottom invertebrate communities are presented. A time-series analysis showed seasonal and interannual changes in the shallow benthic communities, likely to be related to ice formation and ice melt within the bay. As one of the best studied regions in the maritime Antarctic Admiralty Bay represents a legacy site, where continued, systematically integrated data sampling can evaluate the effects of climate change on marine life. Both high species richness and high assemblage diversity of the Admiralty Bay shelf benthic community have been documented against the background of habitat heterogeneity.
Prediction of outcome in multiorgan resections for cancer using a bayes-network.
Udelnow, Andrej; Leinung, Steffen; Grochola, Lukasz Filipp; Henne-Bruns, Doris; Wfcrl, Peter
2013-01-01
The long-term success of multivisceral resections for cancer is difficult to forecast due to the complexity of factors influencing the prognosis. The aim of our study was to assess the predictivity of a Bayes network for the postoperative outcome and survival. We included each oncologic patient undergoing resection of 4 or more organs from 2002 till 2005 at the Ulm university hospital. Preoperative data were assessed as well as the tumour classification, the resected organs, intra- and postoperative complications and overall survival. Using the Genie 2.0 software we developed a Bayes network. Multivisceral tumour resections were performed in 22 patients. The receiver operating curve areas of the variables "survival >12 months" and "hospitalisation >28 days" as predicted by the Bayes network were 0.81 and 0.77 and differed significantly from 0.5 (p: 0.019 and 0.028, respectively). The positive predictive values of the Bayes network for these variables were 1 and 0.8 and the negative ones 0.71 and 0.88, respectively. Bayes networks are useful for the prognosis estimation of individual patients and can help to decide whether to perform a multivisceral resection for cancer.
Liu, Jingfang; Zhang, Pengzhu; Lu, Yingjie
2014-11-01
User-generated medical messages on Internet contain extensive information related to adverse drug reactions (ADRs) and are known as valuable resources for post-marketing drug surveillance. The aim of this study was to find an effective method to identify messages related to ADRs automatically from online user reviews. We conducted experiments on online user reviews using different feature set and different classification technique. Firstly, the messages from three communities, allergy community, schizophrenia community and pain management community, were collected, the 3000 messages were annotated. Secondly, the N-gram-based features set and medical domain-specific features set were generated. Thirdly, three classification techniques, SVM, C4.5 and Naïve Bayes, were used to perform classification tasks separately. Finally, we evaluated the performance of different method using different feature set and different classification technique by comparing the metrics including accuracy and F-measure. In terms of accuracy, the accuracy of SVM classifier was higher than 0.8, the accuracy of C4.5 classifier or Naïve Bayes classifier was lower than 0.8; meanwhile, the combination feature sets including n-gram-based feature set and domain-specific feature set consistently outperformed single feature set. In terms of F-measure, the highest F-measure is 0.895 which was achieved by using combination feature sets and a SVM classifier. In all, we can get the best classification performance by using combination feature sets and SVM classifier. By using combination feature sets and SVM classifier, we can get an effective method to identify messages related to ADRs automatically from online user reviews.
Content Abstract Classification Using Naive Bayes
NASA Astrophysics Data System (ADS)
Latif, Syukriyanto; Suwardoyo, Untung; Aldrin Wihelmus Sanadi, Edwin
2018-03-01
This study aims to classify abstract content based on the use of the highest number of words in an abstract content of the English language journals. This research uses a system of text mining technology that extracts text data to search information from a set of documents. Abstract content of 120 data downloaded at www.computer.org. Data grouping consists of three categories: DM (Data Mining), ITS (Intelligent Transport System) and MM (Multimedia). Systems built using naive bayes algorithms to classify abstract journals and feature selection processes using term weighting to give weight to each word. Dimensional reduction techniques to reduce the dimensions of word counts rarely appear in each document based on dimensional reduction test parameters of 10% -90% of 5.344 words. The performance of the classification system is tested by using the Confusion Matrix based on comparative test data and test data. The results showed that the best classification results were obtained during the 75% training data test and 25% test data from the total data. Accuracy rates for categories of DM, ITS and MM were 100%, 100%, 86%. respectively with dimension reduction parameters of 30% and the value of learning rate between 0.1-0.5.
On Lagrangian residual currents with applications in south San Francisco Bay, California
Cheng, Ralph T.; Casulli, Vincenzo
1982-01-01
The Lagrangian residual circulation has often been introduced as the sum of the Eulerian residual circulation and the Stokes' drift. Unfortunately, this definition of the Lagrangian residual circulation is conceptually incorrect because both the Eulerian residual circulation and the Stokes' drift are Eulerian variables. In this paper a classification of various residual variables are reviewed and properly defined. The Lagrangian residual circulation is then studied by means of a two-stage formulation of a computer model. The tidal circulation is first computed in a conventional Eulerian way, and then the Lagrangian residual circulation is determined by a method patterned after the method of markers and cells. To demonstrate properties of the Lagrangian residual circulation, application of this approach in South San Francisco Bay, California, is considered. With the aid of the model results, properties of the Eulerian and Lagrangian residual circulation are examined. It can be concluded that estimation of the Lagrangian residual circulation from Eulerian data may lead to unacceptable error, particularly in a tidal estuary where the tidal excursion is of the same order of magnitude as the length scale of the basin. A direction calculation of the Lagrangian residual circulation must be made and has been shown to be feasible.
Spatial modeling and classification of corneal shape.
Marsolo, Keith; Twa, Michael; Bullimore, Mark A; Parthasarathy, Srinivasan
2007-03-01
One of the most promising applications of data mining is in biomedical data used in patient diagnosis. Any method of data analysis intended to support the clinical decision-making process should meet several criteria: it should capture clinically relevant features, be computationally feasible, and provide easily interpretable results. In an initial study, we examined the feasibility of using Zernike polynomials to represent biomedical instrument data in conjunction with a decision tree classifier to distinguish between the diseased and non-diseased eyes. Here, we provide a comprehensive follow-up to that work, examining a second representation, pseudo-Zernike polynomials, to determine whether they provide any increase in classification accuracy. We compare the fidelity of both methods using residual root-mean-square (rms) error and evaluate accuracy using several classifiers: neural networks, C4.5 decision trees, Voting Feature Intervals, and Naïve Bayes. We also examine the effect of several meta-learning strategies: boosting, bagging, and Random Forests (RFs). We present results comparing accuracy as it relates to dataset and transformation resolution over a larger, more challenging, multi-class dataset. They show that classification accuracy is similar for both data transformations, but differs by classifier. We find that the Zernike polynomials provide better feature representation than the pseudo-Zernikes and that the decision trees yield the best balance of classification accuracy and interpretability.
Predicted seafloor facies of Central Santa Monica Bay, California
Dartnell, Peter; Gardner, James V.
2004-01-01
Summary -- Mapping surficial seafloor facies (sand, silt, muddy sand, rock, etc.) should be the first step in marine geological studies and is crucial when modeling sediment processes, pollution transport, deciphering tectonics, and defining benthic habitats. This report outlines an empirical technique that predicts the distribution of seafloor facies for a large area offshore Los Angeles, CA using high-resolution bathymetry and co-registered, calibrated backscatter from multibeam echosounders (MBES) correlated to ground-truth sediment samples. The technique uses a series of procedures that involve supervised classification and a hierarchical decision tree classification that are now available in advanced image-analysis software packages. Derivative variance images of both bathymetry and acoustic backscatter are calculated from the MBES data and then used in a hierarchical decision-tree framework to classify the MBES data into areas of rock, gravelly muddy sand, muddy sand, and mud. A quantitative accuracy assessment on the classification results is performed using ground-truth sediment samples. The predicted facies map is also ground-truthed using seafloor photographs and high-resolution sub-bottom seismic-reflection profiles. This Open-File Report contains the predicted seafloor facies map as a georeferenced TIFF image along with the multibeam bathymetry and acoustic backscatter data used in the study as well as an explanation of the empirical classification process.
Bayesian network modelling of upper gastrointestinal bleeding
NASA Astrophysics Data System (ADS)
Aisha, Nazziwa; Shohaimi, Shamarina; Adam, Mohd Bakri
2013-09-01
Bayesian networks are graphical probabilistic models that represent causal and other relationships between domain variables. In the context of medical decision making, these models have been explored to help in medical diagnosis and prognosis. In this paper, we discuss the Bayesian network formalism in building medical support systems and we learn a tree augmented naive Bayes Network (TAN) from gastrointestinal bleeding data. The accuracy of the TAN in classifying the source of gastrointestinal bleeding into upper or lower source is obtained. The TAN achieves a high classification accuracy of 86% and an area under curve of 92%. A sensitivity analysis of the model shows relatively high levels of entropy reduction for color of the stool, history of gastrointestinal bleeding, consistency and the ratio of blood urea nitrogen to creatinine. The TAN facilitates the identification of the source of GIB and requires further validation.
Nearest Neighbor Algorithms for Pattern Classification
NASA Technical Reports Server (NTRS)
Barrios, J. O.
1972-01-01
A solution of the discrimination problem is considered by means of the minimum distance classifier, commonly referred to as the nearest neighbor (NN) rule. The NN rule is nonparametric, or distribution free, in the sense that it does not depend on any assumptions about the underlying statistics for its application. The k-NN rule is a procedure that assigns an observation vector z to a category F if most of the k nearby observations x sub i are elements of F. The condensed nearest neighbor (CNN) rule may be used to reduce the size of the training set required categorize The Bayes risk serves merely as a reference-the limit of excellence beyond which it is not possible to go. The NN rule is bounded below by the Bayes risk and above by twice the Bayes risk.
On the Discriminant Analysis in the 2-Populations Case
NASA Astrophysics Data System (ADS)
Rublík, František
2008-01-01
The empirical Bayes Gaussian rule, which in the normal case yields good values of the probability of total error, may yield high values of the maximum probability error. From this point of view the presented modified version of the classification rule of Broffitt, Randles and Hogg appears to be superior. The modification included in this paper is termed as a WR method, and the choice of its weights is discussed. The mentioned methods are also compared with the K nearest neighbours classification rule.
Caccia, Valentina G; Boyer, Joseph N
2005-11-01
An objective classification analysis was performed on a water quality data set from 25 sites collected monthly during 1994-2003. The water quality parameters measured included: TN, TON, DIN, NH4+, NO3-, NO2-, TP, SRP, TN:TP ratio, TOC, DO, CHL A, turbidity, salinity and temperature. Based on this spatial analysis, Biscayne Bay was divided into five zones having similar water quality characteristics. A robust nutrient gradient, driven mostly by dissolved inorganic nitrogen, from alongshore to offshore in the main Bay, was a large determinant in the spatial clustering. Two of these zones (Alongshore and Inshore) were heavily influenced by freshwater input from four canals which drain the South Dade agricultural area, Black Point Landfill, and sewage treatment plant. The North Bay zone, with high turbidity, phytoplankton biomass, total phosphorus, and low DO, was affected by runoff from five canals, the Munisport Landfill, and the urban landscape. The South Bay zone, an embayment surrounded by mangrove wetlands with little urban development, was high in dissolved organic constituents but low in inorganic nutrients. The Main Bay was the area most influenced by water exchange with the Atlantic Ocean and showed the lowest nutrient concentrations. The water quality in Biscayne Bay is therefore highly dependent of the land use and influence from the watershed.
Schmidt, Wiebke; Evers-King, Hayley L.; Campos, Carlos J. A.; Jones, Darren B.; Miller, Peter I.; Davidson, Keith; Shutler, Jamie D.
2018-01-01
Microbiological contamination or elevated marine biotoxin concentrations within shellfish can result in temporary closure of shellfish aquaculture harvesting, leading to financial loss for the aquaculture business and a potential reduction in consumer confidence in shellfish products. We present a method for predicting short-term variations in shellfish concentrations of Escherichia coli and biotoxin (okadaic acid and its derivates dinophysistoxins and pectenotoxins). The approach was evaluated for 2 contrasting shellfish harvesting areas. Through a meta-data analysis and using environmental data (in situ, satellite observations and meteorological nowcasts and forecasts), key environmental drivers were identified and used to develop models to predict E. coli and biotoxin concentrations within shellfish. Models were trained and evaluated using independent datasets, and the best models were identified based on the model exhibiting the lowest root mean square error. The best biotoxin model was able to provide 1 wk forecasts with an accuracy of 86%, a 0% false positive rate and a 0% false discovery rate (n = 78 observations) when used to predict the closure of shellfish beds due to biotoxin. The best E. coli models were used to predict the European hygiene classification of the shellfish beds to an accuracy of 99% (n = 107 observations) and 98% (n = 63 observations) for a bay (St Austell Bay) and an estuary (Turnaware Bar), respectively. This generic approach enables high accuracy short-term farm-specific forecasts, based on readily accessible environmental data and observations. PMID:29805719
Classification of postural profiles among mouth-breathing children by learning vector quantization.
Mancini, F; Sousa, F S; Hummel, A D; Falcão, A E J; Yi, L C; Ortolani, C F; Sigulem, D; Pisa, I T
2011-01-01
Mouth breathing is a chronic syndrome that may bring about postural changes. Finding characteristic patterns of changes occurring in the complex musculoskeletal system of mouth-breathing children has been a challenge. Learning vector quantization (LVQ) is an artificial neural network model that can be applied for this purpose. The aim of the present study was to apply LVQ to determine the characteristic postural profiles shown by mouth-breathing children, in order to further understand abnormal posture among mouth breathers. Postural training data on 52 children (30 mouth breathers and 22 nose breathers) and postural validation data on 32 children (22 mouth breathers and 10 nose breathers) were used. The performance of LVQ and other classification models was compared in relation to self-organizing maps, back-propagation applied to multilayer perceptrons, Bayesian networks, naive Bayes, J48 decision trees, k, and k-nearest-neighbor classifiers. Classifier accuracy was assessed by means of leave-one-out cross-validation, area under ROC curve (AUC), and inter-rater agreement (Kappa statistics). By using the LVQ model, five postural profiles for mouth-breathing children could be determined. LVQ showed satisfactory results for mouth-breathing and nose-breathing classification: sensitivity and specificity rates of 0.90 and 0.95, respectively, when using the training dataset, and 0.95 and 0.90, respectively, when using the validation dataset. The five postural profiles for mouth-breathing children suggested by LVQ were incorporated into application software for classifying the severity of mouth breathers' abnormal posture.
Bayes factors based on robust TDT-type tests for family trio design.
Yuan, Min; Pan, Xiaoqing; Yang, Yaning
2015-06-01
Adaptive transmission disequilibrium test (aTDT) and MAX3 test are two robust-efficient association tests for case-parent family trio data. Both tests incorporate information of common genetic models including recessive, additive and dominant models and are efficient in power and robust to genetic model specifications. The aTDT uses information of departure from Hardy-Weinberg disequilibrium to identify the potential genetic model underlying the data and then applies the corresponding TDT-type test, and the MAX3 test is defined as the maximum of the absolute value of three TDT-type tests under the three common genetic models. In this article, we propose three robust Bayes procedures, the aTDT based Bayes factor, MAX3 based Bayes factor and Bayes model averaging (BMA), for association analysis with case-parent trio design. The asymptotic distributions of aTDT under the null and alternative hypothesis are derived in order to calculate its Bayes factor. Extensive simulations show that the Bayes factors and the p-values of the corresponding tests are generally consistent and these Bayes factors are robust to genetic model specifications, especially so when the priors on the genetic models are equal. When equal priors are used for the underlying genetic models, the Bayes factor method based on aTDT is more powerful than those based on MAX3 and Bayes model averaging. When the prior placed a small (large) probability on the true model, the Bayes factor based on aTDT (BMA) is more powerful. Analysis of a simulation data about RA from GAW15 is presented to illustrate applications of the proposed methods.
A study and evaluation of image analysis techniques applied to remotely sensed data
NASA Technical Reports Server (NTRS)
Atkinson, R. J.; Dasarathy, B. V.; Lybanon, M.; Ramapriyan, H. K.
1976-01-01
An analysis of phenomena causing nonlinearities in the transformation from Landsat multispectral scanner coordinates to ground coordinates is presented. Experimental results comparing rms errors at ground control points indicated a slight improvement when a nonlinear (8-parameter) transformation was used instead of an affine (6-parameter) transformation. Using a preliminary ground truth map of a test site in Alabama covering the Mobile Bay area and six Landsat images of the same scene, several classification methods were assessed. A methodology was developed for automatic change detection using classification/cluster maps. A coding scheme was employed for generation of change depiction maps indicating specific types of changes. Inter- and intraseasonal data of the Mobile Bay test area were compared to illustrate the method. A beginning was made in the study of data compression by applying a Karhunen-Loeve transform technique to a small section of the test data set. The second part of the report provides a formal documentation of the several programs developed for the analysis and assessments presented.
Moore, Jason H; Gilbert, Joshua C; Tsai, Chia-Ti; Chiang, Fu-Tien; Holden, Todd; Barney, Nate; White, Bill C
2006-07-21
Detecting, characterizing, and interpreting gene-gene interactions or epistasis in studies of human disease susceptibility is both a mathematical and a computational challenge. To address this problem, we have previously developed a multifactor dimensionality reduction (MDR) method for collapsing high-dimensional genetic data into a single dimension (i.e. constructive induction) thus permitting interactions to be detected in relatively small sample sizes. In this paper, we describe a comprehensive and flexible framework for detecting and interpreting gene-gene interactions that utilizes advances in information theory for selecting interesting single-nucleotide polymorphisms (SNPs), MDR for constructive induction, machine learning methods for classification, and finally graphical models for interpretation. We illustrate the usefulness of this strategy using artificial datasets simulated from several different two-locus and three-locus epistasis models. We show that the accuracy, sensitivity, specificity, and precision of a naïve Bayes classifier are significantly improved when SNPs are selected based on their information gain (i.e. class entropy removed) and reduced to a single attribute using MDR. We then apply this strategy to detecting, characterizing, and interpreting epistatic models in a genetic study (n = 500) of atrial fibrillation and show that both classification and model interpretation are significantly improved.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Taiping; Yang, Zhaoqing; Khangaonkar, Tarang
2010-04-22
In this study, a hydrodynamic model based on the unstructured-grid finite volume coastal ocean model (FVCOM) was developed for Bellingham Bay, Washington. The model simulates water surface elevation, velocity, temperature, and salinity in a three-dimensional domain that covers the entire Bellingham Bay and adjacent water bodies, including Lummi Bay, Samish Bay, Padilla Bay, and Rosario Strait. The model was developed using Pacific Northwest National Laboratory’s high-resolution Puget Sound and Northwest Straits circulation and transport model. A sub-model grid for Bellingham Bay and adjacent coastal waters was extracted from the Puget Sound model and refined in Bellingham Bay using bathymetric lightmore » detection and ranging (LIDAR) and river channel cross-section data. The model uses tides, river inflows, and meteorological inputs to predict water surface elevations, currents, salinity, and temperature. A tidal open boundary condition was specified using standard National Oceanic and Atmospheric Administration (NOAA) predictions. Temperature and salinity open boundary conditions were specified based on observed data. Meteorological forcing (wind, solar radiation, and net surface heat flux) was obtained from NOAA real observations and National Center for Environmental Prediction North American Regional Analysis outputs. The model was run in parallel with 48 cores using a time step of 2.5 seconds. It took 18 hours of cpu time to complete 26 days of simulation. The model was calibrated with oceanographic field data for the period of 6/1/2009 to 6/26/2009. These data were collected specifically for the purpose of model development and calibration. They include time series of water-surface elevation, currents, temperature, and salinity as well as temperature and salinity profiles during instrument deployment and retrieval. Comparisons between model predictions and field observations show an overall reasonable agreement in both temporal and spatial scales. Comparisons of root mean square error values for surface elevation, velocity, temperature, and salinity time series are 0.11 m, 0.10 m/s, 1.28oC, and 1.91 ppt, respectively. The model was able to reproduce the salinity and temperature stratifications inside Bellingham Bay. Wetting and drying processes in tidal flats in Bellingham Bay, Samish Bay, and Padilla Bay were also successfully simulated. Both model results and observed data indicated that water surface elevations inside Bellingham Bay are highly correlated to tides. Circulation inside the bay is weak and complex and is affected by various forcing mechanisms, including tides, winds, freshwater inflows, and other local forcing factors. The Bellingham Bay model solution was successfully linked to the NOAA oil spill trajectory simulation model “General NOAA Operational Modeling Environment (GNOME).” Overall, the Bellingham Bay model has been calibrated reasonably well and can be used to provide detailed hydrodynamic information in the bay and adjacent water bodies. While there is room for further improvement with more available data, the calibrated hydrodynamic model provides useful hydrodynamic information in Bellingham Bay and can be used to support sediment transport and water quality modeling as well as assist in the design of nearshore restoration scenarios.« less
NASA Astrophysics Data System (ADS)
Liu, Lei; Chen, Hongde; Zhong, Yijiang; Wang, Jun; Xu, Changgui; Chen, Anqing; Du, Xiaofeng
2017-10-01
Sediment gravity flow deposits are common, particularly in sandy formations, but their origin has been a matter of debate and there is no consensus about the classification of such deposits. However, sediment gravity flow sandstones are economically important and have the potential to meet a growing demand in oil and gas exploration, so there is a drive to better understand them. This study focuses on sediment gravity flow deposits identified from well cores in Palaeogene deposits from the Liaodong Bay Depression in Bohai Bay Basin, China. We classify the sediment gravity flow deposits into eight lithofacies using lithological characteristics, grain size, and sedimentary structures, and interpret the associated depositional processes. Based on the scale, spatial distribution, and contact relationships of sediment gravity flow deposits, we defined six types of lithofacies associations (LAs) that reflect transformation processes and depositional morphology: LA1 (unconfined proximal breccia deposits), LA2 (confined channel deposits), LA3 (braided-channel lobe deposits), LA4 (unconfined lobe deposits), LA5 (distal sheet deposits), and LA6 (non-channelized sheet deposits). Finally, we established three depositional models that reflect the sedimentological characteristics and depositional processes of sediment gravity flow deposits: (1) slope-apron gravel-rich depositional model, which involves cohesive debris flows deposited as LA1 and dilute turbidity currents deposited as LA5; (2) non-channelized surge-like turbidity current depositional model, which mainly comprises sandy slumping, suspended load dominated turbidity currents, and dilute turbidity currents deposited as LA5 and LA6; and (3) channelized subaqueous-fan depositional model, which consists of non-cohesive bedload dominated turbidity currents, suspended load dominated turbidity currents, and dilute turbidity currents deposited as LA2-LA5, originating from sustained extrabasinal turbidity currents (hyperpycnal flow). The depositional models may be applicable to oil and gas exploration and production from sediment gravity flow systems in similar lacustrine depositional environments elsewhere.
Single-accelerometer-based daily physical activity classification.
Long, Xi; Yin, Bin; Aarts, Ronald M
2009-01-01
In this study, a single tri-axial accelerometer placed on the waist was used to record the acceleration data for human physical activity classification. The data collection involved 24 subjects performing daily real-life activities in a naturalistic environment without researchers' intervention. For the purpose of assessing customers' daily energy expenditure, walking, running, cycling, driving, and sports were chosen as target activities for classification. This study compared a Bayesian classification with that of a Decision Tree based approach. A Bayes classifier has the advantage to be more extensible, requiring little effort in classifier retraining and software update upon further expansion or modification of the target activities. Principal components analysis was applied to remove the correlation among features and to reduce the feature vector dimension. Experiments using leave-one-subject-out and 10-fold cross validation protocols revealed a classification accuracy of approximately 80%, which was comparable with that obtained by a Decision Tree classifier.
Machine learning for the assessment of Alzheimer's disease through DTI
NASA Astrophysics Data System (ADS)
Lella, Eufemia; Amoroso, Nicola; Bellotti, Roberto; Diacono, Domenico; La Rocca, Marianna; Maggipinto, Tommaso; Monaco, Alfonso; Tangaro, Sabina
2017-09-01
Digital imaging techniques have found several medical applications in the development of computer aided detection systems, especially in neuroimaging. Recent advances in Diffusion Tensor Imaging (DTI) aim to discover biological markers for the early diagnosis of Alzheimer's disease (AD), one of the most widespread neurodegenerative disorders. We explore here how different supervised classification models provide a robust support to the diagnosis of AD patients. We use DTI measures, assessing the structural integrity of white matter (WM) fiber tracts, to reveal patterns of disrupted brain connectivity. In particular, we provide a voxel-wise measure of fractional anisotropy (FA) and mean diffusivity (MD), thus identifying the regions of the brain mostly affected by neurodegeneration, and then computing intensity features to feed supervised classification algorithms. In particular, we evaluate the accuracy of discrimination of AD patients from healthy controls (HC) with a dataset of 80 subjects (40 HC, 40 AD), from the Alzheimer's Disease Neurodegenerative Initiative (ADNI). In this study, we compare three state-of-the-art classification models: Random Forests, Naive Bayes and Support Vector Machines (SVMs). We use a repeated five-fold cross validation framework with nested feature selection to perform a fair comparison between these algorithms and evaluate the information content they provide. Results show that AD patterns are well localized within the brain, thus DTI features can support the AD diagnosis.
A new local-global approach for classification.
Peres, R T; Pedreira, C E
2010-09-01
In this paper, we propose a new local-global pattern classification scheme that combines supervised and unsupervised approaches, taking advantage of both, local and global environments. We understand as global methods the ones concerned with the aim of constructing a model for the whole problem space using the totality of the available observations. Local methods focus into sub regions of the space, possibly using an appropriately selected subset of the sample. In the proposed method, the sample is first divided in local cells by using a Vector Quantization unsupervised algorithm, the LBG (Linde-Buzo-Gray). In a second stage, the generated assemblage of much easier problems is locally solved with a scheme inspired by Bayes' rule. Four classification methods were implemented for comparison purposes with the proposed scheme: Learning Vector Quantization (LVQ); Feedforward Neural Networks; Support Vector Machine (SVM) and k-Nearest Neighbors. These four methods and the proposed scheme were implemented in eleven datasets, two controlled experiments, plus nine public available datasets from the UCI repository. The proposed method has shown a quite competitive performance when compared to these classical and largely used classifiers. Our method is simple concerning understanding and implementation and is based on very intuitive concepts. Copyright 2010 Elsevier Ltd. All rights reserved.
Geothermal Potential of Adak Island, Alaska
1985-10-01
alteration of the Andrew Bay Hot Springs is essentially propylitic , with the introduction of pyrite and the conversion of magnetite to pyrite. This pyritic...features: Goethite coats the walls of a 1-mm fracture in this rock. Classification: Propylitically altered andesite porphyry breccia. 71 NWC TP 6676 Date: 20
Computing a Comprehensible Model for Spam Filtering
NASA Astrophysics Data System (ADS)
Ruiz-Sepúlveda, Amparo; Triviño-Rodriguez, José L.; Morales-Bueno, Rafael
In this paper, we describe the application of the Desicion Tree Boosting (DTB) learning model to spam email filtering.This classification task implies the learning in a high dimensional feature space. So, it is an example of how the DTB algorithm performs in such feature space problems. In [1], it has been shown that hypotheses computed by the DTB model are more comprehensible that the ones computed by another ensemble methods. Hence, this paper tries to show that the DTB algorithm maintains the same comprehensibility of hypothesis in high dimensional feature space problems while achieving the performance of other ensemble methods. Four traditional evaluation measures (precision, recall, F1 and accuracy) have been considered for performance comparison between DTB and others models usually applied to spam email filtering. The size of the hypothesis computed by a DTB is smaller and more comprehensible than the hypothesis computed by Adaboost and Naïve Bayes.
Boehm, Udo; Steingroever, Helen; Wagenmakers, Eric-Jan
2018-06-01
An important tool in the advancement of cognitive science are quantitative models that represent different cognitive variables in terms of model parameters. To evaluate such models, their parameters are typically tested for relationships with behavioral and physiological variables that are thought to reflect specific cognitive processes. However, many models do not come equipped with the statistical framework needed to relate model parameters to covariates. Instead, researchers often revert to classifying participants into groups depending on their values on the covariates, and subsequently comparing the estimated model parameters between these groups. Here we develop a comprehensive solution to the covariate problem in the form of a Bayesian regression framework. Our framework can be easily added to existing cognitive models and allows researchers to quantify the evidential support for relationships between covariates and model parameters using Bayes factors. Moreover, we present a simulation study that demonstrates the superiority of the Bayesian regression framework to the conventional classification-based approach.
Classifying environmentally significant urban land uses with satellite imagery.
Park, Mi-Hyun; Stenstrom, Michael K
2008-01-01
We investigated Bayesian networks to classify urban land use from satellite imagery. Landsat Enhanced Thematic Mapper Plus (ETM(+)) images were used for the classification in two study areas: (1) Marina del Rey and its vicinity in the Santa Monica Bay Watershed, CA and (2) drainage basins adjacent to the Sweetwater Reservoir in San Diego, CA. Bayesian networks provided 80-95% classification accuracy for urban land use using four different classification systems. The classifications were robust with small training data sets with normal and reduced radiometric resolution. The networks needed only 5% of the total data (i.e., 1500 pixels) for sample size and only 5- or 6-bit information for accurate classification. The network explicitly showed the relationship among variables from its structure and was also capable of utilizing information from non-spectral data. The classification can be used to provide timely and inexpensive land use information over large areas for environmental purposes such as estimating stormwater pollutant loads.
Evolving optimised decision rules for intrusion detection using particle swarm paradigm
NASA Astrophysics Data System (ADS)
Sivatha Sindhu, Siva S.; Geetha, S.; Kannan, A.
2012-12-01
The aim of this article is to construct a practical intrusion detection system (IDS) that properly analyses the statistics of network traffic pattern and classify them as normal or anomalous class. The objective of this article is to prove that the choice of effective network traffic features and a proficient machine-learning paradigm enhances the detection accuracy of IDS. In this article, a rule-based approach with a family of six decision tree classifiers, namely Decision Stump, C4.5, Naive Baye's Tree, Random Forest, Random Tree and Representative Tree model to perform the detection of anomalous network pattern is introduced. In particular, the proposed swarm optimisation-based approach selects instances that compose training set and optimised decision tree operate over this trained set producing classification rules with improved coverage, classification capability and generalisation ability. Experiment with the Knowledge Discovery and Data mining (KDD) data set which have information on traffic pattern, during normal and intrusive behaviour shows that the proposed algorithm produces optimised decision rules and outperforms other machine-learning algorithm.
Vafaee Sharbaf, Fatemeh; Mosafer, Sara; Moattar, Mohammad Hossein
2016-06-01
This paper proposes an approach for gene selection in microarray data. The proposed approach consists of a primary filter approach using Fisher criterion which reduces the initial genes and hence the search space and time complexity. Then, a wrapper approach which is based on cellular learning automata (CLA) optimized with ant colony method (ACO) is used to find the set of features which improve the classification accuracy. CLA is applied due to its capability to learn and model complicated relationships. The selected features from the last phase are evaluated using ROC curve and the most effective while smallest feature subset is determined. The classifiers which are evaluated in the proposed framework are K-nearest neighbor; support vector machine and naïve Bayes. The proposed approach is evaluated on 4 microarray datasets. The evaluations confirm that the proposed approach can find the smallest subset of genes while approaching the maximum accuracy. Copyright © 2016 Elsevier Inc. All rights reserved.
Classification of wetlands vegetation using small scale color infrared imagery
NASA Technical Reports Server (NTRS)
Williamson, F. S. L.
1975-01-01
A classification system for Chesapeake Bay wetlands was derived from the correlation of film density classes and actual vegetation classes. The data processing programs used were developed by the Laboratory for the Applications of Remote Sensing. These programs were tested for their value in classifying natural vegetation, using digitized data from small scale aerial photography. Existing imagery and the vegetation map of Farm Creek Marsh were used to determine the optimal number of classes, and to aid in determining if the computer maps were a believable product.
Consistent latent position estimation and vertex classification for random dot product graphs.
Sussman, Daniel L; Tang, Minh; Priebe, Carey E
2014-01-01
In this work, we show that using the eigen-decomposition of the adjacency matrix, we can consistently estimate latent positions for random dot product graphs provided the latent positions are i.i.d. from some distribution. If class labels are observed for a number of vertices tending to infinity, then we show that the remaining vertices can be classified with error converging to Bayes optimal using the $(k)$-nearest-neighbors classification rule. We evaluate the proposed methods on simulated data and a graph derived from Wikipedia.
Mandelkow, Hendrik; de Zwart, Jacco A.; Duyn, Jeff H.
2016-01-01
Naturalistic stimuli like movies evoke complex perceptual processes, which are of great interest in the study of human cognition by functional MRI (fMRI). However, conventional fMRI analysis based on statistical parametric mapping (SPM) and the general linear model (GLM) is hampered by a lack of accurate parametric models of the BOLD response to complex stimuli. In this situation, statistical machine-learning methods, a.k.a. multivariate pattern analysis (MVPA), have received growing attention for their ability to generate stimulus response models in a data-driven fashion. However, machine-learning methods typically require large amounts of training data as well as computational resources. In the past, this has largely limited their application to fMRI experiments involving small sets of stimulus categories and small regions of interest in the brain. By contrast, the present study compares several classification algorithms known as Nearest Neighbor (NN), Gaussian Naïve Bayes (GNB), and (regularized) Linear Discriminant Analysis (LDA) in terms of their classification accuracy in discriminating the global fMRI response patterns evoked by a large number of naturalistic visual stimuli presented as a movie. Results show that LDA regularized by principal component analysis (PCA) achieved high classification accuracies, above 90% on average for single fMRI volumes acquired 2 s apart during a 300 s movie (chance level 0.7% = 2 s/300 s). The largest source of classification errors were autocorrelations in the BOLD signal compounded by the similarity of consecutive stimuli. All classifiers performed best when given input features from a large region of interest comprising around 25% of the voxels that responded significantly to the visual stimulus. Consistent with this, the most informative principal components represented widespread distributions of co-activated brain regions that were similar between subjects and may represent functional networks. In light of these results, the combination of naturalistic movie stimuli and classification analysis in fMRI experiments may prove to be a sensitive tool for the assessment of changes in natural cognitive processes under experimental manipulation. PMID:27065832
Forward for book entitled "Estuaries: Classification, Ecology, and Human Impacts"
The author was introduced to the science of estuaries as a graduate student in the early 1980s, studying the ecology of oyster populations in Chesapeake Bay. To undertake this research, he needed to learn not only about oyster biology, but also about the unique physical and chemi...
Question analysis for Indonesian comparative question
NASA Astrophysics Data System (ADS)
Saelan, A.; Purwarianti, A.; Widyantoro, D. H.
2017-01-01
Information seeking is one of human needs today. Comparing things using search engine surely take more times than search only one thing. In this paper, we analyzed comparative questions for comparative question answering system. Comparative question is a question that comparing two or more entities. We grouped comparative questions into 5 types: selection between mentioned entities, selection between unmentioned entities, selection between any entity, comparison, and yes or no question. Then we extracted 4 types of information from comparative questions: entity, aspect, comparison, and constraint. We built classifiers for classification task and information extraction task. Features used for classification task are bag of words, whether for information extraction, we used lexical, 2 previous and following words lexical, and previous label as features. We tried 2 scenarios: classification first and extraction first. For classification first, we used classification result as a feature for extraction. Otherwise, for extraction first, we used extraction result as features for classification. We found that the result would be better if we do extraction first before classification. For the extraction task, classification using SMO gave the best result (88.78%), while for classification, it is better to use naïve bayes (82.35%).
Kim, Eunji; Ivanov, Ivan; Hua, Jianping; Lampe, Johanna W; Hullar, Meredith Aj; Chapkin, Robert S; Dougherty, Edward R
2017-01-01
Ranking feature sets for phenotype classification based on gene expression is a challenging issue in cancer bioinformatics. When the number of samples is small, all feature selection algorithms are known to be unreliable, producing significant error, and error estimators suffer from different degrees of imprecision. The problem is compounded by the fact that the accuracy of classification depends on the manner in which the phenomena are transformed into data by the measurement technology. Because next-generation sequencing technologies amount to a nonlinear transformation of the actual gene or RNA concentrations, they can potentially produce less discriminative data relative to the actual gene expression levels. In this study, we compare the performance of ranking feature sets derived from a model of RNA-Seq data with that of a multivariate normal model of gene concentrations using 3 measures: (1) ranking power, (2) length of extensions, and (3) Bayes features. This is the model-based study to examine the effectiveness of reporting lists of small feature sets using RNA-Seq data and the effects of different model parameters and error estimators. The results demonstrate that the general trends of the parameter effects on the ranking power of the underlying gene concentrations are preserved in the RNA-Seq data, whereas the power of finding a good feature set becomes weaker when gene concentrations are transformed by the sequencing machine.
Numerical study of water residence time in the Yueqing Bay based on the eulerian approach
NASA Astrophysics Data System (ADS)
Ying, Chao; Li, Xinwen; Liu, Yong; Yao, Wenwei; Li, Ruijie
2018-05-01
The Yueqing Bay was a semi-enclosed bay located in the southeast of Zhejiang Province, China. Due to substantial anthropogenic influences since 1964, the water quality in the bay had deteriorated seriously. Thus urgent measures should be taken to protect the water body. In this study, a numerical model was calibrated for water surface elevation and tidal current from August 14 to August 26, 2011. Comparisons of observed and simulated data showed that the model reproduced the tidal range and phase and the variations of current at different periods fairly well. The calibrated model was then applied to investigate spatial flushing pattern of the bay by calculation of residence time. The results obtained from a series of model experiments demonstrated that the residence time increased from 10 day at the bay mouth to more than 70 day at the upper bay. The average residence time over the whole bay was 49.5 day. In addition, the adaptation of flushing homogeneity curve showed that the residence time in the bay varied smoothly. This study provides a numerical tool to quantify the transport timescale in Yueqing Bay and supports adaptive management of the bay by local authorities.
Modeling ozone episodes in the Baltimore-Washington region
NASA Technical Reports Server (NTRS)
Ryan, William F.
1994-01-01
Surface ozone (O3) concentrations in excess of the National Ambient Air Quality Standard (NAAQS) continue to occur in metropolitan areas in the United States despite efforts to control emissions of O3 precursors. Future O3 control strategies will be based on results from modeling efforts that have just begun in many areas. Two initial questions that arise are model sensitivity to domain-specific conditions and the selection of episodes for model evaluation and control strategy development. For the Baltimore-Washington region (B-W), the presence of the Chesapeake Bay introduces a number of issues relevant to model sensitivity. In this paper, the specific questions of the determination of model volume (mixing height) for the Urban Airshed Model (UAM) is discussed and various alternative methods compared. For the latter question, several analytic approaches, Cluster Analysis and classification and Regression Tree (CART) analysis are undertaken to determine meteorological conditions associated with severe O3 events in the B-W domain.
A Landsat-Based Assessment of Mobile Bay Land Use and Land Cover Change from 1974 to 2008
NASA Technical Reports Server (NTRS)
Spruce, Joseph; Ellis, Jean; Smoot, James; Swann, Roberta; Graham, William
2009-01-01
The Mobile Bay region has experienced noteworthy land use and land cover (LULC) change in the latter half of the 20th century. Accompanying this change has been urban expansion and a reduction of rural land uses. Much of this LULC change has reportedly occurred since the landfall of Hurricane Frederic in 1979. The Mobile Bay region provides great economic and ecologic benefits to the Nation, including important coastal habitat for a broad diversity of fisheries and wildlife. Regional urbanization threatens the estuary s water quality and aquatic-habitat dependent biota, including commercial fisheries and avian wildlife. Coastal conservation and urban land use planners require additional information on historical LULC change to support coastal habitat restoration and resiliency management efforts. This presentation discusses results of a Gulf of Mexico Application Pilot project that was conducted in 2008 to quantify and assess LULC change from 1974 to 2008. This project was led by NASA Stennis Space Center and involved multiple Gulf of Mexico Alliance (GOMA) partners, including the Mobile Bay National Estuary Program (NEP), the U.S. Army Corps of Engineers, the National Oceanic and Atmospheric Administration s (NOAA s) National Coastal Data Development Center (NCDDC), and the NOAA Coastal Services Center. Nine Landsat images were employed to compute LULC products because of their availability and suitability for the application. The project also used Landsat-based national LULC products, including coastal LULC products from NOAA s Coastal Change & Analysis Program (C-CAP), available at 5-year intervals since 1995. Our study was initiated in part because C-CAP LULC products were not available to assess the region s urbanization prior to 1995 and subsequent to post Hurricane Katrina in 2006. This project assessed LULC change across the 34-year time frame and at decadal and middecadal scales. The study area included the majority of Mobile and Baldwin counties that encompass Mobile Bay. In doing so, each date of Landsat data was classified using an end-user defined modified Anderson level 1 classification scheme. LULC classifications were refined using a decision rule approach in conjunction with available C-CAP products. Individual dates of LULC classifications were validated by image interpretation of stratified random locations on raw Landsat color composite imagery in combination with higher resolution remote sensing and in-situ reference data. The results indicate that during the 34-year study period, urban areas increased from 96,688 to 150,227 acres, representing a 55.37% increase, or 1.63% per annum. Most of the identified urban expansion results from conversion of rural forest and agriculture to urban cover types. Final LULC mapping and metadata products were produced for the entire study area as well as watersheds of concern within the study area. Final project products, including LULC trend information, were incorporated into the Mobile Bay NEP State of the Bay report. Products and metadata were transferred to NOAA NCDDC to allow free online accessibility and use by GOMA partners and by the public.
Observations and a linear model of water level in an interconnected inlet-bay system
NASA Astrophysics Data System (ADS)
Aretxabaleta, Alfredo L.; Ganju, Neil K.; Butman, Bradford; Signell, Richard P.
2017-04-01
A system of barrier islands and back-barrier bays occurs along southern Long Island, New York, and in many coastal areas worldwide. Characterizing the bay physical response to water level fluctuations is needed to understand flooding during extreme events and evaluate their relation to geomorphological changes. Offshore sea level is one of the main drivers of water level fluctuations in semienclosed back-barrier bays. We analyzed observed water levels (October 2007 to November 2015) and developed analytical models to better understand bay water level along southern Long Island. An increase (˜0.02 m change in 0.17 m amplitude) in the dominant M2 tidal amplitude (containing the largest fraction of the variability) was observed in Great South Bay during mid-2014. The observed changes in both tidal amplitude and bay water level transfer from offshore were related to the dredging of nearby inlets and possibly the changing size of a breach across Fire Island caused by Hurricane Sandy (after December 2012). The bay response was independent of the magnitude of the fluctuations (e.g., storms) at a specific frequency. An analytical model that incorporates bay and inlet dimensions reproduced the observed transfer function in Great South Bay and surrounding areas. The model predicts the transfer function in Moriches and Shinnecock bays where long-term observations were not available. The model is a simplified tool to investigate changes in bay water level and enables the evaluation of future conditions and alternative geomorphological settings.
Observations and a linear model of water level in an interconnected inlet-bay system
Aretxabaleta, Alfredo; Ganju, Neil K.; Butman, Bradford; Signell, Richard
2017-01-01
A system of barrier islands and back-barrier bays occurs along southern Long Island, New York, and in many coastal areas worldwide. Characterizing the bay physical response to water level fluctuations is needed to understand flooding during extreme events and evaluate their relation to geomorphological changes. Offshore sea level is one of the main drivers of water level fluctuations in semienclosed back-barrier bays. We analyzed observed water levels (October 2007 to November 2015) and developed analytical models to better understand bay water level along southern Long Island. An increase (∼0.02 m change in 0.17 m amplitude) in the dominant M2 tidal amplitude (containing the largest fraction of the variability) was observed in Great South Bay during mid-2014. The observed changes in both tidal amplitude and bay water level transfer from offshore were related to the dredging of nearby inlets and possibly the changing size of a breach across Fire Island caused by Hurricane Sandy (after December 2012). The bay response was independent of the magnitude of the fluctuations (e.g., storms) at a specific frequency. An analytical model that incorporates bay and inlet dimensions reproduced the observed transfer function in Great South Bay and surrounding areas. The model predicts the transfer function in Moriches and Shinnecock bays where long-term observations were not available. The model is a simplified tool to investigate changes in bay water level and enables the evaluation of future conditions and alternative geomorphological settings.
Extracting galactic structure parameters from multivariated density estimation
NASA Technical Reports Server (NTRS)
Chen, B.; Creze, M.; Robin, A.; Bienayme, O.
1992-01-01
Multivariate statistical analysis, including includes cluster analysis (unsupervised classification), discriminant analysis (supervised classification) and principle component analysis (dimensionlity reduction method), and nonparameter density estimation have been successfully used to search for meaningful associations in the 5-dimensional space of observables between observed points and the sets of simulated points generated from a synthetic approach of galaxy modelling. These methodologies can be applied as the new tools to obtain information about hidden structure otherwise unrecognizable, and place important constraints on the space distribution of various stellar populations in the Milky Way. In this paper, we concentrate on illustrating how to use nonparameter density estimation to substitute for the true densities in both of the simulating sample and real sample in the five-dimensional space. In order to fit model predicted densities to reality, we derive a set of equations which include n lines (where n is the total number of observed points) and m (where m: the numbers of predefined groups) unknown parameters. A least-square estimation will allow us to determine the density law of different groups and components in the Galaxy. The output from our software, which can be used in many research fields, will also give out the systematic error between the model and the observation by a Bayes rule.
Drinking Water Microbiome as a Screening Tool for ...
Many water utilities in the US using chloramine as disinfectant treatment in their distribution systems have experienced nitrification episodes, which detrimentally impact the water quality. A chloraminated drinking water distribution system (DWDS) simulator was operated through four successive operational schemes, including two stable events (SS) and an episode of nitrification (SF), followed by a ‘chlorine burn’ (SR) by switching disinfectant from chloramine to free chlorine. The current research investigated the viability of biological signatures as potential indicators of operational failure and predictors of nitrification in DWDS. For this purpose, we examined the bulk water (BW) bacterial microbiome of a chloraminated DWDS simulator operated through successive operational schemes, including an episode of nitrification. BW data was chosen because sampling of BW in a DWDS by water utility operators is relatively simpler and easier than collecting biofilm samples from underground pipes. The methodology applied a supervised classification machine learning approach (naïve Bayes algorithm) for developing predictive models for nitrification. Classification models were trained with biological datasets (Operational Taxonomic Unit [OTU] and genus-level taxonomic groups) generated using next generation high-throughput technology, and divided into two groups (i.e. binary) of positives and negatives (Failure and Stable, respectively). We also invest
Logical Differential Prediction Bayes Net, improving breast cancer diagnosis for older women.
Nassif, Houssam; Wu, Yirong; Page, David; Burnside, Elizabeth
2012-01-01
Overdiagnosis is a phenomenon in which screening identities cancer which may not go on to cause symptoms or death. Women over 65 who develop breast cancer bear the heaviest burden of overdiagnosis. This work introduces novel machine learning algorithms to improve diagnostic accuracy of breast cancer in aging populations. At the same time, we aim at minimizing unnecessary invasive procedures (thus decreasing false positives) and concomitantly addressing overdiagnosis. We develop a novel algorithm. Logical Differential Prediction Bayes Net (LDP-BN), that calculates the risk of breast disease based on mammography findings. LDP-BN uses Inductive Logic Programming (ILP) to learn relational rules, selects older-specific differentially predictive rules, and incorporates them into a Bayes Net, significantly improving its performance. In addition, LDP-BN offers valuable insight into the classification process, revealing novel older-specific rules that link mass presence to invasive, and calcification presence and lack of detectable mass to DCIS.
Know your data: understanding implicit usage versus explicit action in video content classification
NASA Astrophysics Data System (ADS)
Yew, Jude; Shamma, David A.
2011-02-01
In this paper, we present a method for video category classification using only social metadata from websites like YouTube. In place of content analysis, we utilize communicative and social contexts surrounding videos as a means to determine a categorical genre, e.g. Comedy, Music. We hypothesize that video clips belonging to different genre categories would have distinct signatures and patterns that are reflected in their collected metadata. In particular, we define and describe social metadata as usage or action to aid in classification. We trained a Naive Bayes classifier to predict categories from a sample of 1,740 YouTube videos representing the top five genre categories. Using just a small number of the available metadata features, we compare the classifications produced by our Naive Bayes classifier with those provided by the uploader of that particular video. Compared to random predictions with the YouTube data (21% accurate), our classifier attained a mediocre 33% accuracy in predicting video genres. However, we found that the accuracy of our classifier significantly improves by nominal factoring of the explicit data features. By factoring the ratings of the videos in the dataset, the classifier was able to accurately predict the genres of 75% of the videos. We argue that the patterns of social activity found in the metadata are not just meaningful in their own right, but are indicative of the meaning of the shared video content. The results presented by this project represents a first step in investigating the potential meaning and significance of social metadata and its relation to the media experience.
He, Qiwei; Veldkamp, Bernard P; Glas, Cees A W; de Vries, Theo
2017-03-01
Patients' narratives about traumatic experiences and symptoms are useful in clinical screening and diagnostic procedures. In this study, we presented an automated assessment system to screen patients for posttraumatic stress disorder via a natural language processing and text-mining approach. Four machine-learning algorithms-including decision tree, naive Bayes, support vector machine, and an alternative classification approach called the product score model-were used in combination with n-gram representation models to identify patterns between verbal features in self-narratives and psychiatric diagnoses. With our sample, the product score model with unigrams attained the highest prediction accuracy when compared with practitioners' diagnoses. The addition of multigrams contributed most to balancing the metrics of sensitivity and specificity. This article also demonstrates that text mining is a promising approach for analyzing patients' self-expression behavior, thus helping clinicians identify potential patients from an early stage.
Taniguchi, Hidetaka; Sato, Hiroshi; Shirakawa, Tomohiro
2018-05-09
Human learners can generalize a new concept from a small number of samples. In contrast, conventional machine learning methods require large amounts of data to address the same types of problems. Humans have cognitive biases that promote fast learning. Here, we developed a method to reduce the gap between human beings and machines in this type of inference by utilizing cognitive biases. We implemented a human cognitive model into machine learning algorithms and compared their performance with the currently most popular methods, naïve Bayes, support vector machine, neural networks, logistic regression and random forests. We focused on the task of spam classification, which has been studied for a long time in the field of machine learning and often requires a large amount of data to obtain high accuracy. Our models achieved superior performance with small and biased samples in comparison with other representative machine learning methods.
Höhna, Sebastian; Landis, Michael J.
2016-01-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com. [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.] PMID:27235697
Höhna, Sebastian; Landis, Michael J; Heath, Tracy A; Boussau, Bastien; Lartillot, Nicolas; Moore, Brian R; Huelsenbeck, John P; Ronquist, Fredrik
2016-07-01
Programs for Bayesian inference of phylogeny currently implement a unique and fixed suite of models. Consequently, users of these software packages are simultaneously forced to use a number of programs for a given study, while also lacking the freedom to explore models that have not been implemented by the developers of those programs. We developed a new open-source software package, RevBayes, to address these problems. RevBayes is entirely based on probabilistic graphical models, a powerful generic framework for specifying and analyzing statistical models. Phylogenetic-graphical models can be specified interactively in RevBayes, piece by piece, using a new succinct and intuitive language called Rev. Rev is similar to the R language and the BUGS model-specification language, and should be easy to learn for most users. The strength of RevBayes is the simplicity with which one can design, specify, and implement new and complex models. Fortunately, this tremendous flexibility does not come at the cost of slower computation; as we demonstrate, RevBayes outperforms competing software for several standard analyses. Compared with other programs, RevBayes has fewer black-box elements. Users need to explicitly specify each part of the model and analysis. Although this explicitness may initially be unfamiliar, we are convinced that this transparency will improve understanding of phylogenetic models in our field. Moreover, it will motivate the search for improvements to existing methods by brazenly exposing the model choices that we make to critical scrutiny. RevBayes is freely available at http://www.RevBayes.com [Bayesian inference; Graphical models; MCMC; statistical phylogenetics.]. © The Author(s) 2016. Published by Oxford University Press, on behalf of the Society of Systematic Biologists.
Changes in Chesapeake Bay Hypoxia over the Past Century
NASA Astrophysics Data System (ADS)
Friedrichs, M. A.; Kaufman, D. E.; Najjar, R.; Tian, H.; Zhang, B.; Yao, Y.
2016-02-01
The Chesapeake Bay, one of the world's largest estuaries, is among the many coastal systems where hypoxia is a major concern and where dissolved oxygen thus represents a critical factor in determining the health of the Bay's ecosystem. Over the past century, the population of the Chesapeake Bay region has almost quadrupled, greatly modifying land cover and management practices within the watershed. Simultaneously, the Chesapeake Bay has been experiencing a high degree of climate change, including increases in temperature, precipitation, and precipitation intensity. Together, these changes have resulted in significantly increased riverine nutrient inputs to the Bay. In order to examine how interdecadal changes in riverine nitrogen input affects biogeochemical cycling and dissolved oxygen concentrations in Chesapeake Bay, a land-estuarine-ocean biogeochemical modeling system has been developed for this region. Riverine inputs of nitrogen to the Bay are computed from a terrestrial ecosystem model (the Dynamic Land Ecosystem Model; DLEM) that resolves riverine discharge variability on scales of days to years. This temporally varying discharge is then used as input to the estuarine-carbon-biogeochemical model embedded in the Regional Modeling System (ROMS), which provides estimates of the oxygen concentrations and nitrogen fluxes within the Bay as well as advective exports from the Bay to the adjacent Mid-Atlantic Bight shelf. Simulation results from this linked modeling system for the present (early 2000s) have been extensively evaluated with in situ and remotely sensed data. Longer-term simulations are used to isolate the effect of increased riverine nitrogen loading on dissolved oxygen concentrations and biogeochemical cycling within the Chesapeake Bay.
NASA Technical Reports Server (NTRS)
Defigueiredo, R. J. P.
1974-01-01
General classes of nonlinear and linear transformations were investigated for the reduction of the dimensionality of the classification (feature) space so that, for a prescribed dimension m of this space, the increase of the misclassification risk is minimized.
NASA Technical Reports Server (NTRS)
Ackleson, S. G.; Klemas, V.
1985-01-01
LANDSAT Thematic Mapper (TM) and Multispectral Scanner (MSS) imagery generated simultaneously over Guinea Marsh, Virginia, are assessed in the ability to detect submerged aquatic, bottom-adhering plant canopies (SAV). An unsupervised clustering algorithm is applied to both image types and the resulting classifications compared to SAV distributions derived from color aerial photography. Class confidence and accuracy are first computed for all water areas and then only shallow areas where water depth is less than 6 feet. In both the TM and MSS imagery, masking water areas deeper than 6 ft. resulted in greater classification accuracy at confidence levels greater than 50%. Both systems perform poorly in detecting SAV with crown cover densities less than 70%. On the basis of the spectral resolution, radiometric sensitivity, and location of visible bands, TM imagery does not offer a significant advantage over MSS data for detecting SAV in Lower Chesapeake Bay. However, because the TM imagery represents a higher spatial resolution, smaller SAV canopies may be detected than is possible with MSS data.
NASA Astrophysics Data System (ADS)
Lin, Yi; Jiang, Miao
2017-01-01
Tree species information is essential for forest research and management purposes, which in turn require approaches for accurate and precise classification of tree species. One such remote sensing technology, terrestrial laser scanning (TLS), has proved to be capable of characterizing detailed tree structures, such as tree stem geometry. Can TLS further differentiate between broad- and needle-leaves? If the answer is positive, TLS data can be used for classification of taxonomic tree groups by directly examining their differences in leaf morphology. An analysis was proposed to assess TLS-represented broad- and needle-leaf structures, followed by a Bayes classifier to perform the classification. Tests indicated that the proposed method can basically implement the task, with an overall accuracy of 77.78%. This study indicates a way of implementing the classification of the two major broad- and needle-leaf taxonomies measured by TLS in accordance to their literal definitions, and manifests the potential of extending TLS applications in forestry.
NASA Astrophysics Data System (ADS)
Prakash, A.; Haselwimmer, C. E.; Gens, R.; Womble, J. N.; Ver Hoef, J.
2013-12-01
Tidewater glaciers are prominent landscape features that play a significant role in landscape and ecosystem processes along the southeastern and southcentral coasts of Alaska. Tidewater glaciers calve large icebergs that serve as an important substrate for harbor seals (Phoca vitulina richardii) for resting, pupping, nursing young, molting, and avoiding predators. Many of the tidewater glaciers in Alaska are retreating, which may influence harbor seal populations. Our objectives are to investigate the relationship between ice conditions and harbor seal distributions, which are poorly understood, in John's Hopkins Inlet, Glacier Bay National Park, Alaska, using a combination of airborne remote sensing and statistical modeling techniques. We present an overview of some results from Object-Based Image Analysis (OBIA) for classification of a time series of very high spatial resolution (4 cm pixels) airborne imagery acquired over John's Hopkins Inlet during the harbor seal pupping season in June and during the molting season in August from 2007 - 2012. Using OBIA we have developed a workflow to automate processing of the large volumes (~1250 images/survey) of airborne visible imagery for 1) classification of ice products (e.g. percent ice cover, percent brash ice, percent ice bergs) at a range of scales, and 2) quantitative determination of ice morphological properties such as iceberg size, roundness, and texture that are not found in traditional per-pixel classification approaches. These ice classifications and morphological variables are then used in statistical models to assess relationships with harbor seal abundance and distribution. Ultimately, understanding these relationships may provide novel perspectives on the spatial and temporal variation of harbor seals in tidewater glacial fjords.
Exploring the CAESAR database using dimensionality reduction techniques
NASA Astrophysics Data System (ADS)
Mendoza-Schrock, Olga; Raymer, Michael L.
2012-06-01
The Civilian American and European Surface Anthropometry Resource (CAESAR) database containing over 40 anthropometric measurements on over 4000 humans has been extensively explored for pattern recognition and classification purposes using the raw, original data [1-4]. However, some of the anthropometric variables would be impossible to collect in an uncontrolled environment. Here, we explore the use of dimensionality reduction methods in concert with a variety of classification algorithms for gender classification using only those variables that are readily observable in an uncontrolled environment. Several dimensionality reduction techniques are employed to learn the underlining structure of the data. These techniques include linear projections such as the classical Principal Components Analysis (PCA) and non-linear (manifold learning) techniques, such as Diffusion Maps and the Isomap technique. This paper briefly describes all three techniques, and compares three different classifiers, Naïve Bayes, Adaboost, and Support Vector Machines (SVM), for gender classification in conjunction with each of these three dimensionality reduction approaches.
Segmentation schema for enhancing land cover identification: A case study using Sentinel 2 data
NASA Astrophysics Data System (ADS)
Mongus, Domen; Žalik, Borut
2018-04-01
Land monitoring is performed increasingly using high and medium resolution optical satellites, such as the Sentinel-2. However, optical data is inevitably subjected to the variable operational conditions under which it was acquired. Overlapping of features caused by shadows, soft transitions between shadowed and non-shadowed regions, and temporal variability of the observed land-cover types require radiometric corrections. This study examines a new approach to enhancing the accuracy of land cover identification that resolves this problem. The proposed method constructs an ensemble-type classification model with weak classifiers tuned to the particular operational conditions under which the data was acquired. Iterative segmentation over the learning set is applied for this purpose, where feature space is partitioned according to the likelihood of misclassifications introduced by the classification model. As these are a consequence of overlapping features, such partitioning avoids the need for radiometric corrections of the data, and divides land cover types implicitly into subclasses. As a result, improved performance of all tested classification approaches were measured during the validation that was conducted on Sentinel-2 data. The highest accuracies in terms of F1-scores were achieved using the Naive Bayes Classifier as the weak classifier, while supplementing original spectral signatures with normalised difference vegetation index and texture analysis features, namely, average intensity, contrast, homogeneity, and dissimilarity. In total, an F1-score of nearly 95% was achieved in this way, with F1-scores of each particular land cover type reaching above 90%.
Naive scoring of human sleep based on a hidden Markov model of the electroencephalogram.
Yaghouby, Farid; Modur, Pradeep; Sunderam, Sridhar
2014-01-01
Clinical sleep scoring involves tedious visual review of overnight polysomnograms by a human expert. Many attempts have been made to automate the process by training computer algorithms such as support vector machines and hidden Markov models (HMMs) to replicate human scoring. Such supervised classifiers are typically trained on scored data and then validated on scored out-of-sample data. Here we describe a methodology based on HMMs for scoring an overnight sleep recording without the benefit of a trained initial model. The number of states in the data is not known a priori and is optimized using a Bayes information criterion. When tested on a 22-subject database, this unsupervised classifier agreed well with human scores (mean of Cohen's kappa > 0.7). The HMM also outperformed other unsupervised classifiers (Gaussian mixture models, k-means, and linkage trees), that are capable of naive classification but do not model dynamics, by a significant margin (p < 0.05).
Emotion detection model of Filipino music
NASA Astrophysics Data System (ADS)
Noblejas, Kathleen Alexis; Isidro, Daryl Arvin; Samonte, Mary Jane C.
2017-02-01
This research explored the creation of a model to detect emotion from Filipino songs. The emotion model used was based from Paul Ekman's six basic emotions. The songs were classified into the following genres: kundiman, novelty, pop, and rock. The songs were annotated by a group of music experts based on the emotion the song induces to the listener. Musical features of the songs were extracted using jAudio while the lyric features were extracted by Bag-of- Words feature representation. The audio and lyric features of the Filipino songs were extracted for classification by the chosen three classifiers, Naïve Bayes, Support Vector Machines, and k-Nearest Neighbors. The goal of the research was to know which classifier would work best for Filipino music. Evaluation was done by 10-fold cross validation and accuracy, precision, recall, and F-measure results were compared. Models were also tested with unknown test data to further determine the models' accuracy through the prediction results.
Fortuno, Cristina; James, Paul A; Young, Erin L; Feng, Bing; Olivier, Magali; Pesaran, Tina; Tavtigian, Sean V; Spurdle, Amanda B
2018-05-18
Clinical interpretation of germline missense variants represents a major challenge, including those in the TP53 Li-Fraumeni syndrome gene. Bioinformatic prediction is a key part of variant classification strategies. We aimed to optimize the performance of the Align-GVGD tool used for p53 missense variant prediction, and compare its performance to other bioinformatic tools (SIFT, PolyPhen-2) and ensemble methods (REVEL, BayesDel). Reference sets of assumed pathogenic and assumed benign variants were defined using functional and/or clinical data. Area under the curve and Matthews correlation coefficient (MCC) values were used as objective functions to select an optimized protein multi-sequence alignment with best performance for Align-GVGD. MCC comparison of tools using binary categories showed optimized Align-GVGD (C15 cut-off) combined with BayesDel (0.16 cut-off), or with REVEL (0.5 cut-off), to have the best overall performance. Further, a semi-quantitative approach using multiple tiers of bioinformatic prediction, validated using an independent set of non-functional and functional variants, supported use of Align-GVGD and BayesDel prediction for different strength of evidence levels in ACMG/AMP rules. We provide rationale for bioinformatic tool selection for TP53 variant classification, and have also computed relevant bioinformatic predictions for every possible p53 missense variant to facilitate their use by the scientific and medical community. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
An Evaluation of Hierarchical Bayes Estimation for the Two- Parameter Logistic Model.
ERIC Educational Resources Information Center
Kim, Seock-Ho
Hierarchical Bayes procedures for the two-parameter logistic item response model were compared for estimating item parameters. Simulated data sets were analyzed using two different Bayes estimation procedures, the two-stage hierarchical Bayes estimation (HB2) and the marginal Bayesian with known hyperparameters (MB), and marginal maximum…
Assessment of various supervised learning algorithms using different performance metrics
NASA Astrophysics Data System (ADS)
Susheel Kumar, S. M.; Laxkar, Deepak; Adhikari, Sourav; Vijayarajan, V.
2017-11-01
Our work brings out comparison based on the performance of supervised machine learning algorithms on a binary classification task. The supervised machine learning algorithms which are taken into consideration in the following work are namely Support Vector Machine(SVM), Decision Tree(DT), K Nearest Neighbour (KNN), Naïve Bayes(NB) and Random Forest(RF). This paper mostly focuses on comparing the performance of above mentioned algorithms on one binary classification task by analysing the Metrics such as Accuracy, F-Measure, G-Measure, Precision, Misclassification Rate, False Positive Rate, True Positive Rate, Specificity, Prevalence.
2009-08-01
14. Comparison of surface light extinction for base and 1950’ s RMB2 results in the upper, mid, and lower regions of the Chesapeake Bay...Lower Bay Light Extinction Surface 1950 Lower Bay Light Extinction Surface Figure 14. Comparison of surface light extinction for base and 1950’ s RMB2...ER D C/ EL T R -0 9 -9 System-Wide Water Resources Program Recreating the 1950’ s Chesapeake Bay: Use of a Network Model to Guide the
Comparison of four approaches to a rock facies classification problem
Dubois, M.K.; Bohling, Geoffrey C.; Chakrabarti, S.
2007-01-01
In this study, seven classifiers based on four different approaches were tested in a rock facies classification problem: classical parametric methods using Bayes' rule, and non-parametric methods using fuzzy logic, k-nearest neighbor, and feed forward-back propagating artificial neural network. Determining the most effective classifier for geologic facies prediction in wells without cores in the Panoma gas field, in Southwest Kansas, was the objective. Study data include 3600 samples with known rock facies class (from core) with each sample having either four or five measured properties (wire-line log curves), and two derived geologic properties (geologic constraining variables). The sample set was divided into two subsets, one for training and one for testing the ability of the trained classifier to correctly assign classes. Artificial neural networks clearly outperformed all other classifiers and are effective tools for this particular classification problem. Classical parametric models were inadequate due to the nature of the predictor variables (high dimensional and not linearly correlated), and feature space of the classes (overlapping). The other non-parametric methods tested, k-nearest neighbor and fuzzy logic, would need considerable improvement to match the neural network effectiveness, but further work, possibly combining certain aspects of the three non-parametric methods, may be justified. ?? 2006 Elsevier Ltd. All rights reserved.
Delineation of marsh types of the Texas coast from Corpus Christi Bay to the Sabine River in 2010
Enwright, Nicholas M.; Hartley, Stephen B.; Brasher, Michael G.; Visser, Jenneke M.; Mitchell, Michael K.; Ballard, Bart M.; Parr, Mark W.; Couvillion, Brady R.; Wilson, Barry C.
2014-01-01
Coastal zone managers and researchers often require detailed information regarding emergent marsh vegetation types for modeling habitat capacities and needs of marsh-reliant wildlife (such as waterfowl and alligator). Detailed information on the extent and distribution of marsh vegetation zones throughout the Texas coast has been historically unavailable. In response, the U.S. Geological Survey, in cooperation and collaboration with the U.S. Fish and Wildlife Service via the Gulf Coast Joint Venture, Texas A&M University-Kingsville, the University of Louisiana-Lafayette, and Ducks Unlimited, Inc., has produced a classification of marsh vegetation types along the middle and upper Texas coast from Corpus Christi Bay to the Sabine River. This study incorporates approximately 1,000 ground reference locations collected via helicopter surveys in coastal marsh areas and about 2,000 supplemental locations from fresh marsh, water, and “other” (that is, nonmarsh) areas. About two-thirds of these data were used for training, and about one-third were used for assessing accuracy. Decision-tree analyses using Rulequest See5 were used to classify emergent marsh vegetation types by using these data, multitemporal satellite-based multispectral imagery from 2009 to 2011, a bare-earth digital elevation model (DEM) based on airborne light detection and ranging (lidar), alternative contemporary land cover classifications, and other spatially explicit variables believed to be important for delineating the extent and distribution of marsh vegetation communities. Image objects were generated from segmentation of high-resolution airborne imagery acquired in 2010 and were used to refine the classification. The classification is dated 2010 because the year is both the midpoint of the multitemporal satellite-based imagery (2009–11) classified and the date of the high-resolution airborne imagery that was used to develop image objects. Overall accuracy corrected for bias (accuracy estimate incorporates true marginal proportions) was 91 percent (95 percent confidence interval [CI]: 89.2–92.8), with a kappa statistic of 0.79 (95 percent CI: 0.77–0.81). The classification performed best for saline marsh (user’s accuracy 81.5 percent; producer’s accuracy corrected for bias 62.9 percent) but showed a lesser ability to discriminate intermediate marsh (user’s accuracy 47.7 percent; producer’s accuracy corrected for bias 49.5 percent). Because of confusion in intermediate and brackish marsh classes, an alternative classification containing only three marsh types was created in which intermediate and brackish marshes were combined into a single class. Image objects were reattributed by using this alternative three-marsh-type classification. Overall accuracy, corrected for bias, of this more general classification was 92.4 percent (95 percent CI: 90.7–94.2), and the kappa statistic was 0.83 (95 percent CI: 0.81–0.85). Mean user’s accuracy for marshes within the four-marsh-type and three-marsh-type classifications was 65.4 percent and 75.6 percent, respectively, whereas mean producer’s accuracy was 56.7 percent and 65.1 percent, respectively. This study provides a more objective and repeatable method for classifying marsh types of the middle and upper Texas coast at an extent and greater level of detail than previously available for the study area. The seamless classification produced through this work is now available to help State agencies (such as the Texas Parks and Wildlife Department) and landscape-scale conservation partnerships (such as the Gulf Coast Prairie Landscape Conservation Cooperative and the Gulf Coast Joint Venture) to develop and (or) refine conservation plans targeting priority natural resources. Moreover, these data may improve projections of landscape change and serve as a baseline for monitoring future changes resulting from chronic and episodic stressors.
NASA Astrophysics Data System (ADS)
Bever, A. J.; Harris, C. K.; McNinch, J.
2006-12-01
Poverty Bay is a small embayment located on the eastern shore of New Zealand's North Island. The modern Waipaoa River, a small mountainous river that drains highly erodible mudstone and siltstone, discharges ~15 million tons of sediment per year to Poverty Bay. Rates of bay infilling from fluvial sediment have varied since the maximum shoreline transgression, ~7000 kya. The evolving geometry of Poverty Bay has likely impacted sediment dispersal over these timescales, and thereby influenced the stratigraphic architecture, rates of shoreline progradation, and sediment supply to the continental shelf. This modeling study investigates sediment transport within both modern and paleo, ~7000 kya, Poverty Bays. The Regional Ocean Modeling System was used to examine sediment transport within modern and ~7000 kya Poverty Bay basin geometries. The numerical model includes hydrodynamics driven by winds and buoyancy, and sediment resuspension from energetic waves and currents. Strong winds and waves from the southeast were used, along with high Waipaoa freshwater and sediment discharge, consistent with storm conditions. Besides shedding light on short term transport mechanisms, these results are being incorporated into a stratigraphic model by Wolinsky and Swenson. The paleo basin geometry narrowed at the head of the bay, causing currents to converge and promoting near- field sediment deposition. Buoyancy and wind driven across-shelf currents in the modern bay transport sediment away from the river mouth. Sediment was deposited closer to the river mouth in the paleo than the modern bay, and the modern bay exported much more sediment to the continental shelf than predicted for the middle Holocene bay. Net across-shelf fluxes decreased from a maximum at the head of the bay to nearly zero at the mouth during the paleo run. The modern run, however, had net across-shelf fluxes still half the maximum at the bay mouth. Results from short term model runs indicated that, with similar river discharges, the 7000 kya Poverty Bay shoreline should have prograded rapidly as sediment was deposited near the river mouth at the head of the bay, an area of little accommodation space. The trapping of sediment within the bay would have lead to a relatively sediment starved continental shelf. As the river mouth progressed towards the wider section of the bay, progradation should have been reduced as both proximal accommodation space and sediment export to the continental shelf increased.
Hydrodynamics and water quality models applied to Sepetiba Bay
NASA Astrophysics Data System (ADS)
Cunha, Cynara de L. da N.; Rosman, Paulo C. C.; Ferreira, Aldo Pacheco; Carlos do Nascimento Monteiro, Teófilo
2006-10-01
A coupled hydrodynamic and water quality model is used to simulate the pollution in Sepetiba Bay due to sewage effluent. Sepetiba Bay has a complicated geometry and bottom topography, and is located on the Brazilian coast near Rio de Janeiro. In the simulation, the dissolved oxygen (DO) concentration and biochemical oxygen demand (BOD) are used as indicators for the presence of organic matter in the body of water, and as parameters for evaluating the environmental pollution of the eastern part of Sepetiba Bay. Effluent sources in the model are taken from DO and BOD field measurements. The simulation results are consistent with field observations and demonstrate that the model has been correctly calibrated. The model is suitable for evaluating the environmental impact of sewage effluent on Sepetiba Bay from river inflows, assessing the feasibility of different treatment schemes, and developing specific monitoring activities. This approach has general applicability for environmental assessment of complicated coastal bays.
NASA Astrophysics Data System (ADS)
Ye, Wei; Song, Wei
2018-02-01
In The Paper, the remote sensing monitoring of sea ice problem was turned into a classification problem in data mining. Based on the statistic of the related band data of HJ1B remote sensing images, the main bands of HJ1B images related with the reflectance of seawater and sea ice were found. On the basis, the decision tree rules for sea ice monitoring were constructed by the related bands found above, and then the rules were applied to Liaodong Bay area seriously covered by sea ice for sea ice monitoring. The result proved that the method is effective.
Flow in water-intake pump bays: A guide for utility engineers. Final report
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ettema, R.
1998-09-01
This report is intended to serve as a guide for power-plant engineers facing problems with flow conditions in pump bays in water-intake structures, especially those located alongside rivers. The guide briefly introduces the typical prevailing flow field outside of a riverside water intake. That flow field often sets the inflow conditions for pump bays located within the water intake. The monograph then presents and discusses the main flow problems associated with pump bays. The problems usually revolve around the formation of troublesome vortices. A novel feature of this monograph is the use of numerical modeling to reveal diagnostically how themore » vortices form and their sensitivities to flow conditions, such as uniformity of approach flow entering the bay and water-surface elevation relative to pump-bell submergence. The modeling was carried out using a computer code developed specially for the present project. Pump-bay layouts are discussed next. The discussion begins with a summary of the main variables influencing bay flows. The numerical model is used to determine the sensitivities of the vortices to variations in the geometric parameters. The fixes include the use of flow-control vanes and suction scoops for ensuring satisfactory flow performance in severe flow conditions; notably flows with strong cross flow and shallow flows. The monograph ends with descriptions of modeling techniques. An extensive discussion is provided on the use of numerical model for illuminating bay flows. The model is used to show how fluid viscosity affects bay flow. The effect of fluid viscosity is an important consideration in hydraulic modeling of water intakes.« less
Classifying smoking urges via machine learning
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-01-01
Background and objective Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. Methods To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. Results The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. Conclusions In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms’ performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. PMID:28110725
Classifying smoking urges via machine learning.
Dumortier, Antoine; Beckjord, Ellen; Shiffman, Saul; Sejdić, Ervin
2016-12-01
Smoking is the largest preventable cause of death and diseases in the developed world, and advances in modern electronics and machine learning can help us deliver real-time intervention to smokers in novel ways. In this paper, we examine different machine learning approaches to use situational features associated with having or not having urges to smoke during a quit attempt in order to accurately classify high-urge states. To test our machine learning approaches, specifically, Bayes, discriminant analysis and decision tree learning methods, we used a dataset collected from over 300 participants who had initiated a quit attempt. The three classification approaches are evaluated observing sensitivity, specificity, accuracy and precision. The outcome of the analysis showed that algorithms based on feature selection make it possible to obtain high classification rates with only a few features selected from the entire dataset. The classification tree method outperformed the naive Bayes and discriminant analysis methods, with an accuracy of the classifications up to 86%. These numbers suggest that machine learning may be a suitable approach to deal with smoking cessation matters, and to predict smoking urges, outlining a potential use for mobile health applications. In conclusion, machine learning classifiers can help identify smoking situations, and the search for the best features and classifier parameters significantly improves the algorithms' performance. In addition, this study also supports the usefulness of new technologies in improving the effect of smoking cessation interventions, the management of time and patients by therapists, and thus the optimization of available health care resources. Future studies should focus on providing more adaptive and personalized support to people who really need it, in a minimum amount of time by developing novel expert systems capable of delivering real-time interventions. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
To study the circulation and water quality in the Tillamook Bay, Oregon, a high-resolution estuarine model that covers the shallow bay and the surrounding wetland has been developed. The estuarine circulation at Tillamook Bay is mainly driven by the tides and the river flows and ...
Sea Ice Detection Based on an Improved Similarity Measurement Method Using Hyperspectral Data.
Han, Yanling; Li, Jue; Zhang, Yun; Hong, Zhonghua; Wang, Jing
2017-05-15
Hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, thus providing an important means of sea ice detection. However, the correlation and redundancy among hyperspectral bands reduce the accuracy of traditional sea ice detection methods. Based on the spectral characteristics of sea ice, this study presents an improved similarity measurement method based on linear prediction (ISMLP) to detect sea ice. First, the first original band with a large amount of information is determined based on mutual information theory. Subsequently, a second original band with the least similarity is chosen by the spectral correlation measuring method. Finally, subsequent bands are selected through the linear prediction method, and a support vector machine classifier model is applied to classify sea ice. In experiments performed on images of Baffin Bay and Bohai Bay, comparative analyses were conducted to compare the proposed method and traditional sea ice detection methods. Our proposed ISMLP method achieved the highest classification accuracies (91.18% and 94.22%) in both experiments. From these results the ISMLP method exhibits better performance overall than other methods and can be effectively applied to hyperspectral sea ice detection.
Sea Ice Detection Based on an Improved Similarity Measurement Method Using Hyperspectral Data
Han, Yanling; Li, Jue; Zhang, Yun; Hong, Zhonghua; Wang, Jing
2017-01-01
Hyperspectral remote sensing technology can acquire nearly continuous spectrum information and rich sea ice image information, thus providing an important means of sea ice detection. However, the correlation and redundancy among hyperspectral bands reduce the accuracy of traditional sea ice detection methods. Based on the spectral characteristics of sea ice, this study presents an improved similarity measurement method based on linear prediction (ISMLP) to detect sea ice. First, the first original band with a large amount of information is determined based on mutual information theory. Subsequently, a second original band with the least similarity is chosen by the spectral correlation measuring method. Finally, subsequent bands are selected through the linear prediction method, and a support vector machine classifier model is applied to classify sea ice. In experiments performed on images of Baffin Bay and Bohai Bay, comparative analyses were conducted to compare the proposed method and traditional sea ice detection methods. Our proposed ISMLP method achieved the highest classification accuracies (91.18% and 94.22%) in both experiments. From these results the ISMLP method exhibits better performance overall than other methods and can be effectively applied to hyperspectral sea ice detection. PMID:28505135
[Diversity and antimicrobial activities of cultivable bacteria isolated from Jiaozhou Bay].
Wang, Yiting; Zhang, Chuanbo; Qi, Lin; Jia, Xiaoqiang; Lu, Wenyu
2016-12-04
Marine microorganisms have a great potential in producing biologically active secondary metabolites. In order to study the diversity and antimicrobial activity, we explored 9 sediment samples in different observation sites of Jiaozhou bay. We used YPD and Z2216E culture medium to isolate bacteria from the sediments; 16S rRNA was sequenced for classification and identification of the isolates. Then, we used Oxford cup method to detect antimicrobial activities of the isolated bacteria against 7 test strains. Lastly, we selected 16 representatives to detect secondary-metabolite biosynthesis genes:PKSI, NRPS, CYP, PhzE, dTGD by PCR specific amplification. A total of 76 bacterial strains were isolated from Jiaozhou bay; according to the 16S rRNA gene sequence analysis. These strains could be sorted into 11 genera belonging to 8 different families:Aneurinibacillus, Brevibacillus, Microbacterium, Oceanisphae, Bacillus, Marinomonas, Staphylococcus, Kocuria, Arthrobacters, Micrococcus and Pseudoalteromonas. Of them 34 strains showed antimicrobial activity against at least one of the tested strains. All 16 strains had at least one function genes, 5 strains possessed more than three function genes. Jiaozhou bay area is rich in microbial resources with potential in providing useful secondary metabolites.
Brooks, R.A.; Bell, S.S.
2005-01-01
A descriptive study of the architecture of the red mangrove, Rhizophora mangle L., habitat of Tampa Bay, FL, was conducted to assess if plant architecture could be used to discriminate overwash from fringing forest type. Seven above-water (e.g., tree height, diameter at breast height, and leaf area) and 10 below-water (e.g., root density, root complexity, and maximum root order) architectural features were measured in eight mangrove stands. A multivariate technique (discriminant analysis) was used to test the ability of different models comprising above-water, below-water, or whole tree architecture to classify forest type. Root architectural features appear to be better than classical forestry measurements at discriminating between fringing and overwash forests but, regardless of the features loaded into the model, misclassification rates were high as forest type was only correctly classified in 66% of the cases. Based upon habitat architecture, the results of this study do not support a sharp distinction between overwash and fringing red mangrove forests in Tampa Bay but rather indicate that the two are architecturally undistinguishable. Therefore, within this northern portion of the geographic range of red mangroves, a more appropriate classification system based upon architecture may be one in which overwash and fringing forest types are combined into a single, "tide dominated" category. ?? 2005 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Melesse, Assefa; Hajigholizadeh, Mohammad; Blakey, Tara
2017-04-01
In this study, Landsat 8 and Sea-Viewing Wide Field-of-View Sensor (SeaWIFS) sensors were used to model the spatiotemporal changes of four water quality parameters: Landsat 8 (turbidity, chlorophyll-a (chl-a), total phosphate, and total nitrogen) and Sea-Viewing Wide Field-of-View Sensor (SeaWIFS) (algal blooms). The study was conducted in Florda bay, south Florida and model outputs were compared with in-situ observed data. The Landsat 8 based study found that, the predictive models to estimate chl-a and turbidity concentrations, developed through the use of stepwise multiple linear regression (MLR), gave high coefficients of determination in dry season (wet season) (R2 = 0.86(0.66) for chl-a and R2 = 0.84(0.63) for turbidity). Total phosphate and TN were estimated using best-fit multiple linear regression models as a function of Landsat TM and OLI,127 and ground data and showed a high coefficient of determination in dry season (wet season) (R2 = 0.74(0.69) for total phosphate and R2 = 0.82(0.82) for TN). Similarly, the ability of SeaWIFS for chl-a retrieval from optically shallow coastal waters by applying algorithms specific to the pixels' benthic class was evaluated. Benthic class was determined through satellite image-based classification methods. It was found that benthic class based chl-a modeling algorithm was better than the existing regionally-tuned approach. Evaluation of the residuals indicated the potential for further improvement to chl-a estimation through finer characterization of benthic environments. Key words: Landsat, SeaWIFS, water quality, Florida bay, Chl-a, turbidity
Bayesian methods for estimating GEBVs of threshold traits
Wang, C-L; Ding, X-D; Wang, J-Y; Liu, J-F; Fu, W-X; Zhang, Z; Yin, Z-J; Zhang, Q
2013-01-01
Estimation of genomic breeding values is the key step in genomic selection (GS). Many methods have been proposed for continuous traits, but methods for threshold traits are still scarce. Here we introduced threshold model to the framework of GS, and specifically, we extended the three Bayesian methods BayesA, BayesB and BayesCπ on the basis of threshold model for estimating genomic breeding values of threshold traits, and the extended methods are correspondingly termed BayesTA, BayesTB and BayesTCπ. Computing procedures of the three BayesT methods using Markov Chain Monte Carlo algorithm were derived. A simulation study was performed to investigate the benefit of the presented methods in accuracy with the genomic estimated breeding values (GEBVs) for threshold traits. Factors affecting the performance of the three BayesT methods were addressed. As expected, the three BayesT methods generally performed better than the corresponding normal Bayesian methods, in particular when the number of phenotypic categories was small. In the standard scenario (number of categories=2, incidence=30%, number of quantitative trait loci=50, h2=0.3), the accuracies were improved by 30.4%, 2.4%, and 5.7% points, respectively. In most scenarios, BayesTB and BayesTCπ generated similar accuracies and both performed better than BayesTA. In conclusion, our work proved that threshold model fits well for predicting GEBVs of threshold traits, and BayesTCπ is supposed to be the method of choice for GS of threshold traits. PMID:23149458
Ioannidis, Konstantinos; Chamberlain, Samuel R; Treder, Matthias S; Kiraly, Franz; Leppink, Eric W; Redden, Sarah A; Stein, Dan J; Lochner, Christine; Grant, Jon E
2016-12-01
Problematic internet use is common, functionally impairing, and in need of further study. Its relationship with obsessive-compulsive and impulsive disorders is unclear. Our objective was to evaluate whether problematic internet use can be predicted from recognised forms of impulsive and compulsive traits and symptomatology. We recruited volunteers aged 18 and older using media advertisements at two sites (Chicago USA, and Stellenbosch, South Africa) to complete an extensive online survey. State-of-the-art out-of-sample evaluation of machine learning predictive models was used, which included Logistic Regression, Random Forests and Naïve Bayes. Problematic internet use was identified using the Internet Addiction Test (IAT). 2006 complete cases were analysed, of whom 181 (9.0%) had moderate/severe problematic internet use. Using Logistic Regression and Naïve Bayes we produced a classification prediction with a receiver operating characteristic area under the curve (ROC-AUC) of 0.83 (SD 0.03) whereas using a Random Forests algorithm the prediction ROC-AUC was 0.84 (SD 0.03) [all three models superior to baseline models p < 0.0001]. The models showed robust transfer between the study sites in all validation sets [p < 0.0001]. Prediction of problematic internet use was possible using specific measures of impulsivity and compulsivity in a population of volunteers. Moreover, this study offers proof-of-concept in support of using machine learning in psychiatry to demonstrate replicability of results across geographically and culturally distinct settings. Copyright © 2016 The Author(s). Published by Elsevier Ltd.. All rights reserved.
Elwen, Simon Harvey; Nastasi, Aurora
2014-01-01
A signature whistle type is a learned, individually distinctive whistle type in a dolphin's acoustic repertoire that broadcasts the identity of the whistle owner. The acquisition and use of signature whistles indicates complex cognitive functioning that requires wider investigation in wild dolphin populations. Here we identify signature whistle types from a population of approximately 100 wild common bottlenose dolphins (Tursiops truncatus) inhabiting Walvis Bay, and describe signature whistle occurrence, acoustic parameters and temporal production. A catalogue of 43 repeatedly emitted whistle types (REWTs) was generated by analysing 79 hrs of acoustic recordings. From this, 28 signature whistle types were identified using a method based on the temporal patterns in whistle sequences. A visual classification task conducted by 5 naïve judges showed high levels of agreement in classification of whistles (Fleiss-Kappa statistic, κ = 0.848, Z = 55.3, P<0.001) and supported our categorisation. Signature whistle structure remained stable over time and location, with most types (82%) recorded in 2 or more years, and 4 identified at Walvis Bay and a second field site approximately 450 km away. Whistle acoustic parameters were consistent with those of signature whistles documented in Sarasota Bay (Florida, USA). We provide evidence of possible two-voice signature whistle production by a common bottlenose dolphin. Although signature whistle types have potential use as a marker for studying individual habitat use, we only identified approximately 28% of those from the Walvis Bay population, despite considerable recording effort. We found that signature whistle type diversity was higher in larger dolphin groups and groups with calves present. This is the first study describing signature whistles in a wild free-ranging T. truncatus population inhabiting African waters and it provides a baseline on which more in depth behavioural studies can be based. PMID:25203814
Chen, You-Shyang; Cheng, Ching-Hsue; Lai, Chien-Jung; Hsu, Cheng-Yi; Syu, Han-Jhou
2012-02-01
Identifying patients in a Target Customer Segment (TCS) is important to determine the demand for, and to appropriately allocate resources for, health care services. The purpose of this study is to propose a two-stage clustering-classification model through (1) initially integrating the RFM attribute and K-means algorithm for clustering the TCS patients and (2) then integrating the global discretization method and the rough set theory for classifying hospitalized departments and optimizing health care services. To assess the performance of the proposed model, a dataset was used from a representative hospital (termed Hospital-A) that was extracted from a database from an empirical study in Taiwan comprised of 183,947 samples that were characterized by 44 attributes during 2008. The proposed model was compared with three techniques, Decision Tree, Naive Bayes, and Multilayer Perceptron, and the empirical results showed significant promise of its accuracy. The generated knowledge-based rules provide useful information to maximize resource utilization and support the development of a strategy for decision-making in hospitals. From the findings, 75 patients in the TCS, three hospital departments, and specific diagnostic items were discovered in the data for Hospital-A. A potential determinant for gender differences was found, and the age attribute was not significant to the hospital departments. Copyright © 2011 Elsevier Ltd. All rights reserved.
Sediment calibration strategies of Phase 5 Chesapeake Bay watershed model
Wu, J.; Shenk, G.W.; Raffensperger, Jeff P.; Moyer, D.; Linker, L.C.; ,
2005-01-01
Sediment is a primary constituent of concern for Chesapeake Bay due to its effect on water clarity. Accurate representation of sediment processes and behavior in Chesapeake Bay watershed model is critical for developing sound load reduction strategies. Sediment calibration remains one of the most difficult components of watershed-scale assessment. This is especially true for Chesapeake Bay watershed model given the size of the watershed being modeled and complexity involved in land and stream simulation processes. To obtain the best calibration, the Chesapeake Bay program has developed four different strategies for sediment calibration of Phase 5 watershed model, including 1) comparing observed and simulated sediment rating curves for different parts of the hydrograph; 2) analyzing change of bed depth over time; 3) relating deposition/scour to total annual sediment loads; and 4) calculating "goodness-of-fit' statistics. These strategies allow a more accurate sediment calibration, and also provide some insightful information on sediment processes and behavior in Chesapeake Bay watershed.
Hyperspectral analysis of seagrass in Redfish Bay, Texas
NASA Astrophysics Data System (ADS)
Wood, John S.
Remote sensing using multi- and hyperspectral imaging and analysis has been used in resource management for quite some time, and for a variety of purposes. In the studies to follow, hyperspectral imagery of Redfish Bay is used to discriminate between species of seagrasses found below the water surface. Water attenuates and reflects light and energy from the electromagnetic spectrum, and as a result, subsurface analysis can be more complex than that performed in the terrestrial world. In the following studies, an iterative process is developed, using ENVI image processing software and ArcGIS software. Band selection was based on recommendations developed empirically in conjunction with ongoing research into depth corrections, which were applied to the imagery bands (a default depth of 65 cm was used). Polygons generated, classified and aggregated within ENVI are reclassified in ArcGIS using field site data that was randomly selected for that purpose. After the first iteration, polygons that remain classified as 'Mixed' are subjected to another iteration of classification in ENVI, then brought into ArcGIS and reclassified. Finally, when that classification scheme is exhausted, a supervised classification is performed, using a 'Maximum Likelihood' classification technique, which assigned the remaining polygons to the classification that was most like the training polygons, by digital number value. Producer's Accuracy by classification ranged from 23.33 % for the 'MixedMono' class to 66.67% for the 'Bare' class; User's Accuracy by classification ranged from 22.58% for the 'MixedMono' class to 69.57% for the 'Bare' classification. An overall accuracy of 37.93% was achieved. Producers and Users Accuracies for Halodule were 29% and 39%, respectively; for Thalassia, they were 46% and 40%. Cohen's Kappa Coefficient was calculated at .2988. We then returned to the field and collected spectral signatures of monotypic stands of seagrass at varying depths and at three sensor levels: above the water surface, just below the air/water interface, and at the canopy position, when it differed from the subsurface position. Analysis of plots of these spectral curves, after applying depth corrections and Multiplicative Scatter Correction, indicates that there are detectable spectral differences between Halodule and Thalassia species at all three positions. Further analysis indicated that only above-surface spectral signals could reliably be used to discriminate between species, because there was an overlap of the standard deviations in the other two positions. A recommendation for wavelengths that would produce increased accuracy in hyperspectral image analysis was made, based on areas where there is a significant amount of difference between the mean spectral signatures, and no overlap of the standard deviations in our samples. The original hyperspectral imagery was reprocessed, using the bands recommended from the research above (approximately 535, 600, 620, 638, and 656 nm). A depth raster was developed from various available sources, which was resampled and reclassified to reflect values for water absorption and water scattering, which were then applied to each band using the depth correction algorithm. Processing followed the iterative classification methods described above. Accuracy for this round of processing improved; overall accuracy increased from 38% to 57%. Improvements were noted in Producer's Accuracy, with the 'Bare' vi classification increasing from 67% to 73%, Halodule increasing from 29% to 63%, Thalassia increasing slightly, from 46% to 50%, and 'MixedMono' improving from 23% to 42%. User's Accuracy also improved, with the 'Bare' class increasing from 69% to 70%, Halodule increasing from 39% to 67%, Thalassia increasing from 40% to 7%, and 'MixedMono' increasing from 22.5% to 35%. A very recent report shows the mean percent cover of seagrasses in Redfish Bay and Corpus Christi Bay combined for all species at 68.6%, and individually by species: Halodule 39.8%, Thalassia 23.7%, Syringodium 4%, Ruppia 1% and Halophila 0.1%. Our study classifies 15% as 'Bare', 23% Halodule, 18% Thalassia, and 2% Ruppia. In addition, we classify 5% as 'Mixed', 22% as 'MixedMono', 12% as 'Bare/Halodule Mix', and 3% 'Bare/Thalassia Mix'. Aggregating the 'Bare' and 'Bare/species' classes would equate to approximately 30%, very close to what this new study produces. Other classes are quite similar, when considering that their study includes no 'Mixed' classifications. This series of research studies illustrates the application and utility of hyperspectral imagery and associated processing to mapping shallow benthic habitats. It also demonstrates that the technology is rapidly changing and adapting, which will lead to even further increases in accuracy. Future studies with hyperspectral imaging should include extensive spectral field collection, and the application of a depth correction.
Wu, H Y; Chen, K L; Chen, Z H; Chen, Q H; Qiu, Y P; Wu, J C; Zhang, J F
2012-03-01
This research presented an evaluation for the ecological quality status (EcoQS) of three semi-enclosed coastal areas using fuzzy integrated assessment method (FIAM). With this method, the hierarchy structure was clarified by an index system of 11 indicators selected from biotic elements and physicochemical elements, and the weight vector of index system was calculated with Delphi-Analytic Hierarchy Process (AHP) procedure. Then, the FIAM was used to achieve an EcoQS assessment. As a result of assessment, most of the sampling stations demonstrated a clear gradient in EcoQS, ranging from high to poor status. Among the four statuses, high and good, owning a ratio of 55.9% and 26.5%, respectively, were two dominant statuses for three bays, especially for Sansha Bay and Luoyuan Bay. The assessment results were found consistent with the pressure information and parameters obtained at most stations. In addition, the sources of uncertainty in classification of EcoQS were also discussed. Copyright © 2011 Elsevier Ltd. All rights reserved.
Langland, Michael J.; Blomquist, Joel D.; Moyer, Douglas; Hyer, Kenneth; Chanat, Jeffrey G.
2013-01-01
The U.S. Geological Survey, in cooperation with Chesapeake Bay Program (CBP) partners, routinely reports long-term concentration trends and monthly and annual constituent loads for stream water-quality monitoring stations across the Chesapeake Bay watershed. This report documents flow-adjusted trends in sediment and total nitrogen and phosphorus concentrations for 31 stations in the years 1985–2011 and for 32 stations in the years 2002–2011. Sediment and total nitrogen and phosphorus yields for 65 stations are presented for the years 2006–2011. A combined nontidal water-quality indicator (based on both trends and yields) indicates there are more stations classified as “improving water-quality trend and a low yield” than “degrading water-quality trend and a high yield” for total nitrogen. The same type of 2-way classification for total phosphorus and sediment results in equal numbers of stations in each indicator class.
BayeSED: A General Approach to Fitting the Spectral Energy Distribution of Galaxies
NASA Astrophysics Data System (ADS)
Han, Yunkun; Han, Zhanwen
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large Ks -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has been performed for the first time. We found that the 2003 model by Bruzual & Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the Ks -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.
BayeSED: A GENERAL APPROACH TO FITTING THE SPECTRAL ENERGY DISTRIBUTION OF GALAXIES
DOE Office of Scientific and Technical Information (OSTI.GOV)
Han, Yunkun; Han, Zhanwen, E-mail: hanyk@ynao.ac.cn, E-mail: zhanwenhan@ynao.ac.cn
2014-11-01
We present a newly developed version of BayeSED, a general Bayesian approach to the spectral energy distribution (SED) fitting of galaxies. The new BayeSED code has been systematically tested on a mock sample of galaxies. The comparison between the estimated and input values of the parameters shows that BayeSED can recover the physical parameters of galaxies reasonably well. We then applied BayeSED to interpret the SEDs of a large K{sub s} -selected sample of galaxies in the COSMOS/UltraVISTA field with stellar population synthesis models. Using the new BayeSED code, a Bayesian model comparison of stellar population synthesis models has beenmore » performed for the first time. We found that the 2003 model by Bruzual and Charlot, statistically speaking, has greater Bayesian evidence than the 2005 model by Maraston for the K{sub s} -selected sample. In addition, while setting the stellar metallicity as a free parameter obviously increases the Bayesian evidence of both models, varying the initial mass function has a notable effect only on the Maraston model. Meanwhile, the physical parameters estimated with BayeSED are found to be generally consistent with those obtained using the popular grid-based FAST code, while the former parameters exhibit more natural distributions. Based on the estimated physical parameters of the galaxies in the sample, we qualitatively classified the galaxies in the sample into five populations that may represent galaxies at different evolution stages or in different environments. We conclude that BayeSED could be a reliable and powerful tool for investigating the formation and evolution of galaxies from the rich multi-wavelength observations currently available. A binary version of the BayeSED code parallelized with Message Passing Interface is publicly available at https://bitbucket.org/hanyk/bayesed.« less
Classification of iRBD and Parkinson's disease patients based on eye movements during sleep.
Christensen, Julie A E; Koch, Henriette; Frandsen, Rune; Kempfner, Jacob; Arvastson, Lars; Christensen, Soren R; Sorensen, Helge B D; Jennum, Poul
2013-01-01
Patients suffering from the sleep disorder idiopathic rapid-eye-movement sleep behavior disorder (iRBD) have been observed to be in high risk of developing Parkinson's disease (PD). This makes it essential to analyze them in the search for PD biomarkers. This study aims at classifying patients suffering from iRBD or PD based on features reflecting eye movements (EMs) during sleep. A Latent Dirichlet Allocation (LDA) topic model was developed based on features extracted from two electrooculographic (EOG) signals measured as parts in full night polysomnographic (PSG) recordings from ten control subjects. The trained model was tested on ten other control subjects, ten iRBD patients and ten PD patients, obtaining a EM topic mixture diagram for each subject in the test dataset. Three features were extracted from the topic mixture diagrams, reflecting "certainty", "fragmentation" and "stability" in the timely distribution of the EM topics. Using a Naive Bayes (NB) classifier and the features "certainty" and "stability" yielded the best classification result and the subjects were classified with a sensitivity of 95 %, a specificity of 80% and an accuracy of 90 %. This study demonstrates in a data-driven approach, that iRBD and PD patients may exhibit abnorm form and/or timely distribution of EMs during sleep.
NASA Technical Reports Server (NTRS)
Urquhart, Erin A.; Zaitchik, Benjamin F.; Waugh, Darryn W.; Guikema, Seth D.; Del Castillo, Carlos E.
2014-01-01
The effect that climate change and variability will have on waterborne bacteria is a topic of increasing concern for coastal ecosystems, including the Chesapeake Bay. Surface water temperature trends in the Bay indicate a warming pattern of roughly 0.3-0.4 C per decade over the past 30 years. It is unclear what impact future warming will have on pathogens currently found in the Bay, including Vibrio spp. Using historical environmental data, combined with three different statistical models of Vibrio vulnificus probability, we explore the relationship between environmental change and predicted Vibrio vulnificus presence in the upper Chesapeake Bay. We find that the predicted response of V. vulnificus probability to high temperatures in the Bay differs systematically between models of differing structure. As existing publicly available datasets are inadequate to determine which model structure is most appropriate, the impact of climatic change on the probability of V. vulnificus presence in the Chesapeake Bay remains uncertain. This result points to the challenge of characterizing climate sensitivity of ecological systems in which data are sparse and only statistical models of ecological sensitivity exist.
Dynamic modeling of Tampa Bay urban development using parallel computing
Xian, G.; Crane, M.; Steinwand, D.
2005-01-01
Urban land use and land cover has changed significantly in the environs of Tampa Bay, Florida, over the past 50 years. Extensive urbanization has created substantial change to the region's landscape and ecosystems. This paper uses a dynamic urban-growth model, SLEUTH, which applies six geospatial data themes (slope, land use, exclusion, urban extent, transportation, hillside), to study the process of urbanization and associated land use and land cover change in the Tampa Bay area. To reduce processing time and complete the modeling process within an acceptable period, the model is recoded and ported to a Beowulf cluster. The parallel-processing computer system accomplishes the massive amount of computation the modeling simulation requires. SLEUTH calibration process for the Tampa Bay urban growth simulation spends only 10 h CPU time. The model predicts future land use/cover change trends for Tampa Bay from 1992 to 2025. Urban extent is predicted to double in the Tampa Bay watershed between 1992 and 2025. Results show an upward trend of urbanization at the expense of a decline of 58% and 80% in agriculture and forested lands, respectively.
Christensen, Victoria G.; Payne, G.A.; Kallemeyn, Larry W.
2004-01-01
Implementation of an order by the International Joint Commission in January 2000 has changed operating procedures for dams that regulate two large reservoirs in Voyageurs National Park in northern Minnesota. These new procedures were expected to restore a more natural water regime and affect water levels, water quality, and trophic status. Results of laboratory analyses and field measurements of chemical and physical properties from May 2001 through September 2003 were compared to similar data collected prior to the change in operating procedures. Rank sum tests showed significant decreases in chlorophyll-a concentrations and trophic state indices for Kabetogama Lake (p=0.021) and Black Bay (p=0.007). There were no significant decreases in total phosphorus concentration, however, perhaps due to internal cycling of phosphorus. No sites had significant trends in seasonal total phosphorus concentrations, with the exception of May samples from Sand Point Lake, which had a significant decreasing trend (tau=-0.056, probability=0.03). May chlorophyll-a concentrations for Kabetogama Lake showed a significant decreasing trend (tau=-0.42, probability=0.05). Based on mean chlorophyll trophic-state indices (2001-03), Sand Point, Namakan, and Rainy Lakes would be classified oligotrophic to mesotrophic, and Kabetogama Lake and Rainy Lake at Black Bay would be classified as mesotrophic. The classification of Sand Point, Namakan, and Rainy Lakes remain the same for data collected prior to the change in operating procedures. In contrast, the trophic classification of Kabetogama Lake and Rainy Lake at Black Bay has changed from eutrophic to mesotrophic.
Pashaei, Elnaz; Ozen, Mustafa; Aydin, Nizamettin
2015-08-01
Improving accuracy of supervised classification algorithms in biomedical applications is one of active area of research. In this study, we improve the performance of Particle Swarm Optimization (PSO) combined with C4.5 decision tree (PSO+C4.5) classifier by applying Boosted C5.0 decision tree as the fitness function. To evaluate the effectiveness of our proposed method, it is implemented on 1 microarray dataset and 5 different medical data sets obtained from UCI machine learning databases. Moreover, the results of PSO + Boosted C5.0 implementation are compared to eight well-known benchmark classification methods (PSO+C4.5, support vector machine under the kernel of Radial Basis Function, Classification And Regression Tree (CART), C4.5 decision tree, C5.0 decision tree, Boosted C5.0 decision tree, Naive Bayes and Weighted K-Nearest neighbor). Repeated five-fold cross-validation method was used to justify the performance of classifiers. Experimental results show that our proposed method not only improve the performance of PSO+C4.5 but also obtains higher classification accuracy compared to the other classification methods.
An ant colony optimization based feature selection for web page classification.
Saraç, Esra; Özel, Selma Ayşe
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods.
We conducted aerial photographic surveys of Oregon's Yaquina Bay estuary during consecutive summers from 1997 through 2001. Imagery was obtained during low tide exposures of intertidal mudflats, allowing use of near-infrared color film to detect and discriminate plant communitie...
Aerial photographic surveys of Oregon's Yaquina Bay estuary were conducted during consecutive summers from 1997 through 2000. Imagery was obtained during low tide exposures of intertidal mudflats, allowing use of near-infrared color film to detect and discriminate plant communit...
1980-08-01
also a mobile substrate habitat type, but not the massive dunes described previously; some vegetation is established. Most foredunes along the coastal...wvith the Fish and Wildlife Co~ordiinatioin ccnii h’ should be cdirected toe ard tin’, still Sit i~l~( ie . apliC ii n lilt Act IS 320.3ft Obovei
NASA Technical Reports Server (NTRS)
Morris, Carl N.
1987-01-01
Motivated by the LANDSAT problem of estimating the probability of crop or geological types based on multi-channel satellite imagery data, Morris and Kostal (1983), Hill, Hinkley, Kostal, and Morris (1984), and Morris, Hinkley, and Johnston (1985) developed an empirical Bayes approach to this problem. Here, researchers return to those developments, making certain improvements and extensions, but restricting attention to the binary case of only two attributes.
Kambhampati, Satya Samyukta; Singh, Vishal; Manikandan, M Sabarimalai; Ramkumar, Barathram
2015-08-01
In this Letter, the authors present a unified framework for fall event detection and classification using the cumulants extracted from the acceleration (ACC) signals acquired using a single waist-mounted triaxial accelerometer. The main objective of this Letter is to find suitable representative cumulants and classifiers in effectively detecting and classifying different types of fall and non-fall events. It was discovered that the first level of the proposed hierarchical decision tree algorithm implements fall detection using fifth-order cumulants and support vector machine (SVM) classifier. In the second level, the fall event classification algorithm uses the fifth-order cumulants and SVM. Finally, human activity classification is performed using the second-order cumulants and SVM. The detection and classification results are compared with those of the decision tree, naive Bayes, multilayer perceptron and SVM classifiers with different types of time-domain features including the second-, third-, fourth- and fifth-order cumulants and the signal magnitude vector and signal magnitude area. The experimental results demonstrate that the second- and fifth-order cumulant features and SVM classifier can achieve optimal detection and classification rates of above 95%, as well as the lowest false alarm rate of 1.03%.
A Theoretical Analysis of Why Hybrid Ensembles Work.
Hsu, Kuo-Wei
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles.
Wavelet-based energy features for glaucomatous image classification.
Dua, Sumeet; Acharya, U Rajendra; Chowriappa, Pradeep; Sree, S Vinitha
2012-01-01
Texture features within images are actively pursued for accurate and efficient glaucoma classification. Energy distribution over wavelet subbands is applied to find these important texture features. In this paper, we investigate the discriminatory potential of wavelet features obtained from the daubechies (db3), symlets (sym3), and biorthogonal (bio3.3, bio3.5, and bio3.7) wavelet filters. We propose a novel technique to extract energy signatures obtained using 2-D discrete wavelet transform, and subject these signatures to different feature ranking and feature selection strategies. We have gauged the effectiveness of the resultant ranked and selected subsets of features using a support vector machine, sequential minimal optimization, random forest, and naïve Bayes classification strategies. We observed an accuracy of around 93% using tenfold cross validations to demonstrate the effectiveness of these methods.
NASA Technical Reports Server (NTRS)
Erb, R. B.
1974-01-01
The Coastal Analysis Team of the Johnson Space Center conducted a 1-year investigation of ERTS-1 MSS data to determine its usefulness in coastal zone management. Galveston Bay, Texas, was the study area for evaluating both conventional image interpretation and computer-aided techniques. There was limited success in detecting, identifying and measuring areal extent of water bodies, turbidity zones, phytoplankton blooms, salt marshes, grasslands, swamps, and low wetlands using image interpretation techniques. Computer-aided techniques were generally successful in identifying these features. Aerial measurement of salt marshes accuracies ranged from 89 to 99 percent. Overall classification accuracy of all study sites was 89 percent for Level 1 and 75 percent for Level 2.
Factors affecting GEBV accuracy with single-step Bayesian models.
Zhou, Lei; Mrode, Raphael; Zhang, Shengli; Zhang, Qin; Li, Bugao; Liu, Jian-Feng
2018-01-01
A single-step approach to obtain genomic prediction was first proposed in 2009. Many studies have investigated the components of GEBV accuracy in genomic selection. However, it is still unclear how the population structure and the relationships between training and validation populations influence GEBV accuracy in terms of single-step analysis. Here, we explored the components of GEBV accuracy in single-step Bayesian analysis with a simulation study. Three scenarios with various numbers of QTL (5, 50, and 500) were simulated. Three models were implemented to analyze the simulated data: single-step genomic best linear unbiased prediction (GBLUP; SSGBLUP), single-step BayesA (SS-BayesA), and single-step BayesB (SS-BayesB). According to our results, GEBV accuracy was influenced by the relationships between the training and validation populations more significantly for ungenotyped animals than for genotyped animals. SS-BayesA/BayesB showed an obvious advantage over SSGBLUP with the scenarios of 5 and 50 QTL. SS-BayesB model obtained the lowest accuracy with the 500 QTL in the simulation. SS-BayesA model was the most efficient and robust considering all QTL scenarios. Generally, both the relationships between training and validation populations and LD between markers and QTL contributed to GEBV accuracy in the single-step analysis, and the advantages of single-step Bayesian models were more apparent when the trait is controlled by fewer QTL.
Pattin, Kristine A.; White, Bill C.; Barney, Nate; Gui, Jiang; Nelson, Heather H.; Kelsey, Karl R.; Andrew, Angeline S.; Karagas, Margaret R.; Moore, Jason H.
2008-01-01
Multifactor dimensionality reduction (MDR) was developed as a nonparametric and model-free data mining method for detecting, characterizing, and interpreting epistasis in the absence of significant main effects in genetic and epidemiologic studies of complex traits such as disease susceptibility. The goal of MDR is to change the representation of the data using a constructive induction algorithm to make nonadditive interactions easier to detect using any classification method such as naïve Bayes or logistic regression. Traditionally, MDR constructed variables have been evaluated with a naïve Bayes classifier that is combined with 10-fold cross validation to obtain an estimate of predictive accuracy or generalizability of epistasis models. Traditionally, we have used permutation testing to statistically evaluate the significance of models obtained through MDR. The advantage of permutation testing is that it controls for false-positives due to multiple testing. The disadvantage is that permutation testing is computationally expensive. This is in an important issue that arises in the context of detecting epistasis on a genome-wide scale. The goal of the present study was to develop and evaluate several alternatives to large-scale permutation testing for assessing the statistical significance of MDR models. Using data simulated from 70 different epistasis models, we compared the power and type I error rate of MDR using a 1000-fold permutation test with hypothesis testing using an extreme value distribution (EVD). We find that this new hypothesis testing method provides a reasonable alternative to the computationally expensive 1000-fold permutation test and is 50 times faster. We then demonstrate this new method by applying it to a genetic epidemiology study of bladder cancer susceptibility that was previously analyzed using MDR and assessed using a 1000-fold permutation test. PMID:18671250
Baele, Guy; Lemey, Philippe; Vansteelandt, Stijn
2013-03-06
Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model's marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. We here assess the original 'model-switch' path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model's marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation.
Wavelet Packet Entropy for Heart Murmurs Classification
Safara, Fatemeh; Doraisamy, Shyamala; Azman, Azreen; Jantan, Azrul; Ranga, Sri
2012-01-01
Heart murmurs are the first signs of cardiac valve disorders. Several studies have been conducted in recent years to automatically differentiate normal heart sounds, from heart sounds with murmurs using various types of audio features. Entropy was successfully used as a feature to distinguish different heart sounds. In this paper, new entropy was introduced to analyze heart sounds and the feasibility of using this entropy in classification of five types of heart sounds and murmurs was shown. The entropy was previously introduced to analyze mammograms. Four common murmurs were considered including aortic regurgitation, mitral regurgitation, aortic stenosis, and mitral stenosis. Wavelet packet transform was employed for heart sound analysis, and the entropy was calculated for deriving feature vectors. Five types of classification were performed to evaluate the discriminatory power of the generated features. The best results were achieved by BayesNet with 96.94% accuracy. The promising results substantiate the effectiveness of the proposed wavelet packet entropy for heart sounds classification. PMID:23227043
Gender classification from video under challenging operating conditions
NASA Astrophysics Data System (ADS)
Mendoza-Schrock, Olga; Dong, Guozhu
2014-06-01
The literature is abundant with papers on gender classification research. However the majority of such research is based on the assumption that there is enough resolution so that the subject's face can be resolved. Hence the majority of the research is actually in the face recognition and facial feature area. A gap exists for gender classification under challenging operating conditions—different seasonal conditions, different clothing, etc.—and when the subject's face cannot be resolved due to lack of resolution. The Seasonal Weather and Gender (SWAG) Database is a novel database that contains subjects walking through a scene under operating conditions that span a calendar year. This paper exploits a subset of that database—the SWAG One dataset—using data mining techniques, traditional classifiers (ex. Naïve Bayes, Support Vector Machine, etc.) and traditional (canny edge detection, etc.) and non-traditional (height/width ratios, etc.) feature extractors to achieve high correct gender classification rates (greater than 85%). Another novelty includes exploiting frame differentials.
Classification Algorithms for Big Data Analysis, a Map Reduce Approach
NASA Astrophysics Data System (ADS)
Ayma, V. A.; Ferreira, R. S.; Happ, P.; Oliveira, D.; Feitosa, R.; Costa, G.; Plaza, A.; Gamba, P.
2015-03-01
Since many years ago, the scientific community is concerned about how to increase the accuracy of different classification methods, and major achievements have been made so far. Besides this issue, the increasing amount of data that is being generated every day by remote sensors raises more challenges to be overcome. In this work, a tool within the scope of InterIMAGE Cloud Platform (ICP), which is an open-source, distributed framework for automatic image interpretation, is presented. The tool, named ICP: Data Mining Package, is able to perform supervised classification procedures on huge amounts of data, usually referred as big data, on a distributed infrastructure using Hadoop MapReduce. The tool has four classification algorithms implemented, taken from WEKA's machine learning library, namely: Decision Trees, Naïve Bayes, Random Forest and Support Vector Machines (SVM). The results of an experimental analysis using a SVM classifier on data sets of different sizes for different cluster configurations demonstrates the potential of the tool, as well as aspects that affect its performance.
Default Bayes Factors for Model Selection in Regression
ERIC Educational Resources Information Center
Rouder, Jeffrey N.; Morey, Richard D.
2012-01-01
In this article, we present a Bayes factor solution for inference in multiple regression. Bayes factors are principled measures of the relative evidence from data for various models or positions, including models that embed null hypotheses. In this regard, they may be used to state positive evidence for a lack of an effect, which is not possible…
Numerical modeling of salt marsh morphological change induced by Hurricane Sandy
Hu, Kelin; Chen, Qin; Wang, Hongqing; Hartig, Ellen K.; Orton, Philip M.
2018-01-01
The salt marshes of Jamaica Bay serve as a recreational outlet for New York City residents, mitigate wave impacts during coastal storms, and provide habitat for critical wildlife species. Hurricanes have been recognized as one of the critical drivers of coastal wetland morphology due to their effects on hydrodynamics and sediment transport, deposition, and erosion processes. In this study, the Delft3D modeling suite was utilized to examine the effects of Hurricane Sandy (2012) on salt marsh morphology in Jamaica Bay. Observed marsh elevation change and accretion from rod Surface Elevation Tables and feldspar Marker Horizons (SET-MH) and hydrodynamic measurements during Hurricane Sandy were used to calibrate and validate the wind-waves-surge-sediment transport-morphology coupled model. The model results agreed well with in situ field measurements. The validated model was then used to detect salt marsh morphological change due to Sandy across Jamaica Bay. Model results indicate that the island-wide morphological changes in the bay's salt marshes due to Sandy were in the range of −30 mm (erosion) to +15 mm (deposition), and spatially complex and heterogeneous. The storm generated paired deposition and erosion patches at local scales. Salt marshes inside the west section of the bay showed erosion overall while marshes inside the east section showed deposition from Sandy. The net sediment amount that Sandy brought into the bay is only about 1% of the total amount of reworked sediment within the bay during the storm. Numerical experiments show that waves and vegetation played a critical role in sediment transport and associated wetland morphological change in Jamaica Bay. Furthermore, without the protection of vegetation, the marsh islands of Jamaica Bay would experience both more erosion and less accretion in coastal storms.
Summary of findings about circulation and the estuarine turbidity maximum in Suisun Bay, California
Schoellhamer, David H.; Burau, Jon R.
1998-01-01
Suisun Bay, California, is the most landward subembayment of San Francisco Bay (fig. 1) and is an important ecological habitat (Cloern and others, 1983; Jassby and others, 1995). During the 1960s and 1970s, data collected in Suisun Bay were analyzed to develop a conceptual model of how water, salt, and sediment move within and through the Bay. This conceptual model has been used to manage freshwater flows from the Sacramento-San Joaquin Delta to Suisun Bay to improve habitat for several threatened and endangered fish species. Instrumentation used to measure water velocity, salinity, and suspended-solids concentration (SSC) greatly improved during the 1980s and 1990s. The U.S. Geological Survey (USGS) has utilized these new instruments to collect one of the largest, high-quality hydrodynamic and sediment data sets available for any estuary. Analysis of these new data has led to the revision of the conceptual model of circulation and sediment transport in Suisun Bay.
Lohmann, Melinda A.; Swain, Eric D.; Wang, John D.; Dixon, Joann
2012-01-01
Biscayne National Park, located in Biscayne Bay in southeast Florida, is one of the largest marine parks in the country and sustains a large natural marine fishery where numerous threatened and endangered species reproduce. In recent years, the bay has experienced hypersaline conditions (salinity greater than 35 practical salinity units) of increasing magnitude and duration. Hypersalinity events were particularly pronounced during April to August 2004 in nearshore areas along the southern and middle parts of the bay. Prolonged hypersaline conditions can cause degradation of water quality and permanent damage to, or loss of, brackish nursery habitats for multiple species of fish and crustaceans as well as damage to certain types of seagrasses that are not tolerant of extreme changes in salinity. To evaluate the factors that contribute to hypersalinity events and to test the effects of possible changes in precipitation patterns and canal flows into Biscayne Bay on salinity in the bay, the U.S. Geological Survey constructed a coupled surface-water/groundwater numerical flow model. The model is designed to account for freshwater flows into Biscayne Bay through the canal system, leakage of salty bay water into the underlying Biscayne aquifer, discharge of fresh and salty groundwater from the Biscayne aquifer into the bay, direct effects of precipitation on bay salinity, indirect effects of precipitation on recharge to the Biscayne aquifer, direct effects of evapotranspiration (ET) on bay salinity, indirect effects of ET on recharge to the Biscayne aquifer, and maintenance of mass balance of both water and solute. The model was constructed using the Flow and Transport in a Linked Overland/Aquifer Density Dependent System (FTLOADDS) simulator, version 3.3, which couples the two-dimensional, surface-water flow and solute-transport simulator SWIFT2D with the density-dependent, groundwater flow an solute-transport simulator SEAWAT. The model was calibrated by a trial-and-error method to fit observed groundwater heads, estimated base flow, and measured bay salinity and temperatures from 1996 to 2004, as well as the location of the freshwater-saltwater interface in the aquifer, by adjusting ET rate limiters, canal vertical hydraulic conductance, leakage rate coefficients (transition-layer thickness and hydraulic conductivity), Manning's n value, and delineation of rainfall zones. Although flow budget calculations indicate that precipitation, ET, and groundwater flux into the bay represent a small portion of the overall budget, these factors may be important in controlling salinity in some parts of the bay, for example the southern parts of the bay where the canal system is not extensively developed or controlled. The balance of precipitation and ET during the wet season generally results in a reduction of bay salinity, whereas the balance of precipitation and ET during the dry season generally results in an increase in bay salinity. During years when wet season precipitation is lower than average, for example less than 70 percent total precipitation for an average year, ET could outweigh precipitation over the bay for essentially the entire year. Hypersaline conditions are prone to occur near the end of the dry season because precipitation rates are generally lower, canal discharge rates (which are strongly correlated to precipitation rates) are also generally lower, and ET rates are higher than during the rest of the year. The hypersalinity event of 2004 followed several years of relatively low precipitation and correspondingly reduced canal structure releases and was unusually extensive, continuing into July. Thus, hypersalinity is ultimately the result of a cumulative deficit of precipitation. The model was used to test the effects of possible changes in canal flux and precipitation. Simulation results showed that by increasing, reducing, or modifying canal discharge rates, the effects on salinity in the bay were more pronounced in the northern part of the bay where there are more canals and canal-control structures. By doubling and halving precipitation, the effects on bay salinity were more pronounced in the southern part of the bay than in the northern part of the bay where there are fewer canals and canal-control structures. The model is designed to quantify factors that contribute to hypersaline conditions in Biscayne Bay and may be less appropriate for addressing other issues or examining conditions substantially different from those described in this report. Model results must be interpreted in light of model limitations, which include representation of the system and conceptual model, uncertainty in physical properties used to describe the system or processes, the scale and discretization of the system, and representation of the boundary conditions.
Martínez, Carlos Alberto; Khare, Kshitij; Banerjee, Arunava; Elzo, Mauricio A
2017-03-21
This study corresponds to the second part of a companion paper devoted to the development of Bayesian multiple regression models accounting for randomness of genotypes in across population genome-wide prediction. This family of models considers heterogeneous and correlated marker effects and allelic frequencies across populations, and has the ability of considering records from non-genotyped individuals and individuals with missing genotypes in any subset of loci without the need for previous imputation, taking into account uncertainty about imputed genotypes. This paper extends this family of models by considering multivariate spike and slab conditional priors for marker allele substitution effects and contains derivations of approximate Bayes factors and fractional Bayes factors to compare models from part I and those developed here with their null versions. These null versions correspond to simpler models ignoring heterogeneity of populations, but still accounting for randomness of genotypes. For each marker loci, the spike component of priors corresponded to point mass at 0 in R S , where S is the number of populations, and the slab component was a S-variate Gaussian distribution, independent conditional priors were assumed. For the Gaussian components, covariance matrices were assumed to be either the same for all markers or different for each marker. For null models, the priors were simply univariate versions of these finite mixture distributions. Approximate algebraic expressions for Bayes factors and fractional Bayes factors were found using the Laplace approximation. Using the simulated datasets described in part I, these models were implemented and compared with models derived in part I using measures of predictive performance based on squared Pearson correlations, Deviance Information Criterion, Bayes factors, and fractional Bayes factors. The extensions presented here enlarge our family of genome-wide prediction models making it more flexible in the sense that it now offers more modeling options. Copyright © 2017 Elsevier Ltd. All rights reserved.
2013-01-01
Background Accurate model comparison requires extensive computation times, especially for parameter-rich models of sequence evolution. In the Bayesian framework, model selection is typically performed through the evaluation of a Bayes factor, the ratio of two marginal likelihoods (one for each model). Recently introduced techniques to estimate (log) marginal likelihoods, such as path sampling and stepping-stone sampling, offer increased accuracy over the traditional harmonic mean estimator at an increased computational cost. Most often, each model’s marginal likelihood will be estimated individually, which leads the resulting Bayes factor to suffer from errors associated with each of these independent estimation processes. Results We here assess the original ‘model-switch’ path sampling approach for direct Bayes factor estimation in phylogenetics, as well as an extension that uses more samples, to construct a direct path between two competing models, thereby eliminating the need to calculate each model’s marginal likelihood independently. Further, we provide a competing Bayes factor estimator using an adaptation of the recently introduced stepping-stone sampling algorithm and set out to determine appropriate settings for accurately calculating such Bayes factors, with context-dependent evolutionary models as an example. While we show that modest efforts are required to roughly identify the increase in model fit, only drastically increased computation times ensure the accuracy needed to detect more subtle details of the evolutionary process. Conclusions We show that our adaptation of stepping-stone sampling for direct Bayes factor calculation outperforms the original path sampling approach as well as an extension that exploits more samples. Our proposed approach for Bayes factor estimation also has preferable statistical properties over the use of individual marginal likelihood estimates for both models under comparison. Assuming a sigmoid function to determine the path between two competing models, we provide evidence that a single well-chosen sigmoid shape value requires less computational efforts in order to approximate the true value of the (log) Bayes factor compared to the original approach. We show that the (log) Bayes factors calculated using path sampling and stepping-stone sampling differ drastically from those estimated using either of the harmonic mean estimators, supporting earlier claims that the latter systematically overestimate the performance of high-dimensional models, which we show can lead to erroneous conclusions. Based on our results, we argue that highly accurate estimation of differences in model fit for high-dimensional models requires much more computational effort than suggested in recent studies on marginal likelihood estimation. PMID:23497171
NASA Astrophysics Data System (ADS)
Ranjbar, Mohammad Hassan; Hadjizadeh Zaker, Nasser
2018-01-01
Gorgan Bay is a semi-enclosed basin located in the southeast of the Caspian Sea, Iran. The bay is recognized as a resting place for migratory birds as well as a spawning habitat for native fish. However, apparently, no detailed research on its physical processes has previously been conducted. In this study, a 3D coupled hydrodynamic and solute transport model was used to investigate general circulation, thermohaline structure, and residence time in Gorgan Bay. Model outputs were validated against a set of field observations. Bottom friction and attenuation coefficient of light intensity were tuned in order to achieve optimum agreement with the observations. Results revealed that, due to the interaction between bathymetry and prevailing winds, a barotropic double-gyre circulation, dominating the general circulation, existed during all seasons in Gorgan Bay. Furthermore, temperature and salinity fluctuations in the bay were seasonal, due to the seasonal variability of atmospheric fluxes. Results also indicated that under the prevailing winds, the domain-averaged residence time in Gorgan Bay would be approximately 95 days. The rivers discharging into Gorgan Bay are considered as the main sources of nutrients in the bay. Since their mouths are located in the area with a residence time of over 100 days, Gorgan Bay could be at risk of eutrophication; it is necessary to adopt preventive measures against water quality degradation.
A neuromorphic network for generic multivariate data classification
Schmuker, Michael; Pfeil, Thomas; Nawrot, Martin Paul
2014-01-01
Computational neuroscience has uncovered a number of computational principles used by nervous systems. At the same time, neuromorphic hardware has matured to a state where fast silicon implementations of complex neural networks have become feasible. En route to future technical applications of neuromorphic computing the current challenge lies in the identification and implementation of functional brain algorithms. Taking inspiration from the olfactory system of insects, we constructed a spiking neural network for the classification of multivariate data, a common problem in signal and data analysis. In this model, real-valued multivariate data are converted into spike trains using “virtual receptors” (VRs). Their output is processed by lateral inhibition and drives a winner-take-all circuit that supports supervised learning. VRs are conveniently implemented in software, whereas the lateral inhibition and classification stages run on accelerated neuromorphic hardware. When trained and tested on real-world datasets, we find that the classification performance is on par with a naïve Bayes classifier. An analysis of the network dynamics shows that stable decisions in output neuron populations are reached within less than 100 ms of biological time, matching the time-to-decision reported for the insect nervous system. Through leveraging a population code, the network tolerates the variability of neuronal transfer functions and trial-to-trial variation that is inevitably present on the hardware system. Our work provides a proof of principle for the successful implementation of a functional spiking neural network on a configurable neuromorphic hardware system that can readily be applied to real-world computing problems. PMID:24469794
Douglas, P K; Harris, Sam; Yuille, Alan; Cohen, Mark S
2011-05-15
Machine learning (ML) has become a popular tool for mining functional neuroimaging data, and there are now hopes of performing such analyses efficiently in real-time. Towards this goal, we compared accuracy of six different ML algorithms applied to neuroimaging data of persons engaged in a bivariate task, asserting their belief or disbelief of a variety of propositional statements. We performed unsupervised dimension reduction and automated feature extraction using independent component (IC) analysis and extracted IC time courses. Optimization of classification hyperparameters across each classifier occurred prior to assessment. Maximum accuracy was achieved at 92% for Random Forest, followed by 91% for AdaBoost, 89% for Naïve Bayes, 87% for a J48 decision tree, 86% for K*, and 84% for support vector machine. For real-time decoding applications, finding a parsimonious subset of diagnostic ICs might be useful. We used a forward search technique to sequentially add ranked ICs to the feature subspace. For the current data set, we determined that approximately six ICs represented a meaningful basis set for classification. We then projected these six IC spatial maps forward onto a later scanning session within subject. We then applied the optimized ML algorithms to these new data instances, and found that classification accuracy results were reproducible. Additionally, we compared our classification method to our previously published general linear model results on this same data set. The highest ranked IC spatial maps show similarity to brain regions associated with contrasts for belief > disbelief, and disbelief < belief. Copyright © 2010 Elsevier Inc. All rights reserved.
Spatially and Temporally Detailed Modeling of Water Quality in Narragansett Bay
Nutrient loading to Narragansett Bay has led to eutrophication, resulting in hypoxia and anoxia, finfish and shellfish kills, loss of seagrass, and reductions in the recreational and economic value of the Bay. We are developing a model that simulates the effects of external nutri...
Inputs and spatial distribution patterns of Cr in Jiaozhou Bay
NASA Astrophysics Data System (ADS)
Yang, Dongfang; Miao, Zhenqing; Huang, Xinmin; Wei, Linzhen; Feng, Ming
2018-03-01
Cr pollution in marine bays has been one of the critical environmental issues, and understanding the input and spatial distribution patterns is essential to pollution control. In according to the source strengths of the major pollution sources, the input patterns of pollutants to marine bay include slight, moderate and heavy, and the spatial distribution are corresponding to three block models respectively. This paper analyzed input patterns and distributions of Cr in Jiaozhou Bay, eastern China based on investigation on Cr in surface waters during 1979-1983. Results showed that the input strengths of Cr in Jiaozhou Bay could be classified as moderate input and slight input, and the input strengths were 32.32-112.30 μg L-1 and 4.17-19.76 μg L-1, respectively. The input patterns of Cr included two patterns of moderate input and slight input, and the horizontal distributions could be defined by means of Block Model 2 and Block Model 3, respectively. In case of moderate input pattern via overland runoff, Cr contents were decreasing from the estuaries to the bay mouth, and the distribution pattern was parallel. In case of moderate input pattern via marine current, Cr contents were decreasing from the bay mouth to the bay, and the distribution pattern was parallel to circular. The Block Models were able to reveal the transferring process of various pollutants, and were helpful to understand the distributions of pollutants in marine bay.
NASA Astrophysics Data System (ADS)
Perrot, Thierry; Rossi, Nadège; Ménesguen, Alain; Dumas, Franck
2014-04-01
First recorded in the 1970s, massive green macroalgal blooms have since become an annual recurrence in Brittany, France. Eutrophication (in particular to anthropogenic nitrogen input) has been identified as the main factor controlling Ulva ‘green tide' events. In this study, we modelled Ulva proliferation using a two-dimensional model by coupling hydrodynamic and biological models (coined ‘MARS-Ulves') for five sites along the Brittany coastline (La Fresnaye Bay, Saint-Brieuc Bay, Lannion Bay, Guissény Bay and Douarnenez Bay). Calibration of the biological model was mainly based on the seasonal variation of the maximum nitrogen uptake rate (VmaxN) and the half-saturation constant for nitrogen (KN) to reproduce the internal nutrient quotas measured in situ for each site. In each bay, model predictions were in agreement with observed algal coverage converted into biomass. A numerical tracking method was implemented to identify the contribution of the rivers that empty into the study bays, and scenarios of decreases in nitrate concentration in rivers were simulated. Results from numerical nitrogen tracking highlighted the main nitrogen sources of green tides and also showed that each river contributes locally to green tides. In addition, dynamic modelling showed that the nitrate concentrations in rivers must be limited to between 5 and 15 mg l- 1, depending on the bay, to reduce Ulva biomass by half on the coasts. The three-step methodology developed in this study (analysing total dissolved inorganic nitrogen flux from rivers, tracking nitrogen sources in Ulva and developing scenarios for reducing nitrogen) provides qualitative and quantitative guidelines for stakeholders to define specific nitrogen reduction targets for better environmental management of water quality.
Spatially and Temporally Detailed Modeling of Water Quality in Narragansett Bay (AGU)
Nutrient loading to Narragansett Bay has led to eutrophication, resulting in hypoxia and anoxia, finfish and shellfish kills, loss of seagrass, and reductions in the recreational and economic value of the Bay. We are developing a model that simulates the effects of external nutri...
Modeling tidal hydrodynamics of San Diego Bay, California
Wang, P.-F.; Cheng, R.T.; Richter, K.; Gross, E.S.; Sutton, D.; Gartner, J.W.
1998-01-01
In 1983, current data were collected by the National Oceanic and Atmospheric Administration using mechanical current meters. During 1992 through 1996, acoustic Doppler current profilers as well as mechanical current meters and tide gauges were used. These measurements not only document tides and tidal currents in San Diego Bay, but also provide independent data sets for model calibration and verification. A high resolution (100-m grid), depth-averaged, numerical hydrodynamic model has been implemented for San Diego Bay to describe essential tidal hydrodynamic processes in the bay. The model is calibrated using the 1983 data set and verified using the more recent 1992-1996 data. Discrepancies between model predictions and field data in beth model calibration and verification are on the order of the magnitude of uncertainties in the field data. The calibrated and verified numerical model has been used to quantify residence time and dilution and flushing of contaminant effluent into San Diego Bay. Furthermore, the numerical model has become an important research tool in ongoing hydrodynamic and water quality studies and in guiding future field data collection programs.
A Bayes linear Bayes method for estimation of correlated event rates.
Quigley, John; Wilson, Kevin J; Walls, Lesley; Bedford, Tim
2013-12-01
Typically, full Bayesian estimation of correlated event rates can be computationally challenging since estimators are intractable. When estimation of event rates represents one activity within a larger modeling process, there is an incentive to develop more efficient inference than provided by a full Bayesian model. We develop a new subjective inference method for correlated event rates based on a Bayes linear Bayes model under the assumption that events are generated from a homogeneous Poisson process. To reduce the elicitation burden we introduce homogenization factors to the model and, as an alternative to a subjective prior, an empirical method using the method of moments is developed. Inference under the new method is compared against estimates obtained under a full Bayesian model, which takes a multivariate gamma prior, where the predictive and posterior distributions are derived in terms of well-known functions. The mathematical properties of both models are presented. A simulation study shows that the Bayes linear Bayes inference method and the full Bayesian model provide equally reliable estimates. An illustrative example, motivated by a problem of estimating correlated event rates across different users in a simple supply chain, shows how ignoring the correlation leads to biased estimation of event rates. © 2013 Society for Risk Analysis.
Does the cost function matter in Bayes decision rule?
Schlü ter, Ralf; Nussbaum-Thom, Markus; Ney, Hermann
2012-02-01
In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.
NASA Astrophysics Data System (ADS)
Yang, Ye; Chui, Ting Fong May
2017-07-01
Many coastal areas worldwide have been reclaimed to meet the increasing land demand. Understanding the effects of land reclamation on the hydrodynamics and transport processes of a semi-enclosed bay is therefore of significance. From a case study of Deep Bay (DB) in China and referring to idealized bay models, the effects of two types of land reclamation, one that narrows the bay mouth and another that reduces the water area inside the bay, were examined in this study. Simulation results of idealized models show that the current velocity at the bay mouth and the incoming tidal energy flux are negatively correlated with the width of bay mouth, as the tidal prism remains almost constant when the bay mouth width reduces. The bay mouth width reduction would also increase the tidal energy dissipation inside of the bay due to friction increase. In DB, a 30% reduction in the mouth width increased the bay mouth current velocity by up to 5% and the total incoming energy flux by 18%. The narrowed bay mouth also substantially changed the bay's vertical structure of salinity, increasing the stratification strength by 1.7×10-4 s-2. For reductions in the water surface area in the head of the bay, results from idealized bay simulations show that the current velocity throughout the bay, the incoming tidal energy flux, and salinity at the inner bay all decrease with water area reduction. Reclaiming 14% of area in DB, the current velocity reduced by 9% at the bay mouth, but increased in the middle and inner parts. The incoming tidal energy flux also increased as the coastline became more streamlined after reclamation, and the salinity at inner bay decreased. Both reclamation types have substantially altered the water and salt transport processes and increased the water exchange ability of the bay with the adjacent sea.
Coastal upwelling by wind-driven forcing in Jervis Bay, New South Wales: A numerical study for 2011
NASA Astrophysics Data System (ADS)
Sun, Youn-Jong; Jalón-Rojas, Isabel; Wang, Xiao Hua; Jiang, Donghui
2018-06-01
The Princeton Ocean Model (POM) was used to investigate an upwelling event in Jervis Bay, New South Wales (SE Australia), with varying wind directions and strengths. The POM was adopted with a downscaling approach for the regional ocean model one-way nested to a global ocean model. The upwelling event was detected from the observed wind data and satellite sea surface temperature images. The validated model reproduced the upwelling event showing the input of bottom cold water driven by wind to the bay, its subsequent deflection to the south, and its outcropping to the surface along the west and south coasts. Nevertheless, the behavior of the bottom water that intruded into the bay varied with different wind directions and strengths. Upwelling-favorable wind directions for flushing efficiency within the bay were ranked in the following order: N (0°; northerly) > NNE (30°; northeasterly) > NW (315°; northwesterly) > NE (45°; northeasterly) > ENE (60°; northeasterly). Increasing wind strengths also enhance cold water penetration and water exchange. It was determined that wind-driven downwelling within the bay, which occurred with NNE, NE and ENE winds, played a key role in blocking the intrusion of the cold water upwelled through the bay entrance. A northerly wind stress higher than 0.3 N m-2 was required for the cold water to reach the northern innermost bay.
NASA Technical Reports Server (NTRS)
Jones, R. (Principal Investigator); Harwood, P.; Finley, R.; Clements, G.; Lodwick, L.; Mcculloch, S.; Marphy, D.
1976-01-01
The author has identified the following significant results. The most significant ADP result was the modification of the DAM package to produce classified printouts, scaled and registered to U.S.G.S., 71/2 minute topographic maps from LARSYS-type classification files. With this modification, all the powerful scaling and registration capabilities of DAM become available for multiclass classification files. The most significant results with respect to image interpretation were the application of mapping techniques to a new, more complex area, and the refinement of an image interpretation procedure which should yield the best results.
Tampa Bay Study Data and Information Management System (DIMS)
NASA Astrophysics Data System (ADS)
Edgar, N. T.; Johnston, J. B.; Yates, K.; Smith, K. E.
2005-05-01
Providing easy access to data and information is an essential component of both science and management. The Tampa Bay Data and Information Management System (DIMS) catalogs and publicizes data and products which are generated through the Tampa Bay Integrated Science Study. The publicly accessible interface consists of a Web site (http://gulfsci.usgs.gov), a digital library, and an interactive map server (IMS). The Tampa Bay Study Web site contains information from scientists involved in the study, and is also the portal site for the digital library and IMS. Study information is highlighted on the Web site according to the estuarine component: geology and geomorphology, water and sediment quality, ecosystem structure and function, and hydrodynamics. The Tampa Bay Digital Library is a web-based clearinghouse for digital products on Tampa Bay, including documents, maps, spatial and tabular data sets, presentations, etc. New developments to the digital library include new search features, 150 new products over the past year, and partnerships to expand the offering of science products. The IMS is a Web-based geographic information system (GIS) used to store, analyze and display data pertaining to Tampa Bay. Upgrades to the IMS have improved performance and speed, as well as increased the number of data sets available for mapping. The Tampa Bay DIMS is a dynamic entity and will continue to evolve with the study. Beginning in 2005, the Tampa Bay Integrated Coastal Model will have a more prominent presence within the DIMS. The Web site will feature model projects and plans; the digital library will host model products and data sets; the IMS will display spatial model data sets and analyses. These tools will be used to increase communication of USGS efforts in Tampa Bay to the public, local managers, and scientists.
Estimation of residence time in a shallow lacustrine embayment
NASA Astrophysics Data System (ADS)
Razmi, A. M.; Barry, D. A.; Lemmin, U.; Bakhtyar, R.
2012-12-01
Near-shore water quality in lacustrine bays subjected to effluent or stream discharges is affected by, amongst other things, the residence time within a given bay. Vidy Bay, located on the northern shore of Lake Geneva, Switzerland, receives discharge from a wastewater treatment plant, the Chamberonne River and a storm-water drain. The residence time of water in the bay largely depends on water exchanges with the main basin (Grand Lac) of Lake Geneva. Field investigations and modeling of the hydrodynamics of Vidy Bay have shown that currents are variable, due mainly to wind variability over the lake. However, in broad terms there are two main current patterns in the bay, (i) currents are linked to large gyres in the Grand Lac, or (ii) currents are partially independent of the Grand Lac and are controlled by small-scale gyres within the bay. Residence times in Vidy Bay were computed using the hydrodynamic model Delft3D. Since the Vidy Bay shoreline follows a shallow arc, the definition of the off-shore extent of the bay is ambiguous. Here, the largest within-bay gyre is used. Particle tracking was conducted for each of the three discharges into the bay. Model results were computed using meteorological data for 2010, and thus include the natural variability in wind patterns and seasonal stratification. An analysis of the results shows that a water parcel from the waste water outfall has a residence time ranging from hours to days. The water residence time is minimum near to the surface and maximum at the near bottom layer. The results confirmed that wind force, thermal stratification, and water depth are the main factors influencing residence time.
Identifying Wrist Fracture Patients with High Accuracy by Automatic Categorization of X-ray Reports
de Bruijn, Berry; Cranney, Ann; O’Donnell, Siobhan; Martin, Joel D.; Forster, Alan J.
2006-01-01
The authors performed this study to determine the accuracy of several text classification methods to categorize wrist x-ray reports. We randomly sampled 751 textual wrist x-ray reports. Two expert reviewers rated the presence (n = 301) or absence (n = 450) of an acute fracture of wrist. We developed two information retrieval (IR) text classification methods and a machine learning method using a support vector machine (TC-1). In cross-validation on the derivation set (n = 493), TC-1 outperformed the two IR based methods and six benchmark classifiers, including Naive Bayes and a Neural Network. In the validation set (n = 258), TC-1 demonstrated consistent performance with 93.8% accuracy; 95.5% sensitivity; 92.9% specificity; and 87.5% positive predictive value. TC-1 was easy to implement and superior in performance to the other classification methods. PMID:16929046
A Theoretical Analysis of Why Hybrid Ensembles Work
2017-01-01
Inspired by the group decision making process, ensembles or combinations of classifiers have been found favorable in a wide variety of application domains. Some researchers propose to use the mixture of two different types of classification algorithms to create a hybrid ensemble. Why does such an ensemble work? The question remains. Following the concept of diversity, which is one of the fundamental elements of the success of ensembles, we conduct a theoretical analysis of why hybrid ensembles work, connecting using different algorithms to accuracy gain. We also conduct experiments on classification performance of hybrid ensembles of classifiers created by decision tree and naïve Bayes classification algorithms, each of which is a top data mining algorithm and often used to create non-hybrid ensembles. Therefore, through this paper, we provide a complement to the theoretical foundation of creating and using hybrid ensembles. PMID:28255296
Fault detection and diagnosis of diesel engine valve trains
NASA Astrophysics Data System (ADS)
Flett, Justin; Bone, Gary M.
2016-05-01
This paper presents the development of a fault detection and diagnosis (FDD) system for use with a diesel internal combustion engine (ICE) valve train. A novel feature is generated for each of the valve closing and combustion impacts. Deformed valve spring faults and abnormal valve clearance faults were seeded on a diesel engine instrumented with one accelerometer. Five classification methods were implemented experimentally and compared. The FDD system using the Naïve-Bayes classification method produced the best overall performance, with a lowest detection accuracy (DA) of 99.95% and a lowest classification accuracy (CA) of 99.95% for the spring faults occurring on individual valves. The lowest DA and CA values for multiple faults occurring simultaneously were 99.95% and 92.45%, respectively. The DA and CA results demonstrate the accuracy of our FDD system for diesel ICE valve train fault scenarios not previously addressed in the literature.
First Step Towards a Coastal Modelling System for South Africa: a St. Helena Bay Case Study
NASA Astrophysics Data System (ADS)
Collins, C.; Lamont, T.; Loveday, B. R.; Hermes, J. C.; Veitch, J.; Backeberg, B.
2016-02-01
St. Helena Bay, forming part of the southern Benguela ecosystem, is the largest bay on the west coast of South Africa and is a biologically important region for pelagic fish, hake, and rock lobster. To date, only a few infrequent studies have focussed on variations in the bay scale circulation. A monthly ship-based monitoring line, the St. Helena Bay Monitoring Line (SHBML), was initiated in 2000 to determine the seasonal changes in cross-shelf hydrography and biology. Even though there has been an increase in ocean modelling in and around South Africa in recent years, coastal modelling is still in its infancy. The 12-year observational data set in the St. Helena Bay region, the only long-term, cross-shelf, full water column data-set for South Africa, makes this area the perfect natural laboratory for the development of a coastal modelling system. In this study, the climatological mean temperature and salinity from three different ROMS simulations and a HYCOM simulation are evaluated against the in situ observations from the SHBML with the aim of determining the influence of different forcing products, horizontal and vertical resolution as well as vertical coordinate schemes on the vertical structure of the ocean. The model simulations tend to overestimate the temperature and salinity across the shelf, and particularly within St. Helena Bay. Furthermore, the models misrepresent the vertical salinity and temperature structures. Interestingly, below 800m, there is a better agreement between temperature in the models and the in-situ observations. This is the first detailed comparison of modelled and in-situ data for the greater St. Helena Bay area at this scale and the next phase will examine whether the model that is most congruent with the observations resolves the same interannual signals as observed in the in-situ data.
Automatic classification of protein structures using physicochemical parameters.
Mohan, Abhilash; Rao, M Divya; Sunderrajan, Shruthi; Pennathur, Gautam
2014-09-01
Protein classification is the first step to functional annotation; SCOP and Pfam databases are currently the most relevant protein classification schemes. However, the disproportion in the number of three dimensional (3D) protein structures generated versus their classification into relevant superfamilies/families emphasizes the need for automated classification schemes. Predicting function of novel proteins based on sequence information alone has proven to be a major challenge. The present study focuses on the use of physicochemical parameters in conjunction with machine learning algorithms (Naive Bayes, Decision Trees, Random Forest and Support Vector Machines) to classify proteins into their respective SCOP superfamily/Pfam family, using sequence derived information. Spectrophores™, a 1D descriptor of the 3D molecular field surrounding a structure was used as a benchmark to compare the performance of the physicochemical parameters. The machine learning algorithms were modified to select features based on information gain for each SCOP superfamily/Pfam family. The effect of combining physicochemical parameters and spectrophores on classification accuracy (CA) was studied. Machine learning algorithms trained with the physicochemical parameters consistently classified SCOP superfamilies and Pfam families with a classification accuracy above 90%, while spectrophores performed with a CA of around 85%. Feature selection improved classification accuracy for both physicochemical parameters and spectrophores based machine learning algorithms. Combining both attributes resulted in a marginal loss of performance. Physicochemical parameters were able to classify proteins from both schemes with classification accuracy ranging from 90-96%. These results suggest the usefulness of this method in classifying proteins from amino acid sequences.
FAST Mast Structural Response to Axial Loading: Modeling and Verification
NASA Technical Reports Server (NTRS)
Knight, Norman F., Jr.; Elliott, Kenny B.; Templeton, Justin D.; Song, Kyongchan; Rayburn, Jeffery T.
2012-01-01
The International Space Station s solar array wing mast shadowing problem is the focus of this paper. A building-block approach to modeling and analysis is pursued for the primary structural components of the solar array wing mast structure. Starting with an ANSYS (Registered Trademark) finite element model, a verified MSC.Nastran (Trademark) model is established for a single longeron. This finite element model translation requires the conversion of several modeling and analysis features for the two structural analysis tools to produce comparable results for the single-longeron configuration. The model is then reconciled using test data. The resulting MSC.Nastran (Trademark) model is then extended to a single-bay configuration and verified using single-bay test data. Conversion of the MSC. Nastran (Trademark) single-bay model to Abaqus (Trademark) is also performed to simulate the elastic-plastic longeron buckling response of the single bay prior to folding.
Tidal oscillation of sediment between a river and a bay: A conceptual model
Ganju, N.K.; Schoellhamer, D.H.; Warner, J.C.; Barad, M.F.; Schladow, S.G.
2004-01-01
A conceptual model of fine sediment transport between a river and a bay is proposed, based on observations at two rivers feeding the same bay. The conceptual model consists of river, transitional, and bay regimes. Within the transitional regime, resuspension, advection, and deposition create a mass of sediment that oscillates landward and seaward. While suspended, this sediment mass forms an estuarine turbidity maximum. At slack tides this sediment mass temporarily deposits on the bed, creating landward and seaward deposits. Tidal excursion and slack tide deposition limit the range of the sediment mass. To verify this conceptual model, data from two small tributary rivers of San Pablo Bay are presented. Tidal variability of suspended-sediment concentration markedly differs between the landward and seaward deposits, allowing interpretation of the intratidal movement of the oscillating sediment mass. Application of this model in suitable estuaries will assist in numerical model calibration as well as in data interpretation. A similar model has been applied to some larger-scale European estuaries, which bear a geometric resemblance to the systems analyzed in this study. ?? 2004 Elsevier Ltd. All rights reserved.
USDA-ARS?s Scientific Manuscript database
Current restoration efforts for the Chesapeake Bay watershed mandate a timeline for reducing the load of nutrients and sediment to receiving waters. The Chesapeake Bay Watershed Model (WSM) has been used for two decades to simulate hydrology and nutrient and sediment transport; however, spatial limi...
USDA-ARS?s Scientific Manuscript database
The Jobos Bay Watershed, located in south-central Puerto Rico, is a tropical Conservation Effects Assessment Project (CEAP) Special Emphasis Watershed. The purpose of CEAP is to quantify environmental benefits of conservation practices and includes field and watershed modeling. In Jobos Bay, the goa...
Seafloor habitat mapping and classification in Glacier Bay, Alaska: Phase 1 & 2 1996-2004
Hooge, Philip N.; Carlson, Paul R.; Mondragon, Jennifer; Etherington, Lisa L.; Cochran, G.R.
2004-01-01
Glacier Bay is a diverse fjord ecosystem with multiple sills, numerous tidewater glaciers and a highly complex oceanographic system. The Bay was completely glaciated prior to the 1700’s and subsequently experienced the fastest glacial retreat recorded in historical times. Currently, some of the highest sedimentation rates ever observed occur in the Bay, along with rapid uplift (up to 2.5 cm/year) due to a combination of plate tectonics and isostatic rebound. Glacier Bay is the second deepest fjord in Alaska, with depths over 500 meters. This variety of physical processes and bathymetry creates many diverse habitats within a relatively small area (1,255 km2 ). Habitat can be defined as the locality, including resources and environmental conditions, occupied by a species or population of organisms (Morrison et al 1992). Mapping and characterization of benthic habitat is crucial to an understanding of marine species and can serve a variety of purposes including: understanding species distributions and improving stock assessments, designing special management areas and marine protected areas, monitoring and protecting important habitats, and assessing habitat change due to natural or human impacts. In 1996, Congress recognized the importance of understanding benthic habitat for fisheries management by reauthorizing the Magnuson-Stevens Fishery Conservation and Management Act and amending it with the Sustainable Fisheries Act (SFA). This amendment emphasizes the importance of habitat protection to healthy fisheries and requires identification of essential fish habitat in management decisions. Recently, the National Park Service’s Ocean Stewardship Strategy identified the creation of benthic habitat maps and sediment maps as crucial components to complete basic ocean park resource inventories (Davis 2003). Glacier Bay National Park managers currently have very limited knowledge about the bathymetry, sediment types, and various marine habitats of ecological importance in the Park. Ocean floor bathymetry and sediment type are the building blocks of marine communities. Bottom type and shape affects the kinds of benthic communities that develop in a particular environment as well as the oceanographic conditions that communities are subject to. Accurate mapping of the ocean floor is essential for park manager’s understanding of existing marine communities and will be important in assessing human induced changes (e.g., vessel traffic and commercial fishing), biological change (e.g., rapid sea otter recolonization), and geological processes of change (e.g., deglaciation). Information on animal-habitat relationships, particularly within a marine reserve framework, will be valuable to agencies making decisions about critical habitats, marine reserve design, as well as fishery management. Identification and mapping of benthic habitat provides National Park Service mangers with tools to increase the effectiveness of resource management. The primary objective of this project is to investigate the geological characteristics of the biological habitats of halibut, Dungeness crab, king crab, and Tanner crab within Glacier Bay National Park. Additionally, habitat classification of shallow water regions of Glacier Bay will provide crucial information on the relationship between benthic habitat features and the abundance of benthic prey items for a variety of marine predators, including sea ducks, the rapidly increasing population of sea otters, and other marine mammals.
A computer model of long-term salinity in San Francisco Bay: Sensitivity to mixing and inflows
Uncles, R.J.; Peterson, D.H.
1995-01-01
A two-level model of the residual circulation and tidally-averaged salinity in San Francisco Bay has been developed in order to interpret long-term (days to decades) salinity variability in the Bay. Applications of the model to biogeochemical studies are also envisaged. The model has been used to simulate daily-averaged salinity in the upper and lower levels of a 51-segment discretization of the Bay over the 22-y period 1967–1988. Observed, monthly-averaged surface salinity data and monthly averages of the daily-simulated salinity are in reasonable agreement, both near the Golden Gate and in the upper reaches, close to the delta. Agreement is less satisfactory in the central reaches of North Bay, in the vicinity of Carquinez Strait. Comparison of daily-averaged data at Station 5 (Pittsburg, in the upper North Bay) with modeled data indicates close agreement with a correlation coefficient of 0.97 for the 4110 daily values. The model successfully simulates the marked seasonal variability in salinity as well as the effects of rapidly changing freshwater inflows. Salinity variability is driven primarily by freshwater inflow. The sensitivity of the modeled salinity to variations in the longitudinal mixing coefficients is investigated. The modeled salinity is relatively insensitive to the calibration factor for vertical mixing and relatively sensitive to the calibration factor for longitudinal mixing. The optimum value of the longitudinal calibration factor is 1.1, compared with the physically-based value of 1.0. Linear time-series analysis indicates that the observed and dynamically-modeled salinity-inflow responses are in good agreement in the lower reaches of the Bay.
Numerical Simulation of Regional Circulation in the Monterey Bay Region
NASA Technical Reports Server (NTRS)
Tseng, Y. H.; Dietrich, D. E.; Ferziger, J. H.
2003-01-01
The objective of this study is to produce a high-resolution numerical model of Mon- terey Bay area in which the dynamics are determined by the complex geometry of the coastline, steep bathymetry, and the in uence of the water masses that constitute the CCS. Our goal is to simulate the regional-scale ocean response with realistic dynamics (annual cycle), forcing, and domain. In particular, we focus on non-hydrostatic e ects (by comparing the results of hydrostatic and non-hydrostatic models) and the role of complex geometry, i.e. the bay and submarine canyon, on the nearshore circulation. To the best of our knowledge, the current study is the rst to simulate the regional circulation in the vicinity of Monterey Bay using a non-hydrostatic model. Section 2 introduces the high resolution Monterey Bay area regional model (MBARM). Section 3 provides the results and veri cation with mooring and satellite data. Section 4 compares the results of hydrostatic and non-hydrostatic models.
Gabere, Musa Nur; Hussein, Mohamed Aly; Aziz, Mohammad Azhar
2016-01-01
Purpose There has been considerable interest in using whole-genome expression profiles for the classification of colorectal cancer (CRC). The selection of important features is a crucial step before training a classifier. Methods In this study, we built a model that uses support vector machine (SVM) to classify cancer and normal samples using Affymetrix exon microarray data obtained from 90 samples of 48 patients diagnosed with CRC. From the 22,011 genes, we selected the 20, 30, 50, 100, 200, 300, and 500 genes most relevant to CRC using the minimum-redundancy–maximum-relevance (mRMR) technique. With these gene sets, an SVM model was designed using four different kernel types (linear, polynomial, radial basis function [RBF], and sigmoid). Results The best model, which used 30 genes and RBF kernel, outperformed other combinations; it had an accuracy of 84% for both ten fold and leave-one-out cross validations in discriminating the cancer samples from the normal samples. With this 30 genes set from mRMR, six classifiers were trained using random forest (RF), Bayes net (BN), multilayer perceptron (MLP), naïve Bayes (NB), reduced error pruning tree (REPT), and SVM. Two hybrids, mRMR + SVM and mRMR + BN, were the best models when tested on other datasets, and they achieved a prediction accuracy of 95.27% and 91.99%, respectively, compared to other mRMR hybrid models (mRMR + RF, mRMR + NB, mRMR + REPT, and mRMR + MLP). Ingenuity pathway analysis was used to analyze the functions of the 30 genes selected for this model and their potential association with CRC: CDH3, CEACAM7, CLDN1, IL8, IL6R, MMP1, MMP7, and TGFB1 were predicted to be CRC biomarkers. Conclusion This model could be used to further develop a diagnostic tool for predicting CRC based on gene expression data from patient samples. PMID:27330311
NASA Astrophysics Data System (ADS)
McNabb, R. W.; Womble, J. N.; Prakash, A.; Gens, R.; Ver Hoef, J.
2014-12-01
Tidewater glaciers play an important role in many landscape and ecosystem processes in fjords, terminating in the sea and calving icebergs and discharging meltwater directly into the ocean. Tidewater glaciers provide floating ice for use as habitat for harbor seals (Phoca vitulina richardii) for resting, pupping, nursing, molting, and avoiding predators. Tidewater glaciers are found in high concentrations in Southeast and Southcentral Alaska; currently, many of these glaciers are retreating or have stabilized in a retracted state, raising questions about the future availability of ice in these fjords as habitat for seals. Our primary objective is to investigate the relationship between harbor seal distribution and ice availability at an advancing tidewater glacier in Johns Hopkins Inlet, Glacier Bay National Park, Alaska. To this end, we use a combination of visible and infrared aerial photographs, object-based image analysis (OBIA), and statistical modeling techniques. We have developed a workflow to automate the processing of the imagery and the classification of the fjordscape (e.g., individual icebergs, brash ice, and open water), providing quantitative information on ice coverage as well as properties not typically found in traditional pixel-based classification techniques, such as block angularity and seal density across the fjord. Reflectance variation in the red channel of the optical images has proven to be the most important first-level criterion to separate open water from floating ice. This first-level criterion works well in areas without dense brash ice, but tends to misclassify dense brash ice as single icebergs. Isolating these large misclassified regions and applying a higher reflectance threshold as a second-level criterion helps to isolate individual ice blocks surrounded by dense brash ice. We present classification results from surveys taken during June and August, 2007-2013, as well as preliminary results from statistical modeling of the spatio-temporal distribution of seals and ice. OBIA is a powerful method of habitat classification and offers an effective approach to compare the spatio-temporal distribution and availability of glacial ice habitats for harbor seals in tidewater glacial fjords.
Use of UUVs to Evaluate and Improve Model Performance Within a Tidally-Dominated Bay
2008-09-30
Sequim Bay Road Sequim , WA 98382 Phone: (360) 681-3616 Fax: (360) 681-3699 Email: lyle.hibler@pnl.gov Grant Number: N00014-07-1-1113 LONG-TERM...releasing rhodamine dye on the surface of Sequim Bay ( Sequim , Washington) from an anchored vessel in 2006. Concurrently collected data from the...advective transport from a point release in Sequim Bay , Washington. Tidal, wind-driven and density-driven circulation were accounted for in the model. The
2010-06-01
35805 3 Pacific Northwest National Laboratory 1529 W. Sequim Bay Rd. Sequim , WA 98382 4 University of South Carolina Columbia, SC 5 Tetra...Watershed and Hydrodynamic Modeling for Evaluating the Impact of Land Use Change on Submerged Aquatic Vegetation and Seagrasses in Mobile Bay ...land use change. Mobile Bay , AL is a designated pilot region of the Gulf of Mexico Alliance (GOMA) and is the focus area of many current NASA and
2014-01-01
Affinity capture of DNA methylation combined with high-throughput sequencing strikes a good balance between the high cost of whole genome bisulfite sequencing and the low coverage of methylation arrays. We present BayMeth, an empirical Bayes approach that uses a fully methylated control sample to transform observed read counts into regional methylation levels. In our model, inefficient capture can readily be distinguished from low methylation levels. BayMeth improves on existing methods, allows explicit modeling of copy number variation, and offers computationally efficient analytical mean and variance estimators. BayMeth is available in the Repitools Bioconductor package. PMID:24517713
Using a food-web model to assess the trophic structure and energy flows in Daya Bay, China
NASA Astrophysics Data System (ADS)
Chen, Zuozhi; Xu, Shannan; Qiu, Yongsong
2015-12-01
Daya Bay, is one of the largest and most important semi-closed bays along the southern coast of China. Due to the favorable geomorphological and climatic conditions, this bay has become an important conservation zone of aquatic germplasm resources in South China Sea. To characterize the trophic structure, ecosystem properties and keystone species, a food-web model for Daya Bay has been developed by the means of a mass-balance approach using the Ecopath with Ecosim software. The mean trophic transfer efficiency for the entire ecosystem as a whole is 10.9% while the trophic level II is 5.1%. The primary- and secondary-producers, including phytoplankton, zooplankton and micro-zoobenthos demonstrated the important overall impacts on the rest of the groups based on mixed trophic impact (MIT) analysis and are classified as the keystone groups. The analysis of ecosystem attributes indicated that ecosystem of Daya Bay can be categorized as an immature one and/or is in the degraded stage. A comparison of this model with other coastal ecosystems, including Kuosheng Bay, Tongoy Bay, Beibu Gulf and Cadiz Gulf, underpinned that the ecosystem of Daye Bay is an obviously stressed system and is more vulnerable to the external disturbance. In general, our study indicates that a holistic approach is needed to minimize the impacts of anthropogenic activities to ensure the sustainability of the ecosystem in the future.
Automating document classification for the Immune Epitope Database
Wang, Peng; Morgan, Alexander A; Zhang, Qing; Sette, Alessandro; Peters, Bjoern
2007-01-01
Background The Immune Epitope Database contains information on immune epitopes curated manually from the scientific literature. Like similar projects in other knowledge domains, significant effort is spent on identifying which articles are relevant for this purpose. Results We here report our experience in automating this process using Naïve Bayes classifiers trained on 20,910 abstracts classified by domain experts. Improvements on the basic classifier performance were made by a) utilizing information stored in PubMed beyond the abstract itself b) applying standard feature selection criteria and c) extracting domain specific feature patterns that e.g. identify peptides sequences. We have implemented the classifier into the curation process determining if abstracts are clearly relevant, clearly irrelevant, or if no certain classification can be made, in which case the abstracts are manually classified. Testing this classification scheme on an independent dataset, we achieve 95% sensitivity and specificity in the 51.1% of abstracts that were automatically classified. Conclusion By implementing text classification, we have sped up the reference selection process without sacrificing sensitivity or specificity of the human expert classification. This study provides both practical recommendations for users of text classification tools, as well as a large dataset which can serve as a benchmark for tool developers. PMID:17655769
Chapter 4: Regional magnetic domains of the Circum-Arctic: A framework for geodynamic interpretation
Saltus, R.W.; Miller, E.L.; Gaina, C.; Brown, P.J.
2011-01-01
We identify and discuss 57 magnetic anomaly pattern domains spanning the Circum-Arctic. The domains are based on analysis of a new Circum-Arctic data compilation. The magnetic anomaly patterns can be broadly related to general geodynamic classification of the crust into stable, deformed (magnetic and nonmagnetic), deep magnetic high, oceanic and large igneous province domains. We compare the magnetic domains with topography/bathymetry, regional geology, regional free air gravity anomalies and estimates of the relative magnetic 'thickness' of the crust. Most of the domains and their geodynamic classification assignments are consistent with their topographic/bathymetric and geological expression. A few of the domains are potentially controversial. For example, the extent of the Iceland Faroe large igneous province as identified by magnetic anomalies may disagree with other definitions for this feature. Also the lack of definitive magnetic expression of oceanic crust in Baffin Bay, the Norwegian-Greenland Sea and the Amerasian Basin is at odds with some previous interpretations. The magnetic domains and their boundaries provide clues for tectonic models and boundaries within this poorly understood portion of the globe. ?? 2011 The Geological Society of London.
Study of the method of water-injected meat identifying based on low-field nuclear magnetic resonance
NASA Astrophysics Data System (ADS)
Xu, Jianmei; Lin, Qing; Yang, Fang; Zheng, Zheng; Ai, Zhujun
2018-01-01
The aim of this study to apply low-field nuclear magnetic resonance technique was to study regular variation of the transverse relaxation spectral parameters of water-injected meat with the proportion of water injection. Based on this, the method of one-way ANOVA and discriminant analysis was used to analyse the differences between these parameters in the capacity of distinguishing water-injected proportion, and established a model for identifying water-injected meat. The results show that, except for T 21b, T 22e and T 23b, the other parameters of the T 2 relaxation spectrum changed regularly with the change of water-injected proportion. The ability of different parameters to distinguish water-injected proportion was different. Based on S, P 22 and T 23m as the prediction variable, the Fisher model and the Bayes model were established by discriminant analysis method, qualitative and quantitative classification of water-injected meat can be realized. The rate of correct discrimination of distinguished validation and cross validation were 88%, the model was stable.
An Ant Colony Optimization Based Feature Selection for Web Page Classification
2014-01-01
The increased popularity of the web has caused the inclusion of huge amount of information to the web, and as a result of this explosive information growth, automated web page classification systems are needed to improve search engines' performance. Web pages have a large number of features such as HTML/XML tags, URLs, hyperlinks, and text contents that should be considered during an automated classification process. The aim of this study is to reduce the number of features to be used to improve runtime and accuracy of the classification of web pages. In this study, we used an ant colony optimization (ACO) algorithm to select the best features, and then we applied the well-known C4.5, naive Bayes, and k nearest neighbor classifiers to assign class labels to web pages. We used the WebKB and Conference datasets in our experiments, and we showed that using the ACO for feature selection improves both accuracy and runtime performance of classification. We also showed that the proposed ACO based algorithm can select better features with respect to the well-known information gain and chi square feature selection methods. PMID:25136678
Bayes factors and multimodel inference
Link, W.A.; Barker, R.J.; Thomson, David L.; Cooch, Evan G.; Conroy, Michael J.
2009-01-01
Multimodel inference has two main themes: model selection, and model averaging. Model averaging is a means of making inference conditional on a model set, rather than on a selected model, allowing formal recognition of the uncertainty associated with model choice. The Bayesian paradigm provides a natural framework for model averaging, and provides a context for evaluation of the commonly used AIC weights. We review Bayesian multimodel inference, noting the importance of Bayes factors. Noting the sensitivity of Bayes factors to the choice of priors on parameters, we define and propose nonpreferential priors as offering a reasonable standard for objective multimodel inference.
1989-07-01
TECHNICAL REPORT HL-89-14 VERIFICATION OF THE HYDRODYNAMIC AND Si SEDIMENT TRANSPORT HYBRID MODELING SYSTEM FOR CUMBERLAND SOUND AND I’) KINGS BAY...Hydrodynamic and Sediment Transport Hybrid Modeling System for Cumberland Sound and Kings Bay Navigation Channel, Georgia 12 PERSONAL AUTHOR(S) Granat...Hydrodynamic results from RMA-2V were used in the numerical sediment transport code STUDH in modeling the interaction of the flow transport and
NASA Astrophysics Data System (ADS)
Holmquist, J. R.; Byrd, K. B.; Ballanti, L.; Nguyen, D.; Simard, M.; Windham-Myers, L.; Thomas, N.
2017-12-01
Remote sensing based maps of tidal marshes, both of their extents and carbon stocks, have the potential to play a key role in conducting greenhouse gas inventories and implementing climate mitigation policies. Our goal was to generate a single remote sensing model of tidal marsh aboveground biomass and carbon that represents nationally diverse tidal marshes within the conterminous United States (CONUS). To meet this objective we developed the first national-scale dataset of aboveground tidal marsh biomass, species composition, and aboveground plant carbon content (%C) from six CONUS regions: Cape Cod, MA, Chesapeake Bay, MD, Everglades, FL, Mississippi Delta, LA, San Francisco Bay, CA, and Puget Sound, WA. Using the random forest algorithm we tested Sentinel-1 radar backscatter metrics and Landsat vegetation indices as predictors of biomass. The final model, driven by six Landsat vegetation indices and with the soil adjusted vegetation index as the most important (n=409, RMSE=310 g/m2, 10.3% normalized RMSE), successfully predicted biomass and carbon for a range of marsh plant functional types defined by height, leaf angle and growth form. Model error was reduced by scaling field measured biomass by Landsat fraction green vegetation derived from object-based classification of National Agriculture Imagery Program imagery. We generated 30m resolution biomass maps for estuarine and palustrine emergent tidal marshes as indicated by a modified NOAA Coastal Change Analysis Program map for each region. With a mean plant %C of 44.1% (n=1384, 95% C.I.=43.99% - 44.37%) we estimated mean aboveground carbon densities (Mg/ha) and total carbon stocks for each wetland type for each region. Louisiana palustrine emergent marshes had the highest C density (2.67 ±0.08 Mg/ha) of all regions, while San Francisco Bay brackish/saline marshes had the highest C density of all estuarine emergent marshes (2.03 ±0.06 Mg/ha). This modeling and data synthesis effort will allow for aboveground C stocks in tidal marshes to be included for the first time in the 2018 U.S. EPA Greenhouse Gas Inventory for coastal wetlands. As technical barriers have been reduced through the availability of free post-processed satellite data, cloud computing platforms and open source software, this approach can potentially be applied globally as well.
NASA Technical Reports Server (NTRS)
April, G. C.; Liu, H. A.
1975-01-01
Total coliform group bacteria were selected to expand the mathematical modeling capabilities of the hydrodynamic and salinity models to understand their relationship to commercial fishing ventures within bay waters and to gain a clear insight into the effect that rivers draining into the bay have on water quality conditions. Parametric observations revealed that temperature factors and river flow rate have a pronounced effect on the concentration profiles, while wind conditions showed only slight effects. An examination of coliform group loading concentrations at constant river flow rates and temperature shows these loading changes have an appreciable influence on total coliform distribution within Mobile Bay.
Results from air quality modeling and field measurements made as part of the Bay Region Atmospheric Chemistry Experiment (BRACE) along with related scientific literature were reviewed to provide an improved estimate of atmospheric reactive nitrogen (N) deposition to Tampa Bay, to...
Mu, Chun-sun; Zhang, Ping; Kong, Chun-yan; Li, Yang-ning
2015-09-01
To study the application of Bayes probability model in differentiating yin and yang jaundice syndromes in neonates. Totally 107 jaundice neonates who admitted to hospital within 10 days after birth were assigned to two groups according to syndrome differentiation, 68 in the yang jaundice syndrome group and 39 in the yin jaundice syndrome group. Data collected for neonates were factors related to jaundice before, during and after birth. Blood routines, liver and renal functions, and myocardial enzymes were tested on the admission day or the next day. Logistic regression model and Bayes discriminating analysis were used to screen factors important for yin and yang jaundice syndrome differentiation. Finally, Bayes probability model for yin and yang jaundice syndromes was established and assessed. Factors important for yin and yang jaundice syndrome differentiation screened by Logistic regression model and Bayes discriminating analysis included mothers' age, mother with gestational diabetes mellitus (GDM), gestational age, asphyxia, or ABO hemolytic diseases, red blood cell distribution width (RDW-SD), platelet-large cell ratio (P-LCR), serum direct bilirubin (DBIL), alkaline phosphatase (ALP), cholinesterase (CHE). Bayes discriminating analysis was performed by SPSS to obtain Bayes discriminant function coefficient. Bayes discriminant function was established according to discriminant function coefficients. Yang jaundice syndrome: y1= -21. 701 +2. 589 x mother's age + 1. 037 x GDM-17. 175 x asphyxia + 13. 876 x gestational age + 6. 303 x ABO hemolytic disease + 2.116 x RDW-SD + 0. 831 x DBIL + 0. 012 x ALP + 1. 697 x LCR + 0. 001 x CHE; Yin jaundice syndrome: y2= -33. 511 + 2.991 x mother's age + 3.960 x GDM-12. 877 x asphyxia + 11. 848 x gestational age + 1. 820 x ABO hemolytic disease +2. 231 x RDW-SD +0. 999 x DBIL +0. 023 x ALP +1. 916 x LCR +0. 002 x CHE. Bayes discriminant function was hypothesis tested and got Wilks' λ =0. 393 (P =0. 000). So Bayes discriminant function was proved to be with statistical difference. To check Bayes probability model in discriminating yin and yang jaundice syndromes, coincidence rates for yin and yang jaundice syndromes were both 90% plus. Yin and yang jaundice syndromes in neonates could be accurately judged by Bayesian discriminating functions.
Integrating Fluvial and Oceanic Drivers in Operational Flooding Forecasts for San Francisco Bay
NASA Astrophysics Data System (ADS)
Herdman, Liv; Erikson, Li; Barnard, Patrick; Kim, Jungho; Cifelli, Rob; Johnson, Lynn
2016-04-01
The nine counties that make up the San Francisco Bay area are home to 7.5 million people and these communties are susceptible to flooding along the bay shoreline and inland creeks that drain to the bay. A forecast model that integrates fluvial and oceanic drivers is necessary for predicting flooding in this complex urban environment. The U.S. Geological Survey ( USGS) and National Weather Service (NWS) are developing a state-of-the-art flooding forecast model for the San Francisco Bay area that will predict watershed and ocean-based flooding up to 72 hours in advance of an approaching storm. The model framework for flood forecasts is based on the USGS-developed Coastal Storm Modeling System (CoSMoS) that was applied to San Francisco Bay under the Our Coast Our Future project. For this application, we utilize Delft3D-FM, a hydrodynamic model based on a flexible mesh grid, to calculate water levels that account for tidal forcing, seasonal water level anomalies, surge and in-Bay generated wind waves from the wind and pressure fields of a NWS forecast model, and tributary discharges from the Research Distributed Hydrologic Model (RDHM), developed by the NWS Office of Hydrologic Development. The flooding extent is determined by overlaying the resulting water levels onto a recently completed 2-m digital elevation model of the study area which best resolves the extensive levee and tidal marsh systems in the region. Here we present initial pilot results of hindcast winter storms in January 2010 and December 2012, where the flooding is driven by oceanic and fluvial factors respectively. We also demonstrate the feasibility of predicting flooding on an operational time scale that incorporates both atmospheric and hydrologic forcings.
A semi-automated image analysis procedure for in situ plankton imaging systems.
Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M
2015-01-01
Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups.
A Semi-Automated Image Analysis Procedure for In Situ Plankton Imaging Systems
Bi, Hongsheng; Guo, Zhenhua; Benfield, Mark C.; Fan, Chunlei; Ford, Michael; Shahrestani, Suzan; Sieracki, Jeffery M.
2015-01-01
Plankton imaging systems are capable of providing fine-scale observations that enhance our understanding of key physical and biological processes. However, processing the large volumes of data collected by imaging systems remains a major obstacle for their employment, and existing approaches are designed either for images acquired under laboratory controlled conditions or within clear waters. In the present study, we developed a semi-automated approach to analyze plankton taxa from images acquired by the ZOOplankton VISualization (ZOOVIS) system within turbid estuarine waters, in Chesapeake Bay. When compared to images under laboratory controlled conditions or clear waters, images from highly turbid waters are often of relatively low quality and more variable, due to the large amount of objects and nonlinear illumination within each image. We first customized a segmentation procedure to locate objects within each image and extracted them for classification. A maximally stable extremal regions algorithm was applied to segment large gelatinous zooplankton and an adaptive threshold approach was developed to segment small organisms, such as copepods. Unlike the existing approaches for images acquired from laboratory, controlled conditions or clear waters, the target objects are often the majority class, and the classification can be treated as a multi-class classification problem. We customized a two-level hierarchical classification procedure using support vector machines to classify the target objects (< 5%), and remove the non-target objects (> 95%). First, histograms of oriented gradients feature descriptors were constructed for the segmented objects. In the first step all non-target and target objects were classified into different groups: arrow-like, copepod-like, and gelatinous zooplankton. Each object was passed to a group-specific classifier to remove most non-target objects. After the object was classified, an expert or non-expert then manually removed the non-target objects that could not be removed by the procedure. The procedure was tested on 89,419 images collected in Chesapeake Bay, and results were consistent with visual counts with >80% accuracy for all three groups. PMID:26010260
Verification testing of the BaySaver Separation System, Model 10K was conducted on a 10 acre drainage basin near downtown Griffin, Georgia. The system consists of two water tight pre-cast concrete manholes and a high-density polyethylene BaySaver Separator Unit. The BaySaver Mod...
In silico prediction of ROCK II inhibitors by different classification approaches.
Cai, Chuipu; Wu, Qihui; Luo, Yunxia; Ma, Huili; Shen, Jiangang; Zhang, Yongbin; Yang, Lei; Chen, Yunbo; Wen, Zehuai; Wang, Qi
2017-11-01
ROCK II is an important pharmacological target linked to central nervous system disorders such as Alzheimer's disease. The purpose of this research is to generate ROCK II inhibitor prediction models by machine learning approaches. Firstly, four sets of descriptors were calculated with MOE 2010 and PaDEL-Descriptor, and optimized by F-score and linear forward selection methods. In addition, four classification algorithms were used to initially build 16 classifiers with k-nearest neighbors [Formula: see text], naïve Bayes, Random forest, and support vector machine. Furthermore, three sets of structural fingerprint descriptors were introduced to enhance the predictive capacity of classifiers, which were assessed with fivefold cross-validation, test set validation and external test set validation. The best two models, MFK + MACCS and MLR + SubFP, have both MCC values of 0.925 for external test set. After that, a privileged substructure analysis was performed to reveal common chemical features of ROCK II inhibitors. Finally, binding modes were analyzed to identify relationships between molecular descriptors and activity, while main interactions were revealed by comparing the docking interaction of the most potent and the weakest ROCK II inhibitors. To the best of our knowledge, this is the first report on ROCK II inhibitors utilizing machine learning approaches that provides a new method for discovering novel ROCK II inhibitors.
Synergistic use of FIA plot data and Landsat 7 ETM+ images for large area forest mapping
Chengquan Huang; Limin Yang; Collin Homer; Michael Coan; Russell Rykhus; Zheng Zhang; Bruce Wylie; Kent Hegge; Andrew Lister; Michael Hoppus; Ronald Tymcio; Larry DeBlander; William Cooke; Ronald McRoberts; Daniel Wendt; Dale Weyermann
2002-01-01
FIA plot data were used to assist in classifying forest land cover from Landsat imagery and relevant ancillary data in two regions of the U.S.: one around the Chesapeake Bay area and the other around Utah. The overall accuracies for the forest/nonforest classification were over 90 percent and about 80 percent, respectively, in the two regions. The accuracies for...
On Algorithms for Generating Computationally Simple Piecewise Linear Classifiers
1989-05-01
suffers. - Waveform classification, e.g. speech recognition, seismic analysis (i.e. discrimination between earthquakes and nuclear explosions), target...assuming Gaussian distributions (B-G) d) Bayes classifier with probability densities estimated with the k-N-N method (B- kNN ) e) The -arest neighbour...range of classifiers are chosen including a fast, easy computable and often used classifier (B-G), reliable and complex classifiers (B- kNN and NNR
Computer Based Behavioral Biometric Authentication via Multi-Modal Fusion
2013-03-01
the decisions made by each individual modality. Fusion of features is the simple concatenation of feature vectors from multiple modalities to be...of Features BayesNet MDL 330 LibSVM PCA 80 J48 Wrapper Evaluator 11 3.5.3 Ensemble Based Decision Level Fusion. In ensemble learning multiple ...The high fusion percentages validate our hypothesis that by combining features from multiple modalities, classification accuracy can be improved. As
PERCH: A Unified Framework for Disease Gene Prioritization.
Feng, Bing-Jian
2017-03-01
To interpret genetic variants discovered from next-generation sequencing, integration of heterogeneous information is vital for success. This article describes a framework named PERCH (Polymorphism Evaluation, Ranking, and Classification for a Heritable trait), available at http://BJFengLab.org/. It can prioritize disease genes by quantitatively unifying a new deleteriousness measure called BayesDel, an improved assessment of the biological relevance of genes to the disease, a modified linkage analysis, a novel rare-variant association test, and a converted variant call quality score. It supports data that contain various combinations of extended pedigrees, trios, and case-controls, and allows for a reduced penetrance, an elevated phenocopy rate, liability classes, and covariates. BayesDel is more accurate than PolyPhen2, SIFT, FATHMM, LRT, Mutation Taster, Mutation Assessor, PhyloP, GERP++, SiPhy, CADD, MetaLR, and MetaSVM. The overall approach is faster and more powerful than the existing quantitative method pVAAST, as shown by the simulations of challenging situations in finding the missing heritability of a complex disease. This framework can also classify variants of unknown significance (variants of uncertain significance) by quantitatively integrating allele frequencies, deleteriousness, association, and co-segregation. PERCH is a versatile tool for gene prioritization in gene discovery research and variant classification in clinical genetic testing. © 2016 The Authors. **Human Mutation published by Wiley Periodicals, Inc.
Classification of sodium MRI data of cartilage using machine learning.
Madelin, Guillaume; Poidevin, Frederick; Makrymallis, Antonios; Regatte, Ravinder R
2015-11-01
To assess the possible utility of machine learning for classifying subjects with and subjects without osteoarthritis using sodium magnetic resonance imaging data. Theory: Support vector machine, k-nearest neighbors, naïve Bayes, discriminant analysis, linear regression, logistic regression, neural networks, decision tree, and tree bagging were tested. Sodium magnetic resonance imaging with and without fluid suppression by inversion recovery was acquired on the knee cartilage of 19 controls and 28 osteoarthritis patients. Sodium concentrations were measured in regions of interests in the knee for both acquisitions. Mean (MEAN) and standard deviation (STD) of these concentrations were measured in each regions of interest, and the minimum, maximum, and mean of these two measurements were calculated over all regions of interests for each subject. The resulting 12 variables per subject were used as predictors for classification. Either Min [STD] alone, or in combination with Mean [MEAN] or Min [MEAN], all from fluid suppressed data, were the best predictors with an accuracy >74%, mainly with linear logistic regression and linear support vector machine. Other good classifiers include discriminant analysis, linear regression, and naïve Bayes. Machine learning is a promising technique for classifying osteoarthritis patients and controls from sodium magnetic resonance imaging data. © 2014 Wiley Periodicals, Inc.
Cevenini, Gabriele; Barbini, Emanuela; Scolletta, Sabino; Biagioli, Bonizella; Giomarelli, Pierpaolo; Barbini, Paolo
2007-11-22
Popular predictive models for estimating morbidity probability after heart surgery are compared critically in a unitary framework. The study is divided into two parts. In the first part modelling techniques and intrinsic strengths and weaknesses of different approaches were discussed from a theoretical point of view. In this second part the performances of the same models are evaluated in an illustrative example. Eight models were developed: Bayes linear and quadratic models, k-nearest neighbour model, logistic regression model, Higgins and direct scoring systems and two feed-forward artificial neural networks with one and two layers. Cardiovascular, respiratory, neurological, renal, infectious and hemorrhagic complications were defined as morbidity. Training and testing sets each of 545 cases were used. The optimal set of predictors was chosen among a collection of 78 preoperative, intraoperative and postoperative variables by a stepwise procedure. Discrimination and calibration were evaluated by the area under the receiver operating characteristic curve and Hosmer-Lemeshow goodness-of-fit test, respectively. Scoring systems and the logistic regression model required the largest set of predictors, while Bayesian and k-nearest neighbour models were much more parsimonious. In testing data, all models showed acceptable discrimination capacities, however the Bayes quadratic model, using only three predictors, provided the best performance. All models showed satisfactory generalization ability: again the Bayes quadratic model exhibited the best generalization, while artificial neural networks and scoring systems gave the worst results. Finally, poor calibration was obtained when using scoring systems, k-nearest neighbour model and artificial neural networks, while Bayes (after recalibration) and logistic regression models gave adequate results. Although all the predictive models showed acceptable discrimination performance in the example considered, the Bayes and logistic regression models seemed better than the others, because they also had good generalization and calibration. The Bayes quadratic model seemed to be a convincing alternative to the much more usual Bayes linear and logistic regression models. It showed its capacity to identify a minimum core of predictors generally recognized as essential to pragmatically evaluate the risk of developing morbidity after heart surgery.
Fang, Shu-Ming; Zhang, Xianming; Bao, Lian-Jun; Zeng, Eddy Y
2016-05-01
Antifouling paint applied to fishing vessels is the primary source of dichloro-diphenyl-trichloroethane (DDT) to the coastal marine environments of China. With the aim to provide science-based support of potential regulations on DDT use in antifouling paint, we utilized a fugacity-based model to evaluate the fate and impact of p,p'-DDT, the dominant component of DDT mixture, in Daya Bay and Hailing Bay, two typical estuarine bays in South China. The emissions of p,p'-DDT from fishing vessels to the aquatic environments of Hailing Bay and Daya Bay were estimated as 9.3 and 7.7 kg yr(-1), respectively. Uncertainty analysis indicated that the temporal variability of p,p'-DDT was well described by the model if fishing vessels were considered as the only direct source, i.e., fishing vessels should be the dominant source of p,p'-DDT in coastal bay areas of China. Estimated hazard quotients indicated that sediment in Hailing Bay posed high risk to the aquatic system, and it would take at least 21 years to reduce the hazards to a safe level. Moreover, p,p'-DDT tends to migrate from water to sediment in the entire Hailing Bay and Daya Bay. On the other hand, our previous research indicated that p,p'-DDT was more likely to migrate from sediment to water in the maricultured zones located in shallow waters of these two bays, where fishing vessels frequently remain. These findings suggest that relocating mariculture zones to deeper waters would reduce the likelihood of farmed fish contamination by p,p'-DDT. Copyright © 2016 Elsevier Ltd. All rights reserved.
Adkison, M.; Peterman, R.; Lapointe, M.; Gillis, D.; Korman, J.
1996-01-01
We compare alternative models of sockeye salmon (Oncorhynchus nerka) productivity (returns per spawner) using more than 30 years of catch and escapement data for Bristol Bay, Alaska, and the Fraser River, British Columbia. The models examined include several alternative forms of models that incorporate climatic influences as well as models not based on climate. For most stocks, a stationary stock-recruitment relationship explains very little of the interannual variation in productivity. In Bristol Bay, productivity co-varies among stocks and appears to be strongly related to fluctuations in climate. The best model for Bristol Bay sockeye involved a change in the 1970s in the parameters of the Ricker stock-recruitment curve; the stocks generally became more productive. In contrast, none of the models of Fraser River stocks that we examined explained much of the variability in their productivity.
The long-term salinity field in San Francisco Bay
Uncles, R.J.; Peterson, D.H.
1996-01-01
Data are presented on long-term salinity behaviour in San Francisco Bay, California. A two-level, width averaged model of the tidally averaged salinity and circulation has been written in order to interpret the long-term (days to decades) salinity variability. The model has been used to simulate daily averaged salinity in the upper and lower levels of a 51 segment discretization of the Bay over the 22-yr period 1967-1988. Monthly averaged surface salinity from observations and monthly-averaged simulated salinity are in reasonable agreement. Good agreement is obtained from comparison with daily averaged salinity measured in the upper reaches of North Bay. The salinity variability is driven primarily by freshwater inflow with relatively minor oceanic influence. All stations exhibit a marked seasonal cycle in accordance with the Mediterranean climate, as well as a rich spectrum of variability due to extreme inflow events and extended periods of drought. Monthly averaged salinity intrusion positions have a pronounced seasonal variability and show an approximately linear response to the logarithm of monthly averaged Delta inflow. Although few observed data are available for studies of long-term salinity stratification, modelled stratification is found to be strongly dependent on freshwater inflow; the nature of that dependence varies throughout the Bay. Near the Golden Gate, stratification tends to increase up to very high inflows. In the central reaches of North Bay, modelled stratification maximizes as a function of inflow and further inflow reduces stratification. Near the head of North Bay, lowest summer inflows are associated with the greatest modelled stratification. Observations from the central reaches of North Bay show marked spring-neap variations in stratification and gravitational circulation, both being stronger at neap tides. This spring-neap variation is simulated by the model. A feature of the modelled stratification is a hysteresis in which, for a given spring-neap tidal range and fairly steady inflows, the stratification is higher progressing from neaps to springs than from springs to neaps. The simulated responses of the Bay to perturbations in coastal sea salinity and Delta inflow have been used to further delineate the time-scales of salinity variability. Simulations have been performed about low inflow, steady-state conditions for both salinity and Delta inflow perturbations. For salinity perturbations a small, sinusoidal salinity signal with a period of 1 yr has been applied at the coastal boundary as well as a pulse of salinity with a duration of one day. For Delta inflow perturbations a small, sinusoidally varying inflow signal with a period of 1 yr has been superimposed on an otherwise constant Delta inflow, as well as a pulse of inflow with a duration of one day. Perturbations is coastal salinity dissipate as they move through the Bay. Seasonal perturbations require about 40-45 days to propagate from the coastal ocean to the Delta and to the head of South Bay. The response times of the model to perturbations in freshwater inflow are faster than this in North Bay and comparable in South Bay. In North Bay, time-scales are consistent with advection due to lower level, up-estuary transport of coastal salinity perturbations; for inflow perturbations, faster response times arise from both upper level, down-estuary advection and much faster, down-estuary migration of isohalines in response to inflow volume continuity. In South Bay, the dominant time-scales are governed by tidal dispersion.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Balaguru, Karthik; Leung, L. Ruby; Lu, Jian
2016-06-27
Analysis of Bay of Bengal tropical cyclone (TC) track data for the month of May during 1980-2013 reveals a meridional dipole in TC intensification: TC intensification rates increased in the northern Bay and decreased in the southern Bay. The dipole was driven by an increase in low-level vorticity and atmospheric humidity in the northern Bay, making the environment more favorable for TC intensification, and enhanced vertical wind shear in the southern Bay, tending to reduce TC development. These environmental changes were associated with a strengthening of the monsoon circulation for the month of May, driven by a La Nin˜a-like shiftmore » in tropical Pacific SSTs andassociated tropical wave dynamics. Analysis of a suite of climate models fromthe CMIP5 archive for the 150-year historical period shows that most models correctly reproduce the link between ENSO and Bay of Bengal TC activity through the monsoon at interannual timescales. Under the RCP 8.5 scenario the same CMIP5 models produce an El Nin˜o like warming trend in the equatorial Pacific, tending to weaken the monsoon circulation. These results suggest« less
Comparing two Bayes methods based on the free energy functions in Bernoulli mixtures.
Yamazaki, Keisuke; Kaji, Daisuke
2013-08-01
Hierarchical learning models are ubiquitously employed in information science and data engineering. The structure makes the posterior distribution complicated in the Bayes method. Then, the prediction including construction of the posterior is not tractable though advantages of the method are empirically well known. The variational Bayes method is widely used as an approximation method for application; it has the tractable posterior on the basis of the variational free energy function. The asymptotic behavior has been studied in many hierarchical models and a phase transition is observed. The exact form of the asymptotic variational Bayes energy is derived in Bernoulli mixture models and the phase diagram shows that there are three types of parameter learning. However, the approximation accuracy or interpretation of the transition point has not been clarified yet. The present paper precisely analyzes the Bayes free energy function of the Bernoulli mixtures. Comparing free energy functions in these two Bayes methods, we can determine the approximation accuracy and elucidate behavior of the parameter learning. Our results claim that the Bayes free energy has the same learning types while the transition points are different. Copyright © 2013 Elsevier Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Chao, Yi; Farrara, John D.; Zhang, Hongchun; Zhang, Yinglong J.; Ateljevich, Eli; Chai, Fei; Davis, Curtiss O.; Dugdale, Richard; Wilkerson, Frances
2017-07-01
A three-dimensional numerical modeling system for the San Francisco Bay is presented. The system is based on an unstructured grid numerical model known as Semi-implicit Cross-scale Hydroscience Integrated System Model (SCHISM). The lateral boundary condition is provided by a regional coastal ocean model. The surface forcing is provided by a regional atmospheric model. The SCHISM results from a decadal hindcast run are compared with available tide gauge data, as well as a collection of temperature and salinity profiles. An examination of the observed climatological annual mean salinities at the United States Geological Survey (USGS) stations shows the highest salinities to be in the open ocean and the lowest well north (upstream) of the Central Bay, a pattern that does not change substantially with season. The corresponding mean SCHISM salinities reproduced the observed variations with location quite well, though with a fresh bias. The lowest values within the Bay occur during spring and the highest values during autumn, mirroring the seasonal variations in river discharge. The corresponding observed mean temperatures within the Bay were 2 to 3° C cooler in the Central Bay than to either the north or south. This observed pattern of a cooler Central Bay was not particularly well reproduced in the SCHISM results, which also showed a cold bias. Examination of the seasonal means revealed that the cool Central Bay pattern is found only during summer in the SCHISM results. The persistent cold and fresh biases in the model control run were nearly eliminated in a sensitivity run with modifications to the surface heat flux and river discharge. The surface atmospheric forcing and the heat flux at the western boundary are found to be the two major terms in a SCHISM-based heat budget analysis of the mean seasonal temperature cycle for the Central Bay. In the Central Bay salt budget, freshwater discharged by rivers into upstream portions of the Bay to the north balanced by the influx of salt from the west are the primary drivers of the mean seasonal salinity cycle. Concerning the interannual variability in temperatures, the warm anomalies during the period 2014-16 were the strongest and most persistent departures from normal during the period analyzed and were realistically reproduced by SCHISM. The most prominent salinity anomalies in both the observations and SCHISM results were the salty anomalies that persisted for most of the four-year California drought of 2012-2015.
Mikulich-Gilbertson, Susan K; Wagner, Brandie D; Grunwald, Gary K; Riggs, Paula D; Zerbe, Gary O
2018-01-01
Medical research is often designed to investigate changes in a collection of response variables that are measured repeatedly on the same subjects. The multivariate generalized linear mixed model (MGLMM) can be used to evaluate random coefficient associations (e.g. simple correlations, partial regression coefficients) among outcomes that may be non-normal and differently distributed by specifying a multivariate normal distribution for their random effects and then evaluating the latent relationship between them. Empirical Bayes predictors are readily available for each subject from any mixed model and are observable and hence, plotable. Here, we evaluate whether second-stage association analyses of empirical Bayes predictors from a MGLMM, provide a good approximation and visual representation of these latent association analyses using medical examples and simulations. Additionally, we compare these results with association analyses of empirical Bayes predictors generated from separate mixed models for each outcome, a procedure that could circumvent computational problems that arise when the dimension of the joint covariance matrix of random effects is large and prohibits estimation of latent associations. As has been shown in other analytic contexts, the p-values for all second-stage coefficients that were determined by naively assuming normality of empirical Bayes predictors provide a good approximation to p-values determined via permutation analysis. Analyzing outcomes that are interrelated with separate models in the first stage and then associating the resulting empirical Bayes predictors in a second stage results in different mean and covariance parameter estimates from the maximum likelihood estimates generated by a MGLMM. The potential for erroneous inference from using results from these separate models increases as the magnitude of the association among the outcomes increases. Thus if computable, scatterplots of the conditionally independent empirical Bayes predictors from a MGLMM are always preferable to scatterplots of empirical Bayes predictors generated by separate models, unless the true association between outcomes is zero.
NASA Astrophysics Data System (ADS)
Uzbaş, Betül; Arslan, Ahmet
2018-04-01
Gender is an important step for human computer interactive processes and identification. Human face image is one of the important sources to determine gender. In the present study, gender classification is performed automatically from facial images. In order to classify gender, we propose a combination of features that have been extracted face, eye and lip regions by using a hybrid method of Local Binary Pattern and Gray-Level Co-Occurrence Matrix. The features have been extracted from automatically obtained face, eye and lip regions. All of the extracted features have been combined and given as input parameters to classification methods (Support Vector Machine, Artificial Neural Networks, Naive Bayes and k-Nearest Neighbor methods) for gender classification. The Nottingham Scan face database that consists of the frontal face images of 100 people (50 male and 50 female) is used for this purpose. As the result of the experimental studies, the highest success rate has been achieved as 98% by using Support Vector Machine. The experimental results illustrate the efficacy of our proposed method.
Comparisons and Selections of Features and Classifiers for Short Text Classification
NASA Astrophysics Data System (ADS)
Wang, Ye; Zhou, Zhi; Jin, Shan; Liu, Debin; Lu, Mi
2017-10-01
Short text is considerably different from traditional long text documents due to its shortness and conciseness, which somehow hinders the applications of conventional machine learning and data mining algorithms in short text classification. According to traditional artificial intelligence methods, we divide short text classification into three steps, namely preprocessing, feature selection and classifier comparison. In this paper, we have illustrated step-by-step how we approach our goals. Specifically, in feature selection, we compared the performance and robustness of the four methods of one-hot encoding, tf-idf weighting, word2vec and paragraph2vec, and in the classification part, we deliberately chose and compared Naive Bayes, Logistic Regression, Support Vector Machine, K-nearest Neighbor and Decision Tree as our classifiers. Then, we compared and analysed the classifiers horizontally with each other and vertically with feature selections. Regarding the datasets, we crawled more than 400,000 short text files from Shanghai and Shenzhen Stock Exchanges and manually labeled them into two classes, the big and the small. There are eight labels in the big class, and 59 labels in the small class.
Texture classification of lung computed tomography images
NASA Astrophysics Data System (ADS)
Pheng, Hang See; Shamsuddin, Siti M.
2013-03-01
Current development of algorithms in computer-aided diagnosis (CAD) scheme is growing rapidly to assist the radiologist in medical image interpretation. Texture analysis of computed tomography (CT) scans is one of important preliminary stage in the computerized detection system and classification for lung cancer. Among different types of images features analysis, Haralick texture with variety of statistical measures has been used widely in image texture description. The extraction of texture feature values is essential to be used by a CAD especially in classification of the normal and abnormal tissue on the cross sectional CT images. This paper aims to compare experimental results using texture extraction and different machine leaning methods in the classification normal and abnormal tissues through lung CT images. The machine learning methods involve in this assessment are Artificial Immune Recognition System (AIRS), Naive Bayes, Decision Tree (J48) and Backpropagation Neural Network. AIRS is found to provide high accuracy (99.2%) and sensitivity (98.0%) in the assessment. For experiments and testing purpose, publicly available datasets in the Reference Image Database to Evaluate Therapy Response (RIDER) are used as study cases.
The dynamics of İzmir Bay under the effects of wind and thermohaline forces
NASA Astrophysics Data System (ADS)
Sayın, Erdem; Eronat, Canan
2018-04-01
The dominant circulation pattern of İzmir Bay on the Aegean Sea coast of Turkey is studied taking into consideration the influence of wind and thermohaline forces. İzmir Bay is discussed by subdividing the bay into outer, middle and inner areas. Wind is the most important driving force in the İzmir coastal area. There are also thermohaline forces due to the existence of water types of different physical properties in the bay. In contrast to the two-layer stratification during summer, a homogeneous water column exists in winter. The free surface version of the Princeton model (Killworth's 3-D general circulation model) is applied, with the input data obtained through the measurements made by the research vessel K. Piri Reis. As a result of the simulations with artificial wind, the strong consistent wind generates circulation patterns independent of the seasonal stratification in the bay. Wind-driven circulation causes cyclonic or anticyclonic movements in the middle bay where the distinct İzmir Bay Water (IBW) forms. Cyclonic movement takes place under the influence of southerly and westerly winds. On the other hand, northerly and easterly winds cause an anticyclonic movement in the middle bay. The outer and inner bay also have the wind-driven recirculation patterns expected.
Ranjbar, Mohammad Hassan; Hadjizadeh Zaker, Nasser
2016-11-01
Gorgan Bay is a semi-enclosed basin located in the southeast of the Caspian Sea in Iran and is an important marine habitat for fish and seabirds. In the present study, the environmental capacity of phosphorus in Gorgan Bay was estimated using a 3D ecological-hydrodynamic numerical model and a linear programming model. The distribution of phosphorus, simulated by the numerical model, was used as an index for the occurrence of eutrophication and to determine the water quality response field of each of the pollution sources. The linear programming model was used to calculate and allocate the total maximum allowable loads of phosphorus to each of the pollution sources in a way that eutrophication be prevented and at the same time maximum environmental capacity be achieved. In addition, the effect of an artificial inlet on the environmental capacity of the bay was investigated. Observations of surface currents in Gorgan Bay were made by GPS-tracked surface drifters to provide data for calibration and verification of numerical modeling. Drifters were deployed at five different points across the bay over a period of 5 days. The results indicated that the annual environmental capacity of phosphorus is approximately 141 t if a concentration of 0.0477 mg/l for phosphorus is set as the water quality criterion. Creating an artificial inlet with a width of 1 km in the western part of the bay would result in a threefold increase in the environmental capacity of the study area.
Wind-Driven Waves in Tampa Bay, Florida
NASA Astrophysics Data System (ADS)
Gilbert, S. A.; Meyers, S. D.; Luther, M. E.
2002-12-01
Turbidity and nutrient flux due to sediment resuspension by waves and currents are important factors controlling water quality in Tampa Bay. During December 2001 and January 2002, four Sea Bird Electronics SeaGauge wave and tide recorders were deployed in Tampa Bay in each major bay segment. Since May 2002, a SeaGauge has been continuously deployed at a site in middle Tampa Bay as a component of the Bay Regional Atmospheric Chemistry Experiment (BRACE). Initial results for the summer 2002 data indicate that significant wave height is linearly dependent on wind speed and direction over a range of 1 to 12 m/s. The data were divided into four groups according to wind direction. Wave height dependence on wind speed was examined for each group. Both northeasterly and southwesterly winds force significant wave heights that are about 30% larger than those for northwesterly and southeasterly winds. This difference is explained by variations in fetch due to basin shape. Comparisons are made between these observations and the results of a SWAN-based model of Tampa Bay. The SWAN wave model is coupled to a three-dimensional circulation model and computes wave spectra at each model grid cell under observed wind conditions and modeled water velocity. When SWAN is run without dissipation, the model results are generally similar in wave period but about 25%-50% higher in significant wave height than the observations. The impact of various dissipation mechanisms such as bottom drag and whitecapping on the wave state is being investigated. Preliminary analyses on winter data give similar results.
Tampa Bay Water Clarity Model (TBWCM): As a Predictive Tool
The Tampa Bay Water Clarity Model was developed as a predictive tool for estimating the impact of changing nutrient loads on water clarity as measured by secchi depth. The model combines a physical mixing model with an irradiance model and nutrient cycling model. A 10 segment bi...
Effects of waves on water dispersion in a semi-enclosed estuarine bay
NASA Astrophysics Data System (ADS)
Delpey, M. T.; Ardhuin, F.; Otheguy, P.
2012-04-01
The bay of Saint Jean de Luz - Ciboure is a touristic destination located in the south west of France on the Basque coast. This small bay is 1.5km wide for 1km long. It is semi-enclosed by breakwaters, so that the area is mostly protected from waves except in its eastern part, where wave breaking is regularly observed over a shallow rock shelf. In the rest of the area the currents are generally weak. The bay receives fresh water inflows from two rivers. During intense raining events, the rivers can introduce pollutants in the bay. The input of pollutants combined with the low level dynamic of the area can affect the water quality for several days. To study such a phenomenon, mechanisms of water dispersion in the bay are investigated. The present paper focuses on the effects of waves on bay dynamics. Several field experiments were conducted in the area, combining wave and current measurements from a set of ADCP and ADV, lagrangian difter experiments in the surfzone, salinity and temperature profile measurements. An analysis of this set of various data is provided. It reveals that the bay combines remarkable density stratification due to fresh water inflows and occasionally intense wave-induced currents in the surfzone. These currents have a strong influence on river plume dynamics when the sea state is energetic. Moreover, modifications of hydrodynamics in the bay passes are found to be remarkably correlated with sea state evolutions. This result suggests a significant impact of waves on the bay flushing. To further analyse these phenomena, a three dimensional numerical model of bay hydrodynamics is developed. The model aims at reproducing fresh water inflows combined with wind-, tide- and wave-induced currents and mixing. The model of the bay is implemented using the code MOHID , which has been modified to allow the three dimensional representation of wave-current interactions proposed by Ardhuin et al. [2008b] . The circulation is forced by the wave field modelled with the code WAVEWATCHIII . A first confrontation between model results and in situ observations is provided, showing a reasonable agreement. ----------------------------------------------------------- 1 Braunschweig, F., Chamble, P., Fernandes, L., Pina, P., Neves, R., The object-oriented design of the integrated modelling system MOHID, Computational Methods in Water Resources International Conference (North Carolina, USA: Chapel Hill). 2 Ardhuin, F., Rascle, N., Belibassakis, K. A., 2008b. Explicit wave-averaged primitive equations using a generalized Lagrangian mean. Ocean Modelling 20, 35-60. 3 Tolman, H. L., 2009. User manual and system documentation of WAVEWATCHIIITM version3.14. Tech. Rep. 276, NOAA/NWS/NCEP/MMAB.
Water resources planning for rivers draining into Mobile Bay
NASA Technical Reports Server (NTRS)
April, G. C.
1976-01-01
The application of remote sensing, automatic data processing, modeling and other aerospace related technologies to hydrological engineering and water resource management are discussed for the entire river drainage system which feeds the Mobile Bay estuary. The adaptation and implementation of existing mathematical modeling methods are investigated for the purpose of describing the behavior of Mobile Bay. Of particular importance are the interactions that system variables such as river flow rate, wind direction and speed, and tidal state have on the water movement and quality within the bay system.
Seismic Velocity Structure across the Hayward Fault Zone Near San Leandro, California
NASA Astrophysics Data System (ADS)
Strayer, L. M.; Catchings, R.; Chan, J. H.; Richardson, I. S.; McEvilly, A.; Goldman, M.; Criley, C.; Sickler, R. R.
2017-12-01
In Fall 2016 we conducted the East Bay Seismic Investigation, a NEHRP-funded collaboration between California State University, East Bay and the United State Geological Survey. The study produced a large volume of seismic data, allowing us to examine the subsurface across the East Bay plain and hills using a variety of geophysical methods. We know of no other survey performed in the past that has imaged this area, at this scale, and with this degree of resolution. Initial models show that seismic velocities of the Hayward Fault Zone (HFZ), the East Bay plain, and the East Bay hills are illuminated to depths of 5-6 km. We used explosive sources at 1-km intervals along a 15-km-long, NE-striking ( 055°), seismic line centered on the HFZ. Vertical- and horizontal-component sensors were spaced at 100 m intervals along the entire profile, with vertical-component sensors at 20 m intervals across mapped or suspected faults. Preliminary seismic refraction tomography across the HFZ, sensu lato, (includes sub-parallel, connected, and related faults), shows that the San Leandro Block (SLB) is a low-velocity feature in the upper 1-3 km, with nearly the same Vp as the adjacent Great Valley sediments to the east, and low Vs values. In our initial analysis we can trace the SLB and its bounding faults (Hayward, Chabot) nearly vertically, to at least 2-4 km depth. Similarly, preliminary migrated reflection images suggest that many if not all of the peripheral reverse, strike-slip and oblique-slip faults of the wider HFZ dip toward the SLB, into a curtain of relocated epicenters that define the HFZ at depth, indicative of a `flower-structure'. Preliminary Vs tomography identifies another apparently weak zone at depth, located about 1.5 km east of the San Leandro shoreline, that may represent the northward continuation of the Silver Creek Fault. Centered 4 km from the Bay, there is a distinctive, 2 km-wide, uplifted, horst-like, high-velocity structure (both Vp & Vs) that bounds the SLB to the west, outboard of the HF. We acquired a 2-D shear-wave velocity results using the multichannel analysis of surface waves (MASW) method on Rayleigh waves generated along the seismic profile. Our MASW result shows 600m depth of investigation, and Vs100 results range from 228m/s to 335m/s at fault zones, which correspond to NEHRP site classification D.
Long-term isolation and local adaptation in Palau's Nikko Bay help corals thrive in acidic waters
NASA Astrophysics Data System (ADS)
Golbuu, Yimnang; Gouezo, Marine; Kurihara, Haruko; Rehm, Lincoln; Wolanski, Eric
2016-09-01
The reefs in Palau's Nikko Bay live in seawater with low pH that is similar to conditions predicted for 2100 because of ocean acidification. Nevertheless, the reefs at Nikko Bay have high coral cover and high diversity. We hypothesize that the low-pH environment in Nikko Bay is caused by low flushing rates, which causes long-term isolation and local adaptation. To test this hypothesis, we modeled the water circulation in and around Nikko Bay. Model results show that average residence time is 71 d, which is ten times the residence time on fore-reef habitats. The long residence time restricts the exchange of coral larvae in the bay with adjacent reefs, allowing persistent selection for tolerant traits and local adaptation. The corals in Nikko Bay are also more susceptible to local pollution because the waters are poorly flushed. Therefore, local management must focus on minimizing human impacts such as dredging, overfishing and pollution in the bay, which would compromise the condition of the corals that have already adapted to low-pH conditions.
Lima, Ana Carolina E S; de Castro, Leandro Nunes
2014-10-01
Social media allow web users to create and share content pertaining to different subjects, exposing their activities, opinions, feelings and thoughts. In this context, online social media has attracted the interest of data scientists seeking to understand behaviours and trends, whilst collecting statistics for social sites. One potential application for these data is personality prediction, which aims to understand a user's behaviour within social media. Traditional personality prediction relies on users' profiles, their status updates, the messages they post, etc. Here, a personality prediction system for social media data is introduced that differs from most approaches in the literature, in that it works with groups of texts, instead of single texts, and does not take users' profiles into account. Also, the proposed approach extracts meta-attributes from texts and does not work directly with the content of the messages. The set of possible personality traits is taken from the Big Five model and allows the problem to be characterised as a multi-label classification task. The problem is then transformed into a set of five binary classification problems and solved by means of a semi-supervised learning approach, due to the difficulty in annotating the massive amounts of data generated in social media. In our implementation, the proposed system was trained with three well-known machine-learning algorithms, namely a Naïve Bayes classifier, a Support Vector Machine, and a Multilayer Perceptron neural network. The system was applied to predict the personality of Tweets taken from three datasets available in the literature, and resulted in an approximately 83% accurate prediction, with some of the personality traits presenting better individual classification rates than others. Copyright © 2014 Elsevier Ltd. All rights reserved.
A trophic model of fringing coral reefs in Nanwan Bay, southern Taiwan suggests overfishing.
Liu, Pi-Jen; Shao, Kwang-Tsao; Jan, Rong-Quen; Fan, Tung-Yung; Wong, Saou-Lien; Hwang, Jiang-Shiou; Chen, Jen-Ping; Chen, Chung-Chi; Lin, Hsing-Juh
2009-09-01
Several coral reefs of Nanwan Bay, Taiwan have recently undergone shifts to macroalgal or sea anemone dominance. Thus, a mass-balance trophic model was constructed to analyze the structure and functioning of the food web. The fringing reef model was comprised of 18 compartments, with the highest trophic level of 3.45 for piscivorous fish. Comparative analyses with other reef models demonstrated that Nanwan Bay was similar to reefs with high fishery catches. While coral biomass was not lower, fish biomass was lower than those of reefs with high catches. Consequently, the sums of consumption and respiratory flows and total system throughput were also decreased. The Nanwan Bay model potentially suggests an overfished status in which the mean trophic level of the catch, matter cycling, and trophic transfer efficiency are extremely reduced.
NASA Astrophysics Data System (ADS)
Tien Bui, Dieu; Hoang, Nhat-Duc
2017-09-01
In this study, a probabilistic model, named as BayGmmKda, is proposed for flood susceptibility assessment in a study area in central Vietnam. The new model is a Bayesian framework constructed by a combination of a Gaussian mixture model (GMM), radial-basis-function Fisher discriminant analysis (RBFDA), and a geographic information system (GIS) database. In the Bayesian framework, GMM is used for modeling the data distribution of flood-influencing factors in the GIS database, whereas RBFDA is utilized to construct a latent variable that aims at enhancing the model performance. As a result, the posterior probabilistic output of the BayGmmKda model is used as flood susceptibility index. Experiment results showed that the proposed hybrid framework is superior to other benchmark models, including the adaptive neuro-fuzzy inference system and the support vector machine. To facilitate the model implementation, a software program of BayGmmKda has been developed in MATLAB. The BayGmmKda program can accurately establish a flood susceptibility map for the study region. Accordingly, local authorities can overlay this susceptibility map onto various land-use maps for the purpose of land-use planning or management.
Bayes factors for the linear ballistic accumulator model of decision-making.
Evans, Nathan J; Brown, Scott D
2018-04-01
Evidence accumulation models of decision-making have led to advances in several different areas of psychology. These models provide a way to integrate response time and accuracy data, and to describe performance in terms of latent cognitive processes. Testing important psychological hypotheses using cognitive models requires a method to make inferences about different versions of the models which assume different parameters to cause observed effects. The task of model-based inference using noisy data is difficult, and has proven especially problematic with current model selection methods based on parameter estimation. We provide a method for computing Bayes factors through Monte-Carlo integration for the linear ballistic accumulator (LBA; Brown and Heathcote, 2008), a widely used evidence accumulation model. Bayes factors are used frequently for inference with simpler statistical models, and they do not require parameter estimation. In order to overcome the computational burden of estimating Bayes factors via brute force integration, we exploit general purpose graphical processing units; we provide free code for this. This approach allows estimation of Bayes factors via Monte-Carlo integration within a practical time frame. We demonstrate the method using both simulated and real data. We investigate the stability of the Monte-Carlo approximation, and the LBA's inferential properties, in simulation studies.
NASA Astrophysics Data System (ADS)
Galperin, Boris; Mellor, George L.
1990-09-01
The three-dimensional model of Delaware Bay, River and adjacent continental shelf was described in Part 1. Here, Part 2 of this two-part paper demonstrates that the model is capable of realistic simulation of current and salinity distributions, tidal cycle variability, events of strong mixing caused by high winds and rapid salinity changes due to high river runoff. The 25-h average subtidal circulation strongly depends on the wind forcing. Monthly residual currents and salinity distributions demonstrate a classical two-layer estuarine circulation wherein relatively low salinity water flows out at the surface and compensating high salinity water from the shelf flows at the bottom. The salinity intrusion is most vigorous along deep channels in the Bay. Winds can generate salinity fronts inside and outside the Bay and enhance or weaken the two-layer circulation pattern. Since the portion of the continental shelf included in the model is limited, the model shelf circulation is locally wind-driven and excludes such effects as coastally trapped waves and interaction with Gulf Stream rings; nevertheless, a significant portion of the coastal elevation variability is hindcast by the model. Also, inclusion of the shelf improves simulation of salinity inside the Bay compared with simulations where the salinity boundary condition is specified at the mouth of the Bay.
Increasing the Knowledge of Stratification in Shallow Coastal Environments
NASA Astrophysics Data System (ADS)
Ojo, T.; Bonner, J.; Hodges, B.; Maidment, D.; Montagna, P.; Minsker, B.
2006-12-01
A testbed has been established using Corpus Christi Bay as an environmental field facility to study the phenomenon of hypoxia that has been observed to develop at certain periods during the year. Stratification affects vertical turbulent mixing of heat, momentum and mass (or constituents) within the water column, in turn influencing the transport of material. The mixing threshold is dependent on the value of the Richardson Number, Ri with inhibition due to stratification occurring at low values (< 0.25) and complete vertical mixing occurring at high values (> 0.25) of Ri. Corpus Christi Bay with average depth of ~3 m is the largest among a system of five bays has been known to stratify due to inflows of hypersaline water (up to 50 psu) from adjoining bays, the Laguna Madre and Oso Bay. Laguna Madre is separated from the Gulf of Mexico by a barrier island and becomes hypersaline because of the imbalance between inflow of freshwater and bay evaporation. Hypersalinity also occurs in Oso Bay due to anthropogenic forcing from a power plant that draws 400 MGD of cooling water from the upper Laguna Madre, discharging waste water into Oso Bay. Several wastewater treatment plants also discharge directly into Oso Bay or its tributary streams. The objective of this study is to develop a methodology for prescribing a set of parameters required for modeling and characterization of hypoxia in this shallow wind-driven bay. The extent to which Ri is dependent on external forcing at the surface boundary was measured using our fully instrumented sensor platforms. Each sensor platform includes sensors for synchronic near-surface meteorological (wind velocity, barometric pressure, air temperature) and water column oceanographic (current, water temperature, conductivity, particle size distribution, particulate concentration, dissolved oxygen, nutrient) variables. These were measured using fixed and mobile vertical profiling sensor platforms. A 2D hydrodynamic model was initially developed for the bay and results indicate that water mass is conserved through a strong vortex spawning from the ~ 20 m deep ship channel that runs east-west along the northernmost portion of the bay. HF radar "observations" however does not indicate this vortical structure suggesting that water conservation is maintained through vertical eddies, captured by 3D current measurements using Acoustic Doppler profilers. This is an example of where advanced sensors indicate needs for more advanced modeling, leading us toward the development of 3D hydrodynamic model for the bay. The geomorphology of the bay (shallow with respect to the deep ship channel) poses a challenge in this model development. Knowledge of stratification in this system of bays has been increased through this study. Measurements taken using the instrument suite deployed by our research facility was coupled with (observed and predicted) hydrodynamic and meteorological data, providing new insight into stratification in Corpus Christi Bay. The bay was observed as cycling through quiescent and well-mixed periods under strong wind influence with the onset of hypoxia during the summer months (June through August). Quiescent periods, when combined with tidal cycling and inland horizontal gradient propagation (from adjoining water bodies as described) lead to conditions favorable to stratification.
Soils and Vegetation of the Khaipudyr Bay Coast of the Barents Sea
NASA Astrophysics Data System (ADS)
Shamrikova, E. V.; Deneva, S. V.; Panyukov, A. N.; Kubik, O. S.
2018-04-01
Soils and vegetation of the coastal zone of the Khaipudyr Bay of the Barents Sea have been examined and compared with analogous objects in the Karelian coastal zone of the White Sea. The environmental conditions of these two areas are somewhat different: the climate of the Khaipudyr Bay coast is more severe, and the seawater salinity is higher (32-33‰ in the Khaipudyr Bay and 25-26‰ in the White Sea). The soil cover patterns of both regions are highly variable. Salt-affected marsh soils (Tidalic Fluvisols) are widespread. The complicated mesotopography includes high geomorphic positions that are not affected by tidal water. Under these conditions, zonal factors of pedogenesis predominate and lead to the development of Cryic Folic Histosols and Histic Reductaquic Cryosols. On low marshes, the concentrations of soluble Ca2+, K+ + Na+, Cl-, and SO2- 4 ions in the soils of the Khaipudyr Bay coast are two to four times higher than those in the analogous soils of Karelian coast. Cluster analysis of a number of soil characteristics allows separation of three soils groups: soils of low marshes, soils of middle-high marshes, and soils of higher positions developing under the impact of zonal factors together with the aerial transfer and deposition of seawater drops. The corresponding plant communities are represented by coastal sedge cenoses, forb-grassy halophytic cenoses, and zonal cenoses of hypoarctic tundra. It is argued that the grouping of marsh soils in the new substantivegenetic classification system of Russian soils requires further elaboration.
A comparison between skeleton and bounding box models for falling direction recognition
NASA Astrophysics Data System (ADS)
Narupiyakul, Lalita; Srisrisawang, Nitikorn
2017-12-01
Falling is an injury that can lead to a serious medical condition in every range of the age of people. However, in the case of elderly, the risk of serious injury is much higher. Due to the fact that one way of preventing serious injury is to treat the fallen person as soon as possible, several works attempted to implement different algorithms to recognize the fall. Our work compares the performance of two models based on features extraction: (i) Body joint data (Skeleton Data) which are the joint's positions in 3 axes and (ii) Bounding box (Box-size Data) covering all body joints. Machine learning algorithms that were chosen are Decision Tree (DT), Naïve Bayes (NB), K-nearest neighbors (KNN), Linear discriminant analysis (LDA), Voting Classification (VC), and Gradient boosting (GB). The results illustrate that the models trained with Skeleton data are performed far better than those trained with Box-size data (with an average accuracy of 94-81% and 80-75%, respectively). KNN shows the best performance in both Body joint model and Bounding box model. In conclusion, KNN with Body joint model performs the best among the others.
Tidal-flow, circulation, and flushing characteristics of Kings Bay, Citrus County, Florida
Hammett, K.M.; Goodwin, C.R.; Sanders, G.L.
1996-01-01
Kings Bay is an estuary on the gulf coast of peninsular Florida with a surface area of less than one square mile. It is a unique estuarine system with no significant inflowing rivers or streams. As much as 99 percent of the freshwater entering the bay originates from multiple spring vents at the bottom of the estuary. The circulation and flushing characteristics of Kings Bay were evaluated by applying SIMSYS2D, a two-dimensional numerical model. Field data were used to calibrate and verify the model. Lagrangian particle simulations were used to determine the circulation characteristics for three hydrologic conditions: low inflow, typical inflow, and low inflow with reduced friction from aquatic vegetation. Spring discharge transported the particles from Kings Bay through Crystal River and out of the model domain. Tidal effects added an oscillatory component to the particle paths. The mean particle residence time was 59 hours for low inflow with reduced friction; therefore, particle residence time is affected more by spring discharge than by bottom friction. Circulation patterns were virtually identical for the three simulated hydroloigc conditions. Simulated particles introduced in the southern part of Kings Bay traveled along the eastern side of Buzzard Island before entering Crystal River and existing the model domain. The flushing characteristics of Kings Bay for the three hydrodynamic conditions were determined by simulating the injection of conservative dye constituents. The average concentration of dye initially injected in Kings Bay decreased asymptotically because of spring discharge, and the tide caused some oscillation in the average dye concentration. Ninety-five percent of the injected dye exited Kings Bay and Crystal River with 94 hours for low inflow, 71 hours for typical inflow, and 94 hours for low inflow with reduced bottom friction. Simulation results indicate that all of the open waters of Kings Bay are flushed by the spring discharge. Reduced bottom friction has little effect on flushing.
Seafloor geomorphology of western Antarctic Peninsula bays: a signature of ice flow behaviour
NASA Astrophysics Data System (ADS)
Munoz, Yuribia P.; Wellner, Julia S.
2018-01-01
Glacial geomorphology is used in Antarctica to reconstruct ice advance during the Last Glacial Maximum and subsequent retreat across the continental shelf. Analogous geomorphic assemblages are found in glaciated fjords and are used to interpret the glacial history and glacial dynamics in those areas. In addition, understanding the distribution of submarine landforms in bays and the local controls exerted on ice flow can help improve numerical models by providing constraints through these drainage areas. We present multibeam swath bathymetry from several bays in the South Shetland Islands and the western Antarctic Peninsula. The submarine landforms are described and interpreted in detail. A schematic model was developed showing the features found in the bays: from glacial lineations and moraines in the inner bay to grounding zone wedges and drumlinoid features in the middle bay and streamlined features and meltwater channels in the outer bay areas. In addition, we analysed local variables in the bays and observed the following: (1) the number of landforms found in the bays scales to the size of the bay, but the geometry of the bays dictates the types of features that form; specifically, we observe a correlation between the bay width and the number of transverse features present in the bays. (2) The smaller seafloor features are present only in the smaller glacial systems, indicating that short-lived atmospheric and oceanographic fluctuations, responsible for the formation of these landforms, are only recorded in these smaller systems. (3) Meltwater channels are abundant on the seafloor, but some are subglacial, carved in bedrock, and some are modern erosional features, carved on soft sediment. Lastly, based on geomorphological evidence, we propose the features found in some of the proximal bay areas were formed during a recent glacial advance, likely the Little Ice Age.
An automated approach to the design of decision tree classifiers
NASA Technical Reports Server (NTRS)
Argentiero, P.; Chin, R.; Beaudet, P.
1982-01-01
An automated technique is presented for designing effective decision tree classifiers predicated only on a priori class statistics. The procedure relies on linear feature extractions and Bayes table look-up decision rules. Associated error matrices are computed and utilized to provide an optimal design of the decision tree at each so-called 'node'. A by-product of this procedure is a simple algorithm for computing the global probability of correct classification assuming the statistical independence of the decision rules. Attention is given to a more precise definition of decision tree classification, the mathematical details on the technique for automated decision tree design, and an example of a simple application of the procedure using class statistics acquired from an actual Landsat scene.
Geomorphologic Modeling of a Macro-Tidal Embayment With Extensive Tidal Flats: Skagit Bay, WA
2009-01-01
Tidal Flats: Skagit Bay , WA Lyle Hibler Pacific Northwest National Laboratory 1529 West Sequim Bay Road Sequim , WA 98382 phone: (360) 681...3616 fax: (360) 681-3681 email: lyle.hibler@pnl.gov Adam Maxwell Pacific Northwest National Laboratory 1529 West Sequim Bay Road Sequim , WA... Sequim Bay Road, Sequim ,WA,98382 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10. SPONSOR
Geomorphologic Modeling of a Macro-tidal Embayment with Extensive Tidal Flats: Skagit Bay, WA
2008-01-01
Bay , WA Lyle Hibler Pacific Northwest National Laboratory 1529 West Sequim Bay Road Sequim , WA 98382 phone: (360) 681-3616 fax: (360)681-3681...email: lyle.hibler@pnl.gov Adam Maxwell Pacific Northwest National Laboratory 1529 West Sequim Bay Road Sequim , WA 98382 phone: (360) 681... Sequim Bay Road, Sequim ,WA,98382 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY NAME(S) AND ADDRESS(ES) 10.
Revealing Fundamental Physics from the Daya Bay Neutrino Experiment Using Deep Neural Networks
Racah, Evan; Ko, Seyoon; Sadowski, Peter; ...
2017-02-02
Experiments in particle physics produce enormous quantities of data that must be analyzed and interpreted by teams of physicists. This analysis is often exploratory, where scientists are unable to enumerate the possible types of signal prior to performing the experiment. Thus, tools for summarizing, clustering, visualizing and classifying high-dimensional data are essential. Here in this work, we show that meaningful physical content can be revealed by transforming the raw data into a learned high-level representation using deep neural networks, with measurements taken at the Daya Bay Neutrino Experiment as a case study. We further show how convolutional deep neural networksmore » can provide an effective classification filter with greater than 97% accuracy across different classes of physics events, significantly better than other machine learning approaches.« less
Dai, Tianjiao; Zhang, Yan; Tang, Yushi; Bai, Yaohui; Tao, Yile; Huang, Bei; Wen, Donghui
2016-10-01
Coastal areas are land-sea transitional zones with complex natural and anthropogenic disturbances. Microorganisms in coastal sediments adapt to such disturbances both individually and as a community. The microbial community structure changes spatially and temporally under environmental stress. In this study, we investigated the microbial community structure in the sediments of Hangzhou Bay, a seriously polluted bay in China. In order to identify the roles and contribution of all microbial taxa, we set thresholds as 0.1% for rare taxa and 1% for abundant taxa, and classified all operational taxonomic units into six exclusive categories based on their abundance. The results showed that the key taxa in differentiating the communities are abundant taxa (AT), conditionally abundant taxa (CAT), and conditionally rare or abundant taxa (CRAT). A large population in conditionally rare taxa (CRT) made this category collectively significant in differentiating the communities. Both bacteria and archaea demonstrated a distance decay pattern of community similarity in the bay, and this pattern was strengthened by rare taxa, CRT and CRAT, but weakened by AT and CAT. This implied that the low abundance taxa were more deterministically distributed, while the high abundance taxa were more ubiquitously distributed. © FEMS 2016. All rights reserved. For permissions, please e-mail: journals.permissions@oup.com.
Huang, Guangzao; Yuan, Mingshun; Chen, Moliang; Li, Lei; You, Wenjie; Li, Hanjie; Cai, James J; Ji, Guoli
2017-10-07
The application of machine learning in cancer diagnostics has shown great promise and is of importance in clinic settings. Here we consider applying machine learning methods to transcriptomic data derived from tumor-educated platelets (TEPs) from individuals with different types of cancer. We aim to define a reliability measure for diagnostic purposes to increase the potential for facilitating personalized treatments. To this end, we present a novel classification method called MFRB (for Multiple Fitting Regression and Bayes decision), which integrates the process of multiple fitting regression (MFR) with Bayes decision theory. MFR is first used to map multidimensional features of the transcriptomic data into a one-dimensional feature. The probability density function of each class in the mapped space is then adjusted using the Gaussian probability density function. Finally, the Bayes decision theory is used to build a probabilistic classifier with the estimated probability density functions. The output of MFRB can be used to determine which class a sample belongs to, as well as to assign a reliability measure for a given class. The classical support vector machine (SVM) and probabilistic SVM (PSVM) are used to evaluate the performance of the proposed method with simulated and real TEP datasets. Our results indicate that the proposed MFRB method achieves the best performance compared to SVM and PSVM, mainly due to its strong generalization ability for limited, imbalanced, and noisy data.
Terziotti, Silvia; Capel, Paul D.; Tesoriero, Anthony J.; Hopple, Jessica A.; Kronholm, Scott C.
2018-03-07
The water quality of the Chesapeake Bay may be adversely affected by dissolved nitrate carried in groundwater discharge to streams. To estimate the concentrations, loads, and yields of nitrate from groundwater to streams for the Chesapeake Bay watershed, a regression model was developed based on measured nitrate concentrations from 156 small streams with watersheds less than 500 square miles (mi2 ) at baseflow. The regression model has three predictive variables: geologic unit, percent developed land, and percent agricultural land. Comparisons of estimated and actual values within geologic units were closely matched. The coefficient of determination (R2 ) for the model was 0.6906. The model was used to calculate baseflow nitrate concentrations at over 83,000 National Hydrography Dataset Plus Version 2 catchments and aggregated to 1,966 total 12-digit hydrologic units in the Chesapeake Bay watershed. The modeled output geospatial data layers provided estimated annual loads and yields of nitrate from groundwater into streams. The spatial distribution of annual nitrate yields from groundwater estimated by this method was compared to the total watershed yields of all sources estimated from a Chesapeake Bay SPAtially Referenced Regressions On Watershed attributes (SPARROW) water-quality model. The comparison showed similar spatial patterns. The regression model for groundwater contribution had similar but lower yields, suggesting that groundwater is an important source of nitrogen for streams in the Chesapeake Bay watershed.
Three-dimensional hydrodynamic modelling study of reverse estuarine circulation: Kuwait Bay.
Alosairi, Y; Pokavanich, T; Alsulaiman, N
2018-02-01
Hydrodynamics and associated environmental processes have always been of major concern to coastal-dependent countries, such as Kuwait. This is due to the environmental impact that accompanies the economic and commercial activities along the coastal areas. In the current study, a three-dimensional numerical model is utilized to unveil the main dynamic and physical properties of Kuwait Bay during the critical season. The model performance over the summer months (June, July and August 2012) is assessed against comprehensive field measurements of water levels, velocity, temperature and salinity data before using the model to describe the circulation as driven by tides, gravitational convection and winds. The results showed that the baroclinic conditions in the Bay are mainly determined by the horizontal salinity gradient and to much less extent temperature gradient. The gradients stretched over the southern coast of the Bay where dense water is found at the inner and enclosed areas, while relatively lighter waters are found near the mouth of the Bay. This gradient imposed a reversed estuarine circulation at the main axis of the Bay, particularly during neap tides when landward flow near the surface and seaward flow near the bed are most evident. The results also revealed that the shallow areas, including Sulaibikhat and Jahra Bays, are well mixed and generally flow in the counter-clockwise direction. Clockwise circulations dominated the northern portion of the Bay, forming a sort of large eddy, while turbulent fields associated with tidal currents were localized near the headlands. Copyright © 2017 Elsevier Ltd. All rights reserved.
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-01-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models. PMID:23275882
Pérez-Rodríguez, Paulino; Gianola, Daniel; González-Camacho, Juan Manuel; Crossa, José; Manès, Yann; Dreisigacker, Susanne
2012-12-01
In genome-enabled prediction, parametric, semi-parametric, and non-parametric regression models have been used. This study assessed the predictive ability of linear and non-linear models using dense molecular markers. The linear models were linear on marker effects and included the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B. The non-linear models (this refers to non-linearity on markers) were reproducing kernel Hilbert space (RKHS) regression, Bayesian regularized neural networks (BRNN), and radial basis function neural networks (RBFNN). These statistical models were compared using 306 elite wheat lines from CIMMYT genotyped with 1717 diversity array technology (DArT) markers and two traits, days to heading (DTH) and grain yield (GY), measured in each of 12 environments. It was found that the three non-linear models had better overall prediction accuracy than the linear regression specification. Results showed a consistent superiority of RKHS and RBFNN over the Bayesian LASSO, Bayesian ridge regression, Bayes A, and Bayes B models.
Modification of Gaussian mixture models for data classification in high energy physics
NASA Astrophysics Data System (ADS)
Štěpánek, Michal; Franc, Jiří; Kůs, Václav
2015-01-01
In high energy physics, we deal with demanding task of signal separation from background. The Model Based Clustering method involves the estimation of distribution mixture parameters via the Expectation-Maximization algorithm in the training phase and application of Bayes' rule in the testing phase. Modifications of the algorithm such as weighting, missing data processing, and overtraining avoidance will be discussed. Due to the strong dependence of the algorithm on initialization, genetic optimization techniques such as mutation, elitism, parasitism, and the rank selection of individuals will be mentioned. Data pre-processing plays a significant role for the subsequent combination of final discriminants in order to improve signal separation efficiency. Moreover, the results of the top quark separation from the Tevatron collider will be compared with those of standard multivariate techniques in high energy physics. Results from this study has been used in the measurement of the inclusive top pair production cross section employing DØ Tevatron full Runll data (9.7 fb-1).
Predictive analysis and data mining among the employment of fresh graduate students in HEI
NASA Astrophysics Data System (ADS)
Rahman, Nor Azziaty Abdul; Tan, Kian Lam; Lim, Chen Kim
2017-10-01
Management of higher education have a problem in producing 100% of graduates who can meet the needs of industry while industry is also facing the problem of finding skilled graduates who suit their needs partly due to the lack of an effective method in assessing problem solving skills as well as weaknesses in the assessment of problem-solving skills. The purpose of this paper is to propose a suitable classification model that can be used in making prediction and assessment of the attributes of the student's dataset to meet the selection criteria of work demanded by the industry of the graduates in the academic field. Supervised and unsupervised Machine Learning Algorithms were used in this research where; K-Nearest Neighbor, Naïve Bayes, Decision Tree, Neural Network, Logistic Regression and Support Vector Machine. The proposed model will help the university management to make a better long-term plans for producing graduates who are skilled, knowledgeable and fulfill the industry needs as well.
Brain cancer probed by native fluorescence and stokes shift spectroscopy
NASA Astrophysics Data System (ADS)
Zhou, Yan; Liu, Cheng-hui; He, Yong; Pu, Yang; Li, Qingbo; Wang, Wei; Alfano, Robert R.
2012-12-01
Optical biopsy spectroscopy was applied to diagnosis human brain cancer in vitro. The spectra of native fluorescence, Stokes shift and excitation spectra were obtained from malignant meningioma, benign, normal meningeal tissues and acoustic neuroma benign tissues. The wide excitation wavelength ranges were used to establish the criterion for distinguishing brain diseases. The alteration of fluorescence spectra between normal and abnormal brain tissues were identified by the characteristic fluorophores under the excitation with UV to visible wavelength range. It was found that the ratios of the peak intensities and peak position in both spectra of fluorescence and Stokes shift may be used to diagnose human brain meninges diseases. The preliminary analysis of fluorescence spectral data from cancer and normal meningeal tissues by basic biochemical component analysis model (BBCA) and Bayes classification model based on statistical methods revealed the changes of components, and classified the difference between cancer and normal human brain meningeal tissues in a predictions accuracy rate is 0.93 in comparison with histopathology and immunohistochemistry reports (gold standard).
Martucci, Sarah K.; Krstolic, Jennifer L.; Raffensperger, Jeff P.; Hopkins, Katherine J.
2006-01-01
The U.S. Geological Survey, U.S. Environmental Protection Agency Chesapeake Bay Program Office, Interstate Commission on the Potomac River Basin, Maryland Department of the Environment, Virginia Department of Conservation and Recreation, Virginia Department of Environmental Quality, and the University of Maryland Center for Environmental Science are collaborating on the Chesapeake Bay Regional Watershed Model, using Hydrological Simulation Program - FORTRAN to simulate streamflow and concentrations and loads of nutrients and sediment to Chesapeake Bay. The model will be used to provide information for resource managers. In order to establish a framework for model simulation, digital spatial datasets were created defining the discretization of the model region (including the Chesapeake Bay watershed, as well as the adjacent parts of Maryland, Delaware, and Virginia outside the watershed) into land segments, a stream-reach network, and associated watersheds. Land segmentation was based on county boundaries represented by a 1:100,000-scale digital dataset. Fifty of the 254 counties and incorporated cities in the model region were divided on the basis of physiography and topography, producing a total of 309 land segments. The stream-reach network for the Chesapeake Bay watershed part of the model region was based on the U.S. Geological Survey Chesapeake Bay SPARROW (SPAtially Referenced Regressions On Watershed attributes) model stream-reach network. Because that network was created only for the Chesapeake Bay watershed, the rest of the model region uses a 1:500,000-scale stream-reach network. Streams with mean annual streamflow of less than 100 cubic feet per second were excluded based on attributes from the dataset. Additional changes were made to enhance the data and to allow for inclusion of stream reaches with monitoring data that were not part of the original network. Thirty-meter-resolution Digital Elevation Model data were used to delineate watersheds for each stream reach. State watershed boundaries replaced the Digital Elevation Model-derived watersheds where coincident. After a number of corrections, the watersheds were coded to indicate major and minor basin, mean annual streamflow, and each watershed's unique identifier as well as that of the downstream watershed. Land segments and watersheds were intersected to create land-watershed segments for the model.
A nowcast model for tides and tidal currents in San Francisco Bay, California
Cheng, Ralph T.; Smith, Richard E.
1998-01-01
National Oceanographic and Atmospheric Administration (NOAA) installed Physical Oceanographic Real-Time System (PORTS) in San Francisco Bay, California to provide observations of tides, tidal currents, and meteorological conditions. PORTS data are used for optimizing vessel operations, increasing margin of safety for navigation, and guiding hazardous material spill prevention and response. Because tides and tidal currents in San Francisco Bay are extremely complex, limited real-time observations are insufficient to provide spatial resolution for variations of tides and tidal currents. To fill the information gaps, a highresolution, robust, semi-implicit, finite-difference nowcast numerical model has been implemented for San Francisco Bay. The model grid and water depths are defined on coordinates based on Mercator projection so the model outputs can be directly superimposed on navigation charts. A data assimilation algorithm has been established to derive the boundary conditions for model simulations. The nowcast model is executed every hour continuously for tides and tidal currents starting from 24 hours before the present time (now) covering a total of 48 hours simulation. Forty-eight hours of nowcast model results are available to the public at all times through the World Wide Web (WWW). Users can view and download the nowcast model results for tides and tidal current distributions in San Francisco Bay for their specific applications and for further analysis.
Schmieder, Roberta; Puehler, Florian; Neuhaus, Roland; Kissel, Maria; Adjei, Alex A; Miner, Jeffrey N; Mumberg, Dominik; Ziegelbauer, Karl; Scholz, Arne
2013-01-01
OBJECTIVE: The objectives of the study were to evaluate the allosteric mitogen-activated protein kinase kinase (MEK) inhibitor BAY 86-9766 in monotherapy and in combination with sorafenib in orthotopic and subcutaneous hepatocellular carcinoma (HCC) models with different underlying etiologies in two species. DESIGN: Antiproliferative potential of BAY 86-9766 and synergistic effects with sorafenib were studied in several HCC cell lines. Relevant pathway signaling was studied in MH3924a cells. For in vivo testing, the HCC cells were implanted subcutaneously or orthotopically. Survival and mode of action (MoA) were analyzed. RESULTS: BAY 86-9766 exhibited potent antiproliferative activity in HCC cell lines with half-maximal inhibitory concentration values ranging from 33 to 762 nM. BAY 86-9766 was strongly synergistic with sorafenib in suppressing tumor cell proliferation and inhibiting phosphorylation of the extracellular signal-regulated kinase (ERK). BAY 86-9766 prolonged survival in Hep3B xenografts, murine Hepa129 allografts, and MH3924A rat allografts. Additionally, tumor growth, ascites formation, and serum alpha-fetoprotein levels were reduced. Synergistic effects in combination with sorafenib were shown in Huh-7, Hep3B xenografts, and MH3924A allografts. On the signaling pathway level, the combination of BAY 86-9766 and sorafenib led to inhibition of the upregulatory feedback loop toward MEK phosphorylation observed after BAY 86-9766 monotreatment. With regard to the underlying MoA, inhibition of ERK phosphorylation, tumor cell proliferation, and microvessel density was observed in vivo. CONCLUSION: BAY 86-9766 shows potent single-agent antitumor activity and acts synergistically in combination with sorafenib in preclinical HCC models. These results support the ongoing clinical development of BAY 86-9766 and sorafenib in advanced HCC. PMID:24204195
NASA Astrophysics Data System (ADS)
Horiguchi, Fumio; Nakata, Kisaburo; Ito, Naganori; Okawa, Ken
2006-12-01
A risk assessment of Tributyltin (TBT) in Tokyo Bay was conducted using the Margin of Exposure (MOE) method at the species level using the Japanese short-neck clam, Ruditapes philippinarum. The assessment endpoint was defined to protect R. philippinarum in Tokyo Bay from TBT (growth effects). A No Observed Effect Concentration (NOEC) for this species with respect to growth reduction induced by TBT was estimated from experimental results published in the scientific literature. Sources of TBT in this study were assumed to be commercial vessels in harbors and navigation routes. Concentrations of TBT in Tokyo Bay were estimated using a three-dimensional hydrodynamic model, an ecosystem model and a chemical fate model. MOEs for this species were estimated for the years 1990, 2000, and 2007. Estimated MOEs for R. philippinarum for 1990, 2000, and 2007 were approximately 1-3, 10, and 100, respectively, indicating a declining temporal trend in the probability of adverse growth effects. A simplified software package called RAMTB was developed by incorporating the chemical fate model and the databases of seasonal flow fields and distributions of organic substances (phytoplankton and detritus) in Tokyo Bay, simulated by the hydrodynamic and ecological model, respectively.
Ge Sun; Timothy J. Callahan; Jennifer E. Pyzoha; Carl C. Trettin
2006-01-01
Restoring depressional wetlands or geographically isolated wetlands such as cypress swamps and Carolina bays on the Atlantic Coastal Plains requires a clear understanding of the hydrologic processes and water balances. The objectives of this paper are to (1) test a distributed forest hydrology model, FLATWOODS, for a Carolina bay wetland system using seven years of...
Ge Sun; Timothy J. Callahan; Jennifer E. Pyzoha; Carl C. Trettin
2006-01-01
Restoring depressional wetlands or geographically isolated wetlands such as cypress swamps and Carolina bays on the Atlantic Coastal Plains requires a clear understanding of the hydrologic processes and water balances. The objectives of this paper are to (1) test a distributed forest hydrology model, FLATWOODS, for a Carolina bay wetland system using seven years of...
A&M. TAN607. Sections for second phase expansion: engine maintenance, machine, ...
A&M. TAN-607. Sections for second phase expansion: engine maintenance, machine, and welding shops; high bay assembly shop, chemical cleaning room (decontamination). Details of sliding door hoods. Approved by INEEL Classification Office for public release. Ralph M. Parsons 1299-5-ANP/GE-3-607-A 109. Date: August 1956. INEEL index code no. 034-0607-00-693-107169 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID
NASA Astrophysics Data System (ADS)
Jacobsen, Timothy R.; Milutinovic, James D.; Miller, James R.
1990-11-01
Physical processes are important in determining benthic recruitment success in estuarine ecosystems. We have conducted two field studies with passive surface drifters to examine the large-scale advection and local dispersion in the region of the oyster seed beds in Delaware Bay. The two studies show that the wind is critical in determining the final location of the drifters and that axial fronts in the bay may play an important role in reducing cross-bay particle dispersion and may keep particles in the nearshore oyster beds. Simulations of particle trajectories from a three-dimensional numerical model of Delaware Bay were also analyzed to determine the sensitivity of particle trajectories to varying wind conditions and different assumptions about larval vertical migration.
Chang, Ni-Bin; Wimberly, Brent; Xuan, Zhemin
2012-03-01
This study presents an integrated k-means clustering and gravity model (IKCGM) for investigating the spatiotemporal patterns of nutrient and associated dissolved oxygen levels in Tampa Bay, Florida. By using a k-means clustering analysis to first partition the nutrient data into a user-specified number of subsets, it is possible to discover the spatiotemporal patterns of nutrient distribution in the bay and capture the inherent linkages of hydrodynamic and biogeochemical features. Such patterns may then be combined with a gravity model to link the nutrient source contribution from each coastal watershed to the generated clusters in the bay to aid in the source proportion analysis for environmental management. The clustering analysis was carried out based on 1 year (2008) water quality data composed of 55 sample stations throughout Tampa Bay collected by the Environmental Protection Commission of Hillsborough County. In addition, hydrological and river water quality data of the same year were acquired from the United States Geological Survey's National Water Information System to support the gravity modeling analysis. The results show that the k-means model with 8 clusters is the optimal choice, in which cluster 2 at Lower Tampa Bay had the minimum values of total nitrogen (TN) concentrations, chlorophyll a (Chl-a) concentrations, and ocean color values in every season as well as the minimum concentration of total phosphorus (TP) in three consecutive seasons in 2008. The datasets indicate that Lower Tampa Bay is an area with limited nutrient input throughout the year. Cluster 5, located in Middle Tampa Bay, displayed elevated TN concentrations, ocean color values, and Chl-a concentrations, suggesting that high values of colored dissolved organic matter are linked with some nutrient sources. The data presented by the gravity modeling analysis indicate that the Alafia River Basin is the major contributor of nutrients in terms of both TP and TN values in all seasons. With this new integration, improvements for environmental monitoring and assessment were achieved to advance our understanding of sea-land interactions and nutrient cycling in a critical coastal bay, the Gulf of Mexico. This journal is © The Royal Society of Chemistry 2012
Hydrodynamics and Eutrophication Model Study of Indian River and Rehoboth Bay, Delaware
1994-05-01
Station, Vicksburg, MS. V Chapter I: Introduction The Study System Indian River and Rehoboth Bay (Figure 1-1) are two water bodies that form part of the...and mass trans- port throughout the system . Objectives The primary objective of this study is to provide a hydrodynamic/ water quality model packge of...portion opens out into Indian River Bay (Figure 3-1). The cooling water diversion was included in the hydrodynamic model. Flow through the power plant, at
NASA Astrophysics Data System (ADS)
Boschetto, Davide; Di Claudio, Gianluca; Mirzaei, Hadis; Leong, Rupert; Grisan, Enrico
2016-03-01
Celiac disease (CD) is an immune-mediated enteropathy triggered by exposure to gluten and similar proteins, affecting genetically susceptible persons, increasing their risk of different complications. Small bowels mucosa damage due to CD involves various degrees of endoscopically relevant lesions, which are not easily recognized: their overall sensitivity and positive predictive values are poor even when zoom-endoscopy is used. Confocal Laser Endomicroscopy (CLE) allows skilled and trained experts to qualitative evaluate mucosa alteration such as a decrease in goblet cells density, presence of villous atrophy or crypt hypertrophy. We present a method for automatically classifying CLE images into three different classes: normal regions, villous atrophy and crypt hypertrophy. This classification is performed after a features selection process, in which four features are extracted from each image, through the application of homomorphic filtering and border identification through Canny and Sobel operators. Three different classifiers have been tested on a dataset of 67 different images labeled by experts in three classes (normal, VA and CH): linear approach, Naive-Bayes quadratic approach and a standard quadratic analysis, all validated with a ten-fold cross validation. Linear classification achieves 82.09% accuracy (class accuracies: 90.32% for normal villi, 82.35% for VA and 68.42% for CH, sensitivity: 0.68, specificity 1.00), Naive Bayes analysis returns 83.58% accuracy (90.32% for normal villi, 70.59% for VA and 84.21% for CH, sensitivity: 0.84 specificity: 0.92), while the quadratic analysis achieves a final accuracy of 94.03% (96.77% accuracy for normal villi, 94.12% for VA and 89.47% for CH, sensitivity: 0.89, specificity: 0.98).
Machine learning approaches to diagnosis and laterality effects in semantic dementia discourse.
Garrard, Peter; Rentoumi, Vassiliki; Gesierich, Benno; Miller, Bruce; Gorno-Tempini, Maria Luisa
2014-06-01
Advances in automatic text classification have been necessitated by the rapid increase in the availability of digital documents. Machine learning (ML) algorithms can 'learn' from data: for instance a ML system can be trained on a set of features derived from written texts belonging to known categories, and learn to distinguish between them. Such a trained system can then be used to classify unseen texts. In this paper, we explore the potential of the technique to classify transcribed speech samples along clinical dimensions, using vocabulary data alone. We report the accuracy with which two related ML algorithms [naive Bayes Gaussian (NBG) and naive Bayes multinomial (NBM)] categorized picture descriptions produced by: 32 semantic dementia (SD) patients versus 10 healthy, age-matched controls; and SD patients with left- (n = 21) versus right-predominant (n = 11) patterns of temporal lobe atrophy. We used information gain (IG) to identify the vocabulary features that were most informative to each of these two distinctions. In the SD versus control classification task, both algorithms achieved accuracies of greater than 90%. In the right- versus left-temporal lobe predominant classification, NBM achieved a high level of accuracy (88%), but this was achieved by both NBM and NBG when the features used in the training set were restricted to those with high values of IG. The most informative features for the patient versus control task were low frequency content words, generic terms and components of metanarrative statements. For the right versus left task the number of informative lexical features was too small to support any specific inferences. An enriched feature set, including values derived from Quantitative Production Analysis (QPA) may shed further light on this little understood distinction. Copyright © 2013 Elsevier Ltd. All rights reserved.
A Simple Model of Nitrogen Concentration, Throughput, and Denitrification in Estuaries
The Estuary Nitrogen Model (ENM) is a mass balance model that includes calculation of nitrogen losses within bays and estuaries using system flushing time. The model has been used to demonstrate the dependence of throughput and denitrification of nitrogen in bays and estuaries on...
Simulation of ground-water discharge to Biscayne Bay, southeastern Florida
Langevin, Christian David
2001-01-01
As part of the Place-Based Studies Program, the U.S. Geological Survey initiated a project in 1996, in cooperation with the U.S. Army Corps of Engineers, to quantify the rates and patterns of submarine ground-water discharge to Biscayne Bay. Project objectives were achieved through field investigations at three sites (Coconut Grove, Deering Estate, and Mowry Canal) along the coastline of Biscayne Bay and through the development and calibration of variable-density, ground-water flow models. Two-dimensional, vertical cross-sectional models were developed for steady-state conditions for the Coconut Grove and Deering Estate transects to quantify local-scale ground-water discharge patterns to Biscayne Bay. A larger regional-scale model was developed in three dimensions to simulate submarine ground-water discharge to the entire bay. The SEAWAT code, which is a combined version of MODFLOW and MT3D, was used to simulate the complex variable-density flow patterns. Field data suggest that ground-water discharge to Biscayne Bay relative to the shoreline is restricted to within 300 meters at Coconut Grove, 600 to 1,000 meters at Deering Estate, and 100 meters at Mowry Canal. The vertical cross-sectional models, which were calibrated to the field data using the assumption of steady state, tend to focus ground-water discharge to within 50 to 200 meters of the shoreline. With homogeneous distributions for aquifer parameters and a constant-concentration boundary for Biscayne Bay, the numerical models could not reproduce the lower ground-water salinities observed beneath the bay, which suggests that further research may be necessary to improve the accuracy of the numerical simulations. Results from the cross-sectional models, which were able to simulate the approximate position of the saltwater interface, suggest that longitudinal dispersivity ranges between 1 and 10 meters, and transverse dispersivity ranges from 0.1 to 1 meter for the Biscayne aquifer. The three-dimensional, regional-scale model was calibrated to ground-water heads, canal baseflow, and the general position of the saltwater interface for nearly a 10-year period from 1989 to 1998. The mean absolute error between observed and simulated head values is 0.15 meter. The mean absolute error between observed and simulated baseflow is 3 x 105 cubic meters per day. The position of the simulated saltwater interface generally matches the position observed in the field, except for areas north of the Miami Canal where the simulated saltwater interface is located about 5 kilometers inland of the observed saltwater interface. Results from the regional-scale model suggest that the average rate of fresh ground-water discharge to Biscayne Bay for the 10-year period (1989-98) is about 2 x 105 cubic meters per day for 100 kilometers of coastline. This simulated discharge rate is about 6 percent of the measured surface-water discharge to Biscayne Bay for the same period. The model also suggests that nearly 100 percent of the fresh ground-water discharge is to the northern half of Biscayne Bay, north of the Cutler Drain Canal. South of the Cutler Drain Canal, coastal lowlands prevent the water table from rising high enough to drive measurable quantities of ground water to Biscayne Bay. Annual variations in sea-level elevation, which can be as large as 0.3 meter, have a substantial effect on rates of ground-water discharge. During 1989-98, simulated rates of ground-water discharge to Biscayne Bay generally are highest when sea level is relatively low.
NASA Technical Reports Server (NTRS)
Goldberg, Daniel L.; Loughner, Christopher P.; Tzortziou, Maria; Stehr, Jeffrey W.; Pickering, Kenneth E.; Marufu, Lackson T.; Dickerson, Russell R.
2013-01-01
Air quality models, such as the Community Multiscale Air Quality (CMAQ) model, indicate decidedly higher ozone near the surface of large interior water bodies, such as the Great Lakes and Chesapeake Bay. In order to test the validity of the model output, we performed surface measurements of ozone (O3) and total reactive nitrogen (NOy) on the 26-m Delaware II NOAA Small Research Vessel experimental (SRVx), deployed in the Chesapeake Bay for 10 daytime cruises in July 2011 as part of NASA's GEO-CAPE CBODAQ oceanographic field campaign in conjunction with NASA's DISCOVER-AQ air quality field campaign. During this 10-day period, the EPA O3 regulatory standard of 75 ppbv averaged over an 8-h period was exceeded four times over water while ground stations in the area only exceeded the standard at most twice. This suggests that on days when the Baltimore/Washington region is in compliance with the EPA standard, air quality over the Chesapeake Bay might exceed the EPA standard. Ozone observations over the bay during the afternoon were consistently 10-20% higher than the closest upwind ground sites during the 10-day campaign; this pattern persisted during good and poor air quality days. A lower boundary layer, reduced cloud cover, slower dry deposition rates, and other lesser mechanisms, contribute to the local maximum of ozone over the Chesapeake Bay. Observations from this campaign were compared to a CMAQ simulation at 1.33 km resolution. The model is able to predict the regional maximum of ozone over the Chesapeake Bay accurately, but NOy concentrations are significantly overestimated. Explanations for the overestimation of NOy in the model simulations are also explored
NASA Astrophysics Data System (ADS)
Goldberg, Daniel L.; Loughner, Christopher P.; Tzortziou, Maria; Stehr, Jeffrey W.; Pickering, Kenneth E.; Marufu, Lackson T.; Dickerson, Russell R.
2014-02-01
Air quality models, such as the Community Multiscale Air Quality (CMAQ) model, indicate decidedly higher ozone near the surface of large interior water bodies, such as the Great Lakes and Chesapeake Bay. In order to test the validity of the model output, we performed surface measurements of ozone (O3) and total reactive nitrogen (NOy) on the 26-m Delaware II NOAA Small Research Vessel experimental (SRVx), deployed in the Chesapeake Bay for 10 daytime cruises in July 2011 as part of NASA's GEO-CAPE CBODAQ oceanographic field campaign in conjunction with NASA's DISCOVER-AQ air quality field campaign. During this 10-day period, the EPA O3 regulatory standard of 75 ppbv averaged over an 8-h period was exceeded four times over water while ground stations in the area only exceeded the standard at most twice. This suggests that on days when the Baltimore/Washington region is in compliance with the EPA standard, air quality over the Chesapeake Bay might exceed the EPA standard. Ozone observations over the bay during the afternoon were consistently 10-20% higher than the closest upwind ground sites during the 10-day campaign; this pattern persisted during good and poor air quality days. A lower boundary layer, reduced cloud cover, slower dry deposition rates, and other lesser mechanisms, contribute to the local maximum of ozone over the Chesapeake Bay. Observations from this campaign were compared to a CMAQ simulation at 1.33 km resolution. The model is able to predict the regional maximum of ozone over the Chesapeake Bay accurately, but NOy concentrations are significantly overestimated. Explanations for the overestimation of NOy in the model simulations are also explored.
Edward T. Sherwood; Holly Greening; Lizanne Garcia; Kris Kaufman; Tony Janicki; Ray Pribble; Brett Cunningham; Steve Peene; Jim Fitzpatrick; Kellie Dixon; Mike Wessel
2016-01-01
The Tampa Bay estuary has undergone a remarkable ecosystem recovery since the 1980s despite continued population growth within the region. However during this time, the Old Tampa Bay (OTB) segment has lagged behind the rest of the Bayâs recovery relative to improvements in overall water quality and seagrass coverage. In 2011, the Tampa Bay Estuary Program, in...
Restoration Lessons Learned from Bay Scallop Habitat Models
Habitat quality and quantity are important factors to consider when restoring bay scallop (Argopecten irradians) populations; however, data linking habitat attributes to bay scallop populations are lacking. This information is essential to guide restoration efforts to reverse sc...
A Mass Balance for Mercury in the San Francisco Bay Area
MacLeod, Matthew; McKone, Thomas E.; Mackay, Don
2008-01-01
We develop and illustrate a general regional multi-species model that describes the fate and transport of mercury in three forms, elemental, divalent, and methylated, in a generic regional environment including air, soil, vegetation, water and sediment. The objectives of the model are to describes the fate of the three forms of mercury in the environment and determine the dominant physical sinks that remove mercury from the system. Chemical transformations between the three groups of mercury species are modeled by assuming constant ratios of species concentrations in individual environmental media. We illustrate and evaluate the model with an application to describe the fate and transport of mercury in the San Francisco Bay Area of California. The model successfully rationalizes the identified sources with observed concentrations of total mercury and methyl mercury in the San Francisco Bay Estuary. The mass balance provided by the model indicates that continental and global background sources control mercury concentrations in the atmosphere but loadings to water in the San Francisco Bay estuary are dominated by runoff from the Central Valley catchment and re-mobilization of contaminated sediments deposited during past mining activities. The model suggests that the response time of mercury concentrations in the San Francisco Bay estuary to changes in loadings is long, of the order of 50 years. PMID:16190232
FISHERY-ORIENTED MODEL OF MARYLAND OYSTER POPULATIONS
We used time series data to calibrate a model of oyster population dynamics for Maryland's Chesapeake Bay. Model parameters were fishing mortality, natural mortality, recruitment, and carrying capacity. We calibrated for the Maryland bay as a whole and separately for 3 salinity z...
Torbati, Mahbaneh Eshaghzadeh; Mitreva, Makedonka; Gopalakrishnan, Vanathi
2016-12-01
Human microbiome data from genomic sequencing technologies is fast accumulating, giving us insights into bacterial taxa that contribute to health and disease. The predictive modeling of such microbiota count data for the classification of human infection from parasitic worms, such as helminths, can help in the detection and management across global populations. Real-world datasets of microbiome experiments are typically sparse, containing hundreds of measurements for bacterial species, of which only a few are detected in the bio-specimens that are analyzed. This feature of microbiome data produces the challenge of needing more observations for accurate predictive modeling and has been dealt with previously, using different methods of feature reduction. To our knowledge, integrative methods, such as transfer learning, have not yet been explored in the microbiome domain as a way to deal with data sparsity by incorporating knowledge of different but related datasets. One way of incorporating this knowledge is by using a meaningful mapping among features of these datasets. In this paper, we claim that this mapping would exist among members of each individual cluster, grouped based on phylogenetic dependency among taxa and their association to the phenotype. We validate our claim by showing that models incorporating associations in such a grouped feature space result in no performance deterioration for the given classification task. In this paper, we test our hypothesis by using classification models that detect helminth infection in microbiota of human fecal samples obtained from Indonesia and Liberia countries. In our experiments, we first learn binary classifiers for helminth infection detection by using Naive Bayes, Support Vector Machines, Multilayer Perceptrons, and Random Forest methods. In the next step, we add taxonomic modeling by using the SMART-scan module to group the data, and learn classifiers using the same four methods, to test the validity of the achieved groupings. We observed a 6% to 23% and 7% to 26% performance improvement based on the Area Under the receiver operating characteristic (ROC) Curve (AUC) and Balanced Accuracy (Bacc) measures, respectively, over 10 runs of 10-fold cross-validation. These results show that using phylogenetic dependency for grouping our microbiota data actually results in a noticeable improvement in classification performance for helminth infection detection. These promising results from this feasibility study demonstrate that methods such as SMART-scan can be utilized in the future for knowledge transfer from different but related microbiome datasets by phylogenetically-related functional mapping, to enable novel integrative biomarker discovery.
Geomorphic Modeling of Macro-Tidal Embayment with Extensive Tidal Flats: Skagit Bay, Washington
2011-09-30
tidal flats: Skagit Bay , Washington Lyle Hibler Battelle-Pacific Northwest Division Marine Sciences Laboratory Sequim , WA 98382 phone: (360) 681...3616 fax: (360) 681-4559 email: lyle.hibler@pnnl.gov Adam Maxwell Battelle-Pacific Northwest Division Marine Sciences Laboratory Sequim , WA...Geomorphic modeling of macro-tidal embayment with extensive tidal flats: Skagit Bay , Washington 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT
ONR Tidal Flats DRI: Planning Joint Modeling and Field Exercises
2007-01-01
ONR Tidal Flats DRI: Planning Joint Modeling and Field Exercises Lyle Hibler Battelle/Marine Research Operations 1529 West Sequim Bay Road...West Sequim Bay Road Sequim , WA 98382 Phone: (360) 681-4591 Fax: (360) 681-4598 Email: adam.maxwell@pnl.gov Award Number: N000140710694...PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) Battelle/Marine Research Operations,1529 West Sequim Bay Road, Sequim ,WA,98382 8. PERFORMING ORGANIZATION
Development of a Knowledge-Based System Approach for Decision Making in Construction Projects
1992-05-01
a generic model for an administrative facility and medical facility with predefined fixed building systems based on Air Force criteria and past...MAINTENANCE HANGAR (MEDIUM BAY) CORROSION CONTROL HANGAR (HIGH BAY) FUEL SYSTEM MAINTENANCE HANGAR (MEDIUM BAY) MEDICAL MODEL 82 Table 5-1--continued...BUILDING SUPPORT MEDICAL LOGISTICS MEDICAL TOTAL 85 Table 5-2--continued MISSILE ASSEMBLY AND MAINTENANCE BUILDING TOTAL MISSILE LOADING AND UNLOADING
Parveen, Salina; DaSilva, Ligia; DePaola, Angelo; Bowers, John; White, Chanelle; Munasinghe, Kumudini Apsara; Brohawn, Kathy; Mudoh, Meshack; Tamplin, Mark
2013-01-15
Information is limited about the growth and survival of naturally-occurring Vibrio parahaemolyticus in live oysters under commercially relevant storage conditions harvested from different regions and in different oyster species. This study produced a predictive model for the growth of naturally-occurring V. parahaemolyticus in live Eastern oysters (Crassostrea virginica) harvested from the Chesapeake Bay, MD, USA and stored at 5-30 °C until oysters gapped. The model was validated with model-independent data collected from Eastern oysters harvested from the Chesapeake Bay and Mobile Bay, AL, USA and Asian (C. ariakensis) oysters from the Chesapeake Bay, VA, USA. The effect of harvest season, region and water condition on growth rate (GR) was also tested. At each time interval, two samples consisting of six oysters each were analyzed by a direct-plating method for total V. parahaemolyticus. The Baranyi D-model was fitted to the total V. parahaemolyticus growth and survival data. A secondary model was produced using the square root model. V. parahaemolyticus slowly inactivated at 5 and 10 °C with average rates of -0.002 and -0.001 log cfu/h, respectively. The average GRs at 15, 20, 25, and 30 °C were 0.038, 0.082, 0.228, and 0.219 log cfu/h, respectively. The bias and accuracy factors of the secondary model for model-independent data were 1.36 and 1.46 for Eastern oysters from Mobile Bay and the Chesapeake Bay, respectively. V. parahaemolyticus GRs were markedly lower in Asian oysters. Harvest temperature, salinity, region and season had no effect on GRs. The observed GRs were less than those predicted by the U.S. Food and Drug Administration's V. parahaemolyticus quantitative risk assessment. Copyright © 2012 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Thompson, D. E.; Rajkumar, T.
2002-12-01
The San Francisco Bay Delta is a large hydrodynamic complex that incorporates the Sacramento and San Joaquin Estuaries, the Suisan Marsh, and the San Francisco Bay proper. Competition exists for the use of this extensive water system both from the fisheries industry, the agricultural industry, and from the marine and estuarine animal species within the Delta. As tidal fluctuations occur, more saline water pushes upstream allowing fish to migrate beyond the Suisan Marsh for breeding and habitat occupation. However, the agriculture industry does not want extensive salinity intrusion to impact water quality for human and plant consumption. The balance is regulated by pumping stations located along the estuaries and reservoirs whereby flushing of fresh water keeps the saline intrusion at bay. The pumping schedule is driven by data collected at various locations within the Bay Delta and by numerical models that predict the salinity intrusion as part of a larger model of the system. The Interagency Ecological Program (IEP) for the San Francisco Bay / Sacramento-San Joaquin Estuary collects, monitors, and archives the data, and the Department of Water Resources provides a numerical model simulation (DSM2) from which predictions are made that drive the pumping schedule. A problem with DSM2 is that the numerical simulation takes roughly 16 hours to complete a prediction. We have created a neural net, optimized with a genetic algorithm, that takes as input the archived data from multiple gauging stations and predicts stage, salinity, and flow at the Carquinez Straits (at the downstream end of the Suisan Marsh). This model seems to be robust in its predictions and operates much faster than the current numerical DSM2 model. Because the Bay-Delta is strongly tidally driven, we used both Principal Component Analysis and Fast Fourier Transforms to discover dominant features within the IEP data. We then filtered out the dominant tidal forcing to discover non-primary tidal effects, and used this to enhance the neural network by mapping input-output relationships in a more efficient manner. Furthermore, the neural network implicitly incorporates both the hydrodynamic and water quality models into a single predictive system. Although our model has not yet been enhanced to demonstrate improve pumping schedules, it has the possibility to support better decision-making procedures that may then be implemented by State agencies if desired. Our intention is now to use our calibrated Bay-Delta neural model in the smaller Elkhorn Slough complex near Monterey Bay where no such hydrodynamic model currently exists. At the Elkhorn Slough, we are fusing the neural net model of tidally-driven flow with in situ flow data and airborne and satellite remote sensing data. These further constrain the behavior of the model in predicting the longer-term health and future of this vital estuary. In particular, we are using visible data to explore the effects of the sediment plume that wastes into Monterey Bay, and infrared data and thermal emissivities to characterize the plant habitat along the margins of the Slough as salinity intrusion and sediment removal change the boundary of the estuary. The details of the Bay-Delta neural net model and its application to the Elkhorn Slough are presented in this paper.
Mid-Bay Islands Hydrodynamics and Sedimentation Modeling Study, Chesapeake Bay
2006-08-01
largest estuary in the United States, extending more than 150 miles from its seaward end at the Atlantic Ocean to the bayward end at the entrance to...water enters the bay from more than 150 major rivers and streams at approximately 80,000 cu ft/sec. Ocean tides enter the bay through the Atlantic ...Ocean entrance and C&D Canal. The mean range of tides in the bay varies from approximately 1 ft on the western shore to 3 ft at the Atlantic Ocean
Sukuru, Sai Chetan K; Nigsch, Florian; Quancard, Jean; Renatus, Martin; Chopra, Rajiv; Brooijmans, Natasja; Mikhailov, Dmitri; Deng, Zhan; Cornett, Allen; Jenkins, Jeremy L; Hommel, Ulrich; Davies, John W; Glick, Meir
2010-01-01
We present here a comprehensive analysis of proteases in the peptide substrate space and demonstrate its applicability for lead discovery. Aligned octapeptide substrates of 498 proteases taken from the MEROPS peptidase database were used for the in silico analysis. A multiple-category naïve Bayes model, trained on the two-dimensional chemical features of the substrates, was able to classify the substrates of 365 (73%) proteases and elucidate statistically significant chemical features for each of their specific substrate positions. The positional awareness of the method allows us to identify the most similar substrate positions between proteases. Our analysis reveals that proteases from different families, based on the traditional classification (aspartic, cysteine, serine, and metallo), could have substrates that differ at the cleavage site (P1–P1′) but are similar away from it. Caspase-3 (cysteine protease) and granzyme B (serine protease) are previously known examples of cross-family neighbors identified by this method. To assess whether peptide substrate similarity between unrelated proteases could reliably translate into the discovery of low molecular weight synthetic inhibitors, a lead discovery strategy was tested on two other cross-family neighbors—namely cathepsin L2 and matrix metallo proteinase 9, and calpain 1 and pepsin A. For both these pairs, a naïve Bayes classifier model trained on inhibitors of one protease could successfully enrich those of its neighbor from a different family and vice versa, indicating that this approach could be prospectively applied to lead discovery for a novel protease target with no known synthetic inhibitors. PMID:20799349
HABITAT ASSESSMENT MODELS FOR BAY SCALLOP, ARGOPECTEN IRRADIANS
Bay scallops (Argopecten irradians) inhabit shallow subtidal habitats along the Atlantic coast of the United States and require settlement substrates, such as submerged aquatic vegetation (SAV), for their early juvenile stages. The short lifespan of bay scallops (1-2 yr) coupled...
Kish, George R.; Harrison, Arnell S.; Alderson, Mark
2008-01-01
The U.S. Geological Survey, in cooperation with the Sarasota Bay Estuary Program conducted a retrospective review of characteristics of the Sarasota Bay watershed in west-central Florida. This report describes watershed characteristics, surface- and ground-water processes, and the environmental setting of the Sarasota Bay watershed. Population growth during the last 50 years is transforming the Sarasota Bay watershed from rural and agriculture to urban and suburban. The transition has resulted in land-use changes that influence surface- and ground-water processes in the watershed. Increased impervious cover decreases recharge to ground water and increases overland runoff and the pollutants carried in the runoff. Soil compaction resulting from agriculture, construction, and recreation activities also decreases recharge to ground water. Conventional approaches to stormwater runoff have involved conveyances and large storage areas. Low-impact development approaches, designed to provide recharge near the precipitation point-of-contact, are being used increasingly in the watershed. Simple pollutant loading models applied to the Sarasota Bay watershed have focused on large-scale processes and pollutant loads determined from empirical values and mean event concentrations. Complex watershed models and more intensive data-collection programs can provide the level of information needed to quantify (1) the effects of lot-scale land practices on runoff, storage, and ground-water recharge, (2) dry and wet season flux of nutrients through atmospheric deposition, (3) changes in partitioning of water and contaminants as urbanization alters predevelopment rainfall-runoff relations, and (4) linkages between watershed models and lot-scale models to evaluate the effect of small-scale changes over the entire Sarasota Bay watershed. As urbanization in the Sarasota Bay watershed continues, focused research on water-resources issues can provide information needed by water-resources managers to ensure the future health of the watershed.
Millie, David F; Fahnenstiel, Gary L; Weckman, Gary R; Klarer, David M; Dyble, Julianne; Vanderploeg, Henry A; Fishman, Daniel B
2011-08-01
Phytoplankton and Microcystis aeruginosa (Kütz.) Kütz. biovolumes were characterized and modeled, respectively, with regard to hydrological and meteorological variables during zebra mussel invasion in Saginaw Bay (1990-1996). Total phytoplankton and Microcystis biomass within the inner bay were one and one-half and six times greater, respectively, than those of the outer bay. Following mussel invasion, mean total biomass in the inner bay decreased 84% but then returned to its approximate initial value. Microcystis was not present in the bay during 1990 and 1991 and thereafter occurred at/in 52% of sample sites/dates with the greatest biomass occurring in 1994-1996 and within months having water temperatures >19°C. With an overall relative biomass of 0.03 ± 0.01 (mean + SE), Microcystis had, at best, a marginal impact upon holistic compositional dynamics. Dynamics of the centric diatom Cyclotella ocellata Pant. and large pennate diatoms dominated compositional dissimilarities both inter- and intra-annually. The environmental variables that corresponded with phytoplankton distributions were similar for the inner and outer bays, and together identified physical forcing and biotic utilization of nutrients as determinants of system-level biomass patterns. Nonparametric models explained 70%-85% of the variability in Microcystis biovolumes and identified maximal biomass to occur at total phosphorus (TP) concentrations ranging from 40 to 45 μg · L(-1) . From isometric projections depicting modeled Microcystis/environmental interactions, a TP concentration of <30 μg · L(-1) was identified as a desirable contemporary "target" for management efforts to ameliorate bloom potentials throughout mussel-impacted bay waters. © 2011 Phycological Society of America.
Modelling Wind Effects on Subtidal Salinity in Apalachicola Bay, Florida
NASA Astrophysics Data System (ADS)
Huang, W.; Jones, W. K.; Wu, T. S.
2002-07-01
Salinity is an important factor for oyster and estuarine productivity in Apalachicola Bay. Observations of salinity at oyster reefs have indicated a high correlation between subtidal salinity variations and the surface winds along the bay axis in an approximately east-west direction. In this paper, we applied a calibrated hydrodynamic model to examine the surface wind effects on the volume fluxes in the tidal inlets and the subtidal salinity variations in the bay. Model simulations show that, due to the large size of inlets located at the east and west ends of this long estuary, surface winds have significant effects on the volume fluxes in the estuary inlets for the water exchanges between the estuary and ocean. In general, eastward winds cause the inflow from the inlets at the western end and the outflow from inlets at the eastern end of the bay. Winds at 15 mph speed in the east-west direction can induce a 2000 m3 s-1 inflow of saline seawater into the bay from the inlets, a rate which is about 2·6 times that of the annual average freshwater inflow from the river. Due to the varied wind-induced volume fluxes in the inlets and the circulation in the bay, the time series of subtidal salinity at oyster reefs considerably increases during strong east-west wind conditions in comparison to salinity during windless conditions. In order to have a better understanding of the characteristics of the wind-induced subtidal circulation and salinity variations, the researchers also connected model simulations under constant east-west wind conditions. Results show that the volume fluxes are linearly proportional to the east-west wind stresses. Spatial distributions of daily average salinity and currents clearly show the significant effects of winds on the bay.
Varadhan, Ravi; Wang, Sue-Jane
2016-01-01
Treatment effect heterogeneity is a well-recognized phenomenon in randomized controlled clinical trials. In this paper, we discuss subgroup analyses with prespecified subgroups of clinical or biological importance. We explore various alternatives to the naive (the traditional univariate) subgroup analyses to address the issues of multiplicity and confounding. Specifically, we consider a model-based Bayesian shrinkage (Bayes-DS) and a nonparametric, empirical Bayes shrinkage approach (Emp-Bayes) to temper the optimism of traditional univariate subgroup analyses; a standardization approach (standardization) that accounts for correlation between baseline covariates; and a model-based maximum likelihood estimation (MLE) approach. The Bayes-DS and Emp-Bayes methods model the variation in subgroup-specific treatment effect rather than testing the null hypothesis of no difference between subgroups. The standardization approach addresses the issue of confounding in subgroup analyses. The MLE approach is considered only for comparison in simulation studies as the “truth” since the data were generated from the same model. Using the characteristics of a hypothetical large outcome trial, we perform simulation studies and articulate the utilities and potential limitations of these estimators. Simulation results indicate that Bayes-DS and Emp-Bayes can protect against optimism present in the naïve approach. Due to its simplicity, the naïve approach should be the reference for reporting univariate subgroup-specific treatment effect estimates from exploratory subgroup analyses. Standardization, although it tends to have a larger variance, is suggested when it is important to address the confounding of univariate subgroup effects due to correlation between baseline covariates. The Bayes-DS approach is available as an R package (DSBayes). PMID:26485117
Potential Inundation due to Rising Sea Levels in the San Francisco Bay Region
Knowles, Noah
2009-01-01
An increase in the rate of sea level rise is one of the primary impacts of projected global climate change. To assess potential inundation associated with a continued acceleration of sea level rise, the highest resolution elevation data available were assembled from various sources and mosaicked to cover the land surfaces of the San Francisco Bay region. Next, to quantify high water levels throughout the bay, a hydrodynamic model of the San Francisco Estuary was driven by a projection of hourly water levels at the Presidio. This projection was based on a combination of climate model outputs and empirical models and incorporates astronomical, storm surge, El Niño, and long-term sea level rise influences. Based on the resulting data, maps of areas vulnerable to inundation were produced, corresponding to specific amounts of sea level rise and recurrence intervals. These maps portray areas where inundation will likely be an increasing concern. In the North Bay, wetland survival and developed fill areas are at risk. In Central and South bays, a key feature is the bay-ward periphery of developed areas that would be newly vulnerable to inundation. Nearly all municipalities adjacent to South Bay face this risk to some degree. For the Bay as a whole, as early as 2050 under this scenario, the one-year peak event nearly equals the 100-year peak event in 2000. Maps of vulnerable areas are presented and some implications discussed.
NASA Astrophysics Data System (ADS)
Cloern, J.
2008-12-01
Programs to ensure sustainability of coastal ecosystems and the biological diversity they harbor require ecological forecasting to assess habitat transformations from the coupled effects of climate change and human population growth. A multidisciplinary modeling project (CASCaDE) was launched in 2007 to develop 21st-century visions of the Sacramento-San Joaquin Delta and San Francisco Bay under four scenarios of climate change and increasing demand for California's water resource. The process begins with downscaled projections of daily weather from GCM's and routes these to a watershed model that computes runoff and an operations model that computes inflows to the Bay-Delta. Hydrologic and climatic outputs, including sea level rise, drive models of tidal hydrodynamics-salinity-temperature in the Delta, sediment inputs and evolving geomorphology of San Francisco Bay. These projected habitat changes are being used to address priority questions asked by resource managers: How will changes in seasonal streamflow, salinity and water temperature, frequency of extreme weather and hydrologic events, and geomorphology influence the sustainability of native species that depend upon the Bay-Delta and the ecosystem services it provides?
Machine learning of neural representations of suicide and emotion concepts identifies suicidal youth
Just, Marcel Adam; Pan, Lisa; Cherkassky, Vladimir L.; McMakin, Dana; Cha, Christine; Nock, Matthew K.; Brent, David
2017-01-01
The clinical assessment of suicidal risk would be significantly complemented by a biologically-based measure that assesses alterations in the neural representations of concepts related to death and life in people who engage in suicidal ideation. This study used machine-learning algorithms (Gaussian Naïve Bayes) to identify such individuals (17 suicidal ideators vs 17 controls) with high (91%) accuracy, based on their altered fMRI neural signatures of death and life-related concepts. The most discriminating concepts were death, cruelty, trouble, carefree, good, and praise. A similar classification accurately (94%) discriminated 9 suicidal ideators who had made a suicide attempt from 8 who had not. Moreover, a major facet of the concept alterations was the evoked emotion, whose neural signature served as an alternative basis for accurate (85%) group classification. The study establishes a biological, neurocognitive basis for altered concept representations in participants with suicidal ideation, which enables highly accurate group membership classification. PMID:29367952
Automatic discovery of optimal classes
NASA Technical Reports Server (NTRS)
Cheeseman, Peter; Stutz, John; Freeman, Don; Self, Matthew
1986-01-01
A criterion, based on Bayes' theorem, is described that defines the optimal set of classes (a classification) for a given set of examples. This criterion is transformed into an equivalent minimum message length criterion with an intuitive information interpretation. This criterion does not require that the number of classes be specified in advance, this is determined by the data. The minimum message length criterion includes the message length required to describe the classes, so there is a built in bias against adding new classes unless they lead to a reduction in the message length required to describe the data. Unfortunately, the search space of possible classifications is too large to search exhaustively, so heuristic search methods, such as simulated annealing, are applied. Tutored learning and probabilistic prediction in particular cases are an important indirect result of optimal class discovery. Extensions to the basic class induction program include the ability to combine category and real value data, hierarchical classes, independent classifications and deciding for each class which attributes are relevant.
Zhang, Wenyu; Zhang, Zhenjiang
2015-01-01
Decision fusion in sensor networks enables sensors to improve classification accuracy while reducing the energy consumption and bandwidth demand for data transmission. In this paper, we focus on the decentralized multi-class classification fusion problem in wireless sensor networks (WSNs) and a new simple but effective decision fusion rule based on belief function theory is proposed. Unlike existing belief function based decision fusion schemes, the proposed approach is compatible with any type of classifier because the basic belief assignments (BBAs) of each sensor are constructed on the basis of the classifier’s training output confusion matrix and real-time observations. We also derive explicit global BBA in the fusion center under Dempster’s combinational rule, making the decision making operation in the fusion center greatly simplified. Also, sending the whole BBA structure to the fusion center is avoided. Experimental results demonstrate that the proposed fusion rule has better performance in fusion accuracy compared with the naïve Bayes rule and weighted majority voting rule. PMID:26295399
A New Direction of Cancer Classification: Positive Effect of Low-Ranking MicroRNAs.
Li, Feifei; Piao, Minghao; Piao, Yongjun; Li, Meijing; Ryu, Keun Ho
2014-10-01
Many studies based on microRNA (miRNA) expression profiles showed a new aspect of cancer classification. Because one characteristic of miRNA expression data is the high dimensionality, feature selection methods have been used to facilitate dimensionality reduction. The feature selection methods have one shortcoming thus far: they just consider the problem of where feature to class is 1:1 or n:1. However, because one miRNA may influence more than one type of cancer, human miRNA is considered to be ranked low in traditional feature selection methods and are removed most of the time. In view of the limitation of the miRNA number, low-ranking miRNAs are also important to cancer classification. We considered both high- and low-ranking features to cover all problems (1:1, n:1, 1:n, and m:n) in cancer classification. First, we used the correlation-based feature selection method to select the high-ranking miRNAs, and chose the support vector machine, Bayes network, decision tree, k-nearest-neighbor, and logistic classifier to construct cancer classification. Then, we chose Chi-square test, information gain, gain ratio, and Pearson's correlation feature selection methods to build the m:n feature subset, and used the selected miRNAs to determine cancer classification. The low-ranking miRNA expression profiles achieved higher classification accuracy compared with just using high-ranking miRNAs in traditional feature selection methods. Our results demonstrate that the m:n feature subset made a positive impression of low-ranking miRNAs in cancer classification.
NASA Astrophysics Data System (ADS)
Best, Sara; Lundrigan, Sarah; Demirov, Entcho; Wroblewski, Joe
2011-10-01
Gilbert Bay on the southeast coast of Labrador is the site of the first Marine Protected Area (MPA) established in the subarctic coastal zone of eastern Canada. The MPA was created to conserve a genetically distinctive population of Atlantic cod, Gadus morhua. This article presents results from a study of the interannual variability in atmospheric and physical oceanographic characteristics of Gilbert Bay over the period 1949-2006. We describe seasonal and interannual variability of the atmospheric parameters at the sea surface in the bay. The interannual variability of the atmosphere in the Gilbert Bay region is related to the North Atlantic Oscillation (NAO) and a recent warming trend in the local climate of coastal Labrador. The related changes in seawater temperature, salinity and sea-ice thickness in winter are simulated with a one-dimensional water column model, the General Ocean Turbulence Model (GOTM). A warming Gilbert Bay ecosystem would be favorable for cod growth, but reduced sea-ice formation during the winter months increases the danger of traveling across the bay by snowmobile.
Predicting the vertical structure of tidal current and salinity in San Francisco Bay, California
Ford, Michael; Wang, Jia; Cheng, Ralph T.
1990-01-01
A two-dimensional laterally averaged numerical estuarine model is developed to study the vertical variations of tidal hydrodynamic properties in the central/north part of San Francisco Bay, California. Tidal stage data, current meter measurements, and conductivity, temperature, and depth profiling data in San Francisco Bay are used for comparison with model predictions. An extensive review of the literature is conducted to assess the success and failure of previous similar investigations and to establish a strategy for development of the present model. A σ plane transformation is used in the vertical dimension to alleviate problems associated with fixed grid model applications in the bay, where the tidal range can be as much as 20–25% of the total water depth. Model predictions of tidal stage and velocity compare favorably with the available field data, and prototype salinity stratification is qualitatively reproduced. Conclusions from this study as well as future model applications and research needs are discussed.
Modeling the tides of Massachusetts and Cape Cod Bays
Jenter, H.L.; Signell, R.P.; Blumberg, A.F.; ,
1993-01-01
A time-dependent, three-dimensional numerical modeling study of the tides of Massachusetts and Cape Code Bays, motivated by construction of a new sewage treatment plant and ocean outfall for the city of Boston, has been undertaken by the authors. The numerical model being used is a hybrid version of the Blumberg and Mellor ECOM3D model, modified to include a semi-implicit time-stepping scheme and transport of a non-reactive dissolved constituent. Tides in the bays are dominated by the semi-diurnal frequencies, in particular by the M2 tide, due to the resonance of these frequencies in the Gulf of Maine. The numerical model reproduces, well, measured tidal ellipses in unstratified wintertime conditions. Stratified conditions present more of a problem because tidal-frequency internal wave generation and propagation significantly complicates the structure of the resulting tidal field. Nonetheless, the numerical model reproduces qualitative aspects of the stratified tidal flow that are consistent with observations in the bays.
A&M. TAN607. Elevation for secondphase expansion of A&M Building. Work ...
A&M. TAN-607. Elevation for second-phase expansion of A&M Building. Work areas south of the Carpentry Shop. High-bay shop, decontamination room at south-most end. Approved by INEEL Classification Office for public release. Ralph M. Parsons 1299-5-ANP/GE-3-607-A 106. Date: August 1956. INEEL index code no. 034-0607-00-693-107166 - Idaho National Engineering Laboratory, Test Area North, Scoville, Butte County, ID
1977-03-01
preserved in 70% ethanol for future reference. Periphyton (Attached Algae ): Periphyton from the rivers are being collected and periphyton from bear...most abundant of phytoplankton include: Asterionella formosa, Tabellaria fenestrata, Melosica granulata, Dinobryon sp., Synedra acus, and Cyclotella sp...listed in table 5 below: TABLE 5 Aquatic Habitats - :4ile Post 7 Site Classif- Species of Major Major Benthic Water Body ication Importance Substrates
Ferragina, A.; de los Campos, G.; Vazquez, A. I.; Cecchinato, A.; Bittante, G.
2017-01-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict “difficult-to-predict” dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm−1 were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R2 value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R2 (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R2 of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. PMID:26387015
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stein, Peter J.; Edson, Patrick L.
2013-12-20
This project saw the completion of the design and development of a second generation, high frequency (90-120 kHz) Subsurface-Threat Detection Sonar Network (SDSN). The system was deployed, operated, and tested in Cobscook Bay, Maine near the site the Ocean Renewable Power Company TidGen™ power unit. This effort resulted in a very successful demonstration of the SDSN detection, tracking, localization, and classification capabilities in a high current, MHK environment as measured by results from the detection and tracking trials in Cobscook Bay. The new high frequency node, designed to operate outside the hearing range of a subset of marine mammals, wasmore » shown to detect and track objects of marine mammal-like target strength to ranges of approximately 500 meters. This performance range results in the SDSN system tracking objects for a significant duration - on the order of minutes - even in a tidal flow of 5-7 knots, potentially allowing time for MHK system or operator decision-making if marine mammals are present. Having demonstrated detection and tracking of synthetic targets with target strengths similar to some marine mammals, the primary hurdle to eventual automated monitoring is a dataset of actual marine mammal kinematic behavior and modifying the tracking algorithms and parameters which are currently tuned to human diver kinematics and classification.« less
Linear and Order Statistics Combiners for Pattern Classification
NASA Technical Reports Server (NTRS)
Tumer, Kagan; Ghosh, Joydeep; Lau, Sonie (Technical Monitor)
2001-01-01
Several researchers have experimentally shown that substantial improvements can be obtained in difficult pattern recognition problems by combining or integrating the outputs of multiple classifiers. This chapter provides an analytical framework to quantify the improvements in classification results due to combining. The results apply to both linear combiners and order statistics combiners. We first show that to a first order approximation, the error rate obtained over and above the Bayes error rate, is directly proportional to the variance of the actual decision boundaries around the Bayes optimum boundary. Combining classifiers in output space reduces this variance, and hence reduces the 'added' error. If N unbiased classifiers are combined by simple averaging. the added error rate can be reduced by a factor of N if the individual errors in approximating the decision boundaries are uncorrelated. Expressions are then derived for linear combiners which are biased or correlated, and the effect of output correlations on ensemble performance is quantified. For order statistics based non-linear combiners, we derive expressions that indicate how much the median, the maximum and in general the i-th order statistic can improve classifier performance. The analysis presented here facilitates the understanding of the relationships among error rates, classifier boundary distributions, and combining in output space. Experimental results on several public domain data sets are provided to illustrate the benefits of combining and to support the analytical results.
Coupling Fluvial and Oceanic Drivers in Flooding Forecasts for San Francisco Bay
NASA Astrophysics Data System (ADS)
Herdman, L.; Kim, J.; Cifelli, R.; Barnard, P.; Erikson, L. H.; Johnson, L. E.; Chandrasekar, V.
2016-12-01
San Francisco Bay is a highly urbanized estuary and the surrounding communities are susceptible to flooding along the bay shoreline and inland rivers and creeks that drain to the Bay. A forecast model that integrates fluvial and oceanic drivers is necessary for predicting flooding in this complex urban environment. This study introduces the state-of-the-art coupling of the USGS Coastal Storm Modeling System (CoSMoS) with the NWS Research Distributed Hydrologic Model (RDHM) for San Francisco Bay. For this application, we utilize Delft3D-FM, a hydrodynamic model based on a flexible mesh grid, to calculate water levels that account for tidal forcing, seasonal water level anomalies, surge and in-Bay generated wind waves from the wind and pressure fields of a NWS forecast model. The tributary discharges from RDHM are dynamic, meteorologically driven allowing for operational use of CoSMoS which has previously relied on statistical estimates of river discharge. The flooding extent is determined by overlaying the resulting maximum water levels onto a recently updated 2-m digital elevation model of the study area which best resolves the extensive levee and tidal marsh systems in the region. The results we present here are focused on the interaction of the Bay and the Napa River watershed. This study demonstrates the interoperability of the CoSMoS and RDHM prediction models. We also use this pilot region to examine storm flooding impacts in a series of storm scenarios that simulate 5-100yr return period events in terms of either coastal or fluvial events. These scenarios demonstrate the wide range of possible flooding outcomes considering rainfall recurrence intervals, soil moisture conditions, storm surge, wind speed, and tides (spring and neap). With a simulated set of over 25 storm scenarios we show how the extent, level, and duration of flooding is dependent on these atmospheric and hydrologic parameters and we also determine a range of likely flood events.
Wind effect on salt transport variability in the Bay of Bengal
NASA Astrophysics Data System (ADS)
Sandeep, K. K.; Pant, V.
2017-12-01
The Bay of Bengal (BoB) exhibits large spatial variability in sea surface salinity (SSS) pattern caused by its unique hydrological, meteorological and oceanographical characteristics. This SSS variability is largely controlled by the seasonally reversing monsoon winds and the associated currents. Further, the BoB receives substantial freshwater inputs through excess precipitation over evaporation and river discharge. Rivers like Ganges, Brahmaputra, Mahanadi, Krishna, Godavari, and Irawwady discharge annually a freshwater volume in range between 1.5 x 1012 and 1.83 x 1013 m3 into the bay. A major volume of this freshwater input to the bay occurs during the southwest monsoon (June-September) period. In the present study, a relative role of winds in the SSS variability in the bay is investigated by using an eddy-resolving three dimensional Regional Ocean Modeling System (ROMS) numerical model. The model is configured with realistic bathymetry, coastline of study region and forced with daily climatology of atmospheric variables. River discharges from the major rivers are distributed in the model grid points representing their respective geographic locations. Salt transport estimate from the model simulation for realistic case are compared with the standard reference datasets. Further, different experiments were carried out with idealized surface wind forcing representing the normal, low, high, and very high wind speed conditions in the bay while retaining the realistic daily varying directions for all the cases. The experimental simulations exhibit distinct dispersal patterns of the freshwater plume and SSS in different experiments in response to the idealized winds. Comparison of the meridional and zonal surface salt transport estimated for each experiment showed strong seasonality with varying magnitude in the bay with a maximum spatial and temporal variability in the western and northern parts of the BoB.
Analysing Twitter and web queries for flu trend prediction.
Santos, José Carlos; Matos, Sérgio
2014-05-07
Social media platforms encourage people to share diverse aspects of their daily life. Among these, shared health related information might be used to infer health status and incidence rates for specific conditions or symptoms. In this work, we present an infodemiology study that evaluates the use of Twitter messages and search engine query logs to estimate and predict the incidence rate of influenza like illness in Portugal. Based on a manually classified dataset of 2704 tweets from Portugal, we selected a set of 650 textual features to train a Naïve Bayes classifier to identify tweets mentioning flu or flu-like illness or symptoms. We obtained a precision of 0.78 and an F-measure of 0.83, based on cross validation over the complete annotated set. Furthermore, we trained a multiple linear regression model to estimate the health-monitoring data from the Influenzanet project, using as predictors the relative frequencies obtained from the tweet classification results and from query logs, and achieved a correlation ratio of 0.89 (p<0.001). These classification and regression models were also applied to estimate the flu incidence in the following flu season, achieving a correlation of 0.72. Previous studies addressing the estimation of disease incidence based on user-generated content have mostly focused on the english language. Our results further validate those studies and show that by changing the initial steps of data preprocessing and feature extraction and selection, the proposed approaches can be adapted to other languages. Additionally, we investigated whether the predictive model created can be applied to data from the subsequent flu season. In this case, although the prediction result was good, an initial phase to adapt the regression model could be necessary to achieve more robust results.
Rajendran, Senthilnathan; Jothi, Arunachalam
2018-05-16
The Three-dimensional structure of a protein depends on the interaction between their amino acid residues. These interactions are in turn influenced by various biophysical properties of the amino acids. There are several examples of proteins that share the same fold but are very dissimilar at the sequence level. For proteins to share a common fold some crucial interactions should be maintained despite insignificant sequence similarity. Since the interactions are because of the biophysical properties of the amino acids, we should be able to detect descriptive patterns for folds at such a property level. In this line, the main focus of our research is to analyze such proteins and to characterize them in terms of their biophysical properties. Protein structures with sequence similarity lesser than 40% were selected for ten different subfolds from three different mainfolds (according to CATH classification) and were used for this analysis. We used the normalized values of the 49 physio-chemical, energetic and conformational properties of amino acids. We characterize the folds based on the average biophysical property values. We also observed a fold specific correlational behavior of biophysical properties despite a very low sequence similarity in our data. We further trained three different binary classification models (Naive Bayes-NB, Support Vector Machines-SVM and Bayesian Generalized Linear Model-BGLM) which could discriminate mainfold based on the biophysical properties. We also show that among the three generated models, the BGLM classifier model was able to discriminate protein sequences coming under all beta category with 81.43% accuracy and all alpha, alpha-beta proteins with 83.37% accuracy. Copyright © 2018 Elsevier Ltd. All rights reserved.
Missisquoi Bay Phosphorus Model Addendum
This technical memorandum provides results of an extended load reduction simulation. The memorandum serves as an addendum to the main Missisquoi Bay Phosphorus Mass Balance Model report prepared for the Lake Champlain Basin Program by LimnoTech in 2012
Sentiment analysis system for movie review in Bahasa Indonesia using naive bayes classifier method
NASA Astrophysics Data System (ADS)
Nurdiansyah, Yanuar; Bukhori, Saiful; Hidayat, Rahmad
2018-04-01
There are many ways of implementing the use of sentiments often found in documents; one of which is the sentiments found on the product or service reviews. It is so important to be able to process and extract textual data from the documents. Therefore, we propose a system that is able to classify sentiments from review documents into two classes: positive sentiment and negative sentiment. We use Naive Bayes Classifier method in this document classification system that we build. We choose Movienthusiast, a movie reviews in Bahasa Indonesia website as the source of our review documents. From there, we were able to collect 1201 movie reviews: 783 positive reviews and 418 negative reviews that we use as the dataset for this machine learning classifier. The classifying accuracy yields an average of 88.37% from five times of accuracy measuring attempts using aforementioned dataset.
Skylab/EREP application to ecological, geological, and oceanographic investigations of Delaware Bay
NASA Technical Reports Server (NTRS)
Klemas, V.; Bartlett, D. S.; Philpot, W. D.; Rogers, R. H.; Reed, L. E.
1978-01-01
Skylab/EREP S190A and S190B film products were optically enhanced and visually interpreted to extract data suitable for; (1) mapping coastal land use; (2) inventorying wetlands vegetation; (3) monitoring tidal conditions; (4) observing suspended sediment patterns; (5) charting surface currents; (6) locating coastal fronts and water mass boundaries; (7) monitoring industrial and municipal waste dumps in the ocean; (8) determining the size and flow direction of river, bay and man-made discharge plumes; and (9) observing ship traffic. Film products were visually analyzed to identify and map ten land-use and vegetation categories at a scale of 1:125,000. Digital tapes from the multispectral scanner were used to prepare thematic maps of land use. Classification accuracies obtained by comparison of derived thematic maps of land-use with USGS-CARETS land-use maps in southern Delaware ranged from 44 percent to 100 percent.
ECOSYSTEM MODELING IN COBSCOOK BAY, MAINE:A SUMMARY, PERSPECTIVE, AND LOOK FORWARD
In the mid-1990s, an interdisciplinary, multi-institutional team of scientists was assembled to address basic issues concerning biological productivity and the unique co-occurrence of many unusual ecological features in Cobscook Bay, Maine. Cobscook Bay is a geologically complex,...
Wang, Hongqing; Hladik, C.M.; Huang, W.; Milla, K.; Edmiston, L.; Harwell, M.A.; Schalles, J.F.
2010-01-01
Apalachicola Bay, Florida, accounts for 90% of Florida's and 10% of the nation's eastern oyster (Crassostrea virginica) harvesting. Chlorophyll-a concentration and total suspended solids (TSS) are two important water quality variables, among other environmental factors such as salinity, for eastern oyster production in Apalachicola Bay. In this research, we developed regression models of the relationships between the reflectance of the Moderate-Resolution Imaging Spectroradiometer (MODIS) Terra 250 m data and the two water quality variables based on the Bay-wide field data collected during 14-17 October 2002, a relatively dry period, and 3-5 April 2006, a relatively wet period, respectively. Then we selected the best regression models (highest coefficient of determination, R2) to derive Bay-wide maps of chlorophylla concentration and TSS for the two periods. The MODIS-derived maps revealed large spatial and temporal variations in chlorophylla concentration and TSS across the entire Apalachicola Bay. ?? 2010 Taylor & Francis.
Ma, Jian; Lu, Chen; Liu, Hongmei
2015-01-01
The aircraft environmental control system (ECS) is a critical aircraft system, which provides the appropriate environmental conditions to ensure the safe transport of air passengers and equipment. The functionality and reliability of ECS have received increasing attention in recent years. The heat exchanger is a particularly significant component of the ECS, because its failure decreases the system’s efficiency, which can lead to catastrophic consequences. Fault diagnosis of the heat exchanger is necessary to prevent risks. However, two problems hinder the implementation of the heat exchanger fault diagnosis in practice. First, the actual measured parameter of the heat exchanger cannot effectively reflect the fault occurrence, whereas the heat exchanger faults are usually depicted by utilizing the corresponding fault-related state parameters that cannot be measured directly. Second, both the traditional Extended Kalman Filter (EKF) and the EKF-based Double Model Filter have certain disadvantages, such as sensitivity to modeling errors and difficulties in selection of initialization values. To solve the aforementioned problems, this paper presents a fault-related parameter adaptive estimation method based on strong tracking filter (STF) and Modified Bayes classification algorithm for fault detection and failure mode classification of the heat exchanger, respectively. Heat exchanger fault simulation is conducted to generate fault data, through which the proposed methods are validated. The results demonstrate that the proposed methods are capable of providing accurate, stable, and rapid fault diagnosis of the heat exchanger. PMID:25823010
Ma, Jian; Lu, Chen; Liu, Hongmei
2015-01-01
The aircraft environmental control system (ECS) is a critical aircraft system, which provides the appropriate environmental conditions to ensure the safe transport of air passengers and equipment. The functionality and reliability of ECS have received increasing attention in recent years. The heat exchanger is a particularly significant component of the ECS, because its failure decreases the system's efficiency, which can lead to catastrophic consequences. Fault diagnosis of the heat exchanger is necessary to prevent risks. However, two problems hinder the implementation of the heat exchanger fault diagnosis in practice. First, the actual measured parameter of the heat exchanger cannot effectively reflect the fault occurrence, whereas the heat exchanger faults are usually depicted by utilizing the corresponding fault-related state parameters that cannot be measured directly. Second, both the traditional Extended Kalman Filter (EKF) and the EKF-based Double Model Filter have certain disadvantages, such as sensitivity to modeling errors and difficulties in selection of initialization values. To solve the aforementioned problems, this paper presents a fault-related parameter adaptive estimation method based on strong tracking filter (STF) and Modified Bayes classification algorithm for fault detection and failure mode classification of the heat exchanger, respectively. Heat exchanger fault simulation is conducted to generate fault data, through which the proposed methods are validated. The results demonstrate that the proposed methods are capable of providing accurate, stable, and rapid fault diagnosis of the heat exchanger.
DEVELOP Chesapeake Bay Watershed Hydrology - UAV Sensor Web
NASA Astrophysics Data System (ADS)
Holley, S. D.; Baruah, A.
2008-12-01
The Chesapeake Bay is the largest estuary in the United States, with a watershed extending through six states and the nation's capital. Urbanization and agriculture practices have led to an excess runoff of nutrients and sediment into the bay. Nutrients and sediment loading stimulate the growth of algal blooms associated with various problems including localized dissolved oxygen deficiencies, toxic algal blooms and death of marine life. The Chesapeake Bay Program, among other stakeholder organizations, contributes greatly to the restoration efforts of the Chesapeake Bay. These stakeholders contribute in many ways such as monitoring the water quality, leading clean-up projects, and actively restoring native habitats. The first stage of the DEVELOP Chesapeake Bay Coastal Management project, relating to water quality, contributed to the restoration efforts by introducing NASA satellite-based water quality data products to the stakeholders as a complement to their current monitoring methods. The second stage, to be initiated in the fall 2008 internship term, will focus on the impacts of land cover variability within the Chesapeake Bay Watershed. Multiple student led discussions with members of the Land Cover team at the Chesapeake Bay Program Office in the DEVELOP GSFC 2008 summer term uncovered the need for remote sensing data for hydrological mapping in the watershed. The Chesapeake Bay Program expressed in repeated discussions on Land Cover mapping that significant portions of upper river areas, streams, and the land directly interfacing those waters are not accurately depicted in the watershed model. Without such hydrological mapping correlated with land cover data the model will not be useful in depicting source areas of nutrient loading which has an ecological and economic impact in and around the Chesapeake Bay. The fall 2008 DEVELOP team will examine the use of UAV flown sensors in connection with in-situ and Earth Observation satellite data. To maximize the web of data, students will also examine the NASA's research into self organizing neural-networks to ensure the data is correlated in such a manner as to support the sensor web connections. Additionally, students will learn the operation and functionality of the Chesapeake Bay Program's watershed model to examine and determine the potential for integration of the sensor web data into the watershed model.
Tidal-flow, circulation, and flushing changes caused by dredge and fill in Hillsborough Bay, Florida
Goodwin, Carl R.
1991-01-01
Hillsborough Bay, Florida, underwent extensive physical changes between 1880 and 1972 because of the construction of islands, channels, and shoreline fills. These changes resulted in a progressive reduction in the quantity of tidal water that enters and leaves the bay. Dredging and filling also changed the magnitude and direction of tidal flow in most of the bay. A two-dimensional, finite-difference hydrodynamic model was used to simulate flood, ebb, and residual water transport for physical conditions in Hillsborough Bay and the northeastern part of Middle Tampa Bay during 1880, 1972, and 1985. The calibrated and verified model was used to evaluate cumulative water-transport changes resulting from construction in the study area between 1880 and 1972. The model also was used to evaluate water-transport changes as a result of a major Federal dredging project completed in 1985. The model indicates that transport changes resulting from the Federal dredging project are much less areally extensive than the corresponding transport changes resulting from construction between 1880 and 1972. Dredging-caused changes of more than 50 percent in flood and ebb water transport were computed to occur over only about 8 square miles of the 65-square-mile study area between 1972 and 1985. Model results indicate that construction between 1880 and 1972 caused changes of similar magnitude over about 23 square miles. Dredging-caused changes of more than 50 percent in residual water transport were computed to occur over only 17 square miles between 1972 and 1985. Between 1880 and 1972, changes of similar magnitude were computed to occur over an area of 45 square miles. Model results also reveal historical tide-induced circulation patterns. The patterns consist of a series of about 8 interconnected circulatory features in 1880 and as many as 15 in 1985. Dredging- and construction-caused changes in number, size, position, shape, and intensity of the circulatory features increase tide-induced circulation throughout the bay. Circulation patterns for 1880, 1972, and 1985 levels of development differ in many details, but all exhibit residual landward flow of water in the deep, central part of the bay and residual seaward flow in the shallows along the bay margins. This general residual flow pattern is confirmed by both computed transport of a hypothetical constituent and long-term salinity observations in Hillsborough Bay. The concept has been used to estimate the average time it takes a particle to move from the head to the mouth of the bay. The mean transit time was computed to be 58 days in 1880 and 29 days in 1972 and 1985. This increase in circulation and decrease in transit time since 1880 is estimated to have caused an increase in average salinity of Hillsborough Bay of about 2 parts per thousand. Dredge and fill construction is concluded to have significantly increased circulation and flushing between 1880 and 1972. Little circulation or flushing change is attributed to dredging activity since 1972.
Acosta-Mesa, Héctor Gabriel; Cruz-Ramírez, Nicandro; Hernández-Jiménez, Rodolfo
2017-01-01
Efforts have been being made to improve the diagnostic performance of colposcopy, trying to help better diagnose cervical cancer, particularly in developing countries. However, improvements in a number of areas are still necessary, such as the time it takes to process the full digital image of the cervix, the performance of the computing systems used to identify different kinds of tissues, and biopsy sampling. In this paper, we explore three different, well-known automatic classification methods (k-Nearest Neighbors, Naïve Bayes, and C4.5), in addition to different data models that take full advantage of this information and improve the diagnostic performance of colposcopy based on acetowhite temporal patterns. Based on the ROC and PRC area scores, the k-Nearest Neighbors and discrete PLA representation performed better than other methods. The values of sensitivity, specificity, and accuracy reached using this method were 60% (95% CI 50–70), 79% (95% CI 71–86), and 70% (95% CI 60–80), respectively. The acetowhitening phenomenon is not exclusive to high-grade lesions, and we have found acetowhite temporal patterns of epithelial changes that are not precancerous lesions but that are similar to positive ones. These findings need to be considered when developing more robust computing systems in the future. PMID:28744318
Predicting hepatotoxicity using ToxCast in vitro bioactivity and ...
Background: The U.S. EPA ToxCastTM program is screening thousands of environmental chemicals for bioactivity using hundreds of high-throughput in vitro assays to build predictive models of toxicity. We represented chemicals based on bioactivity and chemical structure descriptors then used supervised machine learning to predict their hepatotoxic effects.Results: A set of 677 chemicals were represented by 711 in vitro bioactivity descriptors (from ToxCast assays), 4,376 chemical structure descriptors (from QikProp, OpenBabel, PADEL, and PubChem), and three hepatotoxicity categories (from animal studies). Hepatotoxicants were defined by rat liver histopathology observed after chronic chemical testing and grouped into hypertrophy (161), injury (101) and proliferative lesions (99). Classifiers were built using six machine learning algorithms: linear discriminant analysis (LDA), Naïve Bayes (NB), support vector classification (SVM), classification and regression trees (CART), k-nearest neighbors (KNN) and an ensemble of classifiers (ENSMB). Classifiers of hepatotoxicity were built using chemical structure, ToxCast bioactivity, and a hybrid representation. Predictive performance was evaluated using 10-fold cross-validation testing and in-loop, filter-based, feature subset selection. Hybrid classifiers had the best balanced accuracy for predicting hypertrophy (0.78±0.08), injury (0.73±0.10) and proliferative lesions (0.72±0.09). Though chemical and bioactivity class
Seagrass Identification Using High-Resolution 532nm Bathymetric LiDAR and Hyperspectral Imagery
NASA Astrophysics Data System (ADS)
Pan, Z.; Prasad, S.; Starek, M. J.; Fernandez Diaz, J. C.; Glennie, C. L.; Carter, W. E.; Shrestha, R. L.; Singhania, A.; Gibeaut, J. C.
2013-12-01
Seagrass provides vital habitat for marine fisheries and is a key indicator species of coastal ecosystem vitality. Monitoring seagrass is therefore an important environmental initiative, but measuring details of seagrass distribution over large areas via remote sensing has proved challenging. Developments in airborne bathymetric light detection and ranging (LiDAR) provide great potential in this regard. Traditional bathymetric LiDAR systems have been limited in their ability to map within the shallow water zone (< 1 m) where seagrass is typically present due to limitations in receiver response and laser pulse length. Emergent short-pulse width bathymetric LiDAR sensors and waveform processing algorithms enable depth measurements in shallow water environments previously inaccessible. This 3D information of the benthic layer can be applied to detect seagrass and characterize its distribution. Researchers with the National Center for Airborne Laser Mapping (NCALM) at the University of Houston (UH) and the Coastal and Marine Geospatial Sciences Lab (CMGL) of the Harte Research Institute at Texas A&M University-Corpus Christi conducted a coordinated airborne and boat-based survey of the Redfish Bay State Scientific Area as part of a collaborative study to investigate the capabilities of bathymetric LiDAR and hyperspectral imaging for seagrass mapping. Redfish Bay, located along the middle Texas coast of the Gulf of Mexico, is a state scientific area designated for the purpose of protecting and studying native seagrasses. Redfish Bay is part of the broader Coastal Bend Bays estuary system recognized by the US Environmental Protection Agency (EPA) as a national estuary of significance. For this survey, UH acquired high-resolution discrete-return and full-waveform bathymetric data using their Optech Aquarius 532 nm green LiDAR. In a separate flight, UH collected 2 sets of hyperspectral imaging data (1.2-m pixel resolution and 72 bands, and 0.6m pixel resolution and 36 bands) with their CASI 1500 hyperspectral sensor. The ground survey was conducted by CMGL. The team used an airboat to collect in-situ radiometer measurements of sky irradiance and surface water reflectance at different locations in the bay. The team also collected water samples, GPS position, and depth. A follow-up survey was conducted to acquire ground-truth data of benthic type at over 80 locations within the bay. Two complementary approaches were developed to detect and map the seagrass cover over the study area - automated classification algorithms were validated with high spatial resolution hyperspectral imagery, and a continuous wavelet based signal processing and pulse broadening analysis of the digitized returns was performed with the full waveform of the bathymetric LiDAR. The two approaches were compared to the collected ground truth data of seagrass type, height, and location. Results of the evaluation will be presented, along with a preliminary discussion of the fusion of the LiDAR and hyperspectral imagery for improved overall classification accuracy.
Results for both sequential and simultaneous calibration of exchange flows between segments of a 10-box, one-dimensional, well-mixed, bifurcated tidal mixing model for Tampa Bay are reported. Calibrations were conducted for three model options with different mathematical expressi...
MASS BALANCE MODELLING OF PCBS IN THE FOX RIVER/GREEN BAY COMPLEX
The USEPA Office of Research and Development developed and applies a multimedia, mass balance modeling approach to the Fox River/Green Bay complex to aid managers with remedial decision-making. The suite of models were applied to PCBs due to the long history of contamination and ...
Order-Constrained Bayes Inference for Dichotomous Models of Unidimensional Nonparametric IRT
ERIC Educational Resources Information Center
Karabatsos, George; Sheu, Ching-Fan
2004-01-01
This study introduces an order-constrained Bayes inference framework useful for analyzing data containing dichotomous scored item responses, under the assumptions of either the monotone homogeneity model or the double monotonicity model of nonparametric item response theory (NIRT). The framework involves the implementation of Gibbs sampling to…
NASA Technical Reports Server (NTRS)
Love, W. J.
1972-01-01
The objectives and scope of the Chesapeake Bay study are discussed. The physical, chemical, biological, political, and social phenomena of concern to the Chesapeake Bay area are included in the study. The construction of a model of the bay which will provide a means of accurately studying the interaction of the ecological factors is described. The application of the study by management organizations for development, enhancement, conservation, preservation, and restoration of the resources is examined.
Modeling the seasonal circulation in Massachusetts Bay
Signell, Richard P.; Jenter, Harry L.; Blumberg, Alan F.; ,
1994-01-01
An 18 month simulation of circulation was conducted in Massachusetts Bay, a roughly 35 m deep, 100??50 km embayment on the northeastern shelf of the United States. Using a variant of the Blumberg-Mellor (1987) model, it was found that a continuous 18 month run was only possible if the velocity field was Shapiro filtered to remove two grid length energy that developed along the open boundary due to mismatch in locally generated and climatologically forced water properties. The seasonal development of temperature and salinity stratification was well-represented by the model once ??-coordinate errors were reduced by subtracting domain averaged vertical profiles of temperature, salinity and density before horizontal differencing was performed. Comparison of modeled and observed subtidal currents at fixed locations revealed that the model performance varies strongly with season and distance from the open boundaries. The model performs best during unstratified conditions, and in the interior of the bay. The model performs poorest during stratified conditions and in the regions where the bay is driven predominantly by remote fluctuations from the Gulf of Maine.
3. DETAIL VIEW OF DIRECT DRIVE STERLING 'DOLPHIN T' MODEL ...
3. DETAIL VIEW OF DIRECT DRIVE STERLING 'DOLPHIN T' MODEL 4 CYLINDER, GASOLINE TRACTOR-TYPE ENGINE WITH FALKBIBBY FLEXIBLE COUPLING - Central Railroad of New Jersey, Newark Bay Lift Bridge, Spanning Newark Bay, Newark, Essex County, NJ
Probabilistic multi-person localisation and tracking in image sequences
NASA Astrophysics Data System (ADS)
Klinger, T.; Rottensteiner, F.; Heipke, C.
2017-05-01
The localisation and tracking of persons in image sequences in commonly guided by recursive filters. Especially in a multi-object tracking environment, where mutual occlusions are inherent, the predictive model is prone to drift away from the actual target position when not taking context into account. Further, if the image-based observations are imprecise, the trajectory is prone to be updated towards a wrong position. In this work we address both these problems by using a new predictive model on the basis of Gaussian Process Regression, and by using generic object detection, as well as instance-specific classification, for refined localisation. The predictive model takes into account the motion of every tracked pedestrian in the scene and the prediction is executed with respect to the velocities of neighbouring persons. In contrast to existing methods our approach uses a Dynamic Bayesian Network in which the state vector of a recursive Bayes filter, as well as the location of the tracked object in the image, are modelled as unknowns. This allows the detection to be corrected before it is incorporated into the recursive filter. Our method is evaluated on a publicly available benchmark dataset and outperforms related methods in terms of geometric precision and tracking accuracy.
NASA Astrophysics Data System (ADS)
Clair, T. A.; Ehrman, J. M.
2006-12-01
The doubling of atmospheric CO2 on temperature and precipitation will change annual runoff and dissolved organic carbon (DOC) export patterns in northern Canada. Because of the physical size and the range of climatic changes of northern Canada, we found it necessary to model potential changes in river water and carbon exports in the region using a neural network approach. We developed a model for hydrology and one for DOC using as inputs, monthly General Circulation Model temperature and precipitation predictions, historical hydrology and dissolved organic carbon values, as well as catchment size and slope. Mining Environment Canada's historical hydrology and water chemistry databases allowed us to identify 20 sites suitable for our analysis. The site results were summarized within the Canadian Terrestrial Ecozone classification system. Our results show spring melts occurring one month sooner in all northern ecozones except for the Hudson Bay Plains zone, with changes in melt intensity occurring in most regions. The DOC model predicts that exports from catchments will increase by between 10 and 20% depending on the ecozone. Generally, we predict that major changes in both hydrology and carbon cycling should be expected in northern Canadian ecosystems in a warmer planet.
Comparisons between data assimilated HYCOM output and in situ Argo measurements in the Bay of Bengal
NASA Astrophysics Data System (ADS)
Wilson, E. A.; Riser, S.
2014-12-01
This study evaluates the performance of data assimilated Hybrid Coordinate Ocean Model (HYCOM) output for the Bay of Bengal from September 2008 through July 2013. We find that while HYCOM assimilates Argo data, the model still suffers from significant temperature and salinity biases in this region. These biases are most severe in the northern Bay of Bengal, where the model tends to be too saline near the surface and too fresh at depth. The maximum magnitude of these biases is approximately 0.6 PSS. We also find that the model's salinity biases have a distinct seasonal cycle. The most problematic periods are the months following the summer monsoon (Oct-Jan). HYCOM's near surface temperature estimates compare more favorably with Argo, but significant errors exist at deeper levels. We argue that optimal interpolation will tend to induce positive salinity biases in the northern regions of the Bay. Further, we speculate that these biases are introduced when the model relaxes to climatology and assimilates real-time data.
Bayesian inference for psychology, part IV: parameter estimation and Bayes factors.
Rouder, Jeffrey N; Haaf, Julia M; Vandekerckhove, Joachim
2018-02-01
In the psychological literature, there are two seemingly different approaches to inference: that from estimation of posterior intervals and that from Bayes factors. We provide an overview of each method and show that a salient difference is the choice of models. The two approaches as commonly practiced can be unified with a certain model specification, now popular in the statistics literature, called spike-and-slab priors. A spike-and-slab prior is a mixture of a null model, the spike, with an effect model, the slab. The estimate of the effect size here is a function of the Bayes factor, showing that estimation and model comparison can be unified. The salient difference is that common Bayes factor approaches provide for privileged consideration of theoretically useful parameter values, such as the value corresponding to the null hypothesis, while estimation approaches do not. Both approaches, either privileging the null or not, are useful depending on the goals of the analyst.
Bayes Factor Covariance Testing in Item Response Models.
Fox, Jean-Paul; Mulder, Joris; Sinharay, Sandip
2017-12-01
Two marginal one-parameter item response theory models are introduced, by integrating out the latent variable or random item parameter. It is shown that both marginal response models are multivariate (probit) models with a compound symmetry covariance structure. Several common hypotheses concerning the underlying covariance structure are evaluated using (fractional) Bayes factor tests. The support for a unidimensional factor (i.e., assumption of local independence) and differential item functioning are evaluated by testing the covariance components. The posterior distribution of common covariance components is obtained in closed form by transforming latent responses with an orthogonal (Helmert) matrix. This posterior distribution is defined as a shifted-inverse-gamma, thereby introducing a default prior and a balanced prior distribution. Based on that, an MCMC algorithm is described to estimate all model parameters and to compute (fractional) Bayes factor tests. Simulation studies are used to show that the (fractional) Bayes factor tests have good properties for testing the underlying covariance structure of binary response data. The method is illustrated with two real data studies.
NASA Astrophysics Data System (ADS)
Havens, H.; Luther, M. E.; Meyers, S. D.
2008-12-01
Response time is critical following a hazardous spill in a marine environment and rapid assessment of circulation patterns can mitigate the damage. Tampa Bay Physical Oceanographic Real-Time System (TB- PORTS) data are used to drive a numerical circulation model of the bay for the purpose of hazardous material spill response, monitoring of human health risks, and environmental protection and management. The model is capable of rapidly producing forecast simulations that, in the event of a human health or ecosystem threat, can alert authorities to areas in Tampa Bay with a high probability of being affected by the material. Responders to an anhydrous ammonia spill in November 2007 in Tampa Bay utilized the numerical model of circulation in the estuary to predict where the spill was likely to be transported. The model quickly generated a week-long simulation predicting how winds and currents might move the spill around the bay. The physical mechanisms transporting ammonium alternated from being tidally driven for the initial two days following the spill to a more classical two-layered circulation for the remainder of the simulation. Velocity profiles of Tampa Bay reveal a strong outward flowing current present at the time of the simulation which acted as a significant transport mechanism for ammonium within the bay. Probability distributions, calculated from the predicted model trajectories, guided sampling in the days after the spill resulting in the detection of a toxic Pseudo-nitzschia bloom that likely was initiated as a result of the anhydrous ammonia spill. The prediction system at present is only accessible to scientists in the Ocean Monitoring and Prediction Lab (OMPL) at the University of South Florida. The forecast simulations are compiled into an animation that is provided to end users at their request. In the future, decision makers will be allowed access to an online component of the coastal prediction system that can be used to manage response and mitigation efforts in order to reduce the risk from such disasters as a hazardous material spills or ship groundings.
Grummer, Jared A; Bryson, Robert W; Reeder, Tod W
2014-03-01
Current molecular methods of species delimitation are limited by the types of species delimitation models and scenarios that can be tested. Bayes factors allow for more flexibility in testing non-nested species delimitation models and hypotheses of individual assignment to alternative lineages. Here, we examined the efficacy of Bayes factors in delimiting species through simulations and empirical data from the Sceloporus scalaris species group. Marginal-likelihood scores of competing species delimitation models, from which Bayes factor values were compared, were estimated with four different methods: harmonic mean estimation (HME), smoothed harmonic mean estimation (sHME), path-sampling/thermodynamic integration (PS), and stepping-stone (SS) analysis. We also performed model selection using a posterior simulation-based analog of the Akaike information criterion through Markov chain Monte Carlo analysis (AICM). Bayes factor species delimitation results from the empirical data were then compared with results from the reversible-jump MCMC (rjMCMC) coalescent-based species delimitation method Bayesian Phylogenetics and Phylogeography (BP&P). Simulation results show that HME and sHME perform poorly compared with PS and SS marginal-likelihood estimators when identifying the true species delimitation model. Furthermore, Bayes factor delimitation (BFD) of species showed improved performance when species limits are tested by reassigning individuals between species, as opposed to either lumping or splitting lineages. In the empirical data, BFD through PS and SS analyses, as well as the rjMCMC method, each provide support for the recognition of all scalaris group taxa as independent evolutionary lineages. Bayes factor species delimitation and BP&P also support the recognition of three previously undescribed lineages. In both simulated and empirical data sets, harmonic and smoothed harmonic mean marginal-likelihood estimators provided much higher marginal-likelihood estimates than PS and SS estimators. The AICM displayed poor repeatability in both simulated and empirical data sets, and produced inconsistent model rankings across replicate runs with the empirical data. Our results suggest that species delimitation through the use of Bayes factors with marginal-likelihood estimates via PS or SS analyses provide a useful and complementary alternative to existing species delimitation methods.
Understanding the Flushing Capability of Bellingham Bay and Its Implication on Bottom Water Hypoxia
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, Taiping; Yang, Zhaoqing
2015-05-05
In this study, an unstructured-grid finite-volume coastal ocean model (FVCOM) was used to simulate hydrodynamic circulation and assess the flushing capability in Bellingham Bay, Washington, USA. The model was reasonably calibrated against field observations for water level, velocity and salinity, and was further used to calculate residence time distributions in the study site. The model results suggest that, despite the large tidal ranges (~4 m during spring tide), tidal currents are relatively weak in Bellingham Bay with surface currents generally below 0.5 m/s. The local residence time in Bellingham Bay varies from to near zero to as long as 15more » days, depending on the location and river flow condition. In general, Bellingham Bay is a well-flushed coastal embayment affected by freshwater discharge, tides, wind, and density-driven circulation. The basin-wide global residence time ranges from 5-7 days. The model results also provide useful information on possible causes of the emerging summertime hypoxia problem in the north central region of Bellingham Bay. It was concluded that the formation of the bottom hypoxic water should result from the increased consumption rate of oxygen in the bottom oceanic inflow with low dissolved oxygen by organic matters accumulated at the regions characterized with relatively long residence time in summer months.« less
The environmental fluid dynamics code (EFDC) was used to study the three dimensional (3D) circulation, water quality, and ecology in Narragansett Bay, RI. Predictions of the Bay hydrodynamics included the behavior of the water surface elevation, currents, salinity, and temperatur...
Importance of Dissolved Organic Nitrogen to Water Quality in Narragansett Bay
This preliminary analysis of the importance of the dissolved organic nitrogen (DON) pool in Narragansett Bay is being conducted as part of a five-year study of Narragansett Bay and its watershed. This larger study includes water quality and ecological modeling components that foc...
Modeling Diel Oxygen Dynamics and Ecosystem Metabolism in Weeks Bay, Alabama.
Weeks Bay is a shallow eutrophic estuary that exhibits frequent summertime diel-cycling hypoxia and periods of dissolved oxygen (DO) oversaturation during the day. Diel DO dynamics in shallow estuaries like Weeks Bay are complex, and may be influenced by wind forcing, vertical an...
Predicting tidal currents in San Francisco Bay using a spectral model
Burau, Jon R.; Cheng, Ralph T.
1988-01-01
This paper describes the formulation of a spectral (or frequency based) model which solves the linearized shallow water equations. To account for highly variable basin bathymetry, spectral solutions are obtained using the finite element method which allows the strategic placement of the computation points in the specific areas of interest or in areas where the gradients of the dependent variables are expected to be large. Model results are compared with data using simple statistics to judge overall model performance in the San Francisco Bay estuary. Once the model is calibrated and verified, prediction of the tides and tidal currents in San Francisco Bay is accomplished by applying astronomical tides (harmonic constants deduced from field data) at the prediction time along the model boundaries.
Rodgers, Joseph Lee
2016-01-01
The Bayesian-frequentist debate typically portrays these statistical perspectives as opposing views. However, both Bayesian and frequentist statisticians have expanded their epistemological basis away from a singular focus on the null hypothesis, to a broader perspective involving the development and comparison of competing statistical/mathematical models. For frequentists, statistical developments such as structural equation modeling and multilevel modeling have facilitated this transition. For Bayesians, the Bayes factor has facilitated this transition. The Bayes factor is treated in articles within this issue of Multivariate Behavioral Research. The current presentation provides brief commentary on those articles and more extended discussion of the transition toward a modern modeling epistemology. In certain respects, Bayesians and frequentists share common goals.
Oil Spill Detection along the Gulf of Mexico Coastline based on Airborne Imaging Spectrometer Data
NASA Astrophysics Data System (ADS)
Arslan, M. D.; Filippi, A. M.; Guneralp, I.
2013-12-01
The Deepwater Horizon oil spill in the Gulf of Mexico between April and July 2010 demonstrated the importance of synoptic oil-spill monitoring in coastal environments via remote-sensing methods. This study focuses on terrestrial oil-spill detection and thickness estimation based on hyperspectral images acquired along the coastline of the Gulf of Mexico. We use AVIRIS (Airborne Visible/Infrared Imaging Spectrometer) imaging spectrometer data collected over Bay Jimmy and Wilkinson Bay within Barataria Bay, Louisiana, USA during September 2010. We also employ field-based observations of the degree of oil accumulation along the coastline, as well as in situ measurements from the literature. As part of our proposed spectroscopic approach, we operate on atmospherically- and geometrically-corrected hyperspectral AVIRIS data to extract image-derived endmembers via Minimum Noise Fraction transform, Pixel Purity Index-generation, and n-dimensional visualization. Extracted endmembers are then used as input to endmember-mapping algorithms to yield fractional-abundance images and crisp classification images. We also employ Multiple Endmember Spectral Mixture Analysis (MESMA) for oil detection and mapping in order to enable the number and types of endmembers to vary on a per-pixel basis, in contast to simple Spectral Mixture Analysis (SMA). MESMA thus better allows accounting for spectral variabiltiy of oil (e.g., due to varying oil thicknesses, states of degradation, and the presence of different oil types, etc.) and other materials, including soils and salt marsh vegetation of varying types, which may or may not be affected by the oil spill. A decision-tree approach is also utilized for comparison. Classification results do indicate that MESMA provides advantageous capabilities for mapping several oil-thickness classes for affected vegetation and soils along the Gulf of Mexico coastline, relative to the conventional approaches tested. Oil thickness-mapping results from MESMA and the decision tree demonstrate that such products can be accurately generated in complex coastal enviroments.
Feature weight estimation for gene selection: a local hyperlinear learning approach
2014-01-01
Background Modeling high-dimensional data involving thousands of variables is particularly important for gene expression profiling experiments, nevertheless,it remains a challenging task. One of the challenges is to implement an effective method for selecting a small set of relevant genes, buried in high-dimensional irrelevant noises. RELIEF is a popular and widely used approach for feature selection owing to its low computational cost and high accuracy. However, RELIEF based methods suffer from instability, especially in the presence of noisy and/or high-dimensional outliers. Results We propose an innovative feature weighting algorithm, called LHR, to select informative genes from highly noisy data. LHR is based on RELIEF for feature weighting using classical margin maximization. The key idea of LHR is to estimate the feature weights through local approximation rather than global measurement, which is typically used in existing methods. The weights obtained by our method are very robust in terms of degradation of noisy features, even those with vast dimensions. To demonstrate the performance of our method, extensive experiments involving classification tests have been carried out on both synthetic and real microarray benchmark datasets by combining the proposed technique with standard classifiers, including the support vector machine (SVM), k-nearest neighbor (KNN), hyperplane k-nearest neighbor (HKNN), linear discriminant analysis (LDA) and naive Bayes (NB). Conclusion Experiments on both synthetic and real-world datasets demonstrate the superior performance of the proposed feature selection method combined with supervised learning in three aspects: 1) high classification accuracy, 2) excellent robustness to noise and 3) good stability using to various classification algorithms. PMID:24625071
Larrañaga, Ana; Bielza, Concha; Pongrácz, Péter; Faragó, Tamás; Bálint, Anna; Larrañaga, Pedro
2015-03-01
Barking is perhaps the most characteristic form of vocalization in dogs; however, very little is known about its role in the intraspecific communication of this species. Besides the obvious need for ethological research, both in the field and in the laboratory, the possible information content of barks can also be explored by computerized acoustic analyses. This study compares four different supervised learning methods (naive Bayes, classification trees, [Formula: see text]-nearest neighbors and logistic regression) combined with three strategies for selecting variables (all variables, filter and wrapper feature subset selections) to classify Mudi dogs by sex, age, context and individual from their barks. The classification accuracy of the models obtained was estimated by means of [Formula: see text]-fold cross-validation. Percentages of correct classifications were 85.13 % for determining sex, 80.25 % for predicting age (recodified as young, adult and old), 55.50 % for classifying contexts (seven situations) and 67.63 % for recognizing individuals (8 dogs), so the results are encouraging. The best-performing method was [Formula: see text]-nearest neighbors following a wrapper feature selection approach. The results for classifying contexts and recognizing individual dogs were better with this method than they were for other approaches reported in the specialized literature. This is the first time that the sex and age of domestic dogs have been predicted with the help of sound analysis. This study shows that dog barks carry ample information regarding the caller's indexical features. Our computerized analysis provides indirect proof that barks may serve as an important source of information for dogs as well.
Empirical Bayes Approaches to Multivariate Fuzzy Partitions.
ERIC Educational Resources Information Center
Woodbury, Max A.; Manton, Kenneth G.
1991-01-01
An empirical Bayes-maximum likelihood estimation procedure is presented for the application of fuzzy partition models in describing high dimensional discrete response data. The model describes individuals in terms of partial membership in multiple latent categories that represent bounded discrete spaces. (SLD)
Modeling Total Suspended Solids (TSS) Concentrations in Narragansett Bay.
This work covers mechanistic modeling of suspended particulates in estuarine systems with an application to Narragansett Bay, RI. Suspended particles directly affect water clarity and attenuate light in the water column. Water clarity affects both phytoplankton and submerged aqua...
NASA Astrophysics Data System (ADS)
Wang, Y.; Ramaswamy, V.; Saleh, F.
2017-12-01
Barnegat Bay located on the east coast of New Jersey, United States and is separated from the Atlantic Ocean by the narrow Barnegat Peninsula which acts as a barrier island. The bay is fed by several rivers which empty through small estuaries along the inner shore. In terms of vulnerability from flooding, the Barnegat Peninsula is under the influence of both coastal storm surge and riverine flooding. Barnegat Bay was hit by Hurricane Sandy causing flood damages with extensive cross-island flow at many streets perpendicular to the shoreline. The objective of this work is to identify and quantify the sources of flooding using a two dimensional inland hydrodynamic model. The hydrodynamic model was forced by three observed coastal boundary conditions, and one hydrologic boundary condition from United States Geological Survey (USGS). The model reliability was evaluated with both FEMA spatial flooding extend and USGS High water marks. Simulated flooding extent showed good agreement with the reanalysis spatial inundation extents. Results offered important perspectives on the flow of the water into the bay, the velocity and the depth of the inundated areas. Using such information can enable emergency managers and decision makers identify evacuation and deploy flood defenses.
Quantifying groundwater’s role in delaying improvements to Chesapeake Bay water quality
Sanford, Ward E.; Pope, Jason P.
2013-01-01
A study has been undertaken to determine the time required for the effects of nitrogen-reducing best management practices (BMPs) implemented at the land surface to reach the Chesapeake Bay via groundwater transport to streams. To accomplish this, a nitrogen mass-balance regression (NMBR) model was developed and applied to seven watersheds on the Delmarva Peninsula. The model included the distribution of groundwater return times obtained from a regional groundwater-flow (GWF) model, the history of nitrogen application at the land surface over the last century, and parameters that account for denitrification. The model was (1) able to reproduce nitrate concentrations in streams and wells over time, including a recent decline in the rate at which concentrations have been increasing, and (2) used to forecast future nitrogen delivery from the Delmarva Peninsula to the Bay given different scenarios of nitrogen load reduction to the water table. The relatively deep porous aquifers of the Delmarva yield longer groundwater return times than those reported earlier for western parts of the Bay watershed. Accordingly, several decades will be required to see the full effects of current and future BMPs. The magnitude of this time lag is critical information for Chesapeake Bay watershed managers and stakeholders.
Fingerprints of Sea Level Rise on Changing Tides in the Chesapeake and Delaware Bays
NASA Astrophysics Data System (ADS)
Ross, Andrew C.; Najjar, Raymond G.; Li, Ming; Lee, Serena Blyth; Zhang, Fan; Liu, Wei
2017-10-01
Secular tidal trends are present in many tide gauge records, but their causes are often unclear. This study examines trends in tides over the last century in the Chesapeake and Delaware Bays. Statistical models show negative M2 amplitude trends at the mouths of both bays, while some upstream locations have insignificant or positive trends. To determine whether sea level rise is responsible for these trends, we include a term for mean sea level in the statistical models and compare the results with predictions from numerical and analytical models. The observed and predicted sensitivities of M2 amplitude and phase to mean sea level are similar, although the numerical model amplitude is less sensitive to sea level. The sensitivity occurs as a result of strengthening and shifting of the amphidromic system in the Chesapeake Bay and decreasing frictional effects and increasing convergence in the Delaware Bay. After accounting for the effect of sea level, significant negative background M2 and S2 amplitude trends are present; these trends may be related to other factors such as dredging, tide gauge errors, or river discharge. Projected changes in tidal amplitudes due to sea level rise over the 21st century are substantial in some areas, but depend significantly on modeling assumptions.
Land use and climate change are expected to alter key processes in the Chesapeake Bay watershed and can potentially exacerbate the impact of excess nitrogen. Atmospheric sources are one of the largest loadings of nitrogen to the Chesapeake Bay watershed. In this study, we explore...
An Adaptive Model of Student Performance Using Inverse Bayes
ERIC Educational Resources Information Center
Lang, Charles
2014-01-01
This article proposes a coherent framework for the use of Inverse Bayesian estimation to summarize and make predictions about student behaviour in adaptive educational settings. The Inverse Bayes Filter utilizes Bayes theorem to estimate the relative impact of contextual factors and internal student factors on student performance using time series…
NASA Astrophysics Data System (ADS)
Ma, Shutian; Motazedian, Dariush; Corchete, Victor
2013-04-01
Many crucial tasks in seismology, such as locating seismic events and estimating focal mechanisms, need crustal velocity models. The velocity models of shallow structures are particularly important in the simulation of ground motions. In southern Ontario, Canada, many small shallow earthquakes occur, generating high-frequency Rayleigh ( Rg) waves that are sensitive to shallow structures. In this research, the dispersion of Rg waves was used to obtain shear-wave velocities in the top few kilometers of the crust in the Georgian Bay, Sudbury, and Thunder Bay areas of southern Ontario. Several shallow velocity models were obtained based on the dispersion of recorded Rg waves. The Rg waves generated by an m N 3.0 natural earthquake on the northern shore of Georgian Bay were used to obtain velocity models for the area of an earthquake swarm in 2007. The Rg waves generated by a mining induced event in the Sudbury area in 2005 were used to retrieve velocity models between Georgian Bay and the Ottawa River. The Rg waves generated by the largest event in a natural earthquake swarm near Thunder Bay in 2008 were used to obtain a velocity model in that swarm area. The basic feature of all the investigated models is that there is a top low-velocity layer with a thickness of about 0.5 km. The seismic velocities changed mainly within the top 2 km, where small earthquakes often occur.
Land-Use and Land-Cover Change around Mobile Bay, Alabama from 1974-2008
NASA Technical Reports Server (NTRS)
Ellis, Jean; Spruce, Joseph P.; Swann, Roberta; Smooth, James C.
2009-01-01
This document summarizes the major findings of a Gulf of Mexico Application Pilot project led by NASA Stennis Space Center (SSC) in conjunction with a regional collaboration network of the Gulf of Mexico Alliance (GOMA). NASA researchers processed and analyzed multi-temporal Landsat data to assess land-use and land-cover (LULC) changes in the coastal counties of Mobile and Baldwin, AL between 1974 and 2008. Our goal was to create satellite-based LULC data products using methods that could be transferable to other coastal areas of concern within the Gulf of Mexico. The Mobile Bay National Estuary Program (MBNEP) is the primary end-user, however, several other state and local groups may benefit from the project s data products that will be available through NOAA-NCDDC s Regional Ecosystem Data Management program. Mobile Bay is a critical ecologic and economic region in the Gulf of Mexico and to the entire country. Mobile Bay was designated as an estuary of national significance in 1996. This estuary receives the fourth largest freshwater inflow in the United States. It provides vital nursery habitat for commercially and recreationally important fish species. It has exceptional aquatic and terrestrial bio-diversity, however, its estuary health is influenced by changing LULC patterns, such as urbanization. Mobile and Baldwin counties have experienced a population growth of 1.1% and 20.5% from 2000-2006. Urban expansion and population growth are likely to accelerate with the construction and operation of the ThyssenKrupp steel mill in the northeast portion of Mobile County. Land-use and land-cover change can negatively impact Gulf coast water quality and ecological resources. The conversion of forest to urban cover types impacts the carbon cycle and increases the freshwater and sediment in coastal waters. Increased freshwater runoff decreases salinity and increases the turbidity of coastal waters, thus impacting the growth potential of submerged aquatic vegetation (SAV), which is critical nursing ground for many Gulf fish species. A survey of Mobile Bay SAV showed widespread decreases since the 1940s. Prior to our project, coastal environmental managers in Baldwin and Mobile counties needed more understanding of the historical LULC for properly assessing the impacts of urbanization. In particular, more information on the location and extent of changing urbanization LULC patterns was needed to aid LULC planning and to assess predictions of future LULC patterns. Our products will assist the coastal environmental managers and land-use planners in making better community growth planning decisions. Our project also will help to establish a historical baseline of LULC distributions, which is a fundamental need in any stewardship plan. The primary research objective of our project was to produce historic and current geospatial LULC change products across a 34-year time frame. A multi-decadal coastal LULC change product was the major project deliverable. The geographic extent and nature of change was quantified and assessed for the upland herbaceous, barren, open water, urban, upland forest, woody wetland, and non-woody wetlanddominated land cover types. We focused on regional analyses of decadal-scale urban expansion and watershed-scaled analyses of LULC change for multiple areas of concern to the Mobile Bay NEP (Figure A). We used the following dates to derive LULC classification products from Landsat data: 1974, 1979, 1984, 1988, 1991, 1996, 2001, 2005, and 2008. We assessed the accuracy of our products using randomly sampled locations and digital geospatial reference data including field survey data, high resolution orthorectified aerial photography, high resolution multispectral and panchromatic satellite data displays (from QuickBird and Corona sensors), digital elevation model data, and National Wetlands Inventory wetland cover type data. NOAA s Coastal Change Assessment Program s (C-CAP) and National Land Cover Database (NLCD) procts were used for qualitative comparison in assessing map accuracy. We calculated an average overall classification accuracy of 87% with similar overall accuracies for the older (MSS) and newer (TM and ETM) Landsat LULC products.
A Machine Learning Concept for DTN Routing
NASA Technical Reports Server (NTRS)
Dudukovich, Rachel; Hylton, Alan; Papachristou, Christos
2017-01-01
This paper discusses the concept and architecture of a machine learning based router for delay tolerant space networks. The techniques of reinforcement learning and Bayesian learning are used to supplement the routing decisions of the popular Contact Graph Routing algorithm. An introduction to the concepts of Contact Graph Routing, Q-routing and Naive Bayes classification are given. The development of an architecture for a cross-layer feedback framework for DTN (Delay-Tolerant Networking) protocols is discussed. Finally, initial simulation setup and results are given.
1993-01-01
Maria and My Parents, Helena and Andrzej IV ACKNOWLEDGMENTS I would like to first of all thank my advisor. Dr. Ryszard Michalski. who introduced...represent the current state of the art in machine learning methodology. The most popular method. the minimization of Bayes risk [ Duda and Hart. 1973]. is a...34 Pattern Recognition, Vol. 23, no. 3-4, pp. 291-309, 1990. Duda , O. and P. Hart, Pattern Classification and Scene Analysis, John Wiley & Sons. 1973
Simulation of scenario earthquake influenced field by using GIS
Zuo, H.-Q.; Xie, L.-L.; Borcherdt, R.D.
1999-01-01
The method for estimating the site effect on ground motion specified by Borcherdt (1994a, 1994b) is briefly introduced in the paper. This method and the detail geological data and site classification data in San Francisco bay area of California, the United States, are applied to simulate the influenced field of scenario earthquake by GIS technology, and the software for simulating has been drawn up. The paper is a partial result of cooperative research project between China Seismological Bureau and US Geological Survey.
Mechanics of Composite Materials with Different Moduli in Tension and Compression
1978-07-01
100% and 400% for carbon-carbon. The principal objective DD N 73 1473 EDITION OF I NOV65 IS OBSOLETE UNCLASSIFIED i i SECURITY CLASSIFICATION OF THIS...corrected. 40 TABLE 2.3 BUCKLING OF PAYLOAD BAY DOOR PANELS WITH VARIOUS LIGHTNING STRIKE PROTECTION CONCEPTS BUCKLING LOAD, N ., lb/in. CONFIGURATION...ORTHOTROPY AND HIGH Et/Ec p 70 P CC"’ CHANGE C02 CHAC -l- AXIAL CHANGE COMMISSION INTIIUNAL IXTERNAL i peamal PRESSURE 40 60 s AXIAL 0 IAN C. TENMiON
Feature selection for the classification of traced neurons.
López-Cabrera, José D; Lorenzo-Ginori, Juan V
2018-06-01
The great availability of computational tools to calculate the properties of traced neurons leads to the existence of many descriptors which allow the automated classification of neurons from these reconstructions. This situation determines the necessity to eliminate irrelevant features as well as making a selection of the most appropriate among them, in order to improve the quality of the classification obtained. The dataset used contains a total of 318 traced neurons, classified by human experts in 192 GABAergic interneurons and 126 pyramidal cells. The features were extracted by means of the L-measure software, which is one of the most used computational tools in neuroinformatics to quantify traced neurons. We review some current feature selection techniques as filter, wrapper, embedded and ensemble methods. The stability of the feature selection methods was measured. For the ensemble methods, several aggregation methods based on different metrics were applied to combine the subsets obtained during the feature selection process. The subsets obtained applying feature selection methods were evaluated using supervised classifiers, among which Random Forest, C4.5, SVM, Naïve Bayes, Knn, Decision Table and the Logistic classifier were used as classification algorithms. Feature selection methods of types filter, embedded, wrappers and ensembles were compared and the subsets returned were tested in classification tasks for different classification algorithms. L-measure features EucDistanceSD, PathDistanceSD, Branch_pathlengthAve, Branch_pathlengthSD and EucDistanceAve were present in more than 60% of the selected subsets which provides evidence about their importance in the classification of this neurons. Copyright © 2018 Elsevier B.V. All rights reserved.
Brain Decoding-Classification of Hand Written Digits from fMRI Data Employing Bayesian Networks
Yargholi, Elahe'; Hossein-Zadeh, Gholam-Ali
2016-01-01
We are frequently exposed to hand written digits 0–9 in today's modern life. Success in decoding-classification of hand written digits helps us understand the corresponding brain mechanisms and processes and assists seriously in designing more efficient brain–computer interfaces. However, all digits belong to the same semantic category and similarity in appearance of hand written digits makes this decoding-classification a challenging problem. In present study, for the first time, augmented naïve Bayes classifier is used for classification of functional Magnetic Resonance Imaging (fMRI) measurements to decode the hand written digits which took advantage of brain connectivity information in decoding-classification. fMRI was recorded from three healthy participants, with an age range of 25–30. Results in different brain lobes (frontal, occipital, parietal, and temporal) show that utilizing connectivity information significantly improves decoding-classification and capability of different brain lobes in decoding-classification of hand written digits were compared to each other. In addition, in each lobe the most contributing areas and brain connectivities were determined and connectivities with short distances between their endpoints were recognized to be more efficient. Moreover, data driven method was applied to investigate the similarity of brain areas in responding to stimuli and this revealed both similarly active areas and active mechanisms during this experiment. Interesting finding was that during the experiment of watching hand written digits, there were some active networks (visual, working memory, motor, and language processing), but the most relevant one to the task was language processing network according to the voxel selection. PMID:27468261
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction
Montesinos-López, Osval A.; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R.; Buenrostro-Mariscal, Raymundo
2017-01-01
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. PMID:28391241
A Variational Bayes Genomic-Enabled Prediction Model with Genotype × Environment Interaction.
Montesinos-López, Osval A; Montesinos-López, Abelardo; Crossa, José; Montesinos-López, José Cricelio; Luna-Vázquez, Francisco Javier; Salinas-Ruiz, Josafhat; Herrera-Morales, José R; Buenrostro-Mariscal, Raymundo
2017-06-07
There are Bayesian and non-Bayesian genomic models that take into account G×E interactions. However, the computational cost of implementing Bayesian models is high, and becomes almost impossible when the number of genotypes, environments, and traits is very large, while, in non-Bayesian models, there are often important and unsolved convergence problems. The variational Bayes method is popular in machine learning, and, by approximating the probability distributions through optimization, it tends to be faster than Markov Chain Monte Carlo methods. For this reason, in this paper, we propose a new genomic variational Bayes version of the Bayesian genomic model with G×E using half-t priors on each standard deviation (SD) term to guarantee highly noninformative and posterior inferences that are not sensitive to the choice of hyper-parameters. We show the complete theoretical derivation of the full conditional and the variational posterior distributions, and their implementations. We used eight experimental genomic maize and wheat data sets to illustrate the new proposed variational Bayes approximation, and compared its predictions and implementation time with a standard Bayesian genomic model with G×E. Results indicated that prediction accuracies are slightly higher in the standard Bayesian model with G×E than in its variational counterpart, but, in terms of computation time, the variational Bayes genomic model with G×E is, in general, 10 times faster than the conventional Bayesian genomic model with G×E. For this reason, the proposed model may be a useful tool for researchers who need to predict and select genotypes in several environments. Copyright © 2017 Montesinos-López et al.
Influence of orographically steered winds on Mutsu Bay surface currents
NASA Astrophysics Data System (ADS)
Yamaguchi, Satoshi; Kawamura, Hiroshi
2005-09-01
Effects of spatially dependent sea surface wind field on currents in Mutsu Bay, which is located at the northern end of Japanese Honshu Island, are investigated using winds derived from synthetic aperture radar (SAR) images and a numerical model. A characteristic wind pattern over the bay was evidenced from analysis of 118 SAR images and coincided with in situ observations. Wind is topographically steered with easterly winds entering the bay through the terrestrial gap and stronger wind blowing over the central water toward its mouth. Nearshore winds are weaker due to terrestrial blockages. Using the Princeton Ocean Model, we investigated currents forced by the observed spatially dependent wind field. The predicted current pattern agrees well with available observations. For a uniform wind field of equal magnitude and average direction, the circulation pattern departs from observations demonstrating that vorticity input due to spatially dependent wind stress is essential in generation of the wind-driven current in Mutsu Bay.
Long Wave Runup in Asymmetric Bays and in Fjords With Two Separate Heads
NASA Astrophysics Data System (ADS)
Raz, Amir; Nicolsky, Dmitry; Rybkin, Alexei; Pelinovsky, Efim
2018-03-01
Modeling of tsunamis in glacial fjords prompts us to evaluate applicability of the cross-sectionally averaged nonlinear shallow water equations to model propagation and runup of long waves in asymmetrical bays and also in fjords with two heads. We utilize the Tuck-Hwang transformation, initially introduced for the plane beaches and currently generalized for bays with arbitrary cross section, to transform the nonlinear governing equations into a linear equation. The solution of the linearized equation describing the runup at the shore line is computed by taking into account the incident wave at the toe of the last sloping segment. We verify our predictions against direct numerical simulation of the 2-D shallow water equations and show that our solution is valid both for bays with an asymmetric L-shaped cross section, and for fjords with two heads—bays with a W-shaped cross section.
A High-Authority/Low-Authority Control Strategy for Coupled Aircraft-Style Bays
NASA Technical Reports Server (NTRS)
Schiller, N. H.; Fuller, C. R.; Cabell, R. H.
2006-01-01
This paper presents a numerical investigation of an active structural acoustic control strategy for coupled aircraft-style bays. While structural coupling can destabilize or limit the performance of some model-based decentralized control systems, fullycoupled centralized control strategies are impractical for typical aircraft containing several hundred bays. An alternative is to use classical rate feedback with matched, collocated transducer pairs to achieve active damping. Unfortunately, due to the conservative nature of this strategy, stability is guaranteed at the expense of achievable noise reduction. Therefore, this paper describes the development of a combined control strategy using robust active damping in addition to a high-authority controller based on linear quadratic Gaussian (LQG) theory. The combined control system is evaluated on a tensioned, two-bay model using piezoceramic actuators and ideal point velocity sensors. Transducer placement on the two-bay structure is discussed, and the advantages of a combined control strategy are presented.
Ferragina, A; de los Campos, G; Vazquez, A I; Cecchinato, A; Bittante, G
2015-11-01
The aim of this study was to assess the performance of Bayesian models commonly used for genomic selection to predict "difficult-to-predict" dairy traits, such as milk fatty acid (FA) expressed as percentage of total fatty acids, and technological properties, such as fresh cheese yield and protein recovery, using Fourier-transform infrared (FTIR) spectral data. Our main hypothesis was that Bayesian models that can estimate shrinkage and perform variable selection may improve our ability to predict FA traits and technological traits above and beyond what can be achieved using the current calibration models (e.g., partial least squares, PLS). To this end, we assessed a series of Bayesian methods and compared their prediction performance with that of PLS. The comparison between models was done using the same sets of data (i.e., same samples, same variability, same spectral treatment) for each trait. Data consisted of 1,264 individual milk samples collected from Brown Swiss cows for which gas chromatographic FA composition, milk coagulation properties, and cheese-yield traits were available. For each sample, 2 spectra in the infrared region from 5,011 to 925 cm(-1) were available and averaged before data analysis. Three Bayesian models: Bayesian ridge regression (Bayes RR), Bayes A, and Bayes B, and 2 reference models: PLS and modified PLS (MPLS) procedures, were used to calibrate equations for each of the traits. The Bayesian models used were implemented in the R package BGLR (http://cran.r-project.org/web/packages/BGLR/index.html), whereas the PLS and MPLS were those implemented in the WinISI II software (Infrasoft International LLC, State College, PA). Prediction accuracy was estimated for each trait and model using 25 replicates of a training-testing validation procedure. Compared with PLS, which is currently the most widely used calibration method, MPLS and the 3 Bayesian methods showed significantly greater prediction accuracy. Accuracy increased in moving from calibration to external validation methods, and in moving from PLS and MPLS to Bayesian methods, particularly Bayes A and Bayes B. The maximum R(2) value of validation was obtained with Bayes B and Bayes A. For the FA, C10:0 (% of each FA on total FA basis) had the highest R(2) (0.75, achieved with Bayes A and Bayes B), and among the technological traits, fresh cheese yield R(2) of 0.82 (achieved with Bayes B). These 2 methods have proven to be useful instruments in shrinking and selecting very informative wavelengths and inferring the structure and functions of the analyzed traits. We conclude that Bayesian models are powerful tools for deriving calibration equations, and, importantly, these equations can be easily developed using existing open-source software. As part of our study, we provide scripts based on the open source R software BGLR, which can be used to train customized prediction equations for other traits or populations. Copyright © 2015 American Dairy Science Association. Published by Elsevier Inc. All rights reserved.
Topobathymetric model of Mobile Bay, Alabama
Danielson, Jeffrey J.; Brock, John C.; Howard, Daniel M.; Gesch, Dean B.; Bonisteel-Cormier, Jamie M.; Travers, Laurinda J.
2013-01-01
Topobathymetric Digital Elevation Models (DEMs) are a merged rendering of both topography (land elevation) and bathymetry (water depth) that provides a seamless elevation product useful for inundation mapping, as well as for other earth science applications, such as the development of sediment-transport, sea-level rise, and storm-surge models. This 1/9-arc-second (approximately 3 meters) resolution model of Mobile Bay, Alabama was developed using multiple topographic and bathymetric datasets, collected on different dates. The topographic data were obtained primarily from the U.S. Geological Survey (USGS) National Elevation Dataset (NED) (http://ned.usgs.gov/) at 1/9-arc-second resolution; USGS Experimental Advanced Airborne Research Lidar (EAARL) data (2 meters) (http://pubs.usgs.gov/ds/400/); and topographic lidar data (2 meters) and Compact Hydrographic Airborne Rapid Total Survey (CHARTS) lidar data (2 meters) from the U.S. Army Corps of Engineers (USACE) (http://www.csc.noaa.gov/digitalcoast/data/coastallidar/). Bathymetry was derived from digital soundings obtained from the National Oceanic and Atmospheric Administration’s (NOAA) National Geophysical Data Center (NGDC) (http://www.ngdc.noaa.gov/mgg/geodas/geodas.html) and from water-penetrating lidar sources, such as EAARL and CHARTS. Mobile Bay is ecologically important as it is the fourth largest estuary in the United States. The Mobile and Tensaw Rivers drain into the bay at the northern end with the bay emptying into the Gulf of Mexico at the southern end. Dauphin Island (a barrier island) and the Fort Morgan Peninsula form the mouth of Mobile Bay. Mobile Bay is 31 miles (50 kilometers) long by a maximum width of 24 miles (39 kilometers) with a total area of 413 square miles (1,070 square kilometers). The vertical datum of the Mobile Bay topobathymetric model is the North American Vertical Datum of 1988 (NAVD 88). All the topographic datasets were originally referenced to NAVD 88 and no transformations were made to these input data. The NGDC hydrographic, multibeam, and trackline surveys were transformed from mean low water (MLW) or mean lower low water (MLLW) to NAVD 88 using VDatum (http://vdatum.noaa.gov). VDatum is a tool developed by the National Geodetic Survey (NGS) that performs transformations among tidal, ellipsoid-based, geoid-based, and orthometric datums using calibrated hydrodynamic models. The vertical accuracy of the input topographic data varied depending on the input source. Because the input elevation data were derived primarily from lidar, the vertical accuracy ranges from 6 to 20 centimeters in root mean square error (RMSE). he horizontal datum of the Mobile Bay topobathymetric model is the North American Datum of 1983 (NAD 83), geographic coordinates. All the topographic and bathymetric datasets were originally referenced to NAD 83, and no transformations were made to the input data. The bathymetric surveys were downloaded referenced to NAD 83 geographic, and therefore no horizontal transformations were required. The topbathymetric model of Mobile Bay and detailed metadata can be obtained from the USGS Web sites: http://nationalmap.gov/.
Das, D K; Maiti, A K; Chakraborty, C
2015-03-01
In this paper, we propose a comprehensive image characterization cum classification framework for malaria-infected stage detection using microscopic images of thin blood smears. The methodology mainly includes microscopic imaging of Leishman stained blood slides, noise reduction and illumination correction, erythrocyte segmentation, feature selection followed by machine classification. Amongst three-image segmentation algorithms (namely, rule-based, Chan-Vese-based and marker-controlled watershed methods), marker-controlled watershed technique provides better boundary detection of erythrocytes specially in overlapping situations. Microscopic features at intensity, texture and morphology levels are extracted to discriminate infected and noninfected erythrocytes. In order to achieve subgroup of potential features, feature selection techniques, namely, F-statistic and information gain criteria are considered here for ranking. Finally, five different classifiers, namely, Naive Bayes, multilayer perceptron neural network, logistic regression, classification and regression tree (CART), RBF neural network have been trained and tested by 888 erythrocytes (infected and noninfected) for each features' subset. Performance evaluation of the proposed methodology shows that multilayer perceptron network provides higher accuracy for malaria-infected erythrocytes recognition and infected stage classification. Results show that top 90 features ranked by F-statistic (specificity: 98.64%, sensitivity: 100%, PPV: 99.73% and overall accuracy: 96.84%) and top 60 features ranked by information gain provides better results (specificity: 97.29%, sensitivity: 100%, PPV: 99.46% and overall accuracy: 96.73%) for malaria-infected stage classification. © 2014 The Authors Journal of Microscopy © 2014 Royal Microscopical Society.
Using a Content Management System for Integrated Water Quantity, Quality and Instream Flows Modeling
NASA Astrophysics Data System (ADS)
Burgholzer, R.; Brogan, C. O.; Scott, D.; Keys, T.
2017-12-01
With increased population and water demand, in-stream flows can become depleted by consumptive uses and dilution of permitted discharges may be compromised. Reduced flows downstream of water withdrawals may increase the violation rate of bacterial concentrations from direct deposition by livestock and wildlife. Water storage reservoirs are constructed and operated to insure more stable supplies for consumptive demands and dilution flows, however their use comes at the cost of increased evaporative losses, potential for thermal pollution, interrupted fish migration, and reduced flooding events that are critical to maintain habitat and water quality. Due to this complex interrelationship between water quantity, quality and instream habitat comprehensive multi-disciplinary models must be developed to insure long-term sustainability of water resources and to avoid conflicts between drinking water, food and energy production, and aquatic biota. The Commonwealth of Virginia funded the expansion of the Chesapeake Bay Program Phase 5 model to cover the entire state, and has been using this model to evaluate water supply permit and planning since 2009. This integrated modeling system combines a content management system (Drupal and PHP) for model input data and leverages the modularity of HSPF with the custom segmentation and parameterization routines programmed by modelers working with the Chesapeake Bay Program. The model has been applied to over 30 Virginia Water Permits, instream flows and aquatic habitat models and a Virginias 30 year water supply demand projections. Future versions will leverage the Bay Model auto-calibration routines for adding small-scale water supply and TMDL models, utilize climate change scenarios, and integrate Virginia's reservoir management modules into the Chesapeake Bay watershed model, feeding projected demand and operational changes back up to EPA models to improve the realism of future Bay-wide simulations.
Papageorgiou, Eirini; Nieuwenhuys, Angela; Desloovere, Kaat
2017-01-01
Background This study aimed to improve the automatic probabilistic classification of joint motion gait patterns in children with cerebral palsy by using the expert knowledge available via a recently developed Delphi-consensus study. To this end, this study applied both Naïve Bayes and Logistic Regression classification with varying degrees of usage of the expert knowledge (expert-defined and discretized features). A database of 356 patients and 1719 gait trials was used to validate the classification performance of eleven joint motions. Hypotheses Two main hypotheses stated that: (1) Joint motion patterns in children with CP, obtained through a Delphi-consensus study, can be automatically classified following a probabilistic approach, with an accuracy similar to clinical expert classification, and (2) The inclusion of clinical expert knowledge in the selection of relevant gait features and the discretization of continuous features increases the performance of automatic probabilistic joint motion classification. Findings This study provided objective evidence supporting the first hypothesis. Automatic probabilistic gait classification using the expert knowledge available from the Delphi-consensus study resulted in accuracy (91%) similar to that obtained with two expert raters (90%), and higher accuracy than that obtained with non-expert raters (78%). Regarding the second hypothesis, this study demonstrated that the use of more advanced machine learning techniques such as automatic feature selection and discretization instead of expert-defined and discretized features can result in slightly higher joint motion classification performance. However, the increase in performance is limited and does not outweigh the additional computational cost and the higher risk of loss of clinical interpretability, which threatens the clinical acceptance and applicability. PMID:28570616
Is there a signal of sea-level rise in Chesapeake Bay salinity?
NASA Astrophysics Data System (ADS)
Hilton, T. W.; Najjar, R. G.; Zhong, L.; Li, M.
2008-09-01
We evaluate the hypothesis that sea-level rise over the second half of the 20th century has led to detectable increases in Chesapeake Bay salinity. We exploit a simple, statistical model that predicts monthly mean salinity as a function of Susquehanna River flow in 23 segments of the main stem Chesapeake Bay. The residual (observed minus modeled) salinity exhibits statistically significant linear (p < 0.05) trends between 1949 and 2006 in 13 of the 23 segments of the bay. The salinity change estimated from the trend line over this period varies from -2.0 to 2.2, with 10 of the 13 cells showing positive changes. The mean and median salinity changes over all 23 cells are 0.47 and 0.72; over the 13 cells with significant trends they are 0.71 and 1.1. We ran a hydrodynamic model of the bay under present-day and reduced sea level conditions and found a bay-average salinity increase of about 0.5, which supports the hypothesis that the salinity residual trends have a significant component due to sea-level rise. Uncertainties remain, however, due to the spatial and temporal extent of historical salinity data and the infilling of the bay due to sedimentation. The salinity residuals also exhibit interannual variability, with peaks occurring at intervals of roughly 7 to 9 years, which are partially explained by Atlantic Shelf salinity, Potomac River flow and the meridional component of wind stress.
NASA Astrophysics Data System (ADS)
Cerralbo, Pablo; Espino, Manuel; Grifoll, Manel
2016-08-01
This contribution shows the importance of the cross-shore spatial wind variability in the water circulation in a small-sized micro-tidal bay. The hydrodynamic wind response at Alfacs Bay (Ebro River delta, NW Mediterranean Sea) is investigated with a numerical model (ROMS) supported by in situ observations. The wind variability observed in meteorological measurements is characterized with meteorological model (WRF) outputs. From the hydrodynamic simulations of the bay, the water circulation response is affected by the cross-shore wind variability, leading to water current structures not observed in the homogeneous-wind case. If the wind heterogeneity response is considered, the water exchange in the longitudinal direction increases significantly, reducing the water exchange time by around 20%. Wind resolutions half the size of the bay (in our case around 9 km) inhibit cross-shore wind variability, which significantly affects the resultant circulation pattern. The characteristic response is also investigated using idealized test cases. These results show how the wind curl contributes to the hydrodynamic response in shallow areas and promotes the exchange between the bay and the open sea. Negative wind curl is related to the formation of an anti-cyclonic gyre at the bay's mouth. Our results highlight the importance of considering appropriate wind resolution even in small-scale domains (such as bays or harbors) to characterize the hydrodynamics, with relevant implications in the water exchange time and the consequent water quality and ecological parameters.
Arnold, W Ray; Warren-Hicks, William J
2007-01-01
The object of this study was to estimate site- and region-specific dissolved copper criteria for a large embayment, the Chesapeake Bay, USA. The intent is to show the utility of 2 copper saltwater quality site-specific criteria estimation models and associated region-specific criteria selection methods. The criteria estimation models and selection methods are simple, efficient, and cost-effective tools for resource managers. The methods are proposed as potential substitutes for the US Environmental Protection Agency's water effect ratio methods. Dissolved organic carbon data and the copper criteria models were used to produce probability-based estimates of site-specific copper saltwater quality criteria. Site- and date-specific criteria estimations were made for 88 sites (n = 5,296) in the Chesapeake Bay. The average and range of estimated site-specific chronic dissolved copper criteria for the Chesapeake Bay were 7.5 and 5.3 to 16.9 microg Cu/L. The average and range of estimated site-specific acute dissolved copper criteria for the Chesapeake Bay were 11.7 and 8.3 to 26.4 microg Cu/L. The results suggest that applicable national and state copper criteria can increase in much of the Chesapeake Bay and remain protective. Virginia Department of Environmental Quality copper criteria near the mouth of the Chesapeake Bay, however, need to decrease to protect species of equal or greater sensitivity to that of the marine mussel, Mytilus sp.
NASA Astrophysics Data System (ADS)
Nicolle, Amandine; Dumas, Franck; Foveau, Aurélie; Foucher, Eric; Thiébaut, Eric
2013-06-01
The king scallop ( Pecten maximus) is one of the most important benthic species of the English Channel as it constitutes the first fishery in terms of landings in this area. To support strategies of spatial fishery management, we develop a high-resolution biophysical model to study scallop dispersal in two bays along the French coasts of the English Channel (i.e. the bay of Saint-Brieuc and the bay of Seine) and to quantify the relative roles of local hydrodynamic processes, temperature-dependent planktonic larval duration (PLD) and active swimming behaviour (SB). The two bays are chosen for three reasons: (1) the distribution of the scallop stocks in these areas is well known from annual scallop stock surveys, (2) these two bays harbour important fisheries and (3) scallops in these two areas present some differences in terms of reproductive cycle and spawning duration. The English Channel currents and temperature are simulated for 10 years (2000-2010) with the MARS-3D code and then used by the Lagrangian module of MARS-3D to model the transport. Results were analysed in terms of larval distribution at settlement and connectivity rates. While larval transport in the two bays depended both on the tidal residual circulation and the wind-induced currents, the relative role of these two hydrodynamic processes varied among bays. In the bay of Saint-Brieuc, the main patterns of larval dispersal were due to tides, the wind being only a source of variability in the extent of larval patch and the local retention rate. Conversely, in the bay of Seine, wind-induced currents altered both the direction and the extent of larval transport. The main effect of a variable PLD in relation to the thermal history of each larva was to reduce the spread of dispersal and consequently increase the local retention by about 10 % on average. Although swimming behaviour could influence larval dispersal during the first days of the PLD when larvae are mainly located in surface waters, it has a minor role on larval distribution at settlement and retention rates. The analysis of the connectivity between subpopulations within each bay allows identifying the main sources of larvae which depend on both the characteristics of local hydrodynamics and the spatial heterogeneity in the reproductive outputs.
NASA Astrophysics Data System (ADS)
Balt, C.; Kincaid, C. R.; Ullman, D. S.
2010-12-01
Greenwich Bay and the Providence River represent two subsystems of the Narragansett Bay (RI) estuary with chronic water quality problems. Both underway and moored Acoustic Doppler Current Profiler (ADCP) observations have shown the presence of large-scale, subtidal gyres within these subsystems. Prior numerical models of Narragansett Bay, developed using the Regional Ocean Modeling System (ROMS), indicate that prevailing summer sea breeze conditions are favorable to the evolution of stable circulation gyres, which increase retention times within each subsystem. Fluid dynamics laboratory models of the Providence River, conducted in the Geophysical Fluid Dynamics Laboratory of the Research School of Earth Sciences (Australian National University), reproduce gyres that match first order features of the ADCP data. These laboratory models also reveal details of small-scale eddies along the edges of the retention gyre. We report results from spatially and temporally detailed current meter deployments (using SeaHorse Tilt Current Meters) in both subsystems, which reveal details on the growth and decay of gyres under various spring-summer forcing conditions. In particular, current meters were deployed during the severe flooding events in the Narragansett Bay watershed during March, 2010. A combination of current meter data and high-resolution ROMS modeling is used to show how gyres effectively limit subtidal exchange from the Providence River and Greenwich Bay and to understand the forcing conditions that favor efficient flushing. The residence times of stable gyres within these regions can be an order of magnitude larger than values predicted by fraction of water methods. ROMS modeling is employed to characterize gyre energy, stability, and flushing rates for a wide range of seasonal, wind and runoff scenarios.
Hurricane-induced Sediment Transport and Morphological Change in Jamaica Bay, New York
NASA Astrophysics Data System (ADS)
Hu, K.; Chen, Q. J.
2016-02-01
Jamaica Bay is located in Brooklyn and Queens, New York on the western end of the south shore of the Long Island land mass. It experienced a conversion of more than 60% of the vegetated salt-marsh islands to intertidal and subtidal mudflats. Hurricanes and nor'easters are among the important driving forces that reshape coastal landscape quickly and affect wetland sustainability. Wetland protection and restoration need a better understanding of hydrodynamics and sediment transport in this area, especially under extreme weather conditions. Hurricane Sandy, which made landfall along east coast on October 30, 2012, provides a critical opportunity for studying the impacts of hurricanes on sedimentation, erosion and morphological changes in Jamaica Bay and salt marsh islands. The Delft3D model suit was applied to model hydrodynamics and sediment transport in Jamaica Bay and salt marsh islands. Three domains were set up for nesting computation. The local domain covering the bay and salt marshes has a resolution of 10 m. The wave module was online coupled with the flow module. Vegetation effects were considered as a large number of rigid cylinders by a sub-module in Delft3D. Parameters in sediment transport and morphological change were carefully chosen and calibrated. Prior- and post-Sandy Surface Elevation Table (SET)/accretion data including mark horizon (short-term) and 137Cs and 210Pb (long-term) at salt marsh islands in Jamaica Bay were used for model validation. Model results indicate that waves played an important role in hurricane-induced morphological change in Jamaica Bay and wetlands. In addition, numerical experiments were carried out to investigate the impacts of hypothetic hurricanes. This study has been supported by the U.S. Geological Survey Hurricane Sandy Disaster Recovery Act Funds.
NASA Astrophysics Data System (ADS)
Welford, J. Kim; Peace, Alexander L.; Geng, Meixia; Dehler, Sonya A.; Dickie, Kate
2018-05-01
Mesozoic to Cenozoic continental rifting, breakup, and spreading between North America and Greenland led to the opening, from south to north, of the Labrador Sea and eventually Baffin Bay between Baffin Island, northeast Canada, and northwest Greenland. Baffin Bay lies at the northern limit of this extinct rift, transform, and spreading system and remains largely underexplored. With the sparsity of existing crustal-scale geophysical investigations of Baffin Bay, regional potential field methods and quantitative deformation assessments based on plate reconstructions provide two means of examining Baffin Bay at the regional scale and drawing conclusions about its crustal structure, its rifting history, and the role of pre-existing structures in its evolution. Despite the identification of extinct spreading axes and fracture zones based on gravity data, insights into the nature and structure of the underlying crust have only been gleaned from limited deep seismic experiments, mostly concentrated in the north and east where the continental shelf is shallower and wider. Baffin Bay is partially underlain by oceanic crust with zones of variable width of extended continental crust along its margins. 3-D gravity inversions, constrained by bathymetric and depth to basement constraints, have generated a range of 3-D crustal density models that collectively reveal an asymmetric distribution of extended continental crust, approximately 25-30 km thick, along the margins of Baffin Bay, with a wider zone on the Greenland margin. A zone of 5 to 13 km thick crust lies at the centre of Baffin Bay, with the thinnest crust (5 km thick) clearly aligning with Eocene spreading centres. The resolved crustal thicknesses are generally in agreement with available seismic constraints, with discrepancies mostly corresponding to zones of higher density lower crust along the Greenland margin and Nares Strait. Deformation modelling from independent plate reconstructions using GPlates of the rifted margins of Baffin Bay was performed to gauge the influence of original crustal thickness and the width of the deformation zone on the crustal thicknesses obtained from the gravity inversions. These results show the best match with the results from the gravity inversions for an original unstretched crustal thickness of 34-36 km, consistent with present-day crustal thicknesses derived from teleseismic studies beyond the likely continentward limits of rifting around the margins of Baffin Bay. The width of the deformation zone has only a minimal influence on the modelled crustal thicknesses if the zone is of sufficient width that edge effects do not interfere with the main modelled domain.
Influence of net freshwater supply on salinity in Florida Bay
Nuttle, William K.; Fourqurean, James W.; Cosby, Bernard J.; Zieman, Joseph C.; Robblee, Michael B.
2000-01-01
An annual water budget for Florida Bay, the large, seasonally hypersaline estuary in the Everglades National Park, was constructed using physically based models and long‐term (31 years) data on salinity, hydrology, and climate. Effects of seasonal and interannual variations of the net freshwater supply (runoff plus rainfall minus evaporation) on salinity variation within the bay were also examined. Particular attention was paid to the effects of runoff, which are the focus of ambitious plans to restore and conserve the Florida Bay ecosystem. From 1965 to 1995 the annual runoff from the Everglades into the bay was less than one tenth of the annual direct rainfall onto the bay, while estimated annual evaporation slightly exceeded annual rainfall. The average net freshwater supply to the bay over a year was thus approximately zero, and interannual variations in salinity appeared to be affected primarily by interannual fluctuations in rainfall. At the annual scale, runoff apparently had little effect on the bay as a whole during this period. On a seasonal basis, variations in rainfall, evaporation, and runoff were not in phase, and the net freshwater supply to the bay varied between positive and negative values, contributing to a strong seasonal pattern in salinity, especially in regions of the bay relatively isolated from exchanges with the Gulf of Mexico and Atlantic Ocean. Changes in runoff could have a greater effect on salinity in the bay if the seasonal patterns of rainfall and evaporation and the timing of the runoff are considered. One model was also used to simulate spatial and temporal patterns of salinity responses expected to result from changes in net freshwater supply. Simulations in which runoff was increased by a factor of 2 (but with no change in spatial pattern) indicated that increased runoff will lower salinity values in eastern Florida Bay, increase the variability of salinity in the South Region, but have little effect on salinity in the Central and West Regions.
NASA Astrophysics Data System (ADS)
Lee, Jun; Lee, Jungwoo; Yun, Sang-Leen; Oh, Hye-Cheol
2017-08-01
The purpose of this study was to develop a two-dimensional shallow water flow model using the finite volume method on a combined unstructured triangular and quadrilateral grid system to simulate coastal, estuarine and river flows. The intercell numerical fluxes were calculated using the classical Osher-Solomon's approximate Riemann solver for the governing conservation laws to be able to handle wetting and drying processes and to capture a tidal bore like phenomenon. The developed model was validated with several benchmark test problems including the two-dimensional dam-break problem. The model results were well agreed with results of other models and experimental results in literature. The unstructured triangular and quadrilateral combined grid system was successfully implemented in the model, thus the developed model would be more flexible when applying in an estuarine system, which includes narrow channels. Then, the model was tested in Mobile Bay, Alabama, USA. The developed model reproduced water surface elevation well as having overall Predictive Skill of 0.98. We found that the primary inlet, Main Pass, only covered 35% of the fresh water exchange while it covered 89% of the total water exchange between the ocean and Mobile Bay. There were also discharge phase difference between MP and the secondary inlet, Pass aux Herons, and this phase difference in flows would act as a critical role in substances' exchange between the eastern Mississippi Sound and the northern Gulf of Mexico through Main Pass and Pass aux Herons in Mobile Bay.
Baumeister, Judith; Fischer, Ruediger; Eckenberg, Peter; Henninger, Kerstin; Ruebsamen-Waigmann, Helga; Kleymann, Gerald
2007-01-01
The efficacy of BAY 57-1293, a novel non-nucleosidic inhibitor of herpes simplex virus 1 and 2 (HSV-1 and HSV-2), bovine herpesvirus and pseudorabies virus, was studied in the guinea pig model of genital herpes in comparison with the licensed drug valaciclovir (Valtrex). Early therapy with BAY 57-1293 almost completely suppressed the symptoms of acute HSV-2 infection, and reduced virus shedding and viral load in the sacral dorsal root ganglia by up to three orders of magnitude, resulting in decreased latency and a greatly diminished frequency of subsequent recurrent episodes. In contrast, valaciclovir showed only moderate effects in this set of experiments. When treatment was initiated late during the course of disease after symptoms were apparent, that is, a setting closer to most clinical situations, the efficacy of therapy with BAY 57-1293 was even more pronounced. Compared with valaciclovir, BAY 57-1293 halved the time necessary for complete healing. Moreover, the onset of action was fast, so that only very few animals developed new lesions after treatment commenced. Finally, in a study addressing the treatment of recurrent disease in animals whose primary infection had remained untreated BAY 57-1293 was efficient in suppressing the episodes. In summary, superior potency and efficacy of BAY 57-1293 over standard treatment with valaciclovir was demonstrated in relevant animal models of human genital herpes disease in terms of abrogating an HSV infection, reducing latency and the frequency of subsequent recurrences. Furthermore, BAY 57-1293 shortens the time to healing even if initiation of therapy is delayed.
Bay scallops (Argopecten irradians) inhabit shallow subtidal habitats along the Atlantic coast of the United States and require settlement substrates, such as submerged aquatic vegetation (SAV), for their early juvenile stages. The short lifespan of bay scallops (1-2 yr) coupled...
Federal Register 2010, 2011, 2012, 2013, 2014
2012-12-19
.../Electronic Equipment Bay Fire Detection and Smoke Penetration AGENCY: Federal Aviation Administration (FAA... where the flightcrew could determine the origin of smoke or fire by a straightforward airplane flight.... The FAA has no requirement for smoke and/or fire detection in the electrical/electronic equipment bays...
Federal Register 2010, 2011, 2012, 2013, 2014
2013-03-04
...; Electrical/Electronic Equipment Bay Fire Detection and Smoke Penetration AGENCY: Federal Aviation... where the flight crew could determine the origin of smoke or fire by a straightforward airplane flight.... The FAA has no requirement for smoke and/or fire detection in the electrical/electronic equipment bays...
In the mid-1990s the Tampa Bay Estuary Program proposed a nutrient reduction strategy focused on improving water clarity to promote seagrass expansion within Tampa Bay. A System Dynamics Model is being developed to evaluate spatially and temporally explicit impacts of nutrient r...
Replacement of tritiated water from irradiated fuel storage bay
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castillo, I.; Boniface, H.; Suppiah, S.
2015-03-15
Recently, AECL developed a novel method to reduce tritium emissions (to groundwater) and personnel doses at the NRU (National Research Universal) reactor irradiated fuel storage bay (also known as rod or spent fuel bay) through a water swap process. The light water in the fuel bay had built up tritium that had been transferred from the heavy water moderator through normal fuel transfers. The major advantage of the thermal stratification method was that a very effective tritium reduction could be achieved by swapping a minimal volume of bay water and warm tritiated water would be skimmed off the bay surface.more » A demonstration of the method was done that involved Computational Fluid Dynamics (CFD) modeling of the swap process and a test program that showed excellent agreement with model prediction for the effective removal of almost all the tritium with a minimal water volume. Building on the successful demonstration, AECL fabricated, installed, commissioned and operated a full-scale system to perform a water swap. This full-scale water swap operation achieved a tritium removal efficiency of about 96%.« less