Science.gov

Sample records for algorithm random forests

  1. Research on machine learning framework based on random forest algorithm

    NASA Astrophysics Data System (ADS)

    Ren, Qiong; Cheng, Hui; Han, Hai

    2017-03-01

    With the continuous development of machine learning, industry and academia have released a lot of machine learning frameworks based on distributed computing platform, and have been widely used. However, the existing framework of machine learning is limited by the limitations of machine learning algorithm itself, such as the choice of parameters and the interference of noises, the high using threshold and so on. This paper introduces the research background of machine learning framework, and combined with the commonly used random forest algorithm in machine learning classification algorithm, puts forward the research objectives and content, proposes an improved adaptive random forest algorithm (referred to as ARF), and on the basis of ARF, designs and implements the machine learning framework.

  2. Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

    PubMed Central

    Park, Wookje; Jung, Sikhang

    2014-01-01

    Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained. PMID:25057508

  3. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study

    NASA Astrophysics Data System (ADS)

    Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.

    2016-09-01

    There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21

  4. Fault diagnosis in spur gears based on genetic algorithm and random forest

    NASA Astrophysics Data System (ADS)

    Cerrada, Mariela; Zurita, Grover; Cabrera, Diego; Sánchez, René-Vinicio; Artés, Mariano; Li, Chuan

    2016-03-01

    There are growing demands for condition-based monitoring of gearboxes, and therefore new methods to improve the reliability, effectiveness, accuracy of the gear fault detection ought to be evaluated. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance of the diagnostic models. On the other hand, random forest classifiers are suitable models in industrial environments where large data-samples are not usually available for training such diagnostic models. The main aim of this research is to build up a robust system for the multi-class fault diagnosis in spur gears, by selecting the best set of condition parameters on time, frequency and time-frequency domains, which are extracted from vibration signals. The diagnostic system is performed by using genetic algorithms and a classifier based on random forest, in a supervised environment. The original set of condition parameters is reduced around 66% regarding the initial size by using genetic algorithms, and still get an acceptable classification precision over 97%. The approach is tested on real vibration signals by considering several fault classes, one of them being an incipient fault, under different running conditions of load and velocity.

  5. Urban Road Detection in Airbone Laser Scanning Point Cloud Using Random Forest Algorithm

    NASA Astrophysics Data System (ADS)

    Kaczałek, B.; Borkowski, A.

    2016-06-01

    The objective of this research is to detect points that describe a road surface in an unclassified point cloud of the airborne laser scanning (ALS). For this purpose we use the Random Forest learning algorithm. The proposed methodology consists of two stages: preparation of features and supervised point cloud classification. In this approach we consider ALS points, representing only the last echo. For these points RGB, intensity, the normal vectors, their mean values and the standard deviations are provided. Moreover, local and global height variations are taken into account as components of a feature vector. The feature vectors are calculated on a basis of the 3D Delaunay triangulation. The proposed methodology was tested on point clouds with the average point density of 12 pts/m2 that represent large urban scene. The significance level of 15% was set up for a decision tree of the learning algorithm. As a result of the Random Forest classification we received two subsets of ALS points. One of those groups represents points belonging to the road network. After the classification evaluation we achieved from 90% of the overall classification accuracy. Finally, the ALS points representing roads were merged and simplified into road network polylines using morphological operations.

  6. Modification of the random forest algorithm to avoid statistical dependence problems when classifying remote sensing imagery

    NASA Astrophysics Data System (ADS)

    Cánovas-García, Fulgencio; Alonso-Sarría, Francisco; Gomariz-Castillo, Francisco; Oñate-Valdivieso, Fernando

    2017-06-01

    Random forest is a classification technique widely used in remote sensing. One of its advantages is that it produces an estimation of classification accuracy based on the so called out-of-bag cross-validation method. It is usually assumed that such estimation is not biased and may be used instead of validation based on an external data-set or a cross-validation external to the algorithm. In this paper we show that this is not necessarily the case when classifying remote sensing imagery using training areas with several pixels or objects. According to our results, out-of-bag cross-validation clearly overestimates accuracy, both overall and per class. The reason is that, in a training patch, pixels or objects are not independent (from a statistical point of view) of each other; however, they are split by bootstrapping into in-bag and out-of-bag as if they were really independent. We believe that putting whole patch, rather than pixels/objects, in one or the other set would produce a less biased out-of-bag cross-validation. To deal with the problem, we propose a modification of the random forest algorithm to split training patches instead of the pixels (or objects) that compose them. This modified algorithm does not overestimate accuracy and has no lower predictive capability than the original. When its results are validated with an external data-set, the accuracy is not different from that obtained with the original algorithm. We analysed three remote sensing images with different classification approaches (pixel and object based); in the three cases reported, the modification we propose produces a less biased accuracy estimation.

  7. Recursive random forest algorithm for constructing multilayered hierarchical gene regulatory networks that govern biological pathways

    PubMed Central

    Zhang, Kui; Busov, Victor; Wei, Hairong

    2017-01-01

    Background Present knowledge indicates a multilayered hierarchical gene regulatory network (ML-hGRN) often operates above a biological pathway. Although the ML-hGRN is very important for understanding how a pathway is regulated, there is almost no computational algorithm for directly constructing ML-hGRNs. Results A backward elimination random forest (BWERF) algorithm was developed for constructing the ML-hGRN operating above a biological pathway. For each pathway gene, the BWERF used a random forest model to calculate the importance values of all transcription factors (TFs) to this pathway gene recursively with a portion (e.g. 1/10) of least important TFs being excluded in each round of modeling, during which, the importance values of all TFs to the pathway gene were updated and ranked until only one TF was remained in the list. The above procedure, termed BWERF. After that, the importance values of a TF to all pathway genes were aggregated and fitted to a Gaussian mixture model to determine the TF retention for the regulatory layer immediately above the pathway layer. The acquired TFs at the secondary layer were then set to be the new bottom layer to infer the next upper layer, and this process was repeated until a ML-hGRN with the expected layers was obtained. Conclusions BWERF improved the accuracy for constructing ML-hGRNs because it used backward elimination to exclude the noise genes, and aggregated the individual importance values for determining the TFs retention. We validated the BWERF by using it for constructing ML-hGRNs operating above mouse pluripotency maintenance pathway and Arabidopsis lignocellulosic pathway. Compared to GENIE3, BWERF showed an improvement in recognizing authentic TFs regulating a pathway. Compared to the bottom-up Gaussian graphical model algorithm we developed for constructing ML-hGRNs, the BWERF can construct ML-hGRNs with significantly reduced edges that enable biologists to choose the implicit edges for experimental

  8. Feature selection for outcome prediction in oesophageal cancer using genetic algorithm and random forest classifier.

    PubMed

    Paul, Desbordes; Su, Ruan; Romain, Modzelewski; Sébastien, Vauclin; Pierre, Vera; Isabelle, Gardin

    2016-12-28

    The outcome prediction of patients can greatly help to personalize cancer treatment. A large amount of quantitative features (clinical exams, imaging, …) are potentially useful to assess the patient outcome. The challenge is to choose the most predictive subset of features. In this paper, we propose a new feature selection strategy called GARF (genetic algorithm based on random forest) extracted from positron emission tomography (PET) images and clinical data. The most relevant features, predictive of the therapeutic response or which are prognoses of the patient survival 3 years after the end of treatment, were selected using GARF on a cohort of 65 patients with a local advanced oesophageal cancer eligible for chemo-radiation therapy. The most relevant predictive results were obtained with a subset of 9 features leading to a random forest misclassification rate of 18±4% and an areas under the of receiver operating characteristic (ROC) curves (AUC) of 0.823±0.032. The most relevant prognostic results were obtained with 8 features leading to an error rate of 20±7% and an AUC of 0.750±0.108. Both predictive and prognostic results show better performances using GARF than using 4 other studied methods.

  9. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  10. Identifying subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm.

    PubMed

    Li, Zhan-Chao; Lai, Yan-Hua; Chen, Li-Li; Chen, Chao; Xie, Yun; Dai, Zong; Zou, Xiao-Yong

    2013-04-05

    In the post-genome era, one of the most important and challenging tasks is to identify the subcellular localizations of protein complexes, and further elucidate their functions in human health with applications to understand disease mechanisms, diagnosis and therapy. Although various experimental approaches have been developed and employed to identify the subcellular localizations of protein complexes, the laboratory technologies fall far behind the rapid accumulation of protein complexes. Therefore, it is highly desirable to develop a computational method to rapidly and reliably identify the subcellular localizations of protein complexes. In this study, a novel method is proposed for predicting subcellular localizations of mammalian protein complexes based on graph theory with a random forest algorithm. Protein complexes are modeled as weighted graphs containing nodes and edges, where nodes represent proteins, edges represent protein-protein interactions and weights are descriptors of protein primary structures. Some topological structure features are proposed and adopted to characterize protein complexes based on graph theory. Random forest is employed to construct a model and predict subcellular localizations of protein complexes. Accuracies on a training set by a 10-fold cross-validation test for predicting plasma membrane/membrane attached, cytoplasm and nucleus are 84.78%, 71.30%, and 82.00%, respectively. And accuracies for the independent test set are 81.31%, 69.95% and 81.00%, respectively. These high prediction accuracies exhibit the state-of-the-art performance of the current method. It is anticipated that the proposed method may become a useful high-throughput tool and plays a complementary role to the existing experimental techniques in identifying subcellular localizations of mammalian protein complexes. The source code of Matlab and the dataset can be obtained freely on request from the authors.

  11. Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm

    NASA Astrophysics Data System (ADS)

    Ahmed, Oumer S.; Franklin, Steven E.; Wulder, Michael A.; White, Joanne C.

    2015-03-01

    Many forest management activities, including the development of forest inventories, require spatially detailed forest canopy cover and height data. Among the various remote sensing technologies, LiDAR (Light Detection and Ranging) offers the most accurate and consistent means for obtaining reliable canopy structure measurements. A potential solution to reduce the cost of LiDAR data, is to integrate transects (samples) of LiDAR data with frequently acquired and spatially comprehensive optical remotely sensed data. Although multiple regression is commonly used for such modeling, often it does not fully capture the complex relationships between forest structure variables. This study investigates the potential of Random Forest (RF), a machine learning technique, to estimate LiDAR measured canopy structure using a time series of Landsat imagery. The study is implemented over a 2600 ha area of industrially managed coastal temperate forests on Vancouver Island, British Columbia, Canada. We implemented a trajectory-based approach to time series analysis that generates time since disturbance (TSD) and disturbance intensity information for each pixel and we used this information to stratify the forest land base into two strata: mature forests and young forests. Canopy cover and height for three forest classes (i.e. mature, young and mature and young (combined)) were modeled separately using multiple regression and Random Forest (RF) techniques. For all forest classes, the RF models provided improved estimates relative to the multiple regression models. The lowest validation error was obtained for the mature forest strata in a RF model (R2 = 0.88, RMSE = 2.39 m and bias = -0.16 for canopy height; R2 = 0.72, RMSE = 0.068% and bias = -0.0049 for canopy cover). This study demonstrates the value of using disturbance and successional history to inform estimates of canopy structure and obtain improved estimates of forest canopy cover and height using the RF algorithm.

  12. Land cover classification using random forest with genetic algorithm-based parameter optimization

    NASA Astrophysics Data System (ADS)

    Ming, Dongping; Zhou, Tianning; Wang, Min; Tan, Tian

    2016-07-01

    Land cover classification based on remote sensing imagery is an important means to monitor, evaluate, and manage land resources. However, it requires robust classification methods that allow accurate mapping of complex land cover categories. Random forest (RF) is a powerful machine-learning classifier that can be used in land remote sensing. However, two important parameters of RF classification, namely, the number of trees and the number of variables tried at each split, affect classification accuracy. Thus, optimal parameter selection is an inevitable problem in RF-based image classification. This study uses the genetic algorithm (GA) to optimize the two parameters of RF to produce optimal land cover classification accuracy. HJ-1B CCD2 image data are used to classify six different land cover categories in Changping, Beijing, China. Experimental results show that GA-RF can avoid arbitrariness in the selection of parameters. The experiments also compare land cover classification results by using GA-RF method, traditional RF method (with default parameters), and support vector machine method. When the GA-RF method is used, classification accuracies, respectively, improved by 1.02% and 6.64%. The comparison results show that GA-RF is a feasible solution for land cover classification without compromising accuracy or incurring excessive time.

  13. Forest Fires in a Random Forest

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhaïl; Vega Orozco, Carmen D.

    2013-04-01

    Forest fires in Canton Ticino (Switzerland) are very complex phenomena. Meteorological data can explain some occurrences of fires in time, but not necessarily in space. Using anthropogenic and geographical feature data with the random forest algorithm, this study tries to highlight factors that most influence the fire-ignition and to identify areas under risk. The fundamental scientific problem considered in the present research deals with an application of random forest algorithms for the analysis and modeling of forest fires patterns in a high dimensional input feature space. This study is focused on the 2,224 anthropogenic forest fires among the 2,401 forest fire ignition points that have occurred in Canton Ticino from 1969 to 2008. Provided by the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL), the database characterizes each fire by their location (x,y coordinates of the ignition point), start date, duration, burned area, and other information such as ignition cause and topographic features such as slope, aspect, altitude, etc. In addition, the database VECTOR25 from SwissTopo was used to extract information of the distances between fire ignition points and anthropogenic structures like buildings, road network, rail network, etc. Developed by L. Breiman and A. Cutler, the Random Forests (RF) algorithm provides an ensemble of classification and regression trees. By a pseudo-random variable selection for each split node, this method grows a variety of decision trees that do not return the same results, and thus by a committee system, returns a value that has a better accuracy than other machine learning methods. This algorithm incorporates directly measurement of importance variable which is used to display factors affecting forest fires. Dealing with this parameter, several models can be fit, and thus, a prediction can be made throughout the validity domain of Canton Ticino. Comprehensive RF analysis was carried out in order to 1

  14. Building a genetic risk model for bipolar disorder from genome-wide association data with random forest algorithm

    PubMed Central

    Chuang, Li-Chung; Kuo, Po-Hsiu

    2017-01-01

    A genetic risk score could be beneficial in assisting clinical diagnosis for complex diseases with high heritability. With large-scale genome-wide association (GWA) data, the current study constructed a genetic risk model with a machine learning approach for bipolar disorder (BPD). The GWA dataset of BPD from the Genetic Association Information Network was used as the training data for model construction, and the Systematic Treatment Enhancement Program (STEP) GWA data were used as the validation dataset. A random forest algorithm was applied for pre-filtered markers, and variable importance indices were assessed. 289 candidate markers were selected by random forest procedures with good discriminability; the area under the receiver operating characteristic curve was 0.944 (0.935–0.953) in the training set and 0.702 (0.681–0.723) in the STEP dataset. Using a score with the cutoff of 184, the sensitivity and specificity for BPD was 0.777 and 0.854, respectively. Pathway analyses revealed important biological pathways for identified genes. In conclusion, the present study identified informative genetic markers to differentiate BPD from healthy controls with acceptable discriminability in the validation dataset. In the future, diagnosis classification can be further improved by assessing more comprehensive clinical risk factors and jointly analysing them with genetic data in large samples. PMID:28045094

  15. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  16. An automated bladder volume measurement algorithm by pixel classification using random forests.

    PubMed

    Annangi, Pavan; Frigstad, Sigmund; Subin, S B; Torp, Anders; Ramasubramaniam, Sundararajan; Varna, Srinivas; Annangi, Pavan; Frigstad, Sigmund; Subin, S B; Torp, Anders; Ramasubramaniam, Sundararajan; Varna, Srinivas; Ramasubramaniam, Sundararajan; Torp, Anders; Varna, Srinivas; Subin, Sb; Annangi, Pavan; Frigstad, Sigmund

    2016-08-01

    Residual bladder volume measurement is a very important marker for patients with urinary retention problems. To be able to monitor patients with these conditions at the bedside by nurses or in an out patient setting by general physicians, hand held ultrasound devices will be extremely useful. However to increase the usage of these devices by non traditional users, automated tools that can aid them in the scanning and measurement process will be of great help. In our paper, we have developed a robust segmentation algorithm to automatically measure bladder volume by segmenting bladder contours from sagittal and transverse ultrasound views using a combination of machine learning and active contour algorithms. The algorithm is tested on 50 unseen images and 23 transverse and longitudinal image pairs and the performance is reported.

  17. Prediction of protein-protein interactions using chaos game representation and wavelet transform via the random forest algorithm.

    PubMed

    Jia, J H; Liu, Z; Chen, X; Xiao, X; Liu, B X

    2015-10-02

    Studying the network of protein-protein interactions (PPIs) will provide valuable insights into the inner workings of cells. It is vitally important to develop an automated, high-throughput tool that efficiently predicts protein-protein interactions. This study proposes a new model for PPI prediction based on the concept of chaos game representation and the wavelet transform, which means that a considerable amount of sequence-order effects can be incorporated into a set of discrete numbers. The advantage of using chaos game representation and the wavelet transform to formulate the protein sequence is that it can more effectively reflect its overall sequence-order characteristics than the conventional correlation factors. Using such a formulation frame to represent the protein sequences means that the random forest algorithm can be used to conduct the prediction. The results for a large-scale independent test dataset show that the proposed model can achieve an excellent performance with an accuracy value of about 0.86 and a geometry mean value of about 0.85. The model is therefore a useful supplementary tool for PPI predictions. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI.

  18. Comparison between WorldView-2 and SPOT-5 images in mapping the bracken fern using the random forest algorithm

    NASA Astrophysics Data System (ADS)

    Odindi, John; Adam, Elhadi; Ngubane, Zinhle; Mutanga, Onisimo; Slotow, Rob

    2014-01-01

    Plant species invasion is known to be a major threat to socioeconomic and ecological systems. Due to high cost and limited extents of urban green spaces, high mapping accuracy is necessary to optimize the management of such spaces. We compare the performance of the new-generation WorldView-2 (WV-2) and SPOT-5 images in mapping the bracken fern [Pteridium aquilinum (L) kuhn] in a conserved urban landscape. Using the random forest algorithm, grid-search approaches based on out-of-bag estimate error were used to determine the optimal ntree and mtry combinations. The variable importance and backward feature elimination techniques were further used to determine the influence of the image bands on mapping accuracy. Additionally, the value of the commonly used vegetation indices in enhancing the classification accuracy was tested on the better performing image data. Results show that the performance of the new WV-2 bands was better than that of the traditional bands. Overall classification accuracies of 84.72 and 72.22% were achieved for the WV-2 and SPOT images, respectively. Use of selected indices from the WV-2 bands increased the overall classification accuracy to 91.67%. The findings in this study show the suitability of the new generation in mapping the bracken fern within the often vulnerable urban natural vegetation cover types.

  19. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    PubMed

    Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation.

  20. Identification of non-coding RNAs with a new composite feature in the Hybrid Random Forest Ensemble algorithm

    PubMed Central

    Lertampaiporn, Supatcha; Thammarongtham, Chinae; Nukoolkit, Chakarida; Kaewkamnerdpong, Boonserm; Ruengjitchatchawalya, Marasri

    2014-01-01

    To identify non-coding RNA (ncRNA) signals within genomic regions, a classification tool was developed based on a hybrid random forest (RF) with a logistic regression model to efficiently discriminate short ncRNA sequences as well as long complex ncRNA sequences. This RF-based classifier was trained on a well-balanced dataset with a discriminative set of features and achieved an accuracy, sensitivity and specificity of 92.11%, 90.7% and 93.5%, respectively. The selected feature set includes a new proposed feature, SCORE. This feature is generated based on a logistic regression function that combines five significant features—structure, sequence, modularity, structural robustness and coding potential—to enable improved characterization of long ncRNA (lncRNA) elements. The use of SCORE improved the performance of the RF-based classifier in the identification of Rfam lncRNA families. A genome-wide ncRNA classification framework was applied to a wide variety of organisms, with an emphasis on those of economic, social, public health, environmental and agricultural significance, such as various bacteria genomes, the Arthrospira (Spirulina) genome, and rice and human genomic regions. Our framework was able to identify known ncRNAs with sensitivities of greater than 90% and 77.7% for prokaryotic and eukaryotic sequences, respectively. Our classifier is available at http://ncrna-pred.com/HLRF.htm. PMID:24771344

  1. Automated classification of seismic sources in large database using random forest algorithm: First results at Piton de la Fournaise volcano (La Réunion).

    NASA Astrophysics Data System (ADS)

    Hibert, Clément; Provost, Floriane; Malet, Jean-Philippe; Stumpf, André; Maggi, Alessia; Ferrazzini, Valérie

    2016-04-01

    In the past decades the increasing quality of seismic sensors and capability to transfer remotely large quantity of data led to a fast densification of local, regional and global seismic networks for near real-time monitoring. This technological advance permits the use of seismology to document geological and natural/anthropogenic processes (volcanoes, ice-calving, landslides, snow and rock avalanches, geothermal fields), but also led to an ever-growing quantity of seismic data. This wealth of seismic data makes the construction of complete seismicity catalogs, that include earthquakes but also other sources of seismic waves, more challenging and very time-consuming as this critical pre-processing stage is classically done by human operators. To overcome this issue, the development of automatic methods for the processing of continuous seismic data appears to be a necessity. The classification algorithm should satisfy the need of a method that is robust, precise and versatile enough to be deployed to monitor the seismicity in very different contexts. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forests classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied

  2. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  3. Spectral Classification of Asteroids by Random Forest

    NASA Astrophysics Data System (ADS)

    Huang, C.; Ma, Y. H.; Zhao, H. B.; Lu, X. P.

    2016-09-01

    With the increasing asteroid spectral and photometric data, a variety of classification methods for asteroids have been proposed. This paper classifies asteroids based on the observations of Sloan Digital Sky Survey (SDSS) Moving Object Catalogue (MOC) by using the random forest algorithm. With the training data derived from the taxonomies of Tholen, Bus, Lazzaro, DeMeo, and Principal Component Analysis, we classify 48642 asteroids according to g, r, i, and z SDSS magnitudes. In this way, asteroids are divided into 8 spectral classes (C, X, S, B, D, K, L, and V).

  4. Multivariable integration method for estimating sea surface salinity in coastal waters from in situ data and remotely sensed data using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Meiling; Liu, Xiangnan; Liu, Da; Ding, Chao; Jiang, Jiale

    2015-02-01

    A random forest (RF) model was created to estimate sea surface salinity (SSS) in the Hong Kong Sea, China, by integrating in situ and remotely sensed data. Optical remotely sensed data from China's HJ-1 satellite and in situ data were collected. The prediction model of salinity was developed by in situ environmental variables in the ocean, namely sea surface temperature (SST), pH, total inorganic nitrogen (TIN) and Chl-a, which are strongly related to SSS according to Pearson's correlation analysis. The large-scale SSS was estimated using the established salinity model with the same input parameters. The ordinary kriging interpolation using in situ data and the retrieval model based on remotely sensed data were developed to obtain the large-scale input parameters of the model. The different number of trees in the forest (ntree) and the number of features at each node (mtry) were adjusted in the RF model. The results showed that an optimum RF model was obtained with mtry=32 and ntree=2000, and the most important variable of the model for SSS prediction was SST, followed by TIN, Chl-a and pH. Such an RF model was successful in evaluating the temporal-spatial distribution of SSS and had a relatively low estimation error. The root mean square error (RMSE) was less than 2.0 psu, the mean absolute error (MAE) was below 1.5 psu, and the absolute percent error (APE) was lower than 5%. The final RF salinity model was then compared with a multiple linear regression model (MLR), a back-propagation artificial neural network model, and a classification and regression trees (CART) model. The RF had a lower estimation error than the other three models. In addition, the RF model was used extensively under different periods and could be universal. This demonstrated that the RF algorithm has the capability to estimate SSS in coastal waters by integrating in situ and remotely sensed data.

  5. Mapping the distributions of C3 and C4 grasses in the mixed-grass prairies of southwest Oklahoma using the Random Forest classification algorithm

    NASA Astrophysics Data System (ADS)

    Yan, Dong; de Beurs, Kirsten M.

    2016-05-01

    The objective of this paper is to demonstrate a new method to map the distributions of C3 and C4 grasses at 30 m resolution and over a 25-year period of time (1988-2013) by combining the Random Forest (RF) classification algorithm and patch stable areas identified using the spatial pattern analysis software FRAGSTATS. Predictor variables for RF classifications consisted of ten spectral variables, four soil edaphic variables and three topographic variables. We provided a confidence score in terms of obtaining pure land cover at each pixel location by retrieving the classification tree votes. Classification accuracy assessments and predictor variable importance evaluations were conducted based on a repeated stratified sampling approach. Results show that patch stable areas obtained from larger patches are more appropriate to be used as sample data pools to train and validate RF classifiers for historical land cover mapping purposes and it is more reasonable to use patch stable areas as sample pools to map land cover in a year closer to the present rather than years further back in time. The percentage of obtained high confidence prediction pixels across the study area ranges from 71.18% in 1988 to 73.48% in 2013. The repeated stratified sampling approach is necessary in terms of reducing the positive bias in the estimated classification accuracy caused by the possible selections of training and validation pixels from the same patch stable areas. The RF classification algorithm was able to identify the important environmental factors affecting the distributions of C3 and C4 grasses in our study area such as elevation, soil pH, soil organic matter and soil texture.

  6. Aggregated Recommendation through Random Forests

    PubMed Central

    2014-01-01

    Aggregated recommendation refers to the process of suggesting one kind of items to a group of users. Compared to user-oriented or item-oriented approaches, it is more general and, therefore, more appropriate for cold-start recommendation. In this paper, we propose a random forest approach to create aggregated recommender systems. The approach is used to predict the rating of a group of users to a kind of items. In the preprocessing stage, we merge user, item, and rating information to construct an aggregated decision table, where rating information serves as the decision attribute. We also model the data conversion process corresponding to the new user, new item, and both new problems. In the training stage, a forest is built for the aggregated training set, where each leaf is assigned a distribution of discrete rating. In the testing stage, we present four predicting approaches to compute evaluation values based on the distribution of each tree. Experiments results on the well-known MovieLens dataset show that the aggregated approach maintains an acceptable level of accuracy. PMID:25180204

  7. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  8. Random forests for classification in ecology.

    PubMed

    Cutler, D Richard; Edwards, Thomas C; Beard, Karen H; Cutler, Adele; Hess, Kyle T; Gibson, Jacob; Lawler, Joshua J

    2007-11-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

  9. Random forests for classification in ecology

    USGS Publications Warehouse

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  10. Robust automated lymph node segmentation with random forests

    NASA Astrophysics Data System (ADS)

    Allen, David; Lu, Le; Yao, Jianhua; Liu, Jiamin; Turkbey, Evrim; Summers, Ronald M.

    2014-03-01

    Enlarged lymph nodes may indicate the presence of illness. Therefore, identification and measurement of lymph nodes provide essential biomarkers for diagnosing disease. Accurate automatic detection and measurement of lymph nodes can assist radiologists for better repeatability and quality assurance, but is challenging as well because lymph nodes are often very small and have a highly variable shape. In this paper, we propose to tackle this problem via supervised statistical learning-based robust voxel labeling, specifically the random forest algorithm. Random forest employs an ensemble of decision trees that are trained on labeled multi-class data to recognize the data features and is adopted to handle lowlevel image features sampled and extracted from 3D medical scans. Here we exploit three types of image features (intensity, order-1 contrast and order-2 contrast) and evaluate their effectiveness in random forest feature selection setting. The trained forest can then be applied to unseen data by voxel scanning via sliding windows (11×11×11), to assign the class label and class-conditional probability to each unlabeled voxel at the center of window. Voxels from the manually annotated lymph nodes in a CT volume are treated as positive class; background non-lymph node voxels as negatives. We show that the random forest algorithm can be adapted and perform the voxel labeling task accurately and efficiently. The experimental results are very promising, with AUCs (area under curve) of the training and validation ROC (receiver operating characteristic) of 0.972 and 0.959, respectively. The visualized voxel labeling results also confirm the validity.

  11. Structure damage detection based on random forest recursive feature elimination

    NASA Astrophysics Data System (ADS)

    Zhou, Qifeng; Zhou, Hao; Zhou, Qingqing; Yang, Fan; Luo, Linkai

    2014-05-01

    Feature extraction is a key former step in structural damage detection. In this paper, a structural damage detection method based on wavelet packet decomposition (WPD) and random forest recursive feature elimination (RF-RFE) is proposed. In order to gain the most effective feature subset and to improve the identification accuracy a two-stage feature selection method is adopted after WPD. First, the damage features are sorted according to original random forest variable importance analysis. Second, using RF-RFE to eliminate the least important feature and reorder the feature list each time, then get the new feature importance sequence. Finally, k-nearest neighbor (KNN) algorithm, as a benchmark classifier, is used to evaluate the extracted feature subset. A four-storey steel shear building model is chosen as an example in method verification. The experimental results show that using the fewer features got from proposed method can achieve higher identification accuracy and reduce the detection time cost.

  12. Phenotype Recognition for RNAi Screening by Random Projection Forest

    NASA Astrophysics Data System (ADS)

    Zhang, Bailing

    2011-06-01

    High-content screening is important in drug discovery. The use of images of living cells as the basic unit for molecule discovery can aid the identification of small compounds altering cellular phenotypes. As such, efficient computational methods are required for the rate limiting task of cellular phenotype identification. In this paper we first investigate the effectiveness of a feature description approach by combining Haralick texture analysis with Curvelet transform and then propose a new ensemble approach for classification. The ensemble contains a set of base classifiers which are trained using random projection (RP) of original features onto higher-dimensional spaces. With Classification and Regression Tree (CART) as the base classifier, it has been empirically demonstrated that the proposed Random Projection Forest ensemble gives better classification results than those achieved by the Boosting, Bagging and Rotation Forest algorithms, offering a classification rate ˜88% with smallest standard deviation, which compares sharply with the published result of 82%.

  13. Random survival forests for competing risks

    PubMed Central

    Ishwaran, Hemant; Gerds, Thomas A.; Kogalur, Udaya B.; Moore, Richard D.; Gange, Stephen J.; Lau, Bryan M.

    2014-01-01

    We introduce a new approach to competing risks using random forests. Our method is fully non-parametric and can be used for selecting event-specific variables and for estimating the cumulative incidence function. We show that the method is highly effective for both prediction and variable selection in high-dimensional problems and in settings such as HIV/AIDS that involve many competing risks. PMID:24728979

  14. Random forests for genomic data analysis.

    PubMed

    Chen, Xi; Ishwaran, Hemant

    2012-06-01

    Random forests (RF) is a popular tree-based ensemble machine learning tool that is highly data adaptive, applies to "large p, small n" problems, and is able to account for correlation as well as interactions among features. This makes RF particularly appealing for high-dimensional genomic data analysis. In this article, we systematically review the applications and recent progresses of RF for genomic data, including prediction and classification, variable selection, pathway analysis, genetic association and epistasis detection, and unsupervised learning.

  15. An integrative computational framework based on a two-step random forest algorithm improves prediction of zinc-binding sites in proteins.

    PubMed

    Zheng, Cheng; Wang, Mingjun; Takemoto, Kazuhiro; Akutsu, Tatsuya; Zhang, Ziding; Song, Jiangning

    2012-01-01

    Zinc-binding proteins are the most abundant metalloproteins in the Protein Data Bank where the zinc ions usually have catalytic, regulatory or structural roles critical for the function of the protein. Accurate prediction of zinc-binding sites is not only useful for the inference of protein function but also important for the prediction of 3D structure. Here, we present a new integrative framework that combines multiple sequence and structural properties and graph-theoretic network features, followed by an efficient feature selection to improve prediction of zinc-binding sites. We investigate what information can be retrieved from the sequence, structure and network levels that is relevant to zinc-binding site prediction. We perform a two-step feature selection using random forest to remove redundant features and quantify the relative importance of the retrieved features. Benchmarking on a high-quality structural dataset containing 1,103 protein chains and 484 zinc-binding residues, our method achieved >80% recall at a precision of 75% for the zinc-binding residues Cys, His, Glu and Asp on 5-fold cross-validation tests, which is a 10%-28% higher recall at the 75% equal precision compared to SitePredict and zincfinder at residue level using the same dataset. The independent test also indicates that our method has achieved recall of 0.790 and 0.759 at residue and protein levels, respectively, which is a performance better than the other two methods. Moreover, AUC (the Area Under the Curve) and AURPC (the Area Under the Recall-Precision Curve) by our method are also respectively better than those of the other two methods. Our method can not only be applied to large-scale identification of zinc-binding sites when structural information of the target is available, but also give valuable insights into important features arising from different levels that collectively characterize the zinc-binding sites. The scripts and datasets are available at http://protein.cau.edu.cn/zincidentifier/.

  16. Wildfire smoke detection using temporospatial features and random forest classifiers

    NASA Astrophysics Data System (ADS)

    Ko, Byoungchul; Kwak, Joon-Young; Nam, Jae-Yeal

    2012-01-01

    We propose a wildfire smoke detection algorithm that uses temporospatial visual features and an ensemble of decision trees and random forest classifiers. In general, wildfire smoke detection is particularly important for early warning systems because smoke is usually generated before flames; in addition, smoke can be detected from a long distance owing to its diffusion characteristics. In order to detect wildfire smoke using a video camera, temporospatial characteristics such as color, wavelet coefficients, motion orientation, and a histogram of oriented gradients are extracted from the preceding 100 corresponding frames and the current keyframe. Two RFs are then trained using independent temporal and spatial feature vectors. Finally, a candidate block is declared as a smoke block if the average probability of two RFs in a smoke class is maximum. The proposed algorithm was successfully applied to various wildfire-smoke and smoke-colored videos and performed better than other related algorithms.

  17. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    PubMed Central

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-01-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS). PMID:27444562

  18. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    NASA Astrophysics Data System (ADS)

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-07-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS).

  19. Randomized Algorithms for Matrices and Data

    NASA Astrophysics Data System (ADS)

    Mahoney, Michael W.

    2012-03-01

    This chapter reviews recent work on randomized matrix algorithms. By “randomized matrix algorithms,” we refer to a class of recently developed random sampling and random projection algorithms for ubiquitous linear algebra problems such as least-squares (LS) regression and low-rank matrix approximation. These developments have been driven by applications in large-scale data analysis—applications which place very different demands on matrices than traditional scientific computing applications. Thus, in this review, we will focus on highlighting the simplicity and generality of several core ideas that underlie the usefulness of these randomized algorithms in scientific applications such as genetics (where these algorithms have already been applied) and astronomy (where, hopefully, in part due to this review they will soon be applied). The work we will review here had its origins within theoretical computer science (TCS). An important feature in the use of randomized algorithms in TCS more generally is that one must identify and then algorithmically deal with relevant “nonuniformity structure” in the data. For the randomized matrix algorithms to be reviewed here and that have proven useful recently in numerical linear algebra (NLA) and large-scale data analysis applications, the relevant nonuniformity structure is defined by the so-called statistical leverage scores. Defined more precisely below, these leverage scores are basically the diagonal elements of the projection matrix onto the dominant part of the spectrum of the input matrix. As such, they have a long history in statistical data analysis, where they have been used for outlier detection in regression diagnostics. More generally, these scores often have a very natural interpretation in terms of the data and processes generating the data. For example, they can be interpreted in terms of the leverage or influence that a given data point has on, say, the best low-rank matrix approximation; and this

  20. Radiation metabolomics. 3. Biomarker discovery in the urine of gamma-irradiated rats using a simplified metabolomics protocol of gas chromatography-mass spectrometry combined with random forests machine learning algorithm.

    PubMed

    Lanz, Christian; Patterson, Andrew D; Slavík, Josef; Krausz, Kristopher W; Ledermann, Monika; Gonzalez, Frank J; Idle, Jeffrey R

    2009-08-01

    Abstract Radiation metabolomics employing mass spectral technologies represents a plausible means of high-throughput minimally invasive radiation biodosimetry. A simplified metabolomics protocol is described that employs ubiquitous gas chromatography-mass spectrometry and open source software including random forests machine learning algorithm to uncover latent biomarkers of 3 Gy gamma radiation in rats. Urine was collected from six male Wistar rats and six sham-irradiated controls for 7 days, 4 prior to irradiation and 3 after irradiation. Water and food consumption, urine volume, body weight, and sodium, potassium, calcium, chloride, phosphate and urea excretion showed major effects from exposure to gamma radiation. The metabolomics protocol uncovered several urinary metabolites that were significantly up-regulated (glyoxylate, threonate, thymine, uracil, p-cresol) and down-regulated (citrate, 2-oxoglutarate, adipate, pimelate, suberate, azelaate) as a result of radiation exposure. Thymine and uracil were shown to derive largely from thymidine and 2'-deoxyuridine, which are known radiation biomarkers in the mouse. The radiation metabolomic phenotype in rats appeared to derive from oxidative stress and effects on kidney function. Gas chromatography-mass spectrometry is a promising platform on which to develop the field of radiation metabolomics further and to assist in the design of instrumentation for use in detecting biological consequences of environmental radiation release.

  1. Improving protein fold recognition by random forest

    PubMed Central

    2014-01-01

    Background Recognizing the correct structural fold among known template protein structures for a target protein (i.e. fold recognition) is essential for template-based protein structure modeling. Since the fold recognition problem can be defined as a binary classification problem of predicting whether or not the unknown fold of a target protein is similar to an already known template protein structure in a library, machine learning methods have been effectively applied to tackle this problem. In our work, we developed RF-Fold that uses random forest - one of the most powerful and scalable machine learning classification methods - to recognize protein folds. Results RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels. Conclusions The good performance achieved by the RF-Fold demonstrates the random forest's effectiveness for protein fold recognition. PMID:25350499

  2. Photometric classification of quasars from RCS-2 using Random Forest

    NASA Astrophysics Data System (ADS)

    Carrasco, D.; Barrientos, L. F.; Pichara, K.; Anguita, T.; Murphy, D. N. A.; Gilbank, D. G.; Gladders, M. D.; Yee, H. K. C.; Hsieh, B. C.; López, S.

    2015-12-01

    The classification and identification of quasars is fundamental to many astronomical research areas. Given the large volume of photometric survey data available in the near future, automated methods for doing so are required. In this article, we present a new quasar candidate catalog from the Red-Sequence Cluster Survey 2 (RCS-2), identified solely from photometric information using an automated algorithm suitable for large surveys. The algorithm performance is tested using a well-defined SDSS spectroscopic sample of quasars and stars. The Random Forest algorithm constructs the catalog from RCS-2 point sources using SDSS spectroscopically-confirmed stars and quasars. The algorithm identifies putative quasars from broadband magnitudes (g, r, i, z) and colors. Exploiting NUV GALEX measurements for a subset of the objects, we refine the classifier by adding new information. An additional subset of the data with WISE W1 and W2 bands is also studied. Upon analyzing 542 897 RCS-2 point sources, the algorithm identified 21 501 quasar candidates with a training-set-derived precision (the fraction of true positives within the group assigned quasar status) of 89.5% and recall (the fraction of true positives relative to all sources that actually are quasars) of 88.4%. These performance metrics improve for the GALEX subset: 6529 quasar candidates are identified from 16 898 sources, with a precision and recall of 97.0% and 97.5%, respectively. Algorithm performance is further improved when WISE data are included, with precision and recall increasing to 99.3% and 99.1%, respectively, for 21 834 quasar candidates from 242 902 sources. We compiled our final catalog (38 257) by merging these samples and removing duplicates. An observational follow up of 17 bright (r < 19) candidates with long-slit spectroscopy at DuPont telescope (LCO) yields 14 confirmed quasars. The results signal encouraging progress in the classification of point sources with Random Forest algorithms to search

  3. Selecting materialized views using random algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Lijuan; Hao, Zhongxiao; Liu, Chi

    2007-04-01

    The data warehouse is a repository of information collected from multiple possibly heterogeneous autonomous distributed databases. The information stored at the data warehouse is in form of views referred to as materialized views. The selection of the materialized views is one of the most important decisions in designing a data warehouse. Materialized views are stored in the data warehouse for the purpose of efficiently implementing on-line analytical processing queries. The first issue for the user to consider is query response time. So in this paper, we develop algorithms to select a set of views to materialize in data warehouse in order to minimize the total view maintenance cost under the constraint of a given query response time. We call it query_cost view_ selection problem. First, cost graph and cost model of query_cost view_ selection problem are presented. Second, the methods for selecting materialized views by using random algorithms are presented. The genetic algorithm is applied to the materialized views selection problem. But with the development of genetic process, the legal solution produced become more and more difficult, so a lot of solutions are eliminated and producing time of the solutions is lengthened in genetic algorithm. Therefore, improved algorithm has been presented in this paper, which is the combination of simulated annealing algorithm and genetic algorithm for the purpose of solving the query cost view selection problem. Finally, in order to test the function and efficiency of our algorithms experiment simulation is adopted. The experiments show that the given methods can provide near-optimal solutions in limited time and works better in practical cases. Randomized algorithms will become invaluable tools for data warehouse evolution.

  4. Random forest automated supervised classification of Hipparcos periodic variable stars

    NASA Astrophysics Data System (ADS)

    Dubath, P.; Rimoldini, L.; Süveges, M.; Blomme, J.; López, M.; Sarro, L. M.; De Ridder, J.; Cuypers, J.; Guy, L.; Lecoeur, I.; Nienartowicz, K.; Jan, A.; Beck, M.; Mowlavi, N.; De Cat, P.; Lebzelter, T.; Eyer, L.

    2011-07-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation (CV) experiments, the rate of correct classification is between 90 and 100 per cent, depending on the variability type. The main mis-classification cases, up to a rate of about 10 per cent, arise due to confusion between SPB and ACV blue variables and between eclipsing binaries, ellipsoidal variables and other variability types. Our training set and the predicted types for the other Hipparcos periodic stars are available online.

  5. Analyzing training information from random forests for improved image segmentation.

    PubMed

    Mahapatra, Dwarikanath

    2014-04-01

    Labeled training data are used for challenging medical image segmentation problems to learn different characteristics of the relevant domain. In this paper, we examine random forest (RF) classifiers, their learned knowledge during training and ways to exploit it for improved image segmentation. Apart from learning discriminative features, RFs also quantify their importance in classification. Feature importance is used to design a feature selection strategy critical for high segmentation and classification accuracy, and also to design a smoothness cost in a second-order MRF framework for graph cut segmentation. The cost function combines the contribution of different image features like intensity, texture, and curvature information. Experimental results on medical images show that this strategy leads to better segmentation accuracy than conventional graph cut algorithms that use only intensity information in the smoothness cost.

  6. Reducing RANS Model Error Using Random Forest

    NASA Astrophysics Data System (ADS)

    Wang, Jian-Xun; Wu, Jin-Long; Xiao, Heng; Ling, Julia

    2016-11-01

    Reynolds-Averaged Navier-Stokes (RANS) models are still the work-horse tools in the turbulence modeling of industrial flows. However, the model discrepancy due to the inadequacy of modeled Reynolds stresses largely diminishes the reliability of simulation results. In this work we use a physics-informed machine learning approach to improve the RANS modeled Reynolds stresses and propagate them to obtain the mean velocity field. Specifically, the functional forms of Reynolds stress discrepancies with respect to mean flow features are trained based on an offline database of flows with similar characteristics. The random forest model is used to predict Reynolds stress discrepancies in new flows. Then the improved Reynolds stresses are propagated to the velocity field via RANS equations. The effects of expanding the feature space through the use of a complete basis of Galilean tensor invariants are also studied. The flow in a square duct, which is challenging for standard RANS models, is investigated to demonstrate the merit of the proposed approach. The results show that both the Reynolds stresses and the propagated velocity field are improved over the baseline RANS predictions. SAND Number: SAND2016-7437 A

  7. Patch forest: a hybrid framework of random forest and patch-based segmentation

    NASA Astrophysics Data System (ADS)

    Xie, Zhongliu; Gillies, Duncan

    2016-03-01

    The development of an accurate, robust and fast segmentation algorithm has long been a research focus in medical computer vision. State-of-the-art practices often involve non-rigidly registering a target image with a set of training atlases for label propagation over the target space to perform segmentation, a.k.a. multi-atlas label propagation (MALP). In recent years, the patch-based segmentation (PBS) framework has gained wide attention due to its advantage of relaxing the strict voxel-to-voxel correspondence to a series of pair-wise patch comparisons for contextual pattern matching. Despite a high accuracy reported in many scenarios, computational efficiency has consistently been a major obstacle for both approaches. Inspired by recent work on random forest, in this paper we propose a patch forest approach, which by equipping the conventional PBS with a fast patch search engine, is able to boost segmentation speed significantly while retaining an equal level of accuracy. In addition, a fast forest training mechanism is also proposed, with the use of a dynamic grid framework to efficiently approximate data compactness computation and a 3D integral image technique for fast box feature retrieval.

  8. Using Random Forest Models to Predict Organizational Violence

    NASA Technical Reports Server (NTRS)

    Levine, Burton; Bobashev, Georgly

    2012-01-01

    We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"

  9. Random Forests-Based Feature Selection for Land-Use Classification Using LIDAR Data and Orthoimagery

    NASA Astrophysics Data System (ADS)

    Guan, H.; Yu, J.; Li, J.; Luo, L.

    2012-07-01

    The development of lidar system, especially incorporated with high-resolution camera components, has shown great potential for urban classification. However, how to automatically select the best features for land-use classification is challenging. Random Forests, a newly developed machine learning algorithm, is receiving considerable attention in the field of image classification and pattern recognition. Especially, it can provide the measure of variable importance. Thus, in this study the performance of the Random Forests-based feature selection for urban areas was explored. First, we extract features from lidar data, including height-based, intensity-based GLCM measures; other spectral features can be obtained from imagery, such as Red, Blue and Green three bands, and GLCM-based measures. Finally, Random Forests is used to automatically select the optimal and uncorrelated features for landuse classification. 0.5-meter resolution lidar data and aerial imagery are used to assess the feature selection performance of Random Forests in the study area located in Mannheim, Germany. The results clearly demonstrate that the use of Random Forests-based feature selection can improve the classification performance by the selected features.

  10. A parallel algorithm for random searches

    NASA Astrophysics Data System (ADS)

    Wosniack, M. E.; Raposo, E. P.; Viswanathan, G. M.; da Luz, M. G. E.

    2015-11-01

    We discuss a parallelization procedure for a two-dimensional random search of a single individual, a typical sequential process. To assure the same features of the sequential random search in the parallel version, we analyze the former spatial patterns of the encountered targets for different search strategies and densities of homogeneously distributed targets. We identify a lognormal tendency for the distribution of distances between consecutively detected targets. Then, by assigning the distinct mean and standard deviation of this distribution for each corresponding configuration in the parallel simulations (constituted by parallel random walkers), we are able to recover important statistical properties, e.g., the target detection efficiency, of the original problem. The proposed parallel approach presents a speedup of nearly one order of magnitude compared with the sequential implementation. This algorithm can be easily adapted to different instances, as searches in three dimensions. Its possible range of applicability covers problems in areas as diverse as automated computer searchers in high-capacity databases and animal foraging.

  11. Some Randomized Algorithms for Convex Quadratic Programming

    SciTech Connect

    Goldbach, R.

    1999-01-15

    We adapt some randomized algorithms of Clarkson [3] for linear programming to the framework of so-called LP-type problems, which was introduced by Sharir and Welzl [10]. This framework is quite general and allows a unified and elegant presentation and analysis. We also show that LP-type problems include minimization of a convex quadratic function subject to convex quadratic constraints as a special case, for which the algorithms can be implemented efficiently, if only linear constraints are present. We show that the expected running times depend only linearly on the number of constraints, and illustrate this by some numerical results. Even though the framework of LP-type problems may appear rather abstract at first, application of the methods considered in this paper to a given problem of that type is easy and efficient. Moreover, our proofs are in fact rather simple, since many technical details of more explicit problem representations are handled in a uniform manner by our approach. In particular, we do not assume boundedness of the feasible set as required in related methods.

  12. Global patterns and predictions of seafloor biomass using random forests.

    PubMed

    Wei, Chih-Lin; Rowe, Gilbert T; Escobar-Briones, Elva; Boetius, Antje; Soltwedel, Thomas; Caley, M Julian; Soliman, Yousria; Huettmann, Falk; Qu, Fangyuan; Yu, Zishan; Pitcher, C Roland; Haedrich, Richard L; Wicksten, Mary K; Rex, Michael A; Baguley, Jeffrey G; Sharma, Jyotsna; Danovaro, Roberto; MacDonald, Ian R; Nunnally, Clifton C; Deming, Jody W; Montagna, Paul; Lévesque, Mélanie; Weslawski, Jan Marcin; Wlodarska-Kowalczuk, Maria; Ingole, Baban S; Bett, Brian J; Billett, David S M; Yool, Andrew; Bluhm, Bodil A; Iken, Katrin; Narayanaswamy, Bhavani E

    2010-12-30

    A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM), seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes). Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC) flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management.

  13. Global Patterns and Predictions of Seafloor Biomass Using Random Forests

    PubMed Central

    Wei, Chih-Lin; Rowe, Gilbert T.; Escobar-Briones, Elva; Boetius, Antje; Soltwedel, Thomas; Caley, M. Julian; Soliman, Yousria; Huettmann, Falk; Qu, Fangyuan; Yu, Zishan; Pitcher, C. Roland; Haedrich, Richard L.; Wicksten, Mary K.; Rex, Michael A.; Baguley, Jeffrey G.; Sharma, Jyotsna; Danovaro, Roberto; MacDonald, Ian R.; Nunnally, Clifton C.; Deming, Jody W.; Montagna, Paul; Lévesque, Mélanie; Weslawski, Jan Marcin; Wlodarska-Kowalczuk, Maria; Ingole, Baban S.; Bett, Brian J.; Billett, David S. M.; Yool, Andrew; Bluhm, Bodil A.; Iken, Katrin; Narayanaswamy, Bhavani E.

    2010-01-01

    A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM), seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes). Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC) flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management. PMID:21209928

  14. Adapting GNU random forest program for Unix and Windows

    NASA Astrophysics Data System (ADS)

    Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.

    2013-10-01

    The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.

  15. Pathway analysis using random forests with bivariate node-split for survival outcomes

    PubMed Central

    Pang, Herbert; Datta, Debayan; Zhao, Hongyu

    2010-01-01

    Motivation: There is great interest in pathway-based methods for genomics data analysis in the research community. Although machine learning methods, such as random forests, have been developed to correlate survival outcomes with a set of genes, no study has assessed the abilities of these methods in incorporating pathway information for analyzing microarray data. In general, genes that are identified without incorporating biological knowledge are more difficult to interpret. Correlating pathway-based gene expression with survival outcomes may lead to biologically more meaningful prognosis biomarkers. Thus, a comprehensive study on how these methods perform in a pathway-based setting is warranted. Results: In this article, we describe a pathway-based method using random forests to correlate gene expression data with survival outcomes and introduce a novel bivariate node-splitting random survival forests. The proposed method allows researchers to identify important pathways for predicting patient prognosis and time to disease progression, and discover important genes within those pathways. We compared different implementations of random forests with different split criteria and found that bivariate node-splitting random survival forests with log-rank test is among the best. We also performed simulation studies that showed random forests outperforms several other machine learning algorithms and has comparable results with a newly developed component-wise Cox boosting model. Thus, pathway-based survival analysis using machine learning tools represents a promising approach in dissecting pathways and for generating new biological hypothesis from microarray studies. Availability: R package Pwayrfsurvival is available from URL: http://www.duke.edu/∼hp44/pwayrfsurvival.htm Contact: pathwayrf@gmail.com Supplementary information: Supplementary data are available at Bioinformatics online. PMID:19933158

  16. Propensity score and proximity matching using random forest.

    PubMed

    Zhao, Peng; Su, Xiaogang; Ge, Tingting; Fan, Juanjuan

    2016-03-01

    In order to derive unbiased inference from observational data, matching methods are often applied to produce balanced treatment and control groups in terms of all background variables. Propensity score has been a key component in this research area. However, propensity score based matching methods in the literature have several limitations, such as model mis-specifications, categorical variables with more than two levels, difficulties in handling missing data, and nonlinear relationships. Random forest, averaging outcomes from many decision trees, is nonparametric in nature, straightforward to use, and capable of solving these issues. More importantly, the precision afforded by random forest (Caruana et al., 2008) may provide us with a more accurate and less model dependent estimate of the propensity score. In addition, the proximity matrix, a by-product of the random forest, may naturally serve as a distance measure between observations that can be used in matching. The proposed random forest based matching methods are applied to data from the National Health and Nutrition Examination Survey (NHANES). Our results show that the proposed methods can produce well balanced treatment and control groups. An illustration is also provided that the methods can effectively deal with missing data in covariates.

  17. Random Forests for Evaluating Pedagogy and Informing Personalized Learning

    ERIC Educational Resources Information Center

    Spoon, Kelly; Beemer, Joshua; Whitmer, John C.; Fan, Juanjuan; Frazee, James P.; Stronach, Jeanne; Bohonak, Andrew J.; Levine, Richard A.

    2016-01-01

    Random forests are presented as an analytics foundation for educational data mining tasks. The focus is on course- and program-level analytics including evaluating pedagogical approaches and interventions and identifying and characterizing at-risk students. As part of this development, the concept of individualized treatment effects (ITE) is…

  18. Using random forests to diagnose aviation turbulence.

    PubMed

    Williams, John K

    Atmospheric turbulence poses a significant hazard to aviation, with severe encounters costing airlines millions of dollars per year in compensation, aircraft damage, and delays due to required post-event inspections and repairs. Moreover, attempts to avoid turbulent airspace cause flight delays and en route deviations that increase air traffic controller workload, disrupt schedules of air crews and passengers and use extra fuel. For these reasons, the Federal Aviation Administration and the National Aeronautics and Space Administration have funded the development of automated turbulence detection, diagnosis and forecasting products. This paper describes a methodology for fusing data from diverse sources and producing a real-time diagnosis of turbulence associated with thunderstorms, a significant cause of weather delays and turbulence encounters that is not well-addressed by current turbulence forecasts. The data fusion algorithm is trained using a retrospective dataset that includes objective turbulence reports from commercial aircraft and collocated predictor data. It is evaluated on an independent test set using several performance metrics including receiver operating characteristic curves, which are used for FAA turbulence product evaluations prior to their deployment. A prototype implementation fuses data from Doppler radar, geostationary satellites, a lightning detection network and a numerical weather prediction model to produce deterministic and probabilistic turbulence assessments suitable for use by air traffic managers, dispatchers and pilots. The algorithm is scheduled to be operationally implemented at the National Weather Service's Aviation Weather Center in 2014.

  19. Classification of remote sensed images using random forests and deep learning framework

    NASA Astrophysics Data System (ADS)

    Piramanayagam, S.; Schwartzkopf, W.; Koehler, F. W.; Saber, E.

    2016-10-01

    In this paper, we explore the use of two machine learning algorithms: (a) random forest for structured labels and (b) fully convolutional neural network for the land cover classification of multi-sensor remote sensed images. In random forest algorithm, individual decision trees are trained on features obtained from image patches and corresponding patch labels. Structural information present in the image patches improves the classification performance when compared to just utilizing pixel features. Random forest method was trained and evaluated on the ISPRS Vaihingen dataset that consist of true ortho photo (TOP: near IR, R, G) and Digital Surface Model (DSM) data. The method achieves an overall accuracy of 86.3% on the test dataset. We also show qualitative results on a SAR image. In addition, we employ a fully convolutional neural network framework (FCN) to do pixel-wise classification of the above multi-sensor image. TOP and DSM data have individual convolutional layers with features fused before the fully convolutional layers. The network when evaluated on the Vaihingen dataset achieves an overall classification accuracy of 88%.

  20. Toward Digital Staining using Imaging Mass Spectrometry and Random Forests

    PubMed Central

    Hanselmann, Michael; Köthe, Ullrich; Kirchner, Marc; Renard, Bernhard Y.; Amstalden, Erika R.; Glunde, Kristine; Heeren, Ron M. A.; Hamprecht, Fred A.

    2009-01-01

    We show on Imaging Mass Spectrometry (IMS) data that the Random Forest classifier can be used for automated tissue classification and that it results in predictions with high sensitivities and positive predictive values, even when inter-sample variability is present in the data. We further demonstrate how Markov Random Fields and vector-valued median filtering can be applied to reduce noise effects to further improve the classification results in a post-hoc smoothing step. Our study gives clear evidence that digital staining by means of IMS constitutes a promising complement to chemical staining techniques. PMID:19469555

  1. Genetic algorithms as global random search methods

    NASA Technical Reports Server (NTRS)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  2. Genetic algorithms as global random search methods

    NASA Technical Reports Server (NTRS)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that that schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solution and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  3. Collision-Resolution Algorithms and Random-Access Communications.

    DTIC Science & Technology

    1980-04-01

    DOCUMENTATION PAGE BEFORE COMPLETING FORM 2GOVT ACCESSION NO. 3. RECIPIENT’S CATALOG NUMBER S. TYPJLCI REPORT IS PUMAERED -- COMMNIC ~fus S.PERFORMING...performance of random-access algorithms that in- corporate these algorithms. The first and most important of these is the con- ditional mean CRI

  4. A Randomized Approximate Nearest Neighbors Algorithm

    DTIC Science & Technology

    2010-09-14

    Introduction to Harmonic Analysis, Second edition, Dover Publi- cations (1976). [12] D. Knuth , Seminumerical Algorithms, vol. 2 of The Art of Computer ...ES) Yale University ,Department of Computer Science,New Haven,CT,06520 8. PERFORMING ORGANIZATION REPORT NUMBER 9. SPONSORING/MONITORING AGENCY...may further assume that t > a2 and evaluate the cdf of D−a at t by computing the probability of D−a being smaller than t to obtain FD−a (t) = ∫ t a2

  5. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions

    PubMed Central

    Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological

  6. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

    PubMed

    Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological

  7. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

    PubMed

    Holliday, Jason A; Wang, Tongli; Aitken, Sally

    2012-09-01

    Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

  8. System and Method for Tracking Vehicles Using Random Search Algorithms.

    DTIC Science & Technology

    1997-01-31

    patent application is available for licensing. Requests for information should be addressed to: OFFICE OF NAVAL RESEARCH DEPARTMENT OF THE NAVY...relates to a system and a method for 22 tracking vehicles using random search algorithm methodolgies . 23 (2) Description of the Prior Art 24 Contact...algorithm methodologies for finding peaks in non-linear 14 functions. U.S. Patent No. 5,148,513 to Koza et al., for 15 example, relates to a non-linear

  9. RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems

    PubMed Central

    Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann

    2016-01-01

    Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing. PMID:26761008

  10. Disaggregating census data for population mapping using random forests with remotely-sensed and ancillary data.

    PubMed

    Stevens, Forrest R; Gaughan, Andrea E; Linard, Catherine; Tatem, Andrew J

    2015-01-01

    High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, "Random Forest" estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America.

  11. Classification of acoustic emission signals using wavelets and Random Forests : Application to localized corrosion

    NASA Astrophysics Data System (ADS)

    Morizet, N.; Godin, N.; Tang, J.; Maillet, E.; Fregonese, M.; Normand, B.

    2016-03-01

    This paper aims to propose a novel approach to classify acoustic emission (AE) signals deriving from corrosion experiments, even if embedded into a noisy environment. To validate this new methodology, synthetic data are first used throughout an in-depth analysis, comparing Random Forests (RF) to the k-Nearest Neighbor (k-NN) algorithm. Moreover, a new evaluation tool called the alter-class matrix (ACM) is introduced to simulate different degrees of uncertainty on labeled data for supervised classification. Then, tests on real cases involving noise and crevice corrosion are conducted, by preprocessing the waveforms including wavelet denoising and extracting a rich set of features as input of the RF algorithm. To this end, a software called RF-CAM has been developed. Results show that this approach is very efficient on ground truth data and is also very promising on real data, especially for its reliability, performance and speed, which are serious criteria for the chemical industry.

  12. Analytic and Algorithmic Solution of Random Satisfiability Problems

    NASA Astrophysics Data System (ADS)

    Mézard, M.; Parisi, G.; Zecchina, R.

    2002-08-01

    We study the satisfiability of random Boolean expressions built from many clauses with K variables per clause (K-satisfiability). Expressions with a ratio α of clauses to variables less than a threshold αc are almost always satisfiable, whereas those with a ratio above this threshold are almost always unsatisfiable. We show the existence of an intermediate phase below αc, where the proliferation of metastable states is responsible for the onset of complexity in search algorithms. We introduce a class of optimization algorithms that can deal with these metastable states; one such algorithm has been tested successfully on the largest existing benchmark of K-satisfiability.

  13. Simple-random-sampling-based multiclass text classification algorithm.

    PubMed

    Liu, Wuying; Wang, Lin; Yi, Mianzhu

    2014-01-01

    Multiclass text classification (MTC) is a challenging issue and the corresponding MTC algorithms can be used in many applications. The space-time overhead of the algorithms must be concerned about the era of big data. Through the investigation of the token frequency distribution in a Chinese web document collection, this paper reexamines the power law and proposes a simple-random-sampling-based MTC (SRSMTC) algorithm. Supported by a token level memory to store labeled documents, the SRSMTC algorithm uses a text retrieval approach to solve text classification problems. The experimental results on the TanCorp data set show that SRSMTC algorithm can achieve the state-of-the-art performance at greatly reduced space-time requirements.

  14. Estimating tropical forest structure using discrete return lidar data and a locally trained synthetic forest algorithm

    NASA Astrophysics Data System (ADS)

    Palace, M. W.; Sullivan, F. B.; Ducey, M.; Czarnecki, C.; Zanin Shimbo, J.; Mota e Silva, J.

    2012-12-01

    Forests are complex ecosystems with diverse species assemblages, crown structures, size class distributions, and historical disturbances. This complexity makes monitoring, understanding and forecasting carbon dynamics difficult. Still, this complexity is also central in carbon cycling of terrestrial vegetation. Lidar data often is used solely to associate plot level biomass measurements with canopy height models. There is much more that may be gleaned from examining the full profile from lidar data. Using discrete return airborne light detection and ranging (lidar) data collected in 2009 by the Tropical Ecology Assessment and Monitoring Network (TEAM), we compared synthetic vegetation profiles to lidar-derived relative vegetation profiles (RVPs) in La Selva, Costa Rica. To accomplish this, we developed RVPs to describe the vertical distribution of plant material on 20 plots at La Selva by transforming cumulative lidar observations to account for obscured plant material. Hundreds of synthetic profiles were developed for forests containing approximately 200,000 trees with random diameter at breast height (DBH), assuming a Weibull distribution with a shape of 1.0, and mean DBH ranging from 0cm to 500cm. For each tree in the synthetic forests, crown shape (width, depth) and total height were estimated using previously developed allometric equations for tropical forests. Profiles for each synthetic forest were generated and compared to TEAM lidar data to determine the best fitting synthetic profile to lidar profiles for each of 20 field plots at La Selva. After determining the best fit synthetic profile using the minimum sum of squared differences, we are able to estimate forest structure (diameter distribution, height, and biomass) and to compare our estimates to field data for each of the twenty field plots. Our preliminary results show promise for estimating forest structure and biomass using lidar data and computer modeling.

  15. Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.

    PubMed

    Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G

    2013-09-23

    The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.

  16. Spatial downscaling of precipitation using adaptable random forests

    NASA Astrophysics Data System (ADS)

    He, Xiaogang; Chaney, Nathaniel W.; Schleiss, Marc; Sheffield, Justin

    2016-10-01

    This paper introduces Prec-DWARF (Precipitation Downscaling With Adaptable Random Forests), a novel machine-learning based method for statistical downscaling of precipitation. Prec-DWARF sets up a nonlinear relationship between precipitation at fine resolution and covariates at coarse/fine resolution, based on the advanced binary tree method known as Random Forests (RF). In addition to a single RF, we also consider a more advanced implementation based on two independent RFs which yield better results for extreme precipitation. Hourly gauge-radar precipitation data at 0.125° from NLDAS-2 are used to conduct synthetic experiments with different spatial resolutions (0.25°, 0.5°, and 1°). Quantitative evaluation of these experiments demonstrates that Prec-DWARF consistently outperforms the baseline (i.e., bilinear interpolation in this case) and can reasonably reproduce the spatial and temporal patterns, occurrence and distribution of observed precipitation fields. However, Prec-DWARF with a single RF significantly underestimates precipitation extremes and often cannot correctly recover the fine-scale spatial structure, especially for the 1° experiments. Prec-DWARF with a double RF exhibits improvement in the simulation of extreme precipitation as well as its spatial and temporal structures, but variogram analyses show that the spatial and temporal variability of the downscaled fields are still strongly underestimated. Covariate importance analysis shows that the most important predictors for the downscaling are the coarse-scale precipitation values over adjacent grid cells as well as the distance to the closest dry grid cell (i.e., the dry drift). The encouraging results demonstrate the potential of Prec-DWARF and machine-learning based techniques in general for the statistical downscaling of precipitation.

  17. Pigmented skin lesion detection using random forest and wavelet-based texture

    NASA Astrophysics Data System (ADS)

    Hu, Ping; Yang, Tie-jun

    2016-10-01

    The incidence of cutaneous malignant melanoma, a disease of worldwide distribution and is the deadliest form of skin cancer, has been rapidly increasing over the last few decades. Because advanced cutaneous melanoma is still incurable, early detection is an important step toward a reduction in mortality. Dermoscopy photographs are commonly used in melanoma diagnosis and can capture detailed features of a lesion. A great variability exists in the visual appearance of pigmented skin lesions. Therefore, in order to minimize the diagnostic errors that result from the difficulty and subjectivity of visual interpretation, an automatic detection approach is required. The objectives of this paper were to propose a hybrid method using random forest and Gabor wavelet transformation to accurately differentiate which part belong to lesion area and the other is not in a dermoscopy photographs and analyze segmentation accuracy. A random forest classifier consisting of a set of decision trees was used for classification. Gabor wavelets transformation are the mathematical model of visual cortical cells of mammalian brain and an image can be decomposed into multiple scales and multiple orientations by using it. The Gabor function has been recognized as a very useful tool in texture analysis, due to its optimal localization properties in both spatial and frequency domain. Texture features based on Gabor wavelets transformation are found by the Gabor filtered image. Experiment results indicate the following: (1) the proposed algorithm based on random forest outperformed the-state-of-the-art in pigmented skin lesions detection (2) and the inclusion of Gabor wavelet transformation based texture features improved segmentation accuracy significantly.

  18. Critical behaviour of spanning forests on random planar graphs

    NASA Astrophysics Data System (ADS)

    Bondesan, Roberto; Caracciolo, Sergio; Sportiello, Andrea

    2017-02-01

    As a follow-up of previous work of the authors, we analyse the statistical mechanics model of random spanning forests on random planar graphs. Special emphasis is given to the analysis of the critical behaviour. Exploiting an exact relation with a model of \\text{O}(-2) -loops and dimers, previously solved by Kostov and Staudacher, we identify critical and multicritical loci, and find them consistent with recent results of Bousquet-Mélou and Courtiel. This is also consistent with the KPZ relation, and the Berker-Kadanoff phase in the anti-ferromagnetic regime of the Potts Model on periodic lattices, predicted by Saleur. To our knowledge, this is the first known example of KPZ appearing explicitly to work within a Berker-Kadanoff phase. We set up equations for the generating function, at the value t  =  -1 of the fugacity, which is of combinatorial interest, and we investigate the resulting numerical series, a favourite problem of Tony Guttmann’s. Dedicated to Tony Guttmann on the occasion of his 70th birthday.

  19. MOQA min-max heapify: A randomness preserving algorithm

    NASA Astrophysics Data System (ADS)

    Gao, Ang; Hennessy, Aoife; Schellekens, Michel

    2012-09-01

    MOQA is a high-level data structuring language, designed to allow for modular static timing analysis [1, 2, 3]. In essence,MOQA allows the programmer to determine the average running time of a broad class of programmes directly from the code in a (semi-)automated way. The modularity property brings a strong advantage for the programmer. The capacity to combine parts of code, where the average-time is simply the sum of the times of the parts, is a very helpful advantage in static analysis, something which is not available in current languages. Modularity also improves precision of average-case analysis, supporting the determination of accurate estimates on the average number of basic operations ofMOQA programs. The mathematical theory underpinning this approach is that of random structures and their preservation. Applying any MOQA operation to all elements of a random structure results in an output isomorphic to one or more random structures, which is the key to systematic timing. Here we introduce the approach in a self contained way and provide a MOQA version of the well-known algorithm of Min-Max heapify, constructed with the MOQA product operation. We demonstrate the "randomness preservation" property of the algorithm and illustrate the applicability of our method by deriving the exact average time of the algorithm.

  20. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    PubMed

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse.

  1. Predictive lithological mapping of Canada's North using Random Forest classification applied to geophysical and geochemical data

    NASA Astrophysics Data System (ADS)

    Harris, J. R.; Grunsky, E. C.

    2015-07-01

    A recent method for mapping lithology which involves the Random Forest (RF) machine classification algorithm is evaluated. Random Forests, a supervised classifier, requires training data representative of each lithology to produce a predictive or classified map. We use two training strategies, one based on the location of lake sediment geochemical samples where the rock type is recorded from a legacy geology map at each sample station and the second strategy is based on lithology recorded from field stations derived from reconnaissance field mapping. We apply the classification to interpolated major and minor lake sediment geochemical data as well as airborne total field magnetic and gamma ray spectrometer data. Using this method we produce predictions of the lithology of a large section of the Hearne Archean - Paleoproterozoic tectonic domain, in northern Canada. The results indicate that meaningful predictive lithologic maps can be produced using RF classification for both training strategies. The best results were achieved when all data were used; however, the geochemical and gamma ray data were the strongest predictors of the various lithologies. The maps generated from this research can be used to compliment field mapping activities by focusing field work on areas where the predicted geology and legacy geology do not match and as first order geological maps in poorly mapped areas.

  2. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    NASA Astrophysics Data System (ADS)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  3. Gene selection using iterative feature elimination random forests for survival outcomes

    PubMed Central

    Pang, Herbert; George, Stephen L.; Hui, Ken; Tong, Tiejun

    2012-01-01

    Although many feature selection methods for classification have been developed, there is a need to identify genes in high-dimensional data with censored survival outcomes. Traditional methods for gene selection in classification problems have several drawbacks. First, the majority of the gene selection approaches for classification are single-gene based. Second, many of the gene selection procedures are not embedded within the algorithm itself. The technique of random forests has been found to perform well in high dimensional data settings with survival outcomes. It also has an embedded feature to identify variables of importance. Therefore, it is an ideal candidate for gene selection in high dimensional data with survival outcomes. In this paper, we develop a novel method based on the random forests to identify a set of prognostic genes. We compare our method with several machine learning methods and various node split criteria using several real data sets. Our method performed well in both simulations and real data analysis. Additionally, we have shown the advantages of our approach over single-gene based approaches. Our method incorporates multivariate correlations in microarray data for survival outcomes. The described method allows us to best utilize the information available from microarray data with survival outcomes. PMID:22547432

  4. A biased random-key genetic algorithm for data clustering.

    PubMed

    Festa, P

    2013-09-01

    Cluster analysis aims at finding subsets (clusters) of a given set of entities, which are homogeneous and/or well separated. Starting from the 1990s, cluster analysis has been applied to several domains with numerous applications. It has emerged as one of the most exciting interdisciplinary fields, having benefited from concepts and theoretical results obtained by different scientific research communities, including genetics, biology, biochemistry, mathematics, and computer science. The last decade has brought several new algorithms, which are able to solve larger sized and real-world instances. We will give an overview of the main types of clustering and criteria for homogeneity or separation. Solution techniques are discussed, with special emphasis on the combinatorial optimization perspective, with the goal of providing conceptual insights and literature references to the broad community of clustering practitioners. A new biased random-key genetic algorithm is also described and compared with several efficient hybrid GRASP algorithms recently proposed to cluster biological data.

  5. Prediction of Protein-Protein Interactions with Physicochemical Descriptors and Wavelet Transform via Random Forests.

    PubMed

    Jia, Jianhua; Xiao, Xuan; Liu, Bingxiang

    2016-06-01

    Protein-protein interactions (PPIs) provide valuable insight into the inner workings of cells, and it is significant to study the network of PPIs. It is vitally important to develop an automated method as a high-throughput tool to timely predict PPIs. Based on the physicochemical descriptors, a protein was converted into several digital signals, and then wavelet transform was used to analyze them. With such a formulation frame to represent the samples of protein sequences, the random forests algorithm was adopted to conduct prediction. The results on a large-scale independent-test data set show that the proposed model can achieve a good performance with an accuracy value of about 0.86 and a geometric mean value of about 0.85. Therefore, it can be a usefully supplementary tool for PPI prediction. The predictor used in this article is freely available at http://www.jci-bioinfo.cn/PPI_RF.

  6. Continental-scale ICESat canopy height modelling sensitivity and random forest simulations in Australia and Canada

    NASA Astrophysics Data System (ADS)

    Hopkinson, C.; Mahoney, C.; Held, A. A.; Hall, R.

    2014-12-01

    The Geoscience Laser Altimeter System (GLAS), previously onboard the Ice, Cloud, and land Elevation Satellite (ICESat) uniquely offers near global waveform LiDAR coverage, however, data quality are subject to system, temporal, and spatial issues. These subtleties are investigated here with respect to canopy height comparisons with 3 airborne LiDAR sites in Australia. Optimal GLAS results were obtained from high energy laser transmissions from laser 3 during leaf-on conditions; GLAS data best corresponded with 95th percentile heights from an all return airborne LiDAR point cloud. In addition, best GLAS results were obtained over relatively open canopies, where prominent ground returns can be retrieved. Optimized GLAS data within Australian forests were employed as canopy height observations, and related to 6 predictor variables (landcover, cover fraction, elevation, slope, soils, and species) by random forest (RF) models. Fifty seven RF models were trained, varying by binomial combinations of predictor data, from 2 to 6 inputs. Trained models were separately utilized to predict Australia wide canopy heights; RF canopy height outputs were validated against spatially concurrent airborne LiDAR 95th percentile canopy heights from an all return point cloud for 10 sites, encompassing multiple ecosystems. The best RF output was obtained from predictor data inputs: landcover, cover fraction, elevation soils, and species, yielding a RMSE=7.98 m, and R2=0.97. Results indicate inherent issues (noted in existing literature) in GLAS observations that propagate through RF algorithms, manifested as canopy height underestimations for taller vegetation (>45 m). To extend this research to the Canadian boreal forest context, research is also targeting canopy height model development in the Northwest Territories, allowing investigations of time-variant phenology and landcover sensitivity due to wetland extent and growth, snow cover and other land cover changes common within boreal

  7. Detecting knee osteoarthritis and its discriminating parameters using random forests.

    PubMed

    Kotti, Margarita; Duffell, Lynsey D; Faisal, Aldo A; McGregor, Alison H

    2017-02-24

    This paper tackles the problem of automatic detection of knee osteoarthritis. A computer system is built that takes as input the body kinetics and produces as output not only an estimation of presence of the knee osteoarthritis, as previously done in the literature, but also the most discriminating parameters along with a set of rules on how this decision was reached. This fills the gap of interpretability between the medical and the engineering approaches. We collected locomotion data from 47 subjects with knee osteoarthritis and 47 healthy subjects. Osteoarthritis subjects were recruited from hospital clinics and GP surgeries, and age and sex matched healthy subjects from the local community. Subjects walked on a walkway equipped with two force plates with piezoelectric 3-component force sensors. Parameters of the vertical, anterior-posterior, and medio-lateral ground reaction forces, such as mean value, push-off time, and slope, were extracted. Then random forest regressors map those parameters via rule induction to the degree of knee osteoarthritis. To boost generalisation ability, a subject-independent protocol is employed. The 5-fold cross-validated accuracy is 72.61%±4.24%. We show that with 3 steps or less a reliable clinical measure can be extracted in a rule-based approach when the dataset is analysed appropriately.

  8. Random Forests for Global and Regional Crop Yield Predictions

    PubMed Central

    Jeong, Jig Han; Resop, Jonathan P.; Mueller, Nathaniel D.; Fleisher, David H.; Yun, Kyungdahm; Butler, Ethan E.; Timlin, Dennis J.; Shim, Kyo-Moon; Gerber, James S.; Reddy, Vangimalla R.

    2016-01-01

    Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data. PMID:27257967

  9. Random forest regression for magnetic resonance image synthesis.

    PubMed

    Jog, Amod; Carass, Aaron; Roy, Snehashis; Pham, Dzung L; Prince, Jerry L

    2017-01-01

    By choosing different pulse sequences and their parameters, magnetic resonance imaging (MRI) can generate a large variety of tissue contrasts. This very flexibility, however, can yield inconsistencies with MRI acquisitions across datasets or scanning sessions that can in turn cause inconsistent automated image analysis. Although image synthesis of MR images has been shown to be helpful in addressing this problem, an inability to synthesize both T2-weighted brain images that include the skull and FLuid Attenuated Inversion Recovery (FLAIR) images has been reported. The method described herein, called REPLICA, addresses these limitations. REPLICA is a supervised random forest image synthesis approach that learns a nonlinear regression to predict intensities of alternate tissue contrasts given specific input tissue contrasts. Experimental results include direct image comparisons between synthetic and real images, results from image analysis tasks on both synthetic and real images, and comparison against other state-of-the-art image synthesis methods. REPLICA is computationally fast, and is shown to be comparable to other methods on tasks they are able to perform. Additionally REPLICA has the capability to synthesize both T2-weighted images of the full head and FLAIR images, and perform intensity standardization between different imaging datasets.

  10. Vertebral degenerative disc disease severity evaluation using random forest classification

    NASA Astrophysics Data System (ADS)

    Munoz, Hector E.; Yao, Jianhua; Burns, Joseph E.; Pham, Yasuyuki; Stieger, James; Summers, Ronald M.

    2014-03-01

    Degenerative disc disease (DDD) develops in the spine as vertebral discs degenerate and osseous excrescences or outgrowths naturally form to restabilize unstable segments of the spine. These osseous excrescences, or osteophytes, may progress or stabilize in size as the spine reaches a new equilibrium point. We have previously created a CAD system that detects DDD. This paper presents a new system to determine the severity of DDD of individual vertebral levels. This will be useful to monitor the progress of developing DDD, as rapid growth may indicate that there is a greater stabilization problem that should be addressed. The existing DDD CAD system extracts the spine from CT images and segments the cortical shell of individual levels with a dual-surface model. The cortical shell is unwrapped, and is analyzed to detect the hyperdense regions of DDD. Three radiologists scored the severity of DDD of each disc space of 46 CT scans. Radiologists' scores and features generated from CAD detections were used to train a random forest classifier. The classifier then assessed the severity of DDD at each vertebral disc level. The agreement between the computer severity score and the average radiologist's score had a quadratic weighted Cohen's kappa of 0.64.

  11. Exploring precrash maneuvers using classification trees and random forests.

    PubMed

    Harb, Rami; Yan, Xuedong; Radwan, Essam; Su, Xiaogang

    2009-01-01

    Taking evasive actions vis-à-vis critical traffic situations impending to motor vehicle crashes endows drivers an opportunity to avoid the crash occurrence or at least diminish its severity. This study explores the drivers, vehicles, and environments' characteristics associated with crash avoidance maneuvers (i.e., evasive actions or no evasive actions). Rear-end collisions, head-on collisions, and angle collisions are analyzed separately using decision trees and the significance of the variables on the binary response variable (evasive actions or no evasive actions) is determined. Moreover, the random forests method is employed to rank the importance of the drivers/vehicles/environments characteristics on crash avoidance maneuvers. According to the exploratory analyses' results, drivers' visibility obstruction, drivers' physical impairment, drivers' distraction are associated with crash avoidance maneuvers in all three types of accidents. Moreover, speed limit is associated with rear-end collisions' avoidance maneuvers and vehicle type is correlated with head-on collisions and angle collisions' avoidance maneuvers. It is recommended that future research investigates further the explored trends (e.g., physically impaired drivers, visibility obstruction) using driving simulators which may help in legislative initiatives and in-vehicle technology recommendations.

  12. Credit Risk Evaluation of Power Market Players with Random Forest

    NASA Astrophysics Data System (ADS)

    Umezawa, Yasushi; Mori, Hiroyuki

    A new method is proposed for credit risk evaluation in a power market. The credit risk evaluation is to measure the bankruptcy risk of the company. The power system liberalization results in new environment that puts emphasis on the profit maximization and the risk minimization. There is a high probability that the electricity transaction causes a risk between companies. So, power market players are concerned with the risk minimization. As a management strategy, a risk index is requested to evaluate the worth of the business partner. This paper proposes a new method for evaluating the credit risk with Random Forest (RF) that makes ensemble learning for the decision tree. RF is one of efficient data mining technique in clustering data and extracting relationship between input and output data. In addition, the method of generating pseudo-measurements is proposed to improve the performance of RF. The proposed method is successfully applied to real financial data of energy utilities in the power market. A comparison is made between the proposed and the conventional methods.

  13. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data

    PubMed Central

    Stevens, Forrest R.; Gaughan, Andrea E.; Linard, Catherine; Tatem, Andrew J.

    2015-01-01

    High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, “Random Forest” estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America. PMID:25689585

  14. An assessment of the effectiveness of a random forest classifier for land-cover classification

    NASA Astrophysics Data System (ADS)

    Rodriguez-Galiano, V. F.; Ghimire, B.; Rogan, J.; Chica-Olmo, M.; Rigol-Sanchez, J. P.

    2012-01-01

    Land cover monitoring using remotely sensed data requires robust classification methods which allow for the accurate mapping of complex land cover and land use categories. Random forest (RF) is a powerful machine learning classifier that is relatively unknown in land remote sensing and has not been evaluated thoroughly by the remote sensing community compared to more conventional pattern recognition techniques. Key advantages of RF include: their non-parametric nature; high classification accuracy; and capability to determine variable importance. However, the split rules for classification are unknown, therefore RF can be considered to be black box type classifier. RF provides an algorithm for estimating missing values; and flexibility to perform several types of data analysis, including regression, classification, survival analysis, and unsupervised learning. In this paper, the performance of the RF classifier for land cover classification of a complex area is explored. Evaluation was based on several criteria: mapping accuracy, sensitivity to data set size and noise. Landsat-5 Thematic Mapper data captured in European spring and summer were used with auxiliary variables derived from a digital terrain model to classify 14 different land categories in the south of Spain. Results show that the RF algorithm yields accurate land cover classifications, with 92% overall accuracy and a Kappa index of 0.92. RF is robust to training data reduction and noise because significant differences in kappa values were only observed for data reduction and noise addition values greater than 50 and 20%, respectively. Additionally, variables that RF identified as most important for classifying land cover coincided with expectations. A McNemar test indicates an overall better performance of the random forest model over a single decision tree at the 0.00001 significance level.

  15. Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

    PubMed

    Hassan, Hebatallah; Badr, Amr; Abdelhalim, M B

    2015-01-01

    O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

  16. An Individual Tree Detection Algorithm for Dense Deciduous Forests with Spreading Branches

    NASA Astrophysics Data System (ADS)

    Shao, G.

    2015-12-01

    Individual tree information derived from LiDAR may have the potential to assist forest inventory and improve the assessment of forest structure and composition for sustainable forest management. The algorithms developed for individual tree detection are commonly focusing on finding tree tops to allocation the tree positions. However, the spreading branches (cylinder crowns) in deciduous forests cause such kind of algorithms work less effectively on dense canopy. This research applies a machine learning algorithm, mean shift, to position individual trees based on the density of LiDAR point could instead of detecting tree tops. The study site locates in a dense oak forest in Indiana, US. The selection of mean shift kernels is discussed. The constant and dynamic bandwidths of mean shit algorithms are applied and compared.

  17. A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts

    PubMed Central

    Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

    2017-01-01

    The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking. PMID:28212327

  18. A Robust Random Forest-Based Approach for Heart Rate Monitoring Using Photoplethysmography Signal Contaminated by Intense Motion Artifacts.

    PubMed

    Ye, Yalan; He, Wenwen; Cheng, Yunfei; Huang, Wenxia; Zhang, Zhilin

    2017-02-16

    The estimation of heart rate (HR) based on wearable devices is of interest in fitness. Photoplethysmography (PPG) is a promising approach to estimate HR due to low cost; however, it is easily corrupted by motion artifacts (MA). In this work, a robust approach based on random forest is proposed for accurately estimating HR from the photoplethysmography signal contaminated by intense motion artifacts, consisting of two stages. Stage 1 proposes a hybrid method to effectively remove MA with a low computation complexity, where two MA removal algorithms are combined by an accurate binary decision algorithm whose aim is to decide whether or not to adopt the second MA removal algorithm. Stage 2 proposes a random forest-based spectral peak-tracking algorithm, whose aim is to locate the spectral peak corresponding to HR, formulating the problem of spectral peak tracking into a pattern classification problem. Experiments on the PPG datasets including 22 subjects used in the 2015 IEEE Signal Processing Cup showed that the proposed approach achieved the average absolute error of 1.65 beats per minute (BPM) on the 22 PPG datasets. Compared to state-of-the-art approaches, the proposed approach has better accuracy and robustness to intense motion artifacts, indicating its potential use in wearable sensors for health monitoring and fitness tracking.

  19. Random Matrix Approach to Quantum Adiabatic Evolution Algorithms

    NASA Technical Reports Server (NTRS)

    Boulatov, Alexei; Smelyanskiy, Vadier N.

    2004-01-01

    We analyze the power of quantum adiabatic evolution algorithms (Q-QA) for solving random NP-hard optimization problems within a theoretical framework based on the random matrix theory (RMT). We present two types of the driven RMT models. In the first model, the driving Hamiltonian is represented by Brownian motion in the matrix space. We use the Brownian motion model to obtain a description of multiple avoided crossing phenomena. We show that the failure mechanism of the QAA is due to the interaction of the ground state with the "cloud" formed by all the excited states, confirming that in the driven RMT models. the Landau-Zener mechanism of dissipation is not important. We show that the QAEA has a finite probability of success in a certain range of parameters. implying the polynomial complexity of the algorithm. The second model corresponds to the standard QAEA with the problem Hamiltonian taken from the Gaussian Unitary RMT ensemble (GUE). We show that the level dynamics in this model can be mapped onto the dynamics in the Brownian motion model. However, the driven RMT model always leads to the exponential complexity of the algorithm due to the presence of the long-range intertemporal correlations of the eigenvalues. Our results indicate that the weakness of effective transitions is the leading effect that can make the Markovian type QAEA successful.

  20. Combinatorial approximation algorithms for MAXCUT using random walks.

    SciTech Connect

    Seshadhri, Comandur; Kale, Satyen

    2010-11-01

    We give the first combinatorial approximation algorithm for MaxCut that beats the trivial 0.5 factor by a constant. The main partitioning procedure is very intuitive, natural, and easily described. It essentially performs a number of random walks and aggregates the information to provide the partition. We can control the running time to get an approximation factor-running time tradeoff. We show that for any constant b > 1.5, there is an {tilde O}(n{sup b}) algorithm that outputs a (0.5 + {delta})-approximation for MaxCut, where {delta} = {delta}(b) is some positive constant. One of the components of our algorithm is a weak local graph partitioning procedure that may be of independent interest. Given a starting vertex i and a conductance parameter {phi}, unless a random walk of length {ell} = O(log n) starting from i mixes rapidly (in terms of {phi} and {ell}), we can find a cut of conductance at most {phi} close to the vertex. The work done per vertex found in the cut is sublinear in n.

  1. Hydrologic landscape regionalisation using deductive classification and random forests.

    PubMed

    Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  2. Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests

    PubMed Central

    Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  3. Random search optimization based on genetic algorithm and discriminant function

    NASA Technical Reports Server (NTRS)

    Kiciman, M. O.; Akgul, M.; Erarslanoglu, G.

    1990-01-01

    The general problem of optimization with arbitrary merit and constraint functions, which could be convex, concave, monotonic, or non-monotonic, is treated using stochastic methods. To improve the efficiency of the random search methods, a genetic algorithm for the search phase and a discriminant function for the constraint-control phase were utilized. The validity of the technique is demonstrated by comparing the results to published test problem results. Numerical experimentation indicated that for cases where a quick near optimum solution is desired, a general, user-friendly optimization code can be developed without serious penalties in both total computer time and accuracy.

  4. Randomized Algorithms for Systems and Control: Theory and Applications

    DTIC Science & Technology

    2008-05-01

    does not display a currently valid OMB control number . 1. REPORT DATE MAY 2008 2. REPORT TYPE 3. DATES COVERED 00-00-2008 to 00-00-2008 4...TITLE AND SUBTITLE Randomized Algorithms for Systems and Control: Theory and Applications 5a. CONTRACT NUMBER 5b. GRANT NUMBER 5c. PROGRAM ELEMENT... NUMBER 6. AUTHOR(S) 5d. PROJECT NUMBER 5e. TASK NUMBER 5f. WORK UNIT NUMBER 7. PERFORMING ORGANIZATION NAME(S) AND ADDRESS(ES) IEIIT-CNR

  5. Applying a weighted random forests method to extract karst sinkholes from LiDAR data

    NASA Astrophysics Data System (ADS)

    Zhu, Junfeng; Pierskalla, William P.

    2016-02-01

    Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.

  6. Random forests on Hadoop for genome-wide association studies of multivariate neuroimaging phenotypes

    PubMed Central

    2013-01-01

    Motivation Multivariate quantitative traits arise naturally in recent neuroimaging genetics studies, in which both structural and functional variability of the human brain is measured non-invasively through techniques such as magnetic resonance imaging (MRI). There is growing interest in detecting genetic variants associated with such multivariate traits, especially in genome-wide studies. Random forests (RFs) classifiers, which are ensembles of decision trees, are amongst the best performing machine learning algorithms and have been successfully employed for the prioritisation of genetic variants in case-control studies. RFs can also be applied to produce gene rankings in association studies with multivariate quantitative traits, and to estimate genetic similarities measures that are predictive of the trait. However, in studies involving hundreds of thousands of SNPs and high-dimensional traits, a very large ensemble of trees must be inferred from the data in order to obtain reliable rankings, which makes the application of these algorithms computationally prohibitive. Results We have developed a parallel version of the RF algorithm for regression and genetic similarity learning tasks in large-scale population genetic association studies involving multivariate traits, called PaRFR (Parallel Random Forest Regression). Our implementation takes advantage of the MapReduce programming model and is deployed on Hadoop, an open-source software framework that supports data-intensive distributed applications. Notable speed-ups are obtained by introducing a distance-based criterion for node splitting in the tree estimation process. PaRFR has been applied to a genome-wide association study on Alzheimer's disease (AD) in which the quantitative trait consists of a high-dimensional neuroimaging phenotype describing longitudinal changes in the human brain structure. PaRFR provides a ranking of SNPs associated to this trait, and produces pair-wise measures of genetic proximity

  7. Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

    NASA Astrophysics Data System (ADS)

    Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

    2015-03-01

    Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.

  8. Selecting Random Distributed Elements for HIFU using Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Zhou, Yufeng

    2011-09-01

    As an effective and noninvasive therapeutic modality for tumor treatment, high-intensity focused ultrasound (HIFU) has attracted attention from both physicians and patients. New generations of HIFU systems with the ability to electrically steer the HIFU focus using phased array transducers have been under development. The presence of side and grating lobes may cause undesired thermal accumulation at the interface of the coupling medium (i.e. water) and skin, or in the intervening tissue. Although sparse randomly distributed piston elements could reduce the amplitude of grating lobes, there are theoretically no grating lobes with the use of concave elements in the new phased array HIFU. A new HIFU transmission strategy is proposed in this study, firing a number of but not all elements for a certain period and then changing to another group for the next firing sequence. The advantages are: 1) the asymmetric position of active elements may reduce the side lobes, and 2) each element has some resting time during the entire HIFU ablation (up to several hours for some clinical applications) so that the decreasing efficiency of the transducer due to thermal accumulation is minimized. Genetic algorithm was used for selecting randomly distributed elements in a HIFU array. Amplitudes of the first side lobes at the focal plane were used as the fitness value in the optimization. Overall, it is suggested that the proposed new strategy could reduce the side lobe and the consequent side-effects, and the genetic algorithm is effective in selecting those randomly distributed elements in a HIFU array.

  9. A robust and accurate approach to automatic blood vessel detection and segmentation from angiography x-ray images using multistage random forests

    NASA Astrophysics Data System (ADS)

    Gupta, Vipin; Kale, Amit; Sundar, Hari

    2012-03-01

    In this paper we propose a novel approach based on multi-stage random forests to address problems faced by traditional vessel segmentation algorithms on account of image artifacts such as stitches organ shadows etc.. Our approach consists of collecting a very large number of training data consisting of positive and negative examples of valid seed points. The method makes use of a 14x14 window around a putative seed point. For this window three types of feature vectors are computed viz. vesselness, eigenvalue and a novel effective margin feature. A random forest RF is trained for each of the feature vectors. At run time the three RFs are applied in succession to a putative seed point generated by a naiive vessel detection algorithm based on vesselness. Our approach will prune this set of putative seed points to correctly identify true seed points thereby avoiding false positives. We demonstrate the effectiveness of our algorithm on a large dataset of angio images.

  10. Random matrix approach to quantum adiabatic evolution algorithms

    SciTech Connect

    Boulatov, A.; Smelyanskiy, V.N.

    2005-05-15

    We analyze the power of the quantum adiabatic evolution algorithm (QAA) for solving random computationally hard optimization problems within a theoretical framework based on random matrix theory (RMT). We present two types of driven RMT models. In the first model, the driving Hamiltonian is represented by Brownian motion in the matrix space. We use the Brownian motion model to obtain a description of multiple avoided crossing phenomena. We show that nonadiabatic corrections in the QAA are due to the interaction of the ground state with the 'cloud' formed by most of the excited states, confirming that in driven RMT models, the Landau-Zener scenario of pairwise level repulsions is not relevant for the description of nonadiabatic corrections. We show that the QAA has a finite probability of success in a certain range of parameters, implying a polynomial complexity of the algorithm. The second model corresponds to the standard QAA with the problem Hamiltonian taken from the RMT Gaussian unitary ensemble (GUE). We show that the level dynamics in this model can be mapped onto the dynamics in the Brownian motion model. For this reason, the driven GUE model can also lead to polynomial complexity of the QAA. The main contribution to the failure probability of the QAA comes from the nonadiabatic corrections to the eigenstates, which only depend on the absolute values of the transition amplitudes. Due to the mapping between the two models, these absolute values are the same in both cases. Our results indicate that this 'phase irrelevance' is the leading effect that can make both the Markovian- and GUE-type QAAs successful.

  11. Precipitation estimates from MSG SEVIRI daytime, night-time and twilight data with random forests

    NASA Astrophysics Data System (ADS)

    Kühnlein, Meike; Appelhans, Tim; Thies, Boris; Nauss, Thomas

    2014-05-01

    We introduce a new rainfall retrieval technique based on MSG SEVIRI data which aims to retrieve rainfall rates in a continuous manner (day, twilight and night) at high temporal resolution. Due to the deficiencies of existing optical rainfall retrievals, the focus of this technique is on assigning rainfall rates to precipitating cloud areas in connection with extra-tropical cyclones in mid-latitudes including both convective and advective-stratiform precipitating cloud areas. The technique is realized in three steps: (i) Precipitating cloud areas are identified. (ii) The precipitating cloud areas are separated into convective and advective-stratiform precipitating areas. (iii) Rainfall rates are assigned to the convective and advective-stratiform precipitating areas, respectively. Therefore, considering the dominant precipitation processes of convective and advective-stratiform precipitation areas within extra-tropical cyclones, satellite-based information on the cloud top height, cloud top temperature, cloud phase and cloud water path are used to retrieve information about precipitation. The approach uses the ensemble classification and regression technique random forests to develop the prediction algorithms. Random forest models contain a combination of characteristics that make them well suited for its application in precipitation remote sensing. One of the key advantages is the ability to capture non-linear association of patterns between predictors and response which becomes important when dealing with complex non-linear events like precipitation. Using a machine learning approach differentiates the proposed technique from most state-of-the-art satellite-based rainfall retrievals which generally use conventional parametric approaches. To train and validate the model, the radar-based RADOLAN RW product from the German Weather Service (DWD) is used which provides area-wide gauge-adjusted hourly precipitation information. Beside the overall performance of the

  12. Unbiased split variable selection for random survival forests using maximally selected rank statistics.

    PubMed

    Wright, Marvin N; Dankowski, Theresa; Ziegler, Andreas

    2017-04-15

    The most popular approach for analyzing survival data is the Cox regression model. The Cox model may, however, be misspecified, and its proportionality assumption may not always be fulfilled. An alternative approach for survival prediction is random forests for survival outcomes. The standard split criterion for random survival forests is the log-rank test statistic, which favors splitting variables with many possible split points. Conditional inference forests avoid this split variable selection bias. However, linear rank statistics are utilized by default in conditional inference forests to select the optimal splitting variable, which cannot detect non-linear effects in the independent variables. An alternative is to use maximally selected rank statistics for the split point selection. As in conditional inference forests, splitting variables are compared on the p-value scale. However, instead of the conditional Monte-Carlo approach used in conditional inference forests, p-value approximations are employed. We describe several p-value approximations and the implementation of the proposed random forest approach. A simulation study demonstrates that unbiased split variable selection is possible. However, there is a trade-off between unbiased split variable selection and runtime. In benchmark studies of prediction performance on simulated and real datasets, the new method performs better than random survival forests if informative dichotomous variables are combined with uninformative variables with more categories and better than conditional inference forests if non-linear covariate effects are included. In a runtime comparison, the method proves to be computationally faster than both alternatives, if a simple p-value approximation is used. Copyright © 2017 John Wiley & Sons, Ltd.

  13. Efficient Grammar Induction Algorithm with Parse Forests from Real Corpora

    NASA Astrophysics Data System (ADS)

    Kurihara, Kenichi; Kameya, Yoshitaka; Sato, Taisuke

    The task of inducing grammar structures has received a great deal of attention. The reasons why researchers have studied are different; to use grammar induction as the first stage in building large treebanks or to make up better language models. However, grammar induction has inherent computational complexity. To overcome it, some grammar induction algorithms add new production rules incrementally. They refine the grammar while keeping their computational complexity low. In this paper, we propose a new efficient grammar induction algorithm. Although our algorithm is similar to algorithms which learn a grammar incrementally, our algorithm uses the graphical EM algorithm instead of the Inside-Outside algorithm. We report results of learning experiments in terms of learning speeds. The results show that our algorithm learns a grammar in constant time regardless of the size of the grammar. Since our algorithm decreases syntactic ambiguities in each step, our algorithm reduces required time for learning. This constant-time learning considerably affects learning time for larger grammars. We also reports results of evaluation of criteria to choose nonterminals. Our algorithm refines a grammar based on a nonterminal in each step. Since there can be several criteria to decide which nonterminal is the best, we evaluate them by learning experiments.

  14. Urban land cover thematic disaggregation, employing datasets from multiple sources and RandomForests modeling

    NASA Astrophysics Data System (ADS)

    Gounaridis, Dimitrios; Koukoulas, Sotirios

    2016-09-01

    Urban land cover mapping has lately attracted a vast amount of attention as it closely relates to a broad scope of scientific and management applications. Late methodological and technological advancements facilitate the development of datasets with improved accuracy. However, thematic resolution of urban land cover has received much less attention so far, a fact that hampers the produced datasets utility. This paper seeks to provide insights towards the improvement of thematic resolution of urban land cover classification. We integrate existing, readily available and with acceptable accuracies datasets from multiple sources, with remote sensing techniques. The study site is Greece and the urban land cover is classified nationwide into five classes, using the RandomForests algorithm. Results allowed us to quantify, for the first time with a good accuracy, the proportion that is occupied by each different urban land cover class. The total area covered by urban land cover is 2280 km2 (1.76% of total terrestrial area), the dominant class is discontinuous dense urban fabric (50.71% of urban land cover) and the least occurring class is discontinuous very low density urban fabric (2.06% of urban land cover).

  15. Automatic co-segmentation of lung tumor based on random forest in PET-CT images

    NASA Astrophysics Data System (ADS)

    Jiang, Xueqing; Xiang, Dehui; Zhang, Bin; Zhu, Weifang; Shi, Fei; Chen, Xinjian

    2016-03-01

    In this paper, a fully automatic method is proposed to segment the lung tumor in clinical 3D PET-CT images. The proposed method effectively combines PET and CT information to make full use of the high contrast of PET images and superior spatial resolution of CT images. Our approach consists of three main parts: (1) initial segmentation, in which spines are removed in CT images and initial connected regions achieved by thresholding based segmentation in PET images; (2) coarse segmentation, in which monotonic downhill function is applied to rule out structures which have similar standardized uptake values (SUV) to the lung tumor but do not satisfy a monotonic property in PET images; (3) fine segmentation, random forests method is applied to accurately segment the lung tumor by extracting effective features from PET and CT images simultaneously. We validated our algorithm on a dataset which consists of 24 3D PET-CT images from different patients with non-small cell lung cancer (NSCLC). The average TPVF, FPVF and accuracy rate (ACC) were 83.65%, 0.05% and 99.93%, respectively. The correlation analysis shows our segmented lung tumor volumes has strong correlation ( average 0.985) with the ground truth 1 and ground truth 2 labeled by a clinical expert.

  16. Human tracking in thermal images using adaptive particle filters with online random forest learning

    NASA Astrophysics Data System (ADS)

    Ko, Byoung Chul; Kwak, Joon-Young; Nam, Jae-Yeal

    2013-11-01

    This paper presents a fast and robust human tracking method to use in a moving long-wave infrared thermal camera under poor illumination with the existence of shadows and cluttered backgrounds. To improve the human tracking performance while minimizing the computation time, this study proposes an online learning of classifiers based on particle filters and combination of a local intensity distribution (LID) with oriented center-symmetric local binary patterns (OCS-LBP). Specifically, we design a real-time random forest (RF), which is the ensemble of decision trees for confidence estimation, and confidences of the RF are converted into a likelihood function of the target state. First, the target model is selected by the user and particles are sampled. Then, RFs are generated using the positive and negative examples with LID and OCS-LBP features by online learning. The learned RF classifiers are used to detect the most likely target position in the subsequent frame in the next stage. Then, the RFs are learned again by means of fast retraining with the tracked object and background appearance in the new frame. The proposed algorithm is successfully applied to various thermal videos as tests and its tracking performance is better than those of other methods.

  17. Random Forest and Objected-Based Classification for Forest Pest Extraction from Uav Aerial Imagery

    NASA Astrophysics Data System (ADS)

    Yuan, Yi; Hu, Xiangyun

    2016-06-01

    Forest pest is one of the most important factors affecting the health of forest. However, since it is difficult to figure out the pest areas and to predict the spreading ways just to partially control and exterminate it has not effective enough so far now. The infected areas by it have continuously spreaded out at present. Thus the introduction of spatial information technology is highly demanded. It is very effective to examine the spatial distribution characteristics that can establish timely proper strategies for control against pests by periodically figuring out the infected situations as soon as possible and by predicting the spreading ways of the infection. Now, with the UAV photography being more and more popular, it has become much cheaper and faster to get UAV images which are very suitable to be used to monitor the health of forest and detect the pest. This paper proposals a new method to effective detect forest pest in UAV aerial imagery. For an image, we segment it to many superpixels at first and then we calculate a 12-dimension statistical texture information for each superpixel which are used to train and classify the data. At last, we refine the classification results by some simple rules. The experiments show that the method is effective for the extraction of forest pest areas in UAV images.

  18. Automated segmentation of thyroid gland on CT images with multi-atlas label fusion and random classification forest

    NASA Astrophysics Data System (ADS)

    Liu, Jiamin; Chang, Kevin; Kim, Lauren; Turkbey, Evrim; Lu, Le; Yao, Jianhua; Summers, Ronald

    2015-03-01

    The thyroid gland plays an important role in clinical practice, especially for radiation therapy treatment planning. For patients with head and neck cancer, radiation therapy requires a precise delineation of the thyroid gland to be spared on the pre-treatment planning CT images to avoid thyroid dysfunction. In the current clinical workflow, the thyroid gland is normally manually delineated by radiologists or radiation oncologists, which is time consuming and error prone. Therefore, a system for automated segmentation of the thyroid is desirable. However, automated segmentation of the thyroid is challenging because the thyroid is inhomogeneous and surrounded by structures that have similar intensities. In this work, the thyroid gland segmentation is initially estimated by multi-atlas label fusion algorithm. The segmentation is refined by supervised statistical learning based voxel labeling with a random forest algorithm. Multiatlas label fusion (MALF) transfers expert-labeled thyroids from atlases to a target image using deformable registration. Errors produced by label transfer are reduced by label fusion that combines the results produced by all atlases into a consensus solution. Then, random forest (RF) employs an ensemble of decision trees that are trained on labeled thyroids to recognize features. The trained forest classifier is then applied to the thyroid estimated from the MALF by voxel scanning to assign the class-conditional probability. Voxels from the expert-labeled thyroids in CT volumes are treated as positive classes; background non-thyroid voxels as negatives. We applied this automated thyroid segmentation system to CT scans of 20 patients. The results showed that the MALF achieved an overall 0.75 Dice Similarity Coefficient (DSC) and the RF classification further improved the DSC to 0.81.

  19. Variable selection with random forest: Balancing stability, performance, and interpretation in ecological and environmental modeling

    EPA Science Inventory

    Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...

  20. The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading

    ERIC Educational Resources Information Center

    Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.

    2016-01-01

    Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…

  1. Oak Park and River Forest High School Random Access Information Center; A PACE Program. Report II.

    ERIC Educational Resources Information Center

    Oak Park - River Forest High School, Oak Park, IL.

    The specifications, planning, and initial development phases of the Random Access Center at the Oak Park and River Forest High School in Oak Park, Illinois, are described with particular attention to the ways that the five functional specifications and the five-part program rationale were implemented in the system design. Specifications, set out…

  2. Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies

    PubMed Central

    Yang, Wei; Charles Gu, C

    2014-01-01

    Genome-wide association studies (GWAS) has brought methodological challenges in handling massive high-dimensional data and also real opportunities for studying the joint effect of many risk factors acting in concert as an organic group. The random forest (RF) methodology is recognized by many for its potential in examining interaction effects in large data sets. However, RF is not designed to directly handle GWAS data, which typically have hundreds of thousands of single-nucleotide polymorphisms as predictor variables. We propose and evaluate a novel extension of RF, called random forest fishing (RFF), for GWAS analysis. RFF repeatedly updates a relatively small set of predictors obtained by RF tests to find globally important groups predictive of the disease phenotype, using a novel search algorithm based on genetic programming and simulated annealing. A key improvement of RFF results from the use of guidance incorporating empirical test results of genome-wide pairwise interactions. Evaluated using simulated and real GWAS data sets, RFF is shown to be effective in identifying important predictors, particularly when both marginal effects and interactions exist, and is applicable to very large GWAS data sets. PMID:23695277

  3. Fault diagnosis of spur gearbox based on random forest and wavelet packet decomposition

    NASA Astrophysics Data System (ADS)

    Cabrera, Diego; Sancho, Fernando; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Li, Chuan; Vásquez, Rafael E.

    2015-09-01

    This paper addresses the development of a random forest classifier for the multi-class fault diagnosis in spur gearboxes. The vibration signal's condition parameters are first extracted by applying the wavelet packet decomposition with multiple mother wavelets, and the coefficients' energy content for terminal nodes is used as the input feature for the classification problem. Then, a study through the parameters' space to find the best values for the number of trees and the number of random features is performed. In this way, the best set of mother wavelets for the application is identified and the best features are selected through the internal ranking of the random forest classifier. The results show that the proposed method reached 98.68% in classification accuracy, and high efficiency and robustness in the models.

  4. A novel quantum random number generation algorithm used by smartphone camera

    NASA Astrophysics Data System (ADS)

    Wu, Nan; Wang, Kun; Hu, Haixing; Song, Fangmin; Li, Xiangdong

    2015-05-01

    We study an efficient algorithm to extract quantum random numbers (QRN) from the raw data obtained by charge-coupled device (CCD) or complementary metal-oxide semiconductor (CMOS) based sensors, like a camera used in a commercial smartphone. Based on NIST statistical test for random number generators, the proposed algorithm has a high QRN generation rate and high statistical randomness. This algorithm provides a kind of simple, low-priced and reliable devices as a QRN generator for quantum key distribution (QKD) or other cryptographic applications.

  5. Random forests-based differential analysis of gene sets for gene expression data.

    PubMed

    Hsueh, Huey-Miin; Zhou, Da-Wei; Tsai, Chen-An

    2013-04-10

    In DNA microarray studies, gene-set analysis (GSA) has become the focus of gene expression data analysis. GSA utilizes the gene expression profiles of functionally related gene sets in Gene Ontology (GO) categories or priori-defined biological classes to assess the significance of gene sets associated with clinical outcomes or phenotypes. Many statistical approaches have been proposed to determine whether such functionally related gene sets express differentially (enrichment and/or deletion) in variations of phenotypes. However, little attention has been given to the discriminatory power of gene sets and classification of patients. In this study, we propose a method of gene set analysis, in which gene sets are used to develop classifications of patients based on the Random Forest (RF) algorithm. The corresponding empirical p-value of an observed out-of-bag (OOB) error rate of the classifier is introduced to identify differentially expressed gene sets using an adequate resampling method. In addition, we discuss the impacts and correlations of genes within each gene set based on the measures of variable importance in the RF algorithm. Significant classifications are reported and visualized together with the underlying gene sets and their contribution to the phenotypes of interest. Numerical studies using both synthesized data and a series of publicly available gene expression data sets are conducted to evaluate the performance of the proposed methods. Compared with other hypothesis testing approaches, our proposed methods are reliable and successful in identifying enriched gene sets and in discovering the contributions of genes within a gene set. The classification results of identified gene sets can provide an valuable alternative to gene set testing to reveal the unknown, biologically relevant classes of samples or patients. In summary, our proposed method allows one to simultaneously assess the discriminatory ability of gene sets and the importance of genes for

  6. Evaluation of three satellite-based latent heat flux algorithms over forest ecosystems using eddy covariance data.

    PubMed

    Yao, Yunjun; Zhang, Yuhu; Zhao, Shaohua; Li, Xianglan; Jia, Kun

    2015-06-01

    We have evaluated the performance of three satellite-based latent heat flux (LE) algorithms over forest ecosystems using observed data from 40 flux towers distributed across the world on all continents. These are the revised remote sensing-based Penman-Monteith LE (RRS-PM) algorithm, the modified satellite-based Priestley-Taylor LE (MS-PT) algorithm, and the semi-empirical Penman LE (UMD-SEMI) algorithm. Sensitivity analysis illustrates that both energy and vegetation terms has the highest sensitivity compared with other input variables. The validation results show that three algorithms demonstrate substantial differences in algorithm performance for estimating daily LE variations among five forest ecosystem biomes. Based on the average Nash-Sutcliffe efficiency and root-mean-squared error (RMSE), the MS-PT algorithm has high performance over both deciduous broadleaf forest (DBF) (0.81, 25.4 W/m(2)) and mixed forest (MF) (0.62, 25.3 W/m(2)) sites, the RRS-PM algorithm has high performance over evergreen broadleaf forest (EBF) (0.4, 28.1 W/m(2)) sites, and the UMD-SEMI algorithm has high performance over both deciduous needleleaf forest (DNF) (0.78, 17.1 W/m(2)) and evergreen needleleaf forest (ENF) (0.51, 28.1 W/m(2)) sites. Perhaps the lower uncertainties in the required forcing data for the MS-PT algorithm, the complicated algorithm structure for the RRS-PM algorithm, and the calibrated coefficients of the UMD-SEMI algorithm based on ground-measured data may explain these differences.

  7. Inference of biological networks using Bi-directional Random Forest Granger causality.

    PubMed

    Furqan, Mohammad Shaheryar; Siyal, Mohammad Yakoob

    2016-01-01

    The standard ordinary least squares based Granger causality is one of the widely used methods for detecting causal interactions between time series data. However, recent developments in technology limit the utilization of some existing implementations due to the availability of high dimensional data. In this paper, we are proposing a technique called Bi-directional Random Forest Granger causality. This technique uses the random forest regularization together with the idea of reusing the time series data by reversing the time stamp to extract more causal information. We have demonstrated the effectiveness of our proposed method by applying it to simulated data and then applied it to two real biological datasets, i.e., fMRI and HeLa cell. fMRI data was used to map brain network involved in deductive reasoning while HeLa cell dataset was used to map gene network involved in cancer.

  8. A U-Statistic-based random Forest approach for genetic association study.

    PubMed

    Li, Ming; Peng, Ruo-Sin; Wei, Changshuai; Lu, Qing

    2012-06-01

    Variations in complex traits are influenced by multiple genetic variants, environmental risk factors, and their interactions. Though substantial progress has been made in identifying single genetic variants associated with complex traits, detecting the gene-gene and gene-environment interactions remains a great challenge. When a large number of genetic variants and environmental risk factors are involved, searching for interactions is limited to pair-wise interactions due to the exponentially increased feature space and computational intensity. Alternatively, recursive partitioning approaches, such as random forests, have gained popularity in high-dimensional genetic association studies. In this article, we propose a U-Statistic-based random forest approach, referred to as Forest U-Test, for genetic association studies with quantitative traits. Through simulation studies, we showed that the Forest U-Test outperformed exiting methods. The proposed method was also applied to study Cannabis Dependence (CD), using three independent datasets from the Study of Addiction: Genetics and Environment. A significant joint association was detected with an empirical p-value less than 0.001. The finding was also replicated in two independent datasets with p-values of 5.93e-19 and 4.70e-17, respectively.

  9. An improved label propagation algorithm based on the similarity matrix using random walk

    NASA Astrophysics Data System (ADS)

    Zhang, Xian-Kun; Song, Chen; Jia, Jia; Lu, Zeng-Lei; Zhang, Qian

    2016-05-01

    Community detection based on label propagation algorithm (LPA) has attracted widespread concern because of its high efficiency. But it is difficult to guarantee the accuracy of community detection as the label spreading is random in the algorithm. In response to the problem, an improved LPA based on random walk (RWLPA) is proposed in this paper. Firstly, a matrix measuring similarity among various nodes in the network is obtained through calculation. Secondly, during the process of label propagation, when a node has more than a neighbor label with the highest frequency, not the label of a random neighbor but the label of the neighbor with the highest similarity will be chosen to update. It can avoid label propagating randomly among communities. Finally, we test LPA and the improved LPA in benchmark networks and real-world networks. The results show that the quality of communities discovered by the improved algorithm is improved compared with the traditional algorithm.

  10. Improved progressive TIN densification filtering algorithm for airborne LiDAR data in forested areas

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaoqian; Guo, Qinghua; Su, Yanjun; Xue, Baolin

    2016-07-01

    Filtering of light detection and ranging (LiDAR) data into the ground and non-ground points is a fundamental step in processing raw airborne LiDAR data. This paper proposes an improved progressive triangulated irregular network (TIN) densification (IPTD) filtering algorithm that can cope with a variety of forested landscapes, particularly both topographically and environmentally complex regions. The IPTD filtering algorithm consists of three steps: (1) acquiring potential ground seed points using the morphological method; (2) obtaining accurate ground seed points; and (3) building a TIN-based model and iteratively densifying TIN. The IPTD filtering algorithm was tested in 15 forested sites with various terrains (i.e., elevation and slope) and vegetation conditions (i.e., canopy cover and tree height), and was compared with seven other commonly used filtering algorithms (including morphology-based, slope-based, and interpolation-based filtering algorithms). Results show that the IPTD achieves the highest filtering accuracy for nine of the 15 sites. In general, it outperforms the other filtering algorithms, yielding the lowest average total error of 3.15% and the highest average kappa coefficient of 89.53%.

  11. Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

    NASA Astrophysics Data System (ADS)

    Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

    2016-03-01

    There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.

  12. A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    PubMed Central

    Haider, Saad; Rahman, Raziur; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database. PMID:26658256

  13. A Randomized Gossip Consenus Algorithm on Convex Metric Spaces

    DTIC Science & Technology

    2012-01-01

    655–661, May 2005. [20] A. Tahbaz Salehi and A. Jadbabaie. Necessary and sufficient conditions for consensus over random networks. IEEE Trans. Autom...Control, 53(3):791–795, Apr 2008. [21] A. Tahbaz Salehi and A. Jadbabaie. Consensus over ergodic stationary graph processes. IEEE Trans. Autom

  14. A Randomized Approximate Nearest Neighbors Algorithm - A Short Version

    DTIC Science & Technology

    2011-01-13

    20172. [8] D. Knuth (1969) in Seminumerical Algorithms, vol. 2 of The Art of Computer Pro- gramming, Reading, Mass: Addison-Wesley. [9] N. Ailon, E...ORGANIZATION NAME(S) AND ADDRESS(ES) Yale University,Department of Computer Science,New Haven,CT,06520 8. PERFORMING ORGANIZATION REPORT NUMBER 9...points is the standard Euclidean distance. For each xi, one can compute in a straightforward manner the distances to the rest of the points and thus

  15. Advances in SCA and RF-DNA Fingerprinting Through Enhanced Linear Regression Attacks and Application of Random Forest Classifiers

    DTIC Science & Technology

    2014-09-18

    ADVANCES IN SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Hiren...SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS AND APPLICATION OF RANDOM FOREST CLASSIFIERS DISSERTATION Presented to the...APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED AFIT-ENG-DS-14-S-03 ADVANCES IN SCA AND RF-DNA FINGERPRINTING THROUGH ENHANCED LINEAR REGRESSION ATTACKS

  16. Effective algorithm for random mask generation used in secured optical data encryption and communication

    NASA Astrophysics Data System (ADS)

    Liu, Yuexin; Metzner, John J.; Guo, Ruyan; Yu, Francis T. S.

    2005-09-01

    An efficient and secure algorithm for random phase mask generation used in optical data encryption and transmission system is proposed, based on Diffie-Hellman public key distribution. Thus-generated random mask has higher security due to the fact that it is never exposed to the vulnerable transmitting channels. The effectiveness to retrieve the original image and its robustness against blind manipulation have been demonstrated by our numerical results. In addition, this algorithm can be easily extended to multicast networking system and refresh of this shared random key is also very simple to implement.

  17. Predicting local Soil- and Land-units with Random Forest in the Senegalese Sahel

    NASA Astrophysics Data System (ADS)

    Grau, Tobias; Brandt, Martin; Samimi, Cyrus

    2013-04-01

    MODIS (MCD12Q1) or Globcover are often the only available global land-cover products, however ground-truthing in the Sahel of Senegal has shown that most classes do have any agreement with actual land-cover making those products unusable in any local application. We suggest a methodology, which models local Wolof land- and soil-types in an area in the Senegalese Ferlo around Linguère at different scales. In a first step, interviews with the local population were conducted to ascertain the local denotation of soil units, as well as their agricultural use and woody vegetation mainly growing on them. "Ndjor" are soft sand soils with mainly Combretum glutinosum trees. They are suitable for groundnuts and beans while millet is grown on hard sand soils ("Bardjen") dominated by Balanites aegyptiaca and Acacia tortilis. "Xur" are clayey depressions with a high diversity of tree species. Lateritic pasture sites with dense woody vegetation (mostly Pterocarpus lucens and Guiera senegalensis) have never been used for cropping and are called "All". In a second step, vegetation and soil parameters of 85 plots (~1 ha) were surveyed in the field. 28 different soil parameters are clustered into 4 classes using the WARD algorithm. Here, 81% agree with the local classification. Then, an ordination (NMDS) with 2 dimensions and a stress-value of 9.13% was calculated using the 28 soil parameters. It shows several significant relationships between the soil classes and the fitted environmental parameters which are derived from field data, a digital elevation model, Landsat and RapidEye imagery as well as TRMM rainfall data. Landsat's band 5 reflectance values (1.55 - 1.75 µm) of mean dry season image (2000-2010) has a R² of 0.42 and is the most important of 9 significant variables (5%-level). A random forest classifier is then used to extrapolate the 4 classes to the whole study area based on the 9 significant environmental parameters. At a resolution of 30 m the OBB (out-of-bag) error

  18. SNRFCB: sub-network based random forest classifier for predicting chemotherapy benefit on survival for cancer treatment.

    PubMed

    Shi, Mingguang; He, Jianmin

    2016-04-01

    Adjuvant chemotherapy (CTX) should be individualized to provide potential survival benefit and avoid potential harm to cancer patients. Our goal was to establish a computational approach for making personalized estimates of the survival benefit from adjuvant CTX. We developed Sub-Network based Random Forest classifier for predicting Chemotherapy Benefit (SNRFCB) based gene expression datasets of lung cancer. The SNRFCB approach was then validated in independent test cohorts for identifying chemotherapy responder cohorts and chemotherapy non-responder cohorts. SNRFCB involved the pre-selection of gene sub-network signatures based on the mutations and on protein-protein interaction data as well as the application of the random forest algorithm to gene expression datasets. Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer patients in the chemotherapy responder group (P = 0.008), but it was not beneficial to patients in the chemotherapy non-responder group (P = 0.657). Adjuvant CTX was significantly associated with the prolonged overall survival of lung cancer squamous cell carcinoma (SQCC) subtype patients in the chemotherapy responder cohorts (P = 0.024), but it was not beneficial to patients in the chemotherapy non-responder cohorts (P = 0.383). SNRFCB improved prediction performance as compared to the machine learning method, support vector machine (SVM). To test the general applicability of the predictive model, we further applied the SNRFCB approach to human breast cancer datasets and also observed superior performance. SNRFCB could provide recurrent probability for individual patients and identify which patients may benefit from adjuvant CTX in clinical trials.

  19. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.

    PubMed

    Mao, Rui; Raj Kumar, Praveen Kumar; Guo, Cheng; Zhang, Yang; Liang, Chun

    2014-01-01

    One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Formula: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

  20. Quantification of the Heterogeneity of Prognostic Cellular Biomarkers in Ewing Sarcoma Using Automated Image and Random Survival Forest Analysis

    PubMed Central

    Yu, Haiyue; Branford White, Harriet; Schäfer, Karl L.; Llombart-Bosch, Antonio; Machado, Isidro; Picci, Piero; Hogendoorn, Pancras C. W.; Athanasou, Nicholas A.; Noble, J. Alison; Hassan, A. Bassim

    2014-01-01

    Driven by genomic somatic variation, tumour tissues are typically heterogeneous, yet unbiased quantitative methods are rarely used to analyse heterogeneity at the protein level. Motivated by this problem, we developed automated image segmentation of images of multiple biomarkers in Ewing sarcoma to generate distributions of biomarkers between and within tumour cells. We further integrate high dimensional data with patient clinical outcomes utilising random survival forest (RSF) machine learning. Using material from cohorts of genetically diagnosed Ewing sarcoma with EWSR1 chromosomal translocations, confocal images of tissue microarrays were segmented with level sets and watershed algorithms. Each cell nucleus and cytoplasm were identified in relation to DAPI and CD99, respectively, and protein biomarkers (e.g. Ki67, pS6, Foxo3a, EGR1, MAPK) localised relative to nuclear and cytoplasmic regions of each cell in order to generate image feature distributions. The image distribution features were analysed with RSF in relation to known overall patient survival from three separate cohorts (185 informative cases). Variation in pre-analytical processing resulted in elimination of a high number of non-informative images that had poor DAPI localisation or biomarker preservation (67 cases, 36%). The distribution of image features for biomarkers in the remaining high quality material (118 cases, 104 features per case) were analysed by RSF with feature selection, and performance assessed using internal cross-validation, rather than a separate validation cohort. A prognostic classifier for Ewing sarcoma with low cross-validation error rates (0.36) was comprised of multiple features, including the Ki67 proliferative marker and a sub-population of cells with low cytoplasmic/nuclear ratio of CD99. Through elimination of bias, the evaluation of high-dimensionality biomarker distribution within cell populations of a tumour using random forest analysis in quality controlled tumour

  1. Data mining in the Life Sciences with Random Forest: a walk in the park or lost in the jungle?

    PubMed

    Touw, Wouter G; Bayjanov, Jumamurat R; Overmars, Lex; Backus, Lennart; Boekhorst, Jos; Wels, Michiel; van Hijum, Sacha A F T

    2013-05-01

    In the Life Sciences 'omics' data is increasingly generated by different high-throughput technologies. Often only the integration of these data allows uncovering biological insights that can be experimentally validated or mechanistically modelled, i.e. sophisticated computational approaches are required to extract the complex non-linear trends present in omics data. Classification techniques allow training a model based on variables (e.g. SNPs in genetic association studies) to separate different classes (e.g. healthy subjects versus patients). Random Forest (RF) is a versatile classification algorithm suited for the analysis of these large data sets. In the Life Sciences, RF is popular because RF classification models have a high-prediction accuracy and provide information on importance of variables for classification. For omics data, variables or conditional relations between variables are typically important for a subset of samples of the same class. For example: within a class of cancer patients certain SNP combinations may be important for a subset of patients that have a specific subtype of cancer, but not important for a different subset of patients. These conditional relationships can in principle be uncovered from the data with RF as these are implicitly taken into account by the algorithm during the creation of the classification model. This review details some of the to the best of our knowledge rarely or never used RF properties that allow maximizing the biological insights that can be extracted from complex omics data sets using RF.

  2. Feature integration with random forests for real-time human activity recognition

    NASA Astrophysics Data System (ADS)

    Kataoka, Hirokatsu; Hashimoto, Kiyoshi; Aoki, Yoshimitsu

    2015-02-01

    This paper presents an approach for real-time human activity recognition. Three different kinds of features (flow, shape, and a keypoint-based feature) are applied in activity recognition. We use random forests for feature integration and activity classification. A forest is created at each feature that performs as a weak classifier. The international classification of functioning, disability and health (ICF) proposed by WHO is applied in order to set the novel definition in activity recognition. Experiments on human activity recognition using the proposed framework show - 99.2% (Weizmann action dataset), 95.5% (KTH human actions dataset), and 54.6% (UCF50 dataset) recognition accuracy with a real-time processing speed. The feature integration and activity-class definition allow us to accomplish high-accuracy recognition match for the state-of-the-art in real-time.

  3. A systematic review of randomized controlled trials on curative and health enhancement effects of forest therapy

    PubMed Central

    Kamioka, Hiroharu; Tsutani, Kiichiro; Mutoh, Yoshiteru; Honda, Takuya; Shiozawa, Nobuyoshi; Okada, Shinpei; Park, Sang-Jun; Kitayuguchi, Jun; Kamada, Masamitsu; Okuizumi, Hiroyasu; Handa, Shuichi

    2012-01-01

    Objective To summarize the evidence for curative and health enhancement effects through forest therapy and to assess the quality of studies based on a review of randomized controlled trials (RCTs). Study design A systematic review based on RCTs. Methods Studies were eligible if they were RCTs. Studies included one treatment group in which forest therapy was applied. The following databases – from 1990 to November 9, 2010 – were searched: MEDLINE via PubMed, CINAHL, Web of Science, and Ichushi- Web. All Cochrane databases and Campbell Systematic Reviews were also searched up to November 9, 2010. Results Two trials met all inclusion criteria. No specific diseases were evaluated, and both studies reported significant effectiveness in one or more outcomes for health enhancement. However, the results of evaluations with the CONSORT (Consolidated Standards of Reporting Trials) 2010 and CLEAR NPT (A Checklist to Evaluate a Report of a Nonpharmacological Trial) checklists generally showed a remarkable lack of description in the studies. Furthermore, there was a problem of heterogeneity, thus a meta-analysis was unable to be performed. Conclusion Because there was insufficient evidence on forest therapy due to poor methodological and reporting quality and heterogeneity of RCTs, it was not possible to offer any conclusions about the effects of this intervention. However, it was possible to identify problems with current RCTs of forest therapy, and to propose a strategy for strengthening study quality and stressing the importance of study feasibility and original check items based on characteristics of forest therapy as a future research agenda. PMID:22888281

  4. The Application of Imperialist Competitive Algorithm for Fuzzy Random Portfolio Selection Problem

    NASA Astrophysics Data System (ADS)

    EhsanHesamSadati, Mir; Bagherzadeh Mohasefi, Jamshid

    2013-10-01

    This paper presents an implementation of the Imperialist Competitive Algorithm (ICA) for solving the fuzzy random portfolio selection problem where the asset returns are represented by fuzzy random variables. Portfolio Optimization is an important research field in modern finance. By using the necessity-based model, fuzzy random variables reformulate to the linear programming and ICA will be designed to find the optimum solution. To show the efficiency of the proposed method, a numerical example illustrates the whole idea on implementation of ICA for fuzzy random portfolio selection problem.

  5. Fanning - A classification algorithm for mixture landscapes applied to Landsat data of Maine forests

    NASA Technical Reports Server (NTRS)

    Ungar, S. G.; Bryant, E.

    1981-01-01

    It is pointed out that typical landscapes include a relatively small number of 'pure' land cover types which combine in various proportions to form a myriad of mixture types. Most Landsat classifications algorithms used today require a separate user specification for each category, including mixture categories. Attention is given to a simpler approach, which would require the user to specify only the 'pure' types. Mixture pixels would be classified on the basis of the proportion of the area covered by each pure type within the pixel. The 'fanning' algorithm quantifies varying proportions of two 'pure' land cover types in selected mixture pixels. This algorithm was applied to 200,000 ha of forest land in Maine, taking into account a comparison with standard inventory information. Results compared well with a discrete categories classification of the same area.

  6. Automated segmentation of dental CBCT image with prior-guided sequential random forests

    SciTech Connect

    Wang, Li; Gao, Yaozong; Shi, Feng; Li, Gang; Chen, Ken-Chung; Tang, Zhen; Xia, James J. E-mail: JXia@HoustonMethodist.org; Shen, Dinggang E-mail: JXia@HoustonMethodist.org

    2016-01-15

    Purpose: Cone-beam computed tomography (CBCT) is an increasingly utilized imaging modality for the diagnosis and treatment planning of the patients with craniomaxillofacial (CMF) deformities. Accurate segmentation of CBCT image is an essential step to generate 3D models for the diagnosis and treatment planning of the patients with CMF deformities. However, due to the image artifacts caused by beam hardening, imaging noise, inhomogeneity, truncation, and maximal intercuspation, it is difficult to segment the CBCT. Methods: In this paper, the authors present a new automatic segmentation method to address these problems. Specifically, the authors first employ a majority voting method to estimate the initial segmentation probability maps of both mandible and maxilla based on multiple aligned expert-segmented CBCT images. These probability maps provide an important prior guidance for CBCT segmentation. The authors then extract both the appearance features from CBCTs and the context features from the initial probability maps to train the first-layer of random forest classifier that can select discriminative features for segmentation. Based on the first-layer of trained classifier, the probability maps are updated, which will be employed to further train the next layer of random forest classifier. By iteratively training the subsequent random forest classifier using both the original CBCT features and the updated segmentation probability maps, a sequence of classifiers can be derived for accurate segmentation of CBCT images. Results: Segmentation results on CBCTs of 30 subjects were both quantitatively and qualitatively validated based on manually labeled ground truth. The average Dice ratios of mandible and maxilla by the authors’ method were 0.94 and 0.91, respectively, which are significantly better than the state-of-the-art method based on sparse representation (p-value < 0.001). Conclusions: The authors have developed and validated a novel fully automated method

  7. Characterizing channel change along a multithread gravel-bed river using random forest image classification

    NASA Astrophysics Data System (ADS)

    Overstreet, B. T.; Legleiter, C. J.

    2012-12-01

    The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and

  8. Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

    PubMed Central

    Emir, Birol; Masters, Elizabeth T; Mardekian, Jack; Clair, Andrew; Kuhn, Max; Silverman, Stuart L

    2015-01-01

    Background Diagnosis of fibromyalgia (FM), a chronic musculoskeletal condition characterized by widespread pain and a constellation of symptoms, remains challenging and is often delayed. Methods Random forest modeling of electronic medical records was used to identify variables that may facilitate earlier FM identification and diagnosis. Subjects aged ≥18 years with two or more listings of the International Classification of Diseases, Ninth Revision, (ICD-9) code for FM (ICD-9 729.1) ≥30 days apart during the 2012 calendar year were defined as cases among subjects associated with an integrated delivery network and who had one or more health care provider encounter in the Humedica database in calendar years 2011 and 2012. Controls were without the FM ICD-9 codes. Seventy-two demographic, clinical, and health care resource utilization variables were entered into a random forest model with downsampling to account for cohort imbalances (<1% subjects had FM). Importance of the top ten variables was ranked based on normalization to 100% for the variable with the largest loss in predicting performance by its omission from the model. Since random forest is a complex prediction method, a set of simple rules was derived to help understand what factors drive individual predictions. Results The ten variables identified by the model were: number of visits where laboratory/non-imaging diagnostic tests were ordered; number of outpatient visits excluding office visits; age; number of office visits; number of opioid prescriptions; number of medications prescribed; number of pain medications excluding opioids; number of medications administered/ordered; number of emergency room visits; and number of musculoskeletal conditions. A receiver operating characteristic curve confirmed the model’s predictive accuracy using an independent test set (area under the curve, 0.810). To enhance interpretability, nine rules were developed that could be used with good predictive probability of

  9. An efficient algorithm for generating random number pairs drawn from a bivariate normal distribution

    NASA Technical Reports Server (NTRS)

    Campbell, C. W.

    1983-01-01

    An efficient algorithm for generating random number pairs from a bivariate normal distribution was developed. Any desired value of the two means, two standard deviations, and correlation coefficient can be selected. Theoretically the technique is exact and in practice its accuracy is limited only by the quality of the uniform distribution random number generator, inaccuracies in computer function evaluation, and arithmetic. A FORTRAN routine was written to check the algorithm and good accuracy was obtained. Some small errors in the correlation coefficient were observed to vary in a surprisingly regular manner. A simple model was developed which explained the qualities aspects of the errors.

  10. Multisite updating Markov chain Monte Carlo algorithm for morphologically constrained Gibbs random fields

    NASA Astrophysics Data System (ADS)

    Sivakumar, Krishnamoorthy; Goutsias, John I.

    1998-09-01

    We study the problem of simulating a class of Gibbs random field models, called morphologically constrained Gibbs random fields, using Markov chain Monte Carlo sampling techniques. Traditional single site updating Markov chain Monte Carlo sampling algorithm, like the Metropolis algorithm, tend to converge extremely slowly when used to simulate these models, particularly at low temperatures and for constraints involving large geometrical shapes. Moreover, the morphologically constrained Gibbs random fields are not, in general, Markov. Hence, a Markov chain Monte Carlo sampling algorithm based on the Gibbs sampler is not possible. We prose a variant of the Metropolis algorithm that, at each iteration, allows multi-site updating and converges substantially faster than the traditional single- site updating algorithm. The set of sites that are updated at a particular iteration is specified in terms of a shape parameter and a size parameter. Computation of the acceptance probability involves a 'test ratio,' which requires computation of the ratio of the probabilities of the current and new realizations. Because of the special structure of our energy function, this computation can be done by means of a simple; local iterative procedure. Therefore lack of Markovianity does not impose any additional computational burden for model simulation. The proposed algorithm has been used to simulate a number of image texture models, both synthetic and natural.

  11. A Random Forest-Induced Distance-Based Measure of Physiological Dysregulation.

    PubMed

    Bello, Ghalib Ayodeji; Dumancas, Gerard

    2017-01-17

    Aging involves gradual, multisystemic physiological dysregulation and over time, this degenerative process increases an individual's risk for multiple age-related comorbidities. The ability to quantify age-related physiological dysregulation can provide key insights into the biological mechanisms underlying the aging process and facilitate the development of clinical interventions. Recent studies have introduced and validated a measure of physiological dysregulation based on statistical distance. This measure quantifies the extent of physiological dysregulation in an individual by measuring how much the individual's biomarker profile deviates from the expected average. The measurement is done by conceptualizing an individual's biomarker profile as a point in multidimensional space, and computing the Mahalanobis distance between this point and a population-based norm. Higher distances imply a greater degree of physiological dysregulation, i.e. increased divergence from normal, healthy functioning. The biomarkers used for the computation are typically clinical markers of physiological function, for example, cholesterol levels and blood glucose. Major shortcomings of this Mahalanobis distance-based approach are the incorrect assumption of multivariate normality, and identical weighting of biomarkers. In this study, we introduce a nonparametric approach that requires no distributional assumptions. This approach utilizes Random Survival Forests and produces a distance measure that exhibits better performance than the standard approach based on Mahalanobis distance. We find that our Random Forest-induced distance metric substantially outperforms the standard measure in predicting mortality, health status and biological age, which suggests it is a more accurate tool for characterizing and quantifying age-related physiological dysregulation.

  12. Performance evaluation of random forest and support vector regressions in natural hazard change detection

    NASA Astrophysics Data System (ADS)

    Eisavi, Vahid; Homayouni, Saeid

    2016-10-01

    Information on land use and land cover changes is considered as a foremost requirement for monitoring environmental change. Developing change detection methodology in the remote sensing community is an active research topic. However, to the best of our knowledge, no research has been conducted so far on the application of random forest regression (RFR) and support vector regression (SVR) for natural hazard change detection from high-resolution optical remote sensing observations. Hence, the objective of this study is to examine the use of RFR and SVR to discriminate between changed and unchanged areas after a tsunami. For this study, RFR and SVR were applied to two different pilot coastlines in Indonesia and Japan. Two different remotely sensed data sets acquired by Quickbird and Ikonos sensors were used for efficient evaluation of the proposed methodology. The results demonstrated better performance of SVM compared to random forest (RF) with an overall accuracy higher by 3% to 4% and kappa coefficient by 0.05 to 0.07. Using McNemar's test, statistically significant differences (Z≥1.96), at the 5% significance level, between the confusion matrices of the RF classifier and the support vector classifier were observed in both study areas. The high accuracy of change detection obtained in this study confirms that these methods have the potential to be used for detecting changes due to natural hazards.

  13. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests

    PubMed Central

    Xia, Bingbing; Jiang, Huiyan; Liu, Huiling; Yi, Dehui

    2016-01-01

    This paper proposed a novel voting ranking random forests (VRRF) method for solving hepatocellular carcinoma (HCC) image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE) pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP) feature, Gray Level Cooccurrence Matrix (GLCM) feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem. PMID:27293477

  14. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests.

    PubMed

    Xia, Bingbing; Jiang, Huiyan; Liu, Huiling; Yi, Dehui

    2015-01-01

    This paper proposed a novel voting ranking random forests (VRRF) method for solving hepatocellular carcinoma (HCC) image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE) pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP) feature, Gray Level Co-occurrence Matrix (GLCM) feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem.

  15. Examining predictors of chemical toxicity in freshwater fish using the random forest technique.

    PubMed

    Tuulaikhuu, Baigal-Amar; Guasch, Helena; García-Berthou, Emili

    2017-03-03

    Chemical pollution is one of the main issues globally threatening the enormous biodiversity of freshwater ecosystems. The toxicity of substances depends on many factors such as the chemical itself, the species affected, environmental conditions, exposure duration, and concentration. We used the random forest technique to examine the factors that mediate toxicity in a set of widespread fishes and analyses of covariance to further assess the importance of differential sensitivity among fish species. Among 13 variables, the 5 most important predictors of toxicity with random forests were, by order of importance, the chemical substance itself (i.e., Chemical Abstracts Service number considered as a categorical factor), octanol-water partition coefficient (log P), pollutant prioritization, ecological structure-activity relationship (ECOSAR) classification, and fish species for 50% lethal concentrations (LC50) and the chemical substance, fish species, log P, ECOSAR classification, and water temperature for no observed effect concentrations (NOECs). Fish species was a very important predictor for both endpoints and with the two contrasting statistical techniques used. Different fish species displayed very different relationships with log P, often with different slopes and with as much importance as the partition coefficient. Therefore, caution should be exercised when extrapolating toxicological results or relationships among species. In addition, further research is needed to determine species-specific sensitivities and unravel the mechanisms behind them.

  16. Disulfide Connectivity Prediction Based on Modelled Protein 3D Structural Information and Random Forest Regression.

    PubMed

    Yu, Dong-Jun; Li, Yang; Hu, Jun; Yang, Xibei; Yang, Jing-Yu; Shen, Hong-Bin

    2015-01-01

    Disulfide connectivity is an important protein structural characteristic. Accurately predicting disulfide connectivity solely from protein sequence helps to improve the intrinsic understanding of protein structure and function, especially in the post-genome era where large volume of sequenced proteins without being functional annotated is quickly accumulated. In this study, a new feature extracted from the predicted protein 3D structural information is proposed and integrated with traditional features to form discriminative features. Based on the extracted features, a random forest regression model is performed to predict protein disulfide connectivity. We compare the proposed method with popular existing predictors by performing both cross-validation and independent validation tests on benchmark datasets. The experimental results demonstrate the superiority of the proposed method over existing predictors. We believe the superiority of the proposed method benefits from both the good discriminative capability of the newly developed features and the powerful modelling capability of the random forest. The web server implementation, called TargetDisulfide, and the benchmark datasets are freely available at: http://csbio.njust.edu.cn/bioinf/TargetDisulfide for academic use.

  17. [Tree species information extraction of farmland returned to forests based on improved support vector machine algorithm].

    PubMed

    Wu, Jian; Peng, Dao-Li

    2011-04-01

    The difference analysis of spectrum among tree species and the improvement of classification algorithm are the difficult points of extracting tree species information using remote sensing images, and are also the keys to improving the accuracy in the tree species information extraction in farmland returned to forests area. TM images were selected in this study, and the spectral indexes that could distinguish tree species information were filtered by analyzing tree species spectrum. Afterwards, the information of tree species was extracted using improved support vector machine algorithm. Although errors and confusion exist, this method shows satisfying results with an overall accuracy of 81.7%. The corresponding result of the traditional method is 72.5%. The method in this paper can achieve a more precise information extraction of tree species and the results can meet the demand of accurate monitoring and decision-making. This method is significant to the rapid assessment of project quality.

  18. Can't See the Forest: Using an Evolutionary Algorithm to Produce an Animated Artwork

    NASA Astrophysics Data System (ADS)

    Trist, Karen; Ciesielski, Vic; Barile, Perry

    We describe an artist's journey of working with an evolutionary algorithm to create an artwork suitable for exhibition in a gallery. Software based on the evolutionary algorithm produces animations which engage the viewer with a target image slowly emerging from a random collection of greyscale lines. The artwork consists of a grid of movies of eucalyptus tree targets. Each movie resolves with different aesthetic qualities, tempo and energy. The artist exercises creative control by choice of target and values for evolutionary and drawing parameters.

  19. Development of a Satellite-based evapotranspiration algorithm: A case study for Two Deciduous Forest Sites

    NASA Astrophysics Data System (ADS)

    Elmasri, B.; Rahman, A. F.

    2011-12-01

    We introduce a new methodology to estimate 8-day average daily evapotranspiration (ET) using both routinely available data and the Penman-Monteith (P-M) equation. Our algorithm considers the environmental constraints on surface resistance and ET by (1) including vapor pressure deficit (VPD), incoming solar radiation, soil moisture, and temperature constraints on stomatal conductance; (2) using leaf area index (LAI) to scale from the leaf to the canopy conductance; and (3) calculating canopy resistance as a function of environmental variables such as net radiation, precipitation index, and VPD. Remote sensing data from the Moderate Resolution Spectroradiometer (MODIS) and the Advance Microwave Scanning Radiometer-EOS (AMSR-E) were used to estimate ET by using MODIS land surface temperature (LST) to estimated VPD, AMSR-E soil moisture to estimate canopy conductance, and MODIS surface emissivity and albedo to estimate shortwave and net radiation. The algorithm was evaluated using ET observations from two AmeriFlux Eddy covariance flux towers located at the Morgan Monroe State Forest (MMSF) in Indiana and the Harvard Forest (HarvF) in Massachusetts for the periods of 2003-2008. ET estimates from our algorithm was compared to the flux observations. Results indicated a root mean square error (RMSE) of the 8-day average ET of 0.57 mm for the HarvF and 0.47 mm for the MMSF. A significant correlation was found between the estimated 8-day average ET and the observed 8-day average ET with r2 of 0.84 for the HarvF and 0.88 for the MMSF. Using tower meteorological data, the r2 slightly increased to 0.90 for the MMSF. The algorithms for VPD and radiation were tested against flux observations and found a strong correlation with r2 ranging from 0.68 to 0.82. Sensitivity analysis revealed that the modeled ET predictions are highly sensitive to changes in the canopy resistance values, so accurate estimates of canopy resistance is essential for improve ET predictions. Our algorithm

  20. A large scale microwave emission model for forests. Contribution to the SMOS algorithm

    NASA Astrophysics Data System (ADS)

    Rahmoune, R.; Della Vecchia, A.; Ferrazzoli, P.; Guerriero, L.; Martin-Porqueras, F.

    2009-04-01

    are being considered. Also the effects of temperature gradients within the crown canopy are being considered. The model was tested against radiometric measurements carried out by towers and aircrafts. A new test has been done using the brightness temperatures measured over some forests in Finland by the AMIRAS radiometer, which is an airborne demonstrator of the MIRAS imaging radiometer to be launched with SMOS. The outputs produced by the model are used to fit the parameters of the simple radiative transfer model which will be used in the Level 2 soil moisture retrieval algorithm. It is planned to compare model outputs with L1C data, which will be made available during the commissioning phase. To this end, a number of adequate extended forest sites are being selected: the Amazon rain forest, the Zaire Basins, the Argentinian Chaco forest, and the Finland forest. 2. PARAMETRIC STUDIES In this paper, results of parametric simulations are shown. The emissivity at vertical and horizontal polarization is simulated as a function of soil moisture content for various conditions of forest cover. Seasonal effects are considered, and the values of Leaf Area Index in winter and summer are taken as basic inputs. The difference between the two values is attributed partially to arboreous foliage and partially to understory, while the woody biomass is assumed to be constant in time. Results indicate that seasonal effects are limited, but not negligible. The simulations are repeated for different distributions of trunk diameters. If the distributions is centered over lower diameter values, the forest is optically thicker, for a given biomass. Also the variations of brightness temperature due to a temperature gradient within the crown canopy have been estimated. The outputs are used to predict the values of a simple first order RT model. 3. COMPARISONS WITH EXPERIMENTAL DATA Results of previous comparisons between model simulations and experimental data are summarized. Experimental

  1. Discrimination of tree species using random forests from the Chinese high-resolution remote sensing satellite GF-1

    NASA Astrophysics Data System (ADS)

    Lv, Jie; Ma, Ting

    2016-10-01

    Tree species distribution is an important issue for sustainable forest resource management. However, the accuracy of tree species discrimination using remote-sensing data needs to be improved to support operational forestry-monitoring tasks. This study aimed to classify tree species in the Liangshui Nature Reserve of Heilongjiang Province, China using spectral and structural remote sensing information in an auto-mated Random Forest modelling approach. This study evaluates and compares the performance of two machine learning classifiers, random forests (RF), support vector machine (SVM) to classify the Chinese high-resolution remote sensing satellite GF-1 images. Texture factor was extracted from GF-1 image with grey-level co-occurrence matrix method. Normalized Difference Vegetation Index (NDVI), Ratio Vegetation Index (RVI), Enhanced Vegetation Index (EVI), Difference Vegetation Index (DVI) were calculated and coupled into the model. The result show that the Random Forest model yielded the highest classification accuracy and prediction success for the tree species with an overall classification accuracy of 81.07% and Kappa coefficient value of 0.77. The proposed random forests method was able to achieve highly satisfactory tree species discrimination results. And aerial LiDAR data should be further explored in future research activities.

  2. Muscle forces during running predicted by gradient-based and random search static optimisation algorithms.

    PubMed

    Miller, Ross H; Gillette, Jason C; Derrick, Timothy R; Caldwell, Graham E

    2009-04-01

    Muscle forces during locomotion are often predicted using static optimisation and SQP. SQP has been criticised for over-estimating force magnitudes and under-estimating co-contraction. These problems may be related to SQP's difficulty in locating the global minimum to complex optimisation problems. Algorithms designed to locate the global minimum may be useful in addressing these problems. Muscle forces for 18 flexors and extensors of the lower extremity were predicted for 10 subjects during the stance phase of running. Static optimisation using SQP and two random search (RS) algorithms (a genetic algorithm and simulated annealing) estimated muscle forces by minimising the sum of cubed muscle stresses. The RS algorithms predicted smaller peak forces (42% smaller on average) and smaller muscle impulses (46% smaller on average) than SQP, and located solutions with smaller cost function scores. Results suggest that RS may be a more effective tool than SQP for minimising the sum of cubed muscle stresses in static optimisation.

  3. Biased Random-Key Genetic Algorithms for the Winner Determination Problem in Combinatorial Auctions.

    PubMed

    de Andrade, Carlos Eduardo; Toso, Rodrigo Franco; Resende, Mauricio G C; Miyazawa, Flávio Keidi

    2015-01-01

    In this paper we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.

  4. A hybrid flower pollination algorithm based modified randomized location for multi-threshold medical image segmentation.

    PubMed

    Wang, Rui; Zhou, Yongquan; Zhao, Chengyan; Wu, Haizhou

    2015-01-01

    Multi-threshold image segmentation is a powerful image processing technique that is used for the preprocessing of pattern recognition and computer vision. However, traditional multilevel thresholding methods are computationally expensive because they involve exhaustively searching the optimal thresholds to optimize the objective functions. To overcome this drawback, this paper proposes a flower pollination algorithm with a randomized location modification. The proposed algorithm is used to find optimal threshold values for maximizing Otsu's objective functions with regard to eight medical grayscale images. When benchmarked against other state-of-the-art evolutionary algorithms, the new algorithm proves itself to be robust and effective through numerical experimental results including Otsu's objective values and standard deviations.

  5. Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling

    PubMed Central

    Deglint, Jason; Kazemzadeh, Farnoud; Cho, Daniel; Clausi, David A.; Wong, Alexander

    2016-01-01

    The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image devices. In this study, we investigate the feasibility of simultaneous multispectral imaging using conventional image sensors with color filter arrays via a novel comprehensive framework for numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the color image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Experimental results demonstrate the feasibility of such a method for the purpose of simultaneous multispectral imaging. PMID:27346434

  6. A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins.

    PubMed

    Chen, Ava S-Y; Westwood, Nicholas J; Brear, Paul; Rogers, Graeme W; Mavridis, Lazaros; Mitchell, John B O

    2016-04-01

    We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES).

  7. Using Classification and Regression Trees (CART) and random forests to analyze attrition: Results from two simulations.

    PubMed

    Hayes, Timothy; Usami, Satoshi; Jacobucci, Ross; McArdle, John J

    2015-12-01

    In this article, we describe a recent development in the analysis of attrition: using classification and regression trees (CART) and random forest methods to generate inverse sampling weights. These flexible machine learning techniques have the potential to capture complex nonlinear, interactive selection models, yet to our knowledge, their performance in the missing data analysis context has never been evaluated. To assess the potential benefits of these methods, we compare their performance with commonly employed multiple imputation and complete case techniques in 2 simulations. These initial results suggest that weights computed from pruned CART analyses performed well in terms of both bias and efficiency when compared with other methods. We discuss the implications of these findings for applied researchers.

  8. Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest

    PubMed Central

    Ju, Ying

    2016-01-01

    G protein-coupled receptors (GPCRs) are the largest receptor superfamily. In this paper, we try to employ physical-chemical properties, which come from SVM-Prot, to represent GPCR. Random Forest was utilized as classifier for distinguishing them from other protein sequences. MEME suite was used to detect the most significant 10 conserved motifs of human GPCRs. In the testing datasets, the average accuracy was 91.61%, and the average AUC was 0.9282. MEME discovery analysis showed that many motifs aggregated in the seven hydrophobic helices transmembrane regions adapt to the characteristic of GPCRs. All of the above indicate that our machine-learning method can successfully distinguish GPCRs from non-GPCRs. PMID:27529053

  9. Random forest Granger causality for detection of effective brain connectivity using high-dimensional data.

    PubMed

    Furqan, Mohammad Shaheryar; Siyal, Mohammad Yakoob

    2016-03-01

    Studies have shown that the brain functions are not localized to isolated areas and connections but rather depend on the intricate network of connections and regions inside the brain. These networks are commonly analyzed using Granger causality (GC) that utilizes the ordinary least squares (OLS) method for its standard implementation. In the past, several approaches have shown to solve the limitations of OLS by using diverse regularization systems. However, there are still some shortcomings in terms of accuracy, precision, and false discovery rate (FDR). In this paper, we are proposing a new strategy to use Random Forest as a regularization technique for computing GC that will improve these shortcomings. We have demonstrated the effectiveness of our proposed methodology by comparing the results with existing Least absolute shrinkage and selection operator (LASSO), and Elastic-Net regularized implementations of GC using simulated dataset. Later, we have used our proposed approach to map the network involved during deductive reasoning using real StarPlus dataset.

  10. Assessment of Carbon Stocks in the Topsoil Using Random Forest and Remote Sensing Images.

    PubMed

    Kim, Jongsung; Grunwald, Sabine

    2016-11-01

    Wetland soils are able to exhibit both consumption and production of greenhouse gases, and they play an important role in the regulation of the global carbon (C) cycle. Still, it is challenging to accurately evaluate the actual amount of C stored in wetlands. The incorporation of remote sensing data into digital soil models has great potential to assess C stocks in wetland soils. Our objectives were (i) to develop C stock prediction models utilizing remote sensing images and environmental ancillary data, (ii) to identify the prime environmental predictor variables that explain the spatial distribution of soil C, and (iii) to assess the amount of C stored in the top 20-cm soils of a prominent nutrient-enriched wetland. We collected a total of 108 soil cores at two soil depths (0-10 cm and 10-20 cm) in the Water Conservation Area 2A, FL. We developed random forest models to predict soil C stocks using field observation data, environmental ancillary data, and spectral data derived from remote sensing images, including Satellite Pour l'Observation de la Terre (spatial resolution: 10 m), Landsat Enhanced Thematic Mapper Plus (30 m), and Moderate Resolution Imaging Spectroradiometer (250 m). The random forest models showed high performance to predict C stocks, and variable importance revealed that hydrology was the major environmental factor explaining the spatial distribution of soil C stocks in Water Conservation Area 2A. Our results showed that this area stores about 4.2 Tg (4.2 Mt) of C in the top 20-cm soils.

  11. Prediction of detailed enzyme functions and identification of specificity determining residues by random forests.

    PubMed

    Nagao, Chioko; Nagano, Nozomi; Mizuguchi, Kenji

    2014-01-01

    Determining enzyme functions is essential for a thorough understanding of cellular processes. Although many prediction methods have been developed, it remains a significant challenge to predict enzyme functions at the fourth-digit level of the Enzyme Commission numbers. Functional specificity of enzymes often changes drastically by mutations of a small number of residues and therefore, information about these critical residues can potentially help discriminate detailed functions. However, because these residues must be identified by mutagenesis experiments, the available information is limited, and the lack of experimentally verified specificity determining residues (SDRs) has hindered the development of detailed function prediction methods and computational identification of SDRs. Here we present a novel method for predicting enzyme functions by random forests, EFPrf, along with a set of putative SDRs, the random forests derived SDRs (rf-SDRs). EFPrf consists of a set of binary predictors for enzymes in each CATH superfamily and the rf-SDRs are the residue positions corresponding to the most highly contributing attributes obtained from each predictor. EFPrf showed a precision of 0.98 and a recall of 0.89 in a cross-validated benchmark assessment. The rf-SDRs included many residues, whose importance for specificity had been validated experimentally. The analysis of the rf-SDRs revealed both a general tendency that functionally diverged superfamilies tend to include more active site residues in their rf-SDRs than in less diverged superfamilies, and superfamily-specific conservation patterns of each functional residue. EFPrf and the rf-SDRs will be an effective tool for annotating enzyme functions and for understanding how enzyme functions have diverged within each superfamily.

  12. An evaluation of ISOCLS and CLASSY clustering algorithms for forest classification in northern Idaho. [Elk River quadrange of the Clearwater National Forest

    NASA Technical Reports Server (NTRS)

    Werth, L. F. (Principal Investigator)

    1981-01-01

    Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.

  13. A surrogate-primary replacement algorithm for response-adaptive randomization in stroke clinical trials.

    PubMed

    Nowacki, Amy S; Zhao, Wenle; Palesch, Yuko Y

    2015-01-12

    Response-adaptive randomization (RAR) offers clinical investigators benefit by modifying the treatment allocation probabilities to optimize the ethical, operational, or statistical performance of the trial. Delayed primary outcomes and their effect on RAR have been studied in the literature; however, the incorporation of surrogate outcomes has not been fully addressed. We explore the benefits and limitations of surrogate outcome utilization in RAR in the context of acute stroke clinical trials. We propose a novel surrogate-primary (S-P) replacement algorithm where a patient's surrogate outcome is used in the RAR algorithm only until their primary outcome becomes available to replace it. Computer simulations investigate the effect of both the delay in obtaining the primary outcome and the underlying surrogate and primary outcome distributional discrepancies on complete randomization, standard RAR and the S-P replacement algorithm methods. Results show that when the primary outcome is delayed, the S-P replacement algorithm reduces the variability of the treatment allocation probabilities and achieves stabilization sooner. Additionally, the S-P replacement algorithm benefit proved to be robust in that it preserved power and reduced the expected number of failures across a variety of scenarios.

  14. Nonconvergence of the Wang-Landau algorithms with multiple random walkers.

    PubMed

    Belardinelli, R E; Pereyra, V D

    2016-05-01

    This paper discusses some convergence properties in the entropic sampling Monte Carlo methods with multiple random walkers, particularly in the Wang-Landau (WL) and 1/t algorithms. The classical algorithms are modified by the use of m-independent random walkers in the energy landscape to calculate the density of states (DOS). The Ising model is used to show the convergence properties in the calculation of the DOS, as well as the critical temperature, while the calculation of the number π by multiple dimensional integration is used in the continuum approximation. In each case, the error is obtained separately for each walker at a fixed time, t; then, the average over m walkers is performed. It is observed that the error goes as 1/sqrt[m]. However, if the number of walkers increases above a certain critical value m>m_{x}, the error reaches a constant value (i.e., it saturates). This occurs for both algorithms; however, it is shown that for a given system, the 1/t algorithm is more efficient and accurate than the similar version of the WL algorithm. It follows that it makes no sense to increase the number of walkers above a critical value m_{x}, since it does not reduce the error in the calculation. Therefore, the number of walkers does not guarantee convergence.

  15. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence

    PubMed Central

    Mi, Chunrong; Huettmann, Falk; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid

  16. Why choose Random Forest to predict rare species distribution with few samples in large undersampled areas? Three Asian crane species models provide supporting evidence.

    PubMed

    Mi, Chunrong; Huettmann, Falk; Guo, Yumin; Han, Xuesong; Wen, Lijia

    2017-01-01

    Species distribution models (SDMs) have become an essential tool in ecology, biogeography, evolution and, more recently, in conservation biology. How to generalize species distributions in large undersampled areas, especially with few samples, is a fundamental issue of SDMs. In order to explore this issue, we used the best available presence records for the Hooded Crane (Grus monacha, n = 33), White-naped Crane (Grus vipio, n = 40), and Black-necked Crane (Grus nigricollis, n = 75) in China as three case studies, employing four powerful and commonly used machine learning algorithms to map the breeding distributions of the three species: TreeNet (Stochastic Gradient Boosting, Boosted Regression Tree Model), Random Forest, CART (Classification and Regression Tree) and Maxent (Maximum Entropy Models). In addition, we developed an ensemble forecast by averaging predicted probability of the above four models results. Commonly used model performance metrics (Area under ROC (AUC) and true skill statistic (TSS)) were employed to evaluate model accuracy. The latest satellite tracking data and compiled literature data were used as two independent testing datasets to confront model predictions. We found Random Forest demonstrated the best performance for the most assessment method, provided a better model fit to the testing data, and achieved better species range maps for each crane species in undersampled areas. Random Forest has been generally available for more than 20 years and has been known to perform extremely well in ecological predictions. However, while increasingly on the rise, its potential is still widely underused in conservation, (spatial) ecological applications and for inference. Our results show that it informs ecological and biogeographical theories as well as being suitable for conservation applications, specifically when the study area is undersampled. This method helps to save model-selection time and effort, and allows robust and rapid

  17. Rotorcraft Blade Mode Damping Identification from Random Responses Using a Recursive Maximum Likelihood Algorithm

    NASA Technical Reports Server (NTRS)

    Molusis, J. A.

    1982-01-01

    An on line technique is presented for the identification of rotor blade modal damping and frequency from rotorcraft random response test data. The identification technique is based upon a recursive maximum likelihood (RML) algorithm, which is demonstrated to have excellent convergence characteristics in the presence of random measurement noise and random excitation. The RML technique requires virtually no user interaction, provides accurate confidence bands on the parameter estimates, and can be used for continuous monitoring of modal damping during wind tunnel or flight testing. Results are presented from simulation random response data which quantify the identified parameter convergence behavior for various levels of random excitation. The data length required for acceptable parameter accuracy is shown to depend upon the amplitude of random response and the modal damping level. Random response amplitudes of 1.25 degrees to .05 degrees are investigated. The RML technique is applied to hingeless rotor test data. The inplane lag regressing mode is identified at different rotor speeds. The identification from the test data is compared with the simulation results and with other available estimates of frequency and damping.

  18. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors

    PubMed Central

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2016-01-01

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems. PMID:26784195

  19. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  20. Data dependent random forest applied to screening for laryngeal disorders through analysis of sustained phonation: acoustic versus contact microphone.

    PubMed

    Verikas, A; Gelzinis, A; Vaiciukynas, E; Bacauskiene, M; Minelga, J; Hållander, M; Uloza, V; Padervinskis, E

    2015-02-01

    Comprehensive evaluation of results obtained using acoustic and contact microphones in screening for laryngeal disorders through analysis of sustained phonation is the main objective of this study. Aiming to obtain a versatile characterization of voice samples recorded using microphones of both types, 14 different sets of features are extracted and used to build an accurate classifier to distinguish between normal and pathological cases. We propose a new, data dependent random forests-based, way to combine information available from the different feature sets. An approach to exploring data and decisions made by a random forest is also presented. Experimental investigations using a mixed gender database of 273 subjects have shown that the perceptual linear predictive cepstral coefficients (PLPCC) was the best feature set for both microphones. However, the linear predictive coefficients (LPC) and linear predictive cosine transform coefficients (LPCTC) exhibited good performance in the acoustic microphone case only. Models designed using the acoustic microphone data significantly outperformed the ones built using data recorded by the contact microphone. The contact microphone did not bring any additional information useful for the classification. The proposed data dependent random forest significantly outperformed the traditional random forest.

  1. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

    PubMed

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2016-01-14

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems.

  2. Random forest in remote sensing: A review of applications and future directions

    NASA Astrophysics Data System (ADS)

    Belgiu, Mariana; Drăguţ, Lucian

    2016-04-01

    A random forest (RF) classifier is an ensemble classifier that produces multiple decision trees, using a randomly selected subset of training samples and variables. This classifier has become popular within the remote sensing community due to the accuracy of its classifications. The overall objective of this work was to review the utilization of RF classifier in remote sensing. This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting. It is, however, sensitive to the sampling design. The variable importance (VI) measurement provided by the RF classifier has been extensively exploited in different scenarios, for example to reduce the number of dimensions of hyperspectral data, to identify the most relevant multisource remote sensing and geographic data, and to select the most suitable season to classify particular target classes. Further investigations are required into less commonly exploited uses of this classifier, such as for sample proximity analysis to detect and remove outliers in the training samples.

  3. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning.

    PubMed

    McIntosh, Chris; Svistoun, Igor; Purdie, Thomas G

    2013-06-01

    Radiation therapy is used to treat cancer patients around the world. High quality treatment plans maximally radiate the targets while minimally radiating healthy organs at risk. In order to judge plan quality and safety, segmentations of the targets and organs at risk are created, and the amount of radiation that will be delivered to each structure is estimated prior to treatment. If the targets or organs at risk are mislabelled, or the segmentations are of poor quality, the safety of the radiation doses will be erroneously reviewed and an unsafe plan could proceed. We propose a technique to automatically label groups of segmentations of different structures from a radiation therapy plan for the joint purposes of providing quality assurance and data mining. Given one or more segmentations and an associated image we seek to assign medically meaningful labels to each segmentation and report the confidence of that label. Our method uses random forests to learn joint distributions over the training features, and then exploits a set of learned potential group configurations to build a conditional random field (CRF) that ensures the assignment of labels is consistent across the group of segmentations. The CRF is then solved via a constrained assignment problem. We validate our method on 1574 plans, consisting of 17[Formula: see text] 579 segmentations, demonstrating an overall classification accuracy of 91.58%. Our results also demonstrate the stability of RF with respect to tree depth and the number of splitting variables in large data sets.

  4. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    ERIC Educational Resources Information Center

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  5. Lesion segmentation from multimodal MRI using random forest following ischemic stroke.

    PubMed

    Mitra, Jhimli; Bourgeat, Pierrick; Fripp, Jurgen; Ghose, Soumya; Rose, Stephen; Salvado, Olivier; Connelly, Alan; Campbell, Bruce; Palmer, Susan; Sharma, Gagan; Christensen, Soren; Carey, Leeanne

    2014-09-01

    Understanding structure-function relationships in the brain after stroke is reliant not only on the accurate anatomical delineation of the focal ischemic lesion, but also on previous infarcts, remote changes and the presence of white matter hyperintensities. The robust definition of primary stroke boundaries and secondary brain lesions will have significant impact on investigation of brain-behavior relationships and lesion volume correlations with clinical measures after stroke. Here we present an automated approach to identify chronic ischemic infarcts in addition to other white matter pathologies, that may be used to aid the development of post-stroke management strategies. Our approach uses Bayesian-Markov Random Field (MRF) classification to segment probable lesion volumes present on fluid attenuated inversion recovery (FLAIR) MRI. Thereafter, a random forest classification of the information from multimodal (T1-weighted, T2-weighted, FLAIR, and apparent diffusion coefficient (ADC)) MRI images and other context-aware features (within the probable lesion areas) was used to extract areas with high likelihood of being classified as lesions. The final segmentation of the lesion was obtained by thresholding the random forest probabilistic maps. The accuracy of the automated lesion delineation method was assessed in a total of 36 patients (24 male, 12 female, mean age: 64.57±14.23yrs) at 3months after stroke onset and compared with manually segmented lesion volumes by an expert. Accuracy assessment of the automated lesion identification method was performed using the commonly used evaluation metrics. The mean sensitivity of segmentation was measured to be 0.53±0.13 with a mean positive predictive value of 0.75±0.18. The mean lesion volume difference was observed to be 32.32%±21.643% with a high Pearson's correlation of r=0.76 (p<0.0001). The lesion overlap accuracy was measured in terms of Dice similarity coefficient with a mean of 0.60±0.12, while the contour

  6. A partially reflecting random walk on spheres algorithm for electrical impedance tomography

    SciTech Connect

    Maire, Sylvain; Simon, Martin

    2015-12-15

    In this work, we develop a probabilistic estimator for the voltage-to-current map arising in electrical impedance tomography. This novel so-called partially reflecting random walk on spheres estimator enables Monte Carlo methods to compute the voltage-to-current map in an embarrassingly parallel manner, which is an important issue with regard to the corresponding inverse problem. Our method uses the well-known random walk on spheres algorithm inside subdomains where the diffusion coefficient is constant and employs replacement techniques motivated by finite difference discretization to deal with both mixed boundary conditions and interface transmission conditions. We analyze the global bias and the variance of the new estimator both theoretically and experimentally. Subsequently, the variance of the new estimator is considerably reduced via a novel control variate conditional sampling technique which yields a highly efficient hybrid forward solver coupling probabilistic and deterministic algorithms.

  7. Search Control Algorithm Based on Random Step Size Hill-Climbing Method for Adaptive PMD Compensation

    NASA Astrophysics Data System (ADS)

    Tanizawa, Ken; Hirose, Akira

    Adaptive polarization mode dispersion (PMD) compensation is required for the speed-up and advancement of the present optical communications. The combination of a tunable PMD compensator and its adaptive control method achieves adaptive PMD compensation. In this paper, we report an effective search control algorithm for the feedback control of the PMD compensator. The algorithm is based on the hill-climbing method. However, the step size changes randomly to prevent the convergence from being trapped at a local maximum or a flat, unlike the conventional hill-climbing method. The randomness depends on the Gaussian probability density functions. We conducted transmission simulations at 160Gb/s and the results show that the proposed method provides more optimal compensator control than the conventional hill-climbing method.

  8. Enhancing network robustness against targeted and random attacks using a memetic algorithm

    NASA Astrophysics Data System (ADS)

    Tang, Xianglong; Liu, Jing; Zhou, Mingxing

    2015-08-01

    In the past decades, there has been much interest in the elasticity of infrastructures to targeted and random attacks. In the recent work by Schneider C. M. et al., Proc. Natl. Acad. Sci. U.S.A., 108 (2011) 3838, the authors proposed an effective measure (namely R, here we label it as R t to represent the measure for targeted attacks) to evaluate network robustness against targeted node attacks. Using a greedy algorithm, they found that the optimal structure is an onion-like one. However, real systems are often under threats of both targeted attacks and random failures. So, enhancing networks robustness against both targeted and random attacks is of great importance. In this paper, we first design a random-robustness index (Rr) . We find that the onion-like networks destroyed the original strong ability of BA networks in resisting random attacks. Moreover, the structure of an R r -optimized network is found to be different from that of an onion-like network. To design robust scale-free networks (RSF) which are resistant to both targeted and random attacks (TRA) without changing the degree distribution, a memetic algorithm (MA) is proposed, labeled as \\textit{MA-RSF}\\textit{TRA} . In the experiments, both synthetic scale-free networks and real-world networks are used to validate the performance of \\textit{MA-RSF}\\textit{TRA} . The results show that \\textit{MA-RSF} \\textit{TRA} has a great ability in searching for the most robust network structure that is resistant to both targeted and random attacks.

  9. Performance of the quantum adiabatic algorithm on random instances of two optimization problems on regular hypergraphs

    NASA Astrophysics Data System (ADS)

    Farhi, Edward; Gosset, David; Hen, Itay; Sandvik, A. W.; Shor, Peter; Young, A. P.; Zamponi, Francesco

    2012-11-01

    In this paper we study the performance of the quantum adiabatic algorithm on random instances of two combinatorial optimization problems, 3-regular 3-XORSAT and 3-regular max-cut. The cost functions associated with these two clause-based optimization problems are similar as they are both defined on 3-regular hypergraphs. For 3-regular 3-XORSAT the clauses contain three variables and for 3-regular max-cut the clauses contain two variables. The quantum adiabatic algorithms we study for these two problems use interpolating Hamiltonians which are amenable to sign-problem free quantum Monte Carlo and quantum cavity methods. Using these techniques we find that the quantum adiabatic algorithm fails to solve either of these problems efficiently, although for different reasons.

  10. Representation of high frequency Space Shuttle data by ARMA algorithms and random response spectra

    NASA Technical Reports Server (NTRS)

    Spanos, P. D.; Mushung, L. J.

    1990-01-01

    High frequency Space Shuttle lift-off data are treated by autoregressive (AR) and autoregressive-moving-average (ARMA) digital algorithms. These algorithms provide useful information on the spectral densities of the data. Further, they yield spectral models which lend themselves to incorporation to the concept of the random response spectrum. This concept yields a reasonably smooth power spectrum for the design of structural and mechanical systems when the available data bank is limited. Due to the non-stationarity of the lift-off event, the pertinent data are split into three slices. Each of the slices is associated with a rather distinguishable phase of the lift-off event, where stationarity can be expected. The presented results are rather preliminary in nature; it is aimed to call attention to the availability of the discussed digital algorithms and to the need to augment the Space Shuttle data bank as more flights are completed.

  11. The backtracking survey propagation algorithm for solving random K-SAT problems

    PubMed Central

    Marino, Raffaele; Parisi, Giorgio; Ricci-Tersenghi, Federico

    2016-01-01

    Discrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K=3,4, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables. PMID:27694952

  12. The backtracking survey propagation algorithm for solving random K-SAT problems

    NASA Astrophysics Data System (ADS)

    Marino, Raffaele; Parisi, Giorgio; Ricci-Tersenghi, Federico

    2016-10-01

    Discrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K=3,4, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables.

  13. Classification Model for Forest Fire Hotspot Occurrences Prediction Using ANFIS Algorithm

    NASA Astrophysics Data System (ADS)

    Wijayanto, A. K.; Sani, O.; Kartika, N. D.; Herdiyeni, Y.

    2017-01-01

    This study proposed the application of data mining technique namely Adaptive Neuro-Fuzzy inference system (ANFIS) on forest fires hotspot data to develop classification models for hotspots occurrence in Central Kalimantan. Hotspot is a point that is indicated as the location of fires. In this study, hotspot distribution is categorized as true alarm and false alarm. ANFIS is a soft computing method in which a given inputoutput data set is expressed in a fuzzy inference system (FIS). The FIS implements a nonlinear mapping from its input space to the output space. The method of this study classified hotspots as target objects by correlating spatial attributes data using three folds in ANFIS algorithm to obtain the best model. The best result obtained from the 3rd fold provided low error for training (error = 0.0093676) and also low error testing result (error = 0.0093676). Attribute of distance to road is the most determining factor that influences the probability of true and false alarm where the level of human activities in this attribute is higher. This classification model can be used to develop early warning system of forest fire.

  14. Randomized algorithms for high quality treatment planning in volumetric modulated arc therapy

    NASA Astrophysics Data System (ADS)

    Yang, Yu; Dong, Bin; Wen, Zaiwen

    2017-02-01

    In recent years, volumetric modulated arc therapy (VMAT) has been becoming a more and more important radiation technique widely used in clinical application for cancer treatment. One of the key problems in VMAT is treatment plan optimization, which is complicated due to the constraints imposed by the involved equipments. In this paper, we consider a model with four major constraints: the bound on the beam intensity, an upper bound on the rate of the change of the beam intensity, the moving speed of leaves of the multi-leaf collimator (MLC) and its directional-convexity. We solve the model by a two-stage algorithm: performing minimization with respect to the shapes of the aperture and the beam intensities alternatively. Specifically, the shapes of the aperture are obtained by a greedy algorithm whose performance is enhanced by random sampling in the leaf pairs with a decremental rate. The beam intensity is optimized using a gradient projection method with non-monotonic line search. We further improve the proposed algorithm by an incremental random importance sampling of the voxels to reduce the computational cost of the energy functional. Numerical simulations on two clinical cancer date sets demonstrate that our method is highly competitive to the state-of-the-art algorithms in terms of both computational time and quality of treatment planning.

  15. Patient-Specific Predictive Modeling Using Random Forests: An Observational Study for the Critically Ill

    PubMed Central

    2017-01-01

    Background With a large-scale electronic health record repository, it is feasible to build a customized patient outcome prediction model specifically for a given patient. This approach involves identifying past patients who are similar to the present patient and using their data to train a personalized predictive model. Our previous work investigated a cosine-similarity patient similarity metric (PSM) for such patient-specific predictive modeling. Objective The objective of the study is to investigate the random forest (RF) proximity measure as a PSM in the context of personalized mortality prediction for intensive care unit (ICU) patients. Methods A total of 17,152 ICU admissions were extracted from the Multiparameter Intelligent Monitoring in Intensive Care II database. A number of predictor variables were extracted from the first 24 hours in the ICU. Outcome to be predicted was 30-day mortality. A patient-specific predictive model was trained for each ICU admission using an RF PSM inspired by the RF proximity measure. Death counting, logistic regression, decision tree, and RF models were studied with a hard threshold applied to RF PSM values to only include the M most similar patients in model training, where M was varied. In addition, case-specific random forests (CSRFs), which uses RF proximity for weighted bootstrapping, were trained. Results Compared to our previous study that investigated a cosine similarity PSM, the RF PSM resulted in superior or comparable predictive performance. RF and CSRF exhibited the best performances (in terms of mean area under the receiver operating characteristic curve [95% confidence interval], RF: 0.839 [0.835-0.844]; CSRF: 0.832 [0.821-0.843]). RF and CSRF did not benefit from personalization via the use of the RF PSM, while the other models did. Conclusions The RF PSM led to good mortality prediction performance for several predictive models, although it failed to induce improved performance in RF and CSRF. The distinction

  16. Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

    PubMed Central

    2011-01-01

    Background Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs). Methods The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE). Results For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates. Conclusions We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for

  17. Random generation of periodic hard ellipsoids based on molecular dynamics: A computationally-efficient algorithm

    NASA Astrophysics Data System (ADS)

    Ghossein, Elias; Lévesque, Martin

    2013-11-01

    This paper presents a computationally-efficient algorithm for generating random periodic packings of hard ellipsoids. The algorithm is based on molecular dynamics where the ellipsoids are set in translational and rotational motion and their volumes gradually increase. Binary collision times are computed by simply finding the roots of a non-linear function. In addition, an original and efficient method to compute the collision time between an ellipsoid and a cube face is proposed. The algorithm can generate all types of ellipsoids (prolate, oblate and scalene) with very high aspect ratios (i.e., >10). It is the first time that such packings are reported in the literature. Orientations tensors were computed for the generated packings and it has been shown that ellipsoids had a uniform distribution of orientations. Moreover, it seems that for low aspect ratios (i.e., ⩽10), the volume fraction is the most influential parameter on the algorithm CPU time. For higher aspect ratios, the influence of the latter becomes as important as the volume fraction. All necessary pseudo-codes are given so that the reader can easily implement the algorithm.

  18. Drug Concentration Thresholds Predictive of Therapy Failure and Death in Children With Tuberculosis: Bread Crumb Trails in Random Forests

    PubMed Central

    Swaminathan, Soumya; Pasipanodya, Jotam G.; Ramachandran, Geetha; Hemanth Kumar, A. K.; Srivastava, Shashikant; Deshpande, Devyani; Nuermberger, Eric; Gumbo, Tawanda

    2016-01-01

    Background. The role of drug concentrations in clinical outcomes in children with tuberculosis is unclear. Target concentrations for dose optimization are unknown. Methods. Plasma drug concentrations measured in Indian children with tuberculosis were modeled using compartmental pharmacokinetic analyses. The children were followed until end of therapy to ascertain therapy failure or death. An ensemble of artificial intelligence algorithms, including random forests, was used to identify predictors of clinical outcome from among 30 clinical, laboratory, and pharmacokinetic variables. Results. Among the 143 children with known outcomes, there was high between-child variability of isoniazid, rifampin, and pyrazinamide concentrations: 110 (77%) completed therapy, 24 (17%) failed therapy, and 9 (6%) died. The main predictors of therapy failure or death were a pyrazinamide peak concentration <38.10 mg/L and rifampin peak concentration <3.01 mg/L. The relative risk of these poor outcomes below these peak concentration thresholds was 3.64 (95% confidence interval [CI], 2.28–5.83). Isoniazid had concentration-dependent antagonism with rifampin and pyrazinamide, with an adjusted odds ratio for therapy failure of 3.00 (95% CI, 2.08–4.33) in antagonism concentration range. In regard to death alone as an outcome, the same drug concentrations, plus z scores (indicators of malnutrition), and age <3 years, were highly ranked predictors. In children <3 years old, isoniazid 0- to 24-hour area under the concentration-time curve <11.95 mg/L × hour and/or rifampin peak <3.10 mg/L were the best predictors of therapy failure, with relative risk of 3.43 (95% CI, .99–11.82). Conclusions. We have identified new antibiotic target concentrations, which are potential biomarkers associated with treatment failure and death in children with tuberculosis. PMID:27742636

  19. Classification of nanoparticle diffusion processes in vital cells by a multifeature random forests approach: application to simulated data, darkfield, and confocal laser scanning microscopy

    NASA Astrophysics Data System (ADS)

    Wagner, Thorsten; Kroll, Alexandra; Wiemann, Martin; Lipinski, Hans-Gerd

    2016-04-01

    Darkfield and confocal laser scanning microscopy both allow for a simultaneous observation of live cells and single nanoparticles. Accordingly, a characterization of nanoparticle uptake and intracellular mobility appears possible within living cells. Single particle tracking makes it possible to characterize the particle and the surrounding cell. In case of free diffusion, the mean squared displacement for each trajectory of a nanoparticle can be measured which allows computing the corresponding diffusion coefficient and, if desired, converting it into the hydrodynamic diameter using the Stokes-Einstein equation and the viscosity of the fluid. However, within the more complex system of a cell's cytoplasm unrestrained diffusion is scarce and several other types of movements may occur. Thus, confined or anomalous diffusion (e.g. diffusion in porous media), active transport, and combinations thereof were described by several authors. To distinguish between these types of particle movement we developed an appropriate classification method, and simulated three types of particle motion in a 2D plane using a Monte Carlo approach: (1) normal diffusion, using random direction and step-length, (2) subdiffusion, using confinements like a reflective boundary with defined radius or reflective objects in the closer vicinity, and (3) superdiffusion, using a directed flow added to the normal diffusion. To simulate subdiffusion we devised a new method based on tracks of different length combined with equally probable obstacle interaction. Next we estimated the fractal dimension, elongation and the ratio of long-time / short-time diffusion coefficients. These features were used to train a random forests classification algorithm. The accuracy for simulated trajectories with 180 steps was 97% (95%-CI: 0.9481-0.9884). The balanced accuracy was 94%, 99% and 98% for normal-, sub- and superdiffusion, respectively. Nanoparticle tracking analysis was used with 100 nm polystyrene particles

  20. On efficient randomized algorithms for finding the PageRank vector

    NASA Astrophysics Data System (ADS)

    Gasnikov, A. V.; Dmitriev, D. Yu.

    2015-03-01

    Two randomized methods are considered for finding the PageRank vector; in other words, the solution of the system p T = p T P with a stochastic n × n matrix P, where n ˜ 107-109, is sought (in the class of probability distributions) with accuracy ɛ: ɛ ≫ n -1. Thus, the possibility of brute-force multiplication of P by the column is ruled out in the case of dense objects. The first method is based on the idea of Markov chain Monte Carlo algorithms. This approach is efficient when the iterative process p {/t+1 T} = p {/t T} P quickly reaches a steady state. Additionally, it takes into account another specific feature of P, namely, the nonzero off-diagonal elements of P are equal in rows (this property is used to organize a random walk over the graph with the matrix P). Based on modern concentration-of-measure inequalities, new bounds for the running time of this method are presented that take into account the specific features of P. In the second method, the search for a ranking vector is reduced to finding the equilibrium in the antagonistic matrix game where S n (1) is a unit simplex in ℝ n and I is the identity matrix. The arising problem is solved by applying a slightly modified Grigoriadis-Khachiyan algorithm (1995). This technique, like the Nazin-Polyak method (2009), is a randomized version of Nemirovski's mirror descent method. The difference is that randomization in the Grigoriadis-Khachiyan algorithm is used when the gradient is projected onto the simplex rather than when the stochastic gradient is computed. For sparse matrices P, the method proposed yields noticeably better results.

  1. 3D statistical shape models incorporating 3D random forest regression voting for robust CT liver segmentation

    NASA Astrophysics Data System (ADS)

    Norajitra, Tobias; Meinzer, Hans-Peter; Maier-Hein, Klaus H.

    2015-03-01

    During image segmentation, 3D Statistical Shape Models (SSM) usually conduct a limited search for target landmarks within one-dimensional search profiles perpendicular to the model surface. In addition, landmark appearance is modeled only locally based on linear profiles and weak learners, altogether leading to segmentation errors from landmark ambiguities and limited search coverage. We present a new method for 3D SSM segmentation based on 3D Random Forest Regression Voting. For each surface landmark, a Random Regression Forest is trained that learns a 3D spatial displacement function between the according reference landmark and a set of surrounding sample points, based on an infinite set of non-local randomized 3D Haar-like features. Landmark search is then conducted omni-directionally within 3D search spaces, where voxelwise forest predictions on landmark position contribute to a common voting map which reflects the overall position estimate. Segmentation experiments were conducted on a set of 45 CT volumes of the human liver, of which 40 images were randomly chosen for training and 5 for testing. Without parameter optimization, using a simple candidate selection and a single resolution approach, excellent results were achieved, while faster convergence and better concavity segmentation were observed, altogether underlining the potential of our approach in terms of increased robustness from distinct landmark detection and from better search coverage.

  2. SEGMA: An Automatic SEGMentation Approach for Human Brain MRI Using Sliding Window and Random Forests

    PubMed Central

    Serag, Ahmed; Wilkinson, Alastair G.; Telford, Emma J.; Pataky, Rozalia; Sparrow, Sarah A.; Anblagan, Devasuda; Macnaught, Gillian; Semple, Scott I.; Boardman, James P.

    2017-01-01

    Quantitative volumes from brain magnetic resonance imaging (MRI) acquired across the life course may be useful for investigating long term effects of risk and resilience factors for brain development and healthy aging, and for understanding early life determinants of adult brain structure. Therefore, there is an increasing need for automated segmentation tools that can be applied to images acquired at different life stages. We developed an automatic segmentation method for human brain MRI, where a sliding window approach and a multi-class random forest classifier were applied to high-dimensional feature vectors for accurate segmentation. The method performed well on brain MRI data acquired from 179 individuals, analyzed in three age groups: newborns (38–42 weeks gestational age), children and adolescents (4–17 years) and adults (35–71 years). As the method can learn from partially labeled datasets, it can be used to segment large-scale datasets efficiently. It could also be applied to different populations and imaging modalities across the life course. PMID:28163680

  3. Genome-wide association study for backfat thickness in Canchim beef cattle using Random Forest approach

    PubMed Central

    2013-01-01

    Background Meat quality involves many traits, such as marbling, tenderness, juiciness, and backfat thickness, all of which require attention from livestock producers. Backfat thickness improvement by means of traditional selection techniques in Canchim beef cattle has been challenging due to its low heritability, and it is measured late in an animal’s life. Therefore, the implementation of new methodologies for identification of single nucleotide polymorphisms (SNPs) linked to backfat thickness are an important strategy for genetic improvement of carcass and meat quality. Results The set of SNPs identified by the random forest approach explained as much as 50% of the deregressed estimated breeding value (dEBV) variance associated with backfat thickness, and a small set of 5 SNPs were able to explain 34% of the dEBV for backfat thickness. Several quantitative trait loci (QTL) for fat-related traits were found in the surrounding areas of the SNPs, as well as many genes with roles in lipid metabolism. Conclusions These results provided a better understanding of the backfat deposition and regulation pathways, and can be considered a starting point for future implementation of a genomic selection program for backfat thickness in Canchim beef cattle. PMID:23738659

  4. Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Lin, Zan; Wu, Hegang; Wang, Li; Wu, Tong; Tan, Chao

    2015-01-01

    Near-infrared (NIR) spectroscopy has such advantages as being noninvasive, fast, relatively inexpensive, and no risk of ionizing radiation. Differences in the NIR signals can reflect many physiological changes, which are in turn associated with such factors as vascularization, cellularity, oxygen consumption, or remodeling. NIR spectral differences between colorectal cancer and healthy tissues were investigated. A Fourier transform NIR spectroscopy instrument equipped with a fiber-optic probe was used to mimic in situ clinical measurements. A total of 186 spectra were collected and then underwent the preprocessing of standard normalize variate (SNV) for removing unwanted background variances. All the specimen and spots used for spectral collection were confirmed staining and examination by an experienced pathologist so as to ensure the representative of the pathology. Principal component analysis (PCA) was used to uncover the possible clustering. Several methods including random forest (RF), partial least squares-discriminant analysis (PLSDA), K-nearest neighbor and classification and regression tree (CART) were used to extract spectral features and to construct the diagnostic models. By comparison, it reveals that, even if no obvious difference of misclassified ratio (MCR) was observed between these models, RF is preferable since it is quicker, more convenient and insensitive to over-fitting. The results indicate that NIR spectroscopy coupled with RF model can serve as a potential tool for discriminating the colorectal cancer tissues from normal ones.

  5. IChemPIC: A Random Forest Classifier of Biological and Crystallographic Protein-Protein Interfaces.

    PubMed

    Da Silva, Franck; Desaphy, Jérémy; Bret, Guillaume; Rognan, Didier

    2015-09-28

    Protein-protein interactions are becoming a major focus of academic and pharmaceutical research to identify low molecular weight compounds able to modulate oligomeric signaling complexes. As the number of protein complexes of known three-dimensional structure is constantly increasing, there is a need to discard biologically irrelevant interfaces and prioritize those of high value for potential druggability assessment. A Random Forest model has been trained on a set of 300 protein-protein interfaces using 45 molecular interaction descriptors as input. It is able to predict the nature of external test interfaces (crystallographic vs biological) with accuracy at least equal to that of the best state-of-the-art methods. However, our method presents unique advantages in the early prioritization of potentially ligandable protein-protein interfaces: (i) it is equally robust in predicting either crystallographic or biological contacts and (ii) it can be applied to a wide array of oligomeric complexes ranging from small-sized biological interfaces to large crystallographic contacts.

  6. Combinations of Stressors in Midlife: Examining Role and Domain Stressors Using Regression Trees and Random Forests

    PubMed Central

    2013-01-01

    Objectives. Global perceptions of stress (GPS) have major implications for mental and physical health, and stress in midlife may influence adaptation in later life. Thus, it is important to determine the unique and interactive effects of diverse influences of role stress (at work or in personal relationships), loneliness, life events, time pressure, caregiving, finances, discrimination, and neighborhood circumstances on these GPS. Method. Exploratory regression trees and random forests were used to examine complex interactions among myriad events and chronic stressors in middle-aged participants’ (N = 410; mean age = 52.12) GPS. Results. Different role and domain stressors were influential at high and low levels of loneliness. Varied combinations of these stressors resulting in similar levels of perceived stress are also outlined as examples of equifinality. Loneliness emerged as an important predictor across trees. Discussion. Exploring multiple stressors simultaneously provides insights into the diversity of stressor combinations across individuals—even those with similar levels of global perceived stress—and answers theoretical mandates to better understand the influence of stress by sampling from many domain and role stressors. Further, the unique influences of each predictor relative to the others inform theory and applied work. Finally, examples of equifinality and multifinality call for targeted interventions. PMID:23341437

  7. Random Forest Segregation of Drug Responses May Define Regions of Biological Significance.

    PubMed

    Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino

    2016-01-01

    The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies.

  8. Object-oriented mapping of urban trees using Random Forest classifiers

    NASA Astrophysics Data System (ADS)

    Puissant, Anne; Rougier, Simon; Stumpf, André

    2014-02-01

    Since vegetation in urban areas delivers crucial ecological services as a support to human well-being and to the urban population in general, its monitoring is a major issue for urban planners. Mapping and monitoring the changes in urban green spaces are important tasks because of their functions such as the management of air, climate and water quality, the reduction of noise, the protection of species and the development of recreational activities. In this context, the objective of this work is to propose a methodology to inventory and map the urban tree spaces from a mono-temporal very high resolution (VHR) optical image using a Random Forest classifier in combination with object-oriented approaches. The methodology is developed and its performance is evaluated on a dataset of the city of Strasbourg (France) for different categories of built-up areas. The results indicate a good accuracy and a high robustness for the classification of the green elements in terms of user and producer accuracies.

  9. Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest.

    PubMed

    Chen, Hui; Lin, Zan; Wu, Hegang; Wang, Li; Wu, Tong; Tan, Chao

    2015-01-25

    Near-infrared (NIR) spectroscopy has such advantages as being noninvasive, fast, relatively inexpensive, and no risk of ionizing radiation. Differences in the NIR signals can reflect many physiological changes, which are in turn associated with such factors as vascularization, cellularity, oxygen consumption, or remodeling. NIR spectral differences between colorectal cancer and healthy tissues were investigated. A Fourier transform NIR spectroscopy instrument equipped with a fiber-optic probe was used to mimic in situ clinical measurements. A total of 186 spectra were collected and then underwent the preprocessing of standard normalize variate (SNV) for removing unwanted background variances. All the specimen and spots used for spectral collection were confirmed staining and examination by an experienced pathologist so as to ensure the representative of the pathology. Principal component analysis (PCA) was used to uncover the possible clustering. Several methods including random forest (RF), partial least squares-discriminant analysis (PLSDA), K-nearest neighbor and classification and regression tree (CART) were used to extract spectral features and to construct the diagnostic models. By comparison, it reveals that, even if no obvious difference of misclassified ratio (MCR) was observed between these models, RF is preferable since it is quicker, more convenient and insensitive to over-fitting. The results indicate that NIR spectroscopy coupled with RF model can serve as a potential tool for discriminating the colorectal cancer tissues from normal ones.

  10. Selective of informative metabolites using random forests based on model population analysis.

    PubMed

    Huang, Jian-Hua; Yan, Jun; Wu, Qing-Hua; Duarte Ferro, Miguel; Yi, Lun-Zhao; Lu, Hong-Mei; Xu, Qing-Song; Liang, Yi-Zeng

    2013-12-15

    One of the main goals of metabolomics studies is to discover informative metabolites or biomarkers, which may be used to diagnose diseases and to find out pathology. Sophisticated feature selection approaches are required to extract the information hidden in such complex 'omics' data. In this study, it is proposed a new and robust selective method by combining random forests (RF) with model population analysis (MPA), for selecting informative metabolites from three metabolomic datasets. According to the contribution to the classification accuracy, the metabolites were classified into three kinds: informative, no-informative, and interfering metabolites. Based on the proposed method, some informative metabolites were selected for three datasets; further analyses of these metabolites between healthy and diseased groups were then performed, showing by T-test that the P values for all these selected metabolites were lower than 0.05. Moreover, the informative metabolites identified by the current method were demonstrated to be correlated with the clinical outcome under investigation. The source codes of MPA-RF in Matlab can be freely downloaded from http://code.google.com/p/my-research-list/downloads/list.

  11. Risk Prediction of One-Year Mortality in Patients with Cardiac Arrhythmias Using Random Survival Forest.

    PubMed

    Miao, Fen; Cai, Yun-Peng; Zhang, Yu-Xiao; Li, Ye; Zhang, Yuan-Ting

    2015-01-01

    Existing models for predicting mortality based on traditional Cox proportional hazard approach (CPH) often have low prediction accuracy. This paper aims to develop a clinical risk model with good accuracy for predicting 1-year mortality in cardiac arrhythmias patients using random survival forest (RSF), a robust approach for survival analysis. 10,488 cardiac arrhythmias patients available in the public MIMIC II clinical database were investigated, with 3,452 deaths occurring within 1-year followups. Forty risk factors including demographics and clinical and laboratory information and antiarrhythmic agents were analyzed as potential predictors of all-cause mortality. RSF was adopted to build a comprehensive survival model and a simplified risk model composed of 14 top risk factors. The built comprehensive model achieved a prediction accuracy of 0.81 measured by c-statistic with 10-fold cross validation. The simplified risk model also achieved a good accuracy of 0.799. Both results outperformed traditional CPH (which achieved a c-statistic of 0.733 for the comprehensive model and 0.718 for the simplified model). Moreover, various factors are observed to have nonlinear impact on cardiac arrhythmias prognosis. As a result, RSF based model which took nonlinearity into account significantly outperformed traditional Cox proportional hazard model and has great potential to be a more effective approach for survival analysis.

  12. A New MAC Address Spoofing Detection Technique Based on Random Forests

    PubMed Central

    Alotaibi, Bandar; Elleithy, Khaled

    2016-01-01

    Media access control (MAC) addresses in wireless networks can be trivially spoofed using off-the-shelf devices. The aim of this research is to detect MAC address spoofing in wireless networks using a hard-to-spoof measurement that is correlated to the location of the wireless device, namely the received signal strength (RSS). We developed a passive solution that does not require modification for standards or protocols. The solution was tested in a live test-bed (i.e., a wireless local area network with the aid of two air monitors acting as sensors) and achieved 99.77%, 93.16% and 88.38% accuracy when the attacker is 8–13 m, 4–8 m and less than 4 m away from the victim device, respectively. We implemented three previous methods on the same test-bed and found that our solution outperforms existing solutions. Our solution is based on an ensemble method known as random forests. PMID:26927103

  13. Prediction of Protein–Protein Interaction Sites in Sequences and 3D Structures by Random Forests

    PubMed Central

    Šikić, Mile; Tomić, Sanja; Vlahoviček, Kristian

    2009-01-01

    Identifying interaction sites in proteins provides important clues to the function of a protein and is becoming increasingly relevant in topics such as systems biology and drug discovery. Although there are numerous papers on the prediction of interaction sites using information derived from structure, there are only a few case reports on the prediction of interaction residues based solely on protein sequence. Here, a sliding window approach is combined with the Random Forests method to predict protein interaction sites using (i) a combination of sequence- and structure-derived parameters and (ii) sequence information alone. For sequence-based prediction we achieved a precision of 84% with a 26% recall and an F-measure of 40%. When combined with structural information, the prediction performance increases to a precision of 76% and a recall of 38% with an F-measure of 51%. We also present an attempt to rationalize the sliding window size and demonstrate that a nine-residue window is the most suitable for predictor construction. Finally, we demonstrate the applicability of our prediction methods by modeling the Ras–Raf complex using predicted interaction sites as target binding interfaces. Our results suggest that it is possible to predict protein interaction sites with quite a high accuracy using only sequence information. PMID:19180183

  14. Discrimination of fish populations using parasites: Random Forests on a 'predictable' host-parasite system.

    PubMed

    Pérez-Del-Olmo, A; Montero, F E; Fernández, M; Barrett, J; Raga, J A; Kostadinova, A

    2010-10-01

    We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2-5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

  15. Random Forest Segregation of Drug Responses May Define Regions of Biological Significance

    PubMed Central

    Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino

    2016-01-01

    The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies. PMID:27014046

  16. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

    NASA Astrophysics Data System (ADS)

    Du, Shihong; Zhang, Fangli; Zhang, Xiuyuan

    2015-07-01

    While most existing studies have focused on extracting geometric information on buildings, only a few have concentrated on semantic information. The lack of semantic information cannot satisfy many demands on resolving environmental and social issues. This study presents an approach to semantically classify buildings into much finer categories than those of existing studies by learning random forest (RF) classifier from a large number of imbalanced samples with high-dimensional features. First, a two-level segmentation mechanism combining GIS and VHR image produces single image objects at a large scale and intra-object components at a small scale. Second, a semi-supervised method chooses a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Third, two important improvements in RF classifier are made: a voting-distribution ranked rule for reducing the influences of imbalanced samples on classification accuracy and a feature importance measurement for evaluating each feature's contribution to the recognition of each category. Fourth, the semantic classification of urban buildings is practically conducted in Beijing city, and the results demonstrate that the proposed approach is effective and accurate. The seven categories used in the study are finer than those in existing work and more helpful to studying many environmental and social problems.

  17. Brain Tumour Segmentation based on Extremely Randomized Forest with high-level features.

    PubMed

    Pinto, Adriano; Pereira, Sergio; Correia, Higino; Oliveira, J; Rasteiro, Deolinda M L D; Silva, Carlos A

    2015-08-01

    Gliomas are among the most common and aggressive brain tumours. Segmentation of these tumours is important for surgery and treatment planning, but also for follow-up evaluations. However, it is a difficult task, given that its size and locations are variable, and the delineation of all tumour tissue is not trivial, even with all the different modalities of the Magnetic Resonance Imaging (MRI). We propose a discriminative and fully automatic method for the segmentation of gliomas, using appearance- and context-based features to feed an Extremely Randomized Forest (Extra-Trees). Some of these features are computed over a non-linear transformation of the image. The proposed method was evaluated using the publicly available Challenge database from BraTS 2013, having obtained a Dice score of 0.83, 0.78 and 0.73 for the complete tumour, and the core and the enhanced regions, respectively. Our results are competitive, when compared against other results reported using the same database.

  18. Parallel and deterministic algorithms from MRFs (Markov Random Fields): Surface reconstruction and integration. Memorandum report

    SciTech Connect

    Geiger, D.; Girosi, F.

    1989-05-01

    In recent years many researchers have investigated the use of Markov random fields (MRFs) for computer vision. They can be applied for example in the output of the visual processes to reconstruct surfaces from sparse and noisy depth data, or to integrate early vision processes to label physical discontinuities. Drawbacks of MRFs models have been the computational complexity of the implementation and the difficulty in estimating the parameters of the model. This paper derives deterministic approximations to MRFs models. One of the considered models is shown to give in a natural way the graduate non convexity (GNC) algorithm. This model can be applied to smooth a field preserving its discontinuities. A new model is then proposed: it allows the gradient of the field to be enhanced at the discontinuities and smoothed elsewhere. All the theoretical results are obtained in the framework of the mean field theory, that is a well known statistical mechanics technique. A fast, parallel, and iterative algorithm to solve the deterministic equations of the two models is presented, together with experiments on synthetic and real images. The algorithm is applied to the problem of surface reconstruction is in the case of sparse data. A fast algorithm is also described that solves the problem of aligning the discontinuities of different visual models with intensity edges via integration.

  19. Optical double image security using random phase fractional Fourier domain encoding and phase-retrieval algorithm

    NASA Astrophysics Data System (ADS)

    Rajput, Sudheesh K.; Nishchal, Naveen K.

    2017-04-01

    We propose a novel security scheme based on the double random phase fractional domain encoding (DRPE) and modified Gerchberg-Saxton (G-S) phase retrieval algorithm for securing two images simultaneously. Any one of the images to be encrypted is converted into a phase-only image using modified G-S algorithm and this function is used as a key for encrypting another image. The original images are retrieved employing the concept of known-plaintext attack and following the DRPE decryption steps with all correct keys. The proposed scheme is also used for encryption of two color images with the help of convolution theorem and phase-truncated fractional Fourier transform. With some modification, the scheme is extended for simultaneous encryption of gray-scale and color images. As a proof-of-concept, simulation results have been presented for securing two gray-scale images, two color images, and simultaneous gray-scale and color images.

  20. A novel chaotic block image encryption algorithm based on dynamic random growth technique

    NASA Astrophysics Data System (ADS)

    Wang, Xingyuan; Liu, Lintao; Zhang, Yingqian

    2015-03-01

    This paper proposes a new block image encryption scheme based on hybrid chaotic maps and dynamic random growth technique. Since cat map is periodic and can be easily cracked by chosen plaintext attack, we use cat map in another securer way, which can completely eliminate the cyclical phenomenon and resist chosen plaintext attack. In the diffusion process, an intermediate parameter is calculated according to the image block. The intermediate parameter is used as the initial parameter of chaotic map to generate random data stream. In this way, the generated key streams are dependent on the plaintext image, which can resist the chosen plaintext attack. The experiment results prove that the proposed encryption algorithm is secure enough to be used in image transmission systems.

  1. Random search algorithm for solving the nonlinear Fredholm integral equations of the second kind.

    PubMed

    Hong, Zhimin; Yan, Zaizai; Yan, Jiao

    2014-01-01

    In this paper, a randomized numerical approach is used to obtain approximate solutions for a class of nonlinear Fredholm integral equations of the second kind. The proposed approach contains two steps: at first, we define a discretized form of the integral equation by quadrature formula methods and solution of this discretized form converges to the exact solution of the integral equation by considering some conditions on the kernel of the integral equation. And then we convert the problem to an optimal control problem by introducing an artificial control function. Following that, in the next step, solution of the discretized form is approximated by a kind of Monte Carlo (MC) random search algorithm. Finally, some examples are given to show the efficiency of the proposed approach.

  2. Applications of the BIOPHYS Algorithm for Physically-Based Retrieval of Biophysical, Structural and Forest Disturbance Information

    NASA Technical Reports Server (NTRS)

    Peddle, Derek R.; Huemmrich, K. Fred; Hall, Forrest G.; Masek, Jeffrey G.; Soenen, Scott A.; Jackson, Chris D.

    2011-01-01

    Canopy reflectance model inversion using look-up table approaches provides powerful and flexible options for deriving improved forest biophysical structural information (BSI) compared with traditional statistical empirical methods. The BIOPHYS algorithm is an improved, physically-based inversion approach for deriving BSI for independent use and validation and for monitoring, inventory and quantifying forest disturbance as well as input to ecosystem, climate and carbon models. Based on the multiple-forward mode (MFM) inversion approach, BIOPHYS results were summarized from different studies (Minnesota/NASA COVER; Virginia/LEDAPS; Saskatchewan/BOREAS), sensors (airborne MMR; Landsat; MODIS) and models (GeoSail; GOMS). Applications output included forest density, height, crown dimension, branch and green leaf area, canopy cover, disturbance estimates based on multi-temporal chronosequences, and structural change following recovery from forest fires over the last century. Good correspondences with validation field data were obtained. Integrated analyses of multiple solar and view angle imagery further improved retrievals compared with single pass data. Quantifying ecosystem dynamics such as the area and percent of forest disturbance, early regrowth and succession provide essential inputs to process-driven models of carbon flux. BIOPHYS is well suited for large-area, multi-temporal applications involving multiple image sets and mosaics for assessing vegetation disturbance and quantifying biophysical structural dynamics and change. It is also suitable for integration with forest inventory, monitoring, updating, and other programs.

  3. Evolving random fractal Cantor superlattices for the infrared using a genetic algorithm.

    PubMed

    Bossard, Jeremy A; Lin, Lan; Werner, Douglas H

    2016-01-01

    Ordered and chaotic superlattices have been identified in Nature that give rise to a variety of colours reflected by the skin of various organisms. In particular, organisms such as silvery fish possess superlattices that reflect a broad range of light from the visible to the UV. Such superlattices have previously been identified as 'chaotic', but we propose that apparent 'chaotic' natural structures, which have been previously modelled as completely random structures, should have an underlying fractal geometry. Fractal geometry, often described as the geometry of Nature, can be used to mimic structures found in Nature, but deterministic fractals produce structures that are too 'perfect' to appear natural. Introducing variability into fractals produces structures that appear more natural. We suggest that the 'chaotic' (purely random) superlattices identified in Nature are more accurately modelled by multi-generator fractals. Furthermore, we introduce fractal random Cantor bars as a candidate for generating both ordered and 'chaotic' superlattices, such as the ones found in silvery fish. A genetic algorithm is used to evolve optimal fractal random Cantor bars with multiple generators targeting several desired optical functions in the mid-infrared and the near-infrared. We present optimized superlattices demonstrating broadband reflection as well as single and multiple pass bands in the near-infrared regime.

  4. Evolving random fractal Cantor superlattices for the infrared using a genetic algorithm

    PubMed Central

    Bossard, Jeremy A.; Lin, Lan; Werner, Douglas H.

    2016-01-01

    Ordered and chaotic superlattices have been identified in Nature that give rise to a variety of colours reflected by the skin of various organisms. In particular, organisms such as silvery fish possess superlattices that reflect a broad range of light from the visible to the UV. Such superlattices have previously been identified as ‘chaotic’, but we propose that apparent ‘chaotic’ natural structures, which have been previously modelled as completely random structures, should have an underlying fractal geometry. Fractal geometry, often described as the geometry of Nature, can be used to mimic structures found in Nature, but deterministic fractals produce structures that are too ‘perfect’ to appear natural. Introducing variability into fractals produces structures that appear more natural. We suggest that the ‘chaotic’ (purely random) superlattices identified in Nature are more accurately modelled by multi-generator fractals. Furthermore, we introduce fractal random Cantor bars as a candidate for generating both ordered and ‘chaotic’ superlattices, such as the ones found in silvery fish. A genetic algorithm is used to evolve optimal fractal random Cantor bars with multiple generators targeting several desired optical functions in the mid-infrared and the near-infrared. We present optimized superlattices demonstrating broadband reflection as well as single and multiple pass bands in the near-infrared regime. PMID:26763335

  5. Estimating CT Image from MRI Data Using Structured Random Forest and Auto-context Model

    PubMed Central

    Huynh, Tri; Gao, Yaozong; Kang, Jiayin; Wang, Li; Zhang, Pei; Lian, Jun; Shen, Dinggang

    2015-01-01

    Computed tomography (CT) imaging is an essential tool in various clinical diagnoses and radiotherapy treatment planning. Since CT image intensities are directly related to positron emission tomography (PET) attenuation coefficients, they are indispensable for attenuation correction (AC) of the PET images. However, due to the relatively high dose of radiation exposure in CT scan, it is advised to limit the acquisition of CT images. In addition, in the new PET and magnetic resonance (MR) imaging scanner, only MR images are available, which are unfortunately not directly applicable to AC. These issues greatly motivate the development of methods for reliable estimate of CT image from its corresponding MR image of the same subject. In this paper, we propose a learning-based method to tackle this challenging problem. Specifically, we first partition a given MR image into a set of patches. Then, for each patch, we use the structured random forest to directly predict a CT patch as a structured output, where a new ensemble model is also used to ensure the robust prediction. Image features are innovatively crafted to achieve multi-level sensitivity, with spatial information integrated through only rigid-body alignment to help avoiding the error-prone inter-subject deformable registration. Moreover, we use an auto-context model to iteratively refine the prediction. Finally, we combine all of the predicted CT patches to obtain the final prediction for the given MR image. We demonstrate the efficacy of our method on two datasets: human brain and prostate images. Experimental results show that our method can accurately predict CT images in various scenarios, even for the images undergoing large shape variation, and also outperforms two state-of-the-art methods. PMID:26241970

  6. Random Forests to Predict Rectal Toxicity Following Prostate Cancer Radiation Therapy

    SciTech Connect

    Ospina, Juan D.; Zhu, Jian; Chira, Ciprian; Bossi, Alberto; Delobel, Jean B.; Beckendorf, Véronique; Dubray, Bernard; Lagrange, Jean-Léon; Correa, Juan C.; and others

    2014-08-01

    Purpose: To propose a random forest normal tissue complication probability (RF-NTCP) model to predict late rectal toxicity following prostate cancer radiation therapy, and to compare its performance to that of classic NTCP models. Methods and Materials: Clinical data and dose-volume histograms (DVH) were collected from 261 patients who received 3-dimensional conformal radiation therapy for prostate cancer with at least 5 years of follow-up. The series was split 1000 times into training and validation cohorts. A RF was trained to predict the risk of 5-year overall rectal toxicity and bleeding. Parameters of the Lyman-Kutcher-Burman (LKB) model were identified and a logistic regression model was fit. The performance of all the models was assessed by computing the area under the receiving operating characteristic curve (AUC). Results: The 5-year grade ≥2 overall rectal toxicity and grade ≥1 and grade ≥2 rectal bleeding rates were 16%, 25%, and 10%, respectively. Predictive capabilities were obtained using the RF-NTCP model for all 3 toxicity endpoints, including both the training and validation cohorts. The age and use of anticoagulants were found to be predictors of rectal bleeding. The AUC for RF-NTCP ranged from 0.66 to 0.76, depending on the toxicity endpoint. The AUC values for the LKB-NTCP were statistically significantly inferior, ranging from 0.62 to 0.69. Conclusions: The RF-NTCP model may be a useful new tool in predicting late rectal toxicity, including variables other than DVH, and thus appears as a strong competitor to classic NTCP models.

  7. Classification of melanoma lesions using sparse coded features and random forests

    NASA Astrophysics Data System (ADS)

    Rastgoo, Mojdeh; Lemaître, Guillaume; Morel, Olivier; Massich, Joan; Garcia, Rafael; Meriaudeau, Fabrice; Marzani, Franck; Sidibé, Désiré

    2016-03-01

    Malignant melanoma is the most dangerous type of skin cancer, yet it is the most treatable kind of cancer, conditioned by its early diagnosis which is a challenging task for clinicians and dermatologists. In this regard, CAD systems based on machine learning and image processing techniques are developed to differentiate melanoma lesions from benign and dysplastic nevi using dermoscopic images. Generally, these frameworks are composed of sequential processes: pre-processing, segmentation, and classification. This architecture faces mainly two challenges: (i) each process is complex with the need to tune a set of parameters, and is specific to a given dataset; (ii) the performance of each process depends on the previous one, and the errors are accumulated throughout the framework. In this paper, we propose a framework for melanoma classification based on sparse coding which does not rely on any pre-processing or lesion segmentation. Our framework uses Random Forests classifier and sparse representation of three features: SIFT, Hue and Opponent angle histograms, and RGB intensities. The experiments are carried out on the public PH2 dataset using a 10-fold cross-validation. The results show that SIFT sparse-coded feature achieves the highest performance with sensitivity and specificity of 100% and 90.3% respectively, with a dictionary size of 800 atoms and a sparsity level of 2. Furthermore, the descriptor based on RGB intensities achieves similar results with sensitivity and specificity of 100% and 71.3%, respectively for a smaller dictionary size of 100 atoms. In conclusion, dictionary learning techniques encode strong structures of dermoscopic images and provide discriminant descriptors.

  8. Using Random Forest to Improve the Downscaling of Global Livestock Census Data

    PubMed Central

    Nicolas, Gaëlle; Robinson, Timothy P.; Wint, G. R. William; Conchedda, Giulia; Cinardi, Giuseppina; Gilbert, Marius

    2016-01-01

    Large scale, high-resolution global data on farm animal distributions are essential for spatially explicit assessments of the epidemiological, environmental and socio-economic impacts of the livestock sector. This has been the major motivation behind the development of the Gridded Livestock of the World (GLW) database, which has been extensively used since its first publication in 2007. The database relies on a downscaling methodology whereby census counts of animals in sub-national administrative units are redistributed at the level of grid cells as a function of a series of spatial covariates. The recent upgrade of GLW1 to GLW2 involved automating the processing, improvement of input data, and downscaling at a spatial resolution of 1 km per cell (5 km per cell in the earlier version). The underlying statistical methodology, however, remained unchanged. In this paper, we evaluate new methods to downscale census data with a higher accuracy and increased processing efficiency. Two main factors were evaluated, based on sample census datasets of cattle in Africa and chickens in Asia. First, we implemented and evaluated Random Forest models (RF) instead of stratified regressions. Second, we investigated whether models that predicted the number of animals per rural person (per capita) could provide better downscaled estimates than the previous approach that predicted absolute densities (animals per km2). RF models consistently provided better predictions than the stratified regressions for both continents and species. The benefit of per capita over absolute density models varied according to the species and continent. In addition, different technical options were evaluated to reduce the processing time while maintaining their predictive power. Future GLW runs (GLW 3.0) will apply the new RF methodology with optimized modelling options. The potential benefit of per capita models will need to be further investigated with a better distinction between rural and agricultural

  9. Object-based gully system prediction from medium resolution imagery using Random Forests

    NASA Astrophysics Data System (ADS)

    Shruthi, Rajesh B. V.; Kerle, Norman; Jetten, Victor; Stein, Alfred

    2014-07-01

    Erosion, in particular gully erosion, is a widespread problem. Its mapping is crucial for erosion monitoring and remediation of degraded areas. In addition, mapping of areas with high potential for future gully erosion can be used to assist prevention strategies. Good relations with topographic variables collected from the field are appropriate for determining areas susceptible to gullying. Image analysis of high resolution remotely sensed imagery (HRI) in combination with field verification has proven to be a good approach, although dependent on expensive imagery. Automatic and semi-automatic methods, such as object-oriented analysis (OOA), are rapid and reproducible. However, HRI data are not always available. We therefore attempted to identify gully systems using statistical modeling of image features from medium resolution imagery, here ASTER. These data were used for determining areas within gully system boundaries (GSB) using a semi-automatic method based on OOA. We assess if the selection of useful object features can be done in an objective and transferable way, using Random Forests (RF) for prediction of gully systems at regional scale, here in the Sehoul region, near Rabat, Morocco. Moderate success was achieved using a semi-automatic object-based RF model (out-of-bag error of 18.8%). Besides compensating for the imbalance between gully and non-gully classes, the procedure followed in this study enabled us to balance the classification error rates. The user's and producer's accuracy of the data with a balanced set of class showed an improved accuracy of the spatial estimates of gully systems, when compared to the data with imbalanced class. The model over-predicted the area within the GSB (13-27%), but its overall performance demonstrated that medium resolution satellite images contain sufficient information to identify gully systems, so that large areas can be mapped with relatively little effort and acceptable accuracy.

  10. Bush encroachment monitoring using multi-temporal Landsat data and random forests

    NASA Astrophysics Data System (ADS)

    Symeonakis, E.; Higginbottom, T.

    2014-11-01

    It is widely accepted that land degradation and desertification (LDD) are serious global threats to humans and the environment. Around a third of savannahs in Africa are affected by LDD processes that may lead to substantial declines in ecosystem functioning and services. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as a potential indicator of LDD. Mapping bush encroachment over large areas can only effectively be done using Earth Observation (EO) data and techniques. However, the accurate assessment of large-scale savannah degradation through bush encroachment with satellite imagery remains a formidable task due to the fact that on the satellite data vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes. Here, we present a methodological framework for the monitoring of bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa. We utilise multi-temporal Landsat TM and ETM+ (SLC-on) data from 1989 until 2009, mostly from the dry-season, and ancillary data in a GIS environment. We then use the machine learning classification approach of random forests to identify the extent of encroachment over the 20-year period. The results show that in the area of study, bush encroachment is as alarming as permanent vegetation loss. The classification of the year 2009 is validated yielding low commission and omission errors and high k-statistic values for the grasses and woody vegetation classes. Our approach is a step towards a rigorous and effective savannah degradation assessment.

  11. Insight into Best Variables for COPD Case Identification: A Random Forests Analysis

    PubMed Central

    Leidy, Nancy K.; Malley, Karen G.; Steenrod, Anna W.; Mannino, David M.; Make, Barry J.; Bowler, Russ P.; Thomashow, Byron M.; Barr, R. G.; Rennard, Stephen I.; Houfek, Julia F.; Yawn, Barbara P.; Han, Meilan K.; Meldrum, Catherine A.; Bacci, Elizabeth D.; Walsh, John W.; Martinez, Fernando

    2016-01-01

    Rationale This study is part of a larger, multi-method project to develop a questionnaire for identifying undiagnosed cases of chronic obstructive pulmonary disease (COPD) in primary care settings, with specific interest in the detection of patients with moderate to severe airway obstruction or risk of exacerbation. Objectives To examine 3 existing datasets for insight into key features of COPD that could be useful in the identification of undiagnosed COPD. Methods Random forests analyses were applied to the following databases: COPD Foundation Peak Flow Study Cohort (N=5761), Burden of Obstructive Lung Disease (BOLD) Kentucky site (N=508), and COPDGene® (N=10,214). Four scenarios were examined to find the best, smallest sets of variables that distinguished cases and controls:(1) moderate to severe COPD (forced expiratory volume in 1 second [FEV1] <50% predicted) versus no COPD; (2) undiagnosed versus diagnosed COPD; (3) COPD with and without exacerbation history; and (4) clinically significant COPD (FEV1<60% predicted or history of acute exacerbation) versus all others. Results From 4 to 8 variables were able to differentiate cases from controls, with sensitivity ≥73 (range: 73–90) and specificity >68 (range: 68–93). Across scenarios, the best models included age, smoking status or history, symptoms (cough, wheeze, phlegm), general or breathing-related activity limitation, episodes of acute bronchitis, and/or missed work days and non-work activities due to breathing or health. Conclusions Results provide insight into variables that should be considered during the development of candidate items for a new questionnaire to identify undiagnosed cases of clinically significant COPD. PMID:26835508

  12. Deciphering the Routes of invasion of Drosophila suzukii by Means of ABC Random Forest.

    PubMed

    Fraimout, Antoine; Debat, Vincent; Fellous, Simon; Hufbauer, Ruth A; Foucaud, Julien; Pudlo, Pierre; Marin, Jean-Michel; Price, Donald K; Cattel, Julien; Chen, Xiao; Deprá, Marindia; François Duyck, Pierre; Guedot, Christelle; Kenis, Marc; Kimura, Masahito T; Loeb, Gregory; Loiseau, Anne; Martinez-Sañudo, Isabel; Pascual, Marta; Polihronakis Richmond, Maxi; Shearer, Peter; Singh, Nadia; Tamura, Koichiro; Xuéreb, Anne; Zhang, Jinping; Estoup, Arnaud

    2017-04-01

    Deciphering invasion routes from molecular data is crucial to understanding biological invasions, including identifying bottlenecks in population size and admixture among distinct populations. Here, we unravel the invasion routes of the invasive pest Drosophila suzukii using a multi-locus microsatellite dataset (25 loci on 23 worldwide sampling locations). To do this, we use approximate Bayesian computation (ABC), which has improved the reconstruction of invasion routes, but can be computationally expensive. We use our study to illustrate the use of a new, more efficient, ABC method, ABC random forest (ABC-RF) and compare it to a standard ABC method (ABC-LDA). We find that Japan emerges as the most probable source of the earliest recorded invasion into Hawaii. Southeast China and Hawaii together are the most probable sources of populations in western North America, which then in turn served as sources for those in eastern North America. European populations are genetically more homogeneous than North American populations, and their most probable source is northeast China, with evidence of limited gene flow from the eastern US as well. All introduced populations passed through bottlenecks, and analyses reveal five distinct admixture events. These findings can inform hypotheses concerning how this species evolved between different and independent source and invasive populations. Methodological comparisons indicate that ABC-RF and ABC-LDA show concordant results if ABC-LDA is based on a large number of simulated datasets but that ABC-RF out-performs ABC-LDA when using a comparable and more manageable number of simulated datasets, especially when analyzing complex introduction scenarios.

  13. Random Forest classification based on star graph topological indices for antioxidant proteins.

    PubMed

    Fernández-Blanco, Enrique; Aguiar-Pulido, Vanessa; Munteanu, Cristian Robert; Dorado, Julian

    2013-01-21

    Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.

  14. Random forest classification of large volume structures for visuo-haptic rendering in CT images

    NASA Astrophysics Data System (ADS)

    Mastmeyer, Andre; Fortmeier, Dirk; Handels, Heinz

    2016-03-01

    For patient-specific voxel-based visuo-haptic rendering of CT scans of the liver area, the fully automatic segmentation of large volume structures such as skin, soft tissue, lungs and intestine (risk structures) is important. Using a machine learning based approach, several existing segmentations from 10 segmented gold-standard patients are learned by random decision forests individually and collectively. The core of this paper is feature selection and the application of the learned classifiers to a new patient data set. In a leave-some-out cross-validation, the obtained full volume segmentations are compared to the gold-standard segmentations of the untrained patients. The proposed classifiers use a multi-dimensional feature space to estimate the hidden truth, instead of relying on clinical standard threshold and connectivity based methods. The result of our efficient whole-body section classification are multi-label maps with the considered tissues. For visuo-haptic simulation, other small volume structures would have to be segmented additionally. We also take a look into these structures (liver vessels). For an experimental leave-some-out study consisting of 10 patients, the proposed method performs much more efficiently compared to state of the art methods. In two variants of leave-some-out experiments we obtain best mean DICE ratios of 0.79, 0.97, 0.63 and 0.83 for skin, soft tissue, hard bone and risk structures. Liver structures are segmented with DICE 0.93 for the liver, 0.43 for blood vessels and 0.39 for bile vessels.

  15. In vivo MRI based prostate cancer localization with random forests and auto-context model.

    PubMed

    Qian, Chunjun; Wang, Li; Gao, Yaozong; Yousuf, Ambereen; Yang, Xiaoping; Oto, Aytekin; Shen, Dinggang

    2016-09-01

    Prostate cancer is one of the major causes of cancer death for men. Magnetic resonance (MR) imaging is being increasingly used as an important modality to localize prostate cancer. Therefore, localizing prostate cancer in MRI with automated detection methods has become an active area of research. Many methods have been proposed for this task. However, most of previous methods focused on identifying cancer only in the peripheral zone (PZ), or classifying suspicious cancer ROIs into benign tissue and cancer tissue. Few works have been done on developing a fully automatic method for cancer localization in the entire prostate region, including central gland (CG) and transition zone (TZ). In this paper, we propose a novel learning-based multi-source integration framework to directly localize prostate cancer regions from in vivo MRI. We employ random forests to effectively integrate features from multi-source images together for cancer localization. Here, multi-source images include initially the multi-parametric MRIs (i.e., T2, DWI, and dADC) and later also the iteratively-estimated and refined tissue probability map of prostate cancer. Experimental results on 26 real patient data show that our method can accurately localize cancerous sections. The higher section-based evaluation (SBE), combined with the ROC analysis result of individual patients, shows that the proposed method is promising for in vivo MRI based prostate cancer localization, which can be used for guiding prostate biopsy, targeting the tumor in focal therapy planning, triage and follow-up of patients with active surveillance, as well as the decision making in treatment selection. The common ROC analysis with the AUC value of 0.832 and also the ROI-based ROC analysis with the AUC value of 0.883 both illustrate the effectiveness of our proposed method.

  16. Cooperative mobile agents search using beehive partitioned structure and Tabu Random search algorithm

    NASA Astrophysics Data System (ADS)

    Ramazani, Saba; Jackson, Delvin L.; Selmic, Rastko R.

    2013-05-01

    In search and surveillance operations, deploying a team of mobile agents provides a robust solution that has multiple advantages over using a single agent in efficiency and minimizing exploration time. This paper addresses the challenge of identifying a target in a given environment when using a team of mobile agents by proposing a novel method of mapping and movement of agent teams in a cooperative manner. The approach consists of two parts. First, the region is partitioned into a hexagonal beehive structure in order to provide equidistant movements in every direction and to allow for more natural and flexible environment mapping. Additionally, in search environments that are partitioned into hexagons, mobile agents have an efficient travel path while performing searches due to this partitioning approach. Second, we use a team of mobile agents that move in a cooperative manner and utilize the Tabu Random algorithm to search for the target. Due to the ever-increasing use of robotics and Unmanned Aerial Vehicle (UAV) platforms, the field of cooperative multi-agent search has developed many applications recently that would benefit from the use of the approach presented in this work, including: search and rescue operations, surveillance, data collection, and border patrol. In this paper, the increased efficiency of the Tabu Random Search algorithm method in combination with hexagonal partitioning is simulated, analyzed, and advantages of this approach are presented and discussed.

  17. Quantitative structure-property relationships of retention indices of some sulfur organic compounds using random forest technique as a variable selection and modeling method.

    PubMed

    Goudarzi, Nasser; Shahsavani, Davood; Emadi-Gandaghi, Fereshteh; Chamjangali, Mansour Arab

    2016-10-01

    In this work, a noble quantitative structure-property relationship technique is proposed on the basis of the random forest for prediction of the retention indices of some sulfur organic compounds. In order to calculate the retention indices of these compounds, the theoretical descriptors produced using their molecular structures are employed. The influence of the significant parameters affecting the capability of the developed random forest prediction power such as the number of randomly selected variables applied to split each node (m) and the number of trees (nt ) is studied to obtain the best model. After optimizing the nt and m parameters, the random forest model conducted for m = 70 and nt = 460 was found to yield the best results. The artificial neural network and multiple linear regression modeling techniques are also used to predict the retention index values for these compounds for comparison with the results of random forest model. The descriptors selected by the stepwise regression and random forest model are used to build the artificial neural network models. The results achieved showed the superiority of the random forest model over the other models for prediction of the retention indices of the studied compounds.

  18. Track-Before-Detect Algorithm for Faint Moving Objects based on Random Sampling and Consensus

    NASA Astrophysics Data System (ADS)

    Dao, P.; Rast, R.; Schlaegel, W.; Schmidt, V.; Dentamaro, A.

    2014-09-01

    There are many algorithms developed for tracking and detecting faint moving objects in congested backgrounds. One obvious application is detection of targets in images where each pixel corresponds to the received power in a particular location. In our application, a visible imager operated in stare mode observes geostationary objects as fixed, stars as moving and non-geostationary objects as drifting in the field of view. We would like to achieve high sensitivity detection of the drifters. The ability to improve SNR with track-before-detect (TBD) processing, where target information is collected and collated before the detection decision is made, allows respectable performance against dim moving objects. Generally, a TBD algorithm consists of a pre-processing stage that highlights potential targets and a temporal filtering stage. However, the algorithms that have been successfully demonstrated, e.g. Viterbi-based and Bayesian-based, demand formidable processing power and memory. We propose an algorithm that exploits the quasi constant velocity of objects, the predictability of the stellar clutter and the intrinsically low false alarm rate of detecting signature candidates in 3-D, based on an iterative method called "RANdom SAmple Consensus” and one that can run real-time on a typical PC. The technique is tailored for searching objects with small telescopes in stare mode. Our RANSAC-MT (Moving Target) algorithm estimates parameters of a mathematical model (e.g., linear motion) from a set of observed data which contains a significant number of outliers while identifying inliers. In the pre-processing phase, candidate blobs were selected based on morphology and an intensity threshold that would normally generate unacceptable level of false alarms. The RANSAC sampling rejects candidates that conform to the predictable motion of the stars. Data collected with a 17 inch telescope by AFRL/RH and a COTS lens/EM-CCD sensor by the AFRL/RD Satellite Assessment Center is

  19. An efficient voting algorithm for finding additive biclusters with random background.

    PubMed

    Xiao, Jing; Wang, Lusheng; Liu, Xiaowen; Jiang, Tao

    2008-12-01

    The biclustering problem has been extensively studied in many areas, including e-commerce, data mining, machine learning, pattern recognition, statistics, and, more recently, computational biology. Given an n x m matrix A (n >or= m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard for various objective functions. In this article, we study a probabilistic model for the implanted additive bicluster problem, where each element in the n x m background matrix is a random integer from [0, L - 1] for some integer L, and a k x k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L - 1] with probability theta. We propose an O(n(2)m) time algorithm based on voting to solve the problem. We show that when k >or= Omega(square root of (n log n)), the voting algorithm can correctly find the implanted bicluster with probability at least 1 - (9/n(2)). We also implement our algorithm as a C++ program named VOTE. The implementation incorporates several ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with overlapping implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed.

  20. Identifying and Analyzing Novel Epilepsy-Related Genes Using Random Walk with Restart Algorithm

    PubMed Central

    Guo, Wei; Shang, Dong-Mei; Cao, Jing-Hui; Feng, Kaiyan; Wang, ShaoPeng

    2017-01-01

    As a pathological condition, epilepsy is caused by abnormal neuronal discharge in brain which will temporarily disrupt the cerebral functions. Epilepsy is a chronic disease which occurs in all ages and would seriously affect patients' personal lives. Thus, it is highly required to develop effective medicines or instruments to treat the disease. Identifying epilepsy-related genes is essential in order to understand and treat the disease because the corresponding proteins encoded by the epilepsy-related genes are candidates of the potential drug targets. In this study, a pioneering computational workflow was proposed to predict novel epilepsy-related genes using the random walk with restart (RWR) algorithm. As reported in the literature RWR algorithm often produces a number of false positive genes, and in this study a permutation test and functional association tests were implemented to filter the genes identified by RWR algorithm, which greatly reduce the number of suspected genes and result in only thirty-three novel epilepsy genes. Finally, these novel genes were analyzed based upon some recently published literatures. Our findings implicate that all novel genes were closely related to epilepsy. It is believed that the proposed workflow can also be applied to identify genes related to other diseases and deepen our understanding of the mechanisms of these diseases. PMID:28255556

  1. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers.

    PubMed

    Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Lanckriet, Gert; Wing, David; Marshall, Simon

    2014-11-01

    Wrist accelerometers are being used in population level surveillance of physical activity (PA) but more research is needed to evaluate their validity for correctly classifying types of PA behavior and predicting energy expenditure (EE). In this study we compare accelerometers worn on the wrist and hip, and the added value of heart rate (HR) data, for predicting PA type and EE using machine learning. Forty adults performed locomotion and household activities in a lab setting while wearing three ActiGraph GT3X+ accelerometers (left hip, right hip, non-dominant wrist) and a HR monitor (Polar RS400). Participants also wore a portable indirect calorimeter (COSMED K4b2), from which EE and metabolic equivalents (METs) were computed for each minute. We developed two predictive models: a random forest classifier to predict activity type and a random forest of regression trees to estimate METs. Predictions were evaluated using leave-one-user-out cross-validation. The hip accelerometer obtained an average accuracy of 92.3% in predicting four activity types (household, stairs, walking, running), while the wrist accelerometer obtained an average accuracy of 87.5%. Across all 8 activities combined (laundry, window washing, dusting, dishes, sweeping, stairs, walking, running), the hip and wrist accelerometers obtained average accuracies of 70.2% and 80.2% respectively. Predicting METs using the hip or wrist devices alone obtained root mean square errors (rMSE) of 1.09 and 1.00 METs per 6 min bout, respectively. Including HR data improved MET estimation, but did not significantly improve activity type classification. These results demonstrate the validity of random forest classification and regression forests for PA type and MET prediction using accelerometers. The wrist accelerometer proved more useful in predicting activities with significant arm movement, while the hip accelerometer was superior for predicting locomotion and estimating EE.

  2. A random forest classifier for the prediction of energy expenditure and type of physical activity from wrist and hip accelerometers

    PubMed Central

    Ellis, Katherine; Kerr, Jacqueline; Godbole, Suneeta; Lanckriet, Gert; Wing, David; Marshall, Simon

    2015-01-01

    Wrist accelerometers are being used in population level surveillance of physical activity (PA) but more research is needed to evaluate their validity for correctly classifying types of PA behavior and predicting energy expenditure (EE). In this study we compare accelerometers worn on the wrist and hip, and the added value of heart rate (HR) data, for predicting PA type and EE using machine learning. Forty adults performed locomotion and household activities in a lab setting while wearing three ActiGraph GT3X+ accelerometers (left hip, right hip, non-dominant wrist) and a HR monitor (Polar RS400). Participants also wore a portable indirect calorimeter (COSMED K4b2), from which EE and metabolic equivalents (METs) were computed for each minute. We developed two predictive models: a random forest classifier to predict activity type and a random forest of regression trees to estimate METs. Predictions were evaluated using leave-one-user-out cross-validation. The hip accelerometer obtained an average accuracy of 92.3% in predicting four activity types (household, stairs, walking, running), while the wrist accelerometer obtained an average accuracy of 87.5%. Across all 8 activities combined (laundry, window washing, dusting, dishes, sweeping, stairs, walking, running), the hip and wrist accelerometers obtained average accuracies of 70.2% and 80.2% respectively. Predicting METs using the hip or wrist devices alone obtained root mean square errors (rMSE) of 1.09 and 1.00 METs per 6-minute bout, respectively. Including HR data improved MET estimation, but did not significantly improve activity type classification. These results demonstrate the validity of random forest classification and regression forests for PA type and MET prediction using accelerometers. The wrist accelerometer proved more useful in predicting activities with significant arm movement, while the hip accelerometer was superior for predicting locomotion and estimating EE. PMID:25340969

  3. Comparing Algorithms for Graph Isomorphism Using Discrete- and Continuous-Time Quantum Random Walks

    DOE PAGES

    Rudinger, Kenneth; Gamble, John King; Bach, Eric; ...

    2013-07-01

    Berry and Wang [Phys. Rev. A 83, 042317 (2011)] show numerically that a discrete-time quan- tum random walk of two noninteracting particles is able to distinguish some non-isomorphic strongly regular graphs from the same family. Here we analytically demonstrate how it is possible for these walks to distinguish such graphs, while continuous-time quantum walks of two noninteracting parti- cles cannot. We show analytically and numerically that even single-particle discrete-time quantum random walks can distinguish some strongly regular graphs, though not as many as two-particle noninteracting discrete-time walks. Additionally, we demonstrate how, given the same quantum random walk, subtle di erencesmore » in the graph certi cate construction algorithm can nontrivially im- pact the walk's distinguishing power. We also show that no continuous-time walk of a xed number of particles can distinguish all strongly regular graphs when used in conjunction with any of the graph certi cates we consider. We extend this constraint to discrete-time walks of xed numbers of noninteracting particles for one kind of graph certi cate; it remains an open question as to whether or not this constraint applies to the other graph certi cates we consider.« less

  4. Comparing Algorithms for Graph Isomorphism Using Discrete- and Continuous-Time Quantum Random Walks

    SciTech Connect

    Rudinger, Kenneth; Gamble, John King; Bach, Eric; Friesen, Mark; Joynt, Robert; Coppersmith, S. N.

    2013-07-01

    Berry and Wang [Phys. Rev. A 83, 042317 (2011)] show numerically that a discrete-time quan- tum random walk of two noninteracting particles is able to distinguish some non-isomorphic strongly regular graphs from the same family. Here we analytically demonstrate how it is possible for these walks to distinguish such graphs, while continuous-time quantum walks of two noninteracting parti- cles cannot. We show analytically and numerically that even single-particle discrete-time quantum random walks can distinguish some strongly regular graphs, though not as many as two-particle noninteracting discrete-time walks. Additionally, we demonstrate how, given the same quantum random walk, subtle di erences in the graph certi cate construction algorithm can nontrivially im- pact the walk's distinguishing power. We also show that no continuous-time walk of a xed number of particles can distinguish all strongly regular graphs when used in conjunction with any of the graph certi cates we consider. We extend this constraint to discrete-time walks of xed numbers of noninteracting particles for one kind of graph certi cate; it remains an open question as to whether or not this constraint applies to the other graph certi cates we consider.

  5. Water chemistry in 179 randomly selected Swedish headwater streams related to forest production, clear-felling and climate.

    PubMed

    Löfgren, Stefan; Fröberg, Mats; Yu, Jun; Nisell, Jakob; Ranneby, Bo

    2014-12-01

    From a policy perspective, it is important to understand forestry effects on surface waters from a landscape perspective. The EU Water Framework Directive demands remedial actions if not achieving good ecological status. In Sweden, 44 % of the surface water bodies have moderate ecological status or worse. Many of these drain catchments with a mosaic of managed forests. It is important for the forestry sector and water authorities to be able to identify where, in the forested landscape, special precautions are necessary. The aim of this study was to quantify the relations between forestry parameters and headwater stream concentrations of nutrients, organic matter and acid-base chemistry. The results are put into the context of regional climate, sulphur and nitrogen deposition, as well as marine influences. Water chemistry was measured in 179 randomly selected headwater streams from two regions in southwest and central Sweden, corresponding to 10 % of the Swedish land area. Forest status was determined from satellite images and Swedish National Forest Inventory data using the probabilistic classifier method, which was used to model stream water chemistry with Bayesian model averaging. The results indicate that concentrations of e.g. nitrogen, phosphorus and organic matter are related to factors associated with forest production but that it is not forestry per se that causes the excess losses. Instead, factors simultaneously affecting forest production and stream water chemistry, such as climate, extensive soil pools and nitrogen deposition, are the most likely candidates The relationships with clear-felled and wetland areas are likely to be direct effects.

  6. Constraining ecosystem model with adaptive Metropolis algorithm using boreal forest site eddy covariance measurements

    NASA Astrophysics Data System (ADS)

    Mäkelä, Jarmo; Susiluoto, Jouni; Markkanen, Tiina; Aurela, Mika; Järvinen, Heikki; Mammarella, Ivan; Hagemann, Stefan; Aalto, Tuula

    2016-12-01

    We examined parameter optimisation in the JSBACH (Kaminski et al., 2013; Knorr and Kattge, 2005; Reick et al., 2013) ecosystem model, applied to two boreal forest sites (Hyytiälä and Sodankylä) in Finland. We identified and tested key parameters in soil hydrology and forest water and carbon-exchange-related formulations, and optimised them using the adaptive Metropolis (AM) algorithm for Hyytiälä with a 5-year calibration period (2000-2004) followed by a 4-year validation period (2005-2008). Sodankylä acted as an independent validation site, where optimisations were not made. The tuning provided estimates for full distribution of possible parameters, along with information about correlation, sensitivity and identifiability. Some parameters were correlated with each other due to a phenomenological connection between carbon uptake and water stress or other connections due to the set-up of the model formulations. The latter holds especially for vegetation phenology parameters. The least identifiable parameters include phenology parameters, parameters connecting relative humidity and soil dryness, and the field capacity of the skin reservoir. These soil parameters were masked by the large contribution from vegetation transpiration. In addition to leaf area index and the maximum carboxylation rate, the most effective parameters adjusting the gross primary production (GPP) and evapotranspiration (ET) fluxes in seasonal tuning were related to soil wilting point, drainage and moisture stress imposed on vegetation. For daily and half-hourly tunings the most important parameters were the ratio of leaf internal CO2 concentration to external CO2 and the parameter connecting relative humidity and soil dryness. Effectively the seasonal tuning transferred water from soil moisture into ET, and daily and half-hourly tunings reversed this process. The seasonal tuning improved the month-to-month development of GPP and ET, and produced the most stable estimates of water use

  7. Random forest learning of ultrasonic statistical physics and object spaces for lesion detection in 2D sonomammography

    NASA Astrophysics Data System (ADS)

    Sheet, Debdoot; Karamalis, Athanasios; Kraft, Silvan; Noël, Peter B.; Vag, Tibor; Sadhu, Anup; Katouzian, Amin; Navab, Nassir; Chatterjee, Jyotirmoy; Ray, Ajoy K.

    2013-03-01

    Breast cancer is the most common form of cancer in women. Early diagnosis can significantly improve lifeexpectancy and allow different treatment options. Clinicians favor 2D ultrasonography for breast tissue abnormality screening due to high sensitivity and specificity compared to competing technologies. However, inter- and intra-observer variability in visual assessment and reporting of lesions often handicaps its performance. Existing Computer Assisted Diagnosis (CAD) systems though being able to detect solid lesions are often restricted in performance. These restrictions are inability to (1) detect lesion of multiple sizes and shapes, and (2) differentiate between hypo-echoic lesions from their posterior acoustic shadowing. In this work we present a completely automatic system for detection and segmentation of breast lesions in 2D ultrasound images. We employ random forests for learning of tissue specific primal to discriminate breast lesions from surrounding normal tissues. This enables it to detect lesions of multiple shapes and sizes, as well as discriminate between hypo-echoic lesion from associated posterior acoustic shadowing. The primal comprises of (i) multiscale estimated ultrasonic statistical physics and (ii) scale-space characteristics. The random forest learns lesion vs. background primal from a database of 2D ultrasound images with labeled lesions. For segmentation, the posterior probabilities of lesion pixels estimated by the learnt random forest are hard thresholded to provide a random walks segmentation stage with starting seeds. Our method achieves detection with 99.19% accuracy and segmentation with mean contour-to-contour error < 3 pixels on a set of 40 images with 49 lesions.

  8. Addressing methodological challenges in implementing the nursing home pain management algorithm randomized controlled trial

    PubMed Central

    Ersek, Mary; Polissar, Nayak; Du Pen, Anna; Jablonski, Anita; Herr, Keela; Neradilek, Moni B

    2015-01-01

    Background Unrelieved pain among nursing home (NH) residents is a well-documented problem. Attempts have been made to enhance pain management for older adults, including those in NHs. Several evidence-based clinical guidelines have been published to assist providers in assessing and managing acute and chronic pain in older adults. Despite the proliferation and dissemination of these practice guidelines, research has shown that intensive systems-level implementation strategies are necessary to change clinical practice and patient outcomes within a health-care setting. One promising approach is the embedding of guidelines into explicit protocols and algorithms to enhance decision making. Purpose The goal of the article is to describe several issues that arose in the design and conduct of a study that compared the effectiveness of pain management algorithms coupled with a comprehensive adoption program versus the effectiveness of education alone in improving evidence-based pain assessment and management practices, decreasing pain and depressive symptoms, and enhancing mobility among NH residents. Methods The study used a cluster-randomized controlled trial (RCT) design in which the individual NH was the unit of randomization. The Roger's Diffusion of Innovations theory provided the framework for the intervention. Outcome measures were surrogate-reported usual pain, self-reported usual and worst pain, and self-reported pain-related interference with activities, depression, and mobility. Results The final sample consisted of 485 NH residents from 27 NHs. The investigators were able to use a staggered enrollment strategy to recruit and retain facilities. The adaptive randomization procedures were successful in balancing intervention and control sites on key NH characteristics. Several strategies were successfully implemented to enhance the adoption of the algorithm. Limitations/Lessons The investigators encountered several methodological challenges that were inherent to

  9. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    NASA Astrophysics Data System (ADS)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  10. Fast conical surface evaluation via randomized algorithm in the null-screen test

    NASA Astrophysics Data System (ADS)

    Aguirre-Aguirre, D.; Díaz-Uribe, R.; Villalobos-Mendoza, B.

    2017-01-01

    This work shows a method to recover the shape of the surface via randomized algorithms when the null-screen test is used, instead of the integration process that is commonly performed. This, because the majority of the errors are added during the reconstruction of the surface (or the integration process). This kind of large surfaces are widely used in the aerospace sector and industry in general, and a big problem exists when these surfaces have to be tested. The null-screen method is a low-cost test, and a complete surface analysis can be done by using this method. In this paper, we show the simulations done for the analysis of fast conic surfaces, where it was proved that the quality and shape of a surface under study can be recovered with a percentage error < 2.

  11. Simulation of Anderson localization in a random fiber using a fast Fresnel diffraction algorithm

    NASA Astrophysics Data System (ADS)

    Davis, Jeffrey A.; Cottrell, Don M.

    2016-06-01

    Anderson localization has been previously demonstrated both theoretically and experimentally for transmission of a Gaussian beam through long distances in an optical fiber consisting of a random array of smaller fibers, each having either a higher or lower refractive index. However, the computational times were extremely long. We show how to simulate these results using a fast Fresnel diffraction algorithm. In each iteration of this approach, the light passes through a phase mask, undergoes Fresnel diffraction over a small distance, and then passes through the same phase mask. We also show results where we use a binary amplitude mask at the input that selectively illuminates either the higher or the lower index fibers. Additionally, we examine imaging of various sized objects through these fibers. In all cases, our results are consistent with other computational methods and experimental results, but with a much reduced computational time.

  12. An improved random walk algorithm for the implicit Monte Carlo method

    NASA Astrophysics Data System (ADS)

    Keady, Kendra P.; Cleveland, Mathew A.

    2017-01-01

    In this work, we introduce a modified Implicit Monte Carlo (IMC) Random Walk (RW) algorithm, which increases simulation efficiency for multigroup radiative transfer problems with strongly frequency-dependent opacities. To date, the RW method has only been implemented in "fully-gray" form; that is, the multigroup IMC opacities are group-collapsed over the full frequency domain of the problem to obtain a gray diffusion problem for RW. This formulation works well for problems with large spatial cells and/or opacities that are weakly dependent on frequency; however, the efficiency of the RW method degrades when the spatial cells are thin or the opacities are a strong function of frequency. To address this inefficiency, we introduce a RW frequency group cutoff in each spatial cell, which divides the frequency domain into optically thick and optically thin components. In the modified algorithm, opacities for the RW diffusion problem are obtained by group-collapsing IMC opacities below the frequency group cutoff. Particles with frequencies above the cutoff are transported via standard IMC, while particles below the cutoff are eligible for RW. This greatly increases the total number of RW steps taken per IMC time-step, which in turn improves the efficiency of the simulation. We refer to this new method as Partially-Gray Random Walk (PGRW). We present numerical results for several multigroup radiative transfer problems, which show that the PGRW method is significantly more efficient than standard RW for several problems of interest. In general, PGRW decreases runtimes by a factor of ∼2-4 compared to standard RW, and a factor of ∼3-6 compared to standard IMC. While PGRW is slower than frequency-dependent Discrete Diffusion Monte Carlo (DDMC), it is also easier to adapt to unstructured meshes and can be used in spatial cells where DDMC is not applicable. This suggests that it may be optimal to employ both DDMC and PGRW in a single simulation.

  13. SU-F-BRD-09: A Random Walk Model Algorithm for Proton Dose Calculation

    SciTech Connect

    Yao, W; Farr, J

    2015-06-15

    Purpose: To develop a random walk model algorithm for calculating proton dose with balanced computation burden and accuracy. Methods: Random walk (RW) model is sometimes referred to as a density Monte Carlo (MC) simulation. In MC proton dose calculation, the use of Gaussian angular distribution of protons due to multiple Coulomb scatter (MCS) is convenient, but in RW the use of Gaussian angular distribution requires an extremely large computation and memory. Thus, our RW model adopts spatial distribution from the angular one to accelerate the computation and to decrease the memory usage. From the physics and comparison with the MC simulations, we have determined and analytically expressed those critical variables affecting the dose accuracy in our RW model. Results: Besides those variables such as MCS, stopping power, energy spectrum after energy absorption etc., which have been extensively discussed in literature, the following variables were found to be critical in our RW model: (1) inverse squared law that can significantly reduce the computation burden and memory, (2) non-Gaussian spatial distribution after MCS, and (3) the mean direction of scatters at each voxel. In comparison to MC results, taken as reference, for a water phantom irradiated by mono-energetic proton beams from 75 MeV to 221.28 MeV, the gamma test pass rate was 100% for the 2%/2mm/10% criterion. For a highly heterogeneous phantom consisting of water embedded by a 10 cm cortical bone and a 10 cm lung in the Bragg peak region of the proton beam, the gamma test pass rate was greater than 98% for the 3%/3mm/10% criterion. Conclusion: We have determined key variables in our RW model for proton dose calculation. Compared with commercial pencil beam algorithms, our RW model much improves the dose accuracy in heterogeneous regions, and is about 10 times faster than MC simulations.

  14. Hardware architecture for projective model calculation and false match refining using random sample consensus algorithm

    NASA Astrophysics Data System (ADS)

    Azimi, Ehsan; Behrad, Alireza; Ghaznavi-Ghoushchi, Mohammad Bagher; Shanbehzadeh, Jamshid

    2016-11-01

    The projective model is an important mapping function for the calculation of global transformation between two images. However, its hardware implementation is challenging because of a large number of coefficients with different required precisions for fixed point representation. A VLSI hardware architecture is proposed for the calculation of a global projective model between input and reference images and refining false matches using random sample consensus (RANSAC) algorithm. To make the hardware implementation feasible, it is proved that the calculation of the projective model can be divided into four submodels comprising two translations, an affine model and a simpler projective mapping. This approach makes the hardware implementation feasible and considerably reduces the required number of bits for fixed point representation of model coefficients and intermediate variables. The proposed hardware architecture for the calculation of a global projective model using the RANSAC algorithm was implemented using Verilog hardware description language and the functionality of the design was validated through several experiments. The proposed architecture was synthesized by using an application-specific integrated circuit digital design flow utilizing 180-nm CMOS technology as well as a Virtex-6 field programmable gate array. Experimental results confirm the efficiency of the proposed hardware architecture in comparison with software implementation.

  15. From analytical solutions of solute transport equations to multidimensional time-domain random walk (TDRW) algorithms

    NASA Astrophysics Data System (ADS)

    Bodin, Jacques

    2015-03-01

    In this study, new multi-dimensional time-domain random walk (TDRW) algorithms are derived from approximate one-dimensional (1-D), two-dimensional (2-D), and three-dimensional (3-D) analytical solutions of the advection-dispersion equation and from exact 1-D, 2-D, and 3-D analytical solutions of the pure-diffusion equation. These algorithms enable the calculation of both the time required for a particle to travel a specified distance in a homogeneous medium and the mass recovery at the observation point, which may be incomplete due to 2-D or 3-D transverse dispersion or diffusion. The method is extended to heterogeneous media, represented as a piecewise collection of homogeneous media. The particle motion is then decomposed along a series of intermediate checkpoints located on the medium interface boundaries. The accuracy of the multi-dimensional TDRW method is verified against (i) exact analytical solutions of solute transport in homogeneous media and (ii) finite-difference simulations in a synthetic 2-D heterogeneous medium of simple geometry. The results demonstrate that the method is ideally suited to purely diffusive transport and to advection-dispersion transport problems dominated by advection. Conversely, the method is not recommended for highly dispersive transport problems because the accuracy of the advection-dispersion TDRW algorithms degrades rapidly for a low Péclet number, consistent with the accuracy limit of the approximate analytical solutions. The proposed approach provides a unified methodology for deriving multi-dimensional time-domain particle equations and may be applicable to other mathematical transport models, provided that appropriate analytical solutions are available.

  16. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-11-01

    The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire

  17. Harmonics elimination algorithm for operational modal analysis using random decrement technique

    NASA Astrophysics Data System (ADS)

    Modak, S. V.; Rawal, Chetan; Kundra, T. K.

    2010-05-01

    Operational modal analysis (OMA) extracts modal parameters of a structure using their output response, during operation in general. OMA, when applied to mechanical engineering structures is often faced with the problem of harmonics present in the output response, and can cause erroneous modal extraction. This paper demonstrates for the first time that the random decrement (RD) method can be efficiently employed to eliminate the harmonics from the randomdec signatures. Further, the research work shows effective elimination of large amplitude harmonics also by proposing inclusion of additional random excitation. This obviously need not be recorded for analysis, as is the case with any other OMA method. The free decays obtained from RD have been used for system modal identification using eigen-system realization algorithm (ERA). The proposed harmonic elimination method has an advantage over previous methods in that it does not require the harmonic frequencies to be known and can be used for multiple harmonics, including periodic signals. The theory behind harmonic elimination is first developed and validated. The effectiveness of the method is demonstrated through a simulated study and then by experimental studies on a beam and a more complex F-shape structure, which resembles in shape to the skeleton of a drilling or milling machine tool. Cases with presence of single and multiple harmonics in the response are considered.

  18. Segmentation of prostate from CT scans using a combined voxel random forests classification with spherical harmonics regularization

    NASA Astrophysics Data System (ADS)

    Commandeur, F.; Acosta, O.; Simon, A.; Ospina Arango, J. D.; Dillenseger, J. L.; Mathieu, R.; Haigron, P.; de Crevoisier, R.

    2015-01-01

    In prostate cancer external beam radiotherapy, pelvic structures identification in computed tomography (CT) is required for the treatment planning and is performed manually by experts. Prostate manual delineations in CT modality is time consuming and prone to observer variability. We propose a fully automated process using a combination of a Random Forests (RF) classification and Spherical Harmonics (SPHARM) to identify the prostate boundaries. The proposed method outperformed classical atlas based approach from the literature. Combining RF to detect the prostate and SPHARM for shape regularization provided promising results for automatic prostate segmentation.

  19. SU-D-201-06: Random Walk Algorithm Seed Localization Parameters in Lung Positron Emission Tomography (PET) Images

    SciTech Connect

    Soufi, M; Asl, A Kamali; Geramifar, P

    2015-06-15

    Purpose: The objective of this study was to find the best seed localization parameters in random walk algorithm application to lung tumor delineation in Positron Emission Tomography (PET) images. Methods: PET images suffer from statistical noise and therefore tumor delineation in these images is a challenging task. Random walk algorithm, a graph based image segmentation technique, has reliable image noise robustness. Also its fast computation and fast editing characteristics make it powerful for clinical purposes. We implemented the random walk algorithm using MATLAB codes. The validation and verification of the algorithm have been done by 4D-NCAT phantom with spherical lung lesions in different diameters from 20 to 90 mm (with incremental steps of 10 mm) and different tumor to background ratios of 4:1 and 8:1. STIR (Software for Tomographic Image Reconstruction) has been applied to reconstruct the phantom PET images with different pixel sizes of 2×2×2 and 4×4×4 mm{sup 3}. For seed localization, we selected pixels with different maximum Standardized Uptake Value (SUVmax) percentages, at least (70%, 80%, 90% and 100%) SUVmax for foreground seeds and up to (20% to 55%, 5% increment) SUVmax for background seeds. Also, for investigation of algorithm performance on clinical data, 19 patients with lung tumor were studied. The resulted contours from algorithm have been compared with nuclear medicine expert manual contouring as ground truth. Results: Phantom and clinical lesion segmentation have shown that the best segmentation results obtained by selecting the pixels with at least 70% SUVmax as foreground seeds and pixels up to 30% SUVmax as background seeds respectively. The mean Dice Similarity Coefficient of 94% ± 5% (83% ± 6%) and mean Hausdorff Distance of 1 (2) pixels have been obtained for phantom (clinical) study. Conclusion: The accurate results of random walk algorithm in PET image segmentation assure its application for radiation treatment planning and

  20. An Introduction to Recursive Partitioning: Rationale, Application and Characteristics of Classification and Regression Trees, Bagging and Random Forests

    PubMed Central

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2010-01-01

    Recursive partitioning methods have become popular and widely used tools for non-parametric regression and classification in many scientific fields. Especially random forests, that can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine and bioinformatics within the past few years. High dimensional problems are common not only in genetics, but also in some areas of psychological research, where only few subjects can be measured due to time or cost constraints, yet a large amount of data is generated for each subject. Random forests have been shown to achieve a high prediction accuracy in such applications, and provide descriptive variable importance measures reflecting the impact of each variable in both main effects and interactions. The aim of this work is to introduce the principles of the standard recursive partitioning methods as well as recent methodological improvements, to illustrate their usage for low and high dimensional data exploration, but also to point out limitations of the methods and potential pitfalls in their practical application. Application of the methods is illustrated using freely available implementations in the R system for statistical computing. PMID:19968396

  1. Classification of Potential Water Bodies Using Landsat 8 OLI and a Combination of Two Boosted Random Forest Classifiers

    PubMed Central

    Ko, Byoung Chul; Kim, Hyeong Hun; Nam, Jae Yeal

    2015-01-01

    This study proposes a new water body classification method using top-of-atmosphere (TOA) reflectance and water indices (WIs) of the Landsat 8 Operational Land Imager (OLI) sensor and its corresponding random forest classifiers. In this study, multispectral images from the OLI sensor are represented as TOA reflectance and WI values because a classification result using two measures is better than raw spectral images. Two types of boosted random forest (BRF) classifiers are learned using TOA reflectance and WI values, respectively, instead of the heuristic threshold or unsupervised methods. The final probability is summed linearly using the probabilities of two different BRFs to classify image pixels to water class. This study first demonstrates that the Landsat 8 OLI sensor has higher classification rate because it provides improved signal-to-ratio radiometric by using 12-bit quantization of the data instead of 8-bit as available from other sensors. In addition, we prove that the performance of the proposed combination of two BRF classifiers shows robust water body classification results, regardless of topology, river properties, and background environment. PMID:26110405

  2. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers

    NASA Astrophysics Data System (ADS)

    Abdel-Rahman, Elfatih M.; Mutanga, Onisimo; Adam, Elhadi; Ismail, Riyad

    2014-02-01

    The visual progression of sirex (Sirex noctilio) infestation symptoms has been categorized into three distinct infestation phases, namely the green, red and grey stages. The grey stage is the final stage which leads to almost complete defoliation resulting in dead standing trees or snags. Dead standing pine trees however, could also be due to the lightning damage. Hence, the objective of the present study was to distinguish amongst healthy, sirex grey-attacked and lightning-damaged pine trees using AISA Eagle hyperspectral data, random forest (RF) and support vector machines (SVM) classifiers. Our study also presents an opportunity to look at the possibility of separating amongst the previously mentioned pine trees damage classes and other landscape classes on the study area. The results of the present study revealed the robustness of the two machine learning classifiers with an overall accuracy of 74.50% (total disagreement = 26%) for RF and 73.50% (total disagreement = 27%) for SVM using all the remaining AISA Eagle spectral bands after removing the noisy ones. When the most useful spectral bands as measured by RF were exploited, the overall accuracy was considerably improved; 78% (total disagreement = 22%) for RF and 76.50% (total disagreement = 24%) for SVM. There was no significant difference between the performances of the two classifiers as demonstrated by the results of McNemar's test (chi-squared; χ2 = 0.14, and 0.03 when all the remaining ASIA Eagle wavebands, after removing the noisy ones and the most important wavebands were used, respectively). This study concludes that AISA Eagle data classified using RF and SVM algorithms provide relatively accurate information that is important to the forest industry for making informed decision regarding pine plantations health protocols.

  3. Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

    NASA Astrophysics Data System (ADS)

    Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin

    2012-04-01

    Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

  4. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  5. 3-D Ultrasound Segmentation of the Placenta Using the Random Walker Algorithm: Reliability and Agreement.

    PubMed

    Stevenson, Gordon N; Collins, Sally L; Ding, Jane; Impey, Lawrence; Noble, J Alison

    2015-12-01

    Volumetric segmentation of the placenta using 3-D ultrasound is currently performed clinically to investigate correlation between organ volume and fetal outcome or pathology. Previously, interpolative or semi-automatic contour-based methodologies were used to provide volumetric results. We describe the validation of an original random walker (RW)-based algorithm against manual segmentation and an existing semi-automated method, virtual organ computer-aided analysis (VOCAL), using initialization time, inter- and intra-observer variability of volumetric measurements and quantification accuracy (with respect to manual segmentation) as metrics of success. Both semi-automatic methods require initialization. Therefore, the first experiment compared initialization times. Initialization was timed by one observer using 20 subjects. This revealed significant differences (p < 0.001) in time taken to initialize the VOCAL method compared with the RW method. In the second experiment, 10 subjects were used to analyze intra-/inter-observer variability between two observers. Bland-Altman plots were used to analyze variability combined with intra- and inter-observer variability measured by intra-class correlation coefficients, which were reported for all three methods. Intra-class correlation coefficient values for intra-observer variability were higher for the RW method than for VOCAL, and both were similar to manual segmentation. Inter-observer variability was 0.94 (0.88, 0.97), 0.91 (0.81, 0.95) and 0.80 (0.61, 0.90) for manual, RW and VOCAL, respectively. Finally, a third observer with no prior ultrasound experience was introduced and volumetric differences from manual segmentation were reported. Dice similarity coefficients for observers 1, 2 and 3 were respectively 0.84 ± 0.12, 0.94 ± 0.08 and 0.84 ± 0.11, and the mean was 0.87 ± 0.13. The RW algorithm was found to provide results concordant with those for manual segmentation and to outperform VOCAL in aspects of observer

  6. Computed tomography synthesis from magnetic resonance images in the pelvis using multiple random forests and auto-context features

    NASA Astrophysics Data System (ADS)

    Andreasen, Daniel; Edmund, Jens M.; Zografos, Vasileios; Menze, Bjoern H.; Van Leemput, Koen

    2016-03-01

    In radiotherapy treatment planning that is only based on magnetic resonance imaging (MRI), the electron density information usually obtained from computed tomography (CT) must be derived from the MRI by synthesizing a so-called pseudo CT (pCT). This is a non-trivial task since MRI intensities are neither uniquely nor quantitatively related to electron density. Typical approaches involve either a classification or regression model requiring specialized MRI sequences to solve intensity ambiguities, or an atlas-based model necessitating multiple registrations between atlases and subject scans. In this work, we explore a machine learning approach for creating a pCT of the pelvic region from conventional MRI sequences without using atlases. We use a random forest provided with information about local texture, edges and spatial features derived from the MRI. This helps to solve intensity ambiguities. Furthermore, we use the concept of auto-context by sequentially training a number of classification forests to create and improve context features, which are finally used to train a regression forest for pCT prediction. We evaluate the pCT quality in terms of the voxel-wise error and the radiologic accuracy as measured by water-equivalent path lengths. We compare the performance of our method against two baseline pCT strategies, which either set all MRI voxels in the subject equal to the CT value of water, or in addition transfer the bone volume from the real CT. We show an improved performance compared to both baseline pCTs suggesting that our method may be useful for MRI-only radiotherapy.

  7. Sequence-Based Prediction of RNA-Binding Proteins Using Random Forest with Minimum Redundancy Maximum Relevance Feature Selection

    PubMed Central

    Ma, Xin; Guo, Jing; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is one of the most challenging problems in computation biology. Although some studies have investigated this problem, the accuracy of prediction is still not sufficient. In this study, a highly accurate method was developed to predict RNA-binding proteins from amino acid sequences using random forests with the minimum redundancy maximum relevance (mRMR) method, followed by incremental feature selection (IFS). We incorporated features of conjoint triad features and three novel features: binding propensity (BP), nonbinding propensity (NBP), and evolutionary information combined with physicochemical properties (EIPP). The results showed that these novel features have important roles in improving the performance of the predictor. Using the mRMR-IFS method, our predictor achieved the best performance (86.62% accuracy and 0.737 Matthews correlation coefficient). High prediction accuracy and successful prediction performance suggested that our method can be a useful approach to identify RNA-binding proteins from sequence information. PMID:26543860

  8. Early identification of mild cognitive impairment using incomplete random forest-robust support vector machine and FDG-PET imaging.

    PubMed

    Lu, Shen; Xia, Yong; Cai, Weidong; Fulham, Michael; Feng, David Dagan

    2017-02-07

    Alzheimer's disease (AD) is the most common type of dementia and will be an increasing health problem in society as the population ages. Mild cognitive impairment (MCI) is considered to be a prodromal stage of AD. The ability to identify subjects with MCI will be increasingly important as disease modifying therapies for AD are developed. We propose a semi-supervised learning method based on robust optimization for the identification of MCI from [18F]Fluorodeoxyglucose PET scans. We extracted three groups of spatial features from the cortical and subcortical regions of each FDG-PET image volume. We measured the statistical uncertainty related to these spatial features via transformation using an incomplete random forest and formulated the MCI identification problem under a robust optimization framework. We compared our approach to other state-of-the-art methods in different learning schemas. Our method outperformed the other techniques in the ability to separate MCI from normal controls.

  9. Deep neural network and random forest hybrid architecture for learning to detect retinal vessels in fundus images.

    PubMed

    Maji, Debapriya; Santara, Anirban; Ghosh, Sambuddha; Sheet, Debdoot; Mitra, Pabitra

    2015-08-01

    Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large-scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning based hybrid architecture for reliable detection of blood vessels in fundus color images. A deep neural network (DNN) is used for unsupervised learning of vesselness dictionaries using sparse trained denoising auto-encoders (DAE), followed by supervised learning of the DNN response using a random forest for detecting vessels in color fundus images. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with max. avg. accuracy of 0.9327 and area under ROC curve of 0.9195.

  10. Relationship between clustering and algorithmic phase transitions in the random k-XORSAT model and its NP-complete extensions

    NASA Astrophysics Data System (ADS)

    Altarelli, F.; Monasson, R.; Zamponi, F.

    2008-01-01

    We study the performances of stochastic heuristic search algorithms on Uniquely Extendible Constraint Satisfaction Problems with random inputs. We show that, for any heuristic preserving the Poissonian nature of the underlying instance, the (heuristic-dependent) largest ratio αa of constraints per variables for which a search algorithm is likely to find solutions is smaller than the critical ratio αd above which solutions are clustered and highly correlated. In addition we show that the clustering ratio can be reached when the number k of variables per constraints goes to infinity by the so-called Generalized Unit Clause heuristic.

  11. Tuning of an optimal fuzzy PID controller with stochastic algorithms for networked control systems with random time delay.

    PubMed

    Pan, Indranil; Das, Saptarshi; Gupta, Amitava

    2011-01-01

    An optimal PID and an optimal fuzzy PID have been tuned by minimizing the Integral of Time multiplied Absolute Error (ITAE) and squared controller output for a networked control system (NCS). The tuning is attempted for a higher order and a time delay system using two stochastic algorithms viz. the Genetic Algorithm (GA) and two variants of Particle Swarm Optimization (PSO) and the closed loop performances are compared. The paper shows that random variation in network delay can be handled efficiently with fuzzy logic based PID controllers over conventional PID controllers.

  12. Convergent Random Forest Predictor: Methodology for predicting drug response from genome-scale data applied to anti-TNF response

    PubMed Central

    Bienkowska, Jadwiga; Dagin, Gul; Batliwalla, Franak; Allaire, Normand; Roubenoff, Ronenn; Gregersen, Peter; Carulli, John

    2015-01-01

    Biomarker development for prediction of patient response to therapy is one of the goals of molecular profiling of human tissues. Due to the large number of transcripts, relatively limited number of samples, and high variability of data, identification of predictive biomarkers is a challenge for data analysis. Furthermore, many genes may be responsible for drug response differences, but often only a few are sufficient for accurate prediction. Here we present an analysis approach, the Convergent Random Forest (CRF) method, for the identification of highly predictive biomarkers. The aim is to select from genome-wide expression data a small number of non-redundant biomarkers that could be developed into a simple and robust diagnostic tool. Our method combines the Random Forest classifier and gene expression clustering to rank and select a small number of predictive genes. We evaluated the CRF approach by analyzing four different data sets. The first set contains transcript profiles of whole blood from rheumatoid arthritis patients, collected before anti-TNF treatment, and their subsequent response to the therapy. In this set, CRF identified 8 transcripts predicting response to therapy with 89% accuracy. We also applied the CRF to the analysis of three previously published expression data sets. For all sets, we have compared the CRF and recursive support vector machines (RSVM) approaches to feature selection and classification. In all cases the CRF selects much smaller number of features, five to eight genes, while achieving similar or better performance on both: training and independent testing sets of data. For both methods performance estimates using cross-validation is similar to performance on independent samples. The method has been implemented in R and is available from the authors upon request: Jadwiga.Bienkowska@biogenidec.com. PMID:19699293

  13. Tracing Forest Change through 40 Years on Two Continents with the BULC Algorithm and Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Cardille, J. A.

    2015-12-01

    With the opening of the Landsat archive, researchers have a vast new data source teeming with imagery and potential. Beyond Landsat, data from other sensors is newly available as well: these include ALOS/PALSAR, Sentinel-1 and -2, MERIS, and many more. Google Earth Engine, developed to organize and provide analysis tools for these immense data sets, is an ideal platform for researchers trying to sift through huge image stacks. It offers nearly unlimited processing power and storage with a straightforward programming interface. Yet labeling forest change through time remains challenging given the current state of the art for interpreting remote sensing image sequences. Moreover, combining data from very different image platforms remains quite difficult. To address these challenges, we developed the BULC algorithm (Bayesian Updating of Land Cover), designed for the continuous updating of land-cover classifications through time in large data sets. The algorithm ingests data from any of the wide variety of earth-resources sensors; it maintains a running estimate of land-cover probabilities and the most probable class at all time points along a sequence of events. Here we compare BULC results from two study sites that witnessed considerable forest change in the last 40 years: the Pacific Northwest of the United States and the Mato Grosso region of Brazil. In Brazil, we incorporated rough classifications from more than 100 images of varying quality, mixing imagery from more than 10 different sensors. In the Pacific Northwest, we used BULC to identify forest changes due to logging and urbanization from 1973 to the present. Both regions had classification sequences that were better than many of the component days, effectively ignoring clouds and other unwanted signal while fusing the information contained on several platforms. As we leave remote sensing's data-poor era and enter a period with multiple looks at Earth's surface from multiple sensors over a short period of

  14. A Novel Compressed Sensing Method for Magnetic Resonance Imaging: Exponential Wavelet Iterative Shrinkage-Thresholding Algorithm with Random Shift

    PubMed Central

    Zhang, Yudong; Yang, Jiquan; Yang, Jianfei; Liu, Aijun; Sun, Ping

    2016-01-01

    Aim. It can help improve the hospital throughput to accelerate magnetic resonance imaging (MRI) scanning. Patients will benefit from less waiting time. Task. In the last decade, various rapid MRI techniques on the basis of compressed sensing (CS) were proposed. However, both computation time and reconstruction quality of traditional CS-MRI did not meet the requirement of clinical use. Method. In this study, a novel method was proposed with the name of exponential wavelet iterative shrinkage-thresholding algorithm with random shift (abbreviated as EWISTARS). It is composed of three successful components: (i) exponential wavelet transform, (ii) iterative shrinkage-thresholding algorithm, and (iii) random shift. Results. Experimental results validated that, compared to state-of-the-art approaches, EWISTARS obtained the least mean absolute error, the least mean-squared error, and the highest peak signal-to-noise ratio. Conclusion. EWISTARS is superior to state-of-the-art approaches. PMID:27066068

  15. A Proposed Extension to the Soil Moisture and Ocean Salinity Level 2 Algorithm for Mixed Forest and Moderate Vegetation Pixels

    NASA Technical Reports Server (NTRS)

    Panciera, Rocco; Walker, Jeffrey P.; Kalma, Jetse; Kim, Edward

    2011-01-01

    The Soil Moisture and Ocean Salinity (SMOS)mission, launched in November 2009, provides global maps of soil moisture and ocean salinity by measuring the L-band (1.4 GHz) emission of the Earth's surface with a spatial resolution of 40-50 km.Uncertainty in the retrieval of soilmoisture over large heterogeneous areas such as SMOS pixels is expected, due to the non-linearity of the relationship between soil moisture and the microwave emission. The current baseline soilmoisture retrieval algorithm adopted by SMOS and implemented in the SMOS Level 2 (SMOS L2) processor partially accounts for the sub-pixel heterogeneity of the land surface, by modelling the individual contributions of different pixel fractions to the overall pixel emission. This retrieval approach is tested in this study using airborne L-band data over an area the size of a SMOS pixel characterised by a mix Eucalypt forest and moderate vegetation types (grassland and crops),with the objective of assessing its ability to correct for the soil moisture retrieval error induced by the land surface heterogeneity. A preliminary analysis using a traditional uniform pixel retrieval approach shows that the sub-pixel heterogeneity of land cover type causes significant errors in soil moisture retrieval (7.7%v/v RMSE, 2%v/v bias) in pixels characterised by a significant amount of forest (40-60%). Although the retrieval approach adopted by SMOS partially reduces this error, it is affected by errors beyond the SMOS target accuracy, presenting in particular a strong dry bias when a fraction of the pixel is occupied by forest (4.1%v/v RMSE,-3.1%v/v bias). An extension to the SMOS approach is proposed that accounts for the heterogeneity of vegetation optical depth within the SMOS pixel. The proposed approach is shown to significantly reduce the error in retrieved soil moisture (2.8%v/v RMSE, -0.3%v/v bias) in pixels characterised by a critical amount of forest (40-60%), at the limited cost of only a crude estimate of the

  16. MODIS 250m burned area mapping based on an algorithm using change point detection and Markov random fields.

    NASA Astrophysics Data System (ADS)

    Mota, Bernardo; Pereira, Jose; Campagnolo, Manuel; Killick, Rebeca

    2013-04-01

    Area burned in tropical savannas of Brazil was mapped using MODIS-AQUA daily 250m resolution imagery by adapting one of the European Space Agency fire_CCI project burned area algorithms, based on change point detection and Markov random fields. The study area covers 1,44 Mkm2 and was performed with data from 2005. The daily 1000 m image quality layer was used for cloud and cloud shadow screening. The algorithm addresses each pixel as a time series and detects changes in the statistical properties of NIR reflectance values, to identify potential burning dates. The first step of the algorithm is robust filtering, to exclude outlier observations, followed by application of the Pruned Exact Linear Time (PELT) change point detection technique. Near-infrared (NIR) spectral reflectance changes between time segments, and post change NIR reflectance values are combined into a fire likelihood score. Change points corresponding to an increase in reflectance are dismissed as potential burn events, as are those occurring outside of a pre-defined fire season. In the last step of the algorithm, monthly burned area probability maps and detection date maps are converted to dichotomous (burned-unburned maps) using Markov random fields, which take into account both spatial and temporal relations in the potential burned area maps. A preliminary assessment of our results is performed by comparison with data from the MODIS 1km active fires and the 500m burned area products, taking into account differences in spatial resolution between the two sensors.

  17. Automatic classification of endogenous landslide seismicity using the Random Forest supervised classifier

    NASA Astrophysics Data System (ADS)

    Provost, F.; Hibert, C.; Malet, J.-P.

    2017-01-01

    The deformation of slow-moving landslides developed in clays induces endogenous seismicity of mostly low-magnitude events (ML<1). Long seismic records and complete catalogs are needed to identify the type of seismic sources and understand their mechanisms. Manual classification of long records is time-consuming and may be highly subjective. We propose an automatic classification method based on the computation of 71 seismic attributes and the use of a supervised classifier. No attribute was selected a priori in order to create a generic multi-class classification method applicable to many landslide contexts. The method can be applied directly on the results of a simple detector. We developed the approach on the seismic network of eight sensors of the Super-Sauze clay-rich landslide (South French Alps) for the detection of four types of seismic sources. The automatic algorithm retrieves 93% of sensitivity in comparison to a manually interpreted catalog considered as reference.

  18. Intra-and-Inter Species Biomass Prediction in a Plantation Forest: Testing the Utility of High Spatial Resolution Spaceborne Multispectral RapidEye Sensor and Advanced Machine Learning Algorithms

    PubMed Central

    Dube, Timothy; Mutanga, Onisimo; Adam, Elhadi; Ismail, Riyad

    2014-01-01

    The quantification of aboveground biomass using remote sensing is critical for better understanding the role of forests in carbon sequestration and for informed sustainable management. Although remote sensing techniques have been proven useful in assessing forest biomass in general, more is required to investigate their capabilities in predicting intra-and-inter species biomass which are mainly characterised by non-linear relationships. In this study, we tested two machine learning algorithms, Stochastic Gradient Boosting (SGB) and Random Forest (RF) regression trees to predict intra-and-inter species biomass using high resolution RapidEye reflectance bands as well as the derived vegetation indices in a commercial plantation. The results showed that the SGB algorithm yielded the best performance for intra-and-inter species biomass prediction; using all the predictor variables as well as based on the most important selected variables. For example using the most important variables the algorithm produced an R2 of 0.80 and RMSE of 16.93 t·ha−1 for E. grandis; R2 of 0.79, RMSE of 17.27 t·ha−1 for P. taeda and R2 of 0.61, RMSE of 43.39 t·ha−1 for the combined species data sets. Comparatively, RF yielded plausible results only for E. dunii (R2 of 0.79; RMSE of 7.18 t·ha−1). We demonstrated that although the two statistical methods were able to predict biomass accurately, RF produced weaker results as compared to SGB when applied to combined species dataset. The result underscores the relevance of stochastic models in predicting biomass drawn from different species and genera using the new generation high resolution RapidEye sensor with strategically positioned bands. PMID:25140631

  19. Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain).

    PubMed

    Rodriguez-Galiano, Victor; Mendes, Maria Paula; Garcia-Soldado, Maria Jose; Chica-Olmo, Mario; Ribeiro, Luis

    2014-04-01

    Watershed management decisions need robust methods, which allow an accurate predictive modeling of pollutant occurrences. Random Forest (RF) is a powerful machine learning data driven method that is rarely used in water resources studies, and thus has not been evaluated thoroughly in this field, when compared to more conventional pattern recognition techniques key advantages of RF include: its non-parametric nature; high predictive accuracy; and capability to determine variable importance. This last characteristic can be used to better understand the individual role and the combined effect of explanatory variables in both protecting and exposing groundwater from and to a pollutant. In this paper, the performance of the RF regression for predictive modeling of nitrate pollution is explored, based on intrinsic and specific vulnerability assessment of the Vega de Granada aquifer. The applicability of this new machine learning technique is demonstrated in an agriculture-dominated area where nitrate concentrations in groundwater can exceed the trigger value of 50 mg/L, at many locations. A comprehensive GIS database of twenty-four parameters related to intrinsic hydrogeologic proprieties, driving forces, remotely sensed variables and physical-chemical variables measured in "situ", were used as inputs to build different predictive models of nitrate pollution. RF measures of importance were also used to define the most significant predictors of nitrate pollution in groundwater, allowing the establishment of the pollution sources (pressures). The potential of RF for generating a vulnerability map to nitrate pollution is assessed considering multiple criteria related to variations in the algorithm parameters and the accuracy of the maps. The performance of the RF is also evaluated in comparison to the logistic regression (LR) method using different efficiency measures to ensure their generalization ability. Prediction results show the ability of RF to build accurate models

  20. A Circuit-Based Neural Network with Hybrid Learning of Backpropagation and Random Weight Change Algorithms

    PubMed Central

    Yang, Changju; Kim, Hyongsuk; Adhikari, Shyam Prasad; Chua, Leon O.

    2016-01-01

    A hybrid learning method of a software-based backpropagation learning and a hardware-based RWC learning is proposed for the development of circuit-based neural networks. The backpropagation is known as one of the most efficient learning algorithms. A weak point is that its hardware implementation is extremely difficult. The RWC algorithm, which is very easy to implement with respect to its hardware circuits, takes too many iterations for learning. The proposed learning algorithm is a hybrid one of these two. The main learning is performed with a software version of the BP algorithm, firstly, and then, learned weights are transplanted on a hardware version of a neural circuit. At the time of the weight transplantation, a significant amount of output error would occur due to the characteristic difference between the software and the hardware. In the proposed method, such error is reduced via a complementary learning of the RWC algorithm, which is implemented in a simple hardware. The usefulness of the proposed hybrid learning system is verified via simulations upon several classical learning problems. PMID:28025566

  1. A Circuit-Based Neural Network with Hybrid Learning of Backpropagation and Random Weight Change Algorithms.

    PubMed

    Yang, Changju; Kim, Hyongsuk; Adhikari, Shyam Prasad; Chua, Leon O

    2016-12-23

    A hybrid learning method of a software-based backpropagation learning and a hardware-based RWC learning is proposed for the development of circuit-based neural networks. The backpropagation is known as one of the most efficient learning algorithms. A weak point is that its hardware implementation is extremely difficult. The RWC algorithm, which is very easy to implement with respect to its hardware circuits, takes too many iterations for learning. The proposed learning algorithm is a hybrid one of these two. The main learning is performed with a software version of the BP algorithm, firstly, and then, learned weights are transplanted on a hardware version of a neural circuit. At the time of the weight transplantation, a significant amount of output error would occur due to the characteristic difference between the software and the hardware. In the proposed method, such error is reduced via a complementary learning of the RWC algorithm, which is implemented in a simple hardware. The usefulness of the proposed hybrid learning system is verified via simulations upon several classical learning problems.

  2. Multi-Objective Random Search Algorithm for Simultaneously Optimizing Wind Farm Layout and Number of Turbines

    NASA Astrophysics Data System (ADS)

    Feng, Ju; Shen, Wen Zhong; Xu, Chang

    2016-09-01

    A new algorithm for multi-objective wind farm layout optimization is presented. It formulates the wind turbine locations as continuous variables and is capable of optimizing the number of turbines and their locations in the wind farm simultaneously. Two objectives are considered. One is to maximize the total power production, which is calculated by considering the wake effects using the Jensen wake model combined with the local wind distribution. The other is to minimize the total electrical cable length. This length is assumed to be the total length of the minimal spanning tree that connects all turbines and is calculated by using Prim's algorithm. Constraints on wind farm boundary and wind turbine proximity are also considered. An ideal test case shows the proposed algorithm largely outperforms a famous multi-objective genetic algorithm (NSGA-II). In the real test case based on the Horn Rev 1 wind farm, the algorithm also obtains useful Pareto frontiers and provides a wide range of Pareto optimal layouts with different numbers of turbines for a real-life wind farm developer.

  3. Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers.

    PubMed

    Sprague, Brienne; Shi, Qian; Kim, Marlene T; Zhang, Liying; Sedykh, Alexander; Ichiishi, Eiichiro; Tokuda, Harukuni; Lee, Kuo-Hsiung; Zhu, Hao

    2014-06-01

    Compared to the current knowledge on cancer chemotherapeutic agents, only limited information is available on the ability of organic compounds, such as drugs and/or natural products, to prevent or delay the onset of cancer. In order to evaluate chemical chemopreventive potentials and design novel chemopreventive agents with low to no toxicity, we developed predictive computational models for chemopreventive agents in this study. First, we curated a database containing over 400 organic compounds with known chemoprevention activities. Based on this database, various random forest and support vector machine binary classifiers were developed. All of the resulting models were validated by cross validation procedures. Then, the validated models were applied to virtually screen a chemical library containing around 23,000 natural products and derivatives. We selected a list of 148 novel chemopreventive compounds based on the consensus prediction of all validated models. We further analyzed the predicted active compounds by their ease of organic synthesis. Finally, 18 compounds were synthesized and experimentally validated for their chemopreventive activity. The experimental validation results paralleled the cross validation results, demonstrating the utility of the developed models. The predictive models developed in this study can be applied to virtually screen other chemical libraries to identify novel lead compounds for the chemoprevention of cancers.

  4. A Combined Random Forests and Active Contour Model Approach for Fully Automatic Segmentation of the Left Atrium in Volumetric MRI

    PubMed Central

    Luo, Gongning

    2017-01-01

    Segmentation of the left atrium (LA) from cardiac magnetic resonance imaging (MRI) datasets is of great importance for image guided atrial fibrillation ablation, LA fibrosis quantification, and cardiac biophysical modelling. However, automated LA segmentation from cardiac MRI is challenging due to limited image resolution, considerable variability in anatomical structures across subjects, and dynamic motion of the heart. In this work, we propose a combined random forests (RFs) and active contour model (ACM) approach for fully automatic segmentation of the LA from cardiac volumetric MRI. Specifically, we employ the RFs within an autocontext scheme to effectively integrate contextual and appearance information from multisource images together for LA shape inferring. The inferred shape is then incorporated into a volume-scalable ACM for further improving the segmentation accuracy. We validated the proposed method on the cardiac volumetric MRI datasets from the STACOM 2013 and HVSMR 2016 databases and showed that it outperforms other latest automated LA segmentation methods. Validation metrics, average Dice coefficient (DC) and average surface-to-surface distance (S2S), were computed as 0.9227 ± 0.0598 and 1.14 ± 1.205 mm, versus those of 0.6222–0.878 and 1.34–8.72 mm, obtained by other methods, respectively. PMID:28316992

  5. Random Forest Classification of Sediments on Exposed Intertidal Flats Using ALOS-2 Quad-Polarimetric SAR Data

    NASA Astrophysics Data System (ADS)

    Wang, W.; Yang, X.; Liu, G.; Zhou, H.; Ma, W.; Yu, Y.; Li, Z.

    2016-06-01

    Coastal zones are one of the world's most densely populated areas and it is necessary to propose an accurate, cost effective, frequent, and synoptic method of monitoring these complex ecosystems. However, misclassification of sediments on exposed intertidal flats restricts the development of coastal zones surveillance. With the advent of SAR (Synthetic Aperture Radar) satellites, polarimetric SAR satellite imagery plays an increasingly important role in monitoring changes in coastal wetland. This research investigated the necessity of combining SAR polarimetric features with optical data, and their contribution in accurately sediment classification. Three experimental groups were set to make assessment of the most appropriate descriptors. (i) Several SAR polarimetric descriptors were extracted from scattering matrix using Cloude-Pottier, Freeman-Durden and Yamaguchi methods; (ii) Optical remote sensing (RS) data with R, G and B channels formed the second feature combinations; (iii) The chosen SAR and optical RS indicators were both added into classifier. Classification was carried out using Random Forest (RF) classifiers and a general result mapping of intertidal flats was generated. Experiments were implemented using ALOS-2 L-band satellite imagery and GF-1 optical multi-spectral data acquired in the same period. The weights of descriptors were evaluated by VI (RF Variable Importance). Results suggested that optical data source has few advantages on sediment classification, and even reduce the effect of SAR indicators. Polarimetric SAR feature sets show great potentials in intertidal flats classification and are promising in classifying mud flats, sand flats, bare farmland and tidal water.

  6. Design, synthesis and experimental validation of novel potential chemopreventive agents using random forest and support vector machine binary classifiers

    NASA Astrophysics Data System (ADS)

    Sprague, Brienne; Shi, Qian; Kim, Marlene T.; Zhang, Liying; Sedykh, Alexander; Ichiishi, Eiichiro; Tokuda, Harukuni; Lee, Kuo-Hsiung; Zhu, Hao

    2014-06-01

    Compared to the current knowledge on cancer chemotherapeutic agents, only limited information is available on the ability of organic compounds, such as drugs and/or natural products, to prevent or delay the onset of cancer. In order to evaluate chemical chemopreventive potentials and design novel chemopreventive agents with low to no toxicity, we developed predictive computational models for chemopreventive agents in this study. First, we curated a database containing over 400 organic compounds with known chemoprevention activities. Based on this database, various random forest and support vector machine binary classifiers were developed. All of the resulting models were validated by cross validation procedures. Then, the validated models were applied to virtually screen a chemical library containing around 23,000 natural products and derivatives. We selected a list of 148 novel chemopreventive compounds based on the consensus prediction of all validated models. We further analyzed the predicted active compounds by their ease of organic synthesis. Finally, 18 compounds were synthesized and experimentally validated for their chemopreventive activity. The experimental validation results paralleled the cross validation results, demonstrating the utility of the developed models. The predictive models developed in this study can be applied to virtually screen other chemical libraries to identify novel lead compounds for the chemoprevention of cancers.

  7. Proteomics Analysis with a Nano Random Forest Approach Reveals Novel Functional Interactions Regulated by SMC Complexes on Mitotic Chromosomes*

    PubMed Central

    Ohta, Shinya; Montaño-Gutierrez, Luis F.; de Lima Alves, Flavia; Ogawa, Hiromi; Toramoto, Iyo; Sato, Nobuko; Morrison, Ciaran G.; Takeda, Shunichi; Hudson, Damien F.; Earnshaw, William C.

    2016-01-01

    Packaging of DNA into condensed chromosomes during mitosis is essential for the faithful segregation of the genome into daughter nuclei. Although the structure and composition of mitotic chromosomes have been studied for over 30 years, these aspects are yet to be fully elucidated. Here, we used stable isotope labeling with amino acids in cell culture to compare the proteomes of mitotic chromosomes isolated from cell lines harboring conditional knockouts of members of the condensin (SMC2, CAP-H, CAP-D3), cohesin (Scc1/Rad21), and SMC5/6 (SMC5) complexes. Our analysis revealed that these complexes associate with chromosomes independently of each other, with the SMC5/6 complex showing no significant dependence on any other chromosomal proteins during mitosis. To identify subtle relationships between chromosomal proteins, we employed a nano Random Forest (nanoRF) approach to detect protein complexes and the relationships between them. Our nanoRF results suggested that as few as 113 of 5058 detected chromosomal proteins are functionally linked to chromosome structure and segregation. Furthermore, nanoRF data revealed 23 proteins that were not previously suspected to have functional interactions with complexes playing important roles in mitosis. Subsequent small-interfering-RNA-based validation and localization tracking by green fluorescent protein-tagging highlighted novel candidates that might play significant roles in mitotic progression. PMID:27231315

  8. Polarimetric SAR decomposition parameter subset selection and their optimal dynamic range evaluation for urban area classification using Random Forest

    NASA Astrophysics Data System (ADS)

    Hariharan, Siddharth; Tirodkar, Siddhesh; Bhattacharya, Avik

    2016-02-01

    Urban area classification is important for monitoring the ever increasing urbanization and studying its environmental impact. Two NASA JPL's UAVSAR datasets of L-band (wavelength: 23 cm) were used in this study for urban area classification. The two datasets used in this study are different in terms of urban area structures, building patterns, their geometric shapes and sizes. In these datasets, some urban areas appear oriented about the radar line of sight (LOS) while some areas appear non-oriented. In this study, roll invariant polarimetric SAR decomposition parameters were used to classify these urban areas. Random Forest (RF), which is an ensemble decision tree learning technique, was used in this study. RF performs parameter subset selection as a part of its classification procedure. In this study, parameter subsets were obtained and analyzed to infer scattering mechanisms useful for urban area classification. The Cloude-Pottier α, the Touzi dominant scattering amplitude αs1 and the anisotropy A were among the top six important parameters selected for both the datasets. However, it was observed that these parameters were ranked differently for the two datasets. The urban area classification using RF was compared with the Support Vector Machine (SVM) and the Maximum Likelihood Classifier (MLC) for both the datasets. RF outperforms SVM by 4% and MLC by 12% in Dataset 1. It also outperforms SVM and MLC by 3.5% and 11% respectively in Dataset 2.

  9. Agricultural cropland mapping using black-and-white aerial photography, Object-Based Image Analysis and Random Forests

    NASA Astrophysics Data System (ADS)

    Vogels, M. F. A.; de Jong, S. M.; Sterk, G.; Addink, E. A.

    2017-02-01

    Land-use and land-cover (LULC) conversions have an important impact on land degradation, erosion and water availability. Information on historical land cover (change) is crucial for studying and modelling land- and ecosystem degradation. During the past decades major LULC conversions occurred in Africa, Southeast Asia and South America as a consequence of a growing population and economy. Most distinct is the conversion of natural vegetation into cropland. Historical LULC information can be derived from satellite imagery, but these only date back until approximately 1972. Before the emergence of satellite imagery, landscapes were monitored by black-and-white (B&W) aerial photography. This photography is often visually interpreted, which is a very time-consuming approach. This study presents an innovative, semi-automated method to map cropland acreage from B&W photography. Cropland acreage was mapped on two study sites in Ethiopia and in The Netherlands. For this purpose we used Geographic Object-Based Image Analysis (GEOBIA) and a Random Forest classification on a set of variables comprising texture, shape, slope, neighbour and spectral information. Overall mapping accuracies attained are 90% and 96% for the two study areas respectively. This mapping method increases the timeline at which historical cropland expansion can be mapped purely from brightness information in B&W photography up to the 1930s, which is beneficial for regions where historical land-use statistics are mostly absent.

  10. DNABP: Identification of DNA-Binding Proteins Based on Feature Selection Using a Random Forest and Predicting Binding Residues

    PubMed Central

    Guo, Jing; Sun, Xiao

    2016-01-01

    DNA-binding proteins are fundamentally important in cellular processes. Several computational-based methods have been developed to improve the prediction of DNA-binding proteins in previous years. However, insufficient work has been done on the prediction of DNA-binding proteins from protein sequence information. In this paper, a novel predictor, DNABP (DNA-binding proteins), was designed to predict DNA-binding proteins using the random forest (RF) classifier with a hybrid feature. The hybrid feature contains two types of novel sequence features, which reflect information about the conservation of physicochemical properties of the amino acids, and the binding propensity of DNA-binding residues and non-binding propensities of non-binding residues. The comparisons with each feature demonstrated that these two novel features contributed most to the improvement in predictive ability. Furthermore, to improve the prediction performance of the DNABP model, feature selection using the minimum redundancy maximum relevance (mRMR) method combined with incremental feature selection (IFS) was carried out during the model construction. The results showed that the DNABP model could achieve 86.90% accuracy, 83.76% sensitivity, 90.03% specificity and a Matthews correlation coefficient of 0.727. High prediction accuracy and performance comparisons with previous research suggested that DNABP could be a useful approach to identify DNA-binding proteins from sequence information. The DNABP web server system is freely available at http://www.cbi.seu.edu.cn/DNABP/. PMID:27907159

  11. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling.

    PubMed

    Zhou, Fuqun; Zhang, Aining

    2016-10-25

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2-3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests' features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data.

  12. Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier.

    PubMed

    Steyrl, David; Scherer, Reinhold; Faller, Josef; Müller-Putz, Gernot R

    2016-02-01

    There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate - for the first time - that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.

  13. Mapping sub-antarctic cushion plants using random forests to combine very high resolution satellite imagery and terrain modelling.

    PubMed

    Bricher, Phillippa K; Lucieer, Arko; Shaw, Justine; Terauds, Aleks; Bergstrom, Dana M

    2013-01-01

    Monitoring changes in the distribution and density of plant species often requires accurate and high-resolution baseline maps of those species. Detecting such change at the landscape scale is often problematic, particularly in remote areas. We examine a new technique to improve accuracy and objectivity in mapping vegetation, combining species distribution modelling and satellite image classification on a remote sub-Antarctic island. In this study, we combine spectral data from very high resolution WorldView-2 satellite imagery and terrain variables from a high resolution digital elevation model to improve mapping accuracy, in both pixel- and object-based classifications. Random forest classification was used to explore the effectiveness of these approaches on mapping the distribution of the critically endangered cushion plant Azorella macquariensis Orchard (Apiaceae) on sub-Antarctic Macquarie Island. Both pixel- and object-based classifications of the distribution of Azorella achieved very high overall validation accuracies (91.6-96.3%, κ = 0.849-0.924). Both two-class and three-class classifications were able to accurately and consistently identify the areas where Azorella was absent, indicating that these maps provide a suitable baseline for monitoring expected change in the distribution of the cushion plants. Detecting such change is critical given the threats this species is currently facing under altering environmental conditions. The method presented here has applications to monitoring a range of species, particularly in remote and isolated environments.

  14. Peaks and dips in Gaussian random fields: a new algorithm for the shear eigenvalues, and the excursion set theory

    NASA Astrophysics Data System (ADS)

    Rossi, Graziano

    2013-04-01

    We present a new algorithm to sample the constrained eigenvalues of the initial shear field associated with Gaussian statistics, called the `peak/dip excursion-set-based' algorithm, at positions which correspond to peaks or dips of the correlated density field. The computational procedure is based on a new formula which extends Doroshkevich's unconditional distribution for the eigenvalues of the linear tidal field, to account for the fact that haloes and voids may correspond to maxima or minima of the density field. The ability to differentiate between random positions and special points in space around which haloes or voids may form (i.e. peaks/dips), encoded in the new formula and reflected in the algorithm, naturally leads to a straightforward implementation of an excursion set model for peaks and dips in Gaussian random fields - one of the key advantages of this sampling procedure. In addition, it offers novel insights into the statistical description of the cosmic web. As a first physical application, we show how the standard distributions of shear ellipticity and prolateness in triaxial models of structure formation are modified by the constraint. In particular, we provide a new expression for the conditional distribution of shape parameters given the density peak constraint, which generalizes some previous literature work. The formula has important implications for the modelling of non-spherical dark matter halo shapes, in relation to their initial shape distribution. We also test and confirm our theoretical predictions for the individual distributions of eigenvalues subjected to the extremum constraint, along with other directly related conditional probabilities. Finally, we indicate how the proposed sampling procedure naturally integrates into the standard excursion set model, potentially solving some of its well-known problems, and into the ellipsoidal collapse framework. Several other ongoing applications and extensions, towards the development of

  15. A production-inventory model with permissible delay incorporating learning effect in random planning horizon using genetic algorithm

    NASA Astrophysics Data System (ADS)

    Kar, Mohuya B.; Bera, Shankar; Das, Debasis; Kar, Samarjit

    2015-10-01

    This paper presents a production-inventory model for deteriorating items with stock-dependent demand under inflation in a random planning horizon. The supplier offers the retailer fully permissible delay in payment. It is assumed that the time horizon of the business period is random in nature and follows exponential distribution with a known mean. Here learning effect is also introduced for the production cost and setup cost. The model is formulated as profit maximization problem with respect to the retailer and solved with the help of genetic algorithm (GA) and PSO. Moreover, the convergence of two methods—GA and PSO—is studied against generation numbers and it is seen that GA converges rapidly than PSO. The optimum results from methods are compared both numerically and graphically. It is observed that the performance of GA is marginally better than PSO. We have provided some numerical examples and some sensitivity analyses to illustrate the model.

  16. Development of Solution Algorithm and Sensitivity Analysis for Random Fuzzy Portfolio Selection Model

    NASA Astrophysics Data System (ADS)

    Hasuike, Takashi; Katagiri, Hideki

    2010-10-01

    This paper focuses on the proposition of a portfolio selection problem considering an investor's subjectivity and the sensitivity analysis for the change of subjectivity. Since this proposed problem is formulated as a random fuzzy programming problem due to both randomness and subjectivity presented by fuzzy numbers, it is not well-defined. Therefore, introducing Sharpe ratio which is one of important performance measures of portfolio models, the main problem is transformed into the standard fuzzy programming problem. Furthermore, using the sensitivity analysis for fuzziness, the analytical optimal portfolio with the sensitivity factor is obtained.

  17. Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series

    NASA Astrophysics Data System (ADS)

    Nitze, Ingmar; Barrett, Brian; Cawkwell, Fiona

    2015-02-01

    The analysis and classification of land cover is one of the principal applications in terrestrial remote sensing. Due to the seasonal variability of different vegetation types and land surface characteristics, the ability to discriminate land cover types changes over time. Multi-temporal classification can help to improve the classification accuracies, but different constraints, such as financial restrictions or atmospheric conditions, may impede their application. The optimisation of image acquisition timing and frequencies can help to increase the effectiveness of the classification process. For this purpose, the Feature Importance (FI) measure of the state-of-the art machine learning method Random Forest was used to determine the optimal image acquisition periods for a general (Grassland, Forest, Water, Settlement, Peatland) and Grassland specific (Improved Grassland, Semi-Improved Grassland) land cover classification in central Ireland based on a 9-year time-series of MODIS Terra 16 day composite data (MOD13Q1). Feature Importances for each acquisition period of the Enhanced Vegetation Index (EVI) and Normalised Difference Vegetation Index (NDVI) were calculated for both classification scenarios. In the general land cover classification, the months December and January showed the highest, and July and August the lowest separability for both VIs over the entire nine-year period. This temporal separability was reflected in the classification accuracies, where the optimal choice of image dates outperformed the worst image date by 13% using NDVI and 5% using EVI on a mono-temporal analysis. With the addition of the next best image periods to the data input the classification accuracies converged quickly to their limit at around 8-10 images. The binary classification schemes, using two classes only, showed a stronger seasonal dependency with a higher intra-annual, but lower inter-annual variation. Nonetheless anomalous weather conditions, such as the cold winter of

  18. Fast Numerical Algorithms for 3-D Scattering from PEC and Dielectric Random Rough Surfaces in Microwave Remote Sensing

    NASA Astrophysics Data System (ADS)

    Zhang, Lisha

    We present fast and robust numerical algorithms for 3-D scattering from perfectly electrical conducting (PEC) and dielectric random rough surfaces in microwave remote sensing. The Coifman wavelets or Coiflets are employed to implement Galerkin's procedure in the method of moments (MoM). Due to the high-precision one-point quadrature, the Coiflets yield fast evaluations of the most off-diagonal entries, reducing the matrix fill effort from O(N2) to O( N). The orthogonality and Riesz basis of the Coiflets generate well conditioned impedance matrix, with rapid convergence for the conjugate gradient solver. The resulting impedance matrix is further sparsified by the matrix-formed standard fast wavelet transform (SFWT). By properly selecting multiresolution levels of the total transformation matrix, the solution precision can be enhanced while matrix sparsity and memory consumption have not been noticeably sacrificed. The unified fast scattering algorithm for dielectric random rough surfaces can asymptotically reduce to the PEC case when the loss tangent grows extremely large. Numerical results demonstrate that the reduced PEC model does not suffer from ill-posed problems. Compared with previous publications and laboratory measurements, good agreement is observed.

  19. Feature selection and classification of urinary mRNA microarray data by iterative random forest to diagnose renal fibrosis: a two-stage study

    PubMed Central

    Zhou, Le-Ting; Cao, Yu-Han; Lv, Lin-Li; Ma, Kun-Ling; Chen, Ping-Sheng; Ni, Hai-Feng; Lei, Xiang-Dong; Liu, Bi-Cheng

    2017-01-01

    Renal fibrosis is a common pathological pathway of progressive chronic kidney disease (CKD). However, kidney function parameters are suboptimal for detecting early fibrosis, and therefore, novel biomarkers are urgently needed. We designed a 2-stage study and constructed a targeted microarray to detect urinary mRNAs of CKD patients with renal biopsy and healthy participants. We analysed the microarray data by an iterative random forest method to select candidate biomarkers and produce a more accurate classifier of renal fibrosis. Seventy-six and 49 participants were enrolled into stage I and stage II studies, respectively. By the iterative random forest method, we identified a four-mRNA signature in urinary sediment, including TGFβ1, MMP9, TIMP2, and vimentin, as important features of tubulointerstitial fibrosis (TIF). All four mRNAs significantly correlated with TIF scores and discriminated TIF with high sensitivity, which was further validated in the stage-II study. The combined classifiers showed excellent sensitivity and outperformed serum creatinine and estimated glomerular filtration rate measurements in diagnosing TIF. Another four mRNAs significantly correlated with glomerulosclerosis. These findings showed that urinary mRNAs can serve as sensitive biomarkers of renal fibrosis, and the random forest classifier containing urinary mRNAs showed favourable performance in diagnosing early renal fibrosis. PMID:28045061

  20. Use of random forest to estimate population attributable fractions from a case-control study of Salmonella enterica serotype Enteritidis infections.

    PubMed

    Gu, W; Vieira, A R; Hoekstra, R M; Griffin, P M; Cole, D

    2015-10-01

    To design effective food safety programmes we need to estimate how many sporadic foodborne illnesses are caused by specific food sources based on case-control studies. Logistic regression has substantive limitations for analysing structured questionnaire data with numerous exposures and missing values. We adapted random forest to analyse data of a case-control study of Salmonella enterica serotype Enteritidis illness for source attribution. For estimation of summary population attributable fractions (PAFs) of exposures grouped into transmission routes, we devised a counterfactual estimator to predict reductions in illness associated with removing grouped exposures. For the purpose of comparison, we fitted the data using logistic regression models with stepwise forward and backward variable selection. Our results show that the forward and backward variable selection of logistic regression models were not consistent for parameter estimation, with different significant exposures identified. By contrast, the random forest model produced estimated PAFs of grouped exposures consistent in rank order with results obtained from outbreak data, with egg-related exposures having the highest estimated PAF (22·1%, 95% confidence interval 8·5-31·8). Random forest might be structurally more coherent and efficient than logistic regression models for attributing Salmonella illnesses to sources involving many causal pathways.

  1. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and

  2. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    PubMed

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  3. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and

  4. A comparative study of family-specific protein-ligand complex affinity prediction based on random forest approach

    NASA Astrophysics Data System (ADS)

    Wang, Yu; Guo, Yanzhi; Kuang, Qifan; Pu, Xuemei; Ji, Yue; Zhang, Zhihang; Li, Menglong

    2015-04-01

    The assessment of binding affinity between ligands and the target proteins plays an essential role in drug discovery and design process. As an alternative to widely used scoring approaches, machine learning methods have also been proposed for fast prediction of the binding affinity with promising results, but most of them were developed as all-purpose models despite of the specific functions of different protein families, since proteins from different function families always have different structures and physicochemical features. In this study, we proposed a random forest method to predict the protein-ligand binding affinity based on a comprehensive feature set covering protein sequence, binding pocket, ligand structure and intermolecular interaction. Feature processing and compression was respectively implemented for different protein family datasets, which indicates that different features contribute to different models, so individual representation for each protein family is necessary. Three family-specific models were constructed for three important protein target families of HIV-1 protease, trypsin and carbonic anhydrase respectively. As a comparison, two generic models including diverse protein families were also built. The evaluation results show that models on family-specific datasets have the superior performance to those on the generic datasets and the Pearson and Spearman correlation coefficients ( R p and Rs) on the test sets are 0.740, 0.874, 0.735 and 0.697, 0.853, 0.723 for HIV-1 protease, trypsin and carbonic anhydrase respectively. Comparisons with the other methods further demonstrate that individual representation and model construction for each protein family is a more reasonable way in predicting the affinity of one particular protein family.

  5. A correction scheme for a simplified analytical random walk model algorithm of proton dose calculation in distal Bragg peak regions

    NASA Astrophysics Data System (ADS)

    Yao, Weiguang; Merchant, Thomas E.; Farr, Jonathan B.

    2016-10-01

    The lateral homogeneity assumption is used in most analytical algorithms for proton dose, such as the pencil-beam algorithms and our simplified analytical random walk model. To improve the dose calculation in the distal fall-off region in heterogeneous media, we analyzed primary proton fluence near heterogeneous media and propose to calculate the lateral fluence with voxel-specific Gaussian distributions. The lateral fluence from a beamlet is no longer expressed by a single Gaussian for all the lateral voxels, but by a specific Gaussian for each lateral voxel. The voxel-specific Gaussian for the beamlet of interest is calculated by re-initializing the fluence deviation on an effective surface where the proton energies of the beamlet of interest and the beamlet passing the voxel are the same. The dose improvement from the correction scheme was demonstrated by the dose distributions in two sets of heterogeneous phantoms consisting of cortical bone, lung, and water and by evaluating distributions in example patients with a head-and-neck tumor and metal spinal implants. The dose distributions from Monte Carlo simulations were used as the reference. The correction scheme effectively improved the dose calculation accuracy in the distal fall-off region and increased the gamma test pass rate. The extra computation for the correction was about 20% of that for the original algorithm but is dependent upon patient geometry.

  6. A new logistic dynamic particle swarm optimization algorithm based on random topology.

    PubMed

    Ni, Qingjian; Deng, Jianming

    2013-01-01

    Population topology of particle swarm optimization (PSO) will directly affect the dissemination of optimal information during the evolutionary process and will have a significant impact on the performance of PSO. Classic static population topologies are usually used in PSO, such as fully connected topology, ring topology, star topology, and square topology. In this paper, the performance of PSO with the proposed random topologies is analyzed, and the relationship between population topology and the performance of PSO is also explored from the perspective of graph theory characteristics in population topologies. Further, in a relatively new PSO variant which named logistic dynamic particle optimization, an extensive simulation study is presented to discuss the effectiveness of the random topology and the design strategies of population topology. Finally, the experimental data are analyzed and discussed. And about the design and use of population topology on PSO, some useful conclusions are proposed which can provide a basis for further discussion and research.

  7. The adaptive dynamic community detection algorithm based on the non-homogeneous random walking

    NASA Astrophysics Data System (ADS)

    Xin, Yu; Xie, Zhi-Qiang; Yang, Jing

    2016-05-01

    With the changing of the habit and custom, people's social activity tends to be changeable. It is required to have a community evolution analyzing method to mine the dynamic information in social network. For that, we design the random walking possibility function and the topology gain function to calculate the global influence matrix of the nodes. By the analysis of the global influence matrix, the clustering directions of the nodes can be obtained, thus the NRW (Non-Homogeneous Random Walk) method for detecting the static overlapping communities can be established. We design the ANRW (Adaptive Non-Homogeneous Random Walk) method via adapting the nodes impacted by the dynamic events based on the NRW. The ANRW combines the local community detection with dynamic adaptive adjustment to decrease the computational cost for ANRW. Furthermore, the ANRW treats the node as the calculating unity, thus the running manner of the ANRW is suitable to the parallel computing, which could meet the requirement of large dataset mining. Finally, by the experiment analysis, the efficiency of ANRW on dynamic community detection is verified.

  8. Robust 3D object localization and pose estimation for random bin picking with the 3DMaMa algorithm

    NASA Astrophysics Data System (ADS)

    Skotheim, Øystein; Thielemann, Jens T.; Berge, Asbjørn; Sommerfelt, Arne

    2010-02-01

    Enabling robots to automatically locate and pick up randomly placed and oriented objects from a bin is an important challenge in factory automation, replacing tedious and heavy manual labor. A system should be able to recognize and locate objects with a predefined shape and estimate the position with the precision necessary for a gripping robot to pick it up. We describe a system that consists of a structured light instrument for capturing 3D data and a robust approach for object location and pose estimation. The method does not depend on segmentation of range images, but instead searches through pairs of 2D manifolds to localize candidates for object match. This leads to an algorithm that is not very sensitive to scene complexity or the number of objects in the scene. Furthermore, the strategy for candidate search is easily reconfigurable to arbitrary objects. Experiments reported in this paper show the utility of the method on a general random bin picking problem, in this paper exemplified by localization of car parts with random position and orientation. Full pose estimation is done in less than 380 ms per image. We believe that the method is applicable for a wide range of industrial automation problems where precise localization of 3D objects in a scene is needed.

  9. Exploring issues of training data imbalance and mislabelling on random forest performance for large area land cover classification using the ensemble margin

    NASA Astrophysics Data System (ADS)

    Mellor, Andrew; Boukir, Samia; Haywood, Andrew; Jones, Simon

    2015-07-01

    Studies have demonstrated the robust performance of the ensemble machine learning classifier, random forests, for remote sensing land cover classification, particularly across complex landscapes. This study introduces new ensemble margin criteria to evaluate the performance of Random Forests (RF) in the context of large area land cover classification and examines the effect of different training data characteristics (imbalance and mislabelling) on classification accuracy and uncertainty. The study presents a new margin weighted confusion matrix, which used in combination with the traditional confusion matrix, provides confidence estimates associated with correctly and misclassified instances in the RF classification model. Landsat TM satellite imagery, topographic and climate ancillary data are used to build binary (forest/non-forest) and multiclass (forest canopy cover classes) classification models, trained using sample aerial photograph maps, across Victoria, Australia. Experiments were undertaken to reveal insights into the behaviour of RF over large and complex data, in which training data are not evenly distributed among classes (imbalance) and contain systematically mislabelled instances. Results of experiments reveal that while the error rate of the RF classifier is relatively insensitive to mislabelled training data (in the multiclass experiment, overall 78.3% Kappa with no mislabelled instances to 70.1% with 25% mislabelling in each class), the level of associated confidence falls at a faster rate than overall accuracy with increasing amounts of mislabelled training data. In general, balanced training data resulted in the lowest overall error rates for classification experiments (82.3% and 78.3% for the binary and multiclass experiments respectively). However, results of the study demonstrate that imbalance can be introduced to improve error rates of more difficult classes, without adversely affecting overall classification accuracy.

  10. Predictive value of initial FDG-PET features for treatment response and survival in esophageal cancer patients treated with chemo-radiation therapy using a random forest classifier

    PubMed Central

    Ruan, Su; Modzelewski, Romain; Pineau, Pascal; Vauclin, Sébastien; Gouel, Pierrick; Michel, Pierre; Di Fiore, Frédéric; Vera, Pierre; Gardin, Isabelle

    2017-01-01

    Purpose In oncology, texture features extracted from positron emission tomography with 18-fluorodeoxyglucose images (FDG-PET) are of increasing interest for predictive and prognostic studies, leading to several tens of features per tumor. To select the best features, the use of a random forest (RF) classifier was investigated. Methods Sixty-five patients with an esophageal cancer treated with a combined chemo-radiation therapy were retrospectively included. All patients underwent a pretreatment whole-body FDG-PET. The patients were followed for 3 years after the end of the treatment. The response assessment was performed 1 month after the end of the therapy. Patients were classified as complete responders and non-complete responders. Sixty-one features were extracted from medical records and PET images. First, Spearman’s analysis was performed to eliminate correlated features. Then, the best predictive and prognostic subsets of features were selected using a RF algorithm. These results were compared to those obtained by a Mann-Whitney U test (predictive study) and a univariate Kaplan-Meier analysis (prognostic study). Results Among the 61 initial features, 28 were not correlated. From these 28 features, the best subset of complementary features found using the RF classifier to predict response was composed of 2 features: metabolic tumor volume (MTV) and homogeneity from the co-occurrence matrix. The corresponding predictive value (AUC = 0.836 ± 0.105, Se = 82 ± 9%, Sp = 91 ± 12%) was higher than the best predictive results found using the Mann-Whitney test: busyness from the gray level difference matrix (P < 0.0001, AUC = 0.810, Se = 66%, Sp = 88%). The best prognostic subset found using RF was composed of 3 features: MTV and 2 clinical features (WHO status and nutritional risk index) (AUC = 0.822 ± 0.059, Se = 79 ± 9%, Sp = 95 ± 6%), while no feature was significantly prognostic according to the Kaplan-Meier analysis. Conclusions The RF classifier can

  11. Efficient asymmetric image authentication schemes based on photon counting-double random phase encoding and RSA algorithms.

    PubMed

    Moon, Inkyu; Yi, Faliu; Han, Mingu; Lee, Jieun

    2016-06-01

    Recently, double random phase encoding (DRPE) has been integrated with the photon counting (PC) imaging technique for the purpose of secure image authentication. In this scheme, the same key should be securely distributed and shared between the sender and receiver, but this is one of the most vexing problems of symmetric cryptosystems. In this study, we propose an efficient asymmetric image authentication scheme by combining the PC-DRPE and RSA algorithms, which solves key management and distribution problems. The retrieved image from the proposed authentication method contains photon-limited encrypted data obtained by means of PC-DRPE. Therefore, the original image can be protected while the retrieved image can be efficiently verified using a statistical nonlinear correlation approach. Experimental results demonstrate the feasibility of our proposed asymmetric image authentication method.

  12. Using Logistic Regression and Random Forests multivariate statistical methods for landslide spatial probability assessment in North-Est Sicily, Italy

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-04-01

    first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.

  13. Forest Walk Methods for Localizing Body Joints from Single Depth Image

    PubMed Central

    Jung, Ho Yub; Lee, Soochahn; Heo, Yong Seok; Yun, Il Dong

    2015-01-01

    We present multiple random forest methods for human pose estimation from single depth images that can operate in very high frame rate. We introduce four algorithms: random forest walk, greedy forest walk, random forest jumps, and greedy forest jumps. The proposed approaches can accurately infer the 3D positions of body joints without additional information such as temporal prior. A regression forest is trained to estimate the probability distribution to the direction or offset toward the particular joint, relative to the adjacent position. During pose estimation, the new position is chosen from a set of representative directions or offsets. The distribution for next position is found from traversing the regression tree from new position. The continual position sampling through 3D space will eventually produce an expectation of sample positions, which we estimate as the joint position. The experiments show that the accuracy is higher than current state-of-the-art pose estimation methods with additional advantage in computation time. PMID:26402029

  14. Automatic segmentation of ground-glass opacities in lung CT images by using Markov random field-based algorithms.

    PubMed

    Zhu, Yanjie; Tan, Yongqing; Hua, Yanqing; Zhang, Guozhen; Zhang, Jianguo

    2012-06-01

    Chest radiologists rely on the segmentation and quantificational analysis of ground-glass opacities (GGO) to perform imaging diagnoses that evaluate the disease severity or recovery stages of diffuse parenchymal lung diseases. However, it is computationally difficult to segment and analyze patterns of GGO while compared with other lung diseases, since GGO usually do not have clear boundaries. In this paper, we present a new approach which automatically segments GGO in lung computed tomography (CT) images using algorithms derived from Markov random field theory. Further, we systematically evaluate the performance of the algorithms in segmenting GGO in lung CT images under different situations. CT image studies from 41 patients with diffuse lung diseases were enrolled in this research. The local distributions were modeled with both simple and adaptive (AMAP) models of maximum a posteriori (MAP). For best segmentation, we used the simulated annealing algorithm with a Gibbs sampler to solve the combinatorial optimization problem of MAP estimators, and we applied a knowledge-guided strategy to reduce false positive regions. We achieved AMAP-based GGO segmentation results of 86.94%, 94.33%, and 94.06% in average sensitivity, specificity, and accuracy, respectively, and we evaluated the performance using radiologists' subjective evaluation and quantificational analysis and diagnosis. We also compared the results of AMAP-based GGO segmentation with those of support vector machine-based methods, and we discuss the reliability and other issues of AMAP-based GGO segmentation. Our research results demonstrate the acceptability and usefulness of AMAP-based GGO segmentation for assisting radiologists in detecting GGO in high-resolution CT diagnostic procedures.

  15. Estimating Digital Terrain Model in forest areas from TanDEM-X and Stereo-photogrammetric technique by means of Random Volume over Ground model

    NASA Astrophysics Data System (ADS)

    Lee, S. K.; Fatoyinbo, T. E.; Lagomasino, D.; Osmanoglu, B.; Feliciano, E. A.

    2015-12-01

    The Digital Terrain Model (DTM) in forest areas is invaluable information for various environmental, hydrological and ecological studies, for example, watershed delineation, vegetation canopy height, water dynamic modeling, forest biomass and carbon estimations. There are few solutions to extract bare-earth Digital Elevation Model information. Airborne lidar systems are widely and successfully used for estimating bare-earth DEMs with centimeter-order accuracy and high spatial resolution. However, expensive cost of operation and small image coverage prevent the use of airborne lidar sensors for large- or global-scale. Although IceSAT/GLAS (Ice, Cloud, and Land Elevation Satellite/Geoscience Laser Altimeter System) lidar data sets have been available for global DTM estimate with relatively lower cost, the large footprint size of 70 m and the interval of 172 m are insufficient for various applications. In this study we propose to extract higher resolution bare-earth DEM over vegetated areas from the combination of interferometric complex coherence from single-pass TanDEM-X (TDX) data at HH polarization and Digital Surface Model (DSM) derived from high-resolution WorldView (WV) images by means of random volume over ground (RVoG) model. The RVoG model is a widely and successfully used model for polarimetric SAR interferometry (Pol-InSAR) forest canopy height inversion. The bare-earth DEM is obtained by complex volume decorrelation in the RVoG model with the DSM estimated by stereo-photogrammetric technique. Forest canopy height can be estimated by subtracting the estimated bare-earth model from the DSM. Finally, the DTM from airborne lidar system was used to validate the bare-earth DEM and forest canopy height estimates.

  16. Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data

    NASA Astrophysics Data System (ADS)

    Ramoelo, Abel; Cho, M. A.; Mathieu, R.; Madonsela, S.; van de Kerchove, R.; Kaszta, Z.; Wolff, E.

    2015-12-01

    Land use and climate change could have huge impacts on food security and the health of various ecosystems. Leaf nitrogen (N) and above-ground biomass are some of the key factors limiting agricultural production and ecosystem functioning. Leaf N and biomass can be used as indicators of rangeland quality and quantity. Conventional methods for assessing these vegetation parameters at landscape scale level are time consuming and tedious. Remote sensing provides a bird-eye view of the landscape, which creates an opportunity to assess these vegetation parameters over wider rangeland areas. Estimation of leaf N has been successful during peak productivity or high biomass and limited studies estimated leaf N in dry season. The estimation of above-ground biomass has been hindered by the signal saturation problems using conventional vegetation indices. The objective of this study is to monitor leaf N and above-ground biomass as an indicator of rangeland quality and quantity using WorldView-2 satellite images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several conventional and red edge based vegetation indices were computed. Overall results indicate that random forest and vegetation indices explained over 89% of leaf N concentrations for grass and trees, and less than 89% for all the years of assessment. The red edge based vegetation indices were among the important variables for predicting leaf N. For the biomass, random forest model explained over 84% of biomass variation in all years, and visible bands including red edge based vegetation indices were found to be important. The study demonstrated that leaf N could be monitored using high spatial resolution with the red edge band capability, and is important for rangeland assessment and monitoring.

  17. The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations.

    PubMed

    Melcher, Michael; Scharl, Theresa; Spangl, Bernhard; Luchner, Markus; Cserjan, Monika; Bayer, Karl; Leisch, Friedrich; Striedner, Gerald

    2015-09-01

    Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical "quality by testing" to rational, knowledge-based "quality by design" approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real-time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed-batch production processes with varying cultivation conditions employing a comprehensive on- and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two-dimensional multi-wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool.

  18. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain) using analogues and analogues followed by random forests and multiple linear regression

    NASA Astrophysics Data System (ADS)

    Ibarra-Berastegi, G.; Saénz, J.; Ezcurra, A.; Elías, A.; Diaz Argandoña, J.; Errasti, I.

    2011-06-01

    In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields, surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built: (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (1997-2001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributed to the analogues-calculation stage.

  19. Downscaling of surface moisture flux and precipitation in the Ebro Valley (Spain) using analogues and analogues followed by random forests and multiple linear regression

    NASA Astrophysics Data System (ADS)

    Ibarra-Berastegi, G.; Saénz, J.; Ezcurra, A.; Elías, A.; Diaz de Argandoña, J.; Errasti, I.

    2011-02-01

    In this paper, reanalysis fields from the ECMWF have been statistically downscaled to predict from large-scale atmospheric fields surface moisture flux and daily precipitation at two observatories (Zaragoza and Tortosa, Ebro Valley, Spain) during the 1961-2001 period. Three types of downscaling models have been built (i) analogues, (ii) analogues followed by random forests and (iii) analogues followed by multiple linear regression. The inputs consist of data (predictor fields) taken from the ERA-40 reanalysis. The predicted fields are precipitation and surface moisture flux as measured at the two observatories. With the aim to reduce the dimensionality of the problem, the ERA-40 fields have been decomposed using empirical orthogonal functions. Available daily data has been divided into two parts: a training period used to find a group of about 300 analogues to build the downscaling model (1961-1996) and a test period (1997-2001), where models' performance has been assessed using independent data. In the case of surface moisture flux, the models based on analogues followed by random forests do not clearly outperform those built on analogues plus multiple linear regression, while simple averages calculated from the nearest analogues found in the training period, yielded only slightly worse results. In the case of precipitation, the three types of model performed equally. These results suggest that most of the models' downscaling capabilities can be attributted to the analogues-calculation stage.

  20. Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

    NASA Astrophysics Data System (ADS)

    Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

    2012-04-01

    The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.

  1. Derivation of Randomized Algorithms.

    DTIC Science & Technology

    1985-10-01

    INSTRUCTIONSREPOT DCUMNTATON AGEBEFORE COMPLETING FORM 2.1T ACCESO NO I.RCIPIENT’S CATALOG NUMBER TILE(adSutile5. TYPE OF REPORT & PERIOD COVERED DERIVATION OF... multiple comparisons between keys are allowed on each step. Thus a comparison tree machine with p processors is allowed a maximum of p comparisons at...be generated from a single original RAM by execution of a fork operation. This model, known as PRAM, allows multiple concurrent reads but prohibits

  2. Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

    NASA Astrophysics Data System (ADS)

    Forkert, Nils Daniel; Fiehler, Jens

    2015-03-01

    The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.

  3. Population genetic structure of two rare tree species (Colubrina oppositifolia and Alphitonia ponderosa, Rhamnaceae) from Hawaiian dry and mesic forests using random amplified polymorphic DNA markers.

    PubMed

    Kwon, J A; Morden, C W

    2002-06-01

    Hawaiian dry and mesic forests contain an increasingly rare assemblage of species due to habitat destruction, invasive alien weeds and exotic pests. Two rare Rhamnaceae species in these ecosystems, Colubrina oppositifolia and Alphitonia ponderosa, were examined using random amplified polymorphic DNA (RAPD) markers to determine the genetic structure of the populations and the amount of variation relative to other native Hawaiian species. Relative variation is lower than with other Hawaiian species, although this is probably not a consequence of genetic bottleneck. Larger populations of both species contain the highest levels of genetic diversity and smaller populations generally the least as determined by number of polymorphic loci, estimated heterozygosity, and Shannon's index of genetic diversity. Populations on separate islands were readily discernible for both species as were two populations of C. oppositifolia on Hawai'i island (North and South Kona populations). Substructure among Kaua'i subpopulations of A. ponderosa that were ecologically separated was also evident. Although population diversity is thought to have remained at predisturbance levels, population size continues to decline as recruitment is either absent or does not keep pace with senescence of mature plants. Recovery efforts must focus on control of alien species if these and other endemic dry and mesic forest species are to persist.

  4. Exposure assessment models for elemental components of particulate matter in an urban environment: A comparison of regression and random forest approaches

    NASA Astrophysics Data System (ADS)

    Brokamp, Cole; Jandarov, Roman; Rao, M. B.; LeMasters, Grace; Ryan, Patrick

    2017-02-01

    Exposure assessment for elemental components of particulate matter (PM) using land use modeling is a complex problem due to the high spatial and temporal variations in pollutant concentrations at the local scale. Land use regression (LUR) models may fail to capture complex interactions and non-linear relationships between pollutant concentrations and land use variables. The increasing availability of big spatial data and machine learning methods present an opportunity for improvement in PM exposure assessment models. In this manuscript, our objective was to develop a novel land use random forest (LURF) model and compare its accuracy and precision to a LUR model for elemental components of PM in the urban city of Cincinnati, Ohio. PM smaller than 2.5 μm (PM2.5) and eleven elemental components were measured at 24 sampling stations from the Cincinnati Childhood Allergy and Air Pollution Study (CCAAPS). Over 50 different predictors associated with transportation, physical features, community socioeconomic characteristics, greenspace, land cover, and emission point sources were used to construct LUR and LURF models. Cross validation was used to quantify and compare model performance. LURF and LUR models were created for aluminum (Al), copper (Cu), iron (Fe), potassium (K), manganese (Mn), nickel (Ni), lead (Pb), sulfur (S), silicon (Si), vanadium (V), zinc (Zn), and total PM2.5 in the CCAAPS study area. LURF utilized a more diverse and greater number of predictors than LUR and LURF models for Al, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all showed a decrease in fractional predictive error of at least 5% compared to their LUR models. LURF models for Al, Cu, Fe, K, Mn, Pb, Si, Zn, TRAP, and PM2.5 all had a cross validated fractional predictive error less than 30%. Furthermore, LUR models showed a differential exposure assessment bias and had a higher prediction error variance. Random forest and other machine learning methods may provide more accurate exposure assessment.

  5. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in northern Minnesota

    USGS Publications Warehouse

    Corcoran, Jennifer M.; Knight, Joseph F.; Gallant, Alisa L.

    2013-01-01

    Wetland mapping at the landscape scale using remotely sensed data requires both affordable data and an efficient accurate classification method. Random forest classification offers several advantages over traditional land cover classification techniques, including a bootstrapping technique to generate robust estimations of outliers in the training data, as well as the capability of measuring classification confidence. Though the random forest classifier can generate complex decision trees with a multitude of input data and still not run a high risk of over fitting, there is a great need to reduce computational and operational costs by including only key input data sets without sacrificing a significant level of accuracy. Our main questions for this study site in Northern Minnesota were: (1) how does classification accuracy and confidence of mapping wetlands compare using different remote sensing platforms and sets of input data; (2) what are the key input variables for accurate differentiation of upland, water, and wetlands, including wetland type; and (3) which datasets and seasonal imagery yield the best accuracy for wetland classification. Our results show the key input variables include terrain (elevation and curvature) and soils descriptors (hydric), along with an assortment of remotely sensed data collected in the spring (satellite visible, near infrared, and thermal bands; satellite normalized vegetation index and Tasseled Cap greenness and wetness; and horizontal-horizontal (HH) and horizontal-vertical (HV) polarization using L-band satellite radar). We undertook this exploratory analysis to inform decisions by natural resource managers charged with monitoring wetland ecosystems and to aid in designing a system for consistent operational mapping of wetlands across landscapes similar to those found in Northern Minnesota.

  6. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models

    NASA Astrophysics Data System (ADS)

    Hong, Haoyuan; Pourghasemi, Hamid Reza; Pourtaghi, Zohre Sadat

    2016-04-01

    Landslides are an important natural hazard that causes a great amount of damage around the world every year, especially during the rainy season. The Lianhua area is located in the middle of China's southern mountainous area, west of Jiangxi Province, and is known to be an area prone to landslides. The aim of this study was to evaluate and compare landslide susceptibility maps produced using the random forest (RF) data mining technique with those produced by bivariate (evidential belief function and frequency ratio) and multivariate (logistic regression) statistical models for Lianhua County, China. First, a landslide inventory map was prepared using aerial photograph interpretation, satellite images, and extensive field surveys. In total, 163 landslide events were recognized in the study area, with 114 landslides (70%) used for training and 49 landslides (30%) used for validation. Next, the landslide conditioning factors-including the slope angle, altitude, slope aspect, topographic wetness index (TWI), slope-length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, annual precipitation, land use, normalized difference vegetation index (NDVI), and lithology-were derived from the spatial database. Finally, the landslide susceptibility maps of Lianhua County were generated in ArcGIS 10.1 based on the random forest (RF), evidential belief function (EBF), frequency ratio (FR), and logistic regression (LR) approaches and were validated using a receiver operating characteristic (ROC) curve. The ROC plot assessment results showed that for landslide susceptibility maps produced using the EBF, FR, LR, and RF models, the area under the curve (AUC) values were 0.8122, 0.8134, 0.7751, and 0.7172, respectively. Therefore, we can conclude that all four models have an AUC of more than 0.70 and can be used in landslide susceptibility mapping in the study area; meanwhile, the EBF and FR models had the best performance for Lianhua

  7. Segmentation of the Cerebellar Peduncles Using a Random Forest Classifier and a Multi-object Geometric Deformable Model: Application to Spinocerebellar Ataxia Type 6.

    PubMed

    Ye, Chuyang; Yang, Zhen; Ying, Sarah H; Prince, Jerry L

    2015-07-01

    The cerebellar peduncles, comprising the superior cerebellar peduncles (SCPs), the middle cerebellar peduncle (MCP), and the inferior cerebellar peduncles (ICPs), are white matter tracts that connect the cerebellum to other parts of the central nervous system. Methods for automatic segmentation and quantification of the cerebellar peduncles are needed for objectively and efficiently studying their structure and function. Diffusion tensor imaging (DTI) provides key information to support this goal, but it remains challenging because the tensors change dramatically in the decussation of the SCPs (dSCP), the region where the SCPs cross. This paper presents an automatic method for segmenting the cerebellar peduncles, including the dSCP. The method uses volumetric segmentation concepts based on extracted DTI features. The dSCP and noncrossing portions of the peduncles are modeled as separate objects, and are initially classified using a random forest classifier together with the DTI features. To obtain geometrically correct results, a multi-object geometric deformable model is used to refine the random forest classification. The method was evaluated using a leave-one-out cross-validation on five control subjects and four patients with spinocerebellar ataxia type 6 (SCA6). It was then used to evaluate group differences in the peduncles in a population of 32 controls and 11 SCA6 patients. In the SCA6 group, we have observed significant decreases in the volumes of the dSCP and the ICPs and significant increases in the mean diffusivity in the noncrossing SCPs, the MCP, and the ICPs. These results are consistent with a degeneration of the cerebellar peduncles in SCA6 patients.

  8. Free variable selection QSPR study to predict (19)F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods.

    PubMed

    Goudarzi, Nasser

    2016-04-05

    In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the (19)F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the (19)F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.

  9. Meta-analysis of randomized controlled trials reveals an improved clinical outcome of using genotype plus clinical algorithm for warfarin dosing.

    PubMed

    Liao, Zhenqi; Feng, Shaoguang; Ling, Peng; Zhang, Guoqing

    2015-02-01

    Previous studies have raised interest in using the genotyping of CYP2C9 and VKORC1 to guide warfarin dosing. However, there is lack of solid evidence to prove that genotype plus clinical algorithm provides improved clinical outcomes than the single clinical algorithm. The results of recent reported clinical trials are paradoxical and needs to be systematically evaluated. In this study, we aim to assess whether genotype plus clinical algorithm of warfarin is superior to the single clinical algorithm through a meta-analysis of randomized controlled trials (RCTs). All relevant studies from PubMed and reference lists from Jan 1, 1995 to Jan 13, 2014 were extracted and screened. Eligible studies included randomized trials that compared clinical plus pharmacogenetic algorithms group to single clinical algorithm group using adult (≥ 18 years) patients with disease conditions that require warfarin use. We further used fix-effect models to calculate the mean difference or the risk ratios (RRs) and 95% CIs to analyze the extracted data. The statistical heterogeneity was calculated using I(2). The percentage of time within the therapeutic INR range was considered to be the primary clinical outcome. The initial search strategy identified 50 citations and 7 trials were eligible. These seven trials included 1,910 participants, including 960 patients who received genotype plus clinical algorithm of warfarin dosing and 950 patients who received clinical algorithm only. We discovered that the percentage of time within the therapeutic INR range of the genotype-guided group was improved compared with the standard group in the RCTs when the initial standard dose was fixed (95% CI 0.09-0.40; I(2) = 47.8%). However, for the studies using non-fixed initial doses, the genotype-guided group failed to exhibit statistically significant outcome compared to the standard group. No significant difference was observed in the incidences of adverse events (RR 0.94, 95% CI 0.84-1.04; I(2) = 0%, p

  10. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    USGS Publications Warehouse

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  11. Extending OPNET Modeler with External Pseudo Random Number Generators and Statistical Evaluation by the Limited Relative Error Algorithm

    NASA Astrophysics Data System (ADS)

    Becker, Markus; Weerawardane, Thushara Lanka; Li, Xi; Görg, Carmelita

    Pseudo Random Number Generators (PRNG) are the base for stochastic simulations. The usage of good generators is essential for valid simulation results. OPNET Modeler a well-known tool for simulation of communication networks provides a Pseudo Random Number Generator. The extension of OPNET Modeler with external generators and additional statistical evaluation methods that has been performed for this paper increases the flexibility and options in the simulation studies performed.

  12. Optimal Subset Selection of Time-Series MODIS Images and Sample Data Transfer with Random Forests for Supervised Classification Modelling

    PubMed Central

    Zhou, Fuqun; Zhang, Aining

    2016-01-01

    Nowadays, various time-series Earth Observation data with multiple bands are freely available, such as Moderate Resolution Imaging Spectroradiometer (MODIS) datasets including 8-day composites from NASA, and 10-day composites from the Canada Centre for Remote Sensing (CCRS). It is challenging to efficiently use these time-series MODIS datasets for long-term environmental monitoring due to their vast volume and information redundancy. This challenge will be greater when Sentinel 2–3 data become available. Another challenge that researchers face is the lack of in-situ data for supervised modelling, especially for time-series data analysis. In this study, we attempt to tackle the two important issues with a case study of land cover mapping using CCRS 10-day MODIS composites with the help of Random Forests’ features: variable importance, outlier identification. The variable importance feature is used to analyze and select optimal subsets of time-series MODIS imagery for efficient land cover mapping, and the outlier identification feature is utilized for transferring sample data available from one year to an adjacent year for supervised classification modelling. The results of the case study of agricultural land cover classification at a regional scale show that using only about a half of the variables we can achieve land cover classification accuracy close to that generated using the full dataset. The proposed simple but effective solution of sample transferring could make supervised modelling possible for applications lacking sample data. PMID:27792152

  13. Gender classification in low-resolution surveillance video: in-depth comparison of random forests and SVMs

    NASA Astrophysics Data System (ADS)

    Geelen, Christopher D.; Wijnhoven, Rob G. J.; Dubbelman, Gijs; de With, Peter H. N.

    2015-03-01

    This research considers gender classification in surveillance environments, typically involving low-resolution images and a large amount of viewpoint variations and occlusions. Gender classification is inherently difficult due to the large intra-class variation and interclass correlation. We have developed a gender classification system, which is successfully evaluated on two novel datasets, which realistically consider the above conditions, typical for surveillance. The system reaches a mean accuracy of up to 90% and approaches our human baseline of 92.6%, proving a high-quality gender classification system. We also present an in-depth discussion of the fundamental differences between SVM and RF classifiers. We conclude that balancing the degree of randomization in any classifier is required for the highest classification accuracy. For our problem, an RF-SVM hybrid classifier exploiting the combination of HSV and LBP features results in the highest classification accuracy of 89.9 0.2%, while classification computation time is negligible compared to the detection time of pedestrians.

  14. Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange.

    PubMed

    Moussa, Jonathan E

    2014-01-07

    The random-phase approximation with second-order screened exchange (RPA+SOSEX) is a model of electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of mean field, and it scales as O(n(5)) operations and O(n(3)) memory for n electrons. We derive a new algorithm that reduces its scaling to O(n(3)) operations and O(n(2)) memory using controlled approximations and a new self-consistent field that approximates Brueckner coupled-cluster doubles theory with RPA+SOSEX, referred to as Brueckner RPA theory. The algorithm comparably reduces the scaling of second-order Mo̸ller-Plesset perturbation theory with smaller cost prefactors than RPA+SOSEX. Within a semiempirical model, we study H2 dissociation to test accuracy and Hn rings to verify scaling.

  15. H-DROP: an SVM based helical domain linker predictor trained with features optimized by combining random forest and stepwise selection

    NASA Astrophysics Data System (ADS)

    Ebina, Teppei; Suzuki, Ryosuke; Tsuji, Ryotaro; Kuroda, Yutaka

    2014-08-01

    Domain linker prediction is attracting much interest as it can help identifying novel domains suitable for high throughput proteomics analysis. Here, we report H-DROP, an SVM-based Helical Domain linker pRediction using OPtimal features. H-DROP is, to the best of our knowledge, the first predictor for specifically and effectively identifying helical linkers. This was made possible first because a large training dataset became available from IS-Dom, and second because we selected a small number of optimal features from a huge number of potential ones. The training helical linker dataset, which included 261 helical linkers, was constructed by detecting helical residues at the boundary regions of two independent structural domains listed in our previously reported IS-Dom dataset. 45 optimal feature candidates were selected from 3,000 features by random forest, which were further reduced to 26 optimal features by stepwise selection. The prediction sensitivity and precision of H-DROP were 35.2 and 38.8 %, respectively. These values were over 10.7 % higher than those of control methods including our previously developed DROP, which is a coil linker predictor, and PPRODO, which is trained with un-differentiated domain boundary sequences. Overall, these results indicated that helical linkers can be predicted from sequence information alone by using a strictly curated training data set for helical linkers and carefully selected set of optimal features. H-DROP is available at http://domserv.lab.tuat.ac.jp.

  16. Identification of copper phthalocyanine blue polymorphs in unaged and aged paint systems by means of micro-Raman spectroscopy and Random Forest.

    PubMed

    Anghelone, Marta; Jembrih-Simbürger, Dubravka; Schreiner, Manfred

    2015-10-05

    Copper phthalocyanine (CuPc) blues (PB15) are largely used in art and industry as pigments. In these fields mainly three different polymorphic modifications of PB15 are employed: alpha, beta and epsilon. Differentiating among these CuPc forms can give important information for developing conservation strategy and can help in relative dating, since each form was introduced in the market in different time periods. This study focuses on the classification of Raman spectra measured using 532 nm excitation wavelength on: (i) dry pigment powders, (ii) unaged mock-ups of self-made paints, (iii) unaged commercial paints, and (iv) paints subjected to accelerated UV ageing. The ratios among integrated Raman bands are taken in consideration as features to perform Random Forest (RF). Features selection based on Gini Contrast score was carried out on the measured dataset to determine the Raman bands ratios with higher predictive power. These were used as polymorphic markers, in order to establish an easy and accessible method for the identification. Three different ratios and the presence of a characteristic vibrational band allowed the identification of the crystal modification in pigments powder as well as in unaged and aged paint films.

  17. Quantitative and qualitative determination of LiuweiDihuang preparations by ultra high performance liquid chromatography in dual-wavelength fingerprinting mode and random forest.

    PubMed

    Qiu, Yixing; Huang, Jianhua; Jiang, Xingming; Chen, Yang; Liu, Yang; Zeng, Rong; Shehla, Nuzhat; Liu, Qiang; Liao, Duanfang; Guo, Dean; Liang, Yizeng; Wang, Wei

    2015-11-01

    The classical traditional Chinese formulation LiuweiDihuang, shown to have clinical efficacy for "nourishing kidney-yin" in traditional Chinese medicine, has been used for thousands of years in China. Little attention, however, has been paid to quality control methods for this formulation. Hence, a rapid and sensitive analytical technique is urgently needed for the evaluation of LiuweiDihuang preparations to assess its quality and pharmacological functionality. In this study, an ultra high performance liquid chromatography dual-wavelength method was developed to simultaneously determine 11 constituents in LiuweiDihuang preparations. This robust approach provided a fast and comprehensive quantitative determination of the major bioactive markers within LiuweiDihuang preparations. To distinguish four dosage forms of LiuweiDihuang preparations, a random forest technique was applied on the spectrometric fingerprint data obtained. This combination approach of chromatographic techniques and data analyses might serve as a rapid and efficient tool to ensure the quality of LiuweiDihuang preparations and other Chinese medicinal formulations and can support quality control and scientific research into the pharmacological potential for these formulations.

  18. A hybrid mixture discriminant analysis-random forest computational model for the prediction of volume of distribution of drugs in human.

    PubMed

    Lombardo, Franco; Obach, R Scott; Dicapua, Frank M; Bakken, Gregory A; Lu, Jing; Potter, David M; Gao, Feng; Miller, Michael D; Zhang, Yao

    2006-04-06

    A computational approach is described that can predict the VD(ss) of new compounds in humans, with an accuracy of within 2-fold of the actual value. A dataset of VD values for 384 drugs in humans was used to train a hybrid mixture discriminant analysis-random forest (MDA-RF) model using 31 computed descriptors. Descriptors included terms describing lipophilicity, ionization, molecular volume, and various molecular fragments. For a test set of 23 proprietary compounds not used in model construction, the geometric mean fold-error (GMFE) was 1.78-fold (+/-11.4%). The model was also tested using a leave-class out approach wherein subsets of drugs based on therapeutic class were removed from the training set of 384, the model was recast, and the VD(ss) values for each of the subsets were predicted. GMFE values ranged from 1.46 to 2.94-fold, depending on the subset. Finally, for an additional set of 74 compounds, VD(ss) predictions made using the computational model were compared to predictions made using previously described methods dependent on animal pharmacokinetic data. Computational VD(ss) predictions were, on average, 2.13-fold different from the VD(ss) predictions from animal data. The computational model described can predict human VD(ss) with an accuracy comparable to predictions requiring substantially greater effort and can be applied in place of animal experimentation.

  19. Influence of atmospherically induced random wave fronts on diffraction imagery - A computer simulation model for testing image reconstruction algorithms

    NASA Technical Reports Server (NTRS)

    Barakat, Richard; Beletic, James W.

    1990-01-01

    This paper is devoted to the development of a two-dimensional computer-simulation model that is based on the rigid constraints of optical diffraction theory with careful attention paid to the generation of sample realizations of Gaussian-distributed, spatially random, isotropic wave fronts that have zero-mean and prescribed-covariance functions. Given a sample realization of the wave front, the corresponding centered point-spread function and optical-transfer function are evaluated. A detailed study is made of the statistics of random wave-front tilt, point-spread function, modulus squared of transfer function, and phase of transfer function.

  20. Mapping forests in Monsoon Asia with ALOS PALSAR and MODIS imagery in 2010

    NASA Astrophysics Data System (ADS)

    Qin, Y.

    2015-12-01

    Spatial distribution and temporal dynamics of forests are important to climate change, carbon cycle, and biodiversity. An accurate forest map is required in monsoon Asia where extensive forest changes occurred. An algorithm was developed to map the distribution of forests in Monsoon Asia in 2010, with the integration of structure information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset, and phenology information from MOD13Q1 NDVI, and MOD09A1 land surface reflectance products. The PALSAR-based forest map was generated based on a decision tree classification, and assessed with the randomly selected ground truth samples from high spatial resolution images in Google Earth. The spatial and area comparison were performed between our forest map (OU/Fudan F/NF) and other forest maps generated by Japanese Space Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests). Then we investigate the reasons for the large uncertainties among these typical forest maps in 2010. This study could provide a way to monitor the dynamics of forests using the Synthetic Aperture Radar (SAR) and optical satellite images, and the resultant F/NF datasets can be used to analyze the impacts of changes in forests on climate and ecosystems.

  1. Using random forests to explore the effects of site attributes and soil properties on near-saturated and saturated hydraulic conductivity

    NASA Astrophysics Data System (ADS)

    Jorda, Helena; Koestel, John; Jarvis, Nicholas

    2014-05-01

    Knowledge of the near-saturated and saturated hydraulic conductivity of soil is fundamental for understanding important processes like groundwater contamination risks or runoff and soil erosion. Hydraulic conductivities are however difficult and time-consuming to determine by direct measurements, especially at the field scale or larger. So far, pedotransfer functions do not offer an especially reliable alternative since published approaches exhibit poor prediction performances. In our study we aimed at building pedotransfer functions by growing random forests (a statistical learning approach) on 486 datasets from the meta-database on tension-disk infiltrometer measurements collected from peer-reviewed literature and recently presented by Jarvis et al. (2013, Influence of soil, land use and climatic factors on the hydraulic conductivity of soil. Hydrol. Earth Syst. Sci. 17(12), 5185-5195). When some data from a specific source publication were allowed to enter the training set whereas others were used for validation, the results of a 10-fold cross-validation showed reasonable coefficients of determination of 0.53 for hydraulic conductivity at 10 cm tension, K10, and 0.41 for saturated conductivity, Ks. The estimated average annual temperature and precipitation at the site were the most important predictors for K10, while bulk density and estimated average annual temperature were most important for Ks prediction. The soil organic carbon content and the diameter of the disk infiltrometer were also important for the prediction of both K10 and Ks. However, coefficients of determination were around zero when all datasets of a specific source publication were excluded from the training set and exclusively used for validation. This may indicate experimenter bias, or that better predictors have to be found or that a larger dataset has to be used to infer meaningful pedotransfer functions for saturated and near-saturated hydraulic conductivities. More research is in progress

  2. Discovery of Novel Hepatitis C Virus NS5B Polymerase Inhibitors by Combining Random Forest, Multiple e-Pharmacophore Modeling and Docking

    PubMed Central

    Wei, Yu; Li, Jinlong; Qing, Jie; Huang, Mingjie; Wu, Ming; Gao, Fenghua; Li, Dongmei; Hong, Zhangyong; Kong, Lingbao; Huang, Weiqiang; Lin, Jianping

    2016-01-01

    The NS5B polymerase is one of the most attractive targets for developing new drugs to block Hepatitis C virus (HCV) infection. We describe the discovery of novel potent HCV NS5B polymerase inhibitors by employing a virtual screening (VS) approach, which is based on random forest (RB-VS), e-pharmacophore (PB-VS), and docking (DB-VS) methods. In the RB-VS stage, after feature selection, a model with 16 descriptors was used. In the PB-VS stage, six energy-based pharmacophore (e-pharmacophore) models from different crystal structures of the NS5B polymerase with ligands binding at the palm I, thumb I and thumb II regions were used. In the DB-VS stage, the Glide SP and XP docking protocols with default parameters were employed. In the virtual screening approach, the RB-VS, PB-VS and DB-VS methods were applied in increasing order of complexity to screen the InterBioScreen database. From the final hits, we selected 5 compounds for further anti-HCV activity and cellular cytotoxicity assay. All 5 compounds were found to inhibit NS5B polymerase with IC50 values of 2.01–23.84 μM and displayed anti-HCV activities with EC50 values ranging from 1.61 to 21.88 μM, and all compounds displayed no cellular cytotoxicity (CC50 > 100 μM) except compound N2, which displayed weak cytotoxicity with a CC50 value of 51.3 μM. The hit compound N2 had the best antiviral activity against HCV, with a selective index of 32.1. The 5 hit compounds with new scaffolds could potentially serve as NS5B polymerase inhibitors through further optimization and development. PMID:26845440

  3. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

    PubMed Central

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

  4. FITTING NONLINEAR ORDINARY DIFFERENTIAL EQUATION MODELS WITH RANDOM EFFECTS AND UNKNOWN INITIAL CONDITIONS USING THE STOCHASTIC APPROXIMATION EXPECTATION–MAXIMIZATION (SAEM) ALGORITHM

    PubMed Central

    Chow, Sy- Miin; Lu, Zhaohua; Zhu, Hongtu; Sherwood, Andrew

    2014-01-01

    The past decade has evidenced the increased prevalence of irregularly spaced longitudinal data in social sciences. Clearly lacking, however, are modeling tools that allow researchers to fit dynamic models to irregularly spaced data, particularly data that show nonlinearity and heterogeneity in dynamical structures. We consider the issue of fitting multivariate nonlinear differential equation models with random effects and unknown initial conditions to irregularly spaced data. A stochastic approximation expectation–maximization algorithm is proposed and its performance is evaluated using a benchmark nonlinear dynamical systems model, namely, the Van der Pol oscillator equations. The empirical utility of the proposed technique is illustrated using a set of 24-h ambulatory cardiovascular data from 168 men and women. Pertinent methodological challenges and unresolved issues are discussed. PMID:25416456

  5. Fitting Nonlinear Ordinary Differential Equation Models with Random Effects and Unknown Initial Conditions Using the Stochastic Approximation Expectation-Maximization (SAEM) Algorithm.

    PubMed

    Chow, Sy-Miin; Lu, Zhaohua; Sherwood, Andrew; Zhu, Hongtu

    2016-03-01

    The past decade has evidenced the increased prevalence of irregularly spaced longitudinal data in social sciences. Clearly lacking, however, are modeling tools that allow researchers to fit dynamic models to irregularly spaced data, particularly data that show nonlinearity and heterogeneity in dynamical structures. We consider the issue of fitting multivariate nonlinear differential equation models with random effects and unknown initial conditions to irregularly spaced data. A stochastic approximation expectation-maximization algorithm is proposed and its performance is evaluated using a benchmark nonlinear dynamical systems model, namely, the Van der Pol oscillator equations. The empirical utility of the proposed technique is illustrated using a set of 24-h ambulatory cardiovascular data from 168 men and women. Pertinent methodological challenges and unresolved issues are discussed.

  6. Woody vegetation cover monitoring with multi-temporal Landsat data and Random Forests: the case of the Northwest Province (South Africa)

    NASA Astrophysics Data System (ADS)

    Symeonakis, Elias; Higginbottom, Thomas; Petroulaki, Kyriaki

    2016-04-01

    Land degradation and desertification (LDD) are serious global threats to humans and the environment. Globally, 10-20% of drylands and 24% of the world's productive lands are potentially degraded, which affects 1.5 billion people and reduces GDP by €3.4 billion. In Africa, LDD processes affect up to a third of savannahs, leading to a decline in the ecosystem services provided to some of the continent's poorest and most vulnerable communities. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as an indicator of LDD. According to some assessments, bush encroachment has rendered 1.1 million ha of South African savanna unusable, threatens another 27 million ha (~17% of the country), and has reduced the grazing capacity throughout the region by up to 50%. Mapping woody cover encroachment over large areas can only be effectively achieved using remote sensing data and techniques. The longest continuously operating Earth-observation program, the Landsat series, is now freely-available as an atmospherically corrected, cloud masked surface reflectance product. The availability and length of the Landsat archive is thus an unparalleled Earth-observation resource, particularly for long-term change detection and monitoring. Here, we map and monitor woody vegetation cover in the Northwest Province of South Africa, a mosaic of 12 Landsat scenes that expands over more than 100,000km2. We employ a multi-temporal approach with dry-season TM, ETM+ and OLI data from 15 epochs between 1989 to 2015. We use 0.5m-pixel colour aerial photography to collect >15,000 samples for training and validating a Random Forest model to map woody cover, grasses, crops, urban and bare areas. High classification accuracies are achieved, especially so for the two cover types indirectly

  7. Sequence Based Prediction of DNA-Binding Proteins Based on Hybrid Feature Selection Using Random Forest and Gaussian Naïve Bayes

    PubMed Central

    Lou, Wangchao; Wang, Xiaoqing; Chen, Fan; Chen, Yixiao; Jiang, Bo; Zhang, Hua

    2014-01-01

    Developing an efficient method for determination of the DNA-binding proteins, due to their vital roles in gene regulation, is becoming highly desired since it would be invaluable to advance our understanding of protein functions. In this study, we proposed a new method for the prediction of the DNA-binding proteins, by performing the feature rank using random forest and the wrapper-based feature selection using forward best-first search strategy. The features comprise information from primary sequence, predicted secondary structure, predicted relative solvent accessibility, and position specific scoring matrix. The proposed method, called DBPPred, used Gaussian naïve Bayes as the underlying classifier since it outperformed five other classifiers, including decision tree, logistic regression, k-nearest neighbor, support vector machine with polynomial kernel, and support vector machine with radial basis function. As a result, the proposed DBPPred yields the highest average accuracy of 0.791 and average MCC of 0.583 according to the five-fold cross validation with ten runs on the training benchmark dataset PDB594. Subsequently, blind tests on the independent dataset PDB186 by the proposed model trained on the entire PDB594 dataset and by other five existing methods (including iDNA-Prot, DNA-Prot, DNAbinder, DNABIND and DBD-Threader) were performed, resulting in that the proposed DBPPred yielded the highest accuracy of 0.769, MCC of 0.538, and AUC of 0.790. The independent tests performed by the proposed DBPPred on completely a large non-DNA binding protein dataset and two RNA binding protein datasets also showed improved or comparable quality when compared with the relevant prediction methods. Moreover, we observed that majority of the selected features by the proposed method are statistically significantly different between the mean feature values of the DNA-binding and the non DNA-binding proteins. All of the experimental results indicate that the proposed DBPPred

  8. Aboveground Biomass Modeling from Field and LiDAR Data in Brazilian Amazon Tropical Rain Forest

    NASA Astrophysics Data System (ADS)

    Silva, C. A.; Hudak, A. T.; Vierling, L. A.; Keller, M. M.; Klauberg Silva, C. K.

    2015-12-01

    Tropical forests are an important component of global carbon stocks, but tropical forest responses to climate change are not sufficiently studied or understood. Among remote sensing technologies, airborne LiDAR (Light Detection and Ranging) may be best suited for quantifying tropical forest carbon stocks. Our objective was to estimate aboveground biomass (AGB) using airborne LiDAR and field plot data in Brazilian tropical rain forest. Forest attributes such as tree density, diameter at breast height, and heights were measured at a combination of square plots and linear transects (n=82) distributed across six different geographic zones in the Amazon. Using previously published allometric equations, tree AGB was computed and then summed to calculate total AGB at each sample plot. LiDAR-derived canopy structure metrics were also computed at each sample plot, and random forest regression modelling was applied to predict AGB from selected LiDAR metrics. The LiDAR-derived AGB model was assessed using the random forest explained variation, adjusted coefficient of determination (Adj. R²), root mean square error (RMSE, both absolute and relative) and BIAS (both absolute and relative). Our findings showed that the 99th percentile of height and height skewness were the best LiDAR metrics for AGB prediction. The AGB model using these two best predictors explained 59.59% of AGB variation, with an Adj. R² of 0.92, RMSE of 33.37 Mg/ha (20.28%), and bias of -0.69 (-0.42%). This study showed that LiDAR canopy structure metrics can be used to predict AGC stocks in Tropical Forest with acceptable precision and accuracy. Therefore, we conclude that there is good potential to monitor carbon sequestration in Brazilian Tropical Rain Forest using airborne LiDAR data, large field plots, and the random forest algorithm.

  9. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed

  10. Forest Management.

    ERIC Educational Resources Information Center

    Weicherding, Patrick J.; And Others

    This bulletin deals with forest management and provides an overview of forestry for the non-professional. The bulletin is divided into six sections: (1) What Is Forestry Management?; (2) How Is the Forest Measured?; (3) What Is Forest Protection?; (4) How Is the Forest Harvested?; (5) What Is Forest Regeneration?; and (6) What Is Forest…

  11. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010.

    PubMed

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien

    2016-02-11

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 10(6 )km(2). The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests.

  12. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010

    PubMed Central

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M.; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore III, Berrien

    2016-01-01

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 106 km2. The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests. PMID:26864143

  13. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010

    NASA Astrophysics Data System (ADS)

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M.; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien, III

    2016-02-01

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 106 km2. The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests.

  14. Scheduling algorithms

    NASA Astrophysics Data System (ADS)

    Wolfe, William J.; Wood, David; Sorensen, Stephen E.

    1996-12-01

    This paper discusses automated scheduling as it applies to complex domains such as factories, transportation, and communications systems. The window-constrained-packing problem is introduced as an ideal model of the scheduling trade offs. Specific algorithms are compared in terms of simplicity, speed, and accuracy. In particular, dispatch, look-ahead, and genetic algorithms are statistically compared on randomly generated job sets. The conclusion is that dispatch methods are fast and fairly accurate; while modern algorithms, such as genetic and simulate annealing, have excessive run times, and are too complex to be practical.

  15. Developing a Learning Algorithm-Generated Empirical Relaxer

    SciTech Connect

    Mitchell, Wayne; Kallman, Josh; Toreja, Allen; Gallagher, Brian; Jiang, Ming; Laney, Dan

    2016-03-30

    One of the main difficulties when running Arbitrary Lagrangian-Eulerian (ALE) simulations is determining how much to relax the mesh during the Eulerian step. This determination is currently made by the user on a simulation-by-simulation basis. We present a Learning Algorithm-Generated Empirical Relaxer (LAGER) which uses a regressive random forest algorithm to automate this decision process. We also demonstrate that LAGER successfully relaxes a variety of test problems, maintains simulation accuracy, and has the potential to significantly decrease both the person-hours and computational hours needed to run a successful ALE simulation.

  16. Mapping forested wetlands in the Great Zhan River Basin through integrating optical, radar, and topographical data classification techniques.

    PubMed

    Na, X D; Zang, S Y; Wu, C S; Li, W L

    2015-11-01

    Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.

  17. SIDRA: a blind algorithm for signal detection in photometric surveys

    NASA Astrophysics Data System (ADS)

    Mislis, D.; Bachelet, E.; Alsubai, K. A.; Bramich, D. M.; Parley, N.

    2016-01-01

    We present the Signal Detection using Random-Forest Algorithm (SIDRA). SIDRA is a detection and classification algorithm based on the Machine Learning technique (Random Forest). The goal of this paper is to show the power of SIDRA for quick and accurate signal detection and classification. We first diagnose the power of the method with simulated light curves and try it on a subset of the Kepler space mission catalogue. We use five classes of simulated light curves (CONSTANT, TRANSIT, VARIABLE, MLENS and EB for constant light curves, transiting exoplanet, variable, microlensing events and eclipsing binaries, respectively) to analyse the power of the method. The algorithm uses four features in order to classify the light curves. The training sample contains 5000 light curves (1000 from each class) and 50 000 random light curves for testing. The total SIDRA success ratio is ≥90 per cent. Furthermore, the success ratio reaches 95-100 per cent for the CONSTANT, VARIABLE, EB and MLENS classes and 92 per cent for the TRANSIT class with a decision probability of 60 per cent. Because the TRANSIT class is the one which fails the most, we run a simultaneous fit using SIDRA and a Box Least Square (BLS)-based algorithm for searching for transiting exoplanets. As a result, our algorithm detects 7.5 per cent more planets than a classic BLS algorithm, with better results for lower signal-to-noise light curves. SIDRA succeeds to catch 98 per cent of the planet candidates in the Kepler sample and fails for 7 per cent of the false alarms subset. SIDRA promises to be useful for developing a detection algorithm and/or classifier for large photometric surveys such as TESS and PLATO exoplanet future space missions.

  18. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests.

    PubMed

    Wilschut, L I; Addink, E A; Heesterbeek, J A P; Dubyanskiy, V M; Davis, S A; Laudisoit, A; M Begon; Burdelov, L A; Atshabar, B B; de Jong, S M

    2013-08-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the 'steppe on floodplain' areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the 'floodplain' areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique empirical

  19. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    PubMed

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

  20. On the likelihood of forests

    NASA Astrophysics Data System (ADS)

    Shang, Yilun

    2016-08-01

    How complex a network is crucially impacts its function and performance. In many modern applications, the networks involved have a growth property and sparse structures, which pose challenges to physicists and applied mathematicians. In this paper, we introduce the forest likelihood as a plausible measure to gauge how difficult it is to construct a forest in a non-preferential attachment way. Based on the notions of admittable labeling and path construction, we propose algorithms for computing the forest likelihood of a given forest. Concrete examples as well as the distributions of forest likelihoods for all forests with some fixed numbers of nodes are presented. Moreover, we illustrate the ideas on real-life networks, including a benzenoid tree, a mathematical family tree, and a peer-to-peer network.

  1. Mapping forest biomass from space - Fusion of hyperspectral EO1-hyperion data and Tandem-X and WorldView-2 canopy height models

    NASA Astrophysics Data System (ADS)

    Kattenborn, Teja; Maack, Joachim; Faßnacht, Fabian; Enßle, Fabian; Ermert, Jörg; Koch, Barbara

    2015-03-01

    Spaceborne sensors allow for wide-scale assessments of forest ecosystems. Combining the products of multiple sensors is hypothesized to improve the estimation of forest biomass. We applied interferometric (Tandem-X) and photogrammetric (WorldView-2) based predictors, e.g. canopy height models, in combination with hyperspectral predictors (EO1-Hyperion) by using 4 different machine learning algorithms for biomass estimation in temperate forest stands near Karlsruhe, Germany. An iterative model selection procedure was used to identify the optimal combination of predictors. The most accurate model (Random Forest) reached a r2 of 0.73 with a RMSE of 14.9% (29.4 t/ha). Further results revealed that the predictive accuracy depended highly on the statistical model and the area size of the field samples. We conclude that a fusion of canopy height and spectral information allows for accurate estimations of forest biomass from space.

  2. Algorithms and Algorithmic Languages.

    ERIC Educational Resources Information Center

    Veselov, V. M.; Koprov, V. M.

    This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…

  3. Forest Resources

    SciTech Connect

    2016-06-01

    Forest biomass is an abundant biomass feedstock that complements the conventional forest use of wood for paper and wood materials. It may be utilized for bioenergy production, such as heat and electricity, as well as for biofuels and a variety of bioproducts, such as industrial chemicals, textiles, and other renewable materials. The resources within the 2016 Billion-Ton Report include primary forest resources, which are taken directly from timberland-only forests, removed from the land, and taken to the roadside.

  4. A small-scale randomized controlled trial of the self-help version of the New Forest Parent Training Programme for children with ADHD symptoms.

    PubMed

    Daley, David; O'Brien, Michelle

    2013-09-01

    The efficacy of a self-help parent training programme for children with attention deficit hyperactivity disorder (ADHD) was evaluated. The New Forest Parenting Programme Self-help (NFPP-SH) is a 6-week written self-help psychological intervention designed to treat childhood ADHD. Forty-three children were randomised to either NFPP-SH intervention or a waiting list control group. Outcomes were child ADHD symptoms measured using questionnaires and direct observation, self-reported parental mental health, parenting competence, and the quality of parent-child interaction. Measures of child symptoms and parental outcomes were assessed before and after the intervention. ADHD symptoms were reduced, and parental competence was increased by self-help intervention. Forty-five percent of intervention children showed clinically significant reductions in ADHD symptoms. Self-help intervention did not lead to improvements in parental mental health or parent-child interaction. Findings provide support for the efficacy of self-help intervention for a clinical sample of children with ADHD symptoms. Self-help may provide a potentially cost-effective method of increasing access to evidence-based interventions for clinical populations.

  5. Is random access memory random?

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.

  6. Efficiency and effectiveness of the use of an acenocoumarol pharmacogenetic dosing algorithm versus usual care in patients with venous thromboembolic disease initiating oral anticoagulation: study protocol for a randomized controlled trial

    PubMed Central

    2012-01-01

    Background Hemorrhagic events are frequent in patients on treatment with antivitamin-K oral anticoagulants due to their narrow therapeutic margin. Studies performed with acenocoumarol have shown the relationship between demographic, clinical and genotypic variants and the response to these drugs. Once the influence of these genetic and clinical factors on the dose of acenocoumarol needed to maintain a stable international normalized ratio (INR) has been demonstrated, new strategies need to be developed to predict the appropriate doses of this drug. Several pharmacogenetic algorithms have been developed for warfarin, but only three have been developed for acenocoumarol. After the development of a pharmacogenetic algorithm, the obvious next step is to demonstrate its effectiveness and utility by means of a randomized controlled trial. The aim of this study is to evaluate the effectiveness and efficiency of an acenocoumarol dosing algorithm developed by our group which includes demographic, clinical and pharmacogenetic variables (VKORC1, CYP2C9, CYP4F2 and ApoE) in patients with venous thromboembolism (VTE). Methods and design This is a multicenter, single blind, randomized controlled clinical trial. The protocol has been approved by La Paz University Hospital Research Ethics Committee and by the Spanish Drug Agency. Two hundred and forty patients with VTE in which oral anticoagulant therapy is indicated will be included. Randomization (case/control 1:1) will be stratified by center. Acenocoumarol dose in the control group will be scheduled and adjusted following common clinical practice; in the experimental arm dosing will be following an individualized algorithm developed and validated by our group. Patients will be followed for three months. The main endpoints are: 1) Percentage of patients with INR within the therapeutic range on day seven after initiation of oral anticoagulant therapy; 2) Time from the start of oral anticoagulant treatment to achievement of a

  7. Aboveground Biomass Monitoring over Siberian Boreal Forest Using Radar Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Stelmaszczuk-Gorska, M. A.; Thiel, C. J.; Schmullius, C.

    2014-12-01

    Aboveground biomass (AGB) plays an essential role in ecosystem research, global cycles, and is of vital importance in climate studies. AGB accumulated in the forests is of special monitoring interest as it contains the most of biomass comparing with other land biomes. The largest of the land biomes is boreal forest, which has a substantial carbon accumulation capability; carbon stock estimated to be 272 +/-23 Pg C (32%) [1]. Russian's forests are of particular concern, due to the largest source of uncertainty in global carbon stock calculations [1], and old inventory data that have not been updated in the last 25 years [2]. In this research new empirical models for AGB estimation are proposed. Using radar L-band data for AGB retrieval and optical data for an update of in situ data the processing scheme was developed. The approach was trained and validated in the Asian part of the boreal forest, in southern Russian Central Siberia; two Siberian Federal Districts: Krasnoyarsk Kray and Irkutsk Oblast. Together the training and testing forest territories cover an area of approximately 3,500 km2. ALOS PALSAR L-band single (HH - horizontal transmitted and received) and dual (HH and HV - horizontal transmitted, horizontal and vertical received) polarizations in Single Look Complex format (SLC) were used to calculate backscattering coefficient in gamma nought and coherence. In total more than 150 images acquired between 2006 and 2011 were available. The data were obtained through the ALOS Kyoto and Carbon Initiative Project (K&C). The data were used to calibrate a randomForest algorithm. Additionally, a simple linear and multiple-regression approach was used. The uncertainty of the AGB estimation at pixel and stand level were calculated approximately as 35% by validation against an independent dataset. The previous studies employing ALOS PALSAR data over boreal forests reported uncertainty of 39.4% using randomForest approach [2] or 42.8% using semi-empirical approach [3].

  8. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests

    PubMed Central

    Wilschut, L.I.; Addink, E.A.; Heesterbeek, J.A.P.; Dubyanskiy, V.M.; Davis, S.A.; Laudisoit, A.; M.Begon; Burdelov, L.A.; Atshabar, B.B.; de Jong, S.M.

    2013-01-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the ‘steppe on floodplain’ areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the ‘floodplain’ areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique

  9. Random broadcast on random geometric graphs

    SciTech Connect

    Bradonjic, Milan; Elsasser, Robert; Friedrich, Tobias

    2009-01-01

    In this work, we consider the random broadcast time on random geometric graphs (RGGs). The classic random broadcast model, also known as push algorithm, is defined as: starting with one informed node, in each succeeding round every informed node chooses one of its neighbors uniformly at random and informs it. We consider the random broadcast time on RGGs, when with high probability: (i) RGG is connected, (ii) when there exists the giant component in RGG. We show that the random broadcast time is bounded by {Omicron}({radical} n + diam(component)), where diam(component) is a diameter of the entire graph, or the giant component, for the regimes (i), or (ii), respectively. In other words, for both regimes, we derive the broadcast time to be {Theta}(diam(G)), which is asymptotically optimal.

  10. On Gaussian random supergravity

    NASA Astrophysics Data System (ADS)

    Bachlechner, Thomas C.

    2014-04-01

    We study the distribution of metastable vacua and the likelihood of slow roll inflation in high dimensional random landscapes. We consider two examples of landscapes: a Gaussian random potential and an effective supergravity potential defined via a Gaussian random superpotential and a trivial Kähler potential. To examine these landscapes we introduce a random matrix model that describes the correlations between various derivatives and we propose an efficient algorithm that allows for a numerical study of high dimensional random fields. Using these novel tools, we find that the vast majority of metastable critical points in N dimensional random supergravities are either approximately supersymmetric with | F| ≪ M susy or supersymmetric. Such approximately supersymmetric points are dynamical attractors in the landscape and the probability that a randomly chosen critical point is metastable scales as log( P ) ∝ - N. We argue that random supergravities lead to potentially interesting inflationary dynamics.

  11. An Analysis of Algorithms for Solving Discrete Logarithms in Fixed Groups

    DTIC Science & Technology

    2010-03-01

    Pollard’s Rho Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 H. Pollard’s Kangaroo Algorithm...Pollard’s Kangaroo Algorithm . . . . . . . . . . . . . . . . . . . . . . 44 5. Index Calculus Algorithm...27 16. Pollard’s Rho - Random Sequence Algorithm . . . . . . . . . . . . . . . . . . 28 17. Pollard’s Kangaroo Algorithm

  12. RS-Forest: A Rapid Density Estimator for Streaming Anomaly Detection

    PubMed Central

    Wu, Ke; Zhang, Kun; Fan, Wei; Edwards, Andrea; Yu, Philip S.

    2015-01-01

    Anomaly detection in streaming data is of high interest in numerous application domains. In this paper, we propose a novel one-class semi-supervised algorithm to detect anomalies in streaming data. Underlying the algorithm is a fast and accurate density estimator implemented by multiple fully randomized space trees (RS-Trees), named RS-Forest. The piecewise constant density estimate of each RS-tree is defined on the tree node into which an instance falls. Each incoming instance in a data stream is scored by the density estimates averaged over all trees in the forest. Two strategies, statistical attribute range estimation of high probability guarantee and dual node profiles for rapid model update, are seamlessly integrated into RS-Forest to systematically address the ever-evolving nature of data streams. We derive the theoretical upper bound for the proposed algorithm and analyze its asymptotic properties via bias-variance decomposition. Empirical comparisons to the state-of-the-art methods on multiple benchmark datasets demonstrate that the proposed method features high detection rate, fast response, and insensitivity to most of the parameter settings. Algorithm implementations and datasets are available upon request. PMID:25685112

  13. The Effectiveness of Parent Training as a Treatment for Preschool Attention-Deficit/Hyperactivity Disorder: Study Protocol for a Randomized Controlled, Multicenter Trial of the New Forest Parenting Program in Everyday Clinical Practice

    PubMed Central

    Daley, David; Frydenberg, Morten; Rask, Charlotte U; Sonuga-Barke, Edmund; Thomsen, Per H

    2016-01-01

    Background Parent training is recommended as the first-line treatment for attention-deficit/hyperactivity disorder (ADHD) in preschool children. The New Forest Parenting Programme (NFPP) is an evidence-based parenting program developed specifically to target preschool ADHD. Objective The objective of this trial is to investigate whether the NFPP can be effectively delivered for children referred through official community pathways in everyday clinical practice. Methods A multicenter randomized controlled parallel arm trial design is employed. There are two treatment arms, NFPP and treatment as usual. NFPP consists of eight individually delivered parenting sessions, where the child attends during three of the sessions. Outcomes are examined at three time points (T1, T2, T3): T1 (baseline), T2 (week 12, post intervention), and T3 (6 month follow/up). 140 children between the ages of 3-7, with a clinical diagnosis of ADHD, informed by the Development and Well Being Assessment, and recruited from three child and adolescent psychiatry departments in Denmark will take part. Randomization is on a 1:1 basis, stratified for age and gender. Results The primary endpoint is change in ADHD symptoms as measured by the Preschool ADHD-Rating Scale (ADHD-RS) by T2. Secondary outcome measures include: effects on this measure at T3 and T2 and T3 measures of teacher reported Preschool ADHD-RS scores, parent and teacher rated scores on the Strength & Difficulties Questionnaire, direct observation of ADHD behaviors during Child’s Solo Play, observation of parent-child interaction, parent sense of competence, and family stress. Results will be reported using the standards set out in the Consolidated Standards of Reporting Trials Statement for Randomized Controlled Trials of nonpharmacological treatments. Conclusions The trial will provide evidence as to whether NFPP is a more effective treatment for preschool ADHD than the treatment usually offered in everyday clinical practice. Trial

  14. Comparison of 3D-OP-OSEM and 3D-FBP reconstruction algorithms for High-Resolution Research Tomograph studies: effects of randoms estimation methods

    NASA Astrophysics Data System (ADS)

    van Velden, Floris H. P.; Kloet, Reina W.; van Berckel, Bart N. M.; Wolfensberger, Saskia P. A.; Lammertsma, Adriaan A.; Boellaard, Ronald

    2008-06-01

    The High-Resolution Research Tomograph (HRRT) is a dedicated human brain positron emission tomography (PET) scanner. Recently, a 3D filtered backprojection (3D-FBP) reconstruction method has been implemented to reduce bias in short duration frames, currently observed in 3D ordinary Poisson OSEM (3D-OP-OSEM) reconstructions. Further improvements might be expected using a new method of variance reduction on randoms (VRR) based on coincidence histograms instead of using the delayed window technique (DW) to estimate randoms. The goal of this study was to evaluate VRR in combination with 3D-OP-OSEM and 3D-FBP reconstruction techniques. To this end, several phantom studies and a human brain study were performed. For most phantom studies, 3D-OP-OSEM showed higher accuracy of observed activity concentrations with VRR than with DW. However, both positive and negative deviations in reconstructed activity concentrations and large biases of grey to white matter contrast ratio (up to 88%) were still observed as a function of scan statistics. Moreover 3D-OP-OSEM+VRR also showed bias up to 64% in clinical data, i.e. in some pharmacokinetic parameters as compared with those obtained with 3D-FBP+VRR. In the case of 3D-FBP, VRR showed similar results as DW for both phantom and clinical data, except that VRR showed a better standard deviation of 6-10%. Therefore, VRR should be used to correct for randoms in HRRT PET studies.

  15. A random-walk algorithm for modeling lithospheric density and the role of body forces in the evolution of the Midcontinent Rift

    NASA Astrophysics Data System (ADS)

    Levandowski, Will; Boyd, Oliver S.; Briggs, Rich W.; Gold, Ryan D.

    2015-12-01

    This paper develops a Monte Carlo algorithm for extracting three-dimensional lithospheric density models from geophysical data. Empirical scaling relationships between velocity and density create a 3-D starting density model, which is then iteratively refined until it reproduces observed gravity and topography. This approach permits deviations from uniform crustal velocity-density scaling, which provide insight into crustal lithology and prevent spurious mapping of crustal anomalies into the mantle. We test this algorithm on the Proterozoic Midcontinent Rift (MCR), north-central United States. The MCR provides a challenge because it hosts a gravity high overlying low shear-wave velocity crust in a generally flat region. Our initial density estimates are derived from a seismic velocity/crustal thickness model based on joint inversion of surface-wave dispersion and receiver functions. By adjusting these estimates to reproduce gravity and topography, we generate a lithospheric-scale model that reveals dense middle crust and eclogitized lowermost crust within the rift. Mantle lithospheric density beneath the MCR is not anomalous, consistent with geochemical evidence that lithospheric mantle was not the primary source of rift-related magmas and suggesting that extension occurred in response to far-field stress rather than a hot mantle plume. Similarly, the subsequent inversion of normal faults resulted from changing far-field stress that exploited not only warm, recently faulted crust but also a gravitational potential energy low in the MCR. The success of this density modeling algorithm in the face of such apparently contradictory geophysical properties suggests that it may be applicable to a variety of tectonic and geodynamic problems.

  16. A random-walk algorithm for modeling lithospheric density and the role of body forces in the evolution of the Midcontinent Rift

    USGS Publications Warehouse

    Levandowski, William Brower; Boyd, Oliver; Briggs, Richard; Gold, Ryan D.

    2015-01-01

    We test this algorithm on the Proterozoic Midcontinent Rift (MCR), north-central U.S. The MCR provides a challenge because it hosts a gravity high overlying low shear-wave velocity crust in a generally flat region. Our initial density estimates are derived from a seismic velocity/crustal thickness model based on joint inversion of surface-wave dispersion and receiver functions. By adjusting these estimates to reproduce gravity and topography, we generate a lithospheric-scale model that reveals dense middle crust and eclogitized lowermost crust within the rift. Mantle lithospheric density beneath the MCR is not anomalous, consistent with geochemical evidence that lithospheric mantle was not the primary source of rift-related magmas and suggesting that extension occurred in response to far-field stress rather than a hot mantle plume. Similarly, the subsequent inversion of normal faults resulted from changing far-field stress that exploited not only warm, recently faulted crust but also a gravitational potential energy low in the MCR. The success of this density modeling algorithm in the face of such apparently contradictory geophysical properties suggests that it may be applicable to a variety of tectonic and geodynamic problems. 

  17. Improved forest change detection with terrain illumination corrected landsat images

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An illumination correction algorithm has been developed to improve the accuracy of forest change detection from Landsat reflectance data. This algorithm is based on an empirical rotation model and was tested on the Landsat imagery pair over Cherokee National Forest, Tennessee, Uinta-Wasatch-Cache N...

  18. Handling packet dropouts and random delays for unstable delayed processes in NCS by optimal tuning of PIλDμ controllers with evolutionary algorithms.

    PubMed

    Pan, Indranil; Das, Saptarshi; Gupta, Amitava

    2011-10-01

    The issues of stochastically varying network delays and packet dropouts in Networked Control System (NCS) applications have been simultaneously addressed by time domain optimal tuning of fractional order (FO) PID controllers. Different variants of evolutionary algorithms are used for the tuning process and their performances are compared. Also the effectiveness of the fractional order PI(λ)D(μ) controllers over their integer order counterparts is looked into. Two standard test bench plants with time delay and unstable poles which are encountered in process control applications are tuned with the proposed method to establish the validity of the tuning methodology. The proposed tuning methodology is independent of the specific choice of plant and is also applicable for less complicated systems. Thus it is useful in a wide variety of scenarios. The paper also shows the superiority of FOPID controllers over their conventional PID counterparts for NCS applications.

  19. Natural Variability of Mexican Forest Fires

    NASA Astrophysics Data System (ADS)

    Velasco-Herrera, Graciela; Velasco Herrera, Victor Manuel; Kemper-Valverdea, N.

    The purpose of this paper was 1) to present a new algorithm for analyzing the forest fires, 2) to discuss the present understanding of the natural variability at different scales with special emphasis on Mexico conditions since 1972, 3) to analyze the internal and external factors affecting forest fires for example ENSO and Total Solar Irradiance, and 4) to discuss the implications of this knowledge, on research and on restoration and management methods, which purpose is to enhance forest biodiversity conservation. 5) We present an estimate of the Mexican forest fires for the next decade. These results may be useful to minimize human and economic losses.

  20. Satellite-based forest monitoring: spatial and temporal forecast of growing index and short-wave infrared band.

    PubMed

    Bayr, Caroline; Gallaun, Heinz; Kleb, Ulrike; Kornberger, Birgit; Steinegger, Martin; Winter, Martin

    2016-04-18

    For detecting anomalies or interventions in the field of forest monitoring we propose an approach based on the spatial and temporal forecast of satellite time series data. For each pixel of the satellite image three different types of forecasts are provided, namely spatial, temporal and combined spatio-temporal forecast. Spatial forecast means that a clustering algorithm is used to group the time series data based on the features normalised difference vegetation index (NDVI) and the short-wave infrared band (SWIR). For estimation of the typical temporal trajectory of the NDVI and SWIR during the vegetation period of each spatial cluster, we apply several methods of functional data analysis including functional principal component analysis, and a novel form of random regression forests with online learning (streaming) capability. The temporal forecast is carried out by means of functional time series analysis and an autoregressive integrated moving average model. The combination of the temporal forecasts, which is based on the past of the considered pixel, and spatial forecasts, which is based on highly correlated pixels within one cluster and their past, is performed by functional data analysis, and a variant of random regression forests adapted to online learning capabilities. For evaluation of the methods, the approaches are applied to a study area in Germany for monitoring forest damages caused by wind-storm, and to a study area in Spain for monitoring forest fires.

  1. Greedy algorithms in disordered systems

    NASA Astrophysics Data System (ADS)

    Duxbury, P. M.; Dobrin, R.

    1999-08-01

    We discuss search, minimal path and minimal spanning tree algorithms and their applications to disordered systems. Greedy algorithms solve these problems exactly, and are related to extremal dynamics in physics. Minimal cost path (Dijkstra) and minimal cost spanning tree (Prim) algorithms provide extremal dynamics for a polymer in a random medium (the KPZ universality class) and invasion percolation (without trapping) respectively.

  2. Mapping tree health using airborne laser scans and hyperspectral imagery: a case study for a floodplain eucalypt forest

    NASA Astrophysics Data System (ADS)

    Shendryk, Iurii; Tulbure, Mirela; Broich, Mark; McGrath, Andrew; Alexandrov, Sergey; Keith, David

    2016-04-01

    Airborne laser scanning (ALS) and hyperspectral imaging (HSI) are two complementary remote sensing technologies that provide comprehensive structural and spectral characteristics of forests over large areas. In this study we developed two algorithms: one for individual tree delineation utilizing ALS and the other utilizing ALS and HSI to characterize health of delineated trees in a structurally complex floodplain eucalypt forest. We conducted experiments in the largest eucalypt, river red gum forest in the world, located in the south-east of Australia that experienced severe dieback over the past six decades. For detection of individual trees from ALS we developed a novel bottom-up approach based on Euclidean distance clustering to detect tree trunks and random walks segmentation to further delineate tree crowns. Overall, our algorithm was able to detect 67% of tree trunks with diameter larger than 13 cm. We assessed the accuracy of tree delineations in terms of crown height and width, with correct delineation of 68% of tree crowns. The increase in ALS point density from ~12 to ~24 points/m2 resulted in tree trunk detection and crown delineation increase of 11% and 13%, respectively. Trees with incorrectly delineated crowns were generally attributed to areas with high tree density along water courses. The accurate delineation of trees allowed us to classify the health of this forest using machine learning and field-measured tree crown dieback and transparency ratios, which were good predictors of tree health in this forest. ALS and HSI derived indices were used as predictor variables to train and test object-oriented random forest classifier. Returned pulse width, intensity and density related ALS indices were the most important predictors in the tree health classifications. At the forest level in terms of tree crown dieback, 77% of trees were classified as healthy, 14% as declining and 9% as dying or dead with 81% mapping accuracy. Similarly, in terms of tree

  3. Analysis of image content recognition algorithm based on sparse coding and machine learning

    NASA Astrophysics Data System (ADS)

    Xiao, Yu

    2017-03-01

    This paper presents an image classification algorithm based on spatial sparse coding model and random forest. Firstly, SIFT feature extraction of the image; and then use the sparse encoding theory to generate visual vocabulary based on SIFT features, and using the visual vocabulary of SIFT features into a sparse vector; through the combination of regional integration and spatial sparse vector, the sparse vector gets a fixed dimension is used to represent the image; at last random forest classifier for image sparse vectors for training and testing, using the experimental data set for standard test Caltech-101 and Scene-15. The experimental results show that the proposed algorithm can effectively represent the features of the image and improve the classification accuracy. In this paper, we propose an innovative image recognition algorithm based on image segmentation, sparse coding and multi instance learning. This algorithm introduces the concept of multi instance learning, the image as a multi instance bag, sparse feature transformation by SIFT images as instances, sparse encoding model generation visual vocabulary as the feature space is mapped to the feature space through the statistics on the number of instances in bags, and then use the 1-norm SVM to classify images and generate sample weights to select important image features.

  4. Certified randomness in quantum physics.

    PubMed

    Acín, Antonio; Masanes, Lluis

    2016-12-07

    The concept of randomness plays an important part in many disciplines. On the one hand, the question of whether random processes exist is fundamental for our understanding of nature. On the other, randomness is a resource for cryptography, algorithms and simulations. Standard methods for generating randomness rely on assumptions about the devices that are often not valid in practice. However, quantum technologies enable new methods for generating certified randomness, based on the violation of Bell inequalities. These methods are referred to as device-independent because they do not rely on any modelling of the devices. Here we review efforts to design device-independent randomness generators and the associated challenges.

  5. Certified randomness in quantum physics

    NASA Astrophysics Data System (ADS)

    Acín, Antonio; Masanes, Lluis

    2016-12-01

    The concept of randomness plays an important part in many disciplines. On the one hand, the question of whether random processes exist is fundamental for our understanding of nature. On the other, randomness is a resource for cryptography, algorithms and simulations. Standard methods for generating randomness rely on assumptions about the devices that are often not valid in practice. However, quantum technologies enable new methods for generating certified randomness, based on the violation of Bell inequalities. These methods are referred to as device-independent because they do not rely on any modelling of the devices. Here we review efforts to design device-independent randomness generators and the associated challenges.

  6. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.

  7. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    PubMed Central

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  8. Tropical forests

    SciTech Connect

    Not Available

    1985-01-01

    Major international aid and nongovernmental groups have agreed on a strategy to conserve tropical forests. Their plan calls for a $5.3 billion, five-year program for the 56 most critically affected countries. This report consists of three parts. The Plan details the costs of deforestation in both developing and industrialized countries, uncovers its real causes, and outlines a five-part action plan. Case Studies reviews dozens of detailed accounts of successful forest management projects from around the world, covering wide-ranging ecological conditions and taking into account the economics of forest products in different marketing situations. Country Investment Profiles spell out country-by-country listings of what should be done, who should do it, and how much it will cost.

  9. A Geospatial Assessment of Mountain Pine Beetle Infestations and Their Effect on Forest Health in Okanogan-Wenatchee National Forest

    NASA Astrophysics Data System (ADS)

    Allain, M.; Nguyen, A.; Johnson, E.; Williams, E.; Tsai, S.; Prichard, S.; Freed, T.; Skiles, J. W.

    2010-12-01

    Fire-suppression over the past century has resulted in an accumulation of forest litter and increased tree density. As nutrients are sequestered in forest litter and not recycled by forest fires, soil nutrient concentrations have decreased. The forests of Northern Washington are in poor health as a result of these factors coupled with sequential droughts. The mountain pine beetle (MPB) thrives in such conditions, giving rise to an outbreak in Washington’s Okanogan-Wenatchee National Forest. These outbreaks occur in three successive stages— the green, red, and gray stages. Beetles first infest the tree in the green phase, leading to discoloration of needles in the red phase and eventually death in the gray phase. With the use of geospatial technology, these outbreaks can be better mapped and assessed to evaluate forest health. Field work on seventeen randomly selected sites was conducted using the point-centered quarter method. The stratified random sampling technique ensured that the sampled trees were representative of all classifications present. Additional measurements taken were soil nutrient concentrations (sodium [Na+], nitrate [NO3-], and potassium [K+]), soil pH, and tree temperatures. Satellite imagery was used to define infestation levels and geophysical parameters, such as land cover, vegetation classification, and vegetation stress. ASTER images were used with the Ratio Vegetation Index (RVI) to explore the differences in vegetation, while MODIS images were used to analyze the Disturbance Index (DI). Four other vegetation indices from Landsat TM5 were used to distinguish the green, red and gray phases. Selected imagery from the Hyperion sensor was used to run a minimum distance supervised classification in ENVI, thus testing the ability of Hyperion imagery to detect the green phase. The National Agricultural Imagery Program (NAIP) archive was used to generate accurate maps of beetle-infested regions. This algorithm was used to detect bark beetle

  10. Evaluation of Algorithms for a Miles-in-Trail Decision Support Tool

    NASA Technical Reports Server (NTRS)

    Bloem, Michael; Hattaway, David; Bambos, Nicholas

    2012-01-01

    Four machine learning algorithms were prototyped and evaluated for use in a proposed decision support tool that would assist air traffic managers as they set Miles-in-Trail restrictions. The tool would display probabilities that each possible Miles-in-Trail value should be used in a given situation. The algorithms were evaluated with an expected Miles-in-Trail cost that assumes traffic managers set restrictions based on the tool-suggested probabilities. Basic Support Vector Machine, random forest, and decision tree algorithms were evaluated, as was a softmax regression algorithm that was modified to explicitly reduce the expected Miles-in-Trail cost. The algorithms were evaluated with data from the summer of 2011 for air traffic flows bound to the Newark Liberty International Airport (EWR) over the ARD, PENNS, and SHAFF fixes. The algorithms were provided with 18 input features that describe the weather at EWR, the runway configuration at EWR, the scheduled traffic demand at EWR and the fixes, and other traffic management initiatives in place at EWR. Features describing other traffic management initiatives at EWR and the weather at EWR achieved relatively high information gain scores, indicating that they are the most useful for estimating Miles-in-Trail. In spite of a high variance or over-fitting problem, the decision tree algorithm achieved the lowest expected Miles-in-Trail costs when the algorithms were evaluated using 10-fold cross validation with the summer 2011 data for these air traffic flows.

  11. Mapping forest composition from the Canadian National Forest Inventory and land cover classification maps.

    PubMed

    Yemshanov, Denys; McKenney, Daniel W; Pedlar, John H

    2012-08-01

    Canada's National Forest Inventory (CanFI) provides coarse-grained, aggregated information on a large number of forest attributes. Though reasonably well suited for summary reporting on national forest resources, the coarse spatial nature of this data limits its usefulness in modeling applications that require information on forest composition at finer spatial resolutions. An alternative source of information is the land cover classification produced by the Canadian Forest Service as part of its Earth Observation for Sustainable Development of Forests (EOSD) initiative. This product, which is derived from Landsat satellite imagery, provides relatively high resolution coverage, but only very general information on forest composition (such as conifer, mixedwood, and deciduous). Here we link the CanFI and EOSD products using a spatial randomization technique to distribute the forest composition information in CanFI to the forest cover classes in EOSD. The resultant geospatial coverages provide randomized predictions of forest composition, which incorporate the fine-scale spatial detail of the EOSD product and agree in general terms with the species composition summaries from the original CanFI estimates. We describe the approach and provide illustrative results for selected major commercial tree species in Canada.

  12. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Z.; Ben-Naim, E.

    2015-01-01

    We study fragmentation of a random recursive tree into a forest by repeated removal of nodes. The initial tree consists of N nodes and it is generated by sequential addition of nodes with each new node attaching to a randomly-selected existing node. As nodes are removed from the tree, one at a time, the tree dissolves into an ensemble of separate trees, namely, a forest. We study statistical properties of trees and nodes in this heterogeneous forest, and find that the fraction of remaining nodes m characterizes the system in the limit N\\to ∞ . We obtain analytically the size density {{φ }s} of trees of size s. The size density has power-law tail {{φ }s}˜ {{s}-α } with exponent α =1+\\frac{1}{m}. Therefore, the tail becomes steeper as further nodes are removed, and the fragmentation process is unusual in that exponent α increases continuously with time. We also extend our analysis to the case where nodes are added as well as removed, and obtain the asymptotic size density for growing trees.

  13. Fragmentation of random trees

    NASA Astrophysics Data System (ADS)

    Kalay, Ziya; Ben-Naim, Eli

    2015-03-01

    We investigate the fragmentation of a random recursive tree by repeated removal of nodes, resulting in a forest of disjoint trees. The initial tree is generated by sequentially attaching new nodes to randomly chosen existing nodes until the tree contains N nodes. As nodes are removed, one at a time, the tree dissolves into an ensemble of separate trees, namely a forest. We study the statistical properties of trees and nodes in this heterogeneous forest. In the limit N --> ∞ , we find that the system is characterized by a single parameter: the fraction of remaining nodes m. We obtain analytically the size density ϕs of trees of size s, which has a power-law tail ϕs ~s-α , with exponent α = 1 + 1 / m . Therefore, the tail becomes steeper as further nodes are removed, producing an unusual scaling exponent that increases continuously with time. Furthermore, we investigate the fragment size distribution in a growing tree, where nodes are added as well as removed, and find that the distribution for this case is much narrower.

  14. Forests & Trees.

    ERIC Educational Resources Information Center

    Gage, Susan

    1989-01-01

    This newsletter discusses the disappearance of the world's forests and the resulting environmental problems of erosion and flooding; loss of genetic diversity; climatic changes such as less rainfall, and intensifying of the greenhouse effect; and displacement and destruction of indigenous cultures. The articles, lessons, and activities are…

  15. Forest Imaging

    NASA Technical Reports Server (NTRS)

    1992-01-01

    NASA's Technology Applications Center, with other government and academic agencies, provided technology for improved resources management to the Cibola National Forest. Landsat satellite images enabled vegetation over a large area to be classified for purposes of timber analysis, wildlife habitat, range measurement and development of general vegetation maps.

  16. Algorithmic chemistry

    SciTech Connect

    Fontana, W.

    1990-12-13

    In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.

  17. Patchiness of forest landscape can predict species distribution better than abundance: the case of a forest-dwelling passerine, the short-toed treecreeper, in central Italy

    PubMed Central

    Valerio, Francesco; Balestrieri, Rosario; Posillico, Mario; Bucci, Rodolfo; Altea, Tiziana; De Cinti, Bruno; Matteucci, Giorgio

    2016-01-01

    Environmental heterogeneity affects not only the distribution of a species but also its local abundance. High heterogeneity due to habitat alteration and fragmentation can influence the realized niche of a species, lowering habitat suitability as well as reducing local abundance. We investigate whether a relationship exists between habitat suitability and abundance and whether both are affected by fragmentation. Our aim was to assess the predictive power of such a relationship to derive advice for environmental management. As a model species we used a forest specialist, the short-toed treecreeper (Family: Certhiidae; Certhia brachydactyla Brehm, 1820), and sampled it in central Italy. Species distribution was modelled as a function of forest structure, productivity and fragmentation, while abundance was directly estimated in two central Italian forest stands. Different algorithms were implemented to model species distribution, employing 170 occurrence points provided mostly by the MITO2000 database: an artificial neural network, classification tree analysis, flexible discriminant analysis, generalized boosting models, generalized linear models, multivariate additive regression splines, maximum entropy and random forests. Abundance was estimated also considering detectability, through N-mixture models. Differences between forest stands in both abundance and habitat suitability were assessed as well as the existence of a relationship. Simpler algorithms resulted in higher goodness of fit than complex ones. Fragmentation was highly influential in determining potential distribution. Local abundance and habitat suitability differed significantly between the two forest stands, which were also significantly different in the degree of fragmentation. Regression showed that suitability has a weak significant effect in explaining increasing value of abundance. In particular, local abundances varied both at low and high suitability values. The study lends support to the

  18. Application of AIS Technology to Forest Mapping

    NASA Technical Reports Server (NTRS)

    Yool, S. R.; Star, J. L.

    1985-01-01

    Concerns about environmental effects of large scale deforestation have prompted efforts to map forests over large areas using various remote sensing data and image processing techniques. Basic research on the spectral characteristics of forest vegetation are required to form a basis for development of new techniques, and for image interpretation. Examination of LANDSAT data and image processing algorithms over a portion of boreal forest have demonstrated the complexity of relations between the various expressions of forest canopies, environmental variability, and the relative capacities of different image processing algorithms to achieve high classification accuracies under these conditions. Airborne Imaging Spectrometer (AIS) data may in part provide the means to interpret the responses of standard data and techniques to the vegetation based on its relatively high spectral resolution.

  19. Competitive evaluation of data mining algorithms for use in classification of leukocyte subtypes with Raman microspectroscopy.

    PubMed

    Maguire, A; Vega-Carrascal, I; Bryant, J; White, L; Howe, O; Lyng, F M; Meade, A D

    2015-04-07

    Raman microspectroscopy has been investigated for some time for use in label-free cell sorting devices. These approaches require coupling of the Raman spectrometer to complex data mining algorithms for identification of cellular subtypes such as the leukocyte subpopulations of lymphocytes and monocytes. In this study, three distinct multivariate classification approaches, (PCA-LDA, SVMs and Random Forests) are developed and tested on their ability to classify the cellular subtype in extracted peripheral blood mononuclear cells (T-cell lymphocytes from myeloid cells), and are evaluated in terms of their respective classification performance. A strategy for optimisation of each of the classification algorithm is presented with emphasis on reduction of model complexity in each of the algorithms. The relative classification performance and performance characteristics are highlighted, overall suggesting the radial basis function SVM as a robust option for classification of leukocytes with Raman microspectroscopy.

  20. Higher-order force gradient symplectic algorithms

    NASA Astrophysics Data System (ADS)

    Chin, Siu A.; Kidwell, Donald W.

    2000-12-01

    We show that a recently discovered fourth order symplectic algorithm, which requires one evaluation of force gradient in addition to three evaluations of the force, when iterated to higher order, yielded algorithms that are far superior to similarly iterated higher order algorithms based on the standard Forest-Ruth algorithm. We gauge the accuracy of each algorithm by comparing the step-size independent error functions associated with energy conservation and the rotation of the Laplace-Runge-Lenz vector when solving a highly eccentric Kepler problem. For orders 6, 8, 10, and 12, the new algorithms are approximately a factor of 103, 104, 104, and 105 better.

  1. Identifying Active Travel Behaviors in Challenging Environments Using GPS, Accelerometers, and Machine Learning Algorithms

    PubMed Central

    Ellis, Katherine; Godbole, Suneeta; Marshall, Simon; Lanckriet, Gert; Staudenmayer, John; Kerr, Jacqueline

    2014-01-01

    Background: Active travel is an important area in physical activity research, but objective measurement of active travel is still difficult. Automated methods to measure travel behaviors will improve research in this area. In this paper, we present a supervised machine learning method for transportation mode prediction from global positioning system (GPS) and accelerometer data. Methods: We collected a dataset of about 150 h of GPS and accelerometer data from two research assistants following a protocol of prescribed trips consisting of five activities: bicycling, riding in a vehicle, walking, sitting, and standing. We extracted 49 features from 1-min windows of this data. We compared the performance of several machine learning algorithms and chose a random forest algorithm to classify the transportation mode. We used a moving average output filter to smooth the output predictions over time. Results: The random forest algorithm achieved 89.8% cross-validated accuracy on this dataset. Adding the moving average filter to smooth output predictions increased the cross-validated accuracy to 91.9%. Conclusion: Machine learning methods are a viable approach for automating measurement of active travel, particularly for measuring travel activities that traditional accelerometer data processing methods misclassify, such as bicycling and vehicle travel. PMID:24795875

  2. The use of airborne hyperspectral data for tree species classification in a species-rich Central European forest area

    NASA Astrophysics Data System (ADS)

    Richter, Ronny; Reu, Björn; Wirth, Christian; Doktor, Daniel; Vohland, Michael

    2016-10-01

    The success of remote sensing approaches to assess tree species diversity in a heterogeneously mixed forest stand depends on the availability of both appropriate data and suitable classification algorithms. To separate the high number of in total ten broadleaf tree species in a small structured floodplain forest, the Leipzig Riverside Forest, we introduce a majority based classification approach for Discriminant Analysis based on Partial Least Squares (PLS-DA), which was tested against Random Forest (RF) and Support Vector Machines (SVM). The classifier performance was tested on different sets of airborne hyperspectral image data (AISA DUAL) that were acquired on single dates in August and September and also stacked to a composite product. Shadowed gaps and shadowed crown parts were eliminated via spectral mixture analysis (SMA) prior to the pixel-based classification. Training and validation sets were defined spectrally with the conditioned Latin hypercube method as a stratified random sampling procedure. In the validation, PLS-DA consistently outperformed the RF and SVM approaches on all datasets. The additional use of spectral variable selection (CARS, "competitive adaptive reweighted sampling") combined with PLS-DA further improved classification accuracies. Up to 78.4% overall accuracy was achieved for the stacked dataset. The image recorded in August provided slightly higher accuracies than the September image, regardless of the applied classifier.

  3. A unifying graph-cut image segmentation framework: algorithms it encompasses and equivalences among them

    NASA Astrophysics Data System (ADS)

    Ciesielski, Krzysztof Chris; Udupa, Jayaram K.; Falcão, A. X.; Miranda, P. A. V.

    2012-02-01

    We present a general graph-cut segmentation framework GGC, in which the delineated objects returned by the algorithms optimize the energy functions associated with the lp norm, 1 <= p <= ∞. Two classes of well known algorithms belong to GGC: the standard graph cut GC (such as the min-cut/max-flow algorithm) and the relative fuzzy connectedness algorithms RFC (including iterative RFC, IRFC). The norm-based description of GGC provides more elegant and mathematically better recognized framework of our earlier results from [18, 19]. Moreover, it allows precise theoretical comparison of GGC representable algorithms with the algorithms discussed in a recent paper [22] (min-cut/max-flow graph cut, random walker, shortest path/geodesic, Voronoi diagram, power watershed/shortest path forest), which optimize, via lp norms, the intermediate segmentation step, the labeling of scene voxels, but for which the final object need not optimize the used lp energy function. Actually, the comparison of the GGC representable algorithms with that encompassed in the framework described in [22] constitutes the main contribution of this work.

  4. Electromagnetic wave extinction within a forested canopy

    NASA Technical Reports Server (NTRS)

    Karam, M. A.; Fung, A. K.

    1989-01-01

    A forested canopy is modeled by a collection of randomly oriented finite-length cylinders shaded by randomly oriented and distributed disk- or needle-shaped leaves. For a plane wave exciting the forested canopy, the extinction coefficient is formulated in terms of the extinction cross sections (ECSs) in the local frame of each forest component and the Eulerian angles of orientation (used to describe the orientation of each component). The ECSs in the local frame for the finite-length cylinders used to model the branches are obtained by using the forward-scattering theorem. ECSs in the local frame for the disk- and needle-shaped leaves are obtained by the summation of the absorption and scattering cross-sections. The behavior of the extinction coefficients with the incidence angle is investigated numerically for both deciduous and coniferous forest. The dependencies of the extinction coefficients on the orientation of the leaves are illustrated numerically.

  5. Dispersal of forest insects

    NASA Technical Reports Server (NTRS)

    Mcmanus, M. L.

    1979-01-01

    Dispersal flights of selected species of forest insects which are associated with periodic outbreaks of pests that occur over large contiguous forested areas are discussed. Gypsy moths, spruce budworms, and forest tent caterpillars were studied for their massive migrations in forested areas. Results indicate that large dispersals into forested areas are due to the females, except in the case of the gypsy moth.

  6. Randomized selection on the GPU

    SciTech Connect

    Monroe, Laura Marie; Wendelberger, Joanne R; Michalak, Sarah E

    2011-01-13

    We implement here a fast and memory-sparing probabilistic top N selection algorithm on the GPU. To our knowledge, this is the first direct selection in the literature for the GPU. The algorithm proceeds via a probabilistic-guess-and-chcck process searching for the Nth element. It always gives a correct result and always terminates. The use of randomization reduces the amount of data that needs heavy processing, and so reduces the average time required for the algorithm. Probabilistic Las Vegas algorithms of this kind are a form of stochastic optimization and can be well suited to more general parallel processors with limited amounts of fast memory.

  7. Geological Mapping Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  8. Genetic algorithms

    NASA Technical Reports Server (NTRS)

    Wang, Lui; Bayer, Steven E.

    1991-01-01

    Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

  9. The weirdest SDSS galaxies: results from an outlier detection algorithm

    NASA Astrophysics Data System (ADS)

    Baron, Dalya; Poznanski, Dovi

    2017-03-01

    How can we discover objects we did not know existed within the large data sets that now abound in astronomy? We present an outlier detection algorithm that we developed, based on an unsupervised Random Forest. We test the algorithm on more than two million galaxy spectra from the Sloan Digital Sky Survey and examine the 400 galaxies with the highest outlier score. We find objects which have extreme emission line ratios and abnormally strong absorption lines, objects with unusual continua, including extremely reddened galaxies. We find galaxy-galaxy gravitational lenses, double-peaked emission line galaxies and close galaxy pairs. We find galaxies with high ionization lines, galaxies that host supernovae and galaxies with unusual gas kinematics. Only a fraction of the outliers we find were reported by previous studies that used specific and tailored algorithms to find a single class of unusual objects. Our algorithm is general and detects all of these classes, and many more, regardless of what makes them peculiar. It can be executed on imaging, time series and other spectroscopic data, operates well with thousands of features, is not sensitive to missing values and is easily parallelizable.

  10. Biodiversity Mapping in a Tropical West African Forest with Airborne Hyperspectral Data

    PubMed Central

    Vaglio Laurin, Gaia; Chan, Jonathan Cheung-Wai; Chen, Qi; Lindsell, Jeremy A.; Coomes, David A.; Guerriero, Leila; Frate, Fabio Del; Miglietta, Franco; Valentini, Riccardo

    2014-01-01

    Tropical forests are major repositories of biodiversity, but are fast disappearing as land is converted to agriculture. Decision-makers need to know which of the remaining forests to prioritize for conservation, but the only spatial information on forest biodiversity has, until recently, come from a sparse network of ground-based plots. Here we explore whether airborne hyperspectral imagery can be used to predict the alpha diversity of upper canopy trees in a West African forest. The abundance of tree species were collected from 64 plots (each 1250 m2 in size) within a Sierra Leonean national park, and Shannon-Wiener biodiversity indices were calculated. An airborne spectrometer measured reflectances of 186 bands in the visible and near-infrared spectral range at 1 m2 resolution. The standard deviations of these reflectance values and their first-order derivatives were calculated for each plot from the c. 1250 pixels of hyperspectral information within them. Shannon-Wiener indices were then predicted from these plot-based reflectance statistics using a machine-learning algorithm (Random Forest). The regression model fitted the data well (pseudo-R2 = 84.9%), and we show that standard deviations of green-band reflectances and infra-red region derivatives had the strongest explanatory powers. Our work shows that airborne hyperspectral sensing can be very effective at mapping canopy tree diversity, because its high spatial resolution allows within-plot heterogeneity in reflectance to be characterized, making it an effective tool for monitoring forest biodiversity over large geographic scales. PMID:24937407

  11. Biodiversity mapping in a tropical West African forest with airborne hyperspectral data.

    PubMed

    Vaglio Laurin, Gaia; Cheung-Wai Chan, Jonathan; Chen, Qi; Lindsell, Jeremy A; Coomes, David A; Guerriero, Leila; Del Frate, Fabio; Miglietta, Franco; Valentini, Riccardo

    2014-01-01

    Tropical forests are major repositories of biodiversity, but are fast disappearing as land is converted to agriculture. Decision-makers need to know which of the remaining forests to prioritize for conservation, but the only spatial information on forest biodiversity has, until recently, come from a sparse network of ground-based plots. Here we explore whether airborne hyperspectral imagery can be used to predict the alpha diversity of upper canopy trees in a West African forest. The abundance of tree species were collected from 64 plots (each 1250 m(2) in size) within a Sierra Leonean national park, and Shannon-Wiener biodiversity indices were calculated. An airborne spectrometer measured reflectances of 186 bands in the visible and near-infrared spectral range at 1 m(2) resolution. The standard deviations of these reflectance values and their first-order derivatives were calculated for each plot from the c. 1250 pixels of hyperspectral information within them. Shannon-Wiener indices were then predicted from these plot-based reflectance statistics using a machine-learning algorithm (Random Forest). The regression model fitted the data well (pseudo-R(2) = 84.9%), and we show that standard deviations of green-band reflectances and infra-red region derivatives had the strongest explanatory powers. Our work shows that airborne hyperspectral sensing can be very effective at mapping canopy tree diversity, because its high spatial resolution allows within-plot heterogeneity in reflectance to be characterized, making it an effective tool for monitoring forest biodiversity over large geographic scales.

  12. Discriminant forest classification method and system

    SciTech Connect

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  13. Genomic-enabled prediction with classification algorithms

    PubMed Central

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-01-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait–environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  14. Genomic-enabled prediction with classification algorithms.

    PubMed

    Ornella, L; Pérez, P; Tapia, E; González-Camacho, J M; Burgueño, J; Zhang, X; Singh, S; Vicente, F S; Bonnett, D; Dreisigacker, S; Singh, R; Long, N; Crossa, J

    2014-06-01

    Pearson's correlation coefficient (ρ) is the most commonly reported metric of the success of prediction in genomic selection (GS). However, in real breeding ρ may not be very useful for assessing the quality of the regression in the tails of the distribution, where individuals are chosen for selection. This research used 14 maize and 16 wheat data sets with different trait-environment combinations. Six different models were evaluated by means of a cross-validation scheme (50 random partitions each, with 90% of the individuals in the training set and 10% in the testing set). The predictive accuracy of these algorithms for selecting individuals belonging to the best α=10, 15, 20, 25, 30, 35, 40% of the distribution was estimated using Cohen's kappa coefficient (κ) and an ad hoc measure, which we call relative efficiency (RE), which indicates the expected genetic gain due to selection when individuals are selected based on GS exclusively. We put special emphasis on the analysis for α=15%, because it is a percentile commonly used in plant breeding programmes (for example, at CIMMYT). We also used ρ as a criterion for overall success. The algorithms used were: Bayesian LASSO (BL), Ridge Regression (RR), Reproducing Kernel Hilbert Spaces (RHKS), Random Forest Regression (RFR), and Support Vector Regression (SVR) with linear (lin) and Gaussian kernels (rbf). The performance of regression methods for selecting the best individuals was compared with that of three supervised classification algorithms: Random Forest Classification (RFC) and Support Vector Classification (SVC) with linear (lin) and Gaussian (rbf) kernels. Classification methods were evaluated using the same cross-validation scheme but with the response vector of the original training sets dichotomised using a given threshold. For α=15%, SVC-lin presented the highest κ coefficients in 13 of the 14 maize data sets, with best values ranging from 0.131 to 0.722 (statistically significant in 9 data sets

  15. Effects of Forest Disturbances on Forest Structural Parameters Retrieval from Lidar Waveform Data

    NASA Technical Reports Server (NTRS)

    Ranson, K, Lon; Sun, G.

    2011-01-01

    The effect of forest disturbance on the lidar waveform and the forest biomass estimation was demonstrated by model simulation. The results show that the correlation between stand biomass and the lidar waveform indices changes when the stand spatial structure changes due to disturbances rather than the natural succession. This has to be considered in developing algorithms for regional or global mapping of biomass from lidar waveform data.

  16. Montana's forest resources. Forest Service resource bulletin

    SciTech Connect

    Conner, R.C.; O'Brien, R.A.

    1993-09-01

    The report includes highlights of the forest resource in Montana as of 1989. Also the study describes the extent, condition, and location of the State's forests with particular emphasis on timberland. Includes statistical tables, area by land classes, ownership, and forest type, growing stock and sawtimber volumes, growth, mortality, and removals for timberland.

  17. Forests through the Eye of a Satellite: Understanding regional forest-cover dynamics using Landsat Imagery

    NASA Astrophysics Data System (ADS)

    Baumann, Matthias

    Forests are changing at an alarming pace worldwide. Forests are an important provider of ecosystem services that contribute to human wellbeing, including the provision of timber and non-timber products, habitat for biodiversity, recreation amenities. Most prominently, forests serve as a sink for atmospheric carbon dioxide that ultimately helps to mitigate changes in the global climate. It is thus important to understand where, how and why forests change worldwide. My dissertation provides answers to these questions. The overarching goal of my dissertation is to improve our understanding of regional forest-cover dynamics by analyzing Landsat satellite imagery. I answer where forests change following drastic socio-economic shocks by using the breakdown of the Soviet Union as a natural experiment. My dissertation provides innovative algorithms to answer why forests change---because of human activities or because of natural events such as storms. Finally, I will show how dynamic forests are within one year by providing ways to characterize green-leaf phenology from satellite imagery. With my findings I directly contribute to a better understanding of the processes on the Earth's surface and I highlight the importance of satellite imagery to learn about regional and local forest-cover dynamics.

  18. MODIS Based Estimation of Forest Aboveground Biomass in China.

    PubMed

    Yin, Guodong; Zhang, Yuan; Sun, Yan; Wang, Tao; Zeng, Zhenzhong; Piao, Shilong

    2015-01-01

    Accurate estimation of forest biomass C stock is essential to understand carbon cycles. However, current estimates of Chinese forest biomass are mostly based on inventory-based timber volumes and empirical conversion factors at the provincial scale, which could introduce large uncertainties in forest biomass estimation. Here we provide a data-driven estimate of Chinese forest aboveground biomass from 2001 to 2013 at a spatial resolution of 1 km by integrating a recently reviewed plot-level ground-measured forest aboveground biomass database with geospatial information from 1-km Moderate-Resolution Imaging Spectroradiometer (MODIS) dataset in a machine learning algorithm (the model tree ensemble, MTE). We show that Chinese forest aboveground biomass is 8.56 Pg C, which is mainly contributed by evergreen needle-leaf forests and deciduous broadleaf forests. The mean forest aboveground biomass density is 56.1 Mg C ha-1, with high values observed in temperate humid regions. The responses of forest aboveground biomass density to mean annual temperature are closely tied to water conditions; that is, negative responses dominate regions with mean annual precipitation less than 1300 mm y-1 and positive responses prevail in regions with mean annual precipitation higher than 2800 mm y-1. During the 2000s, the forests in China sequestered C by 61.9 Tg C y-1, and this C sink is mainly distributed in north China and may be attributed to warming climate, rising CO2 concentration, N deposition, and growth of young forests.

  19. MODIS Based Estimation of Forest Aboveground Biomass in China

    PubMed Central

    Sun, Yan; Wang, Tao; Zeng, Zhenzhong; Piao, Shilong

    2015-01-01

    Accurate estimation of forest biomass C stock is essential to understand carbon cycles. However, current estimates of Chinese forest biomass are mostly based on inventory-based timber volumes and empirical conversion factors at the provincial scale, which could introduce large uncertainties in forest biomass estimation. Here we provide a data-driven estimate of Chinese forest aboveground biomass from 2001 to 2013 at a spatial resolution of 1 km by integrating a recently reviewed plot-level ground-measured forest aboveground biomass database with geospatial information from 1-km Moderate-Resolution Imaging Spectroradiometer (MODIS) dataset in a machine learning algorithm (the model tree ensemble, MTE). We show that Chinese forest aboveground biomass is 8.56 Pg C, which is mainly contributed by evergreen needle-leaf forests and deciduous broadleaf forests. The mean forest aboveground biomass density is 56.1 Mg C ha−1, with high values observed in temperate humid regions. The responses of forest aboveground biomass density to mean annual temperature are closely tied to water conditions; that is, negative responses dominate regions with mean annual precipitation less than 1300 mm y−1 and positive responses prevail in regions with mean annual precipitation higher than 2800 mm y−1. During the 2000s, the forests in China sequestered C by 61.9 Tg C y−1, and this C sink is mainly distributed in north China and may be attributed to warming climate, rising CO2 concentration, N deposition, and growth of young forests. PMID:26115195

  20. Alerts of forest disturbance from MODIS imagery

    NASA Astrophysics Data System (ADS)

    Hammer, Dan; Kraft, Robin; Wheeler, David

    2014-12-01

    This paper reports the methodology and computational strategy for a forest cover disturbance alerting system. Analytical techniques from time series econometrics are applied to imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor to detect temporal instability in vegetation indices. The characteristics from each MODIS pixel's spectral history are extracted and compared against historical data on forest cover loss to develop a geographically localized classification rule that can be applied across the humid tropical biome. The final output is a probability of forest disturbance for each 500 m pixel that is updated every 16 days. The primary objective is to provide high-confidence alerts of forest disturbance, while minimizing false positives. We find that the alerts serve this purpose exceedingly well in Pará, Brazil, with high probability alerts garnering a user accuracy of 98 percent over the training period and 93 percent after the training period (2000-2005) when compared against the PRODES deforestation data set, which is used to assess spatial accuracy. Implemented in Clojure and Java on the Hadoop distributed data processing platform, the algorithm is a fast, automated, and open source system for detecting forest disturbance. It is intended to be used in conjunction with higher-resolution imagery and data products that cannot be updated as quickly as MODIS-based data products. By highlighting hotspots of change, the algorithm and associated output can focus high-resolution data acquisition and aid in efforts to enforce local forest conservation efforts.

  1. Extremely Randomized Machine Learning Methods for Compound Activity Prediction.

    PubMed

    Czarnecki, Wojciech M; Podlewska, Sabina; Bojarski, Andrzej J

    2015-11-09

    Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called 'extremely randomized methods'-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their 'non-extreme' competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

  2. Testing an earthquake prediction algorithm

    USGS Publications Warehouse

    Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.

    1997-01-01

    A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.

  3. Quantum Algorithms

    NASA Technical Reports Server (NTRS)

    Abrams, D.; Williams, C.

    1999-01-01

    This thesis describes several new quantum algorithms. These include a polynomial time algorithm that uses a quantum fast Fourier transform to find eigenvalues and eigenvectors of a Hamiltonian operator, and that can be applied in cases for which all know classical algorithms require exponential time.

  4. Forest Health Detectives

    ERIC Educational Resources Information Center

    Bal, Tara L.

    2014-01-01

    "Forest health" is an important concept often not covered in tree, forest, insect, or fungal ecology and biology. With minimal, inexpensive equipment, students can investigate and conduct their own forest health survey to assess the percentage of trees with natural or artificial wounds or stress. Insects and diseases in the forest are…

  5. A Hybrid Color Space for Skin Detection Using Genetic Algorithm Heuristic Search and Principal Component Analysis Technique

    PubMed Central

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377

  6. Mapping in random-structures

    SciTech Connect

    Reidys, C.M.

    1996-06-01

    A mapping in random-structures is defined on the vertices of a generalized hypercube Q{sub {alpha}}{sup n}. A random-structure will consist of (1) a random contact graph and (2) a family of relations imposed on adjacent vertices. The vertex set of a random contact graph will be the set of all coordinates of a vertex P {element_of} Q{sub {alpha}}{sup n}. Its edge will be the union of the edge sets of two random graphs. The first is a random 1-regular graph on 2m vertices (coordinates) and the second is a random graph G{sub p} with p = c{sub 2}/n on all n vertices (coordinates). The structure of the random contact graphs will be investigated and it will be shown that for certain values of m, c{sub 2} the mapping in random-structures allows to search by the set of random-structures. This is applied to mappings in RNA-secondary structures. Also, the results on random-structures might be helpful for designing 3D-folding algorithms for RNA.

  7. Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Gao, Xiaogang; Sorooshian, Soroosh; Li, Xin

    2016-03-01

    The controlled outflows from a reservoir or dam are highly dependent on the decisions made by the reservoir operators, instead of a natural hydrological process. Difference exists between the natural upstream inflows to reservoirs and the controlled outflows from reservoirs that supply the downstream users. With the decision maker's awareness of changing climate, reservoir management requires adaptable means to incorporate more information into decision making, such as water delivery requirement, environmental constraints, dry/wet conditions, etc. In this paper, a robust reservoir outflow simulation model is presented, which incorporates one of the well-developed data-mining models (Classification and Regression Tree) to predict the complicated human-controlled reservoir outflows and extract the reservoir operation patterns. A shuffled cross-validation approach is further implemented to improve CART's predictive performance. An application study of nine major reservoirs in California is carried out. Results produced by the enhanced CART, original CART, and random forest are compared with observation. The statistical measurements show that the enhanced CART and random forest overperform the CART control run in general, and the enhanced CART algorithm gives a better predictive performance over random forest in simulating the peak flows. The results also show that the proposed model is able to consistently and reasonably predict the expert release decisions. Experiments indicate that the release operation in the Oroville Lake is significantly dominated by SWP allocation amount and reservoirs with low elevation are more sensitive to inflow amount than others.

  8. Randomization Strategies.

    PubMed

    Kepler, Christopher K

    2017-04-01

    An understanding of randomization is important both for study design and to assist medical professionals in evaluating the medical literature. Simple randomization can be done through a variety of techniques, but carries a risk of unequal distribution of subjects into treatment groups. Block randomization can be used to overcome this limitation by ensuring that small subgroups are distributed evenly between treatment groups. Finally, techniques can be used to evenly distribute subjects between treatment groups while accounting for confounding variables, so as to not skew results when there is a high index of suspicion that a particular variable will influence outcome.

  9. Forest dynamics

    PubMed Central

    Frelich, Lee

    2016-01-01

    Forest dynamics encompass changes in stand structure, species composition, and species interactions with disturbance and environment over a range of spatial and temporal scales. For convenience, spatial scale is defined as individual tree, neighborhood, stand, and landscape. Whether a given canopy-leveling disturbance will initiate a sequence of development in structure with little change in composition or initiate an episode of succession depends on a match or mismatch, respectively, with traits of the dominant tree species that allow the species to survive disturbance. When these match, certain species-disturbance type combinations lock in a pattern of stand and landscape dynamics that can persist for several generations of trees; thus, dominant tree species regulate, as well as respond to, disturbance. A complex interaction among tree species, neighborhood effects, disturbance type and severity, landform, and soils determines how stands of differing composition form and the mosaic of stands that compose the landscape. Neighborhood effects (e.g., serotinous seed rain, sprouting, shading, leaf-litter chemistry, and leaf-litter physical properties) operate at small spatial extents of the individual tree and its neighbors but play a central role in forest dynamics by contributing to patch formation at stand scales and dynamics of the entire landscape. Dominance by tree species with neutral to negative neighborhood effects leads to unstable landscape dynamics in disturbance-prone regions, wherein most stands are undergoing succession; stability can only occur under very low-severity disturbance regimes. Dominance by species with positive effects leads to stable landscape dynamics wherein only a small proportion of stands undergo succession at any one time. Positive neighborhood effects are common in temperate and boreal zones, whereas negative effects are more common in tropical climates. Landscapes with positive dynamics have alternate categories of dynamics

  10. Forest dynamics.

    PubMed

    Frelich, Lee

    2016-01-01

    Forest dynamics encompass changes in stand structure, species composition, and species interactions with disturbance and environment over a range of spatial and temporal scales. For convenience, spatial scale is defined as individual tree, neighborhood, stand, and landscape. Whether a given canopy-leveling disturbance will initiate a sequence of development in structure with little change in composition or initiate an episode of succession depends on a match or mismatch, respectively, with traits of the dominant tree species that allow the species to survive disturbance. When these match, certain species-disturbance type combinations lock in a pattern of stand and landscape dynamics that can persist for several generations of trees; thus, dominant tree species regulate, as well as respond to, disturbance. A complex interaction among tree species, neighborhood effects, disturbance type and severity, landform, and soils determines how stands of differing composition form and the mosaic of stands that compose the landscape. Neighborhood effects (e.g., serotinous seed rain, sprouting, shading, leaf-litter chemistry, and leaf-litter physical properties) operate at small spatial extents of the individual tree and its neighbors but play a central role in forest dynamics by contributing to patch formation at stand scales and dynamics of the entire landscape. Dominance by tree species with neutral to negative neighborhood effects leads to unstable landscape dynamics in disturbance-prone regions, wherein most stands are undergoing succession; stability can only occur under very low-severity disturbance regimes. Dominance by species with positive effects leads to stable landscape dynamics wherein only a small proportion of stands undergo succession at any one time. Positive neighborhood effects are common in temperate and boreal zones, whereas negative effects are more common in tropical climates. Landscapes with positive dynamics have alternate categories of dynamics

  11. Random thoughts

    NASA Astrophysics Data System (ADS)

    ajansen; kwhitefoot; panteltje1; edprochak; sudhakar, the

    2014-07-01

    In reply to the physicsworld.com news story “How to make a quantum random-number generator from a mobile phone” (16 May, http://ow.ly/xFiYc, see also p5), which describes a way of delivering random numbers by counting the number of photons that impinge on each of the individual pixels in the camera of a Nokia N9 smartphone.

  12. Bearing fault component identification using information gain and machine learning algorithms

    NASA Astrophysics Data System (ADS)

    Vinay, Vakharia; Kumar, Gupta Vijay; Kumar, Kankar Pavan

    2015-04-01

    In the present study an attempt has been made to identify various bearing faults using machine learning algorithm. Vibration signals obtained from faults in inner race, outer race, rolling element and combined faults are considered. Raw vibration signal cannot be used directly since vibration signals are masked by noise. To overcome this difficulty combined time frequency domain method such as wavelet transform is used. Further wavelet selection criteria based on minimum permutation entropy is employed to select most appropriate base wavelet. Statistical features from selected wavelet coefficients are calculated to form feature vector. To reduce size of feature vector information gain attribute selection method is employed. Modified feature set is fed in to machine learning algorithm such as random forest and self-organizing map for getting maximize fault identification efficiency. Results obtained revealed that attribute selection method shows improvement in fault identification accuracy of bearing components.

  13. Monte Carlo simulations: Hidden errors from ``good'' random number generators

    NASA Astrophysics Data System (ADS)

    Ferrenberg, Alan M.; Landau, D. P.; Wong, Y. Joanna

    1992-12-01

    The Wolff algorithm is now accepted as the best cluster-flipping Monte Carlo algorithm for beating ``critical slowing down.'' We show how this method can yield incorrect answers due to subtle correlations in ``high quality'' random number generators.

  14. Radar modeling of a boreal forest

    NASA Technical Reports Server (NTRS)

    Chauhan, Narinder S.; Lang, Roger H.; Ranson, K. J.

    1991-01-01

    Microwave modeling, ground truth, and SAR data are used to investigate the characteristics of forest stands. A mixed coniferous forest stand has been modeled at P, L, and C bands. Extensive measurements of ground truth and canopy geometry parameters were performed in a 200-m-square hemlock-dominated forest plot. About 10 percent of the trees were sampled to determine a distribution of diameter at breast height (DBH). Hemlock trees in the forest are modeled by characterizing tree trunks, branches, and needles as randomly oriented lossy dielectric cylinders whose area and orientation distributions are prescribed. The distorted Born approximation is used to compute the backscatter at P, L, and C bands. The theoretical results are found to be lower than the calibrated ground-truth data. The experiment and model results agree quite closely, however, when the ratios of VV to HH and HV to HH are compared.

  15. Aboveground Biomass and Dynamics of Forest Attributes using LiDAR Data and Vegetation Model

    NASA Astrophysics Data System (ADS)

    V V L, P. A.

    2015-12-01

    In recent years, biomass estimation for tropical forests has received much attention because of the fact that regional biomass is considered to be a critical input to climate change. Biomass almost determines the potential carbon emission that could be released to the atmosphere due to deforestation or conservation to non-forest land use. Thus, accurate biomass estimation is necessary for better understating of deforestation impacts on global warming and environmental degradation. In this context, forest stand height inclusion in biomass estimation plays a major role in reducing the uncertainty in the estimation of biomass. The improvement in the accuracy in biomass shall also help in meeting the MRV objectives of REDD+. Along with the precise estimate of biomass, it is also important to emphasize the role of vegetation models that will most likely become an important tool for assessing the effects of climate change on potential vegetation dynamics and terrestrial carbon storage and for managing terrestrial ecosystem sustainability. Remote sensing is an efficient way to estimate forest parameters in large area, especially at regional scale where field data is limited. LIDAR (Light Detection And Ranging) provides accurate information on the vertical structure of forests. We estimated average tree canopy heights and AGB from GLAS waveform parameters by using a multi-regression linear model in forested area of Madhya Pradesh (area-3,08,245 km2), India. The derived heights from ICESat-GLAS were correlated with field measured tree canopy heights for 60 plots. Results have shown a significant correlation of R2= 74% for top canopy heights and R2= 57% for stand biomass. The total biomass estimation 320.17 Mt and canopy heights are generated by using random forest algorithm. These canopy heights and biomass maps were used in vegetation models to predict the changes biophysical/physiological characteristics of forest according to the changing climate. In our study we have

  16. Stochastic gradient boosting classification trees for forest fuel types mapping through airborne laser scanning and IRS LISS-III imagery

    NASA Astrophysics Data System (ADS)

    Chirici, G.; Scotti, R.; Montaghi, A.; Barbati, A.; Cartisano, R.; Lopez, G.; Marchetti, M.; McRoberts, R. E.; Olsson, H.; Corona, P.

    2013-12-01

    This paper presents an application of Airborne Laser Scanning (ALS) data in conjunction with an IRS LISS-III image for mapping forest fuel types. For two study areas of 165 km2 and 487 km2 in Sicily (Italy), 16,761 plots of size 30-m × 30-m were distributed using a tessellation-based stratified sampling scheme. ALS metrics and spectral signatures from IRS extracted for each plot were used as predictors to classify forest fuel types observed and identified by photointerpretation and fieldwork. Following use of traditional parametric methods that produced unsatisfactory results, three non-parametric classification approaches were tested: (i) classification and regression tree (CART), (ii) the CART bagging method called Random Forests, and (iii) the CART bagging/boosting stochastic gradient boosting (SGB) approach. This contribution summarizes previous experiences using ALS data for estimating forest variables useful for fire management in general and for fuel type mapping, in particular. It summarizes characteristics of classification and regression trees, presents the pre-processing operation, the classification algorithms, and the achieved results. The results demonstrated superiority of the SGB method with overall accuracy of 84%. The most relevant ALS metric was canopy cover, defined as the percent of non-ground returns. Other relevant metrics included the spectral information from IRS and several other ALS metrics such as percentiles of the height distribution, the mean height of all returns, and the number of returns.

  17. Random sequential adsorption on fractals.

    PubMed

    Ciesla, Michal; Barbasz, Jakub

    2012-07-28

    Irreversible adsorption of spheres on flat collectors having dimension d < 2 is studied. Molecules are adsorbed on Sierpinski's triangle and carpet-like fractals (1 < d < 2), and on general Cantor set (d < 1). Adsorption process is modeled numerically using random sequential adsorption (RSA) algorithm. The paper concentrates on measurement of fundamental properties of coverages, i.e., maximal random coverage ratio and density autocorrelation function, as well as RSA kinetics. Obtained results allow to improve phenomenological relation between maximal random coverage ratio and collector dimension. Moreover, simulations show that, in general, most of known dimensional properties of adsorbed monolayers are valid for non-integer dimensions.

  18. QUANTIFYING FOREST ABOVEGROUND CARBON POOLS AND FLUXES USING MULTI-TEMPORAL LIDAR A report on field monitoring, remote sensing MMV, GIS integration, and modeling results for forestry field validation test to quantify aboveground tree biomass and carbon

    SciTech Connect

    Lee Spangler; Lee A. Vierling; Eva K. Stand; Andrew T. Hudak; Jan U.H. Eitel; Sebastian Martinuzzi

    2012-04-01

    Sound policy recommendations relating to the role of forest management in mitigating atmospheric carbon dioxide (CO{sub 2}) depend upon establishing accurate methodologies for quantifying forest carbon pools for large tracts of land that can be dynamically updated over time. Light Detection and Ranging (LiDAR) remote sensing is a promising technology for achieving accurate estimates of aboveground biomass and thereby carbon pools; however, not much is known about the accuracy of estimating biomass change and carbon flux from repeat LiDAR acquisitions containing different data sampling characteristics. In this study, discrete return airborne LiDAR data was collected in 2003 and 2009 across {approx}20,000 hectares (ha) of an actively managed, mixed conifer forest landscape in northern Idaho, USA. Forest inventory plots, established via a random stratified sampling design, were established and sampled in 2003 and 2009. The Random Forest machine learning algorithm was used to establish statistical relationships between inventory data and forest structural metrics derived from the LiDAR acquisitions. Aboveground biomass maps were created for the study area based on statistical relationships developed at the plot level. Over this 6-year period, we found that the mean increase in biomass due to forest growth across the non-harvested portions of the study area was 4.8 metric ton/hectare (Mg/ha). In these non-harvested areas, we found a significant difference in biomass increase among forest successional stages, with a higher biomass increase in mature and old forest compared to stand initiation and young forest. Approximately 20% of the landscape had been disturbed by harvest activities during the six-year time period, representing a biomass loss of >70 Mg/ha in these areas. During the study period, these harvest activities outweighed growth at the landscape scale, resulting in an overall loss in aboveground carbon at this site. The 30-fold increase in sampling density

  19. A 50-m Forest Cover Map in Southeast Asia from ALOS/PALSAR and Its Application on Forest Fragmentation Assessment

    PubMed Central

    Dong, Jinwei; Xiao, Xiangming; Sheldon, Sage; Biradar, Chandrashekhar; Zhang, Geli; Dinh Duong, Nguyen; Hazarika, Manzul; Wikantika, Ketut; Takeuhci, Wataru; Moore, Berrien

    2014-01-01

    Southeast Asia experienced higher rates of deforestation than other continents in the 1990s and still was a hotspot of forest change in the 2000s. Biodiversity conservation planning and accurate estimation of forest carbon fluxes and pools need more accurate information about forest area, spatial distribution and fragmentation. However, the recent forest maps of Southeast Asia were generated from optical images at spatial resolutions of several hundreds of meters, and they do not capture well the exceptionally complex and dynamic environments in Southeast Asia. The forest area estimates from those maps vary substantially, ranging from 1.73×106 km2 (GlobCover) to 2.69×106 km2 (MCD12Q1) in 2009; and their uncertainty is constrained by frequent cloud cover and coarse spatial resolution. Recently, cloud-free imagery from the Phased Array Type L-band Synthetic Aperture Radar (PALSAR) onboard the Advanced Land Observing Satellite (ALOS) became available. We used the PALSAR 50-m orthorectified mosaic imagery in 2009 to generate a forest cover map of Southeast Asia at 50-m spatial resolution. The validation, using ground-reference data collected from the Geo-Referenced Field Photo Library and high-resolution images in Google Earth, showed that our forest map has a reasonably high accuracy (producer's accuracy 86% and user's accuracy 93%). The PALSAR-based forest area estimates in 2009 are significantly correlated with those from GlobCover and MCD12Q1 at national and subnational scales but differ in some regions at the pixel scale due to different spatial resolutions, forest definitions, and algorithms. The resultant 50-m forest map was used to quantify forest fragmentation and it revealed substantial details of forest fragmentation. This new 50-m map of tropical forests could serve as a baseline map for forest resource inventory, deforestation monitoring, reducing emissions from deforestation and forest degradation (REDD+) implementation, and biodiversity. PMID:24465714

  20. A 50-m forest cover map in Southeast Asia from ALOS/PALSAR and its application on forest fragmentation assessment.

    PubMed

    Dong, Jinwei; Xiao, Xiangming; Sheldon, Sage; Biradar, Chandrashekhar; Zhang, Geli; Duong, Nguyen Dinh; Hazarika, Manzul; Wikantika, Ketut; Takeuhci, Wataru; Moore, Berrien

    2014-01-01

    Southeast Asia experienced higher rates of deforestation than other continents in the 1990s and still was a hotspot of forest change in the 2000s. Biodiversity conservation planning and accurate estimation of forest carbon fluxes and pools need more accurate information about forest area, spatial distribution and fragmentation. However, the recent forest maps of Southeast Asia were generated from optical images at spatial resolutions of several hundreds of meters, and they do not capture well the exceptionally complex and dynamic environments in Southeast Asia. The forest area estimates from those maps vary substantially, ranging from 1.73×10(6) km(2) (GlobCover) to 2.69×10(6) km(2) (MCD12Q1) in 2009; and their uncertainty is constrained by frequent cloud cover and coarse spatial resolution. Recently, cloud-free imagery from the Phased Array Type L-band Synthetic Aperture Radar (PALSAR) onboard the Advanced Land Observing Satellite (ALOS) became available. We used the PALSAR 50-m orthorectified mosaic imagery in 2009 to generate a forest cover map of Southeast Asia at 50-m spatial resolution. The validation, using ground-reference data collected from the Geo-Referenced Field Photo Library and high-resolution images in Google Earth, showed that our forest map has a reasonably high accuracy (producer's accuracy 86% and user's accuracy 93%). The PALSAR-based forest area estimates in 2009 are significantly correlated with those from GlobCover and MCD12Q1 at national and subnational scales but differ in some regions at the pixel scale due to different spatial resolutions, forest definitions, and algorithms. The resultant 50-m forest map was used to quantify forest fragmentation and it revealed substantial details of forest fragmentation. This new 50-m map of tropical forests could serve as a baseline map for forest resource inventory, deforestation monitoring, reducing emissions from deforestation and forest degradation (REDD+) implementation, and biodiversity.

  1. Effectiveness of community forestry in Prey Long forest, Cambodia.

    PubMed

    Lambrick, Frances H; Brown, Nick D; Lawrence, Anna; Bebber, Daniel P

    2014-04-01

    Cambodia has 57% forest cover, the second highest in the Greater Mekong region, and a high deforestation rate (1.2%/year, 2005-2010). Community forestry (CF) has been proposed as a way to reduce deforestation and support livelihoods through local management of forests. CF is expanding rapidly in Cambodia. The National Forests Program aims to designate one million hectares of forest to CF by 2030. However, the effectiveness of CF in conservation is not clear due to a global lack of controlled comparisons, multiple meanings of CF, and the context-specific nature of CF implementation. We assessed the effectiveness of CF by comparing 9 CF sites with paired controls in state production forest in the area of Prey Long forest, Cambodia. We assessed forest condition in 18-20 randomly placed variable-radius plots and fixed-area regeneration plots. We surveyed 10% of households in each of the 9 CF villages to determine the proportion that used forest products, as a measure of household dependence on the forest. CF sites had fewer signs of anthropogenic damage (cut stems, stumps, and burned trees), higher aboveground biomass, more regenerating stems, and reduced canopy openness than control areas. Abundance of economically valuable species, however, was higher in control sites. We used survey results and geographic parameters to model factors affecting CF outcomes. Interaction between management type, CF or control, and forest dependence indicated that CF was more effective in cases where the community relied on forest products for subsistence use and income.

  2. Estimating Forest Aboveground Biomass by Combining Optical and SAR Data: A Case Study in Genhe, Inner Mongolia, China

    PubMed Central

    Shao, Zhenfeng; Zhang, Linjing

    2016-01-01

    Estimation of forest aboveground biomass is critical for regional carbon policies and sustainable forest management. Passive optical remote sensing and active microwave remote sensing both play an important role in the monitoring of forest biomass. However, optical spectral reflectance is saturated in relatively dense vegetation areas, and microwave backscattering is significantly influenced by the underlying soil when the vegetation coverage is low. Both of these conditions decrease the estimation accuracy of forest biomass. A new optical and microwave integrated vegetation index (VI) was proposed based on observations from both field experiments and satellite (Landsat 8 Operational Land Imager (OLI) and RADARSAT-2) data. According to the difference in interaction between the multispectral reflectance and microwave backscattering signatures with biomass, the combined VI (COVI) was designed using the weighted optical optimized soil-adjusted vegetation index (OSAVI) and microwave horizontally transmitted and vertically received signal (HV) to overcome the disadvantages of both data types. The performance of the COVI was evaluated by comparison with those of the sole optical data, Synthetic Aperture Radar (SAR) data, and the simple combination of independent optical and SAR variables. The most accurate performance was obtained by the models based on the COVI and optical and microwave optimal variables excluding OSAVI and HV, in combination with a random forest algorithm and the largest number of reference samples. The results also revealed that the predictive accuracy depended highly on the statistical method and the number of sample units. The validation indicated that this integrated method of determining the new VI is a good synergistic way to combine both optical and microwave information for the accurate estimation of forest biomass. PMID:27338378

  3. Pol-In SAR Optimal Coherence Estimation and its application in Imaging Forest Canopy

    NASA Astrophysics Data System (ADS)

    Lin, Q.; Chu, T.; Zebker, H. A.

    2012-12-01

    Polarimetric SAR interferometry processing, combining poloarimetric and interferometric data, is a good candidate for global biomass estimation. One advantage of PolInSAR is the possibility to obtain interferograms from all possible linear combinations of polarization states, thus, it improves the coherence level and as a consequence, increases the accuarcy of the reconstructed elevation for scatters. PolIn SAR gives hope to find the scatter center for forest canopy and can be used to global biomass measurement. As a key procedure of PolIn SAR, coherence optimization is to obtain the optimal scatter mechanism between two SAR data acquisition which leads to the highest interferometric coherence estimation. Various algorithms has been proposed to solve this problem, including two-mechanism coherence (2MC) optimization, single-mechanism coherence (1MC) optimization, numeric range etc. The optimal coherence, as an essential parameter in Random Volume over Ground (RVOG) model, can be used to retrieve the forest tree height and thus, contributes to the global biomass estimation. We will examine the data acquired by ALOSPOL SAR in Hawaii area to image the forest canopy area. Various optimal coherence methods are used and the results are compared.

  4. Aboveground carbon loss in natural and managed tropical forests from 2000 to 2012

    NASA Astrophysics Data System (ADS)

    Tyukavina, A.; Baccini, A.; Hansen, M. C.; Potapov, P. V.; Stehman, S. V.; Houghton, R. A.; Krylov, A. M.; Turubanova, S.; Goetz, S. J.

    2015-07-01

    Tropical forests provide global climate regulation ecosystem services and their clearing is a significant source of anthropogenic greenhouse gas (GHG) emissions and resultant radiative forcing of climate change. However, consensus on pan-tropical forest carbon dynamics is lacking. We present a new estimate that employs recommended good practices to quantify gross tropical forest aboveground carbon (AGC) loss from 2000 to 2012 through the integration of Landsat-derived tree canopy cover, height, intactness and forest cover loss and GLAS-lidar derived forest biomass. An unbiased estimate of forest loss area is produced using a stratified random sample with strata derived from a wall-to-wall 30 m forest cover loss map. Our sample-based results separate the gross loss of forest AGC into losses from natural forests (0.59 PgC yr-1) and losses from managed forests (0.43 PgC yr-1) including plantations, agroforestry systems and subsistence agriculture. Latin America accounts for 43% of gross AGC loss and 54% of natural forest AGC loss, with Brazil experiencing the highest AGC loss for both categories at national scales. We estimate gross tropical forest AGC loss and natural forest loss to account for 11% and 6% of global year 2012 CO2 emissions, respectively. Given recent trends, natural forests will likely constitute an increasingly smaller proportion of tropical forest GHG emissions and of global emissions as fossil fuel consumption increases, with implications for the valuation of co-benefits in tropical forest conservation.

  5. Space complexity of estimation of distribution algorithms.

    PubMed

    Gao, Yong; Culberson, Joseph

    2005-01-01

    In this paper, we investigate the space complexity of the Estimation of Distribution Algorithms (EDAs), a class of sampling-based variants of the genetic algorithm. By analyzing the nature of EDAs, we identify criteria that characterize the space complexity of two typical implementation schemes of EDAs, the factorized distribution algorithm and Bayesian network-based algorithms. Using random additive functions as the prototype, we prove that the space complexity of the factorized distribution algorithm and Bayesian network-based algorithms is exponential in the problem size even if the optimization problem has a very sparse interaction structure.

  6. ASTErIsM: application of topometric clustering algorithms in automatic galaxy detection and classification

    NASA Astrophysics Data System (ADS)

    Tramacere, A.; Paraficz, D.; Dubath, P.; Kneib, J.-P.; Courbin, F.

    2016-12-01

    We present a study on galaxy detection and shape classification using topometric clustering algorithms. We first use the DBSCAN algorithm to extract, from CCD frames, groups of adjacent pixels with significant fluxes and we then apply the DENCLUE algorithm to separate the contributions of overlapping sources. The DENCLUE separation is based on the localization of pattern of local maxima, through an iterative algorithm, which associates each pixel to the closest local maximum. Our main classification goal is to take apart elliptical from spiral galaxies. We introduce new sets of features derived from the computation of geometrical invariant moments of the pixel group shape and from the statistics of the spatial distribution of the DENCLUE local maxima patterns. Ellipticals are characterized by a single group of local maxima, related to the galaxy core, while spiral galaxies have additional groups related to segments of spiral arms. We use two different supervised ensemble classification algorithms: Random Forest and Gradient Boosting. Using a sample of ≃24 000 galaxies taken from the Galaxy Zoo 2 main sample with spectroscopic redshifts, and we test our classification against the Galaxy Zoo 2 catalogue. We find that features extracted from our pipeline give, on average, an accuracy of ≃93 per cent, when testing on a test set with a size of 20 per cent of our full data set, with features deriving from the angular distribution of density attractor ranking at the top of the discrimination power.

  7. Education Highlights: Forest Biomass

    ScienceCinema

    Barone, Rachel; Canter, Christina

    2016-07-12

    Argonne intern Rachel Barone from Ithaca College worked with Argonne mentor Christina Canter in studying forest biomass. This research will help scientists develop large scale use of biofuels from forest biomass.

  8. Education Highlights: Forest Biomass

    SciTech Connect

    Barone, Rachel; Canter, Christina

    2016-01-27

    Argonne intern Rachel Barone from Ithaca College worked with Argonne mentor Christina Canter in studying forest biomass. This research will help scientists develop large scale use of biofuels from forest biomass.

  9. Random Vibrations

    NASA Technical Reports Server (NTRS)

    Messaro. Semma; Harrison, Phillip

    2010-01-01

    Ares I Zonal Random vibration environments due to acoustic impingement and combustion processes are develop for liftoff, ascent and reentry. Random Vibration test criteria for Ares I Upper Stage pyrotechnic components are developed by enveloping the applicable zonal environments where each component is located. Random vibration tests will be conducted to assure that these components will survive and function appropriately after exposure to the expected vibration environments. Methodology: Random Vibration test criteria for Ares I Upper Stage pyrotechnic components were desired that would envelope all the applicable environments where each component was located. Applicable Ares I Vehicle drawings and design information needed to be assessed to determine the location(s) for each component on the Ares I Upper Stage. Design and test criteria needed to be developed by plotting and enveloping the applicable environments using Microsoft Excel Spreadsheet Software and documenting them in a report Using Microsoft Word Processing Software. Conclusion: Random vibration liftoff, ascent, and green run design & test criteria for the Upper Stage Pyrotechnic Components were developed by using Microsoft Excel to envelope zonal environments applicable to each component. Results were transferred from Excel into a report using Microsoft Word. After the report is reviewed and edited by my mentor it will be submitted for publication as an attachment to a memorandum. Pyrotechnic component designers will extract criteria from my report for incorporation into the design and test specifications for components. Eventually the hardware will be tested to the environments I developed to assure that the components will survive and function appropriately after exposure to the expected vibration environments.

  10. The Fate of the Forest in Brazil, 2000 to 2013

    NASA Astrophysics Data System (ADS)

    Zalles, V.; Potapov, P.; Hansen, M.

    2015-12-01

    Better understanding the drivers of tropical deforestation is essential to research on global climate change and biodiversity loss, and would be particularly informative to ongoing international climate change negotiations. Geographically explicit maps of post-forest land cover can provide valuable information about the extent and spatial distribution of the major drivers of deforestation. Brazil is the country with the largest extent of tropical forest in the world and the one with the most tropical forest cover loss since the turn of this century. This fate of the forest study aims to determine which land covers have replaced forest cover in Brazil. Using a classification tree algorithm, we determined pasture and cropland extent in areas of forest cover loss in Brazil circa 2012. We used 30 m resolution Landsat data for the 2000-2013 time period as well as tree cover loss data from the Global Forest Change (GFC) maps published by Hansen et al. (2013). The GFC data was used to mask out areas not categorized as forest cover lost between 2000 and 2013. Additionally, the year of loss layer was used to disaggregate pasture and cropland extent by year. Our results comprehensively demonstrate the extent to which pasture is the dominant post-forest land cover in Brazil. More broadly, the product reveals spatiotemporal patterns of forest conversion to pasture and cropland in Brazil, which could lead to a better understanding of the underlying drivers of deforestation.

  11. Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    NASA Astrophysics Data System (ADS)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2016-03-01

    Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.

  12. Comparison of machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    NASA Astrophysics Data System (ADS)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2015-04-01

    Machine learning (ML) algorithms have been successfully evaluated as valuable tools in satellite-based rainfall retrievals which shows the high potential of ML algorithms when faced with high dimensional and complex data. Moreover, the recent developments in parallel computing with ML offer new possibilities in terms of training and predicting speed and therefore makes their usage in real time systems feasible. The present study compares four ML algorithms for rainfall area detection and rainfall rate assignment during daytime, night-time and twilight using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path are applied as predictor variables. As machine learning algorithms, random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) are chosen. The comparison is realised in three steps. First, an extensive tuning study is carried out to customise each of the models. Secondly, the models are trained using the optimum values of model parameters found in the tuning study. Finally, the trained models are used to detect rainfall areas and to assign rainfall rates using an independent validation datasets which is compared against ground-based radar data. To train and validate the models, the radar-based RADOLAN RW product from the German Weather Service (DWD) is used which provides area-wide gauge-adjusted hourly precipitation information. Though the differences in the performance of the algorithms were rather small, NNET and AVNNET have been identified as the most suitable algorithms. On average, they showed the best performance in rainfall area delineation as well as in rainfall rate assignment. The fast computation time of NNET allows to work with large datasets as it is required in remote sensing based rainfall retrievals. However, since none of the algorithms performed considerably better that the others we conclude that research

  13. A simple and practical control of the authenticity of organic sugarcane samples based on the use of machine-learning algorithms and trace elements determination by inductively coupled plasma mass spectrometry.

    PubMed

    Barbosa, Rommel M; Batista, Bruno L; Barião, Camila V; Varrique, Renan M; Coelho, Vinicius A; Campiglia, Andres D; Barbosa, Fernando

    2015-10-01

    A practical and easy control of the authenticity of organic sugarcane samples based on the use of machine-learning algorithms and trace elements determination by inductively coupled plasma mass spectrometry is proposed. Reference ranges for 32 chemical elements in 22 samples of sugarcane (13 organic and 9 non organic) were established and then two algorithms, Naive Bayes (NB) and Random Forest (RF), were evaluated to classify the samples. Accurate results (>90%) were obtained when using all variables (i.e., 32 elements). However, accuracy was improved (95.4% for NB) when only eight minerals (Rb, U, Al, Sr, Dy, Nb, Ta, Mo), chosen by a feature selection algorithm, were employed. Thus, the use of a fingerprint based on trace element levels associated with classification machine learning algorithms may be used as a simple alternative for authenticity evaluation of organic sugarcane samples.

  14. The Children's Rain Forest.

    ERIC Educational Resources Information Center

    Thornton, Carol A.; And Others

    1995-01-01

    Describes a unit on rain forests in which first graders studied about rain forests, built a classroom rain forest, and created a bulletin board. They also graphed rainfall, estimated body water, and estimated the number of newspapers that could be produced from one canopy tree. (MKR)

  15. An R package for spatial coverage sampling and random sampling from compact geographical strata by k-means

    NASA Astrophysics Data System (ADS)

    Walvoort, D. J. J.; Brus, D. J.; de Gruijter, J. J.

    2010-10-01

    Both for mapping and for estimating spatial means of an environmental variable, the accuracy of the result will usually be increased by dispersing the sample locations so that they cover the study area as uniformly as possible. We developed a new R package for designing spatial coverage samples for mapping, and for random sampling from compact geographical strata for estimating spatial means. The mean squared shortest distance (MSSD) was chosen as objective function, which can be minimized by k-means clustering. Two k-means algorithms are described, one for unequal area and one for equal area partitioning. The R package is illustrated with three examples: (1) subsampling of square and circular sampling plots commonly used in surveys of soil, vegetation, forest, etc.; (2) sampling of agricultural fields for soil testing; and (3) infill sampling of climate stations for mainland Australia and Tasmania. The algorithms give satisfactory results within reasonable computing time.

  16. Research on Routing Selection Algorithm Based on Genetic Algorithm

    NASA Astrophysics Data System (ADS)

    Gao, Guohong; Zhang, Baojian; Li, Xueyong; Lv, Jinna

    The hereditary algorithm is a kind of random searching and method of optimizing based on living beings natural selection and hereditary mechanism. In recent years, because of the potentiality in solving complicate problems and the successful application in the fields of industrial project, hereditary algorithm has been widely concerned by the domestic and international scholar. Routing Selection communication has been defined a standard communication model of IP version 6.This paper proposes a service model of Routing Selection communication, and designs and implements a new Routing Selection algorithm based on genetic algorithm.The experimental simulation results show that this algorithm can get more resolution at less time and more balanced network load, which enhances search ratio and the availability of network resource, and improves the quality of service.

  17. Haplotyping algorithms

    SciTech Connect

    Sobel, E.; Lange, K.; O`Connell, J.R.

    1996-12-31

    Haplotyping is the logical process of inferring gene flow in a pedigree based on phenotyping results at a small number of genetic loci. This paper formalizes the haplotyping problem and suggests four algorithms for haplotype reconstruction. These algorithms range from exhaustive enumeration of all haplotype vectors to combinatorial optimization by simulated annealing. Application of the algorithms to published genetic analyses shows that manual haplotyping is often erroneous. Haplotyping is employed in screening pedigrees for phenotyping errors and in positional cloning of disease genes from conserved haplotypes in population isolates. 26 refs., 6 figs., 3 tabs.

  18. Performance analysis of image processing algorithms for classification of natural vegetation in the mountains of southern California

    NASA Technical Reports Server (NTRS)

    Yool, S. R.; Star, J. L.; Estes, J. E.; Botkin, D. B.; Eckhardt, D. W.

    1986-01-01

    The earth's forests fix carbon from the atmosphere during photosynthesis. Scientists are concerned that massive forest removals may promote an increase in atmospheric carbon dioxide, with possible global warming and related environmental effects. Space-based remote sensing may enable the production of accurate world forest maps needed to examine this concern objectively. To test the limits of remote sensing for large-area forest mapping, we use Landsat data acquired over a site in the forested mountains of southern California to examine the relative capacities of a variety of popular image processing algorithms to discriminate different forest types. Results indicate that certain algorithms are best suited to forest classification. Differences in performance between the algorithms tested appear related to variations in their sensitivities to spectral variations caused by background reflectance, differential illumination, and spatial pattern by species. Results emphasize the complexity between the land-cover regime, remotely sensed data and the algorithms used to process these data.

  19. High contrast holograms using nanotube forest

    NASA Astrophysics Data System (ADS)

    Montelongo, Yunuen; Chen, Bingan; Butt, Haider; Robertson, John; Wilkinson, Timothy D.

    2013-09-01

    Nanotube forest behaves as highly absorbent material when they are randomly placed in sub-wavelength scales. Furthermore, it is possible to create diffractive structures when these bulks are patterned in a substrate. Here, we introduce an alternative to fabricate intensity holograms by patterning fringes of nanotube forest on a substrate. The result is an efficient intensity hologram that is not restricted to sub-wavelength patterning. Both the theoretical and experimental analysis was performed with good agreement. The produced holograms show a uniform behaviour throughout the visible spectra.

  20. Forest height estimation from mountain forest areas using general model-based decomposition for polarimetric interferometric synthetic aperture radar images

    NASA Astrophysics Data System (ADS)

    Minh, Nghia Pham; Zou, Bin; Cai, Hongjun; Wang, Chengyi

    2014-01-01

    The estimation of forest parameters over mountain forest areas using polarimetric interferometric synthetic aperture radar (PolInSAR) images is one of the greatest interests in remote sensing applications. For mountain forest areas, scattering mechanisms are strongly affected by the ground topography variations. Most of the previous studies in modeling microwave backscattering signatures of forest area have been carried out over relatively flat areas. Therefore, a new