Science.gov

Sample records for algorithm random forests

  1. Fault Detection of Aircraft System with Random Forest Algorithm and Similarity Measure

    PubMed Central

    Park, Wookje; Jung, Sikhang

    2014-01-01

    Research on fault detection algorithm was developed with the similarity measure and random forest algorithm. The organized algorithm was applied to unmanned aircraft vehicle (UAV) that was readied by us. Similarity measure was designed by the help of distance information, and its usefulness was also verified by proof. Fault decision was carried out by calculation of weighted similarity measure. Twelve available coefficients among healthy and faulty status data group were used to determine the decision. Similarity measure weighting was done and obtained through random forest algorithm (RFA); RF provides data priority. In order to get a fast response of decision, a limited number of coefficients was also considered. Relation of detection rate and amount of feature data were analyzed and illustrated. By repeated trial of similarity calculation, useful data amount was obtained. PMID:25057508

  2. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study

    NASA Astrophysics Data System (ADS)

    Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.

    2016-09-01

    There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck–chest–abdomen–pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21

  3. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study

    NASA Astrophysics Data System (ADS)

    Polan, Daniel F.; Brady, Samuel L.; Kaufman, Robert A.

    2016-09-01

    There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 n , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21

  4. Tissue segmentation of computed tomography images using a Random Forest algorithm: a feasibility study.

    PubMed

    Polan, Daniel F; Brady, Samuel L; Kaufman, Robert A

    2016-09-01

    There is a need for robust, fully automated whole body organ segmentation for diagnostic CT. This study investigates and optimizes a Random Forest algorithm for automated organ segmentation; explores the limitations of a Random Forest algorithm applied to the CT environment; and demonstrates segmentation accuracy in a feasibility study of pediatric and adult patients. To the best of our knowledge, this is the first study to investigate a trainable Weka segmentation (TWS) implementation using Random Forest machine-learning as a means to develop a fully automated tissue segmentation tool developed specifically for pediatric and adult examinations in a diagnostic CT environment. Current innovation in computed tomography (CT) is focused on radiomics, patient-specific radiation dose calculation, and image quality improvement using iterative reconstruction, all of which require specific knowledge of tissue and organ systems within a CT image. The purpose of this study was to develop a fully automated Random Forest classifier algorithm for segmentation of neck-chest-abdomen-pelvis CT examinations based on pediatric and adult CT protocols. Seven materials were classified: background, lung/internal air or gas, fat, muscle, solid organ parenchyma, blood/contrast enhanced fluid, and bone tissue using Matlab and the TWS plugin of FIJI. The following classifier feature filters of TWS were investigated: minimum, maximum, mean, and variance evaluated over a voxel radius of 2 (n) , (n from 0 to 4), along with noise reduction and edge preserving filters: Gaussian, bilateral, Kuwahara, and anisotropic diffusion. The Random Forest algorithm used 200 trees with 2 features randomly selected per node. The optimized auto-segmentation algorithm resulted in 16 image features including features derived from maximum, mean, variance Gaussian and Kuwahara filters. Dice similarity coefficient (DSC) calculations between manually segmented and Random Forest algorithm segmented images from 21

  5. High-resolution climate data over conterminous US using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Hashimoto, H.; Nemani, R. R.; Wang, W.

    2014-12-01

    We developed a new methodology to create high-resolution precipitation data using the random forest algorithm. We have used two approaches: physical downscaling from GCM data using a regional climate model, and interpolation from ground observation data. Physical downscaling method can be applied only for a small region because it is computationally expensive and complex to deploy. On the other hand, interpolation schemes from ground observations do not consider physical processes. In this study, we utilized the random forest algorithm to integrate atmospheric reanalysis data, satellite data, topography data, and ground observation data. First we considered situations where precipitation is same across the domain, largely dominated by storm like systems. We then picked several points to train random forest algorithm. The random forest algorithm estimates out-of-bag errors spatially, and produces the relative importance of each of the input variable.This methodology has the following advantages. (1) The methodology can ingest any spatial dataset to improve downscaling. Even non-precipitation datasets can be ingested such as satellite cloud cover data, radar reflectivity image, or modeled convective available potential energy. (2) The methodology is purely statistical so that physical assumptions are not required. Meanwhile, most of interpolation schemes assume empirical relationship between precipitation and elevation for orographic precipitation. (3) Low quality value in ingested data does not cause critical bias in the results because of the ensemble feature of random forest. Therefore, users do not need to pay a special attention to quality control of input data compared to other interpolation methodologies. (4) Same methodology can be applied to produce other high-resolution climate datasets, such as wind and cloud cover. Those variables are usually hard to be interpolated by conventional algorithms. In conclusion, the proposed methodology can produce reasonable

  6. Autoclassification of the Variable 3XMM Sources Using the Random Forest Machine Learning Algorithm

    NASA Astrophysics Data System (ADS)

    Farrell, Sean A.; Murphy, Tara; Lo, Kitty K.

    2015-11-01

    In the current era of large surveys and massive data sets, autoclassification of astrophysical sources using intelligent algorithms is becoming increasingly important. In this paper we present the catalog of variable sources in the Third XMM-Newton Serendipitous Source catalog (3XMM) autoclassified using the Random Forest machine learning algorithm. We used a sample of manually classified variable sources from the second data release of the XMM-Newton catalogs (2XMMi-DR2) to train the classifier, obtaining an accuracy of ∼92%. We also evaluated the effectiveness of identifying spurious detections using a sample of spurious sources, achieving an accuracy of ∼95%. Manual investigation of a random sample of classified sources confirmed these accuracy levels and showed that the Random Forest machine learning algorithm is highly effective at automatically classifying 3XMM sources. Here we present the catalog of classified 3XMM variable sources. We also present three previously unidentified unusual sources that were flagged as outlier sources by the algorithm: a new candidate supergiant fast X-ray transient, a 400 s X-ray pulsar, and an eclipsing 5 hr binary system coincident with a known Cepheid.

  7. Combining Spectral and Texture Features Using Random Forest Algorithm: Extracting Impervious Surface Area in Wuhan

    NASA Astrophysics Data System (ADS)

    Shao, Zhenfeng; Zhang, Yuan; Zhang, Lei; Song, Yang; Peng, Minjun

    2016-06-01

    Impervious surface area (ISA) is one of the most important indicators of urban environments. At present, based on multi-resolution remote sensing images, numerous approaches have been proposed to extract impervious surface, using statistical estimation, sub-pixel classification and spectral mixture analysis method of sub-pixel analysis. Through these methods, impervious surfaces can be effectively applied to regional-scale planning and management. However, for the large scale region, high resolution remote sensing images can provide more details, and therefore they will be more conducive to analysis environmental monitoring and urban management. Since the purpose of this study is to map impervious surfaces more effectively, three classification algorithms (random forests, decision trees, and artificial neural networks) were tested for their ability to map impervious surface. Random forests outperformed the decision trees, and artificial neural networks in precision. Combining the spectral indices and texture, random forests is applied to impervious surface extraction with a producer's accuracy of 0.98, a user's accuracy of 0.97, and an overall accuracy of 0.98 and a kappa coefficient of 0.97.

  8. Fault diagnosis in spur gears based on genetic algorithm and random forest

    NASA Astrophysics Data System (ADS)

    Cerrada, Mariela; Zurita, Grover; Cabrera, Diego; Sánchez, René-Vinicio; Artés, Mariano; Li, Chuan

    2016-03-01

    There are growing demands for condition-based monitoring of gearboxes, and therefore new methods to improve the reliability, effectiveness, accuracy of the gear fault detection ought to be evaluated. Feature selection is still an important aspect in machine learning-based diagnosis in order to reach good performance of the diagnostic models. On the other hand, random forest classifiers are suitable models in industrial environments where large data-samples are not usually available for training such diagnostic models. The main aim of this research is to build up a robust system for the multi-class fault diagnosis in spur gears, by selecting the best set of condition parameters on time, frequency and time-frequency domains, which are extracted from vibration signals. The diagnostic system is performed by using genetic algorithms and a classifier based on random forest, in a supervised environment. The original set of condition parameters is reduced around 66% regarding the initial size by using genetic algorithms, and still get an acceptable classification precision over 97%. The approach is tested on real vibration signals by considering several fault classes, one of them being an incipient fault, under different running conditions of load and velocity.

  9. Urban Road Detection in Airbone Laser Scanning Point Cloud Using Random Forest Algorithm

    NASA Astrophysics Data System (ADS)

    Kaczałek, B.; Borkowski, A.

    2016-06-01

    The objective of this research is to detect points that describe a road surface in an unclassified point cloud of the airborne laser scanning (ALS). For this purpose we use the Random Forest learning algorithm. The proposed methodology consists of two stages: preparation of features and supervised point cloud classification. In this approach we consider ALS points, representing only the last echo. For these points RGB, intensity, the normal vectors, their mean values and the standard deviations are provided. Moreover, local and global height variations are taken into account as components of a feature vector. The feature vectors are calculated on a basis of the 3D Delaunay triangulation. The proposed methodology was tested on point clouds with the average point density of 12 pts/m2 that represent large urban scene. The significance level of 15% was set up for a decision tree of the learning algorithm. As a result of the Random Forest classification we received two subsets of ALS points. One of those groups represents points belonging to the road network. After the classification evaluation we achieved from 90% of the overall classification accuracy. Finally, the ALS points representing roads were merged and simplified into road network polylines using morphological operations.

  10. Automatic classification of endogenous seismic sources within a landslide body using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Provost, Floriane; Hibert, Clément; Malet, Jean-Philippe; Stumpf, André; Doubre, Cécile

    2016-04-01

    Different studies have shown the presence of microseismic activity in soft-rock landslides. The seismic signals exhibit significantly different features in the time and frequency domains which allow their classification and interpretation. Most of the classes could be associated with different mechanisms of deformation occurring within and at the surface (e.g. rockfall, slide-quake, fissure opening, fluid circulation). However, some signals remain not fully understood and some classes contain few examples that prevent any interpretation. To move toward a more complete interpretation of the links between the dynamics of soft-rock landslides and the physical processes controlling their behaviour, a complete catalog of the endogeneous seismicity is needed. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forest classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied to the seismicity recorded at the Super-Sauze landslide between 2013 and 2015. We selected a dozen of seismic signal features that characterize precisely its spectral content (e.g. central frequency, spectrum width, energy in several frequency bands, spectrogram shape, spectrum local and global maxima

  11. Pre-Hospital Triage of Trauma Patients Using the Random Forest Computer Algorithm

    PubMed Central

    Scerbo, Michelle; Radhakrishnan, Hari; Cotton, Bryan; Dua, Anahita; Del Junco, Deborah; Wade, Charles; Holcomb, John B.

    2015-01-01

    Background Over-triage not only wastes resources but displaces the patient from their community and causes delay of treatment for the more seriously injured. This study aimed to validate the Random Forest computer model (RFM) as means of better triaging trauma patients to Level I trauma centers. Methods Adult trauma patients with “medium activation” presenting via helicopter to a Level I Trauma Center from May 2007 to May 2009 were included. The “medium activation” trauma patient is alert and hemodynamically stable on scene but has either subnormal vital signs or an accumulation of risk factors that may indicate a potentially serious injury. Variables included in the RFM computer analysis including demographics, mechanism of injury, pre-hospital fluid, medications, vitals, and disposition. Statistical analysis was performed via the Random Forest Algorithm to compare our institutional triage rate to rates determined by the RFM. Results A total of 1,653 patients were included in this study of which 496 were used in the testing set of the RFM. In our testing set, 33.8% of patients brought to our Level I trauma center could have been managed at a Level III trauma center and 88% of patients that required a Level I trauma center were identified correctly. In the testing set, there was an over-triage rate of 66% while utilizing the RFM we decreased the over-triage rate to 42% (p<0.001). There was an under-triage rate of 8.3%. The RFM predicted patient disposition with a sensitivity of 89%, specificity of 42%, negative predictive value of 92% and positive predictive value of 34%. Conclusion While prospective validation is required, it appears that computer modeling potentially could be used to guide triage decisions, allowing both more accurate triage and more efficient use of the trauma system. PMID:24484906

  12. Characterizing stand-level forest canopy cover and height using Landsat time series, samples of airborne LiDAR, and the Random Forest algorithm

    NASA Astrophysics Data System (ADS)

    Ahmed, Oumer S.; Franklin, Steven E.; Wulder, Michael A.; White, Joanne C.

    2015-03-01

    Many forest management activities, including the development of forest inventories, require spatially detailed forest canopy cover and height data. Among the various remote sensing technologies, LiDAR (Light Detection and Ranging) offers the most accurate and consistent means for obtaining reliable canopy structure measurements. A potential solution to reduce the cost of LiDAR data, is to integrate transects (samples) of LiDAR data with frequently acquired and spatially comprehensive optical remotely sensed data. Although multiple regression is commonly used for such modeling, often it does not fully capture the complex relationships between forest structure variables. This study investigates the potential of Random Forest (RF), a machine learning technique, to estimate LiDAR measured canopy structure using a time series of Landsat imagery. The study is implemented over a 2600 ha area of industrially managed coastal temperate forests on Vancouver Island, British Columbia, Canada. We implemented a trajectory-based approach to time series analysis that generates time since disturbance (TSD) and disturbance intensity information for each pixel and we used this information to stratify the forest land base into two strata: mature forests and young forests. Canopy cover and height for three forest classes (i.e. mature, young and mature and young (combined)) were modeled separately using multiple regression and Random Forest (RF) techniques. For all forest classes, the RF models provided improved estimates relative to the multiple regression models. The lowest validation error was obtained for the mature forest strata in a RF model (R2 = 0.88, RMSE = 2.39 m and bias = -0.16 for canopy height; R2 = 0.72, RMSE = 0.068% and bias = -0.0049 for canopy cover). This study demonstrates the value of using disturbance and successional history to inform estimates of canopy structure and obtain improved estimates of forest canopy cover and height using the RF algorithm.

  13. [Segmentation of Winter Wheat Canopy Image Based on Visual Spectral and Random Forest Algorithm].

    PubMed

    Liu, Ya-dong; Cui, Ri-xian

    2015-12-01

    Digital image analysis has been widely used in non-destructive monitoring of crop growth and nitrogen nutrition status due to its simplicity and efficiency. It is necessary to segment winter wheat plant from soil background for accessing canopy cover, intensity level of visible spectrum (R, G, and B) and other color indices derived from RGB. In present study, according to the variation in R, G, and B components of sRGB color space and L*, a*, and b* components of CIEL* a* b* color space between wheat plant and soil background, the segmentation of wheat plant from soil background were conducted by the Otsu's method based on a* component of CIEL* a* b* color space, and RGB based random forest method, and CIEL* a* b* based random forest method, respectively. Also the ability to segment wheat plant from soil background was evaluated with the value of segmentation accuracy. The results showed that all three methods had revealed good ability to segment wheat plant from soil background. The Otsu's method had lowest segmentation accuracy in comparison with the other two methods. There were only little difference in segmentation error between the two random forest methods. In conclusion, the random forest method had revealed its capacity to segment wheat plant from soil background with only the visual spectral information of canopy image without any color components combinations or any color space transformation.

  14. [Segmentation of Winter Wheat Canopy Image Based on Visual Spectral and Random Forest Algorithm].

    PubMed

    Liu, Ya-dong; Cui, Ri-xian

    2015-12-01

    Digital image analysis has been widely used in non-destructive monitoring of crop growth and nitrogen nutrition status due to its simplicity and efficiency. It is necessary to segment winter wheat plant from soil background for accessing canopy cover, intensity level of visible spectrum (R, G, and B) and other color indices derived from RGB. In present study, according to the variation in R, G, and B components of sRGB color space and L*, a*, and b* components of CIEL* a* b* color space between wheat plant and soil background, the segmentation of wheat plant from soil background were conducted by the Otsu's method based on a* component of CIEL* a* b* color space, and RGB based random forest method, and CIEL* a* b* based random forest method, respectively. Also the ability to segment wheat plant from soil background was evaluated with the value of segmentation accuracy. The results showed that all three methods had revealed good ability to segment wheat plant from soil background. The Otsu's method had lowest segmentation accuracy in comparison with the other two methods. There were only little difference in segmentation error between the two random forest methods. In conclusion, the random forest method had revealed its capacity to segment wheat plant from soil background with only the visual spectral information of canopy image without any color components combinations or any color space transformation. PMID:26964234

  15. Forest Fires in a Random Forest

    NASA Astrophysics Data System (ADS)

    Leuenberger, Michael; Kanevski, Mikhaïl; Vega Orozco, Carmen D.

    2013-04-01

    Forest fires in Canton Ticino (Switzerland) are very complex phenomena. Meteorological data can explain some occurrences of fires in time, but not necessarily in space. Using anthropogenic and geographical feature data with the random forest algorithm, this study tries to highlight factors that most influence the fire-ignition and to identify areas under risk. The fundamental scientific problem considered in the present research deals with an application of random forest algorithms for the analysis and modeling of forest fires patterns in a high dimensional input feature space. This study is focused on the 2,224 anthropogenic forest fires among the 2,401 forest fire ignition points that have occurred in Canton Ticino from 1969 to 2008. Provided by the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL), the database characterizes each fire by their location (x,y coordinates of the ignition point), start date, duration, burned area, and other information such as ignition cause and topographic features such as slope, aspect, altitude, etc. In addition, the database VECTOR25 from SwissTopo was used to extract information of the distances between fire ignition points and anthropogenic structures like buildings, road network, rail network, etc. Developed by L. Breiman and A. Cutler, the Random Forests (RF) algorithm provides an ensemble of classification and regression trees. By a pseudo-random variable selection for each split node, this method grows a variety of decision trees that do not return the same results, and thus by a committee system, returns a value that has a better accuracy than other machine learning methods. This algorithm incorporates directly measurement of importance variable which is used to display factors affecting forest fires. Dealing with this parameter, several models can be fit, and thus, a prediction can be made throughout the validity domain of Canton Ticino. Comprehensive RF analysis was carried out in order to 1

  16. [Study on prediction of compound-target-disease network of chuanxiong rhizoma based on random forest algorithm].

    PubMed

    Yuan, Jie; Li, Xiao-Jie; Chen, Chao; Song, Xiang-Gang; Wang, Shu-Mei

    2014-06-01

    To collect small molecule drugs and their drug target data such as enzymes, ion channels, G-protein-coupled receptors and nuclear receptors from KEGG database as the training sets, in order to establish drug-target interaction models based on the random forest algorithm. The accuracies of the models were evaluated by the 10-fold cross-validation test, showing that the predicted success rates of the four drug target models were 71.34%, 67.08%, 73.17% and 67.83%, respectively. The models were adopted to predict the targets of 26 chemical components and establish the compound-target-disease network. The results were well verified by literatures. The models established in this paper are highly accurate, and can be used to discover potential targets in other traditional Chinese medicine ingredients. PMID:25244771

  17. Land cover and land use mapping of the iSimangaliso Wetland Park, South Africa: comparison of oblique and orthogonal random forest algorithms

    NASA Astrophysics Data System (ADS)

    Bassa, Zaakirah; Bob, Urmilla; Szantoi, Zoltan; Ismail, Riyad

    2016-01-01

    In recent years, the popularity of tree-based ensemble methods for land cover classification has increased significantly. Using WorldView-2 image data, we evaluate the potential of the oblique random forest algorithm (oRF) to classify a highly heterogeneous protected area. In contrast to the random forest (RF) algorithm, the oRF algorithm builds multivariate trees by learning the optimal split using a supervised model. The oRF binary algorithm is adapted to a multiclass land cover and land use application using both the "one-against-one" and "one-against-all" combination approaches. Results show that the oRF algorithms are capable of achieving high classification accuracies (>80%). However, there was no statistical difference in classification accuracies obtained by the oRF algorithms and the more popular RF algorithm. For all the algorithms, user accuracies (UAs) and producer accuracies (PAs) >80% were recorded for most of the classes. Both the RF and oRF algorithms poorly classified the indigenous forest class as indicated by the low UAs and PAs. Finally, the results from this study advocate and support the utility of the oRF algorithm for land cover and land use mapping of protected areas using WorldView-2 image data.

  18. Comparison between WorldView-2 and SPOT-5 images in mapping the bracken fern using the random forest algorithm

    NASA Astrophysics Data System (ADS)

    Odindi, John; Adam, Elhadi; Ngubane, Zinhle; Mutanga, Onisimo; Slotow, Rob

    2014-01-01

    Plant species invasion is known to be a major threat to socioeconomic and ecological systems. Due to high cost and limited extents of urban green spaces, high mapping accuracy is necessary to optimize the management of such spaces. We compare the performance of the new-generation WorldView-2 (WV-2) and SPOT-5 images in mapping the bracken fern [Pteridium aquilinum (L) kuhn] in a conserved urban landscape. Using the random forest algorithm, grid-search approaches based on out-of-bag estimate error were used to determine the optimal ntree and mtry combinations. The variable importance and backward feature elimination techniques were further used to determine the influence of the image bands on mapping accuracy. Additionally, the value of the commonly used vegetation indices in enhancing the classification accuracy was tested on the better performing image data. Results show that the performance of the new WV-2 bands was better than that of the traditional bands. Overall classification accuracies of 84.72 and 72.22% were achieved for the WV-2 and SPOT images, respectively. Use of selected indices from the WV-2 bands increased the overall classification accuracy to 91.67%. The findings in this study show the suitability of the new generation in mapping the bracken fern within the often vulnerable urban natural vegetation cover types.

  19. PRBP: Prediction of RNA-Binding Proteins Using a Random Forest Algorithm Combined with an RNA-Binding Residue Predictor.

    PubMed

    Ma, Xin; Guo, Jing; Xiao, Ke; Sun, Xiao

    2015-01-01

    The prediction of RNA-binding proteins is an incredibly challenging problem in computational biology. Although great progress has been made using various machine learning approaches with numerous features, the problem is still far from being solved. In this study, we attempt to predict RNA-binding proteins directly from amino acid sequences. A novel approach, PRBP predicts RNA-binding proteins using the information of predicted RNA-binding residues in conjunction with a random forest based method. For a given protein, we first predict its RNA-binding residues and then judge whether the protein binds RNA or not based on information from that prediction. If the protein cannot be identified by the information associated with its predicted RNA-binding residues, then a novel random forest predictor is used to determine if the query protein is a RNA-binding protein. We incorporated features of evolutionary information combined with physicochemical features (EIPP) and amino acid composition feature to establish the random forest predictor. Feature analysis showed that EIPP contributed the most to the prediction of RNA-binding proteins. The results also showed that the information from the RNA-binding residue prediction improved the overall performance of our RNA-binding protein prediction. It is anticipated that the PRBP method will become a useful tool for identifying RNA-binding proteins. A PRBP Web server implementation is freely available at http://www.cbi.seu.edu.cn/PRBP/.

  20. A tale of two "forests": random forest machine learning AIDS tropical forest carbon mapping.

    PubMed

    Mascaro, Joseph; Asner, Gregory P; Knapp, David E; Kennedy-Bowdoin, Ty; Martin, Roberta E; Anderson, Christopher; Higgins, Mark; Chadwick, K Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including--in the latter case--x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called "out-of-bag"), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha(-1) when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686

  1. Detecting targets hidden in random forests

    NASA Astrophysics Data System (ADS)

    Kouritzin, Michael A.; Luo, Dandan; Newton, Fraser; Wu, Biao

    2009-05-01

    Military tanks, cargo or troop carriers, missile carriers or rocket launchers often hide themselves from detection in the forests. This plagues the detection problem of locating these hidden targets. An electro-optic camera mounted on a surveillance aircraft or unmanned aerial vehicle is used to capture the images of the forests with possible hidden targets, e.g., rocket launchers. We consider random forests of longitudinal and latitudinal correlations. Specifically, foliage coverage is encoded with a binary representation (i.e., foliage or no foliage), and is correlated in adjacent regions. We address the detection problem of camouflaged targets hidden in random forests by building memory into the observations. In particular, we propose an efficient algorithm to generate random forests, ground, and camouflage of hidden targets with two dimensional correlations. The observations are a sequence of snapshots consisting of foliage-obscured ground or target. Theoretically, detection is possible because there are subtle differences in the correlations of the ground and camouflage of the rocket launcher. However, these differences are well beyond human perception. To detect the presence of hidden targets automatically, we develop a Markov representation for these sequences and modify the classical filtering equations to allow the Markov chain observation. Particle filters are used to estimate the position of the targets in combination with a novel random weighting technique. Furthermore, we give positive proof-of-concept simulations.

  2. Automated classification of seismic sources in large database using random forest algorithm: First results at Piton de la Fournaise volcano (La Réunion).

    NASA Astrophysics Data System (ADS)

    Hibert, Clément; Provost, Floriane; Malet, Jean-Philippe; Stumpf, André; Maggi, Alessia; Ferrazzini, Valérie

    2016-04-01

    In the past decades the increasing quality of seismic sensors and capability to transfer remotely large quantity of data led to a fast densification of local, regional and global seismic networks for near real-time monitoring. This technological advance permits the use of seismology to document geological and natural/anthropogenic processes (volcanoes, ice-calving, landslides, snow and rock avalanches, geothermal fields), but also led to an ever-growing quantity of seismic data. This wealth of seismic data makes the construction of complete seismicity catalogs, that include earthquakes but also other sources of seismic waves, more challenging and very time-consuming as this critical pre-processing stage is classically done by human operators. To overcome this issue, the development of automatic methods for the processing of continuous seismic data appears to be a necessity. The classification algorithm should satisfy the need of a method that is robust, precise and versatile enough to be deployed to monitor the seismicity in very different contexts. We propose a multi-class detection method based on the random forests algorithm to automatically classify the source of seismic signals. Random forests is a supervised machine learning technique that is based on the computation of a large number of decision trees. The multiple decision trees are constructed from training sets including each of the target classes. In the case of seismic signals, these attributes may encompass spectral features but also waveform characteristics, multi-stations observations and other relevant information. The Random Forests classifier is used because it provides state-of-the-art performance when compared with other machine learning techniques (e.g. SVM, Neural Networks) and requires no fine tuning. Furthermore it is relatively fast, robust, easy to parallelize, and inherently suitable for multi-class problems. In this work, we present the first results of the classification method applied

  3. Weighted Hybrid Decision Tree Model for Random Forest Classifier

    NASA Astrophysics Data System (ADS)

    Kulkarni, Vrushali Y.; Sinha, Pradeep K.; Petare, Manisha C.

    2016-06-01

    Random Forest is an ensemble, supervised machine learning algorithm. An ensemble generates many classifiers and combines their results by majority voting. Random forest uses decision tree as base classifier. In decision tree induction, an attribute split/evaluation measure is used to decide the best split at each node of the decision tree. The generalization error of a forest of tree classifiers depends on the strength of the individual trees in the forest and the correlation among them. The work presented in this paper is related to attribute split measures and is a two step process: first theoretical study of the five selected split measures is done and a comparison matrix is generated to understand pros and cons of each measure. These theoretical results are verified by performing empirical analysis. For empirical analysis, random forest is generated using each of the five selected split measures, chosen one at a time. i.e. random forest using information gain, random forest using gain ratio, etc. The next step is, based on this theoretical and empirical analysis, a new approach of hybrid decision tree model for random forest classifier is proposed. In this model, individual decision tree in Random Forest is generated using different split measures. This model is augmented by weighted voting based on the strength of individual tree. The new approach has shown notable increase in the accuracy of random forest.

  4. A Random Forest-based ensemble method for activity recognition.

    PubMed

    Feng, Zengtao; Mo, Lingfei; Li, Meng

    2015-01-01

    This paper presents a multi-sensor ensemble approach to human physical activity (PA) recognition, using random forest. We designed an ensemble learning algorithm, which integrates several independent Random Forest classifiers based on different sensor feature sets to build a more stable, more accurate and faster classifier for human activity recognition. To evaluate the algorithm, PA data collected from the PAMAP (Physical Activity Monitoring for Aging People), which is a standard, publicly available database, was utilized to train and test. The experimental results show that the algorithm is able to correctly recognize 19 PA types with an accuracy of 93.44%, while the training is faster than others. The ensemble classifier system based on the RF (Random Forest) algorithm can achieve high recognition accuracy and fast calculation. PMID:26737432

  5. Evaluating total inorganic nitrogen in coastal waters through fusion of multi-temporal RADARSAT-2 and optical imagery using random forest algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Meiling; Liu, Xiangnan; Li, Jin; Ding, Chao; Jiang, Jiale

    2014-12-01

    Satellites routinely provide frequent, large-scale, near-surface views of many oceanographic variables pertinent to plankton ecology. However, the nutrient fertility of water can be challenging to detect accurately using remote sensing technology. This research has explored an approach to estimate the nutrient fertility in coastal waters through the fusion of synthetic aperture radar (SAR) images and optical images using the random forest (RF) algorithm. The estimation of total inorganic nitrogen (TIN) in the Hong Kong Sea, China, was used as a case study. In March of 2009 and May and August of 2010, a sequence of multi-temporal in situ data and CCD images from China's HJ-1 satellite and RADARSAT-2 images were acquired. Four sensitive parameters were selected as input variables to evaluate TIN: single-band reflectance, a normalized difference spectral index (NDSI) and HV and VH polarizations. The RF algorithm was used to merge the different input variables from the SAR and optical imagery to generate a new dataset (i.e., the TIN outputs). The results showed the temporal-spatial distribution of TIN. The TIN values decreased from coastal waters to the open water areas, and TIN values in the northeast area were higher than those found in the southwest region of the study area. The maximum TIN values occurred in May. Additionally, the estimation accuracy for estimating TIN was significantly improved when the SAR and optical data were used in combination rather than a single data type alone. This study suggests that this method of estimating nutrient fertility in coastal waters by effectively fusing data from multiple sensors is very promising.

  6. Mapping the distributions of C3 and C4 grasses in the mixed-grass prairies of southwest Oklahoma using the Random Forest classification algorithm

    NASA Astrophysics Data System (ADS)

    Yan, Dong; de Beurs, Kirsten M.

    2016-05-01

    The objective of this paper is to demonstrate a new method to map the distributions of C3 and C4 grasses at 30 m resolution and over a 25-year period of time (1988-2013) by combining the Random Forest (RF) classification algorithm and patch stable areas identified using the spatial pattern analysis software FRAGSTATS. Predictor variables for RF classifications consisted of ten spectral variables, four soil edaphic variables and three topographic variables. We provided a confidence score in terms of obtaining pure land cover at each pixel location by retrieving the classification tree votes. Classification accuracy assessments and predictor variable importance evaluations were conducted based on a repeated stratified sampling approach. Results show that patch stable areas obtained from larger patches are more appropriate to be used as sample data pools to train and validate RF classifiers for historical land cover mapping purposes and it is more reasonable to use patch stable areas as sample pools to map land cover in a year closer to the present rather than years further back in time. The percentage of obtained high confidence prediction pixels across the study area ranges from 71.18% in 1988 to 73.48% in 2013. The repeated stratified sampling approach is necessary in terms of reducing the positive bias in the estimated classification accuracy caused by the possible selections of training and validation pixels from the same patch stable areas. The RF classification algorithm was able to identify the important environmental factors affecting the distributions of C3 and C4 grasses in our study area such as elevation, soil pH, soil organic matter and soil texture.

  7. A Tale of Two “Forests”: Random Forest Machine Learning Aids Tropical Forest Carbon Mapping

    PubMed Central

    Mascaro, Joseph; Asner, Gregory P.; Knapp, David E.; Kennedy-Bowdoin, Ty; Martin, Roberta E.; Anderson, Christopher; Higgins, Mark; Chadwick, K. Dana

    2014-01-01

    Accurate and spatially-explicit maps of tropical forest carbon stocks are needed to implement carbon offset mechanisms such as REDD+ (Reduced Deforestation and Degradation Plus). The Random Forest machine learning algorithm may aid carbon mapping applications using remotely-sensed data. However, Random Forest has never been compared to traditional and potentially more reliable techniques such as regionally stratified sampling and upscaling, and it has rarely been employed with spatial data. Here, we evaluated the performance of Random Forest in upscaling airborne LiDAR (Light Detection and Ranging)-based carbon estimates compared to the stratification approach over a 16-million hectare focal area of the Western Amazon. We considered two runs of Random Forest, both with and without spatial contextual modeling by including—in the latter case—x, and y position directly in the model. In each case, we set aside 8 million hectares (i.e., half of the focal area) for validation; this rigorous test of Random Forest went above and beyond the internal validation normally compiled by the algorithm (i.e., called “out-of-bag”), which proved insufficient for this spatial application. In this heterogeneous region of Northern Peru, the model with spatial context was the best preforming run of Random Forest, and explained 59% of LiDAR-based carbon estimates within the validation area, compared to 37% for stratification or 43% by Random Forest without spatial context. With the 60% improvement in explained variation, RMSE against validation LiDAR samples improved from 33 to 26 Mg C ha−1 when using Random Forest with spatial context. Our results suggest that spatial context should be considered when using Random Forest, and that doing so may result in substantially improved carbon stock modeling for purposes of climate change mitigation. PMID:24489686

  8. Aggregated Recommendation through Random Forests

    PubMed Central

    2014-01-01

    Aggregated recommendation refers to the process of suggesting one kind of items to a group of users. Compared to user-oriented or item-oriented approaches, it is more general and, therefore, more appropriate for cold-start recommendation. In this paper, we propose a random forest approach to create aggregated recommender systems. The approach is used to predict the rating of a group of users to a kind of items. In the preprocessing stage, we merge user, item, and rating information to construct an aggregated decision table, where rating information serves as the decision attribute. We also model the data conversion process corresponding to the new user, new item, and both new problems. In the training stage, a forest is built for the aggregated training set, where each leaf is assigned a distribution of discrete rating. In the testing stage, we present four predicting approaches to compute evaluation values based on the distribution of each tree. Experiments results on the well-known MovieLens dataset show that the aggregated approach maintains an acceptable level of accuracy. PMID:25180204

  9. Randomized approximate nearest neighbors algorithm.

    PubMed

    Jones, Peter Wilcox; Osipov, Andrei; Rokhlin, Vladimir

    2011-09-20

    We present a randomized algorithm for the approximate nearest neighbor problem in d-dimensional Euclidean space. Given N points {x(j)} in R(d), the algorithm attempts to find k nearest neighbors for each of x(j), where k is a user-specified integer parameter. The algorithm is iterative, and its running time requirements are proportional to T·N·(d·(log d) + k·(d + log k)·(log N)) + N·k(2)·(d + log k), with T the number of iterations performed. The memory requirements of the procedure are of the order N·(d + k). A by-product of the scheme is a data structure, permitting a rapid search for the k nearest neighbors among {x(j)} for an arbitrary point x ∈ R(d). The cost of each such query is proportional to T·(d·(log d) + log(N/k)·k·(d + log k)), and the memory requirements for the requisite data structure are of the order N·(d + k) + T·(d + N). The algorithm utilizes random rotations and a basic divide-and-conquer scheme, followed by a local graph search. We analyze the scheme's behavior for certain types of distributions of {x(j)} and illustrate its performance via several numerical examples.

  10. Classification of large microarray datasets using fast random forest construction.

    PubMed

    Manilich, Elena A; Özsoyoğlu, Z Meral; Trubachev, Valeriy; Radivoyevitch, Tomas

    2011-04-01

    Random forest is an ensemble classification algorithm. It performs well when most predictive variables are noisy and can be used when the number of variables is much larger than the number of observations. The use of bootstrap samples and restricted subsets of attributes makes it more powerful than simple ensembles of trees. The main advantage of a random forest classifier is its explanatory power: it measures variable importance or impact of each factor on a predicted class label. These characteristics make the algorithm ideal for microarray data. It was shown to build models with high accuracy when tested on high-dimensional microarray datasets. Current implementations of random forest in the machine learning and statistics community, however, limit its usability for mining over large datasets, as they require that the entire dataset remains permanently in memory. We propose a new framework, an optimized implementation of a random forest classifier, which addresses specific properties of microarray data, takes computational complexity of a decision tree algorithm into consideration, and shows excellent computing performance while preserving predictive accuracy. The implementation is based on reducing overlapping computations and eliminating dependency on the size of main memory. The implementation's excellent computational performance makes the algorithm useful for interactive data analyses and data mining.

  11. A random forest classifier for lymph diseases.

    PubMed

    Azar, Ahmad Taher; Elshazly, Hanaa Ismail; Hassanien, Aboul Ella; Elkorany, Abeer Mohamed

    2014-02-01

    Machine learning-based classification techniques provide support for the decision-making process in many areas of health care, including diagnosis, prognosis, screening, etc. Feature selection (FS) is expected to improve classification performance, particularly in situations characterized by the high data dimensionality problem caused by relatively few training examples compared to a large number of measured features. In this paper, a random forest classifier (RFC) approach is proposed to diagnose lymph diseases. Focusing on feature selection, the first stage of the proposed system aims at constructing diverse feature selection algorithms such as genetic algorithm (GA), Principal Component Analysis (PCA), Relief-F, Fisher, Sequential Forward Floating Search (SFFS) and the Sequential Backward Floating Search (SBFS) for reducing the dimension of lymph diseases dataset. Switching from feature selection to model construction, in the second stage, the obtained feature subsets are fed into the RFC for efficient classification. It was observed that GA-RFC achieved the highest classification accuracy of 92.2%. The dimension of input feature space is reduced from eighteen to six features by using GA. PMID:24290902

  12. Random forests for classification in ecology.

    PubMed

    Cutler, D Richard; Edwards, Thomas C; Beard, Karen H; Cutler, Adele; Hess, Kyle T; Gibson, Jacob; Lawler, Joshua J

    2007-11-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature.

  13. Random forests for classification in ecology

    USGS Publications Warehouse

    Cutler, D.R.; Edwards, T.C.; Beard, K.H.; Cutler, A.; Hess, K.T.; Gibson, J.; Lawler, J.J.

    2007-01-01

    Classification procedures are some of the most widely used statistical methods in ecology. Random forests (RF) is a new and powerful statistical classifier that is well established in other disciplines but is relatively unknown in ecology. Advantages of RF compared to other statistical classifiers include (1) very high classification accuracy; (2) a novel method of determining variable importance; (3) ability to model complex interactions among predictor variables; (4) flexibility to perform several types of statistical data analysis, including regression, classification, survival analysis, and unsupervised learning; and (5) an algorithm for imputing missing values. We compared the accuracies of RF and four other commonly used statistical classifiers using data on invasive plant species presence in Lava Beds National Monument, California, USA, rare lichen species presence in the Pacific Northwest, USA, and nest sites for cavity nesting birds in the Uinta Mountains, Utah, USA. We observed high classification accuracy in all applications as measured by cross-validation and, in the case of the lichen data, by independent test data, when comparing RF to other common classification methods. We also observed that the variables that RF identified as most important for classifying invasive plant species coincided with expectations based on the literature. ?? 2007 by the Ecological Society of America.

  14. missForest: Nonparametric missing value imputation using random forest

    NASA Astrophysics Data System (ADS)

    Stekhoven, Daniel J.

    2015-05-01

    missForest imputes missing values particularly in the case of mixed-type data. It uses a random forest trained on the observed values of a data matrix to predict the missing values. It can be used to impute continuous and/or categorical data including complex interactions and non-linear relations. It yields an out-of-bag (OOB) imputation error estimate without the need of a test set or elaborate cross-validation and can be run in parallel to save computation time. missForest has been used to, among other things, impute variable star colors in an All-Sky Automated Survey (ASAS) dataset of variable stars with no NOMAD match.

  15. Randomized Algorithms for Matrices and Data

    NASA Astrophysics Data System (ADS)

    Mahoney, Michael W.

    2012-03-01

    This chapter reviews recent work on randomized matrix algorithms. By “randomized matrix algorithms,” we refer to a class of recently developed random sampling and random projection algorithms for ubiquitous linear algebra problems such as least-squares (LS) regression and low-rank matrix approximation. These developments have been driven by applications in large-scale data analysis—applications which place very different demands on matrices than traditional scientific computing applications. Thus, in this review, we will focus on highlighting the simplicity and generality of several core ideas that underlie the usefulness of these randomized algorithms in scientific applications such as genetics (where these algorithms have already been applied) and astronomy (where, hopefully, in part due to this review they will soon be applied). The work we will review here had its origins within theoretical computer science (TCS). An important feature in the use of randomized algorithms in TCS more generally is that one must identify and then algorithmically deal with relevant “nonuniformity structure” in the data. For the randomized matrix algorithms to be reviewed here and that have proven useful recently in numerical linear algebra (NLA) and large-scale data analysis applications, the relevant nonuniformity structure is defined by the so-called statistical leverage scores. Defined more precisely below, these leverage scores are basically the diagonal elements of the projection matrix onto the dominant part of the spectrum of the input matrix. As such, they have a long history in statistical data analysis, where they have been used for outlier detection in regression diagnostics. More generally, these scores often have a very natural interpretation in terms of the data and processes generating the data. For example, they can be interpreted in terms of the leverage or influence that a given data point has on, say, the best low-rank matrix approximation; and this

  16. Structure damage detection based on random forest recursive feature elimination

    NASA Astrophysics Data System (ADS)

    Zhou, Qifeng; Zhou, Hao; Zhou, Qingqing; Yang, Fan; Luo, Linkai

    2014-05-01

    Feature extraction is a key former step in structural damage detection. In this paper, a structural damage detection method based on wavelet packet decomposition (WPD) and random forest recursive feature elimination (RF-RFE) is proposed. In order to gain the most effective feature subset and to improve the identification accuracy a two-stage feature selection method is adopted after WPD. First, the damage features are sorted according to original random forest variable importance analysis. Second, using RF-RFE to eliminate the least important feature and reorder the feature list each time, then get the new feature importance sequence. Finally, k-nearest neighbor (KNN) algorithm, as a benchmark classifier, is used to evaluate the extracted feature subset. A four-storey steel shear building model is chosen as an example in method verification. The experimental results show that using the fewer features got from proposed method can achieve higher identification accuracy and reduce the detection time cost.

  17. RandomForest4Life: a Random Forest for predicting ALS disease progression.

    PubMed

    Hothorn, Torsten; Jung, Hans H

    2014-09-01

    We describe a method for predicting disease progression in amyotrophic lateral sclerosis (ALS) patients. The method was developed as a submission to the DREAM Phil Bowen ALS Prediction Prize4Life Challenge of summer 2012. Based on repeated patient examinations over a three- month period, we used a random forest algorithm to predict future disease progression. The procedure was set up and internally evaluated using data from 1197 ALS patients. External validation by an expert jury was based on undisclosed information of an additional 625 patients; all patient data were obtained from the PRO-ACT database. In terms of prediction accuracy, the approach described here ranked third best. Our interpretation of the prediction model confirmed previous reports suggesting that past disease progression is a strong predictor of future disease progression measured on the ALS functional rating scale (ALSFRS). We also found that larger variability in initial ALSFRS scores is linked to faster future disease progression. The results reported here furthermore suggested that approaches taking the multidimensionality of the ALSFRS into account promise some potential for improved ALS disease prediction.

  18. Random rotation survival forest for high dimensional censored data.

    PubMed

    Zhou, Lifeng; Wang, Hong; Xu, Qingsong

    2016-01-01

    Recently, rotation forest has been extended to regression and survival analysis problems. However, due to intensive computation incurred by principal component analysis, rotation forest often fails when high-dimensional or big data are confronted. In this study, we extend rotation forest to high dimensional censored time-to-event data analysis by combing random subspace, bagging and rotation forest. Supported by proper statistical analysis, we show that the proposed method random rotation survival forest outperforms state-of-the-art survival ensembles such as random survival forest and popular regularized Cox models. PMID:27625979

  19. Random Bits Forest: a Strong Classifier/Regressor for Big Data

    PubMed Central

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-01-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS). PMID:27444562

  20. Random Bits Forest: a Strong Classifier/Regressor for Big Data.

    PubMed

    Wang, Yi; Li, Yi; Pu, Weilin; Wen, Kathryn; Shugart, Yin Yao; Xiong, Momiao; Jin, Li

    2016-01-01

    Efficiency, memory consumption, and robustness are common problems with many popular methods for data analysis. As a solution, we present Random Bits Forest (RBF), a classification and regression algorithm that integrates neural networks (for depth), boosting (for width), and random forests (for prediction accuracy). Through a gradient boosting scheme, it first generates and selects ~10,000 small, 3-layer random neural networks. These networks are then fed into a modified random forest algorithm to obtain predictions. Testing with datasets from the UCI (University of California, Irvine) Machine Learning Repository shows that RBF outperforms other popular methods in both accuracy and robustness, especially with large datasets (N > 1000). The algorithm also performed highly in testing with an independent data set, a real psoriasis genome-wide association study (GWAS). PMID:27444562

  1. Ancestry assessment using random forest modeling.

    PubMed

    Hefner, Joseph T; Spradley, M Kate; Anderson, Bruce

    2014-05-01

    A skeletal assessment of ancestry relies on morphoscopic traits and skeletal measurements. Using a sample of American Black (n = 38), American White (n = 39), and Southwest Hispanics (n = 72), the present study investigates whether these data provide similar biological information and combines both data types into a single classification using a random forest model (RFM). Our results indicate that both data types provide similar information concerning the relationships among population groups. Also, by combining both in an RFM, the correct allocation of ancestry for an unknown cranium increases. The distribution of cross-validated grouped cases correctly classified using discriminant analyses and RFMs ranges between 75.4% (discriminant function analysis, morphoscopic data only) and 89.6% (RFM). Unlike the traditional, experience-based approach using morphoscopic traits, the inclusion of both data types in a single analysis is a quantifiable approach accounting for more variation within and between groups, reducing misclassification rates, and capturing aspects of cranial shape, size, and morphology.

  2. Photometric classification of quasars from RCS-2 using Random Forest

    NASA Astrophysics Data System (ADS)

    Carrasco, D.; Barrientos, L. F.; Pichara, K.; Anguita, T.; Murphy, D. N. A.; Gilbank, D. G.; Gladders, M. D.; Yee, H. K. C.; Hsieh, B. C.; López, S.

    2015-12-01

    The classification and identification of quasars is fundamental to many astronomical research areas. Given the large volume of photometric survey data available in the near future, automated methods for doing so are required. In this article, we present a new quasar candidate catalog from the Red-Sequence Cluster Survey 2 (RCS-2), identified solely from photometric information using an automated algorithm suitable for large surveys. The algorithm performance is tested using a well-defined SDSS spectroscopic sample of quasars and stars. The Random Forest algorithm constructs the catalog from RCS-2 point sources using SDSS spectroscopically-confirmed stars and quasars. The algorithm identifies putative quasars from broadband magnitudes (g, r, i, z) and colors. Exploiting NUV GALEX measurements for a subset of the objects, we refine the classifier by adding new information. An additional subset of the data with WISE W1 and W2 bands is also studied. Upon analyzing 542 897 RCS-2 point sources, the algorithm identified 21 501 quasar candidates with a training-set-derived precision (the fraction of true positives within the group assigned quasar status) of 89.5% and recall (the fraction of true positives relative to all sources that actually are quasars) of 88.4%. These performance metrics improve for the GALEX subset: 6529 quasar candidates are identified from 16 898 sources, with a precision and recall of 97.0% and 97.5%, respectively. Algorithm performance is further improved when WISE data are included, with precision and recall increasing to 99.3% and 99.1%, respectively, for 21 834 quasar candidates from 242 902 sources. We compiled our final catalog (38 257) by merging these samples and removing duplicates. An observational follow up of 17 bright (r < 19) candidates with long-slit spectroscopy at DuPont telescope (LCO) yields 14 confirmed quasars. The results signal encouraging progress in the classification of point sources with Random Forest algorithms to search

  3. Improving protein fold recognition by random forest

    PubMed Central

    2014-01-01

    Background Recognizing the correct structural fold among known template protein structures for a target protein (i.e. fold recognition) is essential for template-based protein structure modeling. Since the fold recognition problem can be defined as a binary classification problem of predicting whether or not the unknown fold of a target protein is similar to an already known template protein structure in a library, machine learning methods have been effectively applied to tackle this problem. In our work, we developed RF-Fold that uses random forest - one of the most powerful and scalable machine learning classification methods - to recognize protein folds. Results RF-Fold consists of hundreds of decision trees that can be trained efficiently on very large datasets to make accurate predictions on a highly imbalanced dataset. We evaluated RF-Fold on the standard Lindahl's benchmark dataset comprised of 976 × 975 target-template protein pairs through cross-validation. Compared with 17 different fold recognition methods, the performance of RF-Fold is generally comparable to the best performance in fold recognition of different difficulty ranging from the easiest family level, the medium-hard superfamily level, and to the hardest fold level. Based on the top-one template protein ranked by RF-Fold, the correct recognition rate is 84.5%, 63.4%, and 40.8% at family, superfamily, and fold levels, respectively. Based on the top-five template protein folds ranked by RF-Fold, the correct recognition rate increases to 91.5%, 79.3% and 58.3% at family, superfamily, and fold levels. Conclusions The good performance achieved by the RF-Fold demonstrates the random forest's effectiveness for protein fold recognition. PMID:25350499

  4. Random forest automated supervised classification of Hipparcos periodic variable stars

    NASA Astrophysics Data System (ADS)

    Dubath, P.; Rimoldini, L.; Süveges, M.; Blomme, J.; López, M.; Sarro, L. M.; De Ridder, J.; Cuypers, J.; Guy, L.; Lecoeur, I.; Nienartowicz, K.; Jan, A.; Beck, M.; Mowlavi, N.; De Cat, P.; Lebzelter, T.; Eyer, L.

    2011-07-01

    We present an evaluation of the performance of an automated classification of the Hipparcos periodic variable stars into 26 types. The sub-sample with the most reliable variability types available in the literature is used to train supervised algorithms to characterize the type dependencies on a number of attributes. The most useful attributes evaluated with the random forest methodology include, in decreasing order of importance, the period, the amplitude, the V-I colour index, the absolute magnitude, the residual around the folded light-curve model, the magnitude distribution skewness and the amplitude of the second harmonic of the Fourier series model relative to that of the fundamental frequency. Random forests and a multi-stage scheme involving Bayesian network and Gaussian mixture methods lead to statistically equivalent results. In standard 10-fold cross-validation (CV) experiments, the rate of correct classification is between 90 and 100 per cent, depending on the variability type. The main mis-classification cases, up to a rate of about 10 per cent, arise due to confusion between SPB and ACV blue variables and between eclipsing binaries, ellipsoidal variables and other variability types. Our training set and the predicted types for the other Hipparcos periodic stars are available online.

  5. Ancestry assessment using random forest modeling.

    PubMed

    Hefner, Joseph T; Spradley, M Kate; Anderson, Bruce

    2014-05-01

    A skeletal assessment of ancestry relies on morphoscopic traits and skeletal measurements. Using a sample of American Black (n = 38), American White (n = 39), and Southwest Hispanics (n = 72), the present study investigates whether these data provide similar biological information and combines both data types into a single classification using a random forest model (RFM). Our results indicate that both data types provide similar information concerning the relationships among population groups. Also, by combining both in an RFM, the correct allocation of ancestry for an unknown cranium increases. The distribution of cross-validated grouped cases correctly classified using discriminant analyses and RFMs ranges between 75.4% (discriminant function analysis, morphoscopic data only) and 89.6% (RFM). Unlike the traditional, experience-based approach using morphoscopic traits, the inclusion of both data types in a single analysis is a quantifiable approach accounting for more variation within and between groups, reducing misclassification rates, and capturing aspects of cranial shape, size, and morphology. PMID:24502438

  6. A parallel algorithm for random searches

    NASA Astrophysics Data System (ADS)

    Wosniack, M. E.; Raposo, E. P.; Viswanathan, G. M.; da Luz, M. G. E.

    2015-11-01

    We discuss a parallelization procedure for a two-dimensional random search of a single individual, a typical sequential process. To assure the same features of the sequential random search in the parallel version, we analyze the former spatial patterns of the encountered targets for different search strategies and densities of homogeneously distributed targets. We identify a lognormal tendency for the distribution of distances between consecutively detected targets. Then, by assigning the distinct mean and standard deviation of this distribution for each corresponding configuration in the parallel simulations (constituted by parallel random walkers), we are able to recover important statistical properties, e.g., the target detection efficiency, of the original problem. The proposed parallel approach presents a speedup of nearly one order of magnitude compared with the sequential implementation. This algorithm can be easily adapted to different instances, as searches in three dimensions. Its possible range of applicability covers problems in areas as diverse as automated computer searchers in high-capacity databases and animal foraging.

  7. Patch forest: a hybrid framework of random forest and patch-based segmentation

    NASA Astrophysics Data System (ADS)

    Xie, Zhongliu; Gillies, Duncan

    2016-03-01

    The development of an accurate, robust and fast segmentation algorithm has long been a research focus in medical computer vision. State-of-the-art practices often involve non-rigidly registering a target image with a set of training atlases for label propagation over the target space to perform segmentation, a.k.a. multi-atlas label propagation (MALP). In recent years, the patch-based segmentation (PBS) framework has gained wide attention due to its advantage of relaxing the strict voxel-to-voxel correspondence to a series of pair-wise patch comparisons for contextual pattern matching. Despite a high accuracy reported in many scenarios, computational efficiency has consistently been a major obstacle for both approaches. Inspired by recent work on random forest, in this paper we propose a patch forest approach, which by equipping the conventional PBS with a fast patch search engine, is able to boost segmentation speed significantly while retaining an equal level of accuracy. In addition, a fast forest training mechanism is also proposed, with the use of a dynamic grid framework to efficiently approximate data compactness computation and a 3D integral image technique for fast box feature retrieval.

  8. Random Forest (RF) Wrappers for Waveband Selection and Classification of Hyperspectral Data.

    PubMed

    Poona, Nitesh Keshavelal; van Niekerk, Adriaan; Nadel, Ryan Leslie; Ismail, Riyad

    2016-02-01

    Hyperspectral data collected using a field spectroradiometer was used to model asymptomatic stress in Pinus radiata and Pinus patula seedlings infected with the pathogen Fusarium circinatum. Spectral data were analyzed using the random forest algorithm. To improve the classification accuracy of the model, subsets of wavebands were selected using three feature selection algorithms: (1) Boruta; (2) recursive feature elimination (RFE); and (3) area under the receiver operating characteristic curve of the random forest (AUC-RF). Results highlighted the robustness of the above feature selection methods when used in conjunction with the random forest algorithm for analyzing hyperspectral data. Overall, the Boruta feature selection algorithm provided the best results. When discriminating F. circinatum stress in Pinus radiata seedlings, Boruta selected wavebands (n = 69) yielded the best overall classification accuracies (training error of 17.00%, independent test error of 17.00% and an AUC value of 0.91). Classification results were, however, significantly lower for P. patula seedlings, with a training error of 24.00%, independent test error of 38.00%, and an AUC value of 0.65. A hybrid selection method that utilizes combinations of wavebands selected from the three feature selection algorithms was also tested. The hybrid method showed an improvement in classification accuracies for P. patula, and no improvement for P. radiata. The results of this study provide impetus towards implementing a hyperspectral framework for detecting stress within nursery environments.

  9. Using Random Forest Models to Predict Organizational Violence

    NASA Technical Reports Server (NTRS)

    Levine, Burton; Bobashev, Georgly

    2012-01-01

    We present a methodology to access the proclivity of an organization to commit violence against nongovernment personnel. We fitted a Random Forest model using the Minority at Risk Organizational Behavior (MAROS) dataset. The MAROS data is longitudinal; so, individual observations are not independent. We propose a modification to the standard Random Forest methodology to account for the violation of the independence assumption. We present the results of the model fit, an example of predicting violence for an organization; and finally, we present a summary of the forest in a "meta-tree,"

  10. Random forest construction with robust semisupervised node splitting.

    PubMed

    Liu, Xiao; Song, Mingli; Tao, Dacheng; Liu, Zicheng; Zhang, Luming; Chen, Chun; Bu, Jiajun

    2015-01-01

    Random forest (RF) is a very important classifier with applications in various machine learning tasks, but its promising performance heavily relies on the size of labeled training data. In this paper, we investigate constructing of RFs with a small size of labeled data and find that the performance bottleneck is located in the node splitting procedures; hence, existing solutions fail to properly partition the feature space if there are insufficient training data. To achieve robust node splitting with insufficient data, we present semisupervised splitting to overcome this limitation by splitting nodes with the guidance of both labeled and abundant unlabeled data. In particular, an accurate quality measure of node splitting is obtained by carrying out the kernel-based density estimation, whereby a multiclass version of asymptotic mean integrated squared error criterion is proposed to adaptively select the optimal bandwidth of the kernel. To avoid the curse of dimensionality, we project the data points from the original high-dimensional feature space onto a low-dimensional subspace before estimation. A unified optimization framework is proposed to select a coupled pair of subspace and separating hyperplane such that the smoothness of the subspace and the quality of the splitting are guaranteed simultaneously. Our algorithm efficiently avoids overfitting caused by bad initialization and local maxima when compared with conventional margin maximization-based semisupervised methods. We demonstrate the effectiveness of the proposed algorithm by comparing it with state-of-the-art supervised and semisupervised algorithms for typical computer vision applications, such as object categorization, face recognition, and image segmentation, on publicly available data sets. PMID:25494503

  11. CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data.

    PubMed

    Bressler, Ryan; Kreisberg, Richard B; Bernard, Brady; Niederhuber, John E; Vockley, Joseph G; Shmulevich, Ilya; Knijnenburg, Theo A

    2015-01-01

    Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. PMID:26679347

  12. CloudForest: A Scalable and Efficient Random Forest Implementation for Biological Data

    PubMed Central

    Bressler, Ryan; Kreisberg, Richard B.; Bernard, Brady; Niederhuber, John E.; Vockley, Joseph G.; Shmulevich, Ilya; Knijnenburg, Theo A.

    2015-01-01

    Random Forest has become a standard data analysis tool in computational biology. However, extensions to existing implementations are often necessary to handle the complexity of biological datasets and their associated research questions. The growing size of these datasets requires high performance implementations. We describe CloudForest, a Random Forest package written in Go, which is particularly well suited for large, heterogeneous, genetic and biomedical datasets. CloudForest includes several extensions, such as dealing with unbalanced classes and missing values. Its flexible design enables users to easily implement additional extensions. CloudForest achieves fast running times by effective use of the CPU cache, optimizing for different classes of features and efficiently multi-threading. https://github.com/ilyalab/CloudForest. PMID:26679347

  13. Decoherence in optimized quantum random-walk search algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Yu-Chao; Bao, Wan-Su; Wang, Xiang; Fu, Xiang-Qun

    2015-08-01

    This paper investigates the effects of decoherence generated by broken-link-type noise in the hypercube on an optimized quantum random-walk search algorithm. When the hypercube occurs with random broken links, the optimized quantum random-walk search algorithm with decoherence is depicted through defining the shift operator which includes the possibility of broken links. For a given database size, we obtain the maximum success rate of the algorithm and the required number of iterations through numerical simulations and analysis when the algorithm is in the presence of decoherence. Then the computational complexity of the algorithm with decoherence is obtained. The results show that the ultimate effect of broken-link-type decoherence on the optimized quantum random-walk search algorithm is negative. Project supported by the National Basic Research Program of China (Grant No. 2013CB338002).

  14. Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.

    PubMed

    Winham, Stacey J; Jenkins, Gregory D; Biernacka, Joanna M

    2016-02-01

    Machine learning methods, including Random Forests (RF), are increasingly used for genetic data analysis. However, the standard RF algorithm does not correctly model the effects of X chromosome single nucleotide polymorphisms (SNPs), leading to biased estimates of variable importance. We propose extensions of RF to correctly model X SNPs, including a stratified approach and an approach based on the process of X chromosome inactivation. We applied the new and standard RF approaches to case-control alcohol dependence data from the Study of Addiction: Genes and Environment (SAGE), and compared the performance of the alternative approaches via a simulation study. Standard RF applied to a case-control study of alcohol dependence yielded inflated variable importance estimates for X SNPs, even when sex was included as a variable, but the results of the new RF methods were consistent with univariate regression-based approaches that correctly model X chromosome data. Simulations showed that the new RF methods eliminate the bias in standard RF variable importance for X SNPs when sex is associated with the trait, and are able to detect causal autosomal and X SNPs. Even in the absence of sex effects, the new extensions perform similarly to standard RF. Thus, we provide a powerful multimarker approach for genetic analysis that accommodates X chromosome data in an unbiased way. This method is implemented in the freely available R package "snpRF" (http://www.cran.r-project.org/web/packages/snpRF/). PMID:26639183

  15. Modeling X Chromosome Data Using Random Forests: Conquering Sex Bias.

    PubMed

    Winham, Stacey J; Jenkins, Gregory D; Biernacka, Joanna M

    2016-02-01

    Machine learning methods, including Random Forests (RF), are increasingly used for genetic data analysis. However, the standard RF algorithm does not correctly model the effects of X chromosome single nucleotide polymorphisms (SNPs), leading to biased estimates of variable importance. We propose extensions of RF to correctly model X SNPs, including a stratified approach and an approach based on the process of X chromosome inactivation. We applied the new and standard RF approaches to case-control alcohol dependence data from the Study of Addiction: Genes and Environment (SAGE), and compared the performance of the alternative approaches via a simulation study. Standard RF applied to a case-control study of alcohol dependence yielded inflated variable importance estimates for X SNPs, even when sex was included as a variable, but the results of the new RF methods were consistent with univariate regression-based approaches that correctly model X chromosome data. Simulations showed that the new RF methods eliminate the bias in standard RF variable importance for X SNPs when sex is associated with the trait, and are able to detect causal autosomal and X SNPs. Even in the absence of sex effects, the new extensions perform similarly to standard RF. Thus, we provide a powerful multimarker approach for genetic analysis that accommodates X chromosome data in an unbiased way. This method is implemented in the freely available R package "snpRF" (http://www.cran.r-project.org/web/packages/snpRF/).

  16. Global patterns and predictions of seafloor biomass using random forests.

    PubMed

    Wei, Chih-Lin; Rowe, Gilbert T; Escobar-Briones, Elva; Boetius, Antje; Soltwedel, Thomas; Caley, M Julian; Soliman, Yousria; Huettmann, Falk; Qu, Fangyuan; Yu, Zishan; Pitcher, C Roland; Haedrich, Richard L; Wicksten, Mary K; Rex, Michael A; Baguley, Jeffrey G; Sharma, Jyotsna; Danovaro, Roberto; MacDonald, Ian R; Nunnally, Clifton C; Deming, Jody W; Montagna, Paul; Lévesque, Mélanie; Weslawski, Jan Marcin; Wlodarska-Kowalczuk, Maria; Ingole, Baban S; Bett, Brian J; Billett, David S M; Yool, Andrew; Bluhm, Bodil A; Iken, Katrin; Narayanaswamy, Bhavani E

    2010-01-01

    A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM), seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes). Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC) flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management. PMID:21209928

  17. Global Patterns and Predictions of Seafloor Biomass Using Random Forests

    PubMed Central

    Wei, Chih-Lin; Rowe, Gilbert T.; Escobar-Briones, Elva; Boetius, Antje; Soltwedel, Thomas; Caley, M. Julian; Soliman, Yousria; Huettmann, Falk; Qu, Fangyuan; Yu, Zishan; Pitcher, C. Roland; Haedrich, Richard L.; Wicksten, Mary K.; Rex, Michael A.; Baguley, Jeffrey G.; Sharma, Jyotsna; Danovaro, Roberto; MacDonald, Ian R.; Nunnally, Clifton C.; Deming, Jody W.; Montagna, Paul; Lévesque, Mélanie; Weslawski, Jan Marcin; Wlodarska-Kowalczuk, Maria; Ingole, Baban S.; Bett, Brian J.; Billett, David S. M.; Yool, Andrew; Bluhm, Bodil A.; Iken, Katrin; Narayanaswamy, Bhavani E.

    2010-01-01

    A comprehensive seafloor biomass and abundance database has been constructed from 24 oceanographic institutions worldwide within the Census of Marine Life (CoML) field projects. The machine-learning algorithm, Random Forests, was employed to model and predict seafloor standing stocks from surface primary production, water-column integrated and export particulate organic matter (POM), seafloor relief, and bottom water properties. The predictive models explain 63% to 88% of stock variance among the major size groups. Individual and composite maps of predicted global seafloor biomass and abundance are generated for bacteria, meiofauna, macrofauna, and megafauna (invertebrates and fishes). Patterns of benthic standing stocks were positive functions of surface primary production and delivery of the particulate organic carbon (POC) flux to the seafloor. At a regional scale, the census maps illustrate that integrated biomass is highest at the poles, on continental margins associated with coastal upwelling and with broad zones associated with equatorial divergence. Lowest values are consistently encountered on the central abyssal plains of major ocean basins The shift of biomass dominance groups with depth is shown to be affected by the decrease in average body size rather than abundance, presumably due to decrease in quantity and quality of food supply. This biomass census and associated maps are vital components of mechanistic deep-sea food web models and global carbon cycling, and as such provide fundamental information that can be incorporated into evidence-based management. PMID:21209928

  18. Experimental implementation of the quantum random-walk algorithm

    SciTech Connect

    Du Jiangfeng; Li Hui; Shi Mingjun; Zhou Xianyi; Han Rongdian; Xu Xiaodong; Wu Jihui

    2003-04-01

    The quantum random walk is a possible approach to construct quantum algorithms. Several groups have investigated the quantum random walk and experimental schemes were proposed. In this paper, we present the experimental implementation of the quantum random-walk algorithm on a nuclear-magnetic-resonance quantum computer. We observe that the quantum walk is in sharp contrast to its classical counterpart. In particular, the properties of the quantum walk strongly depends on the quantum entanglement.

  19. Adapting GNU random forest program for Unix and Windows

    NASA Astrophysics Data System (ADS)

    Jirina, Marcel; Krayem, M. Said; Jirina, Marcel, Jr.

    2013-10-01

    The Random Forest is a well-known method and also a program for data clustering and classification. Unfortunately, the original Random Forest program is rather difficult to use. Here we describe a new version of this program originally written in Fortran 77. The modified program in Fortran 95 needs to be compiled only once and information for different tasks is passed with help of arguments. The program was tested with 24 data sets from UCI MLR and results are available on the net.

  20. Topics in Randomized Algorithms for Numerical Linear Algebra

    NASA Astrophysics Data System (ADS)

    Holodnak, John T.

    In this dissertation, we present results for three topics in randomized algorithms. Each topic is related to random sampling. We begin by studying a randomized algorithm for matrix multiplication that randomly samples outer products. We show that if a set of deterministic conditions is satisfied, then the algorithm can compute the exact product. In addition, we show probabilistic bounds on the two norm relative error of the algorithm. two norm relative error of the algorithm. In the second part, we discuss the sensitivity of leverage scores to perturbations. Leverage scores are scalar quantities that give a notion of importance to the rows of a matrix. They are used as sampling probabilities in many randomized algorithms. We show bounds on the difference between the leverage scores of a matrix and a perturbation of the matrix. In the last part, we approximate functions over an active subspace of parameters. To identify the active subspace, we apply an algorithm that relies on a random sampling scheme. We show bounds on the accuracy of the active subspace identification algorithm and construct an approximation to a function with 3556 parameters using a ten-dimensional active subspace.

  1. Genetic algorithms as global random search methods

    NASA Technical Reports Server (NTRS)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that the schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solutions and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  2. Genetic algorithms as global random search methods

    NASA Technical Reports Server (NTRS)

    Peck, Charles C.; Dhawan, Atam P.

    1995-01-01

    Genetic algorithm behavior is described in terms of the construction and evolution of the sampling distributions over the space of candidate solutions. This novel perspective is motivated by analysis indicating that that schema theory is inadequate for completely and properly explaining genetic algorithm behavior. Based on the proposed theory, it is argued that the similarities of candidate solutions should be exploited directly, rather than encoding candidate solution and then exploiting their similarities. Proportional selection is characterized as a global search operator, and recombination is characterized as the search process that exploits similarities. Sequential algorithms and many deletion methods are also analyzed. It is shown that by properly constraining the search breadth of recombination operators, convergence of genetic algorithms to a global optimum can be ensured.

  3. Evaluating Random Forests for Survival Analysis using Prediction Error Curves.

    PubMed

    Mogensen, Ulla B; Ishwaran, Hemant; Gerds, Thomas A

    2012-09-01

    Prediction error curves are increasingly used to assess and compare predictions in survival analysis. This article surveys the R package pec which provides a set of functions for efficient computation of prediction error curves. The software implements inverse probability of censoring weights to deal with right censored data and several variants of cross-validation to deal with the apparent error problem. In principle, all kinds of prediction models can be assessed, and the package readily supports most traditional regression modeling strategies, like Cox regression or additive hazard regression, as well as state of the art machine learning methods such as random forests, a nonparametric method which provides promising alternatives to traditional strategies in low and high-dimensional settings. We show how the functionality of pec can be extended to yet unsupported prediction models. As an example, we implement support for random forest prediction models based on the R-packages randomSurvivalForest and party. Using data of the Copenhagen Stroke Study we use pec to compare random forests to a Cox regression model derived from stepwise variable selection. Reproducible results on the user level are given for publicly available data from the German breast cancer study group.

  4. Parameter identification using a creeping-random-search algorithm

    NASA Technical Reports Server (NTRS)

    Parrish, R. V.

    1971-01-01

    A creeping-random-search algorithm is applied to different types of problems in the field of parameter identification. The studies are intended to demonstrate that a random-search algorithm can be applied successfully to these various problems, which often cannot be handled by conventional deterministic methods, and, also, to introduce methods that speed convergence to an extremal of the problem under investigation. Six two-parameter identification problems with analytic solutions are solved, and two application problems are discussed in some detail. Results of the study show that a modified version of the basic creeping-random-search algorithm chosen does speed convergence in comparison with the unmodified version. The results also show that the algorithm can successfully solve problems that contain limits on state or control variables, inequality constraints (both independent and dependent, and linear and nonlinear), or stochastic models.

  5. Comparison of random forest and parametric imputation models for imputing missing data using MICE: a CALIBER study.

    PubMed

    Shah, Anoop D; Bartlett, Jonathan W; Carpenter, James; Nicholas, Owen; Hemingway, Harry

    2014-03-15

    Multivariate imputation by chained equations (MICE) is commonly used for imputing missing data in epidemiologic research. The "true" imputation model may contain nonlinearities which are not included in default imputation models. Random forest imputation is a machine learning technique which can accommodate nonlinearities and interactions and does not require a particular regression model to be specified. We compared parametric MICE with a random forest-based MICE algorithm in 2 simulation studies. The first study used 1,000 random samples of 2,000 persons drawn from the 10,128 stable angina patients in the CALIBER database (Cardiovascular Disease Research using Linked Bespoke Studies and Electronic Records; 2001-2010) with complete data on all covariates. Variables were artificially made "missing at random," and the bias and efficiency of parameter estimates obtained using different imputation methods were compared. Both MICE methods produced unbiased estimates of (log) hazard ratios, but random forest was more efficient and produced narrower confidence intervals. The second study used simulated data in which the partially observed variable depended on the fully observed variables in a nonlinear way. Parameter estimates were less biased using random forest MICE, and confidence interval coverage was better. This suggests that random forest imputation may be useful for imputing complex epidemiologic data sets in which some patients have missing data.

  6. Toward Digital Staining using Imaging Mass Spectrometry and Random Forests

    PubMed Central

    Hanselmann, Michael; Köthe, Ullrich; Kirchner, Marc; Renard, Bernhard Y.; Amstalden, Erika R.; Glunde, Kristine; Heeren, Ron M. A.; Hamprecht, Fred A.

    2009-01-01

    We show on Imaging Mass Spectrometry (IMS) data that the Random Forest classifier can be used for automated tissue classification and that it results in predictions with high sensitivities and positive predictive values, even when inter-sample variability is present in the data. We further demonstrate how Markov Random Fields and vector-valued median filtering can be applied to reduce noise effects to further improve the classification results in a post-hoc smoothing step. Our study gives clear evidence that digital staining by means of IMS constitutes a promising complement to chemical staining techniques. PMID:19469555

  7. The Study of Randomized Visual Saliency Detection Algorithm

    PubMed Central

    Xu, Weihong; Kuang, Fangjun; Gao, Shangbing

    2013-01-01

    Image segmentation process for high quality visual saliency map is very dependent on the existing visual saliency metrics. It is mostly only get sketchy effect of saliency map, and roughly based visual saliency map will affect the image segmentation results. The paper had presented the randomized visual saliency detection algorithm. The randomized visual saliency detection method can quickly generate the same size as the original input image and detailed results of the saliency map. The randomized saliency detection method can be applied to real-time requirements for image content-based scaling saliency results map. The randomization method for fast randomized video saliency area detection, the algorithm only requires a small amount of memory space can be detected detailed oriented visual saliency map, the presented results are shown that the method of visual saliency map used in image after the segmentation process can be an ideal segmentation results. PMID:24382980

  8. The study of randomized visual saliency detection algorithm.

    PubMed

    Chen, Yuantao; Xu, Weihong; Kuang, Fangjun; Gao, Shangbing

    2013-01-01

    Image segmentation process for high quality visual saliency map is very dependent on the existing visual saliency metrics. It is mostly only get sketchy effect of saliency map, and roughly based visual saliency map will affect the image segmentation results. The paper had presented the randomized visual saliency detection algorithm. The randomized visual saliency detection method can quickly generate the same size as the original input image and detailed results of the saliency map. The randomized saliency detection method can be applied to real-time requirements for image content-based scaling saliency results map. The randomization method for fast randomized video saliency area detection, the algorithm only requires a small amount of memory space can be detected detailed oriented visual saliency map, the presented results are shown that the method of visual saliency map used in image after the segmentation process can be an ideal segmentation results.

  9. Random geometric prior forest for multiclass object segmentation.

    PubMed

    Liu, Xiao; Song, Mingli; Tao, Dacheng; Bu, Jiajun; Chen, Chun

    2015-10-01

    Recent advances in object detection have led to the development of segmentation by detection approaches that integrate top-down geometric priors for multiclass object segmentation. A key yet under-addressed issue in utilizing top-down cues for the problem of multiclass object segmentation by detection is efficiently generating robust and accurate geometric priors. In this paper, we propose a random geometric prior forest scheme to obtain object-adaptive geometric priors efficiently and robustly. In the scheme, a testing object first searches for training neighbors with similar geometries using the random geometric prior forest, and then the geometry of the testing object is reconstructed by linearly combining the geometries of its neighbors. Our scheme enjoys several favorable properties when compared with conventional methods. First, it is robust and very fast because its inference does not suffer from bad initializations, poor local minimums or complex optimization. Second, the figure/ground geometries of training samples are utilized in a multitask manner. Third, our scheme is object-adaptive but does not require the labeling of parts or poselets, and thus, it is quite easy to implement. To demonstrate the effectiveness of the proposed scheme, we integrate the obtained top-down geometric priors with conventional bottom-up color cues in the frame of graph cut. The proposed random geometric prior forest achieves the best segmentation results of all of the methods tested on VOC2010/2012 and is 90 times faster than the current state-of-the-art method. PMID:25974937

  10. Random Forest and Rotation Forest for fully polarized SAR image classification using polarimetric and spatial features

    NASA Astrophysics Data System (ADS)

    Du, Peijun; Samat, Alim; Waske, Björn; Liu, Sicong; Li, Zhenhong

    2015-07-01

    Fully Polarimetric Synthetic Aperture Radar (PolSAR) has the advantages of all-weather, day and night observation and high resolution capabilities. The collected data are usually sorted in Sinclair matrix, coherence or covariance matrices which are directly related to physical properties of natural media and backscattering mechanism. Additional information related to the nature of scattering medium can be exploited through polarimetric decomposition theorems. Accordingly, PolSAR image classification gains increasing attentions from remote sensing communities in recent years. However, the above polarimetric measurements or parameters cannot provide sufficient information for accurate PolSAR image classification in some scenarios, e.g. in complex urban areas where different scattering mediums may exhibit similar PolSAR response due to couples of unavoidable reasons. Inspired by the complementarity between spectral and spatial features bringing remarkable improvements in optical image classification, the complementary information between polarimetric and spatial features may also contribute to PolSAR image classification. Therefore, the roles of textural features such as contrast, dissimilarity, homogeneity and local range, morphological profiles (MPs) in PolSAR image classification are investigated using two advanced ensemble learning (EL) classifiers: Random Forest and Rotation Forest. Supervised Wishart classifier and support vector machines (SVMs) are used as benchmark classifiers for the evaluation and comparison purposes. Experimental results with three Radarsat-2 images in quad polarization mode indicate that classification accuracies could be significantly increased by integrating spatial and polarimetric features using ensemble learning strategies. Rotation Forest can get better accuracy than SVM and Random Forest, in the meantime, Random Forest is much faster than Rotation Forest.

  11. Markov random-field-based anomaly screening algorithm

    NASA Astrophysics Data System (ADS)

    Bello, Martin G.

    1995-06-01

    A novel anomaly screening algorithm is described which makes use of a regression diagnostic associated with the fitting of Markov Random Field (MRF) models. This regression diagnostic quantifies the extent to which a given neighborhood of pixels is atypical, relative to local background characteristics. The screening algorithm consists first in the calculation of an MRF-based anomoly statistic values. Next, 'blob' features, such as pixel count and maximal pixel intensity are calculated, and ranked over the image, in order to 'filter' the blobs to some final subset of most likely candidates. Receiver operating characteristics obtained from applying the above described screening algorithm to the detection of minelike targets in high- and low-frequency side-scan sonar imagery are presented together with results obtained from other screening algorithms for comparison, demonstrating performance comparable to trained human operators. In addition, real-time implementation considerations associated with each algorithmic component of the described procedure are identified.

  12. Simulation of multicorrelated random processes using the FFT algorithm

    NASA Technical Reports Server (NTRS)

    Wittig, L. E.; Sinha, A. K.

    1975-01-01

    A technique for the digital simulation of multicorrelated Gaussian random processes is described. This technique is based upon generating discrete frequency functions which correspond to the Fourier transform of the desired random processes, and then using the fast Fourier transform (FFT) algorithm to obtain the actual random processes. The main advantage of this method of simulation over other methods is computation time; it appears to be more than an order of magnitude faster than present methods of simulation. One of the main uses of multicorrelated simulated random processes is in solving nonlinear random vibration problems by numerical integration of the governing differential equations. The response of a nonlinear string to a distributed noise input is presented as an example.

  13. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

    PubMed

    Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological

  14. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions.

    PubMed

    Hengl, Tomislav; Heuvelink, Gerard B M; Kempen, Bas; Leenaars, Johan G B; Walsh, Markus G; Shepherd, Keith D; Sila, Andrew; MacMillan, Robert A; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008-2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management--organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15-75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological

  15. Mapping Soil Properties of Africa at 250 m Resolution: Random Forests Significantly Improve Current Predictions

    PubMed Central

    Hengl, Tomislav; Heuvelink, Gerard B. M.; Kempen, Bas; Leenaars, Johan G. B.; Walsh, Markus G.; Shepherd, Keith D.; Sila, Andrew; MacMillan, Robert A.; Mendes de Jesus, Jorge; Tamene, Lulseged; Tondoh, Jérôme E.

    2015-01-01

    80% of arable land in Africa has low soil fertility and suffers from physical soil problems. Additionally, significant amounts of nutrients are lost every year due to unsustainable soil management practices. This is partially the result of insufficient use of soil management knowledge. To help bridge the soil information gap in Africa, the Africa Soil Information Service (AfSIS) project was established in 2008. Over the period 2008–2014, the AfSIS project compiled two point data sets: the Africa Soil Profiles (legacy) database and the AfSIS Sentinel Site database. These data sets contain over 28 thousand sampling locations and represent the most comprehensive soil sample data sets of the African continent to date. Utilizing these point data sets in combination with a large number of covariates, we have generated a series of spatial predictions of soil properties relevant to the agricultural management—organic carbon, pH, sand, silt and clay fractions, bulk density, cation-exchange capacity, total nitrogen, exchangeable acidity, Al content and exchangeable bases (Ca, K, Mg, Na). We specifically investigate differences between two predictive approaches: random forests and linear regression. Results of 5-fold cross-validation demonstrate that the random forests algorithm consistently outperforms the linear regression algorithm, with average decreases of 15–75% in Root Mean Squared Error (RMSE) across soil properties and depths. Fitting and running random forests models takes an order of magnitude more time and the modelling success is sensitive to artifacts in the input data, but as long as quality-controlled point data are provided, an increase in soil mapping accuracy can be expected. Results also indicate that globally predicted soil classes (USDA Soil Taxonomy, especially Alfisols and Mollisols) help improve continental scale soil property mapping, and are among the most important predictors. This indicates a promising potential for transferring pedological

  16. Predicting adaptive phenotypes from multilocus genotypes in Sitka spruce (Picea sitchensis) using random forest.

    PubMed

    Holliday, Jason A; Wang, Tongli; Aitken, Sally

    2012-09-01

    Climate is the primary driver of the distribution of tree species worldwide, and the potential for adaptive evolution will be an important factor determining the response of forests to anthropogenic climate change. Although association mapping has the potential to improve our understanding of the genomic underpinnings of climatically relevant traits, the utility of adaptive polymorphisms uncovered by such studies would be greatly enhanced by the development of integrated models that account for the phenotypic effects of multiple single-nucleotide polymorphisms (SNPs) and their interactions simultaneously. We previously reported the results of association mapping in the widespread conifer Sitka spruce (Picea sitchensis). In the current study we used the recursive partitioning algorithm 'Random Forest' to identify optimized combinations of SNPs to predict adaptive phenotypes. After adjusting for population structure, we were able to explain 37% and 30% of the phenotypic variation, respectively, in two locally adaptive traits--autumn budset timing and cold hardiness. For each trait, the leading five SNPs captured much of the phenotypic variation. To determine the role of epistasis in shaping these phenotypes, we also used a novel approach to quantify the strength and direction of pairwise interactions between SNPs and found such interactions to be common. Our results demonstrate the power of Random Forest to identify subsets of markers that are most important to climatic adaptation, and suggest that interactions among these loci may be widespread.

  17. RAQ-A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems.

    PubMed

    Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann

    2016-01-01

    Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing.

  18. RAQ–A Random Forest Approach for Predicting Air Quality in Urban Sensing Systems

    PubMed Central

    Yu, Ruiyun; Yang, Yu; Yang, Leyou; Han, Guangjie; Move, Oguti Ann

    2016-01-01

    Air quality information such as the concentration of PM2.5 is of great significance for human health and city management. It affects the way of traveling, urban planning, government policies and so on. However, in major cities there is typically only a limited number of air quality monitoring stations. In the meantime, air quality varies in the urban areas and there can be large differences, even between closely neighboring regions. In this paper, a random forest approach for predicting air quality (RAQ) is proposed for urban sensing systems. The data generated by urban sensing includes meteorology data, road information, real-time traffic status and point of interest (POI) distribution. The random forest algorithm is exploited for data training and prediction. The performance of RAQ is evaluated with real city data. Compared with three other algorithms, this approach achieves better prediction precision. Exciting results are observed from the experiments that the air quality can be inferred with amazingly high accuracy from the data which are obtained from urban sensing. PMID:26761008

  19. Fire detection system using random forest classification for image sequences of complex background

    NASA Astrophysics Data System (ADS)

    Kim, Onecue; Kang, Dong-Joong

    2013-06-01

    We present a fire alarm system based on image processing that detects fire accidents in various environments. To reduce false alarms that frequently appeared in earlier systems, we combined image features including color, motion, and blinking information. We specifically define the color conditions of fires in hue, saturation and value, and RGB color space. Fire features are represented as intensity variation, color mean and variance, motion, and image differences. Moreover, blinking fire features are modeled by using crossing patches. We propose an algorithm that classifies patches into fire or nonfire areas by using random forest supervised learning. We design an embedded surveillance device made with acrylonitrile butadiene styrene housing for stable fire detection in outdoor environments. The experimental results show that our algorithm works robustly in complex environments and is able to detect fires in real time.

  20. Prediction of aquatic toxicity mode of action using linear discriminant and random forest models.

    PubMed

    Martin, Todd M; Grulke, Christopher M; Young, Douglas M; Russom, Christine L; Wang, Nina Y; Jackson, Crystal R; Barron, Mace G

    2013-09-23

    The ability to determine the mode of action (MOA) for a diverse group of chemicals is a critical part of ecological risk assessment and chemical regulation. However, existing MOA assignment approaches in ecotoxicology have been limited to a relatively few MOAs, have high uncertainty, or rely on professional judgment. In this study, machine based learning algorithms (linear discriminant analysis and random forest) were used to develop models for assigning aquatic toxicity MOA. These methods were selected since they have been shown to be able to correlate diverse data sets and provide an indication of the most important descriptors. A data set of MOA assignments for 924 chemicals was developed using a combination of high confidence assignments, international consensus classifications, ASTER (ASessment Tools for the Evaluation of Risk) predictions, and weight of evidence professional judgment based an assessment of structure and literature information. The overall data set was randomly divided into a training set (75%) and a validation set (25%) and then used to develop linear discriminant analysis (LDA) and random forest (RF) MOA assignment models. The LDA and RF models had high internal concordance and specificity and were able to produce overall prediction accuracies ranging from 84.5 to 87.7% for the validation set. These results demonstrate that computational chemistry approaches can be used to determine the acute toxicity MOAs across a large range of structures and mechanisms.

  1. Estimating tropical forest structure using discrete return lidar data and a locally trained synthetic forest algorithm

    NASA Astrophysics Data System (ADS)

    Palace, M. W.; Sullivan, F. B.; Ducey, M.; Czarnecki, C.; Zanin Shimbo, J.; Mota e Silva, J.

    2012-12-01

    Forests are complex ecosystems with diverse species assemblages, crown structures, size class distributions, and historical disturbances. This complexity makes monitoring, understanding and forecasting carbon dynamics difficult. Still, this complexity is also central in carbon cycling of terrestrial vegetation. Lidar data often is used solely to associate plot level biomass measurements with canopy height models. There is much more that may be gleaned from examining the full profile from lidar data. Using discrete return airborne light detection and ranging (lidar) data collected in 2009 by the Tropical Ecology Assessment and Monitoring Network (TEAM), we compared synthetic vegetation profiles to lidar-derived relative vegetation profiles (RVPs) in La Selva, Costa Rica. To accomplish this, we developed RVPs to describe the vertical distribution of plant material on 20 plots at La Selva by transforming cumulative lidar observations to account for obscured plant material. Hundreds of synthetic profiles were developed for forests containing approximately 200,000 trees with random diameter at breast height (DBH), assuming a Weibull distribution with a shape of 1.0, and mean DBH ranging from 0cm to 500cm. For each tree in the synthetic forests, crown shape (width, depth) and total height were estimated using previously developed allometric equations for tropical forests. Profiles for each synthetic forest were generated and compared to TEAM lidar data to determine the best fitting synthetic profile to lidar profiles for each of 20 field plots at La Selva. After determining the best fit synthetic profile using the minimum sum of squared differences, we are able to estimate forest structure (diameter distribution, height, and biomass) and to compare our estimates to field data for each of the twenty field plots. Our preliminary results show promise for estimating forest structure and biomass using lidar data and computer modeling.

  2. Modeling Gene Regulation in Liver Hepatocellular Carcinoma with Random Forests

    PubMed Central

    2016-01-01

    Liver hepatocellular carcinoma (HCC) remains a leading cause of cancer-related death. Poor understanding of the mechanisms underlying HCC prevents early detection and leads to high mortality. We developed a random forest model that incorporates copy-number variation, DNA methylation, transcription factor, and microRNA binding information as features to predict gene expression in HCC. Our model achieved a highly significant correlation between predicted and measured expression of held-out genes. Furthermore, we identified potential regulators of gene expression in HCC. Many of these regulators have been previously found to be associated with cancer and are differentially expressed in HCC. We also evaluated our predicted target sets for these regulators by making comparison with experimental results. Lastly, we found that the transcription factor E2F6, one of the candidate regulators inferred by our model, is predictive of survival rate in HCC. Results of this study will provide directions for future prospective studies in HCC.

  3. Random Matrix Approach to Quantum Adiabatic Evolution Algorithms

    NASA Technical Reports Server (NTRS)

    Boulatov, Alexei; Smelyanskiy, Vadier N.

    2004-01-01

    We analyze the power of quantum adiabatic evolution algorithms (Q-QA) for solving random NP-hard optimization problems within a theoretical framework based on the random matrix theory (RMT). We present two types of the driven RMT models. In the first model, the driving Hamiltonian is represented by Brownian motion in the matrix space. We use the Brownian motion model to obtain a description of multiple avoided crossing phenomena. We show that the failure mechanism of the QAA is due to the interaction of the ground state with the "cloud" formed by all the excited states, confirming that in the driven RMT models. the Landau-Zener mechanism of dissipation is not important. We show that the QAEA has a finite probability of success in a certain range of parameters. implying the polynomial complexity of the algorithm. The second model corresponds to the standard QAEA with the problem Hamiltonian taken from the Gaussian Unitary RMT ensemble (GUE). We show that the level dynamics in this model can be mapped onto the dynamics in the Brownian motion model. However, the driven RMT model always leads to the exponential complexity of the algorithm due to the presence of the long-range intertemporal correlations of the eigenvalues. Our results indicate that the weakness of effective transitions is the leading effect that can make the Markovian type QAEA successful.

  4. Combinatorial approximation algorithms for MAXCUT using random walks.

    SciTech Connect

    Seshadhri, Comandur; Kale, Satyen

    2010-11-01

    We give the first combinatorial approximation algorithm for MaxCut that beats the trivial 0.5 factor by a constant. The main partitioning procedure is very intuitive, natural, and easily described. It essentially performs a number of random walks and aggregates the information to provide the partition. We can control the running time to get an approximation factor-running time tradeoff. We show that for any constant b > 1.5, there is an {tilde O}(n{sup b}) algorithm that outputs a (0.5 + {delta})-approximation for MaxCut, where {delta} = {delta}(b) is some positive constant. One of the components of our algorithm is a weak local graph partitioning procedure that may be of independent interest. Given a starting vertex i and a conductance parameter {phi}, unless a random walk of length {ell} = O(log n) starting from i mixes rapidly (in terms of {phi} and {ell}), we can find a cut of conductance at most {phi} close to the vertex. The work done per vertex found in the cut is sublinear in n.

  5. Low scaling algorithms for the random phase and GW approximation

    NASA Astrophysics Data System (ADS)

    Kaltak, Merzuk; Klimes, Jiri; Kresse, Georg

    2015-03-01

    The computationally most expensive step in conventional RPA implementations is the calculation of the independent particle polarizability χ0. We present an algorithm that calculates χ0 using the Green's function in real space and imaginary time. In combination with optimized non-uniform frequency and time grids the correlation energy on the random phase approximation level can be calculated efficiently with a computational cost that grows only cubically with system size. We apply this approach to calculate RPA defect energies of silicon using unit cells with up to 250 atoms and 128 CPU cores. Furthermore, we show how to extent the algorithm to the GW framework of Hedin and solve the Dyson equation for the Green's function with the same computational effort. This work was supported by the Austrian Spezialforschungsbereich Vienna Computational Materials Laboratory (SFB ViCoM) and the Deutsche Forschungsgruppe (FOR) 1346

  6. Predictive lithological mapping of Canada's North using Random Forest classification applied to geophysical and geochemical data

    NASA Astrophysics Data System (ADS)

    Harris, J. R.; Grunsky, E. C.

    2015-07-01

    A recent method for mapping lithology which involves the Random Forest (RF) machine classification algorithm is evaluated. Random Forests, a supervised classifier, requires training data representative of each lithology to produce a predictive or classified map. We use two training strategies, one based on the location of lake sediment geochemical samples where the rock type is recorded from a legacy geology map at each sample station and the second strategy is based on lithology recorded from field stations derived from reconnaissance field mapping. We apply the classification to interpolated major and minor lake sediment geochemical data as well as airborne total field magnetic and gamma ray spectrometer data. Using this method we produce predictions of the lithology of a large section of the Hearne Archean - Paleoproterozoic tectonic domain, in northern Canada. The results indicate that meaningful predictive lithologic maps can be produced using RF classification for both training strategies. The best results were achieved when all data were used; however, the geochemical and gamma ray data were the strongest predictors of the various lithologies. The maps generated from this research can be used to compliment field mapping activities by focusing field work on areas where the predicted geology and legacy geology do not match and as first order geological maps in poorly mapped areas.

  7. Sequential Monte Carlo tracking of the marginal artery by multiple cue fusion and random forest regression.

    PubMed

    Cherry, Kevin M; Peplinski, Brandon; Kim, Lauren; Wang, Shijun; Lu, Le; Zhang, Weidong; Liu, Jianfei; Wei, Zhuoshi; Summers, Ronald M

    2015-01-01

    Given the potential importance of marginal artery localization in automated registration in computed tomography colonography (CTC), we have devised a semi-automated method of marginal vessel detection employing sequential Monte Carlo tracking (also known as particle filtering tracking) by multiple cue fusion based on intensity, vesselness, organ detection, and minimum spanning tree information for poorly enhanced vessel segments. We then employed a random forest algorithm for intelligent cue fusion and decision making which achieved high sensitivity and robustness. After applying a vessel pruning procedure to the tracking results, we achieved statistically significantly improved precision compared to a baseline Hessian detection method (2.7% versus 75.2%, p<0.001). This method also showed statistically significantly improved recall rate compared to a 2-cue baseline method using fewer vessel cues (30.7% versus 67.7%, p<0.001). These results demonstrate that marginal artery localization on CTC is feasible by combining a discriminative classifier (i.e., random forest) with a sequential Monte Carlo tracking mechanism. In so doing, we present the effective application of an anatomical probability map to vessel pruning as well as a supplementary spatial coordinate system for colonic segmentation and registration when this task has been confounded by colon lumen collapse.

  8. Gearbox fault diagnosis based on deep random forest fusion of acoustic and vibratory signals

    NASA Astrophysics Data System (ADS)

    Li, Chuan; Sanchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Cabrera, Diego; Vásquez, Rafael E.

    2016-08-01

    Fault diagnosis is an effective tool to guarantee safe operations in gearboxes. Acoustic and vibratory measurements in such mechanical devices are all sensitive to the existence of faults. This work addresses the use of a deep random forest fusion (DRFF) technique to improve fault diagnosis performance for gearboxes by using measurements of an acoustic emission (AE) sensor and an accelerometer that are used for monitoring the gearbox condition simultaneously. The statistical parameters of the wavelet packet transform (WPT) are first produced from the AE signal and the vibratory signal, respectively. Two deep Boltzmann machines (DBMs) are then developed for deep representations of the WPT statistical parameters. A random forest is finally suggested to fuse the outputs of the two DBMs as the integrated DRFF model. The proposed DRFF technique is evaluated using gearbox fault diagnosis experiments under different operational conditions, and achieves 97.68% of the classification rate for 11 different condition patterns. Compared to other peer algorithms, the addressed method exhibits the best performance. The results indicate that the deep learning fusion of acoustic and vibratory signals may improve fault diagnosis capabilities for gearboxes.

  9. Disaggregating Census Data for Population Mapping Using Random Forests with Remotely-Sensed and Ancillary Data

    PubMed Central

    Stevens, Forrest R.; Gaughan, Andrea E.; Linard, Catherine; Tatem, Andrew J.

    2015-01-01

    High resolution, contemporary data on human population distributions are vital for measuring impacts of population growth, monitoring human-environment interactions and for planning and policy development. Many methods are used to disaggregate census data and predict population densities for finer scale, gridded population data sets. We present a new semi-automated dasymetric modeling approach that incorporates detailed census and ancillary data in a flexible, “Random Forest” estimation technique. We outline the combination of widely available, remotely-sensed and geospatial data that contribute to the modeled dasymetric weights and then use the Random Forest model to generate a gridded prediction of population density at ~100 m spatial resolution. This prediction layer is then used as the weighting surface to perform dasymetric redistribution of the census counts at a country level. As a case study we compare the new algorithm and its products for three countries (Vietnam, Cambodia, and Kenya) with other common gridded population data production methodologies. We discuss the advantages of the new method and increases over the accuracy and flexibility of those previous approaches. Finally, we outline how this algorithm will be extended to provide freely-available gridded population data sets for Africa, Asia and Latin America. PMID:25689585

  10. Random Forests for Global and Regional Crop Yield Predictions

    PubMed Central

    Jeong, Jig Han; Resop, Jonathan P.; Mueller, Nathaniel D.; Fleisher, David H.; Yun, Kyungdahm; Butler, Ethan E.; Timlin, Dennis J.; Shim, Kyo-Moon; Gerber, James S.; Reddy, Vangimalla R.

    2016-01-01

    Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data. PMID:27257967

  11. Random Forests for Global and Regional Crop Yield Predictions.

    PubMed

    Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung

    2016-01-01

    Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data. PMID:27257967

  12. Random Forests for Global and Regional Crop Yield Predictions.

    PubMed

    Jeong, Jig Han; Resop, Jonathan P; Mueller, Nathaniel D; Fleisher, David H; Yun, Kyungdahm; Butler, Ethan E; Timlin, Dennis J; Shim, Kyo-Moon; Gerber, James S; Reddy, Vangimalla R; Kim, Soo-Hyung

    2016-01-01

    Accurate predictions of crop yield are critical for developing effective agricultural and food policies at the regional and global scales. We evaluated a machine-learning method, Random Forests (RF), for its ability to predict crop yield responses to climate and biophysical variables at global and regional scales in wheat, maize, and potato in comparison with multiple linear regressions (MLR) serving as a benchmark. We used crop yield data from various sources and regions for model training and testing: 1) gridded global wheat grain yield, 2) maize grain yield from US counties over thirty years, and 3) potato tuber and maize silage yield from the northeastern seaboard region. RF was found highly capable of predicting crop yields and outperformed MLR benchmarks in all performance statistics that were compared. For example, the root mean square errors (RMSE) ranged between 6 and 14% of the average observed yield with RF models in all test cases whereas these values ranged from 14% to 49% for MLR models. Our results show that RF is an effective and versatile machine-learning method for crop yield predictions at regional and global scales for its high accuracy and precision, ease of use, and utility in data analysis. RF may result in a loss of accuracy when predicting the extreme ends or responses beyond the boundaries of the training data.

  13. Vertebral degenerative disc disease severity evaluation using random forest classification

    NASA Astrophysics Data System (ADS)

    Munoz, Hector E.; Yao, Jianhua; Burns, Joseph E.; Pham, Yasuyuki; Stieger, James; Summers, Ronald M.

    2014-03-01

    Degenerative disc disease (DDD) develops in the spine as vertebral discs degenerate and osseous excrescences or outgrowths naturally form to restabilize unstable segments of the spine. These osseous excrescences, or osteophytes, may progress or stabilize in size as the spine reaches a new equilibrium point. We have previously created a CAD system that detects DDD. This paper presents a new system to determine the severity of DDD of individual vertebral levels. This will be useful to monitor the progress of developing DDD, as rapid growth may indicate that there is a greater stabilization problem that should be addressed. The existing DDD CAD system extracts the spine from CT images and segments the cortical shell of individual levels with a dual-surface model. The cortical shell is unwrapped, and is analyzed to detect the hyperdense regions of DDD. Three radiologists scored the severity of DDD of each disc space of 46 CT scans. Radiologists' scores and features generated from CAD detections were used to train a random forest classifier. The classifier then assessed the severity of DDD at each vertebral disc level. The agreement between the computer severity score and the average radiologist's score had a quadratic weighted Cohen's kappa of 0.64.

  14. Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique

    PubMed Central

    Hassan, Hebatallah; Badr, Amr; Abdelhalim, MB

    2015-01-01

    O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein. PMID:26244014

  15. Prediction of O-glycosylation Sites Using Random Forest and GA-Tuned PSO Technique.

    PubMed

    Hassan, Hebatallah; Badr, Amr; Abdelhalim, M B

    2015-01-01

    O-glycosylation is one of the main types of the mammalian protein glycosylation; it occurs on the particular site of serine (S) or threonine (T). Several O-glycosylation site predictors have been developed. However, a need to get even better prediction tools remains. One challenge in training the classifiers is that the available datasets are highly imbalanced, which makes the classification accuracy for the minority class to become unsatisfactory. In our previous work, we have proposed a new classification approach, which is based on particle swarm optimization (PSO) and random forest (RF); this approach has considered the imbalanced dataset problem. The PSO parameters setting in the training process impacts the classification accuracy. Thus, in this paper, we perform parameters optimization for the PSO algorithm, based on genetic algorithm, in order to increase the classification accuracy. Our proposed genetic algorithm-based approach has shown better performance in terms of area under the receiver operating characteristic curve against existing predictors. In addition, we implemented a glycosylation predictor tool based on that approach, and we demonstrated that this tool could successfully identify candidate glycosylation sites in case study protein.

  16. An Individual Tree Detection Algorithm for Dense Deciduous Forests with Spreading Branches

    NASA Astrophysics Data System (ADS)

    Shao, G.

    2015-12-01

    Individual tree information derived from LiDAR may have the potential to assist forest inventory and improve the assessment of forest structure and composition for sustainable forest management. The algorithms developed for individual tree detection are commonly focusing on finding tree tops to allocation the tree positions. However, the spreading branches (cylinder crowns) in deciduous forests cause such kind of algorithms work less effectively on dense canopy. This research applies a machine learning algorithm, mean shift, to position individual trees based on the density of LiDAR point could instead of detecting tree tops. The study site locates in a dense oak forest in Indiana, US. The selection of mean shift kernels is discussed. The constant and dynamic bandwidths of mean shit algorithms are applied and compared.

  17. Randomized tree construction algorithm to explore energy landscapes.

    PubMed

    Jaillet, Léonard; Corcho, Francesc J; Pérez, Juan-Jesús; Cortés, Juan

    2011-12-01

    In this work, a new method for exploring conformational energy landscapes is described. The method, called transition-rapidly exploring random tree (T-RRT), combines ideas from statistical physics and robot path planning algorithms. A search tree is constructed on the conformational space starting from a given state. The tree expansion is driven by a double strategy: on the one hand, it is naturally biased toward yet unexplored regions of the space; on the other, a Monte Carlo-like transition test guides the expansion toward energetically favorable regions. The balance between these two strategies is automatically achieved due to a self-tuning mechanism. The method is able to efficiently find both energy minima and transition paths between them. As a proof of concept, the method is applied to two academic benchmarks and the alanine dipeptide.

  18. Randomized tree construction algorithm to explore energy landscapes.

    PubMed

    Jaillet, Léonard; Corcho, Francesc J; Pérez, Juan-Jesús; Cortés, Juan

    2011-12-01

    In this work, a new method for exploring conformational energy landscapes is described. The method, called transition-rapidly exploring random tree (T-RRT), combines ideas from statistical physics and robot path planning algorithms. A search tree is constructed on the conformational space starting from a given state. The tree expansion is driven by a double strategy: on the one hand, it is naturally biased toward yet unexplored regions of the space; on the other, a Monte Carlo-like transition test guides the expansion toward energetically favorable regions. The balance between these two strategies is automatically achieved due to a self-tuning mechanism. The method is able to efficiently find both energy minima and transition paths between them. As a proof of concept, the method is applied to two academic benchmarks and the alanine dipeptide. PMID:21919017

  19. Random matrix approach to quantum adiabatic evolution algorithms

    SciTech Connect

    Boulatov, A.; Smelyanskiy, V.N.

    2005-05-15

    We analyze the power of the quantum adiabatic evolution algorithm (QAA) for solving random computationally hard optimization problems within a theoretical framework based on random matrix theory (RMT). We present two types of driven RMT models. In the first model, the driving Hamiltonian is represented by Brownian motion in the matrix space. We use the Brownian motion model to obtain a description of multiple avoided crossing phenomena. We show that nonadiabatic corrections in the QAA are due to the interaction of the ground state with the 'cloud' formed by most of the excited states, confirming that in driven RMT models, the Landau-Zener scenario of pairwise level repulsions is not relevant for the description of nonadiabatic corrections. We show that the QAA has a finite probability of success in a certain range of parameters, implying a polynomial complexity of the algorithm. The second model corresponds to the standard QAA with the problem Hamiltonian taken from the RMT Gaussian unitary ensemble (GUE). We show that the level dynamics in this model can be mapped onto the dynamics in the Brownian motion model. For this reason, the driven GUE model can also lead to polynomial complexity of the QAA. The main contribution to the failure probability of the QAA comes from the nonadiabatic corrections to the eigenstates, which only depend on the absolute values of the transition amplitudes. Due to the mapping between the two models, these absolute values are the same in both cases. Our results indicate that this 'phase irrelevance' is the leading effect that can make both the Markovian- and GUE-type QAAs successful.

  20. Hydrologic Landscape Regionalisation Using Deductive Classification and Random Forests

    PubMed Central

    Brown, Stuart C.; Lester, Rebecca E.; Versace, Vincent L.; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  1. Global Crustal Heat Flow Using Random Decision Forest Prediction

    NASA Astrophysics Data System (ADS)

    Becker, J. J.; Wood, W. T.; Martin, K. M.

    2014-12-01

    We have applied supervised learning with random decision forests (RDF) to estimate, or predict, a global, densely spaced grid of crustal heat flow. The results are similar to global heat flow predictions that have been previously published but are more accurate and offer higher resolution. The training inputs are measurement values and uncertainties of existing sparsely sampled, (~8,000 locations), geographically biased, yet globally extensive, datasets of crustal heat flow. The RDF estimate is a highly non-linear empirical relationship between crustal heat flow and dozens of other parameters (attributes) that we have densely sampled, global, estimates of (e.g., crustal age, water depth, crustal thickness, seismic sound speed, seafloor temperature, sediment thickness, and sediment grain type). Synthetic attributes were key to obtaining good results using the RDF method. We created synthetic attributes by applying physical intuition and statistical analyses to the fundamental attributes. Statistics include median, kurtosis, and dozens of other functions, all calculated at every node and averaged over a variety of ranges from 5 to 500km. Other synthetic attributes are simply plausible, (e.g., distance from volcanoes, seafloor porosity, mean grain size). More than 600 densely sampled attributes are used in our prediction, and for each we estimated their relative importance. The important attributes included all those expected from geophysics, (e.g., inverse square root of age, gradient of depth, crustal thickness, crustal density, sediment thickness, distance from trenches), and some unexpected but plausible attributes, (e.g., seafloor temperature), but none that were unphysical. The simplicity of the RDF technique may also be of great interest beyond the discipline of crustal heat flow as it allows for more geologically intelligent predictions, decreasing the effect of sampling bias, and improving predictions in regions with little or no data, while rigorously

  2. Hydrologic landscape regionalisation using deductive classification and random forests.

    PubMed

    Brown, Stuart C; Lester, Rebecca E; Versace, Vincent L; Fawcett, Jonathon; Laurenson, Laurie

    2014-01-01

    Landscape classification and hydrological regionalisation studies are being increasingly used in ecohydrology to aid in the management and research of aquatic resources. We present a methodology for classifying hydrologic landscapes based on spatial environmental variables by employing non-parametric statistics and hybrid image classification. Our approach differed from previous classifications which have required the use of an a priori spatial unit (e.g. a catchment) which necessarily results in the loss of variability that is known to exist within those units. The use of a simple statistical approach to identify an appropriate number of classes eliminated the need for large amounts of post-hoc testing with different number of groups, or the selection and justification of an arbitrary number. Using statistical clustering, we identified 23 distinct groups within our training dataset. The use of a hybrid classification employing random forests extended this statistical clustering to an area of approximately 228,000 km2 of south-eastern Australia without the need to rely on catchments, landscape units or stream sections. This extension resulted in a highly accurate regionalisation at both 30-m and 2.5-km resolution, and a less-accurate 10-km classification that would be more appropriate for use at a continental scale. A smaller case study, of an area covering 27,000 km2, demonstrated that the method preserved the intra- and inter-catchment variability that is known to exist in local hydrology, based on previous research. Preliminary analysis linking the regionalisation to streamflow indices is promising suggesting that the method could be used to predict streamflow behaviour in ungauged catchments. Our work therefore simplifies current classification frameworks that are becoming more popular in ecohydrology, while better retaining small-scale variability in hydrology, thus enabling future attempts to explain and visualise broad-scale hydrologic trends at the scale of

  3. Inner Random Restart Genetic Algorithm for Practical Delivery Schedule Optimization

    NASA Astrophysics Data System (ADS)

    Sakurai, Yoshitaka; Takada, Kouhei; Onoyama, Takashi; Tsukamoto, Natsuki; Tsuruta, Setsuo

    A delivery route optimization that improves the efficiency of real time delivery or a distribution network requires solving several tens to hundreds but less than 2 thousands cities Traveling Salesman Problems (TSP) within interactive response time (less than about 3 second), with expert-level accuracy (less than about 3% of error rate). Further, to make things more difficult, the optimization is subjects to special requirements or preferences of each various delivery sites, persons, or societies. To meet these requirements, an Inner Random Restart Genetic Algorithm (Irr-GA) is proposed and developed. This method combines meta-heuristics such as random restart and GA having different types of simple heuristics. Such simple heuristics are 2-opt and NI (Nearest Insertion) methods, each applied for gene operations. The proposed method is hierarchical structured, integrating meta-heuristics and heuristics both of which are multiple but simple. This method is elaborated so that field experts as well as field engineers can easily understand to make the solution or method easily customized and extended according to customers' needs or taste. Comparison based on the experimental results and consideration proved that the method meets the above requirements more than other methods judging from not only optimality but also simplicity, flexibility, and expandability in order for this method to be practically used.

  4. Applying a weighted random forests method to extract karst sinkholes from LiDAR data

    NASA Astrophysics Data System (ADS)

    Zhu, Junfeng; Pierskalla, William P.

    2016-02-01

    Detailed mapping of sinkholes provides critical information for mitigating sinkhole hazards and understanding groundwater and surface water interactions in karst terrains. LiDAR (Light Detection and Ranging) measures the earth's surface in high-resolution and high-density and has shown great potentials to drastically improve locating and delineating sinkholes. However, processing LiDAR data to extract sinkholes requires separating sinkholes from other depressions, which can be laborious because of the sheer number of the depressions commonly generated from LiDAR data. In this study, we applied the random forests, a machine learning method, to automatically separate sinkholes from other depressions in a karst region in central Kentucky. The sinkhole-extraction random forest was grown on a training dataset built from an area where LiDAR-derived depressions were manually classified through a visual inspection and field verification process. Based on the geometry of depressions, as well as natural and human factors related to sinkholes, 11 parameters were selected as predictive variables to form the dataset. Because the training dataset was imbalanced with the majority of depressions being non-sinkholes, a weighted random forests method was used to improve the accuracy of predicting sinkholes. The weighted random forest achieved an average accuracy of 89.95% for the training dataset, demonstrating that the random forest can be an effective sinkhole classifier. Testing of the random forest in another area, however, resulted in moderate success with an average accuracy rate of 73.96%. This study suggests that an automatic sinkhole extraction procedure like the random forest classifier can significantly reduce time and labor costs and makes its more tractable to map sinkholes using LiDAR data for large areas. However, the random forests method cannot totally replace manual procedures, such as visual inspection and field verification.

  5. Improving the chances of successful protein structure determination with a random forest classifier

    SciTech Connect

    Jahandideh, Samad; Jaroszewski, Lukasz; Godzik, Adam

    2014-03-01

    Using an extended set of protein features calculated separately for protein surface and interior, a new version of XtalPred based on a random forest classifier achieves a significant improvement in predicting the success of structure determination from the primary amino-acid sequence. Obtaining diffraction quality crystals remains one of the major bottlenecks in structural biology. The ability to predict the chances of crystallization from the amino-acid sequence of the protein can, at least partly, address this problem by allowing a crystallographer to select homologs that are more likely to succeed and/or to modify the sequence of the target to avoid features that are detrimental to successful crystallization. In 2007, the now widely used XtalPred algorithm [Slabinski et al. (2007 ▶), Protein Sci.16, 2472–2482] was developed. XtalPred classifies proteins into five ‘crystallization classes’ based on a simple statistical analysis of the physicochemical features of a protein. Here, towards the same goal, advanced machine-learning methods are applied and, in addition, the predictive potential of additional protein features such as predicted surface ruggedness, hydrophobicity, side-chain entropy of surface residues and amino-acid composition of the predicted protein surface are tested. The new XtalPred-RF (random forest) achieves significant improvement of the prediction of crystallization success over the original XtalPred. To illustrate this, XtalPred-RF was tested by revisiting target selection from 271 Pfam families targeted by the Joint Center for Structural Genomics (JCSG) in PSI-2, and it was estimated that the number of targets entered into the protein-production and crystallization pipeline could have been reduced by 30% without lowering the number of families for which the first structures were solved. The prediction improvement depends on the subset of targets used as a testing set and reaches 100% (i.e. twofold) for the top class of predicted

  6. RFMirTarget: Predicting Human MicroRNA Target Genes with a Random Forest Classifier

    PubMed Central

    Mendoza, Mariana R.; da Fonseca, Guilherme C.; Loss-Morais, Guilherme; Alves, Ronnie; Margis, Rogerio; Bazzan, Ana L. C.

    2013-01-01

    MicroRNAs are key regulators of eukaryotic gene expression whose fundamental role has already been identified in many cell pathways. The correct identification of miRNAs targets is still a major challenge in bioinformatics and has motivated the development of several computational methods to overcome inherent limitations of experimental analysis. Indeed, the best results reported so far in terms of specificity and sensitivity are associated to machine learning-based methods for microRNA-target prediction. Following this trend, in the current paper we discuss and explore a microRNA-target prediction method based on a random forest classifier, namely RFMirTarget. Despite its well-known robustness regarding general classifying tasks, to the best of our knowledge, random forest have not been deeply explored for the specific context of predicting microRNAs targets. Our framework first analyzes alignments between candidate microRNA-target pairs and extracts a set of structural, thermodynamics, alignment, seed and position-based features, upon which classification is performed. Experiments have shown that RFMirTarget outperforms several well-known classifiers with statistical significance, and that its performance is not impaired by the class imbalance problem or features correlation. Moreover, comparing it against other algorithms for microRNA target prediction using independent test data sets from TarBase and starBase, we observe a very promising performance, with higher sensitivity in relation to other methods. Finally, tests performed with RFMirTarget show the benefits of feature selection even for a classifier with embedded feature importance analysis, and the consistency between relevant features identified and important biological properties for effective microRNA-target gene alignment. PMID:23922946

  7. Multivariate classification with random forests for gravitational wave searches of black hole binary coalescence

    NASA Astrophysics Data System (ADS)

    Baker, Paul T.; Caudill, Sarah; Hodge, Kari A.; Talukder, Dipongkar; Capano, Collin; Cornish, Neil J.

    2015-03-01

    Searches for gravitational waves produced by coalescing black hole binaries with total masses ≳25 M⊙ use matched filtering with templates of short duration. Non-Gaussian noise bursts in gravitational wave detector data can mimic short signals and limit the sensitivity of these searches. Previous searches have relied on empirically designed statistics incorporating signal-to-noise ratio and signal-based vetoes to separate gravitational wave candidates from noise candidates. We report on sensitivity improvements achieved using a multivariate candidate ranking statistic derived from a supervised machine learning algorithm. We apply the random forest of bagged decision trees technique to two separate searches in the high mass (≳25 M⊙ ) parameter space. For a search which is sensitive to gravitational waves from the inspiral, merger, and ringdown of binary black holes with total mass between 25 M⊙ and 100 M⊙ , we find sensitive volume improvements as high as 70±13%-109±11% when compared to the previously used ranking statistic. For a ringdown-only search which is sensitive to gravitational waves from the resultant perturbed intermediate mass black hole with mass roughly between 10 M⊙ and 600 M⊙ , we find sensitive volume improvements as high as 61±4%-241±12% when compared to the previously used ranking statistic. We also report how sensitivity improvements can differ depending on mass regime, mass ratio, and available data quality information. Finally, we describe the techniques used to tune and train the random forest classifier that can be generalized to its use in other searches for gravitational waves.

  8. Biased Randomized Algorithm for Fast Model-Based Diagnosis

    NASA Technical Reports Server (NTRS)

    Williams, Colin; Vartan, Farrokh

    2005-01-01

    A biased randomized algorithm has been developed to enable the rapid computational solution of a propositional- satisfiability (SAT) problem equivalent to a diagnosis problem. The closest competing methods of automated diagnosis are described in the preceding article "Fast Algorithms for Model-Based Diagnosis" and "Two Methods of Efficient Solution of the Hitting-Set Problem" (NPO-30584), which appears elsewhere in this issue. It is necessary to recapitulate some of the information from the cited articles as a prerequisite to a description of the present method. As used here, "diagnosis" signifies, more precisely, a type of model-based diagnosis in which one explores any logical inconsistencies between the observed and expected behaviors of an engineering system. The function of each component and the interconnections among all the components of the engineering system are represented as a logical system. Hence, the expected behavior of the engineering system is represented as a set of logical consequences. Faulty components lead to inconsistency between the observed and expected behaviors of the system, represented by logical inconsistencies. Diagnosis - the task of finding the faulty components - reduces to finding the components, the abnormalities of which could explain all the logical inconsistencies. One seeks a minimal set of faulty components (denoted a minimal diagnosis), because the trivial solution, in which all components are deemed to be faulty, always explains all inconsistencies. In the methods of the cited articles, the minimal-diagnosis problem is treated as equivalent to a minimal-hitting-set problem, which is translated from a combinatorial to a computational problem by mapping it onto the Boolean-satisfiability and integer-programming problems. The integer-programming approach taken in one of the prior methods is complete (in the sense that it is guaranteed to find a solution if one exists) and slow and yields a lower bound on the size of the

  9. Phase Transitions in Sampling Algorithms and the Underlying Random Structures

    NASA Astrophysics Data System (ADS)

    Randall, Dana

    Sampling algorithms based on Markov chains arise in many areas of computing, engineering and science. The idea is to perform a random walk among the elements of a large state space so that samples chosen from the stationary distribution are useful for the application. In order to get reliable results, we require the chain to be rapidly mixing, or quickly converging to equilibrium. For example, to sample independent sets in a given graph G, the so-called hard-core lattice gas model, we can start at any independent set and repeatedly add or remove a single vertex (if allowed). By defining the transition probabilities of these moves appropriately, we can ensure that the chain will converge to a use- ful distribution over the state space Ω. For instance, the Gibbs (or Boltzmann) distribution, parameterized by Λ> 0, is defined so that p(Λ) = π(I) = Λ|I| /Z, where Z = sum_{J in Ω} Λ^{|J|} is the normalizing constant known as the partition function. An interesting phenomenon occurs as Λ is varied. For small values of Λ, local Markov chains converge quickly to stationarity, while for large values, they are prohibitively slow. To see why, imagine the underlying graph G is a region of the Cartesian lattice. Large independent sets will dominate the stationary distribution π when Λ is sufficiently large, and yet it will take a very long time to move from an independent set lying mostly on the odd sublattice to one that is mostly even. This phenomenon is well known in the statistical physics community, and characterizes by a phase transition in the underlying model.

  10. Adaptive randomized algorithms for analysis and design of control systems under uncertain environments

    NASA Astrophysics Data System (ADS)

    Chen, Xinjia

    2015-05-01

    We consider the general problem of analysis and design of control systems in the presence of uncertainties. We treat uncertainties that affect a control system as random variables. The performance of the system is measured by the expectation of some derived random variables, which are typically bounded. We develop adaptive sequential randomized algorithms for estimating and optimizing the expectation of such bounded random variables with guaranteed accuracy and confidence level. These algorithms can be applied to overcome the conservatism and computational complexity in the analysis and design of controllers to be used in uncertain environments. We develop methods for investigating the optimality and computational complexity of such algorithms.

  11. Prediction of Protein Cleavage Site with Feature Selection by Random Forest

    PubMed Central

    Li, Bi-Qing; Cai, Yu-Dong; Feng, Kai-Yan; Zhao, Gui-Jun

    2012-01-01

    Proteinases play critical roles in both intra and extracellular processes by binding and cleaving their protein substrates. The cleavage can either be non-specific as part of degradation during protein catabolism or highly specific as part of proteolytic cascades and signal transduction events. Identification of these targets is extremely challenging. Current computational approaches for predicting cleavage sites are very limited since they mainly represent the amino acid sequences as patterns or frequency matrices. In this work, we developed a novel predictor based on Random Forest algorithm (RF) using maximum relevance minimum redundancy (mRMR) method followed by incremental feature selection (IFS). The features of physicochemical/biochemical properties, sequence conservation, residual disorder, amino acid occurrence frequency, secondary structure and solvent accessibility were utilized to represent the peptides concerned. Here, we compared existing prediction tools which are available for predicting possible cleavage sites in candidate substrates with ours. It is shown that our method makes much more reliable predictions in terms of the overall prediction accuracy. In addition, this predictor allows the use of a wide range of proteinases. PMID:23029276

  12. What variables are important in predicting bovine viral diarrhea virus? A random forest approach.

    PubMed

    Machado, Gustavo; Mendoza, Mariana Recamonde; Corbellini, Luis Gustavo

    2015-07-24

    Bovine viral diarrhea virus (BVDV) causes one of the most economically important diseases in cattle, and the virus is found worldwide. A better understanding of the disease associated factors is a crucial step towards the definition of strategies for control and eradication. In this study we trained a random forest (RF) prediction model and performed variable importance analysis to identify factors associated with BVDV occurrence. In addition, we assessed the influence of features selection on RF performance and evaluated its predictive power relative to other popular classifiers and to logistic regression. We found that RF classification model resulted in an average error rate of 32.03% for the negative class (negative for BVDV) and 36.78% for the positive class (positive for BVDV).The RF model presented area under the ROC curve equal to 0.702. Variable importance analysis revealed that important predictors of BVDV occurrence were: a) who inseminates the animals, b) number of neighboring farms that have cattle and c) rectal palpation performed routinely. Our results suggest that the use of machine learning algorithms, especially RF, is a promising methodology for the analysis of cross-sectional studies, presenting a satisfactory predictive power and the ability to identify predictors that represent potential risk factors for BVDV investigation. We examined classical predictors and found some new and hard to control practices that may lead to the spread of this disease within and among farms, mainly regarding poor or neglected reproduction management, which should be considered for disease control and eradication.

  13. Urban land cover thematic disaggregation, employing datasets from multiple sources and RandomForests modeling

    NASA Astrophysics Data System (ADS)

    Gounaridis, Dimitrios; Koukoulas, Sotirios

    2016-09-01

    Urban land cover mapping has lately attracted a vast amount of attention as it closely relates to a broad scope of scientific and management applications. Late methodological and technological advancements facilitate the development of datasets with improved accuracy. However, thematic resolution of urban land cover has received much less attention so far, a fact that hampers the produced datasets utility. This paper seeks to provide insights towards the improvement of thematic resolution of urban land cover classification. We integrate existing, readily available and with acceptable accuracies datasets from multiple sources, with remote sensing techniques. The study site is Greece and the urban land cover is classified nationwide into five classes, using the RandomForests algorithm. Results allowed us to quantify, for the first time with a good accuracy, the proportion that is occupied by each different urban land cover class. The total area covered by urban land cover is 2280 km2 (1.76% of total terrestrial area), the dominant class is discontinuous dense urban fabric (50.71% of urban land cover) and the least occurring class is discontinuous very low density urban fabric (2.06% of urban land cover).

  14. Automatic co-segmentation of lung tumor based on random forest in PET-CT images

    NASA Astrophysics Data System (ADS)

    Jiang, Xueqing; Xiang, Dehui; Zhang, Bin; Zhu, Weifang; Shi, Fei; Chen, Xinjian

    2016-03-01

    In this paper, a fully automatic method is proposed to segment the lung tumor in clinical 3D PET-CT images. The proposed method effectively combines PET and CT information to make full use of the high contrast of PET images and superior spatial resolution of CT images. Our approach consists of three main parts: (1) initial segmentation, in which spines are removed in CT images and initial connected regions achieved by thresholding based segmentation in PET images; (2) coarse segmentation, in which monotonic downhill function is applied to rule out structures which have similar standardized uptake values (SUV) to the lung tumor but do not satisfy a monotonic property in PET images; (3) fine segmentation, random forests method is applied to accurately segment the lung tumor by extracting effective features from PET and CT images simultaneously. We validated our algorithm on a dataset which consists of 24 3D PET-CT images from different patients with non-small cell lung cancer (NSCLC). The average TPVF, FPVF and accuracy rate (ACC) were 83.65%, 0.05% and 99.93%, respectively. The correlation analysis shows our segmented lung tumor volumes has strong correlation ( average 0.985) with the ground truth 1 and ground truth 2 labeled by a clinical expert.

  15. Effects of systematic phase errors on optimized quantum random-walk search algorithm

    NASA Astrophysics Data System (ADS)

    Zhang, Yu-Chao; Bao, Wan-Su; Wang, Xiang; Fu, Xiang-Qun

    2015-06-01

    This study investigates the effects of systematic errors in phase inversions on the success rate and number of iterations in the optimized quantum random-walk search algorithm. Using the geometric description of this algorithm, a model of the algorithm with phase errors is established, and the relationship between the success rate of the algorithm, the database size, the number of iterations, and the phase error is determined. For a given database size, we obtain both the maximum success rate of the algorithm and the required number of iterations when phase errors are present in the algorithm. Analyses and numerical simulations show that the optimized quantum random-walk search algorithm is more robust against phase errors than Grover’s algorithm. Project supported by the National Basic Research Program of China (Grant No. 2013CB338002).

  16. Random Survival Forests for Predicting the Bed Occupancy in the Intensive Care Unit

    PubMed Central

    Ruyssinck, Joeri; Houthooft, Rein; Couckuyt, Ivo; Gadeyne, Bram; Colpaert, Kirsten; Decruyenaere, Johan; De Turck, Filip; Dhaene, Tom

    2016-01-01

    Predicting the bed occupancy of an intensive care unit (ICU) is a daunting task. The uncertainty associated with the prognosis of critically ill patients and the random arrival of new patients can lead to capacity problems and the need for reactive measures. In this paper, we work towards a predictive model based on Random Survival Forests which can assist physicians in estimating the bed occupancy. As input data, we make use of the Sequential Organ Failure Assessment (SOFA) score collected and calculated from 4098 patients at two ICU units of Ghent University Hospital over a time period of four years. We compare the performance of our system with a baseline performance and a standard Random Forest regression approach. Our results indicate that Random Survival Forests can effectively be used to assist in the occupancy prediction problem. Furthermore, we show that a group based approach, such as Random Survival Forests, performs better compared to a setting in which the length of stay of a patient is individually assessed.

  17. Automated segmentation of thyroid gland on CT images with multi-atlas label fusion and random classification forest

    NASA Astrophysics Data System (ADS)

    Liu, Jiamin; Chang, Kevin; Kim, Lauren; Turkbey, Evrim; Lu, Le; Yao, Jianhua; Summers, Ronald

    2015-03-01

    The thyroid gland plays an important role in clinical practice, especially for radiation therapy treatment planning. For patients with head and neck cancer, radiation therapy requires a precise delineation of the thyroid gland to be spared on the pre-treatment planning CT images to avoid thyroid dysfunction. In the current clinical workflow, the thyroid gland is normally manually delineated by radiologists or radiation oncologists, which is time consuming and error prone. Therefore, a system for automated segmentation of the thyroid is desirable. However, automated segmentation of the thyroid is challenging because the thyroid is inhomogeneous and surrounded by structures that have similar intensities. In this work, the thyroid gland segmentation is initially estimated by multi-atlas label fusion algorithm. The segmentation is refined by supervised statistical learning based voxel labeling with a random forest algorithm. Multiatlas label fusion (MALF) transfers expert-labeled thyroids from atlases to a target image using deformable registration. Errors produced by label transfer are reduced by label fusion that combines the results produced by all atlases into a consensus solution. Then, random forest (RF) employs an ensemble of decision trees that are trained on labeled thyroids to recognize features. The trained forest classifier is then applied to the thyroid estimated from the MALF by voxel scanning to assign the class-conditional probability. Voxels from the expert-labeled thyroids in CT volumes are treated as positive classes; background non-thyroid voxels as negatives. We applied this automated thyroid segmentation system to CT scans of 20 patients. The results showed that the MALF achieved an overall 0.75 Dice Similarity Coefficient (DSC) and the RF classification further improved the DSC to 0.81.

  18. Random Forest and Objected-Based Classification for Forest Pest Extraction from Uav Aerial Imagery

    NASA Astrophysics Data System (ADS)

    Yuan, Yi; Hu, Xiangyun

    2016-06-01

    Forest pest is one of the most important factors affecting the health of forest. However, since it is difficult to figure out the pest areas and to predict the spreading ways just to partially control and exterminate it has not effective enough so far now. The infected areas by it have continuously spreaded out at present. Thus the introduction of spatial information technology is highly demanded. It is very effective to examine the spatial distribution characteristics that can establish timely proper strategies for control against pests by periodically figuring out the infected situations as soon as possible and by predicting the spreading ways of the infection. Now, with the UAV photography being more and more popular, it has become much cheaper and faster to get UAV images which are very suitable to be used to monitor the health of forest and detect the pest. This paper proposals a new method to effective detect forest pest in UAV aerial imagery. For an image, we segment it to many superpixels at first and then we calculate a 12-dimension statistical texture information for each superpixel which are used to train and classify the data. At last, we refine the classification results by some simple rules. The experiments show that the method is effective for the extraction of forest pest areas in UAV images.

  19. Variable selection with random forest: Balancing stability, performance, and interpretation in ecological and environmental modeling

    EPA Science Inventory

    Random forest (RF) is popular in ecological and environmental modeling, in part, because of its insensitivity to correlated predictors and resistance to overfitting. Although variable selection has been proposed to improve both performance and interpretation of RF models, it is u...

  20. Oak Park and River Forest High School Random Access Information Center; A PACE Program. Report II.

    ERIC Educational Resources Information Center

    Oak Park - River Forest High School, Oak Park, IL.

    The specifications, planning, and initial development phases of the Random Access Center at the Oak Park and River Forest High School in Oak Park, Illinois, are described with particular attention to the ways that the five functional specifications and the five-part program rationale were implemented in the system design. Specifications, set out…

  1. The Random Forests Statistical Technique: An Examination of Its Value for the Study of Reading

    ERIC Educational Resources Information Center

    Matsuki, Kazunaga; Kuperman, Victor; Van Dyke, Julie A.

    2016-01-01

    Studies investigating individual differences in reading ability often involve data sets containing a large number of collinear predictors and a small number of observations. In this article, we discuss the method of Random Forests and demonstrate its suitability for addressing the statistical concerns raised by such data sets. The method is…

  2. Source-enhanced coalescence of trees in a random forest

    NASA Astrophysics Data System (ADS)

    Lushnikov, A. A.

    2015-08-01

    The time evolution of a random graph with varying number of edges and vertices is considered. The edges and vertices are assumed to be added at random by one at a time with different rates. A fresh edge connects either two linked components and forms a new component of larger order g (coalescence of graphs) or increases (by one) the number of edges in a given linked component (cycling). Assuming the vertices to have a finite valence (the number of edges connected with a given vertex is limited) the kinetic equation for the distribution of linked components of the graph over their orders and valences is formulated and solved exactly by applying the generating function method for the case of coalescence of trees. The evolution process is shown to reveal a phase transition: the emergence of a giant linked component whose order is comparable to the total order of the graph. The time dependencies of the moments of the distribution of linked components over their orders and valences are found explicitly for the pregelation period and the critical behavior of the spectrum is analyzed. It is found that the linked components are γ distributed over g with the algebraic prefactor g-5 /2. The coalescence process is shown to terminate by the formation of the steady-state γ spectrum with the same algebraic prefactor.

  3. Random forest fishing: a novel approach to identifying organic group of risk factors in genome-wide association studies

    PubMed Central

    Yang, Wei; Charles Gu, C

    2014-01-01

    Genome-wide association studies (GWAS) has brought methodological challenges in handling massive high-dimensional data and also real opportunities for studying the joint effect of many risk factors acting in concert as an organic group. The random forest (RF) methodology is recognized by many for its potential in examining interaction effects in large data sets. However, RF is not designed to directly handle GWAS data, which typically have hundreds of thousands of single-nucleotide polymorphisms as predictor variables. We propose and evaluate a novel extension of RF, called random forest fishing (RFF), for GWAS analysis. RFF repeatedly updates a relatively small set of predictors obtained by RF tests to find globally important groups predictive of the disease phenotype, using a novel search algorithm based on genetic programming and simulated annealing. A key improvement of RFF results from the use of guidance incorporating empirical test results of genome-wide pairwise interactions. Evaluated using simulated and real GWAS data sets, RFF is shown to be effective in identifying important predictors, particularly when both marginal effects and interactions exist, and is applicable to very large GWAS data sets. PMID:23695277

  4. Fault diagnosis of spur gearbox based on random forest and wavelet packet decomposition

    NASA Astrophysics Data System (ADS)

    Cabrera, Diego; Sancho, Fernando; Sánchez, René-Vinicio; Zurita, Grover; Cerrada, Mariela; Li, Chuan; Vásquez, Rafael E.

    2015-09-01

    This paper addresses the development of a random forest classifier for the multi-class fault diagnosis in spur gearboxes. The vibration signal's condition parameters are first extracted by applying the wavelet packet decomposition with multiple mother wavelets, and the coefficients' energy content for terminal nodes is used as the input feature for the classification problem. Then, a study through the parameters' space to find the best values for the number of trees and the number of random features is performed. In this way, the best set of mother wavelets for the application is identified and the best features are selected through the internal ranking of the random forest classifier. The results show that the proposed method reached 98.68% in classification accuracy, and high efficiency and robustness in the models.

  5. Relevant feature set estimation with a knock-out strategy and random forests.

    PubMed

    Ganz, Melanie; Greve, Douglas N; Fischl, Bruce; Konukoglu, Ender

    2015-11-15

    Group analysis of neuroimaging data is a vital tool for identifying anatomical and functional variations related to diseases as well as normal biological processes. The analyses are often performed on a large number of highly correlated measurements using a relatively smaller number of samples. Despite the correlation structure, the most widely used approach is to analyze the data using univariate methods followed by post-hoc corrections that try to account for the data's multivariate nature. Although widely used, this approach may fail to recover from the adverse effects of the initial analysis when local effects are not strong. Multivariate pattern analysis (MVPA) is a powerful alternative to the univariate approach for identifying relevant variations. Jointly analyzing all the measures, MVPA techniques can detect global effects even when individual local effects are too weak to detect with univariate analysis. Current approaches are successful in identifying variations that yield highly predictive and compact models. However, they suffer from lessened sensitivity and instabilities in identification of relevant variations. Furthermore, current methods' user-defined parameters are often unintuitive and difficult to determine. In this article, we propose a novel MVPA method for group analysis of high-dimensional data that overcomes the drawbacks of the current techniques. Our approach explicitly aims to identify all relevant variations using a "knock-out" strategy and the Random Forest algorithm. In evaluations with synthetic datasets the proposed method achieved substantially higher sensitivity and accuracy than the state-of-the-art MVPA methods, and outperformed the univariate approach when the effect size is low. In experiments with real datasets the proposed method identified regions beyond the univariate approach, while other MVPA methods failed to replicate the univariate results. More importantly, in a reproducibility study with the well-known ADNI dataset

  6. Genotype-phenotype matching analysis of 38 Lactococcus lactis strains using random forest methods

    PubMed Central

    2013-01-01

    Background Lactococcus lactis is used in dairy food fermentation and for the efficient production of industrially relevant enzymes. The genome content and different phenotypes have been determined for multiple L. lactis strains in order to understand intra-species genotype and phenotype diversity and annotate gene functions. In this study, we identified relations between gene presence and a collection of 207 phenotypes across 38 L. lactis strains of dairy and plant origin. Gene occurrence and phenotype data were used in an iterative gene selection procedure, based on the Random Forest algorithm, to identify genotype-phenotype relations. Results A total of 1388 gene-phenotype relations were found, of which some confirmed known gene-phenotype relations, such as the importance of arabinose utilization genes only for strains of plant origin. We also identified a gene cluster related to growth on melibiose, a plant disaccharide; this cluster is present only in melibiose-positive strains and can be used as a genetic marker in trait improvement. Additionally, several novel gene-phenotype relations were uncovered, for instance, genes related to arsenite resistance or arginine metabolism. Conclusions Our results indicate that genotype-phenotype matching by integrating large data sets provides the possibility to identify gene-phenotype relations, possibly improve gene function annotation and identified relations can be used for screening bacterial culture collections for desired phenotypes. In addition to all gene-phenotype relations, we also provide coherent phenotype data for 38 Lactococcus strains assessed in 207 different phenotyping experiments, which to our knowledge is the largest to date for the Lactococcus lactis species. PMID:23530958

  7. Space resection model calculation based on Random Sample Consensus algorithm

    NASA Astrophysics Data System (ADS)

    Liu, Xinzhu; Kang, Zhizhong

    2016-03-01

    Resection has been one of the most important content in photogrammetry. It aims at the position and attitude information of camera at the shooting point. However in some cases, the observed values for calculating are with gross errors. This paper presents a robust algorithm that using RANSAC method with DLT model can effectually avoiding the difficulties to determine initial values when using co-linear equation. The results also show that our strategies can exclude crude handicap and lead to an accurate and efficient way to gain elements of exterior orientation.

  8. Evaluation of three satellite-based latent heat flux algorithms over forest ecosystems using eddy covariance data.

    PubMed

    Yao, Yunjun; Zhang, Yuhu; Zhao, Shaohua; Li, Xianglan; Jia, Kun

    2015-06-01

    We have evaluated the performance of three satellite-based latent heat flux (LE) algorithms over forest ecosystems using observed data from 40 flux towers distributed across the world on all continents. These are the revised remote sensing-based Penman-Monteith LE (RRS-PM) algorithm, the modified satellite-based Priestley-Taylor LE (MS-PT) algorithm, and the semi-empirical Penman LE (UMD-SEMI) algorithm. Sensitivity analysis illustrates that both energy and vegetation terms has the highest sensitivity compared with other input variables. The validation results show that three algorithms demonstrate substantial differences in algorithm performance for estimating daily LE variations among five forest ecosystem biomes. Based on the average Nash-Sutcliffe efficiency and root-mean-squared error (RMSE), the MS-PT algorithm has high performance over both deciduous broadleaf forest (DBF) (0.81, 25.4 W/m(2)) and mixed forest (MF) (0.62, 25.3 W/m(2)) sites, the RRS-PM algorithm has high performance over evergreen broadleaf forest (EBF) (0.4, 28.1 W/m(2)) sites, and the UMD-SEMI algorithm has high performance over both deciduous needleleaf forest (DNF) (0.78, 17.1 W/m(2)) and evergreen needleleaf forest (ENF) (0.51, 28.1 W/m(2)) sites. Perhaps the lower uncertainties in the required forcing data for the MS-PT algorithm, the complicated algorithm structure for the RRS-PM algorithm, and the calibrated coefficients of the UMD-SEMI algorithm based on ground-measured data may explain these differences.

  9. Evaluation of Algorithms for Calculating Forest Micrometeorological Variables Using an Extensive Dataset of Paired Station Recordings

    NASA Astrophysics Data System (ADS)

    Garvelmann, J.; Pohl, S.; Warscher, M.; Mair, E.; Marke, T.; Strasser, U.; Kunstmann, H.

    2015-12-01

    Forests represent significant areas of subalpine environments and their influence is crucial for the snow cover dynamics on the ground. Since measurements of major micrometeorological variables are usually lacking for forested sites, physically based or empirical parameterizations are usually applied to calculate the beneath-canopy micrometeorological conditions for snow hydrological modeling. Most of these parameterizations have been developed from observations at selected long-term climate stations. Consequently, the high spatial variability of the micrometeorological variables is usually not taken into account. The goal of this study is to evaluate existing approaches using an extensive dataset collected during five winter seasons using a stratified sampling design with pairs of snow monitoring stations (SnoMoS) at open/forested sites in three study areas (Black Forest region of SW Germany, Brixenbach catchment in the Austrian Alps and the Berchtesgadener Ache catchment in the Berchtesgaden Alps of SE Germany). In total, recordings from 110 station pairs were available for analysis. The measurements of air temperature, relative humidity, wind speed and global radiation from the open field sites were used to calculate the adjacent inside forest conditions. Calculation results are compared to the respective beneath-canopy measurements in order to evaluate the applied model algorithms. The results reveal that the algorithms surprisingly well reproduced the inside canopy conditions for wind speed and global radiation. However, air temperature and relative humidity are not well reproduced. Our study comes up with a modification of the two respective parameterizations developed from the paired measurements.

  10. Improved progressive TIN densification filtering algorithm for airborne LiDAR data in forested areas

    NASA Astrophysics Data System (ADS)

    Zhao, Xiaoqian; Guo, Qinghua; Su, Yanjun; Xue, Baolin

    2016-07-01

    Filtering of light detection and ranging (LiDAR) data into the ground and non-ground points is a fundamental step in processing raw airborne LiDAR data. This paper proposes an improved progressive triangulated irregular network (TIN) densification (IPTD) filtering algorithm that can cope with a variety of forested landscapes, particularly both topographically and environmentally complex regions. The IPTD filtering algorithm consists of three steps: (1) acquiring potential ground seed points using the morphological method; (2) obtaining accurate ground seed points; and (3) building a TIN-based model and iteratively densifying TIN. The IPTD filtering algorithm was tested in 15 forested sites with various terrains (i.e., elevation and slope) and vegetation conditions (i.e., canopy cover and tree height), and was compared with seven other commonly used filtering algorithms (including morphology-based, slope-based, and interpolation-based filtering algorithms). Results show that the IPTD achieves the highest filtering accuracy for nine of the 15 sites. In general, it outperforms the other filtering algorithms, yielding the lowest average total error of 3.15% and the highest average kappa coefficient of 89.53%.

  11. A Copula Based Approach for Design of Multivariate Random Forests for Drug Sensitivity Prediction

    PubMed Central

    Haider, Saad; Rahman, Raziur; Ghosh, Souparno; Pal, Ranadip

    2015-01-01

    Modeling sensitivity to drugs based on genetic characterizations is a significant challenge in the area of systems medicine. Ensemble based approaches such as Random Forests have been shown to perform well in both individual sensitivity prediction studies and team science based prediction challenges. However, Random Forests generate a deterministic predictive model for each drug based on the genetic characterization of the cell lines and ignores the relationship between different drug sensitivities during model generation. This application motivates the need for generation of multivariate ensemble learning techniques that can increase prediction accuracy and improve variable importance ranking by incorporating the relationships between different output responses. In this article, we propose a novel cost criterion that captures the dissimilarity in the output response structure between the training data and node samples as the difference in the two empirical copulas. We illustrate that copulas are suitable for capturing the multivariate structure of output responses independent of the marginal distributions and the copula based multivariate random forest framework can provide higher accuracy prediction and improved variable selection. The proposed framework has been validated on genomics of drug sensitivity for cancer and cancer cell line encyclopedia database. PMID:26658256

  12. Differentiation of fat, muscle, and edema in thigh MRIs using random forest classification

    NASA Astrophysics Data System (ADS)

    Kovacs, William; Liu, Chia-Ying; Summers, Ronald M.; Yao, Jianhua

    2016-03-01

    There are many diseases that affect the distribution of muscles, including Duchenne and fascioscapulohumeral dystrophy among other myopathies. In these disease cases, it is important to quantify both the muscle and fat volumes to track the disease progression. There has also been evidence that abnormal signal intensity on the MR images, which often is an indication of edema or inflammation can be a good predictor for muscle deterioration. We present a fully-automated method that examines magnetic resonance (MR) images of the thigh and identifies the fat, muscle, and edema using a random forest classifier. First the thigh regions are automatically segmented using the T1 sequence. Then, inhomogeneity artifacts were corrected using the N3 technique. The T1 and STIR (short tau inverse recovery) images are then aligned using landmark based registration with the bone marrow. The normalized T1 and STIR intensity values are used to train the random forest. Once trained, the random forest can accurately classify the aforementioned classes. This method was evaluated on MR images of 9 patients. The precision values are 0.91+/-0.06, 0.98+/-0.01 and 0.50+/-0.29 for muscle, fat, and edema, respectively. The recall values are 0.95+/-0.02, 0.96+/-0.03 and 0.43+/-0.09 for muscle, fat, and edema, respectively. This demonstrates the feasibility of utilizing information from multiple MR sequences for the accurate quantification of fat, muscle and edema.

  13. An Optimal Parallel Algorithm for Constructing a Spanning Forest on Trapezoid Graphs

    NASA Astrophysics Data System (ADS)

    Honma, Hirotoshi; Masuyama, Shigeru

    Given a simple graph G with n vertices, m edges and k connected components. The spanning forest problem is to find a spanning tree for each connected component of G. This problem has applications to the electrical power demand problem, computer network design, circuit analysis, etc. An optimal parallel algorithm for finding a spanning tree on the trapezoid graph is given by Bera et al., it takes O(logn) time with O(n/logn) processors on the EREW (Exclusive-Read Exclusive-Write) PRAM. Bera et al.'s algorithm is very simple and elegant. Moreover, it can correctly construct a spanning tree when the graph is connected. However, their algorithm can not accept a disconnected graph as an input. Applying their algorithm to a disconnected graph, Concurrent-Write occurs once for each connected component, thus this can not be achieved on EREW PRAM. In this paper we present an O(logn) time parallel algorithm with O(n/logn) processors for constructing a spanning forest on trapezoid graph G on EREW PRAM even if G is a disconnected graph.

  14. Experimental implementation of a quantum random-walk search algorithm using strongly dipolar coupled spins

    SciTech Connect

    Lu Dawei; Peng Xinhua; Du Jiangfeng; Zhu Jing; Zou Ping; Yu Yihua; Zhang Shanmin; Chen Qun

    2010-02-15

    An important quantum search algorithm based on the quantum random walk performs an oracle search on a database of N items with O({radical}(phN)) calls, yielding a speedup similar to the Grover quantum search algorithm. The algorithm was implemented on a quantum information processor of three-qubit liquid-crystal nuclear magnetic resonance (NMR) in the case of finding 1 out of 4, and the diagonal elements' tomography of all the final density matrices was completed with comprehensible one-dimensional NMR spectra. The experimental results agree well with the theoretical predictions.

  15. Predicting local Soil- and Land-units with Random Forest in the Senegalese Sahel

    NASA Astrophysics Data System (ADS)

    Grau, Tobias; Brandt, Martin; Samimi, Cyrus

    2013-04-01

    MODIS (MCD12Q1) or Globcover are often the only available global land-cover products, however ground-truthing in the Sahel of Senegal has shown that most classes do have any agreement with actual land-cover making those products unusable in any local application. We suggest a methodology, which models local Wolof land- and soil-types in an area in the Senegalese Ferlo around Linguère at different scales. In a first step, interviews with the local population were conducted to ascertain the local denotation of soil units, as well as their agricultural use and woody vegetation mainly growing on them. "Ndjor" are soft sand soils with mainly Combretum glutinosum trees. They are suitable for groundnuts and beans while millet is grown on hard sand soils ("Bardjen") dominated by Balanites aegyptiaca and Acacia tortilis. "Xur" are clayey depressions with a high diversity of tree species. Lateritic pasture sites with dense woody vegetation (mostly Pterocarpus lucens and Guiera senegalensis) have never been used for cropping and are called "All". In a second step, vegetation and soil parameters of 85 plots (~1 ha) were surveyed in the field. 28 different soil parameters are clustered into 4 classes using the WARD algorithm. Here, 81% agree with the local classification. Then, an ordination (NMDS) with 2 dimensions and a stress-value of 9.13% was calculated using the 28 soil parameters. It shows several significant relationships between the soil classes and the fitted environmental parameters which are derived from field data, a digital elevation model, Landsat and RapidEye imagery as well as TRMM rainfall data. Landsat's band 5 reflectance values (1.55 - 1.75 µm) of mean dry season image (2000-2010) has a R² of 0.42 and is the most important of 9 significant variables (5%-level). A random forest classifier is then used to extrapolate the 4 classes to the whole study area based on the 9 significant environmental parameters. At a resolution of 30 m the OBB (out-of-bag) error

  16. An integrated classifier for computer-aided diagnosis of colorectal polyps based on random forest and location index strategies

    NASA Astrophysics Data System (ADS)

    Hu, Yifan; Han, Hao; Zhu, Wei; Li, Lihong; Pickhardt, Perry J.; Liang, Zhengrong

    2016-03-01

    Feature classification plays an important role in differentiation or computer-aided diagnosis (CADx) of suspicious lesions. As a widely used ensemble learning algorithm for classification, random forest (RF) has a distinguished performance for CADx. Our recent study has shown that the location index (LI), which is derived from the well-known kNN (k nearest neighbor) and wkNN (weighted k nearest neighbor) classifier [1], has also a distinguished role in the classification for CADx. Therefore, in this paper, based on the property that the LI will achieve a very high accuracy, we design an algorithm to integrate the LI into RF for improved or higher value of AUC (area under the curve of receiver operating characteristics -- ROC). Experiments were performed by the use of a database of 153 lesions (polyps), including 116 neoplastic lesions and 37 hyperplastic lesions, with comparison to the existing classifiers of RF and wkNN, respectively. A noticeable gain by the proposed integrated classifier was quantified by the AUC measure.

  17. Comparative analyses between retained introns and constitutively spliced introns in Arabidopsis thaliana using random forest and support vector machine.

    PubMed

    Mao, Rui; Raj Kumar, Praveen Kumar; Guo, Cheng; Zhang, Yang; Liang, Chun

    2014-01-01

    One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter [Formula: see text] in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention.

  18. Comparative Analyses between Retained Introns and Constitutively Spliced Introns in Arabidopsis thaliana Using Random Forest and Support Vector Machine

    PubMed Central

    Mao, Rui; Raj Kumar, Praveen Kumar; Guo, Cheng; Zhang, Yang; Liang, Chun

    2014-01-01

    One of the important modes of pre-mRNA post-transcriptional modification is alternative splicing. Alternative splicing allows creation of many distinct mature mRNA transcripts from a single gene by utilizing different splice sites. In plants like Arabidopsis thaliana, the most common type of alternative splicing is intron retention. Many studies in the past focus on positional distribution of retained introns (RIs) among different genic regions and their expression regulations, while little systematic classification of RIs from constitutively spliced introns (CSIs) has been conducted using machine learning approaches. We used random forest and support vector machine (SVM) with radial basis kernel function (RBF) to differentiate these two types of introns in Arabidopsis. By comparing coordinates of introns of all annotated mRNAs from TAIR10, we obtained our high-quality experimental data. To distinguish RIs from CSIs, We investigated the unique characteristics of RIs in comparison with CSIs and finally extracted 37 quantitative features: local and global nucleotide sequence features of introns, frequent motifs, the signal strength of splice sites, and the similarity between sequences of introns and their flanking regions. We demonstrated that our proposed feature extraction approach was more accurate in effectively classifying RIs from CSIs in comparison with other four approaches. The optimal penalty parameter C and the RBF kernel parameter in SVM were set based on particle swarm optimization algorithm (PSOSVM). Our classification performance showed F-Measure of 80.8% (random forest) and 77.4% (PSOSVM). Not only the basic sequence features and positional distribution characteristics of RIs were obtained, but also putative regulatory motifs in intron splicing were predicted based on our feature extraction approach. Clearly, our study will facilitate a better understanding of underlying mechanisms involved in intron retention. PMID:25110928

  19. Magnetic localization and orientation of the capsule endoscope based on a random complex algorithm

    PubMed Central

    He, Xiaoqi; Zheng, Zizhao; Hu, Chao

    2015-01-01

    The development of the capsule endoscope has made possible the examination of the whole gastrointestinal tract without much pain. However, there are still some important problems to be solved, among which, one important problem is the localization of the capsule. Currently, magnetic positioning technology is a suitable method for capsule localization, and this depends on a reliable system and algorithm. In this paper, based on the magnetic dipole model as well as magnetic sensor array, we propose nonlinear optimization algorithms using a random complex algorithm, applied to the optimization calculation for the nonlinear function of the dipole, to determine the three-dimensional position parameters and two-dimensional direction parameters. The stability and the antinoise ability of the algorithm is compared with the Levenberg–Marquart algorithm. The simulation and experiment results show that in terms of the error level of the initial guess of magnet location, the random complex algorithm is more accurate, more stable, and has a higher “denoise” capacity, with a larger range for initial guess values. PMID:25914561

  20. Random forest-based similarity measures for multi-modal classification of Alzheimer’s disease

    PubMed Central

    Gray, Katherine R.; Aljabar, Paul; Heckemann, Rolf A.; Hammers, Alexander; Rueckert, Daniel

    2012-01-01

    Neurodegenerative disorders, such as Alzheimer’s disease, are associated with changes in multiple neuroimaging and biological measures. These may provide complementary information for diagnosis and prognosis. We present a multi-modality classification framework in which manifolds are constructed based on pairwise similarity measures derived from random forest classifiers. Similarities from multiple modalities are combined to generate an embedding that simultaneously encodes information about all the available features. Multimodality classification is then performed using coordinates from this joint embedding. We evaluate the proposed framework by application to neuroimaging and biological data from the Alzheimer’s Disease Neuroimaging Initiative (ADNI). Features include regional MRI volumes, voxel-based FDG-PET signal intensities, CSF biomarker measures, and categorical genetic information. Classification based on the joint embedding constructed using information from all four modalities out-performs classification based on any individual modality for comparisons between Alzheimer’s disease patients and healthy controls, as well as between mild cognitive impairment patients and healthy controls. Based on the joint embedding, we achieve classification accuracies of 89% between Alzheimer’s disease patients and healthy controls, and 75% between mild cognitive impairment patients and healthy controls. These results are comparable with those reported in other recent studies using multi-kernel learning. Random forests provide consistent pairwise similarity measures for multiple modalities, thus facilitating the combination of different types of feature data. We demonstrate this by application to data in which the number of features differ by several orders of magnitude between modalities. Random forest classifiers extend naturally to multi-class problems, and the framework described here could be applied to distinguish between multiple patient groups in the

  1. A systematic review of randomized controlled trials on curative and health enhancement effects of forest therapy

    PubMed Central

    Kamioka, Hiroharu; Tsutani, Kiichiro; Mutoh, Yoshiteru; Honda, Takuya; Shiozawa, Nobuyoshi; Okada, Shinpei; Park, Sang-Jun; Kitayuguchi, Jun; Kamada, Masamitsu; Okuizumi, Hiroyasu; Handa, Shuichi

    2012-01-01

    Objective To summarize the evidence for curative and health enhancement effects through forest therapy and to assess the quality of studies based on a review of randomized controlled trials (RCTs). Study design A systematic review based on RCTs. Methods Studies were eligible if they were RCTs. Studies included one treatment group in which forest therapy was applied. The following databases – from 1990 to November 9, 2010 – were searched: MEDLINE via PubMed, CINAHL, Web of Science, and Ichushi- Web. All Cochrane databases and Campbell Systematic Reviews were also searched up to November 9, 2010. Results Two trials met all inclusion criteria. No specific diseases were evaluated, and both studies reported significant effectiveness in one or more outcomes for health enhancement. However, the results of evaluations with the CONSORT (Consolidated Standards of Reporting Trials) 2010 and CLEAR NPT (A Checklist to Evaluate a Report of a Nonpharmacological Trial) checklists generally showed a remarkable lack of description in the studies. Furthermore, there was a problem of heterogeneity, thus a meta-analysis was unable to be performed. Conclusion Because there was insufficient evidence on forest therapy due to poor methodological and reporting quality and heterogeneity of RCTs, it was not possible to offer any conclusions about the effects of this intervention. However, it was possible to identify problems with current RCTs of forest therapy, and to propose a strategy for strengthening study quality and stressing the importance of study feasibility and original check items based on characteristics of forest therapy as a future research agenda. PMID:22888281

  2. Characterizing channel change along a multithread gravel-bed river using random forest image classification

    NASA Astrophysics Data System (ADS)

    Overstreet, B. T.; Legleiter, C. J.

    2012-12-01

    The Snake River in Grand Teton National Park is a dam-regulated but highly dynamic gravel-bed river that alternates between a single thread and a multithread planform. Identifying key drivers of channel change on this river could improve our understanding of 1) how flow regulation at Jackson Lake Dam has altered the character of the river over time; 2) how changes in the distribution of various types of vegetation impacts river dynamics; and 3) how the Snake River will respond to future human and climate driven disturbances. Despite the importance of monitoring planform changes over time, automated channel extraction and understanding the physical drivers contributing to channel change continue to be challenging yet critical steps in the remote sensing of riverine environments. In this study we use the random forest statistical technique to first classify land cover within the Snake River corridor and then extract channel features from a sequence of high-resolution multispectral images of the Snake River spanning the period from 2006 to 2012, which encompasses both exceptionally dry years and near-record runoff in 2011. We show that the random forest technique can be used to classify images with as few as four spectral bands with far greater accuracy than traditional single-tree classification approaches. Secondly, we couple random forest derived land cover maps with LiDAR derived topography, bathymetry, and canopy height to explore physical drivers contributing to observed channel changes on the Snake River. In conclusion we show that the random forest technique is a powerful tool for classifying multispectral images of rivers. Moreover, we hypothesize that with sufficient data for calculating spatially distributed metrics of channel form and more frequent channel monitoring, this tool can also be used to identify areas with high probabilities of channel change. Land cover maps of a portion of the Snake River produced from digital aerial photography from 2010 and

  3. Identification of a potential fibromyalgia diagnosis using random forest modeling applied to electronic medical records

    PubMed Central

    Emir, Birol; Masters, Elizabeth T; Mardekian, Jack; Clair, Andrew; Kuhn, Max; Silverman, Stuart L

    2015-01-01

    Background Diagnosis of fibromyalgia (FM), a chronic musculoskeletal condition characterized by widespread pain and a constellation of symptoms, remains challenging and is often delayed. Methods Random forest modeling of electronic medical records was used to identify variables that may facilitate earlier FM identification and diagnosis. Subjects aged ≥18 years with two or more listings of the International Classification of Diseases, Ninth Revision, (ICD-9) code for FM (ICD-9 729.1) ≥30 days apart during the 2012 calendar year were defined as cases among subjects associated with an integrated delivery network and who had one or more health care provider encounter in the Humedica database in calendar years 2011 and 2012. Controls were without the FM ICD-9 codes. Seventy-two demographic, clinical, and health care resource utilization variables were entered into a random forest model with downsampling to account for cohort imbalances (<1% subjects had FM). Importance of the top ten variables was ranked based on normalization to 100% for the variable with the largest loss in predicting performance by its omission from the model. Since random forest is a complex prediction method, a set of simple rules was derived to help understand what factors drive individual predictions. Results The ten variables identified by the model were: number of visits where laboratory/non-imaging diagnostic tests were ordered; number of outpatient visits excluding office visits; age; number of office visits; number of opioid prescriptions; number of medications prescribed; number of pain medications excluding opioids; number of medications administered/ordered; number of emergency room visits; and number of musculoskeletal conditions. A receiver operating characteristic curve confirmed the model’s predictive accuracy using an independent test set (area under the curve, 0.810). To enhance interpretability, nine rules were developed that could be used with good predictive probability of

  4. Automated segmentation of geographic atrophy of the retinal epithelium via random forests in AREDS color fundus images☆

    PubMed Central

    Feeny, Albert K.; Tadarati, Mongkol; Freund, David E.; Bressler, Neil M.; Burlina, Philippe

    2015-01-01

    Background Age-related macular degeneration (AMD), left untreated, is the leading cause of vision loss in people older than 55. Severe central vision loss occurs in the advanced stage of the disease, characterized by either the in growth of choroidal neovascularization (CNV), termed the “wet” form, or by geographic atrophy (GA) of the retinal pigment epithelium (RPE) involving the center of the macula, termed the “dry” form. Tracking the change in GA area over time is important since it allows for the characterization of the effectiveness of GA treatments. Tracking GA evolution can be achieved by physicians performing manual delineation of GA area on retinal fundus images. However, manual GA delineation is time-consuming and subject to inter-and intra-observer variability. Methods We have developed a fully automated GA segmentation algorithm in color fundus images that uses a supervised machine learning approach employing a random forest classifier. This algorithm is developed and tested using a dataset of images from the NIH-sponsored Age Related Eye Disease Study (AREDS). GA segmentation output was compared against a manual delineation by a retina specialist. Results Using 143 color fundus images from 55 different patient eyes, our algorithm achieved PPV of 0.82±0.19, and NPV of 0:95±0.07. Discussion This is the first study, to our knowledge, applying machine learning methods to GA segmentation on color fundus images and using AREDS imagery for testing. These preliminary results show promising evidence that machine learning methods may have utility in automated characterization of GA from color fundus images. PMID:26318113

  5. Convergence rates of finite difference stochastic approximation algorithms part II: implementation via common random numbers

    NASA Astrophysics Data System (ADS)

    Dai, Liyi

    2016-05-01

    Stochastic optimization is a fundamental problem that finds applications in many areas including biological and cognitive sciences. The classical stochastic approximation algorithm for iterative stochastic optimization requires gradient information of the sample object function that is typically difficult to obtain in practice. Recently there has been renewed interests in derivative free approaches to stochastic optimization. In this paper, we examine the rates of convergence for the Kiefer-Wolfowitz algorithm and the mirror descent algorithm, by approximating gradient using finite differences generated through common random numbers. It is shown that the convergence of these algorithms can be accelerated by controlling the implementation of the finite differences. Particularly, it is shown that the rate can be increased to n-2/5 in general and to n-1/2, the best possible rate of stochastic approximation, in Monte Carlo optimization for a broad class of problems, in the iteration number n.

  6. Biased Random-Key Genetic Algorithms for the Winner Determination Problem in Combinatorial Auctions.

    PubMed

    de Andrade, Carlos Eduardo; Toso, Rodrigo Franco; Resende, Mauricio G C; Miyazawa, Flávio Keidi

    2015-01-01

    In this paper we address the problem of picking a subset of bids in a general combinatorial auction so as to maximize the overall profit using the first-price model. This winner determination problem assumes that a single bidding round is held to determine both the winners and prices to be paid. We introduce six variants of biased random-key genetic algorithms for this problem. Three of them use a novel initialization technique that makes use of solutions of intermediate linear programming relaxations of an exact mixed integer linear programming model as initial chromosomes of the population. An experimental evaluation compares the effectiveness of the proposed algorithms with the standard mixed linear integer programming formulation, a specialized exact algorithm, and the best-performing heuristics proposed for this problem. The proposed algorithms are competitive and offer strong results, mainly for large-scale auctions.

  7. Security authentication with a three-dimensional optical phase code using random forest classifier.

    PubMed

    Markman, Adam; Carnicer, Artur; Javidi, Bahram

    2016-06-01

    An object with a unique three-dimensional (3D) optical phase mask attached is analyzed for security and authentication. These 3D optical phase masks are more difficult to duplicate or to have a mathematical formulation compared with 2D masks and thus have improved security capabilities. A quick response code was modulated using a random 3D optical phase mask generating a 3D optical phase code (OPC). Due to the scattering of light through the 3D OPC, a unique speckle pattern based on the materials and structure in the 3D optical phase mask is generated and recorded on a CCD device. Feature extraction is performed by calculating the mean, variance, skewness, kurtosis, and entropy for each recorded speckle pattern. The random forest classifier is used for authentication. Optical experiments demonstrate the feasibility of the authentication scheme. PMID:27409445

  8. Breast segmentation in MRI using Poisson surface reconstruction initialized with random forest edge detection

    NASA Astrophysics Data System (ADS)

    Martel, Anne L.; Gallego-Ortiz, Cristina; Lu, YingLi

    2016-03-01

    Segmentation of breast tissue in MRI images is an important pre-processing step for many applications. We present a new method that uses a random forest classifier to identify candidate edges in the image and then applies a Poisson reconstruction step to define a 3D surface based on the detected edge points. Using a leave one patient out cross validation we achieve a Dice overlap score of 0.96 +/- 0.02 for T1 weighted non-fat suppressed images in 8 patients. In a second dataset of 332 images acquired using a Dixon sequence, which was not used in training the random classifier, the mean Dice score was 0.90 +/- 0.03. Using this approach we have achieved accurate, robust segmentation results using a very small training set.

  9. Application of random forests method to predict the retention indices of some polycyclic aromatic hydrocarbons.

    PubMed

    Goudarzi, N; Shahsavani, D; Emadi-Gandaghi, F; Chamjangali, M Arab

    2014-03-14

    In this work, a quantitative structure-retention relationship (QSRR) investigation was carried out based on the new method of random forests (RF) for prediction of the retention indices (RIs) of some polycyclic aromatic hydrocarbon (PAH) compounds. The RIs of these compounds were calculated using the theoretical descriptors generated from their molecular structures. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. Optimization of these parameters showed that in the point m=70, nt=460, the RF method can give the best results. Also, performance of the RF model was compared with that of the artificial neural network (ANN) and multiple linear regression (MLR) techniques. The results obtained show the relative superiority of the RF method over the MLR and ANN ones.

  10. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis.

    PubMed

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  11. Functional Principal Component Analysis and Randomized Sparse Clustering Algorithm for Medical Image Analysis

    PubMed Central

    Lin, Nan; Jiang, Junhai; Guo, Shicheng; Xiong, Momiao

    2015-01-01

    Due to the advancement in sensor technology, the growing large medical image data have the ability to visualize the anatomical changes in biological tissues. As a consequence, the medical images have the potential to enhance the diagnosis of disease, the prediction of clinical outcomes and the characterization of disease progression. But in the meantime, the growing data dimensions pose great methodological and computational challenges for the representation and selection of features in image cluster analysis. To address these challenges, we first extend the functional principal component analysis (FPCA) from one dimension to two dimensions to fully capture the space variation of image the signals. The image signals contain a large number of redundant features which provide no additional information for clustering analysis. The widely used methods for removing the irrelevant features are sparse clustering algorithms using a lasso-type penalty to select the features. However, the accuracy of clustering using a lasso-type penalty depends on the selection of the penalty parameters and the threshold value. In practice, they are difficult to determine. Recently, randomized algorithms have received a great deal of attentions in big data analysis. This paper presents a randomized algorithm for accurate feature selection in image clustering analysis. The proposed method is applied to both the liver and kidney cancer histology image data from the TCGA database. The results demonstrate that the randomized feature selection method coupled with functional principal component analysis substantially outperforms the current sparse clustering algorithms in image cluster analysis. PMID:26196383

  12. Fast randomized Hough transformation track initiation algorithm based on multi-scale clustering

    NASA Astrophysics Data System (ADS)

    Wan, Minjie; Gu, Guohua; Chen, Qian; Qian, Weixian; Wang, Pengcheng

    2015-10-01

    A fast randomized Hough transformation track initiation algorithm based on multi-scale clustering is proposed to overcome existing problems in traditional infrared search and track system(IRST) which cannot provide movement information of the initial target and select the threshold value of correlation automatically by a two-dimensional track association algorithm based on bearing-only information . Movements of all the targets are presumed to be uniform rectilinear motion throughout this new algorithm. Concepts of space random sampling, parameter space dynamic linking table and convergent mapping of image to parameter space are developed on the basis of fast randomized Hough transformation. Considering the phenomenon of peak value clustering due to shortcomings of peak detection itself which is built on threshold value method, accuracy can only be ensured on condition that parameter space has an obvious peak value. A multi-scale idea is added to the above-mentioned algorithm. Firstly, a primary association is conducted to select several alternative tracks by a low-threshold .Then, alternative tracks are processed by multi-scale clustering methods , through which accurate numbers and parameters of tracks are figured out automatically by means of transforming scale parameters. The first three frames are processed by this algorithm in order to get the first three targets of the track , and then two slightly different gate radius are worked out , mean value of which is used to be the global threshold value of correlation. Moreover, a new model for curvilinear equation correction is applied to the above-mentioned track initiation algorithm for purpose of solving the problem of shape distortion when a space three-dimensional curve is mapped to a two-dimensional bearing-only space. Using sideways-flying, launch and landing as examples to build models and simulate, the application of the proposed approach in simulation proves its effectiveness , accuracy , and adaptivity

  13. Nonconvergence of the Wang-Landau algorithms with multiple random walkers

    NASA Astrophysics Data System (ADS)

    Belardinelli, R. E.; Pereyra, V. D.

    2016-05-01

    This paper discusses some convergence properties in the entropic sampling Monte Carlo methods with multiple random walkers, particularly in the Wang-Landau (WL) and 1 /t algorithms. The classical algorithms are modified by the use of m -independent random walkers in the energy landscape to calculate the density of states (DOS). The Ising model is used to show the convergence properties in the calculation of the DOS, as well as the critical temperature, while the calculation of the number π by multiple dimensional integration is used in the continuum approximation. In each case, the error is obtained separately for each walker at a fixed time, t ; then, the average over m walkers is performed. It is observed that the error goes as 1 /√{m } . However, if the number of walkers increases above a certain critical value m >mx , the error reaches a constant value (i.e., it saturates). This occurs for both algorithms; however, it is shown that for a given system, the 1 /t algorithm is more efficient and accurate than the similar version of the WL algorithm. It follows that it makes no sense to increase the number of walkers above a critical value mx, since it does not reduce the error in the calculation. Therefore, the number of walkers does not guarantee convergence.

  14. Reducing the variability in random-phase initialized Gerchberg-Saxton Algorithm

    NASA Astrophysics Data System (ADS)

    Salgado-Remacha, Francisco Javier

    2016-11-01

    Gerchberg-Saxton Algorithm is a common tool for designing Computer Generated Holograms. There exist some standard functions for evaluating the quality of the final results. However, the use of randomized initial guess leads to different results, increasing the variability of the evaluation functions values. This fact is especially detrimental when the computing time is elevated. In this work, a new tool is presented, able to describe the fidelity of the results with a notably reduced variability after multiple attempts of the Gerchberg-Saxton Algorithm. This new tool results very helpful for topical fields such as 3D digital holography.

  15. 1H NMR based metabolic profiling in Crohn's disease by random forest methodology.

    PubMed

    Fathi, Fariba; Majari-Kasmaee, Laleh; Mani-Varnosfaderani, Ahmad; Kyani, Anahita; Rostami-Nejad, Mohammad; Sohrabzadeh, Kaveh; Naderi, Nosratollah; Zali, Mohammad Reza; Rezaei-Tavirani, Mostafa; Tafazzoli, Mohsen; Arefi-Oskouie, Afsaneh

    2014-07-01

    The present study was designed to search for metabolic biomarkers and their correlation with serum zinc in Crohn's disease patients. Crohn's disease (CD) is a form of inflammatory bowel disease that may affect any part of the gastrointestinal tract and can be difficult to diagnose using the clinical tests. Thus, introduction of a novel diagnostic method would be a major step towards CD treatment. Proton nuclear magnetic resonance spectroscopy ((1)H NMR) was employed for metabolic profiling to find out which metabolites in the serum have meaningful significance in the diagnosis of CD. CD and healthy subjects were correctly classified using random forest methodology. The classification model for the external test set showed a 94% correct classification of CD and healthy subjects. The present study suggests Valine and Isoleucine as differentiating metabolites for CD diagnosis. These metabolites can be used for screening of risky samples at the early stages of CD diagnoses. Moreover, a robust random forest regression model with good prediction outcomes was developed for correlating serum zinc level and metabolite concentrations. The regression model showed the correlation (R(2)) and root mean square error values of 0.83 and 6.44, respectively. This model suggests valuable clues for understanding the mechanism of zinc deficiency in CD patients.

  16. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests.

    PubMed

    Xia, Bingbing; Jiang, Huiyan; Liu, Huiling; Yi, Dehui

    2015-01-01

    This paper proposed a novel voting ranking random forests (VRRF) method for solving hepatocellular carcinoma (HCC) image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE) pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP) feature, Gray Level Co-occurrence Matrix (GLCM) feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem.

  17. Modeling Urban Dynamics Using Random Forest: Implementing Roc and Toc for Model Evaluation

    NASA Astrophysics Data System (ADS)

    Ahmadlou, M.; Delavar, M. R.; Shafizadeh-Moghadam, H.; Tayyebi, A.

    2016-06-01

    The importance of spatial accuracy of land use/cover change maps necessitates the use of high performance models. To reach this goal, calibrating machine learning (ML) approaches to model land use/cover conversions have received increasing interest among the scholars. This originates from the strength of these techniques as they powerfully account for the complex relationships underlying urban dynamics. Compared to other ML techniques, random forest has rarely been used for modeling urban growth. This paper, drawing on information from the multi-temporal Landsat satellite images of 1985, 2000 and 2015, calibrates a random forest regression (RFR) model to quantify the variable importance and simulation of urban change spatial patterns. The results and performance of RFR model were evaluated using two complementary tools, relative operating characteristics (ROC) and total operating characteristics (TOC), by overlaying the map of observed change and the modeled suitability map for land use change (error map). The suitability map produced by RFR model showed 82.48% area under curve for the ROC model which indicates a very good performance and highlights its appropriateness for simulating urban growth.

  18. A Novel Hepatocellular Carcinoma Image Classification Method Based on Voting Ranking Random Forests

    PubMed Central

    Xia, Bingbing; Jiang, Huiyan; Liu, Huiling; Yi, Dehui

    2016-01-01

    This paper proposed a novel voting ranking random forests (VRRF) method for solving hepatocellular carcinoma (HCC) image classification problem. Firstly, in preprocessing stage, this paper used bilateral filtering for hematoxylin-eosin (HE) pathological images. Next, this paper segmented the bilateral filtering processed image and got three different kinds of images, which include single binary cell image, single minimum exterior rectangle cell image, and single cell image with a size of n⁎n. After that, this paper defined atypia features which include auxiliary circularity, amendment circularity, and cell symmetry. Besides, this paper extracted some shape features, fractal dimension features, and several gray features like Local Binary Patterns (LBP) feature, Gray Level Cooccurrence Matrix (GLCM) feature, and Tamura features. Finally, this paper proposed a HCC image classification model based on random forests and further optimized the model by voting ranking method. The experiment results showed that the proposed features combined with VRRF method have a good performance in HCC image classification problem. PMID:27293477

  19. 1H NMR based metabolic profiling in Crohn's disease by random forest methodology.

    PubMed

    Fathi, Fariba; Majari-Kasmaee, Laleh; Mani-Varnosfaderani, Ahmad; Kyani, Anahita; Rostami-Nejad, Mohammad; Sohrabzadeh, Kaveh; Naderi, Nosratollah; Zali, Mohammad Reza; Rezaei-Tavirani, Mostafa; Tafazzoli, Mohsen; Arefi-Oskouie, Afsaneh

    2014-07-01

    The present study was designed to search for metabolic biomarkers and their correlation with serum zinc in Crohn's disease patients. Crohn's disease (CD) is a form of inflammatory bowel disease that may affect any part of the gastrointestinal tract and can be difficult to diagnose using the clinical tests. Thus, introduction of a novel diagnostic method would be a major step towards CD treatment. Proton nuclear magnetic resonance spectroscopy ((1)H NMR) was employed for metabolic profiling to find out which metabolites in the serum have meaningful significance in the diagnosis of CD. CD and healthy subjects were correctly classified using random forest methodology. The classification model for the external test set showed a 94% correct classification of CD and healthy subjects. The present study suggests Valine and Isoleucine as differentiating metabolites for CD diagnosis. These metabolites can be used for screening of risky samples at the early stages of CD diagnoses. Moreover, a robust random forest regression model with good prediction outcomes was developed for correlating serum zinc level and metabolite concentrations. The regression model showed the correlation (R(2)) and root mean square error values of 0.83 and 6.44, respectively. This model suggests valuable clues for understanding the mechanism of zinc deficiency in CD patients. PMID:24757065

  20. Parrallel Implementation of Fast Randomized Algorithms for Low Rank Matrix Decomposition

    SciTech Connect

    Lucas, Andrew J.; Stalizer, Mark; Feo, John T.

    2014-03-01

    We analyze the parallel performance of randomized interpolative decomposition by de- composing low rank complex-valued Gaussian random matrices larger than 100 GB. We chose a Cray XMT supercomputer as it provides an almost ideal PRAM model permitting quick investigation of parallel algorithms without obfuscation from hardware idiosyncrasies. We obtain that on non-square matrices performance scales almost linearly with runtime about 100 times faster on 128 processors. We also verify that numerically discovered error bounds still hold on matrices two orders of magnitude larger than those previously tested.

  1. Rotorcraft Blade Mode Damping Identification from Random Responses Using a Recursive Maximum Likelihood Algorithm

    NASA Technical Reports Server (NTRS)

    Molusis, J. A.

    1982-01-01

    An on line technique is presented for the identification of rotor blade modal damping and frequency from rotorcraft random response test data. The identification technique is based upon a recursive maximum likelihood (RML) algorithm, which is demonstrated to have excellent convergence characteristics in the presence of random measurement noise and random excitation. The RML technique requires virtually no user interaction, provides accurate confidence bands on the parameter estimates, and can be used for continuous monitoring of modal damping during wind tunnel or flight testing. Results are presented from simulation random response data which quantify the identified parameter convergence behavior for various levels of random excitation. The data length required for acceptable parameter accuracy is shown to depend upon the amplitude of random response and the modal damping level. Random response amplitudes of 1.25 degrees to .05 degrees are investigated. The RML technique is applied to hingeless rotor test data. The inplane lag regressing mode is identified at different rotor speeds. The identification from the test data is compared with the simulation results and with other available estimates of frequency and damping.

  2. On the convergence of EM-like algorithms for image segmentation using Markov random fields.

    PubMed

    Roche, Alexis; Ribes, Delphine; Bach-Cuadra, Meritxell; Krüger, Gunnar

    2011-12-01

    Inference of Markov random field images segmentation models is usually performed using iterative methods which adapt the well-known expectation-maximization (EM) algorithm for independent mixture models. However, some of these adaptations are ad hoc and may turn out numerically unstable. In this paper, we review three EM-like variants for Markov random field segmentation and compare their convergence properties both at the theoretical and practical levels. We specifically advocate a numerical scheme involving asynchronous voxel updating, for which general convergence results can be established. Our experiments on brain tissue classification in magnetic resonance images provide evidence that this algorithm may achieve significantly faster convergence than its competitors while yielding at least as good segmentation results.

  3. Enhancing network robustness against targeted and random attacks using a memetic algorithm

    NASA Astrophysics Data System (ADS)

    Tang, Xianglong; Liu, Jing; Zhou, Mingxing

    2015-08-01

    In the past decades, there has been much interest in the elasticity of infrastructures to targeted and random attacks. In the recent work by Schneider C. M. et al., Proc. Natl. Acad. Sci. U.S.A., 108 (2011) 3838, the authors proposed an effective measure (namely R, here we label it as R t to represent the measure for targeted attacks) to evaluate network robustness against targeted node attacks. Using a greedy algorithm, they found that the optimal structure is an onion-like one. However, real systems are often under threats of both targeted attacks and random failures. So, enhancing networks robustness against both targeted and random attacks is of great importance. In this paper, we first design a random-robustness index (Rr) . We find that the onion-like networks destroyed the original strong ability of BA networks in resisting random attacks. Moreover, the structure of an R r -optimized network is found to be different from that of an onion-like network. To design robust scale-free networks (RSF) which are resistant to both targeted and random attacks (TRA) without changing the degree distribution, a memetic algorithm (MA) is proposed, labeled as \\textit{MA-RSF}\\textit{TRA} . In the experiments, both synthetic scale-free networks and real-world networks are used to validate the performance of \\textit{MA-RSF}\\textit{TRA} . The results show that \\textit{MA-RSF} \\textit{TRA} has a great ability in searching for the most robust network structure that is resistant to both targeted and random attacks.

  4. Predicting protein-RNA interaction amino acids using random forest based on submodularity subset selection.

    PubMed

    Pan, Xiaoyong; Zhu, Lin; Fan, Yong-Xian; Yan, Junchi

    2014-11-13

    Protein-RNA interaction plays a very crucial role in many biological processes, such as protein synthesis, transcription and post-transcription of gene expression and pathogenesis of disease. Especially RNAs always function through binding to proteins. Identification of binding interface region is especially useful for cellular pathways analysis and drug design. In this study, we proposed a novel approach for binding sites identification in proteins, which not only integrates local features and global features from protein sequence directly, but also constructed a balanced training dataset using sub-sampling based on submodularity subset selection. Firstly we extracted local features and global features from protein sequence, such as evolution information and molecule weight. Secondly, the number of non-interaction sites is much more than interaction sites, which leads to a sample imbalance problem, and hence biased machine learning model with preference to non-interaction sites. To better resolve this problem, instead of previous randomly sub-sampling over-represented non-interaction sites, a novel sampling approach based on submodularity subset selection was employed, which can select more representative data subset. Finally random forest were trained on optimally selected training subsets to predict interaction sites. Our result showed that our proposed method is very promising for predicting protein-RNA interaction residues, it achieved an accuracy of 0.863, which is better than other state-of-the-art methods. Furthermore, it also indicated the extracted global features have very strong discriminate ability for identifying interaction residues from random forest feature importance analysis.

  5. The backtracking survey propagation algorithm for solving random K-SAT problems

    PubMed Central

    Marino, Raffaele; Parisi, Giorgio; Ricci-Tersenghi, Federico

    2016-01-01

    Discrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K=3,4, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables. PMID:27694952

  6. The backtracking survey propagation algorithm for solving random K-SAT problems

    NASA Astrophysics Data System (ADS)

    Marino, Raffaele; Parisi, Giorgio; Ricci-Tersenghi, Federico

    2016-10-01

    Discrete combinatorial optimization has a central role in many scientific disciplines, however, for hard problems we lack linear time algorithms that would allow us to solve very large instances. Moreover, it is still unclear what are the key features that make a discrete combinatorial optimization problem hard to solve. Here we study random K-satisfiability problems with K=3,4, which are known to be very hard close to the SAT-UNSAT threshold, where problems stop having solutions. We show that the backtracking survey propagation algorithm, in a time practically linear in the problem size, is able to find solutions very close to the threshold, in a region unreachable by any other algorithm. All solutions found have no frozen variables, thus supporting the conjecture that only unfrozen solutions can be found in linear time, and that a problem becomes impossible to solve in linear time when all solutions contain frozen variables.

  7. A randomized algorithm for two-cluster partition of a set of vectors

    NASA Astrophysics Data System (ADS)

    Kel'manov, A. V.; Khandeev, V. I.

    2015-02-01

    A randomized algorithm is substantiated for the strongly NP-hard problem of partitioning a finite set of vectors of Euclidean space into two clusters of given sizes according to the minimum-of-the sum-of-squared-distances criterion. It is assumed that the centroid of one of the clusters is to be optimized and is determined as the mean value over all vectors in this cluster. The centroid of the other cluster is fixed at the origin. For an established parameter value, the algorithm finds an approximate solution of the problem in time that is linear in the space dimension and the input size of the problem for given values of the relative error and failure probability. The conditions are established under which the algorithm is asymptotically exact and runs in time that is linear in the space dimension and quadratic in the input size of the problem.

  8. Representation of high frequency Space Shuttle data by ARMA algorithms and random response spectra

    NASA Technical Reports Server (NTRS)

    Spanos, P. D.; Mushung, L. J.

    1990-01-01

    High frequency Space Shuttle lift-off data are treated by autoregressive (AR) and autoregressive-moving-average (ARMA) digital algorithms. These algorithms provide useful information on the spectral densities of the data. Further, they yield spectral models which lend themselves to incorporation to the concept of the random response spectrum. This concept yields a reasonably smooth power spectrum for the design of structural and mechanical systems when the available data bank is limited. Due to the non-stationarity of the lift-off event, the pertinent data are split into three slices. Each of the slices is associated with a rather distinguishable phase of the lift-off event, where stationarity can be expected. The presented results are rather preliminary in nature; it is aimed to call attention to the availability of the discussed digital algorithms and to the need to augment the Space Shuttle data bank as more flights are completed.

  9. Automated segmentation of the thyroid gland on thoracic CT scans by multiatlas label fusion and random forest classification.

    PubMed

    Narayanan, Divya; Liu, Jiamin; Kim, Lauren; Chang, Kevin W; Lu, Le; Yao, Jianhua; Turkbey, Evrim B; Summers, Ronald M

    2015-10-01

    The thyroid is an endocrine gland that regulates metabolism. Thyroid image analysis plays an important role in both diagnostic radiology and radiation oncology treatment planning. Low tissue contrast of the thyroid relative to surrounding anatomic structures makes manual segmentation of this organ challenging. This work proposes a fully automated system for thyroid segmentation on CT imaging. Following initial thyroid segmentation with multiatlas joint label fusion, a random forest (RF) algorithm was applied. Multiatlas label fusion transfers labels from labeled atlases and warps them to target images using deformable registration. A consensus atlas solution was formed based on optimal weighting of atlases and similarity to a given target image. Following the initial segmentation, a trained RF classifier employed voxel scanning to assign class-conditional probabilities to the voxels in the target image. Thyroid voxels were categorized with positive labels and nonthyroid voxels were categorized with negative labels. Our method was evaluated on CT scans from 66 patients, 6 of which served as atlases for multiatlas label fusion. The system with independent multiatlas label fusion method and RF classifier achieved average dice similarity coefficients of [Formula: see text] and [Formula: see text], respectively. The system with sequential multiatlas label fusion followed by RF correction increased the dice similarity coefficient to [Formula: see text] and improved the segmentation accuracy.

  10. Automatic estimation of voice onset time for word-initial stops by applying random forest to onset detection.

    PubMed

    Lin, Chi-Yueh; Wang, Hsiao-Chuan

    2011-07-01

    The voice onset time (VOT) of a stop consonant is the interval between its burst onset and voicing onset. Among a variety of research topics on VOT, one that has been studied for years is how VOTs are efficiently measured. Manual annotation is a feasible way, but it becomes a time-consuming task when the corpus size is large. This paper proposes an automatic VOT estimation method based on an onset detection algorithm. At first, a forced alignment is applied to identify the locations of stop consonants. Then a random forest based onset detector searches each stop segment for its burst and voicing onsets to estimate a VOT. The proposed onset detection can detect the onsets in an efficient and accurate manner with only a small amount of training data. The evaluation data extracted from the TIMIT corpus were 2344 words with a word-initial stop. The experimental results showed that 83.4% of the estimations deviate less than 10 ms from their manually labeled values, and 96.5% of the estimations deviate by less than 20 ms. Some factors that influence the proposed estimation method, such as place of articulation, voicing of a stop consonant, and quality of succeeding vowel, were also investigated.

  11. A Note on the Behavior of the Randomized Kaczmarz Algorithm of Strohmer and Vershynin.

    PubMed

    Censor, Yair; Herman, Gabor T; Jiang, Ming

    2009-08-01

    In a recent paper by T. Strohmer and R. Vershynin ["A Randomized Kaczmarz Algorithm with Exponential Convergence", Journal of Fourier Analysis and Applications, published online on April 25, 2008] a "randomized Kaczmarz algorithm" is proposed for solving systems of linear equations [Formula: see text] . In that algorithm the next equation to be used in an iterative Kaczmarz process is selected with a probability proportional to ‖a(i)‖ (2). The paper illustrates the superiority of this selection method for the reconstruction of a bandlimited function from its nonuniformly spaced sampling values.In this note we point out that the reported success of the algorithm of Strohmer and Vershynin in their numerical simulation depends on the specific choices that are made in translating the underlying problem, whose geometrical nature is "find a common point of a set of hyperplanes", into a system of algebraic equations. If this translation is carefully done, as in the numerical simulation provided by Strohmer and Vershynin for the reconstruction of a bandlimited function from its nonuniformly spaced sampling values, then indeed good performance may result. However, there will always be legitimate algebraic representations of the underlying problem (so that the set of solutions of the system of algebraic equations is exactly the set of points in the intersection of the hyperplanes), for which the selection method of Strohmer and Vershynin will perform in an inferior manner.

  12. A large scale microwave emission model for forests. Contribution to the SMOS algorithm

    NASA Astrophysics Data System (ADS)

    Rahmoune, R.; Della Vecchia, A.; Ferrazzoli, P.; Guerriero, L.; Martin-Porqueras, F.

    2009-04-01

    are being considered. Also the effects of temperature gradients within the crown canopy are being considered. The model was tested against radiometric measurements carried out by towers and aircrafts. A new test has been done using the brightness temperatures measured over some forests in Finland by the AMIRAS radiometer, which is an airborne demonstrator of the MIRAS imaging radiometer to be launched with SMOS. The outputs produced by the model are used to fit the parameters of the simple radiative transfer model which will be used in the Level 2 soil moisture retrieval algorithm. It is planned to compare model outputs with L1C data, which will be made available during the commissioning phase. To this end, a number of adequate extended forest sites are being selected: the Amazon rain forest, the Zaire Basins, the Argentinian Chaco forest, and the Finland forest. 2. PARAMETRIC STUDIES In this paper, results of parametric simulations are shown. The emissivity at vertical and horizontal polarization is simulated as a function of soil moisture content for various conditions of forest cover. Seasonal effects are considered, and the values of Leaf Area Index in winter and summer are taken as basic inputs. The difference between the two values is attributed partially to arboreous foliage and partially to understory, while the woody biomass is assumed to be constant in time. Results indicate that seasonal effects are limited, but not negligible. The simulations are repeated for different distributions of trunk diameters. If the distributions is centered over lower diameter values, the forest is optically thicker, for a given biomass. Also the variations of brightness temperature due to a temperature gradient within the crown canopy have been estimated. The outputs are used to predict the values of a simple first order RT model. 3. COMPARISONS WITH EXPERIMENTAL DATA Results of previous comparisons between model simulations and experimental data are summarized. Experimental

  13. What a difference a parameter makes: a psychophysical comparison of random dot motion algorithms.

    PubMed

    Pilly, Praveen K; Seitz, Aaron R

    2009-06-01

    Random dot motion (RDM) displays have emerged as one of the standard stimulus types employed in psychophysical and physiological studies of motion processing. RDMs are convenient because it is straightforward to manipulate the relative motion energy for a given motion direction in addition to stimulus parameters such as the speed, contrast, duration, density, aperture, etc. However, as widely as RDMs are employed so do they vary in their details of implementation. As a result, it is often difficult to make direct comparisons across studies employing different RDM algorithms and parameters. Here, we systematically measure the ability of human subjects to estimate motion direction for four commonly used RDM algorithms under a range of parameters in order to understand how these different algorithms compare in their perceptibility. We find that parametric and algorithmic differences can produce dramatically different performances. These effects, while surprising, can be understood in relationship to pertinent neurophysiological data regarding spatiotemporal displacement tuning properties of cells in area MT and how the tuning function changes with stimulus contrast and retinal eccentricity. These data help give a baseline by which different RDM algorithms can be compared, demonstrate a need for clearly reporting RDM details in the methods of papers, and also pose new constraints and challenges to models of motion direction processing.

  14. Randomized algorithms for stability and robustness analysis of high-speed communication networks.

    PubMed

    Alpcan, Tansu; Başar, Tamer; Tempo, Roberto

    2005-09-01

    This paper initiates a study toward developing and applying randomized algorithms for stability of high-speed communication networks. The focus is on congestion and delay-based flow controllers for sources, which are "utility maximizers" for individual users. First, we introduce a nonlinear algorithm for such source flow controllers, which uses as feedback aggregate congestion and delay information from bottleneck nodes of the network, and depends on a number of parameters, among which are link capacities, user preference for utility, and pricing. We then linearize this nonlinear model around its unique equilibrium point and perform a robustness analysis for a special symmetric case with a single bottleneck node. The "symmetry" here captures the scenario when certain utility and pricing parameters are the same across all active users, for which we derive closed-form necessary and sufficient conditions for stability and robustness under parameter variations. In addition, the ranges of values for the utility and pricing parameters for which stability is guaranteed are computed exactly. These results also admit counterparts for the case when the pricing parameters vary across users, but the utility parameter values are still the same. In the general nonsymmetric case, when closed-form derivation is not possible, we construct specific randomized algorithms which provide a probabilistic estimate of the local stability of the network. In particular, we use Monte Carlo as well as quasi-Monte Carlo techniques for the linearized model. The results obtained provide a complete analysis of congestion control algorithms for internet style networks with a single bottleneck node as well as for networks with general random topologies. PMID:16252829

  15. Classification of interstitial lung disease patterns using local DCT features and random forest.

    PubMed

    Anthimopoulos, M; Christodoulidis, S; Christe, A; Mougiakakou, S

    2014-01-01

    Over the last decade, a plethora of computer-aided diagnosis (CAD) systems have been proposed aiming to improve the accuracy of the physicians in the diagnosis of interstitial lung diseases (ILD). In this study, we propose a scheme for the classification of HRCT image patches with ILD abnormalities as a basic component towards the quantification of the various ILD patterns in the lung. The feature extraction method relies on local spectral analysis using a DCT-based filter bank. After convolving the image with the filter bank, q-quantiles are computed for describing the distribution of local frequencies that characterize image texture. Then, the gray-level histogram values of the original image are added forming the final feature vector. The classification of the already described patches is done by a random forest (RF) classifier. The experimental results prove the superior performance and efficiency of the proposed approach compared against the state-of-the-art.

  16. Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest.

    PubMed

    Liao, Zhijun; Ju, Ying; Zou, Quan

    2016-01-01

    G protein-coupled receptors (GPCRs) are the largest receptor superfamily. In this paper, we try to employ physical-chemical properties, which come from SVM-Prot, to represent GPCR. Random Forest was utilized as classifier for distinguishing them from other protein sequences. MEME suite was used to detect the most significant 10 conserved motifs of human GPCRs. In the testing datasets, the average accuracy was 91.61%, and the average AUC was 0.9282. MEME discovery analysis showed that many motifs aggregated in the seven hydrophobic helices transmembrane regions adapt to the characteristic of GPCRs. All of the above indicate that our machine-learning method can successfully distinguish GPCRs from non-GPCRs. PMID:27529053

  17. Prediction of G Protein-Coupled Receptors with SVM-Prot Features and Random Forest

    PubMed Central

    Ju, Ying

    2016-01-01

    G protein-coupled receptors (GPCRs) are the largest receptor superfamily. In this paper, we try to employ physical-chemical properties, which come from SVM-Prot, to represent GPCR. Random Forest was utilized as classifier for distinguishing them from other protein sequences. MEME suite was used to detect the most significant 10 conserved motifs of human GPCRs. In the testing datasets, the average accuracy was 91.61%, and the average AUC was 0.9282. MEME discovery analysis showed that many motifs aggregated in the seven hydrophobic helices transmembrane regions adapt to the characteristic of GPCRs. All of the above indicate that our machine-learning method can successfully distinguish GPCRs from non-GPCRs. PMID:27529053

  18. Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling

    NASA Astrophysics Data System (ADS)

    Deglint, Jason; Kazemzadeh, Farnoud; Cho, Daniel; Clausi, David A.; Wong, Alexander

    2016-06-01

    The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image devices. In this study, we investigate the feasibility of simultaneous multispectral imaging using conventional image sensors with color filter arrays via a novel comprehensive framework for numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the color image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Experimental results demonstrate the feasibility of such a method for the purpose of simultaneous multispectral imaging.

  19. A multi-stage random forest classifier for phase contrast cell segmentation.

    PubMed

    Essa, Ehab; Xie, Xianghua; Errington, Rachel J; White, Nick

    2015-01-01

    We present a machine learning based approach to automatically detect and segment cells in phase contrast images. The proposed method consists of a multi-stage classification scheme based on random forest (RF) classifier. Both low level and mid level image features are used to determine meaningful cell regions. Pixel-wise RF classification is first carried out to categorize pixels into 4 classes (dark cell, bright cell, halo artifact, and background) and generate a probability map for cell regions. K-means clustering is then applied on the probability map to group similar pixels into candidate cell regions. Finally, cell validation is performed by another RF to verify the candidate cell regions. The proposed method has been tested on U2-OS human osteosarcoma phase contrast images. The experimental results show better performance of the proposed method with precision 92.96% and recall 96.63% compared to a state-of-the-art segmentation technique. PMID:26737137

  20. Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling.

    PubMed

    Deglint, Jason; Kazemzadeh, Farnoud; Cho, Daniel; Clausi, David A; Wong, Alexander

    2016-01-01

    The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image devices. In this study, we investigate the feasibility of simultaneous multispectral imaging using conventional image sensors with color filter arrays via a novel comprehensive framework for numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the color image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Experimental results demonstrate the feasibility of such a method for the purpose of simultaneous multispectral imaging. PMID:27346434

  1. A multi-stage random forest classifier for phase contrast cell segmentation.

    PubMed

    Essa, Ehab; Xie, Xianghua; Errington, Rachel J; White, Nick

    2015-01-01

    We present a machine learning based approach to automatically detect and segment cells in phase contrast images. The proposed method consists of a multi-stage classification scheme based on random forest (RF) classifier. Both low level and mid level image features are used to determine meaningful cell regions. Pixel-wise RF classification is first carried out to categorize pixels into 4 classes (dark cell, bright cell, halo artifact, and background) and generate a probability map for cell regions. K-means clustering is then applied on the probability map to group similar pixels into candidate cell regions. Finally, cell validation is performed by another RF to verify the candidate cell regions. The proposed method has been tested on U2-OS human osteosarcoma phase contrast images. The experimental results show better performance of the proposed method with precision 92.96% and recall 96.63% compared to a state-of-the-art segmentation technique.

  2. Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling

    PubMed Central

    Deglint, Jason; Kazemzadeh, Farnoud; Cho, Daniel; Clausi, David A.; Wong, Alexander

    2016-01-01

    The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image devices. In this study, we investigate the feasibility of simultaneous multispectral imaging using conventional image sensors with color filter arrays via a novel comprehensive framework for numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the color image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Experimental results demonstrate the feasibility of such a method for the purpose of simultaneous multispectral imaging. PMID:27346434

  3. Confidence Intervals for Random Forests: The Jackknife and the Infinitesimal Jackknife

    PubMed Central

    Wager, Stefan; Hastie, Trevor; Efron, Bradley

    2014-01-01

    We study the variability of predictions made by bagged learners and random forests, and show how to estimate standard errors for these methods. Our work builds on variance estimates for bagging proposed by Efron (1992, 2013) that are based on the jackknife and the infinitesimal jackknife (IJ). In practice, bagged predictors are computed using a finite number B of bootstrap replicates, and working with a large B can be computationally expensive. Direct applications of jackknife and IJ estimators to bagging require B = Θ(n1.5) bootstrap replicates to converge, where n is the size of the training set. We propose improved versions that only require B = Θ(n) replicates. Moreover, we show that the IJ estimator requires 1.7 times less bootstrap replicates than the jackknife to achieve a given accuracy. Finally, we study the sampling distributions of the jackknife and IJ variance estimates themselves. We illustrate our findings with multiple experiments and simulation studies. PMID:25580094

  4. Numerical Demultiplexing of Color Image Sensor Measurements via Non-linear Random Forest Modeling.

    PubMed

    Deglint, Jason; Kazemzadeh, Farnoud; Cho, Daniel; Clausi, David A; Wong, Alexander

    2016-06-27

    The simultaneous capture of imaging data at multiple wavelengths across the electromagnetic spectrum is highly challenging, requiring complex and costly multispectral image devices. In this study, we investigate the feasibility of simultaneous multispectral imaging using conventional image sensors with color filter arrays via a novel comprehensive framework for numerical demultiplexing of the color image sensor measurements. A numerical forward model characterizing the formation of sensor measurements from light spectra hitting the sensor is constructed based on a comprehensive spectral characterization of the sensor. A numerical demultiplexer is then learned via non-linear random forest modeling based on the forward model. Given the learned numerical demultiplexer, one can then demultiplex simultaneously-acquired measurements made by the color image sensor into reflectance intensities at discrete selectable wavelengths, resulting in a higher resolution reflectance spectrum. Experimental results demonstrate the feasibility of such a method for the purpose of simultaneous multispectral imaging.

  5. A Random Forest Model for Predicting Allosteric and Functional Sites on Proteins.

    PubMed

    Chen, Ava S-Y; Westwood, Nicholas J; Brear, Paul; Rogers, Graeme W; Mavridis, Lazaros; Mitchell, John B O

    2016-04-01

    We created a computational method to identify allosteric sites using a machine learning method trained and tested on protein structures containing bound ligand molecules. The Random Forest machine learning approach was adopted to build our three-way predictive model. Based on descriptors collated for each ligand and binding site, the classification model allows us to assign protein cavities as allosteric, regular or orthosteric, and hence to identify allosteric sites. 43 structural descriptors per complex were derived and were used to characterize individual protein-ligand binding sites belonging to the three classes, allosteric, regular and orthosteric. We carried out a separate validation on a further unseen set of protein structures containing the ligand 2-(N-cyclohexylamino) ethane sulfonic acid (CHES). PMID:27491922

  6. Hyperspectral image clustering method based on artificial bee colony algorithm and Markov random fields

    NASA Astrophysics Data System (ADS)

    Sun, Xu; Yang, Lina; Gao, Lianru; Zhang, Bing; Li, Shanshan; Li, Jun

    2015-01-01

    Center-oriented hyperspectral image clustering methods have been widely applied to hyperspectral remote sensing image processing; however, the drawbacks are obvious, including the over-simplicity of computing models and underutilized spatial information. In recent years, some studies have been conducted trying to improve this situation. We introduce the artificial bee colony (ABC) and Markov random field (MRF) algorithms to propose an ABC-MRF-cluster model to solve the problems mentioned above. In this model, a typical ABC algorithm framework is adopted in which cluster centers and iteration conditional model algorithm's results are considered as feasible solutions and objective functions separately, and MRF is modified to be capable of dealing with the clustering problem. Finally, four datasets and two indices are used to show that the application of ABC-cluster and ABC-MRF-cluster methods could help to obtain better image accuracy than conventional methods. Specifically, the ABC-cluster method is superior when used for a higher power of spectral discrimination, whereas the ABC-MRF-cluster method can provide better results when used for an adjusted random index. In experiments on simulated images with different signal-to-noise ratios, ABC-cluster and ABC-MRF-cluster showed good stability.

  7. An evaluation of ISOCLS and CLASSY clustering algorithms for forest classification in northern Idaho. [Elk River quadrange of the Clearwater National Forest

    NASA Technical Reports Server (NTRS)

    Werth, L. F. (Principal Investigator)

    1981-01-01

    Both the iterative self-organizing clustering system (ISOCLS) and the CLASSY algorithms were applied to forest and nonforest classes for one 1:24,000 quadrangle map of northern Idaho and the classification and mapping accuracies were evaluated with 1:30,000 color infrared aerial photography. Confusion matrices for the two clustering algorithms were generated and studied to determine which is most applicable to forest and rangeland inventories in future projects. In an unsupervised mode, ISOCLS requires many trial-and-error runs to find the proper parameters to separate desired information classes. CLASSY tells more in a single run concerning the classes that can be separated, shows more promise for forest stratification than ISOCLS, and shows more promise for consistency. One major drawback to CLASSY is that important forest and range classes that are smaller than a minimum cluster size will be combined with other classes. The algorithm requires so much computer storage that only data sets as small as a quadrangle can be used at one time.

  8. Random forest in remote sensing: A review of applications and future directions

    NASA Astrophysics Data System (ADS)

    Belgiu, Mariana; Drăguţ, Lucian

    2016-04-01

    A random forest (RF) classifier is an ensemble classifier that produces multiple decision trees, using a randomly selected subset of training samples and variables. This classifier has become popular within the remote sensing community due to the accuracy of its classifications. The overall objective of this work was to review the utilization of RF classifier in remote sensing. This review has revealed that RF classifier can successfully handle high data dimensionality and multicolinearity, being both fast and insensitive to overfitting. It is, however, sensitive to the sampling design. The variable importance (VI) measurement provided by the RF classifier has been extensively exploited in different scenarios, for example to reduce the number of dimensions of hyperspectral data, to identify the most relevant multisource remote sensing and geographic data, and to select the most suitable season to classify particular target classes. Further investigations are required into less commonly exploited uses of this classifier, such as for sample proximity analysis to detect and remove outliers in the training samples.

  9. Groupwise conditional random forests for automatic shape classification and contour quality assessment in radiotherapy planning.

    PubMed

    McIntosh, Chris; Svistoun, Igor; Purdie, Thomas G

    2013-06-01

    Radiation therapy is used to treat cancer patients around the world. High quality treatment plans maximally radiate the targets while minimally radiating healthy organs at risk. In order to judge plan quality and safety, segmentations of the targets and organs at risk are created, and the amount of radiation that will be delivered to each structure is estimated prior to treatment. If the targets or organs at risk are mislabelled, or the segmentations are of poor quality, the safety of the radiation doses will be erroneously reviewed and an unsafe plan could proceed. We propose a technique to automatically label groups of segmentations of different structures from a radiation therapy plan for the joint purposes of providing quality assurance and data mining. Given one or more segmentations and an associated image we seek to assign medically meaningful labels to each segmentation and report the confidence of that label. Our method uses random forests to learn joint distributions over the training features, and then exploits a set of learned potential group configurations to build a conditional random field (CRF) that ensures the assignment of labels is consistent across the group of segmentations. The CRF is then solved via a constrained assignment problem. We validate our method on 1574 plans, consisting of 17[Formula: see text] 579 segmentations, demonstrating an overall classification accuracy of 91.58%. Our results also demonstrate the stability of RF with respect to tree depth and the number of splitting variables in large data sets.

  10. View-invariant, partially occluded human detection in still images using part bases and random forest

    NASA Astrophysics Data System (ADS)

    Ko, Byoung Chul; Son, Jung Eun; Nam, Jae-Yeal

    2015-05-01

    This paper presents a part-based human detection method that is invariant to variations in the view of the human and partial occlusion by other objects. First, to address the view variance, parts are extracted from three views: frontal-rear, left profile, and right profile. Then a random set of rectangular parts are extracted from the upper, middle, and lower body as the distribution of Gaussian. Second, an individual part classifier is constructed using random forests across all parts extracted from the three views. From the part locations of each view, part vectors (PVs) are generated and part bases (PB) are also formalized by clustering PVs with their weights of each PB. For testing, a PV for the frontal-rear view is estimated using trained part detectors and is then applied to the trained PB for each view class. Then the distance is computed between the PB and PVs. After applying the same process to the other two views, the final human and its view having the minimum score are selected. The proposed method is applied to pedestrian datasets and its detection precision is, on average, 0.14 higher than related methods, while achieving a faster or comparable processing time with an average of 1.85 s per image.

  11. Random Forest-Based Recognition of Isolated Sign Language Subwords Using Data from Accelerometers and Surface Electromyographic Sensors.

    PubMed

    Su, Ruiliang; Chen, Xiang; Cao, Shuai; Zhang, Xu

    2015-01-01

    Sign language recognition (SLR) has been widely used for communication amongst the hearing-impaired and non-verbal community. This paper proposes an accurate and robust SLR framework using an improved decision tree as the base classifier of random forests. This framework was used to recognize Chinese sign language subwords using recordings from a pair of portable devices worn on both arms consisting of accelerometers (ACC) and surface electromyography (sEMG) sensors. The experimental results demonstrated the validity of the proposed random forest-based method for recognition of Chinese sign language (CSL) subwords. With the proposed method, 98.25% average accuracy was obtained for the classification of a list of 121 frequently used CSL subwords. Moreover, the random forests method demonstrated a superior performance in resisting the impact of bad training samples. When the proportion of bad samples in the training set reached 50%, the recognition error rate of the random forest-based method was only 10.67%, while that of a single decision tree adopted in our previous work was almost 27.5%. Our study offers a practical way of realizing a robust and wearable EMG-ACC-based SLR systems. PMID:26784195

  12. An Introduction to Recursive Partitioning: Rationale, Application, and Characteristics of Classification and Regression Trees, Bagging, and Random Forests

    ERIC Educational Resources Information Center

    Strobl, Carolin; Malley, James; Tutz, Gerhard

    2009-01-01

    Recursive partitioning methods have become popular and widely used tools for nonparametric regression and classification in many scientific fields. Especially random forests, which can deal with large numbers of predictor variables even in the presence of complex interactions, have been applied successfully in genetics, clinical medicine, and…

  13. Lesion segmentation from multimodal MRI using random forest following ischemic stroke.

    PubMed

    Mitra, Jhimli; Bourgeat, Pierrick; Fripp, Jurgen; Ghose, Soumya; Rose, Stephen; Salvado, Olivier; Connelly, Alan; Campbell, Bruce; Palmer, Susan; Sharma, Gagan; Christensen, Soren; Carey, Leeanne

    2014-09-01

    Understanding structure-function relationships in the brain after stroke is reliant not only on the accurate anatomical delineation of the focal ischemic lesion, but also on previous infarcts, remote changes and the presence of white matter hyperintensities. The robust definition of primary stroke boundaries and secondary brain lesions will have significant impact on investigation of brain-behavior relationships and lesion volume correlations with clinical measures after stroke. Here we present an automated approach to identify chronic ischemic infarcts in addition to other white matter pathologies, that may be used to aid the development of post-stroke management strategies. Our approach uses Bayesian-Markov Random Field (MRF) classification to segment probable lesion volumes present on fluid attenuated inversion recovery (FLAIR) MRI. Thereafter, a random forest classification of the information from multimodal (T1-weighted, T2-weighted, FLAIR, and apparent diffusion coefficient (ADC)) MRI images and other context-aware features (within the probable lesion areas) was used to extract areas with high likelihood of being classified as lesions. The final segmentation of the lesion was obtained by thresholding the random forest probabilistic maps. The accuracy of the automated lesion delineation method was assessed in a total of 36 patients (24 male, 12 female, mean age: 64.57±14.23yrs) at 3months after stroke onset and compared with manually segmented lesion volumes by an expert. Accuracy assessment of the automated lesion identification method was performed using the commonly used evaluation metrics. The mean sensitivity of segmentation was measured to be 0.53±0.13 with a mean positive predictive value of 0.75±0.18. The mean lesion volume difference was observed to be 32.32%±21.643% with a high Pearson's correlation of r=0.76 (p<0.0001). The lesion overlap accuracy was measured in terms of Dice similarity coefficient with a mean of 0.60±0.12, while the contour

  14. VES/TEM 1D joint inversion by using Controlled Random Search (CRS) algorithm

    NASA Astrophysics Data System (ADS)

    Bortolozo, Cassiano Antonio; Porsani, Jorge Luís; Santos, Fernando Acácio Monteiro dos; Almeida, Emerson Rodrigo

    2015-01-01

    Electrical (DC) and Transient Electromagnetic (TEM) soundings are used in a great number of environmental, hydrological, and mining exploration studies. Usually, data interpretation is accomplished by individual 1D models resulting often in ambiguous models. This fact can be explained by the way as the two different methodologies sample the medium beneath surface. Vertical Electrical Sounding (VES) is good in marking resistive structures, while Transient Electromagnetic sounding (TEM) is very sensitive to conductive structures. Another difference is VES is better to detect shallow structures, while TEM soundings can reach deeper layers. A Matlab program for 1D joint inversion of VES and TEM soundings was developed aiming at exploring the best of both methods. The program uses CRS - Controlled Random Search - algorithm for both single and 1D joint inversions. Usually inversion programs use Marquadt type algorithms but for electrical and electromagnetic methods, these algorithms may find a local minimum or not converge. Initially, the algorithm was tested with synthetic data, and then it was used to invert experimental data from two places in Paraná sedimentary basin (Bebedouro and Pirassununga cities), both located in São Paulo State, Brazil. Geoelectric model obtained from VES and TEM data 1D joint inversion is similar to the real geological condition, and ambiguities were minimized. Results with synthetic and real data show that 1D VES/TEM joint inversion better recovers simulated models and shows a great potential in geological studies, especially in hydrogeological studies.

  15. Random Forest as an Imputation Method for Education and Psychology Research: Its Impact on Item Fit and Difficulty of the Rasch Model

    ERIC Educational Resources Information Center

    Golino, Hudson F.; Gomes, Cristiano M. A.

    2016-01-01

    This paper presents a non-parametric imputation technique, named random forest, from the machine learning field. The random forest procedure has two main tuning parameters: the number of trees grown in the prediction and the number of predictors used. Fifty experimental conditions were created in the imputation procedure, with different…

  16. Evolving random fractal Cantor superlattices for the infrared using a genetic algorithm.

    PubMed

    Bossard, Jeremy A; Lin, Lan; Werner, Douglas H

    2016-01-01

    Ordered and chaotic superlattices have been identified in Nature that give rise to a variety of colours reflected by the skin of various organisms. In particular, organisms such as silvery fish possess superlattices that reflect a broad range of light from the visible to the UV. Such superlattices have previously been identified as 'chaotic', but we propose that apparent 'chaotic' natural structures, which have been previously modelled as completely random structures, should have an underlying fractal geometry. Fractal geometry, often described as the geometry of Nature, can be used to mimic structures found in Nature, but deterministic fractals produce structures that are too 'perfect' to appear natural. Introducing variability into fractals produces structures that appear more natural. We suggest that the 'chaotic' (purely random) superlattices identified in Nature are more accurately modelled by multi-generator fractals. Furthermore, we introduce fractal random Cantor bars as a candidate for generating both ordered and 'chaotic' superlattices, such as the ones found in silvery fish. A genetic algorithm is used to evolve optimal fractal random Cantor bars with multiple generators targeting several desired optical functions in the mid-infrared and the near-infrared. We present optimized superlattices demonstrating broadband reflection as well as single and multiple pass bands in the near-infrared regime.

  17. Evolving random fractal Cantor superlattices for the infrared using a genetic algorithm

    PubMed Central

    Bossard, Jeremy A.; Lin, Lan; Werner, Douglas H.

    2016-01-01

    Ordered and chaotic superlattices have been identified in Nature that give rise to a variety of colours reflected by the skin of various organisms. In particular, organisms such as silvery fish possess superlattices that reflect a broad range of light from the visible to the UV. Such superlattices have previously been identified as ‘chaotic’, but we propose that apparent ‘chaotic’ natural structures, which have been previously modelled as completely random structures, should have an underlying fractal geometry. Fractal geometry, often described as the geometry of Nature, can be used to mimic structures found in Nature, but deterministic fractals produce structures that are too ‘perfect’ to appear natural. Introducing variability into fractals produces structures that appear more natural. We suggest that the ‘chaotic’ (purely random) superlattices identified in Nature are more accurately modelled by multi-generator fractals. Furthermore, we introduce fractal random Cantor bars as a candidate for generating both ordered and ‘chaotic’ superlattices, such as the ones found in silvery fish. A genetic algorithm is used to evolve optimal fractal random Cantor bars with multiple generators targeting several desired optical functions in the mid-infrared and the near-infrared. We present optimized superlattices demonstrating broadband reflection as well as single and multiple pass bands in the near-infrared regime. PMID:26763335

  18. Evolving random fractal Cantor superlattices for the infrared using a genetic algorithm.

    PubMed

    Bossard, Jeremy A; Lin, Lan; Werner, Douglas H

    2016-01-01

    Ordered and chaotic superlattices have been identified in Nature that give rise to a variety of colours reflected by the skin of various organisms. In particular, organisms such as silvery fish possess superlattices that reflect a broad range of light from the visible to the UV. Such superlattices have previously been identified as 'chaotic', but we propose that apparent 'chaotic' natural structures, which have been previously modelled as completely random structures, should have an underlying fractal geometry. Fractal geometry, often described as the geometry of Nature, can be used to mimic structures found in Nature, but deterministic fractals produce structures that are too 'perfect' to appear natural. Introducing variability into fractals produces structures that appear more natural. We suggest that the 'chaotic' (purely random) superlattices identified in Nature are more accurately modelled by multi-generator fractals. Furthermore, we introduce fractal random Cantor bars as a candidate for generating both ordered and 'chaotic' superlattices, such as the ones found in silvery fish. A genetic algorithm is used to evolve optimal fractal random Cantor bars with multiple generators targeting several desired optical functions in the mid-infrared and the near-infrared. We present optimized superlattices demonstrating broadband reflection as well as single and multiple pass bands in the near-infrared regime. PMID:26763335

  19. Automated Detection of Selective Logging in Amazon Forests Using Airborne Lidar Data and Pattern Recognition Algorithms

    NASA Astrophysics Data System (ADS)

    Keller, M. M.; d'Oliveira, M. N.; Takemura, C. M.; Vitoria, D.; Araujo, L. S.; Morton, D. C.

    2012-12-01

    Selective logging, the removal of several valuable timber trees per hectare, is an important land use in the Brazilian Amazon and may degrade forests through long term changes in structure, loss of forest carbon and species diversity. Similar to deforestation, the annual area affected by selected logging has declined significantly in the past decade. Nonetheless, this land use affects several thousand km2 per year in Brazil. We studied a 1000 ha area of the Antimary State Forest (FEA) in the State of Acre, Brazil (9.304 ○S, 68.281 ○W) that has a basal area of 22.5 m2 ha-1 and an above-ground biomass of 231 Mg ha-1. Logging intensity was low, approximately 10 to 15 m3 ha-1. We collected small-footprint airborne lidar data using an Optech ALTM 3100EA over the study area once each in 2010 and 2011. The study area contained both recent and older logging that used both conventional and technologically advanced logging techniques. Lidar return density averaged over 20 m-2 for both collection periods with estimated horizontal and vertical precision of 0.30 and 0.15 m. A relative density model comparing returns from 0 to 1 m elevation to returns in 1-5 m elevation range revealed the pattern of roads and skid trails. These patterns were confirmed by ground-based GPS survey. A GIS model of the road and skid network was built using lidar and ground data. We tested and compared two pattern recognition approaches used to automate logging detection. Both segmentation using commercial eCognition segmentation and a Frangi filter algorithm identified the road and skid trail network compared to the GIS model. We report on the effectiveness of these two techniques.

  20. Cooperative mobile agents search using beehive partitioned structure and Tabu Random search algorithm

    NASA Astrophysics Data System (ADS)

    Ramazani, Saba; Jackson, Delvin L.; Selmic, Rastko R.

    2013-05-01

    In search and surveillance operations, deploying a team of mobile agents provides a robust solution that has multiple advantages over using a single agent in efficiency and minimizing exploration time. This paper addresses the challenge of identifying a target in a given environment when using a team of mobile agents by proposing a novel method of mapping and movement of agent teams in a cooperative manner. The approach consists of two parts. First, the region is partitioned into a hexagonal beehive structure in order to provide equidistant movements in every direction and to allow for more natural and flexible environment mapping. Additionally, in search environments that are partitioned into hexagons, mobile agents have an efficient travel path while performing searches due to this partitioning approach. Second, we use a team of mobile agents that move in a cooperative manner and utilize the Tabu Random algorithm to search for the target. Due to the ever-increasing use of robotics and Unmanned Aerial Vehicle (UAV) platforms, the field of cooperative multi-agent search has developed many applications recently that would benefit from the use of the approach presented in this work, including: search and rescue operations, surveillance, data collection, and border patrol. In this paper, the increased efficiency of the Tabu Random Search algorithm method in combination with hexagonal partitioning is simulated, analyzed, and advantages of this approach are presented and discussed.

  1. Improved random-starting method for the EM algorithm for finite mixtures of regressions.

    PubMed

    Schepers, Jan

    2015-03-01

    Two methods for generating random starting values for the expectation maximization (EM) algorithm are compared in terms of yielding maximum likelihood parameter estimates in finite mixtures of regressions. One of these methods is ubiquitous in applications of finite mixture regression, whereas the other method is an alternative that appears not to have been used so far. The two methods are compared in two simulation studies and on an illustrative data set. The results show that the alternative method yields solutions with likelihood values at least as high as, and often higher than, those returned by the standard method. Moreover, analyses of the illustrative data set show that the results obtained by the two methods may differ considerably with regard to some of the substantive conclusions. The results reported in this article indicate that in applications of finite mixture regression, consideration should be given to the type of mechanism chosen to generate random starting values for the EM algorithm. In order to facilitate the use of the proposed alternative method, an R function implementing the approach is provided in the Appendix of the article.

  2. Classification of nanoparticle diffusion processes in vital cells by a multifeature random forests approach: application to simulated data, darkfield, and confocal laser scanning microscopy

    NASA Astrophysics Data System (ADS)

    Wagner, Thorsten; Kroll, Alexandra; Wiemann, Martin; Lipinski, Hans-Gerd

    2016-04-01

    Darkfield and confocal laser scanning microscopy both allow for a simultaneous observation of live cells and single nanoparticles. Accordingly, a characterization of nanoparticle uptake and intracellular mobility appears possible within living cells. Single particle tracking makes it possible to characterize the particle and the surrounding cell. In case of free diffusion, the mean squared displacement for each trajectory of a nanoparticle can be measured which allows computing the corresponding diffusion coefficient and, if desired, converting it into the hydrodynamic diameter using the Stokes-Einstein equation and the viscosity of the fluid. However, within the more complex system of a cell's cytoplasm unrestrained diffusion is scarce and several other types of movements may occur. Thus, confined or anomalous diffusion (e.g. diffusion in porous media), active transport, and combinations thereof were described by several authors. To distinguish between these types of particle movement we developed an appropriate classification method, and simulated three types of particle motion in a 2D plane using a Monte Carlo approach: (1) normal diffusion, using random direction and step-length, (2) subdiffusion, using confinements like a reflective boundary with defined radius or reflective objects in the closer vicinity, and (3) superdiffusion, using a directed flow added to the normal diffusion. To simulate subdiffusion we devised a new method based on tracks of different length combined with equally probable obstacle interaction. Next we estimated the fractal dimension, elongation and the ratio of long-time / short-time diffusion coefficients. These features were used to train a random forests classification algorithm. The accuracy for simulated trajectories with 180 steps was 97% (95%-CI: 0.9481-0.9884). The balanced accuracy was 94%, 99% and 98% for normal-, sub- and superdiffusion, respectively. Nanoparticle tracking analysis was used with 100 nm polystyrene particles

  3. Random forests for verbal autopsy analysis: multisite validation study using clinical diagnostic gold standards

    PubMed Central

    2011-01-01

    Background Computer-coded verbal autopsy (CCVA) is a promising alternative to the standard approach of physician-certified verbal autopsy (PCVA), because of its high speed, low cost, and reliability. This study introduces a new CCVA technique and validates its performance using defined clinical diagnostic criteria as a gold standard for a multisite sample of 12,542 verbal autopsies (VAs). Methods The Random Forest (RF) Method from machine learning (ML) was adapted to predict cause of death by training random forests to distinguish between each pair of causes, and then combining the results through a novel ranking technique. We assessed quality of the new method at the individual level using chance-corrected concordance and at the population level using cause-specific mortality fraction (CSMF) accuracy as well as linear regression. We also compared the quality of RF to PCVA for all of these metrics. We performed this analysis separately for adult, child, and neonatal VAs. We also assessed the variation in performance with and without household recall of health care experience (HCE). Results For all metrics, for all settings, RF was as good as or better than PCVA, with the exception of a nonsignificantly lower CSMF accuracy for neonates with HCE information. With HCE, the chance-corrected concordance of RF was 3.4 percentage points higher for adults, 3.2 percentage points higher for children, and 1.6 percentage points higher for neonates. The CSMF accuracy was 0.097 higher for adults, 0.097 higher for children, and 0.007 lower for neonates. Without HCE, the chance-corrected concordance of RF was 8.1 percentage points higher than PCVA for adults, 10.2 percentage points higher for children, and 5.9 percentage points higher for neonates. The CSMF accuracy was higher for RF by 0.102 for adults, 0.131 for children, and 0.025 for neonates. Conclusions We found that our RF Method outperformed the PCVA method in terms of chance-corrected concordance and CSMF accuracy for

  4. Drug Concentration Thresholds Predictive of Therapy Failure and Death in Children With Tuberculosis: Bread Crumb Trails in Random Forests

    PubMed Central

    Swaminathan, Soumya; Pasipanodya, Jotam G.; Ramachandran, Geetha; Hemanth Kumar, A. K.; Srivastava, Shashikant; Deshpande, Devyani; Nuermberger, Eric; Gumbo, Tawanda

    2016-01-01

    Background. The role of drug concentrations in clinical outcomes in children with tuberculosis is unclear. Target concentrations for dose optimization are unknown. Methods. Plasma drug concentrations measured in Indian children with tuberculosis were modeled using compartmental pharmacokinetic analyses. The children were followed until end of therapy to ascertain therapy failure or death. An ensemble of artificial intelligence algorithms, including random forests, was used to identify predictors of clinical outcome from among 30 clinical, laboratory, and pharmacokinetic variables. Results. Among the 143 children with known outcomes, there was high between-child variability of isoniazid, rifampin, and pyrazinamide concentrations: 110 (77%) completed therapy, 24 (17%) failed therapy, and 9 (6%) died. The main predictors of therapy failure or death were a pyrazinamide peak concentration <38.10 mg/L and rifampin peak concentration <3.01 mg/L. The relative risk of these poor outcomes below these peak concentration thresholds was 3.64 (95% confidence interval [CI], 2.28–5.83). Isoniazid had concentration-dependent antagonism with rifampin and pyrazinamide, with an adjusted odds ratio for therapy failure of 3.00 (95% CI, 2.08–4.33) in antagonism concentration range. In regard to death alone as an outcome, the same drug concentrations, plus z scores (indicators of malnutrition), and age <3 years, were highly ranked predictors. In children <3 years old, isoniazid 0- to 24-hour area under the concentration-time curve <11.95 mg/L × hour and/or rifampin peak <3.10 mg/L were the best predictors of therapy failure, with relative risk of 3.43 (95% CI, .99–11.82). Conclusions. We have identified new antibiotic target concentrations, which are potential biomarkers associated with treatment failure and death in children with tuberculosis. PMID:27742636

  5. Track-Before-Detect Algorithm for Faint Moving Objects based on Random Sampling and Consensus

    NASA Astrophysics Data System (ADS)

    Dao, P.; Rast, R.; Schlaegel, W.; Schmidt, V.; Dentamaro, A.

    2014-09-01

    There are many algorithms developed for tracking and detecting faint moving objects in congested backgrounds. One obvious application is detection of targets in images where each pixel corresponds to the received power in a particular location. In our application, a visible imager operated in stare mode observes geostationary objects as fixed, stars as moving and non-geostationary objects as drifting in the field of view. We would like to achieve high sensitivity detection of the drifters. The ability to improve SNR with track-before-detect (TBD) processing, where target information is collected and collated before the detection decision is made, allows respectable performance against dim moving objects. Generally, a TBD algorithm consists of a pre-processing stage that highlights potential targets and a temporal filtering stage. However, the algorithms that have been successfully demonstrated, e.g. Viterbi-based and Bayesian-based, demand formidable processing power and memory. We propose an algorithm that exploits the quasi constant velocity of objects, the predictability of the stellar clutter and the intrinsically low false alarm rate of detecting signature candidates in 3-D, based on an iterative method called "RANdom SAmple Consensus” and one that can run real-time on a typical PC. The technique is tailored for searching objects with small telescopes in stare mode. Our RANSAC-MT (Moving Target) algorithm estimates parameters of a mathematical model (e.g., linear motion) from a set of observed data which contains a significant number of outliers while identifying inliers. In the pre-processing phase, candidate blobs were selected based on morphology and an intensity threshold that would normally generate unacceptable level of false alarms. The RANSAC sampling rejects candidates that conform to the predictable motion of the stars. Data collected with a 17 inch telescope by AFRL/RH and a COTS lens/EM-CCD sensor by the AFRL/RD Satellite Assessment Center is

  6. An efficient voting algorithm for finding additive biclusters with random background.

    PubMed

    Xiao, Jing; Wang, Lusheng; Liu, Xiaowen; Jiang, Tao

    2008-12-01

    The biclustering problem has been extensively studied in many areas, including e-commerce, data mining, machine learning, pattern recognition, statistics, and, more recently, computational biology. Given an n x m matrix A (n >or= m), the main goal of biclustering is to identify a subset of rows (called objects) and a subset of columns (called properties) such that some objective function that specifies the quality of the found bicluster (formed by the subsets of rows and of columns of A) is optimized. The problem has been proved or conjectured to be NP-hard for various objective functions. In this article, we study a probabilistic model for the implanted additive bicluster problem, where each element in the n x m background matrix is a random integer from [0, L - 1] for some integer L, and a k x k implanted additive bicluster is obtained from an error-free additive bicluster by randomly changing each element to a number in [0, L - 1] with probability theta. We propose an O(n(2)m) time algorithm based on voting to solve the problem. We show that when k >or= Omega(square root of (n log n)), the voting algorithm can correctly find the implanted bicluster with probability at least 1 - (9/n(2)). We also implement our algorithm as a C++ program named VOTE. The implementation incorporates several ideas for estimating the size of an implanted bicluster, adjusting the threshold in voting, dealing with small biclusters, and dealing with overlapping implanted biclusters. Our experimental results on both simulated and real datasets show that VOTE can find biclusters with a high accuracy and speed. PMID:19040364

  7. Precise algorithm to generate random sequential addition of hard hyperspheres at saturation.

    PubMed

    Zhang, G; Torquato, S

    2013-11-01

    The study of the packing of hard hyperspheres in d-dimensional Euclidean space R^{d} has been a topic of great interest in statistical mechanics and condensed matter theory. While the densest known packings are ordered in sufficiently low dimensions, it has been suggested that in sufficiently large dimensions, the densest packings might be disordered. The random sequential addition (RSA) time-dependent packing process, in which congruent hard hyperspheres are randomly and sequentially placed into a system without interparticle overlap, is a useful packing model to study disorder in high dimensions. Of particular interest is the infinite-time saturation limit in which the available space for another sphere tends to zero. However, the associated saturation density has been determined in all previous investigations by extrapolating the density results for nearly saturated configurations to the saturation limit, which necessarily introduces numerical uncertainties. We have refined an algorithm devised by us [S. Torquato, O. U. Uche, and F. H. Stillinger, Phys. Rev. E 74, 061308 (2006)] to generate RSA packings of identical hyperspheres. The improved algorithm produce such packings that are guaranteed to contain no available space in a large simulation box using finite computational time with heretofore unattained precision and across the widest range of dimensions (2≤d≤8). We have also calculated the packing and covering densities, pair correlation function g(2)(r), and structure factor S(k) of the saturated RSA configurations. As the space dimension increases, we find that pair correlations markedly diminish, consistent with a recently proposed "decorrelation" principle, and the degree of "hyperuniformity" (suppression of infinite-wavelength density fluctuations) increases. We have also calculated the void exclusion probability in order to compute the so-called quantizer error of the RSA packings, which is related to the second moment of inertia of the average

  8. Precise algorithm to generate random sequential addition of hard hyperspheres at saturation

    NASA Astrophysics Data System (ADS)

    Zhang, G.; Torquato, S.

    2013-11-01

    The study of the packing of hard hyperspheres in d-dimensional Euclidean space Rd has been a topic of great interest in statistical mechanics and condensed matter theory. While the densest known packings are ordered in sufficiently low dimensions, it has been suggested that in sufficiently large dimensions, the densest packings might be disordered. The random sequential addition (RSA) time-dependent packing process, in which congruent hard hyperspheres are randomly and sequentially placed into a system without interparticle overlap, is a useful packing model to study disorder in high dimensions. Of particular interest is the infinite-time saturation limit in which the available space for another sphere tends to zero. However, the associated saturation density has been determined in all previous investigations by extrapolating the density results for nearly saturated configurations to the saturation limit, which necessarily introduces numerical uncertainties. We have refined an algorithm devised by us [S. Torquato, O. U. Uche, and F. H. Stillinger, Phys. Rev. EPLEEE81539-375510.1103/PhysRevE.74.061308 74, 061308 (2006)] to generate RSA packings of identical hyperspheres. The improved algorithm produce such packings that are guaranteed to contain no available space in a large simulation box using finite computational time with heretofore unattained precision and across the widest range of dimensions (2≤d≤8). We have also calculated the packing and covering densities, pair correlation function g2(r), and structure factor S(k) of the saturated RSA configurations. As the space dimension increases, we find that pair correlations markedly diminish, consistent with a recently proposed “decorrelation” principle, and the degree of “hyperuniformity” (suppression of infinite-wavelength density fluctuations) increases. We have also calculated the void exclusion probability in order to compute the so-called quantizer error of the RSA packings, which is related to the

  9. Precise algorithm to generate random sequential addition of hard hyperspheres at saturation.

    PubMed

    Zhang, G; Torquato, S

    2013-11-01

    The study of the packing of hard hyperspheres in d-dimensional Euclidean space R^{d} has been a topic of great interest in statistical mechanics and condensed matter theory. While the densest known packings are ordered in sufficiently low dimensions, it has been suggested that in sufficiently large dimensions, the densest packings might be disordered. The random sequential addition (RSA) time-dependent packing process, in which congruent hard hyperspheres are randomly and sequentially placed into a system without interparticle overlap, is a useful packing model to study disorder in high dimensions. Of particular interest is the infinite-time saturation limit in which the available space for another sphere tends to zero. However, the associated saturation density has been determined in all previous investigations by extrapolating the density results for nearly saturated configurations to the saturation limit, which necessarily introduces numerical uncertainties. We have refined an algorithm devised by us [S. Torquato, O. U. Uche, and F. H. Stillinger, Phys. Rev. E 74, 061308 (2006)] to generate RSA packings of identical hyperspheres. The improved algorithm produce such packings that are guaranteed to contain no available space in a large simulation box using finite computational time with heretofore unattained precision and across the widest range of dimensions (2≤d≤8). We have also calculated the packing and covering densities, pair correlation function g(2)(r), and structure factor S(k) of the saturated RSA configurations. As the space dimension increases, we find that pair correlations markedly diminish, consistent with a recently proposed "decorrelation" principle, and the degree of "hyperuniformity" (suppression of infinite-wavelength density fluctuations) increases. We have also calculated the void exclusion probability in order to compute the so-called quantizer error of the RSA packings, which is related to the second moment of inertia of the average

  10. Statistical physics analysis of the computational complexity of solving random satisfiability problems using backtrack algorithms

    NASA Astrophysics Data System (ADS)

    Cocco, S.; Monasson, R.

    2001-08-01

    The computational complexity of solving random 3-Satisfiability (3-SAT) problems is investigated using statistical physics concepts and techniques related to phase transitions, growth processes and (real-space) renormalization flows. 3-SAT is a representative example of hard computational tasks; it consists in knowing whether a set of αN randomly drawn logical constraints involving N Boolean variables can be satisfied altogether or not. Widely used solving procedures, as the Davis-Putnam-Loveland-Logemann (DPLL) algorithm, perform a systematic search for a solution, through a sequence of trials and errors represented by a search tree. The size of the search tree accounts for the computational complexity, i.e. the amount of computational efforts, required to achieve resolution. In the present study, we identify, using theory and numerical experiments, easy (size of the search tree scaling polynomially with N) and hard (exponential scaling) regimes as a function of the ratio α of constraints per variable. The typical complexity is explicitly calculated in the different regimes, in very good agreement with numerical simulations. Our theoretical approach is based on the analysis of the growth of the branches in the search tree under the operation of DPLL. On each branch, the initial 3-SAT problem is dynamically turned into a more generic 2+p-SAT problem, where p and 1 - p are the fractions of constraints involving three and two variables respectively. The growth of each branch is monitored by the dynamical evolution of α and p and is represented by a trajectory in the static phase diagram of the random 2+p-SAT problem. Depending on whether or not the trajectories cross the boundary between satisfiable and unsatisfiable phases, single branches or full trees are generated by DPLL, resulting in easy or hard resolutions. Our picture for the origin of complexity can be applied to other computational problems solved by branch and bound algorithms.

  11. Comparing Algorithms for Graph Isomorphism Using Discrete- and Continuous-Time Quantum Random Walks

    SciTech Connect

    Rudinger, Kenneth; Gamble, John King; Bach, Eric; Friesen, Mark; Joynt, Robert; Coppersmith, S. N.

    2013-07-01

    Berry and Wang [Phys. Rev. A 83, 042317 (2011)] show numerically that a discrete-time quan- tum random walk of two noninteracting particles is able to distinguish some non-isomorphic strongly regular graphs from the same family. Here we analytically demonstrate how it is possible for these walks to distinguish such graphs, while continuous-time quantum walks of two noninteracting parti- cles cannot. We show analytically and numerically that even single-particle discrete-time quantum random walks can distinguish some strongly regular graphs, though not as many as two-particle noninteracting discrete-time walks. Additionally, we demonstrate how, given the same quantum random walk, subtle di erences in the graph certi cate construction algorithm can nontrivially im- pact the walk's distinguishing power. We also show that no continuous-time walk of a xed number of particles can distinguish all strongly regular graphs when used in conjunction with any of the graph certi cates we consider. We extend this constraint to discrete-time walks of xed numbers of noninteracting particles for one kind of graph certi cate; it remains an open question as to whether or not this constraint applies to the other graph certi cates we consider.

  12. Comparing Algorithms for Graph Isomorphism Using Discrete- and Continuous-Time Quantum Random Walks

    DOE PAGES

    Rudinger, Kenneth; Gamble, John King; Bach, Eric; Friesen, Mark; Joynt, Robert; Coppersmith, S. N.

    2013-07-01

    Berry and Wang [Phys. Rev. A 83, 042317 (2011)] show numerically that a discrete-time quan- tum random walk of two noninteracting particles is able to distinguish some non-isomorphic strongly regular graphs from the same family. Here we analytically demonstrate how it is possible for these walks to distinguish such graphs, while continuous-time quantum walks of two noninteracting parti- cles cannot. We show analytically and numerically that even single-particle discrete-time quantum random walks can distinguish some strongly regular graphs, though not as many as two-particle noninteracting discrete-time walks. Additionally, we demonstrate how, given the same quantum random walk, subtle di erencesmore » in the graph certi cate construction algorithm can nontrivially im- pact the walk's distinguishing power. We also show that no continuous-time walk of a xed number of particles can distinguish all strongly regular graphs when used in conjunction with any of the graph certi cates we consider. We extend this constraint to discrete-time walks of xed numbers of noninteracting particles for one kind of graph certi cate; it remains an open question as to whether or not this constraint applies to the other graph certi cates we consider.« less

  13. A New MAC Address Spoofing Detection Technique Based on Random Forests.

    PubMed

    Alotaibi, Bandar; Elleithy, Khaled

    2016-01-01

    Media access control (MAC) addresses in wireless networks can be trivially spoofed using off-the-shelf devices. The aim of this research is to detect MAC address spoofing in wireless networks using a hard-to-spoof measurement that is correlated to the location of the wireless device, namely the received signal strength (RSS). We developed a passive solution that does not require modification for standards or protocols. The solution was tested in a live test-bed (i.e., a wireless local area network with the aid of two air monitors acting as sensors) and achieved 99.77%, 93.16% and 88.38% accuracy when the attacker is 8-13 m, 4-8 m and less than 4 m away from the victim device, respectively. We implemented three previous methods on the same test-bed and found that our solution outperforms existing solutions. Our solution is based on an ensemble method known as random forests. PMID:26927103

  14. Diagnosis of colorectal cancer by near-infrared optical fiber spectroscopy and random forest

    NASA Astrophysics Data System (ADS)

    Chen, Hui; Lin, Zan; Wu, Hegang; Wang, Li; Wu, Tong; Tan, Chao

    2015-01-01

    Near-infrared (NIR) spectroscopy has such advantages as being noninvasive, fast, relatively inexpensive, and no risk of ionizing radiation. Differences in the NIR signals can reflect many physiological changes, which are in turn associated with such factors as vascularization, cellularity, oxygen consumption, or remodeling. NIR spectral differences between colorectal cancer and healthy tissues were investigated. A Fourier transform NIR spectroscopy instrument equipped with a fiber-optic probe was used to mimic in situ clinical measurements. A total of 186 spectra were collected and then underwent the preprocessing of standard normalize variate (SNV) for removing unwanted background variances. All the specimen and spots used for spectral collection were confirmed staining and examination by an experienced pathologist so as to ensure the representative of the pathology. Principal component analysis (PCA) was used to uncover the possible clustering. Several methods including random forest (RF), partial least squares-discriminant analysis (PLSDA), K-nearest neighbor and classification and regression tree (CART) were used to extract spectral features and to construct the diagnostic models. By comparison, it reveals that, even if no obvious difference of misclassified ratio (MCR) was observed between these models, RF is preferable since it is quicker, more convenient and insensitive to over-fitting. The results indicate that NIR spectroscopy coupled with RF model can serve as a potential tool for discriminating the colorectal cancer tissues from normal ones.

  15. Risk Prediction of One-Year Mortality in Patients with Cardiac Arrhythmias Using Random Survival Forest

    PubMed Central

    Miao, Fen; Cai, Yun-Peng; Zhang, Yu-Xiao; Li, Ye; Zhang, Yuan-Ting

    2015-01-01

    Existing models for predicting mortality based on traditional Cox proportional hazard approach (CPH) often have low prediction accuracy. This paper aims to develop a clinical risk model with good accuracy for predicting 1-year mortality in cardiac arrhythmias patients using random survival forest (RSF), a robust approach for survival analysis. 10,488 cardiac arrhythmias patients available in the public MIMIC II clinical database were investigated, with 3,452 deaths occurring within 1-year followups. Forty risk factors including demographics and clinical and laboratory information and antiarrhythmic agents were analyzed as potential predictors of all-cause mortality. RSF was adopted to build a comprehensive survival model and a simplified risk model composed of 14 top risk factors. The built comprehensive model achieved a prediction accuracy of 0.81 measured by c-statistic with 10-fold cross validation. The simplified risk model also achieved a good accuracy of 0.799. Both results outperformed traditional CPH (which achieved a c-statistic of 0.733 for the comprehensive model and 0.718 for the simplified model). Moreover, various factors are observed to have nonlinear impact on cardiac arrhythmias prognosis. As a result, RSF based model which took nonlinearity into account significantly outperformed traditional Cox proportional hazard model and has great potential to be a more effective approach for survival analysis. PMID:26379761

  16. Discrimination of fish populations using parasites: Random Forests on a 'predictable' host-parasite system.

    PubMed

    Pérez-Del-Olmo, A; Montero, F E; Fernández, M; Barrett, J; Raga, J A; Kostadinova, A

    2010-10-01

    We address the effect of spatial scale and temporal variation on model generality when forming predictive models for fish assignment using a new data mining approach, Random Forests (RF), to variable biological markers (parasite community data). Models were implemented for a fish host-parasite system sampled along the Mediterranean and Atlantic coasts of Spain and were validated using independent datasets. We considered 2 basic classification problems in evaluating the importance of variations in parasite infracommunities for assignment of individual fish to their populations of origin: multiclass (2-5 population models, using 2 seasonal replicates from each of the populations) and 2-class task (using 4 seasonal replicates from 1 Atlantic and 1 Mediterranean population each). The main results are that (i) RF are well suited for multiclass population assignment using parasite communities in non-migratory fish; (ii) RF provide an efficient means for model cross-validation on the baseline data and this allows sample size limitations in parasite tag studies to be tackled effectively; (iii) the performance of RF is dependent on the complexity and spatial extent/configuration of the problem; and (iv) the development of predictive models is strongly influenced by seasonal change and this stresses the importance of both temporal replication and model validation in parasite tagging studies.

  17. Object-oriented mapping of urban trees using Random Forest classifiers

    NASA Astrophysics Data System (ADS)

    Puissant, Anne; Rougier, Simon; Stumpf, André

    2014-02-01

    Since vegetation in urban areas delivers crucial ecological services as a support to human well-being and to the urban population in general, its monitoring is a major issue for urban planners. Mapping and monitoring the changes in urban green spaces are important tasks because of their functions such as the management of air, climate and water quality, the reduction of noise, the protection of species and the development of recreational activities. In this context, the objective of this work is to propose a methodology to inventory and map the urban tree spaces from a mono-temporal very high resolution (VHR) optical image using a Random Forest classifier in combination with object-oriented approaches. The methodology is developed and its performance is evaluated on a dataset of the city of Strasbourg (France) for different categories of built-up areas. The results indicate a good accuracy and a high robustness for the classification of the green elements in terms of user and producer accuracies.

  18. Genome-wide association study for backfat thickness in Canchim beef cattle using Random Forest approach

    PubMed Central

    2013-01-01

    Background Meat quality involves many traits, such as marbling, tenderness, juiciness, and backfat thickness, all of which require attention from livestock producers. Backfat thickness improvement by means of traditional selection techniques in Canchim beef cattle has been challenging due to its low heritability, and it is measured late in an animal’s life. Therefore, the implementation of new methodologies for identification of single nucleotide polymorphisms (SNPs) linked to backfat thickness are an important strategy for genetic improvement of carcass and meat quality. Results The set of SNPs identified by the random forest approach explained as much as 50% of the deregressed estimated breeding value (dEBV) variance associated with backfat thickness, and a small set of 5 SNPs were able to explain 34% of the dEBV for backfat thickness. Several quantitative trait loci (QTL) for fat-related traits were found in the surrounding areas of the SNPs, as well as many genes with roles in lipid metabolism. Conclusions These results provided a better understanding of the backfat deposition and regulation pathways, and can be considered a starting point for future implementation of a genomic selection program for backfat thickness in Canchim beef cattle. PMID:23738659

  19. Selective of informative metabolites using random forests based on model population analysis.

    PubMed

    Huang, Jian-Hua; Yan, Jun; Wu, Qing-Hua; Duarte Ferro, Miguel; Yi, Lun-Zhao; Lu, Hong-Mei; Xu, Qing-Song; Liang, Yi-Zeng

    2013-12-15

    One of the main goals of metabolomics studies is to discover informative metabolites or biomarkers, which may be used to diagnose diseases and to find out pathology. Sophisticated feature selection approaches are required to extract the information hidden in such complex 'omics' data. In this study, it is proposed a new and robust selective method by combining random forests (RF) with model population analysis (MPA), for selecting informative metabolites from three metabolomic datasets. According to the contribution to the classification accuracy, the metabolites were classified into three kinds: informative, no-informative, and interfering metabolites. Based on the proposed method, some informative metabolites were selected for three datasets; further analyses of these metabolites between healthy and diseased groups were then performed, showing by T-test that the P values for all these selected metabolites were lower than 0.05. Moreover, the informative metabolites identified by the current method were demonstrated to be correlated with the clinical outcome under investigation. The source codes of MPA-RF in Matlab can be freely downloaded from http://code.google.com/p/my-research-list/downloads/list. PMID:24209380

  20. Identifying functions of protein complexes based on topology similarity with random forest.

    PubMed

    Li, Zhan-Chao; Lai, Yan-Hua; Chen, Li-Li; Xie, Yun; Dai, Zong; Zou, Xiao-Yong

    2014-03-01

    Elucidating the functions of protein complexes is critical for understanding disease mechanisms, diagnosis and therapy. In this study, based on the concept that protein complexes with similar topology may have similar functions, we firstly model protein complexes as weighted graphs with nodes representing the proteins and edges indicating interaction between proteins. Secondly, we use topology features derived from the graphs to characterize protein complexes based on the graph theory. Finally, we construct a predictor by using random forest and topology features to identify the functions of protein complexes. Effectiveness of the current method is evaluated by identifying the functions of mammalian protein complexes. And then the predictor is also utilized to identify the functions of protein complexes retrieved from human protein-protein interaction networks. We identify some protein complexes with significant roles in the occurrence of tumors, vesicles and retinoblastoma. It is anticipated that the current research has an important impact on pathogenesis and the pharmaceutical industry. The source code of Matlab and the dataset are freely available on request from the authors. PMID:24389559

  1. A New MAC Address Spoofing Detection Technique Based on Random Forests

    PubMed Central

    Alotaibi, Bandar; Elleithy, Khaled

    2016-01-01

    Media access control (MAC) addresses in wireless networks can be trivially spoofed using off-the-shelf devices. The aim of this research is to detect MAC address spoofing in wireless networks using a hard-to-spoof measurement that is correlated to the location of the wireless device, namely the received signal strength (RSS). We developed a passive solution that does not require modification for standards or protocols. The solution was tested in a live test-bed (i.e., a wireless local area network with the aid of two air monitors acting as sensors) and achieved 99.77%, 93.16% and 88.38% accuracy when the attacker is 8–13 m, 4–8 m and less than 4 m away from the victim device, respectively. We implemented three previous methods on the same test-bed and found that our solution outperforms existing solutions. Our solution is based on an ensemble method known as random forests. PMID:26927103

  2. Random Forest Segregation of Drug Responses May Define Regions of Biological Significance

    PubMed Central

    Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino

    2016-01-01

    The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies. PMID:27014046

  3. Random Forest Segregation of Drug Responses May Define Regions of Biological Significance.

    PubMed

    Bukhari, Qasim; Borsook, David; Rudin, Markus; Becerra, Lino

    2016-01-01

    The ability to assess brain responses in unsupervised manner based on fMRI measure has remained a challenge. Here we have applied the Random Forest (RF) method to detect differences in the pharmacological MRI (phMRI) response in rats to treatment with an analgesic drug (buprenorphine) as compared to control (saline). Three groups of animals were studied: two groups treated with different doses of the opioid buprenorphine, low (LD), and high dose (HD), and one receiving saline. PhMRI responses were evaluated in 45 brain regions and RF analysis was applied to allocate rats to the individual treatment groups. RF analysis was able to identify drug effects based on differential phMRI responses in the hippocampus, amygdala, nucleus accumbens, superior colliculus, and the lateral and posterior thalamus for drug vs. saline. These structures have high levels of mu opioid receptors. In addition these regions are involved in aversive signaling, which is inhibited by mu opioids. The results demonstrate that buprenorphine mediated phMRI responses comprise characteristic features that allow a supervised differentiation from placebo treated rats as well as the proper allocation to the respective drug dose group using the RF method, a method that has been successfully applied in clinical studies.

  4. Evaluation of random forest regression for prediction of breeding value from genomewide SNPs.

    PubMed

    Sarkar, Rupam Kumar; Rao, A R; Meher, Prabina Kumar; Nepolean, T; Mohapatra, T

    2015-06-01

    Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs (P variables) is much higher than the number of genotypes (n observations) (P » n). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under P » n. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy.

  5. Random Forests Are Able to Identify Differences in Clotting Dynamics from Kinetic Models of Thrombin Generation.

    PubMed

    Arumugam, Jayavel; Bukkapatnam, Satish T S; Narayanan, Krishna R; Srinivasa, Arun R

    2016-01-01

    Current methods for distinguishing acute coronary syndromes such as heart attack from stable coronary artery disease, based on the kinetics of thrombin formation, have been limited to evaluating sensitivity of well-established chemical species (e.g., thrombin) using simple quantifiers of their concentration profiles (e.g., maximum level of thrombin concentration, area under the thrombin concentration versus time curve). In order to get an improved classifier, we use a 34-protein factor clotting cascade model and convert the simulation data into a high-dimensional representation (about 19000 features) using a piecewise cubic polynomial fit. Then, we systematically find plausible assays to effectively gauge changes in acute coronary syndrome/coronary artery disease populations by introducing a statistical learning technique called Random Forests. We find that differences associated with acute coronary syndromes emerge in combinations of a handful of features. For instance, concentrations of 3 chemical species, namely, active alpha-thrombin, tissue factor-factor VIIa-factor Xa ternary complex, and intrinsic tenase complex with factor X, at specific time windows, could be used to classify acute coronary syndromes to an accuracy of about 87.2%. Such a combination could be used to efficiently assay the coagulation system. PMID:27171403

  6. Longitudinal clinical score prediction in Alzheimer's disease with soft-split sparse regression based random forest.

    PubMed

    Huang, Lei; Jin, Yan; Gao, Yaozong; Thung, Kim-Han; Shen, Dinggang

    2016-10-01

    Alzheimer's disease (AD) is an irreversible neurodegenerative disease and affects a large population in the world. Cognitive scores at multiple time points can be reliably used to evaluate the progression of the disease clinically. In recent studies, machine learning techniques have shown promising results on the prediction of AD clinical scores. However, there are multiple limitations in the current models such as linearity assumption and missing data exclusion. Here, we present a nonlinear supervised sparse regression-based random forest (RF) framework to predict a variety of longitudinal AD clinical scores. Furthermore, we propose a soft-split technique to assign probabilistic paths to a test sample in RF for more accurate predictions. In order to benefit from the longitudinal scores in the study, unlike the previous studies that often removed the subjects with missing scores, we first estimate those missing scores with our proposed soft-split sparse regression-based RF and then utilize those estimated longitudinal scores at all the previous time points to predict the scores at the next time point. The experiment results demonstrate that our proposed method is superior to the traditional RF and outperforms other state-of-art regression models. Our method can also be extended to be a general regression framework to predict other disease scores. PMID:27500865

  7. Evaluation of random forest regression for prediction of breeding value from genomewide SNPs.

    PubMed

    Sarkar, Rupam Kumar; Rao, A R; Meher, Prabina Kumar; Nepolean, T; Mohapatra, T

    2015-06-01

    Genomic prediction is meant for estimating the breeding value using molecular marker data which has turned out to be a powerful tool for efficient utilization of germplasm resources and rapid improvement of cultivars. Model-based techniques have been widely used for prediction of breeding values of genotypes from genomewide association studies. However, application of the random forest (RF), a model-free ensemble learning method, is not widely used for prediction. In this study, the optimum values of tuning parameters of RF have been identified and applied to predict the breeding value of genotypes based on genomewide single-nucleotide polymorphisms (SNPs), where the number of SNPs (P variables) is much higher than the number of genotypes (n observations) (P » n). Further, a comparison was made with the model-based genomic prediction methods, namely, least absolute shrinkage and selection operator (LASSO), ridge regression (RR) and elastic net (EN) under P » n. It was found that the correlations between the predicted and observed trait response were 0.591, 0.539, 0.431 and 0.587 for RF, LASSO, RR and EN, respectively, which implies superiority of the RF over the model-based techniques in genomic prediction. Hence, we suggest that the RF methodology can be used as an alternative to the model-based techniques for the prediction of breeding value at genome level with higher accuracy. PMID:26174666

  8. Land cover mapping based on random forest classification of multitemporal spectral and thermal images.

    PubMed

    Eisavi, Vahid; Homayouni, Saeid; Yazdi, Ahmad Maleknezhad; Alimohammadi, Abbas

    2015-05-01

    Thematic mapping of complex landscapes, with various phenological patterns from satellite imagery, is a particularly challenging task. However, supplementary information, such as multitemporal data and/or land surface temperature (LST), has the potential to improve the land cover classification accuracy and efficiency. In this paper, in order to map land covers, we evaluated the potential of multitemporal Landsat 8's spectral and thermal imageries using a random forest (RF) classifier. We used a grid search approach based on the out-of-bag (OOB) estimate of error to optimize the RF parameters. Four different scenarios were considered in this research: (1) RF classification of multitemporal spectral images, (2) RF classification of multitemporal LST images, (3) RF classification of all multitemporal LST and spectral images, and (4) RF classification of selected important or optimum features. The study area in this research was Naghadeh city and its surrounding region, located in West Azerbaijan Province, northwest of Iran. The overall accuracies of first, second, third, and fourth scenarios were equal to 86.48, 82.26, 90.63, and 91.82%, respectively. The quantitative assessments of the results demonstrated that the most important or optimum features increase the class separability, while the spectral and thermal features produced a more moderate increase in the land cover mapping accuracy. In addition, the contribution of the multitemporal thermal information led to a considerable increase in the user and producer accuracies of classes with a rapid temporal change behavior, such as crops and vegetation. PMID:25910718

  9. Task-Dependent Band-Selection of Hyperspectral Images by Projection-Based Random Forests

    NASA Astrophysics Data System (ADS)

    Hänsch, R.; Hellwich, O.

    2016-06-01

    The automatic classification of land cover types from hyperspectral images is a challenging problem due to (among others) the large amount of spectral bands and their high spatial and spectral correlation. The extraction of meaningful features, that enables a subsequent classifier to distinguish between different land cover classes, is often limited to a subset of all available data dimensions which is found by band selection techniques or other methods of dimensionality reduction. This work applies Projection-Based Random Forests to hyperspectral images, which not only overcome the need of an explicit feature extraction, but also provide mechanisms to automatically select spectral bands that contain original (i.e. non-redundant) as well as highly meaningful information for the given classification task. The proposed method is applied to four challenging hyperspectral datasets and it is shown that the effective number of spectral bands can be considerably limited without loosing too much of classification performance, e.g. a loss of 1 % accuracy if roughly 13 % of all available bands are used.

  10. Land cover mapping based on random forest classification of multitemporal spectral and thermal images.

    PubMed

    Eisavi, Vahid; Homayouni, Saeid; Yazdi, Ahmad Maleknezhad; Alimohammadi, Abbas

    2015-05-01

    Thematic mapping of complex landscapes, with various phenological patterns from satellite imagery, is a particularly challenging task. However, supplementary information, such as multitemporal data and/or land surface temperature (LST), has the potential to improve the land cover classification accuracy and efficiency. In this paper, in order to map land covers, we evaluated the potential of multitemporal Landsat 8's spectral and thermal imageries using a random forest (RF) classifier. We used a grid search approach based on the out-of-bag (OOB) estimate of error to optimize the RF parameters. Four different scenarios were considered in this research: (1) RF classification of multitemporal spectral images, (2) RF classification of multitemporal LST images, (3) RF classification of all multitemporal LST and spectral images, and (4) RF classification of selected important or optimum features. The study area in this research was Naghadeh city and its surrounding region, located in West Azerbaijan Province, northwest of Iran. The overall accuracies of first, second, third, and fourth scenarios were equal to 86.48, 82.26, 90.63, and 91.82%, respectively. The quantitative assessments of the results demonstrated that the most important or optimum features increase the class separability, while the spectral and thermal features produced a more moderate increase in the land cover mapping accuracy. In addition, the contribution of the multitemporal thermal information led to a considerable increase in the user and producer accuracies of classes with a rapid temporal change behavior, such as crops and vegetation.

  11. Using conditional inference trees and random forests to predict the bioaccumulation potential of organic chemicals.

    PubMed

    Strempel, Sebastian; Nendza, Monika; Scheringer, Martin; Hungerbühler, Konrad

    2013-04-01

    The present study presents a data-oriented, tiered approach to assessing the bioaccumulation potential of chemicals according to the European chemicals regulation on Registration, Evaluation, Authorisation and Restriction of Chemicals (REACH). The authors compiled data for eight physicochemical descriptors (partition coefficients, degradation half-lives, polarity, and so forth) for a set of 713 organic chemicals for which experimental values of the bioconcentration factor (BCF) are available. The authors employed supervised machine learning methods (conditional inference trees and random forests) to derive relationships between the physicochemical descriptors and the BCF values. In a first tier, the authors established rules for classifying a chemical as bioaccumulative (B) or nonbioaccumulative (non-B). In a second tier, the authors developed a new tool for estimating numerical BCF values. For both cases the optimal set of relevant descriptors was determined; these are biotransformation half-life and octanol-water distribution coefficient (log D) for the classification rules and log D, biotransformation half-life, and topological polar surface area for the BCF estimation tool. The uncertainty of the BCF estimates obtained with the new estimation tool was quantified by comparing the estimated and experimental BCF values of the 713 chemicals. Comparison with existing BCF estimation methods indicates that the performance of this new BCF estimation tool is at least as high as that of existing methods. The authors recommend the present study's classification rules and BCF estimation tool for a consensus application in combination with existing BCF estimation methods.

  12. A survival prediction model of rats in hemorrhagic shock using the random forest classifier.

    PubMed

    Choi, Joon Yul; Kim, Sung Kean; Lee, Wan Hyung; Yoo, Tae Keun; Kim, Deok Won

    2012-01-01

    Hemorrhagic shock is the cause of one third of deaths resulting from injury in the world. Although many studies have tried to diagnose hemorrhagic shock early and accurately, such attempts were inconclusive due to compensatory mechanisms of humans. The objective of this study was to construct a survival prediction model of rats in hemorrhagic shock using a random forest (RF) model, which is a newly emerged classifier acknowledged for its performance. Heart rate (HR), mean arterial pressure (MAP), respiratory rate (RR), lactate concentration (LC), and perfusion (PF) measured in rats were used as input variables for the RF model and its performance was compared with that of a logistic regression (LR) model. Before constructing the models, we performed a 5-fold cross validation for RF variable selection and forward stepwise variable selection for the LR model to see which variables are important for the models. For the LR model, sensitivity, specificity, accuracy, and area under the receiver operating characteristic curve (ROC-AUC) were 1, 0.89, 0.94, and 0.98, respectively. For the RF models, sensitivity, specificity, accuracy, and AUC were 0.96, 1, 0.98, and 0.99, respectively. In conclusion, the RF model was superior to the LR model for survival prediction in the rat model.

  13. Risk Prediction of One-Year Mortality in Patients with Cardiac Arrhythmias Using Random Survival Forest.

    PubMed

    Miao, Fen; Cai, Yun-Peng; Zhang, Yu-Xiao; Li, Ye; Zhang, Yuan-Ting

    2015-01-01

    Existing models for predicting mortality based on traditional Cox proportional hazard approach (CPH) often have low prediction accuracy. This paper aims to develop a clinical risk model with good accuracy for predicting 1-year mortality in cardiac arrhythmias patients using random survival forest (RSF), a robust approach for survival analysis. 10,488 cardiac arrhythmias patients available in the public MIMIC II clinical database were investigated, with 3,452 deaths occurring within 1-year followups. Forty risk factors including demographics and clinical and laboratory information and antiarrhythmic agents were analyzed as potential predictors of all-cause mortality. RSF was adopted to build a comprehensive survival model and a simplified risk model composed of 14 top risk factors. The built comprehensive model achieved a prediction accuracy of 0.81 measured by c-statistic with 10-fold cross validation. The simplified risk model also achieved a good accuracy of 0.799. Both results outperformed traditional CPH (which achieved a c-statistic of 0.733 for the comprehensive model and 0.718 for the simplified model). Moreover, various factors are observed to have nonlinear impact on cardiac arrhythmias prognosis. As a result, RSF based model which took nonlinearity into account significantly outperformed traditional Cox proportional hazard model and has great potential to be a more effective approach for survival analysis.

  14. Semantic classification of urban buildings combining VHR image and GIS data: An improved random forest approach

    NASA Astrophysics Data System (ADS)

    Du, Shihong; Zhang, Fangli; Zhang, Xiuyuan

    2015-07-01

    While most existing studies have focused on extracting geometric information on buildings, only a few have concentrated on semantic information. The lack of semantic information cannot satisfy many demands on resolving environmental and social issues. This study presents an approach to semantically classify buildings into much finer categories than those of existing studies by learning random forest (RF) classifier from a large number of imbalanced samples with high-dimensional features. First, a two-level segmentation mechanism combining GIS and VHR image produces single image objects at a large scale and intra-object components at a small scale. Second, a semi-supervised method chooses a large number of unbiased samples by considering the spatial proximity and intra-cluster similarity of buildings. Third, two important improvements in RF classifier are made: a voting-distribution ranked rule for reducing the influences of imbalanced samples on classification accuracy and a feature importance measurement for evaluating each feature's contribution to the recognition of each category. Fourth, the semantic classification of urban buildings is practically conducted in Beijing city, and the results demonstrate that the proposed approach is effective and accurate. The seven categories used in the study are finer than those in existing work and more helpful to studying many environmental and social problems.

  15. In silico prediction of toxic action mechanisms of phenols for imbalanced data with Random Forest learner.

    PubMed

    Chen, Jing; Tang, Yuan Yan; Fang, Bin; Guo, Chang

    2012-05-01

    With an increasing need for the rapid and effective safety assessment of compounds in industrial and civil-use products, in silico toxicity exploration techniques provide an economic way for environmental hazard assessment. The previous in silico researches have developed many quantitative structure-activity relationships models to predict toxicity mechanisms for last decade. Most of these methods benefit from data analysis and machine learning techniques, which rely heavily on the characteristics of data sets. For Tetrahymena pyriformis toxicity data sets, there is a great technical challenge-data imbalance. The skewness of data class distribution would greatly deteriorate the prediction performance on rare classes. Most of the previous researches for phenol mechanisms of toxic action prediction did not consider this practical problem. In this work, we dealt with the problem by considering the difference between the two types of misclassifications. Random Forest learner was employed in cost-sensitive learning framework to construct prediction models based on selected molecular descriptors. In computational experiments, both the global and local models obtained appreciable overall prediction accuracies. Particularly, the performance on rare classes was indeed promoted. Moreover, for practical usage of these models, the balance of the two misclassifications can be adjusted by using different cost matrices according to the application goals. PMID:22481075

  16. Applications of the BIOPHYS Algorithm for Physically-Based Retrieval of Biophysical, Structural and Forest Disturbance Information

    NASA Technical Reports Server (NTRS)

    Peddle, Derek R.; Huemmrich, K. Fred; Hall, Forrest G.; Masek, Jeffrey G.; Soenen, Scott A.; Jackson, Chris D.

    2011-01-01

    Canopy reflectance model inversion using look-up table approaches provides powerful and flexible options for deriving improved forest biophysical structural information (BSI) compared with traditional statistical empirical methods. The BIOPHYS algorithm is an improved, physically-based inversion approach for deriving BSI for independent use and validation and for monitoring, inventory and quantifying forest disturbance as well as input to ecosystem, climate and carbon models. Based on the multiple-forward mode (MFM) inversion approach, BIOPHYS results were summarized from different studies (Minnesota/NASA COVER; Virginia/LEDAPS; Saskatchewan/BOREAS), sensors (airborne MMR; Landsat; MODIS) and models (GeoSail; GOMS). Applications output included forest density, height, crown dimension, branch and green leaf area, canopy cover, disturbance estimates based on multi-temporal chronosequences, and structural change following recovery from forest fires over the last century. Good correspondences with validation field data were obtained. Integrated analyses of multiple solar and view angle imagery further improved retrievals compared with single pass data. Quantifying ecosystem dynamics such as the area and percent of forest disturbance, early regrowth and succession provide essential inputs to process-driven models of carbon flux. BIOPHYS is well suited for large-area, multi-temporal applications involving multiple image sets and mosaics for assessing vegetation disturbance and quantifying biophysical structural dynamics and change. It is also suitable for integration with forest inventory, monitoring, updating, and other programs.

  17. An Evaluation of the MOD17 Gross Primary Production Algorithm in a Mangrove Forest

    NASA Astrophysics Data System (ADS)

    Wells, H.; Najjar, R.; Herrmann, M.; Fuentes, J. D.; Ruiz-Plancarte, J.

    2015-12-01

    Though coastal wetlands occupy a small fraction of the Earth's surface, they are extremely active ecosystems and play a significant role in the global carbon budget. However, coastal wetlands are still poorly understood, especially when compared to open-ocean and terrestrial ecosystems. This is partly due to the limited in situ observations in these areas. One of the ways around the limited in situ data is to use remote sensing products. Here we present the first evaluation of the MOD17 remote sensing algorithm of gross primary productivity (GPP) in a mangrove forest using data from a flux tower in the Florida Everglades. MOD17 utilizes remote sensing products from the Moderate Resolution Imaging Spectroradiometer and meteorological fields from the NCEP/DOE Reanalysis 2. MOD17 is found to capture the long-term mean and seasonal amplitude of GPP but has significant errors describing the interannual variability, intramonthly variability, and the phasing of the annual cycle in GPP. Regarding the latter, MOD17 overestimates GPP when salinity is high and underestimates it when it is low, consistent with the fact that MOD17 ignores salinity and salinity tends to decrease GPP. Including salinity in the algorithm would then most likely improve its accuracy. MOD17 also assumes that GPP is linear with respect to PAR (photosynthetically active radiation), which does not hold true in the mangroves. Finally, the estimated PAR and air temperature inputs to MOD17 were found to be significantly lower than observed. In summary, while MOD17 captures some aspects of GPP variability at this mangrove site, it appears to be doing so for the wrong reasons.

  18. Data Security in Ad Hoc Networks Using Randomization of Cryptographic Algorithms

    NASA Astrophysics Data System (ADS)

    Krishna, B. Ananda; Radha, S.; Keshava Reddy, K. Chenna

    Ad hoc networks are a new wireless networking paradigm for mobile hosts. Unlike traditional mobile wireless networks, ad hoc networks do not rely on any fixed infrastructure. Instead, hosts rely on each other to keep the network connected. The military tactical and other security-sensitive operations are still the main applications of ad hoc networks, although there is a trend to adopt ad hoc networks for commercial uses due to their unique properties. One main challenge in design of these networks is how to feasibly detect and defend the major attacks against data, impersonation and unauthorized data modification. Also, in the same network some nodes may be malicious whose objective is to degrade the network performance. In this study, we propose a security model in which the packets are encrypted and decrypted using multiple algorithms where the selection scheme is random. The performance of the proposed model is analyzed and it is observed that there is no increase in control overhead but a slight delay is introduced due to the encryption process. We conclude that the proposed security model works well for heavily loaded networks with high mobility and can be extended for more cryptographic algorithms.

  19. From analytical solutions of solute transport equations to multidimensional time-domain random walk (TDRW) algorithms

    NASA Astrophysics Data System (ADS)

    Bodin, Jacques

    2015-03-01

    In this study, new multi-dimensional time-domain random walk (TDRW) algorithms are derived from approximate one-dimensional (1-D), two-dimensional (2-D), and three-dimensional (3-D) analytical solutions of the advection-dispersion equation and from exact 1-D, 2-D, and 3-D analytical solutions of the pure-diffusion equation. These algorithms enable the calculation of both the time required for a particle to travel a specified distance in a homogeneous medium and the mass recovery at the observation point, which may be incomplete due to 2-D or 3-D transverse dispersion or diffusion. The method is extended to heterogeneous media, represented as a piecewise collection of homogeneous media. The particle motion is then decomposed along a series of intermediate checkpoints located on the medium interface boundaries. The accuracy of the multi-dimensional TDRW method is verified against (i) exact analytical solutions of solute transport in homogeneous media and (ii) finite-difference simulations in a synthetic 2-D heterogeneous medium of simple geometry. The results demonstrate that the method is ideally suited to purely diffusive transport and to advection-dispersion transport problems dominated by advection. Conversely, the method is not recommended for highly dispersive transport problems because the accuracy of the advection-dispersion TDRW algorithms degrades rapidly for a low Péclet number, consistent with the accuracy limit of the approximate analytical solutions. The proposed approach provides a unified methodology for deriving multi-dimensional time-domain particle equations and may be applicable to other mathematical transport models, provided that appropriate analytical solutions are available.

  20. Harmonics elimination algorithm for operational modal analysis using random decrement technique

    NASA Astrophysics Data System (ADS)

    Modak, S. V.; Rawal, Chetan; Kundra, T. K.

    2010-05-01

    Operational modal analysis (OMA) extracts modal parameters of a structure using their output response, during operation in general. OMA, when applied to mechanical engineering structures is often faced with the problem of harmonics present in the output response, and can cause erroneous modal extraction. This paper demonstrates for the first time that the random decrement (RD) method can be efficiently employed to eliminate the harmonics from the randomdec signatures. Further, the research work shows effective elimination of large amplitude harmonics also by proposing inclusion of additional random excitation. This obviously need not be recorded for analysis, as is the case with any other OMA method. The free decays obtained from RD have been used for system modal identification using eigen-system realization algorithm (ERA). The proposed harmonic elimination method has an advantage over previous methods in that it does not require the harmonic frequencies to be known and can be used for multiple harmonics, including periodic signals. The theory behind harmonic elimination is first developed and validated. The effectiveness of the method is demonstrated through a simulated study and then by experimental studies on a beam and a more complex F-shape structure, which resembles in shape to the skeleton of a drilling or milling machine tool. Cases with presence of single and multiple harmonics in the response are considered.

  1. Medical Decision Support System for Diagnosis of Heart Arrhythmia using DWT and Random Forests Classifier.

    PubMed

    Alickovic, Emina; Subasi, Abdulhamit

    2016-04-01

    In this study, Random Forests (RF) classifier is proposed for ECG heartbeat signal classification in diagnosis of heart arrhythmia. Discrete wavelet transform (DWT) is used to decompose ECG signals into different successive frequency bands. A set of different statistical features were extracted from the obtained frequency bands to denote the distribution of wavelet coefficients. This study shows that RF classifier achieves superior performances compared to other decision tree methods using 10-fold cross-validation for the ECG datasets and the obtained results suggest that further significant improvements in terms of classification accuracy can be accomplished by the proposed classification system. Accurate ECG signal classification is the major requirement for detection of all arrhythmia types. Performances of the proposed system have been evaluated on two different databases, namely MIT-BIH database and St. -Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database. For MIT-BIH database, RF classifier yielded an overall accuracy 99.33 % against 98.44 and 98.67 % for the C4.5 and CART classifiers, respectively. For St. -Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database, RF classifier yielded an overall accuracy 99.95 % against 99.80 % for both C4.5 and CART classifiers, respectively. The combined model with multiscale principal component analysis (MSPCA) de-noising, discrete wavelet transform (DWT) and RF classifier also achieves better performance with the area under the receiver operating characteristic (ROC) curve (AUC) and F-measure equal to 0.999 and 0.993 for MIT-BIH database and 1 and 0.999 for and St. -Petersburg Institute of Cardiological Technics 12-lead Arrhythmia Database, respectively. Obtained results demonstrate that the proposed system has capacity for reliable classification of ECG signals, and to assist the clinicians for making an accurate diagnosis of cardiovascular disorders (CVDs). PMID:26922592

  2. Classification of melanoma lesions using sparse coded features and random forests

    NASA Astrophysics Data System (ADS)

    Rastgoo, Mojdeh; Lemaître, Guillaume; Morel, Olivier; Massich, Joan; Garcia, Rafael; Meriaudeau, Fabrice; Marzani, Franck; Sidibé, Désiré

    2016-03-01

    Malignant melanoma is the most dangerous type of skin cancer, yet it is the most treatable kind of cancer, conditioned by its early diagnosis which is a challenging task for clinicians and dermatologists. In this regard, CAD systems based on machine learning and image processing techniques are developed to differentiate melanoma lesions from benign and dysplastic nevi using dermoscopic images. Generally, these frameworks are composed of sequential processes: pre-processing, segmentation, and classification. This architecture faces mainly two challenges: (i) each process is complex with the need to tune a set of parameters, and is specific to a given dataset; (ii) the performance of each process depends on the previous one, and the errors are accumulated throughout the framework. In this paper, we propose a framework for melanoma classification based on sparse coding which does not rely on any pre-processing or lesion segmentation. Our framework uses Random Forests classifier and sparse representation of three features: SIFT, Hue and Opponent angle histograms, and RGB intensities. The experiments are carried out on the public PH2 dataset using a 10-fold cross-validation. The results show that SIFT sparse-coded feature achieves the highest performance with sensitivity and specificity of 100% and 90.3% respectively, with a dictionary size of 800 atoms and a sparsity level of 2. Furthermore, the descriptor based on RGB intensities achieves similar results with sensitivity and specificity of 100% and 71.3%, respectively for a smaller dictionary size of 100 atoms. In conclusion, dictionary learning techniques encode strong structures of dermoscopic images and provide discriminant descriptors.

  3. Random forest classification of large volume structures for visuo-haptic rendering in CT images

    NASA Astrophysics Data System (ADS)

    Mastmeyer, Andre; Fortmeier, Dirk; Handels, Heinz

    2016-03-01

    For patient-specific voxel-based visuo-haptic rendering of CT scans of the liver area, the fully automatic segmentation of large volume structures such as skin, soft tissue, lungs and intestine (risk structures) is important. Using a machine learning based approach, several existing segmentations from 10 segmented gold-standard patients are learned by random decision forests individually and collectively. The core of this paper is feature selection and the application of the learned classifiers to a new patient data set. In a leave-some-out cross-validation, the obtained full volume segmentations are compared to the gold-standard segmentations of the untrained patients. The proposed classifiers use a multi-dimensional feature space to estimate the hidden truth, instead of relying on clinical standard threshold and connectivity based methods. The result of our efficient whole-body section classification are multi-label maps with the considered tissues. For visuo-haptic simulation, other small volume structures would have to be segmented additionally. We also take a look into these structures (liver vessels). For an experimental leave-some-out study consisting of 10 patients, the proposed method performs much more efficiently compared to state of the art methods. In two variants of leave-some-out experiments we obtain best mean DICE ratios of 0.79, 0.97, 0.63 and 0.83 for skin, soft tissue, hard bone and risk structures. Liver structures are segmented with DICE 0.93 for the liver, 0.43 for blood vessels and 0.39 for bile vessels.

  4. Prediction of hot spots in protein interfaces using a random forest model with hybrid features.

    PubMed

    Wang, Lin; Liu, Zhi-Ping; Zhang, Xiang-Sun; Chen, Luonan

    2012-03-01

    Prediction of hot spots in protein interfaces provides crucial information for the research on protein-protein interaction and drug design. Existing machine learning methods generally judge whether a given residue is likely to be a hot spot by extracting features only from the target residue. However, hot spots usually form a small cluster of residues which are tightly packed together at the center of protein interface. With this in mind, we present a novel method to extract hybrid features which incorporate a wide range of information of the target residue and its spatially neighboring residues, i.e. the nearest contact residue in the other face (mirror-contact residue) and the nearest contact residue in the same face (intra-contact residue). We provide a novel random forest (RF) model to effectively integrate these hybrid features for predicting hot spots in protein interfaces. Our method can achieve accuracy (ACC) of 82.4% and Matthew's correlation coefficient (MCC) of 0.482 in Alanine Scanning Energetics Database, and ACC of 77.6% and MCC of 0.429 in Binding Interface Database. In a comparison study, performance of our RF model exceeds other existing methods, such as Robetta, FOLDEF, KFC, KFC2, MINERVA and HotPoint. Of our hybrid features, three physicochemical features of target residues (mass, polarizability and isoelectric point), the relative side-chain accessible surface area and the average depth index of mirror-contact residues are found to be the main discriminative features in hot spots prediction. We also confirm that hot spots tend to form large contact surface areas between two interacting proteins. Source data and code are available at: http://www.aporc.org/doc/wiki/HotSpot. PMID:22258275

  5. Identification of High-Impact cis-Regulatory Mutations Using Transcription Factor Specific Random Forest Models

    PubMed Central

    Svetlichnyy, Dmitry; Imrichova, Hana; Fiers, Mark; Kalender Atak, Zeynep; Aerts, Stein

    2015-01-01

    Cancer genomes contain vast amounts of somatic mutations, many of which are passenger mutations not involved in oncogenesis. Whereas driver mutations in protein-coding genes can be distinguished from passenger mutations based on their recurrence, non-coding mutations are usually not recurrent at the same position. Therefore, it is still unclear how to identify cis-regulatory driver mutations, particularly when chromatin data from the same patient is not available, thus relying only on sequence and expression information. Here we use machine-learning methods to predict functional regulatory regions using sequence information alone, and compare the predicted activity of the mutated region with the reference sequence. This way we define the Predicted Regulatory Impact of a Mutation in an Enhancer (PRIME). We find that the recently identified driver mutation in the TAL1 enhancer has a high PRIME score, representing a “gain-of-target” for MYB, whereas the highly recurrent TERT promoter mutation has a surprisingly low PRIME score. We trained Random Forest models for 45 cancer-related transcription factors, and used these to score variations in the HeLa genome and somatic mutations across more than five hundred cancer genomes. Each model predicts only a small fraction of non-coding mutations with a potential impact on the function of the encompassing regulatory region. Nevertheless, as these few candidate driver mutations are often linked to gains in chromatin activity and gene expression, they may contribute to the oncogenic program by altering the expression levels of specific oncogenes and tumor suppressor genes. PMID:26562774

  6. In vivo MRI based prostate cancer localization with random forests and auto-context model.

    PubMed

    Qian, Chunjun; Wang, Li; Gao, Yaozong; Yousuf, Ambereen; Yang, Xiaoping; Oto, Aytekin; Shen, Dinggang

    2016-09-01

    Prostate cancer is one of the major causes of cancer death for men. Magnetic resonance (MR) imaging is being increasingly used as an important modality to localize prostate cancer. Therefore, localizing prostate cancer in MRI with automated detection methods has become an active area of research. Many methods have been proposed for this task. However, most of previous methods focused on identifying cancer only in the peripheral zone (PZ), or classifying suspicious cancer ROIs into benign tissue and cancer tissue. Few works have been done on developing a fully automatic method for cancer localization in the entire prostate region, including central gland (CG) and transition zone (TZ). In this paper, we propose a novel learning-based multi-source integration framework to directly localize prostate cancer regions from in vivo MRI. We employ random forests to effectively integrate features from multi-source images together for cancer localization. Here, multi-source images include initially the multi-parametric MRIs (i.e., T2, DWI, and dADC) and later also the iteratively-estimated and refined tissue probability map of prostate cancer. Experimental results on 26 real patient data show that our method can accurately localize cancerous sections. The higher section-based evaluation (SBE), combined with the ROC analysis result of individual patients, shows that the proposed method is promising for in vivo MRI based prostate cancer localization, which can be used for guiding prostate biopsy, targeting the tumor in focal therapy planning, triage and follow-up of patients with active surveillance, as well as the decision making in treatment selection. The common ROC analysis with the AUC value of 0.832 and also the ROI-based ROC analysis with the AUC value of 0.883 both illustrate the effectiveness of our proposed method. PMID:27048995

  7. Random Forests to Predict Rectal Toxicity Following Prostate Cancer Radiation Therapy

    SciTech Connect

    Ospina, Juan D.; Zhu, Jian; Chira, Ciprian; Bossi, Alberto; Delobel, Jean B.; Beckendorf, Véronique; Dubray, Bernard; Lagrange, Jean-Léon; Correa, Juan C.; and others

    2014-08-01

    Purpose: To propose a random forest normal tissue complication probability (RF-NTCP) model to predict late rectal toxicity following prostate cancer radiation therapy, and to compare its performance to that of classic NTCP models. Methods and Materials: Clinical data and dose-volume histograms (DVH) were collected from 261 patients who received 3-dimensional conformal radiation therapy for prostate cancer with at least 5 years of follow-up. The series was split 1000 times into training and validation cohorts. A RF was trained to predict the risk of 5-year overall rectal toxicity and bleeding. Parameters of the Lyman-Kutcher-Burman (LKB) model were identified and a logistic regression model was fit. The performance of all the models was assessed by computing the area under the receiving operating characteristic curve (AUC). Results: The 5-year grade ≥2 overall rectal toxicity and grade ≥1 and grade ≥2 rectal bleeding rates were 16%, 25%, and 10%, respectively. Predictive capabilities were obtained using the RF-NTCP model for all 3 toxicity endpoints, including both the training and validation cohorts. The age and use of anticoagulants were found to be predictors of rectal bleeding. The AUC for RF-NTCP ranged from 0.66 to 0.76, depending on the toxicity endpoint. The AUC values for the LKB-NTCP were statistically significantly inferior, ranging from 0.62 to 0.69. Conclusions: The RF-NTCP model may be a useful new tool in predicting late rectal toxicity, including variables other than DVH, and thus appears as a strong competitor to classic NTCP models.

  8. Random Forest classification based on star graph topological indices for antioxidant proteins.

    PubMed

    Fernández-Blanco, Enrique; Aguiar-Pulido, Vanessa; Munteanu, Cristian Robert; Dorado, Julian

    2013-01-21

    Aging and life quality is an important research topic nowadays in areas such as life sciences, chemistry, pharmacology, etc. People live longer, and, thus, they want to spend that extra time with a better quality of life. At this regard, there exists a tiny subset of molecules in nature, named antioxidant proteins that may influence the aging process. However, testing every single protein in order to identify its properties is quite expensive and inefficient. For this reason, this work proposes a model, in which the primary structure of the protein is represented using complex network graphs that can be used to reduce the number of proteins to be tested for antioxidant biological activity. The graph obtained as a representation will help us describe the complex system by using topological indices. More specifically, in this work, Randić's Star Networks have been used as well as the associated indices, calculated with the S2SNet tool. In order to simulate the existing proportion of antioxidant proteins in nature, a dataset containing 1999 proteins, of which 324 are antioxidant proteins, was created. Using this data as input, Star Graph Topological Indices were calculated with the S2SNet tool. These indices were then used as input to several classification techniques. Among the techniques utilised, the Random Forest has shown the best performance, achieving a score of 94% correctly classified instances. Although the target class (antioxidant proteins) represents a tiny subset inside the dataset, the proposed model is able to achieve a percentage of 81.8% correctly classified instances for this class, with a precision of 81.3%.

  9. Using Random Forest to Improve the Downscaling of Global Livestock Census Data

    PubMed Central

    Nicolas, Gaëlle; Robinson, Timothy P.; Wint, G. R. William; Conchedda, Giulia; Cinardi, Giuseppina; Gilbert, Marius

    2016-01-01

    Large scale, high-resolution global data on farm animal distributions are essential for spatially explicit assessments of the epidemiological, environmental and socio-economic impacts of the livestock sector. This has been the major motivation behind the development of the Gridded Livestock of the World (GLW) database, which has been extensively used since its first publication in 2007. The database relies on a downscaling methodology whereby census counts of animals in sub-national administrative units are redistributed at the level of grid cells as a function of a series of spatial covariates. The recent upgrade of GLW1 to GLW2 involved automating the processing, improvement of input data, and downscaling at a spatial resolution of 1 km per cell (5 km per cell in the earlier version). The underlying statistical methodology, however, remained unchanged. In this paper, we evaluate new methods to downscale census data with a higher accuracy and increased processing efficiency. Two main factors were evaluated, based on sample census datasets of cattle in Africa and chickens in Asia. First, we implemented and evaluated Random Forest models (RF) instead of stratified regressions. Second, we investigated whether models that predicted the number of animals per rural person (per capita) could provide better downscaled estimates than the previous approach that predicted absolute densities (animals per km2). RF models consistently provided better predictions than the stratified regressions for both continents and species. The benefit of per capita over absolute density models varied according to the species and continent. In addition, different technical options were evaluated to reduce the processing time while maintaining their predictive power. Future GLW runs (GLW 3.0) will apply the new RF methodology with optimized modelling options. The potential benefit of per capita models will need to be further investigated with a better distinction between rural and agricultural

  10. Using Random Forest to Improve the Downscaling of Global Livestock Census Data.

    PubMed

    Nicolas, Gaëlle; Robinson, Timothy P; Wint, G R William; Conchedda, Giulia; Cinardi, Giuseppina; Gilbert, Marius

    2016-01-01

    Large scale, high-resolution global data on farm animal distributions are essential for spatially explicit assessments of the epidemiological, environmental and socio-economic impacts of the livestock sector. This has been the major motivation behind the development of the Gridded Livestock of the World (GLW) database, which has been extensively used since its first publication in 2007. The database relies on a downscaling methodology whereby census counts of animals in sub-national administrative units are redistributed at the level of grid cells as a function of a series of spatial covariates. The recent upgrade of GLW1 to GLW2 involved automating the processing, improvement of input data, and downscaling at a spatial resolution of 1 km per cell (5 km per cell in the earlier version). The underlying statistical methodology, however, remained unchanged. In this paper, we evaluate new methods to downscale census data with a higher accuracy and increased processing efficiency. Two main factors were evaluated, based on sample census datasets of cattle in Africa and chickens in Asia. First, we implemented and evaluated Random Forest models (RF) instead of stratified regressions. Second, we investigated whether models that predicted the number of animals per rural person (per capita) could provide better downscaled estimates than the previous approach that predicted absolute densities (animals per km2). RF models consistently provided better predictions than the stratified regressions for both continents and species. The benefit of per capita over absolute density models varied according to the species and continent. In addition, different technical options were evaluated to reduce the processing time while maintaining their predictive power. Future GLW runs (GLW 3.0) will apply the new RF methodology with optimized modelling options. The potential benefit of per capita models will need to be further investigated with a better distinction between rural and agricultural

  11. Dust storm detection using random forests and physical-based approaches over the Middle East

    NASA Astrophysics Data System (ADS)

    Souri, Amir Hossein; Vajedian, Sanaz

    2015-07-01

    Dust storms are important phenomena over large regions of the arid and semi-arid areas of the Middle East. Due to the influences of dust aerosols on climate and human daily activities, dust detection plays a crucial role in environmental and climatic studies. Detection of dust storms is critical to accurately understand dust, their properties and distribution. Currently, remotely sensed data such as MODIS (Moderate Resolution Imaging Spectroradiometer) with appropriate temporal and spectral resolutions have been widely used for this purpose. This paper investigates the capability of two physical-based methods, and random forests (RF) classifier, for the first time, to detect dust storms using MODIS imagery. Since the physical-based approaches are empirical, they suffer from certain drawbacks such as high variability of thresholds depending on the underlying surface. Therefore, classification-based approaches could be deployed as an alternative. In this paper, the most relevant bands are chosen based on the physical effects of the major classes, particularly dust, cloud and snow, on both emissive infrared and reflective bands. In order to verify the capability of the methods, OMAERUV AAOD (aerosol absorption optical depth) product from OMI (Ozone Monitoring Instrument) sensor is exploited. In addition, some small regions are selected manually to be considered as ground truth for measuring the probability of false detection (POFD) and probability of missing detection (POMD). The dust class generated by RF is consistent qualitatively with the location and extent of dust observed in OMERAUV and MODIS true colour images. Quantitatively, the dust classes generated for eight dust outbreaks in the Middle East are found to be accurate from 7% and 6% of POFD and POMD respectively. Moreover, results demonstrate the sound capability of RF in classifying dust plumes over both water and land simultaneously. The performance of the physical-based approaches is found weaker than RF

  12. Bush encroachment monitoring using multi-temporal Landsat data and random forests

    NASA Astrophysics Data System (ADS)

    Symeonakis, E.; Higginbottom, T.

    2014-11-01

    It is widely accepted that land degradation and desertification (LDD) are serious global threats to humans and the environment. Around a third of savannahs in Africa are affected by LDD processes that may lead to substantial declines in ecosystem functioning and services. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as a potential indicator of LDD. Mapping bush encroachment over large areas can only effectively be done using Earth Observation (EO) data and techniques. However, the accurate assessment of large-scale savannah degradation through bush encroachment with satellite imagery remains a formidable task due to the fact that on the satellite data vegetation variability in response to highly variable rainfall patterns might obscure the underlying degradation processes. Here, we present a methodological framework for the monitoring of bush encroachment-related land degradation in a savannah environment in the Northwest Province of South Africa. We utilise multi-temporal Landsat TM and ETM+ (SLC-on) data from 1989 until 2009, mostly from the dry-season, and ancillary data in a GIS environment. We then use the machine learning classification approach of random forests to identify the extent of encroachment over the 20-year period. The results show that in the area of study, bush encroachment is as alarming as permanent vegetation loss. The classification of the year 2009 is validated yielding low commission and omission errors and high k-statistic values for the grasses and woody vegetation classes. Our approach is a step towards a rigorous and effective savannah degradation assessment.

  13. iDNA-Prot: identification of DNA binding proteins using random forest with grey model.

    PubMed

    Lin, Wei-Zhong; Fang, Jian-An; Xiao, Xuan; Chou, Kuo-Chen

    2011-01-01

    DNA-binding proteins play crucial roles in various cellular processes. Developing high throughput tools for rapidly and effectively identifying DNA-binding proteins is one of the major challenges in the field of genome annotation. Although many efforts have been made in this regard, further effort is needed to enhance the prediction power. By incorporating the features into the general form of pseudo amino acid composition that were extracted from protein sequences via the "grey model" and by adopting the random forest operation engine, we proposed a new predictor, called iDNA-Prot, for identifying uncharacterized proteins as DNA-binding proteins or non-DNA binding proteins based on their amino acid sequences information alone. The overall success rate by iDNA-Prot was 83.96% that was obtained via jackknife tests on a newly constructed stringent benchmark dataset in which none of the proteins included has ≥25% pairwise sequence identity to any other in a same subset. In addition to achieving high success rate, the computational time for iDNA-Prot is remarkably shorter in comparison with the relevant existing predictors. Hence it is anticipated that iDNA-Prot may become a useful high throughput tool for large-scale analysis of DNA-binding proteins. As a user-friendly web-server, iDNA-Prot is freely accessible to the public at the web-site on http://icpr.jci.edu.cn/bioinfo/iDNA-Prot or http://www.jci-bioinfo.cn/iDNA-Prot. Moreover, for the convenience of the vast majority of experimental scientists, a step-by-step guide is provided on how to use the web-server to get the desired results. PMID:21935457

  14. Insight into Best Variables for COPD Case Identification: A Random Forests Analysis

    PubMed Central

    Leidy, Nancy K.; Malley, Karen G.; Steenrod, Anna W.; Mannino, David M.; Make, Barry J.; Bowler, Russ P.; Thomashow, Byron M.; Barr, R. G.; Rennard, Stephen I.; Houfek, Julia F.; Yawn, Barbara P.; Han, Meilan K.; Meldrum, Catherine A.; Bacci, Elizabeth D.; Walsh, John W.; Martinez, Fernando

    2016-01-01

    Rationale This study is part of a larger, multi-method project to develop a questionnaire for identifying undiagnosed cases of chronic obstructive pulmonary disease (COPD) in primary care settings, with specific interest in the detection of patients with moderate to severe airway obstruction or risk of exacerbation. Objectives To examine 3 existing datasets for insight into key features of COPD that could be useful in the identification of undiagnosed COPD. Methods Random forests analyses were applied to the following databases: COPD Foundation Peak Flow Study Cohort (N=5761), Burden of Obstructive Lung Disease (BOLD) Kentucky site (N=508), and COPDGene® (N=10,214). Four scenarios were examined to find the best, smallest sets of variables that distinguished cases and controls:(1) moderate to severe COPD (forced expiratory volume in 1 second [FEV1] <50% predicted) versus no COPD; (2) undiagnosed versus diagnosed COPD; (3) COPD with and without exacerbation history; and (4) clinically significant COPD (FEV1<60% predicted or history of acute exacerbation) versus all others. Results From 4 to 8 variables were able to differentiate cases from controls, with sensitivity ≥73 (range: 73–90) and specificity >68 (range: 68–93). Across scenarios, the best models included age, smoking status or history, symptoms (cough, wheeze, phlegm), general or breathing-related activity limitation, episodes of acute bronchitis, and/or missed work days and non-work activities due to breathing or health. Conclusions Results provide insight into variables that should be considered during the development of candidate items for a new questionnaire to identify undiagnosed cases of clinically significant COPD. PMID:26835508

  15. SU-D-201-06: Random Walk Algorithm Seed Localization Parameters in Lung Positron Emission Tomography (PET) Images

    SciTech Connect

    Soufi, M; Asl, A Kamali; Geramifar, P

    2015-06-15

    Purpose: The objective of this study was to find the best seed localization parameters in random walk algorithm application to lung tumor delineation in Positron Emission Tomography (PET) images. Methods: PET images suffer from statistical noise and therefore tumor delineation in these images is a challenging task. Random walk algorithm, a graph based image segmentation technique, has reliable image noise robustness. Also its fast computation and fast editing characteristics make it powerful for clinical purposes. We implemented the random walk algorithm using MATLAB codes. The validation and verification of the algorithm have been done by 4D-NCAT phantom with spherical lung lesions in different diameters from 20 to 90 mm (with incremental steps of 10 mm) and different tumor to background ratios of 4:1 and 8:1. STIR (Software for Tomographic Image Reconstruction) has been applied to reconstruct the phantom PET images with different pixel sizes of 2×2×2 and 4×4×4 mm{sup 3}. For seed localization, we selected pixels with different maximum Standardized Uptake Value (SUVmax) percentages, at least (70%, 80%, 90% and 100%) SUVmax for foreground seeds and up to (20% to 55%, 5% increment) SUVmax for background seeds. Also, for investigation of algorithm performance on clinical data, 19 patients with lung tumor were studied. The resulted contours from algorithm have been compared with nuclear medicine expert manual contouring as ground truth. Results: Phantom and clinical lesion segmentation have shown that the best segmentation results obtained by selecting the pixels with at least 70% SUVmax as foreground seeds and pixels up to 30% SUVmax as background seeds respectively. The mean Dice Similarity Coefficient of 94% ± 5% (83% ± 6%) and mean Hausdorff Distance of 1 (2) pixels have been obtained for phantom (clinical) study. Conclusion: The accurate results of random walk algorithm in PET image segmentation assure its application for radiation treatment planning and

  16. Image encryption algorithm based on the random local phase encoding in gyrator transform domains

    NASA Astrophysics Data System (ADS)

    Liu, Zhengjun; Yang, Meng; Liu, Wei; Li, She; Gong, Min; Liu, Wanyu; Liu, Shutian

    2012-09-01

    A random local phase encoding method is presented for encrypting a secret image. Some random polygons are introduced to control the local regions of random phase encoding. The data located in the random polygon is encoded by random phase encoding. The random phase data is the main key in this encryption method. The different random phases calculated by using a monotonous function are employed. The random data defining random polygon serves as an additional key for enhancing the security of the image encryption scheme. Numerical simulations are given for demonstrating the performance of the proposed encryption approach.

  17. A fast random walk algorithm for computing the pulsed-gradient spin-echo signal in multiscale porous media.

    PubMed

    Grebenkov, Denis S

    2011-02-01

    A new method for computing the signal attenuation due to restricted diffusion in a linear magnetic field gradient is proposed. A fast random walk (FRW) algorithm for simulating random trajectories of diffusing spin-bearing particles is combined with gradient encoding. As random moves of a FRW are continuously adapted to local geometrical length scales, the method is efficient for simulating pulsed-gradient spin-echo experiments in hierarchical or multiscale porous media such as concrete, sandstones, sedimentary rocks and, potentially, brain or lungs. PMID:21159532

  18. A fast random walk algorithm for computing the pulsed-gradient spin-echo signal in multiscale porous media.

    PubMed

    Grebenkov, Denis S

    2011-02-01

    A new method for computing the signal attenuation due to restricted diffusion in a linear magnetic field gradient is proposed. A fast random walk (FRW) algorithm for simulating random trajectories of diffusing spin-bearing particles is combined with gradient encoding. As random moves of a FRW are continuously adapted to local geometrical length scales, the method is efficient for simulating pulsed-gradient spin-echo experiments in hierarchical or multiscale porous media such as concrete, sandstones, sedimentary rocks and, potentially, brain or lungs.

  19. Identification of human protein complexes from local sub-graphs of protein-protein interaction network based on random forest with topological structure features.

    PubMed

    Li, Zhan-Chao; Lai, Yan-Hua; Chen, Li-Li; Zhou, Xuan; Dai, Zong; Zou, Xiao-Yong

    2012-03-01

    In the post-genomic era, one of the most important and challenging tasks is to identify protein complexes and further elucidate its molecular mechanisms in specific biological processes. Previous computational approaches usually identify protein complexes from protein interaction network based on dense sub-graphs and incomplete priori information. Additionally, the computational approaches have little concern about the biological properties of proteins and there is no a common evaluation metric to evaluate the performance. So, it is necessary to construct novel method for identifying protein complexes and elucidating the function of protein complexes. In this study, a novel approach is proposed to identify protein complexes using random forest and topological structure. Each protein complex is represented by a graph of interactions, where descriptor of the protein primary structure is used to characterize biological properties of protein and vertex is weighted by the descriptor. The topological structure features are developed and used to characterize protein complexes. Random forest algorithm is utilized to build prediction model and identify protein complexes from local sub-graphs instead of dense sub-graphs. As a demonstration, the proposed approach is applied to protein interaction data in human, and the satisfied results are obtained with accuracy of 80.24%, sensitivity of 81.94%, specificity of 80.07%, and Matthew's correlation coefficient of 0.4087 in 10-fold cross-validation test. Some new protein complexes are identified, and analysis based on Gene Ontology shows that the complexes are likely to be true complexes and play important roles in the pathogenesis of some diseases. PCI-RFTS, a corresponding executable program for protein complexes identification, can be acquired freely on request from the authors.

  20. Variances in the projections, resulting from CLIMEX, Boosted Regression Trees and Random Forests techniques

    NASA Astrophysics Data System (ADS)

    Shabani, Farzin; Kumar, Lalit; Solhjouy-fard, Samaneh

    2016-05-01

    The aim of this study was to have a comparative investigation and evaluation of the capabilities of correlative and mechanistic modeling processes, applied to the projection of future distributions of date palm in novel environments and to establish a method of minimizing uncertainty in the projections of differing techniques. The location of this study on a global scale is in Middle Eastern Countries. We compared the mechanistic model CLIMEX (CL) with the correlative models MaxEnt (MX), Boosted Regression Trees (BRT), and Random Forests (RF) to project current and future distributions of date palm (Phoenix dactylifera L.). The Global Climate Model (GCM), the CSIRO-Mk3.0 (CS) using the A2 emissions scenario, was selected for making projections. Both indigenous and alien distribution data of the species were utilized in the modeling process. The common areas predicted by MX, BRT, RF, and CL from the CS GCM were extracted and compared to ascertain projection uncertainty levels of each individual technique. The common areas identified by all four modeling techniques were used to produce a map indicating suitable and unsuitable areas for date palm cultivation for Middle Eastern countries, for the present and the year 2100. The four different modeling approaches predict fairly different distributions. Projections from CL were more conservative than from MX. The BRT and RF were the most conservative methods in terms of projections for the current time. The combination of the final CL and MX projections for the present and 2100 provide higher certainty concerning those areas that will become highly suitable for future date palm cultivation. According to the four models, cold, hot, and wet stress, with differences on a regional basis, appears to be the major restrictions on future date palm distribution. The results demonstrate variances in the projections, resulting from different techniques. The assessment and interpretation of model projections requires reservations

  1. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy)

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-11-01

    The aim of this work is to define reliable susceptibility models for shallow landslides using Logistic Regression and Random Forests multivariate statistical techniques. The study area, located in North-East Sicily, was hit on October 1st 2009 by a severe rainstorm (225 mm of cumulative rainfall in 7 h) which caused flash floods and more than 1000 landslides. Several small villages, such as Giampilieri, were hit with 31 fatalities, 6 missing persons and damage to buildings and transportation infrastructures. Landslides, mainly types such as earth and debris translational slides evolving into debris flows, were triggered on steep slopes and involved colluvium and regolith materials which cover the underlying metamorphic bedrock. The work has been carried out with the following steps: i) realization of a detailed event landslide inventory map through field surveys coupled with observation of high resolution aerial colour orthophoto; ii) identification of landslide source areas; iii) data preparation of landslide controlling factors and descriptive statistics based on a bivariate method (Frequency Ratio) to get an initial overview on existing relationships between causative factors and shallow landslide source areas; iv) choice of criteria for the selection and sizing of the mapping unit; v) implementation of 5 multivariate statistical susceptibility models based on Logistic Regression and Random Forests techniques and focused on landslide source areas; vi) evaluation of the influence of sample size and type of sampling on results and performance of the models; vii) evaluation of the predictive capabilities of the models using ROC curve, AUC and contingency tables; viii) comparison of model results and obtained susceptibility maps; and ix) analysis of temporal variation of landslide susceptibility related to input parameter changes. Models based on Logistic Regression and Random Forests have demonstrated excellent predictive capabilities. Land use and wildfire

  2. Fractional Fourier domain optical image hiding using phase retrieval algorithm based on iterative nonlinear double random phase encoding.

    PubMed

    Wang, Xiaogang; Chen, Wen; Chen, Xudong

    2014-09-22

    We present a novel image hiding method based on phase retrieval algorithm under the framework of nonlinear double random phase encoding in fractional Fourier domain. Two phase-only masks (POMs) are efficiently determined by using the phase retrieval algorithm, in which two cascaded phase-truncated fractional Fourier transforms (FrFTs) are involved. No undesired information disclosure, post-processing of the POMs or digital inverse computation appears in our proposed method. In order to achieve the reduction in key transmission, a modified image hiding method based on the modified phase retrieval algorithm and logistic map is further proposed in this paper, in which the fractional orders and the parameters with respect to the logistic map are regarded as encryption keys. Numerical results have demonstrated the feasibility and effectiveness of the proposed algorithms.

  3. An automatic water body area monitoring algorithm for satellite images based on Markov Random Fields

    NASA Astrophysics Data System (ADS)

    Elmi, Omid; Tourian, Mohammad J.; Sneeuw, Nico

    2016-04-01

    Our knowledge about spatial and temporal variation of hydrological parameters are surprisingly poor, because most of it is based on in situ stations and the number of stations have reduced dramatically during the past decades. On the other hand, remote sensing techniques have proven their ability to measure different parameters of Earth phenomena. Optical and SAR satellite imagery provide the opportunity to monitor the spatial change in coastline, which can serve as a way to determine the water extent repeatedly in an appropriate time interval. An appropriate classification technique to separate water and land is the backbone of each automatic water body monitoring. Due to changes in the water level, river and lake extent, atmosphere, sunlight radiation and onboard calibration of the satellite over time, most of the pixel-based classification techniques fail to determine accurate water masks. Beyond pixel intensity, spatial correlation between neighboring pixels is another source of information that should be used to decide the label of pixels. Water bodies have strong spatial correlation in satellite images. Therefore including contextual information as additional constraint into the procedure of water body monitoring improves the accuracy of the derived water masks significantly. In this study, we present an automatic algorithm for water body area monitoring based on maximum a posteriori (MAP) estimation of Markov Random Fields (MRF). First we collect all available images from selected case studies during the monitoring period. Then for each image separately we apply a k-means clustering to derive a primary water mask. After that we develop a MRF using pixel values and the primary water mask for each image. Then among the different realizations of the field we select the one that maximizes the posterior estimation. We solve this optimization problem using graph cut techniques. A graph with two terminals is constructed, after which the best labelling structure for

  4. Normalized algorithm for mapping and dating forest disturbances and regrowth for the United States

    NASA Astrophysics Data System (ADS)

    He, Liming; Chen, Jing M.; Zhang, Shaoliang; Gomez, Gustavo; Pan, Yude; McCullough, Kevin; Birdsey, Richard; Masek, Jeffrey G.

    2011-04-01

    Forest disturbances such as harvesting, wildfire and insect infestation are critical ecosystem processes affecting the carbon cycle. Because carbon dynamics are related to time since disturbance, forest stand age that can be used as a surrogate for major clear-cut/fire disturbance information has recently been recognized as an important input to forest carbon cycle models for improving prediction accuracy. In this study, forest disturbances in the USA for the period of ˜1990-2000 were mapped using 400+ pairs of re-sampled Landsat TM/ETM scenes in 500m resolution, which were provided by the Landsat Ecosystem Disturbance Adaptive Processing System project. The detected disturbances were then separated into two five-year age groups, facilitated by Forest Inventory and Analysis (FIA) data, which was used to calculate the area of forest regeneration for each county in the USA. In this study, a disturbance index (DI) was defined as the ratio of the short wave-infrared (SWIR, band 5) to near-infrared (NIR, band 4) reflectance. Forest disturbances were identified through the Normalized Difference of Disturbance Index (NDDI) between circa 2000 and 1990, where a positive NDDI means disturbance and a negative NDDI means regrowth. Axis rotation was performed on the plot between DIs of the two matched Landsat scenes in order to reduce any difference of DIs caused by non-disturbance factors. The threshold of NDDI for each TM/ETM pair was determined by analysis of FIA data. Minor disturbances affecting small areas may be omitted due to the coarse resolution of the aggregated Landsat data, but the major stand-clearing disturbances (clear-cut harvest, fire) are captured. The spatial distribution of the detected disturbed areas was validated by Monitoring Trends in Burn Severity fire data in four States of the western USA (Washington, Oregon, Idaho, and California). Results indicate omission errors of 66.9%. An important application of this remote sensing-based disturbance map is

  5. Classification of Potential Water Bodies Using Landsat 8 OLI and a Combination of Two Boosted Random Forest Classifiers.

    PubMed

    Ko, Byoung Chul; Kim, Hyeong Hun; Nam, Jae Yeal

    2015-06-11

    This study proposes a new water body classification method using top-of-atmosphere (TOA) reflectance and water indices (WIs) of the Landsat 8 Operational Land Imager (OLI) sensor and its corresponding random forest classifiers. In this study, multispectral images from the OLI sensor are represented as TOA reflectance and WI values because a classification result using two measures is better than raw spectral images. Two types of boosted random forest (BRF) classifiers are learned using TOA reflectance and WI values, respectively, instead of the heuristic threshold or unsupervised methods. The final probability is summed linearly using the probabilities of two different BRFs to classify image pixels to water class. This study first demonstrates that the Landsat 8 OLI sensor has higher classification rate because it provides improved signal-to-ratio radiometric by using 12-bit quantization of the data instead of 8-bit as available from other sensors. In addition, we prove that the performance of the proposed combination of two BRF classifiers shows robust water body classification results, regardless of topology, river properties, and background environment.

  6. Identification of and handling of critical irradiance forecast uncertainties using a Random Forest scheme - a case study for southern Brazil

    NASA Astrophysics Data System (ADS)

    Beyer, Hans Georg; Preed Revheim, Pal; Kratenber, Manfred Georg; Zuern, Hans Helmut

    2015-04-01

    For the secure operation of the utility grid, especially in grid with a high penetration of volatile wind or solar generation forecasts of the respective power flows are essential. Full profit of the forecasts can, however not been taken without the knowledge of their uncertainties. Based on irradiance data from southern Brazil we present a scheme for the identification of situations for which elevated forecast errors are to be expected. For this, the classification technique Random Forests applied, using the history of the site specific irradiance data forecast errors together with a set of auxiliary meteorological variables. As byproduct, extracted systematic forecast errors are used to update the forecasts. The predictive performance of the Random Forest models are assessed by the predictions ability to reduce the number of hours or days with forecast errors exceeding the limits, and on the resulting overall forecast RMSE. Limited to none improvements are obtained when predicting next-hour forecast errors, while significant improvements are obtained when predicting next-day forecast errors. Setting a relatively low limit for forecast error exceedance is found to give the largest improvements in terms of reduction of forecast RMSE.

  7. Classification of Potential Water Bodies Using Landsat 8 OLI and a Combination of Two Boosted Random Forest Classifiers

    PubMed Central

    Ko, Byoung Chul; Kim, Hyeong Hun; Nam, Jae Yeal

    2015-01-01

    This study proposes a new water body classification method using top-of-atmosphere (TOA) reflectance and water indices (WIs) of the Landsat 8 Operational Land Imager (OLI) sensor and its corresponding random forest classifiers. In this study, multispectral images from the OLI sensor are represented as TOA reflectance and WI values because a classification result using two measures is better than raw spectral images. Two types of boosted random forest (BRF) classifiers are learned using TOA reflectance and WI values, respectively, instead of the heuristic threshold or unsupervised methods. The final probability is summed linearly using the probabilities of two different BRFs to classify image pixels to water class. This study first demonstrates that the Landsat 8 OLI sensor has higher classification rate because it provides improved signal-to-ratio radiometric by using 12-bit quantization of the data instead of 8-bit as available from other sensors. In addition, we prove that the performance of the proposed combination of two BRF classifiers shows robust water body classification results, regardless of topology, river properties, and background environment. PMID:26110405

  8. A Novel Compressed Sensing Method for Magnetic Resonance Imaging: Exponential Wavelet Iterative Shrinkage-Thresholding Algorithm with Random Shift.

    PubMed

    Zhang, Yudong; Yang, Jiquan; Yang, Jianfei; Liu, Aijun; Sun, Ping

    2016-01-01

    Aim. It can help improve the hospital throughput to accelerate magnetic resonance imaging (MRI) scanning. Patients will benefit from less waiting time. Task. In the last decade, various rapid MRI techniques on the basis of compressed sensing (CS) were proposed. However, both computation time and reconstruction quality of traditional CS-MRI did not meet the requirement of clinical use. Method. In this study, a novel method was proposed with the name of exponential wavelet iterative shrinkage-thresholding algorithm with random shift (abbreviated as EWISTARS). It is composed of three successful components: (i) exponential wavelet transform, (ii) iterative shrinkage-thresholding algorithm, and (iii) random shift. Results. Experimental results validated that, compared to state-of-the-art approaches, EWISTARS obtained the least mean absolute error, the least mean-squared error, and the highest peak signal-to-noise ratio. Conclusion. EWISTARS is superior to state-of-the-art approaches. PMID:27066068

  9. A Novel Compressed Sensing Method for Magnetic Resonance Imaging: Exponential Wavelet Iterative Shrinkage-Thresholding Algorithm with Random Shift

    PubMed Central

    Zhang, Yudong; Yang, Jiquan; Yang, Jianfei; Liu, Aijun; Sun, Ping

    2016-01-01

    Aim. It can help improve the hospital throughput to accelerate magnetic resonance imaging (MRI) scanning. Patients will benefit from less waiting time. Task. In the last decade, various rapid MRI techniques on the basis of compressed sensing (CS) were proposed. However, both computation time and reconstruction quality of traditional CS-MRI did not meet the requirement of clinical use. Method. In this study, a novel method was proposed with the name of exponential wavelet iterative shrinkage-thresholding algorithm with random shift (abbreviated as EWISTARS). It is composed of three successful components: (i) exponential wavelet transform, (ii) iterative shrinkage-thresholding algorithm, and (iii) random shift. Results. Experimental results validated that, compared to state-of-the-art approaches, EWISTARS obtained the least mean absolute error, the least mean-squared error, and the highest peak signal-to-noise ratio. Conclusion. EWISTARS is superior to state-of-the-art approaches. PMID:27066068

  10. Model parameter adaption-based multi-model algorithm for extended object tracking using a random matrix.

    PubMed

    Li, Borui; Mu, Chundi; Han, Shuli; Bai, Tianming

    2014-01-01

    Traditional object tracking technology usually regards the target as a point source object. However, this approximation is no longer appropriate for tracking extended objects such as large targets and closely spaced group objects. Bayesian extended object tracking (EOT) using a random symmetrical positive definite (SPD) matrix is a very effective method to jointly estimate the kinematic state and physical extension of the target. The key issue in the application of this random matrix-based EOT approach is to model the physical extension and measurement noise accurately. Model parameter adaptive approaches for both extension dynamic and measurement noise are proposed in this study based on the properties of the SPD matrix to improve the performance of extension estimation. An interacting multi-model algorithm based on model parameter adaptive filter using random matrix is also presented. Simulation results demonstrate the effectiveness of the proposed adaptive approaches and multi-model algorithm. The estimation performance of physical extension is better than the other algorithms, especially when the target maneuvers. The kinematic state estimation error is lower than the others as well. PMID:24763252

  11. Model parameter adaption-based multi-model algorithm for extended object tracking using a random matrix.

    PubMed

    Li, Borui; Mu, Chundi; Han, Shuli; Bai, Tianming

    2014-04-24

    Traditional object tracking technology usually regards the target as a point source object. However, this approximation is no longer appropriate for tracking extended objects such as large targets and closely spaced group objects. Bayesian extended object tracking (EOT) using a random symmetrical positive definite (SPD) matrix is a very effective method to jointly estimate the kinematic state and physical extension of the target. The key issue in the application of this random matrix-based EOT approach is to model the physical extension and measurement noise accurately. Model parameter adaptive approaches for both extension dynamic and measurement noise are proposed in this study based on the properties of the SPD matrix to improve the performance of extension estimation. An interacting multi-model algorithm based on model parameter adaptive filter using random matrix is also presented. Simulation results demonstrate the effectiveness of the proposed adaptive approaches and multi-model algorithm. The estimation performance of physical extension is better than the other algorithms, especially when the target maneuvers. The kinematic state estimation error is lower than the others as well.

  12. Use of a porous material description of forests in infrasonic propagation algorithms.

    PubMed

    Swearingen, Michelle E; White, Michael J; Ketcham, Stephen A; McKenna, Mihan H

    2013-10-01

    Infrasound can propagate very long distances and remain at measurable levels. As a result infrasound sensing is used for remote monitoring in many applications. At local ranges, on the order of 10 km, the influence of the presence or absence of forests on the propagation of infrasonic signals is considered. Because the wavelengths of interest are much larger than the scale of individual components, the forest is modeled as a porous material. This approximation is developed starting with the relaxation model of porous materials. This representation is then incorporated into a Crank-Nicholson method parabolic equation solver to determine the relative impacts of the physical parameters of a forest (trunk size and basal area), the presence of gaps/trees in otherwise continuous forest/open terrain, and the effects of meteorology coupled with the porous layer. Finally, the simulations are compared to experimental data from a 10.9 kg blast propagated 14.5 km. Comparison to the experimental data shows that appropriate inclusion of a forest layer along the propagation path provides a closer fit to the data than solely changing the ground type across the frequency range from 1 to 30 Hz. PMID:24116403

  13. Use of a porous material description of forests in infrasonic propagation algorithms.

    PubMed

    Swearingen, Michelle E; White, Michael J; Ketcham, Stephen A; McKenna, Mihan H

    2013-10-01

    Infrasound can propagate very long distances and remain at measurable levels. As a result infrasound sensing is used for remote monitoring in many applications. At local ranges, on the order of 10 km, the influence of the presence or absence of forests on the propagation of infrasonic signals is considered. Because the wavelengths of interest are much larger than the scale of individual components, the forest is modeled as a porous material. This approximation is developed starting with the relaxation model of porous materials. This representation is then incorporated into a Crank-Nicholson method parabolic equation solver to determine the relative impacts of the physical parameters of a forest (trunk size and basal area), the presence of gaps/trees in otherwise continuous forest/open terrain, and the effects of meteorology coupled with the porous layer. Finally, the simulations are compared to experimental data from a 10.9 kg blast propagated 14.5 km. Comparison to the experimental data shows that appropriate inclusion of a forest layer along the propagation path provides a closer fit to the data than solely changing the ground type across the frequency range from 1 to 30 Hz.

  14. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy. PMID:26687087

  15. GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran.

    PubMed

    Naghibi, Seyed Amir; Pourghasemi, Hamid Reza; Dixon, Barnali

    2016-01-01

    Groundwater is considered one of the most valuable fresh water resources. The main objective of this study was to produce groundwater spring potential maps in the Koohrang Watershed, Chaharmahal-e-Bakhtiari Province, Iran, using three machine learning models: boosted regression tree (BRT), classification and regression tree (CART), and random forest (RF). Thirteen hydrological-geological-physiographical (HGP) factors that influence locations of springs were considered in this research. These factors include slope degree, slope aspect, altitude, topographic wetness index (TWI), slope length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, lithology, land use, drainage density, and fault density. Subsequently, groundwater spring potential was modeled and mapped using CART, RF, and BRT algorithms. The predicted results from the three models were validated using the receiver operating characteristics curve (ROC). From 864 springs identified, 605 (≈70 %) locations were used for the spring potential mapping, while the remaining 259 (≈30 %) springs were used for the model validation. The area under the curve (AUC) for the BRT model was calculated as 0.8103 and for CART and RF the AUC were 0.7870 and 0.7119, respectively. Therefore, it was concluded that the BRT model produced the best prediction results while predicting locations of springs followed by CART and RF models, respectively. Geospatially integrated BRT, CART, and RF methods proved to be useful in generating the spring potential map (SPM) with reasonable accuracy.

  16. Discrimination of raw and processed Dipsacus asperoides by near infrared spectroscopy combined with least squares-support vector machine and random forests

    NASA Astrophysics Data System (ADS)

    Xin, Ni; Gu, Xiao-Feng; Wu, Hao; Hu, Yu-Zhu; Yang, Zhong-Lin

    2012-04-01

    Most herbal medicines could be processed to fulfill the different requirements of therapy. The purpose of this study was to discriminate between raw and processed Dipsacus asperoides, a common traditional Chinese medicine, based on their near infrared (NIR) spectra. Least squares-support vector machine (LS-SVM) and random forests (RF) were employed for full-spectrum classification. Three types of kernels, including linear kernel, polynomial kernel and radial basis function kernel (RBF), were checked for optimization of LS-SVM model. For comparison, a linear discriminant analysis (LDA) model was performed for classification, and the successive projections algorithm (SPA) was executed prior to building an LDA model to choose an appropriate subset of wavelengths. The three methods were applied to a dataset containing 40 raw herbs and 40 corresponding processed herbs. We ran 50 runs of 10-fold cross validation to evaluate the model's efficiency. The performance of the LS-SVM with RBF kernel (RBF LS-SVM) was better than the other two kernels. The RF, RBF LS-SVM and SPA-LDA successfully classified all test samples. The mean error rates for the 50 runs of 10-fold cross validation were 1.35% for RBF LS-SVM, 2.87% for RF, and 2.50% for SPA-LDA. The best classification results were obtained by using LS-SVM with RBF kernel, while RF was fast in the training and making predictions.

  17. An algorithm to detect chimeric clones and random noise in genomic mapping

    SciTech Connect

    Grigoriev, A.; Mott, R.; Lehrach, H.

    1994-07-15

    Experimental noise and contiguous clone inserts can pose serious problems in reconstructing genomic maps from hybridization data. The authors describe an algorithm that easily identifies false positive signals and clones containing chimeric inserts/internal deletions. The algorithm {open_quotes}dechimerizes{close_quotes} clones, splitting them into independent contiguous components and cleaning the initial library into a more consistent data set for further ordering. The effectiveness of the algorithm is demonstrated on both simulated data and the real YAC map of the whole genome genome of the fission yeast Schizosaccharomyces pombe. 8 refs., 3 figs., 1 tab.

  18. Computed tomography synthesis from magnetic resonance images in the pelvis using multiple random forests and auto-context features

    NASA Astrophysics Data System (ADS)

    Andreasen, Daniel; Edmund, Jens M.; Zografos, Vasileios; Menze, Bjoern H.; Van Leemput, Koen

    2016-03-01

    In radiotherapy treatment planning that is only based on magnetic resonance imaging (MRI), the electron density information usually obtained from computed tomography (CT) must be derived from the MRI by synthesizing a so-called pseudo CT (pCT). This is a non-trivial task since MRI intensities are neither uniquely nor quantitatively related to electron density. Typical approaches involve either a classification or regression model requiring specialized MRI sequences to solve intensity ambiguities, or an atlas-based model necessitating multiple registrations between atlases and subject scans. In this work, we explore a machine learning approach for creating a pCT of the pelvic region from conventional MRI sequences without using atlases. We use a random forest provided with information about local texture, edges and spatial features derived from the MRI. This helps to solve intensity ambiguities. Furthermore, we use the concept of auto-context by sequentially training a number of classification forests to create and improve context features, which are finally used to train a regression forest for pCT prediction. We evaluate the pCT quality in terms of the voxel-wise error and the radiologic accuracy as measured by water-equivalent path lengths. We compare the performance of our method against two baseline pCT strategies, which either set all MRI voxels in the subject equal to the CT value of water, or in addition transfer the bone volume from the real CT. We show an improved performance compared to both baseline pCTs suggesting that our method may be useful for MRI-only radiotherapy.

  19. A randomized controlled trial of a diagnostic algorithm for symptoms of uncomplicated cystitis at an out-of-hours service

    PubMed Central

    Grude, Nils; Lindbaek, Morten

    2015-01-01

    Objective. To compare the clinical outcome of patients presenting with symptoms of uncomplicated cystitis who were seen by a doctor, with patients who were given treatment following a diagnostic algorithm. Design. Randomized controlled trial. Setting. Out-of-hours service, Oslo, Norway. Intervention. Women with typical symptoms of uncomplicated cystitis were included in the trial in the time period September 2010–November 2011. They were randomized into two groups. One group received standard treatment according to the diagnostic algorithm, the other group received treatment after a regular consultation by a doctor. Subjects. Women (n = 441) aged 16–55 years. Mean age in both groups 27 years. Main outcome measures. Number of days until symptomatic resolution. Results. No significant differences were found between the groups in the basic patient demographics, severity of symptoms, or percentage of urine samples with single culture growth. A median of three days until symptomatic resolution was found in both groups. By day four 79% in the algorithm group and 72% in the regular consultation group were free of symptoms (p = 0.09). The number of patients who contacted a doctor again in the follow-up period and received alternative antibiotic treatment was insignificantly higher (p = 0.08) after regular consultation than after treatment according to the diagnostic algorithm. There were no cases of severe pyelonephritis or hospital admissions during the follow-up period. Conclusion. Using a diagnostic algorithm is a safe and efficient method for treating women with symptoms of uncomplicated cystitis at an out-of-hours service. This simplification of treatment strategy can lead to a more rational use of consultation time and a stricter adherence to National Antibiotic Guidelines for a common disorder. PMID:25961367

  20. Deep neural network and random forest hybrid architecture for learning to detect retinal vessels in fundus images.

    PubMed

    Maji, Debapriya; Santara, Anirban; Ghosh, Sambuddha; Sheet, Debdoot; Mitra, Pabitra

    2015-08-01

    Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large-scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning based hybrid architecture for reliable detection of blood vessels in fundus color images. A deep neural network (DNN) is used for unsupervised learning of vesselness dictionaries using sparse trained denoising auto-encoders (DAE), followed by supervised learning of the DNN response using a random forest for detecting vessels in color fundus images. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with max. avg. accuracy of 0.9327 and area under ROC curve of 0.9195.

  1. Deep neural network and random forest hybrid architecture for learning to detect retinal vessels in fundus images.

    PubMed

    Maji, Debapriya; Santara, Anirban; Ghosh, Sambuddha; Sheet, Debdoot; Mitra, Pabitra

    2015-08-01

    Vision impairment due to pathological damage of the retina can largely be prevented through periodic screening using fundus color imaging. However the challenge with large-scale screening is the inability to exhaustively detect fine blood vessels crucial to disease diagnosis. In this work we present a computational imaging framework using deep and ensemble learning based hybrid architecture for reliable detection of blood vessels in fundus color images. A deep neural network (DNN) is used for unsupervised learning of vesselness dictionaries using sparse trained denoising auto-encoders (DAE), followed by supervised learning of the DNN response using a random forest for detecting vessels in color fundus images. In experimental evaluation with the DRIVE database, we achieve the objective of vessel detection with max. avg. accuracy of 0.9327 and area under ROC curve of 0.9195. PMID:26736930

  2. Random sampler M-estimator algorithm with sequential probability ratio test for robust function approximation via feed-forward neural networks.

    PubMed

    El-Melegy, Moumen T

    2013-07-01

    This paper addresses the problem of fitting a functional model to data corrupted with outliers using a multilayered feed-forward neural network. Although it is of high importance in practical applications, this problem has not received careful attention from the neural network research community. One recent approach to solving this problem is to use a neural network training algorithm based on the random sample consensus (RANSAC) framework. This paper proposes a new algorithm that offers two enhancements over the original RANSAC algorithm. The first one improves the algorithm accuracy and robustness by employing an M-estimator cost function to decide on the best estimated model from the randomly selected samples. The other one improves the time performance of the algorithm by utilizing a statistical pretest based on Wald's sequential probability ratio test. The proposed algorithm is successfully evaluated on synthetic and real data, contaminated with varying degrees of outliers, and compared with existing neural network training algorithms.

  3. Alignment control of carbon nanotube forest from random to nearly perfectly aligned by utilizing the crowding effect.

    PubMed

    Xu, Ming; Futaba, Don N; Yumura, Motoo; Hata, Kenji

    2012-07-24

    Alignment represents an important structural parameter of carbon nanotubes (CNTs) owing to their exceptionally high aspect ratio, one-dimensional property. In this paper, we demonstrate a general approach to control the alignment of few-walled CNT forests from nearly random to nearly ideally aligned by tailoring the density of active catalysts at the catalyst formation stage, which can be experimentally achieved by controlling the CNT forest mass density. Experimentally, we found that the catalyst density and the degree of alignment were inseparably linked because of a crowding effect from neighboring CNTs, that is, the increasing confinement of CNTs with increased density. Therefore, the CNT density governed the degree of alignment, which increased monotonically with the density. This relationship, in turn, allowed the precise control of the alignment through control of the mass density. To understand this behavior further, we developed a simple, first-order model based on the flexural modulus of the CNTs that could quantitatively describe the relationship between the degree of alignment (HOF) and carbon nanotube spacing (crowding effect) of any type of CNTs.

  4. Evaluation of Laser Based Alignment Algorithms Under Additive Random and Diffraction Noise

    SciTech Connect

    McClay, W A; Awwal, A; Wilhelmsen, K; Ferguson, W; McGee, M; Miller, M

    2004-09-30

    The purpose of the automatic alignment algorithm at the National Ignition Facility (NIF) is to determine the position of a laser beam based on the position of beam features from video images. The position information obtained is used to command motors and attenuators to adjust the beam lines to the desired position, which facilitates the alignment of all 192 beams. One of the goals of the algorithm development effort is to ascertain the performance, reliability, and uncertainty of the position measurement. This paper describes a method of evaluating the performance of algorithms using Monte Carlo simulation. In particular we show the application of this technique to the LM1{_}LM3 algorithm, which determines the position of a series of two beam light sources. The performance of the algorithm was evaluated for an ensemble of over 900 simulated images with varying image intensities and noise counts, as well as varying diffraction noise amplitude and frequency. The performance of the algorithm on the image data set had a tolerance well beneath the 0.5-pixel system requirement.

  5. A Comparison of Hourly Typhoon Rainfall Forecasting Models Based on Support Vector Machines and Random Forests with Different Predictor Sets

    NASA Astrophysics Data System (ADS)

    Lin, Kun-Hsiang; Tseng, Hung-Wei; Kuo, Chen-Min; Yang, Tao-Chang; Yu, Pao-Shan

    2016-04-01

    Typhoons with heavy rainfall and strong wind often cause severe floods and losses in Taiwan, which motivates the development of rainfall forecasting models as part of an early warning system. Thus, this study aims to develop rainfall forecasting models based on two machine learning methods, support vector machines (SVMs) and random forests (RFs), and investigate the performances of the models with different predictor sets for searching the optimal predictor set in forecasting. Four predictor sets were used: (1) antecedent rainfalls, (2) antecedent rainfalls and typhoon characteristics, (3) antecedent rainfalls and meteorological factors, and (4) antecedent rainfalls, typhoon characteristics and meteorological factors to construct for 1- to 6-hour ahead rainfall forecasting. An application to three rainfall stations in Yilan River basin, northeastern Taiwan, was conducted. Firstly, the performance of the SVMs-based forecasting model with predictor set #1 was analyzed. The results show that the accuracy of the models for 2- to 6-hour ahead forecasting decrease rapidly as compared to the accuracy of the model for 1-hour ahead forecasting which is acceptable. For improving the model performance, each predictor set was further examined in the SVMs-based forecasting model. The results reveal that the SVMs-based model using predictor set #4 as input variables performs better than the other sets and a significant improvement of model performance is found especially for the long lead time forecasting. Lastly, the performance of the SVMs-based model using predictor set #4 as input variables was compared with the performance of the RFs-based model using predictor set #4 as input variables. It is found that the RFs-based model is superior to the SVMs-based model in hourly typhoon rainfall forecasting. Keywords: hourly typhoon rainfall forecasting, predictor selection, support vector machines, random forests

  6. Identifying the origin of groundwater samples in a multi-layer aquifer system with Random Forest classification

    NASA Astrophysics Data System (ADS)

    Baudron, Paul; Alonso-Sarría, Francisco; García-Aróstegui, José Luís; Cánovas-García, Fulgencio; Martínez-Vicente, David; Moreno-Brotóns, Jesús

    2013-08-01

    Accurate identification of the origin of groundwater samples is not always possible in complex multilayered aquifers. This poses a major difficulty for a reliable interpretation of geochemical results. The problem is especially severe when the information on the tubewells design is hard to obtain. This paper shows a supervised classification method based on the Random Forest (RF) machine learning technique to identify the layer from where groundwater samples were extracted. The classification rules were based on the major ion composition of the samples. We applied this method to the Campo de Cartagena multi-layer aquifer system, in southeastern Spain. A large amount of hydrogeochemical data was available, but only a limited fraction of the sampled tubewells included a reliable determination of the borehole design and, consequently, of the aquifer layer being exploited. Added difficulty was the very similar compositions of water samples extracted from different aquifer layers. Moreover, not all groundwater samples included the same geochemical variables. Despite of the difficulty of such a background, the Random Forest classification reached accuracies over 90%. These results were much better than the Linear Discriminant Analysis (LDA) and Decision Trees (CART) supervised classification methods. From a total of 1549 samples, 805 proceeded from one unique identified aquifer, 409 proceeded from a possible blend of waters from several aquifers and 335 were of unknown origin. Only 468 of the 805 unique-aquifer samples included all the chemical variables needed to calibrate and validate the models. Finally, 107 of the groundwater samples of unknown origin could be classified. Most unclassified samples did not feature a complete dataset. The uncertainty on the identification of training samples was taken in account to enhance the model. Most of the samples that could not be identified had an incomplete dataset.

  7. An Improved Patient-Specific Mortality Risk Prediction in ICU in a Random Forest Classification Framework.

    PubMed

    Ghose, Soumya; Mitra, Jhimli; Khanna, Sankalp; Dowling, Jason

    2015-01-01

    Dynamic and automatic patient specific prediction of the risk associated with ICU mortality may facilitate timely and appropriate intervention of health professionals in hospitals. In this work, patient information and time series measurements of vital signs and laboratory results from the first 48 hours of ICU stays of 4000 adult patients from a publicly available dataset are used to design and validate a mortality prediction system. An ensemble of decision trees are used to simultaneously predict and associate a risk score against each patient in a k-fold validation framework. Risk assessment prediction accuracy of 87% is achieved with our model and the results show significant improvement over a baseline algorithm of SAPS-I that is commonly used for mortality prediction in ICU. The performance of our model is further compared to other state-of-the-art algorithms evaluated on the same dataset. PMID:26210418

  8. On the efficiency of a randomized mirror descent algorithm in online optimization problems

    NASA Astrophysics Data System (ADS)

    Gasnikov, A. V.; Nesterov, Yu. E.; Spokoiny, V. G.

    2015-04-01

    A randomized online version of the mirror descent method is proposed. It differs from the existing versions by the randomization method. Randomization is performed at the stage of the projection of a subgradient of the function being optimized onto the unit simplex rather than at the stage of the computation of a subgradient, which is common practice. As a result, a componentwise subgradient descent with a randomly chosen component is obtained, which admits an online interpretation. This observation, for example, has made it possible to uniformly interpret results on weighting expert decisions and propose the most efficient method for searching for an equilibrium in a zero-sum two-person matrix game with sparse matrix.

  9. Tracing Forest Change through 40 Years on Two Continents with the BULC Algorithm and Google Earth Engine

    NASA Astrophysics Data System (ADS)

    Cardille, J. A.

    2015-12-01

    With the opening of the Landsat archive, researchers have a vast new data source teeming with imagery and potential. Beyond Landsat, data from other sensors is newly available as well: these include ALOS/PALSAR, Sentinel-1 and -2, MERIS, and many more. Google Earth Engine, developed to organize and provide analysis tools for these immense data sets, is an ideal platform for researchers trying to sift through huge image stacks. It offers nearly unlimited processing power and storage with a straightforward programming interface. Yet labeling forest change through time remains challenging given the current state of the art for interpreting remote sensing image sequences. Moreover, combining data from very different image platforms remains quite difficult. To address these challenges, we developed the BULC algorithm (Bayesian Updating of Land Cover), designed for the continuous updating of land-cover classifications through time in large data sets. The algorithm ingests data from any of the wide variety of earth-resources sensors; it maintains a running estimate of land-cover probabilities and the most probable class at all time points along a sequence of events. Here we compare BULC results from two study sites that witnessed considerable forest change in the last 40 years: the Pacific Northwest of the United States and the Mato Grosso region of Brazil. In Brazil, we incorporated rough classifications from more than 100 images of varying quality, mixing imagery from more than 10 different sensors. In the Pacific Northwest, we used BULC to identify forest changes due to logging and urbanization from 1973 to the present. Both regions had classification sequences that were better than many of the component days, effectively ignoring clouds and other unwanted signal while fusing the information contained on several platforms. As we leave remote sensing's data-poor era and enter a period with multiple looks at Earth's surface from multiple sensors over a short period of

  10. Multi-Objective Random Search Algorithm for Simultaneously Optimizing Wind Farm Layout and Number of Turbines

    NASA Astrophysics Data System (ADS)

    Feng, Ju; Shen, Wen Zhong; Xu, Chang

    2016-09-01

    A new algorithm for multi-objective wind farm layout optimization is presented. It formulates the wind turbine locations as continuous variables and is capable of optimizing the number of turbines and their locations in the wind farm simultaneously. Two objectives are considered. One is to maximize the total power production, which is calculated by considering the wake effects using the Jensen wake model combined with the local wind distribution. The other is to minimize the total electrical cable length. This length is assumed to be the total length of the minimal spanning tree that connects all turbines and is calculated by using Prim's algorithm. Constraints on wind farm boundary and wind turbine proximity are also considered. An ideal test case shows the proposed algorithm largely outperforms a famous multi-objective genetic algorithm (NSGA-II). In the real test case based on the Horn Rev 1 wind farm, the algorithm also obtains useful Pareto frontiers and provides a wide range of Pareto optimal layouts with different numbers of turbines for a real-life wind farm developer.

  11. Genome-wide association data classification and SNPs selection using two-stage quality-based Random Forests

    PubMed Central

    2015-01-01

    Background Single-nucleotide polymorphisms (SNPs) selection and identification are the most important tasks in Genome-wide association data analysis. The problem is difficult because genome-wide association data is very high dimensional and a large portion of SNPs in the data is irrelevant to the disease. Advanced machine learning methods have been successfully used in Genome-wide association studies (GWAS) for identification of genetic variants that have relatively big effects in some common, complex diseases. Among them, the most successful one is Random Forests (RF). Despite of performing well in terms of prediction accuracy in some data sets with moderate size, RF still suffers from working in GWAS for selecting informative SNPs and building accurate prediction models. In this paper, we propose to use a new two-stage quality-based sampling method in random forests, named ts-RF, for SNP subspace selection for GWAS. The method first applies p-value assessment to find a cut-off point that separates informative and irrelevant SNPs in two groups. The informative SNPs group is further divided into two sub-groups: highly informative and weak informative SNPs. When sampling the SNP subspace for building trees for the forest, only those SNPs from the two sub-groups are taken into account. The feature subspaces always contain highly informative SNPs when used to split a node at a tree. Results This approach enables one to generate more accurate trees with a lower prediction error, meanwhile possibly avoiding overfitting. It allows one to detect interactions of multiple SNPs with the diseases, and to reduce the dimensionality and the amount of Genome-wide association data needed for learning the RF model. Extensive experiments on two genome-wide SNP data sets (Parkinson case-control data comprised of 408,803 SNPs and Alzheimer case-control data comprised of 380,157 SNPs) and 10 gene data sets have demonstrated that the proposed model significantly reduced prediction

  12. A Proposed Extension to the Soil Moisture and Ocean Salinity Level 2 Algorithm for Mixed Forest and Moderate Vegetation Pixels

    NASA Technical Reports Server (NTRS)

    Panciera, Rocco; Walker, Jeffrey P.; Kalma, Jetse; Kim, Edward

    2011-01-01

    The Soil Moisture and Ocean Salinity (SMOS)mission, launched in November 2009, provides global maps of soil moisture and ocean salinity by measuring the L-band (1.4 GHz) emission of the Earth's surface with a spatial resolution of 40-50 km.Uncertainty in the retrieval of soilmoisture over large heterogeneous areas such as SMOS pixels is expected, due to the non-linearity of the relationship between soil moisture and the microwave emission. The current baseline soilmoisture retrieval algorithm adopted by SMOS and implemented in the SMOS Level 2 (SMOS L2) processor partially accounts for the sub-pixel heterogeneity of the land surface, by modelling the individual contributions of different pixel fractions to the overall pixel emission. This retrieval approach is tested in this study using airborne L-band data over an area the size of a SMOS pixel characterised by a mix Eucalypt forest and moderate vegetation types (grassland and crops),with the objective of assessing its ability to correct for the soil moisture retrieval error induced by the land surface heterogeneity. A preliminary analysis using a traditional uniform pixel retrieval approach shows that the sub-pixel heterogeneity of land cover type causes significant errors in soil moisture retrieval (7.7%v/v RMSE, 2%v/v bias) in pixels characterised by a significant amount of forest (40-60%). Although the retrieval approach adopted by SMOS partially reduces this error, it is affected by errors beyond the SMOS target accuracy, presenting in particular a strong dry bias when a fraction of the pixel is occupied by forest (4.1%v/v RMSE,-3.1%v/v bias). An extension to the SMOS approach is proposed that accounts for the heterogeneity of vegetation optical depth within the SMOS pixel. The proposed approach is shown to significantly reduce the error in retrieved soil moisture (2.8%v/v RMSE, -0.3%v/v bias) in pixels characterised by a critical amount of forest (40-60%), at the limited cost of only a crude estimate of the

  13. Segmentation of heterogeneous or small FDG PET positive tissue based on a 3D-locally adaptive random walk algorithm.

    PubMed

    Onoma, D P; Ruan, S; Thureau, S; Nkhali, L; Modzelewski, R; Monnehan, G A; Vera, P; Gardin, I

    2014-12-01

    A segmentation algorithm based on the random walk (RW) method, called 3D-LARW, has been developed to delineate small tumors or tumors with a heterogeneous distribution of FDG on PET images. Based on the original algorithm of RW [1], we propose an improved approach using new parameters depending on the Euclidean distance between two adjacent voxels instead of a fixed one and integrating probability densities of labels into the system of linear equations used in the RW. These improvements were evaluated and compared with the original RW method, a thresholding with a fixed value (40% of the maximum in the lesion), an adaptive thresholding algorithm on uniform spheres filled with FDG and FLAB method, on simulated heterogeneous spheres and on clinical data (14 patients). On these three different data, 3D-LARW has shown better segmentation results than the original RW algorithm and the three other methods. As expected, these improvements are more pronounced for the segmentation of small or tumors having heterogeneous FDG uptake.

  14. Random Forest ensembles for detection and prediction of Alzheimer's disease with a good between-cohort robustness.

    PubMed

    Lebedev, A V; Westman, E; Van Westen, G J P; Kramberger, M G; Lundervold, A; Aarsland, D; Soininen, H; Kłoszewska, I; Mecocci, P; Tsolaki, M; Vellas, B; Lovestone, S; Simmons, A

    2014-01-01

    Computer-aided diagnosis of Alzheimer's disease (AD) is a rapidly developing field of neuroimaging with strong potential to be used in practice. In this context, assessment of models' robustness to noise and imaging protocol differences together with post-processing and tuning strategies are key tasks to be addressed in order to move towards successful clinical applications. In this study, we investigated the efficacy of Random Forest classifiers trained using different structural MRI measures, with and without neuroanatomical constraints in the detection and prediction of AD in terms of accuracy and between-cohort robustness. From The ADNI database, 185 AD, and 225 healthy controls (HC) were randomly split into training and testing datasets. 165 subjects with mild cognitive impairment (MCI) were distributed according to the month of conversion to dementia (4-year follow-up). Structural 1.5-T MRI-scans were processed using Freesurfer segmentation and cortical reconstruction. Using the resulting output, AD/HC classifiers were trained. Training included model tuning and performance assessment using out-of-bag estimation. Subsequently the classifiers were validated on the AD/HC test set and for the ability to predict MCI-to-AD conversion. Models' between-cohort robustness was additionally assessed using the AddNeuroMed dataset acquired with harmonized clinical and imaging protocols. In the ADNI set, the best AD/HC sensitivity/specificity (88.6%/92.0% - test set) was achieved by combining cortical thickness and volumetric measures. The Random Forest model resulted in significantly higher accuracy compared to the reference classifier (linear Support Vector Machine). The models trained using parcelled and high-dimensional (HD) input demonstrated equivalent performance, but the former was more effective in terms of computation/memory and time costs. The sensitivity/specificity for detecting MCI-to-AD conversion (but not AD/HC classification performance) was further

  15. Mass weighted urn design--A new randomization algorithm for unequal allocations.

    PubMed

    Zhao, Wenle

    2015-07-01

    Unequal allocations have been used in clinical trials motivated by ethical, efficiency, or feasibility concerns. Commonly used permuted block randomization faces a tradeoff between effective imbalance control with a small block size and accurate allocation target with a large block size. Few other unequal allocation randomization designs have been proposed in literature with applications in real trials hardly ever been reported, partly due to their complexity in implementation compared to the permuted block randomization. Proposed in this paper is the mass weighted urn design, in which the number of balls in the urn equals to the number of treatments, and remains unchanged during the study. The chance a ball being randomly selected is proportional to the mass of the ball. After each treatment assignment, a part of the mass of the selected ball is re-distributed to all balls based on the target allocation ratio. This design allows any desired optimal unequal allocations be accurately targeted without approximation, and provides a consistent imbalance control throughout the allocation sequence. The statistical properties of this new design is evaluated with the Euclidean distance between the observed treatment distribution and the desired treatment distribution as the treatment imbalance measure; and the Euclidean distance between the conditional allocation probability and the target allocation probability as the allocation predictability measure. Computer simulation results are presented comparing the mass weighted urn design with other randomization designs currently available for unequal allocations.

  16. Development of Solution Algorithm and Sensitivity Analysis for Random Fuzzy Portfolio Selection Model

    NASA Astrophysics Data System (ADS)

    Hasuike, Takashi; Katagiri, Hideki

    2010-10-01

    This paper focuses on the proposition of a portfolio selection problem considering an investor's subjectivity and the sensitivity analysis for the change of subjectivity. Since this proposed problem is formulated as a random fuzzy programming problem due to both randomness and subjectivity presented by fuzzy numbers, it is not well-defined. Therefore, introducing Sharpe ratio which is one of important performance measures of portfolio models, the main problem is transformed into the standard fuzzy programming problem. Furthermore, using the sensitivity analysis for fuzziness, the analytical optimal portfolio with the sensitivity factor is obtained.

  17. A comparison of random forest and its Gini importance with standard chemometric methods for the feature selection and classification of spectral data

    PubMed Central

    Menze, Bjoern H; Kelm, B Michael; Masuch, Ralf; Himmelreich, Uwe; Bachert, Peter; Petrich, Wolfgang; Hamprecht, Fred A

    2009-01-01

    Background Regularized regression methods such as principal component or partial least squares regression perform well in learning tasks on high dimensional spectral data, but cannot explicitly eliminate irrelevant features. The random forest classifier with its associated Gini feature importance, on the other hand, allows for an explicit feature elimination, but may not be optimally adapted to spectral data due to the topology of its constituent classification trees which are based on orthogonal splits in feature space. Results We propose to combine the best of both approaches, and evaluated the joint use of a feature selection based on a recursive feature elimination using the Gini importance of random forests' together with regularized classification methods on spectral data sets from medical diagnostics, chemotaxonomy, biomedical analytics, food science, and synthetically modified spectral data. Here, a feature selection using the Gini feature importance with a regularized classification by discriminant partial least squares regression performed as well as or better than a filtering according to different univariate statistical tests, or using regression coefficients in a backward feature elimination. It outperformed the direct application of the random forest classifier, or the direct application of the regularized classifiers on the full set of features. Conclusion The Gini importance of the random forest provided superior means for measuring feature relevance on spectral data, but – on an optimal subset of features – the regularized classifiers might be preferable over the random forest classifier, in spite of their limitation to model linear dependencies only. A feature selection based on Gini importance, however, may precede a regularized linear classification to identify this optimal subset of features, and to earn a double benefit of both dimensionality reduction and the elimination of noise from the classification task. PMID:19591666

  18. Intra-and-Inter Species Biomass Prediction in a Plantation Forest: Testing the Utility of High Spatial Resolution Spaceborne Multispectral RapidEye Sensor and Advanced Machine Learning Algorithms

    PubMed Central

    Dube, Timothy; Mutanga, Onisimo; Adam, Elhadi; Ismail, Riyad

    2014-01-01

    The quantification of aboveground biomass using remote sensing is critical for better understanding the role of forests in carbon sequestration and for informed sustainable management. Although remote sensing techniques have been proven useful in assessing forest biomass in general, more is required to investigate their capabilities in predicting intra-and-inter species biomass which are mainly characterised by non-linear relationships. In this study, we tested two machine learning algorithms, Stochastic Gradient Boosting (SGB) and Random Forest (RF) regression trees to predict intra-and-inter species biomass using high resolution RapidEye reflectance bands as well as the derived vegetation indices in a commercial plantation. The results showed that the SGB algorithm yielded the best performance for intra-and-inter species biomass prediction; using all the predictor variables as well as based on the most important selected variables. For example using the most important variables the algorithm produced an R2 of 0.80 and RMSE of 16.93 t·ha−1 for E. grandis; R2 of 0.79, RMSE of 17.27 t·ha−1 for P. taeda and R2 of 0.61, RMSE of 43.39 t·ha−1 for the combined species data sets. Comparatively, RF yielded plausible results only for E. dunii (R2 of 0.79; RMSE of 7.18 t·ha−1). We demonstrated that although the two statistical methods were able to predict biomass accurately, RF produced weaker results as compared to SGB when applied to combined species dataset. The result underscores the relevance of stochastic models in predicting biomass drawn from different species and genera using the new generation high resolution RapidEye sensor with strategically positioned bands. PMID:25140631

  19. Mapping SOC (Soil Organic Carbon) using LiDAR-derived vegetation indices in a random forest regression model

    NASA Astrophysics Data System (ADS)

    Will, R. M.; Glenn, N. F.; Benner, S. G.; Pierce, J. L.; Spaete, L.; Li, A.

    2015-12-01

    Quantifying SOC (Soil Organic Carbon) storage in complex terrain is challenging due to high spatial variability. Generally, the challenge is met by transforming point data to the entire landscape using surrogate, spatially-distributed, variables like elevation or precipitation. In many ecosystems, remotely sensed information on above-ground vegetation (e.g. NDVI) is a good predictor of below-ground carbon stocks. In this project, we are attempting to improve this predictive method by incorporating LiDAR-derived vegetation indices. LiDAR provides a mechanism for improved characterization of aboveground vegetation by providing structural parameters such as vegetation height and biomass. In this study, a random forest model is used to predict SOC using a suite of LiDAR-derived vegetation indices as predictor variables. The Reynolds Creek Experimental Watershed (RCEW) is an ideal location for a study of this type since it encompasses a strong elevation/precipitation gradient that supports lower biomass sagebrush ecosystems at low elevations and forests with more biomass at higher elevations. Sagebrush ecosystems composed of Wyoming, Low and Mountain Sagebrush have SOC values ranging from .4 to 1% (top 30 cm), while higher biomass ecosystems composed of aspen, juniper and fir have SOC values approaching 4% (top 30 cm). Large differences in SOC have been observed between canopy and interspace locations and high resolution vegetation information is likely to explain plot scale variability in SOC. Mapping of the SOC reservoir will help identify underlying controls on SOC distribution and provide insight into which processes are most important in determining SOC in semi-arid mountainous regions. In addition, airborne LiDAR has the potential to characterize vegetation communities at a high resolution and could be a tool for improving estimates of SOC at larger scales.

  20. Fast Numerical Algorithms for 3-D Scattering from PEC and Dielectric Random Rough Surfaces in Microwave Remote Sensing

    NASA Astrophysics Data System (ADS)

    Zhang, Lisha

    We present fast and robust numerical algorithms for 3-D scattering from perfectly electrical conducting (PEC) and dielectric random rough surfaces in microwave remote sensing. The Coifman wavelets or Coiflets are employed to implement Galerkin's procedure in the method of moments (MoM). Due to the high-precision one-point quadrature, the Coiflets yield fast evaluations of the most off-diagonal entries, reducing the matrix fill effort from O(N2) to O( N). The orthogonality and Riesz basis of the Coiflets generate well conditioned impedance matrix, with rapid convergence for the conjugate gradient solver. The resulting impedance matrix is further sparsified by the matrix-formed standard fast wavelet transform (SFWT). By properly selecting multiresolution levels of the total transformation matrix, the solution precision can be enhanced while matrix sparsity and memory consumption have not been noticeably sacrificed. The unified fast scattering algorithm for dielectric random rough surfaces can asymptotically reduce to the PEC case when the loss tangent grows extremely large. Numerical results demonstrate that the reduced PEC model does not suffer from ill-posed problems. Compared with previous publications and laboratory measurements, good agreement is observed.

  1. Predictive modeling of groundwater nitrate pollution using Random Forest and multisource variables related to intrinsic and specific vulnerability: a case study in an agricultural setting (Southern Spain).

    PubMed

    Rodriguez-Galiano, Victor; Mendes, Maria Paula; Garcia-Soldado, Maria Jose; Chica-Olmo, Mario; Ribeiro, Luis

    2014-04-01

    Watershed management decisions need robust methods, which allow an accurate predictive modeling of pollutant occurrences. Random Forest (RF) is a powerful machine learning data driven method that is rarely used in water resources studies, and thus has not been evaluated thoroughly in this field, when compared to more conventional pattern recognition techniques key advantages of RF include: its non-parametric nature; high predictive accuracy; and capability to determine variable importance. This last characteristic can be used to better understand the individual role and the combined effect of explanatory variables in both protecting and exposing groundwater from and to a pollutant. In this paper, the performance of the RF regression for predictive modeling of nitrate pollution is explored, based on intrinsic and specific vulnerability assessment of the Vega de Granada aquifer. The applicability of this new machine learning technique is demonstrated in an agriculture-dominated area where nitrate concentrations in groundwater can exceed the trigger value of 50 mg/L, at many locations. A comprehensive GIS database of twenty-four parameters related to intrinsic hydrogeologic proprieties, driving forces, remotely sensed variables and physical-chemical variables measured in "situ", were used as inputs to build different predictive models of nitrate pollution. RF measures of importance were also used to define the most significant predictors of nitrate pollution in groundwater, allowing the establishment of the pollution sources (pressures). The potential of RF for generating a vulnerability map to nitrate pollution is assessed considering multiple criteria related to variations in the algorithm parameters and the accuracy of the maps. The performance of the RF is also evaluated in comparison to the logistic regression (LR) method using different efficiency measures to ensure their generalization ability. Prediction results show the ability of RF to build accurate models

  2. Low Scaling Algorithms for the Random Phase Approximation: Imaginary Time and Laplace Transformations.

    PubMed

    Kaltak, Merzuk; Klimeš, Jiří; Kresse, Georg

    2014-06-10

    In this paper, we determine efficient imaginary frequency and imaginary time grids for second-order Møller-Plesset (MP) perturbation theory. The least-squares and Minimax quadratures are compared for periodic systems, finding that the Minimax quadrature performs slightly better for the considered materials. We show that the imaginary frequency grids developed for second order also perform well for the correlation energy in the direct random phase approximation. Furthermore, we show that the polarizabilities on the imaginary time axis can be Fourier-transformed to the imaginary frequency domain, since the time and frequency Minimax grids are dual to each other. The same duality is observed for the least-squares grids. The transformation from imaginary time to imaginary frequency allows one to reduce the time complexity to cubic (in system size), so that random phase approximation (RPA) correlation energies become accessible for large systems.

  3. A correction scheme for a simplified analytical random walk model algorithm of proton dose calculation in distal Bragg peak regions

    NASA Astrophysics Data System (ADS)

    Yao, Weiguang; Merchant, Thomas E.; Farr, Jonathan B.

    2016-10-01

    The lateral homogeneity assumption is used in most analytical algorithms for proton dose, such as the pencil-beam algorithms and our simplified analytical random walk model. To improve the dose calculation in the distal fall-off region in heterogeneous media, we analyzed primary proton fluence near heterogeneous media and propose to calculate the lateral fluence with voxel-specific Gaussian distributions. The lateral fluence from a beamlet is no longer expressed by a single Gaussian for all the lateral voxels, but by a specific Gaussian for each lateral voxel. The voxel-specific Gaussian for the beamlet of interest is calculated by re-initializing the fluence deviation on an effective surface where the proton energies of the beamlet of interest and the beamlet passing the voxel are the same. The dose improvement from the correction scheme was demonstrated by the dose distributions in two sets of heterogeneous phantoms consisting of cortical bone, lung, and water and by evaluating distributions in example patients with a head-and-neck tumor and metal spinal implants. The dose distributions from Monte Carlo simulations were used as the reference. The correction scheme effectively improved the dose calculation accuracy in the distal fall-off region and increased the gamma test pass rate. The extra computation for the correction was about 20% of that for the original algorithm but is dependent upon patient geometry.

  4. Improved scaling of time-evolving block-decimation algorithm through reduced-rank randomized singular value decomposition

    NASA Astrophysics Data System (ADS)

    Tamascelli, D.; Rosenbach, R.; Plenio, M. B.

    2015-06-01

    When the amount of entanglement in a quantum system is limited, the relevant dynamics of the system is restricted to a very small part of the state space. When restricted to this subspace the description of the system becomes efficient in the system size. A class of algorithms, exemplified by the time-evolving block-decimation (TEBD) algorithm, make use of this observation by selecting the relevant subspace through a decimation technique relying on the singular value decomposition (SVD). In these algorithms, the complexity of each time-evolution step is dominated by the SVD. Here we show that, by applying a randomized version of the SVD routine (RRSVD), the power law governing the computational complexity of TEBD is lowered by one degree, resulting in a considerable speed-up. We exemplify the potential gains in efficiency at the hand of some real world examples to which TEBD can be successfully applied and demonstrate that for those systems RRSVD delivers results as accurate as state-of-the-art deterministic SVD routines.

  5. The adaptive dynamic community detection algorithm based on the non-homogeneous random walking

    NASA Astrophysics Data System (ADS)

    Xin, Yu; Xie, Zhi-Qiang; Yang, Jing

    2016-05-01

    With the changing of the habit and custom, people's social activity tends to be changeable. It is required to have a community evolution analyzing method to mine the dynamic information in social network. For that, we design the random walking possibility function and the topology gain function to calculate the global influence matrix of the nodes. By the analysis of the global influence matrix, the clustering directions of the nodes can be obtained, thus the NRW (Non-Homogeneous Random Walk) method for detecting the static overlapping communities can be established. We design the ANRW (Adaptive Non-Homogeneous Random Walk) method via adapting the nodes impacted by the dynamic events based on the NRW. The ANRW combines the local community detection with dynamic adaptive adjustment to decrease the computational cost for ANRW. Furthermore, the ANRW treats the node as the calculating unity, thus the running manner of the ANRW is suitable to the parallel computing, which could meet the requirement of large dataset mining. Finally, by the experiment analysis, the efficiency of ANRW on dynamic community detection is verified.

  6. Robust 3D object localization and pose estimation for random bin picking with the 3DMaMa algorithm

    NASA Astrophysics Data System (ADS)

    Skotheim, Øystein; Thielemann, Jens T.; Berge, Asbjørn; Sommerfelt, Arne

    2010-02-01

    Enabling robots to automatically locate and pick up randomly placed and oriented objects from a bin is an important challenge in factory automation, replacing tedious and heavy manual labor. A system should be able to recognize and locate objects with a predefined shape and estimate the position with the precision necessary for a gripping robot to pick it up. We describe a system that consists of a structured light instrument for capturing 3D data and a robust approach for object location and pose estimation. The method does not depend on segmentation of range images, but instead searches through pairs of 2D manifolds to localize candidates for object match. This leads to an algorithm that is not very sensitive to scene complexity or the number of objects in the scene. Furthermore, the strategy for candidate search is easily reconfigurable to arbitrary objects. Experiments reported in this paper show the utility of the method on a general random bin picking problem, in this paper exemplified by localization of car parts with random position and orientation. Full pose estimation is done in less than 380 ms per image. We believe that the method is applicable for a wide range of industrial automation problems where precise localization of 3D objects in a scene is needed.

  7. Random Forest Classification of Sediments on Exposed Intertidal Flats Using ALOS-2 Quad-Polarimetric SAR Data

    NASA Astrophysics Data System (ADS)

    Wang, W.; Yang, X.; Liu, G.; Zhou, H.; Ma, W.; Yu, Y.; Li, Z.

    2016-06-01

    Coastal zones are one of the world's most densely populated areas and it is necessary to propose an accurate, cost effective, frequent, and synoptic method of monitoring these complex ecosystems. However, misclassification of sediments on exposed intertidal flats restricts the development of coastal zones surveillance. With the advent of SAR (Synthetic Aperture Radar) satellites, polarimetric SAR satellite imagery plays an increasingly important role in monitoring changes in coastal wetland. This research investigated the necessity of combining SAR polarimetric features with optical data, and their contribution in accurately sediment classification. Three experimental groups were set to make assessment of the most appropriate descriptors. (i) Several SAR polarimetric descriptors were extracted from scattering matrix using Cloude-Pottier, Freeman-Durden and Yamaguchi methods; (ii) Optical remote sensing (RS) data with R, G and B channels formed the second feature combinations; (iii) The chosen SAR and optical RS indicators were both added into classifier. Classification was carried out using Random Forest (RF) classifiers and a general result mapping of intertidal flats was generated. Experiments were implemented using ALOS-2 L-band satellite imagery and GF-1 optical multi-spectral data acquired in the same period. The weights of descriptors were evaluated by VI (RF Variable Importance). Results suggested that optical data source has few advantages on sediment classification, and even reduce the effect of SAR indicators. Polarimetric SAR feature sets show great potentials in intertidal flats classification and are promising in classifying mud flats, sand flats, bare farmland and tidal water.

  8. Proteomics Analysis with a Nano Random Forest Approach Reveals Novel Functional Interactions Regulated by SMC Complexes on Mitotic Chromosomes*

    PubMed Central

    Ohta, Shinya; Montaño-Gutierrez, Luis F.; de Lima Alves, Flavia; Ogawa, Hiromi; Toramoto, Iyo; Sato, Nobuko; Morrison, Ciaran G.; Takeda, Shunichi; Hudson, Damien F.; Earnshaw, William C.

    2016-01-01

    Packaging of DNA into condensed chromosomes during mitosis is essential for the faithful segregation of the genome into daughter nuclei. Although the structure and composition of mitotic chromosomes have been studied for over 30 years, these aspects are yet to be fully elucidated. Here, we used stable isotope labeling with amino acids in cell culture to compare the proteomes of mitotic chromosomes isolated from cell lines harboring conditional knockouts of members of the condensin (SMC2, CAP-H, CAP-D3), cohesin (Scc1/Rad21), and SMC5/6 (SMC5) complexes. Our analysis revealed that these complexes associate with chromosomes independently of each other, with the SMC5/6 complex showing no significant dependence on any other chromosomal proteins during mitosis. To identify subtle relationships between chromosomal proteins, we employed a nano Random Forest (nanoRF) approach to detect protein complexes and the relationships between them. Our nanoRF results suggested that as few as 113 of 5058 detected chromosomal proteins are functionally linked to chromosome structure and segregation. Furthermore, nanoRF data revealed 23 proteins that were not previously suspected to have functional interactions with complexes playing important roles in mitosis. Subsequent small-interfering-RNA-based validation and localization tracking by green fluorescent protein-tagging highlighted novel candidates that might play significant roles in mitotic progression. PMID:27231315

  9. Mapping Sub-Antarctic Cushion Plants Using Random Forests to Combine Very High Resolution Satellite Imagery and Terrain Modelling

    PubMed Central

    Bricher, Phillippa K.; Lucieer, Arko; Shaw, Justine; Terauds, Aleks; Bergstrom, Dana M.

    2013-01-01

    Monitoring changes in the distribution and density of plant species often requires accurate and high-resolution baseline maps of those species. Detecting such change at the landscape scale is often problematic, particularly in remote areas. We examine a new technique to improve accuracy and objectivity in mapping vegetation, combining species distribution modelling and satellite image classification on a remote sub-Antarctic island. In this study, we combine spectral data from very high resolution WorldView-2 satellite imagery and terrain variables from a high resolution digital elevation model to improve mapping accuracy, in both pixel- and object-based classifications. Random forest classification was used to explore the effectiveness of these approaches on mapping the distribution of the critically endangered cushion plant Azorellamacquariensis Orchard (Apiaceae) on sub-Antarctic Macquarie Island. Both pixel- and object-based classifications of the distribution of Azorella achieved very high overall validation accuracies (91.6–96.3%, κ = 0.849–0.924). Both two-class and three-class classifications were able to accurately and consistently identify the areas where Azorella was absent, indicating that these maps provide a suitable baseline for monitoring expected change in the distribution of the cushion plants. Detecting such change is critical given the threats this species is currently facing under altering environmental conditions. The method presented here has applications to monitoring a range of species, particularly in remote and isolated environments. PMID:23940805

  10. Proteomics-based, multivariate random forest method for prediction of protein separation behavior during cation-exchange chromatography.

    PubMed

    Swanson, Ryan K; Xu, Ruo; Nettleton, Dan; Glatz, Charles E

    2012-08-01

    The most significant cost of recombinant protein production lies in the optimization of the downstream purification methods, mainly due to a lack of knowledge of the separation behavior of the host cell proteins (HCP). To reduce the effort required for purification process development, this work was aimed at modeling the separation behavior of a complex mixture of proteins in cation-exchange chromatography (CEX). With the emergence of molecular pharming as a viable option for the production of recombinant pharmaceutical proteins, the HCP mixture chosen was an extract of corn germ. Aqueous two phase system (ATPS) partitioning followed by two-dimensional electrophoresis (2DE) provided data on isoelectric point, molecular weight and surface hydrophobicity of the extract and step-elution fractions. A multivariate random forest (MVRF) method was then developed using the three characterization variables to predict the elution pattern of individual corn HCP. The MVRF method achieved an average root mean squared error (RMSE) value of 0.0406 (fraction of protein eluted in each CEX elution step) for all the proteins that were characterized, providing evidence for the effectiveness of both the characterization method and the analysis approach for protein purification applications.

  11. A co-adaptive sensory motor rhythms Brain-Computer Interface based on common spatial patterns and Random Forest.

    PubMed

    Schwarz, Andreas; Scherer, Reinhold; Steyrl, David; Faller, Josef; Muller-Putz, Gernot R

    2015-08-01

    Sensorimotor rhythm (SMR) based Brain-Computer Interfaces (BCI) typically require lengthy user training. This can be exhausting and fatiguing for the user as data collection may be monotonous and typically without any feedback for user motivation. Hence new ways to reduce user training and improve performance are needed. We recently introduced a two class motor imagery BCI system which continuously adapted with increasing run-time to the brain patterns of the user. The system was designed to provide visual feedback to the user after just five minutes. The aim of the current work was to improve user-specific online adaptation, which was expected to lead to higher performances. To maximize SMR discrimination, the method of filter-bank common spatial patterns (fbCSP) and Random Forest (RF) classifier were combined. In a supporting online study, all volunteers performed significantly better than chance. Overall peak accuracy of 88.6 ± 6.1 (SD) % was reached, which significantly exceeded the performance of our previous system by 13%. Therefore, we consider this system the next step towards fully auto-calibrating motor imagery BCIs. PMID:26736445

  12. Automated sleep stage identification system based on time-frequency analysis of a single EEG channel and random forest classifier.

    PubMed

    Fraiwan, Luay; Lweesy, Khaldon; Khasawneh, Natheer; Wenz, Heinrich; Dickhaus, Hartmut

    2012-10-01

    In this work, an efficient automated new approach for sleep stage identification based on the new standard of the American academy of sleep medicine (AASM) is presented. The propose approach employs time-frequency analysis and entropy measures for feature extraction from a single electroencephalograph (EEG) channel. Three time-frequency techniques were deployed for the analysis of the EEG signal: Choi-Williams distribution (CWD), continuous wavelet transform (CWT), and Hilbert-Huang Transform (HHT). Polysomnographic recordings from sixteen subjects were used in this study and features were extracted from the time-frequency representation of the EEG signal using Renyi's entropy. The classification of the extracted features was done using random forest classifier. The performance of the new approach was tested by evaluating the accuracy and the kappa coefficient for the three time-frequency distributions: CWD, CWT, and HHT. The CWT time-frequency distribution outperformed the other two distributions and showed excellent performance with an accuracy of 0.83 and a kappa coefficient of 0.76.

  13. Random forests in non-invasive sensorimotor rhythm brain-computer interfaces: a practical and convenient non-linear classifier.

    PubMed

    Steyrl, David; Scherer, Reinhold; Faller, Josef; Müller-Putz, Gernot R

    2016-02-01

    There is general agreement in the brain-computer interface (BCI) community that although non-linear classifiers can provide better results in some cases, linear classifiers are preferable. Particularly, as non-linear classifiers often involve a number of parameters that must be carefully chosen. However, new non-linear classifiers were developed over the last decade. One of them is the random forest (RF) classifier. Although popular in other fields of science, RFs are not common in BCI research. In this work, we address three open questions regarding RFs in sensorimotor rhythm (SMR) BCIs: parametrization, online applicability, and performance compared to regularized linear discriminant analysis (LDA). We found that the performance of RF is constant over a large range of parameter values. We demonstrate - for the first time - that RFs are applicable online in SMR-BCIs. Further, we show in an offline BCI simulation that RFs statistically significantly outperform regularized LDA by about 3%. These results confirm that RFs are practical and convenient non-linear classifiers for SMR-BCIs. Taking into account further properties of RFs, such as independence from feature distributions, maximum margin behavior, multiclass and advanced data mining capabilities, we argue that RFs should be taken into consideration for future BCIs.

  14. Polarimetric SAR decomposition parameter subset selection and their optimal dynamic range evaluation for urban area classification using Random Forest

    NASA Astrophysics Data System (ADS)

    Hariharan, Siddharth; Tirodkar, Siddhesh; Bhattacharya, Avik

    2016-02-01

    Urban area classification is important for monitoring the ever increasing urbanization and studying its environmental impact. Two NASA JPL's UAVSAR datasets of L-band (wavelength: 23 cm) were used in this study for urban area classification. The two datasets used in this study are different in terms of urban area structures, building patterns, their geometric shapes and sizes. In these datasets, some urban areas appear oriented about the radar line of sight (LOS) while some areas appear non-oriented. In this study, roll invariant polarimetric SAR decomposition parameters were used to classify these urban areas. Random Forest (RF), which is an ensemble decision tree learning technique, was used in this study. RF performs parameter subset selection as a part of its classification procedure. In this study, parameter subsets were obtained and analyzed to infer scattering mechanisms useful for urban area classification. The Cloude-Pottier α, the Touzi dominant scattering amplitude αs1 and the anisotropy A were among the top six important parameters selected for both the datasets. However, it was observed that these parameters were ranked differently for the two datasets. The urban area classification using RF was compared with the Support Vector Machine (SVM) and the Maximum Likelihood Classifier (MLC) for both the datasets. RF outperforms SVM by 4% and MLC by 12% in Dataset 1. It also outperforms SVM and MLC by 3.5% and 11% respectively in Dataset 2.

  15. Proteomics-based, multivariate random forest method for prediction of protein separation behavior during cation-exchange chromatography.

    PubMed

    Swanson, Ryan K; Xu, Ruo; Nettleton, Dan; Glatz, Charles E

    2012-08-01

    The most significant cost of recombinant protein production lies in the optimization of the downstream purification methods, mainly due to a lack of knowledge of the separation behavior of the host cell proteins (HCP). To reduce the effort required for purification process development, this work was aimed at modeling the separation behavior of a complex mixture of proteins in cation-exchange chromatography (CEX). With the emergence of molecular pharming as a viable option for the production of recombinant pharmaceutical proteins, the HCP mixture chosen was an extract of corn germ. Aqueous two phase system (ATPS) partitioning followed by two-dimensional electrophoresis (2DE) provided data on isoelectric point, molecular weight and surface hydrophobicity of the extract and step-elution fractions. A multivariate random forest (MVRF) method was then developed using the three characterization variables to predict the elution pattern of individual corn HCP. The MVRF method achieved an average root mean squared error (RMSE) value of 0.0406 (fraction of protein eluted in each CEX elution step) for all the proteins that were characterized, providing evidence for the effectiveness of both the characterization method and the analysis approach for protein purification applications. PMID:22748375

  16. EcmPred: prediction of extracellular matrix proteins based on random forest with maximum relevance minimum redundancy feature selection.

    PubMed

    Kandaswamy, Krishna Kumar; Pugalenthi, Ganesan; Kalies, Kai-Uwe; Hartmann, Enno; Martinetz, Thomas

    2013-01-21

    The extracellular matrix (ECM) is a major component of tissues of multicellular organisms. It consists of secreted macromolecules, mainly polysaccharides and glycoproteins. Malfunctions of ECM proteins lead to severe disorders such as marfan syndrome, osteogenesis imperfecta, numerous chondrodysplasias, and skin diseases. In this work, we report a random forest approach, EcmPred, for the prediction of ECM proteins from protein sequences. EcmPred was trained on a dataset containing 300 ECM and 300 non-ECM and tested on a dataset containing 145 ECM and 4187 non-ECM proteins. EcmPred achieved 83% accuracy on the training and 77% on the test dataset. EcmPred predicted 15 out of 20 experimentally verified ECM proteins. By scanning the entire human proteome, we predicted novel ECM proteins validated with gene ontology and InterPro. The dataset and standalone version of the EcmPred software is available at http://www.inb.uni-luebeck.de/tools-demos/Extracellular_matrix_proteins/EcmPred.

  17. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    PubMed

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  18. In silico modelling of permeation enhancement potency in Caco-2 monolayers based on molecular descriptors and random forest.

    PubMed

    Welling, Søren H; Clemmensen, Line K H; Buckley, Stephen T; Hovgaard, Lars; Brockhoff, Per B; Refsgaard, Hanne H F

    2015-08-01

    Structural traits of permeation enhancers are important determinants of their capacity to promote enhanced drug absorption. Therefore, in order to obtain a better understanding of structure-activity relationships for permeation enhancers, a Quantitative Structural Activity Relationship (QSAR) model has been developed. The random forest-QSAR model was based upon Caco-2 data for 41 surfactant-like permeation enhancers from Whitehead et al. (2008) and molecular descriptors calculated from their structure. The QSAR model was validated by two test-sets: (i) an eleven compound experimental set with Caco-2 data and (ii) nine compounds with Caco-2 data from literature. Feature contributions, a recent developed diagnostic tool, was applied to elucidate the contribution of individual molecular descriptors to the predicted potency. Feature contributions provided easy interpretable suggestions of important structural properties for potent permeation enhancers such as segregation of hydrophilic and lipophilic domains. Focusing on surfactant-like properties, it is possible to model the potency of the complex pharmaceutical excipients, permeation enhancers. For the first time, a QSAR model has been developed for permeation enhancement. The model is a valuable in silico approach for both screening of new permeation enhancers and physicochemical optimisation of surfactant enhancer systems.

  19. Efficient asymmetric image authentication schemes based on photon counting-double random phase encoding and RSA algorithms.

    PubMed

    Moon, Inkyu; Yi, Faliu; Han, Mingu; Lee, Jieun

    2016-06-01

    Recently, double random phase encoding (DRPE) has been integrated with the photon counting (PC) imaging technique for the purpose of secure image authentication. In this scheme, the same key should be securely distributed and shared between the sender and receiver, but this is one of the most vexing problems of symmetric cryptosystems. In this study, we propose an efficient asymmetric image authentication scheme by combining the PC-DRPE and RSA algorithms, which solves key management and distribution problems. The retrieved image from the proposed authentication method contains photon-limited encrypted data obtained by means of PC-DRPE. Therefore, the original image can be protected while the retrieved image can be efficiently verified using a statistical nonlinear correlation approach. Experimental results demonstrate the feasibility of our proposed asymmetric image authentication method.

  20. Efficient asymmetric image authentication schemes based on photon counting-double random phase encoding and RSA algorithms.

    PubMed

    Moon, Inkyu; Yi, Faliu; Han, Mingu; Lee, Jieun

    2016-06-01

    Recently, double random phase encoding (DRPE) has been integrated with the photon counting (PC) imaging technique for the purpose of secure image authentication. In this scheme, the same key should be securely distributed and shared between the sender and receiver, but this is one of the most vexing problems of symmetric cryptosystems. In this study, we propose an efficient asymmetric image authentication scheme by combining the PC-DRPE and RSA algorithms, which solves key management and distribution problems. The retrieved image from the proposed authentication method contains photon-limited encrypted data obtained by means of PC-DRPE. Therefore, the original image can be protected while the retrieved image can be efficiently verified using a statistical nonlinear correlation approach. Experimental results demonstrate the feasibility of our proposed asymmetric image authentication method. PMID:27411183

  1. Temporal optimisation of image acquisition for land cover classification with Random Forest and MODIS time-series

    NASA Astrophysics Data System (ADS)

    Nitze, Ingmar; Barrett, Brian; Cawkwell, Fiona

    2015-02-01

    The analysis and classification of land cover is one of the principal applications in terrestrial remote sensing. Due to the seasonal variability of different vegetation types and land surface characteristics, the ability to discriminate land cover types changes over time. Multi-temporal classification can help to improve the classification accuracies, but different constraints, such as financial restrictions or atmospheric conditions, may impede their application. The optimisation of image acquisition timing and frequencies can help to increase the effectiveness of the classification process. For this purpose, the Feature Importance (FI) measure of the state-of-the art machine learning method Random Forest was used to determine the optimal image acquisition periods for a general (Grassland, Forest, Water, Settlement, Peatland) and Grassland specific (Improved Grassland, Semi-Improved Grassland) land cover classification in central Ireland based on a 9-year time-series of MODIS Terra 16 day composite data (MOD13Q1). Feature Importances for each acquisition period of the Enhanced Vegetation Index (EVI) and Normalised Difference Vegetation Index (NDVI) were calculated for both classification scenarios. In the general land cover classification, the months December and January showed the highest, and July and August the lowest separability for both VIs over the entire nine-year period. This temporal separability was reflected in the classification accuracies, where the optimal choice of image dates outperformed the worst image date by 13% using NDVI and 5% using EVI on a mono-temporal analysis. With the addition of the next best image periods to the data input the classification accuracies converged quickly to their limit at around 8-10 images. The binary classification schemes, using two classes only, showed a stronger seasonal dependency with a higher intra-annual, but lower inter-annual variation. Nonetheless anomalous weather conditions, such as the cold winter of

  2. Automatic segmentation of ground-glass opacities in lung CT images by using Markov random field-based algorithms.

    PubMed

    Zhu, Yanjie; Tan, Yongqing; Hua, Yanqing; Zhang, Guozhen; Zhang, Jianguo

    2012-06-01

    Chest radiologists rely on the segmentation and quantificational analysis of ground-glass opacities (GGO) to perform imaging diagnoses that evaluate the disease severity or recovery stages of diffuse parenchymal lung diseases. However, it is computationally difficult to segment and analyze patterns of GGO while compared with other lung diseases, since GGO usually do not have clear boundaries. In this paper, we present a new approach which automatically segments GGO in lung computed tomography (CT) images using algorithms derived from Markov random field theory. Further, we systematically evaluate the performance of the algorithms in segmenting GGO in lung CT images under different situations. CT image studies from 41 patients with diffuse lung diseases were enrolled in this research. The local distributions were modeled with both simple and adaptive (AMAP) models of maximum a posteriori (MAP). For best segmentation, we used the simulated annealing algorithm with a Gibbs sampler to solve the combinatorial optimization problem of MAP estimators, and we applied a knowledge-guided strategy to reduce false positive regions. We achieved AMAP-based GGO segmentation results of 86.94%, 94.33%, and 94.06% in average sensitivity, specificity, and accuracy, respectively, and we evaluated the performance using radiologists' subjective evaluation and quantificational analysis and diagnosis. We also compared the results of AMAP-based GGO segmentation with those of support vector machine-based methods, and we discuss the reliability and other issues of AMAP-based GGO segmentation. Our research results demonstrate the acceptability and usefulness of AMAP-based GGO segmentation for assisting radiologists in detecting GGO in high-resolution CT diagnostic procedures.

  3. Applicability of random sequential adsorption algorithm for simulation of surface plasma polishing kinetics

    NASA Astrophysics Data System (ADS)

    Minárik, Stanislav; Vaňa, Dušan

    2015-11-01

    Applicability of random sequential adsorption (RSA) model for the material removal during a surface plasma polishing is discussed. The mechanical nature of plasma polishing process is taken into consideration in modified version of RSA model. During the plasma polishing the surface layer is aligned such that molecules of material are removed from the surface mechanically as a consequence of the surface deformation induced by plasma particles impact. We propose modification of RSA technique to describe the reduction of material on the surface provided that sequential character of molecules release from the surface is maintained throughout the polishing process. This empirical model is able to estimate depth profile of material density on the surface during the plasma polishing. We have shown that preliminary results obtained from this model are in good agreement with experimental results. We believe that molecular dynamics simulation of the polishing process, possibly also other types of surface treatment, can be based on this model. However influence of material parameters and processing conditions (including plasma characteristics) must be taken into account using appropriate model variables.

  4. Use of random forest to estimate population attributable fractions from a case-control study of Salmonella enterica serotype Enteritidis infections.

    PubMed

    Gu, W; Vieira, A R; Hoekstra, R M; Griffin, P M; Cole, D

    2015-10-01

    To design effective food safety programmes we need to estimate how many sporadic foodborne illnesses are caused by specific food sources based on case-control studies. Logistic regression has substantive limitations for analysing structured questionnaire data with numerous exposures and missing values. We adapted random forest to analyse data of a case-control study of Salmonella enterica serotype Enteritidis illness for source attribution. For estimation of summary population attributable fractions (PAFs) of exposures grouped into transmission routes, we devised a counterfactual estimator to predict reductions in illness associated with removing grouped exposures. For the purpose of comparison, we fitted the data using logistic regression models with stepwise forward and backward variable selection. Our results show that the forward and backward variable selection of logistic regression models were not consistent for parameter estimation, with different significant exposures identified. By contrast, the random forest model produced estimated PAFs of grouped exposures consistent in rank order with results obtained from outbreak data, with egg-related exposures having the highest estimated PAF (22·1%, 95% confidence interval 8·5-31·8). Random forest might be structurally more coherent and efficient than logistic regression models for attributing Salmonella illnesses to sources involving many causal pathways.

  5. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness.

    PubMed

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia's marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to 'small p and large n' problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and

  6. Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds.

    PubMed

    Bertolini, F; Galimberti, G; Calò, D G; Schiavo, G; Matassino, D; Fontanesi, L

    2015-10-01

    The genetic identification of the population of origin of individuals, including animals, has several practical applications in forensics, evolution, conservation genetics, breeding and authentication of animal products. Commercial high-density single nucleotide polymorphism (SNP) genotyping tools that have been recently developed in many species provide information from a large number of polymorphic sites that can be used to identify population-/breed-informative markers. In this study, starting from Illumina BovineSNP50 v1 BeadChip array genotyping data available from 3711 cattle of four breeds (2091 Italian Holstein, 738 Italian Brown, 475 Italian Simmental and 407 Marchigiana), principal component analysis (PCA) and random forests (RFs) were combined to identify informative SNP panels useful for cattle breed identification. From a PCA preselected list of 580 SNPs, RFs were computed using ranking methods (Mean Decrease in the Gini Index and Mean Accuracy Decrease) to identify the most informative 48 and 96 SNPs for breed assignment. The out-of-bag (OOB) error rate for both ranking methods and SNP densities ranged from 0.0 to 0.1% in the reference population. Application of this approach in a test population (10% of individuals pre-extracted from the whole data set) achieved 100% of correct assignment with both classifiers. Linkage disequilibrium between selected SNPs was relevant (r(2) > 0.6) only in few pairs of markers indicating that most of the selected SNPs captured different fractions of variance. Several informative SNPs were in genes/QTL regions that affect or are associated with phenotypes or production traits that might differentiate the investigated breeds. The combination of PCA and RF to perform SNP selection and breed assignment can be easily implemented and is able to identify subsets of informative SNPs useful for population assignment starting from a large number of markers derived by high-throughput genotyping platforms.

  7. Selecting Optimal Random Forest Predictive Models: A Case Study on Predicting the Spatial Distribution of Seabed Hardness

    PubMed Central

    Li, Jin; Tran, Maggie; Siwabessy, Justy

    2016-01-01

    Spatially continuous predictions of seabed hardness are important baseline environmental information for sustainable management of Australia’s marine jurisdiction. Seabed hardness is often inferred from multibeam backscatter data with unknown accuracy and can be inferred from underwater video footage at limited locations. In this study, we classified the seabed into four classes based on two new seabed hardness classification schemes (i.e., hard90 and hard70). We developed optimal predictive models to predict seabed hardness using random forest (RF) based on the point data of hardness classes and spatially continuous multibeam data. Five feature selection (FS) methods that are variable importance (VI), averaged variable importance (AVI), knowledge informed AVI (KIAVI), Boruta and regularized RF (RRF) were tested based on predictive accuracy. Effects of highly correlated, important and unimportant predictors on the accuracy of RF predictive models were examined. Finally, spatial predictions generated using the most accurate models were visually examined and analysed. This study confirmed that: 1) hard90 and hard70 are effective seabed hardness classification schemes; 2) seabed hardness of four classes can be predicted with a high degree of accuracy; 3) the typical approach used to pre-select predictive variables by excluding highly correlated variables needs to be re-examined; 4) the identification of the important and unimportant predictors provides useful guidelines for further improving predictive models; 5) FS methods select the most accurate predictive model(s) instead of the most parsimonious ones, and AVI and Boruta are recommended for future studies; and 6) RF is an effective modelling method with high predictive accuracy for multi-level categorical data and can be applied to ‘small p and large n’ problems in environmental sciences. Additionally, automated computational programs for AVI need to be developed to increase its computational efficiency and

  8. Robust prediction of B-factor profile from sequence using two-stage SVR based on random forest feature selection.

    PubMed

    Pan, Xiao-Yong; Shen, Hong-Bin

    2009-01-01

    B-factor is highly correlated with protein internal motion, which is used to measure the uncertainty in the position of an atom within a crystal structure. Although the rapid progress of structural biology in recent years makes more accurate protein structures available than ever, with the avalanche of new protein sequences emerging during the post-genomic Era, the gap between the known protein sequences and the known protein structures becomes wider and wider. It is urgent to develop automated methods to predict B-factor profile from the amino acid sequences directly, so as to be able to timely utilize them for basic research. In this article, we propose a novel approach, called PredBF, to predict the real value of B-factor. We firstly extract both global and local features from the protein sequences as well as their evolution information, then the random forests feature selection is applied to rank their importance and the most important features are inputted to a two-stage support vector regression (SVR) for prediction, where the initial predicted outputs from the 1(st) SVR are further inputted to the 2nd layer SVR for final refinement. Our results have revealed that a systematic analysis of the importance of different features makes us have deep insights into the different contributions of features and is very necessary for developing effective B-factor prediction tools. The two-layer SVR prediction model designed in this study further enhanced the robustness of predicting the B-factor profile. As a web server, PredBF is freely available at: http://www.csbio.sjtu.edu.cn/bioinf/PredBF for academic use.

  9. Using Logistic Regression and Random Forests multivariate statistical methods for landslide spatial probability assessment in North-Est Sicily, Italy

    NASA Astrophysics Data System (ADS)

    Trigila, Alessandro; Iadanza, Carla; Esposito, Carlo; Scarascia-Mugnozza, Gabriele

    2015-04-01

    first phase of the work addressed to identify the spatial relationships between the landslides location and the 13 related factors by using the Frequency Ratio bivariate statistical method. The analysis was then carried out by adopting a multivariate statistical approach, according to the Logistic Regression technique and Random Forests technique that gave best results in terms of AUC. The models were performed and evaluated with different sample sizes and also taking into account the temporal variation of input variables such as burned areas by wildfire. The most significant outcome of this work are: the relevant influence of the sample size on the model results and the strong importance of some environmental factors (e.g. land use and wildfires) for the identification of the depletion zones of extremely rapid shallow landslides.

  10. Combining random forest and 2D correlation analysis to identify serum spectral signatures for neuro-oncology.

    PubMed

    Smith, Benjamin R; Ashton, Katherine M; Brodbelt, Andrew; Dawson, Timothy; Jenkinson, Michael D; Hunt, Neil T; Palmer, David S; Baker, Matthew J

    2016-06-01

    Fourier transform infrared (FTIR) spectroscopy has long been established as an analytical technique for the measurement of vibrational modes of molecular systems. More recently, FTIR has been used for the analysis of biofluids with the aim of becoming a tool to aid diagnosis. For the clinician, this represents a convenient, fast, non-subjective option for the study of biofluids and the diagnosis of disease states. The patient also benefits from this method, as the procedure for the collection of serum is much less invasive and stressful than traditional biopsy. This is especially true of patients in whom brain cancer is suspected. A brain biopsy is very unpleasant for the patient, potentially dangerous and can occasionally be inconclusive. We therefore present a method for the diagnosis of brain cancer from serum samples using FTIR and machine learning techniques. The scope of the study involved 433 patients from whom were collected 9 spectra each in the range 600-4000 cm(-1). To begin the development of the novel method, various pre-processing steps were investigated and ranked in terms of final accuracy of the diagnosis. Random forest machine learning was utilised as a classifier to separate patients into cancer or non-cancer categories based upon the intensities of wavenumbers present in their spectra. Generalised 2D correlational analysis was then employed to further augment the machine learning, and also to establish spectral features important for the distinction between cancer and non-cancer serum samples. Using these methods, sensitivities of up to 92.8% and specificities of up to 91.5% were possible. Furthermore, ratiometrics were also investigated in order to establish any correlations present in the dataset. We show a rapid, computationally light, accurate, statistically robust methodology for the identification of spectral features present in differing disease states. With current advances in IR technology, such as the development of rapid discrete

  11. Random forests for feature selection in QSPR Models - an application for predicting standard enthalpy of formation of hydrocarbons

    PubMed Central

    2013-01-01

    Background One of the main topics in the development of quantitative structure-property relationship (QSPR) predictive models is the identification of the subset of variables that represent the structure of a molecule and which are predictors for a given property. There are several automated feature selection methods, ranging from backward, forward or stepwise procedures, to further elaborated methodologies such as evolutionary programming. The problem lies in selecting the minimum subset of descriptors that can predict a certain property with a good performance, computationally efficient and in a more robust way, since the presence of irrelevant or redundant features can cause poor generalization capacity. In this paper an alternative selection method, based on Random Forests to determine the variable importance is proposed in the context of QSPR regression problems, with an application to a manually curated dataset for predicting standard enthalpy of formation. The subsequent predictive models are trained with support vector machines introducing the variables sequentially from a ranked list based on the variable importance. Results The model generalizes well even with a high dimensional dataset and in the presence of highly correlated variables. The feature selection step was shown to yield lower prediction errors with RMSE values 23% lower than without feature selection, albeit using only 6% of the total number of variables (89 from the original 1485). The proposed approach further compared favourably with other feature selection methods and dimension reduction of the feature space. The predictive model was selected using a 10-fold cross validation procedure and, after selection, it was validated with an independent set to assess its performance when applied to new data and the results were similar to the ones obtained for the training set, supporting the robustness of the proposed approach. Conclusions The proposed methodology seemingly improves the prediction

  12. An evidence gathering and assessment technique designed for a forest cover classification algorithm based on the Dempster-Shafer theory of evidence

    NASA Astrophysics Data System (ADS)

    Szymanski, David Lawrence

    This thesis presents a new approach for classifying Landsat 5 Thematic Mapper (TM) imagery that utilizes digitally represented, non-spectral data in the classification step. A classification algorithm that is based on the Dempster-Shafer theory of evidence is developed and tested for its ability to provide an accurate representation of forest cover on the ground at the Anderson et al (1976) level II. The research focuses on defining an objective, systematic method of gathering and assessing the evidence from digital sources including TM data, the normalized difference vegetation index, soils, slope, aspect, and elevation. The algorithm is implemented using the ESRI ArcView Spatial Analyst software package and the Grid spatial data structure with software coded in both ArcView Avenue and also C. The methodology uses frequency of occurrence information to gather evidence and also introduces measures of evidence quality that quantify the ability of the evidence source to differentiate the Anderson forest cover classes. The measures are derived objectively and empirically and are based on common principles of legal argument. The evidence assessment measures augment the Dempster-Shafer theory and the research will determine if they provide an argument that is mentally sound, credible, and consistent. This research produces a method for identifying, assessing, and combining evidence sources using the Dempster-Shafer theory that results in a classified image containing the Anderson forest cover class. Test results indicate that the new classifier performs with accuracy that is similar to the traditional maximum likelihood approach. However, confusion among the deciduous and mixed classes remains. The utility of the evidence gathering method and also the evidence assessment method is demonstrated and confirmed. The algorithm presents an operational method of using the Dempster-Shafer theory of evidence for forest classification.

  13. Monitoring grass nutrients and biomass as indicators of rangeland quality and quantity using random forest modelling and WorldView-2 data

    NASA Astrophysics Data System (ADS)

    Ramoelo, Abel; Cho, M. A.; Mathieu, R.; Madonsela, S.; van de Kerchove, R.; Kaszta, Z.; Wolff, E.

    2015-12-01

    Land use and climate change could have huge impacts on food security and the health of various ecosystems. Leaf nitrogen (N) and above-ground biomass are some of the key factors limiting agricultural production and ecosystem functioning. Leaf N and biomass can be used as indicators of rangeland quality and quantity. Conventional methods for assessing these vegetation parameters at landscape scale level are time consuming and tedious. Remote sensing provides a bird-eye view of the landscape, which creates an opportunity to assess these vegetation parameters over wider rangeland areas. Estimation of leaf N has been successful during peak productivity or high biomass and limited studies estimated leaf N in dry season. The estimation of above-ground biomass has been hindered by the signal saturation problems using conventional vegetation indices. The objective of this study is to monitor leaf N and above-ground biomass as an indicator of rangeland quality and quantity using WorldView-2 satellite images and random forest technique in the north-eastern part of South Africa. Series of field work to collect samples for leaf N and biomass were undertaken in March 2013, April or May 2012 (end of wet season) and July 2012 (dry season). Several conventional and red edge based vegetation indices were computed. Overall results indicate that random forest and vegetation indices explained over 89% of leaf N concentrations for grass and trees, and less than 89% for all the years of assessment. The red edge based vegetation indices were among the important variables for predicting leaf N. For the biomass, random forest model explained over 84% of biomass variation in all years, and visible bands including red edge based vegetation indices were found to be important. The study demonstrated that leaf N could be monitored using high spatial resolution with the red edge band capability, and is important for rangeland assessment and monitoring.

  14. The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations.

    PubMed

    Melcher, Michael; Scharl, Theresa; Spangl, Bernhard; Luchner, Markus; Cserjan, Monika; Bayer, Karl; Leisch, Friedrich; Striedner, Gerald

    2015-09-01

    Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical "quality by testing" to rational, knowledge-based "quality by design" approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real-time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed-batch production processes with varying cultivation conditions employing a comprehensive on- and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two-dimensional multi-wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool. PMID:26121295

  15. Analysis and Recognition of Traditional Chinese Medicine Pulse Based on the Hilbert-Huang Transform and Random Forest in Patients with Coronary Heart Disease.

    PubMed

    Guo, Rui; Wang, Yiqin; Yan, Hanxia; Yan, Jianjun; Yuan, Fengyin; Xu, Zhaoxia; Liu, Guoping; Xu, Wenjie

    2015-01-01

    Objective. This research provides objective and quantitative parameters of the traditional Chinese medicine (TCM) pulse conditions for distinguishing between patients with the coronary heart disease (CHD) and normal people by using the proposed classification approach based on Hilbert-Huang transform (HHT) and random forest. Methods. The energy and the sample entropy features were extracted by applying the HHT to TCM pulse by treating these pulse signals as time series. By using the random forest classifier, the extracted two types of features and their combination were, respectively, used as input data to establish classification model. Results. Statistical results showed that there were significant differences in the pulse energy and sample entropy between the CHD group and the normal group. Moreover, the energy features, sample entropy features, and their combination were inputted as pulse feature vectors; the corresponding average recognition rates were 84%, 76.35%, and 90.21%, respectively. Conclusion. The proposed approach could be appropriately used to analyze pulses of patients with CHD, which can lay a foundation for research on objective and quantitative criteria on disease diagnosis or Zheng differentiation.

  16. The potential of random forest and neural networks for biomass and recombinant protein modeling in Escherichia coli fed-batch fermentations.

    PubMed

    Melcher, Michael; Scharl, Theresa; Spangl, Bernhard; Luchner, Markus; Cserjan, Monika; Bayer, Karl; Leisch, Friedrich; Striedner, Gerald

    2015-09-01

    Product quality assurance strategies in production of biopharmaceuticals currently undergo a transformation from empirical "quality by testing" to rational, knowledge-based "quality by design" approaches. The major challenges in this context are the fragmentary understanding of bioprocesses and the severely limited real-time access to process variables related to product quality and quantity. Data driven modeling of process variables in combination with model predictive process control concepts represent a potential solution to these problems. The selection of statistical techniques best qualified for bioprocess data analysis and modeling is a key criterion. In this work a series of recombinant Escherichia coli fed-batch production processes with varying cultivation conditions employing a comprehensive on- and offline process monitoring platform was conducted. The applicability of two machine learning methods, random forest and neural networks, for the prediction of cell dry mass and recombinant protein based on online available process parameters and two-dimensional multi-wavelength fluorescence spectroscopy is investigated. Models solely based on routinely measured process variables give a satisfying prediction accuracy of about ± 4% for the cell dry mass, while additional spectroscopic information allows for an estimation of the protein concentration within ± 12%. The results clearly argue for a combined approach: neural networks as modeling technique and random forest as variable selection tool.

  17. Classification of savanna tree species, in the Greater Kruger National Park region, by integrating hyperspectral and LiDAR data in a Random Forest data mining environment

    NASA Astrophysics Data System (ADS)

    Naidoo, L.; Cho, M. A.; Mathieu, R.; Asner, G.

    2012-04-01

    The accurate classification and mapping of individual trees at species level in the savanna ecosystem can provide numerous benefits for the managerial authorities. Such benefits include the mapping of economically useful tree species, which are a key source of food production and fuel wood for the local communities, and of problematic alien invasive and bush encroaching species, which can threaten the integrity of the environment and livelihoods of the local communities. Species level mapping is particularly challenging in African savannas which are complex, heterogeneous, and open environments with high intra-species spectral variability due to differences in geology, topography, rainfall, herbivory and human impacts within relatively short distances. Savanna vegetation are also highly irregular in canopy and crown shape, height and other structural dimensions with a combination of open grassland patches and dense woody thicket - a stark contrast to the more homogeneous forest vegetation. This study classified eight common savanna tree species in the Greater Kruger National Park region, South Africa, using a combination of hyperspectral and Light Detection and Ranging (LiDAR)-derived structural parameters, in the form of seven predictor datasets, in an automated Random Forest modelling approach. The most important predictors, which were found to play an important role in the different classification models and contributed to the success of the hybrid dataset model when combined, were species tree height; NDVI; the chlorophyll b wavelength (466 nm) and a selection of raw, continuum removed and Spectral Angle Mapper (SAM) bands. It was also concluded that the hybrid predictor dataset Random Forest model yielded the highest classification accuracy and prediction success for the eight savanna tree species with an overall classification accuracy of 87.68% and KHAT value of 0.843.

  18. Effect of sample size on multi-parametric prediction of tissue outcome in acute ischemic stroke using a random forest classifier

    NASA Astrophysics Data System (ADS)

    Forkert, Nils Daniel; Fiehler, Jens

    2015-03-01

    The tissue outcome prediction in acute ischemic stroke patients is highly relevant for clinical and research purposes. It has been shown that the combined analysis of diffusion and perfusion MRI datasets using high-level machine learning techniques leads to an improved prediction of final infarction compared to single perfusion parameter thresholding. However, most high-level classifiers require a previous training and, until now, it is ambiguous how many subjects are required for this, which is the focus of this work. 23 MRI datasets of acute stroke patients with known tissue outcome were used in this work. Relative values of diffusion and perfusion parameters as well as the binary tissue outcome were extracted on a voxel-by- voxel level for all patients and used for training of a random forest classifier. The number of patients used for training set definition was iteratively and randomly reduced from using all 22 other patients to only one other patient. Thus, 22 tissue outcome predictions were generated for each patient using the trained random forest classifiers and compared to the known tissue outcome using the Dice coefficient. Overall, a logarithmic relation between the number of patients used for training set definition and tissue outcome prediction accuracy was found. Quantitatively, a mean Dice coefficient of 0.45 was found for the prediction using the training set consisting of the voxel information from only one other patient, which increases to 0.53 if using all other patients (n=22). Based on extrapolation, 50-100 patients appear to be a reasonable tradeoff between tissue outcome prediction accuracy and effort required for data acquisition and preparation.

  19. Pharmacogenetics-based warfarin dosing algorithm decreases time to stable anticoagulation and the risk of major hemorrhage: an updated meta-analysis of randomized controlled trials.

    PubMed

    Wang, Zhi-Quan; Zhang, Rui; Zhang, Peng-Pai; Liu, Xiao-Hong; Sun, Jian; Wang, Jun; Feng, Xiang-Fei; Lu, Qiu-Fen; Li, Yi-Gang

    2015-04-01

    Warfarin is yet the most widely used oral anticoagulant for thromboembolic diseases, despite the recently emerged novel anticoagulants. However, difficulty in maintaining stable dose within the therapeutic range and subsequent serious adverse effects markedly limited its use in clinical practice. Pharmacogenetics-based warfarin dosing algorithm is a recently emerged strategy to predict the initial and maintaining dose of warfarin. However, whether this algorithm is superior over conventional clinically guided dosing algorithm remains controversial. We made a comparison of pharmacogenetics-based versus clinically guided dosing algorithm by an updated meta-analysis. We searched OVID MEDLINE, EMBASE, and the Cochrane Library for relevant citations. The primary outcome was the percentage of time in therapeutic range. The secondary outcomes were time to stable therapeutic dose and the risks of adverse events including all-cause mortality, thromboembolic events, total bleedings, and major bleedings. Eleven randomized controlled trials with 2639 participants were included. Our pooled estimates indicated that pharmacogenetics-based dosing algorithm did not improve percentage of time in therapeutic range [weighted mean difference, 4.26; 95% confidence interval (CI), -0.50 to 9.01; P = 0.08], but it significantly shortened the time to stable therapeutic dose (weighted mean difference, -8.67; 95% CI, -11.86 to -5.49; P < 0.00001). Additionally, pharmacogenetics-based algorithm significantly reduced the risk of major bleedings (odds ratio, 0.48; 95% CI, 0.23 to 0.98; P = 0.04), but it did not reduce the risks of all-cause mortality, total bleedings, or thromboembolic events. Our results suggest that pharmacogenetics-based warfarin dosing algorithm significantly improves the efficiency of International Normalized Ratio correction and reduces the risk of major hemorrhage.

  20. Landslide susceptibility assessment in Lianhua County (China): A comparison between a random forest data mining technique and bivariate and multivariate statistical models

    NASA Astrophysics Data System (ADS)

    Hong, Haoyuan; Pourghasemi, Hamid Reza; Pourtaghi, Zohre Sadat

    2016-04-01

    Landslides are an important natural hazard that causes a great amount of damage around the world every year, especially during the rainy season. The Lianhua area is located in the middle of China's southern mountainous area, west of Jiangxi Province, and is known to be an area prone to landslides. The aim of this study was to evaluate and compare landslide susceptibility maps produced using the random forest (RF) data mining technique with those produced by bivariate (evidential belief function and frequency ratio) and multivariate (logistic regression) statistical models for Lianhua County, China. First, a landslide inventory map was prepared using aerial photograph interpretation, satellite images, and extensive field surveys. In total, 163 landslide events were recognized in the study area, with 114 landslides (70%) used for training and 49 landslides (30%) used for validation. Next, the landslide conditioning factors-including the slope angle, altitude, slope aspect, topographic wetness index (TWI), slope-length (LS), plan curvature, profile curvature, distance to rivers, distance to faults, distance to roads, annual precipitation, land use, normalized difference vegetation index (NDVI), and lithology-were derived from the spatial database. Finally, the landslide susceptibility maps of Lianhua County were generated in ArcGIS 10.1 based on the random forest (RF), evidential belief function (EBF), frequency ratio (FR), and logistic regression (LR) approaches and were validated using a receiver operating characteristic (ROC) curve. The ROC plot assessment results showed that for landslide susceptibility maps produced using the EBF, FR, LR, and RF models, the area under the curve (AUC) values were 0.8122, 0.8134, 0.7751, and 0.7172, respectively. Therefore, we can conclude that all four models have an AUC of more than 0.70 and can be used in landslide susceptibility mapping in the study area; meanwhile, the EBF and FR models had the best performance for Lianhua

  1. Reliable rain rates from optical satellite sensors - a random forests-based approach for the hourly retrieval of rainfall rates from Meteosat SEVIRI

    NASA Astrophysics Data System (ADS)

    Kühnlein, Meike; Appelhans, Tim; Thies, Boris; Nauss, Thomas

    2013-04-01

    Many ecological and biodiversity-oriented projects require area-wide precipitation information and satellite-based rainfall retrievals are often the only option. Using optical and microphysical cloud property retrievals, area-wide information about the distribution of precipitating clouds can generally be provided from optical sensors aboard geostationary (GEO) weather satellites. However, the retrieval of spatio-temporal high resolution rainfall amounts from such sensors bears large uncertainties. In existing optical retrievals, the rainfall rate is generally retrieved as a function of the cloud-top temperature which leads to sufficient results for deep-convective systems but such a concept is inappropriate for any kind of advective/stratiform precipitation formation processes. To overcome this drawback, several authors suggest to use optical and microphysical cloud parameters not only for the rain-area delineation but also for the rain rate retrieval. In the present study, a method has been developed to estimate hourly rainfall rates using cloud physical properties retrieved from MSG SEVIRI data. The rainfall rate assignment is realized by using an ensemble classification and regression technique, called random forests. This method is already widely established in other disciplines, but has not yet been utilized extensively by climatologists. Random forests is used to assign rainfall rates to already identified rain areas in a two-step approach. First, the rain area is separated into areas of precipitation processes. Next, rainfall rates are assigned to these areas. For the development and validation of the new technique, radar-based precipitation data of the German Weather Service is used. The so-called RADOLAN RW product provide gauge-adjusted hourly precipitation amounts at a temporal resolution of one hour. Germany is chosen as the study area of the new technique. The region can be regarded as sufficiently representative for mid-latitudes precipitation

  2. Influence of multi-source and multi-temporal remotely sensed and ancillary data on the accuracy of random forest classification of wetlands in northern Minnesota

    USGS Publications Warehouse

    Corcoran, Jennifer M.; Knight, Joseph F.; Gallant, Alisa L.

    2013-01-01

    Wetland mapping at the landscape scale using remotely sensed data requires both affordable data and an efficient accurate classification method. Random forest classification offers several advantages over traditional land cover classification techniques, including a bootstrapping technique to generate robust estimations of outliers in the training data, as well as the capability of measuring classification confidence. Though the random forest classifier can generate complex decision trees with a multitude of input data and still not run a high risk of over fitting, there is a great need to reduce computational and operational costs by including only key input data sets without sacrificing a significant level of accuracy. Our main questions for this study site in Northern Minnesota were: (1) how does classification accuracy and confidence of mapping wetlands compare using different remote sensing platforms and sets of input data; (2) what are the key input variables for accurate differentiation of upland, water, and wetlands, including wetland type; and (3) which datasets and seasonal imagery yield the best accuracy for wetland classification. Our results show the key input variables include terrain (elevation and curvature) and soils descriptors (hydric), along with an assortment of remotely sensed data collected in the spring (satellite visible, near infrared, and thermal bands; satellite normalized vegetation index and Tasseled Cap greenness and wetness; and horizontal-horizontal (HH) and horizontal-vertical (HV) polarization using L-band satellite radar). We undertook this exploratory analysis to inform decisions by natural resource managers charged with monitoring wetland ecosystems and to aid in designing a system for consistent operational mapping of wetlands across landscapes similar to those found in Northern Minnesota.

  3. Estimating the granularity coefficient of a Potts-Markov random field within a Markov chain Monte Carlo algorithm.

    PubMed

    Pereyra, Marcelo; Dobigeon, Nicolas; Batatia, Hadj; Tourneret, Jean-Yves

    2013-06-01

    This paper addresses the problem of estimating the Potts parameter β jointly with the unknown parameters of a Bayesian model within a Markov chain Monte Carlo (MCMC) algorithm. Standard MCMC methods cannot be applied to this problem because performing inference on β requires computing the intractable normalizing constant of the Potts model. In the proposed MCMC method, the estimation of β is conducted using a likelihood-free Metropolis-Hastings algorithm. Experimental results obtained for synthetic data show that estimating β jointly with the other unknown parameters leads to estimation results that are as good as those obtained with the actual value of β. On the other hand, choosing an incorrect value of β can degrade estimation performance significantly. To illustrate the interest of this method, the proposed algorithm is successfully applied to real bidimensional SAR and tridimensional ultrasound images.

  4. Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange

    SciTech Connect

    Moussa, Jonathan E.

    2014-01-07

    The random-phase approximation with second-order screened exchange (RPA+SOSEX) is a model of electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of mean field, and it scales as O(n{sup 5}) operations and O(n{sup 3}) memory for n electrons. We derive a new algorithm that reduces its scaling to O(n{sup 3}) operations and O(n{sup 2}) memory using controlled approximations and a new self-consistent field that approximates Brueckner coupled-cluster doubles theory with RPA+SOSEX, referred to as Brueckner RPA theory. The algorithm comparably reduces the scaling of second-order Møller-Plesset perturbation theory with smaller cost prefactors than RPA+SOSEX. Within a semiempirical model, we study H{sub 2} dissociation to test accuracy and H{sub n} rings to verify scaling.

  5. Free variable selection QSPR study to predict 19F chemical shifts of some fluorinated organic compounds using Random Forest and RBF-PLS methods

    NASA Astrophysics Data System (ADS)

    Goudarzi, Nasser

    2016-04-01

    In this work, two new and powerful chemometrics methods are applied for the modeling and prediction of the 19F chemical shift values of some fluorinated organic compounds. The radial basis function-partial least square (RBF-PLS) and random forest (RF) are employed to construct the models to predict the 19F chemical shifts. In this study, we didn't used from any variable selection method and RF method can be used as variable selection and modeling technique. Effects of the important parameters affecting the ability of the RF prediction power such as the number of trees (nt) and the number of randomly selected variables to split each node (m) were investigated. The root-mean-square errors of prediction (RMSEP) for the training set and the prediction set for the RBF-PLS and RF models were 44.70, 23.86, 29.77, and 23.69, respectively. Also, the correlation coefficients of the prediction set for the RBF-PLS and RF models were 0.8684 and 0.9313, respectively. The results obtained reveal that the RF model can be used as a powerful chemometrics tool for the quantitative structure-property relationship (QSPR) studies.

  6. Hierarchical Bayesian spatial models for predicting multiple forest variables using waveform LiDAR, hyperspectral imagery, and large inventory datasets

    USGS Publications Warehouse

    Finley, Andrew O.; Banerjee, Sudipto; Cook, Bruce D.; Bradford, John B.

    2013-01-01

    In this paper we detail a multivariate spatial regression model that couples LiDAR, hyperspectral and forest inventory data to predict forest outcome variables at a high spatial resolution. The proposed model is used to analyze forest inventory data collected on the US Forest Service Penobscot Experimental Forest (PEF), ME, USA. In addition to helping meet the regression model's assumptions, results from the PEF analysis suggest that the addition of multivariate spatial random effects improves model fit and predictive ability, compared with two commonly applied modeling approaches. This improvement results from explicitly modeling the covariation among forest outcome variables and spatial dependence among observations through the random effects. Direct application of such multivariate models to even moderately large datasets is often computationally infeasible because of cubic order matrix algorithms involved in estimation. We apply a spatial dimension reduction technique to help overcome this computational hurdle without sacrificing richness in modeling.

  7. Identification of copper phthalocyanine blue polymorphs in unaged and aged paint systems by means of micro-Raman spectroscopy and Random Forest.

    PubMed

    Anghelone, Marta; Jembrih-Simbürger, Dubravka; Schreiner, Manfred

    2015-10-01

    Copper phthalocyanine (CuPc) blues (PB15) are largely used in art and industry as pigments. In these fields mainly three different polymorphic modifications of PB15 are employed: alpha, beta and epsilon. Differentiating among these CuPc forms can give important information for developing conservation strategy and can help in relative dating, since each form was introduced in the market in different time periods. This study focuses on the classification of Raman spectra measured using 532 nm excitation wavelength on: (i) dry pigment powders, (ii) unaged mock-ups of self-made paints, (iii) unaged commercial paints, and (iv) paints subjected to accelerated UV ageing. The ratios among integrated Raman bands are taken in consideration as features to perform Random Forest (RF). Features selection based on Gini Contrast score was carried out on the measured dataset to determine the Raman bands ratios with higher predictive power. These were used as polymorphic markers, in order to establish an easy and accessible method for the identification. Three different ratios and the presence of a characteristic vibrational band allowed the identification of the crystal modification in pigments powder as well as in unaged and aged paint films.

  8. Integration of Random Forest with population-based outlier analyses provides insight on the genomic basis and evolution of run timing in Chinook salmon (Oncorhynchus tshawytscha).

    PubMed

    Brieuc, Marine S O; Ono, Kotaro; Drinan, Daniel P; Naish, Kerry A

    2015-06-01

    Anadromous Chinook salmon populations vary in the period of river entry at the initiation of adult freshwater migration, facilitating optimal arrival at natal spawning. Run timing is a polygenic trait that shows evidence of rapid parallel evolution in some lineages, signifying a key role for this phenotype in the ecological divergence between populations. Studying the genetic basis of local adaptation in quantitative traits is often impractical in wild populations. Therefore, we used a novel approach, Random Forest, to detect markers linked to run timing across 14 populations from contrasting environments in the Columbia River and Puget Sound, USA. The approach permits detection of loci of small effect on the phenotype. Divergence between populations at these loci was then examined using both principle component analysis and FST outlier analyses, to determine whether shared genetic changes resulted in similar phenotypes across different lineages. Sequencing of 9107 RAD markers in 414 individuals identified 33 predictor loci explaining 79.2% of trait variance. Discriminant analysis of principal components of the predictors revealed both shared and unique evolutionary pathways in the trait across different lineages, characterized by minor allele frequency changes. However, genome mapping of predictor loci also identified positional overlap with two genomic outlier regions, consistent with selection on loci of large effect. Therefore, the results suggest selective sweeps on few loci and minor changes in loci that were detected by this study. Use of a polygenic framework has provided initial insight into how divergence in a trait has occurred in the wild. PMID:25913096

  9. Identification of copper phthalocyanine blue polymorphs in unaged and aged paint systems by means of micro-Raman spectroscopy and Random Forest.

    PubMed

    Anghelone, Marta; Jembrih-Simbürger, Dubravka; Schreiner, Manfred

    2015-10-01

    Copper phthalocyanine (CuPc) blues (PB15) are largely used in art and industry as pigments. In these fields mainly three different polymorphic modifications of PB15 are employed: alpha, beta and epsilon. Differentiating among these CuPc forms can give important information for developing conservation strategy and can help in relative dating, since each form was introduced in the market in different time periods. This study focuses on the classification of Raman spectra measured using 532 nm excitation wavelength on: (i) dry pigment powders, (ii) unaged mock-ups of self-made paints, (iii) unaged commercial paints, and (iv) paints subjected to accelerated UV ageing. The ratios among integrated Raman bands are taken in consideration as features to perform Random Forest (RF). Features selection based on Gini Contrast score was carried out on the measured dataset to determine the Raman bands ratios with higher predictive power. These were used as polymorphic markers, in order to establish an easy and accessible method for the identification. Three different ratios and the presence of a characteristic vibrational band allowed the identification of the crystal modification in pigments powder as well as in unaged and aged paint films. PMID:25974675

  10. Integration of Random Forest with population-based outlier analyses provides insight on the genomic basis and evolution of run timing in Chinook salmon (Oncorhynchus tshawytscha).

    PubMed

    Brieuc, Marine S O; Ono, Kotaro; Drinan, Daniel P; Naish, Kerry A

    2015-06-01

    Anadromous Chinook salmon populations vary in the period of river entry at the initiation of adult freshwater migration, facilitating optimal arrival at natal spawning. Run timing is a polygenic trait that shows evidence of rapid parallel evolution in some lineages, signifying a key role for this phenotype in the ecological divergence between populations. Studying the genetic basis of local adaptation in quantitative traits is often impractical in wild populations. Therefore, we used a novel approach, Random Forest, to detect markers linked to run timing across 14 populations from contrasting environments in the Columbia River and Puget Sound, USA. The approach permits detection of loci of small effect on the phenotype. Divergence between populations at these loci was then examined using both principle component analysis and FST outlier analyses, to determine whether shared genetic changes resulted in similar phenotypes across different lineages. Sequencing of 9107 RAD markers in 414 individuals identified 33 predictor loci explaining 79.2% of trait variance. Discriminant analysis of principal components of the predictors revealed both shared and unique evolutionary pathways in the trait across different lineages, characterized by minor allele frequency changes. However, genome mapping of predictor loci also identified positional overlap with two genomic outlier regions, consistent with selection on loci of large effect. Therefore, the results suggest selective sweeps on few loci and minor changes in loci that were detected by this study. Use of a polygenic framework has provided initial insight into how divergence in a trait has occurred in the wild.

  11. Preoperative overnight parenteral nutrition (TPN) improves skeletal muscle protein metabolism indicated by microarray algorithm analyses in a randomized trial.

    PubMed

    Iresjö, Britt-Marie; Engström, Cecilia; Lundholm, Kent

    2016-06-01

    Loss of muscle mass is associated with increased risk of morbidity and mortality in hospitalized patients. Uncertainties of treatment efficiency by short-term artificial nutrition remain, specifically improvement of protein balance in skeletal muscles. In this study, algorithmic microarray analysis was applied to map cellular changes related to muscle protein metabolism in human skeletal muscle tissue during provision of overnight preoperative total parenteral nutrition (TPN). Twenty-two patients (11/group) scheduled for upper GI surgery due to malignant or benign disease received a continuous peripheral all-in-one TPN infusion (30 kcal/kg/day, 0.16 gN/kg/day) or saline infusion for 12 h prior operation. Biopsies from the rectus abdominis muscle were taken at the start of operation for isolation of muscle RNA RNA expression microarray analyses were performed with Agilent Sureprint G3, 8 × 60K arrays using one-color labeling. 447 mRNAs were differently expressed between study and control patients (P < 0.1). mRNAs related to ribosomal biogenesis, mRNA processing, and translation were upregulated during overnight nutrition; particularly anabolic signaling S6K1 (P < 0.01-0.1). Transcripts of genes associated with lysosomal degradation showed consistently lower expression during TPN while mRNAs for ubiquitin-mediated degradation of proteins as well as transcripts related to intracellular signaling pathways, PI3 kinase/MAPkinase, were either increased or decreased. In conclusion, muscle mRNA alterations during overnight standard TPN infusions at constant rate altered mRNAs associated with mTOR signaling; increased initiation of protein translation; and suppressed autophagy/lysosomal degradation of proteins. This indicates that overnight preoperative parenteral nutrition is effective to promote muscle protein metabolism. PMID:27273879

  12. Preoperative overnight parenteral nutrition (TPN) improves skeletal muscle protein metabolism indicated by microarray algorithm analyses in a randomized trial.

    PubMed

    Iresjö, Britt-Marie; Engström, Cecilia; Lundholm, Kent

    2016-06-01

    Loss of muscle mass is associated with increased risk of morbidity and mortality in hospitalized patients. Uncertainties of treatment efficiency by short-term artificial nutrition remain, specifically improvement of protein balance in skeletal muscles. In this study, algorithmic microarray analysis was applied to map cellular changes related to muscle protein metabolism in human skeletal muscle tissue during provision of overnight preoperative total parenteral nutrition (TPN). Twenty-two patients (11/group) scheduled for upper GI surgery due to malignant or benign disease received a continuous peripheral all-in-one TPN infusion (30 kcal/kg/day, 0.16 gN/kg/day) or saline infusion for 12 h prior operation. Biopsies from the rectus abdominis muscle were taken at the start of operation for isolation of muscle RNA RNA expression microarray analyses were performed with Agilent Sureprint G3, 8 × 60K arrays using one-color labeling. 447 mRNAs were differently expressed between study and control patients (P < 0.1). mRNAs related to ribosomal biogenesis, mRNA processing, and translation were upregulated during overnight nutrition; particularly anabolic signaling S6K1 (P < 0.01-0.1). Transcripts of genes associated with lysosomal degradation showed consistently lower expression during TPN while mRNAs for ubiquitin-mediated degradation of proteins as well as transcripts related to intracellular signaling pathways, PI3 kinase/MAPkinase, were either increased or decreased. In conclusion, muscle mRNA alterations during overnight standard TPN infusions at constant rate altered mRNAs associated with mTOR signaling; increased initiation of protein translation; and suppressed autophagy/lysosomal degradation of proteins. This indicates that overnight preoperative parenteral nutrition is effective to promote muscle protein metabolism.

  13. A Comparative Assessment of the Influences of Human Impacts on Soil Cd Concentrations Based on Stepwise Linear Regression, Classification and Regression Tree, and Random Forest Models

    PubMed Central

    Qiu, Lefeng; Wang, Kai; Long, Wenli; Wang, Ke; Hu, Wei; Amable, Gabriel S.

    2016-01-01

    Soil cadmium (Cd) contamination has attracted a great deal of attention because of its detrimental effects on animals and humans. This study aimed to develop and compare the performances of stepwise linear regression (SLR), classification and regression tree (CART) and random forest (RF) models in the prediction and mapping of the spatial distribution of soil Cd and to identify likely sources of Cd accumulation in Fuyang County, eastern China. Soil Cd data from 276 topsoil (0–20 cm) samples were collected and randomly divided into calibration (222 samples) and validation datasets (54 samples). Auxiliary data, including detailed land use information, soil organic matter, soil pH, and topographic data, were incorporated into the models to simulate the soil Cd concentrations and further identify the main factors influencing soil Cd variation. The predictive models for soil Cd concentration exhibited acceptable overall accuracies (72.22% for SLR, 70.37% for CART, and 75.93% for RF). The SLR model exhibited the largest predicted deviation, with a mean error (ME) of 0.074 mg/kg, a mean absolute error (MAE) of 0.160 mg/kg, and a root mean squared error (RMSE) of 0.274 mg/kg, and the RF model produced the results closest to the observed values, with an ME of 0.002 mg/kg, an MAE of 0.132 mg/kg, and an RMSE of 0.198 mg/kg. The RF model also exhibited the greatest R2 value (0.772). The CART model predictions closely followed, with ME, MAE, RMSE, and R2 values of 0.013 mg/kg, 0.154 mg/kg, 0.230 mg/kg and 0.644, respectively. The three prediction maps generally exhibited similar and realistic spatial patterns of soil Cd contamination. The heavily Cd-affected areas were primarily located in the alluvial valley plain of the Fuchun River and its tributaries because of the dramatic industrialization and urbanization processes that have occurred there. The most important variable for explaining high levels of soil Cd accumulation was the presence of metal smelting industries. The

  14. Discovery of Novel Hepatitis C Virus NS5B Polymerase Inhibitors by Combining Random Forest, Multiple e-Pharmacophore Modeling and Docking

    PubMed Central

    Wei, Yu; Li, Jinlong; Qing, Jie; Huang, Mingjie; Wu, Ming; Gao, Fenghua; Li, Dongmei; Hong, Zhangyong; Kong, Lingbao; Huang, Weiqiang; Lin, Jianping

    2016-01-01

    The NS5B polymerase is one of the most attractive targets for developing new drugs to block Hepatitis C virus (HCV) infection. We describe the discovery of novel potent HCV NS5B polymerase inhibitors by employing a virtual screening (VS) approach, which is based on random forest (RB-VS), e-pharmacophore (PB-VS), and docking (DB-VS) methods. In the RB-VS stage, after feature selection, a model with 16 descriptors was used. In the PB-VS stage, six energy-based pharmacophore (e-pharmacophore) models from different crystal structures of the NS5B polymerase with ligands binding at the palm I, thumb I and thumb II regions were used. In the DB-VS stage, the Glide SP and XP docking protocols with default parameters were employed. In the virtual screening approach, the RB-VS, PB-VS and DB-VS methods were applied in increasing order of complexity to screen the InterBioScreen database. From the final hits, we selected 5 compounds for further anti-HCV activity and cellular cytotoxicity assay. All 5 compounds were found to inhibit NS5B polymerase with IC50 values of 2.01–23.84 μM and displayed anti-HCV activities with EC50 values ranging from 1.61 to 21.88 μM, and all compounds displayed no cellular cytotoxicity (CC50 > 100 μM) except compound N2, which displayed weak cytotoxicity with a CC50 value of 51.3 μM. The hit compound N2 had the best antiviral activity against HCV, with a selective index of 32.1. The 5 hit compounds with new scaffolds could potentially serve as NS5B polymerase inhibitors through further optimization and development. PMID:26845440

  15. Exploring metabolic syndrome serum free fatty acid profiles based on GC-SIM-MS combined with random forests and canonical correlation analysis.

    PubMed

    Dai, Ling; Gonçalves, Carlos M Vicente; Lin, Zhang; Huang, Jianhua; Lu, Hongmei; Yi, Lunzhao; Liang, Yizeng; Wang, Dongsheng; An, Dong

    2015-04-01

    Metabolic syndrome (MetS) is a cluster of metabolic abnormalities associated with an increased risk of developing cardiovascular diseases or type II diabetes. Till now, the etiology of MetS is complex and still unknown. Metabolic profiling is a powerful tool for exploring metabolic perturbations and potential biomarkers, thus may shed light on the pathophysiological mechanism of diseases. In this study, fatty acid profiling was employed to exploit the metabolic disturbances and discover potential biomarkers of MetS. Fatty acid profiles of serum samples from metabolic syndrome patients and healthy controls were first analyzed by gas chromatography-selected ion monitoring-mass spectrometry (GC-SIM-MS), a robust method for quantitation of fatty acids. Then, the supervised multivariate statistical method of random forests (RF) was used to establish a classification and prediction model for MetS, which could assist the diagnosis of MetS. Furthermore, canonical correlation analysis (CCA) was employed to investigate the relationships between free fatty acids (FFAs) and clinical parameters. As a result, several FFAs, including C16:1n-9c, C20:1n-9c and C22:4n-6c, were identified as potential biomarkers of MetS. The results also indicated that high density lipoprotein-cholesterol (HDL-C), triglycerides (TG) and fasting blood glucose (FBG) were the most important parameters which were closely correlated with FFAs disturbances of MetS, thus they should be paid more attention in clinical practice for monitoring FFAs disturbances of MetS than waist circumference (WC) and systolic blood pressure/diastolic blood pressure (SBP/DBP). The results have demonstrated that metabolic profiling by GC-SIM-MS combined with RF and CCA may be a useful tool for discovering the perturbations of serum FFAs and possible biomarkers for MetS.

  16. A decision support system based on an ensemble of random forests for improving the management of women with abnormal findings at cervical cancer screening.

    PubMed

    Bountris, Panagiotis; Haritou, Maria; Pouliakis, Abraham; Karakitsos, Petros; Koutsouris, Dimitrios

    2015-08-01

    In most cases, cervical cancer (CxCa) develops due to underestimated abnormalities in the Pap test. Today, there are ancillary molecular biology techniques available that provide important information related to CxCa and the Human Papillomavirus (HPV) natural history, including HPV DNA tests, HPV mRNA tests and immunocytochemistry techniques such as overexpression of p16. These techniques are either highly sensitive or highly specific, however not both at the same time, thus no perfect method is available today. In this paper we present a decision support system (DSS) based on an ensemble of Random Forests (RFs) for the intelligent combination of the results of classic and ancillary techniques that are available for CxCa detection, in order to exploit the benefits of each technique and produce more accurate results. The proposed system achieved both, high sensitivity (86.1%) and high specificity (93.3%), as well as high overall accuracy (91.8%), in detecting cervical intraepithelial neoplasia grade 2 or worse (CIN2+). The system's performance was better than any other single test involved in this study. Moreover, the proposed architecture of employing an ensemble of RFs proved to be better than the single classifier approach. The presented system can handle cases with missing tests and more importantly cases with inadequate cytological outcome, thus it can also produce accurate results in the case of stand-alone HPV-based screening, where Pap test is not applied. The proposed system may identify women at true risk of developing CxCa and guide personalised management and therapeutic interventions.

  17. Application of the Random Forest method to analyse epidemiological and phenotypic characteristics of Salmonella 4,[5],12:i:- and Salmonella Typhimurium strains.

    PubMed

    Barco, L; Mancin, M; Ruffa, M; Saccardin, C; Minorello, C; Zavagnin, P; Lettini, A A; Olsen, J E; Ricci, A

    2012-11-01

    Salmonella enterica 4,[5],12:i:- is a monophasic variant of S. Typhimurium. In the last decade, its prevalence rose sharply. Although S. 4,[5],12:i:- and S. Typhimurium are known to pose a considerable public health risk, there is no detailed information on the circulation of these serovars in Italy, particularly as far as veterinary isolates are concerned. For this reason, a data set of 877 strains isolated in the north-east of Italy from foodstuffs, animals and environment was analysed during 2005-2010. The Random Forests (RF) method was used to identify the most important epidemiological and phenotypic variables to show the difference between the two serovars. Both descriptive analysis and RF revealed that S. 4,[5],12:i:- is less heterogeneous than S. Typhimurium. RF highlighted that phage type was the most important variable to differentiate the two serovars. The most common phage types identified for S. 4,[5],12:i:- were DT20a, U311 and DT193. The same phage types were also found in S. Typhimurium isolates, although with a much lower prevalence. DT7 and DT120 were ascribed to the two serovars at comparable levels. DT104, DT2 and DT99 were ascribed exclusively to S. Typhimurium, and almost all the other phage types identified were more related to the latter serovar. Such data confirm that phage typing can provide an indication of the biphasic or monophasic state of the strains investigated and could therefore support serotyping results. However, phage typing cannot be used as the definitive method to differentiate the two serovars, as part of the phage types were detected for both serovars and, in particular, all phage types found for S. 4,[5],12:i- were found also for S. Typhimurium.

  18. Mapping forests in Monsoon Asia with ALOS PALSAR and MODIS imagery in 2010

    NASA Astrophysics Data System (ADS)

    Qin, Y.

    2015-12-01

    Spatial distribution and temporal dynamics of forests are important to climate change, carbon cycle, and biodiversity. An accurate forest map is required in monsoon Asia where extensive forest changes occurred. An algorithm was developed to map the distribution of forests in Monsoon Asia in 2010, with the integration of structure information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset, and phenology information from MOD13Q1 NDVI, and MOD09A1 land surface reflectance products. The PALSAR-based forest map was generated based on a decision tree classification, and assessed with the randomly selected ground truth samples from high spatial resolution images in Google Earth. The spatial and area comparison were performed between our forest map (OU/Fudan F/NF) and other forest maps generated by Japanese Space Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests). Then we investigate the reasons for the large uncertainties among these typical forest maps in 2010. This study could provide a way to monitor the dynamics of forests using the Synthetic Aperture Radar (SAR) and optical satellite images, and the resultant F/NF datasets can be used to analyze the impacts of changes in forests on climate and ecosystems.

  19. Quantumness, Randomness and Computability

    NASA Astrophysics Data System (ADS)

    Solis, Aldo; Hirsch, Jorge G.

    2015-06-01

    Randomness plays a central role in the quantum mechanical description of our interactions. We review the relationship between the violation of Bell inequalities, non signaling and randomness. We discuss the challenge in defining a random string, and show that algorithmic information theory provides a necessary condition for randomness using Borel normality. We close with a view on incomputablity and its implications in physics.

  20. Construction of an Indonesian herbal constituents database and its use in Random Forest modelling in a search for inhibitors of aldose reductase.

    PubMed

    Naeem, Sadaf; Hylands, Peter; Barlow, David

    2012-02-01

    Data on phytochemical constituents of plants commonly used in traditional Indonesian medicine have been compiled as a computer database. This database (the Indonesian Herbal constituents database, IHD) currently contains details on ∼1,000 compounds found in 33 different plants. For each entry, the IHD gives details of chemical structure, trivial and systematic name, CAS registry number, pharmacology (where known), toxicology (LD(50)), botanical species, the part(s) of the plant(s) where the compounds are found, typical dosage(s) and reference(s). A second database has been also been compiled for plant-derived compounds with known activity against the enzyme, aldose reductase (AR). This database (the aldose reductase inhibitors database, ARID) contains the same details as the IHD, and currently comprises information on 120 different AR inhibitors. Virtual screening of all compounds in the IHD has been performed using Random Forest (RF) modelling, in a search for novel leads active against AR-to provide for new forms of symptomatic relief in diabetic patients. For the RF modelling, a set of simple 2D chemical descriptors were employed to classify all compounds in the combined ARID and IHD databases as either active or inactive as AR inhibitors. The resulting RF models (which gave misclassification rates of 21%) were used to identify putative new AR inhibitors in the IHD, with such compounds being identified as those giving RF scores >0.5 (in each of the three different RF models developed). In vitro assays were subsequently performed for four of the compounds obtained as hits in this in silico screening, to determine their inhibitory activity against human recombinant AR. The two compounds having the highest RF scores (prunetin and ononin) were shown to have the highest activities experimentally (giving ∼58% and ∼52% inhibition at a concentration of 15μM, respectively), while the compounds with lowest RF scores (vanillic acid and cinnamic acid) showed the

  1. Woody vegetation cover monitoring with multi-temporal Landsat data and Random Forests: the case of the Northwest Province (South Africa)

    NASA Astrophysics Data System (ADS)

    Symeonakis, Elias; Higginbottom, Thomas; Petroulaki, Kyriaki

    2016-04-01

    Land degradation and desertification (LDD) are serious global threats to humans and the environment. Globally, 10-20% of drylands and 24% of the world's productive lands are potentially degraded, which affects 1.5 billion people and reduces GDP by €3.4 billion. In Africa, LDD processes affect up to a third of savannahs, leading to a decline in the ecosystem services provided to some of the continent's poorest and most vulnerable communities. Indirectly, LDD can be monitored using relevant indicators. The encroachment of woody plants into grasslands, and the subsequent conversion of savannahs and open woodlands into shrublands, has attracted a lot of attention over the last decades and has been identified as an indicator of LDD. According to some assessments, bush encroachment has rendered 1.1 million ha of South African savanna unusable, threatens another 27 million ha (~17% of the country), and has reduced the grazing capacity throughout the region by up to 50%. Mapping woody cover encroachment over large areas can only be effectively achieved using remote sensing data and techniques. The longest continuously operating Earth-observation program, the Landsat series, is now freely-available as an atmospherically corrected, cloud masked surface reflectance product. The availability and length of the Landsat archive is thus an unparalleled Earth-observation resource, particularly for long-term change detection and monitoring. Here, we map and monitor woody vegetation cover in the Northwest Province of South Africa, a mosaic of 12 Landsat scenes that expands over more than 100,000km2. We employ a multi-temporal approach with dry-season TM, ETM+ and OLI data from 15 epochs between 1989 to 2015. We use 0.5m-pixel colour aerial photography to collect >15,000 samples for training and validating a Random Forest model to map woody cover, grasses, crops, urban and bare areas. High classification accuracies are achieved, especially so for the two cover types indirectly

  2. An efficient, three-dimensional, anisotropic, fractional Brownian motion and truncated fractional Levy motion simulation algorithm based on successive random additions

    NASA Astrophysics Data System (ADS)

    Lu, Silong; Molz, Fred J.; Liu, Hui Hai

    2003-02-01

    Fluid flow and solute transport in the subsurface are known to be strongly influenced by the heterogeneity of aquifers. To simulate aquifer properties, such as logarithmic hydraulic conductivity (ln( K)) variations, fractional Brownian motion (fBm) and truncated fractional Levy motion (fLm) were suggested previously. In this paper, an efficient three-dimensional successive random additions (SRA) algorithm is presented to construct spatial ln( K) distributions. A convenient conditioning procedure using the inverse-distance-weighting method as a data interpolator, which forces the generated fBm or truncated fLm realization to go through known data points, is included also. The proposed method coded in the FORTRAN language, and a complementary code for verifying fractal structure in fBm realizations based on dispersional analysis, are validated carefully through numerical tests. These software packages allow one to go beyond the stationary stochastic process hydrology of the 1980s to the new geo-statistics of non-stationary stochastic processes with stationary increments, as embodied by the stochastic fractals fBm, fLm and their associated increments fGn and fLn.

  3. Holocene local forest history at two sites in Småland, southern Sweden - insights from quantitative reconstructions using the Landscape Reconstruction Algorithm

    NASA Astrophysics Data System (ADS)

    Cui, Qiaoyu; Gaillard, Marie-José; Lemdahl, Geoffrey; Olsson, Fredrik; Sugita, Shinya

    2010-05-01

    Quantitative reconstruction of past vegetation using fossil pollen was long very problematic. It is well known that pollen percentages and pollen accumulation rates do not represent vegetation abundance properly because pollen values are influenced by many factors of which inter-taxonomic differences in pollen productivity and vegetation structure are the most important ones. It is also recognized that pollen assemblages from large sites (lakes or bogs) record the characteristics of the regional vegetation, while pollen assemblages from small sites record local features. Based on the theoretical understanding of the factors and mechanisms that affect pollen representation of vegetation, Sugita (2007a and b) proposed the Landscape Reconstruction Algorithm (LRA) to estimate vegetation abundance in percentage cover for well defined spatial scales. The LRA includes two models, REVEALS and LOVE. REVEALS estimates regional vegetation abundance at a spatial scale of 100 km x 100 km. LOVE estimates local vegetation abundance at the spatial scale of the relevant source area of pollen (RSAP sensu Sugita 1993) of the pollen site. REVEALS estimates are needed to apply LOVE in order to calculate the RSAP and the vegetation cover within the RSAP. The two models were validated theoretically and empirically. Two small bogs in southern Sweden were studied for pollen, plant macrofossil, charcoal, and coleoptera in order to reconstruct the local Holocene forest and fire history (e.g. Greisman and Gaillard 2009; Olsson et al. 2009). We applied the LOVE model in order to 1) compare the LOVE estimates with pollen percentages for a better understanding of the local forest history; 2) obtain more precise information on the local vegetation to explain between-sites differences in fire history. We used pollen records from two large lakes in Småland to obtain REVEALS estimates for twelve continuous 500-yrs time windows. Following the strategy of the Swedish VR LANDCLIM project (see Gaillard

  4. Data mining methods in the prediction of Dementia: A real-data comparison of the accuracy, sensitivity and specificity of linear discriminant analysis, logistic regression, neural networks, support vector machines, classification trees and random forests

    PubMed Central

    2011-01-01

    Background Dementia and cognitive impairment associated with aging are a major medical and social concern. Neuropsychological testing is a key element in the diagnostic procedures of Mild Cognitive Impairment (MCI), but has presently a limited value in the prediction of progression to dementia. We advance the hypothesis that newer statistical classification methods derived from data mining and machine learning methods like Neural Networks, Support Vector Machines and Random Forests can improve accuracy, sensitivity and specificity of predictions obtained from neuropsychological testing. Seven non parametric classifiers derived from data mining methods (Multilayer Perceptrons Neural Networks, Radial Basis Function Neural Networks, Support Vector Machines, CART, CHAID and QUEST Classification Trees and Random Forests) were compared to three traditional classifiers (Linear Discriminant Analysis, Quadratic Discriminant Analysis and Logistic Regression) in terms of overall classification accuracy, specificity, sensitivity, Area under the ROC curve and Press'Q. Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. Results Press' Q test showed that all classifiers performed better than chance alone (p < 0.05). Support Vector Machines showed the larger overall classification accuracy (Median (Me) = 0.76) an area under the ROC (Me = 0.90). However this method showed high specificity (Me = 1.0) but low sensitivity (Me = 0.3). Random Forest ranked second in overall accuracy (Me = 0.73) with high area under the ROC (Me = 0.73) specificity (Me = 0.73) and sensitivity (Me = 0.64). Linear Discriminant Analysis also showed acceptable overall accuracy (Me = 0.66), with acceptable area under the ROC (Me = 0.72) specificity (Me = 0.66) and sensitivity (Me = 0.64). The remaining classifiers showed

  5. Aboveground Biomass Modeling from Field and LiDAR Data in Brazilian Amazon Tropical Rain Forest

    NASA Astrophysics Data System (ADS)

    Silva, C. A.; Hudak, A. T.; Vierling, L. A.; Keller, M. M.; Klauberg Silva, C. K.

    2015-12-01

    Tropical forests are an important component of global carbon stocks, but tropical forest responses to climate change are not sufficiently studied or understood. Among remote sensing technologies, airborne LiDAR (Light Detection and Ranging) may be best suited for quantifying tropical forest carbon stocks. Our objective was to estimate aboveground biomass (AGB) using airborne LiDAR and field plot data in Brazilian tropical rain forest. Forest attributes such as tree density, diameter at breast height, and heights were measured at a combination of square plots and linear transects (n=82) distributed across six different geographic zones in the Amazon. Using previously published allometric equations, tree AGB was computed and then summed to calculate total AGB at each sample plot. LiDAR-derived canopy structure metrics were also computed at each sample plot, and random forest regression modelling was applied to predict AGB from selected LiDAR metrics. The LiDAR-derived AGB model was assessed using the random forest explained variation, adjusted coefficient of determination (Adj. R²), root mean square error (RMSE, both absolute and relative) and BIAS (both absolute and relative). Our findings showed that the 99th percentile of height and height skewness were the best LiDAR metrics for AGB prediction. The AGB model using these two best predictors explained 59.59% of AGB variation, with an Adj. R² of 0.92, RMSE of 33.37 Mg/ha (20.28%), and bias of -0.69 (-0.42%). This study showed that LiDAR canopy structure metrics can be used to predict AGC stocks in Tropical Forest with acceptable precision and accuracy. Therefore, we conclude that there is good potential to monitor carbon sequestration in Brazilian Tropical Rain Forest using airborne LiDAR data, large field plots, and the random forest algorithm.

  6. Experimental evidence of quantum randomness incomputability

    SciTech Connect

    Calude, Cristian S.; Dinneen, Michael J.; Dumitrescu, Monica; Svozil, Karl

    2010-08-15

    In contrast with software-generated randomness (called pseudo-randomness), quantum randomness can be proven incomputable; that is, it is not exactly reproducible by any algorithm. We provide experimental evidence of incomputability--an asymptotic property--of quantum randomness by performing finite tests of randomness inspired by algorithmic information theory.

  7. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010

    NASA Astrophysics Data System (ADS)

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M.; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien, III

    2016-02-01

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 106 km2. The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests.

  8. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010

    PubMed Central

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M.; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore III, Berrien

    2016-01-01

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 106 km2. The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests. PMID:26864143

  9. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010.

    PubMed

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien

    2016-01-01

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 10(6 )km(2). The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests. PMID:26864143

  10. Mapping forests in monsoon Asia with ALOS PALSAR 50-m mosaic images and MODIS imagery in 2010.

    PubMed

    Qin, Yuanwei; Xiao, Xiangming; Dong, Jinwei; Zhang, Geli; Roy, Partha Sarathi; Joshi, Pawan Kumar; Gilani, Hammad; Murthy, Manchiraju Sri Ramachandra; Jin, Cui; Wang, Jie; Zhang, Yao; Chen, Bangqian; Menarguez, Michael Angelo; Biradar, Chandrashekhar M; Bajgain, Rajen; Li, Xiangping; Dai, Shengqi; Hou, Ying; Xin, Fengfei; Moore, Berrien

    2016-02-11

    Extensive forest changes have occurred in monsoon Asia, substantially affecting climate, carbon cycle and biodiversity. Accurate forest cover maps at fine spatial resolutions are required to qualify and quantify these effects. In this study, an algorithm was developed to map forests in 2010, with the use of structure and biomass information from the Advanced Land Observation System (ALOS) Phased Array L-band Synthetic Aperture Radar (PALSAR) mosaic dataset and the phenological information from MODerate Resolution Imaging Spectroradiometer (MOD13Q1 and MOD09A1) products. Our forest map (PALSARMOD50 m F/NF) was assessed through randomly selected ground truth samples from high spatial resolution images and had an overall accuracy of 95%. Total area of forests in monsoon Asia in 2010 was estimated to be ~6.3 × 10(6 )km(2). The distribution of evergreen and deciduous forests agreed reasonably well with the median Normalized Difference Vegetation Index (NDVI) in winter. PALSARMOD50 m F/NF map showed good spatial and areal agreements with selected forest maps generated by the Japan Aerospace Exploration Agency (JAXA F/NF), European Space Agency (ESA F/NF), Boston University (MCD12Q1 F/NF), Food and Agricultural Organization (FAO FRA), and University of Maryland (Landsat forests), but relatively large differences and uncertainties in tropical forests and evergreen and deciduous forests.

  11. Algorithms and Algorithmic Languages.

    ERIC Educational Resources Information Center

    Veselov, V. M.; Koprov, V. M.

    This paper is intended as an introduction to a number of problems connected with the description of algorithms and algorithmic languages, particularly the syntaxes and semantics of algorithmic languages. The terms "letter, word, alphabet" are defined and described. The concept of the algorithm is defined and the relation between the algorithm and…

  12. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    PubMed

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching. PMID:26353063

  13. Scalable Nearest Neighbor Algorithms for High Dimensional Data.

    PubMed

    Muja, Marius; Lowe, David G

    2014-11-01

    For many computer vision and machine learning problems, large training sets are key for good performance. However, the most computationally expensive part of many computer vision and machine learning algorithms consists of finding nearest neighbor matches to high dimensional vectors that represent the training data. We propose new algorithms for approximate nearest neighbor matching and evaluate and compare them with previous algorithms. For matching high dimensional features, we find two algorithms to be the most efficient: the randomized k-d forest and a new algorithm proposed in this paper, the priority search k-means tree. We also propose a new algorithm for matching binary features by searching multiple hierarchical clustering trees and show it outperforms methods typically used in the literature. We show that the optimal nearest neighbor algorithm and its parameters depend on the data set characteristics and describe an automated configuration procedure for finding the best algorithm to search a particular data set. In order to scale to very large data sets that would otherwise not fit in the memory of a single machine, we propose a distributed nearest neighbor matching framework that can be used with any of the algorithms described in the paper. All this research has been released as an open source library called fast library for approximate nearest neighbors (FLANN), which has been incorporated into OpenCV and is now one of the most popular libraries for nearest neighbor matching.

  14. Forest Management.

    ERIC Educational Resources Information Center

    Weicherding, Patrick J.; And Others

    This bulletin deals with forest management and provides an overview of forestry for the non-professional. The bulletin is divided into six sections: (1) What Is Forestry Management?; (2) How Is the Forest Measured?; (3) What Is Forest Protection?; (4) How Is the Forest Harvested?; (5) What Is Forest Regeneration?; and (6) What Is Forest…

  15. Mapping forested wetlands in the Great Zhan River Basin through integrating optical, radar, and topographical data classification techniques.

    PubMed

    Na, X D; Zang, S Y; Wu, C S; Li, W L

    2015-11-01

    Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area.

  16. Mapping forested wetlands in the Great Zhan River Basin through integrating optical, radar, and topographical data classification techniques.

    PubMed

    Na, X D; Zang, S Y; Wu, C S; Li, W L

    2015-11-01

    Knowledge of the spatial extent of forested wetlands is essential to many studies including wetland functioning assessment, greenhouse gas flux estimation, and wildlife suitable habitat identification. For discriminating forested wetlands from their adjacent land cover types, researchers have resorted to image analysis techniques applied to numerous remotely sensed data. While with some success, there is still no consensus on the optimal approaches for mapping forested wetlands. To address this problem, we examined two machine learning approaches, random forest (RF) and K-nearest neighbor (KNN) algorithms, and applied these two approaches to the framework of pixel-based and object-based classifications. The RF and KNN algorithms were constructed using predictors derived from Landsat 8 imagery, Radarsat-2 advanced synthetic aperture radar (SAR), and topographical indices. The results show that the objected-based classifications performed better than per-pixel classifications using the same algorithm (RF) in terms of overall accuracy and the difference of their kappa coefficients are statistically significant (p<0.01). There were noticeably omissions for forested and herbaceous wetlands based on the per-pixel classifications using the RF algorithm. As for the object-based image analysis, there were also statistically significant differences (p<0.01) of Kappa coefficient between results performed based on RF and KNN algorithms. The object-based classification using RF provided a more visually adequate distribution of interested land cover types, while the object classifications based on the KNN algorithm showed noticeably commissions for forested wetlands and omissions for agriculture land. This research proves that the object-based classification with RF using optical, radar, and topographical data improved the mapping accuracy of land covers and provided a feasible approach to discriminate the forested wetlands from the other land cover types in forestry area. PMID

  17. Random Effects Diagonal Metric Multidimensional Scaling Models.

    ERIC Educational Resources Information Center

    Clarkson, Douglas B.; Gonzalez, Richard

    2001-01-01

    Defines a random effects diagonal metric multidimensional scaling model, gives its computational algorithms, describes researchers' experiences with these algorithms, and provides an illustration of the use of the model and algorithms. (Author/SLD)

  18. Efficiency and effectiveness of the use of an acenocoumarol pharmacogenetic dosing algorithm versus usual care in patients with venous thromboembolic disease initiating oral anticoagulation: study protocol for a randomized controlled trial

    PubMed Central

    2012-01-01

    Background Hemorrhagic events are frequent in patients on treatment with antivitamin-K oral anticoagulants due to their narrow therapeutic margin. Studies performed with acenocoumarol have shown the relationship between demographic, clinical and genotypic variants and the response to these drugs. Once the influence of these genetic and clinical factors on the dose of acenocoumarol needed to maintain a stable international normalized ratio (INR) has been demonstrated, new strategies need to be developed to predict the appropriate doses of this drug. Several pharmacogenetic algorithms have been developed for warfarin, but only three have been developed for acenocoumarol. After the development of a pharmacogenetic algorithm, the obvious next step is to demonstrate its effectiveness and utility by means of a randomized controlled trial. The aim of this study is to evaluate the effectiveness and efficiency of an acenocoumarol dosing algorithm developed by our group which includes demographic, clinical and pharmacogenetic variables (VKORC1, CYP2C9, CYP4F2 and ApoE) in patients with venous thromboembolism (VTE). Methods and design This is a multicenter, single blind, randomized controlled clinical trial. The protocol has been approved by La Paz University Hospital Research Ethics Committee and by the Spanish Drug Agency. Two hundred and forty patients with VTE in which oral anticoagulant therapy is indicated will be included. Randomization (case/control 1:1) will be stratified by center. Acenocoumarol dose in the control group will be scheduled and adjusted following common clinical practice; in the experimental arm dosing will be following an individualized algorithm developed and validated by our group. Patients will be followed for three months. The main endpoints are: 1) Percentage of patients with INR within the therapeutic range on day seven after initiation of oral anticoagulant therapy; 2) Time from the start of oral anticoagulant treatment to achievement of a

  19. Is random access memory random?

    NASA Technical Reports Server (NTRS)

    Denning, P. J.

    1986-01-01

    Most software is contructed on the assumption that the programs and data are stored in random access memory (RAM). Physical limitations on the relative speeds of processor and memory elements lead to a variety of memory organizations that match processor addressing rate with memory service rate. These include interleaved and cached memory. A very high fraction of a processor's address requests can be satified from the cache without reference to the main memory. The cache requests information from main memory in blocks that can be transferred at the full memory speed. Programmers who organize algorithms for locality can realize the highest performance from these computers.

  20. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests.

    PubMed

    Wilschut, L I; Addink, E A; Heesterbeek, J A P; Dubyanskiy, V M; Davis, S A; Laudisoit, A; M Begon; Burdelov, L A; Atshabar, B B; de Jong, S M

    2013-08-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the 'steppe on floodplain' areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the 'floodplain' areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique empirical

  1. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests.

    PubMed

    Wilschut, L I; Addink, E A; Heesterbeek, J A P; Dubyanskiy, V M; Davis, S A; Laudisoit, A; M Begon; Burdelov, L A; Atshabar, B B; de Jong, S M

    2013-08-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the 'steppe on floodplain' areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the 'floodplain' areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique empirical

  2. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests

    NASA Astrophysics Data System (ADS)

    Wilschut, L. I.; Addink, E. A.; Heesterbeek, J. A. P.; Dubyanskiy, V. M.; Davis, S. A.; Laudisoit, A.; Begon, M.; Burdelov, L. A.; Atshabar, B. B.; de Jong, S. M.

    2013-08-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the ‘steppe on floodplain’ areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the ‘floodplain’ areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique

  3. Random broadcast on random geometric graphs

    SciTech Connect

    Bradonjic, Milan; Elsasser, Robert; Friedrich, Tobias

    2009-01-01

    In this work, we consider the random broadcast time on random geometric graphs (RGGs). The classic random broadcast model, also known as push algorithm, is defined as: starting with one informed node, in each succeeding round every informed node chooses one of its neighbors uniformly at random and informs it. We consider the random broadcast time on RGGs, when with high probability: (i) RGG is connected, (ii) when there exists the giant component in RGG. We show that the random broadcast time is bounded by {Omicron}({radical} n + diam(component)), where diam(component) is a diameter of the entire graph, or the giant component, for the regimes (i), or (ii), respectively. In other words, for both regimes, we derive the broadcast time to be {Theta}(diam(G)), which is asymptotically optimal.

  4. Angular Distribution of Particles Emerging from a Diffusive Region and its Implications for the Fleck-Canfield Random Walk Algorithm for Implicit Monte Carlo Radiation Transport

    SciTech Connect

    Cooper, M.A.

    2000-07-03

    We present various approximations for the angular distribution of particles emerging from an optically thick, purely isotropically scattering region into a vacuum. Our motivation is to use such a distribution for the Fleck-Canfield random walk method [1] for implicit Monte Carlo (IMC) [2] radiation transport problems. We demonstrate that the cosine distribution recommended in the original random walk paper [1] is a poor approximation to the angular distribution predicted by transport theory. Then we examine other approximations that more closely match the transport angular distribution.

  5. Multisource passive acoustic tracking: an application of random finite set data fusion

    NASA Astrophysics Data System (ADS)

    Ali, Andreas M.; Hudson, Ralph E.; Lorenzelli, Flavio; Yao, Kung

    2010-04-01

    Multisource passive acoustic tracking is useful in animal bio-behavioral study by replacing or enhancing human involvement during and after field data collection. Multiple simultaneous vocalizations are a common occurrence in a forest or a jungle, where many species are encountered. Given a set of nodes that are capable of producing multiple direction-of-arrivals (DOAs), such data needs to be combined into meaningful estimates. Random Finite Set provides the mathematical probabilistic model, which is suitable for analysis and optimal estimation algorithm synthesis. Then the proposed algorithm has been verified using a simulation and a controlled test experiment.

  6. On the likelihood of forests

    NASA Astrophysics Data System (ADS)

    Shang, Yilun

    2016-08-01

    How complex a network is crucially impacts its function and performance. In many modern applications, the networks involved have a growth property and sparse structures, which pose challenges to physicists and applied mathematicians. In this paper, we introduce the forest likelihood as a plausible measure to gauge how difficult it is to construct a forest in a non-preferential attachment way. Based on the notions of admittable labeling and path construction, we propose algorithms for computing the forest likelihood of a given forest. Concrete examples as well as the distributions of forest likelihoods for all forests with some fixed numbers of nodes are presented. Moreover, we illustrate the ideas on real-life networks, including a benzenoid tree, a mathematical family tree, and a peer-to-peer network.

  7. Comparison of 3D-OP-OSEM and 3D-FBP reconstruction algorithms for High-Resolution Research Tomograph studies: effects of randoms estimation methods.

    PubMed

    van Velden, Floris H P; Kloet, Reina W; van Berckel, Bart N M; Wolfensberger, Saskia P A; Lammertsma, Adriaan A; Boellaard, Ronald

    2008-06-21

    The High-Resolution Research Tomograph (HRRT) is a dedicated human brain positron emission tomography (PET) scanner. Recently, a 3D filtered backprojection (3D-FBP) reconstruction method has been implemented to reduce bias in short duration frames, currently observed in 3D ordinary Poisson OSEM (3D-OP-OSEM) reconstructions. Further improvements might be expected using a new method of variance reduction on randoms (VRR) based on coincidence histograms instead of using the delayed window technique (DW) to estimate randoms. The goal of this study was to evaluate VRR in combination with 3D-OP-OSEM and 3D-FBP reconstruction techniques. To this end, several phantom studies and a human brain study were performed. For most phantom studies, 3D-OP-OSEM showed higher accuracy of observed activity concentrations with VRR than with DW. However, both positive and negative deviations in reconstructed activity concentrations and large biases of grey to white matter contrast ratio (up to 88%) were still observed as a function of scan statistics. Moreover 3D-OP-OSEM+VRR also showed bias up to 64% in clinical data, i.e. in some pharmacokinetic parameters as compared with those obtained with 3D-FBP+VRR. In the case of 3D-FBP, VRR showed similar results as DW for both phantom and clinical data, except that VRR showed a better standard deviation of 6-10%. Therefore, VRR should be used to correct for randoms in HRRT PET studies.

  8. Comparison of 3D-OP-OSEM and 3D-FBP reconstruction algorithms for High-Resolution Research Tomograph studies: effects of randoms estimation methods

    NASA Astrophysics Data System (ADS)

    van Velden, Floris H. P.; Kloet, Reina W.; van Berckel, Bart N. M.; Wolfensberger, Saskia P. A.; Lammertsma, Adriaan A.; Boellaard, Ronald

    2008-06-01

    The High-Resolution Research Tomograph (HRRT) is a dedicated human brain positron emission tomography (PET) scanner. Recently, a 3D filtered backprojection (3D-FBP) reconstruction method has been implemented to reduce bias in short duration frames, currently observed in 3D ordinary Poisson OSEM (3D-OP-OSEM) reconstructions. Further improvements might be expected using a new method of variance reduction on randoms (VRR) based on coincidence histograms instead of using the delayed window technique (DW) to estimate randoms. The goal of this study was to evaluate VRR in combination with 3D-OP-OSEM and 3D-FBP reconstruction techniques. To this end, several phantom studies and a human brain study were performed. For most phantom studies, 3D-OP-OSEM showed higher accuracy of observed activity concentrations with VRR than with DW. However, both positive and negative deviations in reconstructed activity concentrations and large biases of grey to white matter contrast ratio (up to 88%) were still observed as a function of scan statistics. Moreover 3D-OP-OSEM+VRR also showed bias up to 64% in clinical data, i.e. in some pharmacokinetic parameters as compared with those obtained with 3D-FBP+VRR. In the case of 3D-FBP, VRR showed similar results as DW for both phantom and clinical data, except that VRR showed a better standard deviation of 6-10%. Therefore, VRR should be used to correct for randoms in HRRT PET studies.

  9. Mapping the distribution of the main host for plague in a complex landscape in Kazakhstan: An object-based approach using SPOT-5 XS, Landsat 7 ETM+, SRTM and multiple Random Forests

    PubMed Central

    Wilschut, L.I.; Addink, E.A.; Heesterbeek, J.A.P.; Dubyanskiy, V.M.; Davis, S.A.; Laudisoit, A.; M.Begon; Burdelov, L.A.; Atshabar, B.B.; de Jong, S.M.

    2013-01-01

    Plague is a zoonotic infectious disease present in great gerbil populations in Kazakhstan. Infectious disease dynamics are influenced by the spatial distribution of the carriers (hosts) of the disease. The great gerbil, the main host in our study area, lives in burrows, which can be recognized on high resolution satellite imagery. In this study, using earth observation data at various spatial scales, we map the spatial distribution of burrows in a semi-desert landscape. The study area consists of various landscape types. To evaluate whether identification of burrows by classification is possible in these landscape types, the study area was subdivided into eight landscape units, on the basis of Landsat 7 ETM+ derived Tasselled Cap Greenness and Brightness, and SRTM derived standard deviation in elevation. In the field, 904 burrows were mapped. Using two segmented 2.5 m resolution SPOT-5 XS satellite scenes, reference object sets were created. Random Forests were built for both SPOT scenes and used to classify the images. Additionally, a stratified classification was carried out, by building separate Random Forests per landscape unit. Burrows were successfully classified in all landscape units. In the ‘steppe on floodplain’ areas, classification worked best: producer's and user's accuracy in those areas reached 88% and 100%, respectively. In the ‘floodplain’ areas with a more heterogeneous vegetation cover, classification worked least well; there, accuracies were 86 and 58% respectively. Stratified classification improved the results in all landscape units where comparison was possible (four), increasing kappa coefficients by 13, 10, 9 and 1%, respectively. In this study, an innovative stratification method using high- and medium resolution imagery was applied in order to map host distribution on a large spatial scale. The burrow maps we developed will help to detect changes in the distribution of great gerbil populations and, moreover, serve as a unique

  10. A random-walk algorithm for modeling lithospheric density and the role of body forces in the evolution of the Midcontinent Rift

    USGS Publications Warehouse

    Levandowski, William Brower; Boyd, Oliver; Briggs, Richard; Gold, Ryan D.

    2015-01-01

    We test this algorithm on the Proterozoic Midcontinent Rift (MCR), north-central U.S. The MCR provides a challenge because it hosts a gravity high overlying low shear-wave velocity crust in a generally flat region. Our initial density estimates are derived from a seismic velocity/crustal thickness model based on joint inversion of surface-wave dispersion and receiver functions. By adjusting these estimates to reproduce gravity and topography, we generate a lithospheric-scale model that reveals dense middle crust and eclogitized lowermost crust within the rift. Mantle lithospheric density beneath the MCR is not anomalous, consistent with geochemical evidence that lithospheric mantle was not the primary source of rift-related magmas and suggesting that extension occurred in response to far-field stress rather than a hot mantle plume. Similarly, the subsequent inversion of normal faults resulted from changing far-field stress that exploited not only warm, recently faulted crust but also a gravitational potential energy low in the MCR. The success of this density modeling algorithm in the face of such apparently contradictory geophysical properties suggests that it may be applicable to a variety of tectonic and geodynamic problems. 

  11. Aboveground Biomass Monitoring over Siberian Boreal Forest Using Radar Remote Sensing Data

    NASA Astrophysics Data System (ADS)

    Stelmaszczuk-Gorska, M. A.; Thiel, C. J.; Schmullius, C.

    2014-12-01

    Aboveground biomass (AGB) plays an essential role in ecosystem research, global cycles, and is of vital importance in climate studies. AGB accumulated in the forests is of special monitoring interest as it contains the most of biomass comparing with other land biomes. The largest of the land biomes is boreal forest, which has a substantial carbon accumulation capability; carbon stock estimated to be 272 +/-23 Pg C (32%) [1]. Russian's forests are of particular concern, due to the largest source of uncertainty in global carbon stock calculations [1], and old inventory data that have not been updated in the last 25 years [2]. In this research new empirical models for AGB estimation are proposed. Using radar L-band data for AGB retrieval and optical data for an update of in situ data the processing scheme was developed. The approach was trained and validated in the Asian part of the boreal forest, in southern Russian Central Siberia; two Siberian Federal Districts: Krasnoyarsk Kray and Irkutsk Oblast. Together the training and testing forest territories cover an area of approximately 3,500 km2. ALOS PALSAR L-band single (HH - horizontal transmitted and received) and dual (HH and HV - horizontal transmitted, horizontal and vertical received) polarizations in Single Look Complex format (SLC) were used to calculate backscattering coefficient in gamma nought and coherence. In total more than 150 images acquired between 2006 and 2011 were available. The data were obtained through the ALOS Kyoto and Carbon Initiative Project (K&C). The data were used to calibrate a randomForest algorithm. Additionally, a simple linear and multiple-regression approach was used. The uncertainty of the AGB estimation at pixel and stand level were calculated approximately as 35% by validation against an independent dataset. The previous studies employing ALOS PALSAR data over boreal forests reported uncertainty of 39.4% using randomForest approach [2] or 42.8% using semi-empirical approach [3].

  12. Handling packet dropouts and random delays for unstable delayed processes in NCS by optimal tuning of PIλDμ controllers with evolutionary algorithms.

    PubMed

    Pan, Indranil; Das, Saptarshi; Gupta, Amitava

    2011-10-01

    The issues of stochastically varying network delays and packet dropouts in Networked Control System (NCS) applications have been simultaneously addressed by time domain optimal tuning of fractional order (FO) PID controllers. Different variants of evolutionary algorithms are used for the tuning process and their performances are compared. Also the effectiveness of the fractional order PI(λ)D(μ) controllers over their integer order counterparts is looked into. Two standard test bench plants with time delay and unstable poles which are encountered in process control applications are tuned with the proposed method to establish the validity of the tuning methodology. The proposed tuning methodology is independent of the specific choice of plant and is also applicable for less complicated systems. Thus it is useful in a wide variety of scenarios. The paper also shows the superiority of FOPID controllers over their conventional PID counterparts for NCS applications. PMID:21621208

  13. An Automated Three-Dimensional Detection and Segmentation Method for Touching Cells by Integrating Concave Points Clustering and Random Walker Algorithm

    PubMed Central

    Gong, Hui; Chen, Shangbin; Zhang, Bin; Ding, Wenxiang; Luo, Qingming; Li, Anan

    2014-01-01

    Characterizing cytoarchitecture is crucial for understanding brain functions and neural diseases. In neuroanatomy, it is an important task to accurately extract cell populations' centroids and contours. Recent advances have permitted imaging at single cell resolution for an entire mouse brain using the Nissl staining method. However, it is difficult to precisely segment numerous cells, especially those cells touching each other. As presented herein, we have developed an automated three-dimensional detection and segmentation method applied to the Nissl staining data, with the following two key steps: 1) concave points clustering to determine the seed points of touching cells; and 2) random walker segmentation to obtain cell contours. Also, we have evaluated the performance of our proposed method with several mouse brain datasets, which were captured with the micro-optical sectioning tomography imaging system, and the datasets include closely touching cells. Comparing with traditional detection and segmentation methods, our approach shows promising detection accuracy and high robustness. PMID:25111442

  14. The Effectiveness of Parent Training as a Treatment for Preschool Attention-Deficit/Hyperactivity Disorder: Study Protocol for a Randomized Controlled, Multicenter Trial of the New Forest Parenting Program in Everyday Clinical Practice

    PubMed Central

    Daley, David; Frydenberg, Morten; Rask, Charlotte U; Sonuga-Barke, Edmund; Thomsen, Per H

    2016-01-01

    Background Parent training is recommended as the first-line treatment for attention-deficit/hyperactivity disorder (ADHD) in preschool children. The New Forest Parenting Programme (NFPP) is an evidence-based parenting program developed specifically to target preschool ADHD. Objective The objective of this trial is to investigate whether the NFPP can be effectively delivered for children referred through official community pathways in everyday clinical practice. Methods A multicenter randomized controlled parallel arm trial design is employed. There are two treatment arms, NFPP and treatment as usual. NFPP consists of eight individually delivered parenting sessions, where the child attends during three of the sessions. Outcomes are examined at three time points (T1, T2, T3): T1 (baseline), T2 (week 12, post intervention), and T3 (6 month follow/up). 140 children between the ages of 3-7, with a clinical diagnosis of ADHD, informed by the Development and Well Being Assessment, and recruited from three child and adolescent psychiatry departments in Denmark will take part. Randomization is on a 1:1 basis, stratified for age and gender. Results The primary endpoint is change in ADHD symptoms as measured by the Preschool ADHD-Rating Scale (ADHD-RS) by T2. Secondary outcome measures include: effects on this measure at T3 and T2 and T3 measures of teacher reported Preschool ADHD-RS scores, parent and teacher rated scores on the Strength & Difficulties Questionnaire, direct observation of ADHD behaviors during Child’s Solo Play, observation of parent-child interaction, parent sense of competence, and family stress. Results will be reported using the standards set out in the Consolidated Standards of Reporting Trials Statement for Randomized Controlled Trials of nonpharmacological treatments. Conclusions The trial will provide evidence as to whether NFPP is a more effective treatment for preschool ADHD than the treatment usually offered in everyday clinical practice. Trial

  15. The LeFE algorithm: embracing the complexity of gene expression in the interpretation of microarray data

    PubMed Central

    Eichler, Gabriel S; Reimers, Mark; Kane, David; Weinstein, John N

    2007-01-01

    Interpretation of microarray data remains a challenge, and most methods fail to consider the complex, nonlinear regulation of gene expression. To address that limitation, we introduce Learner of Functional Enrichment (LeFE), a statistical/machine learning algorithm based on Random Forest, and demonstrate it on several diverse datasets: smoker/never smoker, breast cancer classification, and cancer drug sensitivity. We also compare it with previously published algorithms, including Gene Set Enrichment Analysis. LeFE regularly identifies statistically significant functional themes consistent with known biology. PMID:17845722

  16. Random bistochastic matrices

    NASA Astrophysics Data System (ADS)

    Cappellini, Valerio; Sommers, Hans-Jürgen; Bruzda, Wojciech; Życzkowski, Karol

    2009-09-01

    Ensembles of random stochastic and bistochastic matrices are investigated. While all columns of a random stochastic matrix can be chosen independently, the rows and columns of a bistochastic matrix have to be correlated. We evaluate the probability measure induced into the Birkhoff polytope of bistochastic matrices by applying the Sinkhorn algorithm to a given ensemble of random stochastic matrices. For matrices of order N = 2 we derive explicit formulae for the probability distributions induced by random stochastic matrices with columns distributed according to the Dirichlet distribution. For arbitrary N we construct an initial ensemble of stochastic matrices which allows one to generate random bistochastic matrices according to a distribution locally flat at the center of the Birkhoff polytope. The value of the probability density at this point enables us to obtain an estimation of the volume of the Birkhoff polytope, consistent with recent asymptotic results.

  17. Algorithmic chemistry

    SciTech Connect

    Fontana, W.

    1990-12-13

    In this paper complex adaptive systems are defined by a self- referential loop in which objects encode functions that act back on these objects. A model for this loop is presented. It uses a simple recursive formal language, derived from the lambda-calculus, to provide a semantics that maps character strings into functions that manipulate symbols on strings. The interaction between two functions, or algorithms, is defined naturally within the language through function composition, and results in the production of a new function. An iterated map acting on sets of functions and a corresponding graph representation are defined. Their properties are useful to discuss the behavior of a fixed size ensemble of randomly interacting functions. This function gas'', or Turning gas'', is studied under various conditions, and evolves cooperative interaction patterns of considerable intricacy. These patterns adapt under the influence of perturbations consisting in the addition of new random functions to the system. Different organizations emerge depending on the availability of self-replicators.

  18. Improved forest change detection with terrain illumination corrected landsat images

    Technology Transfer Automated Retrieval System (TEKTRAN)

    An illumination correction algorithm has been developed to improve the accuracy of forest change detection from Landsat reflectance data. This algorithm is based on an empirical rotation model and was tested on the Landsat imagery pair over Cherokee National Forest, Tennessee, Uinta-Wasatch-Cache N...

  19. Evaluation of Algorithms for a Miles-in-Trail Decision Support Tool

    NASA Technical Reports Server (NTRS)

    Bloem, Michael; Hattaway, David; Bambos, Nicholas

    2012-01-01

    Four machine learning algorithms were prototyped and evaluated for use in a proposed decision support tool that would assist air traffic managers as they set Miles-in-Trail restrictions. The tool would display probabilities that each possible Miles-in-Trail value should be used in a given situation. The algorithms were evaluated with an expected Miles-in-Trail cost that assumes traffic managers set restrictions based on the tool-suggested probabilities. Basic Support Vector Machine, random forest, and decision tree algorithms were evaluated, as was a softmax regression algorithm that was modified to explicitly reduce the expected Miles-in-Trail cost. The algorithms were evaluated with data from the summer of 2011 for air traffic flows bound to the Newark Liberty International Airport (EWR) over the ARD, PENNS, and SHAFF fixes. The algorithms were provided with 18 input features that describe the weather at EWR, the runway configuration at EWR, the scheduled traffic demand at EWR and the fixes, and other traffic management initiatives in place at EWR. Features describing other traffic management initiatives at EWR and the weather at EWR achieved relatively high information gain scores, indicating that they are the most useful for estimating Miles-in-Trail. In spite of a high variance or over-fitting problem, the decision tree algorithm achieved the lowest expected Miles-in-Trail costs when the algorithms were evaluated using 10-fold cross validation with the summer 2011 data for these air traffic flows.

  20. Mapping tree health using airborne laser scans and hyperspectral imagery: a case study for a floodplain eucalypt forest

    NASA Astrophysics Data System (ADS)

    Shendryk, Iurii; Tulbure, Mirela; Broich, Mark; McGrath, Andrew; Alexandrov, Sergey; Keith, David

    2016-04-01

    Airborne laser scanning (ALS) and hyperspectral imaging (HSI) are two complementary remote sensing technologies that provide comprehensive structural and spectral characteristics of forests over large areas. In this study we developed two algorithms: one for individual tree delineation utilizing ALS and the other utilizing ALS and HSI to characterize health of delineated trees in a structurally complex floodplain eucalypt forest. We conducted experiments in the largest eucalypt, river red gum forest in the world, located in the south-east of Australia that experienced severe dieback over the past six decades. For detection of individual trees from ALS we developed a novel bottom-up approach based on Euclidean distance clustering to detect tree trunks and random walks segmentation to further delineate tree crowns. Overall, our algorithm was able to detect 67% of tree trunks with diameter larger than 13 cm. We assessed the accuracy of tree delineations in terms of crown height and width, with correct delineation of 68% of tree crowns. The increase in ALS point density from ~12 to ~24 points/m2 resulted in tree trunk detection and crown delineation increase of 11% and 13%, respectively. Trees with incorrectly delineated crowns were generally attributed to areas with high tree density along water courses. The accurate delineation of trees allowed us to classify the health of this forest using machine learning and field-measured tree crown dieback and transparency ratios, which were good predictors of tree health in this forest. ALS and HSI derived indices were used as predictor variables to train and test object-oriented random forest classifier. Returned pulse width, intensity and density related ALS indices were the most important predictors in the tree health classifications. At the forest level in terms of tree crown dieback, 77% of trees were classified as healthy, 14% as declining and 9% as dying or dead with 81% mapping accuracy. Similarly, in terms of tree

  1. Natural Variability of Mexican Forest Fires

    NASA Astrophysics Data System (ADS)

    Velasco-Herrera, Graciela; Velasco Herrera, Victor Manuel; Kemper-Valverdea, N.

    The purpose of this paper was 1) to present a new algorithm for analyzing the forest fires, 2) to discuss the present understanding of the natural variability at different scales with special emphasis on Mexico conditions since 1972, 3) to analyze the internal and external factors affecting forest fires for example ENSO and Total Solar Irradiance, and 4) to discuss the implications of this knowledge, on research and on restoration and management methods, which purpose is to enhance forest biodiversity conservation. 5) We present an estimate of the Mexican forest fires for the next decade. These results may be useful to minimize human and economic losses.

  2. Fast Randomized STDMA Link Scheduling

    NASA Astrophysics Data System (ADS)

    Gomez, Sergio; Gras, Oriol; Friderikos, Vasilis

    In this paper a fast randomized parallel link swap based packing (RSP) algorithm for timeslot allocation in a spatial time division multiple access (STDMA) wireless mesh network is presented. The proposed randomized algorithm extends several greedy scheduling algorithms that utilize the physical interference model by applying a local search that leads to a substantial improvement in the spatial timeslot reuse. Numerical simulations reveal that compared to previously scheduling schemes the proposed randomized algorithm can achieve a performance gain of up to 11%. A significant benefit of the proposed scheme is that the computations can be parallelized and therefore can efficiently utilize commoditized and emerging multi-core and/or multi-CPU processors.

  3. Time fluctuation analysis of forest fire sequences

    NASA Astrophysics Data System (ADS)

    Vega Orozco, Carmen D.; Kanevski, Mikhaïl; Tonini, Marj; Golay, Jean; Pereira, Mário J. G.

    2013-04-01

    Forest fires are complex events involving both space and time fluctuations. Understanding of their dynamics and pattern distribution is of great importance in order to improve the resource allocation and support fire management actions at local and global levels. This study aims at characterizing the temporal fluctuations of forest fire sequences observed in Portugal, which is the country that holds the largest wildfire land dataset in Europe. This research applies several exploratory data analysis measures to 302,000 forest fires occurred from 1980 to 2007. The applied clustering measures are: Morisita clustering index, fractal and multifractal dimensions (box-counting), Ripley's K-function, Allan Factor, and variography. These algorithms enable a global time structural analysis describing the degree of clustering of a point pattern and defining whether the observed events occur randomly, in clusters or in a regular pattern. The considered methods are of general importance and can be used for other spatio-temporal events (i.e. crime, epidemiology, biodiversity, geomarketing, etc.). An important contribution of this research deals with the analysis and estimation of local measures of clustering that helps understanding their temporal structure. Each measure is described and executed for the raw data (forest fires geo-database) and results are compared to reference patterns generated under the null hypothesis of randomness (Poisson processes) embedded in the same time period of the raw data. This comparison enables estimating the degree of the deviation of the real data from a Poisson process. Generalizations to functional measures of these clustering methods, taking into account the phenomena, were also applied and adapted to detect time dependences in a measured variable (i.e. burned area). The time clustering of the raw data is compared several times with the Poisson processes at different thresholds of the measured function. Then, the clustering measure value

  4. Higher-order force gradient symplectic algorithms

    NASA Astrophysics Data System (ADS)

    Chin, Siu A.; Kidwell, Donald W.

    2000-12-01

    We show that a recently discovered fourth order symplectic algorithm, which requires one evaluation of force gradient in addition to three evaluations of the force, when iterated to higher order, yielded algorithms that are far superior to similarly iterated higher order algorithms based on the standard Forest-Ruth algorithm. We gauge the accuracy of each algorithm by comparing the step-size independent error functions associated with energy conservation and the rotation of the Laplace-Runge-Lenz vector when solving a highly eccentric Kepler problem. For orders 6, 8, 10, and 12, the new algorithms are approximately a factor of 103, 104, 104, and 105 better.

  5. Random quantum operations

    NASA Astrophysics Data System (ADS)

    Bruzda, Wojciech; Cappellini, Valerio; Sommers, Hans-Jürgen; Życzkowski, Karol

    2009-01-01

    We define a natural ensemble of trace preserving, completely positive quantum maps and present algorithms to generate them at random. Spectral properties of the superoperator Φ associated with a given quantum map are investigated and a quantum analogue of the Frobenius-Perron theorem is proved. We derive a general formula for the density of eigenvalues of Φ and show the connection with the Ginibre ensemble of real non-symmetric random matrices. Numerical investigations of the spectral gap imply that a generic state of the system iterated several times by a fixed generic map converges exponentially to an invariant state.

  6. Randomized selection on the GPU

    SciTech Connect

    Monroe, Laura Marie; Wendelberger, Joanne R; Michalak, Sarah E

    2011-01-13

    We implement here a fast and memory-sparing probabilistic top N selection algorithm on the GPU. To our knowledge, this is the first direct selection in the literature for the GPU. The algorithm proceeds via a probabilistic-guess-and-chcck process searching for the Nth element. It always gives a correct result and always terminates. The use of randomization reduces the amount of data that needs heavy processing, and so reduces the average time required for the algorithm. Probabilistic Las Vegas algorithms of this kind are a form of stochastic optimization and can be well suited to more general parallel processors with limited amounts of fast memory.

  7. Genetic algorithms

    NASA Technical Reports Server (NTRS)

    Wang, Lui; Bayer, Steven E.

    1991-01-01

    Genetic algorithms are mathematical, highly parallel, adaptive search procedures (i.e., problem solving methods) based loosely on the processes of natural genetics and Darwinian survival of the fittest. Basic genetic algorithms concepts are introduced, genetic algorithm applications are introduced, and results are presented from a project to develop a software tool that will enable the widespread use of genetic algorithm technology.

  8. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies

    PubMed Central

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O`Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1–98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  9. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting. PMID:26262681

  10. Forest Cover Estimation in Ireland Using Radar Remote Sensing: A Comparative Analysis of Forest Cover Assessment Methodologies.

    PubMed

    Devaney, John; Barrett, Brian; Barrett, Frank; Redmond, John; O Halloran, John

    2015-01-01

    Quantification of spatial and temporal changes in forest cover is an essential component of forest monitoring programs. Due to its cloud free capability, Synthetic Aperture Radar (SAR) is an ideal source of information on forest dynamics in countries with near-constant cloud-cover. However, few studies have investigated the use of SAR for forest cover estimation in landscapes with highly sparse and fragmented forest cover. In this study, the potential use of L-band SAR for forest cover estimation in two regions (Longford and Sligo) in Ireland is investigated and compared to forest cover estimates derived from three national (Forestry2010, Prime2, National Forest Inventory), one pan-European (Forest Map 2006) and one global forest cover (Global Forest Change) product. Two machine-learning approaches (Random Forests and Extremely Randomised Trees) are evaluated. Both Random Forests and Extremely Randomised Trees classification accuracies were high (98.1-98.5%), with differences between the two classifiers being minimal (<0.5%). Increasing levels of post classification filtering led to a decrease in estimated forest area and an increase in overall accuracy of SAR-derived forest cover maps. All forest cover products were evaluated using an independent validation dataset. For the Longford region, the highest overall accuracy was recorded with the Forestry2010 dataset (97.42%) whereas in Sligo, highest overall accuracy was obtained for the Prime2 dataset (97.43%), although accuracies of SAR-derived forest maps were comparable. Our findings indicate that spaceborne radar could aid inventories in regions with low levels of forest cover in fragmented landscapes. The reduced accuracies observed for the global and pan-continental forest cover maps in comparison to national and SAR-derived forest maps indicate that caution should be exercised when applying these datasets for national reporting.

  11. Randomized parallel speedups for list ranking

    SciTech Connect

    Vishkin, U.

    1987-06-01

    The following problem is considered: given a linked list of length n, compute the distance of each element of the linked list from the end of the list. The problem has two standard deterministic algorithms: a linear time serial algorithm, and an O(n log n)/ rho + log n) time parallel algorithm using rho processors. The authors present a randomized parallel algorithm for the problem. The algorithm is designed for an exclusive-read exclusive-write parallel random access machine (EREW PRAM). It runs almost surely in time O(n/rho + log n log* n) using rho processors. Using a recently published parallel prefix sums algorithm the list-ranking algorithm can be adapted to run on a concurrent-read concurrent-write parallel random access machine (CRCW PRAM) almost surely in time O(n/rho + log n) using rho processors.

  12. Searching for closely related ligands with different mechanisms of action using machine learning and mapping algorithms.

    PubMed

    Balfer, Jenny; Vogt, Martin; Bajorath, Jürgen

    2013-09-23

    Supervised machine learning approaches, including support vector machines, random forests, Bayesian classifiers, nearest-neighbor similarity searching, and a conceptually distinct mapping algorithm termed DynaMAD, have been investigated for their ability to detect structurally related ligands of a given receptor with different mechanisms of action. For this purpose, a large number of simulated virtual screening trials were carried out with models trained on mechanistic subsets of different classes of receptor ligands. The results revealed that ligands with the desired mechanism of action were frequently contained in database selection sets of limited size. All machine learning approaches successfully detected mechanistic subsets of ligands in a large background database of druglike compounds. However, the early enrichment characteristics considerably differed. Overall, random forests of relatively simple design and support vector machines with Gaussian kernels (Gaussian SVMs) displayed the highest search performance. In addition, DynaMAD was found to yield very small selection sets comprising only ~10 compounds that also contained ligands with the desired mechanism of action. Random forest, Gaussian SVM, and DynaMAD calculations revealed an enrichment of compounds with the desired mechanism over other mechanistic subsets. PMID:23952618

  13. A Geospatial Assessment of Mountain Pine Beetle Infestations and Their Effect on Forest Health in Okanogan-Wenatchee National Forest

    NASA Astrophysics Data System (ADS)

    Allain, M.; Nguyen, A.; Johnson, E.; Williams, E.; Tsai, S.; Prichard, S.; Freed, T.; Skiles, J. W.

    2010-12-01

    Fire-suppression over the past century has resulted in an accumulation of forest litter and increased tree density. As nutrients are sequestered in forest litter and not recycled by forest fires, soil nutrient concentrations have decreased. The forests of Northern Washington are in poor health as a result of these factors coupled with sequential droughts. The mountain pine beetle (MPB) thrives in such conditions, giving rise to an outbreak in Washington’s Okanogan-Wenatchee National Forest. These outbreaks occur in three successive stages— the green, red, and gray stages. Beetles first infest the tree in the green phase, leading to discoloration of needles in the red phase and eventually death in the gray phase. With the use of geospatial technology, these outbreaks can be better mapped and assessed to evaluate forest health. Field work on seventeen randomly selected sites was conducted using the point-centered quarter method. The stratified random sampling technique ensured that the sampled trees were representative of all classifications present. Additional measurements taken were soil nutrient concentrations (sodium [Na+], nitrate [NO3-], and potassium [K+]), soil pH, and tree temperatures. Satellite imagery was used to define infestation levels and geophysical parameters, such as land cover, vegetation classification, and vegetation stress. ASTER images were used with the Ratio Vegetation Index (RVI) to explore the differences in vegetation, while MODIS images were used to analyze the Disturbance Index (DI). Four other vegetation indices from Landsat TM5 were used to distinguish the green, red and gray phases. Selected imagery from the Hyperion sensor was used to run a minimum distance supervised classification in ENVI, thus testing the ability of Hyperion imagery to detect the green phase. The National Agricultural Imagery Program (NAIP) archive was used to generate accurate maps of beetle-infested regions. This algorithm was used to detect bark beetle

  14. Geological Mapping Using Machine Learning Algorithms

    NASA Astrophysics Data System (ADS)

    Harvey, A. S.; Fotopoulos, G.

    2016-06-01

    Remotely sensed spectral imagery, geophysical (magnetic and gravity), and geodetic (elevation) data are useful in a variety of Earth science applications such as environmental monitoring and mineral exploration. Using these data with Machine Learning Algorithms (MLA), which are widely used in image analysis and statistical pattern recognition applications, may enhance preliminary geological mapping and interpretation. This approach contributes towards a rapid and objective means of geological mapping in contrast to conventional field expedition techniques. In this study, four supervised MLAs (naïve Bayes, k-nearest neighbour, random forest, and support vector machines) are compared in order to assess their performance for correctly identifying geological rocktypes in an area with complete ground validation information. Geological maps of the Sudbury region are used for calibration and validation. Percent of correct classifications was used as indicators of performance. Results show that random forest is the best approach. As expected, MLA performance improves with more calibration clusters, i.e. a more uniform distribution of calibration data over the study region. Performance is generally low, though geological trends that correspond to a ground validation map are visualized. Low performance may be the result of poor spectral images of bare rock which can be covered by vegetation or water. The distribution of calibration clusters and MLA input parameters affect the performance of the MLAs. Generally, performance improves with more uniform sampling, though this increases required computational effort and time. With the achievable performance levels in this study, the technique is useful in identifying regions of interest and identifying general rocktype trends. In particular, phase I geological site investigations will benefit from this approach and lead to the selection of sites for advanced surveys.

  15. Forested wetlands

    SciTech Connect

    Lugo, A.E.; Brinson, M.; Brown, S.

    1990-01-01

    Forested wetlands have important roles in global biogeochemical cycles, supporting freshwater and saltwater commercial fisheries, and in providing a place for wildlife of all kinds to flourish. Scientific attention towards these ecosystems has lagged with only a few comprehensive works on forested wetlands of the world. A major emphasis of this book is to develop unifying principles and data bases on the structure and function of forested wetlands, in order to stimulate scientific study of them. Wetlands are areas that are inundated or saturated by surface-water or ground-water, at such a frequency and duration that under natural conditions they support organisms adapted to poorly aerated and/or saturated soil. The strategy of classifying the conditions that control the structure and behavior of forested wetlands by assuming that the physiognomy and floristic composition of the system will reflect the total energy expenditure of the ecosystem; and the structural and functional characteristics of forested wetlands from different parts of the world are the major topics covered.

  16. Patchiness of forest landscape can predict species distribution better than abundance: the case of a forest-dwelling passerine, the short-toed treecreeper, in central Italy.

    PubMed

    Basile, Marco; Valerio, Francesco; Balestrieri, Rosario; Posillico, Mario; Bucci, Rodolfo; Altea, Tiziana; De Cinti, Bruno; Matteucci, Giorgio

    2016-01-01

    Environmental heterogeneity affects not only the distribution of a species but also its local abundance. High heterogeneity due to habitat alteration and fragmentation can influence the realized niche of a species, lowering habitat suitability as well as reducing local abundance. We investigate whether a relationship exists between habitat suitability and abundance and whether both are affected by fragmentation. Our aim was to assess the predictive power of such a relationship to derive advice for environmental management. As a model species we used a forest specialist, the short-toed treecreeper (Family: Certhiidae; Certhia brachydactyla Brehm, 1820), and sampled it in central Italy. Species distribution was modelled as a function of forest structure, productivity and fragmentation, while abundance was directly estimated in two central Italian forest stands. Different algorithms were implemented to model species distribution, employing 170 occurrence points provided mostly by the MITO2000 database: an artificial neural network, classification tree analysis, flexible discriminant analysis, generalized boosting models, generalized linear models, multivariate additive regression splines, maximum entropy and random forests. Abundance was estimated also considering detectability, through N-mixture models. Differences between forest stands in both abundance and habitat suitability were assessed as well as the existence of a relationship. Simpler algorithms resulted in higher goodness of fit than complex ones. Fragmentation was highly influential in determining potential distribution. Local abundance and habitat suitability differed significantly between the two forest stands, which were also significantly different in the degree of fragmentation. Regression showed that suitability has a weak significant effect in explaining increasing value of abundance. In particular, local abundances varied both at low and high suitability values. The study lends support to the

  17. Patchiness of forest landscape can predict species distribution better than abundance: the case of a forest-dwelling passerine, the short-toed treecreeper, in central Italy

    PubMed Central

    Valerio, Francesco; Balestrieri, Rosario; Posillico, Mario; Bucci, Rodolfo; Altea, Tiziana; De Cinti, Bruno; Matteucci, Giorgio

    2016-01-01

    Environmental heterogeneity affects not only the distribution of a species but also its local abundance. High heterogeneity due to habitat alteration and fragmentation can influence the realized niche of a species, lowering habitat suitability as well as reducing local abundance. We investigate whether a relationship exists between habitat suitability and abundance and whether both are affected by fragmentation. Our aim was to assess the predictive power of such a relationship to derive advice for environmental management. As a model species we used a forest specialist, the short-toed treecreeper (Family: Certhiidae; Certhia brachydactyla Brehm, 1820), and sampled it in central Italy. Species distribution was modelled as a function of forest structure, productivity and fragmentation, while abundance was directly estimated in two central Italian forest stands. Different algorithms were implemented to model species distribution, employing 170 occurrence points provided mostly by the MITO2000 database: an artificial neural network, classification tree analysis, flexible discriminant analysis, generalized boosting models, generalized linear models, multivariate additive regression splines, maximum entropy and random forests. Abundance was estimated also considering detectability, through N-mixture models. Differences between forest stands in both abundance and habitat suitability were assessed as well as the existence of a relationship. Simpler algorithms resulted in higher goodness of fit than complex ones. Fragmentation was highly influential in determining potential distribution. Local abundance and habitat suitability differed significantly between the two forest stands, which were also significantly different in the degree of fragmentation. Regression showed that suitability has a weak significant effect in explaining increasing value of abundance. In particular, local abundances varied both at low and high suitability values. The study lends support to the

  18. Patchiness of forest landscape can predict species distribution better than abundance: the case of a forest-dwelling passerine, the short-toed treecreeper, in central Italy.

    PubMed

    Basile, Marco; Valerio, Francesco; Balestrieri, Rosario; Posillico, Mario; Bucci, Rodolfo; Altea, Tiziana; De Cinti, Bruno; Matteucci, Giorgio

    2016-01-01

    Environmental heterogeneity affects not only the distribution of a species but also its local abundance. High heterogeneity due to habitat alteration and fragmentation can influence the realized niche of a species, lowering habitat suitability as well as reducing local abundance. We investigate whether a relationship exists between habitat suitability and abundance and whether both are affected by fragmentation. Our aim was to assess the predictive power of such a relationship to derive advice for environmental management. As a model species we used a forest specialist, the short-toed treecreeper (Family: Certhiidae; Certhia brachydactyla Brehm, 1820), and sampled it in central Italy. Species distribution was modelled as a function of forest structure, productivity and fragmentation, while abundance was directly estimated in two central Italian forest stands. Different algorithms were implemented to model species distribution, employing 170 occurrence points provided mostly by the MITO2000 database: an artificial neural network, classification tree analysis, flexible discriminant analysis, generalized boosting models, generalized linear models, multivariate additive regression splines, maximum entropy and random forests. Abundance was estimated also considering detectability, through N-mixture models. Differences between forest stands in both abundance and habitat suitability were assessed as well as the existence of a relationship. Simpler algorithms resulted in higher goodness of fit than complex ones. Fragmentation was highly influential in determining potential distribution. Local abundance and habitat suitability differed significantly between the two forest stands, which were also significantly different in the degree of fragmentation. Regression showed that suitability has a weak significant effect in explaining increasing value of abundance. In particular, local abundances varied both at low and high suitability values. The study lends support to the

  19. Patchiness of forest landscape can predict species distribution better than abundance: the case of a forest-dwelling passerine, the short-toed treecreeper, in central Italy

    PubMed Central

    Valerio, Francesco; Balestrieri, Rosario; Posillico, Mario; Bucci, Rodolfo; Altea, Tiziana; De Cinti, Bruno; Matteucci, Giorgio

    2016-01-01

    Environmental heterogeneity affects not only the distribution of a species but also its local abundance. High heterogeneity due to habitat alteration and fragmentation can influence the realized niche of a species, lowering habitat suitability as well as reducing local abundance. We investigate whether a relationship exists between habitat suitability and abundance and whether both are affected by fragmentation. Our aim was to assess the predictive power of such a relationship to derive advice for environmental management. As a model species we used a forest specialist, the short-toed treecreeper (Family: Certhiidae; Certhia brachydactyla Brehm, 1820), and sampled it in central Italy. Species distribution was modelled as a function of forest structure, productivity and fragmentation, while abundance was directly estimated in two central Italian forest stands. Different algorithms were implemented to model species distribution, employing 170 occurrence points provided mostly by the MITO2000 database: an artificial neural network, classification tree analysis, flexible discriminant analysis, generalized boosting models, generalized linear models, multivariate additive regression splines, maximum entropy and random forests. Abundance was estimated also considering detectability, through N-mixture models. Differences between forest stands in both abundance and habitat suitability were assessed as well as the existence of a relationship. Simpler algorithms resulted in higher goodness of fit than complex ones. Fragmentation was highly influential in determining potential distribution. Local abundance and habitat suitability differed significantly between the two forest stands, which were also significantly different in the degree of fragmentation. Regression showed that suitability has a weak significant effect in explaining increasing value of abundance. In particular, local abundances varied both at low and high suitability values. The study lends support to the

  20. Mapping forest composition from the Canadian National Forest Inventory and land cover classification maps.

    PubMed

    Yemshanov, Denys; McKenney, Daniel W; Pedlar, John H

    2012-08-01

    Canada's National Forest Inventory (CanFI) provides coarse-grained, aggregated information on a large number of forest attributes. Though reasonably well suited for summary reporting on national forest resources, the coarse spatial nature of this data limits its usefulness in modeling applications that require information on forest composition at finer spatial resolutions. An alternative source of information is the land cover classification produced by the Canadian Forest Service as part of its Earth Observation for Sustainable Development of Forests (EOSD) initiative. This product, which is derived from Landsat satellite imagery, provides relatively high resolution coverage, but only very general information on forest composition (such as conifer, mixedwood, and deciduous). Here we link the CanFI and EOSD products using a spatial randomization technique to distribute the forest composition information in CanFI to the forest cover classes in EOSD. The resultant geospatial coverages provide randomized predictions of forest composition, which incorporate the fine-scale spatial detail of the EOSD product and agree in general terms with the species composition summaries from the original CanFI estimates. We describe the approach and provide illustrative results for selected major commercial tree species in Canada.

  1. How random are random numbers generated using photons?

    NASA Astrophysics Data System (ADS)

    Solis, Aldo; Angulo Martínez, Alí M.; Ramírez Alarcón, Roberto; Cruz Ramírez, Hector; U'Ren, Alfred B.; Hirsch, Jorge G.

    2015-06-01

    Randomness is fundamental in quantum theory, with many philosophical and practical implications. In this paper we discuss the concept of algorithmic randomness, which provides a quantitative method to assess the Borel normality of a given sequence of numbers, a necessary condition for it to be considered random. We use Borel normality as a tool to investigate the randomness of ten sequences of bits generated from the differences between detection times of photon pairs generated by spontaneous parametric downconversion. These sequences are shown to fulfil the randomness criteria without difficulties. As deviations from Borel normality for photon-generated random number sequences have been reported in previous work, a strategy to understand these diverging findings is outlined.

  2. Image segmentation using random features

    NASA Astrophysics Data System (ADS)

    Bull, Geoff; Gao, Junbin; Antolovich, Michael

    2014-01-01

    This paper presents a novel algorithm for selecting random features via compressed sensing to improve the performance of Normalized Cuts in image segmentation. Normalized Cuts is a clustering algorithm that has been widely applied to segmenting images, using features such as brightness, intervening contours and Gabor filter responses. Some drawbacks of Normalized Cuts are that computation times and memory usage can be excessive, and the obtained segmentations are often poor. This paper addresses the need to improve the processing time of Normalized Cuts while improving the segmentations. A significant proportion of the time in calculating Normalized Cuts is spent computing an affinity matrix. A new algorithm has been developed that selects random features using compressed sensing techniques to reduce the computation needed for the affinity matrix. The new algorithm, when compared to the standard implementation of Normalized Cuts for segmenting images from the BSDS500, produces better segmentations in significantly less time.

  3. The use of airborne hyperspectral data for tree species classification in a species-rich Central European forest area

    NASA Astrophysics Data System (ADS)

    Richter, Ronny; Reu, Björn; Wirth, Christian; Doktor, Daniel; Vohland, Michael

    2016-10-01

    The success of remote sensing approaches to assess tree species diversity in a heterogeneously mixed forest stand depends on the availability of both appropriate data and suitable classification algorithms. To separate the high number of in total ten broadleaf tree species in a small structured floodplain forest, the Leipzig Riverside Forest, we introduce a majority based classification approach for Discriminant Analysis based on Partial Least Squares (PLS-DA), which was tested against Random Forest (RF) and Support Vector Machines (SVM). The classifier performance was tested on different sets of airborne hyperspectral image data (AISA DUAL) that were acquired on single dates in August and September and also stacked to a composite product. Shadowed gaps and shadowed crown parts were eliminated via spectral mixture analysis (SMA) prior to the pixel-based classification. Training and validation sets were defined spectrally with the conditioned Latin hypercube method as a stratified random sampling procedure. In the validation, PLS-DA consistently outperformed the RF and SVM approaches on all datasets. The additional use of spectral variable selection (CARS, "competitive adaptive reweighted sampling") combined with PLS-DA further improved classification accuracies. Up to 78.4% overall accuracy was achieved for the stacked dataset. The image recorded in August provided slightly higher accuracies than the September image, regardless of the applied classifier.

  4. Application of AIS Technology to Forest Mapping

    NASA Technical Reports Server (NTRS)

    Yool, S. R.; Star, J. L.

    1985-01-01

    Concerns about environmental effects of large scale deforestation have prompted efforts to map forests over large areas using various remote sensing data and image processing techniques. Basic research on the spectral characteristics of forest vegetation are required to form a basis for development of new techniques, and for image interpretation. Examination of LANDSAT data and image processing algorithms over a portion of boreal forest have demonstrated the complexity of relations between the various expressions of forest canopies, environmental variability, and the relative capacities of different image processing algorithms to achieve high classification accuracies under these conditions. Airborne Imaging Spectrometer (AIS) data may in part provide the means to interpret the responses of standard data and techniques to the vegetation based on its relatively high spectral resolution.

  5. Testing an earthquake prediction algorithm

    USGS Publications Warehouse

    Kossobokov, V.G.; Healy, J.H.; Dewey, J.W.

    1997-01-01

    A test to evaluate earthquake prediction algorithms is being applied to a Russian algorithm known as M8. The M8 algorithm makes intermediate term predictions for earthquakes to occur in a large circle, based on integral counts of transient seismicity in the circle. In a retroactive prediction for the period January 1, 1985 to July 1, 1991 the algorithm as configured for the forward test would have predicted eight of ten strong earthquakes in the test area. A null hypothesis, based on random assignment of predictions, predicts eight earthquakes in 2.87% of the trials. The forward test began July 1, 1991 and will run through December 31, 1997. As of July 1, 1995, the algorithm had forward predicted five out of nine earthquakes in the test area, which success ratio would have been achieved in 53% of random trials with the null hypothesis.

  6. Forests & Trees.

    ERIC Educational Resources Information Center

    Gage, Susan

    1989-01-01

    This newsletter discusses the disappearance of the world's forests and the resulting environmental problems of erosion and flooding; loss of genetic diversity; climatic changes such as less rainfall, and intensifying of the greenhouse effect; and displacement and destruction of indigenous cultures. The articles, lessons, and activities are…

  7. Forest Imaging

    NASA Technical Reports Server (NTRS)

    1992-01-01

    NASA's Technology Applications Center, with other government and academic agencies, provided technology for improved resources management to the Cibola National Forest. Landsat satellite images enabled vegetation over a large area to be classified for purposes of timber analysis, wildlife habitat, range measurement and development of general vegetation maps.

  8. Entangled decision forests and their application for semantic segmentation of CT images.

    PubMed

    Montillo, Albert; Shotton, Jamie; Winn, John; Iglesias, Juan Eugenio; Metaxas, Dimitri; Criminisi, Antonio

    2011-01-01

    This work addresses the challenging problem of simultaneously segmenting multiple anatomical structures in highly varied CT scans. We propose the entangled decision forest (EDF) as a new discriminative classifier which augments the state of the art decision forest, resulting in higher prediction accuracy and shortened decision time. Our main contribution is two-fold. First, we propose entangling the binary tests applied at each tree node in the forest, such that the test result can depend on the result of tests applied earlier in the same tree and at image points offset from the voxel to be classified. This is demonstrated to improve accuracy and capture long-range semantic context. Second, during training, we propose injecting randomness in a guided way, in which node feature types and parameters are randomly drawn from a learned (nonuniform) distribution. This further improves classification accuracy. We assess our probabilistic anatomy segmentation technique using a labeled database of CT image volumes of 250 different patients from various scan protocols and scanner vendors. In each volume, 12 anatomical structures have been manually segmented. The database comprises highly varied body shapes and sizes, a wide array of pathologies, scan resolutions, and diverse contrast agents. Quantitative comparisons with state of the art algorithms demonstrate both superior test accuracy and computational efficiency. PMID:21761656

  9. Discriminative boosted forest with convolutional neural network-based patch descriptor for object detection

    NASA Astrophysics Data System (ADS)

    Xiang, Tao; Li, Tao; Ye, Mao; Li, Xudong

    2016-01-01

    Object detection with intraclass variations is challenging. The existing methods have not achieved the optimal combinations of classifiers and features, especially features learned by convolutional neural networks (CNNs). To solve this problem, we propose an object-detection method based on improved random forest and local image patches represented by CNN features. First, we compute CNN-based patch descriptors for each sample by modified CNNs. Then, the random forest is built whose split functions are defined by patch selector and linear projection learned by linear support vector machine. To improve the classification accuracy, the split functions in each depth of the forest make up a local classifier, and all local classifiers are assembled in a layer-wise manner by a boosting algorithm. The main contributions of our approach are summarized as follows: (1) We propose a new local patch descriptor based on CNN features. (2) We define a patch-based split function which is optimized with maximum class-label purity and minimum classification error over the samples of the node. (3) Each local classifier is assembled by minimizing the global classification error. We evaluate the method on three well-known challenging datasets: TUD pedestrians, INRIA pedestrians, and UIUC cars. The experiments demonstrate that our method achieves state-of-the-art or competitive performance.

  10. Quasi-Random Sequence Generators.

    1994-03-01

    Version 00 LPTAU generates quasi-random sequences. The sequences are uniformly distributed sets of L=2**30 points in the N-dimensional unit cube: I**N=[0,1]. The sequences are used as nodes for multidimensional integration, as searching points in global optimization, as trial points in multicriteria decision making, as quasi-random points for quasi Monte Carlo algorithms.

  11. Marble Algorithm: a solution to estimating ecological niches from presence-only records

    PubMed Central

    Qiao, Huijie; Lin, Congtian; Jiang, Zhigang; Ji, Liqiang

    2015-01-01

    We describe an algorithm that helps to predict potential distributional areas for species using presence-only records. The Marble Algorithm is a density-based clustering program based on Hutchinson’s concept of ecological niches as multidimensional hypervolumes in environmental space. The algorithm characterizes this niche space using the density-based spatial clustering of applications with noise (DBSCAN) algorithm. When MA is provided with a set of occurrence points in environmental space, the algorithm determines two parameters that allow the points to be grouped into several clusters. These clusters are used as reference sets describing the ecological niche, which can then be mapped onto geographic space and used as the potential distribution of the species. We used both virtual species and ten empirical datasets to compare MA with other distribution-modeling tools, including Bioclimate Analysis and Prediction System, Environmental Niche Factor Analysis, the Genetic Algorithm for Rule-set Production, Maximum Entropy Modeling, Artificial Neural Networks, Climate Space Models, Classification Tree Analysis, Generalised Additive Models, Generalised Boosted Models, Generalised Linear Models, Multivariate Adaptive Regression Splines and Random Forests. Results indicate that MA predicts potential distributional areas with high accuracy, moderate robustness, and above-average transferability on all datasets, particularly when dealing with small numbers of occurrences. PMID:26387771

  12. Extremely Randomized Machine Learning Methods for Compound Activity Prediction.

    PubMed

    Czarnecki, Wojciech M; Podlewska, Sabina; Bojarski, Andrzej J

    2015-11-09

    Speed, a relatively low requirement for computational resources and high effectiveness of the evaluation of the bioactivity of compounds have caused a rapid growth of interest in the application of machine learning methods to virtual screening tasks. However, due to the growth of the amount of data also in cheminformatics and related fields, the aim of research has shifted not only towards the development of algorithms of high predictive power but also towards the simplification of previously existing methods to obtain results more quickly. In the study, we tested two approaches belonging to the group of so-called 'extremely randomized methods'-Extreme Entropy Machine and Extremely Randomized Trees-for their ability to properly identify compounds that have activity towards particular protein targets. These methods were compared with their 'non-extreme' competitors, i.e., Support Vector Machine and Random Forest. The extreme approaches were not only found out to improve the efficiency of the classification of bioactive compounds, but they were also proved to be less computationally complex, requiring fewer steps to perform an optimization procedure.

  13. Electromagnetic wave extinction within a forested canopy

    NASA Technical Reports Server (NTRS)

    Karam, M. A.; Fung, A. K.

    1989-01-01

    A forested canopy is modeled by a collection of randomly oriented finite-length cylinders shaded by randomly oriented and distributed disk- or needle-shaped leaves. For a plane wave exciting the forested canopy, the extinction coefficient is formulated in terms of the extinction cross sections (ECSs) in the local frame of each forest component and the Eulerian angles of orientation (used to describe the orientation of each component). The ECSs in the local frame for the finite-length cylinders used to model the branches are obtained by using the forward-scattering theorem. ECSs in the local frame for the disk- and needle-shaped leaves are obtained by the summation of the absorption and scattering cross-sections. The behavior of the extinction coefficients with the incidence angle is investigated numerically for both deciduous and coniferous forest. The dependencies of the extinction coefficients on the orientation of the leaves are illustrated numerically.

  14. Forest fires

    SciTech Connect

    Fuller, M.

    1991-01-01

    This book examines the many complex and sensitive issues relating to wildland fires. Beginning with an overview of the fires of 1980s, the book discusses the implications of continued drought and considers the behavior of wildland fires, from ignition and spread to spotting and firestorms. Topics include the effects of weather, forest fuels, fire ecology, and the effects of fire on plants and animals. In addition, the book examines firefighting methods and equipment, including new minimum impact techniques and compressed air foam; prescribed burning; and steps that can be taken to protect individuals and human structures. A history of forest fire policies in the U.S. and a discussion of solutions to fire problems around the world completes the coverage. With one percent of the earth's surface burning every year in the last decade, this is a penetrating book on a subject of undeniable importance.

  15. Biodiversity Mapping in a Tropical West African Forest with Airborne Hyperspectral Data

    PubMed Central

    Vaglio Laurin, Gaia; Chan, Jonathan Cheung-Wai; Chen, Qi; Lindsell, Jeremy A.; Coomes, David A.; Guerriero, Leila; Frate, Fabio Del; Miglietta, Franco; Valentini, Riccardo

    2014-01-01

    Tropical forests are major repositories of biodiversity, but are fast disappearing as land is converted to agriculture. Decision-makers need to know which of the remaining forests to prioritize for conservation, but the only spatial information on forest biodiversity has, until recently, come from a sparse network of ground-based plots. Here we explore whether airborne hyperspectral imagery can be used to predict the alpha diversity of upper canopy trees in a West African forest. The abundance of tree species were collected from 64 plots (each 1250 m2 in size) within a Sierra Leonean national park, and Shannon-Wiener biodiversity indices were calculated. An airborne spectrometer measured reflectances of 186 bands in the visible and near-infrared spectral range at 1 m2 resolution. The standard deviations of these reflectance values and their first-order derivatives were calculated for each plot from the c. 1250 pixels of hyperspectral information within them. Shannon-Wiener indices were then predicted from these plot-based reflectance statistics using a machine-learning algorithm (Random Forest). The regression model fitted the data well (pseudo-R2 = 84.9%), and we show that standard deviations of green-band reflectances and infra-red region derivatives had the strongest explanatory powers. Our work shows that airborne hyperspectral sensing can be very effective at mapping canopy tree diversity, because its high spatial resolution allows within-plot heterogeneity in reflectance to be characterized, making it an effective tool for monitoring forest biodiversity over large geographic scales. PMID:24937407

  16. A hybrid color space for skin detection using genetic algorithm heuristic search and principal component analysis technique.

    PubMed

    Maktabdar Oghaz, Mahdi; Maarof, Mohd Aizaini; Zainal, Anazida; Rohani, Mohd Foad; Yaghoubyan, S Hadi

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377

  17. A hybrid color space for skin detection using genetic algorithm heuristic search and principal component analysis technique.

    PubMed

    Maktabdar Oghaz, Mahdi; Maarof, Mohd Aizaini; Zainal, Anazida; Rohani, Mohd Foad; Yaghoubyan, S Hadi

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications.

  18. A Hybrid Color Space for Skin Detection Using Genetic Algorithm Heuristic Search and Principal Component Analysis Technique

    PubMed Central

    2015-01-01

    Color is one of the most prominent features of an image and used in many skin and face detection applications. Color space transformation is widely used by researchers to improve face and skin detection performance. Despite the substantial research efforts in this area, choosing a proper color space in terms of skin and face classification performance which can address issues like illumination variations, various camera characteristics and diversity in skin color tones has remained an open issue. This research proposes a new three-dimensional hybrid color space termed SKN by employing the Genetic Algorithm heuristic and Principal Component Analysis to find the optimal representation of human skin color in over seventeen existing color spaces. Genetic Algorithm heuristic is used to find the optimal color component combination setup in terms of skin detection accuracy while the Principal Component Analysis projects the optimal Genetic Algorithm solution to a less complex dimension. Pixel wise skin detection was used to evaluate the performance of the proposed color space. We have employed four classifiers including Random Forest, Naïve Bayes, Support Vector Machine and Multilayer Perceptron in order to generate the human skin color predictive model. The proposed color space was compared to some existing color spaces and shows superior results in terms of pixel-wise skin detection accuracy. Experimental results show that by using Random Forest classifier, the proposed SKN color space obtained an average F-score and True Positive Rate of 0.953 and False Positive Rate of 0.0482 which outperformed the existing color spaces in terms of pixel wise skin detection accuracy. The results also indicate that among the classifiers used in this study, Random Forest is the most suitable classifier for pixel wise skin detection applications. PMID:26267377

  19. Random sequential adsorption on fractals.

    PubMed

    Ciesla, Michal; Barbasz, Jakub

    2012-07-28

    Irreversible adsorption of spheres on flat collectors having dimension d < 2 is studied. Molecules are adsorbed on Sierpinski's triangle and carpet-like fractals (1 < d < 2), and on general Cantor set (d < 1). Adsorption process is modeled numerically using random sequential adsorption (RSA) algorithm. The paper concentrates on measurement of fundamental properties of coverages, i.e., maximal random coverage ratio and density autocorrelation function, as well as RSA kinetics. Obtained results allow to improve phenomenological relation between maximal random coverage ratio and collector dimension. Moreover, simulations show that, in general, most of known dimensional properties of adsorbed monolayers are valid for non-integer dimensions.

  20. Simulating California reservoir operation using the classification and regression-tree algorithm combined with a shuffled cross-validation scheme

    NASA Astrophysics Data System (ADS)

    Yang, Tiantian; Gao, Xiaogang; Sorooshian, Soroosh; Li, Xin

    2016-03-01

    The controlled outflows from a reservoir or dam are highly dependent on the decisions made by the reservoir operators, instead of a natural hydrological process. Difference exists between the natural upstream inflows to reservoirs and the controlled outflows from reservoirs that supply the downstream users. With the decision maker's awareness of changing climate, reservoir management requires adaptable means to incorporate more information into decision making, such as water delivery requirement, environmental constraints, dry/wet conditions, etc. In this paper, a robust reservoir outflow simulation model is presented, which incorporates one of the well-developed data-mining models (Classification and Regression Tree) to predict the complicated human-controlled reservoir outflows and extract the reservoir operation patterns. A shuffled cross-validation approach is further implemented to improve CART's predictive performance. An application study of nine major reservoirs in California is carried out. Results produced by the enhanced CART, original CART, and random forest are compared with observation. The statistical measurements show that the enhanced CART and random forest overperform the CART control run in general, and the enhanced CART algorithm gives a better predictive performance over random forest in simulating the peak flows. The results also show that the proposed model is able to consistently and reasonably predict the expert release decisions. Experiments indicate that the release operation in the Oroville Lake is significantly dominated by SWP allocation amount and reservoirs with low elevation are more sensitive to inflow amount than others.

  1. Dispersal of forest insects

    NASA Technical Reports Server (NTRS)

    Mcmanus, M. L.

    1979-01-01

    Dispersal flights of selected species of forest insects which are associated with periodic outbreaks of pests that occur over large contiguous forested areas are discussed. Gypsy moths, spruce budworms, and forest tent caterpillars were studied for their massive migrations in forested areas. Results indicate that large dispersals into forested areas are due to the females, except in the case of the gypsy moth.

  2. Discriminant forest classification method and system

    DOEpatents

    Chen, Barry Y.; Hanley, William G.; Lemmond, Tracy D.; Hiller, Lawrence J.; Knapp, David A.; Mugge, Marshall J.

    2012-11-06

    A hybrid machine learning methodology and system for classification that combines classical random forest (RF) methodology with discriminant analysis (DA) techniques to provide enhanced classification capability. A DA technique which uses feature measurements of an object to predict its class membership, such as linear discriminant analysis (LDA) or Andersen-Bahadur linear discriminant technique (AB), is used to split the data at each node in each of its classification trees to train and grow the trees and the forest. When training is finished, a set of n DA-based decision trees of a discriminant forest is produced for use in predicting the classification of new samples of unknown class.

  3. Random Vibrations

    NASA Technical Reports Server (NTRS)

    Messaro. Semma; Harrison, Phillip

    2010-01-01

    Ares I Zonal Random vibration environments due to acoustic impingement and combustion processes are develop for liftoff, ascent and reentry. Random Vibration test criteria for Ares I Upper Stage pyrotechnic components are developed by enveloping the applicable zonal environments where each component is located. Random vibration tests will be conducted to assure that these components will survive and function appropriately after exposure to the expected vibration environments. Methodology: Random Vibration test criteria for Ares I Upper Stage pyrotechnic components were desired that would envelope all the applicable environments where each component was located. Applicable Ares I Vehicle drawings and design information needed to be assessed to determine the location(s) for each component on the Ares I Upper Stage. Design and test criteria needed to be developed by plotting and enveloping the applicable environments using Microsoft Excel Spreadsheet Software and documenting them in a report Using Microsoft Word Processing Software. Conclusion: Random vibration liftoff, ascent, and green run design & test criteria for the Upper Stage Pyrotechnic Components were developed by using Microsoft Excel to envelope zonal environments applicable to each component. Results were transferred from Excel into a report using Microsoft Word. After the report is reviewed and edited by my mentor it will be submitted for publication as an attachment to a memorandum. Pyrotechnic component designers will extract criteria from my report for incorporation into the design and test specifications for components. Eventually the hardware will be tested to the environments I developed to assure that the components will survive and function appropriately after exposure to the expected vibration environments.

  4. Effects of Forest Disturbances on Forest Structural Parameters Retrieval from Lidar Waveform Data

    NASA Technical Reports Server (NTRS)

    Ranson, K, Lon; Sun, G.

    2011-01-01

    The effect of forest disturbance on the lidar waveform and the forest biomass estimation was demonstrated by model simulation. The results show that the correlation between stand biomass and the lidar waveform indices changes when the stand spatial structure changes due to disturbances rather than the natural succession. This has to be considered in developing algorithms for regional or global mapping of biomass from lidar waveform data.

  5. Alerts of forest disturbance from MODIS imagery

    NASA Astrophysics Data System (ADS)

    Hammer, Dan; Kraft, Robin; Wheeler, David

    2014-12-01

    This paper reports the methodology and computational strategy for a forest cover disturbance alerting system. Analytical techniques from time series econometrics are applied to imagery from the Moderate Resolution Imaging Spectroradiometer (MODIS) sensor to detect temporal instability in vegetation indices. The characteristics from each MODIS pixel's spectral history are extracted and compared against historical data on forest cover loss to develop a geographically localized classification rule that can be applied across the humid tropical biome. The final output is a probability of forest disturbance for each 500 m pixel that is updated every 16 days. The primary objective is to provide high-confidence alerts of forest disturbance, while minimizing false positives. We find that the alerts serve this purpose exceedingly well in Pará, Brazil, with high probability alerts garnering a user accuracy of 98 percent over the training period and 93 percent after the training period (2000-2005) when compared against the PRODES deforestation data set, which is used to assess spatial accuracy. Implemented in Clojure and Java on the Hadoop distributed data processing platform, the algorithm is a fast, automated, and open source system for detecting forest disturbance. It is intended to be used in conjunction with higher-resolution imagery and data products that cannot be updated as quickly as MODIS-based data products. By highlighting hotspots of change, the algorithm and associated output can focus high-resolution data acquisition and aid in efforts to enforce local forest conservation efforts.

  6. MODIS Based Estimation of Forest Aboveground Biomass in China

    PubMed Central

    Sun, Yan; Wang, Tao; Zeng, Zhenzhong; Piao, Shilong

    2015-01-01

    Accurate estimation of forest biomass C stock is essential to understand carbon cycles. However, current estimates of Chinese forest biomass are mostly based on inventory-based timber volumes and empirical conversion factors at the provincial scale, which could introduce large uncertainties in forest biomass estimation. Here we provide a data-driven estimate of Chinese forest aboveground biomass from 2001 to 2013 at a spatial resolution of 1 km by integrating a recently reviewed plot-level ground-measured forest aboveground biomass database with geospatial information from 1-km Moderate-Resolution Imaging Spectroradiometer (MODIS) dataset in a machine learning algorithm (the model tree ensemble, MTE). We show that Chinese forest aboveground biomass is 8.56 Pg C, which is mainly contributed by evergreen needle-leaf forests and deciduous broadleaf forests. The mean forest aboveground biomass density is 56.1 Mg C ha−1, with high values observed in temperate humid regions. The responses of forest aboveground biomass density to mean annual temperature are closely tied to water conditions; that is, negative responses dominate regions with mean annual precipitation less than 1300 mm y−1 and positive responses prevail in regions with mean annual precipitation higher than 2800 mm y−1. During the 2000s, the forests in China sequestered C by 61.9 Tg C y−1, and this C sink is mainly distributed in north China and may be attributed to warming climate, rising CO2 concentration, N deposition, and growth of young forests. PMID:26115195

  7. MODIS Based Estimation of Forest Aboveground Biomass in China.

    PubMed

    Yin, Guodong; Zhang, Yuan; Sun, Yan; Wang, Tao; Zeng, Zhenzhong; Piao, Shilong

    2015-01-01

    Accurate estimation of forest biomass C stock is essential to understand carbon cycles. However, current estimates of Chinese forest biomass are mostly based on inventory-based timber volumes and empirical conversion factors at the provincial scale, which could introduce large uncertainties in forest biomass estimation. Here we provide a data-driven estimate of Chinese forest aboveground biomass from 2001 to 2013 at a spatial resolution of 1 km by integrating a recently reviewed plot-level ground-measured forest aboveground biomass database with geospatial information from 1-km Moderate-Resolution Imaging Spectroradiometer (MODIS) dataset in a machine learning algorithm (the model tree ensemble, MTE). We show that Chinese forest aboveground biomass is 8.56 Pg C, which is mainly contributed by evergreen needle-leaf forests and deciduous broadleaf forests. The mean forest aboveground biomass density is 56.1 Mg C ha-1, with high values observed in temperate humid regions. The responses of forest aboveground biomass density to mean annual temperature are closely tied to water conditions; that is, negative responses dominate regions with mean annual precipitation less than 1300 mm y-1 and positive responses prevail in regions with mean annual precipitation higher than 2800 mm y-1. During the 2000s, the forests in China sequestered C by 61.9 Tg C y-1, and this C sink is mainly distributed in north China and may be attributed to warming climate, rising CO2 concentration, N deposition, and growth of young forests. PMID:26115195

  8. MODIS Based Estimation of Forest Aboveground Biomass in China.

    PubMed

    Yin, Guodong; Zhang, Yuan; Sun, Yan; Wang, Tao; Zeng, Zhenzhong; Piao, Shilong

    2015-01-01

    Accurate estimation of forest biomass C stock is essential to understand carbon cycles. However, current estimates of Chinese forest biomass are mostly based on inventory-based timber volumes and empirical conversion factors at the provincial scale, which could introduce large uncertainties in forest biomass estimation. Here we provide a data-driven estimate of Chinese forest aboveground biomass from 2001 to 2013 at a spatial resolution of 1 km by integrating a recently reviewed plot-level ground-measured forest aboveground biomass database with geospatial information from 1-km Moderate-Resolution Imaging Spectroradiometer (MODIS) dataset in a machine learning algorithm (the model tree ensemble, MTE). We show that Chinese forest aboveground biomass is 8.56 Pg C, which is mainly contributed by evergreen needle-leaf forests and deciduous broadleaf forests. The mean forest aboveground biomass density is 56.1 Mg C ha-1, with high values observed in temperate humid regions. The responses of forest aboveground biomass density to mean annual temperature are closely tied to water conditions; that is, negative responses dominate regions with mean annual precipitation less than 1300 mm y-1 and positive responses prevail in regions with mean annual precipitation higher than 2800 mm y-1. During the 2000s, the forests in China sequestered C by 61.9 Tg C y-1, and this C sink is mainly distributed in north China and may be attributed to warming climate, rising CO2 concentration, N deposition, and growth of young forests.

  9. Forests through the Eye of a Satellite: Understanding regional forest-cover dynamics using Landsat Imagery

    NASA Astrophysics Data System (ADS)

    Baumann, Matthias

    Forests are changing at an alarming pace worldwide. Forests are an important provider of ecosystem services that contribute to human wellbeing, including the provision of timber and non-timber products, habitat for biodiversity, recreation amenities. Most prominently, forests serve as a sink for atmospheric carbon dioxide that ultimately helps to mitigate changes in the global climate. It is thus important to understand where, how and why forests change worldwide. My dissertation provides answers to these questions. The overarching goal of my dissertation is to improve our understanding of regional forest-cover dynamics by analyzing Landsat satellite imagery. I answer where forests change following drastic socio-economic shocks by using the breakdown of the Soviet Union as a natural experiment. My dissertation provides innovative algorithms to answer why forests change---because of human activities or because of natural events such as storms. Finally, I will show how dynamic forests are within one year by providing ways to characterize green-leaf phenology from satellite imagery. With my findings I directly contribute to a better understanding of the processes on the Earth's surface and I highlight the importance of satellite imagery to learn about regional and local forest-cover dynamics.

  10. Applying various algorithms for species distribution modelling.

    PubMed

    Li, Xinhai; Wang, Yuan

    2013-06-01

    Species distribution models have been used extensively in many fields, including climate change biology, landscape ecology and conservation biology. In the past 3 decades, a number of new models have been proposed, yet researchers still find it difficult to select appropriate models for data and objectives. In this review, we aim to provide insight into the prevailing species distribution models for newcomers in the field of modelling. We compared 11 popular models, including regression models (the generalized linear model, the generalized additive model, the multivariate adaptive regression splines model and hierarchical modelling), classification models (mixture discriminant analysis, the generalized boosting model, and classification and regression tree analysis) and complex models (artificial neural network, random forest, genetic algorithm for rule set production and maximum entropy approaches). Our objectives are: (i) to compare the strengths and weaknesses of the models, their characteristics and identify suitable situations for their use (in terms of data type and species-environment relationships) and (ii) to provide guidelines for model application, including 3 steps: model selection, model formulation and parameter estimation. PMID:23731809

  11. Applying various algorithms for species distribution modelling.

    PubMed

    Li, Xinhai; Wang, Yuan

    2013-06-01

    Species distribution models have been used extensively in many fields, including climate change biology, landscape ecology and conservation biology. In the past 3 decades, a number of new models have been proposed, yet researchers still find it difficult to select appropriate models for data and objectives. In this review, we aim to provide insight into the prevailing species distribution models for newcomers in the field of modelling. We compared 11 popular models, including regression models (the generalized linear model, the generalized additive model, the multivariate adaptive regression splines model and hierarchical modelling), classification models (mixture discriminant analysis, the generalized boosting model, and classification and regression tree analysis) and complex models (artificial neural network, random forest, genetic algorithm for rule set production and maximum entropy approaches). Our objectives are: (i) to compare the strengths and weaknesses of the models, their characteristics and identify suitable situations for their use (in terms of data type and species-environment relationships) and (ii) to provide guidelines for model application, including 3 steps: model selection, model formulation and parameter estimation.

  12. Montana's forest resources. Forest Service resource bulletin

    SciTech Connect

    Conner, R.C.; O'Brien, R.A.

    1993-09-01

    The report includes highlights of the forest resource in Montana as of 1989. Also the study describes the extent, condition, and location of the State's forests with particular emphasis on timberland. Includes statistical tables, area by land classes, ownership, and forest type, growing stock and sawtimber volumes, growth, mortality, and removals for timberland.

  13. Returning forests analyzed with the forest identity

    PubMed Central

    Kauppi, Pekka E.; Ausubel, Jesse H.; Fang, Jingyun; Mather, Alexander S.; Sedjo, Roger A.; Waggoner, Paul E.

    2006-01-01

    Amid widespread reports of deforestation, some nations have nevertheless experienced transitions from deforestation to reforestation. In a causal relationship, the Forest Identity relates the carbon sequestered in forests to the changing variables of national or regional forest area, growing stock density per area, biomass per growing stock volume, and carbon concentration in the biomass. It quantifies the sources of change of a nation's forests. The Identity also logically relates the quantitative impact on forest expanse of shifting timber harvest to regions and plantations where density grows faster. Among 50 nations with extensive forests reported in the Food and Agriculture Organization's comprehensive Global Forest Resources Assessment 2005, no nation where annual per capita gross domestic product exceeded $4,600 had a negative rate of growing stock change. Using the Forest Identity and national data from the Assessment report, a single synoptic chart arrays the 50 nations with coordinates of the rates of change of basic variables, reveals both clusters of nations and outliers, and suggests trends in returning forests and their attributes. The Forest Identity also could serve as a tool for setting forest goals and illuminating how national policies accelerate or retard the forest transitions that are diffusing among nations. PMID:17101996

  14. The MCNP5 Random number generator

    SciTech Connect

    Brown, F. B.; Nagaya, Y.

    2002-01-01

    MCNP and other Monte Carlo particle transport codes use random number generators to produce random variates from a uniform distribution on the interval. These random variates are then used in subsequent sampling from probability distributions to simulate the physical behavior of particles during the transport process. This paper describes the new random number generator developed for MCNP Version 5. The new generator will optionally preserve the exact random sequence of previous versions and is entirely conformant to the Fortran-90 standard, hence completely portable. In addition, skip-ahead algorithms have been implemented to efficiently initialize the generator for new histories, a capability that greatly simplifies parallel algorithms. Further, the precision of the generator has been increased, extending the period by a factor of 10{sup 5}. Finally, the new generator has been subjected to 3 different sets of rigorous and extensive statistical tests to verify that it produces a sufficiently random sequence.

  15. Comparison of machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    NASA Astrophysics Data System (ADS)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2015-04-01

    Machine learning (ML) algorithms have been successfully evaluated as valuable tools in satellite-based rainfall retrievals which shows the high potential of ML algorithms when faced with high dimensional and complex data. Moreover, the recent developments in parallel computing with ML offer new possibilities in terms of training and predicting speed and therefore makes their usage in real time systems feasible. The present study compares four ML algorithms for rainfall area detection and rainfall rate assignment during daytime, night-time and twilight using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path are applied as predictor variables. As machine learning algorithms, random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) are chosen. The comparison is realised in three steps. First, an extensive tuning study is carried out to customise each of the models. Secondly, the models are trained using the optimum values of model parameters found in the tuning study. Finally, the trained models are used to detect rainfall areas and to assign rainfall rates using an independent validation datasets which is compared against ground-based radar data. To train and validate the models, the radar-based RADOLAN RW product from the German Weather Service (DWD) is used which provides area-wide gauge-adjusted hourly precipitation information. Though the differences in the performance of the algorithms were rather small, NNET and AVNNET have been identified as the most suitable algorithms. On average, they showed the best performance in rainfall area delineation as well as in rainfall rate assignment. The fast computation time of NNET allows to work with large datasets as it is required in remote sensing based rainfall retrievals. However, since none of the algorithms performed considerably better that the others we conclude that research

  16. Comparison of four machine learning algorithms for their applicability in satellite-based optical rainfall retrievals

    NASA Astrophysics Data System (ADS)

    Meyer, Hanna; Kühnlein, Meike; Appelhans, Tim; Nauss, Thomas

    2016-03-01

    Machine learning (ML) algorithms have successfully been demonstrated to be valuable tools in satellite-based rainfall retrievals which show the practicability of using ML algorithms when faced with high dimensional and complex data. Moreover, recent developments in parallel computing with ML present new possibilities for training and prediction speed and therefore make their usage in real-time systems feasible. This study compares four ML algorithms - random forests (RF), neural networks (NNET), averaged neural networks (AVNNET) and support vector machines (SVM) - for rainfall area detection and rainfall rate assignment using MSG SEVIRI data over Germany. Satellite-based proxies for cloud top height, cloud top temperature, cloud phase and cloud water path serve as predictor variables. The results indicate an overestimation of rainfall area delineation regardless of the ML algorithm (averaged bias = 1.8) but a high probability of detection ranging from 81% (SVM) to 85% (NNET). On a 24-hour basis, the performance of the rainfall rate assignment yielded R2 values between 0.39 (SVM) and 0.44 (AVNNET). Though the differences in the algorithms' performance were rather small, NNET and AVNNET were identified as the most suitable algorithms. On average, they demonstrated the best performance in rainfall area delineation as well as in rainfall rate assignment. NNET's computational speed is an additional advantage in work with large datasets such as in remote sensing based rainfall retrievals. However, since no single algorithm performed considerably better than the others we conclude that further research in providing suitable predictors for rainfall is of greater necessity than an optimization through the choice of the ML algorithm.

  17. [Automated mapping of urban forests' disturbance and recovery in Nanjing, China].

    PubMed

    Lyu, Ying-ying; Zhuang, Yi-lin; Ren, Xin-yu; Li, Ming-shi; Xu, Wang-gu; Wang, Zhi

    2016-02-01

    Using Landsat TM/ETM dense time series observations spanning from 1987 to 2011, taking Laoshan forest farm and Purple Mountain as the research objects, the landsat ecosystem disturbance adaptive processing system (Ledaps) algorithm was used to generate surface reflectance datasets, which were fed to the vegetation change tracker model (VCT) model to derive urban forest disturbance and recovery products over Nanjing, followed by an intensive validation of the products. The results showed that there was a relatively high spatial agreement for forest disturbance products mapped by VCT, ranging from 65.4% to 95.0%. There was an apparent fluctuating forest disturbance and recovery rate over time, and the change trend of forest disturbance occurring at the two sites was roughly similar, but forest recovery was obviously different. Forest coverage in Purple Mountain was less than that in Laoshan forest farm, but the forest disturbance and recovery rates in Laoshan forest farm were larger than those in Purple Mountain.

  18. [Automated mapping of urban forests' disturbance and recovery in Nanjing, China].

    PubMed

    Lyu, Ying-ying; Zhuang, Yi-lin; Ren, Xin-yu; Li, Ming-shi; Xu, Wang-gu; Wang, Zhi

    2016-02-01

    Using Landsat TM/ETM dense time series observations spanning from 1987 to 2011, taking Laoshan forest farm and Purple Mountain as the research objects, the landsat ecosystem disturbance adaptive processing system (Ledaps) algorithm was used to generate surface reflectance datasets, which were fed to the vegetation change tracker model (VCT) model to derive urban forest disturbance and recovery products over Nanjing, followed by an intensive validation of the products. The results showed that there was a relatively high spatial agreement for forest disturbance products mapped by VCT, ranging from 65.4% to 95.0%. There was an apparent fluctuating forest disturbance and recovery rate over time, and the change trend of forest disturbance occurring at the two sites was roughly similar, but forest recovery was obviously different. Forest coverage in Purple Mountain was less than that in Laoshan forest farm, but the forest disturbance and recovery rates in Laoshan forest farm were larger than those in Purple Mountain. PMID:27396114

  19. Fractional randomness

    NASA Astrophysics Data System (ADS)

    Tapiero, Charles S.; Vallois, Pierre

    2016-11-01

    The premise of this paper is that a fractional probability distribution is based on fractional operators and the fractional (Hurst) index used that alters the classical setting of random variables. For example, a random variable defined by its density function might not have a fractional density function defined in its conventional sense. Practically, it implies that a distribution's granularity defined by a fractional kernel may have properties that differ due to the fractional index used and the fractional calculus applied to define it. The purpose of this paper is to consider an application of fractional calculus to define the fractional density function of a random variable. In addition, we provide and prove a number of results, defining the functional forms of these distributions as well as their existence. In particular, we define fractional probability distributions for increasing and decreasing functions that are right continuous. Examples are used to motivate the usefulness of a statistical approach to fractional calculus and its application to economic and financial problems. In conclusion, this paper is a preliminary attempt to construct statistical fractional models. Due to the breadth and the extent of such problems, this paper may be considered as an initial attempt to do so.

  20. Random bits, true and unbiased, from atmospheric turbulence.

    PubMed

    Marangon, Davide G; Vallone, Giuseppe; Villoresi, Paolo

    2014-06-30

    Random numbers represent a fundamental ingredient for secure communications and numerical simulation as well as to games and in general to Information Science. Physical processes with intrinsic unpredictability may be exploited to generate genuine random numbers. The optical propagation in strong atmospheric turbulence is here taken to this purpose, by observing a laser beam after a 143 km free-space path. In addition, we developed an algorithm to extract the randomness of the beam images at the receiver without post-processing. The numbers passed very selective randomness tests for qualification as genuine random numbers. The extracting algorithm can be easily generalized to random images generated by different physical processes.

  1. A simple and practical control of the authenticity of organic sugarcane samples based on the use of machine-learning algorithms and trace elements determination by inductively coupled plasma mass spectrometry.

    PubMed

    Barbosa, Rommel M; Batista, Bruno L; Barião, Camila V; Varrique, Renan M; Coelho, Vinicius A; Campiglia, Andres D; Barbosa, Fernando

    2015-10-01

    A practical and easy control of the authenticity of organic sugarcane samples based on the use of machine-learning algorithms and trace elements determination by inductively coupled plasma mass spectrometry is proposed. Reference ranges for 32 chemical elements in 22 samples of sugarcane (13 organic and 9 non organic) were established and then two algorithms, Naive Bayes (NB) and Random Forest (RF), were evaluated to classify the samples. Accurate results (>90%) were obtained when using all variables (i.e., 32 elements). However, accuracy was improved (95.4% for NB) when only eight minerals (Rb, U, Al, Sr, Dy, Nb, Ta, Mo), chosen by a feature selection algorithm, were employed. Thus, the use of a fingerprint based on trace element levels associated with classification machine learning algorithms may be used as a simple alternative for authenticity evaluation of organic sugarcane samples.

  2. Forest Health Detectives

    ERIC Educational Resources Information Center

    Bal, Tara L.

    2014-01-01

    "Forest health" is an important concept often not covered in tree, forest, insect, or fungal ecology and biology. With minimal, inexpensive equipment, students can investigate and conduct their own forest health survey to assess the percentage of trees with natural or artificial wounds or stress. Insects and diseases in the forest are…

  3. Aboveground Biomass and Dynamics of Forest Attributes using LiDAR Data and Vegetation Model

    NASA Astrophysics Data System (ADS)

    V V L, P. A.

    2015-12-01

    In recent years, biomass estimation for tropical forests has received much attention because of the fact that regional biomass is considered to be a critical input to climate change. Biomass almost determines the potential carbon emission that could be released to the atmosphere due to deforestation or conservation to non-forest land use. Thus, accurate biomass estimation is necessary for better understating of deforestation impacts on global warming and environmental degradation. In this context, forest stand height inclusion in biomass estimation plays a major role in reducing the uncertainty in the estimation of biomass. The improvement in the accuracy in biomass shall also help in meeting the MRV objectives of REDD+. Along with the precise estimate of biomass, it is also important to emphasize the role of vegetation models that will most likely become an important tool for assessing the effects of climate change on potential vegetation dynamics and terrestrial carbon storage and for managing terrestrial ecosystem sustainability. Remote sensing is an efficient way to estimate forest parameters in large area, especially at regional scale where field data is limited. LIDAR (Light Detection And Ranging) provides accurate information on the vertical structure of forests. We estimated average tree canopy heights and AGB from GLAS waveform parameters by using a multi-regression linear model in forested area of Madhya Pradesh (area-3,08,245 km2), India. The derived heights from ICESat-GLAS were correlated with field measured tree canopy heights for 60 plots. Results have shown a significant correlation of R2= 74% for top canopy heights and R2= 57% for stand biomass. The total biomass estimation 320.17 Mt and canopy heights are generated by using random forest algorithm. These canopy heights and biomass maps were used in vegetation models to predict the changes biophysical/physiological characteristics of forest according to the changing climate. In our study we have

  4. Modeling long-term suspended-sediment export from an undisturbed forest catchment

    NASA Astrophysics Data System (ADS)

    Zimmermann, Alexander; Francke, Till; Elsenbeer, Helmut

    2013-04-01

    Most estimates of suspended sediment yields from humid, undisturbed, and geologically stable forest environments fall within a range of 5 - 30 t km-2 a-1. These low natural erosion rates in small headwater catchments (≤ 1 km2) support the common impression that a well-developed forest cover prevents surface erosion. Interestingly, those estimates originate exclusively from areas with prevailing vertical hydrological flow paths. Forest environments dominated by (near-) surface flow paths (overland flow, pipe flow, and return flow) and a fast response to rainfall, however, are not an exceptional phenomenon, yet only very few sediment yields have been estimated for these areas. Not surprisingly, even fewer long-term (≥ 10 years) records exist. In this contribution we present our latest research which aims at quantifying long-term suspended-sediment export from an undisturbed rainforest catchment prone to frequent overland flow. A key aspect of our approach is the application of machine-learning techniques (Random Forest, Quantile Regression Forest) which allows not only the handling of non-Gaussian data, non-linear relations between predictors and response, and correlations between predictors, but also the assessment of prediction uncertainty. For the current study we provided the machine-learning algorithms exclusively with information from a high-resolution rainfall time series to reconstruct discharge and suspended sediment dynamics for a 21-year period. The significance of our results is threefold. First, our estimates clearly show that forest cover does not necessarily prevent erosion if wet antecedent conditions and large rainfalls coincide. During these situations, overland flow is widespread and sediment fluxes increase in a non-linear fashion due to the mobilization of new sediment sources. Second, our estimates indicate that annual suspended sediment yields of the undisturbed forest catchment show large fluctuations. Depending on the frequency of large

  5. Fast generation of sparse random kernel graphs

    SciTech Connect

    Hagberg, Aric; Lemons, Nathan; Du, Wen -Bo

    2015-09-10

    The development of kernel-based inhomogeneous random graphs has provided models that are flexible enough to capture many observed characteristics of real networks, and that are also mathematically tractable. We specify a class of inhomogeneous random graph models, called random kernel graphs, that produces sparse graphs with tunable graph properties, and we develop an efficient generation algorithm to sample random instances from this model. As real-world networks are usually large, it is essential that the run-time of generation algorithms scales better than quadratically in the number of vertices n. We show that for many practical kernels our algorithm runs in time at most ο(n(logn)²). As an example, we show how to generate samples of power-law degree distribution graphs with tunable assortativity.

  6. Fast generation of sparse random kernel graphs

    DOE PAGES

    Hagberg, Aric; Lemons, Nathan; Du, Wen -Bo

    2015-09-10

    The development of kernel-based inhomogeneous random graphs has provided models that are flexible enough to capture many observed characteristics of real networks, and that are also mathematically tractable. We specify a class of inhomogeneous random graph models, called random kernel graphs, that produces sparse graphs with tunable graph properties, and we develop an efficient generation algorithm to sample random instances from this model. As real-world networks are usually large, it is essential that the run-time of generation algorithms scales better than quadratically in the number of vertices n. We show that for many practical kernels our algorithm runs in timemore » at most ο(n(logn)²). As an example, we show how to generate samples of power-law degree distribution graphs with tunable assortativity.« less

  7. QUANTIFYING FOREST ABOVEGROUND CARBON POOLS AND FLUXES USING MULTI-TEMPORAL L